Semi-supervised Cardiac Image Segmentation via Label Propagation and Style Transfer
Yao Zhang, Jiawei Yang, Feng Hou, Yang Liu, Yixin Wang, Jiang Tian, Cheng Zhong, Yang Zhang, Zhiqiang He
SSemi-supervised Cardiac Image Segmentationvia Label Propagation and Style Transfer
Yao Zhang , , Jiawei Yang , Feng Hou , , Yang Liu , , Yixin Wang , , JiangTian , Cheng Zhong , Yang Zhang , and Zhiqiang He [email protected] Abstract.
Accurate segmentation of cardiac structures can assist doc-tors to diagnose diseases, and to improve treatment planning, which ishighly demanded in the clinical practice. However, the shortage of anno-tation and the variance of the data among different vendors and medicalcenters restrict the performance of advanced deep learning methods. Inthis work, we present a fully automatic method to segment cardiac struc-tures including the left (LV) and right ventricle (RV) blood pools, as wellas for the left ventricular myocardium (MYO) in MRI volumes. Specifi-cally, we design a semi-supervised learning method to leverage unlabelledMRI sequence timeframes by label propagation. Then we exploit styletransfer to reduce the variance among different centers and vendors formore robust cardiac image segmentation. We evaluate our method in theM&Ms challenge , ranking 2nd place among 14 competitive teams. Cardiac disease is one of the leading threats to human health and causes massivedeath every year. In the clinical routine, advanced medical imaging techniques(such as MRI, CT, ultrasound) are used for the diagnosis of cardiac disease.Accurate segmentation of cardiac structures from medical images is an essentialstep to quantatively evaluate cardiac function and improve therapy planning,which is highly demanded in the clinical practice.In the recent years, deep learning models have been widely used for cardiacimage segmentation and achieved promising results [4]. However, these modelscould be failed on unseen datasets acquired from distinct clinical centers ormedical imaging scanners [8]. The M&Ms challenge is motivated to contributeto the effort of building generalizable models that can be applied consistentlyacross clinical centers [2]. In this challenge, the cohort is composed of 350 patientswith hypertrophic and dilated cardiomyopathies as well as healthy subjects.All subjects were scanned in clinical centers in three different countries (Spain,Germany and Canada) using four different magnetic resonance scanner vendors(Siemens, General Electric, Philips and Canon). The variance of data amongmultiple centers and vendors poses extreme challenge to the generalization ofmachine/deep learning models. a r X i v : . [ ee ss . I V ] D ec Zhang et al. E x t e r n a l I n n o v a t i o n Manual LabelPseudo Label
Label Propagation
Orignial StyleUnited Style
Style Transfer3D UNet
Train Finetune
Fig. 1.
The pipeline of our method.
The success of these deep learning models usually relies on large-scale man-ually annotated datasets. It requires expensive resources to manually label theimages. In contrast to 3D MRI or CT images for other organs, cardiac MRIsequence timeframes has 4 dimensions (i.e., height, depth, width, and time se-ries), which leads to much more workload for annotation. In M&Ms challenge,only the End of Systole (ES) and the End of Diastole (ED) of the cardiac MRIsequence timeframes are annotated. Plenty of images between ES and ED stayunlabelled, which limits the performance of supervised learning.In this paper, we propose to leverage unlabelled MRI sequence timeframesto improve cardiac segmentation by label propagation. Then we exploit styletransfer to reduce the variance among different centers and vendors for morerobust cardiac image segmentation.
In this section, we will describe our method in detail. We employ a semi-supervisedmethod to achieve effective cardiac image segmentation using unlabelled time-frames from the MRI sequence timeframes. The pipeline of our method is il-lustrated in Fig. 1. In contrast to [3], we exploit label propagation to leverageunlabelled cardiac sequence. Firstly, a set of pseudo labels of the unlabelled im-ages between ES and ED are generated. Secondly, a 3D UNet [7] is trained onthe data with both manual and pseudo labels. At last, as those pseudo labels isnot as accurate as manual ones, the trained UNet is fine-tuned on the manuallylabelled data. Furthermore, in order to reduce the gap of the data from differentvendors, we augment the training data through style transfer to improve thegeneralization of semi-supervised learning. itle Suppressed Due to Excessive Length 3
RegistrationRegistration (a) Unlabeled time frame (b) Labeled template time frames
Argmin Norm (c) Wrap transformation matrix
Transformation (d) Generated label
Fig. 2.
The procedure of label propagation.
Here we augment our training set from unlabeled time frames by leveraging theinsights of label propagation. Label propagation is a family of semi-supervisedalgorithms. It advocates the exploitation of similarity between labeled and un-labeled data, granting us the ability to assign labels to previously unlabeleddata.Fortunately, this task is a well-formulated problem for label propagationfor its subject-level time-series prosperity. Firstly, the annotations of End ofSystole (ES) and End of Diastole (ED) are good priors as they capture the mostextreme scenarios and thus can cover the transition frames between ES andED. Meanwhile, data from different time frames within a patient share almostidentical distribution, which will alleviate propagation errors caused by inter-subject variabilities. Hence, with ES and ED frames as priors, label propagationalgorithms can propagate their labels to non-ES and non-ED time frames in aintra-subject manner. By doing so, the propagated labels are better constrainedwith specificity compared with inter-subject propagation.Specifically, we use registration algorithm to propagate labels, as shown inFig. 2. For each patient, the ES and ED frames are the template frames. Givena target time frame, two template frames are first registered to T , resulting two Warp matrix, called ES wrap and ED wrap . Naturally, we want the registered labelsmooth and not elasticizing from the template too much. So we then comparethe norm of two ES wrap and ED wrap and choose the one with smaller normas T wrap . Finally, the generated label is obtained by applying the T warp to thecorresponding template label. Zhang et al.
Fig. 3.
Histogram distribution of MRI from different clinical centers. Center 1, center2, and center 3 are marked in blue, orange, and green, respectively. Note that MRIscanners used in both center 2 and center 3 is from the same vendor, while that incenter 1 is different.
Instead of involving a complicated deep learning network for style transfer [5],we utilize a simple yet effective method, histogram matching, to achieve this.Histogram matching is a process where a time series, image, or higher dimensionscalar data is modified such that its histogram matches that of reference dataset.A common application of this is to match the images from two sensors withslightly different responses, or from a sensor whose response changes. We analyzethe histogram distribution between different vendors and medical centers, andfind that the histogram difference between vendors is much more remarkablewhile that between centers is insignificant (see Fig. 3). Herein, we apply it tomatch MRI images from different vendors.The procedure is as follows. The cumulative histogram is computed for eachdataset from a vendor. For any particular value x i in the data to be adjustedhas a cumulative histogram value given by S ( x i ). This in turn is the cumulativedistribution value in the reference dataset, denoted as T ( x j ). The input datavalue x i is replaced by x j .Specifically, we sample 100 volumes from training set and then randomlyselect a slice from each volume as the unified reference data. Then other data ismatched to the reference one. As large scale dataset benefits the deep learningmethod, we also augment the training data by transferring the data from onevendor to another (i.e., from vendor A to vendor B or vice versa). The challenge cohort is composed of 350 patients with hypertrophic and dilatedcardiomyopathies as well as healthy subjects. All subjects were scanned in clini-cal centres in three different countries (Spain, Germany and Canada) using four itle Suppressed Due to Excessive Length 5
Fig. 4.
Visual examples of accurate segmentation. From left to right, each column showsthe timeframe, ground truth, prediction of the proposed approach, and the differencebetween ground truth and prediction. LV, MYO, and RV are marked in red, green,and blue, respectively. ED ES ED ES (a)Time frame (b)GT (c)Ours (d)Overlap of GT and ours Fig. 5.
Visual examples of inaccurate segmentation. From left to right, it shows thetimeframe, ground truth, prediction of the proposed approach, and the difference be-tween ground truth and prediction. LV, MYO, and RV are marked in red, green, andblue, respectively. Zhang et al. different magnetic resonance scanner vendors (Siemens, General Electric, Philipsand Canon).The training set will contain 150 annotated images from two different MRIvendors (75 each) and 25 unannotated images from a third vendor. The CMRimages have been segmented by experienced clinicians from the respective insti-tutions, including contours for the LV and RV blood pools, as well as for the leftventricular MYO.The 200 test cases correspond to 50 new studies from each of the vendorsprovided in the training set and 50 additional studies from a fourth unseenvendor, that will be tested for model generalizability. 20% of these datasetswill be used for validation and the rest will be reserved for testing and rankingparticipants.
Our method is built upon nnUNet [6], a powerful implementation and out-of-the-box tool for medical image segmentation. Then, we use ANTs [1] as theimplementation of label propagation algorithm and using the default parameters.The transform method in the registration is three-stage, including rigid, affine,and deformable SyN transform. As for histogram matching, we utilize the scikit-image [9].
We first evaluate our method on the training set of 150 images, where 120 imagesare used for training and the rest for validation. Dice similarity is employed forevaluation.
Table 1.
Results on training setMethod Dice Similarity [%]LV MYO RV AverageBaseline 92.88 86.48 87.78 89.05LP 92.43 86.73 89.59 89.58LP+HM 92.46 87.00 90.94 90.13
Table 1 collects the results. Baseline is the fully supervised method (i.e.,the 3D UNet) trained solely on end-systole and end-diastole time-frames forwhich manual segmentations were available. “LP” and “HM” denote the pro-posed semi-supervised method that exploits label propagation and histogrammatching. It is observed that label propagation excels baseline by 0.53% andhistogram matching further increases 0.55% in terms of Dice per case.We then validate our method on the validation set, which is held out by theorganizers. The docker image with our method is submitted to the organizers itle Suppressed Due to Excessive Length 7
Table 2.
Results on validation setMethod Dice Similarity [%] Hausdorff Distance [mm]LV MYO RV LV MYO RVBaseline 91.15 ± ± ± ± ± ± ± ± ± ± ± ± to get the results. As shown in Table 2, our method consistently outperformsbaseline on LV, MYO and RV. Specifically, it improves 0.75% in terms of Diceper case, and reduces 0.82mm in terms of Hausdorff Distance on average of thethree targets. The results on different vendors are collected in Table 3. Table 3.
Results of different vendors on validation setMethod Vendor Dice [%] HD [mm]LV MYO RV LV MYO RVBaseline A 91.70 86.31 87.46 10.05 11.28 14.21B 94.43 88.79 92.90 7.31 11.16 8.55C 88.05 86.02 86.26 9.25 12.07 13.16D 90.39 82.74 86.34 8.73 18.06 14.99Proposed A 91.80 86.53 87.95 9.26 11.47 13.58B 94.64 89.07 93.78 6.50 10.25 7.39C 89.58 87.20 88.12 8.83 12.35 11.32D 91.00 83.54 87.44 7.80 16.57 13.26
Table 4.
Results on test setMethod Dice Similarity [%]LV MYO RVPeter M. Full 91.0 84.9 88.4Yao Zhang (Ours) 90.6 84.0 87.8Jun Ma 90.2 83.5 87.4Mario Parre˜no 91.2 83.8 85.3Fanwei Kong 90.2 82.8 85.7
We also submit our method as docker image to the organizers for online test .Please note that neither post-processing nor ensemble strategy is employed inour evaluation. Table 4 shows the top 5 teams and our method ranks 2nd place,demonstrating the effectiveness of our method for cardiac segmentation from Table 5.
Detailed results of our method on 4 vendors of the test setVendor Dice Similarity [%] Hausdorff Distance [mm]LV MYO RV LV MYO RVA 91.87 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± multiple vendors and centers. Table 5 presents the detailed results of our methodon the images from 4 different vendors. It is observed that the proposed methodobtains consistently promising results on both seen and unseen vendors. Fig. 4and Fig. 5 show some accurate and inaccurate predictions generated by ourmethod. In this paper, we design and develop a semi-supervised method for cardiac imagesegmentation from multiple vendors and medical centers. We exploit label prop-agation and iterative refinement to leverage unlabelled data in a semi-supervisedmanner. We further reduce distribution gap between MRI images from differentvendors and centers by histogram matching. The results show that our frame-work is able to achieve superior performance for robust LV, MYO, and RV seg-mentation. The proposed method ranks 2nd place among 14 competitive teamsin the M&M Challenge.
References
1. Avants, B.B., Tustison, N., Song, G.: Advanced normalization tools (ants). Or In-sight (2009)2. Campello, V.M., Palomares, J.F.R., Guala, A., Marakas, M., Friedrich, M., Lekadir,K.: Multi-Centre, Multi-Vendor & Multi-Disease Cardiac Image Segmentation Chal-lenge (Mar 2020). https://doi.org/10.5281/zenodo.3715890, https://doi.org/10.5281/zenodo.3715890
3. Chen, C., Ouyang, C., Tarroni, G., Schlemper, J., Qiu, H., Bai, W., Rueckert, D.:Unsupervised multi-modal style transfer for cardiac mr segmentation. arXiv preprintarXiv:1908.07344 pp. 209–219 (2019)4. Chen, C., Qin, C., Qiu, H., Tarroni, G., Duan, J., Bai, W., Rueckert, D.: Deep learn-ing for cardiac image segmentation: A review. Frontiers in Cardiovascular Medicine , 25 (2020)5. Chen, L.C., Lopes, R.G., Cheng, B., Collins, M.D., Cubuk, E.D., Zoph, B., Adam,H., Shlens, J.: Leveraging semi-supervised learning in video sequences for urbanscene segmentation. arXiv preprint arXiv:2005.10266 (2020)itle Suppressed Due to Excessive Length 96. Isensee, F., Petersen, J., Kohl, S.A.A., J¨ager, P.F., Maier-Hein, K.H.: nnu-net:Breaking the spell on successful medical image segmentation. arXiv preprintarXiv:1904.08128 (2019)7. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedicalimage segmentation. In: International Conference on Medical Image Computing andComputer-Assisted Intervention. pp. 234–241. Springer (2015)8. Tao, Q., Yan, W., Wang, Y., Paiman, E.H., Shamonin, D.P., Garg, P., Plein, S.,Huang, L., Xia, L., Sramko, M., et al.: Deep learning–based method for fully auto-matic quantification of left ventricle function from cine mr images: a multivendor,multicenter study. Radiology (1), 81–88 (2019)9. van der Walt, S., Sch¨onberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D.,Yager, N., Gouillart, E., Yu, T.: scikit-image: Image processing in python. PeerJ2