[PDF] Semi-supervised Cardiac Image Segmentation via Label Propagation and Style Transfer

Abstract

Accurate segmentation of cardiac structures can assist doctors to diagnose diseases, and to improve treatment planning, which is highly demanded in the clinical practice. However, the shortage of annotation and the variance of the data among different vendors and medical centers restrict the performance of advanced deep learning methods. In this work, we present a fully automatic method to segment cardiac structures including the left (LV) and right ventricle (RV) blood pools, as well as for the left ventricular myocardium (MYO) in MRI volumes. Specifically, we design a semi-supervised learning method to leverage unlabelled MRI sequence timeframes by label propagation. Then we exploit style transfer to reduce the variance among different centers and vendors for more robust cardiac image segmentation. We evaluate our method in the M&Ms challenge 7 , ranking 2nd place among 14 competitive teams.

Full PDF

SSemi-supervised Cardiac Image Segmentationvia Label Propagation and Style Transfer

Yao Zhang , , Jiawei Yang , Feng Hou , , Yang Liu , , Yixin Wang , , JiangTian , Cheng Zhong , Yang Zhang , and Zhiqiang He [email protected] Abstract.

Accurate segmentation of cardiac structures can assist doc-tors to diagnose diseases, and to improve treatment planning, which ishighly demanded in the clinical practice. However, the shortage of anno-tation and the variance of the data among diﬀerent vendors and medicalcenters restrict the performance of advanced deep learning methods. Inthis work, we present a fully automatic method to segment cardiac struc-tures including the left (LV) and right ventricle (RV) blood pools, as wellas for the left ventricular myocardium (MYO) in MRI volumes. Speciﬁ-cally, we design a semi-supervised learning method to leverage unlabelledMRI sequence timeframes by label propagation. Then we exploit styletransfer to reduce the variance among diﬀerent centers and vendors formore robust cardiac image segmentation. We evaluate our method in theM&Ms challenge , ranking 2nd place among 14 competitive teams. Cardiac disease is one of the leading threats to human health and causes massivedeath every year. In the clinical routine, advanced medical imaging techniques(such as MRI, CT, ultrasound) are used for the diagnosis of cardiac disease.Accurate segmentation of cardiac structures from medical images is an essentialstep to quantatively evaluate cardiac function and improve therapy planning,which is highly demanded in the clinical practice.In the recent years, deep learning models have been widely used for cardiacimage segmentation and achieved promising results [4]. However, these modelscould be failed on unseen datasets acquired from distinct clinical centers ormedical imaging scanners [8]. The M&Ms challenge is motivated to contributeto the eﬀort of building generalizable models that can be applied consistentlyacross clinical centers [2]. In this challenge, the cohort is composed of 350 patientswith hypertrophic and dilated cardiomyopathies as well as healthy subjects.All subjects were scanned in clinical centers in three diﬀerent countries (Spain,Germany and Canada) using four diﬀerent magnetic resonance scanner vendors(Siemens, General Electric, Philips and Canon). The variance of data amongmultiple centers and vendors poses extreme challenge to the generalization ofmachine/deep learning models. a r X i v : . [ ee ss . I V ] D ec Zhang et al. E x t e r n a l I n n o v a t i o n Manual LabelPseudo Label

Label Propagation

Orignial StyleUnited Style

Style Transfer3D UNet

Train Finetune

Fig. 1.

The pipeline of our method.

The success of these deep learning models usually relies on large-scale man-ually annotated datasets. It requires expensive resources to manually label theimages. In contrast to 3D MRI or CT images for other organs, cardiac MRIsequence timeframes has 4 dimensions (i.e., height, depth, width, and time se-ries), which leads to much more workload for annotation. In M&Ms challenge,only the End of Systole (ES) and the End of Diastole (ED) of the cardiac MRIsequence timeframes are annotated. Plenty of images between ES and ED stayunlabelled, which limits the performance of supervised learning.In this paper, we propose to leverage unlabelled MRI sequence timeframesto improve cardiac segmentation by label propagation. Then we exploit styletransfer to reduce the variance among diﬀerent centers and vendors for morerobust cardiac image segmentation.

In this section, we will describe our method in detail. We employ a semi-supervisedmethod to achieve eﬀective cardiac image segmentation using unlabelled time-frames from the MRI sequence timeframes. The pipeline of our method is il-lustrated in Fig. 1. In contrast to [3], we exploit label propagation to leverageunlabelled cardiac sequence. Firstly, a set of pseudo labels of the unlabelled im-ages between ES and ED are generated. Secondly, a 3D UNet [7] is trained onthe data with both manual and pseudo labels. At last, as those pseudo labels isnot as accurate as manual ones, the trained UNet is ﬁne-tuned on the manuallylabelled data. Furthermore, in order to reduce the gap of the data from diﬀerentvendors, we augment the training data through style transfer to improve thegeneralization of semi-supervised learning. itle Suppressed Due to Excessive Length 3

RegistrationRegistration (a) Unlabeled time frame (b) Labeled template time frames

Argmin Norm (c) Wrap transformation matrix

Transformation (d) Generated label

Fig. 2.

The procedure of label propagation.

Here we augment our training set from unlabeled time frames by leveraging theinsights of label propagation. Label propagation is a family of semi-supervisedalgorithms. It advocates the exploitation of similarity between labeled and un-labeled data, granting us the ability to assign labels to previously unlabeleddata.Fortunately, this task is a well-formulated problem for label propagationfor its subject-level time-series prosperity. Firstly, the annotations of End ofSystole (ES) and End of Diastole (ED) are good priors as they capture the mostextreme scenarios and thus can cover the transition frames between ES andED. Meanwhile, data from diﬀerent time frames within a patient share almostidentical distribution, which will alleviate propagation errors caused by inter-subject variabilities. Hence, with ES and ED frames as priors, label propagationalgorithms can propagate their labels to non-ES and non-ED time frames in aintra-subject manner. By doing so, the propagated labels are better constrainedwith speciﬁcity compared with inter-subject propagation.Speciﬁcally, we use registration algorithm to propagate labels, as shown inFig. 2. For each patient, the ES and ED frames are the template frames. Givena target time frame, two template frames are ﬁrst registered to T , resulting two Warp matrix, called ES wrap and ED wrap . Naturally, we want the registered labelsmooth and not elasticizing from the template too much. So we then comparethe norm of two ES wrap and ED wrap and choose the one with smaller normas T wrap . Finally, the generated label is obtained by applying the T warp to thecorresponding template label. Zhang et al.

Fig. 3.

Histogram distribution of MRI from diﬀerent clinical centers. Center 1, center2, and center 3 are marked in blue, orange, and green, respectively. Note that MRIscanners used in both center 2 and center 3 is from the same vendor, while that incenter 1 is diﬀerent.

Instead of involving a complicated deep learning network for style transfer [5],we utilize a simple yet eﬀective method, histogram matching, to achieve this.Histogram matching is a process where a time series, image, or higher dimensionscalar data is modiﬁed such that its histogram matches that of reference dataset.A common application of this is to match the images from two sensors withslightly diﬀerent responses, or from a sensor whose response changes. We analyzethe histogram distribution between diﬀerent vendors and medical centers, andﬁnd that the histogram diﬀerence between vendors is much more remarkablewhile that between centers is insigniﬁcant (see Fig. 3). Herein, we apply it tomatch MRI images from diﬀerent vendors.The procedure is as follows. The cumulative histogram is computed for eachdataset from a vendor. For any particular value x i in the data to be adjustedhas a cumulative histogram value given by S ( x i ). This in turn is the cumulativedistribution value in the reference dataset, denoted as T ( x j ). The input datavalue x i is replaced by x j .Speciﬁcally, we sample 100 volumes from training set and then randomlyselect a slice from each volume as the uniﬁed reference data. Then other data ismatched to the reference one. As large scale dataset beneﬁts the deep learningmethod, we also augment the training data by transferring the data from onevendor to another (i.e., from vendor A to vendor B or vice versa). The challenge cohort is composed of 350 patients with hypertrophic and dilatedcardiomyopathies as well as healthy subjects. All subjects were scanned in clini-cal centres in three diﬀerent countries (Spain, Germany and Canada) using four itle Suppressed Due to Excessive Length 5

Fig. 4.

Visual examples of accurate segmentation. From left to right, each column showsthe timeframe, ground truth, prediction of the proposed approach, and the diﬀerencebetween ground truth and prediction. LV, MYO, and RV are marked in red, green,and blue, respectively. ED ES ED ES (a)Time frame (b)GT (c)Ours (d)Overlap of GT and ours Fig. 5.

Visual examples of inaccurate segmentation. From left to right, it shows thetimeframe, ground truth, prediction of the proposed approach, and the diﬀerence be-tween ground truth and prediction. LV, MYO, and RV are marked in red, green, andblue, respectively. Zhang et al. diﬀerent magnetic resonance scanner vendors (Siemens, General Electric, Philipsand Canon).The training set will contain 150 annotated images from two diﬀerent MRIvendors (75 each) and 25 unannotated images from a third vendor. The CMRimages have been segmented by experienced clinicians from the respective insti-tutions, including contours for the LV and RV blood pools, as well as for the leftventricular MYO.The 200 test cases correspond to 50 new studies from each of the vendorsprovided in the training set and 50 additional studies from a fourth unseenvendor, that will be tested for model generalizability. 20% of these datasetswill be used for validation and the rest will be reserved for testing and rankingparticipants.

Our method is built upon nnUNet [6], a powerful implementation and out-of-the-box tool for medical image segmentation. Then, we use ANTs [1] as theimplementation of label propagation algorithm and using the default parameters.The transform method in the registration is three-stage, including rigid, aﬃne,and deformable SyN transform. As for histogram matching, we utilize the scikit-image [9].

We ﬁrst evaluate our method on the training set of 150 images, where 120 imagesare used for training and the rest for validation. Dice similarity is employed forevaluation.

Table 1.

Results on training setMethod Dice Similarity [%]LV MYO RV AverageBaseline 92.88 86.48 87.78 89.05LP 92.43 86.73 89.59 89.58LP+HM 92.46 87.00 90.94 90.13

Table 1 collects the results. Baseline is the fully supervised method (i.e.,the 3D UNet) trained solely on end-systole and end-diastole time-frames forwhich manual segmentations were available. “LP” and “HM” denote the pro-posed semi-supervised method that exploits label propagation and histogrammatching. It is observed that label propagation excels baseline by 0.53% andhistogram matching further increases 0.55% in terms of Dice per case.We then validate our method on the validation set, which is held out by theorganizers. The docker image with our method is submitted to the organizers itle Suppressed Due to Excessive Length 7

Table 2.

Results on validation setMethod Dice Similarity [%] Hausdorﬀ Distance [mm]LV MYO RV LV MYO RVBaseline 91.15 ± ± ± ± ± ± ± ± ± ± ± ± to get the results. As shown in Table 2, our method consistently outperformsbaseline on LV, MYO and RV. Speciﬁcally, it improves 0.75% in terms of Diceper case, and reduces 0.82mm in terms of Hausdorﬀ Distance on average of thethree targets. The results on diﬀerent vendors are collected in Table 3. Table 3.

Results of diﬀerent vendors on validation setMethod Vendor Dice [%] HD [mm]LV MYO RV LV MYO RVBaseline A 91.70 86.31 87.46 10.05 11.28 14.21B 94.43 88.79 92.90 7.31 11.16 8.55C 88.05 86.02 86.26 9.25 12.07 13.16D 90.39 82.74 86.34 8.73 18.06 14.99Proposed A 91.80 86.53 87.95 9.26 11.47 13.58B 94.64 89.07 93.78 6.50 10.25 7.39C 89.58 87.20 88.12 8.83 12.35 11.32D 91.00 83.54 87.44 7.80 16.57 13.26

Table 4.

Results on test setMethod Dice Similarity [%]LV MYO RVPeter M. Full 91.0 84.9 88.4Yao Zhang (Ours) 90.6 84.0 87.8Jun Ma 90.2 83.5 87.4Mario Parre˜no 91.2 83.8 85.3Fanwei Kong 90.2 82.8 85.7

We also submit our method as docker image to the organizers for online test .Please note that neither post-processing nor ensemble strategy is employed inour evaluation. Table 4 shows the top 5 teams and our method ranks 2nd place,demonstrating the eﬀectiveness of our method for cardiac segmentation from Table 5.

Detailed results of our method on 4 vendors of the test setVendor Dice Similarity [%] Hausdorﬀ Distance [mm]LV MYO RV LV MYO RVA 91.87 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± multiple vendors and centers. Table 5 presents the detailed results of our methodon the images from 4 diﬀerent vendors. It is observed that the proposed methodobtains consistently promising results on both seen and unseen vendors. Fig. 4and Fig. 5 show some accurate and inaccurate predictions generated by ourmethod. In this paper, we design and develop a semi-supervised method for cardiac imagesegmentation from multiple vendors and medical centers. We exploit label prop-agation and iterative reﬁnement to leverage unlabelled data in a semi-supervisedmanner. We further reduce distribution gap between MRI images from diﬀerentvendors and centers by histogram matching. The results show that our frame-work is able to achieve superior performance for robust LV, MYO, and RV seg-mentation. The proposed method ranks 2nd place among 14 competitive teamsin the M&M Challenge.

References

1. Avants, B.B., Tustison, N., Song, G.: Advanced normalization tools (ants). Or In-sight (2009)2. Campello, V.M., Palomares, J.F.R., Guala, A., Marakas, M., Friedrich, M., Lekadir,K.: Multi-Centre, Multi-Vendor & Multi-Disease Cardiac Image Segmentation Chal-lenge (Mar 2020). https://doi.org/10.5281/zenodo.3715890, https://doi.org/10.5281/zenodo.3715890

3. Chen, C., Ouyang, C., Tarroni, G., Schlemper, J., Qiu, H., Bai, W., Rueckert, D.:Unsupervised multi-modal style transfer for cardiac mr segmentation. arXiv preprintarXiv:1908.07344 pp. 209–219 (2019)4. Chen, C., Qin, C., Qiu, H., Tarroni, G., Duan, J., Bai, W., Rueckert, D.: Deep learn-ing for cardiac image segmentation: A review. Frontiers in Cardiovascular Medicine , 25 (2020)5. Chen, L.C., Lopes, R.G., Cheng, B., Collins, M.D., Cubuk, E.D., Zoph, B., Adam,H., Shlens, J.: Leveraging semi-supervised learning in video sequences for urbanscene segmentation. arXiv preprint arXiv:2005.10266 (2020)itle Suppressed Due to Excessive Length 96. Isensee, F., Petersen, J., Kohl, S.A.A., J¨ager, P.F., Maier-Hein, K.H.: nnu-net:Breaking the spell on successful medical image segmentation. arXiv preprintarXiv:1904.08128 (2019)7. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedicalimage segmentation. In: International Conference on Medical Image Computing andComputer-Assisted Intervention. pp. 234–241. Springer (2015)8. Tao, Q., Yan, W., Wang, Y., Paiman, E.H., Shamonin, D.P., Garg, P., Plein, S.,Huang, L., Xia, L., Sramko, M., et al.: Deep learning–based method for fully auto-matic quantiﬁcation of left ventricle function from cine mr images: a multivendor,multicenter study. Radiology (1), 81–88 (2019)9. van der Walt, S., Sch¨onberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D.,Yager, N., Gouillart, E., Yu, T.: scikit-image: Image processing in python. PeerJ2