[PDF] Automatic Breast Lesion Classification by Joint Neural Analysis of Mammography and Ultrasound

Abstract

Mammography and ultrasound are extensively used by radiologists as complementary modalities to achieve better performance in breast cancer diagnosis. However, existing computer-aided diagnosis (CAD) systems for the breast are generally based on a single modality. In this work, we propose a deep-learning based method for classifying breast cancer lesions from their respective mammography and ultrasound images. We present various approaches and show a consistent improvement in performance when utilizing both modalities. The proposed approach is based on a GoogleNet architecture, fine-tuned for our data in two training steps. First, a distinct neural network is trained separately for each modality, generating high-level features. Then, the aggregated features originating from each modality are used to train a multimodal network to provide the final classification. In quantitative experiments, the proposed approach achieves an AUC of 0.94, outperforming state-of-the-art models trained over a single modality. Moreover, it performs similarly to an average radiologist, surpassing two out of four radiologists participating in a reader study. The promising results suggest that the proposed method may become a valuable decision support tool for breast radiologists.

Full PDF

AAutomatic Breast Lesion Classiﬁcation by JointNeural Analysis of Mammography andUltrasound

Gavriel Habib , Nahum Kiryati , Miri Sklair-Levy , Anat Shalmon , OsnatHalshtok Neiman , Renata Faermann Weidenfeld , Yael Yagil , Eli Konen ,and Arnaldo Mayer School of Electrical Engineering, Tel-Aviv University, Tel Aviv-Yafo, Israel [email protected] The Manuel and Raquel Klachky Chair of Image Processing, School of ElectricalEngineering, Tel-Aviv University, Tel Aviv-Yafo, Israel Diagnostic Imaging, Sheba Medical Center, aﬃliated to the Sackler School ofMedicine, Tel-Aviv University, Israel

Abstract.

Mammography and ultrasound are extensively used by ra-diologists as complementary modalities to achieve better performancein breast cancer diagnosis. However, existing computer-aided diagnosis(CAD) systems for the breast are generally based on a single modality. Inthis work, we propose a deep-learning based method for classifying breastcancer lesions from their respective mammography and ultrasound im-ages. We present various approaches and show a consistent improvementin performance when utilizing both modalities. The proposed approach isbased on a GoogleNet architecture, ﬁne-tuned for our data in two train-ing steps. First, a distinct neural network is trained separately for eachmodality, generating high-level features. Then, the aggregated featuresoriginating from each modality are used to train a multimodal network toprovide the ﬁnal classiﬁcation. In quantitative experiments, the proposedapproach achieves an AUC of 0.94, outperforming state-of-the-art modelstrained over a single modality. Moreover, it performs similarly to an av-erage radiologist, surpassing two out of four radiologists participating ina reader study. The promising results suggest that the proposed methodmay become a valuable decision support tool for breast radiologists.

Keywords:

Deep Learning · Mammography · Ultrasound.

Breast cancer is the second most common type of cancer among Americanwomen after skin cancer. According to the American Cancer Society estima-tions, 268,600 invasive breast cancer cases have been diagnosed in 2019, leadingto 41,760 deaths. However, early detection may save lives as it enables bettertreatment options.Mammography-based screening is the most widely used approach for breastcancer detection, with proven mortality reduction and early disease treatment a r X i v : . [ ee ss . I V ] S e p Habib et al.

Fig. 1.

Benign (Top) and Malignant (Bottom) lesions from our dataset. Malignantlesions tend to have less strict boundaries in both mammography (left) and ultrasound(right) screenings. beneﬁts [1]. However, it suﬀers from poor lesion visibility in dense breasts [2]. Toimprove sensitivity in dense breasts, contrast-enhanced spectral mammography(CESM) has been developed. CESM is based on the subtraction of low and highenergy images, acquired following the injection of a contrast agent [3]. AlthoughCESM reaches MRI levels of lesion visibility for dense breasts [31], the techniqueis still in the early adoption phase.Ultrasound imaging has proven to be a valuable tool in dense breasts, increas-ing cancer detection sensitivity by 17% [4]. Nevertheless, breast ultrasound maymiss solid tumors that are easily detected with mammography. Devolli-Disha etal. [5] showed that ultrasound had a higher sensitivity (69.2%) than mammogra-phy (15.4%) in women younger than 40 years, whereas mammography (78.7%)beats ultrasound (63.9%) in women older than 60 years. Due to its beneﬁts anddisadvantages, radiologists suggest using breast ultrasound as a complementaryscreening test to mammography [6,7].Classiﬁcation of breast lesions is a challenging task for the radiologist. Ma-lignant and benign lesions can be diﬀerentiated by their shape, boundary andtexture. For example, malignant lesions may have irregular and not well deﬁnedboundaries as they have the ability to spread (see Figure 1). Nevertheless, inmany cases radiologists cannot classify the lesion and the patient is referred fora biopsy which is a stressful and expensive process. Given that 65%-85% of thebiopsies turns out to be benign [8], there is a clear need for tools that will helpradiologists reduce benign biopsies.

In recent years, deep learning techniques have been providing signiﬁcant im-provements in various medical imaging tasks, such as tumor detection and clas-siﬁcation, image denoising and registration. In the ﬁeld of breast cancer classiﬁ-cation, existing methods are based mainly on mammograms [9,17,18,19], ultra-sound [10,20], MRI [21,22] or histopathology images [23].To deal with the limited amount of data, Chougrada et al. [9] used transferlearning over ImageNet and achieved state-of-the-art results over public mam- ultimodal Breast Cancer Classiﬁcation 3 mography datasets. Cheng et al. [10] performed a semi-supervised learning ap-proach over a large breast ultrasound dataset with only few annotated images.Wu et al. [19] synthesized mammogram lesions using class-conditional GAN andused them as additional training data instead of basic augmentations.Emphasizing the importance of lesions’ context, Wu et al. [17] trained a deepmulti-view CNN over a large private mammogram dataset. They used a breast-level model to create heatmaps that represent suspected areas, and a patch-levelmodel to locally predict the presence of malignant or benign ﬁndings. Shen etal. [18] combined coarse and ﬁne details using an attention mechanism to selectinformative patches for classiﬁcation.Common breast imaging modalities were also combined with additional datafrom other domains. Byra et al. [11] used the Nakagami parameter maps createdfrom breast ultrasound images to train a CNN from scratch. Perek et al. [12]integrated CESM images with features of BIRADS [13], a textual radiologicallexicon for breast lesions, as inputs to a classiﬁer.Most of previous studies utilized only a single modality, while some combineddiﬀerent types of breast images. Hadad et al. [14] classiﬁed MRI breast lesionsusing ﬁne tuning of a network pre-trained on mammography images insteadof natural images. Regarding mammography with ultrasound, Cong et al. [15]separately trained three base classiﬁers (SVM, KNN and Naive Bayes) for eachmodality, integrated some of them by a selective ensemble method and obtainedthe ﬁnal prediction by majority vote. Shaikh et al. [16] proposed a learning-using-privileged-information approach, i.e. utilizing both modalities for training, butavoiding one during test time. These papers suggested the potential of cross-modal learning.In this paper, we propose a novel deep-learning method for the classiﬁca-tion of breast lesion, using both mammography and ultrasound images of thelesion. To the best of our knowledge, it is the ﬁrst reported attempt to combinethese very diﬀerent imaging modalities by fusing high-level perceptual represen-tations for lesion classiﬁcation. We use a unique dataset consisting of matchedmammography and ultrasound lesions, acquired at our institution. The proposedmethods are evaluated using a leave one out scheme, demonstrating signiﬁcantimprovement in AUC (area under curve) when features extracted from bothmodalities are combined into a single multi-modality classiﬁer, in comparison tosingle modality classiﬁcation using only mammography or ultrasound.

Although combining mammography and ultrasound imaging for breast cancerscreenings is a common practice, to the best of our knowledge there are no publicdatasets containing corresponding lesions from both modalities. Therefore, wecreated our own retrospective dataset of 153 biopsy-proven lesions, consisting of73 malignant and 80 benign cases. For each lesion, corresponding mammography

Habib et al.

Fig. 2.

Matched malignant lesion contouring in both modalities. and ultrasound images were contoured by an expert breast radiologist, with abiopsy proven labelling. Figure 2 demonstrates a sample from the dataset.

Two convolutional neural networks (CNNs), onefor each modality, were trained to tell apart malignant and benign lesions. Thecontoured lesions were cropped into image patches and submitted to geomet-ric transformations (translation, rotation, ﬂipping) to augment the dataset andgenerate additional inputs.We experimented with two diﬀerent architectures: (1) Basic CNN with ReLUactivation maps, max pooling and fully connected layers that was trained fromscratch (Figure 3); (2) GoogleNet [24] previously trained over ImageNet.

Multimodal network

Figure 4 presents the multimodal fully connected net-work, consisting of 7 layers. High-level perceptual descriptors of matched lesionswere extracted from both trained single-modality networks and combined byconcatenation. The concatenated vector is then used as an input for the mul-timodal network, which eventually provides the ﬁnal malignancy probability ofthe input lesion.

Both single and multi modal classiﬁers were trained using thesame loss function in each experiment. To enrich diversity, we experimentedwith two loss functions: (1) BCE - Binary Cross Entropy loss; (2) LMCL - LargeMargin Cosine Loss [25] which is commonly used in Face Recognition tasks.LMCL deﬁnes a decision margin in the cosine space and learns discriminativefeatures by maximizing inter-class and minimizing intra-class cosine margin. ultimodal Breast Cancer Classiﬁcation 5 × × × × × × × × × × × ×

128 1 × × × × convolution+ReLUmax poolingfully connectedsoftmax Fig. 3.

Proposed single modality basic CNN architecture. The input is a cropped le-sion and the output is the softmax malignancy probability. A 512 dimensional vector(“descriptor”) is the last layer before the output layer. fully connectedsoftmax

Fig. 4.

Multimodality fully connected network architecture. The input is a concatena-tion of correspondence lesion descriptors extracted from mammography and ultrasoundCNNs. The output is the softmax malignancy probability.

Fig. 5.

Same lesion captured in diﬀerent views in ultrasound screening.

Training method

Lesion patches from diﬀerent modalities can be utilized forclassiﬁcation in diﬀerent manners. Unfortunately, image registration is almostimpossible because of the diﬀerence in mammography and ultrasound imag-ing techniques. Moreover, even ultrasound images of the same lesion are highlydiﬀerent, as the images are captured in various views and the breast is easilydeformed by the mechanical pressure applied by the transducer (see Figure 5).Therefore, we make use of the coupled mammography-ultrasound lesions bycombining them in the feature space instead of image space. For completeness,we show two diﬀerent training methods: (1) We ﬁrst train each single modalitynetwork separately, then combine high-level feature data from both networks andfeed it as an input for training the multimodal network; (2) End-to-end training,illustrated in Figure 6, in which we train all three networks (two single modalitynetworks and one fully connected after feature combination) concurrently. Theloss function is the sum of the losses from each of the three networks. In thisapproach, the performance of each network is tied to the other two’s.

Habib et al.

CNN v e c t o r MG scoreCNN v e c t o r US scoreconcatenated vector FC network multimodal score

Fig. 6.

End-to-end training method: all three classiﬁers are trained at the same time,while the loss is the sum of all three losses.

Given 153 mammography-ultrasound lesion pairs, we randomly selected 120 ﬁxedpairs for the leave-one-out experiments, benign and malignant being equallydistributed. The remaining 33 lesions were held out as validation set for hyper-parameter tuning.We ran 8 diﬀerent experiments, one for each combination of previously men-tioned conﬁgurations: training method, model architecture and loss function. Allthe experiments were performed using the “leave-one-out” methodology. In eachphase, 119 out of 120 lesions were used for training and a single lesion, diﬀerentin every phase, was used for testing. Finally, the test lesion obtained three scores,one for each modality and one combined, representing the average malignancyprobabilities of all its appearances in the dataset.Results were evaluated by means of AUC (area under the ROC curve), cal-culated from all test scores. As can be seen in Table 1, combining mammog-raphy and ultrasound features improved results in most experiments. In fact,only in one experiment results deteriorated due to the combination of modal-ities. Clearly, using transfer learning outperforms training from scratch, likelybecause of the small dataset size. Moreover, two steps of training (each modalityﬁrst and combined descriptors afterwards) achieve better results than an end-to-end training. No signiﬁcant diﬀerence is observed in the performance of thetested loss functions.

Comparison to state-of-the-art models

Our method is based on combiningmammography with ultrasound. However, as previous authors haven’t discussedtheir exact models’ design [15], we report the results of two baselines, each usinga single modality, and compare them to our single modality networks. As eachmodel was trained over a diﬀerent dataset, it may be confusing and even mean-ingless to directly compare reported results. Therefore, for qualitative assessmentof our model, we trained these models over our own dataset.

Mammography

The patch-level network proposed by [17] was trained on ourmammography dataset. Based on DenseNet121 [27] and transfer learning, it ultimodal Breast Cancer Classiﬁcation 7 achieved an AUC of 0.86 - better than our trained from scratch CNN (0.76), butinferior to GoogleNet (0.89).

Ultrasound

Training VGG16 model [29] previously trained over ImageNet, onour ultrasound dataset, as suggested by [28], yielded AUC of 0.81. It is worthmentioning that the reported AUC on the original dataset of Hijab et al. wasmuch higher (0.97), which may suggest that their method is sensitive to thespeciﬁc training data used. The obtained AUC is better than our trained fromscratch CNN (0.75), but inferior to GoogleNet (0.88).

Table 1.

AUC results of all experiments, reported on test set. Scores order is as follows:mammography/ultrasound/combined.

Training Loss InitializationMethod Function

From scratch (CNN) Transfer Learning (GoogleNet)

Separate BCE 0.76/0.75/

LMCL 0.73/0.79/

End-to-end BCE 0.74/0.78/ /0.81/

LMCL 0.72/0.78/ /0.80/0.78

To compare the proposed method with human radiologists, we performed a sim-pliﬁed reader study with 4 experienced radiologists. 120 pairs of correspondingmammography-ultrasound lesion images, taken from the biopsy-proven leave-one-out experiment dataset, were visually assigned a malignancy rate (from 0to 10) by each of the participating radiologists separately. The AUCs achievedby the readers were: 0.931, 0.938, 0.967 and 0.979, compared to 0.942 for ourbest model. ROC curves are shown in ﬁgure 7. These results suggest that theproposed model performed similarly to an average radiologist.

Is the model paying attention to the same attributes as radiologists when pre-dicting whether a lesion is malignant? To gain insight about this question, weapplied the Grad-CAM algorithm [26] to our best trained GoogleNet. Grad-CAM produces a gradient-based heat-map that highlights input parts that mostinﬂuenced the output prediction.In Figure 8, we present several mammography and ultrasound malignantexamples from the training set, with their Grad-CAM computations. “Hotter”areas indicate attended regions. We observe that for malignant lesions, the modelappears to rely signiﬁcantly (hot colors in heat map) on the lesion boundaries,especially where irregular features are encountered, in agreement with the radi-ologist diagnostic methodology.

Habib et al.

Fig. 7.

ROC curves of our model and each reader.

Fig. 8.

Examples of malignant lesions from the training set with their Grad-CAMvisualizations (Top-mammography, Bottom-ultrasound).

We propose a deep-learning method for the classiﬁcation of breast lesions thatcombines mammography and ultrasound input images. We show that by com-bining high-level perceptual features from both modalities, the classiﬁcation per-formance is improved. Furthermore, the proposed method is shown to performsimilarly to an average radiologist, surpassing two out of four radiologists par-ticipating in a reader study. The promising results suggest the proposed methodmay become a valuable decision support tool for multimodal classiﬁcation ofbreast lesions. In future research, further validation on a larger dataset shouldbe performed. The proposed method may be generalized by incorporating addi-tional imaging modalities, such as breast MRI, as well as medical backgroundinformation of the patient [30]. ultimodal Breast Cancer Classiﬁcation 9

References

1. Heywang-Köbrunner, Sylvia H., Astrid Hacker, and Stefan Sedlacek. "Advantagesand disadvantages of mammography screening". Breast care 6.3 (2011): 199-207.2. Carney PA, Miglioretti DL, Yankaskas BC, Kerlikowske K, Rosenberg R, RutterCM, Geller BM, Abraham LA, Taplin SH, Dignan M, Cutter G (2003) Individualand combined eﬀects of age, breast density, and hormone replacement therapy useon the accuracy of screening mammography. Ann Intern Med 138(3): 168-75.3. Lobbes MBI, Smidt ML, Houwers J, Tjan-Heijnen VC, Wildberger JE (2013) Con-trast enhanced mammography: techniques, current results, and potential indica-tions. Clin Radiol 68(9): 935-944.4. Kolb TM, et al. "Occult cancer in women with dense breasts: detection with screen-ing US diagnostic yield and tumor characteristics." Radiology (1998). Apr; 207(1):191-9.5. Devolli-Disha, E., Manxhuka-Kerliu, S., Ymeri, H. & Kutllovci, A. Comparative ac-curacy of mammography and ultrasound in women with breast symptoms accordingto age and breast density. Bosn. J. Basic. Med. Sci. 9, 131-136 (2009).6. Kelly, Kevin M., Judy Dean, Sung-Jae Lee, and W. Scott Comulada. "Breast cancerdetection: radiologists (cid:48) performance using mammography with and without auto-mated whole-breast ultrasound." European radiology 20, no. 11 (2010): 2557-2564.7. Skaane, Per, Randi Gullien, Ellen B. Eben, Merete Sandhaug, Ruediger Schulz-Wendtland, and Frank Stoeblen. "Interpretation of automated breast ultrasound(ABUS) with and without knowledge of mammography: a reader performancestudy." Acta Radiologica 56, no. 4 (2015): 404-412.8. Jesneck, J.L., Lo, J.Y., Baker, J.A.: Breast mass lesions: computer-aided diagnosismodels with mammographic and sonographic descriptors. Radiology 244(2), 390(2007).9. H. Chougrada, H. Zouakia, O. Alheyane: Deep convolutional neural networks forbreast cancer screening. Comp Meth Progr Biomed, 157 (2018), pp. 19-30.10. Cheng J-Z, Ni D, Chou Y-H, Qin J, Tiu C-M, Chang Y-C et al (2016) Computer-aided diagnosis with deep learning architecture: applications to breast lesions in USimages and pulmonary nodules in CT scans. Sci Rep 6:24454.11. Byra M, Piotrzkowska-Wroblewska H, Dobruch-Sobczak K, Nowicki A (2017) Com-bining Nakagami imaging and convolutional neural network for breast lesion classi-ﬁcation. Paper presented at the IEEE international ultrasonics symposium, IUS.12. Perek, S.; Kiryati, N.; Zimmerman-Moreno, G.; Sklair-Levy, M.; Konen, E.; Mayer,A. Classiﬁcation of contrast-enhanced spectral mammography (CESM) images. Int.J. Comput. Assist. Radiol. Surg. 2019, 14, 249-257.13. American College of Radiology, ACR BI-RADS Atlas 5th Edition, 125-143 (2013).14. Hadad, O.; Bakalo, R.; Ben-Ar, R.; Hashoul, S.; Amit, G.: Classiﬁcation of BreastLesions Using Cross-Modal Deep Learning. IEEE International Symposium onBiomedical Imaging (ISBI) (2017).15. Cong, Jinyu, Benzheng Wei, Yunlong He, Yilong Yin, and Yuanjie Zheng. "Aselective ensemble classiﬁcation method combining mammography images with ul-trasound images for breast cancer diagnosis." Computational and mathematicalmethods in medicine 2017 (2017).16. Shaikh, Tawseef Ayoub, Rashid Ali, and MM Sufyan Beg. "Transfer learning priv-ileged information fuels CAD diagnosis of breast cancer." Machine Vision and Ap-plications 31, no. 1 (2020): 9.0 Habib et al.17. Wu, Nan, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin,Stanisław Jastrzębski et al. "Deep neural networks improve radiologists (cid:48)(cid:48)