Automatic Breast Lesion Classification by Joint Neural Analysis of Mammography and Ultrasound
Gavriel Habib, Nahum Kiryati, Miri Sklair-Levy, Anat Shalmon, Osnat Halshtok Neiman, Renata Faermann Weidenfeld, Yael Yagil, Eli Konen, Arnaldo Mayer
AAutomatic Breast Lesion Classification by JointNeural Analysis of Mammography andUltrasound
Gavriel Habib , Nahum Kiryati , Miri Sklair-Levy , Anat Shalmon , OsnatHalshtok Neiman , Renata Faermann Weidenfeld , Yael Yagil , Eli Konen ,and Arnaldo Mayer School of Electrical Engineering, Tel-Aviv University, Tel Aviv-Yafo, Israel [email protected] The Manuel and Raquel Klachky Chair of Image Processing, School of ElectricalEngineering, Tel-Aviv University, Tel Aviv-Yafo, Israel Diagnostic Imaging, Sheba Medical Center, affiliated to the Sackler School ofMedicine, Tel-Aviv University, Israel
Abstract.
Mammography and ultrasound are extensively used by ra-diologists as complementary modalities to achieve better performancein breast cancer diagnosis. However, existing computer-aided diagnosis(CAD) systems for the breast are generally based on a single modality. Inthis work, we propose a deep-learning based method for classifying breastcancer lesions from their respective mammography and ultrasound im-ages. We present various approaches and show a consistent improvementin performance when utilizing both modalities. The proposed approach isbased on a GoogleNet architecture, fine-tuned for our data in two train-ing steps. First, a distinct neural network is trained separately for eachmodality, generating high-level features. Then, the aggregated featuresoriginating from each modality are used to train a multimodal network toprovide the final classification. In quantitative experiments, the proposedapproach achieves an AUC of 0.94, outperforming state-of-the-art modelstrained over a single modality. Moreover, it performs similarly to an av-erage radiologist, surpassing two out of four radiologists participating ina reader study. The promising results suggest that the proposed methodmay become a valuable decision support tool for breast radiologists.
Keywords:
Deep Learning · Mammography · Ultrasound.
Breast cancer is the second most common type of cancer among Americanwomen after skin cancer. According to the American Cancer Society estima-tions, 268,600 invasive breast cancer cases have been diagnosed in 2019, leadingto 41,760 deaths. However, early detection may save lives as it enables bettertreatment options.Mammography-based screening is the most widely used approach for breastcancer detection, with proven mortality reduction and early disease treatment a r X i v : . [ ee ss . I V ] S e p Habib et al.
Fig. 1.
Benign (Top) and Malignant (Bottom) lesions from our dataset. Malignantlesions tend to have less strict boundaries in both mammography (left) and ultrasound(right) screenings. benefits [1]. However, it suffers from poor lesion visibility in dense breasts [2]. Toimprove sensitivity in dense breasts, contrast-enhanced spectral mammography(CESM) has been developed. CESM is based on the subtraction of low and highenergy images, acquired following the injection of a contrast agent [3]. AlthoughCESM reaches MRI levels of lesion visibility for dense breasts [31], the techniqueis still in the early adoption phase.Ultrasound imaging has proven to be a valuable tool in dense breasts, increas-ing cancer detection sensitivity by 17% [4]. Nevertheless, breast ultrasound maymiss solid tumors that are easily detected with mammography. Devolli-Disha etal. [5] showed that ultrasound had a higher sensitivity (69.2%) than mammogra-phy (15.4%) in women younger than 40 years, whereas mammography (78.7%)beats ultrasound (63.9%) in women older than 60 years. Due to its benefits anddisadvantages, radiologists suggest using breast ultrasound as a complementaryscreening test to mammography [6,7].Classification of breast lesions is a challenging task for the radiologist. Ma-lignant and benign lesions can be differentiated by their shape, boundary andtexture. For example, malignant lesions may have irregular and not well definedboundaries as they have the ability to spread (see Figure 1). Nevertheless, inmany cases radiologists cannot classify the lesion and the patient is referred fora biopsy which is a stressful and expensive process. Given that 65%-85% of thebiopsies turns out to be benign [8], there is a clear need for tools that will helpradiologists reduce benign biopsies.
In recent years, deep learning techniques have been providing significant im-provements in various medical imaging tasks, such as tumor detection and clas-sification, image denoising and registration. In the field of breast cancer classifi-cation, existing methods are based mainly on mammograms [9,17,18,19], ultra-sound [10,20], MRI [21,22] or histopathology images [23].To deal with the limited amount of data, Chougrada et al. [9] used transferlearning over ImageNet and achieved state-of-the-art results over public mam- ultimodal Breast Cancer Classification 3 mography datasets. Cheng et al. [10] performed a semi-supervised learning ap-proach over a large breast ultrasound dataset with only few annotated images.Wu et al. [19] synthesized mammogram lesions using class-conditional GAN andused them as additional training data instead of basic augmentations.Emphasizing the importance of lesions’ context, Wu et al. [17] trained a deepmulti-view CNN over a large private mammogram dataset. They used a breast-level model to create heatmaps that represent suspected areas, and a patch-levelmodel to locally predict the presence of malignant or benign findings. Shen etal. [18] combined coarse and fine details using an attention mechanism to selectinformative patches for classification.Common breast imaging modalities were also combined with additional datafrom other domains. Byra et al. [11] used the Nakagami parameter maps createdfrom breast ultrasound images to train a CNN from scratch. Perek et al. [12]integrated CESM images with features of BIRADS [13], a textual radiologicallexicon for breast lesions, as inputs to a classifier.Most of previous studies utilized only a single modality, while some combineddifferent types of breast images. Hadad et al. [14] classified MRI breast lesionsusing fine tuning of a network pre-trained on mammography images insteadof natural images. Regarding mammography with ultrasound, Cong et al. [15]separately trained three base classifiers (SVM, KNN and Naive Bayes) for eachmodality, integrated some of them by a selective ensemble method and obtainedthe final prediction by majority vote. Shaikh et al. [16] proposed a learning-using-privileged-information approach, i.e. utilizing both modalities for training, butavoiding one during test time. These papers suggested the potential of cross-modal learning.In this paper, we propose a novel deep-learning method for the classifica-tion of breast lesion, using both mammography and ultrasound images of thelesion. To the best of our knowledge, it is the first reported attempt to combinethese very different imaging modalities by fusing high-level perceptual represen-tations for lesion classification. We use a unique dataset consisting of matchedmammography and ultrasound lesions, acquired at our institution. The proposedmethods are evaluated using a leave one out scheme, demonstrating significantimprovement in AUC (area under curve) when features extracted from bothmodalities are combined into a single multi-modality classifier, in comparison tosingle modality classification using only mammography or ultrasound.
Although combining mammography and ultrasound imaging for breast cancerscreenings is a common practice, to the best of our knowledge there are no publicdatasets containing corresponding lesions from both modalities. Therefore, wecreated our own retrospective dataset of 153 biopsy-proven lesions, consisting of73 malignant and 80 benign cases. For each lesion, corresponding mammography
Habib et al.
Fig. 2.
Matched malignant lesion contouring in both modalities. and ultrasound images were contoured by an expert breast radiologist, with abiopsy proven labelling. Figure 2 demonstrates a sample from the dataset.
Two convolutional neural networks (CNNs), onefor each modality, were trained to tell apart malignant and benign lesions. Thecontoured lesions were cropped into image patches and submitted to geomet-ric transformations (translation, rotation, flipping) to augment the dataset andgenerate additional inputs.We experimented with two different architectures: (1) Basic CNN with ReLUactivation maps, max pooling and fully connected layers that was trained fromscratch (Figure 3); (2) GoogleNet [24] previously trained over ImageNet.
Multimodal network
Figure 4 presents the multimodal fully connected net-work, consisting of 7 layers. High-level perceptual descriptors of matched lesionswere extracted from both trained single-modality networks and combined byconcatenation. The concatenated vector is then used as an input for the mul-timodal network, which eventually provides the final malignancy probability ofthe input lesion.
Both single and multi modal classifiers were trained using thesame loss function in each experiment. To enrich diversity, we experimentedwith two loss functions: (1) BCE - Binary Cross Entropy loss; (2) LMCL - LargeMargin Cosine Loss [25] which is commonly used in Face Recognition tasks.LMCL defines a decision margin in the cosine space and learns discriminativefeatures by maximizing inter-class and minimizing intra-class cosine margin. ultimodal Breast Cancer Classification 5 × × × × × × × × × × × ×
128 1 × × × × convolution+ReLUmax poolingfully connectedsoftmax Fig. 3.
Proposed single modality basic CNN architecture. The input is a cropped le-sion and the output is the softmax malignancy probability. A 512 dimensional vector(“descriptor”) is the last layer before the output layer. fully connectedsoftmax
Fig. 4.
Multimodality fully connected network architecture. The input is a concatena-tion of correspondence lesion descriptors extracted from mammography and ultrasoundCNNs. The output is the softmax malignancy probability.
Fig. 5.
Same lesion captured in different views in ultrasound screening.
Training method
Lesion patches from different modalities can be utilized forclassification in different manners. Unfortunately, image registration is almostimpossible because of the difference in mammography and ultrasound imag-ing techniques. Moreover, even ultrasound images of the same lesion are highlydifferent, as the images are captured in various views and the breast is easilydeformed by the mechanical pressure applied by the transducer (see Figure 5).Therefore, we make use of the coupled mammography-ultrasound lesions bycombining them in the feature space instead of image space. For completeness,we show two different training methods: (1) We first train each single modalitynetwork separately, then combine high-level feature data from both networks andfeed it as an input for training the multimodal network; (2) End-to-end training,illustrated in Figure 6, in which we train all three networks (two single modalitynetworks and one fully connected after feature combination) concurrently. Theloss function is the sum of the losses from each of the three networks. In thisapproach, the performance of each network is tied to the other two’s.
Habib et al.
CNN v e c t o r MG scoreCNN v e c t o r US scoreconcatenated vector FC network multimodal score
Fig. 6.
End-to-end training method: all three classifiers are trained at the same time,while the loss is the sum of all three losses.
Given 153 mammography-ultrasound lesion pairs, we randomly selected 120 fixedpairs for the leave-one-out experiments, benign and malignant being equallydistributed. The remaining 33 lesions were held out as validation set for hyper-parameter tuning.We ran 8 different experiments, one for each combination of previously men-tioned configurations: training method, model architecture and loss function. Allthe experiments were performed using the “leave-one-out” methodology. In eachphase, 119 out of 120 lesions were used for training and a single lesion, differentin every phase, was used for testing. Finally, the test lesion obtained three scores,one for each modality and one combined, representing the average malignancyprobabilities of all its appearances in the dataset.Results were evaluated by means of AUC (area under the ROC curve), cal-culated from all test scores. As can be seen in Table 1, combining mammog-raphy and ultrasound features improved results in most experiments. In fact,only in one experiment results deteriorated due to the combination of modal-ities. Clearly, using transfer learning outperforms training from scratch, likelybecause of the small dataset size. Moreover, two steps of training (each modalityfirst and combined descriptors afterwards) achieve better results than an end-to-end training. No significant difference is observed in the performance of thetested loss functions.
Comparison to state-of-the-art models
Our method is based on combiningmammography with ultrasound. However, as previous authors haven’t discussedtheir exact models’ design [15], we report the results of two baselines, each usinga single modality, and compare them to our single modality networks. As eachmodel was trained over a different dataset, it may be confusing and even mean-ingless to directly compare reported results. Therefore, for qualitative assessmentof our model, we trained these models over our own dataset.
Mammography
The patch-level network proposed by [17] was trained on ourmammography dataset. Based on DenseNet121 [27] and transfer learning, it ultimodal Breast Cancer Classification 7 achieved an AUC of 0.86 - better than our trained from scratch CNN (0.76), butinferior to GoogleNet (0.89).
Ultrasound
Training VGG16 model [29] previously trained over ImageNet, onour ultrasound dataset, as suggested by [28], yielded AUC of 0.81. It is worthmentioning that the reported AUC on the original dataset of Hijab et al. wasmuch higher (0.97), which may suggest that their method is sensitive to thespecific training data used. The obtained AUC is better than our trained fromscratch CNN (0.75), but inferior to GoogleNet (0.88).
Table 1.
AUC results of all experiments, reported on test set. Scores order is as follows:mammography/ultrasound/combined.
Training Loss InitializationMethod Function
From scratch (CNN) Transfer Learning (GoogleNet)
Separate BCE 0.76/0.75/
LMCL 0.73/0.79/
End-to-end BCE 0.74/0.78/ /0.81/
LMCL 0.72/0.78/ /0.80/0.78
To compare the proposed method with human radiologists, we performed a sim-plified reader study with 4 experienced radiologists. 120 pairs of correspondingmammography-ultrasound lesion images, taken from the biopsy-proven leave-one-out experiment dataset, were visually assigned a malignancy rate (from 0to 10) by each of the participating radiologists separately. The AUCs achievedby the readers were: 0.931, 0.938, 0.967 and 0.979, compared to 0.942 for ourbest model. ROC curves are shown in figure 7. These results suggest that theproposed model performed similarly to an average radiologist.
Is the model paying attention to the same attributes as radiologists when pre-dicting whether a lesion is malignant? To gain insight about this question, weapplied the Grad-CAM algorithm [26] to our best trained GoogleNet. Grad-CAM produces a gradient-based heat-map that highlights input parts that mostinfluenced the output prediction.In Figure 8, we present several mammography and ultrasound malignantexamples from the training set, with their Grad-CAM computations. “Hotter”areas indicate attended regions. We observe that for malignant lesions, the modelappears to rely significantly (hot colors in heat map) on the lesion boundaries,especially where irregular features are encountered, in agreement with the radi-ologist diagnostic methodology.
Habib et al.
Fig. 7.
ROC curves of our model and each reader.
Fig. 8.
Examples of malignant lesions from the training set with their Grad-CAMvisualizations (Top-mammography, Bottom-ultrasound).
We propose a deep-learning method for the classification of breast lesions thatcombines mammography and ultrasound input images. We show that by com-bining high-level perceptual features from both modalities, the classification per-formance is improved. Furthermore, the proposed method is shown to performsimilarly to an average radiologist, surpassing two out of four radiologists par-ticipating in a reader study. The promising results suggest the proposed methodmay become a valuable decision support tool for multimodal classification ofbreast lesions. In future research, further validation on a larger dataset shouldbe performed. The proposed method may be generalized by incorporating addi-tional imaging modalities, such as breast MRI, as well as medical backgroundinformation of the patient [30]. ultimodal Breast Cancer Classification 9
References
1. Heywang-Köbrunner, Sylvia H., Astrid Hacker, and Stefan Sedlacek. "Advantagesand disadvantages of mammography screening". Breast care 6.3 (2011): 199-207.2. Carney PA, Miglioretti DL, Yankaskas BC, Kerlikowske K, Rosenberg R, RutterCM, Geller BM, Abraham LA, Taplin SH, Dignan M, Cutter G (2003) Individualand combined effects of age, breast density, and hormone replacement therapy useon the accuracy of screening mammography. Ann Intern Med 138(3): 168-75.3. Lobbes MBI, Smidt ML, Houwers J, Tjan-Heijnen VC, Wildberger JE (2013) Con-trast enhanced mammography: techniques, current results, and potential indica-tions. Clin Radiol 68(9): 935-944.4. Kolb TM, et al. "Occult cancer in women with dense breasts: detection with screen-ing US diagnostic yield and tumor characteristics." Radiology (1998). Apr; 207(1):191-9.5. Devolli-Disha, E., Manxhuka-Kerliu, S., Ymeri, H. & Kutllovci, A. Comparative ac-curacy of mammography and ultrasound in women with breast symptoms accordingto age and breast density. Bosn. J. Basic. Med. Sci. 9, 131-136 (2009).6. Kelly, Kevin M., Judy Dean, Sung-Jae Lee, and W. Scott Comulada. "Breast cancerdetection: radiologists (cid:48) performance using mammography with and without auto-mated whole-breast ultrasound." European radiology 20, no. 11 (2010): 2557-2564.7. Skaane, Per, Randi Gullien, Ellen B. Eben, Merete Sandhaug, Ruediger Schulz-Wendtland, and Frank Stoeblen. "Interpretation of automated breast ultrasound(ABUS) with and without knowledge of mammography: a reader performancestudy." Acta Radiologica 56, no. 4 (2015): 404-412.8. Jesneck, J.L., Lo, J.Y., Baker, J.A.: Breast mass lesions: computer-aided diagnosismodels with mammographic and sonographic descriptors. Radiology 244(2), 390(2007).9. H. Chougrada, H. Zouakia, O. Alheyane: Deep convolutional neural networks forbreast cancer screening. Comp Meth Progr Biomed, 157 (2018), pp. 19-30.10. Cheng J-Z, Ni D, Chou Y-H, Qin J, Tiu C-M, Chang Y-C et al (2016) Computer-aided diagnosis with deep learning architecture: applications to breast lesions in USimages and pulmonary nodules in CT scans. Sci Rep 6:24454.11. Byra M, Piotrzkowska-Wroblewska H, Dobruch-Sobczak K, Nowicki A (2017) Com-bining Nakagami imaging and convolutional neural network for breast lesion classi-fication. Paper presented at the IEEE international ultrasonics symposium, IUS.12. Perek, S.; Kiryati, N.; Zimmerman-Moreno, G.; Sklair-Levy, M.; Konen, E.; Mayer,A. Classification of contrast-enhanced spectral mammography (CESM) images. Int.J. Comput. Assist. Radiol. Surg. 2019, 14, 249-257.13. American College of Radiology, ACR BI-RADS Atlas 5th Edition, 125-143 (2013).14. Hadad, O.; Bakalo, R.; Ben-Ar, R.; Hashoul, S.; Amit, G.: Classification of BreastLesions Using Cross-Modal Deep Learning. IEEE International Symposium onBiomedical Imaging (ISBI) (2017).15. Cong, Jinyu, Benzheng Wei, Yunlong He, Yilong Yin, and Yuanjie Zheng. "Aselective ensemble classification method combining mammography images with ul-trasound images for breast cancer diagnosis." Computational and mathematicalmethods in medicine 2017 (2017).16. Shaikh, Tawseef Ayoub, Rashid Ali, and MM Sufyan Beg. "Transfer learning priv-ileged information fuels CAD diagnosis of breast cancer." Machine Vision and Ap-plications 31, no. 1 (2020): 9.0 Habib et al.17. Wu, Nan, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin,Stanisław Jastrzębski et al. "Deep neural networks improve radiologists (cid:48)(cid:48)