Context-Aware Learning using Transferable Features for Classification of Breast Cancer Histology Images
Ruqayya Awan, Navid Alemi Koohbanani, Muhammad Shaban, Anna Lisowska, Nasir Rajpoot
CContext-Aware Learning using TransferableFeatures for Classification of Breast CancerHistology Images
Ruqayya Awan (cid:63) , Navid Alemi Koohbanani (cid:63) , Muhammad Shaban , AnnaLisowska , and Nasir Rajpoot Department of Computer Science, University of Warwick, Coventry, UK The Alan Turing Institute, London, UK Department of Pathology, University Hospitals Coventry & Warwickshire, UK
Abstract.
Convolutional neural networks (CNNs) have been recentlyused for a variety of histology image analysis. However, availability ofa large dataset is a major prerequisite for training a CNN which limitsits use by the computational pathology community. In previous studies,CNNs have demonstrated their potential in terms of feature generaliz-ability and transferability accompanied with better performance. Consid-ering these traits of CNN, we propose a simple yet effective method whichleverages the strengths of CNN combined with the advantages of includ-ing contextual information, particularly designed for a small dataset. Ourmethod consists of two main steps: first it uses the activation features ofCNN trained for a patch-based classification and then it trains a separateclassifier using features of overlapping patches to perform image-basedclassification using the contextual information. The proposed frameworkoutperformed the state-of-the-art method for breast cancer classification.
Keywords:
Digital pathology, Convolutional neural network, Context-aware learning, Transferable features, Breast cancer
Breast cancer is the most common type of cancer diagnosed and is the secondmost common type of cancer with high mortality rate after lung cancer in women[1]. Due to the increased incidence of breast cancer and subjectivity in diagno-sis, there is an increasing demand for automated systems. To this end, deepneural networks (DNNs) have been widely used to produce the state-of-the-artresults for a variety of histology image analysis tasks such as nuclei detectionand classification [2], tissue classification [3,4] and segmentation [5,6].The CAMELYON16 challenge [6] is the best demonstration of using deeplearning for automatic tissue analysis, outperforming the pathologists in termsof detection of tumors within the whole slide images (WSIs). The objective of thischallenge was to automatically detect the metastasis in haematoxylin and eosin (cid:63)
Joint co-authors a r X i v : . [ c s . C V ] M a r BACH Challenge (H&E) stained WSIs of lymph node sections. Cruz-Roa et al. [3] presented a deeplearning architecture for automated basal carcinoma detection. This methodfirst learns image representation via autoencoder and then a CNN is applied onthis representation to capture both translation invariant features and a compactimage representation. Spanhol et al. [7] applied a simple CNN for classifying theBreaKHis database [8] consisting of microscopic images of benign and malignantbreast tumor biopsies. Small patches were extracted at different magnificationlevels to train the network and during inference, final output was produced bycombining the predictions of the small patches.The generalizability property of DNN makes their features transferable toother applications which encouraged the researchers to employ transfer learn-ing for histology images as in [5,9,10]. These features have also been used totrain separate classifiers for predictions [11,12,13,14], which are particularly use-ful when there is not enough dataset for training the CNN from scratch. Insome recent studies [15,16], context-aware based learning architecture has beenintroduced, in which first CNN is trained using high pixel resolution patches toextract features at a cellular level that are then fed to a second CNN, stackedon top of the first for expanding the context from a single patch to a large tissueregion. The experimental results of these studies suggest that the contextualinformation plays a crucial role in identifying abnormalities in heterogeneoustissue structures.Our contribution in this work is twofold. First, we propose to use CNN fea-tures as a generic descriptor for a small dataset, provided as a part of a chal-lenge dataset. We extract transferable features from a number of networks, eachtrained on a different dataset for the purpose of classification by a separate clas-sifier trained on these features. As our second contribution, we combine thesefeatures to learn context of a large patch to improve our classification perfor-mance. To this end, we use transferable features for a block of consecutive patchesto train a SVM model to classify the H&E stained breast images into normal,benign, carcinoma insitu (CIS) and breast invasive carcinoma (BIC).
We used the dataset provided as a part of the ICIAR 2018 challenge for theclassification of breast cancer histology images. This dataset comprises of 400high resolution images of size 2048 × × magnification, stainedwith H&E stain. The pixel resolution for these images is 0.42 µ m. Each imagebelongs to one of the four classes: normal, benign, insitu carcinoma or invasivecarcinoma. The ground truth was provided by the two pathologists. To study thefeature transferability of CNN, we experimented with other part of the challengedataset provided for segmentation task. Ten WSIs with coarse annotations wereprovided for this task. We extracted patches from these WSIs after manuallyrefining the original annotations.The challenge dataset for a classification task consists of training imagesused in [14] along with 151 additional images. To evaluate the effectiveness of ACH Challenge 3 our proposed approach, we splitted the challenge dataset for two settings. In thefirst setting, we use the same images for training and testing which were used in[14] for a fair comparison. We included the additional images in our validationset while training the network. The test dataset contains two set of images, withequal number of images in each class. The testing data is not provided with thechallenge data but is made publicly available by the authors in two sets. Thefirst test set contains 20 images while the second set contains 16 images and isreferred as test extended dataset in this paper. In the second setting which isused for submission to the challenge, we combined the whole challenge datasetfrom task-1 and the test dataset and randomly split them into 75% training and25% validation set.Regarding the implementation, we used residual neural network with 50 lay-ers for patch-based classification in Tensorflow. For context-aware image-basedclassification, support vector machine (SVM) classifier with radial basis function(RBF) was used and implemented in MATLAB. Further details on both thesesteps are given in the
Methods section.
In this paper, we introduce an effective model for the purpose of image-basedclassification using more context information, particularly for a small dataset.To this end, we design our model in two main steps: patch-based classificationand context-aware image-based classification. The overall system architecture isshown in Figure 1.
Fig. 1.
Flow Diagram of our classification framework. Twelve non-overlapping patchesare extracted from the input image. A 8192-dimensional feature vector is then obtainedfor each patch using a trained ResNet. The class label for the overlapping blocks (2 × Stain inconsistency of digitized WSIs is a significant issue affecting the perfor-mance of machine learning (ML) systems. The dataset provided for this chal-
BACH Challenge lenge contains images with large stain variation. To this end, we performed stainnormalization using the Reinhard method [17], available in our group’s
StainNormalization Toolbox [18]. This method transforms the color distribution of animage to the color distribution of a target image by matching the mean and stan-dard deviation of the source image to that of target image. This transformationis carried out for each channel separately, in the Lab colorspace.
Fig. 2.
Output of stain normalization: A, B and C show the target image, the originalimage and the stain normalized version of B respectively.
ResNet [19] introduced in 2015 by Microsoft has been shown to outperformseveral architectures including VGG [20], GoogleNet [21], PReLU-net [22] andBN-inception [23]. This network also outperformed best performing networkswith a significant margin for the classification of histopathology colorectal images[24]. The state-of-the-art results of ResNet on different datasets motivated us touse it for our patch-based classification. For our experiments, we used ResNetwith 50 layers. For network training, overlapping patches of size 512 ×
512 pixelswere extracted from the images. The network was trained for 16 epochs withbatch size of 12 and the best trained network was selected for further processing.The training was done using stochastic gradient descent with momentum set to0.95. The learning rate was initially set to 0.001 and was decremented after eachupdate. Due to the very small dataset and also to make our network robustto feature transformation, we performed data augmentation involving randomrotation (90 to 360 degrees with step of 90 degrees) and flipping during thetraining stage.
The above patch-based classification network learns a limited contextual repre-sentation for each class by using small patches of size 512 ×
512 pixels. To train aclassifier with larger context, we divided each image into twelve non-overlappingpatches and for each patch, we then extracted 8192 dimensional feature vector
ACH Challenge 5 from the last layer of our patch-based network. We then trained an SVM clas-sifier with the flattened features of 2 × × For the evaluation of our proposed method, we experimented with different con-figurations to show the significance of contextual information, effect of featuretransferability using networks trained on different datasets and also to compareour method with the results of [14].Firstly, we experimented with contextual information captured from the vary-ing size of block of patches. We trained SVM with the context of 1 × ×
512 pixels), 2 × × × × × × × × BACH Challenge achieved higher accuracy compared to [14] which demonstrates the capability ofthe contextual information for discriminating different classes.
Fig. 3.
Summary of our experimental results. (a) Accuracy obtained using the contextof various size of blocks where Context(1 × ×
2) and Context(3 ×
3) repre-sent contextual block of size 512 × × × In this paper, we proposed a context-aware network for automated classificationof breast cancer histology images. The proposed method leverages the power ofCNNs to encode the representation of a patch into high dimensional space anduses traditional machine method (SVM) to aggregate the contextual informationfrom the high dimensional features while having a limited dataset. Our proposedapproach outperformed the existing methods proposed for the same task. Theproposed method is not limited to breast cancer classification task. It could beapplied to other problems where both high resolution and contextual informationare required to make an optimal prediction.
ACH Challenge 7
References
1. R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,”
CA: a cancerjournal for clinicians , vol. 66, no. 1, pp. 7–30, 2016.2. K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. Snead, I. A. Cree, andN. M. Rajpoot, “Locality sensitive deep learning for detection and classificationof nuclei in routine colon cancer histology images,”
IEEE transactions on medicalimaging , vol. 35, no. 5, pp. 1196–1206, 2016.3. A. Cruz-Roa, A. Basavanhally, F. Gonz´alez, H. Gilmore, M. Feldman, S. Ganesan,N. Shih, J. Tomaszewski, and A. Madabhushi, “Automatic detection of invasiveductal carcinoma in whole slide images with convolutional neural networks,” in
SPIE medical imaging , vol. 9041, pp. 904103–904103, International Society forOptics and Photonics, 2014.4. D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, “Deep learning foridentifying metastatic breast cancer,” arXiv preprint arXiv:1606.05718 , 2016.5. H. Chen, X. Qi, L. Yu, and P.-A. Heng, “Dcan: Deep contour-aware networks foraccurate gland segmentation,” in
Proceedings of the IEEE conference on ComputerVision and Pattern Recognition , pp. 2487–2496, 2016.6. B. E. Bejnordi, M. Veta, P. J. van Diest, B. van Ginneken, N. Karssemeijer, G. Lit-jens, J. A. van der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol, et al. , “Diag-nostic assessment of deep learning algorithms for detection of lymph node metas-tases in women with breast cancer,”
Jama , vol. 318, no. 22, pp. 2199–2210, 2017.7. F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “Breast cancerhistopathological image classification using convolutional neural networks,” in
Neu-ral Networks (IJCNN), 2016 International Joint Conference on , pp. 2560–2567,IEEE, 2016.8. F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “A dataset for breastcancer histopathological image classification,”
IEEE Transactions on BiomedicalEngineering , vol. 63, no. 7, pp. 1455–1462, 2016.9. N. Bayramoglu and J. Heikkil¨a, “Transfer learning for cell nuclei classification inhistopathology images,” in
Computer Vision–ECCV 2016 Workshops , pp. 532–539,Springer, 2016.10. Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li, “Breast cancer multi-classification from histopathological images with structured deep learning model,”
Scientific reports , vol. 7, no. 1, p. 4172, 2017.11. Y. Xu, Z. Jia, L.-B. Wang, Y. Ai, F. Zhang, M. Lai, I. Eric, and C. Chang, “Largescale tissue histopathology image classification, segmentation, and visualization viadeep convolutional activation features,”
BMC bioinformatics , vol. 18, no. 1, p. 281,2017.12. M. Valkonen, K. Kartasalo, K. Liimatainen, M. Nykter, L. Latonen, and P. Ruusu-vuori, “Dual structured convolutional neural network with feature augmentationfor quantitative characterization of tissue histology,” in
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , pp. 27–35, 2017.13. Y. Xu, Z. Jia, Y. Ai, F. Zhang, M. Lai, I. Eric, and C. Chang, “Deep convolutionalactivation features for large scale brain tumor histopathology image classificationand segmentation,” in
Acoustics, Speech and Signal Processing (ICASSP), 2015IEEE International Conference on , pp. 947–951, IEEE, 2015.14. T. Ara´ujo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Pol´onia, andA. Campilho, “Classification of breast cancer histology images using convolutionalneural networks,”
PloS one , vol. 12, no. 6, p. e0177544, 2017. BACH Challenge15. A. Agarwalla, M. Shaban, and N. M. Rajpoot, “Representation-aggregationnetworks for segmentation of multi-gigapixel histology images,” arXiv preprintarXiv:1707.08814 , 2017.16. B. E. Bejnordi, G. Zuidhof, M. Balkenhol, M. Hermsen, P. Bult, B. van Ginneken,N. Karssemeijer, G. Litjens, and J. van der Laak, “Context-aware stacked con-volutional neural networks for classification of breast carcinomas in whole-slidehistopathology images,” arXiv preprint arXiv:1705.03678 , 2017.17. E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, “Color transfer betweenimages,”
IEEE Computer graphics and applications , vol. 21, no. 5, pp. 34–41, 2001.18. A. M. Khan, N. Rajpoot, D. Treanor, and D. Magee, “A nonlinear mapping ap-proach to stain normalization in digital histopathology images using image-specificcolor deconvolution,”
IEEE Transactions on Biomedical Engineering , vol. 61, no. 6,pp. 1729–1738, 2014.19. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog-nition,” in
Proceedings of the IEEE conference on computer vision and patternrecognition , pp. 770–778, 2016.20. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scaleimage recognition,” arXiv preprint arXiv:1409.1556 , 2014.21. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van-houcke, and A. Rabinovich, “Going deeper with convolutions,” in
Proceedings ofthe IEEE conference on computer vision and pattern recognition , pp. 1–9, 2015.22. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassinghuman-level performance on imagenet classification,” in
Proceedings of the IEEEinternational conference on computer vision , pp. 1026–1034, 2015.23. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network train-ing by reducing internal covariate shift,” in
International Conference on MachineLearning , pp. 448–456, 2015.24. B. Korbar, A. M. Olofson, A. P. Miraflor, K. M. Nicka, M. A. Suriawinata, L. Tor-resani, A. A. Suriawinata, and S. Hassanpour, “Deep-learning for classification ofcolorectal polyps on whole-slide images,” arXiv preprint arXiv:1703.01550arXiv preprint arXiv:1703.01550