[PDF] Texture CNN for Histopathological Image Classification

Abstract

Biopsies are the gold standard for breast cancer diagnosis. This task can be improved by the use of Computer Aided Diagnosis (CAD) systems, reducing the time of diagnosis and reducing the inter and intra-observer variability. The advances in computing have brought this type of system closer to reality. However, datasets of Histopathological Images (HI) from biopsies are quite small and unbalanced what makes difficult to use modern machine learning techniques such as deep learning. In this paper we propose a compact architecture based on texture filters that has fewer parameters than traditional deep models but is able to capture the difference between malignant and benign tissues with relative accuracy. The experimental results on the BreakHis dataset have show that the proposed texture CNN achieves almost 90% of accuracy for classifying benign and malignant tissues.

Full PDF

TTexture CNN for Histopathological Image Classiﬁcation

Jonathan de Matos ∗† ,Alceu de S. Britto Jr. †‡ , Luiz E. S. de Oliveira § and Alessandro L. Koerich ∗† Universidade Estadual de Ponta Grossa, Brazil ‡ Pontiﬁcia Universidade Cat´olica do Paran´a, Brazil § Universidade Federal do Paran´a, Brazil ∗ ´Ecole de Technologie Sup´erieure, Montreal, Canada Abstract —Biopsies are the gold standard for breast cancerdiagnosis. This task can be improved by the use of ComputerAided Diagnosis (CAD) systems, reducing the time of diagnosisand reducing the inter and intra-observer variability. Theadvances in computing have brought this type of system closerto reality. However, datasets of Histopathological Images (HI)from biopsies are quite small and unbalanced what makesdifﬁcult to use modern machine learning techniques such asdeep learning. In this paper we propose a compact architecturebased on texture ﬁlters that has fewer parameters than tradi-tional deep models but is able to capture the difference betweenmalignant and benign tissues with relative accuracy. Theexperimental results on the BreakHis dataset have show thatthe proposed texture CNN achieves almost 90% of accuracyfor classifying benign and malignant tissues.

Keywords -Deep learning, texture, histopathological images,breast cancer.

I. I

NTRODUCTION

Current hardware capabilities and computing technologiesprovide the ability of computing to solve problems in manyﬁelds. The medical ﬁeld is a noble employ of technologyas it can help to improve populations’ health and quality oflife. Medical diagnosis is a good example of the applicationof computing. One type of diagnosis is based on the analysisof images acquired from imaging devices such as MagneticResonance (MRI), X-Rays, Computed Tomography (CT)or Ultrasound. On the other hand, Histopathologic Image(HI) is another kind of medical image obtained by meansof microscopy of tissues from biopsies which gives to thespecialists, the ability to observe tissues characteristics in acell basis [1].Imaging exams like mammography, Ultrasound or CT canshow the presence of masses growing in breast tissue, but theconﬁrmation of the type of tumor can only be accomplishedby a biopsy. However, biopsy is a time-consuming processthat involves several steps: acquisition procedure (e.g. ﬁneneedle aspiration or surgical open biopsy); tissue processing(creation of the slide with the staining process); and a ﬁnalanalysis of the slide by a pathologist. Pathologist analysisis a highly specialized and time-consuming task prone tointer and intra-observer discordance [2]. The variance in theanalysis process can be caused by the staining process byHematoxylin and Eosin (H&E), which is the most commonand accessible stain, but it can produce different color intensities depending on the brand, the storage time, andtemperature. In this context, Computer Aided Diagnosis(CAD) may increase pathologists’ throughput and improvethe conﬁdence of results by reducing observer subjectivityand assuring repeatability.Recently, deep learning methods like Convolution NeuralNetworks (CNN) have gained attention from the scientiﬁccommunity due to state-of-the-art results achieved in severalimage classiﬁcation tasks. However, CNNs usually havehundreds of thousands or even millions of trainable pa-rameters and for learning a good model they require largeamounts of data for training [3]. Therefore, it is not straight-forward to use such deep models in HI due to the scarcity ofdata. Usually, HI datasets such as BreakHIs [4], CRC [5] andHICL [6] have few patients and consequently the number ofimages is very low. Basically, two approaches can be usedto circumvent the data scarcity to allow the use of deepmodels in HI tasks: data augmentation or transfer learning[3]. For data augmentation, low-level transformation such asafﬁne transforms are usually applied to generate modiﬁedimages and to avoid inserting biases in the classiﬁcationprocess using other morphological operations. Spanhol etal. [7] used a patching procedure that consists of croppinglow resolution regions, e.g. 100 × a r X i v : . [ c s . C V ] M a y ection III presents the experimental results achieved bythe proposed texture CNN as well as by other complexarchitectures that require data augmentation. In the lastsection we present our conclusion and perspectives and ideasfor future work. II. P ROPOSED A PPROACH

The BreaKHis dataset is composed of 7,909 histopatho-logical images of 82 patients labeled as malignant or benignbreast tumors [4]. Each image has also a tumor type labelwhere four types are malignant and four types are benigntumors, as presented in Table I. The dataset is imbalancedby a factor of seven in the worst case, which means, e.g.that ductal carcinoma (malignant) images have seven timesmore samples than adenosis (benign).

Table II

MAGE AND PATIENT DISTRIBUTION OF B REA KH IS DATASET

Tumor Type Images Patients B e n i g n Adenosis 444 4Fibroadenoma 1014 10Phyllodes tumor 453 3Tubular adenoma 569 7Total 2368 24 M a li g n a n t Ductal carcinoma 3451 38Lobular carcinoma 626 5Mucinous carcinoma 792 9Papillary carcinoma 560 6Total 5429 58

The images are Hematoxylin & Eosin stained slices oftissues with 700 ×

460 pixels. For all patients there areimages with four magniﬁcation factors: 40 × , 100 × , 200 × and 400 × which are equivalent to 0.49, 0.20, 0.10, and0.05 µ m per pixel, respectively. The different magniﬁcationsrepresent the enlargement of regions of interest selected bythe pathologist during his analysis.HIs do not have the same shapes found in large-scaleimage datasets that are commonly used to train CNNs,such as ImageNet or CIFAR. Therefore, instead of usingpre-trained CNNs, we propose an architecture that is moresuitable to capture the texture-like features present in HIs.For such an aim, we use an alternative architecture based onthe texture CNN proposed by Andrearczyk and Whelan [9].It consists of only two convolutional layers (Conv2D), anaverage pooling layer (AvgPool2D) over the entire featuremap also called global average pooling, and fully connectedlayers (Dense). The activation function ReLU is used in allconvolutional and dense layers except at the last layer whereit is used the softmax activation function. This architecture,named TCNN, is described in Table II. One of the mainadvantages of such a architecture is that it leads to very com-pact network since it has about 11,900 trainable parameters.Besides capturing texture information, this architecture alsoaddresses one of the main problems of using deep learning architectures with small-size datasets, because the amountof data required to train such a network is not so high. InTable II, kernel refers to the size of the convolutional ﬁlter,stride is the step of the ﬁlter, which means how many pixelsthe ﬁlter shifts at each operation and size is the size of thefeature map resulting from each layer operation.We also propose a second architecture which is basedboth on the texture CNN and the Inception V3 CNN, namedTCNN Inception. This architecture has parallel ﬁlters withdifferent kernel sizes like an Inception CNN, which areconcatenated in a subsequent layer (Concatenation). Thisarchitecture, described in Table III, is more complex thanthe previous one due to the greater number of convolutionalﬁlters, which increases the number of trainable parametersto 1,252,392.For both texture CNNs, the shape of the input imageis deﬁned as 350 ×

230 pixels. However, the images ofthe BreaKHis dataset have 700 ×

460 pixels, but in textureanalysis using CNNs, halving the dimension of the imagedoes not impact the accuracy signiﬁcantly. Although, theﬁnal prediction results do not suffer a large impact, thememory and processing requirements are reduced, makingthe approach more convenient for training.

Table IIA

RCHITECTURE OF THE PROPOSED T EXTURE

CNN (TCNN).

Type of layer Kernel Stride Size × × × ×

322 Conv2D 3 × × × ×

323 AvgPool2D 346 ×

226 1 × × ×

324 Flatten - - 325 Dense - - 326 Dense - - 167 Dense - - 2

Finally, for comparison purposes, we have also used anInception V3 network [10]. However, such a network hasmore than 23 million trainable parameters and it cannot befully trained with HIs due to the limited number of imagesavailable in the dataset (Table I). To circumvent this prob-lem, we ﬁne-tuned an Inception V3 network pre-trained onthe ImageNet dataset [10]. The ﬁne-tuning process consistsof freezing some layers of the network during the trainingprocess to reduce the number of trainable parameters (andthe amount of data required to adjust these parameters).Usually, the layers in charge of learning a representation(convolutional layers) are kept frozen and the layers de-voted to classiﬁcation are trained on the target dataset. Theassumption is that the convolutional layers were properlytrained on the large dataset, so they are able to provide ameaningful representation of the input image in terms ofrelevant features. Different from the previous networks, thepre-trained Inception V3 requires input images of 299 × able IIIA RCHITECTURE OF THE PROPOSED TEXTURE

CNN

BASED ON THE I NCEPTION

CNN (TCNN I NC ). Type of layer Kernel Stride Size × × × ×

322 Conv2D 3 × × × ×

323 Conv2D 5 × × × ×

324 Concatenation (1, 2, 3) 350 × ×

965 Conv2D 1 × × × ×

646 Conv2D 3 × × × ×

647 Conv2D 5 × × × ×

648 Concatenation (5, 6, 7) 350 × × × × × × × × × × × × × × × × × × × × × × ×

230 1 × × × which was used in some previous works [7]. Table IVC

OMPLEXITY OF THE

CNN

MODELS

Model Number of Trainable Parameters

TCNN 11,900TCNN Inc 1,252,392Inception V3 [3] 23,851,784AlexNet [7] 62,378,344

III. E

XPERIMENTAL R ESULTS

The three deep networks described in the previous sectionwere evaluated on the BreaKHis dataset using the experi-mental protocol proposed by Spanhol et al. [4] which usesﬁve 30%/70% (test/training sets) hold-outs with repetition.Furthermore, we also split the training folds into training(85%) and validation (15%) subsets.The experiments were carried out only with images of200 × magniﬁcation factor to limit the number of exper-iments. It is also worth notice that the dataset split intotraining, validation and test does not strictly respect theproportion of 60%–10%–30% for the three subsets. Thereason is that the data split is patient-wise to avoid havingimages of the same patient in the training and test sets.The deep networks were trained for 120 epochs usingthe Adadelta optimizer and the early stopping mechanism,which stops the training based on the stability of the ac-curacy on the validation set after 15 iterations. We havechosen 120 epochs empirically by observing the trainingconvergence. The Inception V3 was initialized using theImageNet weights and we ﬁne-tuned the whole network. TCNN and TCNN Inception were initialized with randomweights.The data augmentation mechanism progressively increasesthe number of generated images from 6 × to 72 × since oneof our goals is to evaluate the impact of the amount of datainto the accuracy of the networks. For data augmentation weused composed random afﬁne transforms including ﬂipping,rotation, and translation. Table V presents the mean accuracyat patient level over ﬁve repetitions. The accuracy is therelation between true positives and true negatives by all thepatients. Table V also presents the speciﬁcity and sensitivity. Table VA

CCURACY AT THE PATIENT LEVEL , SENSITIVITY AND SPECIFICITY FOR

TCNN, TCNN I

NC AND I NCEPTION V3 WITHOUT DATASETAUGMENTATION (1 × ) AND WITH FIVE DATA AUGMENTATIONS (6 × TO × ). R ESULTS ARE GIVEN BY THE MEAN AND THE STANDARDDEVIATION OVER FIVE FOLDS . Accuracy Sensitivity SpeciﬁcityModel DA Mean ± SD Mean ± SD Mean ± SD T CNN × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± T CNN I n ce p t i o n × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± I n ce p t i o n V × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± Table V shows that increasing the number of imagesfor training the networks leads to a slight improvement inaccuracy for Inception V3 and TCNN Inc. For TCNN, basedon the critical distance (CD) graph shown in Figure III wecan infer that the results without data augmentation (1 × ) and12 × are not statistically different from the TCNN using 72 × data augmentation, which means that such a compact CNNcan be well trained with a small dataset and it is not worthusing more images. Overall, the Inception V3 with 72 × and12 × data augmentation are not statistically different from theTCNN Inc 72 × , as shown in the critical distance (CD) graphof Figure III. Surprisingly, the TCNN trained without dataaugmentation (TCNN 1 × ) provides the ﬁfth-best accuracy.Table VI compares the performance of the approachesproposed and evaluated in this paper with the state-of-the-art for the BreaKHis dataset. The Inception V3 trained with72 × outperformed the MI Approach [11], which currentlyachieves the best performance for 200 × magniﬁcation factor. igure 1. Critical distance graph for TCNN based on patient level accuracyobtained from the results of the Nemenyi test.Figure 2. Critical distance graph between the two best patient levelaccuracy of each network obtained from the results of the Nemenyi test. The MI approach also provides the best overall resultfor other magniﬁcation factors, reaching 92.1% for 40 × magniﬁcation. Surprisingly, we achieved 85.1% of accuracyusing the TCNN without data augmentation, a performancecomparable to the baseline which employs an AlexNet CNNwith millions of trainable parameters. Table VIC

OMPARISON WITH THE STATE - OF - THE - ART RESULTS FOR B REA KH ISDATASET . V

ALUES REPRESENT THE ACCURACY OF TWO - CLASSPROBLEM ( MALIGNANT OR BENIGN ). Approach Accuracy (%)

CNN (Alexnet) [7] 84.6

TCNN (DA 1 × ) TCNN Inc (DA 72 × ) Inception V3 FT (DA 72 × ) 87.4 IV. C

ONCLUSION

In this paper we proposed a texture CNN to deal withthe problem of histopathological image classiﬁcation. Theproposed TCNN exploits the texture characteristics of theHIs and has a reduced number of trainable parameterscompared to other CNN architectures. Despite the TCNNdid not outperform a ﬁne-tuned Inception V3 in the two-class problem (benign versus malignant), it has 2,000 × lesstrainable parameters than an Inception CNN. Therefore, thisopens up the possibility of exploiting this architecture inother related problems where the size of datasets is relativelysmall.Finally, simply increasing the number of samples usinglow-level transformations seems not to contribute to improve performance of TCNNs. As a future work, we need to lookalso at the quality of the generated samples, looking forsamples that may lead to a meaningful improvement.R EFERENCES [1] J. de Matos, A.S. Britto Jr., L.E.S. Oliveira, and A.L. Koerich,“Histopathologic image processing: A review,” 2019.[2] J-P. Bellocq et al., “Securiser le diagnostic en anatomie etcytologie pathologiques en 2011. lerreur diagnostique: entrediscours et realite,”

Annales de Pathologie , vol. 31, no. 5, pp.S92–S94, 2011.[3] J. de Matos, A.S. Britto Jr., L.E.S. Oliveira, and A.L. Koerich,“Double transfer learning for breast cancer histopathologicimage classiﬁcation,” in

IEEE Intl J Conf Neural Net , 2019.[4] F.A. Spanhol, L.E.S. Oliveira, C. Petitjean, and L. Heutte, “ADataset for Breast Cancer Histopathological Image Classiﬁ-cation,”

IEEE Trans Biom Eng , vol. 63, no. 7, pp. 1455–1462,2016.[5] J.N. Kather, C.-A. Weis, F. Bianconi, S. M. Melchers, L.R.Schad, T. Gaiser, A. Marx, and F.G. Z¨ollner, “Multi-classtexture analysis in colorectal cancer histology,”

Sci Reports ,vol. 6, no. 1, pp. 27988, 2016.[6] Kostopoulos et al. , “Computer-based association of thetexture of expressed estrogen receptor nuclei with histologicgrade using immunohistochemically-stained breast carcino-mas,”

Anal Quant Cytol Hist , vol. 4, no. 31, pp. 187–196,2009.[7] F.A. Spanhol, L.E.S. Oliveira, C. Petitjean, and L. Heutte,“Breast cancer histopathological image classiﬁcation usingconvolutional neural networks,” in

IEEE Intl J Conf NeuralNet , 2016, pp. 2560–2567.[8] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A.C. Berg, and L. Fei-Fei, “ImageNet Large Scale VisualRecognition Challenge,”

Intl J Comp Vis , vol. 115, no. 3, pp.211–252, 2015.[9] V. Andrearczyk and P.F. Whelan, “Using ﬁlter banks inconvolutional neural networks for texture classiﬁcation,”

PattRecog Lett , vol. 84, pp. 63–69, 2016.[10] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,“Rethinking the inception architecture for computer vision,”

CVPR , 2016.[11] P.J. Sudharshan, C. Petitjean, F.A. Spanhol, L.E.S. Oliveira,L. Heutte, and P. Honeine, “Multiple instance learning forhistopathological breast cancer image classiﬁcation,”

Exp SysAppl , vol. 117, pp. 103–111, 2019.[12] F.A. Spanhol, L.E.S. Oliveira, P.R. Cavalin, C. Petitjean, andL. Heutte, “Deep features for breast cancer histopathologicalimage classiﬁcation,” in

Intl Conf SMC , 2017, pp. 1868–1873.[13] Y. Song, J.J. Zou, H. Chang, and W. Cai, “Adapting ﬁshervectors for histopathology image classiﬁcation,” in