Texture CNN for Histopathological Image Classification
Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. de Oliveira, Alessandro L. Koerich
TTexture CNN for Histopathological Image Classification
Jonathan de Matos ∗† ,Alceu de S. Britto Jr. †‡ , Luiz E. S. de Oliveira § and Alessandro L. Koerich ∗† Universidade Estadual de Ponta Grossa, Brazil ‡ Pontificia Universidade Cat´olica do Paran´a, Brazil § Universidade Federal do Paran´a, Brazil ∗ ´Ecole de Technologie Sup´erieure, Montreal, Canada Abstract —Biopsies are the gold standard for breast cancerdiagnosis. This task can be improved by the use of ComputerAided Diagnosis (CAD) systems, reducing the time of diagnosisand reducing the inter and intra-observer variability. Theadvances in computing have brought this type of system closerto reality. However, datasets of Histopathological Images (HI)from biopsies are quite small and unbalanced what makesdifficult to use modern machine learning techniques such asdeep learning. In this paper we propose a compact architecturebased on texture filters that has fewer parameters than tradi-tional deep models but is able to capture the difference betweenmalignant and benign tissues with relative accuracy. Theexperimental results on the BreakHis dataset have show thatthe proposed texture CNN achieves almost 90% of accuracyfor classifying benign and malignant tissues.
Keywords -Deep learning, texture, histopathological images,breast cancer.
I. I
NTRODUCTION
Current hardware capabilities and computing technologiesprovide the ability of computing to solve problems in manyfields. The medical field is a noble employ of technologyas it can help to improve populations’ health and quality oflife. Medical diagnosis is a good example of the applicationof computing. One type of diagnosis is based on the analysisof images acquired from imaging devices such as MagneticResonance (MRI), X-Rays, Computed Tomography (CT)or Ultrasound. On the other hand, Histopathologic Image(HI) is another kind of medical image obtained by meansof microscopy of tissues from biopsies which gives to thespecialists, the ability to observe tissues characteristics in acell basis [1].Imaging exams like mammography, Ultrasound or CT canshow the presence of masses growing in breast tissue, but theconfirmation of the type of tumor can only be accomplishedby a biopsy. However, biopsy is a time-consuming processthat involves several steps: acquisition procedure (e.g. fineneedle aspiration or surgical open biopsy); tissue processing(creation of the slide with the staining process); and a finalanalysis of the slide by a pathologist. Pathologist analysisis a highly specialized and time-consuming task prone tointer and intra-observer discordance [2]. The variance in theanalysis process can be caused by the staining process byHematoxylin and Eosin (H&E), which is the most commonand accessible stain, but it can produce different color intensities depending on the brand, the storage time, andtemperature. In this context, Computer Aided Diagnosis(CAD) may increase pathologists’ throughput and improvethe confidence of results by reducing observer subjectivityand assuring repeatability.Recently, deep learning methods like Convolution NeuralNetworks (CNN) have gained attention from the scientificcommunity due to state-of-the-art results achieved in severalimage classification tasks. However, CNNs usually havehundreds of thousands or even millions of trainable pa-rameters and for learning a good model they require largeamounts of data for training [3]. Therefore, it is not straight-forward to use such deep models in HI due to the scarcity ofdata. Usually, HI datasets such as BreakHIs [4], CRC [5] andHICL [6] have few patients and consequently the number ofimages is very low. Basically, two approaches can be usedto circumvent the data scarcity to allow the use of deepmodels in HI tasks: data augmentation or transfer learning[3]. For data augmentation, low-level transformation such asaffine transforms are usually applied to generate modifiedimages and to avoid inserting biases in the classificationprocess using other morphological operations. Spanhol etal. [7] used a patching procedure that consists of croppinglow resolution regions, e.g. 100 × a r X i v : . [ c s . C V ] M a y ection III presents the experimental results achieved bythe proposed texture CNN as well as by other complexarchitectures that require data augmentation. In the lastsection we present our conclusion and perspectives and ideasfor future work. II. P ROPOSED A PPROACH
The BreaKHis dataset is composed of 7,909 histopatho-logical images of 82 patients labeled as malignant or benignbreast tumors [4]. Each image has also a tumor type labelwhere four types are malignant and four types are benigntumors, as presented in Table I. The dataset is imbalancedby a factor of seven in the worst case, which means, e.g.that ductal carcinoma (malignant) images have seven timesmore samples than adenosis (benign).
Table II
MAGE AND PATIENT DISTRIBUTION OF B REA KH IS DATASET
Tumor Type Images Patients B e n i g n Adenosis 444 4Fibroadenoma 1014 10Phyllodes tumor 453 3Tubular adenoma 569 7Total 2368 24 M a li g n a n t Ductal carcinoma 3451 38Lobular carcinoma 626 5Mucinous carcinoma 792 9Papillary carcinoma 560 6Total 5429 58
The images are Hematoxylin & Eosin stained slices oftissues with 700 ×
460 pixels. For all patients there areimages with four magnification factors: 40 × , 100 × , 200 × and 400 × which are equivalent to 0.49, 0.20, 0.10, and0.05 µ m per pixel, respectively. The different magnificationsrepresent the enlargement of regions of interest selected bythe pathologist during his analysis.HIs do not have the same shapes found in large-scaleimage datasets that are commonly used to train CNNs,such as ImageNet or CIFAR. Therefore, instead of usingpre-trained CNNs, we propose an architecture that is moresuitable to capture the texture-like features present in HIs.For such an aim, we use an alternative architecture based onthe texture CNN proposed by Andrearczyk and Whelan [9].It consists of only two convolutional layers (Conv2D), anaverage pooling layer (AvgPool2D) over the entire featuremap also called global average pooling, and fully connectedlayers (Dense). The activation function ReLU is used in allconvolutional and dense layers except at the last layer whereit is used the softmax activation function. This architecture,named TCNN, is described in Table II. One of the mainadvantages of such a architecture is that it leads to very com-pact network since it has about 11,900 trainable parameters.Besides capturing texture information, this architecture alsoaddresses one of the main problems of using deep learning architectures with small-size datasets, because the amountof data required to train such a network is not so high. InTable II, kernel refers to the size of the convolutional filter,stride is the step of the filter, which means how many pixelsthe filter shifts at each operation and size is the size of thefeature map resulting from each layer operation.We also propose a second architecture which is basedboth on the texture CNN and the Inception V3 CNN, namedTCNN Inception. This architecture has parallel filters withdifferent kernel sizes like an Inception CNN, which areconcatenated in a subsequent layer (Concatenation). Thisarchitecture, described in Table III, is more complex thanthe previous one due to the greater number of convolutionalfilters, which increases the number of trainable parametersto 1,252,392.For both texture CNNs, the shape of the input imageis defined as 350 ×
230 pixels. However, the images ofthe BreaKHis dataset have 700 ×
460 pixels, but in textureanalysis using CNNs, halving the dimension of the imagedoes not impact the accuracy significantly. Although, thefinal prediction results do not suffer a large impact, thememory and processing requirements are reduced, makingthe approach more convenient for training.
Table IIA
RCHITECTURE OF THE PROPOSED T EXTURE
CNN (TCNN).
Type of layer Kernel Stride Size × × × ×
322 Conv2D 3 × × × ×
323 AvgPool2D 346 ×
226 1 × × ×
324 Flatten - - 325 Dense - - 326 Dense - - 167 Dense - - 2
Finally, for comparison purposes, we have also used anInception V3 network [10]. However, such a network hasmore than 23 million trainable parameters and it cannot befully trained with HIs due to the limited number of imagesavailable in the dataset (Table I). To circumvent this prob-lem, we fine-tuned an Inception V3 network pre-trained onthe ImageNet dataset [10]. The fine-tuning process consistsof freezing some layers of the network during the trainingprocess to reduce the number of trainable parameters (andthe amount of data required to adjust these parameters).Usually, the layers in charge of learning a representation(convolutional layers) are kept frozen and the layers de-voted to classification are trained on the target dataset. Theassumption is that the convolutional layers were properlytrained on the large dataset, so they are able to provide ameaningful representation of the input image in terms ofrelevant features. Different from the previous networks, thepre-trained Inception V3 requires input images of 299 × able IIIA RCHITECTURE OF THE PROPOSED TEXTURE
CNN
BASED ON THE I NCEPTION
CNN (TCNN I NC ). Type of layer Kernel Stride Size × × × ×
322 Conv2D 3 × × × ×
323 Conv2D 5 × × × ×
324 Concatenation (1, 2, 3) 350 × ×
965 Conv2D 1 × × × ×
646 Conv2D 3 × × × ×
647 Conv2D 5 × × × ×
648 Concatenation (5, 6, 7) 350 × × × × × × × × × × × × × × × × × × × × × × ×
230 1 × × × which was used in some previous works [7]. Table IVC
OMPLEXITY OF THE
CNN
MODELS
Model Number of Trainable Parameters
TCNN 11,900TCNN Inc 1,252,392Inception V3 [3] 23,851,784AlexNet [7] 62,378,344
III. E
XPERIMENTAL R ESULTS
The three deep networks described in the previous sectionwere evaluated on the BreaKHis dataset using the experi-mental protocol proposed by Spanhol et al. [4] which usesfive 30%/70% (test/training sets) hold-outs with repetition.Furthermore, we also split the training folds into training(85%) and validation (15%) subsets.The experiments were carried out only with images of200 × magnification factor to limit the number of exper-iments. It is also worth notice that the dataset split intotraining, validation and test does not strictly respect theproportion of 60%–10%–30% for the three subsets. Thereason is that the data split is patient-wise to avoid havingimages of the same patient in the training and test sets.The deep networks were trained for 120 epochs usingthe Adadelta optimizer and the early stopping mechanism,which stops the training based on the stability of the ac-curacy on the validation set after 15 iterations. We havechosen 120 epochs empirically by observing the trainingconvergence. The Inception V3 was initialized using theImageNet weights and we fine-tuned the whole network. TCNN and TCNN Inception were initialized with randomweights.The data augmentation mechanism progressively increasesthe number of generated images from 6 × to 72 × since oneof our goals is to evaluate the impact of the amount of datainto the accuracy of the networks. For data augmentation weused composed random affine transforms including flipping,rotation, and translation. Table V presents the mean accuracyat patient level over five repetitions. The accuracy is therelation between true positives and true negatives by all thepatients. Table V also presents the specificity and sensitivity. Table VA
CCURACY AT THE PATIENT LEVEL , SENSITIVITY AND SPECIFICITY FOR
TCNN, TCNN I
NC AND I NCEPTION V3 WITHOUT DATASETAUGMENTATION (1 × ) AND WITH FIVE DATA AUGMENTATIONS (6 × TO × ). R ESULTS ARE GIVEN BY THE MEAN AND THE STANDARDDEVIATION OVER FIVE FOLDS . Accuracy Sensitivity SpecificityModel DA Mean ± SD Mean ± SD Mean ± SD T CNN × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± T CNN I n ce p t i o n × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± I n ce p t i o n V × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± × ± ± ± Table V shows that increasing the number of imagesfor training the networks leads to a slight improvement inaccuracy for Inception V3 and TCNN Inc. For TCNN, basedon the critical distance (CD) graph shown in Figure III wecan infer that the results without data augmentation (1 × ) and12 × are not statistically different from the TCNN using 72 × data augmentation, which means that such a compact CNNcan be well trained with a small dataset and it is not worthusing more images. Overall, the Inception V3 with 72 × and12 × data augmentation are not statistically different from theTCNN Inc 72 × , as shown in the critical distance (CD) graphof Figure III. Surprisingly, the TCNN trained without dataaugmentation (TCNN 1 × ) provides the fifth-best accuracy.Table VI compares the performance of the approachesproposed and evaluated in this paper with the state-of-the-art for the BreaKHis dataset. The Inception V3 trained with72 × outperformed the MI Approach [11], which currentlyachieves the best performance for 200 × magnification factor. igure 1. Critical distance graph for TCNN based on patient level accuracyobtained from the results of the Nemenyi test.Figure 2. Critical distance graph between the two best patient levelaccuracy of each network obtained from the results of the Nemenyi test. The MI approach also provides the best overall resultfor other magnification factors, reaching 92.1% for 40 × magnification. Surprisingly, we achieved 85.1% of accuracyusing the TCNN without data augmentation, a performancecomparable to the baseline which employs an AlexNet CNNwith millions of trainable parameters. Table VIC
OMPARISON WITH THE STATE - OF - THE - ART RESULTS FOR B REA KH ISDATASET . V
ALUES REPRESENT THE ACCURACY OF TWO - CLASSPROBLEM ( MALIGNANT OR BENIGN ). Approach Accuracy (%)
CNN (Alexnet) [7] 84.6
TCNN (DA 1 × ) TCNN Inc (DA 72 × ) Inception V3 FT (DA 72 × ) 87.4 IV. C
ONCLUSION
In this paper we proposed a texture CNN to deal withthe problem of histopathological image classification. Theproposed TCNN exploits the texture characteristics of theHIs and has a reduced number of trainable parameterscompared to other CNN architectures. Despite the TCNNdid not outperform a fine-tuned Inception V3 in the two-class problem (benign versus malignant), it has 2,000 × lesstrainable parameters than an Inception CNN. Therefore, thisopens up the possibility of exploiting this architecture inother related problems where the size of datasets is relativelysmall.Finally, simply increasing the number of samples usinglow-level transformations seems not to contribute to improve performance of TCNNs. As a future work, we need to lookalso at the quality of the generated samples, looking forsamples that may lead to a meaningful improvement.R EFERENCES [1] J. de Matos, A.S. Britto Jr., L.E.S. Oliveira, and A.L. Koerich,“Histopathologic image processing: A review,” 2019.[2] J-P. Bellocq et al., “Securiser le diagnostic en anatomie etcytologie pathologiques en 2011. lerreur diagnostique: entrediscours et realite,”
Annales de Pathologie , vol. 31, no. 5, pp.S92–S94, 2011.[3] J. de Matos, A.S. Britto Jr., L.E.S. Oliveira, and A.L. Koerich,“Double transfer learning for breast cancer histopathologicimage classification,” in
IEEE Intl J Conf Neural Net , 2019.[4] F.A. Spanhol, L.E.S. Oliveira, C. Petitjean, and L. Heutte, “ADataset for Breast Cancer Histopathological Image Classifi-cation,”
IEEE Trans Biom Eng , vol. 63, no. 7, pp. 1455–1462,2016.[5] J.N. Kather, C.-A. Weis, F. Bianconi, S. M. Melchers, L.R.Schad, T. Gaiser, A. Marx, and F.G. Z¨ollner, “Multi-classtexture analysis in colorectal cancer histology,”
Sci Reports ,vol. 6, no. 1, pp. 27988, 2016.[6] Kostopoulos et al. , “Computer-based association of thetexture of expressed estrogen receptor nuclei with histologicgrade using immunohistochemically-stained breast carcino-mas,”
Anal Quant Cytol Hist , vol. 4, no. 31, pp. 187–196,2009.[7] F.A. Spanhol, L.E.S. Oliveira, C. Petitjean, and L. Heutte,“Breast cancer histopathological image classification usingconvolutional neural networks,” in
IEEE Intl J Conf NeuralNet , 2016, pp. 2560–2567.[8] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A.C. Berg, and L. Fei-Fei, “ImageNet Large Scale VisualRecognition Challenge,”
Intl J Comp Vis , vol. 115, no. 3, pp.211–252, 2015.[9] V. Andrearczyk and P.F. Whelan, “Using filter banks inconvolutional neural networks for texture classification,”
PattRecog Lett , vol. 84, pp. 63–69, 2016.[10] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,“Rethinking the inception architecture for computer vision,”
CVPR , 2016.[11] P.J. Sudharshan, C. Petitjean, F.A. Spanhol, L.E.S. Oliveira,L. Heutte, and P. Honeine, “Multiple instance learning forhistopathological breast cancer image classification,”
Exp SysAppl , vol. 117, pp. 103–111, 2019.[12] F.A. Spanhol, L.E.S. Oliveira, P.R. Cavalin, C. Petitjean, andL. Heutte, “Deep features for breast cancer histopathologicalimage classification,” in
Intl Conf SMC , 2017, pp. 1868–1873.[13] Y. Song, J.J. Zou, H. Chang, and W. Cai, “Adapting fishervectors for histopathology image classification,” in