Multiscale Deep Neural Networks for Multiclass Tissue Classification of Histological Whole-Slide Images
Rune Wetteland, Kjersti Engan, Trygve Eftestøl, Vebjørn Kvikstad, Emilius A.M. Janssen
aa r X i v : . [ ee ss . I V ] S e p Medical Imaging with Deep Learning 2019 MIDL 2019 – Extended Abstract Track
Multiscale Deep Neural Networks for Multiclass TissueClassification of Histological Whole-Slide Images
Rune Wetteland [email protected] Kjersti Engan [email protected] Trygve Eftestøl [email protected] Vebjørn Kvikstad [email protected] Emilius A.M. Janssen , [email protected] Department of Electrical Engineering and Computer Science, University of Stavanger, Norway Department of Pathology, Stavanger University Hospital, Norway Department of Mathematics and Natural Sciences, University of Stavanger, Norway
Abstract
Correct treatment of urothelial carcinoma patients is dependent on accurate grading andstaging of the cancer tumour. This is determined manually by a pathologist by examin-ing the histological whole-slide images (WSI). The large size of these images makes this atime-consuming and challenging task. The WSI contain a variety of tissue types, and amethod for defining diagnostic relevant regions would have several advantages for visual-ization as well as further input to automated diagnosis systems. We propose an automaticmultiscale method for classification of tiles from WSI of urothelial carcinoma patients intosix classes. Three architectures based on convolutional neural network (CNN) were tested:MONO-CNN (400x), DI-CNN (100x/400x) and TRI-CNN (25x/100x/400x). The prelim-inary results show that the two multiscale models performed significantly better than themono-scale model, achieving an F1-score of 0.986, substantiating that utilising multiplescales in the model aids the classification accuracy.
1. Introduction
Bladder cancer is the 10th most common cancer type worldwide (Bray et al., 2018). Morethan 90% of bladder cancer cases are urothelial carcinomas which has a particular highrecurrence (50-70%) and progression rate (10-30%), making correct treatment and follow-upvital for survivability. Treatment is dependent on the cancer grade and stage, determinedmanually by an expert pathologist examining the histological whole-slide images (WSI).This is a time-consuming and challenging task, and studies have shown that it may have alow reproducibility in some cases, such as grading of urothelial carcinoma (Mangrud, 2014).Examination of the WSI is challenging because of the large size of the image, whichcontains several different tissue types, where only some are useful for diagnostic information.An automatic tool for identification of such regions would be beneficial for both guiding apathologist to the useful areas of the large WSI during examination, and for ROI extractionof useful tissue for a computer aided diagnostic solution. In this paper we present anautomatic method for classification of tiles from WSI of urothelial carcinoma patients intothe classes: urothelium, stroma, muscle, damaged tissue, blood and background. The tiles c (cid:13) etteland Engan Eftestøl Kvikstad A.M. Janssen are extracted at different magnification levels, to combine and utilise information at differentscales in a similar fashion to that of a pathologist.Multiscale approaches to tile-based classification have previously been done on othercancer types. In the work of Li et al. (2017) a multiscale U-Net was proposed for segmen-tation of histological images from radical prostatectomies to classify tiles into four classes.Tiles of size 100x100, 200x200 and 400x400 pixels were all extracted from histological imagesat 200x magnification. Features from the different tiles were then concatenated and usedas input to the multiscale U-net. The model achieved a mean Jaccard index of 65.8% overthe four classes. In Sirinukunwattana et al. (2018) a comparison of five single-scale and fivemultiscale architectures were tested on two datasets. Their best model (G) was a multiscalemodel which achieved an average F1-score of 0.782 ± ± ±
2. Data Material
The data material consists of Hematoxylin Eosin Saffron (HES) stained WSI from patientsdiagnosed with primary papillary urothelial carcinoma, collected at the University Hospitalof Stavanger, Norway. An expert pathologist has carefully annotated 239 selected regionsfrom 50 WSI from 32 unique patients, where each region includes one of the five foregroundclasses. Regions belonging to the background class was annotated on seven randomly se-lected patients.Tiles were extracted from these regions at 25x, 100x and 400x magnification in such amanner that the centre pixel is the same in all three tiles. All tiles have the same size of128x128x3 pixels. Tiles belonging to the test set was extracted from patients not presentin the training data. The remaining data was augmented to balance the dataset and wasfurther randomly shuffled and split into 85% training and 15% validation data. A randomseed was set to ensure that the shuffling was the same for each model. The final datasetsconsist of 128K training tiles, 23K validation tiles and 11K test tiles.
3. Method and Results
This paper compares three architectures referred to as the MONO-, DI- and TRI-CNNmodel. The three architectures have one (400x), two (100x, 400x) and three (25x, 100x,400x) inputs, respectively. Each input is fed into a pre-trained VGG16 network (Simonyan and Zisserman,2014) which acts as a feature extractor. The fully-connected (FC) layers of VGG16 are re-placed with a classification network consisting of two FC-layers, each followed by a dropoutlayer, and a final softmax layer with one output node for each of the six classes. TheDI-CNN and TRI-CNN models have two and three parallel VGG16 branches, respectively,which are concatenated before entering the classification network. ultiscale Tissue Classification The FC-layers were tested with 512, 1024, 1536, 2048 and 4096 neurons, and dropoutrates of 0, 0.3 and 0.5. This 15-model hyperparameter search was conducted on each ofthe three architectures, resulting in 45 models. These 45 models were run three consecutivetimes and averaged together for a more accurate result. Each model was trained usingearly stopping, stopping the model if validation loss did not decrease within 30 epochs. Allmodel selections were based on the validation set performance. After training, the weightparameters from the epoch which performed best on the validation dataset were restored,and a final evaluation of the model was performed on the test dataset. The VGG16 networkshad their weight parameters frozen during training. The model was written in Python 3.5using the Keras machine learning library (Chollet et al., 2015).Table 1 shows the hyperparameters for the best performing models and their averageresult from the three consecutive runs. The MONO-CNN model achieves a result similar tothat of the autoencoder. The two multiscale models perform equally and significantly betterthan the mono-scale models. The multiscale models also have a lower standard deviation onall metrics. Since both multiscale models achieve the same result, one could argue that thesimplest model of the two should be chosen. In that case, DI-CNN with its 36M parametersis a simpler model than TRI-CNN which has 47M parameters in total. DI-CNN also havea marginally lower standard deviation than TRI-CNN.Table 1: Models evaluated on the test set. F1-Score is presented as the total average andstandard deviation calculated across all six classes over three consecutive runs.Parameters are shown as no. of trainable parameters / no. of total parameters.Model Input Scale Dropout FC-Neurons ± ± ± ±
4. Conclusion
In this paper, we present preliminary results from a multiscale tile-based classificationmodel. Tiles from six classes were extracted at multiple scales from WSI of patients diag-nosed with urothelial carcinoma. Three model architectures were compared: MONO-CNN(400x), DI-CNN (100x, 400x) and TRI-CNN (25x, 100x, 400x). Results for an autoencodermodel from previous work was also included for reference. Both multiscale models outper-form the two single-scale models and achieve a very good result indicating the advantage ofutilising multiple scales. The model can be used as an ROI extraction method for relevanttissue areas in the large WSI, useful for both pathologist and computer-aided diagnosticsystems. Some more experiments should be performed to clarify if the behaviour stemsfrom the multiscale approach or the extended field-of-view.
1. Model trained and evaluated on the same dataset (Wetteland et al., 2019). etteland Engan Eftestøl Kvikstad A.M. Janssen References
Freddie Bray, Jacques Ferlay, Isabelle Soerjomataram, Rebecca L. Siegel, Lindsey Torre,and Ahmedin Jemal. Global cancer statistics 2018: Globocan estimates of incidence andmortality worldwide for 36 cancers in 185 countries: Global cancer statistics 2018.
CA:A Cancer Journal for Clinicians , 68, 09 2018. doi: 10.3322/caac.21492.Fran¸cois Chollet et al. Keras, 2015.Jiayun Li, Karthik V Sarma, King Chung Ho, Arkadiusz Gertych, Beatrice S Knudsen, andCorey W Arnold. A multi-scale u-net for semantic segmentation of histological imagesfrom radical prostatectomies. In
AMIA Annual Symposium Proceedings , volume 2017,page 1140. American Medical Informatics Association, 2017.Ok M˚alfrid Mangrud.
Identification of patients with high and low risk of progresson ofurothelial carcinoma of the urinary bladder stage Ta and T1 . PhD thesis, Ph. D. disser-tation, University of Bergen, 2014.Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, BertrandThirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, VincentDubourg, et al. Scikit-learn: Machine learning in python.
Journal of machine learn-ing research , 12(Oct):2825–2830, 2011.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scaleimage recognition. arXiv preprint arXiv:1409.1556 , 2014.Korsuk Sirinukunwattana, Nasullah Khalid Alham, Clare Verrill, and Jens Rittscher. Im-proving whole slide segmentation through visual context-a systematic study. In
Inter-national Conference on Medical Image Computing and Computer-Assisted Intervention ,pages 192–200. Springer, 2018.Rune Wetteland, Kjersti Engan, Trygve Eftestøl, Vebjørn Kvikstad, and Emilius A. M.Janssen. Multiclass tissue classification of whole-slide histological images using convolu-tional neural networks. In
Proceedings of the 8th International Conference on PatternRecognition Applications and Methods - Volume 1: ICPRAM , pages 320–327. INSTICC,SciTePress, 2019. ISBN 978-989-758-351-3. doi: 10.5220/0007253603200327., pages 320–327. INSTICC,SciTePress, 2019. ISBN 978-989-758-351-3. doi: 10.5220/0007253603200327.