[PDF] Multiscale Deep Neural Networks for Multiclass Tissue Classification of Histological Whole-Slide Images

Abstract

Correct treatment of urothelial carcinoma patients is dependent on accurate grading and staging of the cancer tumour. This is determined manually by a pathologist by examining the histological whole-slide images (WSI). The large size of these images makes this a time-consuming and challenging task. The WSI contain a variety of tissue types, and a method for defining diagnostic relevant regions would have several advantages for visualization as well as further input to automated diagnosis systems. We propose an automatic multiscale method for classification of tiles from WSI of urothelial carcinoma patients into six classes. Three architectures based on convolutional neural network (CNN) were tested: MONO-CNN (400x), DI-CNN (100x/400x) and TRI-CNN (25x/100x/400x). The preliminary results show that the two multiscale models performed significantly better than the mono-scale model, achieving an F1-score of 0.986, substantiating that utilising multiple scales in the model aids the classification accuracy.

Full PDF

aa r X i v : . [ ee ss . I V ] S e p Medical Imaging with Deep Learning 2019 MIDL 2019 – Extended Abstract Track

Multiscale Deep Neural Networks for Multiclass TissueClassiﬁcation of Histological Whole-Slide Images

Rune Wetteland [email protected] Kjersti Engan [email protected] Trygve Eftestøl [email protected] Vebjørn Kvikstad [email protected] Emilius A.M. Janssen , [email protected] Department of Electrical Engineering and Computer Science, University of Stavanger, Norway Department of Pathology, Stavanger University Hospital, Norway Department of Mathematics and Natural Sciences, University of Stavanger, Norway

Abstract

Correct treatment of urothelial carcinoma patients is dependent on accurate grading andstaging of the cancer tumour. This is determined manually by a pathologist by examin-ing the histological whole-slide images (WSI). The large size of these images makes this atime-consuming and challenging task. The WSI contain a variety of tissue types, and amethod for deﬁning diagnostic relevant regions would have several advantages for visual-ization as well as further input to automated diagnosis systems. We propose an automaticmultiscale method for classiﬁcation of tiles from WSI of urothelial carcinoma patients intosix classes. Three architectures based on convolutional neural network (CNN) were tested:MONO-CNN (400x), DI-CNN (100x/400x) and TRI-CNN (25x/100x/400x). The prelim-inary results show that the two multiscale models performed signiﬁcantly better than themono-scale model, achieving an F1-score of 0.986, substantiating that utilising multiplescales in the model aids the classiﬁcation accuracy.

1. Introduction

Bladder cancer is the 10th most common cancer type worldwide (Bray et al., 2018). Morethan 90% of bladder cancer cases are urothelial carcinomas which has a particular highrecurrence (50-70%) and progression rate (10-30%), making correct treatment and follow-upvital for survivability. Treatment is dependent on the cancer grade and stage, determinedmanually by an expert pathologist examining the histological whole-slide images (WSI).This is a time-consuming and challenging task, and studies have shown that it may have alow reproducibility in some cases, such as grading of urothelial carcinoma (Mangrud, 2014).Examination of the WSI is challenging because of the large size of the image, whichcontains several diﬀerent tissue types, where only some are useful for diagnostic information.An automatic tool for identiﬁcation of such regions would be beneﬁcial for both guiding apathologist to the useful areas of the large WSI during examination, and for ROI extractionof useful tissue for a computer aided diagnostic solution. In this paper we present anautomatic method for classiﬁcation of tiles from WSI of urothelial carcinoma patients intothe classes: urothelium, stroma, muscle, damaged tissue, blood and background. The tiles c (cid:13) etteland Engan Eftestøl Kvikstad A.M. Janssen are extracted at diﬀerent magniﬁcation levels, to combine and utilise information at diﬀerentscales in a similar fashion to that of a pathologist.Multiscale approaches to tile-based classiﬁcation have previously been done on othercancer types. In the work of Li et al. (2017) a multiscale U-Net was proposed for segmen-tation of histological images from radical prostatectomies to classify tiles into four classes.Tiles of size 100x100, 200x200 and 400x400 pixels were all extracted from histological imagesat 200x magniﬁcation. Features from the diﬀerent tiles were then concatenated and usedas input to the multiscale U-net. The model achieved a mean Jaccard index of 65.8% overthe four classes. In Sirinukunwattana et al. (2018) a comparison of ﬁve single-scale and ﬁvemultiscale architectures were tested on two datasets. Their best model (G) was a multiscalemodel which achieved an average F1-score of 0.782 ± ± ±

2. Data Material

The data material consists of Hematoxylin Eosin Saﬀron (HES) stained WSI from patientsdiagnosed with primary papillary urothelial carcinoma, collected at the University Hospitalof Stavanger, Norway. An expert pathologist has carefully annotated 239 selected regionsfrom 50 WSI from 32 unique patients, where each region includes one of the ﬁve foregroundclasses. Regions belonging to the background class was annotated on seven randomly se-lected patients.Tiles were extracted from these regions at 25x, 100x and 400x magniﬁcation in such amanner that the centre pixel is the same in all three tiles. All tiles have the same size of128x128x3 pixels. Tiles belonging to the test set was extracted from patients not presentin the training data. The remaining data was augmented to balance the dataset and wasfurther randomly shuﬄed and split into 85% training and 15% validation data. A randomseed was set to ensure that the shuﬄing was the same for each model. The ﬁnal datasetsconsist of 128K training tiles, 23K validation tiles and 11K test tiles.

3. Method and Results

This paper compares three architectures referred to as the MONO-, DI- and TRI-CNNmodel. The three architectures have one (400x), two (100x, 400x) and three (25x, 100x,400x) inputs, respectively. Each input is fed into a pre-trained VGG16 network (Simonyan and Zisserman,2014) which acts as a feature extractor. The fully-connected (FC) layers of VGG16 are re-placed with a classiﬁcation network consisting of two FC-layers, each followed by a dropoutlayer, and a ﬁnal softmax layer with one output node for each of the six classes. TheDI-CNN and TRI-CNN models have two and three parallel VGG16 branches, respectively,which are concatenated before entering the classiﬁcation network. ultiscale Tissue Classification The FC-layers were tested with 512, 1024, 1536, 2048 and 4096 neurons, and dropoutrates of 0, 0.3 and 0.5. This 15-model hyperparameter search was conducted on each ofthe three architectures, resulting in 45 models. These 45 models were run three consecutivetimes and averaged together for a more accurate result. Each model was trained usingearly stopping, stopping the model if validation loss did not decrease within 30 epochs. Allmodel selections were based on the validation set performance. After training, the weightparameters from the epoch which performed best on the validation dataset were restored,and a ﬁnal evaluation of the model was performed on the test dataset. The VGG16 networkshad their weight parameters frozen during training. The model was written in Python 3.5using the Keras machine learning library (Chollet et al., 2015).Table 1 shows the hyperparameters for the best performing models and their averageresult from the three consecutive runs. The MONO-CNN model achieves a result similar tothat of the autoencoder. The two multiscale models perform equally and signiﬁcantly betterthan the mono-scale models. The multiscale models also have a lower standard deviation onall metrics. Since both multiscale models achieve the same result, one could argue that thesimplest model of the two should be chosen. In that case, DI-CNN with its 36M parametersis a simpler model than TRI-CNN which has 47M parameters in total. DI-CNN also havea marginally lower standard deviation than TRI-CNN.Table 1: Models evaluated on the test set. F1-Score is presented as the total average andstandard deviation calculated across all six classes over three consecutive runs.Parameters are shown as no. of trainable parameters / no. of total parameters.Model Input Scale Dropout FC-Neurons ± ± ± ±

4. Conclusion

In this paper, we present preliminary results from a multiscale tile-based classiﬁcationmodel. Tiles from six classes were extracted at multiple scales from WSI of patients diag-nosed with urothelial carcinoma. Three model architectures were compared: MONO-CNN(400x), DI-CNN (100x, 400x) and TRI-CNN (25x, 100x, 400x). Results for an autoencodermodel from previous work was also included for reference. Both multiscale models outper-form the two single-scale models and achieve a very good result indicating the advantage ofutilising multiple scales. The model can be used as an ROI extraction method for relevanttissue areas in the large WSI, useful for both pathologist and computer-aided diagnosticsystems. Some more experiments should be performed to clarify if the behaviour stemsfrom the multiscale approach or the extended ﬁeld-of-view.

1. Model trained and evaluated on the same dataset (Wetteland et al., 2019). etteland Engan Eftestøl Kvikstad A.M. Janssen References

Freddie Bray, Jacques Ferlay, Isabelle Soerjomataram, Rebecca L. Siegel, Lindsey Torre,and Ahmedin Jemal. Global cancer statistics 2018: Globocan estimates of incidence andmortality worldwide for 36 cancers in 185 countries: Global cancer statistics 2018.

CA:A Cancer Journal for Clinicians , 68, 09 2018. doi: 10.3322/caac.21492.Fran¸cois Chollet et al. Keras, 2015.Jiayun Li, Karthik V Sarma, King Chung Ho, Arkadiusz Gertych, Beatrice S Knudsen, andCorey W Arnold. A multi-scale u-net for semantic segmentation of histological imagesfrom radical prostatectomies. In

AMIA Annual Symposium Proceedings , volume 2017,page 1140. American Medical Informatics Association, 2017.Ok M˚alfrid Mangrud.

Identiﬁcation of patients with high and low risk of progresson ofurothelial carcinoma of the urinary bladder stage Ta and T1 . PhD thesis, Ph. D. disser-tation, University of Bergen, 2014.Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, BertrandThirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, VincentDubourg, et al. Scikit-learn: Machine learning in python.

Journal of machine learn-ing research , 12(Oct):2825–2830, 2011.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scaleimage recognition. arXiv preprint arXiv:1409.1556 , 2014.Korsuk Sirinukunwattana, Nasullah Khalid Alham, Clare Verrill, and Jens Rittscher. Im-proving whole slide segmentation through visual context-a systematic study. In

Inter-national Conference on Medical Image Computing and Computer-Assisted Intervention ,pages 192–200. Springer, 2018.Rune Wetteland, Kjersti Engan, Trygve Eftestøl, Vebjørn Kvikstad, and Emilius A. M.Janssen. Multiclass tissue classiﬁcation of whole-slide histological images using convolu-tional neural networks. In

Proceedings of the 8th International Conference on PatternRecognition Applications and Methods - Volume 1: ICPRAM , pages 320–327. INSTICC,SciTePress, 2019. ISBN 978-989-758-351-3. doi: 10.5220/0007253603200327., pages 320–327. INSTICC,SciTePress, 2019. ISBN 978-989-758-351-3. doi: 10.5220/0007253603200327.