Cell abundance aware deep learning for cell detection on highly imbalanced pathological data
Yeman Brhane Hagos, Catherine SY Lecat, Dominic Patel, Lydia Lee, Thien-An Tran, Manuel Rodriguez- Justo, Kwee Yong, Yinyin Yuan
CCELL ABUNDANCE AWARE DEEP LEARNING FOR CELL DETECTION ON HIGHLYIMBALANCED PATHOLOGICAL DATA
Yeman Brhane Hagos (cid:63) † Catherine SY Lecat ◦ Dominic Patel (cid:5)
Lydia Lee ◦ Thien-An Tran ◦ Manuel Rodriguez- Justo (cid:5)
Kwee Yong ◦ Yinyin Yuan (cid:63) † (cid:63) Division of Molecular Pathology, The Institute of Cancer Research, London, UK. † Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK. ◦ University College London Cancer Institute, Research Department of Haematology. (cid:5)
University College London Cancer Institute, Research Department of Pathology.
ABSTRACT
Automated analysis of tissue sections allows a better un-derstanding of disease biology, and may reveal biomarkersthat could guide prognosis or treatment selection. In digitalpathology, less abundant cell types can be of biological signif-icance, but their scarcity can result in biased and sub-optimalcell detection model. To minimize the effect of cell imbal-ance on cell detection, we proposed a deep learning pipelinethat considers the abundance of cell types during model train-ing. Cell weight images were generated, which assign largerweights to less abundant cells and used the weights to reg-ularize Dice overlap loss function. The model was trainedand evaluated on myeloma bone marrow trephine samples.Our model obtained cell detection F1-score of . , a increase compared to baseline models, and it outperformedbaseline models at detecting rare cell types. We found thatscaling deep learning loss function by the abundance of cellsimproves cell detection performance. Our results demonstratethe importance of incorporating domain knowledge on deeplearning methods for pathological data with class imbalance. Index Terms — Deep learning, convolutional neural net-work, cell detection, class imbalance, digital pathology, mul-tiplex immunohistochemistry.
1. INTRODUCTION
In digital pathology, cell detection and classification arethe first step to assessing tumour load, surrounding micro-environment and immune phenotypes [1]. Multiplex im-munohistochemistry (mIHC) is a staining method that allowssimultaneous examination of multiple cell markers in a singleimage, where each cell is represented by a unique color orcolor combinations (Fig. 1). Intrinsically, some cell typesare fewer compared to others. For example, in bone marrowtrephine samples, the number of CD4+/FOXP3- effector andCD4+/FOXP3+ regulatory T cells is lower than that of CD8+T cells ( Fig. 1). This imbalance causes instability and bias on the performance of discriminative models.Recently, different deep learning techniques have beenproposed to address the issue of class imbalance in medicalimage data. Some methods focus on sampling, and/or aug-mentation [2]. The sampling method reduces variability ofthe data [2]. Both methods are suited for segmentation (forexample, background vs. foreground segmentation) and clas-sification applications because a training sample for such ap-plications has fewer number of instances/labels. However, ina patch based cell detection, there might be hundreds of cellsin a small patch belonging to different classes. Thus, patchlevel sampling and augmentation approaches might increasethe degree of imbalance in the context of single cell detec-tion. Other methods focused on developing a robust trainingloss function [2, 3]. Folk et al. [3] proposed an approach thatassigns cell weight from cell segmentation. However, collect-ing manual single cell segmentation is costly.
Fig. 1 . Samples mIHC image showing class imbalance. Thenumber of CD8+ cells (red) are higher than CD4+/FOXP3-(brown) and CD4+/FOXP3+ (dark blue) cells.In this work, we proposed a new class balancing approachin the context of single cell detection from single cell dot an- a r X i v : . [ ee ss . I V ] F e b able 1 . Distribution of dataset. number of slides ( N s )– N s CD8+ CD4+/FOXP3- CD4+/FOXP3+
Training 5 2244 997 243Validation 3 1555 689 140Test 3 1306 702 138notation. Our work has the following contributions:• We implemented cell detection and classification deeplearning framework that uses class balancing techniqueof Dice overlap in dataset with class imbalance.• We implemented an algorithm that generates cellsweight image from expert dot annotation based on therelative abundance of cell types in the training data.• We implemented and compared performance of differ-ent cell weighting strategies.
2. MATERIAL
Our dataset contains newly diagnosed myeloma bonemarrow mIHC whole slide images. It contains CD8+,CD4+/FOXP3- and CD4+/FOXP3+ cell types (Fig. 2a).To train and evaluate the proposed method, a total of 8014cells were annotated in different regions of the whole slideimages by experts by putting a dot at the center of a cell (Fig.2b). Table 2 shows training, validation, and testing split.
3. METHODOLOGY3.1. Cell detection training data preparation
For training, the annotated regions were divided into x x patches. Let n be the number of training patches, the train-ing data, T d is represented by a set T d = { I , R , W ) } = { ( I , R , W ) , ( I , R , W ) , ( I , R , W ) , ..., ( I n , R n ,W n ) } , where I i ∈ R x x , W i ∈ R x x , and R i ∈ R x x are the i th input, weight, and referenceimages, respectively. Sample I, R and W images are shown inFig. 2b. Reference image: it is an artificial image generated fromthe expert single cell dot annotation using Equation (1). R ( i, j ) = (cid:40) if d < r otherwise (1)where R ( i, j ) is pixel value at ( i, j ) and d is an Euclideandistance from ( i, j ) to the closest cell center. The value of r was set to 4 pixels( 1.768 µ m). The value of r was chosenempirically, making sure blobs in R don’t touch each other(Fig. 2b). Weight image: it assigns a weight to every cell in theinput image (Fig. 2b). The weights are inferred from the rel-ative abundance of each cell type in the training data. Rarecells are given larger weight. Let n be the number of celltypes in the training dataset, let C = { c , c , c , ..., c n } bethe n cell types, and let N k be the number of c k cells in thetraining data. Then, N = { N , N , N , ..., N n } representa set of abundance of cells for the n cell types. Weight im-age is generated using information from N and location ofcells. We implemented three cell abundance weighting func-tions: RatioWeight (Equation 2), ExpWeightType1 (Equation3), and ExpWeightType2 (Equation 4). W ( i, j ) = (cid:40) max ( N ) N k if d k < r otherwise (2) W ( i, j ) = (cid:40) exp − N k max ( N ) if d k < r exp − otherwise (3) W ( i, j ) = (cid:40) exp − ( N k max ( N ) ) if d k < r exp − otherwise (4)where d k is an Euclidean distance from ( i, j ) to the k th celltype center. The value of r was set to 4 pixels( 1.768 µ m).For our dataset, C = { CD8+, CD4+/FOXP3-, CD4+/FOXP3+ } , N = { , , } and these values are usedto generate the weights using Equation (2 - 4). Fig. 2 . Training data preparation. a) Sample patches for allcell types. b). Sample annotated, reference (R) and weight(W) images for an input image (I). In W, less abundant celltype is assigned larger weight. CD4+/FOXP3+ cells havelarger weight than CD4+/FOXP3- and CD8 cells. ig. 3 . Schematic of cell detection model. The number on the top and side of the blocks indicate the size and spatial dimensionof the features, respectively. The loss function is computed using model output, reference and weight images.
Our proposed cell detection pipeline is shown in Fig. 3. It isa U-net [4] convolutional neural network (CNN) inspired byinception V3. We applied inception V3 blocks which extractsmulti-scale features at a given layer. The model has encoderand decoder part. The encoder learns low dimensional rep-resentation of the input image, and the decoder reconstruct atarget image. The x convolutional layer at the end of thearchitecture transforms x x dimensional features to x x , size of reference image (R). Parameters were ini-tialized using uniform glorot [5], and optimized using Adam[6] with learning rate of − . ReLU activation was appliedto all layers, but Sigmoid to the last layer to transform theoutput to probability. To minimize the effect of class imbalance, we appliedweighted dice overlap loss. The loss was computed as l = 1 − (cid:80) j = wj =0 (cid:80) i = hi =0 W ( i, j ) Y ( i, j ) R ( i, j ) + (cid:15) (cid:80) j = wj =0 (cid:80) i = hi =0 W ( i, j )( Y ( i, j ) + R ( i, j )) + (cid:15) (5)where W , R and Y are the weight, reference and output im-ages, respectively. w and h represent width and height ofinput image respectively. (cid:15) = 10 − was added to ensure com-putational stability. To train a cell classification model, we extracted x x patches as shown in Fig. 2a. We applied VGG [7] stylearchitecture, which contains three convolutional layers with { , , } filters followed by two dense layers with { , } neurons. ReLU activation was applied to all layer, but Soft-max for the last layer to transform the tensors to probabilities.Parameters were initialized using uniform glorot [5], andoptimized using Adam [6] with learning rate of − . Weapplied categorical cross entropy loss with class weightingexplained in Equation (2 - 4).
4. RESULTS AND DISCUSSION
To evaluate the performance of the proposed different weight-ing strategy based cell detection models and compare withother state of the art U-Net [4] and CONCORDe-Net [8], wemeasured precision, recall, and f1-score on separately heldtest images. CONCORDe-Net [8] is cell count regularizedCNN designed for cell detection for mIHC images.F1-score of . was obtained using ExpWeightType1and RatioWeight models, a increase compared to U-netand increase compared to CONCORDe-Net [8] as shownin Table 2. Moreover, recall of ExpWeightType2 model washigher than baseline models by at least . For ExpWeight-Type1 model, a detection was considered as true positive if its within pixels ( . µm ) Euclidean distance to a groundtruth annotation. For all models, the distance was optimizedindependently maximizing F1-score. This suggests classweighting improves cell detection performance. Table 2 . Cell detection performance of different models. U-net [4] model is a model in Fig. 3 trained without applyingweights.
Method Precision Recall F1-score
ExpWeightType1
RatioWeight 0.80 0.75 0.78ExpWeightType2 0.78 . for all celltypes. Overall accuracy of . was achieved.To visualize separability of cell types using deep learntfeatures and to scrutinize miss-classified cell types, weapplied uniform manifold approximation and projection(UMAP) dimensionality reduction (Figure 4b). The differentcell types are mapped into different UMAP space in 2D. CD8cells in the same space with CD4 cells are cells expressingboth CD4 and CD8 proteins. Fig. 4 . Cell classification model performance evaluation. a)Receiver operating characteristics curve and area under thecurve (AUC). b) UMAP visualization of dimensionaldeep learned features.In our dataset, compared to CD8+ cells there are less num-ber of CD4+/FOXP3+ cells. The visualization in Fig. 5 indi-cates a model with ExpWeightType1 detected CD4+/FOXP3+cells, which were under-represented in the training dataset,while the model trained without any weight (U-net) missedsome of these cells. For CD8+ cells, the detection results re-main similar with and without cell weighting. However, weobserved that the weighted model sometimes failed to detectlarge (in size) CD4+/FOXP3+ cells. This could be because
Fig. 5 . Samples results from different cell detection methods.of under-representation such type of cells in the training data.This indicates the proposed weighting method introduces asattention mechanism to the model to detect rare cell types andthus improve overall cell detection performance.For reproducibility, model was trained using a Docker im-age within Singularity container on HPC cluster. Code isavailable at https://github.com/YemanBrhane/AwareNet . Thedocker image is available on at yhagos/tf2gpu:concordenet onDocker Hub.Our study has limitations. Our samples were collectedfrom different hospitals with potential differences in process-ing and fixation, but they were stained and scanned using thesame protocols and platform. We trained and validated themodel on small scale dataset.Overall, our results demonstrate the importance of incor-porating domain knowledge for deep learning training on adataset with class imbalance. In the future, we will apply themodel on a larger cohort of bone marrow samples, to under-stand the composition of the bone marrow immune microen-vironment, and the changes imposed by malignant disease.
5. CONCLUSION
In this paper, to minimize the effect of cell imbalance in celldetection, we proposed a deep learning method that considersabundance of cells during training. Cell weight images weregenerated by assigning larger weights to less abundant celltypes and applied the weights to regularize Dice overlap lossfunction. Using negative exponential weighting, we obtaineda increase in cell detection F1-score, and better rare cellsetection compared to baseline models. Compliance with Ethical Standards
The data used in this study was obtained with appropriate eth-ical approval granted by HRA and HCRW (REC Reference07/Q0502/17).
Acknowledgements
This project was funded by the European Union’s Horizon2020 research and innovation programme under the MarieSklodowska-Curie grant agreement No766030 and CRUKEarly Detection Program Award (C9203/A28770). KY re-ceives funding from the National Institute for Health Re-search University College Hospital Biomedical ResearchCentre. LL is supported by the Medical Research Council,UK.
6. REFERENCES [1] Yinyin Yuan, “Spatial heterogeneity in the tumor mi-croenvironment,”
Cold Spring Harbor perspectives inmedicine , vol. 6, no. 8, pp. a026583, 2016.[2] Carole H Sudre, Wenqi Li, Tom Vercauteren, SebastienOurselin, and M Jorge Cardoso, “Generalised dice over-lap as a deep learning loss function for highly unbalancedsegmentations,” in
Deep learning in medical image anal-ysis and multimodal learning for clinical decision sup-port , pp. 240–248. Springer, 2017.[3] Thorsten Falk, Dominic Mai, Robert Bensch, ¨Ozg¨unC¸ ic¸ek, Ahmed Abdulkadir, Yassine Marrakchi, AntonB¨ohm, Jan Deubner, Zoe J¨ackel, Katharina Seiwald,et al., “U-net: deep learning for cell counting, detection,and morphometry,”
Nature methods , vol. 16, no. 1, pp.67–70, 2019.[4] Olaf Ronneberger, Philipp Fischer, and Thomas Brox,“U-net: Convolutional networks for biomedical imagesegmentation,” in
International Conference on Medi-cal image computing and computer-assisted intervention .Springer, 2015, pp. 234–241.[5] Xavier Glorot and Yoshua Bengio, “Understanding thedifficulty of training deep feedforward neural networks,”in
Proceedings of the thirteenth international conferenceon artificial intelligence and statistics , 2010, pp. 249–256.[6] Diederik P Kingma and Jimmy Ba, “Adam: Amethod for stochastic optimization,” arXiv preprintarXiv:1412.6980 , 2014. [7] Karen Simonyan and Andrew Zisserman, “Very deepconvolutional networks for large-scale image recogni-tion,” arXiv preprint arXiv:1409.1556 , 2014.[8] Yeman Brhane Hagos, Priya Lakshmi Narayanan, Ayse UAkarca, Teresa Marafioti, and Yinyin Yuan, “Concorde-net: Cell count regularized convolutional neural net-work for cell detection in multiplex immunohistochem-istry images,” in