Attention based Multiple Instance Learning for Classification of Blood Cell Disorders
Ario Sadafi, Asya Makhro, Anna Bogdanova, Nassir Navab, Tingying Peng, Shadi Albarqouni, Carsten Marr
AAttention based Multiple Instance Learning forClassification of Blood Cell Disorders
Ario Sadafi , , , Asya Makhro , Anna Bogdanova , Nassir Navab , ,Tingying Peng , (cid:63) , Shadi Albarqouni , (cid:63) , and Carsten Marr (cid:63) Institute of Computational Biology, Helmholtz Zentrum Mnchen - GermanResearch Center for Environmental Health, Germany Computer Aided Medical Procedures, Technical University of Munich, Germany Helmholtz AI, Helmholtz Center Munich, Germany Red Blood Cell Research Group, Institute of Veterinary Physiology, VetsuisseFaculty and the Zurich Center for Integrative Human Physiology, University ofZurich, Zurich, Switzerland Computer Aided Medical Procedures, Johns Hopkins University, USA Computer Vision Lab (CVL), ETH Zurich, Switzerland
Abstract.
Red blood cells are highly deformable and present in variousshapes. In blood cell disorders, only a subset of all cells is morphologicallyaltered and relevant for the diagnosis. However, manually labeling of allcells is laborious, complicated and introduces inter-expert variability. Wepropose an attention based multiple instance learning method to clas-sify blood samples of patients suffering from blood cell disorders. Cellsare detected using an R-CNN architecture. With the features extractedfor each cell, a multiple instance learning method classifies patient sam-ples into one out of four blood cell disorders. The attention mechanismprovides a measure of the contribution of each cell to the overall classifi-cation and significantly improves the networks classification accuracy aswell as its interpretability for the medical expert.
Keywords:
Multiple Instance Learning · Attention · Red Blood Cells
Historically, classification of hereditary hemolytic anemias, a particular class ofblood disorders, is based on the abnormal shape of red blood cells. Sickle celldisease, spherocytosis, ovalocytosis, stomatocytosis: these types of anemia referdirectly to the changes in cell shape, whereas genetic causes of the disease wereidentified later [8,7,14]. All but one (sickle cell disease) of the above-mentioneddisorders are structural diseases of the red blood cell membrane caused by muta-tions in genes coding for the cytoskeletal proteins spectrin, ankyrin and band 3protein, as well as protein 4.2 [7] making cells look like flowers, stars, hedgehogs,cups, droplets or spheres [2]. Characteristic changes in morphology are hallmarks (cid:63)
Shared senior authorship a r X i v : . [ c s . C V ] J u l A. Sadafi et al. of diseases caused by abnormalities in hemoglobin structure as for sickle cell dis-ease and thalassemia. Somewhat more subtle are the changes in shape of redblood cells of patients harboring mutated glycolytic enzymes [9] or ion channels(like the Gardos channel or the PIEZO1 channel [18]). Independent of the causeand class of hereditary anemia, not all red blood cells but a fraction of them (of-ten as large as 5-10% of the total cell population) are abnormally shaped. Thismakes diagnosis based on shape changes alone difficult, and additional tests arecurrently required. Furthermore, detection of abnormal shapes suffers from thesubjective view of a human observer, a skillful, but possibly biased expert thatmay only process several hundreds of cells per patient. Instead, machine learningapproaches are required and have been showing to outperform human experts ina number of clinical tasks. Classification of skin cancer at the dermatologist levelproposed by Esteva et al. [6], human level recognition of blast cells by Mateket al. [15] or an AI system for breast cancer screening developed by McKinneyet al. [16] are just some of the various cases machine learning excels experts.Introduction of an unbiased computer-based assessment of red blood cell shapesand their abundance in a blood sample may open new possibilities for diagnos-tics, assessment of disease severity, and monitoring of treatment success for thepatients with rare anemias.Multiple instance learning (MIL) is used in medical image computation whenall of the instances from a patient must be taken into account and no specificlabel exists for each of the instances. For example, Campanella et al. [3] proposea method to whole slide pathology image classification with the MIL. Here, eachwhole slide image is weakly labeled as healthy or cancerous, but no specificlabel exists for every small image patch. Similarly, Ozdemir et al. [17] suggest amethod based on MIL to classify lung cancer CT scans. In a slightly differentwork, Conjeti et al. [5] suggest a method of hashing for medical image retrievalbased on MIL and the auxiliary branch for the vanishing gradient problem knownto impede MIL approaches. While these approaches perform on a patch level,none of them is able to identify single cells that are often crucial for the diagnosis.To this end, we propose a method based on weak patient labels and atten-tion based MIL to classify patient blood samples into disease classes. Cells areextracted from images by an R-CNN architecture previously trained on a singlecell detection task and feature maps of the backbone ResNet are passed to theproposed method. Without any cellwise labeling, the model manages to detectlandmark cells for every disease in an unsupervised way by giving them thehighest attention.
A patients blood sample may consist of several bright-field images (see Datasetfor a detailed description) and each image contains several instances / single redblood cells (see Fig. 1). A previously trained R-CNN architecture is used to findinstances and extract their features (see Fig. 2). Based on these features, wepropose to classify a sample into one of four diseases taking into account the fea-
IL Classification of Blood Disorders 3
R-CNNBrightfield microscopy ClassificationAttention A tt e n t i o n HighLow P r o p o s e d m e t h o d Patient blood sample Single cell detection
SpherocytosisHealthySCDThalPIEZO1 Fig. 1.
Overview of the proposed method. Bright-field images of blood samples areacquired with a microscope. Using a previously trained R-CNN, all cells are detectedfrom each image. Looking at all of the cells our proposed approach classifies the sampleand provides a cell-wise attention score for better interpretability. tures of all of the instances present in the input. Additionally, an attention scoreimproves performance and interpretability of the method to medical experts forfurther verification. The proposed approach consists of three main blocks: (i) themultiple instance learning, (ii) an auxiliary single instance classifier and (iii) theattention mechanism (see Fig. 1).More formally, our objective is a model f that classifies a blood sample con-taining several instances into one of the classes c i ∈ C and generates a score a k ∈ A denoting the contribution of each instance in the final decision:c i , a k = f(B) , (1)where B = { I , ..., I N } is the bag of instances and each instance I i is a tensorof the size 256 × × Any off the shelf detection algorithm can be used for detecting single cells in theimages. We are using a modified Mask R-CNN architecture [10] with ResNet[11]backbone that generates the features. For every detected cell the relevant featuresare extracted with RoI aligning and used as an instance I given as the input toour method. The Mask R-CNN was trained on a separate dataset from [20],consisting of 208 microscopic images with each containing 30-40 red blood cells(total of > A. Sadafi et al.
Attention scoresNK K Instance classificationAux-SICN … x … … … … … …Bag classification Conv, ReLUMaxpooling 2x2 Fully connectedFully connected, softmaxRoI aligning Matrix multiplication
R-CNNOne Sample
Fig. 2.
Architecture of the proposed method. The R-CNN extracts features from theinput images and detects the red blood cells. Based on the detected cells a bag of cellfeatures in all of the images is formed after RoI aligning. Passing through convolutionaland fully connected layer and attention pooling a feature vector is shaped for the bagand classified. An auxiliary single instance classification (Aux-SIC) branch helps thetraining during the first steps. N is the number of instances in a bag and K the totalnumber of classes. data for single cell extraction is limited to bounding boxes around the cells thatdoes not require any special expertise and can be performed by anyone withminimum training.
Single instance classification (SIC) is the most intuitive approach for classifica-tion of samples. In our case, each instance is passed through several convolutionallayers and a embedding feature vector h for every instance in the bag is gener-ated (Fig. 2). SIC is a CNN architecture that classifies each instance embeddingbased on the weak labels of the bag. At inference time, a majority voting isemployed to determine the class of a given sample based on the single cells’classification results: L SIC ( θ, ψ ) = 1N N (cid:88) i=1 CE( c i , ˆ c i ) , (2)where ˆ c i = f SIC ( h i ; ψ ), CE is the cross entropy loss, and h i = f EMB (I i , θ ) : I i ∈ B (3)with c i being the label for each cell i based on the bag label. ψ and θ are learnedmodel parameters. In contrast to supervised methods where for every given instance one tries tofind a target variable ˆ c ∈ C, multiple instance learning (MIL) tries to find the
IL Classification of Blood Disorders 5 target variable ˆ c based the input which is a set of instances B = { I , ..., I N } .There are two approaches to implement MIL: instance level and embeddinglevel approaches. We use the embedding level approach to formulate the MILproblem. As defined in eq. 3, the function f EMB maps every instance into a lowdimensional space h and a single representation for the whole bag z is generatedusing a MIL pooling method. A bag level classifier classifies z into one of theclasses. A few MIL pooling methods exist. One popular method is max pooling[1] where maximization is used to generate the bag level representation:z m = max k=1 ... N { h km } . (4)The MIL approach can be formulated as follows: L MIL ( θ, φ ) = CE( c , ˆ c ) , (5)where ˆ c = f MIL ( { h , . . . , h N } , { α , . . . , α N } ; φ ), where α is the attention score(see Sec. 2.4), and φ represents learned model parameters. An attention mechanism is widely used in various deep learning tasks from se-mantic segmentation [4] to conversational question answering [23]. Ilse et al. [13]proposed an attention mechanism where a weighted average is calculated overthe instance embeddings and these weights are learned by the neural network. IfH = { h , ..., h N } is a bag of instance embeddings, attention based MIL poolingis defined as: z = N (cid:88) k=1 α k h k , (6)where α k = exp { w T tanh(Vh Tk ) } (cid:80) Nj=1 exp { w T tanh(Vh Tj ) } . (7) V and w are parameters that are learned during the training. This atten-tion scores α k help the interpretability of the trained model by discovering thecontribution of each instance in the drawn conclusion and acting as a similaritymeasure for comparison between the instances. One of the difficulties of MIL is sparse and vanishing gradients due to instancepooling. Here, we propose a dynamic loss function that incorporates the MILloss along with the auxiliary SIC branch during the training using a decayingcoefficient defined as follows: L ( θ, φ, ψ ) = (1 − β E ) L MIL + β E L SIC , (8)where β is a hyper-parameter, and E is the epoch index. A. Sadafi et al.
We validated the proposed method on a dataset of bright-field microscopy imagesof human blood cell genetic disorders. We designed an ablation study as follows:(i) single instance classification (SIC), (ii) multiple instance learning (MIL) withmaxpooling, (iii) MIL with maxpooling and the auxiliary SIC branch, and (iv)MIL with attention pooling and auxiliary SIC branch.
Dataset.
All images are obtained by an Axiocam mounted on Axiovert 200mZeiss microscope with a 100x objective. No preprocessing is done and cells arenot stained. The data consists of patient samples acquired at different timepoints, or in different kinds of solutions. In each sample there are 4 - 12 imagesand each image contains 12 - 45 cells. The dataset contains four genetic mor-phological disorders: Thalassemia (3 patients, 25 samples), sickle cell disease (9patients, 56 samples), PIEZO1 mutation (8 patients, 44 samples) and hereditaryspherocytosis (13 patients, 89 samples). Also we have a healthy control group(26 individuals, 137 samples). We did patient-wise train and test split for a fairtest set selection.Patients previously diagnosed with hereditary spherocytosis were enrolledin the CoMMiTMenT-study ( ). This study wasapproved by the Medical Ethical Research Board of the University Medical Cen-ter Utrecht, the Netherlands, under reference code 15/426M and by the EthicalCommittee of Clinical Investigations of Hospital Clinic, Spain (IDIBAPS) un-der reference code 2013/8436. Genetic cause of the disease was identified bythe research group of Richard van Wijk, University Medical Center Utrecht, theNetherlands. The healthy subjects study was approved by the ethics committeesof the University of Heidelberg, Germany, (S-066/2018) and of the University ofBerne, Switzerland (2018-01766), and was performed according to the Declara-tion of Helsinki.
Implementation Details.
The proposed method consists of three components:multiple instance embedding, auxiliary SIC, attention & bag classifier. Fig. 2shows the architecture of the method.
The multiple instance embedding is a multi-layer convolutional neural networkused for embedding features extracted by the R-CNN. It consists of five convo-lutional layers, a dropout layer with probability of 0 . An auxiliary single instance classifier looks at every instance embedding andtries to classify it with a fully connected layer. We chose a β equal to 0 . IL Classification of Blood Disorders 7
Table 1.
Comparison of the proposed method (MIL + att. + SIC) with other baselines.Mean and standard deviation for accuracy, weighted F1 score and average of area underROC curve of all classes for five runs is shown.
Method Accuracy F1 Score AU ROC
SIC 0 . ± .
01 0 . ± .
01 0 . ± . . ± .
04 0 . ± .
05 0 . ± . . ± .
01 0 . ± .
11 0 . ± . . ± .
04 0 . ± .
01 0 . ± . In the attention and bag classifier the matrix of embedded instance represen-tations ( H ) is multiplied by the attention matrix. The attention matrix is dy-namically generated based on H . After the multiplication bag classifier, a fullyconnected layer with softmax, does the final MIL classification. Training.
We decided to use 3-fold cross validation. Three different, independentmodels are trained based on each fold and performances are averaged. The mod-els are trained by AMSGrad variation of Adam optimizer [19] with a learningrate of 0 . − . Training continues fora maximum of 150 epochs with an early stopping if the MIL loss drops belowa specific threshold (0 . https://github.com/marrlab/attMIL . Evaluation metrics.
Accuracy, macro F1 score and average area under the ROCand precision recall curves are used for comparison between different approaches.
Baselines.
SIC is the baseline for our approach. We compare the results with aMIL without the designed auxiliary SIC branch and a MIL with a maxpoolingmethod to our approach.
All of the experiments were run for five times and the average metric with stan-dard deviation is reported. For each of the experiments we report the accuracy,weighted F1 score and area under the ROC curve in Table 1. Additionally, inFig. 3, the area under precision recall curve for all five classes is reported.
The attention mechanism allows us to have a qualitative evaluation of the model.Showing the cells contributing most to the classification can be beneficial for clin-ical adaptability of the model as it provides the experts with some explanation of
A. Sadafi et al. A r e a und e r p r e c i s i o n r e c a ll c u r v e SICMIL + maxMIL + max + SICMIL + att. + SIC
Fig. 3.
Area under precision recall curve for all experiments and every class is demon-strated. We show mean and standard deviation of five runs.
Spherocytosis Sickle cell diseaseThalassemia PIEZO 1Healthy
HighLow
Fig. 4.
Exemplary whole slide images with attention values, demonstrated with coloredbounding boxes. White has the highest attention score while blue and dark blue arethe lowest. the decisions made by the neural network. If cells receiving a high attention areknown to be important for a specific morphological disorder, not only the modelhas learned them in an unsupervised manner but this also proves that the modelactually knows what is relevant and what is not. Figure 4 shows cell attentionin an exemplary image from samples belonging to each of the five classes.Further, we extracted the top eight cells of a sample from every class inthe dataset having the highest attention (Fig. 5). Cells are clearly morpholog-ically different. It is interesting to note that for the healthy control class, cellsthat look a little bit odd received highest attentions as they might be flags forsome disorders, although in the end they are not different enough to make thewhole sample considered as a disorder. Note that the attention of healthy cellsis generally lower than cells from disease samples (Fig. 4).
Our proposed approach based on MIL improves the performance of classifica-tion of the genetic blood disorders. The attention mechanism is effective both in
IL Classification of Blood Disorders 9
Spherocytosis Thalassemia Sickle cell diseaseHealthy PIEZO 1
Fig. 5.
Eight exemplary single cells per class with highest attention. terms of accuracy and interpretability of classification. The model automaticallylearns about diagnostic cells in the samples giving them a high attention. Theseresults are promising and have great potential for decision support and clini-cal applications in terms of diagnosing blood diseases and training of medicalstudents.Possible future works can include uncertainty estimation of the classification[22], including an active learning framework [21] and using extra features in theattention. Additionally, detailed analysis on the shape of the cells receiving highattention might be informative about the underlying pathological mechanismsand severity of the disease manifestation. This might allow stratifying patientsand targeted treatments in terms of personalized medicine, which is especiallyimportant for rare anemias such as spherocytosis and xerocytosis, but also forhemolytic anemia with poor response to conventional treatment [12].
Acknowledgments
C.M. and A.S. have received funding from the European Research Council(ERC) under the European Unions Horizon 2020 research and innovation pro-gramme (Grant agreement No. 866411). C.M. was supported by the BMBF,grant 01ZX1710A-F (Micmode-I2T). S.A. is supported by the PRIME pro-gramme of the German Academic Exchange Service (DAAD) with funds fromthe German Federal Ministry of Education and Research (BMBF).
References
1. Amores, J.: Multiple instance classification: Review, taxonomy and comparativestudy. Artificial intelligence , 81–105 (2013)2. Bessis, M.: Corpuscles: Atlas of Red Blood Cell. Springer (1974)0 A. Sadafi et al.3. Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Silva, V.W.K., Busam,K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade compu-tational pathology using weakly supervised deep learning on whole slide images.Nature medicine (8), 1301–1309 (2019)4. Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: The IEEE Conference on Computer Visionand Pattern Recognition (CVPR) (2016)5. Conjeti, S., Paschali, M., Katouzian, A., Navab, N.: Deep multiple instance hashingfor scalable medical image retrieval. In: International Conference on Medical ImageComputing and Computer-Assisted Intervention. pp. 550–558. Springer (2017)6. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.:Dermatologist-level classification of skin cancer with deep neural networks. Nature (7639), 115–118 (2017)7. Gallagher, P.G.: Red cell membrane disorders. ASH Education Program Book (1), 13–18 (2005)8. Gallagher, P.: Update on the clinical spectrum and genetics of red blood cell mem-brane disorders. Current hematology reports (2), 85–91 (2004)9. Grace, R.F., Glader, B.: Red blood cell enzyme disorders. Pediatric Clinics (3),579–595 (2018)10. He, K., Gkioxari, G., Doll´ar, P., Girshick, R.: Mask r-cnn. In: Proceedings of theIEEE international conference on computer vision. pp. 2961–2969 (2017)11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016)12. Huisjes, R., Makhro, A., Llaudet-Planas, E., Hertz, L., Petkova-Kirova, P., Ver-hagen, L.P., Pignatelli, S., Rab, M.A., Schiffelers, R.M., Seiler, E., et al.: Density,heterogeneity and deformability of red cells as markers of clinical severity in hered-itary spherocytosis. haematologica (2), 338–347 (2020)13. Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learn-ing. arXiv preprint arXiv:1802.04712 (2018)14. Kato, G.J., Piel, F.B., Reid, C.D., Gaston, M.H., Ohene-Frempong, K., Krish-namurti, L., Smith, W.R., Panepinto, J.A., Weatherall, D.J., Costa, F.F., et al.:Sickle cell disease. Nature Reviews Disease Primers (1), 1–22 (2018)15. Matek, C., Schwarz, S., Spiekermann, K., Marr, C.: Human-level recognition ofblast cells in acute myeloid leukaemia with convolutional neural networks. NatureMachine Intelligence (11), 538–544 (2019)16. McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian,H., Back, T., Chesus, M., Corrado, G.C., Darzi, A., et al.: International evaluationof an ai system for breast cancer screening. Nature (7788), 89–94 (2020)17. Ozdemir, O., Russell, R.L., Berlin, A.A.: A 3d probabilistic deep learning systemfor detection and diagnosis of lung cancer using low-dose ct scans. arXiv preprintarXiv:1902.03233 (2019)18. Picard, V., Guitton, C., Thuret, I., Rose, C., Bendelac, L., Ghazal, K., Aguilar-Martinez, P., Badens, C., Barro, C., B´en´eteau, C., et al.: Clinical and biologicalfeatures in piezo1-hereditary xerocytosis and gardos channelopathy: a retrospectiveseries of 126 patients. haematologica104