A Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images
Qazi Ammar Arshad, Mohsen Ali, Saeed-ul Hassan, Chen Chen, Ayisha Imran, Ghulam Rasul, Waqas Sultani
NNeural Computing and Applications manuscript No. ( This paper is under review at Neural Computing and Applications ) A Dataset and Benchmark for Malaria Life-Cycle Classificationin Thin Blood Smear Images
Qazi Ammar Arshad · Mohsen Ali · Saeed-ul Hassan · ChenChen · Ayisha Imran · Ghulam Rasul · Waqas Sultani Received: date / Accepted: date
Abstract
Malaria microscopy, microscopic examina-tion of stained blood slides to detect parasite
Plas-modium , is considered to be a gold-standard for de-tecting life-threatening disease malaria. Detecting theplasmodium parasite requires a skilled examiner andmay take up to 10 to 15 minutes to completely gothrough the whole slide. Due to a lack of skilled medi-cal professionals in the underdeveloped or resource de-ficient regions, many cases go misdiagnosed, which re-sult in unavoidable medical complications. We proposeto complement the medical professionals by creatinga deep learning-based method to automatically detect(localize) the plasmodium parasites in the photographof stained film. To handle the unbalanced nature ofthe dataset, we adopt a two-stage approach. Where thefirst stage is trained to detect blood cells and classifythem into just healthy or infected. The second stageis trained to classify each detected cell further into themalaria life-cycle stage. To facilitate the research in ma-chine learning-based malaria microscopy, we introducea new large scale microscopic image malaria dataset.Thirty-eight thousand cells are tagged from the 345microscopic images of different Giemsa-stained slides ofblood samples. Extensive experimentation is performedusing different Convolutional Neural Networks on thisdataset. Our experiments and analysis reveal that thetwo-stage approach works better than the one-stage ap-proach for malaria detection. To ensure the usability ofour approach, we have also developed a mobile app that
E-mail: [email protected] · Information Technology University, Lahore, Pakistan University of North Carolina at Charlotte, USA Chughtai Institute of Pathology, Lahore, Pakistan Ittefaq hospital, Lahore, Pakistan will be used by local hospitals for investigation and ed-ucational purposes. The dataset, its annotations, andimplementation codes will be released upon publicationof the paper.
Out of 400 species of the
Anopheles mosquito only 60species carry plasmodium , a parasite that causes malaria.Of that, only a female mosquito’s bite can transfer theparasite to humans. According to the World Health Or-ganization, only in 2018, 228 million cases of malariaoccurred worldwide and there were 405,000 deaths glob-ally due to malaria [33].. Malaria-causing mosquito, Anophe-les, is found in tropical and subtropical regions includ-ing Africa, Latin, America, and Asia; and adversely af-fects many resource-deprived regions. African continentshares 90% of the deaths caused by malaria, with 68%of the Ethiopian population and 97% of the DemocraticRepublic of the Congo lives in areas at risk of malaria.[3,31].Although rapid testing kits are becoming more com-mon, the standard method of diagnosis of malaria is amicroscopic analysis of the stained blood films or slides[1,10,31]. Being a low-cost and simple technique, it’sa vital tool for detecting both malarial parasites andtheir density. Treatment in the early-stage being vitalfor avoiding medical complications rely heavily on earlydiagnosis. Examination of slides is taxing and difficultsince a small portion of the slide is visible through mi-croscope and parasite itself is of small size and availablein low-density [32]”, resulting in 10-15 minutes for athorough examination. Adding to the time constraint,low-quality equipment or not sufficiently skilled medi-cal personnel [13] further adversely affect the quality of a r X i v : . [ ee ss . I V ] F e b Qazi Ammar Arshad et al. malaria microscopy analysis, leading to inaccurate re-sults [5,48]. For the successful recovery of a patient, itis vital to diagnose and treat malarial infection early.Named as among diseases of the poor, malaria mostlyaffects the poorest communities of the world, where pre-ventive measures are not affordable and medical treat-ment is not readily available.We propose a method to supplement the medicalprofessionals in increasing the accuracy and decreasingthe time required for the examination. Once the slidesare prepared, the images could be taken from either themicroscopic camera or using the mobile phone camerathrough the microscope’s eye-piece. Malaria cells will bedetected, counted, and classified by our algorithm. Theoutput of our algorithm shown by overlaying results onthe image can help medical professionals to analyze de-cisions made by the algorithm. With the help of oursuch technology even a less trained expert, not surpris-ingly common in developing countries [5,48], can detectmalaria parasites in the field prepared blood slides.In this paper, we present a two-stage approach formalaria detection and malaria life-cycle-stage classifi-cation. During the first stage, given the image capturedby a microscopic camera or mobile phone, microscopiccells are efficiently segmented using morphological oper-ation and watershed algorithm. This is followed by theextraction of deep convolution features for malaria ver-sus non-malaria classification. Once the malaria cellsare detected, in the second stage, we employ anotherdeep classifier for the malaria stage classifications.Currently, no large public microscopic image datasetannotating malaria is available, especially one from thedeveloping country. We have collected samples from alocal hospital in Lahore, Pakistan, and got them an-notated from the expert. Our dataset contains P.vivaxmalaria species in four different life cycle stages includ-ing Ring, Schizont, Trophozoite, Gametocyte. Finally,to make our approach to be used in practical scenarios,we have also developed a mobile app that makes ourapproach user friendly. Extensive experiments are con-ducted to handle the bias introduced due to the classimbalance in the dataset. The experimental results in-dicate the efficacy of our approach.
The automatic analysis of medical images, i.e., the im-ages containing microscopic cells [45,27,1,10,21], MRI[23,29], CT-scans [47,50] etc., can have huge impact onaccuracy and efficiency of diagnostics. Due to this, sev-eral recent works have devised automatic methods formedical image analysis; including the images containingthe malaria parasites. Devi et al. [10] introduced a hybrid classifier that isa combination of three different classifiers (SVM, kNN,and ANN) and used it for the life-cycle stage classi-fication of malaria cells. Their results show that theirproposed hybrid classifier performs better than the in-dividual classifier. Gloria et al. [14] adopted a two-stageapproach and used histogram features for the classifi-cation of malaria cells. The first stage is used to clas-sify healthy and malaria cells, while the second stageis used to identify the in-between malaria stages (ring,trophozoite, gametocyte, and schizont). Abbas et al.[1] proposed a color segmentation through an adaptivealgorithm of Gaussian mixture model for the classi-fication of healthy and malaria-infected cells in thinblood smear and estimated the level of parasitemia.Khashman .[21] used convolutional neural networks forthe classification of blood cells for microscopic images.Their neural network model classified the blood cellsinto three classes i.e., red blood cells, white blood cells,and platelets. Kiskin et al .[22] developed an algorithmto detect mosquitoes from acoustic data and showedthat audio recording classification through CNN em-ploying wavelet transformations achieve better resultsas compared to the classifiers that use handcrafted fea-tures.Di Ruberto et al. [12] introduced automatic detec-tion and classification of Giemsa stained blood cells.They employed a morphological operation for cell seg-mentation and enhanced cell roundness by disk basestructuring elements. Tek et al. [42] used weighted k-nearest neighbor to classify Giemsa stained blood cells.Rao et al. [37] used mathematical morphological opera-tion on Giemsa stained blood slides to analyze the dif-ferent life cycle stages of malaria species. Sio et al. [41]introduced a software named MalariaCount that countsthe number of infected blood cells. They used binarymorphology to classify between malaria vs healthy cells.Their experimental results on P. falciparum-infectedblood slides demonstrated the automatic calculationof the value of parasitemia. However, their approachdoes not address the classification of different plasmod-ium species and the life cycle stage of specific malariaspecies. Tek et al. [44] presented a framework that useda color normalization technique to improve the segmen-tation of blood cells. After that they extracted cellsfrom these slides, classify them as infected or healthyand also identify the species and life cycle stage of plas-modium. Kumarasamy et al. [24] proposed a unique so-lution for malaria species and stage classification. A ra-dial basis Support Vector Machine (SVM) was trainedon binary morphology and texture features to identifythe infection stage of the parasite. Linder et al. [25] usedSVM for malaria-infected cell identification in Giemsa
Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images 3 stained thin blood smear. Parasite candidate regionswere selected based on their color and size. After fea-tures extraction (SIFT, local binary pattern, and localcontrast), an SVM is trained on these features for par-asite detection. Bhowmick et al. [7] proposed to use thewatershed algorithm for cell segmentation on scanningelectron microscopic images captured at 2000X mag-nification. After that, they extracted different geomet-ric features to train a multi-layer neural network forthe classification task. Fatima et al. [16] used adaptivethreshold and mathematical morphological algorithmsfor malaria parasite detection in thin blood films micro-scopic images. Similarly, Molina et al. [30] designed asequential classification model in which they have com-bined three different classification modules. The firstmodule used SVM and the second and third modulesused Linear Discriminant Analysis (LDA) for classifica-tion tasks.Due to the resurgence of deep convolutional neu-ral networks, recent years have witnessed significantprogress in different areas of computer vision, includ-ing malaria detection Bibin et al. [8] introduced a deeplearning-based binary classifier for the classification ofinfected vs non-infected cells using color and texturefeatures and a deep belief network (DBN). A hybridmodel for malaria-infected blood cell classification wasintroduced by Devi et al. [11]. They employed a com-bination of three classifiers (SVM, k-NN, and NaiveBayes) and also introduced an optimal feature-set whichis a combination of some existing and new features. Au-thors in [46] introduced a novel deep model in whichthey replaced the last layer of the VGG network withSVM (named as VGG-SVM) and used it for the classi-fication of plasmodium falciparum-infected blood cells.Similar to malaria detection in a thin blood smear,several approaches have been proposed for automaticmalaria detection in a thick blood smear. Mehanianet al. [28] used Convolutional Neural Network (CNN)for the detection of the malaria parasite in the thickblood smear with sufficient accuracy. Salamah et al.[39] purpose thick blood smear segmentation techniqueusing intensity slicing and morphological operation formalaria parasite detection of microscopic images. Theproposed method is robust to noise, artifacts, and inten-sive variation in blood slides. Yang et al. [49] introduceda new publicly available thick blood smear dataset con-taining 1819 from 150 patients. Classification of malariaparasites is performed in two stages. The first stageuses intensity-based Iterative Global Minimum Screen-ing (IGMS) to generate the parasite candidates and inthe second stage, customized CNN is used to classifythe candidate region as parasite or background. In contrast to the above approach, we presented anew indigenously collected dataset containing micro-scopic images of malaria slides using a thin smear andput forward a two-stage approach for accurate cell clas-sification. Also, we provide a mobile app and an edu-cational use-case to make our approach user-friendly.
Our goal is to detect and localize malaria-infected cellsin images that are captured using either a mobile phonecamera or through a camera attached with a digitalmicroscope. Our proposed approach to tackle this chal-lenging problem is based on five observations: (1) sincethe purpose is to facilitate doctor on time, the pro-posed method should be efficient and must have highclassification accuracy, (2) in practical scenarios, some-times doctors just want to know the presence or ab-sence of malaria, while in some other critical situa-tions, they are interested to find out actual malariastage, therefore two-stage approach which serves bothpurposes would be preferable, (3) to make the medicalprofessional comfortable with the method, the approachshould be trained and tested on the locally acquire mi-croscopic data, (4) a user-friendly interface to employand interpret the output of the algorithm, (5) finally,to make new medical practitioners more comfortablewith computer vision-based diagnostics, the approachshould help them in their educational understandingof the problem. To make our paper self-contained andbe reproducible, we describe details of our approach inthe following sections. Specifically, we discuss cell seg-mentation and the details of the two-stage approach formalaria-infected cell detection. Finally, we discuss theexperimental results, our mobile app, and the applica-tion of our approach for education purposes.3.1 Cropping the area of interestUsing the mobile phone camera to perform microscopicanalysis introduces various artifacts. One of them isthe dark region surrounding the region of interest, ob-served when viewing the slide through the mobile cam-era Fig. 2.a. Due to the static focal length in most mo-bile phones, this could not be removed by camera set-tings. These darker shaded regions represent the back-ground and could affect the performance of the seg-mentation method. Using the basic image processingtechniques, these darker regions are removed. The firstimage is converted to the grayscale image and contrastis enhanced by applying the histogram equalization.
Qazi Ammar Arshad et al.
Fig. 1
The flow diagram of the proposed malaria detection approach. Processes start with an image captured with a mobilephone/microscopic camera followed by the pre-processing binary mask generation. After segmenting the cells using the wa-tershed algorithm, individual cells are fed into our binary classification network which separates healthy and infected cells.Finally, the infected cells are fed into another multi-class classification model for life cycle-stage detection.
The resultant image is converted into the binary im-age by a threshold. The threshold is picked using theOtsu method. The original image and crop darker re-gion according to the largest contour as shown in Fig 2.aand Fig 2.b respectively. Secondly, we count the ratioof darker to non-darker pixels in both row and columnand crop the image from the locations where this ratiogoes below a certain threshold. Fig 2.c shows the resultof the final image.
Fig. 2
Process of removing unwanted darker regions imagescaptured from microscope through mobile phone: a) showsthe unwanted darker region around the original image cap-tured with a mobile phone mounted over an optical micro-scope, b) shows the image after the first pre-processing stepin which image is cropped by finding the largest on the orig-inal image, c) represents the image after applying automaticthresholding method of preprocessing to completely removethe darker region from the image.
Binarization:
We first convert the input image intograyscale and apply histogram equalization on theimage to enhance the contrast of blood cells fromthe background. After that, we divide our input im-age into sub-parts and then apply otsu threshold-ing [34] to get a binary mask of these sub-images.In some cases due to non-uniform staining of pe-ripheral blood slides, light illumination may not bethe same in the whole image, therefore, this divisionhelps in getting different thresholds for foregroundand background in the different regions of the sameimage. After this, these sub-images are combined toget a binary mask.2.
Noise Removal:
To enhance the cell structure,we applied two mathematical morphological oper-ations. First, we apply an opening operation to re-move small artifacts from the binary image. Second,erosion operation is performed to separate the clus-tered cells from each other. Figure. 3.c shows the
Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images 5 binary mask generated after otsu thresholding andmorphological operations.3.
Segmentation:
The watershed algorithm [38] isapplied on binary masks to generate unique labelsfor each cell. Finally, we estimate contour on theselabels to get the individual cell localization (Figure.3.d and Figure. 3.e). A few cell localization resultsare shown in Figure 6.3.3 Cell’s infection-stage RecognitionEach localized cell is sent to the classification module,which labels each detected cell according to the stageof infection. Two strategies are adopted for the classifi-cation module, in the first one, we train a single-stageclassification to predict all the labels including whetherthe cell is healthy or not. This strategy, although rea-sonable, is adversely affected by the imbalanced distri-bution of healthy and non-healthy samples of the cells.In the second one, a two-stage setup is introduced tohandle this issue. Below we detail both designs.
Single-stage Classification (SSC):
A multi-label clas-sification network is trained to classify the cells ex-tracted from the localization step. The network consistof a Convolutional Neural Network-based feature ex-tractor is employed to extract the features, which areinput to fully-connected layers. The output layers inferthe probability of the cell belonging to four malariallife-stage classes i.e., ring, trophozoite, schizont, game-
Fig. 3 a) shows the original image captured with a mobilephone/microscopic camera, (b) represents histogram equal-ization is applied to the gray image to enhance contrast, (c)shows a binary mask of the image after applying a set of mor-phological operations and, (d) demonstrate labels generatedusing the watershed algorithm. Finally, (e) shows segmenta-tion results indicating red blood cells in a thin blood smearimage. tocyte, and the healthy class. To discover the convo-lution neural network architecture that performs bestfor our problem, we have used VGG16, VGG19 [40],ResNet50v2 [17], DenseNet169, DenseNet201 [18] andthe architecture proposed in Rajaraman et al.[36]. Allthese networks are first trained on imageNet dataset [9]and then fine-tuned on our dataset.Normally microscopic slides dataset contains a largenumber of healthy cells and very few infected cells. Dueto high-class imbalance, the learning process is biasedtowards predicting the health cells correctly rather thandifferentiating between the stages. To counter the ef-fects of class imbalance, we design a cascade structureconsisting of two stages, as described below.
Two-stage Classification (TSC):
To address the lim-itations of SSC, we propose a two-stage class classifica-tion approach. The first stage performs binary classifi-cation which classifies cells as healthy and malarial in-fected. After that, the cells that are indicated as malariaare further fed into a multi-class classification networkto identify the life cycle stage of infected cells. To han-dle the situation, where health cells miss-classified asinfected, the second stage also predicts healthy as oneof the possible labels. Figure. 1 shows the architectureof our two-stage method that is a combination of twodifferent networks.Similar to SSC, different CNN models, VGG16, VGG19,ResNet50v2, DenseNet169 and DenseNet201 and thearchitecture proposed by Rajaraman et al.[36] are usedin binary classification (first stage). For the training ofthe second stage, a subset of training data is generatedthat contains all malaria classes and randomly sampledhealthy cells, such that the training data is balanced.From the many different versions of the first stage, weselect the best binary classifier to classify the cells intomalaria and then use the second stage to further classifythe malaria cells into their different types. Our exper-iments demonstrate that the two-stage approach out-performs the one-stage approach.
Machine learning models in general and deep learn-ing in specific are highly dependent upon good qualityannotated datasets of reasonable size. We found cur-rently available datasets lacking and hence collected alarge one to help facilitate the research in this impor-tant direction. Before describing our malaria dataset,we briefly review the existing microscopic blood smeardataset for malaria detection.
Qazi Ammar Arshad et al.Malaria Dataset Malaria Species StageClassification ClassificationLabels Number ofannotated cells BBXLocalization SlideImagesSegmented-Malaria [36] P.f No Binary 27 ,
558 No NoMaMic Image Database [25] P.f No Binary 16 ,
991 No YesP.vivax (malaria)[6] P.v Yes MultiClass 80 ,
000 Yes Yesmalaria-655[43] P.f, P.v ,P.m, P.o Yes Multiclass 4363 No YesMP-IDB[26] P.f, P.v ,P.m, P.o Yes Multiclass 840 No YesIML-Malaria Dataset P.v Yes MultiClass 38 ,
449 Yes Yes
Table 1
A comparison of different malaria datasets. Binary classification labeled datasets only describe cells as healthy ormalaria while multi-class datasets provide labels of different life cycle stages of plasmodium. whereas P.f represents Plasmodiumfalciparum, P.v represents Plasmodium vivax, P.m represents Plasmodium malariae and P.o represents Plasmodium ovale.
Segmented-Malaria [36]: This dataset contains 27 , MaMic Image Database [25]: This dataset containsimages of blood slides that are used to identify P. falci-parum. Images are captured with a microscopic cam-era that scans the whole slide and captured around549 images of a single slide. Every single image has1280 × that provide a whole-slideview. For analysis, these images can be download as1500 × ,
991 blood cells in total.
P.vivax (malaria) [6]: This dataset consists of 1364images of thin blood smear stained with Gamesa. Thedataset contains around 80 ,
000 blood cells. The anno-tation contains cell-level bounding boxes annotations ofthe healthy and infected cell. Also, the dataset containsannotations of the cells at different life cycle stages in-cluding ring, trophozoite, schizont, and gametocyte
Malaria-655 [43]: This dataset contains 655 images ofnine different thin blood slides that are prepared sep-arately from each other at different time. To prepare http://demo.webmicroscope.net/SlideCollections their dataset, the blood smear is fixed on the slide withmethanol and stained with Giemsa stain. This datasetincludes images of four different species of malaria in-cluding P.falciparum, P.vivax, P.ovale, and P.malariae.SP200 research microscope with a mounted Canon A60camera is used for capturing images. Images are cap-tured at a 100x objective lens. The images in this datasetare of 1600 × MP-IDB [26]: This dataset contains four different genusof malaria parasites including P.vivax, P.falciparum,P.ovale and P. malariae. Each genus in the dataset hasfour distinct life cycle stages of plasmodium. Datasetis acquired at 100x magnification with Leica DM2000optical laboratory microscope coupled with a built-incamera. It consists of only 229 images with a 2592 × ,
000 blood cells and images arelabeled by expert pathologists. They provide image-level labels that indicate the presence of a malaria para-site and the name of the image contains the informationof the life cycle stage of the parasite in a specific image.
Limitations of existing dataset:
Each of the above-mentioned datasets has its limitations: Segmented-Malaria[36] dataset contains individual images of blood cellwithout image level labels or information. MaMic imagedatabase [25] dataset contains a small number of bloodcells. P. vivax (malaria) [6] is collected from a developedworld and does not necessarily represents our indige-nous hematology, Malaria-655 [43] lacks bounding boxlevel annotations and similarly MP-IDB [26] only pro-vides image-level labels. Table 3.3 demonstrates the de-
Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images 7
Fig. 4
This figure shows different stage of malaria includingring, trophozoite, schizont and gametocyte in IML-Malariadataset. The top row shows healthy cells. Please see Section1.1 for the details of different malaria stages. tailed comparisons between existing and the proposeddataset.
Our Dataset–IML-Malaria:
We collected blood samples from malaria-infectedpeople living in the province of Punjab, Pakistan. Pak-istan being an agricultural country, having warm andhumid weather, and a large irrigation network is amongregions with a high risk of malarial infection [19]. Likemany developing countries, malaria control efforts havebeen hampered due to a lack of expensive laboratoryinstruments, and, more importantly, skilled techniciansand doctors specifically in the remote areas [35]. Everyyear, on average, 500,000 confirmed cases of malarial in-fection and 50,000 deaths are caused by it are reportedin Pakistan [20]. The reasons are agriculture practices,monsoon season rains and vast irrigation network [19].Another reason for malaria infection in Pakistan is fre-quent floods [35]. Two major species causing malaria inPakistan are P.vivax and P.falciparum. The extensiveresearch done by Qureshi et al. [35] shows that 81.3%cases were caused by P. vivax, 14.7% by P. falciparum,and the remaining 4% cases are caused by a mixed in-fection in Pakistan. Pakistan is facing many challengesin malaria control due to misdiagnosis, lack of expensivelaboratory instruments, and, more importantly, skilledpeople specifically in remote areas [35].We have collected a new malaria dataset that iscaptured with an XSZ-107 series microscope. Globally49% cases of malaria [35,4] are reported due to P.vivax,therefore, we concentrate on collecting images of micro-scopic slides infected by genus of plasmodium. Blood smears were prepared by spreading thin blood film onthe slide and were left to dry. After fixing the bloodsmear through methanol, the slide is stained using Giemsasolution for 15 - 20 minutes. Finally, the cover-slips arepermanently fixed on the slide to keep them safe. Ourthin-film blood slides are prepared by trained expertsin labs. Images are captured with the help of a cameramounted on a microscope at 100x objective magnifica-tion. Images are annotated and captured by an experthematologist, at a local institute. The dataset contains345 images consisting of 111 blood cells on average,both healthy and ones infected by the parasite. We an-notate the infected cells into classes per life-cycle stage,that is, Ring, Trophozoite, Schizont, and Gametocyteas shown in Figure. 4. In addition to this bounding boxfor each cell is also annotated. It must be noted thattrophozoite and schizont are very rare [26] and so theyare present in very low numbers in our data collection aswell. Comparison between different datasets are shownin Table 3.3. Figure 5 some examples of images in ourdataset. We evaluate our proposed approach on two levels ofmicroscopic analysis, the first one is cell localizationand another one is stage classification.5.1 Cell LocalizationAs our CNN are trained on individual cells so we firstneed to localize blood cells for classification. As men-tioned in section 3.2, we apply the watershed segmen-tation algorithm with a combination of morphologicaloperations to localize cells in the blood slide image. Thecell is considered localized (true positive) if the inter-section over union (IOU) between predicted and groundtruth cell bounding box is greater than 50% otherwisethe predicted box is considered false positive. Similarly,the cells which are not detected by the algorithm areconsidered as a false negative. Table.2 shows the resultsof cell localization on our dataset. From the table, it canbe seen that 34,367 blood cells are accurately detectedwith 91.14% F -Score.5.2 Malarial life-cycle stage ClassificationPlasmodium parasites have different life-cycle phasesincluding ring, trophozoite, schizont, and gametocyte The actual magnification is 100 ×
10 (eye-piece) Qazi Ammar Arshad et al.
Fig. 5
Examples of images in IML-Malaria dataset. Slide images look different to each others, due to the effect of differentstaining conditions. The malaria stages shown as a) Trophozoite and Ring stage b) Schizont stage c) Ring and Gametocytestage. Blue circles in (d) show different regions containing artifacts in the blood slide.
Fig. 6
Result of cell localization on IML-Malaria dataset.The blue and red boxes show the ground truth predicted cellbounding boxes respectively.Dataset TP FP P R F -scoreIML-Malaria 34367 4082 0.94 0.89 0.91 Table 2
Localization results on IML-Malaria dataset. TP,FP, P and R represents true positive, false positive, precisionand recall respectively. stage. The life cycle stage helps in measuring the de-gree of the parasite inside the patient’s blood. Our IML-Malaria dataset is similarly labeled, including the labelhealthy for the cells without infection. We randomly di-vide the dataset into training (70%), testing (20%), andvalidation (10%) set. All the base-networks used, werepre-trained on imageNet [9] dataset before plugged intoour pipeline and fine-tuned on the malaria microscopytask. In our experiment we use the same setting forall the networks (Figure. 1), i.e., using the convolutionlayer of these state of the art networks and then addingtwo fully connected layers at the end of each network.Accuracy of trained single SSC model and TSC modelover the testing dataset are given in Table.3 and Ta-ble.5. To compute the average accuracy, we first com-pute class-wise accuracy and then divide the sum ofthese accuracies by the number of classes. Below we discuss the performance of SSC and TSC separately.
Single Stage Classification (SSC) Results :
Re-sults of SSC with different backbone networks are com-pared in Table.3. CNN proposed by [36] shows the av-erage accuracy of 41%, whereas DenseNet201 outper-forms with an average accuracy of 74%. Although theaverage accuracy appears reasonable, the confusion ma-trix, Figure. 7, indicates that the results are skewed bythe healthy cells.
Avg. Accuracy F -scoreRajaraman et al. [36] 46.1% 51.26%VGG-16 55.35% 52.40%VGG-19 70.44% 76.48%ResNet50v2 72.86% 76.44%DenseNet169 73.91% 77.62% DenseNet201 74.56% 75.52%Table 3
Single Stage multi-class classification results onIML-Malaria dataset. Among different CNN models used,DenseNet201 has the highest classification accuracy.
Fig. 7
Single stage multi-class classification confusion ma-trix of DenseNet201 that outperforms in single stage settingon test set of IML-Malaria dataset.
Two-Stage Classification
Both average accuracy and F score improves (Table.5) when two-stage model isapplied over the test data. As indicated by the confusion Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images 9Avg. Accuracy F -scoreRajaraman et al. [36] 84.93% 76.27%VGG-16 83.66% 75.23%VGG-19 94.79% 90.12% ResNet50v2 95.63% 89.91%
DenseNet121 94.77% 88.60%DenseNet169 95.61% 89.16%DenseNet201 95.18% 89.45%
Table 4
Classification results of stage-1 for different CNNmodels on IML-Malaria dataset. ResNet50v2 shows the bestaccuracy for binary classification. matrix, Figure 8, the prediction accuracy improves forthe Gametocyte and Trophozoit, with a small improve-ment in detection of the Ring stage. However, Schizontdoes not show much improvement. This could be asso-ciated with a very small number of Schizont samples inthe training and testing data.We perform extensive experiments to investigate howdifferent design decisions affect the overall accuracy ofthe model. First-stage (binary stage), is trained usingthe different backbones and tested on the validationset. It was found that ResNet50v2 has higher accuracythan other backbone models. Keeping ResNet50v2 asthe backbone for the first stage, we trained our secondstage with different backbone networks. Table 4 showsthe accuracies of binary classifications and the modelwith ResNet50v2 backbone shows the best accuracy onthe test data too.Table.5 shows the result of two-stage classificationon our malaria dataset. The combination of ResNet50v2at stage one and stage two shows the top average accu-racy of 79.61%. Table.6 shows the comparison of single-stage and two-stage classification. From Table.6, it canbe seen that the accuracy of each network is improvedin two-stage settings. Looking at the results in the con-text of Table 4, one can see the benefit of using the firststage to balance out the samples coming to the secondstage for evaluations.
Fig. 8
Two-stage multi-class classification confusion matrixwhich use ResNet50v2 at both stages. Avg. Accuracy F -scoreRajaraman et al. [36] 59.03% 60.59%VGG-16 79.04% 80.72%VGG-19 71.1% 76.05% ResNet50v2 79.61% 82.04%
DenseNet121 78.11% 79.40%DenseNet169 76.35% 79.08%DenseNet201 77.24% 79.27%
Table 5
Two-stage multi-class classification results on IML-Malaria dataset. ResNet50v2 is used in the first stage forbinary classification and then all CNN models are used formulti-class classification. Combination of ResNet50v2 at bothstages shows the highest accuracy.Avg. AccuracySingle Stage Avg. AccuracyTwo-StageRajaraman et al. [36] 46.1%
VGG-16 55.35%
VGG-19 70.44%
ResNet50v2 72.86%
DenseNet169 73.91%
DenseNet201 74.56%
Comparison of single stage and two-stage classifi-cation: For all CNN models, two-stage classification outper-forms one-stage classification accuracies.
Fig. 9
Snapshot of our mobile application through one canefficiently count and localize total and malaria infected cellin an image captured through a mobile camera mounted overthe microscope.
To make our solution accessible and usable in practicalscenarios, we have developed a mobile application. Themobile app provides a user-friendly interface to captureimages from a microscope using a mobile camera and achieve the total cell counts, number of malaria cells,and visualization of each malaria cells on the fly. Notethat all processing related to CNN is done on the serverand the mobile app is just used as an interface. To usethe App in the field, the user just needs to connect hismobile to the internet.The details of the mobile app and malaria detec-tion process through it is as follows: The mobile appis developed in Swift-5 using the Xcode IDE. Story-board is used for app design and Alamofire API [2] isemployed for http networking. Server APIs are builtusing the Django REST framework [15] that connectsmobile apps with python server. Images are capturedthrough a smartphone attached to a conventional lightmicroscope. Then, the mobile app sends the image toour server that contains segmentation and classifica-tion code for cell counting, detection of malaria-infectedblood cells, and classification of the malaria life cyclestage. All of these results are shown to users on mobilephones. This type of application will help many fieldtechnicians to save their time and efforts mostly in theregions where malaria is endemic. Our application pro-vides a complete and end-to-end solution for detectingmalaria with conventional light microscopes. Figure. 9shows a brief overview of our mobile application.
Constructing an interface for easy acquisition of the mi-croscopic images provides us an opportunity to repre-sent a complete slide as one. Manual slide examinationrequired mechanical movement and focus adjustmentrepeatedly which is a very hectic task. We implementa stitching mechanism, generating a global slide viewas shown in Figure 10.Such representation is not onlyhelpful for the practitioners but can be vital in educat-ing the medical students and retraining the profession-als. Training the medical practitioners and students onsuch recent technology will make them comfortable inusing the technology to augment their expertise. Thismodule could be extended to microscopic analysis otherthan malaria too.
In this paper, we provide a machine learning-based mech-anism for automatic malaria microscopy. Malaria re-sults in four hundred and five thousand fatalities ev-ery year from two hundred and twenty-eight million in-fected, many from the poor regions of the world. Ma-chine learning-based intervention is necessary to counterthe low-quality equipment and less-trained staff being used for diagnosis through blood slides and increase thespeed of diagnosis since timely diagnosis is vital for aspeedy recovery. We have collected a new large datasetof malaria microscopic images which help training deeplearning-based research on this problem. A Convolu-tional Neural Network-based, two-stage pipeline is pro-posed for accurate malaria cell stage recognition. Thecascaded nature of our architecture helps counter theeffects of the imbalanced nature of the dataset. To keepthe computational cost low, instead of the deep learning-based method, a watershed algorithm is used with themorphological operations to localize the cells. Exten-sive experiments were performed to measure perfor-mance. The mean accuracy of the selected model was79 .
61% accuracy and 82 . Declaration of Competing Interest:
We wish toconfirm that there are no known conflicts of interest as-sociated with this publication and there has been no sig-nificant financial support for this work that could haveinfluenced its outcome. We confirm that the manuscripthas been read and approved by all named authors andthat there are no other persons who satisfied the crite-ria for authorship but are not listed. We further confirmthat the order of authors listed in the manuscript hasbeen approved by all of us. We confirm that we havegiven due consideration to the protection of intellec-tual property associated with this work and that thereare no impediments to publication, including the tim-ing of publication, with respect to intellectual property.In so doing we confirm that we have followed the regu-lations of our institutions concerning intellectual prop-erty. We understand that the Corresponding Authoris the sole contact for the Editorial process (includingEditorial Manager and direct communications with theoffice). He/she is responsible for communicating withthe other authors about progress, submissions of revi-sions, and final approval of proofs. We confirm that wehave provided a current, correct email address which isaccessible by the Corresponding Author.
Acknowledgement
The project is partially supportedby an unrestricted gift award from Facebook, USA. Theopinions, findings, and conclusions or recommendationsexpressed in this publication are those of the author(s)and do not necessarily reflect those of Facebook.
Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images 11
Fig. 10
Blood slide image stitching that give a global view of the whole slide. Red circles shows the malaria infected bloodcells.
References
1. Abbas, N., Saba, T., Mohamad, D., Rehman, A., Al-mazyad, A.S., Al-Ghamdi, J.S.: Machine aided malariaparasitemia detection in giemsa-stained thin bloodsmears. Neural Computing and Applications (3), 803–818 (2018)2. Alamofire. https://github.com/Alamofire/Alamofire
3. Alemu, M., Tadesse, D., Hailu, T., Mulu, W., Derbie, A.,Hailu, T., Abera, B.: Performance of laboratory profes-sionals working on malaria microscopy in tigray, northethiopia. Journal of parasitology research (2017)4. Alias, H., Surin, J., Mahmud, R., Shafie, A., Zin, J.M.,Nor, M.M., Ibrahim, A.S., Rundi, C.: Spatial distributionof malaria in peninsular malaysia from 2000 to 2009. Par-asites & vectors (1), 186 (2014)5. Ashraf, S., Kao, A., Hugo, C., Christophel, E.M., Fatun-mbi, B., Luchavez, J., Lilley, K., Bell, D.: Developingstandards for malaria microscopy: external competencyassessment for malaria microscopists in the asia-pacific.Malaria journal (1), 352 (2012)6. Image set BBBC041v1. https://data.broadinstitute.org/bbbc/BBBC041/
7. Bhowmick, S., Das, D.K., Maiti, A.K., Chakraborty, C.:Computer-aided diagnosis of thalassemia using scanningelectron microscopic images of peripheral blood: a mor-phological approach. Journal of Medical Imaging andHealth Informatics (3), 215–221 (2012)8. Bibin, D., Nair, M.S., Punitha, P.: Malaria parasite de-tection from peripheral blood smear images using deepbelief networks. IEEE Access , 9099–9108 (2017)9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei,L.: Imagenet: A large-scale hierarchical image database.In: 2009 IEEE conference on computer vision and patternrecognition, pp. 248–255. Ieee (2009) 10. Devi, S.S., Laskar, R.H., Sheikh, S.A.: Hybrid classifierbased life cycle stages analysis for malaria-infected ery-throcyte using thin blood smear images. Neural Com-puting and Applications (8), 217–235 (2018)11. Devi, S.S., Roy, A., Singha, J., Sheikh, S.A., Laskar,R.H.: Malaria infected erythrocyte classification basedon a hybrid classifier using microscopic images of thinblood smear. Multimedia Tools and Applications (1),631–660 (2018)12. Di Ruberto, C., Dempster, A., Khan, S., Jarra, B.: Analy-sis of infected blood cell images using morphological oper-ators. Image and vision computing (2), 133–146 (2002)13. malERA Consultative Group on Diagnoses, Diagnostics:A research agenda for malaria eradication: diagnoses anddiagnostics. PLoS medicine (1), e1000396 (2011)14. D´ıaz, G., Gonz´alez, F.A., Romero, E.: A semi-automaticmethod for quantification and classification of erythro-cytes infected with malaria parasites in microscopic im-ages. Journal of Biomedical Informatics (2), 296–307(2009)15. Django REST framework.
16. Fatima, T., Farid, M.S.: Automatic detection of plasmod-ium parasites from microscopic blood images. Journal ofParasitic Diseases (1), 69–78 (2020)17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learn-ing for image recognition. In: Proceedings of the IEEEconference on computer vision and pattern recognition,pp. 770–778 (2016)18. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger,K.Q.: Densely connected convolutional networks. In: Pro-ceedings of the IEEE conference on computer vision andpattern recognition, pp. 4700–4708 (2017)19. Kakar, Q., Khan, M., Bile, K.: Malaria control in pak-istan: new tools at hand but challenging epidemiological2 Qazi Ammar Arshad et al.realities. EMHJ-Eastern Mediterranean Health Journal,16 (Supp.), 54-60, 2010 (2010)20. Khan, W., Rahman, A.U., Shafiq, S., Ihsan, H., Khan,K.: Malaria prevalence in malakand district, the northwestern region of pakistan. JPMA (946) (2019)21. Khashman, A.: Investigation of different neural modelsfor blood cell type identification. Neural Computing andApplications (6), 1177–1183 (2012)22. Kiskin, I., Zilli, D., Li, Y., Sinka, M., Willis, K., Roberts,S.: Bioacoustic detection with wavelet-conditioned con-volutional neural networks. Neural Computing and Ap-plications (4), 915–927 (2020)23. Konar, D., Bhattacharyya, S., Gandhi, T.K., Panigrahi,B.K.: A quantum-inspired self-supervised network modelfor automatic segmentation of brain mr images. AppliedSoft Computing (2020)24. Kumarasamy, S.K., Ong, S., Tan, K.S.: Robust contourreconstruction of red blood cells and parasites in the au-tomated identification of the stages of malarial infection.Machine Vision and Applications (3), 461–469 (2011)25. Linder, N., Turkki, R., Walliander, M., M˚artensson, A.,Diwan, V., Rahtu, E., Pietik¨ainen, M., Lundin, M.,Lundin, J.: A malaria diagnostic tool based on computervision screening and visualization of plasmodium falci-parum candidate areas in digitized blood smears. PLoSOne (8) (2014)26. Loddo, A., Di Ruberto, C., Kocher, M., Prod’Hom, G.:Mp-idb: the malaria parasite image database for imageprocessing and analysis. In: Sipaim–Miccai BiomedicalWorkshop, pp. 57–65. Springer (2018)27. Lu, Y., Qin, X., Fan, H., Lai, T., Li, Z.: Wbc-net: A whiteblood cell segmentation network based on unet++ andresnet. Applied Soft Computing (2021)28. Mehanian, C., Jaiswal, M., Delahunt, C., Thompson, C.,Horning, M., Hu, L., Ostbye, T., McGuire, S., Mehanian,M., Champlin, C., et al.: Computer-automated malariadiagnosis and quantitation using convolutional neuralnetworks. In: Proceedings of the IEEE InternationalConference on Computer Vision Workshops, pp. 116–125(2017)29. Mittal, M., Goyal, L.M., Kaur, S., Kaur, I., Verma, A.,Jude Hemanth, D.: Deep learning based enhanced tumorsegmentation approach for mr brain images. Applied SoftComputing (2019)30. Molina, A., Alf´erez, S., Bold´u, L., Acevedo, A., Rodel-lar, J., Merino, A.: Sequential classification system forrecognition of malaria infection using peripheral bloodcell images. Journal of Clinical Pathology (2020)31. Mukadi, P., Gillet, P., Lukuka, A., Atua, B., Sheshe,N., Kanza, A., Mayunda, J.B., Mongita, B., Senga, R.,Ngoyi, J., et al.: External quality assessment of giemsa-stained blood film microscopy for the diagnosis of malariaand sleeping sickness in the democratic republic of thecongo. Bulletin of the World Health Organization ,441–448 (2013)32. Organization, W.H., et al.: Methods for surveillance ofantimalarial drug efficacy. 2009. Geneva, Switzerland(2015)33. Organization, W.H., et al.: World malaria report 2019(2019)34. Otsu, N.: A threshold selection method from gray-levelhistograms. IEEE transactions on systems, man, andcybernetics (1), 62–66 (1979)35. Qureshi, N.A., Fatima, H., Afzal, M., Khattak, A.A.,Nawaz, M.A.: Occurrence and seasonal variation of hu-man plasmodium infection in punjab province, pakistan.BMC infectious diseases (1), 935 (2019) 36. Rajaraman, S., Antani, S.K., Poostchi, M., Silamut, K.,Hossain, M.A., Maude, R.J., Jaeger, S., Thoma, G.R.:Pre-trained convolutional neural networks as feature ex-tractors toward improved malaria parasite detection inthin blood smear images. PeerJ , e4568 (2018)37. Rao, K.M.a., Dempster, A., Jarra, B., Khan, S.: Auto-matic scanning of malaria infected blood slide images us-ing mathematical morphology (2002)38. Roerdink, J.B., Meijster, A.: The watershed transform:Definitions, algorithms and parallelization strategies.Fundamenta informaticae (1, 2), 187–228 (2000)39. Salamah, U., Sarno, R., Arifin, A.Z., Nugroho, A.S., Rozi,I.E., Asih, P.B.S.: A robust segmentation for malaria par-asite detection of thick blood smear microscopic images.Int. J. Adv. Sci. Eng. Inf. Technol. (4), 1450–1459 (2019)40. Simonyan, K., Zisserman, A.: Very deep convolutionalnetworks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014)41. Sio, S.W., Sun, W., Kumar, S., Bin, W.Z., Tan, S.S.,Ong, S.H., Kikuchi, H., Oshima, Y., Tan, K.S.: Malaria-count: an image analysis-based program for the accuratedetermination of parasitemia. Journal of microbiologicalmethods (1), 11–18 (2007)42. Tek, F.B., Dempster, A.G., Kale, I.: Malaria parasite de-tection in peripheral blood images. BMVA (2006)43. Tek, F.B., Dempster, A.G., Kale, I.: Images of thin bloodsmears with bounding boxes around malaria parasites(malaria-655). Computer Vision and Image Understand-ing (2010). URL
44. Tek, F.B., Dempster, A.G., Kale, I.: Parasite detectionand identification for automated thin blood film malariadiagnosis. Computer vision and image understanding (1), 21–32 (2010)45. To˘ga¸car, M., Ergen, B., C¨omert, Z.: Classification ofwhite blood cells using deep features obtained from con-volutional neural network models based on the combina-tion of feature selection methods. Applied Soft Comput-ing (2020)46. Vijayalakshmi, A., et al.: Deep learning approach to de-tect malaria from microscopic images. Multimedia Toolsand Applications pp. 1–21 (2019)47. Wang, B., Jin, S., Yan, Q., Xu, H., Luo, C., Wei, L.,Zhao, W., Hou, X., Ma, W., Xu, Z., Zheng, Z., Sun, W.,Lan, L., Zhang, W., Mu, X., Shi, C., Wang, Z., Lee, J.,Jin, Z., Lin, M., Jin, H., Zhang, L., Guo, J., Zhao, B.,Ren, Z., Wang, S., Xu, W., Wang, X., Wang, J., You,Z., Dong, J.: Ai-assisted ct imaging analysis for covid-19screening: Building and deploying a medical ai system.Applied Soft Computing (2021)48. Wongsrichanalai, C., Barcus, M.J., Muth, S., Sutami-hardja, A., Wernsdorfer, W.H.: A review of malaria diag-nostic tools: microscopy and rapid diagnostic test (rdt).The American journal of tropical medicine and hygiene (6 Suppl), 119–127 (2007)49. Yang, F., Poostchi, M., Yu, H., Zhou, Z., Silamut, K.,Yu, J., Maude, R.J., Jaeger, S., Antani, S.: Deep learningfor smartphone-based malaria parasite detection in thickblood smears. IEEE Journal of Biomedical and HealthInformatics24