DeepCervix: A Deep Learning-based Framework for the Classification of Cervical Cells Using Hybrid Deep Feature Fusion Techniques
Md Mamunur Rahaman, Chen Li, Yudong Yao, Frank Kulwa, Xiangchen Wu, Xiaoyan Li, Qian Wang
DD EEP C ERVIX : A D
EEP L EARNING - BASED F RAMEWORK FORTHE C LASSIFICATION OF C ERVICAL C ELLS U SING H YBRID D EEP F EATURE F USION T ECHNIQUES
A P
REPRINT
Md Mamunur Rahaman
Microscopic Image and Medical Image Analysis Group, MBIE CollegeNortheastern UniversityShenyang 110169, China [email protected]
Chen Li
Microscopic Image and Medical Image Analysis Group, MBIE CollegeNortheastern UniversityShenyang 110169, China [email protected]
Yudong Yao
Department of Electrical and Computer EngineeringStevens Institute of TechnologyHoboken, NJ 07030, USA
Frank Kulwa
Microscopic Image and Medical Image Analysis GroupMBIE CollegeNortheastern UniversityShenyang 110169, China
Xiangchen Wu
Suzhou Ruiguan Technology Company Ltd.Suzhou 215000, China
Xiaoyan Li
Cancer Hospital of China Medical UniversityLiaoning Hospital and InstituteShenyang 110042, China
Qian Wang
Cancer Hospital of China Medical UniversityLiaoning Hospital and InstituteShenyang 110042, ChinaFebruary 25, 2021 A BSTRACT
Cervical cancer, one of the most common fatal cancers among women, can be prevented by regularscreening to detect any precancerous lesions at early stages and treat them. Pap smear test is a widelyperformed screening technique for early detection of cervical cancer, whereas this manual screeningmethod suffers from high false-positive results because of human errors. To improve the manualscreening practice, machine learning (ML) and deep learning (DL) based computer-aided diagnostic(CAD) systems have been investigated widely to classify cervical pap cells. Most of the existingresearches require pre-segmented images to obtain good classification results, whereas accuratecervical cell segmentation is challenging because of cell clustering. Some studies rely on handcraftedfeatures, which cannot guarantee the classification stage’s optimality. Moreover, DL provides poor a r X i v : . [ ee ss . I V ] F e b PREPRINT - F
EBRUARY
25, 2021performance for a multiclass classification task when there is an uneven distribution of data, which isprevalent in the cervical cell dataset. This investigation has addressed those limitations by proposingDeepCervix, a hybrid deep feature fusion (HDFF) technique based on DL to classify the cervicalcells accurately. Our proposed method uses various DL models to capture more potential informationto enhance classification performance. Our proposed HDFF method is tested on the publicly availableSIPAKMED dataset and compared the performance with base DL models and the LF method. Forthe SIPAKMED dataset, we have obtained the state of the art classification accuracy of . , . , and . for 2-class, 3-class, and 5-class classification. Moreover, our method is testedon the Herlev dataset and achieves an accuracy of . for binary class and . for 7-classclassification. K eywords Cervical cancer · Classification · Ensemble learning · Feature fusion · Deep learning · Pap smear
Cervical cancer, found in woman’s cervix, is the fourth most prevalent cancer among women [1]. According to theWorld Health Organization (WHO), approximately 570 000 women are diagnosed with cervical cancer globally, andabout 311 000 women have lost their lives due to this fatal disease in 2018 alone [2]. More than 80% of the cervicalcancer cases and 85% of deaths occur in poor and developing nations because of the absence of screening and treatmentfacilities [3]. Improper menstrual hygiene, pregnancy at an early age, smoking and use of oral preventatives are theleading risk factors that lead to the infection with human papillomavirus (HPV) [4]. Research has revealed that longterm infection with HPV is the main reason for cervical cancer. However, Cervical cancer is the most treatable form ofcancer if it is detected early and treated adequately [5].Routine screening of women over 30 years old plays a vital role to prevent cervical cancer effectively by allowing theearly detection and treatment [6]. The most popular screening technique to detect the cervical malignancy is cervicalcytopathology (pap smear test or liquid-based cytology) due to its cost-effectiveness [5, 7]. In this technique, cells arecollected from the squamocolumnar terminal of the cervix and the malignancy is checked under the light microscope byexpert cytologists [8, 9]. It usually demands 5-10 minutes to analyze a single slide based on the different orientation andoverlapping of the cells [10]. Moreover, manual screening method is difficult, tedious, time-consuming, expensive andsubject to errors because each slide contains around three million cells with different orientation and overlapping, whichleads to developing an automated computerized system that can analyze the pap cell effectively and efficiently [11, 12].With the possibility to train data at the end of 1990s, there has been extensive research for the development of computer-aided diagnostic (CAD) system to help doctors to track cervical cancer [13]. The traditional CAD system consists ofthree steps: cell segmentation (cytoplasm, nuclei), feature extraction and classification. In this system, firstly, filteringbased preprocessing work is performed to enhance image quality. Then, cell nuclei are extracted using k-means [14],clustering [15] or super-pixel [16] methods. After, the post processing task is performed to correct the segmentednucleus. After that, handcrafted features [17, 18, 19], such as Morphological features, color metric features and texturefeatures are extracted from the segmented nucleus. Next, the feature selection technique is applied to find the mostdiscriminant features, and finally, a classifier is designed to classify the cell [20].The above-described method requires many steps to process the data and extracted handcrafted features cannot ensuresuperior classification performance, which also highlights the incompetence of automatic learning. In order to obtainan enhanced CAD system, deep learning (DL) based feature extraction methods have a significant advantage overother machine learning (ML) algorithms. DL based algorithm is achieving the state-of-the-art results on challengingcomputer vision tasks [21, 22]. One compromise with DL is that it demands a considerable amount of data to obtain agood result compared with ML techniques, which is challenging to obtain in the medical domain [23]. Moreover, DLalso provides poor performance when there is an uneven distribution of the sample data in a multiclass classificationproblem, which is very prevalent in the medical domain. Therefore, the CAD technique for the analysis of pap cellsrequires further research and development.In this study, we have introduced DeepCervix, which is a DL based framework to accurately classify the cervicalcytopathology cell based on hybrid deep feature fusion (HDFF) techniques. In our proposed framework, we haveused pre-trained DL models that are trained on ImageNet datasets ( > PREPRINT - F
EBRUARY
25, 2021
Training data
Data Preprocessing i. Rescalingii. Dataset generation (Geometric manipulation)iii. In place data augmentation (Keras ImageDataGenerator)
VGG19 Model
26 BN
27 Dropout28 D, SM
ResNet50 Model
86 conv4_block1_2_relu -
174 conv5_block3_out
175 GMP(conv4_block1_0_conv)176 BN177 Dropout178 D(1024), relu179 BN180 Dropout181 D, SM
XceptionNet Model
VGG16 Model
D(1024 ), relu23 BN24 Dropout25 D, SM F r o ze n F i n e T un e d F i n e T un e d Feature combination of D (1024) Sequential Model, Dropout, BN, SMClassification
Test Images
Evaluation Metrics
AccuracyPrecisionRecallF-scoreCervical cells(a) (b)(c)(d) (e)(f)FFN
Figure 1: Workflow diagram of the proposed DeepCervix network. (Global Max Pooling (GMP), Batch Normalization(BN), Dense Layer (D), SoftMax (SM))on SIPAKMED dataset, consisting of single-cell cervical cytopathology images. For SIPAKMED dataset, we haveachieved the highest classification accuracy of 99.85%, 98.38% and 99.14% for 2-class, 3-class and 5-class classificationproblems, respectively. Moreover, we have also tested our method on Herlev dataset and reached an exactitude of98.91% for binary classification and 90.32% for 7-class distribution problem. The workflow of the suggested HDFFmethod is presented in Fig. 1. From the workflow diagram, we can see that:• As shown in Fig. 1, the cervical pap smear images are first retrieved from accessible databases (e.g.,SIPAKMED, Herlev) and considered as training samples.3
PREPRINT - F
EBRUARY
25, 2021• In the preprocessing step, two stages of data augmentation task are implemented; first is to use somegeometric manipulation, such as affine transformations, adding noises (Gaussian, Laplace), canny filter, edgedetection, colour filter, change of brightness and contrast to increase the training samples. Second is to use thein-place data augmentation technique utilizing the Keras “ImageDataGenerator” API, where the images arereconstructed randomly during the training time.• After the preprocessing step, the images are supplied to four DL models, VGG16, VGG19, XceptionNet andResNet50. From Fig. 1-(c), it is seen that for VGG16 model, we have fine-tuned the last convolutional block,from layer-13 to layer-18 along with the top-level classifier.• In the feature fusion network (FFN) stage, first, we extract the features from the last layer before the SM layerof the DL models to create the feature arrays with 1024 features from each model. Then, the feature arrays arefed into the sequential model connecting with dense layer with BN and dropout layer in between, to performthe classification.• In this step, unseen test images are provided to perform the classification.• Finally, we have assessed the performance of the proposed model by calculating the precision, recall, F scoreand accuracy.The main contributions of this paper are as follows: (1) To the best of our knowledge, this is the first study to classifycervical cytopathology cell using HDFF techniques. (2) Two different stages of data augmentation techniques arepresented in this study. (3) Four types of CNN’s with enhanced structure, VGG16, VGG19, XceptionNet and ResNet50are introduced to extract the complementary features from various depths of the networks. (4) An improved FFN isincluded to integrate the features adaptively by combining dense layer with SM, BN and dropout layer in between.(5) Our proposed method achieves the highest classification accuracy on the SIPAKMED dataset, which shows thepotential of improved cervical cancer diagnostic systems.The remainder of this paper is organized as follows: Sec. 3 presents relevant studies of DL for the analysis ofcervical cytopathology images and relevant feature fusion studies in computer vision tasks. Sec. 4 investigates datapre-processing techniques that we have utilized in our experiment and our proposed methods. Sec. 5 explains theexperimental dataset, data settings, experimental setup, evaluation method, and experimental results and analysis. Sec. 6discusses our proposed method with some examples of misclassified images. Finally, Sec. 7 concludes this paper bypointing out some limitations of our method. An overview of relevant DL approaches that are employed to analyze the cervical cells and feature fusion techniques inimaging modalities are compiled in this section.
Various DL and ML-based techniques have been applied to classify the cervical cells. For instance, [24] utilizes thehistogram features, texture features, grey level features and local binary pattern features. Then, the features are suppliedinto a hybrid classifier system combining with SVM and adaptive neuro-fuzzy interface system to analyze the cervicalcells into normal and abnormal. A hybrid ensemble technique is introduced by combining 15 different machine learningalgorithms, such as random forest, bagging, rotation forest and J graft to classify the cervical cells [25]. Theyobserve that a hybrid ensemble technique performs better than an individual algorithm.A deep CNN (base AlexNet) based feature extraction method is applied in [26], followed by an unsupervised featureselection task. Later, feature vectors are supplied into the least-square version of the support vector machine (LSSVM)and SoftMax regression to classify the cervical cells. [27] designs a model to extract the features using VGG16 fromcervical cells and fed the features into ML classifiers, support vector machine (SVM), random forest and AdaBoost.They discern that SVM functions better than other ML classifiers. A pre-trained AlexNet architecture is employed toextract the characteristics of cervical cells and apply those features to classify them using SVM [28]. A CNN basedclassification approach is explained in [29] to classify the cervical cells applying VGG16 and ResNet architectureand explore that ResNet50 is more suitable than VGG16 based on the performance. A deep transfer learning-basedclassification approach is presented in [30] to classify the cervical cells into healthy and abnormal with prior data4 PREPRINT - F
EBRUARY
25, 2021augmentation and patch extraction work. [31] applies deep transfer learning technique based on AlexNet to detect,segment and classify the cervical cells and demonstrates that segmentation is not necessary for classification. AlexNet,GoogleNet, ResNet and DenseNet based pre-trained and fine-tuned CNN architecture is employed to classify thecervical cells in [32], where segmentation of cytoplasm and nucleus are prerequired for this method.Similarly, In [33], VGG-like network consists of seven layers uses pre-segmented cervical cells to perform theclassification task. A comparative study is performed based on five DL models, ResNet101, Densenet161, Alexnet,VGG19 and SqueezeNet to check their classification performance on the cervical dataset, where DenseNet161 providesthe maximum accuracy [34]. Moreover, [35] coupled the features of pre-trained Inception-V3, ResNet152 andInceptionResNetV2 to analyze biomedical images. In addition, a detailed study about relevant work, it is recommendedto go through our survey paper about cervical cytopathology image analysis using DL [1].It is perceived from the reference review that most of the authors have conducted a binary classification task, whereas,in practice, multiclass classification is more important. Moreover, the transferred model often unable to acknowledgethe characteristics of medical images, and traditional features can not guarantee the optimality of the system. Therefore,this paper investigate methods to address those issues.
A hybrid fusion approach, combining early and late fusion is presented in [36] for the diagnosis of glaucoma. Hand-crafted features such as Gray level co-occurrence matrix, central and Hu moments are consolidated with deep features.Later, the feature vectors are supplied to SVM and CNN based classifier. A satellite remote sensing scene classificationmethod based on multi-structure deep feature fusion is presented in [37]. CaffeNet, VGG-VD16 and GoogLeNetare applied to extract the features and fuse those features through the fusion network to do the classification. [38]develops a CAD method to detect breast cancer by employing feature fusion with CNN. They have combined the deepfeatures, morphological features, texture features, density features and fuse those features through extreme machinelearning classifier to classify the breast masses into benign and malignant. In our previous study [39], we have classifiedcervical histopathology images using weighted voting based ensemble learning techniques. In [40], an ensemble ofdifferent CNN structure, is obtained to classify medical images. The proposed ensemble method proves better predictivecapability by combining the results of different classifiers. [41] practices the pre-trained AlexNet and VGG16 to extractthe features from segmented skin lesions and classify them into benign and malignant.
The cervical cytopathology cell images (SIPAKMED dataset) that we have employed to check the performance of ourproposed method are in BMP format with dimensions ranging from ( × ) to ( × ) pixels. Therefore, wehave rescaled the object size to ( × ) pixels for all the four CNN networks. In this respect, we have utilized theKeras “preprocess-input" function, which transforms input images according to the model requirement. Various geometric transformations and image processing functions are discussed in this subsection that we have used inour experiment. The data augmentation task is performed using machine learning “imgaug" library, fourth version,which supports various augmentation techniques. The newly formed images saved along with the training images andincrease the training data size by a factor of six, which is used to obtain better results.• Affine Transformations (ATs): ATs are geometric manipulations that move a pixel from a coordinate positionof ( a, b ) to a new position of ( a (cid:48) , b (cid:48) ) . A pair of transformations specify the movement, a (cid:48) = T a ( a, b ) , b (cid:48) = T b ( a, b ) (1)It combines linear transformations and translations. In our experiment, we have performed rotation, scaling,translation, shearing and horizontal and vertical flip operations of an image. For a batch of training images,one of these transformations is randomly arranged.• Contrast limited adaptive histogram equalization (CLAHE): As we know, histogram equalization (HE)enhances the contrast of images, which may lead to too bright or dark regions. Whereas, CLAHE performshistogram equalization by dividing images into small blocks, where each block performs HE. As a result, it5 PREPRINT - F
EBRUARY
25, 2021prevents the over-amplification of noise and contrast in an image. CLAHE, all channel CLAHE and gammacontrast are employed in our experiment. One of the CLAHE augmenters is randomly chosen from a batch oftraining samples.• Edge detection: “EdgeDetect” and “DirectedEdgeDetect" functions are used from imgaug API that transformsthe input images into edge images, where edges are detected from random angles and mark non-edge region asblack and edge region as white.• Canny filter: Canny edge detection augmenters are also utilized, where the input images are preprocessedusing Sobel filter.• Photometric transformations (Pms): PMs are accomplished by shuffling all the colour channels, turning imagesinto grayscale, changing hue and saturation value, adding hue and saturation and quantizing images up to 16colours.• Contrast adaptation (CA): CA is performed by modifying the contrast and brightness of an image.
In order to enhance model performance, Keras “ImageDataGenerator" API is applied [42]. The images are transformedrandomly during the training time. As a result, the network examines unlike samples in each epoch, which extend themodel generalizability. In this process, we have set the featurewise center as false, rotation range is set to 5 degrees andfill mode is nearest. Then, we have fixed horizontal and vertical flips to true, brightness range from 50% to 130% andkept the channel shift range true.
Lately, DL, one type of ML algorithms, is the most commonly designed and successful type of ml algorithm to analyzethe medical images. Convolutional neural network (CNN) is the most prevalent deep learning architecture. Research hasconfirmed that CNNs are robust to image noise and invariant to translation, rotation and size, which increase the object’sanalyzing ability [43, 44]. The CNN architecture is composed of convolution, pooling and fully connected layers. Themain building block of CNN structure is convolution layer, which extracts the low- and high-level features of an imageas the layer gets deeper [45]. The pooling layer after the convolution layer reduces the size of the convoluted features byextracting the maximum or average value through max-pooling or average pooling operation. A fully connected layer(FCL) connects every neuron of each layer to another layer to classify the image, followed by the principle of multilayerperceptron [46]. In this study, we have utilized VGG-16, VGG-19, ResNet-50 and XceptionNet as CNN architecture.1. VGGNet: The VGGNet came with the idea of a deeper network with smaller filter. The model can have 16 to19 layers with fixed input size of × × . The convolution filter size is ( × ) with a stride of 1 pixel.A linear transformation of input is also performed by ( × ) convolution filter with ReLU activation function.A total of five max-pooling operations is performed with window size ( × ), followed by three FCL. Thesignificant discovery of the VGGNet is the small receptive field ( × ), which enables to have more weightlayers, consequently, to improve the performance [47].2. ResNet: [48] observes that with the increase of network depth the network performance improves at a certainlevel and then degrades rapidly. Therefore, it introduced skip connections to increase the performance withnetwork depth. Thus, it is possible to have 1000 weight layer in ResNet. For a X feature input of a convolutionlayer with F(x) as a residual function, the input of the first layer (x) is copied to the output layer, H ( x ) = F ( x ) + x, or, F ( x ) = H ( x ) − x (2)The structure of the residual learning block is shown in Fig. 2.3. XceptionNet: The extended version of Inception model is XceptionNet, which is based on depth wise separableconvolutions, followed by pointwise convolution. The model is lighter with few number of connections andprovides better results on ImageNet classification then InceptionV3, ResNet and VGGNet [49]. To train a CNN from scratch demands a considerable amount of data with high computing power, which also costslonger training time. In medical domain, image datasets are usually in the order of − , since arranging largeannotated dataset is quite impossible. Moreover, the image quality is also inferior. The solution to this problem istransfer learning (TL), which helps to create an accurate model by starting the learning from patterns that have been6 PREPRINT - F
EBRUARY
25, 2021
X feature map X identity
H(x) = F(x) + X
F(x) ReLU
ReLU
Residual learning block
Figure 2: The structure of residual learning block of Resnet.
Parametertransferring
Fine-tuning convolutional block
Fine-tuning
Fully connected classifier
Pre Training
W1 W2 W3 - - -WkFeature maps Feature mapsFeature maps FC FC FCImageNet C & PC & PC & PC & PC & PC & PSIPAKMEDInput Image( 224×224×3 )
Feature maps @block2_conv1
Feature maps @block4_conv1 Feature maps @block4_conv3 (a)(b)
Figure 3: (a) Visualization of TL process, where parameters are transferred from another CNN and fine-tuned oncervical cancer cell dataset, (b) Visualization of the feature maps of three different convolutional layers of VGG16.already learned on solving different problems instead of learning from the scratch [50, 51]. Therefore, TL is an approachin DL and ML techniques, that allow us to transfer knowledge from one model to another. There are two steps in aTL process. The first step is to select a pre-trained model that is trained on a large scale of benchmark dataset, which7
PREPRINT - F
EBRUARY
25, 2021is related to the problem we intend to solve. For instance, Keras offers a wide range of pre-trained network such asVGG, Inception, Xception, ResNet in the literature. The second step is to fine-tune the model considering the size andsimilarity of our dataset with the pre-trained model. For instance, if we have a considerable amount of dataset, which isdifferent from the pre-trained model dataset. Therefore, it is wise to train the entire model. Nevertheless, for a smallamount of dataset, we need to freeze most of the layers and train only a few layers.In this study, we have utilized VGG series, XceptionNet and ResNet50 network in the TL process, where the weightsare pretrained on ImageNet dataset. ImageNet consists of . million training, , validation and , testingimages and belonging to classes. As it is observed from our workflow diagram in Fig. 1-(c), the earlier layers ofevery CNN model is frozen, which is responsible for capturing more generic features. Then, we have retrained the latterlayers of the network as fine-tuning by training on cervical cancer cells dataset to capture more dataset-specific features.Finally, we have fine-tuned our own fully connected classifier. Fig. 3 presents VGG16 network as an example, wherethe first few convolutional blocks use transferred parameters ( w , w , w ..., w k ) from another VGG16 network that istrained on ImageNet dataset.For all the four CNN’s, the input size is ( × × ), the learning rate is − for 50 epochs and then continuedtraining for another 50 periods with learning rate − , the batch size is 32 for the training set, batch size is one is forthe testing set, and Adam optimizer is employed. Fig. 3-(a) exhibits the whole TL process as an example on the VGGnetwork, where the first few layers are pre-trained on ImageNet dataset, and latter convolutional blocks along with FCLare fine-tuned. Fig. 3-(b) shows some representative feature maps extracted from various convolutional blocks of theVGG-16 network, which demonstrates the capability of TL process for extracting meaningful information from theimages. F V16 : FCL F V19
FCL: 1024(ReLU) F X FCL: 1024(ReLU)F R FCL: 1024(ReLU)Feature Concatenation output size: 4096
Dropout layer
Batch NormalizationFully Connected Layer (softmax)
Figure 4: Framework of the proposed hybrid feature fusion network.
Late fusion (LF) is one type of ensemble classifiers that relies on the maximum number of classifier decisions andthen weights that decision to improve the classification performance. In this experiment, the classification result offour different DL models, namely, VGG16, VGG19, ResNet50, and XceptionNet, are combined using a majorityvoting technique, where each class is determined based on the highest number of votes received on that class. If m = 1 , , , . . . ., X and n = 1 , , , . . . . . . , Y , where X is the number of classifiers, and Y is the number of classes,the i th classifier’s decision can be represented as E ( m, n ) ∈ (0 , . The LF technique for majority voting can bedescribed as follows, X (cid:88) m =1 E ( m, n ) = max Yn =1 X (cid:88) m =1 E ( m, n ) (3) Feature representation plays a vital role in image classification. We have observed that feature fusion (FF) is an efficientapproach for cervical cytopathology cell image analysis. FF strategy combines multiple relevant features into a singlefeature vector, which contain rich information and contributes more descriptions than the initial input feature vectors.The traditional strategies for FF are serial and parallel FF [52]. In a serial FF method, two features are concatenatedinto a single feature. For instance, two features F and F are extracted from an image with x, y vector dimension, then,fused feature is F s = ( x + y ) . Whereas, parallel FF merges two components into a complex vector, F p = F + iF with i indicating an imaginary component. 8 PREPRINT - F
EBRUARY
25, 2021 (a) (b) (c) (d) (e)
Figure 5: An example of SIPAKMED database in five categories: (a) Superficial-Intermediate, (b) Parabasal, (c)Koilocytotic, (d) Dyskeratotic, (e) Metaplastic.The problem with the above mentioned FF techniques is that they are unable to use original input features since theyare creating new features. Moreover, they suffer from integrating multiple features. In our study, we have proposedan HDFF technique by integrating feature vectors from multiple CNN architectures. Fig. 4 shows our proposed DFFnetwork, where F V , F V , F R , F X are the normalized feature vectors extracted from the dense layer (FCL) with1024 neurons of VGG16, VGG19, ResNet50 and XceptionNet. The FFN consists of one concatenation layer and oneFCL layer with softmax activation function to integrate different features. Moreover, dropout and batch normalizationlayers are introduced to prevent overfitting and optimize training performance. The concatenation layer generates avector of 4096 dimensions. If we consider (cid:83) for the concatenation operation, F n ( i ) indicates the n th feature vector.Then, the output vector of i th sample F ( i ) can be written as F ( i ) = (cid:91) i =1 F n ( i ) (4) To investigate the performance of our proposed DeepCervix network, we have applied publicly available SIPAKMEDdataset consisting of 4049 annotated cervical pap smear cell images [53]. A set of dataset is displayed in Fig. 5.Based on the cell appearance and morphology, expert cytopathologists classified the cells into five categories, such assuperficial-intermediate, parabasal, koilocytotic, metaplastic and dyskeratotic. More precisely, Superficial-intermediateand parabasal cells can be further categorized as normal cells, koilocytotic and dyskeratotic cells are recognized asabnormal cells, and metaplastic cells are counted under benign cells. Table 1 provides the distribution of cells accordingto their classes. Table 1: Distribution of the SIPAKMED database
Category Number of CellsSuperficial Normal 831Parabasal 787Koilocytotic Abnormal 825Dyskeratotic 813Metaplastic Benign 793Total 4049
SIPAKMED dataset comprises 4049 annotated cervical cell images. Among them, 60% of the dataset in each classis used for training, 20% is for validation, and 20% is for testing. We have performed 5-class (superficial, parabasal,koilocytotic, metaplastic and dyskeratotic), 3-class (Normal, abnormal and benign) and 2-class (Normal and abnormal)classification of the dataset. Moreover, data augmentation techniques are used on the training set, which increases thetraining dataset by a factor of 6. The resulted training, validation and test dataset is shown in Table 2.9
PREPRINT - F
EBRUARY
25, 2021Table 2: The experimental data setting of SIPAKMED dataset
Dataset Total Number of Images5-Class 3-Class 2-ClassTraining 16982 16989 13664Validation 811 811 652Test 812 811 652
In this experiment, we have used Google Colaboratory, which is a cloud service based on Jupyter notebook, to train andtest our model [54]. Python 2 and 3 are pre-configured with many other ML libraries, such as Tensorflow, MatplotLib,Keras, PyTorch and OpenCV in Jupyter notebook. It provides run time with fully functional GPU (NVIDIA Tesla K80)in Colab environment to exercise DL. Moreover, the codes are protected in Google drive.
To overcome the bias among the different algorithms, selecting a suitable evaluation metric is vital. Precision, recall,F1 score and accuracy are the most standard measures to evaluate the classification performance [55]. The number ofcorrectly identified samples among the all recognized representations are known as precision, whereas recall defines theability of a classification model to recognize all the relevant samples. The F1 score combines both metrics, precisionand recall, using the harmonic mean. Accuracy is the proportion of correctly predicted samples from the total numberof samples. The mathematical expressions of the evaluation metrics are shown in Table 3 . In Table 3, true positive(TP) is the number of accurately labeled positive samples, true negative (TN) is the number of correctly classifiednegative samples, the number of negative samples classified as positive are False positive (FP), and the number ofpositive instances predicted as negative is a false negative (FN).Table 3: Evaluation metrics
Assessments Formula
Precision, P TPTP + FP Recall, R TPTP + FN F1 score × P × RP + R Accuracy TP + TNTP + TN + FP + FN To exam the performance of our proposed HDFF method, we have calculated the precision, recall, F1 score andaccuracy of each individual fine-tuned DL models (VGG16, VGG19, ResNet-50, XceptionNet) along with latefusion (LF), where we have implemented the majority voting of diverse classifier (MVDC) and HDFF methods. Theperformance results for the classification of cervical cells on the unseen test dataset are shown in Table 4. The resultsare analyzed for binary class, 3-class and 5-class classification problems.
Binary classification:
In this case, we have classified the cervical cells into Normal and Abnormal (Table 1). It isseen from Table 4 that, among the four DL models, VGG16 gives the highest average precision, recall, F1 score of1.00, 1.00, 0.998, respectively, with an overall accuracy of 99.85%. After VGG16, ResNet-50 gives the classificationaccuracy of 99.38%, with an average precision, recall and F1 score of 0.995, 0.995 and 0.990. Whereas, XceptionNetperforms the least among them with an overall accuracy of 98.31%. Moreover, MVDC based LF and HDFF techniquesachieve a similar result as VGG16. 10
PREPRINT - F
EBRUARY
25, 2021Table 4: Performance analysis of the proposed HDFF method along with the base models. (Average Precision (Avg. P),Average Recall (Avg. R), Average F1 score (Avg. F1), Late Fusion (LF)
Cl. Pro. CNN Models Avg. P Avg. R Avg. F1 Acc.(%)2-Class VGG16 1.00 1.00 0.998
VGG19 0.985 0.985 0.990 98.77ResNet-50 0.995 0.995 0.990 99.38XceotionNet 0.980 0.980 0.980 98.31LF 1.00 1.00 0.998
HDFF 1.00 1.00 0.998
For ternary classification, we have classified the cervical cells into Normal, Abnormal andBenign class (Table 1). It can be seen from Table 4 that VGG16 obtains the classification accuracy of 97.90% with anaverage precision, recall and F1 score of 97.60%, 97% and 97.3%. VGG19 and ResNet-50 provide the same averageprecision value of 0.963, recall value of 0.943, 0.950 and F1 score of 0.953, 0.956, respectively. Both of them obtain anaccuracy of 96.18%. However, XceptionNet shows the worst performance and contribute with an accuracy of 89.64%.Additionally, LF technique obtains an accuracy of 98.52%, with precision, recall and F1 value of 0.987, 0.980 and0.980, respectively. Our advanced HDFF method obtains the highest classification accuracy of 99.38% with an averageprecision, recall and F1 score of 0.993, 0.990, 0.993, respectively.
In this experiment, we have analyzed the cervical cells into five classes (Table 1). It is shownfrom Table 4 that the highest overall accuracy, precision, recall and F1 score is 99.14%, 0.992, 0.990 and 0.990,obtained by HDFF technique, followed by LF method, VGG16, VGG19, ResNet50 and XceptionNet with an overallaccuracy of 98.64%, 98.27%, 96.43%, 96.06% and 65.77%, respectively. XceptionNet gives the worst performancewith an average precision, recall and F1 score of 0.751, 0.650, 0.639, respectively.The performance results in Table 4 illustrate that our proposed HDFF method (DeepCervix) obtains the highestclassification accuracy for binary class, 3-class and 5-class classification problem. After the HDFF method, LF achievesthe top classification results. Among the four DL models, VGG16 always provides superior performance, whereasthe performance of XceptionNet degrades with the extension of number of classes. It is also observed that binaryclassification achieved the highest classification accuracy, followed by 3-class and 5-class classification problem.
To better illustrate the classification performance, we present confusion matrices of our proposed HDFF and LF methodsin Fig. 6. Moreover, Fig. 7 shows the accuracy of each DL, LF, and HDFF models in histogram charts.11
PREPRINT - F
EBRUARY
25, 2021
328 0 04 320 0
328 01 323 162 0 0 1 00 167 0 0 01 0 162 2 00 0 2 157 0 T r u e l a b e l Predicted labelTest Data Confusion Matrix Test Data Confusion Matrix
326 2 00 324 03 0 156160 1 2 1 00 167 0 0 01 2 160 2 00 0 3 156 00 0 0 0 158 T r u e l a b e l Predicted label Predicted labelTest Data Confusion Matrix T r u e l a b e l T r u e l a b e l Predicted label
Predicted label T r u e l a b e l (a) LF and HDFF methods on Binary Classification (b) LF method on 3-class Classification (c) HDFF method on 3-classClassification(d) LF method on 5-class Classification (e) HDFF method on 5-class ClassificationTest Data Confusion Matrix Test Data Confusion Matrix
Figure 6: The confusion matrix of the LF and HDFF methods for 2-class, 3-class and 5-class classification problem.If we look at the confusion matrix for binary classification in Fig 6-(a), it is seen that both of the models (HDFF and LF)can accurately recognize 328 images as abnormal and 323 images as normal, though one regular image is labeled asabnormal. According to Table 4, both of the models obtained the same accuracy. For 3-class and 5-class classifications,the HDFF method has better recognition ability than the LF method. From Fig. 6-(c) it is observed that the HDFFmethod can accurately recognize 326 images as abnormal, 324 images as normal, and 156 images as benign, whereasonly five images are misclassified. For 5-class classification, the HDFF method accurately classified 805 images out of812 images (Fig. 6-(e)).According to the histogram diagram in Fig. 7, it is recognized that all of the models obtained considerably very highaccuracy for binary classification problems. As the number of classes increases, the overall accuracy for individual DLmodels decreases, whereas our proposed HDFF method shows good performance. For 3-class classification problem,the accuracy for the HDFF method is 99.38%, which is 1.48%, 3.2%, 3.2%, 9.74%, 0.86% higher than VGG16, VGG19,ResNet-50, XceptionNet and LF method, respectively. For 5-class classification, the highest classification accuracy is99.14%, achieved using HDFF method, which is an improvement of 0.87% than VGG16, 0.5% than LF, 2.71% thanVGG19, 3.08% than ResNet50, and 33.37% than XceptionNet.
Table 5 presents a comparative analysis of our proposed HDFF method with existing works to classify cervical cellsusing the SIPAKMED dataset. It is recognized from the table that our proposed HDFF method obtained the highestclassification accuracies on binary and multiclass classification problems. For binary and 5-class classification problems,12
PREPRINT - F
EBRUARY
25, 2021 .
85 97 . . .
77 96 .
18 96 . .
38 96 .
18 96 . .
31 89 . . .
85 98 .
52 98 . .
85 99 .
38 99 . B INARY CLASS 3 -CLASS 5 -CLASS
VGG16 VGG19 ResNet50 XceptionNet LF HDFF
Figure 7: Performance comparison among different TL models with HDFF and LF methods.our method obtained 1.60% and 0.19% higher accuracies than the current studies. It is noticed that the 3-classclassification problem has not been addressed in existing researches.Table 5: Comparison of classification accuracies on SIPAKMED dataset
Ref. Method Class Accuracy[53] CNN 5-Class 95.35%[56] Graph convolutionalnetwork 5-Class 98.37%[57] DenseNet-161 5-Class 98.96%[58] Bagging EnsembleClassifier 2-Class5-Class 98.25%94.09%Our method HDFF 2-Class3-Class5-Class
In our experiment, first, we have trained the individual DL models (VGG16, VGG19, ResNet50, XceptionNet) andsaving them with their weights separately. Then, we use those saved models and their weights and perform furthertraining in the HDFF method stage. To train each DL model, it takes around six hours for 100 epochs (using googlecolab). To train the HDFF model by using the saved models requires only a few minutes(3 seconds per epoch). Thoughit requires quite a long time for training, the testing time is around 2.5 seconds for each cervical cells.
Publicly available pap smear benchmark dataset (Herlev dataset) [17], consists of 917 single-cell images, is employedto evaluate our proposed HDFF method. This dataset is divided into seven classes. These seven classes can be furtherclassified into benign and malignant. The benign class consists of 242 images, and the malignant class consists of 675images. The details of the dataset are given in Table 6.Our experiment took 60% images of each class for training, 20% is for validation, and the rest is for testing. Besides,the data augmentation technique is addressed on the training set, which increases the training dataset by a factor of 14.The resulting training, validation, and test dataset for 7-class and 2-class classification problems are given in Table 7.13
PREPRINT - F
EBRUARY
25, 2021Table 6: Distribution of the Herlev dataset
Category Number of CellsNormal squamous Normal 74Intermediate squamous 70Columnar 98Mild dysplasia Abnormal 182Moderate dysplasia 146Severe dysplasia 197Carcinoma in situ 150Total 917
Table 7: The experimental data setting of Herlev dataset
Dataset Total Number of Images7-Class 2-ClassTraining 8190 8235Validation 185 184Test 186 184
Table 8 presents the classification performance of four different DL models with the LF and HDFF methods. The fourCNN models are accepted as a backbone network of LF and HDFF models. For binary classification of the Herlevdataset, it is observed that ResNet-50 provides the highest precision, recall, and F1 score for distinguishing the normalcervical cells from the abnormal one amid of the four CNN models, followed by VGG19, VGG16, and XceptionNet.Among the LF and HDFF methods, the HDFF method achieves the highest classification accuracy of 98.91%, which is1.08% higher than the LF method.For the 7-class classification of the Herlev dataset, ResNet-50 provides the highest classification accuracy of 83.87%among the four CNN models, whereas XceptionNet performs the worst and gives an accuracy of 39.78%. The LFapproach reaches 86.02% accuracy, with an average precision, recall, and F1 score of 0.887, 0.872, 0877, respectively.Moreover, our proposed HDFF method obtains the highest classification accuracy of 90.32%, with an average precision,recall, and F1 score of 0.915, 0.911, and 0.916, respectively.It is recognized that, for both the binary and multiclass classification problems, ResNet-50 obtains the highestclassification accuracy among the four DL models. After ResNet50, the LF model achieves better results than theindividual DL models, whereas the HDFF method obtains the highest classification accuracy.
Table 9 compares the performance results of existing studies with our proposed HDFF method in terms of overallclassification accuracy for 2-class and 7-class classification problems. A higher accuracy value indicates a higher rateof correct classifications. It is observed from the table that most of the existing work perform binary class classificationtasks, and they obtain accuracy above 90%. However, only a few papers addressed both the binary and multiclassclassification of the Herlev dataset. For the multiclass classification problem, the classification accuracy is between68.54% to 95.9%. [33] obtains the highest accuracy for 7-class classification, but it requires pre-segmented cervical cellimages. It is further observed from Table 9 that our proposed HDFF method outperforms existing methods in mostcases, which shows the robustness of our proposed algorithm.
Lately, the advancement of DL is solving critical tasks in the medical domain. Classification of cervical cells can helpidentify the cancerous subjects early, which is a significant step to prevent cervical cancers. This study proposes theHDFF method (DeepCervix) to classify the cervical cells on the SIPAKMED and Herlev datasets and obtained excellentresults. 14
PREPRINT - F
EBRUARY
25, 2021Table 8: Performance analysis of the proposed HDFF method along with the base models on Herlev dataset. (AveragePrecision (Avg. P), Average Recall (Avg. R), Average F1 score (Avg. F1), Late Fusion (LF)
Cl. Pro. CNN Models Avg. P Avg. R Avg. F1 Acc.(%)2-Class VGG16 0.880 0.895 0.885 90.76VGG19 0.910 0.845 0.870 90.76ResNet-50 0.950 0.930 0.940 95.11XceptionNet 0.850 0.815 0.835 87.50LF 0.985 0.960 0.975 97.83HDFF 0.995 0.980 0.985
Table 9: Comparison of classification accuracies on Herlev dataset ( BPNN (Back propagation neural network), LSSVM(Least-squares support-vector machines), HVCA (Hybrid variational convolutional autoencoder), ETL (Ensembledtransfer learning), Cl. Pro.(Classification problem), Acc (Accuracy))
Ref. Method Cl. Pro. Acc[59] BPNN 3-Class 79%[25] Hybrid ensemble 2-Class7-Class 98%78%[26] AlexNet & LSSVM 2-Class 94.61%[28] AlexNet & SVM 2-Class 99.19%[29] VGG16 & ResNet 2-Class 86%[30] CNN & TL 2-Class 98.3%[60] CNN & TL 2-Class 95.1%[31] AlexNet & TL & DT 2-Class7-Class 99.3%93.2%[32] Morphology & CNN 2-Class7-Class 94.5%64.5%[33] VGG-like network(Segmened image) 2-Class7-Class 98.10% [34] DenseNet161 2-Class7-Class 94.38%68.54%[61] HVCA 2-Class [62] Pretrained ResNet50 2-Class 97.89%[39] ETL 2-Class 98.37%Our method HDFF 2-Class7-Class 98.91%90.32%
Imaging modality, image quality, dataset distribution, model structure, complexity, loss function, optimization functionand number of epochs are some critical factors that influence a model’s performance. When we observe the performancemetrics for the SIPAKMED dataset in Table 4, VGG16 performs relatively well compared to ResNet50, VGG19, andXceptionNet. Therefore, a shallow network performs better than a very deep network for the SIPAKMED dataset. Ifwe consider the network architecture for VGG16, it contains very small receptive fields, which enables to have moreweight layers and thus to improve performance. The LF model based on MVDC shows a slight improvement in the15
PREPRINT - F
EBRUARY
25, 2021
Original: Dyskeratotic
Prediction: KoilocytoticConfidence score: 0.898 Original: DyskeratoticPrediction: KoilocytoticConfidence score: 0.898
Original: Koilocytotic
Prediction: DyskeratoticConfidence score: 0.994
Original: Koilocytotic
Prediction: DyskeratoticConfidence score: 0.993
Original: Metaplastic
Prediction: KoilocytoticConfidence score: 0.694 (a) (d)(b) (c) (e)
Figure 8: Examples of misclassified cervical cells from SIPAKMED dataset.
Original: CarcinomaPred: Severe dysplasticConfidence score: 0.744
Original: Carcinoma
Pred: Severe dysplasticConfidence score: 0.953
Original: Columnar
Pred: Light dyskeratoticConfidence score: 0.919 Original: Moderate dysplasticPred: Carcinoma
Confidence score: 0.337
Original: ColumnarPred: Severe dysplasticConfidence score: 0.851 (a) (d)(b) (c) (e)
Figure 9: Examples of misclassified cervical cells from Herlev dataset.overall result, but it cannot always guarantee leading performance. Besides, the HDFF method can effectively improvethe classification performance and provides the best result. It is observed from Fig. 6 that the HDFF method cancorrectly classify 805 images out of 812 images in a 5-class classification task. It is also observed that Koilocytotic andmetaplastic are challenging cells to classify. For the Herlev dataset (Table 8), unlike SIPAKMED, ResNet-50 performsbetter than other DL models. Therefore, it is observed that, for highly imbalanced and small datasets, ResNet-50is preferable. Besides, the best performance is obtained by the HDFF method for 2-class and 7-class classificationproblems.Fig. 8 and Fig. 9 provide examples of misclassified cervical cells on the SIPAKMED and Herlev dataset for the 5-classand 7-class classification problem. It can be seen from Fig 8-(a) that, for the dyskeratotic class image, the cell boundaryand nucleus are hard to distinguish and are wrongly listed as Koilocytotic with a confidence score of 0.898. ForFig. 8-(b),(c) the Dyskeratotic and Koilocytotic class image looks identical with the invisible nucleus boundary andmisclassified as koilocytotic and Dyskeratotic, respectively. Fig. 8-(d) reveals that the dark stained koilocytotic cellis misclassified as Dyskeratotic. From Fig. 8-(e), it can be found that the content of the Metaplastic cell is too darkto identify the cell and nucleus region and misclassified as koilocytotic with a confidence score of 0.694. Accordingto Fig. 9-(a),(b) two dark-stained carcinoma images are labeled as severe dysplastic. In Fig. 9-(c),(d) two columnarimages, which look very different to each other, are misclassified as light and severe dysplastic. It can be seen fromFig. 9-(e) that a moderate dysplastic cell image is misclassified as carcinoma. For all the misclassified images, it isrecognized that none of them contain adequate information about a cell.
This study proposes a deep learning-based HDFF and LF method to classify cervical cells. It is observed from theperformance metrics that the HDFF method achieves higher classification accuracies compared to the LF method.Unlike other methods that rely on pre segmentation of cytoplasm/nucleus and hand-crafted features, our proposedmethod offers end-to-end classification of cervical cells using deep features. SIPAKMED and Herlev datasets areutilized to evaluate the performance of our proposed model. For the SIPAKMED dataset, we have obtained thestate-of-the-art accuracy of 99.85%, 99.38%, and 99.14% for 2-class, 3-class, and 5-class classification problems. Wehave reached 98.91% accuracy for the Herlev dataset for a binary classification problem and 90.32% for the 7-Classclassification problem. 16
PREPRINT - F
EBRUARY
25, 2021Though our method provides very good performance, there are a few limitations. First of all, despite the high accuracyof the SIPAKMED dataset, the performance of our method degrades for 7-class classification on the Herlev dataset.An ideal screening system should not miss any abnormal cells. To overcome this for the multiclass classificationproblem, we could have integrated pre-segmented cell features into our model. Secondly, for our HDFF method, wehave investigated four DL models, fine-tuned them, and integrate their features to get the final model. In the future,we can investigate other DL models and compare their results for the multiclass classification accuracy. Thirdly, ourproposed method should be generalized for the classification involving cell overlapping. Finally, poison noise is acritical factor for cervical cell images that degrades model performance. Therefore, the denoising methods, such asadaptive wiener filter [63] in the preprocessing step can be implemented to improve the model’s overall performance.
References [1] Md Mamunur Rahaman, Chen Li, Xiangchen Wu, Yudong Yao, Zhijie Hu, Tao Jiang, Xiaoyan Li, and ShouliangQi. A survey for cervical cytopathology image analysis using deep learning.
IEEE Access , 8:61687–61710, 2020.[2] WHO et al. Who guidelines for the use of thermal ablation for cervical pre-cancer lesions. 2019.[3] Jacques Ferlay, M Ervik, F Lam, M Colombet, L Mery, M Piñeros, A Znaor, I Soerjomataram, and F Bray. Globalcancer observatory: cancer today.
Lyon, France: International Agency for Research on Cancer , 2018.[4] Tanja Šarenac and Momir Mikov. Cervical cancer, different treatments and importance of bile acids as therapeuticagents in this disease.
Frontiers in Pharmacology , 10:484, 2019.[5] Debbie Saslow, Diane Solomon, Herschel W Lawson, Maureen Killackey, Shalini L Kulasingam, Joanna Cain,Francisco AR Garcia, Ann T Moriarty, Alan G Waxman, David C Wilbur, et al. American cancer society, americansociety for colposcopy and cervical pathology, and american society for clinical pathology screening guidelinesfor the prevention and early detection of cervical cancer.
American journal of clinical pathology , 137(4):516–542,2012.[6] World Health Organization (WHO) et al. Screening as well as vaccination is essential in the fight against cervicalcancer. , 28:2018, 2014.[7] Elizabeth Davey, Alexandra Barratt, Les Irwig, Siew F Chan, Petra Macaskill, Patricia Mannes, and A MarionSaville. Effect of study design and quality on unsatisfactory rates, cytology classifications, and accuracy inliquid-based versus conventional cervical cytology: a systematic review.
The Lancet , 367(9505):122–132, 2006.[8] George N Papanicolaou. New cancer diagnosis.
CA: A Cancer Journal for Clinicians , 23(3):174–179, 1973.[9] George N Papanicolaou and Herbert F Traut. The diagnostic value of vaginal smears in carcinoma of the uterus.
American Journal of Obstetrics and Gynecology , 42(2):193–206, 1941.[10] Tarik M Elsheikh, R Marshall Austin, David F Chhieng, Fern S Miller, Ann T Moriarty, and Andrew A Renshaw.American society of cytopathology workload recommendations for automated pap test screening: Developed bythe productivity and quality assurance in the era of automated screening task force.
Diagnostic cytopathology ,41(2):174–178, 2013.[11] Aslı GençTav, Selim Aksoy, and Sevgen ÖNder. Unsupervised segmentation and classification of cervical cellimages.
Pattern recognition , 45(12):4151–4168, 2012.[12] Richard Lozano. Comparison of computer-assisted and manual screening of cervical cytology.
GynecologicOncology , 104(1):134–138, 2007.[13] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, MohsenGhafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning inmedical image analysis.
Medical image analysis , 42:60–88, 2017.[14] K Krishna and M Narasimha Murty. Genetic k-means algorithm.
IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics) , 29(3):433–439, 1999.[15] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu.An efficient k-means clustering algorithm: Analysis and implementation.
IEEE transactions on pattern analysisand machine intelligence , 24(7):881–892, 2002.[16] Hansang Lee and Junmo Kim. Segmentation of overlapping cervical cells in microscopic images with superpixelpartitioning and cell-wise contour refinement. In
Proceedings of the IEEE conference on computer vision andpattern recognition workshops , pages 63–69, 2016. 17
PREPRINT - F
EBRUARY
25, 2021[17] Jan Jantzen, Jonas Norup, Georgios Dounias, and Beth Bjerregaard. Pap-smear benchmark data for patternclassification.
Nature inspired Smart Information Systems (NiSIS 2005) , pages 1–9, 2005.[18] Yannis Marinakis, Magdalene Marinaki, and Georgios Dounias. Particle swarm optimization for pap-smeardiagnosis.
Expert Systems with Applications , 35(4):1645–1656, 2008.[19] Yannis Marinakis, Georgios Dounias, and Jan Jantzen. Pap smear diagnosis using a hybrid intelligent schemefocusing on genetic algorithm based feature selection and nearest neighbor classification.
Computers in Biologyand Medicine , 39(1):69–78, 2009.[20] Khin Yadanar Win, Somsak Choomchuay, Kazuhiko Hamamoto, Manasanan Raveesunthornkiat, Likit Rangsirat-tanakul, and Suriya Pongsawat. Computer aided diagnosis system for detection of cancer cells on cytologicalpleural effusion images.
BioMed research international , 2018, 2018.[21] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.
Deep learning , volume 1. MIT pressCambridge, 2016.[22] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, AndrejKarpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.
Internationaljournal of computer vision , 115(3):211–252, 2015.[23] Michael S Landau and Liron Pantanowitz. Artificial intelligence in cytopathology: a review of the literature andoverview of commercial landscape.
Journal of the American Society of Cytopathology , 8(4):230–241, 2019.[24] P Sukumar and RK Gnanamurthy. Computer aided detection of cervical cancer using pap smear images basedon hybrid classifier.
International Journal of Applied Engineering Research, Research India Publications ,10(8):21021–21032, 2015.[25] Abid Sarwar, Vinod Sharma, and Rajeev Gupta. Hybrid ensemble learning technique for screening of cervicalcancer using papanicolaou smear image analysis.
Personalized Medicine Universe , 4:54–62, 2015.[26] Kangkana Bora, Manish Chowdhury, Lipi B Mahanta, Malay K Kundu, and Anup K Das. Pap smear imageclassification using convolutional neural network. In
Proceedings of the Tenth Indian Conference on ComputerVision, Graphics and Image Processing , pages 1–8, 2016.[27] Jonghwan Hyeon, Ho-Jin Choi, Kap No Lee, and Byung Doo Lee. Automating papanicolaou test using deepconvolutional activation feature. In , pages 382–385. IEEE, 2017.[28] Bilal Taha, Jorge Dias, and Naoufel Werghi. Classification of cervical-cancer using pap-smear images: aconvolutional neural network approach. In
Annual Conference on Medical Image Understanding and Analysis ,pages 261–272. Springer, 2017.[29] Hakan Wieslander, Gustav Forslid, Ewert Bengtsson, Carolina Wahlby, Jan-Michael Hirsch, ChristinaRunow Stark, and Sajith Kecheril Sadanandan. Deep convolutional neural networks for detecting cellularchanges due to malignancy. In
Proceedings of the IEEE International Conference on Computer Vision Workshops ,pages 82–89, 2017.[30] Ling Zhang, Le Lu, Isabella Nogues, Ronald M Summers, Shaoxiong Liu, and Jianhua Yao. Deeppap: deepconvolutional networks for cervical cell classification.
IEEE journal of biomedical and health informatics ,21(6):1633–1643, 2017.[31] Srishti Gautam, Nirmal Jith, Anil K Sao, Arnav Bhavsar, Adarsh Natarajan, et al. Considerations for a pap smearimage analysis system with cnn features. arXiv preprint arXiv:1806.09025 , 2018.[32] Haoming Lin, Yuyang Hu, Siping Chen, Jianhua Yao, and Ling Zhang. Fine-grained classification of cervical cellsusing morphological and appearance based convolutional neural networks.
IEEE Access , 7:71541–71549, 2019.[33] Khalid Hamed S Allehaibi, Lukito Edi Nugroho, Lutfan Lazuardi, Anton Satria Prabuwono, Teddy Mantoro, et al.Segmentation and classification of cervical cells using deep learning.
IEEE Access , 7:116925–116941, 2019.[34] Yuttachon Promworn, Satjana Pattanasak, Chuchart Pintavirooj, and Wibool Piyawattanametha. Comparisons ofpap smear classification with deep learning models. In , pages 282–285. IEEE, 2019.[35] Long D Nguyen, Ruihan Gao, Dongyun Lin, and Zhiping Lin. Biomedical image classification based on a featureconcatenation and ensemble of deep cnns.
Journal of Ambient Intelligence and Humanized Computing , pages1–13, 2019.[36] Nacer Eddine Benzebouchi, Nabiha Azizi, Amira S Ashour, Nilanjan Dey, and R Simon Sherratt. Multi-modalclassifier fusion with feature cooperation for glaucoma diagnosis.
Journal of Experimental & Theoretical ArtificialIntelligence , 31(6):841–874, 2019. 18
PREPRINT - F
EBRUARY
25, 2021[37] Wei Xue, Xiangyang Dai, and Li Liu. Remote sensing scene classification based on multi-structure deep featuresfusion.
IEEE Access , 8:28746–28755, 2020.[38] Zhiqiong Wang, Mo Li, Huaxia Wang, Hanyu Jiang, Yudong Yao, Hao Zhang, and Junchang Xin. Breastcancer detection using extreme learning machine based on feature fusion with cnn deep features.
IEEE Access ,7:105146–105158, 2019.[39] Dan Xue, Xiaomin Zhou, Chen Li, Yudong Yao, Md Mamunur Rahaman, Jinghua Zhang, Hao Chen, JinpengZhang, Shouliang Qi, and Hongzan Sun. An application of transfer learning and ensemble learning techniques forcervical histopathology image classification.
IEEE Access , 8:104603–104618, 2020.[40] Ashnil Kumar, Jinman Kim, David Lyndon, Michael Fulham, and Dagan Feng. An ensemble of fine-tunedconvolutional neural networks for medical image classification.
IEEE journal of biomedical and health informatics ,21(1):31–40, 2016.[41] Javeria Amin, Abida Sharif, Nadia Gul, Muhammad Almas Anjum, Muhammad Wasif Nisar, Faisal Azam, andSyed Ahmad Chan Bukhari. Integrated design of deep features fusion for localization and classification of skincancer.
Pattern Recognition Letters , 131:63–70, 2020.[42] Md Mamunur Rahaman, Chen Li, Yudong Yao, Frank Kulwa, Mohammad Asadur Rahman, Qian Wang, ShouliangQi, Fanjie Kong, Xuemin Zhu, and Xin Zhao. Identification of covid-19 samples from chest x-ray images usingdeep learning: A comparison of transfer learning approaches.
Journal of X-Ray Science and Technology ,(Preprint):1–19, 2020.[43] David Rolnick, Andreas Veit, Serge Belongie, and Nir Shavit. Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694 , 2017.[44] Sander Dieleman, Kyle W Willett, and Joni Dambre. Rotation-invariant convolutional neural networks for galaxymorphology prediction.
Monthly notices of the royal astronomical society , 450(2):1441–1459, 2015.[45] Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. A survey of the recent architectures ofdeep convolutional neural networks.
Artificial Intelligence Review , pages 1–62, 2020.[46] Víctor Suárez-Paniagua and Isabel Segura-Bedmar. Evaluation of pooling operations in convolutional architecturesfor drug-drug interaction extraction.
BMC bioinformatics , 19(8):39–47, 2018.[47] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014.[48] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.[49] François Chollet. Xception: Deep learning with depthwise separable convolutions. In
Proceedings of the IEEEconference on computer vision and pattern recognition , pages 1251–1258, 2017.[50] Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, and Samy Bengio. Transfusion: Understanding transfer learningfor medical imaging. In
Advances in neural information processing systems , pages 3347–3357, 2019.[51] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.
IEEE Transactions on knowledge and dataengineering , 22(10):1345–1359, 2009.[52] Jian Yang, Jing-yu Yang, David Zhang, and Jian-feng Lu. Feature fusion: parallel strategy vs. serial strategy.
Pattern recognition , 36(6):1369–1381, 2003.[53] Marina E Plissiti, Panagiotis Dimitrakopoulos, Giorgos Sfikas, Christophoros Nikou, O Krikoni, and AntoniaCharchanti. Sipakmed: A new dataset for feature and image based classification of normal and pathologicalcervical cells in pap smear images. In ,pages 3144–3148. IEEE, 2018.[54] Ekaba Bisong. Google colaboratory. In
Building Machine Learning and Deep Learning Models on Google CloudPlatform , pages 59–64. Springer, 2019.[55] P Sukumar and RK Gnanamurthy. Computer aided detection of cervical cancer using pap smear images based onadaptive neuro fuzzy inference system classifier.
Journal of Medical Imaging and Health Informatics , 6(2):312–319, 2016.[56] J. Shi, R. Wang, Yushan Zheng, Z. Jiang, and Lanlan Yu. Graph convolutional networks for cervical cellclassification. 2019.[57] Muhammed Talo. Diagnostic classification of cervical cell images from pap smear slides.
Academic PerspectiveProcedia , 2(3):1043–1050, 2019. 19
PREPRINT - F
EBRUARY
25, 2021[58] Kyi Pyar Win, Yuttana Kitjaidure, Kazuhiko Hamamoto, and Thet Myo Aung. Computer-assisted screening forcervical cancer using digital image processing of pap smear images.
Applied Sciences , 10(5):1800, 2020.[59] Seema Singh, V Tejaswini, Rishya P Murthy, and Amit Mutgi. Neural network based automated system fordiagnosis of cervical cancer.
International Journal of Biomedical and Clinical Engineering (IJBCE) , 4(2):26–39,2015.[60] Loris Nanni, Stefano Ghidoni, and Sheryl Brahnam. Handcrafted vs. non-handcrafted features for computer visionclassification.
Pattern Recognition , 71:158–172, 2017.[61] Aditya Khamparia, Deepak Gupta, Joel JPC Rodrigues, and Victor Hugo C de Albuquerque. Dcavn: Cervicalcancer prediction and classification using deep convolutional and variational autoencoder network.
MultimediaTools and Applications , pages 1–17, 2020.[62] Aditya Khamparia, Deepak Gupta, Victor Hugo C de Albuquerque, Arun Kumar Sangaiah, and Rutvij H Jhaveri.Internet of health things-driven deep learning system for detection and classification of cervical cells using transferlearning.
The Journal of Supercomputing , pages 1–19, 2020.[63] TP Deepa and A Nagaraja Rao. A study on denoising of poisson noise in pap smear microscopic image.
Indian JSci Technol , 9:45, 2016.