[PDF] DeepCervix: A Deep Learning-based Framework for the Classification of Cervical Cells Using Hybrid Deep Feature Fusion Techniques

Abstract

Cervical cancer, one of the most common fatal cancers among women, can be prevented by regular screening to detect any precancerous lesions at early stages and treat them. Pap smear test is a widely performed screening technique for early detection of cervical cancer, whereas this manual screening method suffers from high false-positive results because of human errors. To improve the manual screening practice, machine learning (ML) and deep learning (DL) based computer-aided diagnostic (CAD) systems have been investigated widely to classify cervical pap cells. Most of the existing researches require pre-segmented images to obtain good classification results, whereas accurate cervical cell segmentation is challenging because of cell clustering. Some studies rely on handcrafted features, which cannot guarantee the classification stage's optimality. Moreover, DL provides poor performance for a multiclass classification task when there is an uneven distribution of data, which is prevalent in the cervical cell dataset. This investigation has addressed those limitations by proposing DeepCervix, a hybrid deep feature fusion (HDFF) technique based on DL to classify the cervical cells accurately. Our proposed method uses various DL models to capture more potential information to enhance classification performance. Our proposed HDFF method is tested on the publicly available SIPAKMED dataset and compared the performance with base DL models and the LF method. For the SIPAKMED dataset, we have obtained the state-of-the-art classification accuracy of 99.85%, 99.38%, and 99.14% for 2-class, 3-class, and 5-class classification. Moreover, our method is tested on the Herlev dataset and achieves an accuracy of 98.32% for binary class and 90.32% for 7-class classification.

Full PDF

DD EEP C ERVIX : A D

EEP L EARNING - BASED F RAMEWORK FORTHE C LASSIFICATION OF C ERVICAL C ELLS U SING H YBRID D EEP F EATURE F USION T ECHNIQUES

A P

REPRINT

Md Mamunur Rahaman

Microscopic Image and Medical Image Analysis Group, MBIE CollegeNortheastern UniversityShenyang 110169, China [email protected]

Chen Li

Microscopic Image and Medical Image Analysis Group, MBIE CollegeNortheastern UniversityShenyang 110169, China [email protected]

Yudong Yao

Department of Electrical and Computer EngineeringStevens Institute of TechnologyHoboken, NJ 07030, USA

Frank Kulwa

Microscopic Image and Medical Image Analysis GroupMBIE CollegeNortheastern UniversityShenyang 110169, China

Xiangchen Wu

Suzhou Ruiguan Technology Company Ltd.Suzhou 215000, China

Xiaoyan Li

Cancer Hospital of China Medical UniversityLiaoning Hospital and InstituteShenyang 110042, China

Qian Wang

Cancer Hospital of China Medical UniversityLiaoning Hospital and InstituteShenyang 110042, ChinaFebruary 25, 2021 A BSTRACT

EBRUARY

25, 2021performance for a multiclass classiﬁcation task when there is an uneven distribution of data, which isprevalent in the cervical cell dataset. This investigation has addressed those limitations by proposingDeepCervix, a hybrid deep feature fusion (HDFF) technique based on DL to classify the cervicalcells accurately. Our proposed method uses various DL models to capture more potential informationto enhance classiﬁcation performance. Our proposed HDFF method is tested on the publicly availableSIPAKMED dataset and compared the performance with base DL models and the LF method. Forthe SIPAKMED dataset, we have obtained the state of the art classiﬁcation accuracy of . , . , and . for 2-class, 3-class, and 5-class classiﬁcation. Moreover, our method is testedon the Herlev dataset and achieves an accuracy of . for binary class and . for 7-classclassiﬁcation. K eywords Cervical cancer · Classiﬁcation · Ensemble learning · Feature fusion · Deep learning · Pap smear

Cervical cancer, found in woman’s cervix, is the fourth most prevalent cancer among women [1]. According to theWorld Health Organization (WHO), approximately 570 000 women are diagnosed with cervical cancer globally, andabout 311 000 women have lost their lives due to this fatal disease in 2018 alone [2]. More than 80% of the cervicalcancer cases and 85% of deaths occur in poor and developing nations because of the absence of screening and treatmentfacilities [3]. Improper menstrual hygiene, pregnancy at an early age, smoking and use of oral preventatives are theleading risk factors that lead to the infection with human papillomavirus (HPV) [4]. Research has revealed that longterm infection with HPV is the main reason for cervical cancer. However, Cervical cancer is the most treatable form ofcancer if it is detected early and treated adequately [5].Routine screening of women over 30 years old plays a vital role to prevent cervical cancer effectively by allowing theearly detection and treatment [6]. The most popular screening technique to detect the cervical malignancy is cervicalcytopathology (pap smear test or liquid-based cytology) due to its cost-effectiveness [5, 7]. In this technique, cells arecollected from the squamocolumnar terminal of the cervix and the malignancy is checked under the light microscope byexpert cytologists [8, 9]. It usually demands 5-10 minutes to analyze a single slide based on the different orientation andoverlapping of the cells [10]. Moreover, manual screening method is difﬁcult, tedious, time-consuming, expensive andsubject to errors because each slide contains around three million cells with different orientation and overlapping, whichleads to developing an automated computerized system that can analyze the pap cell effectively and efﬁciently [11, 12].With the possibility to train data at the end of 1990s, there has been extensive research for the development of computer-aided diagnostic (CAD) system to help doctors to track cervical cancer [13]. The traditional CAD system consists ofthree steps: cell segmentation (cytoplasm, nuclei), feature extraction and classiﬁcation. In this system, ﬁrstly, ﬁlteringbased preprocessing work is performed to enhance image quality. Then, cell nuclei are extracted using k-means [14],clustering [15] or super-pixel [16] methods. After, the post processing task is performed to correct the segmentednucleus. After that, handcrafted features [17, 18, 19], such as Morphological features, color metric features and texturefeatures are extracted from the segmented nucleus. Next, the feature selection technique is applied to ﬁnd the mostdiscriminant features, and ﬁnally, a classiﬁer is designed to classify the cell [20].The above-described method requires many steps to process the data and extracted handcrafted features cannot ensuresuperior classiﬁcation performance, which also highlights the incompetence of automatic learning. In order to obtainan enhanced CAD system, deep learning (DL) based feature extraction methods have a signiﬁcant advantage overother machine learning (ML) algorithms. DL based algorithm is achieving the state-of-the-art results on challengingcomputer vision tasks [21, 22]. One compromise with DL is that it demands a considerable amount of data to obtain agood result compared with ML techniques, which is challenging to obtain in the medical domain [23]. Moreover, DLalso provides poor performance when there is an uneven distribution of the sample data in a multiclass classiﬁcationproblem, which is very prevalent in the medical domain. Therefore, the CAD technique for the analysis of pap cellsrequires further research and development.In this study, we have introduced DeepCervix, which is a DL based framework to accurately classify the cervicalcytopathology cell based on hybrid deep feature fusion (HDFF) techniques. In our proposed framework, we haveused pre-trained DL models that are trained on ImageNet datasets ( > PREPRINT - F

EBRUARY

25, 2021

Training data

Data Preprocessing i. Rescalingii. Dataset generation (Geometric manipulation)iii. In place data augmentation (Keras ImageDataGenerator)

VGG19 Model

26 BN

27 Dropout28 D, SM

ResNet50 Model

86 conv4_block1_2_relu -

174 conv5_block3_out

175 GMP(conv4_block1_0_conv)176 BN177 Dropout178 D(1024), relu179 BN180 Dropout181 D, SM

XceptionNet Model

VGG16 Model

D(1024 ), relu23 BN24 Dropout25 D, SM F r o ze n F i n e T un e d F i n e T un e d Feature combination of D (1024) Sequential Model, Dropout, BN, SMClassification

Test Images

Evaluation Metrics

AccuracyPrecisionRecallF-scoreCervical cells(a) (b)(c)(d) (e)(f)FFN

Figure 1: Workﬂow diagram of the proposed DeepCervix network. (Global Max Pooling (GMP), Batch Normalization(BN), Dense Layer (D), SoftMax (SM))on SIPAKMED dataset, consisting of single-cell cervical cytopathology images. For SIPAKMED dataset, we haveachieved the highest classiﬁcation accuracy of 99.85%, 98.38% and 99.14% for 2-class, 3-class and 5-class classiﬁcationproblems, respectively. Moreover, we have also tested our method on Herlev dataset and reached an exactitude of98.91% for binary classiﬁcation and 90.32% for 7-class distribution problem. The workﬂow of the suggested HDFFmethod is presented in Fig. 1. From the workﬂow diagram, we can see that:• As shown in Fig. 1, the cervical pap smear images are ﬁrst retrieved from accessible databases (e.g.,SIPAKMED, Herlev) and considered as training samples.3

PREPRINT - F

EBRUARY

25, 2021• In the preprocessing step, two stages of data augmentation task are implemented; ﬁrst is to use somegeometric manipulation, such as afﬁne transformations, adding noises (Gaussian, Laplace), canny ﬁlter, edgedetection, colour ﬁlter, change of brightness and contrast to increase the training samples. Second is to use thein-place data augmentation technique utilizing the Keras “ImageDataGenerator” API, where the images arereconstructed randomly during the training time.• After the preprocessing step, the images are supplied to four DL models, VGG16, VGG19, XceptionNet andResNet50. From Fig. 1-(c), it is seen that for VGG16 model, we have ﬁne-tuned the last convolutional block,from layer-13 to layer-18 along with the top-level classiﬁer.• In the feature fusion network (FFN) stage, ﬁrst, we extract the features from the last layer before the SM layerof the DL models to create the feature arrays with 1024 features from each model. Then, the feature arrays arefed into the sequential model connecting with dense layer with BN and dropout layer in between, to performthe classiﬁcation.• In this step, unseen test images are provided to perform the classiﬁcation.• Finally, we have assessed the performance of the proposed model by calculating the precision, recall, F scoreand accuracy.The main contributions of this paper are as follows: (1) To the best of our knowledge, this is the ﬁrst study to classifycervical cytopathology cell using HDFF techniques. (2) Two different stages of data augmentation techniques arepresented in this study. (3) Four types of CNN’s with enhanced structure, VGG16, VGG19, XceptionNet and ResNet50are introduced to extract the complementary features from various depths of the networks. (4) An improved FFN isincluded to integrate the features adaptively by combining dense layer with SM, BN and dropout layer in between.(5) Our proposed method achieves the highest classiﬁcation accuracy on the SIPAKMED dataset, which shows thepotential of improved cervical cancer diagnostic systems.The remainder of this paper is organized as follows: Sec. 3 presents relevant studies of DL for the analysis ofcervical cytopathology images and relevant feature fusion studies in computer vision tasks. Sec. 4 investigates datapre-processing techniques that we have utilized in our experiment and our proposed methods. Sec. 5 explains theexperimental dataset, data settings, experimental setup, evaluation method, and experimental results and analysis. Sec. 6discusses our proposed method with some examples of misclassiﬁed images. Finally, Sec. 7 concludes this paper bypointing out some limitations of our method. An overview of relevant DL approaches that are employed to analyze the cervical cells and feature fusion techniques inimaging modalities are compiled in this section.

Various DL and ML-based techniques have been applied to classify the cervical cells. For instance, [24] utilizes thehistogram features, texture features, grey level features and local binary pattern features. Then, the features are suppliedinto a hybrid classiﬁer system combining with SVM and adaptive neuro-fuzzy interface system to analyze the cervicalcells into normal and abnormal. A hybrid ensemble technique is introduced by combining 15 different machine learningalgorithms, such as random forest, bagging, rotation forest and J graft to classify the cervical cells [25]. Theyobserve that a hybrid ensemble technique performs better than an individual algorithm.A deep CNN (base AlexNet) based feature extraction method is applied in [26], followed by an unsupervised featureselection task. Later, feature vectors are supplied into the least-square version of the support vector machine (LSSVM)and SoftMax regression to classify the cervical cells. [27] designs a model to extract the features using VGG16 fromcervical cells and fed the features into ML classiﬁers, support vector machine (SVM), random forest and AdaBoost.They discern that SVM functions better than other ML classiﬁers. A pre-trained AlexNet architecture is employed toextract the characteristics of cervical cells and apply those features to classify them using SVM [28]. A CNN basedclassiﬁcation approach is explained in [29] to classify the cervical cells applying VGG16 and ResNet architectureand explore that ResNet50 is more suitable than VGG16 based on the performance. A deep transfer learning-basedclassiﬁcation approach is presented in [30] to classify the cervical cells into healthy and abnormal with prior data4 PREPRINT - F

EBRUARY

25, 2021augmentation and patch extraction work. [31] applies deep transfer learning technique based on AlexNet to detect,segment and classify the cervical cells and demonstrates that segmentation is not necessary for classiﬁcation. AlexNet,GoogleNet, ResNet and DenseNet based pre-trained and ﬁne-tuned CNN architecture is employed to classify thecervical cells in [32], where segmentation of cytoplasm and nucleus are prerequired for this method.Similarly, In [33], VGG-like network consists of seven layers uses pre-segmented cervical cells to perform theclassiﬁcation task. A comparative study is performed based on ﬁve DL models, ResNet101, Densenet161, Alexnet,VGG19 and SqueezeNet to check their classiﬁcation performance on the cervical dataset, where DenseNet161 providesthe maximum accuracy [34]. Moreover, [35] coupled the features of pre-trained Inception-V3, ResNet152 andInceptionResNetV2 to analyze biomedical images. In addition, a detailed study about relevant work, it is recommendedto go through our survey paper about cervical cytopathology image analysis using DL [1].It is perceived from the reference review that most of the authors have conducted a binary classiﬁcation task, whereas,in practice, multiclass classiﬁcation is more important. Moreover, the transferred model often unable to acknowledgethe characteristics of medical images, and traditional features can not guarantee the optimality of the system. Therefore,this paper investigate methods to address those issues.

A hybrid fusion approach, combining early and late fusion is presented in [36] for the diagnosis of glaucoma. Hand-crafted features such as Gray level co-occurrence matrix, central and Hu moments are consolidated with deep features.Later, the feature vectors are supplied to SVM and CNN based classiﬁer. A satellite remote sensing scene classiﬁcationmethod based on multi-structure deep feature fusion is presented in [37]. CaffeNet, VGG-VD16 and GoogLeNetare applied to extract the features and fuse those features through the fusion network to do the classiﬁcation. [38]develops a CAD method to detect breast cancer by employing feature fusion with CNN. They have combined the deepfeatures, morphological features, texture features, density features and fuse those features through extreme machinelearning classiﬁer to classify the breast masses into benign and malignant. In our previous study [39], we have classiﬁedcervical histopathology images using weighted voting based ensemble learning techniques. In [40], an ensemble ofdifferent CNN structure, is obtained to classify medical images. The proposed ensemble method proves better predictivecapability by combining the results of different classiﬁers. [41] practices the pre-trained AlexNet and VGG16 to extractthe features from segmented skin lesions and classify them into benign and malignant.

The cervical cytopathology cell images (SIPAKMED dataset) that we have employed to check the performance of ourproposed method are in BMP format with dimensions ranging from ( × ) to ( × ) pixels. Therefore, wehave rescaled the object size to ( × ) pixels for all the four CNN networks. In this respect, we have utilized theKeras “preprocess-input" function, which transforms input images according to the model requirement. Various geometric transformations and image processing functions are discussed in this subsection that we have used inour experiment. The data augmentation task is performed using machine learning “imgaug" library, fourth version,which supports various augmentation techniques. The newly formed images saved along with the training images andincrease the training data size by a factor of six, which is used to obtain better results.• Afﬁne Transformations (ATs): ATs are geometric manipulations that move a pixel from a coordinate positionof ( a, b ) to a new position of ( a (cid:48) , b (cid:48) ) . A pair of transformations specify the movement, a (cid:48) = T a ( a, b ) , b (cid:48) = T b ( a, b ) (1)It combines linear transformations and translations. In our experiment, we have performed rotation, scaling,translation, shearing and horizontal and vertical ﬂip operations of an image. For a batch of training images,one of these transformations is randomly arranged.• Contrast limited adaptive histogram equalization (CLAHE): As we know, histogram equalization (HE)enhances the contrast of images, which may lead to too bright or dark regions. Whereas, CLAHE performshistogram equalization by dividing images into small blocks, where each block performs HE. As a result, it5 PREPRINT - F

EBRUARY

25, 2021prevents the over-ampliﬁcation of noise and contrast in an image. CLAHE, all channel CLAHE and gammacontrast are employed in our experiment. One of the CLAHE augmenters is randomly chosen from a batch oftraining samples.• Edge detection: “EdgeDetect” and “DirectedEdgeDetect" functions are used from imgaug API that transformsthe input images into edge images, where edges are detected from random angles and mark non-edge region asblack and edge region as white.• Canny ﬁlter: Canny edge detection augmenters are also utilized, where the input images are preprocessedusing Sobel ﬁlter.• Photometric transformations (Pms): PMs are accomplished by shufﬂing all the colour channels, turning imagesinto grayscale, changing hue and saturation value, adding hue and saturation and quantizing images up to 16colours.• Contrast adaptation (CA): CA is performed by modifying the contrast and brightness of an image.

In order to enhance model performance, Keras “ImageDataGenerator" API is applied [42]. The images are transformedrandomly during the training time. As a result, the network examines unlike samples in each epoch, which extend themodel generalizability. In this process, we have set the featurewise center as false, rotation range is set to 5 degrees andﬁll mode is nearest. Then, we have ﬁxed horizontal and vertical ﬂips to true, brightness range from 50% to 130% andkept the channel shift range true.

Lately, DL, one type of ML algorithms, is the most commonly designed and successful type of ml algorithm to analyzethe medical images. Convolutional neural network (CNN) is the most prevalent deep learning architecture. Research hasconﬁrmed that CNNs are robust to image noise and invariant to translation, rotation and size, which increase the object’sanalyzing ability [43, 44]. The CNN architecture is composed of convolution, pooling and fully connected layers. Themain building block of CNN structure is convolution layer, which extracts the low- and high-level features of an imageas the layer gets deeper [45]. The pooling layer after the convolution layer reduces the size of the convoluted features byextracting the maximum or average value through max-pooling or average pooling operation. A fully connected layer(FCL) connects every neuron of each layer to another layer to classify the image, followed by the principle of multilayerperceptron [46]. In this study, we have utilized VGG-16, VGG-19, ResNet-50 and XceptionNet as CNN architecture.1. VGGNet: The VGGNet came with the idea of a deeper network with smaller ﬁlter. The model can have 16 to19 layers with ﬁxed input size of × × . The convolution ﬁlter size is ( × ) with a stride of 1 pixel.A linear transformation of input is also performed by ( × ) convolution ﬁlter with ReLU activation function.A total of ﬁve max-pooling operations is performed with window size ( × ), followed by three FCL. Thesigniﬁcant discovery of the VGGNet is the small receptive ﬁeld ( × ), which enables to have more weightlayers, consequently, to improve the performance [47].2. ResNet: [48] observes that with the increase of network depth the network performance improves at a certainlevel and then degrades rapidly. Therefore, it introduced skip connections to increase the performance withnetwork depth. Thus, it is possible to have 1000 weight layer in ResNet. For a X feature input of a convolutionlayer with F(x) as a residual function, the input of the ﬁrst layer (x) is copied to the output layer, H ( x ) = F ( x ) + x, or, F ( x ) = H ( x ) − x (2)The structure of the residual learning block is shown in Fig. 2.3. XceptionNet: The extended version of Inception model is XceptionNet, which is based on depth wise separableconvolutions, followed by pointwise convolution. The model is lighter with few number of connections andprovides better results on ImageNet classiﬁcation then InceptionV3, ResNet and VGGNet [49]. To train a CNN from scratch demands a considerable amount of data with high computing power, which also costslonger training time. In medical domain, image datasets are usually in the order of − , since arranging largeannotated dataset is quite impossible. Moreover, the image quality is also inferior. The solution to this problem istransfer learning (TL), which helps to create an accurate model by starting the learning from patterns that have been6 PREPRINT - F

EBRUARY

25, 2021

X feature map X identity

H(x) = F(x) + X

F(x) ReLU

ReLU

Residual learning block

Figure 2: The structure of residual learning block of Resnet.

Parametertransferring

Fine-tuning convolutional block

Fine-tuning

Fully connected classifier

Pre Training

W1 W2 W3 - - -WkFeature maps Feature mapsFeature maps FC FC FCImageNet C & PC & PC & PC & PC & PC & PSIPAKMEDInput Image( 224×224×3 )

Feature maps @block2_conv1

Feature maps @block4_conv1 Feature maps @block4_conv3 (a)(b)

Figure 3: (a) Visualization of TL process, where parameters are transferred from another CNN and ﬁne-tuned oncervical cancer cell dataset, (b) Visualization of the feature maps of three different convolutional layers of VGG16.already learned on solving different problems instead of learning from the scratch [50, 51]. Therefore, TL is an approachin DL and ML techniques, that allow us to transfer knowledge from one model to another. There are two steps in aTL process. The ﬁrst step is to select a pre-trained model that is trained on a large scale of benchmark dataset, which7

PREPRINT - F

EBRUARY

25, 2021is related to the problem we intend to solve. For instance, Keras offers a wide range of pre-trained network such asVGG, Inception, Xception, ResNet in the literature. The second step is to ﬁne-tune the model considering the size andsimilarity of our dataset with the pre-trained model. For instance, if we have a considerable amount of dataset, which isdifferent from the pre-trained model dataset. Therefore, it is wise to train the entire model. Nevertheless, for a smallamount of dataset, we need to freeze most of the layers and train only a few layers.In this study, we have utilized VGG series, XceptionNet and ResNet50 network in the TL process, where the weightsare pretrained on ImageNet dataset. ImageNet consists of . million training, , validation and , testingimages and belonging to classes. As it is observed from our workﬂow diagram in Fig. 1-(c), the earlier layers ofevery CNN model is frozen, which is responsible for capturing more generic features. Then, we have retrained the latterlayers of the network as ﬁne-tuning by training on cervical cancer cells dataset to capture more dataset-speciﬁc features.Finally, we have ﬁne-tuned our own fully connected classiﬁer. Fig. 3 presents VGG16 network as an example, wherethe ﬁrst few convolutional blocks use transferred parameters ( w , w , w ..., w k ) from another VGG16 network that istrained on ImageNet dataset.For all the four CNN’s, the input size is ( × × ), the learning rate is − for 50 epochs and then continuedtraining for another 50 periods with learning rate − , the batch size is 32 for the training set, batch size is one is forthe testing set, and Adam optimizer is employed. Fig. 3-(a) exhibits the whole TL process as an example on the VGGnetwork, where the ﬁrst few layers are pre-trained on ImageNet dataset, and latter convolutional blocks along with FCLare ﬁne-tuned. Fig. 3-(b) shows some representative feature maps extracted from various convolutional blocks of theVGG-16 network, which demonstrates the capability of TL process for extracting meaningful information from theimages. F V16 : FCL F V19

FCL: 1024(ReLU) F X FCL: 1024(ReLU)F R FCL: 1024(ReLU)Feature Concatenation output size: 4096

Dropout layer

Batch NormalizationFully Connected Layer (softmax)

Figure 4: Framework of the proposed hybrid feature fusion network.

Late fusion (LF) is one type of ensemble classiﬁers that relies on the maximum number of classiﬁer decisions andthen weights that decision to improve the classiﬁcation performance. In this experiment, the classiﬁcation result offour different DL models, namely, VGG16, VGG19, ResNet50, and XceptionNet, are combined using a majorityvoting technique, where each class is determined based on the highest number of votes received on that class. If m = 1 , , , . . . ., X and n = 1 , , , . . . . . . , Y , where X is the number of classiﬁers, and Y is the number of classes,the i th classiﬁer’s decision can be represented as E ( m, n ) ∈ (0 , . The LF technique for majority voting can bedescribed as follows, X (cid:88) m =1 E ( m, n ) = max Yn =1 X (cid:88) m =1 E ( m, n ) (3) Feature representation plays a vital role in image classiﬁcation. We have observed that feature fusion (FF) is an efﬁcientapproach for cervical cytopathology cell image analysis. FF strategy combines multiple relevant features into a singlefeature vector, which contain rich information and contributes more descriptions than the initial input feature vectors.The traditional strategies for FF are serial and parallel FF [52]. In a serial FF method, two features are concatenatedinto a single feature. For instance, two features F and F are extracted from an image with x, y vector dimension, then,fused feature is F s = ( x + y ) . Whereas, parallel FF merges two components into a complex vector, F p = F + iF with i indicating an imaginary component. 8 PREPRINT - F

EBRUARY

25, 2021 (a) (b) (c) (d) (e)

Figure 5: An example of SIPAKMED database in ﬁve categories: (a) Superﬁcial-Intermediate, (b) Parabasal, (c)Koilocytotic, (d) Dyskeratotic, (e) Metaplastic.The problem with the above mentioned FF techniques is that they are unable to use original input features since theyare creating new features. Moreover, they suffer from integrating multiple features. In our study, we have proposedan HDFF technique by integrating feature vectors from multiple CNN architectures. Fig. 4 shows our proposed DFFnetwork, where F V , F V , F R , F X are the normalized feature vectors extracted from the dense layer (FCL) with1024 neurons of VGG16, VGG19, ResNet50 and XceptionNet. The FFN consists of one concatenation layer and oneFCL layer with softmax activation function to integrate different features. Moreover, dropout and batch normalizationlayers are introduced to prevent overﬁtting and optimize training performance. The concatenation layer generates avector of 4096 dimensions. If we consider (cid:83) for the concatenation operation, F n ( i ) indicates the n th feature vector.Then, the output vector of i th sample F ( i ) can be written as F ( i ) = (cid:91) i =1 F n ( i ) (4) To investigate the performance of our proposed DeepCervix network, we have applied publicly available SIPAKMEDdataset consisting of 4049 annotated cervical pap smear cell images [53]. A set of dataset is displayed in Fig. 5.Based on the cell appearance and morphology, expert cytopathologists classiﬁed the cells into ﬁve categories, such assuperﬁcial-intermediate, parabasal, koilocytotic, metaplastic and dyskeratotic. More precisely, Superﬁcial-intermediateand parabasal cells can be further categorized as normal cells, koilocytotic and dyskeratotic cells are recognized asabnormal cells, and metaplastic cells are counted under benign cells. Table 1 provides the distribution of cells accordingto their classes. Table 1: Distribution of the SIPAKMED database

Category Number of CellsSuperﬁcial Normal 831Parabasal 787Koilocytotic Abnormal 825Dyskeratotic 813Metaplastic Benign 793Total 4049

SIPAKMED dataset comprises 4049 annotated cervical cell images. Among them, 60% of the dataset in each classis used for training, 20% is for validation, and 20% is for testing. We have performed 5-class (superﬁcial, parabasal,koilocytotic, metaplastic and dyskeratotic), 3-class (Normal, abnormal and benign) and 2-class (Normal and abnormal)classiﬁcation of the dataset. Moreover, data augmentation techniques are used on the training set, which increases thetraining dataset by a factor of 6. The resulted training, validation and test dataset is shown in Table 2.9

PREPRINT - F

EBRUARY

25, 2021Table 2: The experimental data setting of SIPAKMED dataset

Dataset Total Number of Images5-Class 3-Class 2-ClassTraining 16982 16989 13664Validation 811 811 652Test 812 811 652

In this experiment, we have used Google Colaboratory, which is a cloud service based on Jupyter notebook, to train andtest our model [54]. Python 2 and 3 are pre-conﬁgured with many other ML libraries, such as Tensorﬂow, MatplotLib,Keras, PyTorch and OpenCV in Jupyter notebook. It provides run time with fully functional GPU (NVIDIA Tesla K80)in Colab environment to exercise DL. Moreover, the codes are protected in Google drive.

To overcome the bias among the different algorithms, selecting a suitable evaluation metric is vital. Precision, recall,F1 score and accuracy are the most standard measures to evaluate the classiﬁcation performance [55]. The number ofcorrectly identiﬁed samples among the all recognized representations are known as precision, whereas recall deﬁnes theability of a classiﬁcation model to recognize all the relevant samples. The F1 score combines both metrics, precisionand recall, using the harmonic mean. Accuracy is the proportion of correctly predicted samples from the total numberof samples. The mathematical expressions of the evaluation metrics are shown in Table 3 . In Table 3, true positive(TP) is the number of accurately labeled positive samples, true negative (TN) is the number of correctly classiﬁednegative samples, the number of negative samples classiﬁed as positive are False positive (FP), and the number ofpositive instances predicted as negative is a false negative (FN).Table 3: Evaluation metrics

Assessments Formula

Precision, P TPTP + FP Recall, R TPTP + FN F1 score × P × RP + R Accuracy TP + TNTP + TN + FP + FN To exam the performance of our proposed HDFF method, we have calculated the precision, recall, F1 score andaccuracy of each individual ﬁne-tuned DL models (VGG16, VGG19, ResNet-50, XceptionNet) along with latefusion (LF), where we have implemented the majority voting of diverse classiﬁer (MVDC) and HDFF methods. Theperformance results for the classiﬁcation of cervical cells on the unseen test dataset are shown in Table 4. The resultsare analyzed for binary class, 3-class and 5-class classiﬁcation problems.

Binary classiﬁcation:

In this case, we have classiﬁed the cervical cells into Normal and Abnormal (Table 1). It isseen from Table 4 that, among the four DL models, VGG16 gives the highest average precision, recall, F1 score of1.00, 1.00, 0.998, respectively, with an overall accuracy of 99.85%. After VGG16, ResNet-50 gives the classiﬁcationaccuracy of 99.38%, with an average precision, recall and F1 score of 0.995, 0.995 and 0.990. Whereas, XceptionNetperforms the least among them with an overall accuracy of 98.31%. Moreover, MVDC based LF and HDFF techniquesachieve a similar result as VGG16. 10

PREPRINT - F

EBRUARY

25, 2021Table 4: Performance analysis of the proposed HDFF method along with the base models. (Average Precision (Avg. P),Average Recall (Avg. R), Average F1 score (Avg. F1), Late Fusion (LF)

Cl. Pro. CNN Models Avg. P Avg. R Avg. F1 Acc.(%)2-Class VGG16 1.00 1.00 0.998

VGG19 0.985 0.985 0.990 98.77ResNet-50 0.995 0.995 0.990 99.38XceotionNet 0.980 0.980 0.980 98.31LF 1.00 1.00 0.998

HDFF 1.00 1.00 0.998

For ternary classiﬁcation, we have classiﬁed the cervical cells into Normal, Abnormal andBenign class (Table 1). It can be seen from Table 4 that VGG16 obtains the classiﬁcation accuracy of 97.90% with anaverage precision, recall and F1 score of 97.60%, 97% and 97.3%. VGG19 and ResNet-50 provide the same averageprecision value of 0.963, recall value of 0.943, 0.950 and F1 score of 0.953, 0.956, respectively. Both of them obtain anaccuracy of 96.18%. However, XceptionNet shows the worst performance and contribute with an accuracy of 89.64%.Additionally, LF technique obtains an accuracy of 98.52%, with precision, recall and F1 value of 0.987, 0.980 and0.980, respectively. Our advanced HDFF method obtains the highest classiﬁcation accuracy of 99.38% with an averageprecision, recall and F1 score of 0.993, 0.990, 0.993, respectively.

In this experiment, we have analyzed the cervical cells into ﬁve classes (Table 1). It is shownfrom Table 4 that the highest overall accuracy, precision, recall and F1 score is 99.14%, 0.992, 0.990 and 0.990,obtained by HDFF technique, followed by LF method, VGG16, VGG19, ResNet50 and XceptionNet with an overallaccuracy of 98.64%, 98.27%, 96.43%, 96.06% and 65.77%, respectively. XceptionNet gives the worst performancewith an average precision, recall and F1 score of 0.751, 0.650, 0.639, respectively.The performance results in Table 4 illustrate that our proposed HDFF method (DeepCervix) obtains the highestclassiﬁcation accuracy for binary class, 3-class and 5-class classiﬁcation problem. After the HDFF method, LF achievesthe top classiﬁcation results. Among the four DL models, VGG16 always provides superior performance, whereasthe performance of XceptionNet degrades with the extension of number of classes. It is also observed that binaryclassiﬁcation achieved the highest classiﬁcation accuracy, followed by 3-class and 5-class classiﬁcation problem.

To better illustrate the classiﬁcation performance, we present confusion matrices of our proposed HDFF and LF methodsin Fig. 6. Moreover, Fig. 7 shows the accuracy of each DL, LF, and HDFF models in histogram charts.11

PREPRINT - F

EBRUARY

25, 2021

328 0 04 320 0

328 01 323 162 0 0 1 00 167 0 0 01 0 162 2 00 0 2 157 0 T r u e l a b e l Predicted labelTest Data Confusion Matrix Test Data Confusion Matrix

326 2 00 324 03 0 156160 1 2 1 00 167 0 0 01 2 160 2 00 0 3 156 00 0 0 0 158 T r u e l a b e l Predicted label Predicted labelTest Data Confusion Matrix T r u e l a b e l T r u e l a b e l Predicted label

Predicted label T r u e l a b e l (a) LF and HDFF methods on Binary Classification (b) LF method on 3-class Classification (c) HDFF method on 3-classClassification(d) LF method on 5-class Classification (e) HDFF method on 5-class ClassificationTest Data Confusion Matrix Test Data Confusion Matrix

Figure 6: The confusion matrix of the LF and HDFF methods for 2-class, 3-class and 5-class classiﬁcation problem.If we look at the confusion matrix for binary classiﬁcation in Fig 6-(a), it is seen that both of the models (HDFF and LF)can accurately recognize 328 images as abnormal and 323 images as normal, though one regular image is labeled asabnormal. According to Table 4, both of the models obtained the same accuracy. For 3-class and 5-class classiﬁcations,the HDFF method has better recognition ability than the LF method. From Fig. 6-(c) it is observed that the HDFFmethod can accurately recognize 326 images as abnormal, 324 images as normal, and 156 images as benign, whereasonly ﬁve images are misclassiﬁed. For 5-class classiﬁcation, the HDFF method accurately classiﬁed 805 images out of812 images (Fig. 6-(e)).According to the histogram diagram in Fig. 7, it is recognized that all of the models obtained considerably very highaccuracy for binary classiﬁcation problems. As the number of classes increases, the overall accuracy for individual DLmodels decreases, whereas our proposed HDFF method shows good performance. For 3-class classiﬁcation problem,the accuracy for the HDFF method is 99.38%, which is 1.48%, 3.2%, 3.2%, 9.74%, 0.86% higher than VGG16, VGG19,ResNet-50, XceptionNet and LF method, respectively. For 5-class classiﬁcation, the highest classiﬁcation accuracy is99.14%, achieved using HDFF method, which is an improvement of 0.87% than VGG16, 0.5% than LF, 2.71% thanVGG19, 3.08% than ResNet50, and 33.37% than XceptionNet.

Table 5 presents a comparative analysis of our proposed HDFF method with existing works to classify cervical cellsusing the SIPAKMED dataset. It is recognized from the table that our proposed HDFF method obtained the highestclassiﬁcation accuracies on binary and multiclass classiﬁcation problems. For binary and 5-class classiﬁcation problems,12

PREPRINT - F

EBRUARY

25, 2021 .

85 97 . . .

77 96 .

18 96 . .

38 96 .

18 96 . .

31 89 . . .

85 98 .

52 98 . .

85 99 .

38 99 . B INARY CLASS 3 -CLASS 5 -CLASS

VGG16 VGG19 ResNet50 XceptionNet LF HDFF

Figure 7: Performance comparison among different TL models with HDFF and LF methods.our method obtained 1.60% and 0.19% higher accuracies than the current studies. It is noticed that the 3-classclassiﬁcation problem has not been addressed in existing researches.Table 5: Comparison of classiﬁcation accuracies on SIPAKMED dataset

Ref. Method Class Accuracy[53] CNN 5-Class 95.35%[56] Graph convolutionalnetwork 5-Class 98.37%[57] DenseNet-161 5-Class 98.96%[58] Bagging EnsembleClassiﬁer 2-Class5-Class 98.25%94.09%Our method HDFF 2-Class3-Class5-Class

In our experiment, ﬁrst, we have trained the individual DL models (VGG16, VGG19, ResNet50, XceptionNet) andsaving them with their weights separately. Then, we use those saved models and their weights and perform furthertraining in the HDFF method stage. To train each DL model, it takes around six hours for 100 epochs (using googlecolab). To train the HDFF model by using the saved models requires only a few minutes(3 seconds per epoch). Thoughit requires quite a long time for training, the testing time is around 2.5 seconds for each cervical cells.

Publicly available pap smear benchmark dataset (Herlev dataset) [17], consists of 917 single-cell images, is employedto evaluate our proposed HDFF method. This dataset is divided into seven classes. These seven classes can be furtherclassiﬁed into benign and malignant. The benign class consists of 242 images, and the malignant class consists of 675images. The details of the dataset are given in Table 6.Our experiment took 60% images of each class for training, 20% is for validation, and the rest is for testing. Besides,the data augmentation technique is addressed on the training set, which increases the training dataset by a factor of 14.The resulting training, validation, and test dataset for 7-class and 2-class classiﬁcation problems are given in Table 7.13

PREPRINT - F

EBRUARY

25, 2021Table 6: Distribution of the Herlev dataset

Category Number of CellsNormal squamous Normal 74Intermediate squamous 70Columnar 98Mild dysplasia Abnormal 182Moderate dysplasia 146Severe dysplasia 197Carcinoma in situ 150Total 917

Table 7: The experimental data setting of Herlev dataset

Dataset Total Number of Images7-Class 2-ClassTraining 8190 8235Validation 185 184Test 186 184

Table 8 presents the classiﬁcation performance of four different DL models with the LF and HDFF methods. The fourCNN models are accepted as a backbone network of LF and HDFF models. For binary classiﬁcation of the Herlevdataset, it is observed that ResNet-50 provides the highest precision, recall, and F1 score for distinguishing the normalcervical cells from the abnormal one amid of the four CNN models, followed by VGG19, VGG16, and XceptionNet.Among the LF and HDFF methods, the HDFF method achieves the highest classiﬁcation accuracy of 98.91%, which is1.08% higher than the LF method.For the 7-class classiﬁcation of the Herlev dataset, ResNet-50 provides the highest classiﬁcation accuracy of 83.87%among the four CNN models, whereas XceptionNet performs the worst and gives an accuracy of 39.78%. The LFapproach reaches 86.02% accuracy, with an average precision, recall, and F1 score of 0.887, 0.872, 0877, respectively.Moreover, our proposed HDFF method obtains the highest classiﬁcation accuracy of 90.32%, with an average precision,recall, and F1 score of 0.915, 0.911, and 0.916, respectively.It is recognized that, for both the binary and multiclass classiﬁcation problems, ResNet-50 obtains the highestclassiﬁcation accuracy among the four DL models. After ResNet50, the LF model achieves better results than theindividual DL models, whereas the HDFF method obtains the highest classiﬁcation accuracy.

Table 9 compares the performance results of existing studies with our proposed HDFF method in terms of overallclassiﬁcation accuracy for 2-class and 7-class classiﬁcation problems. A higher accuracy value indicates a higher rateof correct classiﬁcations. It is observed from the table that most of the existing work perform binary class classiﬁcationtasks, and they obtain accuracy above 90%. However, only a few papers addressed both the binary and multiclassclassiﬁcation of the Herlev dataset. For the multiclass classiﬁcation problem, the classiﬁcation accuracy is between68.54% to 95.9%. [33] obtains the highest accuracy for 7-class classiﬁcation, but it requires pre-segmented cervical cellimages. It is further observed from Table 9 that our proposed HDFF method outperforms existing methods in mostcases, which shows the robustness of our proposed algorithm.

Lately, the advancement of DL is solving critical tasks in the medical domain. Classiﬁcation of cervical cells can helpidentify the cancerous subjects early, which is a signiﬁcant step to prevent cervical cancers. This study proposes theHDFF method (DeepCervix) to classify the cervical cells on the SIPAKMED and Herlev datasets and obtained excellentresults. 14

PREPRINT - F

EBRUARY

25, 2021Table 8: Performance analysis of the proposed HDFF method along with the base models on Herlev dataset. (AveragePrecision (Avg. P), Average Recall (Avg. R), Average F1 score (Avg. F1), Late Fusion (LF)

Cl. Pro. CNN Models Avg. P Avg. R Avg. F1 Acc.(%)2-Class VGG16 0.880 0.895 0.885 90.76VGG19 0.910 0.845 0.870 90.76ResNet-50 0.950 0.930 0.940 95.11XceptionNet 0.850 0.815 0.835 87.50LF 0.985 0.960 0.975 97.83HDFF 0.995 0.980 0.985

Table 9: Comparison of classiﬁcation accuracies on Herlev dataset ( BPNN (Back propagation neural network), LSSVM(Least-squares support-vector machines), HVCA (Hybrid variational convolutional autoencoder), ETL (Ensembledtransfer learning), Cl. Pro.(Classiﬁcation problem), Acc (Accuracy))

Ref. Method Cl. Pro. Acc[59] BPNN 3-Class 79%[25] Hybrid ensemble 2-Class7-Class 98%78%[26] AlexNet & LSSVM 2-Class 94.61%[28] AlexNet & SVM 2-Class 99.19%[29] VGG16 & ResNet 2-Class 86%[30] CNN & TL 2-Class 98.3%[60] CNN & TL 2-Class 95.1%[31] AlexNet & TL & DT 2-Class7-Class 99.3%93.2%[32] Morphology & CNN 2-Class7-Class 94.5%64.5%[33] VGG-like network(Segmened image) 2-Class7-Class 98.10% [34] DenseNet161 2-Class7-Class 94.38%68.54%[61] HVCA 2-Class [62] Pretrained ResNet50 2-Class 97.89%[39] ETL 2-Class 98.37%Our method HDFF 2-Class7-Class 98.91%90.32%

Imaging modality, image quality, dataset distribution, model structure, complexity, loss function, optimization functionand number of epochs are some critical factors that inﬂuence a model’s performance. When we observe the performancemetrics for the SIPAKMED dataset in Table 4, VGG16 performs relatively well compared to ResNet50, VGG19, andXceptionNet. Therefore, a shallow network performs better than a very deep network for the SIPAKMED dataset. Ifwe consider the network architecture for VGG16, it contains very small receptive ﬁelds, which enables to have moreweight layers and thus to improve performance. The LF model based on MVDC shows a slight improvement in the15

PREPRINT - F

EBRUARY

25, 2021

Original: Dyskeratotic

Prediction: KoilocytoticConfidence score: 0.898 Original: DyskeratoticPrediction: KoilocytoticConfidence score: 0.898

Original: Koilocytotic

Prediction: DyskeratoticConfidence score: 0.994

Original: Koilocytotic

Prediction: DyskeratoticConfidence score: 0.993

Original: Metaplastic

Prediction: KoilocytoticConfidence score: 0.694 (a) (d)(b) (c) (e)

Figure 8: Examples of misclassiﬁed cervical cells from SIPAKMED dataset.

Original: CarcinomaPred: Severe dysplasticConfidence score: 0.744

Original: Carcinoma

Pred: Severe dysplasticConfidence score: 0.953

Original: Columnar

Pred: Light dyskeratoticConfidence score: 0.919 Original: Moderate dysplasticPred: Carcinoma

Confidence score: 0.337

Original: ColumnarPred: Severe dysplasticConfidence score: 0.851 (a) (d)(b) (c) (e)

Figure 9: Examples of misclassiﬁed cervical cells from Herlev dataset.overall result, but it cannot always guarantee leading performance. Besides, the HDFF method can effectively improvethe classiﬁcation performance and provides the best result. It is observed from Fig. 6 that the HDFF method cancorrectly classify 805 images out of 812 images in a 5-class classiﬁcation task. It is also observed that Koilocytotic andmetaplastic are challenging cells to classify. For the Herlev dataset (Table 8), unlike SIPAKMED, ResNet-50 performsbetter than other DL models. Therefore, it is observed that, for highly imbalanced and small datasets, ResNet-50is preferable. Besides, the best performance is obtained by the HDFF method for 2-class and 7-class classiﬁcationproblems.Fig. 8 and Fig. 9 provide examples of misclassiﬁed cervical cells on the SIPAKMED and Herlev dataset for the 5-classand 7-class classiﬁcation problem. It can be seen from Fig 8-(a) that, for the dyskeratotic class image, the cell boundaryand nucleus are hard to distinguish and are wrongly listed as Koilocytotic with a conﬁdence score of 0.898. ForFig. 8-(b),(c) the Dyskeratotic and Koilocytotic class image looks identical with the invisible nucleus boundary andmisclassiﬁed as koilocytotic and Dyskeratotic, respectively. Fig. 8-(d) reveals that the dark stained koilocytotic cellis misclassiﬁed as Dyskeratotic. From Fig. 8-(e), it can be found that the content of the Metaplastic cell is too darkto identify the cell and nucleus region and misclassiﬁed as koilocytotic with a conﬁdence score of 0.694. Accordingto Fig. 9-(a),(b) two dark-stained carcinoma images are labeled as severe dysplastic. In Fig. 9-(c),(d) two columnarimages, which look very different to each other, are misclassiﬁed as light and severe dysplastic. It can be seen fromFig. 9-(e) that a moderate dysplastic cell image is misclassiﬁed as carcinoma. For all the misclassiﬁed images, it isrecognized that none of them contain adequate information about a cell.

This study proposes a deep learning-based HDFF and LF method to classify cervical cells. It is observed from theperformance metrics that the HDFF method achieves higher classiﬁcation accuracies compared to the LF method.Unlike other methods that rely on pre segmentation of cytoplasm/nucleus and hand-crafted features, our proposedmethod offers end-to-end classiﬁcation of cervical cells using deep features. SIPAKMED and Herlev datasets areutilized to evaluate the performance of our proposed model. For the SIPAKMED dataset, we have obtained thestate-of-the-art accuracy of 99.85%, 99.38%, and 99.14% for 2-class, 3-class, and 5-class classiﬁcation problems. Wehave reached 98.91% accuracy for the Herlev dataset for a binary classiﬁcation problem and 90.32% for the 7-Classclassiﬁcation problem. 16

PREPRINT - F

EBRUARY

25, 2021Though our method provides very good performance, there are a few limitations. First of all, despite the high accuracyof the SIPAKMED dataset, the performance of our method degrades for 7-class classiﬁcation on the Herlev dataset.An ideal screening system should not miss any abnormal cells. To overcome this for the multiclass classiﬁcationproblem, we could have integrated pre-segmented cell features into our model. Secondly, for our HDFF method, wehave investigated four DL models, ﬁne-tuned them, and integrate their features to get the ﬁnal model. In the future,we can investigate other DL models and compare their results for the multiclass classiﬁcation accuracy. Thirdly, ourproposed method should be generalized for the classiﬁcation involving cell overlapping. Finally, poison noise is acritical factor for cervical cell images that degrades model performance. Therefore, the denoising methods, such asadaptive wiener ﬁlter [63] in the preprocessing step can be implemented to improve the model’s overall performance.

References [1] Md Mamunur Rahaman, Chen Li, Xiangchen Wu, Yudong Yao, Zhijie Hu, Tao Jiang, Xiaoyan Li, and ShouliangQi. A survey for cervical cytopathology image analysis using deep learning.

IEEE Access , 8:61687–61710, 2020.[2] WHO et al. Who guidelines for the use of thermal ablation for cervical pre-cancer lesions. 2019.[3] Jacques Ferlay, M Ervik, F Lam, M Colombet, L Mery, M Piñeros, A Znaor, I Soerjomataram, and F Bray. Globalcancer observatory: cancer today.

Lyon, France: International Agency for Research on Cancer , 2018.[4] Tanja Šarenac and Momir Mikov. Cervical cancer, different treatments and importance of bile acids as therapeuticagents in this disease.

Frontiers in Pharmacology , 10:484, 2019.[5] Debbie Saslow, Diane Solomon, Herschel W Lawson, Maureen Killackey, Shalini L Kulasingam, Joanna Cain,Francisco AR Garcia, Ann T Moriarty, Alan G Waxman, David C Wilbur, et al. American cancer society, americansociety for colposcopy and cervical pathology, and american society for clinical pathology screening guidelinesfor the prevention and early detection of cervical cancer.

American journal of clinical pathology , 137(4):516–542,2012.[6] World Health Organization (WHO) et al. Screening as well as vaccination is essential in the ﬁght against cervicalcancer. , 28:2018, 2014.[7] Elizabeth Davey, Alexandra Barratt, Les Irwig, Siew F Chan, Petra Macaskill, Patricia Mannes, and A MarionSaville. Effect of study design and quality on unsatisfactory rates, cytology classiﬁcations, and accuracy inliquid-based versus conventional cervical cytology: a systematic review.

The Lancet , 367(9505):122–132, 2006.[8] George N Papanicolaou. New cancer diagnosis.

CA: A Cancer Journal for Clinicians , 23(3):174–179, 1973.[9] George N Papanicolaou and Herbert F Traut. The diagnostic value of vaginal smears in carcinoma of the uterus.

American Journal of Obstetrics and Gynecology , 42(2):193–206, 1941.[10] Tarik M Elsheikh, R Marshall Austin, David F Chhieng, Fern S Miller, Ann T Moriarty, and Andrew A Renshaw.American society of cytopathology workload recommendations for automated pap test screening: Developed bythe productivity and quality assurance in the era of automated screening task force.

Diagnostic cytopathology ,41(2):174–178, 2013.[11] Aslı GençTav, Selim Aksoy, and Sevgen ÖNder. Unsupervised segmentation and classiﬁcation of cervical cellimages.

Pattern recognition , 45(12):4151–4168, 2012.[12] Richard Lozano. Comparison of computer-assisted and manual screening of cervical cytology.

GynecologicOncology , 104(1):134–138, 2007.[13] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, MohsenGhafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning inmedical image analysis.

Medical image analysis , 42:60–88, 2017.[14] K Krishna and M Narasimha Murty. Genetic k-means algorithm.

IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics) , 29(3):433–439, 1999.[15] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu.An efﬁcient k-means clustering algorithm: Analysis and implementation.

IEEE transactions on pattern analysisand machine intelligence , 24(7):881–892, 2002.[16] Hansang Lee and Junmo Kim. Segmentation of overlapping cervical cells in microscopic images with superpixelpartitioning and cell-wise contour reﬁnement. In

Proceedings of the IEEE conference on computer vision andpattern recognition workshops , pages 63–69, 2016. 17

PREPRINT - F

EBRUARY

25, 2021[17] Jan Jantzen, Jonas Norup, Georgios Dounias, and Beth Bjerregaard. Pap-smear benchmark data for patternclassiﬁcation.

Nature inspired Smart Information Systems (NiSIS 2005) , pages 1–9, 2005.[18] Yannis Marinakis, Magdalene Marinaki, and Georgios Dounias. Particle swarm optimization for pap-smeardiagnosis.

Expert Systems with Applications , 35(4):1645–1656, 2008.[19] Yannis Marinakis, Georgios Dounias, and Jan Jantzen. Pap smear diagnosis using a hybrid intelligent schemefocusing on genetic algorithm based feature selection and nearest neighbor classiﬁcation.

Computers in Biologyand Medicine , 39(1):69–78, 2009.[20] Khin Yadanar Win, Somsak Choomchuay, Kazuhiko Hamamoto, Manasanan Raveesunthornkiat, Likit Rangsirat-tanakul, and Suriya Pongsawat. Computer aided diagnosis system for detection of cancer cells on cytologicalpleural effusion images.

BioMed research international , 2018, 2018.[21] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.

Deep learning , volume 1. MIT pressCambridge, 2016.[22] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, AndrejKarpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.

Internationaljournal of computer vision , 115(3):211–252, 2015.[23] Michael S Landau and Liron Pantanowitz. Artiﬁcial intelligence in cytopathology: a review of the literature andoverview of commercial landscape.

Journal of the American Society of Cytopathology , 8(4):230–241, 2019.[24] P Sukumar and RK Gnanamurthy. Computer aided detection of cervical cancer using pap smear images basedon hybrid classiﬁer.

International Journal of Applied Engineering Research, Research India Publications ,10(8):21021–21032, 2015.[25] Abid Sarwar, Vinod Sharma, and Rajeev Gupta. Hybrid ensemble learning technique for screening of cervicalcancer using papanicolaou smear image analysis.

Personalized Medicine Universe , 4:54–62, 2015.[26] Kangkana Bora, Manish Chowdhury, Lipi B Mahanta, Malay K Kundu, and Anup K Das. Pap smear imageclassiﬁcation using convolutional neural network. In

Proceedings of the Tenth Indian Conference on ComputerVision, Graphics and Image Processing , pages 1–8, 2016.[27] Jonghwan Hyeon, Ho-Jin Choi, Kap No Lee, and Byung Doo Lee. Automating papanicolaou test using deepconvolutional activation feature. In , pages 382–385. IEEE, 2017.[28] Bilal Taha, Jorge Dias, and Naoufel Werghi. Classiﬁcation of cervical-cancer using pap-smear images: aconvolutional neural network approach. In

Annual Conference on Medical Image Understanding and Analysis ,pages 261–272. Springer, 2017.[29] Hakan Wieslander, Gustav Forslid, Ewert Bengtsson, Carolina Wahlby, Jan-Michael Hirsch, ChristinaRunow Stark, and Sajith Kecheril Sadanandan. Deep convolutional neural networks for detecting cellularchanges due to malignancy. In

Proceedings of the IEEE International Conference on Computer Vision Workshops ,pages 82–89, 2017.[30] Ling Zhang, Le Lu, Isabella Nogues, Ronald M Summers, Shaoxiong Liu, and Jianhua Yao. Deeppap: deepconvolutional networks for cervical cell classiﬁcation.

IEEE journal of biomedical and health informatics ,21(6):1633–1643, 2017.[31] Srishti Gautam, Nirmal Jith, Anil K Sao, Arnav Bhavsar, Adarsh Natarajan, et al. Considerations for a pap smearimage analysis system with cnn features. arXiv preprint arXiv:1806.09025 , 2018.[32] Haoming Lin, Yuyang Hu, Siping Chen, Jianhua Yao, and Ling Zhang. Fine-grained classiﬁcation of cervical cellsusing morphological and appearance based convolutional neural networks.

IEEE Access , 7:71541–71549, 2019.[33] Khalid Hamed S Allehaibi, Lukito Edi Nugroho, Lutfan Lazuardi, Anton Satria Prabuwono, Teddy Mantoro, et al.Segmentation and classiﬁcation of cervical cells using deep learning.

IEEE Access , 7:116925–116941, 2019.[34] Yuttachon Promworn, Satjana Pattanasak, Chuchart Pintavirooj, and Wibool Piyawattanametha. Comparisons ofpap smear classiﬁcation with deep learning models. In , pages 282–285. IEEE, 2019.[35] Long D Nguyen, Ruihan Gao, Dongyun Lin, and Zhiping Lin. Biomedical image classiﬁcation based on a featureconcatenation and ensemble of deep cnns.

Journal of Ambient Intelligence and Humanized Computing , pages1–13, 2019.[36] Nacer Eddine Benzebouchi, Nabiha Azizi, Amira S Ashour, Nilanjan Dey, and R Simon Sherratt. Multi-modalclassiﬁer fusion with feature cooperation for glaucoma diagnosis.

Journal of Experimental & Theoretical ArtiﬁcialIntelligence , 31(6):841–874, 2019. 18

PREPRINT - F

EBRUARY

25, 2021[37] Wei Xue, Xiangyang Dai, and Li Liu. Remote sensing scene classiﬁcation based on multi-structure deep featuresfusion.

IEEE Access , 8:28746–28755, 2020.[38] Zhiqiong Wang, Mo Li, Huaxia Wang, Hanyu Jiang, Yudong Yao, Hao Zhang, and Junchang Xin. Breastcancer detection using extreme learning machine based on feature fusion with cnn deep features.

IEEE Access ,7:105146–105158, 2019.[39] Dan Xue, Xiaomin Zhou, Chen Li, Yudong Yao, Md Mamunur Rahaman, Jinghua Zhang, Hao Chen, JinpengZhang, Shouliang Qi, and Hongzan Sun. An application of transfer learning and ensemble learning techniques forcervical histopathology image classiﬁcation.

IEEE Access , 8:104603–104618, 2020.[40] Ashnil Kumar, Jinman Kim, David Lyndon, Michael Fulham, and Dagan Feng. An ensemble of ﬁne-tunedconvolutional neural networks for medical image classiﬁcation.

IEEE journal of biomedical and health informatics ,21(1):31–40, 2016.[41] Javeria Amin, Abida Sharif, Nadia Gul, Muhammad Almas Anjum, Muhammad Wasif Nisar, Faisal Azam, andSyed Ahmad Chan Bukhari. Integrated design of deep features fusion for localization and classiﬁcation of skincancer.

Pattern Recognition Letters , 131:63–70, 2020.[42] Md Mamunur Rahaman, Chen Li, Yudong Yao, Frank Kulwa, Mohammad Asadur Rahman, Qian Wang, ShouliangQi, Fanjie Kong, Xuemin Zhu, and Xin Zhao. Identiﬁcation of covid-19 samples from chest x-ray images usingdeep learning: A comparison of transfer learning approaches.

Journal of X-Ray Science and Technology ,(Preprint):1–19, 2020.[43] David Rolnick, Andreas Veit, Serge Belongie, and Nir Shavit. Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694 , 2017.[44] Sander Dieleman, Kyle W Willett, and Joni Dambre. Rotation-invariant convolutional neural networks for galaxymorphology prediction.

Monthly notices of the royal astronomical society , 450(2):1441–1459, 2015.[45] Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. A survey of the recent architectures ofdeep convolutional neural networks.

Artiﬁcial Intelligence Review , pages 1–62, 2020.[46] Víctor Suárez-Paniagua and Isabel Segura-Bedmar. Evaluation of pooling operations in convolutional architecturesfor drug-drug interaction extraction.

BMC bioinformatics , 19(8):39–47, 2018.[47] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014.[48] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.[49] François Chollet. Xception: Deep learning with depthwise separable convolutions. In

Proceedings of the IEEEconference on computer vision and pattern recognition , pages 1251–1258, 2017.[50] Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, and Samy Bengio. Transfusion: Understanding transfer learningfor medical imaging. In

Advances in neural information processing systems , pages 3347–3357, 2019.[51] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.

IEEE Transactions on knowledge and dataengineering , 22(10):1345–1359, 2009.[52] Jian Yang, Jing-yu Yang, David Zhang, and Jian-feng Lu. Feature fusion: parallel strategy vs. serial strategy.

Pattern recognition , 36(6):1369–1381, 2003.[53] Marina E Plissiti, Panagiotis Dimitrakopoulos, Giorgos Sﬁkas, Christophoros Nikou, O Krikoni, and AntoniaCharchanti. Sipakmed: A new dataset for feature and image based classiﬁcation of normal and pathologicalcervical cells in pap smear images. In ,pages 3144–3148. IEEE, 2018.[54] Ekaba Bisong. Google colaboratory. In

Building Machine Learning and Deep Learning Models on Google CloudPlatform , pages 59–64. Springer, 2019.[55] P Sukumar and RK Gnanamurthy. Computer aided detection of cervical cancer using pap smear images based onadaptive neuro fuzzy inference system classiﬁer.

Journal of Medical Imaging and Health Informatics , 6(2):312–319, 2016.[56] J. Shi, R. Wang, Yushan Zheng, Z. Jiang, and Lanlan Yu. Graph convolutional networks for cervical cellclassiﬁcation. 2019.[57] Muhammed Talo. Diagnostic classiﬁcation of cervical cell images from pap smear slides.

Academic PerspectiveProcedia , 2(3):1043–1050, 2019. 19

PREPRINT - F

EBRUARY

25, 2021[58] Kyi Pyar Win, Yuttana Kitjaidure, Kazuhiko Hamamoto, and Thet Myo Aung. Computer-assisted screening forcervical cancer using digital image processing of pap smear images.

Applied Sciences , 10(5):1800, 2020.[59] Seema Singh, V Tejaswini, Rishya P Murthy, and Amit Mutgi. Neural network based automated system fordiagnosis of cervical cancer.

International Journal of Biomedical and Clinical Engineering (IJBCE) , 4(2):26–39,2015.[60] Loris Nanni, Stefano Ghidoni, and Sheryl Brahnam. Handcrafted vs. non-handcrafted features for computer visionclassiﬁcation.

Pattern Recognition , 71:158–172, 2017.[61] Aditya Khamparia, Deepak Gupta, Joel JPC Rodrigues, and Victor Hugo C de Albuquerque. Dcavn: Cervicalcancer prediction and classiﬁcation using deep convolutional and variational autoencoder network.

MultimediaTools and Applications , pages 1–17, 2020.[62] Aditya Khamparia, Deepak Gupta, Victor Hugo C de Albuquerque, Arun Kumar Sangaiah, and Rutvij H Jhaveri.Internet of health things-driven deep learning system for detection and classiﬁcation of cervical cells using transferlearning.

The Journal of Supercomputing , pages 1–19, 2020.[63] TP Deepa and A Nagaraja Rao. A study on denoising of poisson noise in pap smear microscopic image.

Indian JSci Technol , 9:45, 2016.