[PDF] Dermo-DOCTOR: A framework for concurrent skin lesion detection and recognition using a deep convolutional neural network with end-to-end dual encoders

Abstract

Automated skin lesion analysis for simultaneous detection and recognition is still challenging for inter-class homogeneity and intra-class heterogeneity, leading to low generic capability of a Single Convolutional Neural Network (CNN) with limited datasets. This article proposes an end-to-end deep CNN-based framework for simultaneous detection and recognition of the skin lesions, named Dermo-DOCTOR, consisting of two encoders. The feature maps from two encoders are fused channel-wise, called Fused Feature Map (FFM). The FFM is utilized for decoding in the detection sub-network, concatenating each stage of two encoders' outputs with corresponding decoder layers to retrieve the lost spatial information due to pooling in the encoders. For the recognition sub-network, the outputs of three fully connected layers, utilizing feature maps of two encoders and FFM, are aggregated to obtain a final lesion class. We train and evaluate the proposed Dermo-Doctor utilizing two publicly available benchmark datasets, such as ISIC-2016 and ISIC-2017. The achieved segmentation results exhibit mean intersection over unions of 85.0 % and 80.0 % respectively for ISIC-2016 and ISIC-2017 test datasets. The proposed Dermo-DOCTOR also demonstrates praiseworthy success in lesion recognition, providing the areas under the receiver operating characteristic curves of 0.98 and 0.91 respectively for those two datasets. The experimental results show that the proposed Dermo-DOCTOR outperforms the alternative methods mentioned in the literature, designed for skin lesion detection and recognition. As the Dermo-DOCTOR provides better-results on two different test datasets, even with limited training data, it can be an auspicious computer-aided assistive tool for dermatologists.

Full PDF

DDermo-DOCTOR: A framework for concurrent skin lesiondetection and recognition using a deep convolutional neuralnetwork with end-to-end dual encoders

Md. Kamrul Hasan a,1, ∗ , Shidhartho Roy a , Chayan Mondal a , Md. Ashraful Alam a , Md.Touﬁck E Elahi a , Aishwariya Dutta b , S. M. Taslim Uddin Raju c , Md. Tasnim Jawad a ,Mohiuddin Ahmad a a Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology,Khulna-9203, Bangladesh b Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna-9203,Bangladesh c Department of Computer Science and Engineering, Khulna University of Engineering & Technology,Khulna-9203, Bangladesh

Abstract

Background and Objective

Automated skin lesion analysis for simultaneous detection and recognition is still chal-lenging for inter-class homogeneity and intra-class heterogeneity, leading to low generic ca-pability of a Single Convolutional Neural Network (CNN) with limited datasets.

Methods

This article proposes an end-to-end deep CNN-based framework for simultaneous de-tection and recognition of the skin lesions, named Dermo-DOCTOR, consisting of two en-coders. The feature maps from two encoders are fused channel-wise, called Fused FeatureMap (FFM). The FFM is utilized for decoding in the detection sub-network, concatenatingeach stage of two encoders’ outputs with corresponding decoder layers to retrieve the lostspatial information due to pooling in the encoders. For the recognition sub-network, theoutputs of three fully connected layers, utilizing feature maps of two encoders and FFM, areaggregated to obtain a ﬁnal lesion class. We train and evaluate the proposed Dermo-Doctorutilizing two publicly available benchmark datasets, such as ISIC-2016 and ISIC-2017.1 a r X i v : . [ ee ss . I V ] F e b esults The achieved segmentation results exhibit mean intersection over unions of 85 . . .

98 and 0 .

91 respectively for those two datasets.The experimental results show that the proposed Dermo-DOCTOR outperforms the alterna-tive methods mentioned in the literature, designed for skin lesion detection and recognition.

Conclusion

As the Dermo-DOCTOR provides better-results on two diﬀerent test datasets, even withlimited training data, it can be an auspicious computer-aided assistive tool for dermatolo-gists.

Keywords:

Malignant melanoma, Skin lesion detection and recognition, Convolutionalneural networks, Dual encoder networks, ISIC skin lesion datasets.

1. Introduction

Cancer is an abnormal and uncontrolled growth of dividing cells, damaging diﬀerent bodycells and contributing to the world’s second-leading cause of death [76]. Although melanomasconstitute less than 5 . . .

27 % men and 45 .

73 % women), and105904 (58 .

14 % men and 41 .

86 % women) will die. The ﬁve-year survival rate of melanoma, ∗ I am corresponding author

Email addresses: [email protected] (Md. Kamrul Hasan), [email protected] (Shidhartho Roy), [email protected] (Chayan Mondal), [email protected] (Md.Ashraful Alam), [email protected] (Md. Touﬁck E Elahi), [email protected] (Aishwariya Dutta), [email protected] (S. M. Taslim Uddin Raju), [email protected] (Md. Tasnim Jawad), [email protected] (Mohiuddin Ahmad) Department of EEE, KUET, Khulna-9203, Bangladesh.

Preprint submitted to Biomedical Signal Processing and Control February 24, 2021 ,000 kilometers

Arctic OceanIndian OceanSouth Atlantic OceanNorth Atlantic OceanSouth Pacific Ocean

North Pacific

Ocean Southern Ocean

A S I AS O U T H A M E R I C A A N T A R C T I C AA F R I C A

E U R O P E

N O R T H A M E R I C A A U S T R A L I A

Age-standardised rate per 100,000

No Data

Figure 1: A world heat map of the age-standardized rates per 1 . the deadliest variety of early detection, is as high as 99 . . . . .

0, 26 .

0, and 59 .

34, which are verylow compared to the required numbers [15, 16, 26]. However, an automated Computer-aidedScreening (CAS) system has become popular among dermatologists to alleviate the abovelimitations, reduce the working burden of dermatologists, and accelerate diagnosis rates [50].Such CAS systems essentially consist of several integral parts, where the segmentation forRegion of Interest (ROI) extraction and the classiﬁcation for lesion recognition. However,3mage-based automated CAS systems are highly challenging for the following hurdles: • Wide range of intra-class variance in colors, textures, edges, and shapes and homo-geneity in inter-classes. • Sometimes, low contrasts and unclear boundaries (edges) in the malignant and otherclass images. • Lesion ROI frequently shares similar visual characteristics and subtle distinctions dueto lighting, perspective, and spatial information within an image. • The appearance of diﬀerent artifacts, such as natural (hairs, veins) or synthetic (airbubbles, ruler lines, color balance charts, marker signs, paint, ink color, artiﬁcial ob-jects, etc. ), LED lighting, darker border (microscopic eﬀects), and non-uniform vi-gnetting, as depicted in Fig. 2. • Lesion ROI only covers a small proportion of local, subtle grain, and global contextinformation. • Unavailability of a large number of manually annotated images, which is the corerequirement of the supervised learning systems.

Hair Dark corners Ink colorRuler Object

Led illuminationNon-uniform Vignetting

Water Bubble

Marker Sign Gel

Figure 2: An example of the challenging dermoscopic images in ISIC dataset [13, 29] with diﬀerent artifacts[35]. .2. Recent Methods The state-of-the-art methods for skin lesion segmentation and recognition are reviewedand described in the following two subsections.

A multi-stage Fully Convolutional Network (mFCN) with parallel integration was intro-duced by Bi et al. [8]. During the training process, mFCN learned from both the trainingdata and the coarse results obtained from the previous (m-1)FCN stage. The summation ofthe earlier results with the current result had two beneﬁts: boost the training data and opti-mize lesion boundary learning. Navarro et al. [62] proposed a superpixels adaptation-basedsegmentation approach to get tight-to-boundaries of the skin lesions. The authors appliedthe Scale-Invariant Feature Transform (SIFT) [54] and Gaussian distribution to detect thefeature points and place these points to the initial centers. Finally, they applied the simplelinear iterative clustering technique to these points for generating the ﬁnal lesion masks.An encoder-decoder network was built by Sarker et al. [71], called SLSDeep. The encoderin SLSDeep was dilated residual network-based design, while the decoder had a pyramidpooling network. Additionally, they proposed a combined negative log-likelihood and end-point error-based cross-entropy loss function. Jahanifar et al. [45] developed an improvedsaliency detection supervised method for the lesion segmentation, which was designed basedon the discriminative regional feature integration. They used a thresholding algorithm forgenerating a new pseudo background region. Goyal et al. [27] designed an automatic ensem-ble of DL methods, such as DeeplabV3+ [19] and Mask R-Convolutional Neural Network(R-CNN) [38], for generating the precise lesion boundaries. They combined the result oftwo models in three ways: a combination of both masks, picking the larger segmented areafrom the output of both methods, and picking a smaller area from those outputs. Finally,the authors discovered that the ﬁrst method outperforms the other ensembling methods.Al-Masni et al. [2] proposed a Full Resolution Convolutional Network (FrCN) for the skinlesion segmentation, which learns full resolution features from each pixel of an input im-age. A segmentation method is realized by Hawas et al. [36] for the neutrosophic graph5ut algorithm. The initial clusters were obtained using Histogram-based Clustering Esti-mation (HBCE) with the corresponding centroids. The genetic algorithm was applied tooptimize the HBCE for getting the optimal threshold. Then, the Neutrosophic C-means(NCM) mapped the lesions into a Neutrosophic Set (NS) domain. Finally, for the lesionsegmentation, the graph cut algorithm was a cost function. Amin et al. [5] segmented thelesion in two steps. In the ﬁrst step, the authors performed preprocessing to resize theimages to 240 × × Lab to select the luminance channel.Finally, in the second step, biorthogonal 2 D wavelet transform and OTSU algorithm wereapplied for the lesion segmentation. Xie et al. [86] produced a deep CNN, called mutualbootstrapping deep CNN (MB-DCNN). MB-DCNN has three networks, such as a coarsesegmentation network (coarse-SN), a mask-guided classiﬁcation network (mask-CN), andan enhanced segmentation network (enhanced-SN). Coarse-SN was used to roughly segmentthe lesion and feed the label region to mask-CN to boost the classiﬁcation task. Al Naziand Abir [4] compared the performance of two variations of the UNet [70] model for thelesion segmentation, such as UNet without spatial dropout and UNet with spatial dropout.In the end, the authors showed that augmentation and dropout, as regularization methods,with UNet, had less prone to overﬁtting and provided better-segmented lesion masks. Pourand Seker [67] oﬀered a segmentation model based on CNN with CIELAB color space andtransformed domain feature extraction. The authors initially implemented a scratch modelinspired by UNet and FCN, then gradually improved the model by injecting features fromthe transformed domain and adding the input image color model CIElab. They succeeded incoping with the constraints that included small data set, removal of artifacts, excessive dataincrease, and contrast stretching. They also conﬁrmed that the CNN model’s performancewith a domain transfer feature is better than the CNNs with a deep layer network. Many image analysis-based methods have already been proposed and developed by theresearchers for dermoscopic image recognition, where the algorithms generally depend onthe detection and extraction of low-level handcrafted features, such as colors, shapes, tex-6ures, and etc.

Cheng et al. [11] extracted diﬀerent features from the ﬁrst-order histogramprobability such as area, roundness (thinness), mean, standard deviation, skew, energy, andentropy. Finally, the authors applied diﬀerent classiﬁers, namely quadratic discriminantanalysis and Multilayer Perceptron (MLP), on the selected features by the Principal Com-ponents Analysis (PCA) [55]. Diﬀerent visual cues, such as ABCD (Asymmetry, Border,Color, and Diﬀerential structures) rule of dermoscopy [61], texture (neighboring gray-leveldependence matrix, angular second-moment, and kinetics of skin lesions) were extracted byMaglogiannis and Doukas [56]. The authors also selected the features using the sequentialbackward ﬂoating selection, PCA, and generalized sequential feature selection algorithms.Finally, they employed MLP and Support Vector Machine (SVM) [21] for the lesion classi-ﬁcation. Oliveira et al. [64] computed diﬀerent local features employing the bag-of-features[25] approach and texture features, where the authors selected the subset of features using aheuristic search approach. In the end, they applied SVM, Bayesian network, Decision Tree(DT) [1], and Artiﬁcial Neural Network (ANN) as classiﬁers. Hameed et al. [31] developedan intellectual Multi-Class Multi-Level (MCML) classiﬁcation algorithm employing two ap-proaches, such as traditional machine learning and deep learning. In the former method,they applied preprocessing, segmentation, extraction of features, and classiﬁcation. As apreprocessing, they removed the hair, black frames, and circle. Finally, the authors classi-ﬁed the texture and color features employing the ANN. Mporas et al. [60] applied a medianﬁlter followed by bottom-hat ﬁltering to detect natural hair or similar to hair artifacts. Theysegmented the ROIs using the active contour model on the grayscale image. Finally, theyextracted diﬀerent color-based features for classiﬁcation using the MLP and other MachineLearning (ML) algorithms.However, as described earlier, the lesion classiﬁcation algorithms are very complex as theyessentially rely on the handcrafted features extraction method, requiring prior knowledge[20] and lots of parameter tuning. Extensive feature engineering is the key to achievingbetter-performance from them, which is often impossible due to the presence of diﬀerentartifacts in the dermoscopic images (see in Fig. 2). The development of various CNN-basedclassiﬁers has achieved a remarkable result on the ImageNet dataset [14]. Nowadays, in many7omputer vision problems, the contribution of both CNNs and DL techniques are undeniable[28]. CNN is an excellent feature extractor, which necessarily alleviates the manual featureengineering as in the algorithms mentioned above, therefore applying it to recognize medicalimages [87]. Mahbod et al. [57] presented an ensemble-based model for CNNs that combinesinter-and intra-architecture network fusion. The authors applied the ﬁne-tuning of pre-trained VGGNet, AlexNet [48], and two types of ResNet. Finally, the average predictionprobability classiﬁcation vectors from diﬀerent sets were fused to provide the ﬁnal prediction.Brinker et al. [10] exercised ResNet-50 with transfer learning [82]. For the optimization ofthe model, they adopted three techniques. Firstly, they exclusively trained the adapted lastlayer, then ﬁne-tuned all layers’ parameters, and ﬁnally, a sudden increment of the learningrate at speciﬁc time steps during ﬁne-tuning. Zhang et al. [91] presented an AttentionResidual Learning (ARL) CNN model for the skin lesion recognition, which was composedof multiple ARL blocks, a global average pooling, and a classiﬁcation layer. Each ARLblock employed residual learning and novel attention learning mechanisms to improve itscapability for discriminative representation. The authors proposed the attention learningmechanism, which aimed to utilize the intrinsic self-attention ability of DCNNs, i.e., usingthe feature maps learned from a high layer to generate a low-layer attention map instead ofapplying extra learnable layers. An integrated framework for skin lesion boundary detectionas well as for skin lesions classiﬁcation was described by Al-Masni et al. [3]. Firstly, adeep learning method, named FrCN, was used for the lesion boundary extraction. Then,geometric augmentation and transfer learning were integrated with four CNN networks,such as Inception-V3 [79], ResNet-50, Inception-ResNet-V2, and DenseNet-201 [42] for thelesion classiﬁcation. They also showed that segmented lesions improve lesion classiﬁcationresults. Yilmaz and Trocan [88] implemented three deep CNN models named AlexNet,GoogLeNet, and ResNet-50. They compared classiﬁcation performance as well as timecomplexity of the implemented models. For data augmentation, a style-based GenerativeAdversarial Network (GAN) architecture was proposed by Qin et al. [68]. In the end, theauthors applied ResNet-50, with transfer learning, for the lesion classiﬁcation. Khan et al.[47] proposed a model for the lesion classiﬁcation, which included the localization of lesion8OI via faster region-based CNN, feature extraction, and feature selection by iteration-controlled Newton-Raphson method. The ABC-based method was ﬁrst used for contraststretching and then used for lesion segmentation. DenseNet-201, via transfer learning, wasused to extract deep-level features, and those features were classiﬁed using an MLP. Gessertet al. [24] ensembled diﬀerent DL methods, such as EﬃcientNets [80], SENet [41], andResNeXt, by a selection strategy. They used multi-resolution input by multi-crop evaluationand two diﬀerent cropping strategies. The encoding of metadata as a feature vector wasconcatenated with the dense (fully connected) neural network. Valle et al. [83] optimizedthe hyperparameter of two deep CNN models, ResNet-101-V2, and Inception-V4 employingtransfer learning with data augmentation. They select the best performing classiﬁer using theANOVA test [73]. Finally, the authors concluded that the transfer learning and ensemblingmodel is a better choice for lesion classiﬁcation.

The above-discussions on the automatic skin lesion diagnosis methods conﬁrm that thedeep CNN approaches are commonly applied nowadays than the diﬀerent systems relyingon handcrafted features. The former approaches provide good reproducibility of resultsand boost diagnostic procedures’ speed while being end-to-end methods. However, theCNN-based skin lesion analysis methods suﬀer from data scarcity to evade overﬁtting. Theensembling of diﬀerent CNN architectures can mitigate those CNN’s limitations, as provenby Harangi [32]. In many articles [24, 27, 30, 32, 51, 58, 66, 75], the authors ﬁrst traineddiﬀerent CNN models independently and then aggregated their outputs for developing en-sembling models. Such an ensembling is tedious and time-consuming, leading to massivetime and resources for training and testing. However, to eradicate those limitations, anend-to-end ensemble approach for skin lesion analysis without compromising state-of-the-art outcomes is highly essential. With the aforementioned thing in mind, this article aimsto provide the following contributions: • Develop an end-to-end ensembling model with dual encoders in our Dermo-DOCTORframework, concatenating two diﬀerent feature maps from those two encoders to9roaden the lesion’s depth information. Such a proposed network with two diﬀer-ent encoders with the same input is likely to learn more discriminating features withlimited training samples. • Incorporate segmented lesion ROIs for the recognition as ROIs enable the classiﬁer tolearn the abstract region and detailed structural description while avoiding surroundinghealthy regions. • Apply geometry- and intensity-based image augmentations and transfer learning toalleviate overﬁtting; the class rebalancing techniques to protect the classiﬁer frombeing biased towards any particular class with more samples. • Develop and compare two other networks for detection (UNet and FCN8s) and recog-nition (ResNet-50 and Xception [12]) under the same experimental settings. • Demonstrate state-of-the-art lesion detection and recognition results, to our best knowl-edge, on two IEEE International Symposium on Biomedical Imaging (ISBI) datasets,such as ISIC-2016 and ISIC-2017, having a diﬀerent number of classes. • Implement a possible application of our Dermo-DOCTOR, deploying its trained weights,which runs in a web browser (see in YouTube ).The rest of the paper is structured accordingly. We explain the design of the Dermo-DOCTOR framework and datasets in section 2. The results and discussions of the extensiveexperiments, with the proper interpretation, are reported in section 3. Finally, we concludethe article in section 4.

2. Materials and Methods

This section manifests the materials and methods, describing the proposed Dermo-DOCTOR pipeline in Section 2.1. We explain the utilized datasets, integral preprocessing, Dermo-DOCTOR App: https://bit.ly/Dermo-DOCTOR

The overall Dermo-DOCTOR framework is illustrated in Fig. 3. We utilize two diﬀerent

Preprocessing CNN

Nev

MelISIC-2016

Nev SK

Mel

ISIC-2017 I I P P O Augmentation

Rebalancing

Predicted Mask Query Image

ISIC Datasets O Result

Figure 3: The proposed pipeline for concurrent detection and recognition systems, where the preprocessinghas incorporated with the proposed network to build a precise diagnostic system. An input I or I generatestwo outputs O and O , where O and O respectively denote the segmentation and recognition results. input types (either I or I ), where I or I is a binary or a multi-class categorization task.An input, either I or I , generates two diﬀerent outputs, such as segmentation ( O ) andrecognition ( O ). The outputs O and O are then processed to provide lesion detection andrecognition results. We process the predicted lesion masks to generate the bounding boxaround the lesion, naming lesion detection. However, diﬀerent crucial integral parts of theDermo-DOCTOR are explained in the following subsections. Two diﬀerent datasets, such as ISIC-2016 [29] and ISIC-2017 [13], are used to validateour proposed pipeline, whose class-wise distributions are presented in Fig. 4. The ISIC-2016 contains a binary class, aiming to classify as either Nevus (Nev) or Melanoma (Mel),explicating that class samples are imbalanced (4 . N ev : M el ). On the other hand,the ISIC-2017 is a multi-class categorization task, intending to classify as either Nevus(Nev), or Seborrheic Keratosis (SK), or Melanoma (Mel). The distribution of ISIC-201711 ev Mel N u m b e r o f S a m p l e s

727 173304 75TrainingTesting (a)

Nev SK Mel N u m b e r o f S a m p l e s (b)Figure 4: The distributions of the utilized ISIC datasets, where (a) is for ISIC-2016 and (b) is for ISIC-2017.The validation set for the ISIC-2016 is not available publicly (see in the left ﬁgure). also tells that class samples are highly imbalanced, where N ev : SK : M el in the trainingset is 11 . . We have applied class rebalancing and diﬀerent image augmentations (both geometry-and intensity-based) as a preprocessing, which are concisely explained as follows:

Rebalancing.

The class imbalance is a common phenomenon in the medical imagingdomain as manually annotated images are very complex and arduous to achieve [32]. Such aclass imbalance can be partially overcome using two commonly used approaches, such as thedata-level method and algorithmic level method [37]. We have combined additional imagesto the underrepresented class from the ISIC archive [44] and weighted the loss function. Forweighing the loss function, we apply W i = N i /N , where W i , N , and N i are the weight for i th class, the total sample numbers, and the sample numbers in the i th class, respectively. Augmentation.

One of the crucial challenges in the medical imaging domain is copingwith the small datasets, such as in the ISIC datasets [32]. However, we have applied diﬀer-ent augmentations based on geometric transformations, such as rotation, ﬂipping, shifting,12ooming, and image processing functions, such as gamma, logarithmic, sigmoid corrections,and stretching, or shrinking the intensity levels.However, the input I or I produces the lesion recognition output by applying twopreprocessing types: the P : only segmentation and the P : rebalancing and augmentationwith segmentation. Nowadays, CNN-based methods outperform the radiologists with high values of balancedaccuracy as proven in [46, 69]. Nevertheless, they trained the models with an enormousnumber of annotated images. However, CNNs may be obliquely limited when employedwith highly variable and distinctive image datasets with limited samples and having inter-class homogeneity and intra-class heterogeneity, as in dermoscopic ISIC datasets [13, 29].Ensembling the network is likely to alleviate data scarcity limitation for the CNN training[17, 32, 49, 59, 72]. In this context, we propose a CNN-based end-to-end ensemble networkfor simultaneous lesion detection and recognition, consisting of two encoders, a decoder, andthree Fully-connected Layers (FCLs), as shown in Fig. 5. The diﬀerent parts of the proposeddual encoder network are explained in the following paragraphs.

Encoder-1.

The ﬁrst encoder ( f en − ) in the proposed network is presented in Fig. 6.It includes Identity (Iden) and Convolutional (Conv) blocks, applying the skip connections[40] in both blocks. There are two main advantages of such a skip connection. Firstly, thelack of regularization of the new layers does not aﬀect their performance, and secondly, thenew layers are not nil even when they are regulated. In encoder-1, an input convolutionis adopted before Iden and Conv blocks, followed by a max-pooling. By stacking theseblocks on top of each other, encoder-1 has been designed for getting a lesion feature map(see in Fig. 6). The output feature map of the encoder-1 is deﬁned as X en − = f en − ( I in ),where X en − ∈ R B × H × W × D , and B , H , W , D , and I in respectively denote the batch size,height, width, depth (channel), and input batch of images. The encoder-1 is divided intoﬁve sub-blocks ( E n and n = 1 , , ..., esion MaskEncoder-decoder Network for Lesion SegmentationPreprocessing (Yes or No) GAP F C ( ℛ ) F C ( ℛ ) F C ( ℛ ) S o f t m a x GAP F C ( ℛ ) F C ( ℛ ) F C ( ℛ ) S o f t m a x GAP F C ( ℛ ) F C ( ℛ ) F C ( ℛ ) S o f t m a x ConcatLesion

Type

Averaging

Segmented Lesion Recognition 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐷 𝐷 𝐷 𝐷 𝐷 LT-1LT-3LT-2

Figure 5: The proposed network for the Dermo-DOCTOR application, where the ﬁrst encoder-decodersub-network is applied for the lesion segmentation, while the second sub-network is employed for the lesionrecognition. The segmented lesion masks are used for ROI extraction for further classiﬁcation and detectionutilizing the bounding boxes around the lesions. sub-block’s outputs ( E n and n = 1 , , ...,

5) will be used as an input for the skip connectionsto regain the lost spatial information due to pooling in the encoders.

Encoder-2.

Within the encoder-2 ( f en − ), three block components are employed, suchas entry ﬂow, middle ﬂow, and exit ﬂow [12]. Fig. 7 depicts the constructional details of theencoder-2. The batch of input images ﬁrstly passes through the input ﬂow, then the centralﬂow, repeated eight times (8 × ), and ﬁnally through the exit ﬂow. All ﬂows employ Depth-wise Separable Convolution (DwSC) [12] and residual connections. The former has been14 × × × × × × × + × × × × + Conv Block Iden Block Conv Batch Norm ReLU Max Pool + Addition E E E E E Convolutional Block (Conv Block) Identity Block (Iden Block)

Figure 6: The encoder-1 of the proposed network, where the Conv and Iden blocks are stacked on top ofeach other. The notation ( n × ) under the Iden block denotes the number of repetitions ( n -times). × × × + × + × + × + × + × × × × × + Addition E E E E E Figure 7: The encoder-2 of the proposed network, where depth-wise separable convolutions [12] were em-ployed instead of traditional convolutions to make it lightweight for real-time applications. X en − = f en − ( I in ), where X en − ∈ R B × H × W × D , and B , H , W , D , and I in respectively denotes batch size, height,width, depth (channel), and input batch of images. The encoder-2 is also divided into ﬁvesub-blocks ( E n and n = 1 , , ..., E n and n = 1 , , ...,

5) are thenused as the skip connections, when it is decoded in the detection sub-network.

Detection Sub-Network.

The decoder semantically projects the salient features oflower resolution from the encoders onto the pixel space having a higher resolution to achievea semantic lesion pixel label [22, 53, 70]. The reduced feature maps (to attain spatialinvariance) from the encoder often cause a loss in spatial resolution, bringing zigzag edgeinformation, coarseness, checkerboard artifacts, and over- and under-segmentation in thesegmented masks [33, 35, 53, 63, 70]. Although there are many approaches to alleviate theseproblems in the segmentation [2, 6, 33, 35, 53, 63, 70], there is still room for performanceimprovement. In our detection sub-network, the obtained outputs from two encoders areconcatenated channel-wise for enlarging the depth representation of the feature map, whichis named as a Fused Feature Map (FFM), where

F F M ∈ R B × H × W × D . We have appliedskip connections, inspired by the UNet, to tackle the subsampling limitations. The F F M ∈R B × H × W × D is an input to the decoder of our detection sub-network (see in Fig. 8). Unlikethe earlier networks, we skip the features from two diﬀerent encoders to recover the lostspatial information (see in Fig. 8). The channel-concatenation in each stage of decoder ispresented as [ E n ++ E n ++ D n ], where E n , E n , D n , and ++ respectively denote skippedfeature maps from encoder-1 and encoder-2, decoder feature map (at n ht stage), and channelconcatenation. E n , E n , and D n are the same scaled feature maps and n = 4 , , ...,

1. Sucha dual encoder skipped feature has enhanced depth information, which is likely to improvethe segmentation accuracy by better retrieving the lost spatial information. Besides, weemploy batch normalization [43] to overcome the internal covariate shift in the trainingphase. We also compact our network’s design, employing a DwSC [12] in place of standardconvolution. We decrease the parameters by a factor of (1 /N + 1 /K ) for each convolution16 UP1 3 × c × c × c × × × × × × E E E E E E E E Output Batch Norm Sep Conv ReLU Upsampling (UP) c Concatenation

Figure 8: The decoder of the proposed detection sub-network for reconstructing a segmentation mask withthe input resolution from the encoders’ low-resolution features. in our Dermo-DOCTOR, where N and K respectively indicate the ﬁlter number and kernelsize [34]. Recognition Sub-Network.

Diﬀerent feature maps from the encoder-1, encoder-2,and FFM are classiﬁed into desired categories applying the FCLs. We employ a GlobalAverage Pooling (GAP) layer [52] before the FCL for vectorizing the 2D feature maps intoa single long continuous linear vector, as it improves generalization and prevents overﬁtting[52]. Additionally, each FCL layer is followed by a dropout layer [78] as a regulariser, wherewe randomly set 50 . O j =1 , )is the average of the LT-1, LT-2, and LT-3. The output ( O j =1 , ) lies in N -dimensionalspace, where O ∈ R N =2 and O ∈ R N =3 respectively for the inputs I or I by applyingthe proposed preprocessing (either P or P ). It is noteworthy that the output lesion class( O j =1 , ) is obtained from the end-to-end training.17 .2. Designing of Web Application The proposed web application named Dermo-DOCTOR, for the end-users, is depictedin Fig. 9. We utilize browser-supported languages such as Hypertext Markup Language

Figure 9: The dermo-DOCTOR prototype, where the user can select or drag a dermoscopic image (png, jpg,bmp, or jpeg) as an input. The desired number of output classes also can be selected in the input panel.The processing and output panels are dedicated to select the process types and display the recognized classwith the probabilities. (HTML), Cascading Style Sheets (CSS), and Javascript, etc. for developing the Dermo-DOCTOR application. A python web framework package, called Flask, is used for developingan application by deploying our proposed CNN-based detection and recognition models andtheir trained weights. We apply HTML and CSS to design a graphical user interface withthree panels, such as an input panel, a processing panel, and an output panel (see in Fig. 9).The user can select the query image (drag-and-drop or direct upload) and the number ofquery classes (both binary and multi-class) in the input panel. Then, the user can also startthe process or reset the selections in the processing panel. The return results from the hostmachine are displayed in the output panel. 18 .3. Training Protocol

The encoder kernels are initialized with the pre-trained ImageNet weights, whereas thedecoder kernels are initialized with the “he normal” distribution [39]. The Aspect Ratio(AS) distribution tells that most of the images in ISIC-2016 and ISIC-2017 datasets have anAS of 3 : 4. Therefore, we resize all the images to 192 ×

256 pixels using the nearest-neighborinterpolation for the detection. Again, the AS distribution of both datasets’ extracted lesionROIs reveals that most of the ROIs have an AS of 1 : 1. Hence, we again resize the lesionROIs to 192 ×

192 pixels using a nearest-neighbor interpolation for the recognition. Addi-tionally, we have standardized and rescaled the training images to [0 1] for both detectionand recognition. We employ Eq. 1 as a loss function and intersection over union as a metricfor training the detection sub-network of the proposed Dermo-DOCTOR. L ( y, ˆ y ) = 1 − N (cid:88) i =1 y i × ˆ y iN (cid:88) i =1 y i + N (cid:88) i =1 ˆ y i − N (cid:88) i =1 y i × ˆ y i − N N (cid:88) i =1 [ y i log ˆ y i + (1 − y i ) log(1 − ˆ y i )] , (1)where y and ˆ y , N respectively denote the true and predicted label, the total pixel numbers.In Eq. 1, log ˆ y i and log(1 − ˆ y i ) are the estimation of log-likelihood of pixel being lesion or not,respectively. The product of y and ˆ y in Eq. 1 is the estimation of similarity (intersection)between true and predicted lesion masks. We employ categorical cross-entropy as a lossfunction and accuracy as a metric for training the recognition sub-network.

3. Results and Discussion

This section bestows diﬀerent lesion detection results and subsequent recognition inSubsections 3.1 and 3.2, respectively. The segmented lesion masks are utilized for ROIextraction to classify and detect the bounding boxes around the lesions.

Firstly, we exhibit the quantitative and qualitative segmentation results, applying theproposed Dermo-DOCTOR and two other well-known networks: the UNet and the FCN8s.19econdly, we compare our outcomes with several state-of-the-art results utilizing the samedatasets. To quantify the segmentation correctness, we use mean Recall (mRc), mean Speci-ﬁcity (mSp), and mean Intersection over Union (mIoU), which are deﬁned in Eq. 2. mRc = 1 M × N N (cid:88) i =1 M (cid:88) j =1 T P ij T P ij + F N ij ,mSp = 1 M × N N (cid:88) i =1 M (cid:88) j =1 T N ij T N ij + F P ij ,mIoU = 1 M × N N (cid:88) i =1 M (cid:88) j =1 T P ij T P ij + F N ij + F P ij , (2)where M and N denote the pixel and sample numbers, whereas T P , T N , F N , and

F P indicate true positive (lesion as a lesion), true negative (background as a background), falsenegative (lesion as a background), and false positive (background as a lesion), respectively.Table 1 confers the segmentation results of three methods on two separate datasets: theISIC-2016 and the ISIC-2017. The mean overlapping between the actual and predicted masks

Table 1: Quantitative segmentation results on ISIC-2016 and ISIC-2017 test datasets using the Dermo-DOCTOR, UNet, and FCN8s. The winner metrics for the ISIC-2016 are presented in bold font, whereasthey are underlined for the ISIC-2017.

Performance metricsTestingdatasets Models mRc mSp mIoUDermo-DOCTOR . ± . . ± . . ± . UNet 0 . ± . . ± . . ± . . ± .

15 0 . ± .

05 0 . ± . . ± .

17 0 . ± .

07 0 . ± . . ± .

21 0 . ± .

07 0 . ± . . ± .

18 0 . ± .

10 0 . ± . . . . . . . . . . . uery Image FCN8s UNet Dermo-Doctor Figure 10: Qualitative segmentation results using the FCN8s, UNet, and Dermo-DOCTOR networks. Green,Red, and Yellow colors indicate the TP, FN, and FP regions, respectively. The top-right IoUs are given forquantitative evaluation.

DOCTOR generates the best segmentation results for three out of the six cases while per-forms as second-best for the remaining three cases. Our Dermo-DOCTOR outperforms theDCL-PSI [7] for ISIC-2016 by a margin of 1 . . . . . . uery Image Result Overlaid Query Image

Result Overlaid

ISIC-2016 test dataset ISIC-2017 test dataset

Figure 11: Qualitative segmentation results on two diﬀerent test datasets employing our Dermo-DOCTOR,where the TP, FN, and FP regions are respectively denoted by the Green, Red, and Yellow colors. Thetop-right IoU exhibits quantitative validation. dataset. Moreover, the methods in Table 2 for the ISIC-2017 dataset have been defeated bythe proposed Dermo-DOCTOR for mIoU, where it beats the nearby HRFB and iFCN [65]by a margin of 2 . . . able 2: Comparative lesion segmentation results for the proposed Dermo-DOCTOR and other state-of-the-art methods on both the ISIC-2016 and ISIC-2017 test datasets.ISIC-2016 test dataset ISIC-2017 test datasetSegmentation Methods mIoU mRc mSp mIoU mRc mSpFCN ensemble [90] 0 .

84 0 .

91 0 .

96 - - -Fusion Structure [81] 0 .

85 0 .

92 0 .

96 - - -DCL-PSI [7] 0 . . .

96 0 .

72 0 .

80 0 . . . . .

85 0 .

87 0 .

96 0 . . . .

78 0 . . . . .

97 0 . .

86 0 . DCL-PSI:

Deep Class-speciﬁc Learning with Probability based Step-wise Integration

HRFB:

High-Resolution Feature Block iFCN: improved Fully Convolutional Network

This subsection exhibits the quantitative and qualitative results for lesion recognition,applying the proposed Dermo-DOCTOR and two other implemented well-known networks:the ResNet-50 and the Xception. In the end, we compare our results with several state-of-the-art results for those datasets. We utilize recall, precision, and F1-score to quantifythe recognition eﬃciency, where they respectively quantify the type-II error, the positivepredictive values, and the harmonic mean of recall and precision for revealing the trade-oﬀbetween them. Additionally, we also estimate the ROC curve and its corresponding AUCvalue to evaluate any randomly elected query image’s prediction probability.Table 3 gives the lesion recognition results, showing the outcomes for three networksand two datasets. The weighted average of recalls for ResNet-50, Xception, and Dermo-DOCTOR have been respectively improved by the margins of 4 . . . P instead of baseline P (see in Table 3). Thehighest possible recall (0 .

91) for ISIC-2016 is received from the proposed Dermo-DOCTORclassiﬁer, applying the proposed preprocessing P on the segmented masks from the pro-24 able 3: The recognition results from numerous comprehensive experiments on the ISIC-2016 and ISIC-2017test datasets on three separate networks, highlighting the weighted average metrics’ highest values.ResNet-50 Xception Dermo-DOCTORPreprocessing Preprocessing PreprocessingTest Dataset Class-wise Metrics P P P P P P Nev 0 .

94 0.87 0 .

97 0 .

91 0 .

96 0 . .

45 0 .

93 0 .

25 0 .

92 0.37 0 . .

84 0 .

88 0 .

83 0 .

91 0 . . Nev 0 .

87 0 .

98 0 .

84 0 .

98 0 .

86 0 . .

64 0 .

63 0 .

70 0 .

71 0 .

68 0 . .

83 0 .

91 0 .

81 0 .

93 0 . . Nev 0 .

90 0 .

92 0 .

90 0 .

94 0.91 0 . .

53 0 .

75 0 .

37 0 .

80 0 .

48 0 . .

83 0 .

89 0 .

80 0 .

91 0 . . Nev 0 .

80 0 .

86 0 .

80 0 .

87 0 . .

59 0 .

80 0 .

66 0 .

83 0 .

66 0 . .

45 0 .

59 0 .

42 0 .

53 0 .

50 0 . .

70 0 .

76 0 .

74 0 .

75 0 . . Nev 0 .

80 0 .

88 0 .

83 0 .

88 0 .

84 0 . .

64 0 .

59 0 .

61 0 .

52 0 .

65 0 . .

43 0 .

56 0 .

52 0 .

62 0 .

58 0 . .

70 0 .

78 0 .

74 0 .

77 0 . . Nev 0 .

80 0.84 0.85 0 .

84 0 .

85 0 . .

61 0 .

68 0 .

63 0 .

64 0 .

65 0 . .

44 0 .

57 0 .

46 0 .

57 0 .

53 0 . .

70 0 .

76 0 .

74 0 .

76 0 . . P : Segmentation; P : Segmentation+Rebalancing+Augmentation; W. Avg.: Weighted Average posed Dermo-DOCTOR segmentor. The rebalancing and augmentation, along with thesegmentation, in preprocessing P reduces the FN-rates (63 . . . . P and Dermo-DOCTOR comparing the baseline classiﬁers (ResNet-5025nd Xception) and preprocessing P . Likewise, the weighted average of precision and F1-score on ISIC-2016 are also respectively improved by 10 . . P . Remarkably, the harmonic mean of recall and precision for the Mel class of ISIC-2016has signiﬁcantly strengthened by a margin of 32 . P ) and Dermo-DOCTOR. However, comparing all the experiments on the ISIC-2016, the proposed P andDermo-DOCTOR are the best preprocessing and classiﬁer having a type-II error of 9 . . .

78, 0 .

79, and 0 . P and proposed Dermo-DOCTOR on the ISIC-2017. The recall ofSK and Mel classes in the ISIC-2017 dataset, applying the proposed Dermo-DOCTOR andpreprocessing P , have respectively updated by the margins of 16 . . . . P . However, comparing all the experiments on theISIC-2017, the proposed P and Dermo-DOCTOR are the best preprocessing and classiﬁer,with a type-II error of 22 . . . . . .

98 and 0 .

91 respectively for ISIC-2016 and ISIC-2017. The rebalancing and augmentation employment with segmentation26 .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

False Positive Rate T r u e P o s i t i v e R a t e Our Proposed with P2 (AUC=0.98)Our Proposed with P1 (AUC=0.81)Xception with P2 (AUC=0.95)Xception with P1 (AUC=0.77)ResNet-50 with P2 (AUC=0.95)ResNet-50 with P1 (AUC=0.81) (a)

False Positive Rate T r u e P o s i t i v e R a t e Our Proposed with P2 (AUC=0.91)Our Proposed with P1 (AUC=0.87)Xception with P2 (AUC=0.88)Xception with P1 (AUC=0.82)ResNet-50 with P2 (AUC=0.87)ResNet-50 with P1 (AUC=0.84) (b)Figure 12: The ROC curves ((a) for ISIC-2016 and (b) for ISIC-2017) of skin lesion recognition, where wehave plotted the ROC for the proposed Dermo-DOCTOR and implemented ResNet-50 and Xception withtwo diﬀerent preprocessing ( P and P ). ( P ) heightens AUC for both ISIC-2016 and ISIC-2017 datasets of all classiﬁers. The Dermo-DOCTOR with P has beaten the baseline Xception and ResNet-50 respectively by 3 . . . . P .The detailed class-wise performances of the lesion recognition by the proposed Dermo-DOCTOR and preprocessing P are exhibited in Table 4. The ISIC-2016’s confusion ma-trix in Table 4 (left) shows that among 304-Nev samples correctly recognized samples are273 (89 .

80 %), whereas barely 31 (10 .

20 %)-Nev samples are recognized as Mel (as FP). It alsoreveals that among 75-Mel samples, rightly recognized samples are 71 (94 .

67 %), whereas only4 (5 .

33 %)-Mel samples are improperly recognized as Nev (as FN). Again, the ISIC-2017’sconfusion matrix (see in Table 4 (right)) demonstrates that 81 .

68 %-Nev samples are cor-rectly recognized as Nev class, while 18 .

32 %-Nev samples are wrongly recognized to otherclasses as FP (5 .

85 % as SK and 12 .

47 % as Mel). Similarly, 17 .

78 % and 38 .

46 % samplesof the SK- and Mel-classes belong to FP and FN, respectively. Although the 38 .

46 % of27 able 4: The confusion matrix for the ISIC-2016 with 379 samples (left) and ISIC-2016 with 600 samples(right) test datasets, employing the proposed Dermo-DOCTOR and preprocessing P ActualNev MelNev 27389 .

80 % 45 .

33 % P r e d i c t e d Mel 3110 .

20 % 7194 .

67 % ActualNev SK MelNev 32181 .

68 % 1011 .

11 % 2823 .

93 %SK 235 .

85 % 7482 .

22 % 1714 .

53 % P r e d i c t e d Mel 4912 .

47 % 66 .

67 % 7261 .

54 %the positive samples (Mel) are improperly recognized, it is still better than the baseline58 . P ). Fig. 13 bestows qualitativeresults from the proposed Dermo-DOCTOR classiﬁer and preprocessing P for the lesionrecognition into diﬀerent, either two classes or three classes. For concurrent detection andrecognition, we utilize the segmented masks and categorized class for contouring the lesions(green color) and label annotation on the image to help the dermatologists for further as-sessment (see in Fig. 13). More concurrent results for all the test images are available onYouTube (ISIC-2016 and ISIC-2017 ). However, the results in Fig. 13 illustrate a few chal-lenging images, where we show some wrongly recognized images. Those qualitative resultsdepict that the detection and recognition are precise even the query test images containdiﬀerent artifacts (see in Fig. 2). Although the Dermo-DOCTOR incorrectly predicts someimages, they visually seem like a predicted class.Table 5 describes the comparison of the results of our Dermo-DOCTOR and other meth-ods, which were trained and tested on the same ISIC datasets. The proposed Dermo-DOCTOR produces the best recognition for two out of the six cases while performingsecond-best with the winning methods on the other four cases (see in Table 5). ISIC-2016 (Detection & Recognition): https://bit.ly/Dermo-DOCTOR_ISIC_16 ISIC-2017 (Detection & Recognition): https://bit.ly/Dermo-DOCTOR_ISIC_17 SIC-2016 test dataset ISIC-2017 test dataset

Figure 13: Example of several qualitative classiﬁcation results of the challenging images of the ISIC-2016and ISIC-2017 test datasets using the Dermo-DOCTOR, where the recognition has accomplished using thesegmented ROIs (green color) from the Dermo-DOCTOR.

Comparison of ISIC-2016.

The proposed network produces the best results for theAUC by beating the state-of-the-art of Yu et al. [89] with a 12 . . . . able 5: The state-of-the-art comparison with proposed Dermo-DOCTOR, which had trained, validated,and tested on the ISIC-2016 and ISIC-2017 datasets.ISIC-2016 test dataset ISIC-2017 test datasetClassiﬁcation Methods Recall Precision AUC Recall Precision AUCResNet-50 [10] 0 .

56 0 .

71 0 .

85 - - -GR [74] - - - 0 .

15 - 0 . .

77 - . IR [3] 0 .

82 - 0 .

77 0 .

76 - -FPRPN [77] . .

82 0 . .

98 0 . . .

60 0 .

69 0 .

86 - 0 .

72 0 . Proposed Dermo-DOCTOR (2020) . .

93 0 . .

78 0 .

79 0 . GR:

Gabor Wavelet-based CNN [74]

ARLCNN:

Attention Residual Learning CNN (ResNet-14 & ResNet-50) [92]

IR:

Inception-ResNet-V2 (ISIC-2016), ResNet-50 (ISIC-2017,ISIC-2018) [3]

FPRPN:

Feature Pyramid Network (FPN) and Region Proposal Network (RPN) [77]

MFA:

Multi-network based feature aggregation [89] . Comparison of ISIC-2017.

The Dermo-DOCTOR serves as the second-best resultsconcerning all the metrics, where it beats the state-of-the-art FPRPN [77] with a marginof 12 . . . . . . Fig. 14 illustrates the prototype of the developed web application deploying our Dermo-DOCTOR, which runs in a web browser at “http://127.0.0.1:5000/” by accessing the CNNenvironments of the local machine. The app takes a dermoscopic image (png, jpg, bmp,30 igure 14: Our designed web application to detect and recognize skin lesions simultaneously. Dermatologistscan utilize this application by selecting or dragging the image as an input. The app will confer the lesiondetection and recognition results for either binary class or multi-classes. or jpeg) as an input, displaying the user’s diagnosis result as soon as the back-end modelanalyzes a given image. The recognized class with its probability is displayed in the outputpanel, highlighting the lesion ROI by a green color bounding box. Therefore, it helps thedermatologists focus on that detected area for cross-checking the predicted class. A real-time utilization of the Dermo-DOCTOR has been uploaded to YouTube , which confers lesstime-latency of getting results. The reason for that is the app sends images to the hostand receives the host’s results through the internet and the time for prediction in the host(higher traﬃc in the host will increase the latency). However, the dedicated machine withGPU can alleviate this time-latency limitation. We have tested the app on our local machineand could not make it public due to resource limitations. Dermo-DOCTOR App: https://bit.ly/Dermo-DOCTOR_App . Conclusion Despite the present colossal challenges due to high visual and intra-class variability,inter-class similarity, and the presence of diﬀerent artifacts, automated skin lesion detectionand recognition are incredibly crucial. However, this article proposed and developed an au-tomated CNN-based lesion detection and recognition network, integrating ROI extractionby segmentation, image augmentations, and class rebalancing. Our experimental resultsdemonstrated that the proposed Dermo-DOCTOR could detect and recognize the lesionmore accurately as we concatenated features from two diﬀerent encoders. Such a concatena-tion provides more prominent and discriminating feature maps of the skin lesions comparinga single encoder. The segmented lesions rather than the whole images can provide moresalient and representative features from the CNNs, leading to improved lesion recognition.Moreover, the rebalanced class distribution attained better performance of the recognitionas compared to the imbalanced distribution. Additionally, the augmentation led the CNN-based classiﬁer to be more generic as CNNs can learn from diverse training samples. Thus, itachieved state-of-the-art performance to detect and recognize the lesions from two diﬀerenttest datasets, such as ISIC-2016 and ISIC-2017. We will further explore and investigate theeﬀects of improved segmentation and weighting of the underrepresented classes in the future.The deployment of our framework to a web application precisely detected and recognizedthe lesions concurrently. The developed web application will be improved, making it moreuser-friendly for dermatologists and deploying it to the google cloud platform for clinicalapplications.

Author ContributionsM. K. Hasan:

Conceptualization, Methodology, Software, Formal analysis, Investiga-tion, Writing- Review & Editing, Supervision;

S. Roy:

Validation, Data Curation, Writing-Original Draft;

C. Mondal:

Validation, Data Curation, Writing- Original Draft;

M. A.Alam:

Conceptualization, Data Curation;

M. T. E. Elahi:

Validation, Writing- OriginalDraft;

A. Dutta:

Validation, Writing- Original Draft;

S. M. T. U. Raju:

Software; M. . Jawad: Validation, Writing- Review & Editing;

M. Ahmad:

Supervision.

Acknowledgements

None. No funding to declare.

Conﬂict of Interest

All authors have no conﬂict of interest to publish this research.

References [1] Acharya, U.R., Molinari, F., Sree, S.V., Chattopadhyay, S., Ng, K.H., Suri, J.S., 2012. Automateddiagnosis of epileptic eeg using entropies. Biomedical Signal Processing and Control 7, 401–408.[2] Al-Masni, M.A., Al-antari, M.A., Choi, M.T., Han, S.M., Kim, T.S., 2018. Skin lesion segmentation indermoscopy images via deep full resolution convolutional networks. Computer methods and programsin biomedicine 162, 221–231.[3] Al-Masni, M.A., Kim, D.H., Kim, T.S., 2020. Multiple skin lesions diagnostics via integrated deepconvolutional networks for segmentation and classiﬁcation. Computer Methods and Programs inBiomedicine 190, 105351.[4] Al Nazi, Z., Abir, T.A., 2020. Automatic skin lesion segmentation and melanoma detection: Transferlearning approach with u-net and dcnn-svm, in: Proceedings of International Joint Conference onComputational Intelligence, Springer. pp. 371–381.[5] Amin, J., Sharif, A., Gul, N., Anjum, M.A., Nisar, M.W., Azam, F., Bukhari, S.A.C., 2020. Integrateddesign of deep features fusion for localization and classiﬁcation of skin cancer. Pattern RecognitionLetters 131, 63–70.[6] Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. SegNet: A deep convolutional encoder-decoderarchitecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence39, 2481–2495.[7] Bi, L., Kim, J., Ahn, E., Kumar, A., Feng, D., Fulham, M., 2019. Step-wise integration of deepclass-speciﬁc learning for dermoscopic image segmentation. Pattern recognition 85, 78–89.[8] Bi, L., Kim, J., Ahn, E., Kumar, A., Fulham, M., Feng, D., 2017. Dermoscopic image segmentation viamultistage fully convolutional networks. IEEE Transactions on Biomedical Engineering 64, 2065–2074.[9] Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A., 2018. Global cancerstatistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: a cancer journal for clinicians 68, 394–424.

10] Brinker, T.J., Hekler, A., Enk, A.H., von Kalle, C., 2019. Enhanced classiﬁer training to improveprecision of a convolutional neural network to identify images of skin lesions. PloS one 14.[11] Cheng, Y., Swamisai, R., Umbaugh, S.E., Moss, R.H., Stoecker, W.V., Teegala, S., Srinivasan, S.K.,2008. Skin lesion classiﬁcation using relative color features. Skin Research and Technology 14, 53–64.[12] Chollet, F., 2017. Xception: Deep learning with depthwise separable convolutions, in: Proceedings ofthe IEEE conference on computer vision and pattern recognition, pp. 1251–1258.[13] Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Li-opyris, K., Mishra, N., Kittler, H., et al., 2018. Skin lesion analysis toward melanoma detection: Achallenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the internationalskin imaging collaboration (isic), in: 2018 IEEE 15th International Symposium on Biomedical Imaging(ISBI 2018), IEEE. pp. 168–172.[14] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchicalimage database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee. pp. 248–255.[15] Dennis Schmid, 2018.

Number of dermatologists in selected European countries in 2015 . https://tinyurl.com/y59xoc7n [Accessed: 4 Sept 2020].[16] Department of Health (Commonwealth of Australia), 2017. Dermatology 2016 Factsheet . https://tinyurl.com/y3z39b9r [Accessed: 15 Jun 2020].[17] Dolz, J., Desrosiers, C., Wang, L., Yuan, J., Shen, D., Ayed, I.B., 2020. Deep cnn ensembles andsuggestive annotations for infant brain mri segmentation. Computerized Medical Imaging and Graphics79, 101660.[18] Estava, A., Kuprel, B., Novoa, R., et al., 2017. Dermatologist level classiﬁcation of skin cancer withdeep neural networks [j]. Nature 542, 115.[19] Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A., 2010. The pascal visual objectclasses (voc) challenge. International journal of computer vision 88, 303–338.[20] Fujisawa, Y., Inoue, S., Nakamura, Y., 2019. The possibility of deep learning-based, computer-aidedskin tumor classiﬁers. Frontiers in Medicine 6, 191.[21] Furey, T.S., Cristianini, N., Duﬀy, N., Bednarski, D.W., Schummer, M., Haussler, D., 2000. Supportvector machine classiﬁcation and validation of cancer tissue samples using microarray expression data.Bioinformatics 16, 906–914.[22] Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P., Garcia-Rodriguez, J., 2018. A survey on deep learning techniques for image and video semantic segmentation.Applied Soft Computing 70, 41–65.[23] Ge, Z., Demyanov, S., Chakravorty, R., Bowling, A., Garnavi, R., 2017. Skin disease recognition using eep saliency features and multimodal learning of dermoscopy and clinical images, in: InternationalConference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 250–258.[24] Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A., 2020. Skin lesion classiﬁcation usingensembles of multi-resolution eﬃcientnets with meta data. MethodsX , 100864.[25] Ghadiyaram, D., Bovik, A.C., 2017. Perceptual quality prediction on authentically distorted imagesusing a bag of features approach. Journal of vision 17, 32–32.[26] Glazer, A.M., Rigel, D.S., 2017. Analysis of trends in geographic distribution of us dermatologyworkforce density. JAMA dermatology 153, 472–473.[27] Goyal, M., Oakley, A., Bansal, P., Dancey, D., Yap, M.H., 2019. Skin lesion segmentation in dermoscopicimages with ensemble deep learning methods. IEEE Access .[28] Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M.S., 2016. Deep learning for visual under-standing: A review. Neurocomputing 187, 27–48.[29] Gutman, D., Codella, N.C., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A., 2016. Skinlesion analysis toward melanoma detection: A challenge at the international symposium on biomedicalimaging (isbi) 2016, hosted by the international skin imaging collaboration (isic). arXiv:1605.01397 .[30] Ha, Q., Liu, B., Liu, F., 2020. Identifying melanoma images using eﬃcientnet ensemble: Winningsolution to the siim-isic melanoma classiﬁcation challenge. arXiv preprint arXiv:2010.05351 .[31] Hameed, N., Shabut, A.M., Ghosh, M.K., Hossain, M., 2020. Multi-class multi-level classiﬁcationalgorithm for skin lesions classiﬁcation using machine learning techniques. Expert Systems with Ap-plications 141, 112961.[32] Harangi, B., 2018. Skin lesion classiﬁcation with ensembles of deep convolutional neural networks.Journal of biomedical informatics 86, 25–32.[33] Hasan, M.K., Alam, M.A., Elahi, M.T.E., Roy, S., Mart´ı, R., 2021a. Drnet: Segmentation and local-ization of optic disc and fovea from diabetic retinopathy image. Artiﬁcial Intelligence in Medicine 111,102001.[34] Hasan, M.K., Calvet, L., Rabbani, N., Bartoli, A., 2021b. Detection, segmentation, and 3d poseestimation of surgical tools using convolutional neural networks and algebraic geometry. Medical ImageAnalysis , 101994.[35] Hasan, M.K., Dahal, L., Samarakoon, P.N., Tushar, F.I., Mart´ı, R., 2020. Dsnet: Automatic dermo-scopic skin lesion segmentation. Computers in Biology and Medicine , 103738.[36] Hawas, A.R., Guo, Y., Du, C., Polat, K., Ashour, A.S., 2020. Oce-ngc: A neutrosophic graph cutalgorithm using optimized clustering estimation algorithm for dermoscopic skin lesion segmentation.Applied Soft Computing 86, 105931.[37] He, H., Garcia, E.A., 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data ngineering 21, 1263–1284.[38] He, K., Gkioxari, G., Doll´ar, P., Girshick, R., 2017. Mask r-cnn, in: Proceedings of the IEEE interna-tional conference on computer vision, pp. 2961–2969.[39] He, K., Zhang, X., Ren, S., Sun, J., 2015. Delving deep into rectiﬁers: Surpassing human-level perfor-mance on imagenet classiﬁcation, in: Proceedings of the IEEE international conference on computervision, pp. 1026–1034.[40] He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedingsof the IEEE conference on computer vision and pattern recognition, pp. 770–778.[41] Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks, in: Proceedings of the IEEE confer-ence on computer vision and pattern recognition, pp. 7132–7141.[42] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutionalnetworks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.4700–4708.[43] Ioﬀe, S., Szegedyet, C., 2015. Batch Normalization: accelerating deep network training by reducinginternal covariate shift. arXiv:1502.03167 .[44] ISIC, 2018. ISIC Archive . https://tinyurl.com/11hgg83u [Accessed: 09 May 2020].[45] Jahanifar, M., Tajeddin, N.Z., Asl, B.M., Gooya, A., 2018. Supervised saliency map driven segmentationof lesions in dermoscopic images. IEEE journal of biomedical and health informatics 23, 509–518.[46] Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C., Liang, H., Baxter, S.L., McKeown, A., Yang,G., Wu, X., Yan, F., et al., 2018. Identifying medical diagnoses and treatable diseases by image-baseddeep learning. Cell 172, 1122–1131.[47] Khan, M.A., Sharif, M., Akram, T., Bukhari, S.A.C., Nayak, R.S., 2020. Developed newton-raphsonbased deep features selection framework for skin lesion recognition. Pattern Recognition Letters 129,293–303.[48] Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classiﬁcation with deep convolutionalneural networks, in: Advances in neural information processing systems, pp. 1097–1105.[49] Kumar, A., Kim, J., Lyndon, D., Fulham, M., Feng, D., 2016. An ensemble of ﬁne-tuned convolutionalneural networks for medical image classiﬁcation. IEEE journal of biomedical and health informatics21, 31–40.[50] Lattooﬁ, N.F., Al-sharuee, I.F., Kamil, M.Y., Obaid, A.H., Mahidi, A.A., Omar, A.A., et al., 2019.Melanoma skin cancer detection based on abcd rule, in: 2019 First International Conference of Com-puter and Applied Sciences (CAS), IEEE. pp. 154–157.[51] Lee, Y.C., Jung, S.H., Won, H.H., 2018. Wonderm: Skin lesion classiﬁcation with ﬁne-tuned neuralnetworks. arXiv preprint arXiv:1808.03426 .

52] Lin, M., Chen, Q., Yan, S., 2013. Network in network. arXiv:1312.4400 .[53] Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation, in:Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440.[54] Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. International journal ofcomputer vision 60, 91–110.[55] Ma´ckiewicz, A., Ratajczak, W., 1993. Principal components analysis (pca). Computers & Geosciences19, 303–342.[56] Maglogiannis, I., Doukas, C.N., 2009. Overview of advanced computer vision systems for skin lesionscharacterization. IEEE transactions on information technology in biomedicine 13, 721–733.[57] Mahbod, A., Schaefer, G., Ellinger, I., Ecker, R., Pitiot, A., Wang, C., 2019. Fusing ﬁne-tuned deepfeatures for skin lesion classiﬁcation. Computerized Medical Imaging and Graphics 71, 19–29.[58] Mahbod, A., Schaefer, G., Wang, C., Dorﬀner, G., Ecker, R., Ellinger, I., 2020. Transfer learningusing a multi-scale and multi-network ensemble for skin lesion classiﬁcation. Computer methods andprograms in biomedicine 193, 105475.[59] Moitra, D., Mandal, R.K., 2020. Prediction of non-small cell lung cancer histology by a deep ensembleof convolutional and bidirectional recurrent neural network. Journal of Digital Imaging , 1–8.[60] Mporas, I., Perikos, I., Paraskevas, M., 2020. Color models for skin lesion classiﬁcation from dermato-scopic images, in: Advances in Integrations of Intelligent Methods. Springer, pp. 85–98.[61] Nachbar, F., Stolz, W., Merkle, T., Cognetta, A.B., Vogt, T., Landthaler, M., Bilek, P., Braun-Falco,O., Plewig, G., 1994. The abcd rule of dermatoscopy: high prospective value in the diagnosis of doubtfulmelanocytic skin lesions. Journal of the American Academy of Dermatology 30, 551–559.[62] Navarro, F., Escudero-Vi˜nolo, M., Besc´os, J., 2018. Accurate segmentation and registration of skinlesion images to evaluate lesion change. IEEE journal of biomedical and health informatics 23, 501–508.[63] Odena, A., Dumoulin, V., Olah, C., 2016. Deconvolution and checkerboard artifacts. Distill 1, e3.[64] Oliveira, R.B., Papa, J.P., Pereira, A.S., Tavares, J.M.R., 2018. Computational methods for pigmentedskin lesion classiﬁcation in images: review and future trends. Neural Computing and Applications 29,613–636.[65] ¨Ozt¨urk, S¸., ¨Ozkaya, U., 2020. Skin lesion segmentation with improved convolutional neural network.Journal of digital imaging .[66] Pacheco, A.G., Ali, A.R., Trappenberg, T., 2019. Skin cancer detection based on deep learning andentropy to detect outlier samples. arXiv preprint arXiv:1909.04525 .[67] Pour, M.P., Seker, H., 2020. Transform domain representation-driven convolutional neural networksfor skin lesion segmentation. Expert Systems with Applications 144, 113129.

68] Qin, Z., Liu, Z., Zhu, P., Xue, Y., 2020. A gan-based image synthesis method for skin lesion classiﬁca-tion. Computer Methods and Programs in Biomedicine , 105568.[69] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C.,Shpanskaya, K., et al., 2017. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deeplearning. arXiv:1711.05225 .[70] Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical imagesegmentation, in: International Conference on Medical image computing and computer-assisted inter-vention, Springer. pp. 234–241.[71] Sarker, M.M.K., Rashwan, H.A., Akram, F., Banu, S.F., Saleh, A., Singh, V.K., Chowdhury, F.U.,Abdulwahab, S., Romani, S., Radeva, P., et al., 2018. Slsdeep: Skin lesion segmentation based on dilatedresidual and pyramid pooling networks, in: International Conference on Medical Image Computing andComputer-Assisted Intervention, Springer. pp. 21–29.[72] Savelli, B., Bria, A., Molinara, M., Marrocco, C., Tortorella, F., 2020. A multi-context cnn ensemblefor small lesion detection. Artiﬁcial Intelligence in Medicine 103, 101749.[73] Scheﬀe, H., 1999. The analysis of variance. volume 72. John Wiley & Sons.[74] Serte, S., Demirel, H., 2019. Gabor wavelet-based deep learning for skin lesion classiﬁcation. Computersin biology and medicine 113, 103423.[75] Shahin, A.H., Kamal, A., Elattar, M.A., 2018. Deep ensemble learning for skin lesion classiﬁcation fromdermoscopic images, in: 2018 9th Cairo International Biomedical Engineering Conference (CIBEC),IEEE. pp. 150–153.[76] Siegel, R.L., Miller, K.D., Jemal, A., 2020. Cancer statistics, 2020. CA: A Cancer Journal for Clinicians70, 7–30.[77] Song, L., Lin, J.P., Wang, Z.J., Wang, H., 2020. An end-to-end multi-task deep learning framework forskin lesion analysis. IEEE Journal of Biomedical and Health Informatics .[78] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a simpleway to prevent neural networks from overﬁtting. The journal of machine learning research 15, 1929–1958.[79] Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecturefor computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition,pp. 2818–2826.[80] Tan, M., Le, Q.V., 2019. Eﬃcientnet: Rethinking model scaling for convolutional neural networks.arXiv preprint arXiv:1905.11946 .[81] Tang, Y., Yang, F., Yuan, S., et al., 2019. A multi-stage framework with context information fusionstructure for skin lesion segmentation, in: 2019 IEEE 16th International Symposium on Biomedical maging (ISBI 2019), IEEE. pp. 1407–1410.[82] Torrey, L., Shavlik, J., 2010. Transfer learning, in: Handbook of research on machine learning applica-tions and trends: algorithms, methods, and techniques. IGI Global, pp. 242–264.[83] Valle, E., Fornaciali, M., Menegola, A., Tavares, J., Bittencourt, F.V., Li, L.T., Avila, S., 2020. Data,depth, and design: Learning reliable models for skin lesion analysis. Neurocomputing 383, 303–313.[84] Venugopal, A., Stoﬀel, E.M., 2019. Colorectal cancer in young adults. Current treatment options ingastroenterology 17, 89–98.[85] Xie, F., Yang, J., Liu, J., Jiang, Z., Zheng, Y., Wang, Y., 2020a. Skin lesion segmentation usinghigh-resolution convolutional neural network. Computer Methods and Programs in Biomedicine 186,105241.[86] Xie, Y., Zhang, J., Xia, Y., Shen, C., 2020b. A mutual bootstrapping model for automated skin lesionsegmentation and classiﬁcation. IEEE Transactions on Medical Imaging .[87] Yadav, S.S., Jadhav, S.M., 2019. Deep convolutional neural network based medical image classiﬁcationfor disease diagnosis. Journal of Big Data 6, 113.[88] Yilmaz, E., Trocan, M., 2020. Benign and malignant skin lesion classiﬁcation comparison for threedeep-learning architectures, in: Asian Conference on Intelligent Information and Database Systems,Springer. pp. 514–524.[89] Yu, Z., Jiang, F., Zhou, F., He, X., Ni, D., Chen, S., Wang, T., Lei, B., 2020. Convolutional descriptorsaggregation via cross-net for skin lesion recognition. Applied Soft Computing , 106281.[90] Yuan, Y., 2017. Automatic skin lesion segmentation with fully convolutional-deconvolutional networks.arXiv preprint arXiv:1703.05165 .[91] Zhang, J., Xie, Y., Xia, Y., Shen, C., 2019. Attention residual learning for skin lesion classiﬁcation.IEEE transactions on medical imaging 38, 2092–2103.[92] Zhang, N., Cai, Y.X., Wang, Y.Y., Tian, Y.T., Wang, X.L., Badami, B., 2020. Skin cancer diagnosisbased on optimized convolutional neural network. Artiﬁcial Intelligence in Medicine 102, 101756.maging (ISBI 2019), IEEE. pp. 1407–1410.[82] Torrey, L., Shavlik, J., 2010. Transfer learning, in: Handbook of research on machine learning applica-tions and trends: algorithms, methods, and techniques. IGI Global, pp. 242–264.[83] Valle, E., Fornaciali, M., Menegola, A., Tavares, J., Bittencourt, F.V., Li, L.T., Avila, S., 2020. Data,depth, and design: Learning reliable models for skin lesion analysis. Neurocomputing 383, 303–313.[84] Venugopal, A., Stoﬀel, E.M., 2019. Colorectal cancer in young adults. Current treatment options ingastroenterology 17, 89–98.[85] Xie, F., Yang, J., Liu, J., Jiang, Z., Zheng, Y., Wang, Y., 2020a. Skin lesion segmentation usinghigh-resolution convolutional neural network. Computer Methods and Programs in Biomedicine 186,105241.[86] Xie, Y., Zhang, J., Xia, Y., Shen, C., 2020b. A mutual bootstrapping model for automated skin lesionsegmentation and classiﬁcation. IEEE Transactions on Medical Imaging .[87] Yadav, S.S., Jadhav, S.M., 2019. Deep convolutional neural network based medical image classiﬁcationfor disease diagnosis. Journal of Big Data 6, 113.[88] Yilmaz, E., Trocan, M., 2020. Benign and malignant skin lesion classiﬁcation comparison for threedeep-learning architectures, in: Asian Conference on Intelligent Information and Database Systems,Springer. pp. 514–524.[89] Yu, Z., Jiang, F., Zhou, F., He, X., Ni, D., Chen, S., Wang, T., Lei, B., 2020. Convolutional descriptorsaggregation via cross-net for skin lesion recognition. Applied Soft Computing , 106281.[90] Yuan, Y., 2017. Automatic skin lesion segmentation with fully convolutional-deconvolutional networks.arXiv preprint arXiv:1703.05165 .[91] Zhang, J., Xie, Y., Xia, Y., Shen, C., 2019. Attention residual learning for skin lesion classiﬁcation.IEEE transactions on medical imaging 38, 2092–2103.[92] Zhang, N., Cai, Y.X., Wang, Y.Y., Tian, Y.T., Wang, X.L., Badami, B., 2020. Skin cancer diagnosisbased on optimized convolutional neural network. Artiﬁcial Intelligence in Medicine 102, 101756.