[PDF] k-Means Clustering and Ensemble of Regressions: An Algorithm for the ISIC 2017 Skin Lesion Segmentation Challenge

Abstract

This abstract briefly describes a segmentation algorithm developed for the ISIC 2017 Skin Lesion Detection Competition hosted at [ref]. The objective of the competition is to perform a segmentation (in the form of a binary mask image) of skin lesions in dermoscopic images as close as possible to a segmentation performed by trained clinicians, which is taken as ground truth. This project only takes part in the segmentation phase of the challenge. The other phases of the competition (feature extraction and lesion identification) are not considered. The proposed algorithm consists of 4 steps: (1) lesion image preprocessing, (2) image segmentation using k-means clustering of pixel colors, (3) calculation of a set of features describing the properties of each segmented region, and (4) calculation of a final score for each region, representing the likelihood of corresponding to a suitable lesion segmentation. The scores in step (4) are obtained by averaging the results of 2 different regression models using the scores of each region as input. Before using the algorithm these regression models must be trained using the training set of images and ground truth masks provided by the Competition. Steps 2 to 4 are repeated with an increasing number of clusters (and therefore the image is segmented into more regions) until there is no further improvement of the calculated scores.

Full PDF

aa r X i v : . [ c s . C V ] F e b k -Means Clustering and Ensemble of Regressions: An Algorithmfor the ISIC 2017 Skin Lesion Segmentation Challenge David Alvarez L. * and Monica Iglesias M. *** Medical Physicist at Asturias University Central Hospital, Oviedo, Spain ** Medical Imaging TechnologistOctober 25, 2018

This abstract brieﬂy describes a segmentation algorithm developed for the ISIC [1] 2017 SkinLesion Detection Competition hosted at [2]. The objective of the competition is to perform asegmentation (in the form of a binary mask image) of skin lesions in dermoscopic images as closeas possible to a segmentation performed by trained clinicians, which is taken as ground truth.This project only takes part in the segmentation phase of the challenge. The other phases ofthe competition (feature extraction and lesion identiﬁcation) are not considered.The proposed algorithm consists of 4 steps: (1) lesion image preprocessing, (2) image seg-mentation using k -means clustering of pixel colors, (3) calculation of a set of features describingthe properties of each segmented region, and (4) calculation of a ﬁnal score for each region,representing the likelihood of corresponding to a suitable lesion segmentation. The scores instep (4) are obtained by averaging the results of 2 diﬀerent regression models using the scores ofeach region as input. Before using the algorithm these regression models must be trained usingthe training set of images and ground truth masks provided by the Competition. Steps 2 to 4are repeated with an increasing number of clusters (and therefore the image is segmented intomore regions) until there is no further improvement of the calculated scores. In the following sections each step of the algorithm is described in detail, as well as the workﬂowbetween them and the processing of the ﬁnal segmentation.

Prior to segmentation, all images are preprocessed in order to minimize undesirable featuresthat could aﬀect the performance of the algorithm, such as bright reﬂections, presence of hairand color diﬀerences between images. The images are also normalized to a unique size andshape. This normalization allows for the comparison of region features such as positions andsizes between diﬀerent images.

Corresponding author: David Alvarez ([email protected]) ize normalization All images are rescaled to 1024 × Removal of reﬂections

Reﬂections in dermoscopic images typically appear as small, very bright areas. In this projectthe brightness of a color pixel is deﬁned as the sum of the values of its 3 channels (red, green andblue). The value of the brightest pixels is replaced by the average of their neighbours, excludingany adjacent pixel also classiﬁed as bright . In order to decide whether a pixel must be replaced,the following rule is followed: the value t of the 99-percentile of brightness in each image isfound; then, all pixels with a brightness greater than 0 . t are classiﬁed as bright and replaced.This threshold was chosen by visually evaluating the results on a small number of images of thetraining set. Removal of hair

The method proposed by [3] is used. First, a morphological closing [4] operation is performedindependently on each of the color channels of the image, using as selection element a disk ofradius equal to 5 pixels. Then a median 3 × White balance

A simple white balance is performed, based on the

Gray World Theory [5] assumption. Thevalues of all pixels on each color channel are added. Let us denote the 3 resulting values S red , S green and S blue . Using these values, 2 color balance factors are calculated as k red = S red /S green and k blue = S blue /S green . Finally, multiplying the red and blue values of each pixel in theimage by their corresponding factor results in an image with an equal “amount of color” in all3 channels. The preprocessed image is segmented using an iterative color clustering algorithm, startingwith 3 clusters. All pixels in the image (considered as points in a 3-dimensional space whosecomponents are the 3 color channels of the image) are assigned to one of the clusters using a k -means algorithm.For each cluster a binary mask is constructed, showing the locations of the pixels belongingto it. Since the mask can contain many small non-connected regions, morphological openingand closing operations are performed, using a disk of radius 10 pixels. These operations allowsmall regions to be absorbed into their bigger neighbours. The number of small, probably notsigniﬁcant regions, is thereby reduced. A set of 10 features is calculated for each of the remaining regions in all clusters. Each featureencodes a diﬀerent property of the region as a number in the range [0 , egion area A histogram of 500 bins in the interval [0 , ] is built using the areas of the lesion masksprovided in the training set, previously rescaled to 1024 × Region position

To calculate this feature, the centroids of all rescaled lesion masks in the training set are calcu-lated. A 2-dimensional gaussian is then ﬁt to the set of centroids. The maximum of the gaussian,which is situated very close to the center of the images, is set to 1. The value of the positionfeature for any region is obtained as the value of the gaussian at the coordinates of the centroidof the region. Thus, regions whose centroid is close to the center of the image are assigned thehighest value of this feature, while the value diminishes the further the centroid is from thatposition.

Region circularity

Let A and p be the area and perimeter of a region respectively. The circularity of the region isdeﬁned as circularity = 4 πAp Note that for a perfectly circular region circularity = 1.

Region solidity

Solidity is deﬁned as the ratio of the number of pixels in the region to the number of pixels inthe convex hull of the region

Region average color (3 channels)

The mean value of each color channel for every lesion in the training set is calculated. Thenthe mean and the standard deviation of the values obtained is calculated for each channel. A1-dimensional gaussian function is deﬁned for each color using the corresponding mean anddeviation, with the maximum of the function set to 1. For any given region, the color featurefor each channel is calculated by applying these functions to the mean color of that region. Notethat those regions with a mean color more similar to the mean color of all lesions in the trainingset will have a higher value of these features.

Similarity between region color and color in the center of the image (3 channels)

The image is divided into 9 equal sized squares, and only the pixels in the central squareare considered. The mean and standard deviation of the pixel values in the central area arecalculated for each of the 3 color channels separately. Using the same approach as in theprevious feature, 3 feature values are calculated which will account for the similarity betweenthe mean color of the region and the mean color in the central area of the image.

Using the set of 10 features, a score is calculated for each region applying 2 diﬀerent regressionmodels. The average of the 2 scores is used as the ﬁnal score for the region. These scores arepredictions of the Jaccard Index between the region and the ground truth mask of the image.3he models must be previously trained with the training set provided by the Competition (seesection 3). The models used were: • Random forest [6] composed of 50 random trees • Epsilon-Support Vector Regression [7] (SVR) with the following parameters: C = 100, γ = 0 .

5, and ǫ = 0 . k nearest neighbours were considered,but the combination of random forest and SVR yielded the best results over the validation set. When all regions have been scored, the algorithm stores the highest score found and the corre-sponding region as the best candidate for the lesion segmentation. A new clustering is performedincreasing the number of clusters by 1, and the resulting regions are scored again. This process isrepeated until there is no improvement in the scores after increasing the number of clusters. Atthis point, the iteration stops and the region with the best score is considered the best estimationof the lesion segmentation.When the best candidate region (herein mask ) has been determined, the following postpro-cessing steps are executed:1. All “holes” (background regions completely surrounded by the mask) are deleted and theirpixels are incorporated to the mask.2. A morphological closing of the mask is performed using a disk of radius 30 pixels, followedby a dilation [4] using a disk of radius 14. These operations compensate for the fact thatthis algorithm tends to underestimate the margin to be left around the lesions comparedto the ground truth masks. They also smooth the border of the mask. The optimal radiifor the disks were found by optimizing over the local validation set.

The regression algorithms need to be trained before being used for prediction. In order togenerate a large and diverse sample set for the training step, the same clustering and featurecalculation algorithm described in section 2 was used on the images of the training set, but thescoring step was replaced by a naive score calculation; each region was assigned a score equalto the sum of its 10 feature values. For each region generated, its array of feature values wereadded to the sample set, as well as the Jaccard index of the region compared to the groundtruth mask for that image. The set of all pairs of feature values and their associated Jaccardindexes can then be used to train the regression algorithms.

References [1] http://isdis.net/isic-project/ .[2] https://challenge.kitware.com/ .[3] M. Silveira et al. , “Comparison of segmentation methods for melanoma diagnosis in der-moscopy images,”

IEEE Journal of Selected Topics in Signal Processing , vol. 3, pp. 35–45,Feb 2009. 44] R. C. Gonzalez and R. E. Woods,

Digital Image Processing . Pearson, 3 ed., 2008.[5] G. Zapryanov et al. , “Automatic white balance algorithms for digital still cameras a com-parative study,”

Information Technologies and Control , vol. 1, pp. 16–22, 2012.[6] L. Breiman, “Random forests,”

Machine Learning , vol. 45, no. 1, pp. 5–32, 2001.[7] A. J. Smola and B. Schlkopf, “A tutorial on support vector regression,”