Approximate Lesion Localization in Dermoscopy Images
M. Emre Celebi, Hitoshi Iyatomi, Gerald Schaefer, William V. Stoecker
aa r X i v : . [ c s . C V ] S e p Approximate Lesion Localization in Dermoscopy Images
M. Emre CelebiDept. of Computer ScienceLouisiana State Univ., Shreveport, LA, [email protected] Hitoshi IyatomiDept. of Electrical InformaticsHosei Univ., Tokyo, [email protected] SchaeferSchool of Engineering and Applied ScienceAston Univ., Birmingham, [email protected] William V. StoeckerStoecker & Associates, Rolla, MO, [email protected] 3, 2018
AbstractBackground:
Dermoscopy is one of the major imaging modalities used in the diagnosis of melanoma and other pigmentedskin lesions. Due to the difficulty and subjectivity of human interpretation, automated analysis of dermoscopy images hasbecome an important research area. Border detection is often the first step in this analysis.
Methods:
In this article, wepresent an approximate lesion localization method that serves as a preprocessing step for detecting borders in dermoscopyimages. In this method, first the black frame around the image is removed using an iterative algorithm. The approximatelocation of the lesion is then determined using an ensemble of thresholding algorithms.
Results:
The method is tested ona set of 428 dermoscopy images. The localization error is quantified by a metric that uses dermatologist determined bordersas the ground truth.
Conclusion:
The results demonstrate that the method presented here achieves both fast and accuratelocalization of lesions in dermoscopy images.
Malignant melanoma, the most deadly form of skin cancer, is one of the most rapidly increasing cancers in the world, with anestimated incidence of 62,480 and an estimated total of 8,420 deaths in the United States in 2008 alone [1]. Early diagnosis isparticularly important since melanoma can be cured with a simple excision if detected early.Dermoscopy, also known as epiluminescence microscopy, has become one of the most important tools in the diagnosis ofmelanoma and other pigmented skin lesions. This non-invasive skin imaging technique involves optical magnification, whichmakes subsurface structures more easily visible when compared to conventional clinical images [2]. This in turn reducesscreening errors and provides greater differentiation between difficult lesions such as pigmented Spitz nevi and small, clinicallyequivocal lesions [3]. However, it has also been demonstrated that dermoscopy may actually lower the diagnostic accuracyin the hands of inexperienced dermatologists [4]. Therefore, in order to minimize the diagnostic errors that result from thedifficulty and subjectivity of visual interpretation, the development of computerized image analysis techniques is of paramountimportance [5].Automated border detection is often the first step in the automated or semi-automated analysis of dermoscopy images[6, 7, 8, 9, 10, 11, 12]. It is crucial for the image analysis for two main reasons. First, the border structure provides importantinformation for accurate diagnosis, as many clinical features, such as asymmetry, border irregularity, and abrupt border cutoff,are calculated directly from the border. Second, the extraction of other important clinical features such as atypical pigmentnetworks, globules, and blue-white areas, critically depends on the accuracy of border detection. Automated border detectionis a challenging task due to several reasons: (i) low contrast between the lesion and the surrounding skin, (ii) irregular andfuzzy lesion borders, (iii) artifacts such as black frames, skin lines, hairs, and air bubbles, (iv) variegated coloring inside thelesion.A number of methods have been developed for preprocessing dermoscopy images. Most of these focused on the removal ofartifacts such as hairs and bubbles. Of the studies dealing with hair removal, Lee et al. [13] and Schmid [6] approached theproblem using mathematical morphology. Fleming et al. [5] applied curvilinear structure detection with various constraintsfollowed by gap filling. More recently, Zhou et al. [14] improved Fleming et al. ’s approach using feature guided examplar-basedinpainting. A method for bubble removal was introduced in [5], where the authors utilized a morphological top-hat operatorfollowed by a radial search procedure.In this article, we present a method for approximate lesion localization in dermoscopy images. First, the black framearound the image is removed using an iterative algorithm. Then, the approximate location of the lesion is determined usingan ensemble of thresholding algorithms. 1kin Research and Technology, 15(3): 314–322, 2009
Dermoscopy images often contain black frames that are introduced during the digitization process. These need to be removedbecause they might interfere with the subsequent lesion localization procedure. In order to determine the darkness of a pixelwith (R, G, B) coordinates, the lightness component of the HSL color space [15] is utilized: L = max( R, G, B ) + min(
R, G, B )2 (1)In particular, a pixel is considered to be black if its lightness value is less than 20. Using this criterion, the image is scannedrow-by-row starting from the top. A particular row is labeled as part of the black frame if it contains 60% black pixels. Thetop-to-bottom scan terminates when a row that contains less than the threshold percentage of pixels is encountered. The samescanning procedure is repeated for the other three main directions. Fig. 1 shows the result of this procedure on a sample image. (a) Original image (b) After frame removal
Figure 1: Black frame removal
Although dermoscopy images can be quite large, the actual lesion often occupies a relatively small area. Therefore, if we candetermine the approximate location of the lesion, the border detection algorithm can focus on this region rather than thewhole image. An accurate bounding box (the smallest axis-aligned rectangular box that encloses the lesion) might be usefulfor various reasons: (i) it provides an estimate of the lesion size (certain image segmentation algorithms such as region growingand morphological flooding can use the size of the region as a termination criterion), (ii) it might improve the border detectionaccuracy since the procedure is focused on a region that is guaranteed to contain the lesion, (iii) it speeds up the borderdetection since the procedure is performed on a region that is often smaller than the whole image, (iv) its surrounding mightbe utilized in the estimation of the background skin color, which is useful for various operations including the elimination ofspurious regions that are discovered during the border detection procedure [10] and the extraction of dermoscopic features suchas blotches [16] and blue-white areas [17].In many dermoscopic images, the lesion can be roughly separated from the background skin using a grayscale thresholdingmethod applied to the blue channel [8, 9]. While there are a number of thresholding methods that perform well in general,the effectiveness of a method strongly depends on the statistical characteristics of the image [18]. Fig. 2 illustrates thisphenomenon . Here, methods 2(d), 2(e), and 2(g) perform quite well. In contrast, methods 2(c) and 2(h) underestimatethe optimal threshold, whereas method 2(f) overestimates the optimal threshold. Although method 2(c) is the most popularthresholding algorithm in the literature [25], for this particular image, it performs the second worst.A possible approach to overcome this problem is to fuse the results provided by an ensemble of thresholding algorithms.In this way, it is possible to exploit the peculiarities of the participating thresholding algorithms synergistically, thus arrivingat more robust final decisions than is possible with a single thresholding algorithm. We note that the goal of the fusion is notto outperform the individual thresholding algorithms, but to obtain accuracies comparable to that of the best thresholdingalgorithm independently of the image characteristics. In this study, we used the threshold fusion method proposed by Melgani[18], which we describe briefly in the following.Let X = { x mn : m = 0 , , . . . , M − , n = 0 , , . . . , N − } be the original scalar M × N image with L possible gray levels( x mn ∈ { , , . . . , L − } ) and Y = { y mn : m = 0 , , . . . , M − , n = 0 , , . . . , N − } be the binary output of the thresholdfusion. Consider an ensemble of P thresholding algorithms. Let T i and A i ( i = 1 , , . . . , P ) be the threshold value and the The frame of this image is left intact for visualization purposes. kin Research and Technology, 15(3): 314–322, 2009 (a) Original image (b) Blue channel(c) Otsu’s method [19] ( T = 137) (d) Kapur et al. ’s method [20] ( T = 178)(e) Huang & Wang’s method [21] ( T = 183) (f) Yen et al. ’s method [22] ( T = 200)(g) Sahoo et al. ’s method [23] ( T = 179) (h) Li & Tam’s method [24] ( T = 59) Figure 2: Comparison of various thresholding methods ( T : threshold)output binary image associated with the i-th algorithm of the ensemble, respectively. Within a Markov Random Field (MRF)framework the fusion problem can be formulated as an energy minimization task. Accordingly, the local energy function U mn to be minimized for the pixel ( m, n ) can be written as follows:U mn = β SP · U SP (cid:2) y mn , Y S ( m, n ) (cid:3) + P X i =1 β i · U II (cid:2) y mn , A Si ( m, n ) (cid:3) (2)where S is a predefined neighborhood system associated with pixel ( m, n ), U SP ( · ) and U II ( · ) refer to the spatial and inter-image energy functions, respectively, whereas β SP and β i ( i = 1 , , . . . , P ) represent the spatial and inter-image parameters,respectively. The spatial energy function can be expressed as:U SP (cid:2) y mn , Y S ( m, n ) (cid:3) = − X y pq ∈ Y S ( m,n ) I ( y mn , y pq ) (3)kin Research and Technology, 15(3): 314–322, 2009where I( ., . ) is the indicator function defined as:I ( y mn , y pq ) = (cid:26) y mn = y pq II (cid:2) y mn , A S ( m, n ) (cid:3) = − X A i ( p,q ) ∈ A Si ( m,n ) α i ( x pq ) · I [ y mn , A i ( p, q )] (5)where α i ( · ) is a weight function given by: α i ( x mn ) = 1 − exp ( − γ | x mn − T i | ) (6)This function controls the effect of unreliable decisions at the pixel level that can be incurred by the thresholding algorithms.At the global (image) level decisions are weighed by the inter-image parameters β i ( i = 1 , , . . . , P ), which are computed asfollows: β i = exp (cid:0) − γ (cid:12)(cid:12) ¯ T − T i (cid:12)(cid:12)(cid:1) (7)where ¯ T is the average threshold value: ¯ T = 1 P P X i =1 T i (8)The MRF fusion strategy proposed in [18] is as follows:1. Apply each thresholding algorithm of the ensemble to the image X to generate the set of thresholded images A i ( i =1 , , . . . , P )2. Initialize Y by minimizing for each pixel ( m, n ) the local energy function U mn defined in Eq. 2 without the spatial energyterm i.e., by setting β SP = 0.3. Update Y by minimizing for each pixel ( m, n ) the local energy function U mn defined in Eq. 2 including the spatial energyterm i.e., by setting β SP = 0.4. Repeat step 3 K max times or until the number of different labels in Y computed over the last two iterations becomesvery small.In our preliminary experiments, we observed that, besides being computationally demanding, the iterative part (step 3) ofthe fusion algorithm makes only marginal contribution to the quality of the results. Therefore, in this study, we consideredonly the first two steps. The γ parameter was set to the recommended value of 0 . α (Eq. 6)and β (Eq. 7) values were precalculated and the neighborhood system S was chosen as a 3 × et al. ’s [20], Huang & Wang’s[21], Yen et al. ’s [22] , Sahoo et al. ’s [23], and Li & Tam’s [24] methods. In order to determine the best combination, weevaluated ensembles with 3 (20 ensembles), 4 (15 ensembles), 5 (6 ensembles), and 6 (1 ensembles) methods.Fig. 3 shows the output of four particular ensembles: Otsu-Kapur-Huang, Yen-Sahoo-Li, Otsu-Kapur-Huang-Yen, andHuang-Yen-Sahoo-Li. Note that each ensemble contains at least one method that either underestimates or overestimates theoptimal threshold. It can be seen that each ensemble performs equally well, which demonstrates that failures in pathologicalcases might be prevented using a proper fusion strategy.Fig. 4(a) shows the result of the ensemble Otsu-Kapur-Huang-Sahoo. Here, the blue bounding box encloses the dermatologistdetermined border (see Section 3), whereas the red one encloses the binary output of the threshold fusion. It can be seen thatthe red box is completely contained within the blue box. This was observed in many cases because the automated thresholdingmethods tend to find the sharpest pigment change, whereas the dermatologists choose the outmost detectable pigment. Weexperimented with two different expansion methods to solve this problem. The first one involves expanding the automatic boxby P % in four main directions. In other words, an automatic box of size M B × N B is expanded by M B · P/
100 pixels in theWest and East directions and N B · P/
100 pixels in the North and South directions. The second one involves incrementingthe threshold values obtained by each algorithm in the ensemble by G gray levels. In the rest of this article, we will refer tothese expansion methods as non-adaptive and adaptive, respectively. Figs. 4(a) and 4(b) show the results of these methodswith the expanded box shown in green. In this particular example, the non-adaptive method performs better in bringing theautomatic box closer to the manual one. In order to determine the optimal expansion amounts we evaluated P ∈ { , , , } and G ∈ { , , , } .kin Research and Technology, 15(3): 314–322, 2009 (a) Otsu-Kapur-Huang (b) Yen-Sahoo-Li(c) Otsu-Kapur-Huang-Yen (d) Huang-Yen-Sahoo-Li Figure 3: Comparison of various threshold ensembles (a) Non-adaptive expansion with P = 3% (b) Adaptive expansion with G = 6 Figure 4: Comparison of the bounding box expansion methods
The proposed method was tested on a set of 428 dermoscopy images obtained from the EDRA Interactive Atlas of Dermoscopy[2] and the Keio University Hospital. These were 24-bit RGB color images with dimensions ranging from 771 ×
507 pixels to768 ×
512 pixels. An experienced dermatologist (WVS) determined the manual borders by selecting a number of points on thelesion border, which were then connected by a second-order B-spline. The bounding box error was quantified using the gradingsystem developed by Hance et al. [26] ε = Area( AutomaticBox ⊕ ManualBox )Area(
ManualBox ) ·
100 (9)where
AutomaticBox is the binary image obtained by filling the bounding box of the fusion output,
ManualBox is thebinary image obtained by filling the bounding box of the dermatologist-determined border, ⊕ is the exclusive-OR operation,which essentially determines the pixels for which the AutomaticBox and
ManualBox disagree, and Area( I ) denotes the numberkin Research and Technology, 15(3): 314–322, 2009Table 1: Ensemble statistics ( µ : mean, σ : std. dev., ε i : initial box error, ε x : expanded box error) Ensemble Expansion Method µ ε i σ ε i µ ε x σ ε x µ s σ s Otsu-Kapur-Huang-Sahoo Non-adaptive ( P = 2) 10.25 8.10 7.58 8.13 268.31 185.64Otsu-Huang-Yen-Li Non-adaptive ( P = 4) 11.92 7.59 7.89 6.30 260.55 183.85Otsu-Huang-Sahoo-Li Non-adaptive ( P = 4) 11.98 7.62 7.90 6.20 260.95 184.14Otsu-Huang-Sahoo Non-adaptive ( P = 2) 11.14 7.17 7.91 6.71 273.84 195.69Otsu-Kapur-Huang-Sahoo Adaptive ( G = 6) 10.25 8.10 9.27 7.68 276.92 192.14Kapur-Huang-Sahoo-Li Adaptive ( G = 8) 10.98 7.66 9.43 7.69 279.03 194.42Otsu-Kapur-Huang-Sahoo Adaptive ( G = 4) 10.25 8.10 9.44 7.56 279.98 194.26Kapur-Huang-Sahoo-Li Adaptive ( G = 6) 10.98 7.66 9.67 7.58 282.09 196.58Table 2: Individual statistics ( µ : mean, σ : std. dev., ε i : initial box error, ε x : expanded box error) Thresholding Method Expansion Method µ ε i σ ε i µ ε x σ ε x µ s σ s Otsu Non-adaptive ( P = 2) 12.05 9.10 9.00 8.95 275.07 199.28Kapur Non-adaptive ( P = 2) 12.87 16.86 12.68 17.56 261.95 197.94Huang Non-adaptive ( P = 2) 20.31 67.97 17.17 69.76 269.59 190.09Yen Non-adaptive ( P = 2) 14.98 27.12 15.74 27.74 255.61 250.53Sahoo Non-adaptive ( P = 2) 13.43 24.60 13.37 25.19 254.43 184.36Li Non-adaptive ( P = 2) 15.12 9.65 11.06 9.07 293.54 215.80Otsu Non-adaptive ( P = 4) 12.05 9.10 9.10 9.14 256.86 182.82Kapur Non-adaptive ( P = 4) 12.87 16.86 15.54 18.61 245.36 183.78Huang Non-adaptive ( P = 4) 20.31 67.97 16.83 70.69 251.99 174.44Yen Non-adaptive ( P = 4) 14.98 27.12 19.32 28.49 239.46 230.91Sahoo Non-adaptive ( P = 4) 13.43 24.60 16.43 25.98 238.32 170.38Li Non-adaptive ( P = 4) 15.12 9.65 9.41 7.99 273.93 198.61of pixels in the binary image I .We determined the optimal parameter combination for the presented approximate bounding box computation method asfollows. First, the black frame removal procedure described in Section 2.1 is performed on each image in the data set. Thelesion bounding box is then computed using the fusion method described in Section 2.2 with one of the 42 ensembles. Finally,the approximate bounding box is expanded using either the non-adaptive method with P ∈ { , , , } or the adaptive methodwith G ∈ { , , , } . Table 1 shows various statistics associated with the four most accurate ensembles for each expansionmethod. The last two columns refer to the mean and standard deviation values, respectively for the percentage image sizereduction, i.e. Area( AutomaticBox ) M · N · G parameter. In contrast, the latter always expands the approximate box by an amount specified by the P parameter.Table 2 shows the statistics for the individual thresholding methods. Note that, due to space limitations, we report onlythe results of the non-adaptive expansion method (as in the ensemble case, the adaptive method has inferior performance). Itcan be seen that, in most configurations, the individual methods obtain significantly higher mean errors than the best ensemblemethods, i.e. the first four rows of Table 1. This is because, as explained in Section 2.2, the individual methods are more proneto catastrophic failures when given pathological input images. The high standard deviation values also support this explanation.Only the performance of Otsu (with P = 2 ,
4) and Li et al. ’s (with P = 4) methods is close to the performance of the ensembles.However, as mentioned in Section 2.2, the goal of fusion is not to outperform the individual thresholding algorithms, but toobtain accuracies comparable to that of the best thresholding algorithm independently of the image characteristics.As mentioned in Section 2.2, an accurate bounding box can provide an estimate of the lesion size, i.e. Area( ManualBorder ).In order to verify this, we calculated the best fitting line for Area(
AutomaticBox ) vs. Area(
ManualBorder ) using the generalizedleast-squares method [27]: Area(
ManualBorder ) ≈ Area(
AutomaticBox ) · . − . ManualBorder is the binary image obtained by filling the dermatologist-determined border. The accuracy of this relationwas calculated by plugging the area of the approximate bounding box for each image into Eq. 10, and then comparing theresult with the actual area of the lesion, which is calculated from the dermatologist-determined border. The percentage meankin Research and Technology, 15(3): 314–322, 2009and standard deviation errors over the entire image set were 11 .
88 and 11 .
49, respectively. These results demonstrate that thelesion size can be estimated from the bounding box area with relatively high accuracy. An even better estimate can be madefrom the binary output of the threshold fusion. The best fitting line for Area(
FusionOutput ) vs. Area(
ManualBorder ) wascalculated as: Area(
ManualBorder ) ≈ Area(
FusionOutput ) · . − . FusionOutput is the binary output of the threshold fusion. The percentage mean and standard deviation errors for thisrelation were 8 .
16 and 8 .
54, respectively.Fig. 5 shows sample bounding box computation results obtained using the ensemble Otsu-Kapur-Huang-Sahoo with P = 2.It can be seen that the presented method determines an accurate bounding box even for lesions with fuzzy borders. We notethat while the expansion operation is useful in most cases, in some cases such as Fig. 5(d), it might deteriorate the resultsslightly. In this paper, an automated method for approximate lesion localization in dermoscopy images is presented. The method iscomprised of three main phases: black frame removal, initial bounding box computation using an ensemble of thresholdingalgorithms, and expansion of the initial bounding box. The execution time of the method is about 0.15 seconds for a typicalimage of size 768 ×
512 pixels on an Intel Pentium D 2.66Ghz computer.The presented method may not perform well on images with significant amount of hair or bubbles since these elementsalter the histogram, which in turn results in biased threshold computations. For images with hair, a preprocessor such as
DullRazor TM [13] might be helpful. Unfortunately, the development of a reliable bubble removal method remains an openproblem.Future work will be directed towards testing the utility of the presented method in a border detection study. The imple-mentation of the threshold fusion method will be made publicly available as part of the Fourier image processing and analysislibrary, which can be downloaded from http://sourceforge.net/projects/fourier-ipal Acknowledgments
This work was supported by grants from Louisiana Board of Regents (LEQSF2008-11-RD-A-12) and NIH (SBIR
References [1] Jemal A, Siegel R, Ward E et al.
Cancer Statistics, 2008. CA: A Cancer Journal for Clinicians 2008; 58(2): 71-96, 2008.[2] Argenziano G, Soyer HP, De Giorgi V et al.
Dermoscopy: A Tutorial. Milan, Italy: EDRA Medical Publishing & NewMedia, 2002.[3] Steiner K, Binder M, Schemper M et al.
Statistical Evaluation of Epiluminescence Dermoscopy Criteria for MelanocyticPigmented Lesions. Journal of American Academy of Dermatology 1993; 29(4): 581-588.[4] Binder M, Schwarz M, Winkler A et al.
Epiluminescence Microscopy. A Useful Tool for the Diagnosis of Pigmented SkinLesions for Formally Trained Dermatologists. Archives of Dermatology 1995; 131(3): 286-291.[5] Fleming MG, Steger C, Zhang J et al.
Techniques for a Structural Analysis of Dermatoscopic Imagery. ComputerizedMedical Imaging and Graphics 1998; 22(5): 375-389.[6] Schmid P. Segmentation of Digitized Dermatoscopic Images by Two-Dimensional Color Clustering. IEEE Trans. on MedicalImaging 1999; 18(2): 164-171.[7] Erkol B, Moss RH, Stanley RJ, Stoecker WV, Hvatum E. Automatic Lesion Boundary Detection in Dermoscopy ImagesUsing Gradient Vector Flow Snakes. Skin Research and Technology 2005; 11(1): 17-26.[8] Iyatomi H, Oka H, Saito M et al.
Quantitative Assessment of Tumor Extraction from Dermoscopy Images and Evaluationof Computer-based Extraction Methods for Automatic Melanoma Diagnostic System. Melanoma Research 2006; 16(2):183-190.[9] Celebi ME, Aslandogan YA, Stoecker WV et al.
Unsupervised Border Detection in Dermoscopy Images. Skin Researchand Technology 2007; 13(4): 454-462.[10] Celebi ME, Kingravi HA, Iyatomi H et al.
Border Detection in Dermoscopy Images Using Statistical Region Merging. SkinResearch and Technology 2008; 14(3): 347-353.kin Research and Technology, 15(3): 314–322, 2009 (a) ε i = 3 .
26% , ε x = 1 .
83% (b) ε i = 4 .
89% , ε x = 3 . ε i = 10 .
87% , ε x = 4 .
03% (d) ε i = 3 .
18% , ε x = 4 . ε i = 6 .
75% , ε x = 5 .
36% (f) ε i = 14 .
24% , ε x = 9 . ε i = 14 .
05% , ε x = 10 .
91% (h) ε i = 18 .
36% , ε x = 13 . Figure 5: Sample bounding box computation results ( ε i : initial box error, ε x : expanded box error)kin Research and Technology, 15(3): 314–322, 2009[11] Zhou H, Schaefer G, Sadka A, Celebi ME. Anisotropic Mean Shift Based Fuzzy C-Means Segmentation of DermoscopyImages. IEEE Journal of Selected Topics in Signal Processing 2009; 3(1): 26-34.[12] Celebi ME, Iyatomi H, Schaefer G, Stoecker WV. Lesion Border Detection in Dermoscopy Images. Computerized MedicalImaging and Graphics 2009; 33(2): 148-153.[13] Lee TK, Ng V, Gallagher R. et al. Dullrazor: A Software Approach to Hair Removal from Images. Computer in Biologyand Medicine 1997; 27(6): 533-543.[14] Zhou H, Chen M, Gass R et al.
Feature-Preserving Artifact Removal from Dermoscopy Images. Proc. of the SPIE MedicalImaging 2008 Conf., 6914: 1B1-9.[15] Levkowitz H, Herman GT. GLHS: A Generalized Lightness, Hue, and Saturation Color Model. CVGIP: Graphical Modelsand Image Processing 1993; 55(4): 271-285.[16] Stoecker WV, Gupta K, Stanley RJ et al.
Detection of Asymmetric Blotches in Dermoscopy Images of Malignant MelanomaUsing Relative Color. Skin Research and Technology 2005; 11(3): 179-184.[17] Celebi ME, Iyatomi H, Stoecker WV. et al.et al.