Tumor and microcalcification characterization using Entropy, Fractal Dimension and intensity values statistical analysis in mammography
Cristian Heber Zepeda Fernández, Minerva Guadalupe Vázquez Domínguez, Eduardo Moreno Barbosa, Benito de Celis Alonso, Karla Herrera, Mario Rodríguez Cahuantzi
TTumor and microcalcification characterization using Entropy, Fractal Dimension andintensity values statistical analysis in mammography
C.H. Zepeda Fern´andez , , ∗ M.G. V´azquez Dom´ınguez , E. Moreno Barbosa ,B. De Celis Alonso , K. Herrera , and M. Rodr´ıguez Cahuantzi C´atedra CONACyT, 03940, CdMx Mexico Facultad de Ciencias F´ısico Matem´aticas,Benem´erita Universidad Aut´onoma de Puebla,Av. San Claudio y 18 Sur,Ciudad Universitaria 72570, Puebla, Mexico Hospital de la Mujer Puebla,Antiguo Camino Guadalupe Hidalgo 11350,Agua santa, Guadalupe Hidalgo, 72490 Puebla, Pue.
Digital analysis of mammographic images is a complementary tool to clinical evaluation, com-monly used to identify tumors and/or microcalcifications in mammograms. Recent mammographicequipment, can automatically classify them using this methodology. The difficulty in finding andclassifying such areas, arise from different factors such as: image acquisition methodology, excessof brightness, similar physiological and radiological properties of tissues, etc. In this work it isproposed that the numerical computations of fractal dimension and entropy are tools that couldbe used to automatically segment and distinguish malignant tumors and/or microcalcifications indigital mammograms. The study consisted in segment the image in two areas: background andconfirmed malignant tumor and/or microcalcification, to which the fractal dimension and entropyvalues are calculated and it was found the correlation between them. The analysis was performed ona grayscale re-digitized images from images provided locally by hospital for microcalcifications andfrom public databases for malignant tumors. From these re-digitized images, it was able to give theexact coordinates as the diagnosis provides. For any image, the highest intensity value was locatedin the tumor and/or microcalcification area, resulting that the fractal dimension had a higher valuethan the rest of the image, while the entropy value was lower, due to the uniformity of intensitiesin these areas. To complete this study, it was performed a data analysis with the set of intensitiesin each area. This allowed to distinguish between the areas of interest according to the value of theintensities, having a value greater than 3 σ . Finally, it is shown a technique to visually highlight themalignant tumor and/or microcalcification. With this three techniques it is possible to distinguisha malignant tumor and/or microcalcification from a mammography. Keywords: Fractal dimension, entropy, data analysis, digital mammography, tumor, microcalcification
I. INTRODUCTION
According to the
World Cancer Research Fund , breastcancer ranks second of all cancers based on its preva-lence (lung cancer being the first) [1], it is also the can-cer that most commonly occurs in women. Due to thisincidence, there is a great variety of clinical studies de-veloped and used to diagnose it. Mammograms are thegolden standard in the clinic nowadays and are used tofind anomalous masses, which can either be malignantor benign tumors. With this technique, it is possible toobserve microcalcifications, macrocalcifications and fattytissue. The image of a tumor appears in general lines asan amorphous mass, while a microcalcification appearsas a dot and usually are not a manifestation of cancerunless there are many of them present in a structuredpattern [2]. In some cases, it could be difficult to distin-guish it and make a diagnosis, due to fatty tissue, imageacquisition methodology, excess of brightness, artefacts,etc. Therefore, it is necessary to use other techniques ∗ Corresponding author, email:[email protected] to determine the malignant tumor and/or microcalcifica-tion (T/MC) location. The use of artificial intelligence(AI) analysis is of high interest, as it can complementand make clinical diagnosis more precise [3]. There areseveral techniques based in mathematical treatments tocharacterize the area of the tumor, one of them is to cal-culate the entropy value [4]. Another kind of entropyis the Shannon entropy (S) from information theory [5],which is given by the equation: S = − n (cid:88) i =1 p i log p i (1)where p i is the i -value in grayscale of the i -pixel and n is the total pixels contained in an area of the image, thisarea could be a section of the total image.To consider that mammograms are digital images, theycan be analyzed with topological properties. Therefore,it is possible to associate a Fractal Dimension (FD) tothese images or a given region in them. This quantityhas been studied in mammograms densities [6] and is a r X i v : . [ phy s i c s . m e d - ph ] J a n calculated with: F D = 2 − log [ A ( (cid:15) )] log [ (cid:15) ] (2)where, A ( (cid:15) ) is the area of the image where the FD iscalculated and (cid:15) is the size of the pixel. Do not confusethe value of the area where the image to analyze ischosen with the value of A ( (cid:15) ). The first one representsthe size of the image (or a section) using the Euclideanmetric, while A ( (cid:15) ) depends of the intensity and the sizeof the pixel of the image (or a section). There are othertechniques calculate the fractal dimension when usingboxes [7], however these techniques do not consider theintensity information which this work is interested in forEq. 2. Therefore they were not used in this work.Mammograms are obtained by the interaction ofX-rays with matter and due to the nature of thiselectromagnetic waves, they cross differently dependingof the material. The image obtained shows differentvalues in a grayscale, therefore, in this work is aimedat showing that is was possible to characterize T/MCregions and distinguish them from the rest of the image.This would be achieved by calculating the values of theFD and S from these areas and comparing them withother areas of the image, allowing us to distinguish themthis way. It is shown that these parameters appearedalso to be correlated. II. METHODS AND DATA SELECTION
Five microcalcification cases were selected from pa-tients of the ”Hospital de la Mujer” in Puebla. Tenmammograms of malignant tumors were obtained from
PEIPA, the Pilot European Image Processing Archive:The mini-MIAS database of mammograms [8]. Both setsof mammograms were complemented with their respec-tive clinical diagnoses.In order to develop an algorithm to study FD and S inmammograms, the analysis of images began by trans-forming mammographic images from a PNG format intoa intensity map using the CERN ROOT software [9].These new images were called re-digitized . In otherwords, mammograms were transformed into a 3D matrix,in which, two axis were the geometrical coordinates (nounits) of the pixels and the other axis was the intensityvalue, which, tacking the maximum value, it was possi-ble to normalize. Each transformation was conformed by97,604 pixels, with a pixel size of 0 . × . sigmas ( σ ). Finally,because the intensities were normalized, it was possibleto highlight the values associated with the T/MC, byraising these values to some power and thus highlightthem. In this work there are shown three techniques todifferentiate the T/MC area from the rest of the image. III. ANALYSIS AND RESULTS
As a first and important result, from the intensity map,it was possible to distinguish between the T/MC areafrom the rest of the breast, because the highest intensityvalue was located in this area.
A. T-mammograms analysis
The real images were given in a matrix of1024 × Figure 1. One of the mammographic images used in thisstudy. (up) Mammographic image of microcalcification typeand (down) its transformation into a 3D phase-space.Top banner (1) MC, (2) C and (3) and (4) t. Bottom banner,same image with same tissues in phase-space. coordinates of (338,314) for the center of T and the co-ordinates in the transformation, for the highest inten-sity value were (346,310) or in the coordinates system:(-0.320313,-0.400391), to show consistency between thediagnosis and the measured value, in Table I are shownthe percentage error for the coordinates x and y , using P error i = | m i − d i | d i . (3)Where, m is the coordinate value obtained by thetransformation, d is the value giving by diagnostic and i = x, y . Fractal dimension and entropy results
The FD and S values for each area for the 10 T-mammograms are shown in Figure 3. From which, itcan be seen that the FD has a higher value for T than tarea. While the S value is higher for t than T area. Thistrend is the same for all samples, then, it was possible todistinguish between T and t areas.
Figure 2. The location of the (1) T and (2) t areas for (up)the real image and (down) its transformation.Table I. Percentage error between the diagnostic coordinateand the transformation coordinate.Sample P error x (%) P error y (%)1 2.36 5.752 2.93 6.423 1.77 0.354 1.06 4.125 0.18 8.516 2.36 0.027 0.02 3.458 4.02 0.019 1.85 2.6510 1.42 4.16 Statistical analysis
An intensity statistical analysis was purposed in orderto show another method to distinguish between the ar-eas. As an example, in Figure 4 are shown the T and tROIs according to diagnosis for mammogram of Figure 2.To start with the analysis, in Figure 5 is shown the in-tensity distribution of all sample. In Figure 6 is shownthe intensity distribution of T-area. Finally, in Figure 7is shown the Crystal Ball fit of the intensity distribu-tion of a t-area selected, where were found the values of
Figure 3. Values of FD and S for T and t areas. It can beseen that the trend of these values between the areas are thesame for all samples. σ = 0 . ± .
004 and mean = 0 . ± . Figure 4. a) T and b) t ROIs from Figure 2, chosen accordingto diagnosis.
Comparing Figure 5 and Figure 6, it is simple to seethat there are more intensity values from 0.5 to 0.6 inall distribution than in T-area. This means that in T-area there are intensity values that match with tissue. Asa first approximation, it was possible to separate theseareas by choosing the maximum intensity value in T-area being 0.890625 and the mean value in t-area. Then,the maximum intensity value from T-area was located at31- σ from the mean value in t-area. This analysis wasrepeated for the rest of mammograms. In Table II arethe σ and mean values for the t-area fit distribution and Figure 5. Intensity distribution values from Figure 2.Figure 6. Intensity T distribution values from Figure 4 the maximum value for the T-area. Then, it is simpleto note that the maximum intensity value (located in T-area) for each sample is bigger than 3- σ from the mean intheir respective t-area, i.e., the highest intensity value islocated more than 99.6% from the t-area intensity values.Another treatment was made in order to distinguish Tand t areas. It consisted in taking the average intensityvalue of these areas. In Table III are shown these values.From this table, it is possible to infer a cut to excludethe section of t-area and keep T-area. As an example, inFigure 8 is shown the intensity values greater than 0.6 Figure 7. Crystal Ball fit intensity distribution values froma t-area of Figure 4. The values of σ = 0 . ± .
004 and mean = 0 . ± . Fit values distribution in t-area Maximum intensitySample mean σ value in T-area1 0.4317 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± mean and σ fit values from the fit intensity distribution in t-area.Last column: Maximum intensity value in the mammogram,i.e., the maximum value in T-area. from Figure 2 to highlight T-area and the correspondingintensity distribution is shown in Figure 9. It can benoticed other area on the top left, the diagnosis does notspecify what it is, it may be a problem in the taking ofthe study, as already mentioned in the Introduction. Table III. Average intensities values in T-mammograms for tand T areas. Sample t-area T-area1 0.629 0.8512 0.855 0.9173 0.632 0.9454 0.707 0.9415 0.652 0.9456 0.640 0.8207 0.882 0.8948 0.718 0.8089 0.765 0.87810 0.820 0.890Figure 8. Results of having selected intensity values greaterthan 0.6 in Figure 2.
Highlight tumor values by powers
Figure 9. Intensity values distribution after having selectedintensity values greater than 0.6 in Figure 2.
Finally, when raising to certain powers the intensity val-ues, the area of the tumor stands out. As an example, inFigure 10 are the results by raising the intensity valuesfrom Figure 2. Again the area described above is kept.
B. MC-mammograms analysis
For this mammograms, it highlights the C-area,its central bin always was found to have the secondhighest intensity value after MC. As an example, thecoordinates for the MC from Figure 1 are (0.681957,0.622490). These mammograms, cannot be comparedwith a real coordinates in the PNG image (as in theT-mammograms), because the diagnosis was made bythe opinion of Dra. Herrera. However, the location ofthe MC matches the diagnosis.
Fractal dimension and entropy analysis
The FD and S values for the five MC-images are shown inFigure 11. From which, it can be seen that the FD has ahigher value for MC than the C and t values. While the Svalue is higher for t than the MC. The FD and S values ofC can fluctuate, because there are more intensity valuescompared to MC and t areas, however its FD value isalways less than the MC value. Then, the T-area can beexcluded
Statistical analysis
Similar to T-mammograms, an intensity value statisti-cal analysis was made. The statistic in this images waslower than in the previous case, however, it was enoughfor the analysis in t-area. As an example, in Figure 12are shown MC, C and t area from Figure 1, the sizesof this areas were choosing to enclose first the MC-area.In Figure 13 is shown the intensity distribution of allsample. Note that in the t-area there are many morevalues than in MC-area. The values distribution, of oneof many t-areas, are shown in Figure 14, the low statisticwas due to the size of the area selected, regarding thesize of MC-area. The fit distribution has the values of
Figure 10. Results of having raised to third, fifth and tenthpower the image of Figure 2. Comparing the images, it issimple to see that T-area becomes prominent.Figure 11. Correlation between FD and S values for the MC,C and t areas from MC-mammograms. mean = 0 . ± . σ = 0 . ± . σ from the mean t-value. Figure 12. a) MC, b) C and c) t area choosing to enclose MCarea from Figure 1. It can be seen the low statistic comare itto Figure 4.
This analysis was made for the rest of samples and inTable IV are shown the fit values of t-area intensity dis-tribution, the maxim intensity value (MC central value)and the maximum intensity value of C-area.As it was mentioned, the statistic in C and MC areaswas very small, less than 50 entries. The main goal wasto distinguish the t and MC areas. Then, from Table IVit is simple to note that the MC values are located morethan 3- σ from the mean value in t-area. Then, it can bedistinguished.As it was made for T-mammograms, the average valueswere calculated for each area to infer the cuts and dis-tinguish the MC-area. In Table V are shown the average Figure 13. Intensity values distribution for Figure 1.Figure 14. Intensity t values distribution by choosing oneof many t-areas from Figure 1. The Gaussian fit values are: mean = 0 . ± . σ = 0 . ± . values of the MC and t areas.As an example, taking values greater than 0.7, the Fig-ure 1 was transformed as the Figure 15 shown. Clearly,the choosing of a bigger cut, the MC-area will be moreprominent. Figure 15. Results of having selected intensity values greaterthan 0.7 from Figure 1
Highlight microcalsification values by powers
Finally, in Figure 16, it is shown the raised intensity val-ues to third, sixth and fifteenth power from Figure 1. F i t v a l u e s d i s t r i bu t i o n i n t - a r e a M a x i m u m i n t e n s i t y M a x i m u m i n t e n s i t y S a m p l e m e a n σ v a l u e i n C - a r e a v a l u e i n M C - a r e a 10 . ± . . ± . . . . ± . . ± . . . . ± . . ± . . . . ± . . ± . . . . ± . . ± . . . Table IV. First column: Sample. Two next columns: mean and σ fit values from the fit intensity distribution in t-area.Last columns: Maximum intensity values in C-area and MC-area.Table V. Average intensities values in MC-mammograms fort and MC areas. Sample t-area MC-area1 0.574 0.9022 0.495 0.9163 0.529 0.9654 0.464 0.9305 0.242 0.811 Then, it can be seen that highlights a white spot, which,corresponds to the MC location. This analysis has thesame results for the rest MC-Mammograms.
IV. DISCUSSION AND CONCLUSIONS
The most important result was that the highest in-tensity pixel value in a mammogram was located in theT/MC area. Therefore, with this result, the area of in-terest can be located. From Figures 3 and 11 was shownthat there is a correlation between FD and S for t-areaand T/MC area. Note that the trend is the same, in-
Figure 16. Results of having raised to a) third, b) sixth andc) fifteenth power the image of Figure 1. A white spot isprotruding, which corresponds to the MC. dependent if it is a T or a MC. It can conclude thatthe value of FD is higher for T/MC area than the cor-responding t area. This result can be understood by thefollowing: the size of the pixel in each mammogram isconstant, then, according with Eq. 2, the FD is propor- tional to A ( (cid:15) ) which is proportional to intensity values inthat area, thus, as the pixel with highest values are lo-cated in the T/MC area, the FD value is higher for theseareas. The higher value of S is like in thermodynamics,which usually describe the system’s disorder, in this case,the uniformity or non-uniformity of the intensity values.Therefore, the conclusion is that in the t-area there isa great variety of pixels, while in T/MC area the pixelsvalues are uniform. Finally, the different values betweenFD or S for each sample may be due to fatty tissue ineach breast, the brightness by which the image was ob-tained, etc.To have another method of distinction, it was made anstatistical intensity analysis to characterize t and then ex-clude it to finally keep the intensity values in the T/MCarea. From Tables II and IV it was possible to identifyt and T/MC area, i.e., a second method was shown, inwhich it was possible to distinguish a T/MC from a mam-mogram, since these values were more than 3- σ from themean t-value. From Tables III and V, it was suggested tomake a cut up to 0.7 in intensity value to keep the T/MCarea, because for this analysis, it was not important toknow the shape of T/MC. The results of this analysis canhelp to the diagnostic, to locate the T/MC, even whenthe breast is too dense, for which, it cannot be distin-guished by naked-eye. Also, it was shown that with there-digitized image, it was simple to highlight the T/MCarea, by raising the intensity values to some power.Finally, it is important to note that the image formatof the mammography is not as relevant when using thismethod, due to the transformation to the 3D phase-space, then can be transform an image into informationpixel intensity. ACKNOWLEDGMENTS
We thank Dra. Karla Herrera for the support withthe Microcalsification mammograms and the Universityof South Florida Digital Mammography Home Page fortheir support with of tumor mammograms, both witha diagnostic for which it was possible to carry out thiswork. [1] World Cancer Reserch Fund, American Institute for Can-cer Research [2] BreastCancer.org [3] McKinney, S. Mayer and Sieniek et. al.
International evaluation of an AI system for breast cancerscreening
Nature
Analysis of differenttypes of entropy measures for breast cancer diagnosis using ensemble classification . Biomed. Res. Local Shannon entropy measure with sta-tistical tests for image randomness
Information Sciences (2013), pp. 323-342.[6] J. W. Byng, N. F. Boyd, E. Fishell, R. A. Jong and M. J.Yaffe.
Automated analysis of mammographic densities . Phys.Med. Biol. , (1996), https://doi.org/10.1088/0031-9155/41/5/007 . [7] K. A. Smitha et al Fractal analysis: fractal dimensionand lacunarity from MR images for differentiating thegrades of glioma