Characterization of Covid-19 Dataset using Complex Networks and Image Processing
CCharacterization of Covid-19 Dataset usingComplex Networks and Image Processing
Josimar Edinson Chire Saire
Institute of Mathematics andComputer Science (ICMC)University of S˜ao Paulo (USP)S˜ao Carlos, SP, [email protected]
Esteban Wilfredo Vilca Zu˜niga
Dept. of Computing and Mathematics (FFCLRP)University of S˜ao Paulo (USP)Ribeir˜ao Preto, [email protected]
Abstract —This paper aims to explore the structure of patternbehind covid-19 dataset. The dataset includes medical imageswith positive and negative cases. A sample of 100 sample ischosen, 50 per each class. An histogram frequency is calculatedto get features using statistical measurements, besides a featureextraction using Grey Level Co-Occurrence Matrix (GLCM).Using both features are build Complex Networks respectivelyto analyze the adjacency matrices and check the presence ofpatterns. Initial experiments introduces the evidence of hiddenpatterns in the dataset for each class, which are visible usingComplex Networks representation.
Index Terms —Covid-19, Network Information, Coronavirus,Complex Networks, Image Processing, Pattern Recognition,GLCM
I. I
NTRODUCTION
Covid-19 is a breakthrough in human history. It is de-stroying powerful economies and collapsing emerge countries.During its early stage, the virus had a reproduction number of4.22 in Germany and the Netherlands. Even if the developedcountries reduced the impact of the virus, in countries withpoverty and weak health systems, the virus is still a severeproblem. By consequence, many efforts are focused on findinga vaccine, study the virus and find automatic tools to supportprognosis of the illness.Oriented on this direction, many groups are working withtomography, x-ray images to build a model using ArtificialIntelligence techniques, i.e. Deep Learning [1] [2] [3] At thesame time a limitation in the first months was access to imagesrelated to covid-19 patients. Deep Learning algorithms arebased in Artificial Neural Networks with many layers andwhere each layer or groups has an specific function but onelimitation is the need of great quantity of images. For theprevious reason, data augmentation is common to get artificialimages with rotation, some noise.One approach which meaningful feature extraction whichrepresents internal patterns from one dataset is Complex Net-works [4] [5] [6] [7], previous experiments showed the strengthof approaches using graph representation in comparison toclassical Machine Learning algorithms. The presented resultsintroduces the idea of good representation of the internalpatterns. Besides, it is possible to affirm that this Complex Network representation can represent this patterns using asmall number of samples in comparison with Deep Learning.In this paper, we propose a new technique based in ComplexNetworks to identify the virus in x-ray images using high-levelalgorithms to exploit the structure of the features from theseimages. II. D
ATA AND M ETHODS
After a search using keywords, i.e. covid-19 tomographydataset. Many datasets were found but these are too big todownload and process later, besides a variety of image formatsare available, i.e. nii, dicom. One available dataset is chosenbecause format png is ready for processing this images. Theimage Fig. 1 presents 16 samples of positive and negativecases respectively.Fig. 1: Sample of DatasetA dataset with 100 images is selected, positives and neg-ative classes are balanced. By consequence is necessary tofind a transformation from images to Complex Networks. Afirst proposal is using Frequency Histogram, because it canreduce dimensionality and represent the distribution of pixels.Previously, a transformation of color images is performed toget grayscale images. Later, a proposal using GLCM is doneto get neighborhood features considering texture analysis. a r X i v : . [ ee ss . I V ] S e p . Frequency Histogram Histogram frequency were calculated to have a lowerdimensionality representation and statistical features werecalculated, median, mean, standard deviation, kurtosis andskew. This histogram is considering the three channels ofclassical RGB image representation. Figure Fig. 2 representsthe histogram frequency from the previous sample of images.Using this representation lets find a visual difference betweenpositive and negative cases.
Covid (1).png Covid (10).png Covid (100).png Covid (1000).png Covid (1001).png Covid (1002).png Covid (1003).png Covid (1004).pngCovid (1005).png Covid (1006).png Covid (1007).png Covid (1008).png Covid (1009).png Covid (101).png Covid (1010).png Covid (1011).pngNon-Covid (1).png Non-Covid (10).png Non-Covid (100).png Non-Covid (1000).png Non-Covid (1001).png Non-Covid (1002).png Non-Covid (1003).png Non-Covid (1004).pngNon-Covid (1005).png Non-Covid (1006).png Non-Covid (1007).png Non-Covid (1008).png Non-Covid (1009).png Non-Covid (101).png Non-Covid (1010).png Non-Covid (1011).png
Fig. 2: Characterization of dataset using Frequency HistogramThen, the creation of graph is using euclidean distancebetween points of the same class, see Fig. 3. Left siderepresents positive class and right, negative ones. A filteringwas performed using median of the distances, the bottom ofthe image introduces the results.Fig. 3: Characterization of dataset using Frequency Histogram
B. Grey Level Co-Occurrence Matrix features
Grey Level Co-Occurrence Matrix (GLCM) algorithm [8]is a second order statistical method use for texture featureextraction. From this matrix, the next features are extracted: • contrast : (cid:80) levels − i,j P i,j ( i − j ) • dissimilarity : (cid:80) levels − i,j P i,j (cid:107) i − j (cid:107) • homogenity : (cid:80) levels − i,j P i,j i − j ) • ASM : (cid:80) levels − i,j P i,j • energy : (cid:112) ( ASM ) • correlation : (cid:80) levels − i,j P i,j ( i − µ i )( j − µ j ) sqrt ( σ i σ j ) This features are considering 4 orientations: 0, 45, 90 and135 degrees. A sample of the dataset is presented in Tab.I. A transformation from RGB representation to grayscale isperformed using this formula:
Image ( i, j ) = 0 . ∗ R + 0 . ∗ G + 0 . ∗ B, (1)where RGB are the red, gren, blue channels of the image.Figure 4 presents the results for GLCM features. Column 1is showing positive cases, and column 2, the negative ones. Fig. 4: Characterization of dataset using GLCMConsidering previous results, using Frequency Histogramand GLCM is possible to notice that Complex Networksbuilding is possible using euclidean distance. Besides, the rep-resentation of Complex Network through adjacency matricespresents reticular patterns. This patterns are different, positivecases presents a distribution of further or higher distancesbetween the nodes/elements than negative ones. By contrast,negatives samples presents only a few link with high distances.ABLE I: GLCM Features dissimilarity 45 26.738676 dissimilarity 90 25.823644 dissimilarity 135 27.982243 correlation 0 0.816670 correlation 45 0.816058 correlation 90 0.831263 correlation 135 0.800997 homogeneity 0 0.107910 homogeneity 45 0.090083 ... ... ... ... ... ... ... ... ... ... ... ... contrast 135 2343.495992
ASM 0 0.001265
ASM 45 0.000535
ASM 90 0.000520
ASM 135 0.000526 energy 0 0.035566 energy 45 0.023127 energy 90 0.022796 energy 135 0.022930 label 1.0
III. C
ONCLUSIONS
An approach to represent covid-19 tomography imagesusing Complex Networks is feasible. The intensity of thelinks represented through adjacency matrices presents a strongdifference between both classes. In spite of GLCM is a moreelaborated technique to extract neighborhood pattern fromthe images, frequency histogram has a similar representation.Although, both processes are different to create the ComplexNetworks has a similar behaviour, this feature is presentedin visualization of adjacency matrices. Besides, a comparisonbetween Complex Networks approaches for High Level Clas-sification will be presented.IV. F
UTURE W ORK
The authors are considering to include a higher number ofsamples for each classes to have higher diversity of images.Complex Networks representation can be leverage for HighLevel Classification tasks, then experiment on that way willbe performed. V. A
CKNOWLEDGMENTS
Authors wants to thank Research4tech, an Artificial Intel-ligence(AI) community of Latin American Researcher withthe aim of promoting AI, build Science communities tocatapult and enforce development of Latin American countriessupported on Science and Technology, integrating academiccommunity, technology groups/communities, government andsociety. R
EFERENCES[1] I. D. Apostolopoulos and T. A. Mpesiana, “Covid-19: automatic detectionfrom x-ray images utilizing transfer learning with convolutionalneural networks,”
Physical and Engineering Sciences in Medicine ,vol. 43, no. 2, pp. 635–640, Jun 2020. [Online]. Available: https://doi.org/10.1007/s13246-020-00865-4[2] T. Ozturk, M. Talo, E. A. Yildirim, U. B. Baloglu, O. Yildirim,and U. Rajendra Acharya, “Automated detection of covid-19 casesusing deep neural networks with x-ray images,”
Computers inBiology and Medicine
Computers in Biologyand Medicine
IEEE Transactions on NeuralNetworks and Learning Systems , vol. 29, pp. 3361–3373, 2018.[5] S. A. Fadaee and M. A. Haeri, “Classification using link prediction,”