Classification of COVID-19 X-ray Images Using a Combination of Deep and Handcrafted Features
CCLASSIFICATION OF COVID-19 X-RAY IMAGES USING A COMBINATION OF DEEP ANDHANDCRAFTED FEATURES
Weihan Zhang (cid:63)
Bryan Pogorelsky (cid:63)
Mark Loveland † Trevor Wolf (cid:63)(cid:63)
Dept. of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin † Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin
ABSTRACT
Coronavirus Disease 2019 (COVID-19) demonstrated theneed for accurate and fast diagnosis methods for emergentviral diseases. Soon after the emergence of COVID-19, med-ical practitioners used X-ray and computed tomography (CT)images of patients’ lungs to detect COVID-19. Machinelearning methods are capable of improving the identificationaccuracy of COVID-19 in X-ray and CT images, deliveringnear real-time results, while alleviating the burden on med-ical practitioners. In this work, we demonstrate the efficacyof a support vector machine (SVM) classifier, trained with acombination of deep convolutional and handcrafted featuresextracted from X-ray chest scans. We use this combina-tion of features to discriminate between healthy, commonpneumonia, and COVID-19 patients. The performance ofthe combined feature approach is compared with a standardconvolutional neural network (CNN) and the SVM trainedwith handcrafted features. We find that combining the fea-tures in our novel framework improves the performance ofthe classification task compared to the independent applica-tion of convolutional and handcrafted features. Specifically,we achieve an accuracy of 0.988 in the classification taskwith our combined approach compared to 0.963 and 0.983accuracy for the handcrafted features with SVM and CNNrespectively.
Index Terms — COVID-19, Deep learning, SVM, Fea-ture extraction, Classification
1. INTRODUCTION
Coronavirus disease 2019 (COVID-19) is an infectious dis-ease caused by Severe Acute Respiratory Syndrome Coro-navirus 2 (SARS-CoV-2). Since the emergence in Wuhan,China in December 2019, it has spread worldwide and hascaused a severe pandemic. The COVID-19 infection causesmild symptoms in the initial stage, but may lead to severeacute symptoms like multi-organ failure and systemic inflam-matory response syndrome [1, 2]. As of December 2020,there have been more than 1.8 million COVID-19 relateddeaths around the world and daily new cases of the diseaseare still rising. Currently, reverse transcription polymerase chain reaction test (RT-PCR) is the most accurate diagnostictest. However, it requires specialized materials, equipment,personnel, and takes at least 24 hours to obtain a result.It may also require a second RT-PCR or a different test toconfirm the diagnosis. Therefore, radiological imaging tech-niques like X-ray and CT-scan can serve as a complement toimprove diagnosis accuracy [3].In recent years, machine learning has been used exten-sively for automatic disease diagnosis in the healthcare sector[4, 5]. Various standard supervised learning algorithms suchas logistic regression, random forests, and support vector ma-chines (SVM) have been applied in detecting COVID-19 inX-ray and CT images of patients’ lungs [6, 7, 8, 9]. The con-volutional neural network (CNN) is a deep learning algorithmthat can extract features from images through a combinationof convolutional, pooling, and fully connected layers. It hasbeen used extensively for image recognition, classification,and object detection. Recent works [10, 11, 12, 13, 14] showthat it can also provide accurate results in detecting COVID-19 in images. These recent works present some insightfulthoughts and valuable opinions. However, the lack of publiclyavailable image databases and the limited amount of patientdata are inevitable challenges for training a CNN.In this study, we propose a fusion model that classifies X-ray images from a combination of handcrafted features andCNN deep features. The model is trained and tested on a largedataset with 1,143 COVID-19 cases, 2,000 normal cases and2,000 other pneumonia cases collected from [15, 16]. Thefeature fusion classifier has been shown as an effective wayof boosting the performance of CNN models in face recogni-tion [17] and biomedical image classifications [18, 19]. Hand-crafted and deep features extract different information fromthe same input image, so the fusion of these two systems hasthe potential to outperform the standard approaches [20]. Ourkey interest is whether a fusion model can also surpass thestandard CNN and SVM for COVID-19 detection. The paperis organized as follows: The methodology and feature extrac-tion techniques are presented in Section 2, the comparativeclassification performances are given in Section 3, and the fi-nal conclusions are made in Section 4. a r X i v : . [ ee ss . I V ] J a n . METHODOLOGY The proposed COVID-19 classifier is trained and tested on acollective dataset with 5,143 X-ray images categorized intothree cases: COVID-19, Normal and Pneumonia. All the im-ages are resized to 224 ×
224 pixels and the local contrastis enhanced by an adaptive histogram equalization algorithmduring the preprocessing stage. Several preprocessed exam-ple images are show in Figure 1. Both handcrafted featuresand VGG16/ResNet50 deep features are extracted from thedataset, then combined and fed into an SVM classifier. Theentire process is shown in Figure 2.(a) COVID-19 (b) Normal (c) Pneumonia
Fig. 1 . Sample images after preprocessing
Handcrafted features seek to characterize each image bycomputing properties using the information directly presentin each image. These handcrafted features are computed foreach image and used as input into the SVM. There are 308features computed on each image by evaluating 14 differ-ent statistical measures on the output of each image underdifferent transformations. The transformations can be cate-gorized into six groups: Texture, Gray-Level Co-OccurrenceMatrix (GLCM), Gray Level Difference Method (GLDM),Fast Fourier Transform, Wavelet transform, and Local Bi-nary Pattern. The features are computed by applying thefollowing same 14 statistical measures on the outputs fromthe aforementioned six transformations: area, mean, standarddeviation, skewness, kurtosis, energy, entropy, maximum,mean absolute deviation, median, minimum, range, rootmean square, and uniformity as used in a COVID-19 im-age classifier that used handcrafted features only [21]. Ofthe aforementioned 14 measures, the following 10 are allcalculated using the standard definitions: Mean, standarddeviation, maximum, minimum, median, range, root meansquare, skewness, mean absolute deviation, and kurtosis.Energy was calculated using the following definition:
Energy := length ( p ) (cid:88) i =1 p i (1)where p i is the i th value from the output vector of a transfor-mation. Area here is defined as the sum of all of the compo- nents of the output vector. Entropy is calculated by first takingthe frequency of each unique intensity via the numpy functionunique() and then normalizing that vector. From there the en-tropy is directly calculated by taking the elementwise sum ofthat normalized vector times the base 2 log of itself. Uni-formity is also calculated from this normalized vector. Forclarity, the pseudo-code is reproduced below: Require: p (vector of output from a transformation) value, counts = unique ( p, returncounts = T rue ) counts = counts/ ( (cid:80) length ( p ) i counts i ) entropy = − (cid:80) length ( p ) i counts i ∗ log ( counts i ) unif ormity = (cid:80) length ( p ) i counts i • Texture
The texture features are calculated by consid-ering each input image as a single row vector and thencalculating each of the above metrics on the vector. Forexample the texture feature corresponding to the meanis simply the sum of all of the pixel values (integer from0 to 255) divided by the number of pixels in the image.This results in a total of 14 features computed.•
GLCM
The GLCM transform characterizes an im-age by creating a histogram of co-occurring greyscalevalues at a given offset and direction over an image[22]. In this specific implementation of GLCM, fea-tures are determined by applying the greycomatrix()function from the skimage library directly on each im-age with an offset of 1 and in four different directions(0, π /4, π /2, 3 π /4). This function returns a 4-D ar-ray corresponding to each direction. Each dimensionis evaluated on the 14 statistical measures as before,resulting in a total of 56 features.• GLDM
GLDM is a method that characterizes an imageby creating a distribution of the absolute differences ofpixel intensity to the pixel intensity of surrounding pix-els at a given distance and direction [23]. In this imple-mentation, GLDM is computed in four directions (0, π /2, pi , 3 π /2) with a distance of 10 pixels. Each of thefour directions gives an output vector and the 14 statis-tical measures are computed on each output resulting in56 features• FFT
The FFT features are evaluated on each image bytransforming the image via a Fast Fourier Transform.Each image is input into the numpy fft.fft2() functionresulting in a vector of output values that are then putinto the numpy fft.fftshift() function. Next, the numpyfloor() function is used to convert to a vector of integerswhich is the final output that is used to compute the 14statistical measures which are the FFT features.•
Wavelet
The wavelet features are computed by apply-ing the pywt package’s dwt2() function directly on each ig. 2 . Methodologyimage [24]. The output of this function gives 4 differentarrays and the 14 statistical measures are computed oneach array resulting in 56 features. The first array fromthe dwt2() output is then put back into the dwt2() func-tion as input, resulting in another 4 matrices. Again,these 4 matrices are used to compute 14 statistical mea-sures each for another 56 features resulting in 112 fea-tures in total from the wavelet transforms.•
LBP
LBP works by looking at points surrounding eachpixel within a given distance and tests whether thepoints are greater than or less than the central point re-sulting in a binary output [25]. In this implementation,scikit-image’s local binary pattern function is used tocompute the LBP outputs with distances of 2,3,5, and7. The resulting four LBPs are then used to computethe 14 statistical measures resulting in 56 features.
Deep features are extracted from two CNN models, VGG16[26] and ResNet50 [27]. More specifically, only the featureextraction layers of the model are utilized which are posi-tioned prior to dense layers meant for the classification task.The model weights are pre-trained on the ImageNet dataset[28] which contains millions of images belonging to 1,000classes. An important note is that no fine tuning is done tothe models, meaning that the model weights are fixed and nofurther training is done.The VGG16 CNN architecture contains 16 layers withtrainable weights (with 5 being dense layers that are not usedfor feature extraction) consisting of 5 blocks that include con-volutional and pooling layers which can be seen in Figure 3.The input of the model accepts RGB images of size 224 × × × ×
512 with subsequent flattening produc-ing a vector containing 25,088 features.
Fig. 3 . VGG16 feature extraction layersAs opposed to CNN architectures such as VGG, ResNetscan have more layer depth with increasing accuracy while atthe same time having less overall complexity. This is achievedby utilizing shortcut connections allowing residual mappingthat may skip one or more layers and performing identitymapping which can alleviate the problem of vanishing gradi-ents. A residual block of this type is shown in Figure 4. TheResNet50 model contains 50 layers with trainable weights (ofwhich a single dense layer is not used for feature extraction).As with VGG16 the input of the model accepts RGB imagesof size 224 × × × × Fig. 4 . Residual blockAfter features are extracted from the models, kernel prin-cipal component analysis (PCA) is applied to reduce the di-mensionality of the deep features. The number of componentsafter the transformation is selected to be 1,000 as this numberof features is near the order of magnitude as the number ofhandcrafted features that are extracted. .3. Classifier
A linear SVM using one-vs-all approach is applied to classifythe combined features. Despite the fact that most deep learn-ing models employ the softmax activation function for classi-fication task, it was shown that SVM works better on severalstandard datasets like MNIST,CIFAR-10,and the ICML 2013Representation Learning Workshop’s face expression recog-nition challenge [29].
3. RESULTS AND DISCUSSIONS
To evaluate the performance of the method outlined above,it was important to compare the performance of combineddeep features and handcrafted features in an SVM classifierwith baseline individual CNNs in addition to solely using thehandcrafted features in an SVM.Both the VGG and ResNet CNNs were evaluated againwith the feature extraction layers frozen with pre-trained Im-ageNet weights. Two layers were added to the models, a1,000 neuron dense layer with a rectified linear activationfunction and a three neuron output layer with a softmax ac-tivation function. The objective of the addition of the layersis to allow the classification of three classes to be possiblein addition to increasing the number of trainable parametersas the feature extraction layers are frozen. Additionally, dur-ing training both models use categorical cross-entropy whileemploying the Adam optimizer [30] with a learning rate of0.005.A parametric study was performed on the handcrafted fea-tures to evaluate which configuration created most accurateresults as inputs into the SVM. Results in Table 1 show thatby itself, the Wavelet features resulted in the highest classifi-cation accuracy followed by GLDM and GLCM. The lowestperforming feature group was the texture features with an ac-curacy of 0.762. It was found that inputting all features (308)into the SVM resulted in the highest accuracy and F-1 Score.A 95% confidence interval is given for all values in Table 1.For each classification model outlined above, the datasetof 5,143 was divided into the same train and test subsets withan 80/20 split. This resulted in 4,114 training images and1,029 test images. The results of each classification modelcan be seen in Table 2. All the metrics listed in the tableare unweighted averages of the statistics of each class with a95% confidence interval. From these results its is clear thatall models that incorporate deep features clearly performedbetter than the SVM that only uses handcrafted features. Thetwo models utilizing both deep features and handcrafted fea-tured with an SVM classifier slightly outperform the conven-tional VGG16 and ResNet50 CNNs. Additionally, the confu-sion matrices of the combined deep features and handcraftedfeatures SVM models are seen in Figure 5. Both combinedfeature models achieve the same low false negative and falsepositive rates of 0.41% and 0.13% respectively.
Table 1 . Performance of X-ray image classification usingSVM with handcrafted features only
Handcrafted Features Accuracy F1-Score
Texture 0.762 ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table 2 . Performance of X-ray image classification models
Classification Model Accuracy F1-Score
Handcrafted Features (SVM) 0.963 ± ± ± ± ± ± ± ± ± ± (a) VGG16 DF + HF (b) ResNet50 DF + HF Fig. 5 . Confusion matrices
4. CONCLUSION
This work demonstrated the use of a combined handcraftedand deep feature approach for classifying COVID-19, pneu-monia, and healthy patients in radiological images. This newapproach was compared to 7 handcrafted feature classifiersand two CNN architectures. With respect to all performancemetrics, the combination of deep features and handcraftedfeatures surpassed that of handcrafted features or deep fea-tures alone. Notably, the proposed architecture achieved anaccuracy of 0.988 by combining VGG16 deep features andhandcrafted features. The next best accuracy of an approachwithout combining deep and handcrafted features was 0.983for ResNet50.
5. REFERENCES [1] Zunyou Wu and Jennifer M McGoogan, “Characteristicsof and important lessons from the coronavirus disease 2019(covid-19) outbreak in china: summary of a report of 72 314cases from the chinese center for disease control and preven-tion,”
Jama , vol. 323, no. 13, pp. 1239–1242, 2020.2] Parag Goyal, Justin J Choi, Laura C Pinheiro, Edward JSchenck, Ruijun Chen, Assem Jabri, Michael J Satlin,Thomas R Campion Jr, Musarrat Nahid, Joanna B Ringel,et al., “Clinical characteristics of covid-19 in new york city,”
New England Journal of Medicine , 2020.[3] Tao Ai, Zhenlu Yang, Hongyan Hou, Chenao Zhan, ChongChen, Wenzhi Lv, Qian Tao, Ziyong Sun, and Liming Xia,“Correlation of chest ct and rt-pcr testing in coronavirus dis-ease 2019 (covid-19) in china: a report of 1014 cases,”
Radi-ology , p. 200642, 2020.[4] Paul Sajda, “Machine learning for detection and diagnosis ofdisease,”
Annu. Rev. Biomed. Eng. , vol. 8, pp. 537–565, 2006.[5] Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang,and Joel T Dudley, “Deep learning for healthcare: review, op-portunities and challenges,”
Briefings in bioinformatics , vol.19, no. 6, pp. 1236–1246, 2018.[6] Kunhua Li, Jiong Wu, Faqi Wu, Dajing Guo, Linli Chen,Zheng Fang, and Chuanming Li, “The clinical and chest ctfeatures associated with severe and critical covid-19 pneumo-nia,”
Investigative radiology , 2020.[7] Zhenyu Tang, Wei Zhao, Xingzhi Xie, Zheng Zhong, Feng Shi,Jun Liu, and Dinggang Shen, “Severity assessment of coron-avirus disease 2019 (covid-19) using quantitative features fromchest ct images,” arXiv preprint arXiv:2003.11988 , 2020.[8] Mucahid Barstugan, Umut Ozkaya, and Saban Ozturk, “Coro-navirus (covid-19) classification using ct images by machinelearning methods,” arXiv preprint arXiv:2003.09424 , 2020.[9] Prabira Kumar Sethy and Santi Kumari Behera, “Detectionof coronavirus disease (covid-19) based on deep features,” , vol. 2020030300, pp. 2020, 2020.[10] Ioannis D Apostolopoulos and Tzani A Mpesiana, “Covid-19: automatic detection from x-ray images utilizing transferlearning with convolutional neural networks,”
Physical andEngineering Sciences in Medicine , p. 1, 2020.[11] Khalid El Asnaoui, Youness Chawki, and Ali Idri, “Au-tomated methods for detection and classification pneumoniabased on x-ray images using deep learning,” arXiv preprintarXiv:2003.14363 , 2020.[12] Tulin Ozturk, Muhammed Talo, Eylul Azra Yildirim,Ulas Baran Baloglu, Ozal Yildirim, and U Rajendra Acharya,“Automated detection of covid-19 cases using deep neuralnetworks with x-ray images,”
Computers in Biology andMedicine , p. 103792, 2020.[13] Tanvir Mahmud, Md Awsafur Rahman, and Shaikh AnowarulFattah, “Covxnet: A multi-dilation convolutional neural net-work for automatic covid-19 and other pneumonia detectionfrom chest x-ray images with transferable multi-receptive fea-ture optimization,”
Computers in biology and medicine , vol.122, pp. 103869, 2020.[14] Lawrence O Hall, Rahul Paul, Dmitry B Goldgof, and Gre-gory M Goldgof, “Finding covid-19 from chest x-raysusing deep learning on a small dataset,” arXiv preprintarXiv:2004.02060 , 2020.[15] Daniel Kermany, Kang Zhang, and Michael Goldbaum, “La-beled optical coherence tomography (oct) and chest x-ray im-ages for classification,”
Mendeley data , vol. 2, 2018. [16] , COVID-19Radiography Database .[17] Dat Tien Nguyen, Tuyen Danh Pham, Na Rae Baek, andKang Ryoung Park, “Combining deep and handcrafted imagefeatures for presentation attack detection in face recognitionsystems using visible-light camera sensors,”
Sensors , vol. 18,no. 3, pp. 699, 2018.[18] Loris Nanni, Sheryl Brahnam, Stefano Ghidoni, and Alessan-dra Lumini, “Bioimage classification with handcrafted andlearned features,”
IEEE/ACM transactions on computationalbiology and bioinformatics , vol. 16, no. 3, pp. 874–885, 2018.[19] Varun Srivastava and Ravindra Kr Purwar, “Classification ofct scan images of lungs using deep convolutional neural net-work with external shape-based features,”
Journal of digitalimaging , vol. 33, no. 1, pp. 252–261, 2020.[20] Loris Nanni, Stefano Ghidoni, and Sheryl Brahnam, “Hand-crafted vs. non-handcrafted features for computer vision clas-sification,”
Pattern Recognition , vol. 71, pp. 158–172, 2017.[21] Abolfazl Zargari Khuzani, Morteza Heidari, and S. AliShariati, “Covid-classifier: An automated machine learningmodel to assist in the diagnosis of covid-19 infection in chestx-ray images,” medRxiv , 2020.[22] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural fea-tures for image classification,”
IEEE Transactions on Systems,Man, and Cybernetics , vol. SMC-3, no. 6, pp. 610–621, 1973.[23] R. W. Conners and C. A. Harlow, “A theoretical comparisonof texture algorithms,”
IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. PAMI-2, no. 3, pp. 204–222,1980.[24] Gregory R. Lee, Ralf Gommers, Filip Waselewski, KaiWohlfahrt, and Aaron O’Leary, “Pywavelets: A python pack-age for wavelet analysis,”
Journal of Open Source Software ,vol. 4, no. 36, pp. 1237, 2019.[25] Timo Ojala, Matti Pietik¨ainen, and David Harwood, “A com-parative study of texture measures with classification based onfeatured distributions,”
Pattern Recognition , vol. 29, no. 1, pp.51 – 59, 1996.[26] Karen Simonyan and Andrew Zisserman, “Very deep convo-lutional networks for large-scale image recognition,” arXivpreprint arXiv:1409.1556 , 2014.[27] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,“Deep residual learning for image recognition,” in
Proceed-ings of the IEEE conference on computer vision and patternrecognition , 2016, pp. 770–778.[28] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, et al., “Imagenet large scalevisual recognition challenge,”
International journal of com-puter vision , vol. 115, no. 3, pp. 211–252, 2015.[29] Yichuan Tang, “Deep learning using linear support vector ma-chines,” arXiv preprint arXiv:1306.0239 , 2013.[30] Diederik P Kingma and Jimmy Ba, “Adam: A method forstochastic optimization,” arXiv preprint arXiv:1412.6980arXiv preprint arXiv:1412.6980