Development of a Machine-Learning System to Classify Lung CT Scan Images into Normal/COVID-19 Class
Seifedine Kadry, Venkatesan Rajinikanth, Seungmin Rho, Nadaradjane Sri Madhava Raja, Vaddi Seshagiri Rao, Krishnan Palani Thanaraj
DDevelopment of a Machine-Learning System to Classify Lung CT Scan Images into Normal/COVID-19 Class
Seifedine Kadry , Venkatesan Rajinikanth , Seungmin Rho , Nadaradjane Sri Madhava Raja , Vaddi Seshagiri Rao , Krishnan Palani Thanaraj Department of Mathematics and Computer Science, Faculty of Science, Beirut Arab University, Lebanon, Department of Electronics and Instrumentation Engineering, St. Joseph’s College of Engineering, Chennai 600119, India; Department of Software, Sejong University, Seoul 05006, Korea; Department of Mechanical Engineering, St. Joseph’s College of Engineering, Chennai 600 119, India;
Abstract:
Recently, the lung infection due to Coronavirus Disease (COVID-19) affected a large human group worldwide and the assessment of the infection rate in the lung is essential for treatment planning. This research aims to propose a Machine-Learning-System (MLS) to detect the COVID-19 infection using the CT scan Slices (CTS). This MLS implements a sequence of methods, such as multi-thresholding, image separation using threshold filter, feature-extraction, feature-selection, feature-fusion and classification. The initial part implements the Chaotic-Bat-Algorithm and Kapur’s Entropy (CBA+KE) thresholding to enhance the CTS. The threshold filter separates the image into two segments based on a chosen threshold ‘Th’. The texture features of these images are extracted, refined and selected using the chosen procedures. Finally, a two-class classifier system is implemented to categorize the chosen CTS (n=500 with a pixel dimension of 512x512x1) into normal/COVID-19 group. In this work, the classifiers, such as Naive Bayes (NB), k-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF) and Support Vector Machine with linear kernel (SVM) are implemented and the classification task is performed using various feature vectors. The experimental outcome of the SVM with Fused-Feature-Vector (FFV) helped to attain a detection accuracy of 89.80%.
Keywords:
Respiratory tract infection; COVID-19; CT scan slice; Feature fusion; SVM classifier; Performance validation.
1. Introduction
The abnormality/infection in the internal body organs are more acute compared to other diseases. Further, the diseases in internal organs are commonly prescreened using the non-invasive methods such as bio-signals and bio-images [1-3]. The infection in lung due to the climatic condition and microorganisms are very common in humans and this infection may cause various symptoms ranging from caught, cold, fever and mild/severe pneumonia [4-6]. The respiratory tract infection due to the Coronavirus Disease (COVID-19) is emerged as one of the major threat globally due to its acuteness and the infection rate. It is one of the major communicable infectious diseases caused by Severe Acute Respiratory Syndrome-Corona Virus-2 (SARS-CoV-2) and according to a recent report [7,8], it affected a larger human community, irrespective of their race and gender. The infection caused by COVID-19 severely affects the respiratory system by causing the severe pneumonia. Due to its harshness and the spreading rate, the World Health Organization (WHO) recently announced it as pandemic [9]. Even though various controlling and treatment procedures are implemented from December 2019 to till date, the mortality due to COVID-19 infection is rapidly increasing.
Due to its acuteness, a considerable number of already initiated to discover the possible solution for the problem due to COVID-19. The earlier research works are related to; (i) The succession and prediction of the COVID-19 to alert the people [10-12], (ii) Precautionary measures to be implemented to control the spread [13,14], (iii) Exploring the structure of the virus to find the solution and Clinical level handling of the pneumonia caused by COVID-19 [15-22]. Among the above said procedures, the clinical level handling gets the priority, in which a possible treatment practices are suggested and implemented by the doctors to control and cure the infection in respiratory tract using various procedures. The clinical level detection of COVID-19 requires two accepted methodologies [23,24]; (i) Clinical level testing using Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test to confirm the disease, and (ii) Image assisted procedure to identify the severity of pneumonia in lungs using Computed-Tomography scan Slices (CTS) and/or chest radiograph (Chest X-ray) assisted diagnosis. During the RT-PCR test, if the result is negative, then the person is a normal and when the RT-PCR provides a positive result, then the person is immediately admitted in hospital for further treatment. When the patient is admitted, the doctor will suggest a combination of treatment procedures, including the image assisted treatment, which helps to choose the possible drug and its dosage level [25]. During this practice, all the RT-PCR positive patients are initially screened with CTS/Chext X-ray to identify the severity of the lung infection. When the treatment is initiated, the patient will be treated till the respiratory system functions well [13]. The treatment planning and the implementation will become a challenging task, when large number of COVID-19 infected patients is admitted in the hospital. Furthermore, the diagnostic burden also rises due to the mass screening of the patients with the imaging modality. Hence, to reduce the burden of the doctors, a computer assisted methods are essential for the initial diagnosis and based on the outcome by the computerized procedure, the doctors can plan and execute the treatment. Normally, the computer based detection procedure can be executed by a skilled technician or a doctor and the findings of this procedure can be shared to the pulmonologist for further assessment. Recently, a considerable number of CTS assisted detection procedures for COVID-19 is reported in the literature [26-33]. Every procedure considered the axial/coronal view of the CTS and the most of the procedures are interested in extracting the pneumonia lesion from the infection to assess the severity of the disease. Still there is a need for a considerable number of image examination procedures, which can be used in future for the clinical level practice. This research proposes a Machine-Learning-System (MLS) to classify the CTS into normal/COVID-19 category using a sequence of methods. The procedures implemented in this MLS is as follows; (i) Tri-level thresholding with Chaotic-Bat-Algorithm and Kapur’s Entropy (CBA+KE), (ii) Separation of the image into Region-Of-Interest (ROI) and artifact by a threshold filter, (iii) Feature extraction using chosen methodology, (iv) Feature ranking and selection based on statistical test, (v) Implementation of serial fusion to get the one-dimensional Fused-Feature-Vector (FFV) and (vi) Classifier implementation and validation. The proposed work is experimentally investigated using MATLAB® software and the essential CTS are collected from the available benchmark datasets. In this work, 500 images (250 normal and 250 COVID-19) of dimension 512x512x1 pixels are utilized for the evaluation. This work implemented a five-fold cross validation during the classification task and the best value attained is considered as the finest result. Further, the proposed work also presents a performance evaluation of the classifiers, such as Naive Bayes (NB), k-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) using the chosen feature vector. The propose MLS helped to achieve a classification accuracy of 81.20% (SVM), 85.80% (RF), and 89.80% (SVM) for various features employed to train, test and validate the classifiers. This study is prearranged as follows; section 2 presents context and section 3 shows the methodology. Section 4 summarizes the experimental outcome and its discussions. Section 5 describes the conclusion.
2. Context
COVID-19 is a recently emerged infectious disease discovered initially in China (Wuhan) in December 2019 [9,14]. The drug discovery for this disease is still in the research phase and no approved drug is available for COVID-19. Due to these reasons, the mortality rate is rising globally [7]. In recent days, a number of image assisted detection procedure for COVID-19 is discussed by the researchers and Table 1 present the summary of few recent techniques.
Table 1.
The summary of image based COVID-19 detection procedure
Reference Implemented investigative procedure Image modality Findings
Rajinikanth et al. [26] Harmony-Search and Otsu’s based image thresholding and watershed-Segmentation is implemented to extract COVID-19 infection. CT Disease severity prediction based on the size of the infection with respect to the lung is discussed Rajinikanth et al. [27] Firefly and Shannon’s entropy based image thresholding and Markov-random-field segmentation is implemented to extract COVID-19 infection. CT This work provided a segmentation accuracy of >92% during the COVID-19 lesion extraction Wu et al. [28] This work implemented a deep-learning procedure for the segmentation and classification of COVID-19 infection from CT images attained from 200 patients. CT Provided the dice score of 78.3% for segmentation. Further, helped to achieve an average sensitivity of 95.0% and a specificity of 93.0% during the classification. Khan et al. [29] A deep-learning based diagnosis of COVID-19 is implemented using CoroNet architecture. Chest X-Ray This work provided a classification accuracy of 89.50%. Rahimzadeh and Attar [30] Implements a modified deep convolutional neural network for the COVID-19 diagnosis. Chest X-Ray This work provided a classification accuracy of 99.56% for the disease class and average accuracy of 91.4%. Ozkaya et al. [31] This work implemented a deep-learning based on features fusion and ranking technique. CT This work helped to attain better values of accuracy (98.27%), sensitivity (98.93%), specificity (97.60%), precision (97.63%), and F1-score (98.28%). Zhou et al. [32] U-Net with attention mechanism is implemented to segment the COVID-19 infection. CT This work provided the following measures; Dice Score=69.1%,Sensitivity=81.1% and Specificity = 97.2%, The details discussed in Table 1 presents recently implemented methodologies to segment and detect the COVID-19 infection using the CT and chest X-ray images. Further, a detailed review of the image assisted procedures existing in the literature can be found in the recent work of Shi et al. [33]. From these earlier works, it can be noted that, the image assisted COVID-19 detection system is essential to support the doctor during the disease diagnosis task. Hence, in this research work, a MLS is proposed to detect the disease using the CTS.
3. Methodology
This part of the work presents the methodology implemented in this system which can work well on the CTS of the views, such as axial, coronal and sagittal. For experimental demonstration, only the axial-view of the CTS is considered. The various stages employed in the proposed work are clearly depicted in Figure 1. The infected patient primarily assed with a radiology assisted imaging procedure (CT scan), which provides a reconstructed three-dimensional (3D) image of the respiratory tract. Assessment of the 3D requires complex computations and hence, the 3D images are separated into 2D slices during the examination. In this work, the axial CTS of normal/COVID-19 class are considered to test the performance of the proposed MLS. Initially, the visibility of the infected section is enhanced using a tri-level thresholding implemented using the Chaotic-Bat-Algorithm and Kapur’s entropy (CBA+KE). After the thresholding, a bi-level threshold filter discussed in [26,34] is implemented to separate the image into Region-Of-Interest (ROI) and artifact. A feature extraction procedure is then implemented to extract the image features from original, threshold and ROI. After extracting the features; the dominant features from each image category is selected using the statistical test and the chosen features are then considered to train, test and validate the classifier system implemented in this work. Further, a future fusion technique is also employed to increase the classification accuracy.
Figure 1.
Proposed Machine-Learning system to recognize the COVID-19 from CT scan images
Image multi-thresholding
Image thresholding is one of the widely adopted enhancement technique considered to improve the visibility of grayscale/RGB images. In this work, the Kapur’s entropy thresholding discussed in [35-37] is implemented to enhance the CTS for further assessment. The mathematical description of the Kapur’s entropy is discussed blow;
Let us consider a chosen dimension of the grayscale image with L gray-levels ( to L ) with a total pixel value of G . If ( ) F i denotes the frequency of the i th intensity-level; then the pixel distribution of the image will be; (0) (1) ... ( 1) G F F F L (1) Then the probability of i th intensity-level is represented by; ( ) ( ) / P i F i G (2) If there are T thresholds as: ( , ,..., ) T t t t , where T L . During the thresholding operation, the image pixels are separated into T groups based on the assigned threshold value. After separating the images as per the chosen threshold, the entropy of each group is computed separately and combined to get the final entropy. For a tri-level threshold problem, the computed entropy will be; ( , , ) F t t t E E E (3) ln , i t i ti i o ii i
P PE P ln , i t i ti i ii t i t
P PE P (4) ln , i t i ti i ii t i t
P PE P where E =entropy, P =probability distribution, and =probability occurrence. During this operation, the objective is to find; max 1 2 3 ( ) ( , , ) Kapur
Kapur F T F t t t (5) In this research, identification of ( )
Kapur
F T is achieved by the CBA. In the literature, a number of procedures are implemented to enhance the optimization performance of the bat-algorithm (BA) and in the proposed work, the search operator in the traditional BA is improved using the Lorenz-Attractor ( ) discussed in [38,39]. Figure 2.
Search pattern made by a single bat to find , , t t t using the Lorenz-attractor The BA bas the following representations [40-42]: Velocity update = [ ] n t tj j j best j p G F (6) Location update = n n nj j n p P (7) Frequency alteration = min max min ( ) j f f f f (8) where is a random value of range [0,1]. Eqn. (8) drives Eqn. (6) and Eqn. (7) and hence, the choice of the frequency value should be appropriate. Updated value for every bat is produced based on; ( * ) nnew old p p (9) where is a Lorenz-Attractor and =loudness constraint. The expression of the loudness variation can be represented as; ( ) nj j n (10) where α is a variable with a value 0<α<1. The typical search of a bat in the three-dimension search space is depicted in Figure 2. Every bat is responsible to find ( , , ) F t t t for the considered grayscale image and the pseudo-code clearly describes the proposed thresholding process. The pseudo-code for CBA+KE thresholding;
Step1 : Initialise the CBA with following parameters {number of bats=25, search dimension=3, objective function= )T(F
Kapur , total iteration=3000 and stopping criteria=maximal iteration, min and max and varies in steps of 0.05. Step2 : Randomly initialize the bats in the 3D search space and compute )T(F
Kapur for each bat.
Step3 : Find the best G attained by a bat and update the velocity and position using; jbesttjtj1nj F]Gp[ Pp Step4 : When the search iteration rises, update the position of every bat using; )*(pp noldnew
Step5 : Is maximal iteration is reached (or) all the agents attained )T(F
Kapur ? If yes, stop the search and declare the thresholds t,t,t . Else, repeat steps 2 to 4, till maximal iteration is attained.
Image separation
The accuracy of the disease detection using bio-images depends mainly on the quality of the image considered. The lung CTS is normally associated with the lung section to be examined along with other unwanted section, such as the bone segment and other body parts. In order to have a better diagnosis using the computer assisted procedures, it is necessary to consider the Region-Of-Interest (ROI) from the medical image. In this work, a threshold filter implemented in [34] is considered to separate the threshold image into ROI and artifact. As discussed by Rajinikanth et al. [26], the threshold level ( Th ) of the filter is initially identified manually and this threshold is then considered for all other images. The extracted ROI has the pneumonia infection section due the COVID-19 and this section is then considered for further assessment. From the ROI, the pneumonia infection is then segmented using the watershed-segmentation discussed in [26]. The segmentation result confirms that, proposed methodology helped to extract the pneumonia infected region from the axial, coronal and sagittal view of the CTS. Feature extraction and selection
All the images (original, threshold, and ROI) considered in this work are in 2D form and hence, the 2D image feature extraction procedures, such as Discrete Wavelet Transform (DWT), Gray-Level Co-Occurrence Matrix (GLCM) and Hu Moments (HuM) are implemented. Further, the entropy features, such as Kapur, max [43-45], Renyi [46,47], Tsallis [48], Shannon [49], Vajda, and Yager [50,51] are also extracted and considered as the prime features. DWT:
It evaluates the non-stationary details in image and the arithmetical expression of DWT is indicated as follows; When a wavelet has the function ( ) ( ) t W r , then its DWT will be; aaa t bDWT a b x t dt (11) where )t( is the principle wavelet, the symbol ‘*’ denote the complex conjugate, a and b )Rb,a( are scaling parameters for image dilation and transition correspondingly. The proposed work extracts 40 numbers of the features using DWT [43]. After extracting these features from the normal/Covid-19 class images, student’s t-test based statistical evaluation is executed and these features are ranked based on the attained t-value and the DWT features whose p-value is >0.05 is discarded. This feature selection procedure helped to attain 13 numbers of one-dimensional feature vector and these are considered and the dominant DWT features. Let the feature vector (13x1) attained from this procedure be; ' ' '1 2 13 ( , ,..., ) DWT a a a f f f f (12) GLCM and HuM:
In the image processing literature, a considerable number of research works are considered the GLCM features during the image recognition and the categorization tasks. The implementation of the HuM for the lung CTS and Chest X-ray is already discussed in the recent work of Bhandary et al. [34]. In this work, 18 numbers of the GLCM and 9 numbers of the HuM are considered as the dominant features. ' ' ' ' ' '1 2 18 1 2 9 ( , ,..., ) ( , ,..., )
GLCM HuM b b b c c c f f f f f f f f (13) Entropy features:
Entropy is the measure of the abnormality existing in the image and this feature provides the essential information on the lung abnormality in the CTS. In this work, 7 entropy features are considered and the essential information of these features can be found in [43-47]. ' ' '1 2 7 ( , ,..., )
Entropy d d d f f f f (14) For a given image, the 1D Feature-Vector (FV) can be obtained by combining and sorting the dominant features, such as DWT f , HuMGLCM f,f , and
Entropy f . Feature extraction is separately implemented on the three images cased and the attained Feature-Vectors (FV) with a size of 47 features is arranged as follows; Original image= DWT GLCM HuM Entropy
FV f f f f
Thresholded image= DWT GLCM HuM Entropy
FV f f f f (15) ROI= DWT GLCM HuM Entropy
FV f f f f
Feature fusion
Features fusion is widely adopted in the Machine-Learning (ML) and Deep-Learning (DL) systems to enhance the classification accuracy. This practice is used to increase the size of the 1D FV to enhance the detection accuracy. In this work, the number of features existing in the considered FV is less (ie. 47x3=141). Hence, a serial fusion technique is employed to fuse the FVs, such as FV , FV and FV . The fused feature vectors considered in the proposed MLS is depicted below; FFV FV FV (16)
FFV FV FV FV (17) In which, the FFV = 94x1 features, and FFV =141x1 features.
Classifier implementation
In the ML and DL techniques, the classifiers are implemented to separate the given dataset into two or multi-class with the help of the feature-vector. Further, the choice of an appropriate classifier is essential to maintain the detection accuracy during the medial data assessment. In the proposed work, a two-class classification problem is considered and the implemented classifier is utilized to classify the CTS image dataset into normal/COVID-19 class. In the proposed work, most commonly implemented classifiers, such as Naive Bayes (NB), k-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) with linear kernel are employed to classify the considered images using the feature vectors, like FV , FFV and FFV . The theoretical background of NB, KNN, DT, RF and SVM can be found in the literature [43-47, 50-53].
Performance validation
The eminence of ML and DL based data analysis is generally authenticated by calculating the important performance values. In the proposed MLS, the following performance values are computed to validate the eminence of the implemented classifier system. The mathematical expression for performance values are as follows: ver ve ve
FFalse Negative Rate FN F T (18) ositive ver ve ve
FFalse P ive Rate FP F T (19) ve veve ve ve ve
T TAccuracy ACC T T F F (20) Pr veve ve Tecision PRE T F (21) veve ve
TSensitivity SEN T F (22) veve ve
TSpecicity SPE T F (23)
Negative Predictive Value veve ve
TNPV T F (24)
21 1 2 veve ve ve
TF Score F S T F F (25) where ve F , ve F , ve T , and ve T represents, false-positive, false-negative, true-positive, and true-negative respectively. COVID-19 dataset
The clinical level analysis of the pneumonia infection due to COVID-19 is generally assessed using CTS. This work considered 500 numbers of grayscale lung CTS (250 normal and 250 COVID-19 class) for the estimation. The normal CTS are collected from the LIDC-IDRI [54-56] and the RIDER-TCIA [57,58] and the COVID-19 class images are collected from the Radiopaedia database [59-67] and the benchmark test images available at [68]. All these images are resized into 512x512x1 pixels and the resized images are considered for the experimental investigation. The sample test images considered in the proposed study is depicted in Figure 3. N o r m a l C O V I D - Figure 3.
Sample test images considered in the proposed study
4. Result and Discussion
In this section, the investigational outcome achieved are presented and discussed. This MLS is implemented using a workstation with configuration-Intel i5 2.GHz processor with 8GB RAM and 2GB VRAM equipped with the MATLAB ® . Experimental outcome of this MLS authenticate that it needs a mean time of 183±17sec to process the considered CTS dataset. The benefit of this MLS is, it is an automated technique and will not involve the operator assistance during the CTS classification. The performance of implemented thresholding and segmentation technique is initially executed using the clinical grade CTS provided in Radiopaedia case-study [69] and the attained results are presented in Figure 3. In this work, the axial, coronal and sagittal slices of the case-study is assessed to confirm the performance of the proposed system and finally the infection due to COVID-19 is extracted using the watershed segmentation recently discussed in [26]. The results confirm that, proposed work offered better segmentation on the considered CTS irrespective of its orientation. Fig 4(a) and (b) depicts the test image to be evaluated and CBA+KE thresholded image respectively. Fig 4(c) and (d) depicts the outcome of the threshold filter, such as > Th and < Th respectively. Fig 4(e) presented the extracted infection using the watershed algorithm. From Fig 4(e), it can be noted that, proposed scheme works well on all the orientations of the CTS and extracts the infection with better accuracy. A x i a l C o r o n a l S a g i tt a l (a) (b) (c) (d) (e) Figure 4.
Sample results attained with 2D slices of the CT scan images. (a) Test image, (b) Threshold image, (c) Separated section with threshold > Th, (d) Extracted section with threshold < Th, (e) Extracted COVID-19 infection After confirming the segmentation performance on the considered case-study, the proposed MLS is then considered to classify the CTS database into normal/COVID-19 class using a chosen procedure. As discussed in section 3, all the images of the considered CTS database are initially enhanced using the CBA+KE threshold and from the enhanced image the ROI is extracted by implementing the filter with a chosen thereshold of
Th=179 ±4. Later, the essential image features, from the original, threshold image and the ROI are extracted using DWT, GLCM, HuM and entropies; as discussed in section 3.3. Later, the feature selection is implemented for the DWT, which helped to reach a final feature vector of dimension 47x1 is reached and is then named as FV , FV and FV The feature parameters in FV and FV are approximately similar ( FV ≈ FV ), hence, during the feature fusion task, the fused-feature-vector is attained as follows; FFV FV FV and
FFV FV FV FV .The classifier training, testing and validation is separately implemented using FV , FFV , and FFV ; and the attained results are then analyzed to identify and confirm the best possible classifier for the proposed MLS. Initially, FV is considered to evaluate the performance of the classifier on the considered data and the attained results are depicted in Figure 5. The performance of the classifier is verified using a five-fold cross validation and the best value among the five trials are chosen for the assessment. Fig 5(a) and (b) shows the confusion-matrix obtained for the NB and KNN classifier. Fig 5(c) – (e) depicts the confusion-matrix of DT, RF and SVM respectively. From this confusion-matrix, it can be noted that the accuracy and overall performance offered by the SVL is superior compared to other classifiers, considered in this research. The classification accuracy attained by the SVM is 81.20% Detected A c t u a l COVID-19 COVID-19 Normal T P =203 F N =47 SEN= 0.8120 Normal F P =52 T N =198 SPE= 0.7920 PRE= 0.7961 NPV= 0.8082 ACC= 0.8020 (a) NB Detected A c t u a l COVID-19 COVID-19 Normal T P =198 F N =52 SEN= 0.7920 Normal F P =44 T N =206 SPE= 0.8240 PRE= 0.8182 NPV= 0.7984 ACC= 0.8080 (b) KNN
Detected A c t u a l COVID-19 COVID-19 Normal T P =201 F N =49 SEN= 0.8040 Normal F P =48 T N =202 SPE= 0.8080 PRE= 0.8072 NPV= 0.8048 ACC= 0.8060 (c) DT Detected A c t u a l COVID-19 COVID-19 Normal T P =196 F N =54 SEN= 0.7840 Normal F P =49 T N =201 SPE= 0.8040 PRE= 0.8000 NPV= 0.7882 ACC= 0.7940 (d) RF Detected A c t u a l COVID-19 COVID-19 Normal T P =209 F N =41 SEN= 0.8360 Normal F P =53 T N =197 SPE= 0.7880 PRE= 0.7977 NPV= 0.8277 ACC= 0.8120 (e) SVM
Figure 5.
Confusion metrics achieved during the classification task executed with FV Similar procedure is then repeated using FFV as well as FFV and the results attained are presented in Table 2 and Table 3. The results of these tables confirm that, the classifier accuracy achieved with the fused-feature-vector is better and these results confirm that, the increase in number of features will increase the classification accuracy. The classification accuracy attained with FFV is better compared to the accuracy attained with FFV as well as FV . Along with the accuracy, it is necessary to compute the overall accuracy of the classifier, to confirm its clinical significance. To get the information about the overall performance of the classifier, Glyph plot is considered in this work. The Glyph plot [] will provide a graphical representation based on the amplitudes of the performance measures. Usually, the Glyph plot with larger dimension represents the better overall performance and the Glyph plot achieved for various classifiers using FV , FFV , and FFV are depicted in Figure 6. From Fig 6(a) it can be noted that, the overall performance attained with SVM is superior. Fig 6(b) shows the better performance by RF and Fig 6(c) confirms the performance of the SVM. In the considered system, for all the feature cases, the overall performance attained with the classifiers, such as NB, KNN and DT are lesser compared to the RF and the SVM.
Table 2.
Initial performance values achieved with the proposed system Features Classifier TP FN TN FP FN r FP r F VV ( F V F V ) ( x f e a t u r e s ) NB 201 49 211 39 0.1960 0.1560 KNN 207 43 209 41 0.1720 0.1640
DT 209 41 212 48 0.1640 0.1846 RF 216 34 213 37 0.1360 0.1480 SVM 212 38 211 39 0.1520 0.1560 F VV ( F V F V F V ) ( x f e a t u r e s ) NB 222 28 217 33 0.1120 0.1320 KNN 226 24 210 40 0.0960 0.1600 DT 224 26 218 32 0.1040 0.1280 RF 228 22 213 37 0.0880 0.1480 SVM 218 32 231 19 0.1280 0.0760
Table 3.
Performance measures attained using the proposed machine-learning scheme
Features Classifier ACC PRE SEN SPE F1S NPV F VV ( F V F V ) ( x f e a t u r e s ) NB 0.8240 0.8375 0.8040 0.8440 0.8204 0.8115 KNN 0.8320 0.8347 0.8280 0.8360 0.8313 0.8294 DT 0.8255 0.8132 0.8360 0.8154 0.8245 0.8379 RF 0.8580 0.8538 0.8640 0.8520 0.8588 0.8623 SVM 0.8460 0.8446 0.8480 0.8440 0.8463 0.8474 F VV ( F V F V F V ) ( x f e a t u r e s ) NB 0.8780 0.8706 0.8880 0.8680 0.8792 0.8857 KNN 0.8720 0.8496 0.9040 0.8400 0.8760 0.8974 DT 0.8840 0.8750 0.8960 0.8720 0.8854 0.8934 RF 0.8820 0.8604 0.9120 0.8520 0.8854 0.9064 SVM 0.8980 0.9198 0.8720 0.9240 0.8953 0.8783 (a) (b) (c)
Figure 6.
Glyph plot of the performance measures obtained during the classification task. (a) Classification with FV , (b) Classification with FFV (c) Classification with FFV
Figure 7.
Classification accuracy achieved with the proposed system with various feature vectors
The results presented in Figure 7 confirm that, when the number of features is increased, then the classification accuracy can be increased. Further, the accuracy attained using FFV is superior compared with the accuracy attained with other irrespective of the classifier unit. In the proposed research, the MLS is implemented to examine the CTS dataset of normal/COVID-19 class and attained an accuracy of >89% with the SVM classifier. In future, a suitable DL procedure can be implemented to improve the classification accuracy.
5. Conclusions
The aim of this research is to propose a computerized system to distinguish the normal and COVID-19 CTS images from a considered image database. This work proposes a MLS using a sequence of procedures ranging from image pre-processing to the classification to implement a scheme with better detection accuracy. The proposed MLS initially implements an image thresholding process with CBA+KE to enhance the test image and then implements a threshold filter to separate the ROI and artifact. Later, essential procedures, such as feature-extraction, feature-selection, feature-fusion, and classification are employed in the proposed MLS. In this work the classifier units, like NB, KNN, DT, RF and SVM are considered and its performance are individually tested with chosen features, such as FV , FFV and FFV . The experimental investigation of this study confirms that, the classification accuracy of SVM is 89.80% when FFV is considered to train, test and validate the classifier. This confirms that, when the proposed MLS is equipped with the SVM classifier, a better classification is attained with the considered CTS database.
References Celik, Y. et al., (2020) Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images. Pattern Recognition Letters, 133, 232-239. https://doi.org/10.1016/j.patrec.2020.03.011. 2.
Das, A. et al., Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques. Cognitive Systems Research, 54, 165-175. https://doi.org/10.1016/j.cogsys.2018.12.009 3.
Sharif, M. et al., A unified patch based method for brain tumor detection using features fusion. Cognitive Systems Research, 59, 273-286. https://doi.org/10.1016/j.cogsys.2019.10.001. 4.
Yan, R. et al. (2020) Chest CT Severity Score: An Imaging Tool for Assessing Severe COVID-19. Radiology: Cardiothoracic Imaging 2020, 2(2). https://doi.org/10.1148/ryct.2020200047. 5.
Fong, SJ., Li, G., Dey, N., Crespo, R.G., Herrera-Viedma, E. (2020) Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV Novel Coronavirus Outbreak. International Journal of Interactive Multimedia and Artificial Intelligence, 6(1), 132-140. Doi: 10.9781/ijimai.2020.02.002. 6.
Fong, SJ., Li, G., Dey, N., Crespo, R.G., Herrera-Viedma, E. (2020) Composite Monte Carlo Decision Making under High Uncertainty of Novel Coronavirus Epidemic Using Hybridized Deep Learning and Fuzzy Rule Induction, 9. arXiv:2003.09868 [cs.AI]. 7. th April 2020) 8. th April 2020) 9. th April 2020) 10.
Nascimento, IBD, et al. (2020) Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis. J. Clin. Med., 9(4), 941; https://doi.org/10.3390/jcm9040941 11.
Bernheim, A. et al. (2020) Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection. Radiology. https://doi.org/10.1148/radiol.2020200463 12.
Santosh, KC. AI-Driven Tools for Coronavirus Outbreak: Need of Active Learning and Cross-Population Train/Test Models on Multitudinal/Multimodal Data. J Med Syst 2020, 44, 93. https://doi.org/10.1007/s10916-020-01562-1. 13.
Chua, F. et al. (2020) The role of CT in case ascertainment and management of COVID-19 pneumonia in the UK: insights from high-incidence regions. Lancet Resp Med. https://doi.org/10.1016/S2213-2600(20)30132-6. 14.
Li, Q. et al. (2020) Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. DOI:10.1056/NEJMoa2001316. 15.
Liu, K-C. et al. (2020) CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity. Eur J Radiol,, 126, 108941. https://doi.org/10.1016/j.ejrad.2020.108941. 16.
Verity, R. et al. (2020) Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases. https://doi.org/10.1016/S1473-3099(20)30243-7. 17.
Fang, Y. et al. (2020) Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology. DOI:10.1148/radiol.2020200432. 18.
Zhou, Z., Guo, D., Li, C. et al. (2020) Coronavirus disease 2019: initial chest CT findings. Eur Radiol . https://doi.org/10.1007/s00330-020-06816-7. 19.
Yoon, SH. et al. Chest Radiographic and CT Findings of the 2019 Novel Coronavirus Disease (COVID-19): Analysis of Nine Patients Treated in Korea. Korean J Radiol. 2020, 21(4):494-500. Doi: 10.3348/kjr.2020.0132. 20.
Li, K., Fang, Y., Li, W. et al. (2020) CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19). Eur Radiol. https://doi.org/10.1007/s00330-020-06817-6. 21.
Song, F, Shi, N, Shan, F, Zhang, Z, Shen, J, Lu, H et al. (2020) Emerging Coronavirus 2019-nCoV Pneumonia. Radiology, 295, 210–217. 22.
Chung, M, Bernheim, A, Mei, X, Zhang, N, Huang, M, Zeng, X et al. (2020) CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV). Radiology, 295, 202–207. https://doi.org/10.1148/radiol.2020200230. 23. https://healthcare-in-europe.com/en/news/ct-outperforms-lab-diagnosis-for-coronavirus-infection.html (Last accessed date 20 th April 2020) 24.
Bai, H.X. et al. (2020) Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology. DOI: 10.1148/radiol.2020200823.
Wang, Y. et al. (2020) Temporal Changes of CT Findings in 90 Patients with COVID-19 Pneumonia: A Longitudinal Study. Thoracic Imaging. https://doi.org/10.1148/radiol.2020200843. 26.
V. Rajinikanth, N. Dey, A.N.J. Raj, A.E. Hassanien, K.C. Santosh, N.S.M. Raja. Harmony-Search and Otsu based System for Coronavirus Disease (COVID-19) Detection using Lung CT Scan Images, 2020. arXiv:2004.03431 [eess.IV]. 27.
V. Rajinikanth, S. Kadry, K.P. Thanaraj, K. Kamalanand, S. Seo. Firefly-Algorithm Supported Scheme to Detect COVID-19 Lesion in Lung CT Scan Images using Shannon Entropy and Markov-Random-Field, 2020, arXiv:2004.09239 [eess.IV]. 28.
Y-H. Wu, S-H. Gao, J. Mei, J. Hu, D-P. Fan, C-W. Zhao, M-M. Cheng, JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation, 2020. arXiv:2004.07054 [eess.IV]. 29.
A.I. Khan, J.L. Shah, M. Bhat, CoroNet: A Deep Neural Network for Detection and Diagnosis of Covid-19 from Chest X-ray Images, 2020. arXiv:2004.04931 [eess.IV]. 30.
M. Rahimzadeh, A. Attar, A New Modified Deep Convolutional Neural Network for Detecting COVID-19 from X-ray Images, 2020. arXiv:2004.08052 [eess.IV]. 31.
U. Ozkaya, S. Ozturk, M. Barstugan, Coronavirus (COVID-19) Classification using Deep Features Fusion and Ranking Technique, 2020. arXiv:2004.03698 [eess.IV]. 32.
T. Zhou, S. Canu, S. Ruan, An automatic COVID-19 CT segmentation based on U-Net with attention mechanism, 2020. arXiv:2004.06673 [eess.IV]. 33.
F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, K. He, Y. Shi, D. Shen, Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19, IEEE Reviews in Biomedical Engineering, 2020. Doi: 10.1109/RBME.2020.2987975. 34.
Bhandary, A. et al. (2020) Deep-learning framework to detect lung abnormality–A study with chest X-Ray and lung CT scan images. Pattern Recogn Lett, 129, 271-278. https://doi.org/10.1016/j.patrec.2019.11.013. 35.
P. Upadhyay, J.K. Chhabra, Kapur’s entropy based optimal multilevel image segmentation using Crow Search Algorithm. Applied Soft Computing, 105522, 2019. https://doi.org/10.1016/j.asoc.2019.105522. 36.
N.S.M. Raja et al., Segmentation of breast thermal images using Kapur's entropy and hidden Markov random field. Journal of Medical Imaging and Health Informatics, 7(8), 1825-1829, 2017. https://doi.org/10.1166/jmihi.2017.2267. 37.
J.N. Kapur, P.K. Sahoo, A.K.C. Wong, A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285, 1985. 38.
I. Stewart, The Lorenz Attractor Exists. Nature 406, 948-949, 2000. 39.
U. Yüzgeç, M. Eser, Chaotic based differential evolution algorithm for optimization of baker's yeast drying process. Egyptian Informatics Journal, 19(3), 151-163, 2018. https://doi.org/10.1016/j.eij.2018.02.001. 40.
X.S. Yang, Nature-Inspired Metaheuristic Algorithms, 2nd edn. Luniver Press, Frome (2011) 41.
X.S. Yang Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3), 141–149, 2013. 42.
Satapathy, S.C., Raja, N.S.M., Rajinikanth, V., Ashour, A.S.: Dey, N: Multi-level image thresholding using Otsu and chaotic bat algorithm. Neural Comput. Appl. 29(12), 1285-1307, 2018. https://doi.org/10.1007/s00521-016-2645-5 43.
Acharya, UR. et al. (2019) Automated detection of Alzheimer’s disease using brain MRI images–a study with various feature extraction techniques. Journal of Medical Systems, 43(9), 302. https://doi.org/10.1007/s10916-019-1428-9 44.
J.H. Tan, E.Y.K. Ng, U.R. Acharya, C. Chee, Study of normal ocular thermogram using textural parameters. Infrared Physics & Technology, 53(2): 120-126, 2010. 45.
C.E.Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27: 623–656, 1948. 46.
G.A. Darbellay, I. Vajda, Entropy expressions for multivariate continuous distributions. IEEE Trans. Inf. Theory, 46, 709–712, 2000. 47.
U.R. Acharya et al., Application of nonlinear methods to discriminate fractionated electrograms in paroxysmal versus persistent atrial fibrillation, Computer methods and programs in biomedicine, vol.175, pp. 163-178, 2019. https://doi.org/10.1016/j.cmpb.2019.04.018. 48.
C. Tsallis, Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52(1), 479–487, 1988. 49.
P.L. Kannappan, On Shannon’s entropy, directed divergence and inaccuracy. Probab. Theory Rel. Fields 22, 95–100, 1972. https://doi.org/10.1016/S0019-9958(73)90246-5
J.E.W. Koh et al., Automated diagnosis of celiac disease using DWT and nonlinear features with video capsule endoscopy images, Future Generation Computer Systems, vol.90, pp. 86-93, 2019. https://doi.org/10.1016/j.future.2018.07.044. 51.
U.R. Acharya et al., Automated detection and classification of liver fibrosis stages using contourlet transform and nonlinear features, Computer methods and programs in biomedicine, vol.166, pp. 91-98, 2018. https://doi.org/10.1016/j.cmpb.2018.10.006. 52.
M. Tuceryan, A.K. Jain, Texture Analysis. In: The Handbook of Pattern Recognition and Computer Vision, 2nd edn., pp. 207–248. World Scientific Publishing Co., Singapore, 1998. 53.
Dey, N. et al. (2019) Social-Group-Optimization based tumor evaluation tool for clinical brain MRI of Flair/diffusion-weighted modality. Biocybernetics and Biomedical Engineering, 39(3), 843-856. https://doi.org/10.1016/j.bbe.2019.07.005 54.
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. (2013) The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, 26(6), 1045-1057. DOI: https://doi.org/10.1007/s10278-013-9622-7 55.
Armato, SG, et al. (2011) The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38, 915—931. DOI: https://doi.org/10.1118/1.3528204 56.
Armato, SG, et al. (2015) Data From LIDC-IDRI. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX 57.
Zhao, Binsheng, Schwartz, Lawrence H, & Kris, Mark G. (2015). Data From RIDER_Lung CT. The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2015.U1X8A5NR 58.
Zhao, B. et al. (2009) Evaluating Variability in Tumor Measurements from Same-day Repeat CT Scans of Patients with Non–Small Cell Lung Cancer 1. Radiology. Radiological Society of North America (RSNA). DOI: 10.1148/radiol.2522081593 59. https://radiopaedia.org/articles/covid-19-3 (Last accessed date 20th April 2020) 60.
Case courtesy of Dr Chong Keng Sang, Sam, Radiopaedia.org, rID: 73893(Last accessed date 20th April 2020) 61.
Case courtesy of Dr Domenico Nicoletti, Radiopaedia.org, rID: 74724 (Last accessed date 20th April 2020) 62.
Case courtesy of Dr Fabio Macori, Radiopaedia.org, rID: 74867 (Last accessed date 20th April 2020) 63.
Case courtesy of Dr Fateme Hosseinabadi , Radiopaedia.org, rID: 74868 (Last accessed date 20th April 2020) 64.
Case courtesy of Dr Derek Smith, Radiopaedia.org, rID: 75249 (Last accessed date 20th April 2020) 65.
Case courtesy of Dr Bahman Rasuli, Radiopaedia.org, rID: 74880 (Last accessed date 20th April 2020) 66.
Case courtesy of Dr Mohammad Taghi Niknejad, Radiopaedia.org, rID: 75605 (Last accessed date 20th April 2020) 67.
Case courtesy of Dr Mohammad Taghi Niknejad, Radiopaedia.org, rID: 75662 (Last accessed date 20th April 2020) 68. http://medicalsegmentation.com/covid19/ (Last accessed date 20th April 2020) 69.