Classification of COVID-19 via Homology of CT-SCAN
Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
aa r X i v : . [ ee ss . I V ] F e b Graphical Abstract
Classification of COVID-19 via Homology of CT-SCAN
Sohail Iqbal, Hafiz Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
Figure 1: Graphical Abstract lassification of COVID-19 via Homology of CT-SCAN
Sohail Iqbal a,1, ∗ , Hafiz Fareed Ahmed a,1 , Talha Qaiser b,1 , Muhammad Imran Qureshi c ,Nasir Rajpoot d a Department of Mathematics, COMSATS University Islamabad, Park Road, Islamabad, Pakistan b Department of Computing, Imperial College London, SW72AZ, United Kingdom c Department of Mathematics, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, SaudiArabia d Deparment of Computer Science, The University of Warwick, Coventry, CV47AL, United Kingdom
Abstract
In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importanceto detect the disease at an early stage especially in the hot spots of this epidemic. There aremore than 110 Million infected cases on the globe, sofar. Due to its promptness and effec-tive results computed tomography (CT)-scan image is preferred to the reverse-transcriptionpolymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the onlypossible way of controlling the spread of the disease. Automated analysis of CT-Scans canprovide enormous support in this process. In this article, We propose a novel approach todetect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive andnatural idea of analyzing shapes, an attempt to mimic a professional medic. We mainlytrace SARS-CoV-2 features by quantifying their topological properties. We primarily use atool called persistent homology, from Topological Data Analysis (TDA), to compute thesetopological properties.We train and test our model on the “SARS-CoV-2 CT-scan dataset” (Soares et al. ,2020), an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients.Our model yielded an overall benchmark F1 score of 99 . . . . ∗ Corresponding author
Email addresses: [email protected] (Sohail Iqbal ), [email protected] (Hafiz Fareed Ahmed), [email protected] (Talha Qaiser ),
[email protected] (Muhammad ImranQureshi),
[email protected] (Nasir Rajpoot) These authors contributed the same during this project
Preprint submitted to Elsevier February 23, 2021 . Introduction
On March 11 2020 WHO declared a pandemic caused by the novel corona virus (nCoV).The disease is well known as SARS-CoV-2, or COVID-19, and originated from Wuhan, Chinain December 2019. After more than a year the virus is still spreading exponentially and hasreached 212 countries. At the time of writing this paper there are more than 110 millioninfected cases, whereas the death toll has crossed 2.4 million.The main reason for the spread of the virus is its asymptomatic nature, where an affectedperson spreads the virus without any signs of illness. The only precaution that is requiredworldwide is immediate isolation of the affected person. For this purpose, effective andtimely testing is the core factor for treatment and prevention.COVID-19 is a member of the family of viruses SARS (Severe Acute Respiratory Syn-drome). It causes severe respiratory illness and primarily damages the lungs (Huang et al. ,2020). We use “Reverse Transcription Polymerise Chain Reaction (RT-PCR)” and chestcomputed tomography (CT) scan for the detection of COVID-19. In RT-PCR we reversetranscription of RNA into DNA, and then detect the presence of virus DNA. This methodis a laboratory technique that requires a trained personnel that carries out the whole pro-cess. The test might take hours, if not days, to give the results. This situation worsens, forextremely affected areas with limited test kits and trained personnel. The RT-PCR testingalso gives us the false-negative results, in some cases (Tahamtan & Ardebili, 2020), whilein the chest CT-scan, the image analysis has been done by radiologists (Kong & Agarwal,2020). The latter is much more sensitive method than the RT-PCR. The study of 1014 casesshows the significance of chest CT-scans over RT-PCR (Ai et al. , 2020). One study findsthat 40 out of 41 (98%) patients had pneumonia with abnormal findings on chest CT-scans(Huang et al. , 2020).The image analysis by radiologists suggest that in COVID-19-positive patients, theground-glass opacities (GGOs) together with consolidations, crazy pavings appear at theperipheral portions of bilateral lungs. The increased attenuation in chest CT scan is themain feature in detection of COVID-19. Detection of these features is relatively time effi-cient due to higher sensitivity of this method (Li et al. , 2020).Over the past decade, one of the promising directions in health care innovation is theapplicability of artificial intelligence (AI) in medical imaging. In recent years, AI in generalhas revolutionized the field of computer vision (Voulodimos et al. , 2018), and natural lan-guage processing (Ruder et al. , 2019; Ruder, 2019) by pushing state-of-the-art performancein various pattern recognition tasks. More recently, there is an upward trend in exploring theusability of machine learning algorithms for medical imaging data (Currie et al. , 2019). Inthe current scenario with the worldwide outbreak of SARS-CoV2, it is imperative to developscreening tools to analyze the COVID-19 chest CT-scans. One of the main challenges in deeplearning is data hungriness. In order to converge a deep learning model one may require thou-sands, and in some cases millions of images (Deng et al. , 2009) for training. On the otherhand techniques of Topological Data Analysis (TDA) use geometrical features, making themefficient in terms of amount of data required, speed, predictability and interpretability.TDA is one of the rapidly growing techniques in data analysis. It provides tools to3nalyze data by bridging techniques from machine learning, statistics, algebraic topology,topology, and algebra. One of the main tools in TDA is persistent homology (PH). It is avery effective technique that record the intrinsic topological properties of data. The essentialidea is to produce topological features across a scale. On this scale, some features “die” earlyand some “live” longer. The persisting times of these features are the key point of PH. Theideas of PH has been successfully applied to many areas of science and technology vis-`a-vis network structures (de Silva & Ghrist, 2007; Lee et al. , 2012), computational biology(Kasson et al. , 2007; Yao et al. , 2009; Wang & Wei, 2016), data analysis (Carlsson, 2009;Liu et al. , 2012; Rieck et al. , 2012), image analysis (Carlsson et al. , 2009; Frosini & Landi,2013; Bendich et al. , 2010), amorphous material structures (Hiraoka et al. , 2016), etc.In the recent years these techniques has been successful in medical image analysis. Forexample, in (Qaiser et al. , 2019, 2016, 2017) the authors developed models, based on PH,for efficient tumor segmentation in whole-slide images of histology slides; in (Garside et al. ,2019) the topological features are used to differentiate healthy patients and those with dia-betic retinopathy; in (Chung et al. , 2018) segmentation of skin cancer using a given imagewas achieved using these techniques, etc.In this work we develop a state-of-the-art model to detect traces of COVID-19 infec-tion in CT-scans. To train and test our model we use “SARS-CoV-2 CT-scan dataset”(Soares et al. , 2020).
Figure 2: A sample of CT-Scan images from “SARS-CoV-2 CT-scan dataset” for COVID-19 patients (firstrow) and normal cases (second row).
There are mainly three stages to develop our model. In the first stage, we devise a wayto construct a simplicial complex from a give image and then calculate PD’s associated toit. We map our PDs on a Hilbert sphere, following Anirudh et al (Anirudh et al. , 2016), inthe second stage. This step enable us to perform different statistical operations on the spaceof PDs. In the last stage, we use SVM to develop our classification model.The rest of the paper is structured as follows. We review basics of PH in Section 2. InSection 3 we describe our methodology to extract features from CT-scans. Our restults arereported in 4. Finally, in Section 5, we draw our conclusions.4 . Mathematical Preliminaries
In this section we recall basic notions and definitions leading to persistent homology andpersistent diagrams (PD’s).
The persistent homology is one of the main tools used in topological data analysis (TDA).It provides a way to analyze the shape of a point cloud data without actually calculatingthe precise geometry. It illuminates some qualitative features of data which persist acrossmultiple scales. These persistent features provide an effective quantification for the shape ofdata. The method is based on techniques from algebraic topology, a branch of mathemat-ics that deals with different “bridges” between algebra and topology known as “functors”.These functors take topologically equivalent (homotopic) spaces to algebraically equivalent(isomorphic) spaces . The functorial nature of persistent homology makes it robust to pertur-bations of an input point cloud; a rarely found feature in some of the existing data analysistechniques.A number of functors exist to deal with different classes of topological spaces (simplicial,cellular, singular, etc). The development of PH is based on the functor known as simplicialhomology. On the topological side we consider a simplicial complex ∆, and on the algebraicside we get vector spaces H i (∆) for i = { , , , . . . } . The dimension of H (∆) gives thenumber of connected components, H (∆) gives the number of holes, H (∆) gives the numberof voids, and so on. In PH we construct simplicial complexes ∆ ǫ from a point cloud dependingon a scale parameter ǫ . The homological features that remain persistent across scales, providean effective analysis of the shape of data. The summary of these features is either shown ona “bar diagram” (BD) or a “persistent diagram” (PD).In what follows we provide a formal overview of the aforementioned terminologies. Formore details see (Rotman, 2013; Carlsson, 2009; Otter et al. , 2017; Edelsbrunner et al. ,2000). Definition 2.1.1 (Simplex).
Let { p , p , . . . , p n } be an affine independent set in R n . Then-simplex generated by this set, denoted by [ p , p , . . . , p n ] , is the convex hull of points p , p , . . . , p n .Every point x of this simplex can be written uniquely in the following form x = n X i =0 t i p i , where n X i =0 t i = 1 and each t i ≥ . (1)A k -face of [ p , p , . . . , p n ] is a simplex generated by a collection of k + 1 points from { p , p , . . . , p n } . A k -face is in fact a k -dimensional geometric object. A simplicial complexis obtained by “gluing” together different simplices along their common faces. Definition 2.1.2 (Simplicial Complex).
A finite simplicial complex ∆ is a collection ofsimplices in R n such that (1) if α ∈ ∆ then every face of α belong to ∆ , (2) for any twosimplices α , α ∈ ∆ , the intersection α ∩ α is either empty or a common face of α and α . igure 3: Orientation of a simplicial complex These conditions guarantee that ∆ records changes in each dimension. In order to calcu-late simplicial homology we need an orientation on the simplicial complex. An orientationof a simplicial complex is a partial order of its vertices which when restricted to a particularsimplex gives a total order.
Figure 4: A simplicial complex with a partial order p < p < p < p < p < p , p < p < p For an oriented simplicial complex ∆, and integer m ≥ − m th chain group C m (∆)consists of formal sums of the form a σ + a σ + · · · + a n σ n , where a , . . . , a n ∈ Z and σ , . . . , σ n are oriented m -simplices of ∆. For convenience, we define C m (∆) = { } for m > dim ∆ and m <
0. An oriented simplicial complex is an element of the chaingroup C m ( K ) of the form ± < p , p , . . . , p m > where p , . . . , p m are distinct. The boundaryoperator ∂ m : C m (∆) → C m − (∆) is defined by setting ∂ m ( < p , p , . . . , p m > ) = m X i =0 ( − i < p , . . . , ˆ p i , . . . , p m >, p i means deleting p i , and extends by linearity. Combining all information, we get thefollowing chain complex · · · ∂ m +1 −−−→ C m (∆) ∂ m −→ C m − (∆) ∂ m − −−−→ · · · ∂ −→ C (∆) ∂ −→ C (∆) ∂ −→ , such that the composition of any two consecutive maps is a zero map. We define the m -thsimplicial homology as H m (∆) = Kernel ∂ m / Image( ∂ m +1 ) , where 0 ≤ m ≤ dim ∆ . Its dimension β m (∆) = dim H m (∆)is called the m th Betti number of the simplicial complex ∆. For a data set X , we can calculate its Betti numbers after imposing a simplicial com-plex ∆( X ) on it. This information is not useful since it only gives number of connectedcomponents, 1-dimensional holes, etc. To extract further information, we construct a fil-tered complex, which consists of nested subcomplexes ∆( X, ǫ ) of ∆, that depend on a scaleparameter ǫ , such that ∆( X, ǫ ) ⊆ ∆( X, ǫ ) whenever ǫ ≤ ǫ . We can apply the simplicial homology functor on each subcomplex. The inclusion map on thecomplexes ∆(
X, ǫ ) ⊂ ∆( X, ǫ ) induces a linear map on the homology groups H m (∆( X, ǫ )) → H m (∆( X, ǫ )), for 0 ≤ m ≤ dim ∆. Hence the homology of this filtration complex consis-tently provide information about ∆ at different values of ǫ . One can represent these features,over various scales, using persistent diagrams (PDs). To exemplify, consider a point cloudin R , say S , given in Fig 5. To extract qualitative information from this data we computetopological features for different values of a scale parameter ǫ . At each scale level of ǫ , weconsider open discs, say D ( p, ǫ ), of radius ǫ around each point p . Then we build a simplicialcomplex ∆( S, ǫ ) using the following rule; a set of points A = { p , . . . , p n } forms an n-simplexif ∩ p ∈ A D ( p, ǫ ) = Φ. At each level the simplicial complex is made up of simplices shown inthe Fig 5.As the values of ǫ increase from 0, we get a filtration of simplicial complexes ∆( S, ǫ ) Figure 5: A filtration of simplicial complexes build from a given point cloud ∆( S,
7e can represent the birth and death of these topological features using the persistentdiagram. A persistent diagram (PD) is a collection of ordered pairs in the extended plane.A point ( a, b ) represent the birth at scale parameter value ǫ = a and death at ǫ = b . Thepoints that touches the infinity line are the persistent features that do not die till the lastvalue of our filtration parameter. In Fig 6 we see a point at line of infinity. This depicts thatthere is one-dimensional hole (or loop) that appears at ǫ = 2 (see Fig 5) and never dies. Itis very consisting with the observation that the overall shape of data in Fig 5 is circular. Figure 6: PDs representing topological features of the point cloud from Fig 5
Apart from its stability, another important feature of the PH is that it can be com-puted using different algorithms. There are many libraries available that implement algo-rithms for the computation of PH. These libraries include,
Perseus , javaPlex , Dionysus , ripser , Gudhi , etc. We use ripser (Tralie et al. , 2018) due to its computational efficiency(Otter et al. , 2017).There are many ways to impose a simplicial complex on a given point cloud. The choicedepends on the nature of data, computational cost, and restrictions of software/packageused. Some typical simplicial complexes are, Vietoris–Rips complex, ˇCech complex, Delaunaycomplex, clique complex, alpha complex, strong witness complex, weak witness complex, etc.
3. Methods and Algorithm
Thoracic radiology evaluations found high rates of ground-glass opacities and consolida-tions in COVID-19 patients. One can observe the ground-glass opacities (GGOs) togetherwith consolidations in the CT-Scan of COVID-19 images. These regions are isolated withdifference in shapes (left image in Fig. 7), which is captured by PDs associated to H and H . Moreover, these regions have unique shape in the intensity plot, see appearance of alps,saddle points in Fig. 7 (right image). These are recorded by PDs of Lower-star-filtrationand H . 8 igure 7: Shape features and intensity plot in a COVID-19 infected CT-scan All These shape features are captured by PDs associated to two different filtered com-plexes, namely, filtered Vietoris-Rips (VR) complex, and lower-star-filtration. The wholeprocess is summarized as two pipelines in Fig 8. In the following subsections, we describeeach step of these two pipelines.
Figure 8: Extraction of topological features from a CT-Scan of a COVID-19 patient. (A) Quantization ofthe Input Image which is used to construct feature point cloud (FPC). (B) Construction of two filteredcomplexes: filtered VR complex constructed from FPC, and lower-star filtration directly from image. (C)Calculation of PDs (D) Use of Riemannian framework to map PDs on a Hilbert sphere. Later, by usingPGA, we construct a unique feature vector of the Input Image.
There are four PDs associated to an image; three from filtered VR complex correspondingto H , H and H , and one from lower-star-filtration. To simplify statistical operationswe map each PD onto a Hilbert sphere, this is explained in Subsection 3.2. On Hilbert9phere, we reduce dimension by using principal geodesic analysis (PGA)(Fletcher et al. ,2004). Eventually each PD is mapped onto a vector of length 2400, a juxtaposition of thesevectors gives a combined feature vector for each image. In the final step, in Subsection 3.3,we build a model from these feature vectors. Before calculating PH we first build feature point cloud (FPC) from a given point cloud.The FPC is an optimal way to record changes for all values of some chosen feature whilekeeping the computational time in a feasible limit.Let X be the point cloud with a metric d X , and F be a compact feature space endowedwith a metric d F . We define the projection map π : X → F such that π ( p ) is the featureof the point p . Let U = { U i } i ∈ I be a covering of F , where I is a finite indexing set, thefiniteness is guaranteed by the compactness of F . Using the projection map π , we get acovering C ( X ) of X , where C ( X ) = { π − ( U i ) } i ∈ I . Let Γ( X , U ) be the connected componentsof C ( X ), that is,¯ X ( U ) = { V : ∃ k ∈ I such that V is connected component of π − ( U k ) } . We build a feature point cloud (FPC) by taking vertices to be points of ¯ X ( U ). The distancebetween two connected components is taken as the distance between their centroids.For a gray image M , let X M be the point cloud in R defined as X M = { ( i, j, p ) : p is the intensity value at ( i, j ) } For feature space F M = [0 , π : X M → [0 , π ( i, j, p ) = p . Forsome finite cover W = { W i } i ∈ J of F M we get an FPC, denoted as, ¯ X M ( W ). The cover ischosen in such a way that the resulting FPC provides a good approximation of the PH of X M . To calculate PH we use filtered Vietoris–Rips complex V R ( ¯ X M ( W )) ⊂ V R . ( ¯ X M ( W )) ⊂ · · · ⊂ V R ( ¯ X M ( W )) . where V R ǫ ( ¯ X M ( W )) = (cid:8) A ⊂ ¯ X M ( W ) : d X ( a , a ) < ǫ, ∀ a , a ∈ A (cid:9) . This filtered complex, defined in the previous section, is able to capture the key featuresof a CT-scan, like, peaks, variations in intensity, etc. Connected components and appearanceof loops at different intensity levels are captured in the PD’s of H , and H respectively (seeFig. 9). The PDs of H and H for the images in Fig. 9 are given in Fig. 11. It is evidentthat the difference in their visual appearance is captured by these PDs.10 igure 9: Comparison of COVID and Non-COVID CT-scans. Difference in the variation of connectedcomponents ( H ) and (randomly shaped) loops ( H ) at different intensity levels can be seen. A peak, which is topologically equivalent to tetrahedron, appear as points at (or near to)infinity-line in the PD of H , that is, a persistent 2-dimensional void (see Fig. 10, 11). Figure 10: Peaks are captured by VR complex and saddle points are captured by lower-star-filtration
Lower Star Filtration (LSF) captures key features about variation in intensities in animage. Local minimum and saddle points, in the intensity plot, are vital shape features (seeFig 10). These features are recorded using LSF. The birth time of a point in this PD is localminimum and death time is saddle point.To record these changes, we construct a simplicial complex, say ∆, in the following way.Each pixel is taken as a vertex, and there is an edge from one vertex to its neighbouring8 (or less in case its an edge vertex) vertices. This allow us to construct the filtration∆ ⊂ ∆ ⊂ · · · ⊂ ∆ , where ∆ ǫ = { A ⊂ ∆ : max a ∈ A p ( a ) ≤ ǫ } , here p ( a ) is the pixel value11f the vertex a . Only the zero-dimensional PD is essential for our model development. TheFig.11 shows a major difference between the PDs associated to LSF of a COVID-19 and anon-COVID-19 CT-scan image. Figure 11: PDs of H , H , H , and lower-star-filtration of a COVID-19 infected, and a Non-COVID-19CT-scan There are many approaches to infer results from a collection of PD’s, for example, bottle-neck distance (Cohen-Steiner et al. , 2007) (and its generalizations), L p -Wasserstein metric(Cohen-Steiner et al. , 2010), persistent landscape (Bubenik, 2015), and Riemannian frame-work (Anirudh et al. , 2016), etc. Due to its efficiency and computational cost, we useRiemannian framework to perform our statistical analysis, which includes principal geodesicanalysis, and SVM on the space of PD’s. In this approach (see figure 12) we first approximatea given PD with a 2D probability distribution function (pdf) that are further mapped, usingsquare-root transformation, onto a Hilbert sphere. On Hilbert sphere we have closed-formexpressions to compare two PD’s. Using this we first apply principal geodesic analysis toreduce our dimensions., and then we build our model using SVM. Figure 12: Riemannian framework approach estimates a PD with a pdf, and then maps onto a Hilbert sphere
For each point ( a, b ) of a PD, we use multivariate normal distribution with parameters12 = (cid:18) . . (cid:19) and µ = ( a, b ). We calculate the values of this 2D pdf on the meshgrid[0 , with a uniform difference of 0 .
5. Further, we compute square-root representationof this pdf which maps them on a Hilbert sphere (c.f. (Anirudh et al. , 2016, Section 3.3)).Hence each PD is converted into a vector of length 1061 . To reduce this dimension we applyPrincipal Geodesic Analysis (PGA) (Fletcher et al. , 2004) on our Hilbert sphere. This wholeprocedure is applied on each collection of PD’s for H , H , H , and on the PD coming fromlower-star-filtration. Hence for a fixed image, each of the four PDs is represented as a vectorof length 2400. We build the classification model using SVM which tends to perform relatively well onlimited training data sets with high dimensional features. Each of the 2 ,
480 CT-Scans isrepresented by a homology feature vector. We use Principal Component Analysis to reducethe dimension of each feature vector to 4800. Before training of the SVM, we randomly splitthe data in to k -folds (where k = 5); we used ( k − H , H , H and LSF,achieves the classification accuracy of 99 . e − and 1 e .
4. Results and Discussion
We use a publicly available data set “SARS-CoV-2 CT-scan dataset”, which contains1252 CT scans that are positive, and 1230 CT scans for patients non-infected for COVID-19infection. These data is collected from patients in hospitals from Sao Paulo, Brazil and madepublic in (Angelov & Almeida Soares, 2020; Soares et al. , 2020).We propose a model based on topological features, these provides a binary classificationfor COVID and Non-COVID CT-Scans. Our model is an attempt to capture the features asobserved by a professional medic. These features are picked up by the topological summariesprovided by PDs. Hence making it biologically more interpretable as compared to deep neuralnetworks. Moreover topological techniques do not need plenty of data to train a model. Table1 compare the average values of the evaluation metrics achieved by different deep networksand our topological model. 13 ❳❳❳❳❳❳❳❳❳❳❳
Method Metric Accuracy Precision Recall Specificity F1 ScoreTopological Approach 99 . ± .
27 99 . ± .
24 99 . ± .
62 99 . ± .
22 99 . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 95 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 97 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 97.38 99.16 95.53 - 97.31Contrastive Learning (Wang et al. , 2020) 90 . ± . . ± . . ± . . ± . et al. , 2020) 95.0 95.3 94.0 94.7 94.3COVID CT-Net (Yazdani et al. , 2020) - - 85 . ± . . ± . . ± . et al. , 2020) 96.2 96.2 96.2 96.2 96.2 Table 1: The topological model performs better in terms of all metrics than the other state-of-the-artapproaches including the latest deep learning models. See (Alshazly et al. , 2020) for the definitions ofevaluation metrics.Figure 13: Some samples of COVID-19 and Non-COVID-19 CT-Scans with the predictions of our topologicalmodel.
Some of the topological features are more important than the others, depending on theinput image. The following table present performance of our model by using individualtopological features.Although LSF feature performs exceptionally well as compared to other features but insome situations lack of saddle points can make it less reliable. So combining all the featuresmakes our model more reliable in all situations.14 ❵❵❵❵❵❵❵❵❵❵❵❵❵❵
Top. Feature Metric Accuracy Precision Recall Specificity F1 Score H . ± . . ± . ± . . ± . ± . H . ± . . ± . . ± . . ± . . ± . H . ± . . ± . . ± . . ± . . ± . . ± . ± ± . ± . ± . Table 2: Performance metrics for different combinations of topological featuresFigure 14: A model using only LSF feature was not able to predict the traces of COVID-19 due to scarcityof saddle points in the infected region. But all models using all/one of the features H , H , and H detectedit correctly. To visualize the relative position of topological feature vectors coming from the CT-scansof SARS-CoV-2 we apply t-SNE. In this process we take vectors of length 4800 and mapthem to 2D. In Fig 15 we can clearly see two segregated clusters of the COVID-19 andNon-COVID-19 images. 15 igure 15: Visualization of the t-SNE embedding of the entire SARS-CoV-2 CT-Scan dataset. We clearly seetwo different clusters representing COVID-19 (red) and Non-COVID-19 (blue). It is clear that the embeddinghas some of the geometric properties of Hilbert sphere.
Machine learning algorithms have achieved persuasive performance in several medicalimaging problem but the interpretability of these ML models is very limited and it remains asignificant hurdle in adoption of these models in clinical practice. We perform an experimentto check the robustness of our model we perform the following procedures.In the first stage, we removed critical regions (GGOs and consolidations) of interestfrom COVID-19 positive cases and then predict its outcome. Secondly, we investigated theperformance of our model on images by randomly removing non-COVID-19 regions. We haveillustrated some COVID-19 cases in Fig. 16, where we covered the GGOs and consolidationsand the model predicted it to be non-COVID-19.Hence the infected COVID-19 regions are very accurately captured by our chosen topo-logical features and deductively by our model.16 igure 16: (A) Original images. (B) We covered COVID-19 regions. (C)&(D) We covered random non-COVID-19 regions. . Conclusion This paper presents a new approach to detect COVID-19 from CT-scan images, usingpersistent homology, and achieved state-of-the-art results. The work shows that the tech-niques of topological data analysis are effective and perform better that most of the deepneural networks.This work provides a highly interpretable model based on topological features of CT-scans. The model is based on the slogan “mimic a professional medic”. This outperformsmost of cutting edge deep neural network approaches. However it will be interesting tocombine topological features of this work with deep convolutional neural networks, thisis left as an open direction. Chest CT-scan imaging has high sensitivity for diagnosis ofCOVID-19 so this is step forward in detecting and hence eliminating COVID-19.
Acknowledgments
The first author is thankful to David Epstein and Saqlain Raza for many fruitful discus-sions.
References
Ai, Tao, Yang, Zhenlu, Hou, Hongyan, Zhan, Chenao, Chen, Chong, Lv, Wenzhi, Tao, Qian,Sun, Ziyong, & Xia, Liming. 2020. Correlation of Chest CT and RT-PCR Testing inCoronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases.
Radiology .https://doi.org/10.1148/radiol.2020200642.Alshazly, Hammam, Linse, Christoph, Barth, Erhardt, & Martinetz, Thomas. 2020. Ex-plainable COVID-19 Detection Using Chest CT Scans and Deep Learning. arXiv preprintarXiv:2011.05317 .Angelov, Plamen, & Almeida Soares, Eduardo. 2020. EXPLAINABLE-BY-DESIGN AP-PROACH FOR COVID-19 CLASSIFICATION VIA CT-SCAN. medRxiv .Anirudh, Rushil, Venkataraman, Vinay, Natesan Ramamurthy, Karthikeyan, & Turaga, Pa-van. 2016. A Riemannian framework for statistical analysis of topological persistencediagrams. 68–76.Bendich, Paul, Edelsbrunner, Herbert, & Kerber, Michael. 2010. Computing robustness andpersistence for images.
IEEE transactions on visualization and computer graphics , (6),1251–1260.Bubenik, Peter. 2015. Statistical topological data analysis using persistence landscapes. TheJournal of Machine Learning Research , (1), 77–102.Carlsson, Gunnar. 2009. Topology and data. Bulletin of the American Mathematical Society , (2), 255–308. 18arlsson, Gunnar, Singh, Gurjeet, & Zomorodian, Afra. 2009. Computing multidimensionalpersistence. Pages 730–739 of: International Symposium on Algorithms and Computation .Springer.Chung, Yu-Min, Hu, Chuan-Shen, Lawson, Austin, & Smyth, Clifford. 2018. Topologicalapproaches to skin disease image analysis.
Pages 100–105 of: 2018 IEEE InternationalConference on Big Data (Big Data) . IEEE.Cohen-Steiner, David, Edelsbrunner, Herbert, & Harer, John. 2007. Stability of persistencediagrams.
Discrete & computational geometry , (1), 103–120.Cohen-Steiner, David, Edelsbrunner, Herbert, Harer, John, & Mileyko, Yuriy. 2010. Lipschitzfunctions have L p-stable persistence. Foundations of computational mathematics , (2),127–139.Currie, Geoff, Hawk, K Elizabeth, Rohren, Eric, Vial, Alanna, & Klein, Ran. 2019. Machinelearning and deep learning in medical imaging: intelligent imaging. Journal of MedicalImaging and Radiation Sciences , (4), 477–487.de Silva, Vin, & Ghrist, Robert. 2007. Coverage in sensor networks via persistent homology. Algebr. Geom. Topol. , (1), 339–358.Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li, Kai, & Fei-Fei, Li. 2009. Imagenet:A large-scale hierarchical image database. Pages 248–255 of: 2009 IEEE conference oncomputer vision and pattern recognition . Ieee.Edelsbrunner, Herbert, Letscher, David, & Zomorodian, Afra. 2000. Topological persistenceand simplification.
Pages 454–463 of: Proceedings 41st annual symposium on foundationsof computer science . IEEE.Fletcher, P Thomas, Lu, Conglin, Pizer, Stephen M, & Joshi, Sarang. 2004. Principalgeodesic analysis for the study of nonlinear statistics of shape.
IEEE transactions onmedical imaging , (8), 995–1005.Frosini, Patrizio, & Landi, Claudia. 2013. Persistent Betti numbers for a noise tolerantshape-based approach to image retrieval. Pattern Recognition Letters , (8), 863–872.Garside, Kathryn, Henderson, Robin, Makarenko, Irina, & Masoller, Cristina. 2019. Topo-logical data analysis of high resolution diabetic retinopathy images. PloS one , (5),e0217413.Hiraoka, Yasuaki, Nakamura, Takenobu, Hirata, Akihiko, Escolar, Emerson G, Matsue,Kaname, & Nishiura, Yasumasa. 2016. Hierarchical structures of amorphous solids charac-terized by persistent homology. Proceedings of the National Academy of Sciences , (26),7035–7040. 19uang, Chaolin, Wang, Yeming, Li, Xingwang, Ren, Lili, Zhao, Jianping, Hu, Yi, Zhang,Li, Fan, Guohui, Xu, Jiuyang, Gu, Xiaoying, Cheng, Zhenshun, Yu, Ting, Xia, Jiaan,Wei, Yuan, Wu, Wenjuan, Xie, Xuelei, Yin, Wen, Li, Hui, Liu, Min, Xiao, Yan, Gao,Hong, Guo, Li, Xie, Jungang, Wang, Guangfa, Jiang, Rongmeng, Gao, Zhancheng, Jin,Qi, Wang, Jianwei, & Cao, Bin. 2020. Clinical features of patients infected with 2019 novelcoronavirus in Wuhan, China. The Lancet , (10223), 497–506.Jaiswal, Aayush, Gianchandani, Neha, Singh, Dilbag, Kumar, Vijay, & Kaur, Manjit. 2020.Classification of the COVID-19 infected patients using DenseNet201 based deep transferlearning. Journal of Biomolecular Structure and Dynamics , 1–8.Kasson, Peter M., Zomorodian, Afra, Park, Sanghyun, Singhal, Nina, Guibas, Leonidas J.,& Pande, Vijay S. 2007. Persistent voids: a new structural metric for membrane fusion.
Bioinformatics , (14), 1753–1759.Kong, Weifang, & Agarwal, Prachi P. 2020. Chest Imaging Appearance of COVID-19 Infec-tion. Radiology: Cardiothoracic Imaging , (1). https://doi.org/10.1148/ryct.2020200028.Lee, H., Kang, H., Chung, M. K., Kim, B., & Lee, D. S. 2012. Persistent Brain NetworkHomology From the Perspective of Dendrogram. IEEE Transactions on Medical Imaging , (12), 2267–2277.Li, Xiaoming, Zeng, Wenbing, Li, Xiang, Chen, Haonan, Shi, Linping, Li, Xinghui, Xiang,Hongnian, Cao, Yang, Chen, Hui, Liu, Chen, & Wang, Jian. 2020. CT imaging changes ofcorona virus disease 2019 (COVID-19): a multi-center study in Southwest China. Journalof Translational Medicine , . https://doi.org/10.1186/s12967-020-02324-w.Liu, Xu, Xie, Zheng, Yi, Dongyun, et al. . 2012. A fast algorithm for constructing topologicalstructure in large data. Homology, Homotopy and Applications , (1), 221–238.Otter, Nina, Porter, Mason A, Tillmann, Ulrike, Grindrod, Peter, & Harrington, Heather A.2017. A roadmap for the computation of persistent homology. EPJ Data Science , (1),17.Panwar, Harsh, Gupta, PK, Siddiqui, Mohammad Khubeb, Morales-Menendez, Ruben,Bhardwaj, Prakhar, & Singh, Vaishnavi. 2020. A deep learning and grad-CAM basedcolor visualization approach for fast detection of COVID-19 cases using chest X-ray andCT-Scan images. Chaos, Solitons & Fractals , , 110190.Qaiser, Talha, Sirinukunwattana, Korsuk, Nakane, Kazuaki, Tsang, Yee-Wah, Epstein,David, & Rajpoot, Nasir M. 2016. Persistent homology for fast tumor segmentation inwhole slide histology images. Procedia Computer Science , , 119–124.Qaiser, Talha, Tsang, Yee-Wah, Epstein, David, & Rajpoot, Nasir. 2017. Tumor segmen-tation in whole slide images using persistent homology and deep convolutional features. Pages 320–329 of: Annual Conference on Medical Image Understanding and Analysis .Springer. 20aiser, Talha, Tsang, Yee-Wah, Taniyama, Daiki, Sakamoto, Naoya, Nakane, Kazuaki, Ep-stein, David, & Rajpoot, Nasir. 2019. Fast and accurate tumor segmentation of histologyimages using persistent homology and deep convolutional features.
Medical Image Analy-sis , , 1 – 14.Rieck, Bastian, Mara, Hubert, & Leitte, Heike. 2012. Multivariate data analysis usingpersistence-based filtering and topological signatures. IEEE Transactions on Visualizationand Computer Graphics , (12), 2382–2391.Rotman, Joseph J. 2013. An introduction to algebraic topology . Vol. 119. Springer Science& Business Media.Ruder, Sebastian. 2019.
Neural transfer learning for natural language processing . Ph.D.thesis, NUI Galway.Ruder, Sebastian, Peters, Matthew E, Swayamdipta, Swabha, & Wolf, Thomas. 2019. Trans-fer learning in natural language processing.
Pages 15–18 of: Proceedings of the 2019 Con-ference of the North American Chapter of the Association for Computational Linguistics:Tutorials .Soares, Eduardo, Angelov, Plamen, Biaso, Sarah, Froes, Michele Higa, & Abe, Daniel Kanda.2020. SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv .Tahamtan, Alireza, & Ardebili, Abdollah. 2020. Real-time RT-PCR in COVID-19 detec-tion: issues affecting the results.
Expert review of molecular diagnostics , , 453–454.doi:10.1080/14737159.2020.1757437.Tralie, Christopher, Saul, Nathaniel, & Bar-On, Rann. 2018. Ripser.py: A Lean PersistentHomology Library for Python. The Journal of Open Source Software , (29), 925.Voulodimos, Athanasios, Doulamis, Nikolaos, Doulamis, Anastasios, & Protopapadakis,Eftychios. 2018. Deep learning for computer vision: A brief review. Computational intel-ligence and neuroscience , .Wang, Bao, & Wei, Guo-Wei. 2016. Object-oriented persistent homology. Journal of com-putational physics , , 276–299.Wang, Zhao, Liu, Quande, & Dou, Qi. 2020. Contrastive Cross-Site Learning With Re-designed Net for COVID-19 CT Classification. IEEE Journal of Biomedical and HealthInformatics , (10), 2806–2813.Yao, Yuan, Sun, Jian, Huang, Xuhui, Bowman, Gregory R, Singh, Gurjeet, Lesnick, Michael,Guibas, Leonidas J, Pande, Vijay S, & Carlsson, Gunnar. 2009. Topological methods forexploring low-density states in biomolecular folding pathways. The Journal of chemicalphysics , (14), 04B614. 21azdani, Shakib, Minaee, Shervin, Kafieh, Rahele, Saeedizadeh, Narges, & Sonka, Milan.2020. Covid ct-net: Predicting covid-19 from chest ct images using attentional convolu-tional network. arXiv preprint arXiv:2009.05096arXiv preprint arXiv:2009.05096