[PDF] Classification of COVID-19 via Homology of CT-SCAN

Abstract

In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" \citep{soares2020sars}, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of 99.42\% , accuracy 99.416\%, precision 99.41\%, and recall 99.42\%. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.

Full PDF

aa r X i v : . [ ee ss . I V ] F e b Graphical Abstract

Classiﬁcation of COVID-19 via Homology of CT-SCAN

Sohail Iqbal, Haﬁz Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot

Figure 1: Graphical Abstract lassiﬁcation of COVID-19 via Homology of CT-SCAN

Sohail Iqbal a,1, ∗ , Haﬁz Fareed Ahmed a,1 , Talha Qaiser b,1 , Muhammad Imran Qureshi c ,Nasir Rajpoot d a Department of Mathematics, COMSATS University Islamabad, Park Road, Islamabad, Pakistan b Department of Computing, Imperial College London, SW72AZ, United Kingdom c Department of Mathematics, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, SaudiArabia d Deparment of Computer Science, The University of Warwick, Coventry, CV47AL, United Kingdom

Abstract

In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importanceto detect the disease at an early stage especially in the hot spots of this epidemic. There aremore than 110 Million infected cases on the globe, sofar. Due to its promptness and eﬀec-tive results computed tomography (CT)-scan image is preferred to the reverse-transcriptionpolymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the onlypossible way of controlling the spread of the disease. Automated analysis of CT-Scans canprovide enormous support in this process. In this article, We propose a novel approach todetect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive andnatural idea of analyzing shapes, an attempt to mimic a professional medic. We mainlytrace SARS-CoV-2 features by quantifying their topological properties. We primarily use atool called persistent homology, from Topological Data Analysis (TDA), to compute thesetopological properties.We train and test our model on the “SARS-CoV-2 CT-scan dataset” (Soares et al. ,2020), an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients.Our model yielded an overall benchmark F1 score of 99 . . . . ∗ Corresponding author

Email addresses: [email protected] (Sohail Iqbal ), [email protected] (Haﬁz Fareed Ahmed), [email protected] (Talha Qaiser ),

[email protected] (Muhammad ImranQureshi),

[email protected] (Nasir Rajpoot) These authors contributed the same during this project

Preprint submitted to Elsevier February 23, 2021 . Introduction

On March 11 2020 WHO declared a pandemic caused by the novel corona virus (nCoV).The disease is well known as SARS-CoV-2, or COVID-19, and originated from Wuhan, Chinain December 2019. After more than a year the virus is still spreading exponentially and hasreached 212 countries. At the time of writing this paper there are more than 110 millioninfected cases, whereas the death toll has crossed 2.4 million.The main reason for the spread of the virus is its asymptomatic nature, where an aﬀectedperson spreads the virus without any signs of illness. The only precaution that is requiredworldwide is immediate isolation of the aﬀected person. For this purpose, eﬀective andtimely testing is the core factor for treatment and prevention.COVID-19 is a member of the family of viruses SARS (Severe Acute Respiratory Syn-drome). It causes severe respiratory illness and primarily damages the lungs (Huang et al. ,2020). We use “Reverse Transcription Polymerise Chain Reaction (RT-PCR)” and chestcomputed tomography (CT) scan for the detection of COVID-19. In RT-PCR we reversetranscription of RNA into DNA, and then detect the presence of virus DNA. This methodis a laboratory technique that requires a trained personnel that carries out the whole pro-cess. The test might take hours, if not days, to give the results. This situation worsens, forextremely aﬀected areas with limited test kits and trained personnel. The RT-PCR testingalso gives us the false-negative results, in some cases (Tahamtan & Ardebili, 2020), whilein the chest CT-scan, the image analysis has been done by radiologists (Kong & Agarwal,2020). The latter is much more sensitive method than the RT-PCR. The study of 1014 casesshows the signiﬁcance of chest CT-scans over RT-PCR (Ai et al. , 2020). One study ﬁndsthat 40 out of 41 (98%) patients had pneumonia with abnormal ﬁndings on chest CT-scans(Huang et al. , 2020).The image analysis by radiologists suggest that in COVID-19-positive patients, theground-glass opacities (GGOs) together with consolidations, crazy pavings appear at theperipheral portions of bilateral lungs. The increased attenuation in chest CT scan is themain feature in detection of COVID-19. Detection of these features is relatively time eﬃ-cient due to higher sensitivity of this method (Li et al. , 2020).Over the past decade, one of the promising directions in health care innovation is theapplicability of artiﬁcial intelligence (AI) in medical imaging. In recent years, AI in generalhas revolutionized the ﬁeld of computer vision (Voulodimos et al. , 2018), and natural lan-guage processing (Ruder et al. , 2019; Ruder, 2019) by pushing state-of-the-art performancein various pattern recognition tasks. More recently, there is an upward trend in exploring theusability of machine learning algorithms for medical imaging data (Currie et al. , 2019). Inthe current scenario with the worldwide outbreak of SARS-CoV2, it is imperative to developscreening tools to analyze the COVID-19 chest CT-scans. One of the main challenges in deeplearning is data hungriness. In order to converge a deep learning model one may require thou-sands, and in some cases millions of images (Deng et al. , 2009) for training. On the otherhand techniques of Topological Data Analysis (TDA) use geometrical features, making themeﬃcient in terms of amount of data required, speed, predictability and interpretability.TDA is one of the rapidly growing techniques in data analysis. It provides tools to3nalyze data by bridging techniques from machine learning, statistics, algebraic topology,topology, and algebra. One of the main tools in TDA is persistent homology (PH). It is avery eﬀective technique that record the intrinsic topological properties of data. The essentialidea is to produce topological features across a scale. On this scale, some features “die” earlyand some “live” longer. The persisting times of these features are the key point of PH. Theideas of PH has been successfully applied to many areas of science and technology vis-`a-vis network structures (de Silva & Ghrist, 2007; Lee et al. , 2012), computational biology(Kasson et al. , 2007; Yao et al. , 2009; Wang & Wei, 2016), data analysis (Carlsson, 2009;Liu et al. , 2012; Rieck et al. , 2012), image analysis (Carlsson et al. , 2009; Frosini & Landi,2013; Bendich et al. , 2010), amorphous material structures (Hiraoka et al. , 2016), etc.In the recent years these techniques has been successful in medical image analysis. Forexample, in (Qaiser et al. , 2019, 2016, 2017) the authors developed models, based on PH,for eﬃcient tumor segmentation in whole-slide images of histology slides; in (Garside et al. ,2019) the topological features are used to diﬀerentiate healthy patients and those with dia-betic retinopathy; in (Chung et al. , 2018) segmentation of skin cancer using a given imagewas achieved using these techniques, etc.In this work we develop a state-of-the-art model to detect traces of COVID-19 infec-tion in CT-scans. To train and test our model we use “SARS-CoV-2 CT-scan dataset”(Soares et al. , 2020).

Figure 2: A sample of CT-Scan images from “SARS-CoV-2 CT-scan dataset” for COVID-19 patients (ﬁrstrow) and normal cases (second row).

There are mainly three stages to develop our model. In the ﬁrst stage, we devise a wayto construct a simplicial complex from a give image and then calculate PD’s associated toit. We map our PDs on a Hilbert sphere, following Anirudh et al (Anirudh et al. , 2016), inthe second stage. This step enable us to perform diﬀerent statistical operations on the spaceof PDs. In the last stage, we use SVM to develop our classiﬁcation model.The rest of the paper is structured as follows. We review basics of PH in Section 2. InSection 3 we describe our methodology to extract features from CT-scans. Our restults arereported in 4. Finally, in Section 5, we draw our conclusions.4 . Mathematical Preliminaries

In this section we recall basic notions and deﬁnitions leading to persistent homology andpersistent diagrams (PD’s).

The persistent homology is one of the main tools used in topological data analysis (TDA).It provides a way to analyze the shape of a point cloud data without actually calculatingthe precise geometry. It illuminates some qualitative features of data which persist acrossmultiple scales. These persistent features provide an eﬀective quantiﬁcation for the shape ofdata. The method is based on techniques from algebraic topology, a branch of mathemat-ics that deals with diﬀerent “bridges” between algebra and topology known as “functors”.These functors take topologically equivalent (homotopic) spaces to algebraically equivalent(isomorphic) spaces . The functorial nature of persistent homology makes it robust to pertur-bations of an input point cloud; a rarely found feature in some of the existing data analysistechniques.A number of functors exist to deal with diﬀerent classes of topological spaces (simplicial,cellular, singular, etc). The development of PH is based on the functor known as simplicialhomology. On the topological side we consider a simplicial complex ∆, and on the algebraicside we get vector spaces H i (∆) for i = { , , , . . . } . The dimension of H (∆) gives thenumber of connected components, H (∆) gives the number of holes, H (∆) gives the numberof voids, and so on. In PH we construct simplicial complexes ∆ ǫ from a point cloud dependingon a scale parameter ǫ . The homological features that remain persistent across scales, providean eﬀective analysis of the shape of data. The summary of these features is either shown ona “bar diagram” (BD) or a “persistent diagram” (PD).In what follows we provide a formal overview of the aforementioned terminologies. Formore details see (Rotman, 2013; Carlsson, 2009; Otter et al. , 2017; Edelsbrunner et al. ,2000). Deﬁnition 2.1.1 (Simplex).

Let { p , p , . . . , p n } be an aﬃne independent set in R n . Then-simplex generated by this set, denoted by [ p , p , . . . , p n ] , is the convex hull of points p , p , . . . , p n .Every point x of this simplex can be written uniquely in the following form x = n X i =0 t i p i , where n X i =0 t i = 1 and each t i ≥ . (1)A k -face of [ p , p , . . . , p n ] is a simplex generated by a collection of k + 1 points from { p , p , . . . , p n } . A k -face is in fact a k -dimensional geometric object. A simplicial complexis obtained by “gluing” together diﬀerent simplices along their common faces. Deﬁnition 2.1.2 (Simplicial Complex).

A ﬁnite simplicial complex ∆ is a collection ofsimplices in R n such that (1) if α ∈ ∆ then every face of α belong to ∆ , (2) for any twosimplices α , α ∈ ∆ , the intersection α ∩ α is either empty or a common face of α and α . igure 3: Orientation of a simplicial complex These conditions guarantee that ∆ records changes in each dimension. In order to calcu-late simplicial homology we need an orientation on the simplicial complex. An orientationof a simplicial complex is a partial order of its vertices which when restricted to a particularsimplex gives a total order.

Figure 4: A simplicial complex with a partial order p < p < p < p < p < p , p < p < p For an oriented simplicial complex ∆, and integer m ≥ − m th chain group C m (∆)consists of formal sums of the form a σ + a σ + · · · + a n σ n , where a , . . . , a n ∈ Z and σ , . . . , σ n are oriented m -simplices of ∆. For convenience, we deﬁne C m (∆) = { } for m > dim ∆ and m <

0. An oriented simplicial complex is an element of the chaingroup C m ( K ) of the form ± < p , p , . . . , p m > where p , . . . , p m are distinct. The boundaryoperator ∂ m : C m (∆) → C m − (∆) is deﬁned by setting ∂ m ( < p , p , . . . , p m > ) = m X i =0 ( − i < p , . . . , ˆ p i , . . . , p m >, p i means deleting p i , and extends by linearity. Combining all information, we get thefollowing chain complex · · · ∂ m +1 −−−→ C m (∆) ∂ m −→ C m − (∆) ∂ m − −−−→ · · · ∂ −→ C (∆) ∂ −→ C (∆) ∂ −→ , such that the composition of any two consecutive maps is a zero map. We deﬁne the m -thsimplicial homology as H m (∆) = Kernel ∂ m / Image( ∂ m +1 ) , where 0 ≤ m ≤ dim ∆ . Its dimension β m (∆) = dim H m (∆)is called the m th Betti number of the simplicial complex ∆. For a data set X , we can calculate its Betti numbers after imposing a simplicial com-plex ∆( X ) on it. This information is not useful since it only gives number of connectedcomponents, 1-dimensional holes, etc. To extract further information, we construct a ﬁl-tered complex, which consists of nested subcomplexes ∆( X, ǫ ) of ∆, that depend on a scaleparameter ǫ , such that ∆( X, ǫ ) ⊆ ∆( X, ǫ ) whenever ǫ ≤ ǫ . We can apply the simplicial homology functor on each subcomplex. The inclusion map on thecomplexes ∆(

X, ǫ ) ⊂ ∆( X, ǫ ) induces a linear map on the homology groups H m (∆( X, ǫ )) → H m (∆( X, ǫ )), for 0 ≤ m ≤ dim ∆. Hence the homology of this ﬁltration complex consis-tently provide information about ∆ at diﬀerent values of ǫ . One can represent these features,over various scales, using persistent diagrams (PDs). To exemplify, consider a point cloudin R , say S , given in Fig 5. To extract qualitative information from this data we computetopological features for diﬀerent values of a scale parameter ǫ . At each scale level of ǫ , weconsider open discs, say D ( p, ǫ ), of radius ǫ around each point p . Then we build a simplicialcomplex ∆( S, ǫ ) using the following rule; a set of points A = { p , . . . , p n } forms an n-simplexif ∩ p ∈ A D ( p, ǫ ) = Φ. At each level the simplicial complex is made up of simplices shown inthe Fig 5.As the values of ǫ increase from 0, we get a ﬁltration of simplicial complexes ∆( S, ǫ ) Figure 5: A ﬁltration of simplicial complexes build from a given point cloud ∆( S,

7e can represent the birth and death of these topological features using the persistentdiagram. A persistent diagram (PD) is a collection of ordered pairs in the extended plane.A point ( a, b ) represent the birth at scale parameter value ǫ = a and death at ǫ = b . Thepoints that touches the inﬁnity line are the persistent features that do not die till the lastvalue of our ﬁltration parameter. In Fig 6 we see a point at line of inﬁnity. This depicts thatthere is one-dimensional hole (or loop) that appears at ǫ = 2 (see Fig 5) and never dies. Itis very consisting with the observation that the overall shape of data in Fig 5 is circular. Figure 6: PDs representing topological features of the point cloud from Fig 5

Apart from its stability, another important feature of the PH is that it can be com-puted using diﬀerent algorithms. There are many libraries available that implement algo-rithms for the computation of PH. These libraries include,

Perseus , javaPlex , Dionysus , ripser , Gudhi , etc. We use ripser (Tralie et al. , 2018) due to its computational eﬃciency(Otter et al. , 2017).There are many ways to impose a simplicial complex on a given point cloud. The choicedepends on the nature of data, computational cost, and restrictions of software/packageused. Some typical simplicial complexes are, Vietoris–Rips complex, ˇCech complex, Delaunaycomplex, clique complex, alpha complex, strong witness complex, weak witness complex, etc.

3. Methods and Algorithm

Thoracic radiology evaluations found high rates of ground-glass opacities and consolida-tions in COVID-19 patients. One can observe the ground-glass opacities (GGOs) togetherwith consolidations in the CT-Scan of COVID-19 images. These regions are isolated withdiﬀerence in shapes (left image in Fig. 7), which is captured by PDs associated to H and H . Moreover, these regions have unique shape in the intensity plot, see appearance of alps,saddle points in Fig. 7 (right image). These are recorded by PDs of Lower-star-ﬁltrationand H . 8 igure 7: Shape features and intensity plot in a COVID-19 infected CT-scan All These shape features are captured by PDs associated to two diﬀerent ﬁltered com-plexes, namely, ﬁltered Vietoris-Rips (VR) complex, and lower-star-ﬁltration. The wholeprocess is summarized as two pipelines in Fig 8. In the following subsections, we describeeach step of these two pipelines.

Figure 8: Extraction of topological features from a CT-Scan of a COVID-19 patient. (A) Quantization ofthe Input Image which is used to construct feature point cloud (FPC). (B) Construction of two ﬁlteredcomplexes: ﬁltered VR complex constructed from FPC, and lower-star ﬁltration directly from image. (C)Calculation of PDs (D) Use of Riemannian framework to map PDs on a Hilbert sphere. Later, by usingPGA, we construct a unique feature vector of the Input Image.

There are four PDs associated to an image; three from ﬁltered VR complex correspondingto H , H and H , and one from lower-star-ﬁltration. To simplify statistical operationswe map each PD onto a Hilbert sphere, this is explained in Subsection 3.2. On Hilbert9phere, we reduce dimension by using principal geodesic analysis (PGA)(Fletcher et al. ,2004). Eventually each PD is mapped onto a vector of length 2400, a juxtaposition of thesevectors gives a combined feature vector for each image. In the ﬁnal step, in Subsection 3.3,we build a model from these feature vectors. Before calculating PH we ﬁrst build feature point cloud (FPC) from a given point cloud.The FPC is an optimal way to record changes for all values of some chosen feature whilekeeping the computational time in a feasible limit.Let X be the point cloud with a metric d X , and F be a compact feature space endowedwith a metric d F . We deﬁne the projection map π : X → F such that π ( p ) is the featureof the point p . Let U = { U i } i ∈ I be a covering of F , where I is a ﬁnite indexing set, theﬁniteness is guaranteed by the compactness of F . Using the projection map π , we get acovering C ( X ) of X , where C ( X ) = { π − ( U i ) } i ∈ I . Let Γ( X , U ) be the connected componentsof C ( X ), that is,¯ X ( U ) = { V : ∃ k ∈ I such that V is connected component of π − ( U k ) } . We build a feature point cloud (FPC) by taking vertices to be points of ¯ X ( U ). The distancebetween two connected components is taken as the distance between their centroids.For a gray image M , let X M be the point cloud in R deﬁned as X M = { ( i, j, p ) : p is the intensity value at ( i, j ) } For feature space F M = [0 , π : X M → [0 , π ( i, j, p ) = p . Forsome ﬁnite cover W = { W i } i ∈ J of F M we get an FPC, denoted as, ¯ X M ( W ). The cover ischosen in such a way that the resulting FPC provides a good approximation of the PH of X M . To calculate PH we use ﬁltered Vietoris–Rips complex V R ( ¯ X M ( W )) ⊂ V R . ( ¯ X M ( W )) ⊂ · · · ⊂ V R ( ¯ X M ( W )) . where V R ǫ ( ¯ X M ( W )) = (cid:8) A ⊂ ¯ X M ( W ) : d X ( a , a ) < ǫ, ∀ a , a ∈ A (cid:9) . This ﬁltered complex, deﬁned in the previous section, is able to capture the key featuresof a CT-scan, like, peaks, variations in intensity, etc. Connected components and appearanceof loops at diﬀerent intensity levels are captured in the PD’s of H , and H respectively (seeFig. 9). The PDs of H and H for the images in Fig. 9 are given in Fig. 11. It is evidentthat the diﬀerence in their visual appearance is captured by these PDs.10 igure 9: Comparison of COVID and Non-COVID CT-scans. Diﬀerence in the variation of connectedcomponents ( H ) and (randomly shaped) loops ( H ) at diﬀerent intensity levels can be seen. A peak, which is topologically equivalent to tetrahedron, appear as points at (or near to)inﬁnity-line in the PD of H , that is, a persistent 2-dimensional void (see Fig. 10, 11). Figure 10: Peaks are captured by VR complex and saddle points are captured by lower-star-ﬁltration

Lower Star Filtration (LSF) captures key features about variation in intensities in animage. Local minimum and saddle points, in the intensity plot, are vital shape features (seeFig 10). These features are recorded using LSF. The birth time of a point in this PD is localminimum and death time is saddle point.To record these changes, we construct a simplicial complex, say ∆, in the following way.Each pixel is taken as a vertex, and there is an edge from one vertex to its neighbouring8 (or less in case its an edge vertex) vertices. This allow us to construct the ﬁltration∆ ⊂ ∆ ⊂ · · · ⊂ ∆ , where ∆ ǫ = { A ⊂ ∆ : max a ∈ A p ( a ) ≤ ǫ } , here p ( a ) is the pixel value11f the vertex a . Only the zero-dimensional PD is essential for our model development. TheFig.11 shows a major diﬀerence between the PDs associated to LSF of a COVID-19 and anon-COVID-19 CT-scan image. Figure 11: PDs of H , H , H , and lower-star-ﬁltration of a COVID-19 infected, and a Non-COVID-19CT-scan There are many approaches to infer results from a collection of PD’s, for example, bottle-neck distance (Cohen-Steiner et al. , 2007) (and its generalizations), L p -Wasserstein metric(Cohen-Steiner et al. , 2010), persistent landscape (Bubenik, 2015), and Riemannian frame-work (Anirudh et al. , 2016), etc. Due to its eﬃciency and computational cost, we useRiemannian framework to perform our statistical analysis, which includes principal geodesicanalysis, and SVM on the space of PD’s. In this approach (see ﬁgure 12) we ﬁrst approximatea given PD with a 2D probability distribution function (pdf) that are further mapped, usingsquare-root transformation, onto a Hilbert sphere. On Hilbert sphere we have closed-formexpressions to compare two PD’s. Using this we ﬁrst apply principal geodesic analysis toreduce our dimensions., and then we build our model using SVM. Figure 12: Riemannian framework approach estimates a PD with a pdf, and then maps onto a Hilbert sphere

For each point ( a, b ) of a PD, we use multivariate normal distribution with parameters12 = (cid:18) . . (cid:19) and µ = ( a, b ). We calculate the values of this 2D pdf on the meshgrid[0 , with a uniform diﬀerence of 0 .

5. Further, we compute square-root representationof this pdf which maps them on a Hilbert sphere (c.f. (Anirudh et al. , 2016, Section 3.3)).Hence each PD is converted into a vector of length 1061 . To reduce this dimension we applyPrincipal Geodesic Analysis (PGA) (Fletcher et al. , 2004) on our Hilbert sphere. This wholeprocedure is applied on each collection of PD’s for H , H , H , and on the PD coming fromlower-star-ﬁltration. Hence for a ﬁxed image, each of the four PDs is represented as a vectorof length 2400. We build the classiﬁcation model using SVM which tends to perform relatively well onlimited training data sets with high dimensional features. Each of the 2 ,

480 CT-Scans isrepresented by a homology feature vector. We use Principal Component Analysis to reducethe dimension of each feature vector to 4800. Before training of the SVM, we randomly splitthe data in to k -folds (where k = 5); we used ( k − H , H , H and LSF,achieves the classiﬁcation accuracy of 99 . e − and 1 e .

4. Results and Discussion

We use a publicly available data set “SARS-CoV-2 CT-scan dataset”, which contains1252 CT scans that are positive, and 1230 CT scans for patients non-infected for COVID-19infection. These data is collected from patients in hospitals from Sao Paulo, Brazil and madepublic in (Angelov & Almeida Soares, 2020; Soares et al. , 2020).We propose a model based on topological features, these provides a binary classiﬁcationfor COVID and Non-COVID CT-Scans. Our model is an attempt to capture the features asobserved by a professional medic. These features are picked up by the topological summariesprovided by PDs. Hence making it biologically more interpretable as compared to deep neuralnetworks. Moreover topological techniques do not need plenty of data to train a model. Table1 compare the average values of the evaluation metrics achieved by diﬀerent deep networksand our topological model. 13 ❳❳❳❳❳❳❳❳❳❳❳

Method Metric Accuracy Precision Recall Speciﬁcity F1 ScoreTopological Approach 99 . ± .

27 99 . ± .

24 99 . ± .

62 99 . ± .

22 99 . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 95 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 97 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 99 . ± . . ± . . ± . . ± . . ± . et al. , 2020) 97.38 99.16 95.53 - 97.31Contrastive Learning (Wang et al. , 2020) 90 . ± . . ± . . ± . . ± . et al. , 2020) 95.0 95.3 94.0 94.7 94.3COVID CT-Net (Yazdani et al. , 2020) - - 85 . ± . . ± . . ± . et al. , 2020) 96.2 96.2 96.2 96.2 96.2 Table 1: The topological model performs better in terms of all metrics than the other state-of-the-artapproaches including the latest deep learning models. See (Alshazly et al. , 2020) for the deﬁnitions ofevaluation metrics.Figure 13: Some samples of COVID-19 and Non-COVID-19 CT-Scans with the predictions of our topologicalmodel.

Some of the topological features are more important than the others, depending on theinput image. The following table present performance of our model by using individualtopological features.Although LSF feature performs exceptionally well as compared to other features but insome situations lack of saddle points can make it less reliable. So combining all the featuresmakes our model more reliable in all situations.14 ❵❵❵❵❵❵❵❵❵❵❵❵❵❵

Top. Feature Metric Accuracy Precision Recall Speciﬁcity F1 Score H . ± . . ± . ± . . ± . ± . H . ± . . ± . . ± . . ± . . ± . H . ± . . ± . . ± . . ± . . ± . . ± . ± ± . ± . ± . Table 2: Performance metrics for diﬀerent combinations of topological featuresFigure 14: A model using only LSF feature was not able to predict the traces of COVID-19 due to scarcityof saddle points in the infected region. But all models using all/one of the features H , H , and H detectedit correctly. To visualize the relative position of topological feature vectors coming from the CT-scansof SARS-CoV-2 we apply t-SNE. In this process we take vectors of length 4800 and mapthem to 2D. In Fig 15 we can clearly see two segregated clusters of the COVID-19 andNon-COVID-19 images. 15 igure 15: Visualization of the t-SNE embedding of the entire SARS-CoV-2 CT-Scan dataset. We clearly seetwo diﬀerent clusters representing COVID-19 (red) and Non-COVID-19 (blue). It is clear that the embeddinghas some of the geometric properties of Hilbert sphere.

Machine learning algorithms have achieved persuasive performance in several medicalimaging problem but the interpretability of these ML models is very limited and it remains asigniﬁcant hurdle in adoption of these models in clinical practice. We perform an experimentto check the robustness of our model we perform the following procedures.In the ﬁrst stage, we removed critical regions (GGOs and consolidations) of interestfrom COVID-19 positive cases and then predict its outcome. Secondly, we investigated theperformance of our model on images by randomly removing non-COVID-19 regions. We haveillustrated some COVID-19 cases in Fig. 16, where we covered the GGOs and consolidationsand the model predicted it to be non-COVID-19.Hence the infected COVID-19 regions are very accurately captured by our chosen topo-logical features and deductively by our model.16 igure 16: (A) Original images. (B) We covered COVID-19 regions. (C)&(D) We covered random non-COVID-19 regions. . Conclusion This paper presents a new approach to detect COVID-19 from CT-scan images, usingpersistent homology, and achieved state-of-the-art results. The work shows that the tech-niques of topological data analysis are eﬀective and perform better that most of the deepneural networks.This work provides a highly interpretable model based on topological features of CT-scans. The model is based on the slogan “mimic a professional medic”. This outperformsmost of cutting edge deep neural network approaches. However it will be interesting tocombine topological features of this work with deep convolutional neural networks, thisis left as an open direction. Chest CT-scan imaging has high sensitivity for diagnosis ofCOVID-19 so this is step forward in detecting and hence eliminating COVID-19.

Acknowledgments

The ﬁrst author is thankful to David Epstein and Saqlain Raza for many fruitful discus-sions.

References

Ai, Tao, Yang, Zhenlu, Hou, Hongyan, Zhan, Chenao, Chen, Chong, Lv, Wenzhi, Tao, Qian,Sun, Ziyong, & Xia, Liming. 2020. Correlation of Chest CT and RT-PCR Testing inCoronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases.

Radiology .https://doi.org/10.1148/radiol.2020200642.Alshazly, Hammam, Linse, Christoph, Barth, Erhardt, & Martinetz, Thomas. 2020. Ex-plainable COVID-19 Detection Using Chest CT Scans and Deep Learning. arXiv preprintarXiv:2011.05317 .Angelov, Plamen, & Almeida Soares, Eduardo. 2020. EXPLAINABLE-BY-DESIGN AP-PROACH FOR COVID-19 CLASSIFICATION VIA CT-SCAN. medRxiv .Anirudh, Rushil, Venkataraman, Vinay, Natesan Ramamurthy, Karthikeyan, & Turaga, Pa-van. 2016. A Riemannian framework for statistical analysis of topological persistencediagrams. 68–76.Bendich, Paul, Edelsbrunner, Herbert, & Kerber, Michael. 2010. Computing robustness andpersistence for images.

IEEE transactions on visualization and computer graphics , (6),1251–1260.Bubenik, Peter. 2015. Statistical topological data analysis using persistence landscapes. TheJournal of Machine Learning Research , (1), 77–102.Carlsson, Gunnar. 2009. Topology and data. Bulletin of the American Mathematical Society , (2), 255–308. 18arlsson, Gunnar, Singh, Gurjeet, & Zomorodian, Afra. 2009. Computing multidimensionalpersistence. Pages 730–739 of: International Symposium on Algorithms and Computation .Springer.Chung, Yu-Min, Hu, Chuan-Shen, Lawson, Austin, & Smyth, Cliﬀord. 2018. Topologicalapproaches to skin disease image analysis.

Pages 100–105 of: 2018 IEEE InternationalConference on Big Data (Big Data) . IEEE.Cohen-Steiner, David, Edelsbrunner, Herbert, & Harer, John. 2007. Stability of persistencediagrams.

Discrete & computational geometry , (1), 103–120.Cohen-Steiner, David, Edelsbrunner, Herbert, Harer, John, & Mileyko, Yuriy. 2010. Lipschitzfunctions have L p-stable persistence. Foundations of computational mathematics , (2),127–139.Currie, Geoﬀ, Hawk, K Elizabeth, Rohren, Eric, Vial, Alanna, & Klein, Ran. 2019. Machinelearning and deep learning in medical imaging: intelligent imaging. Journal of MedicalImaging and Radiation Sciences , (4), 477–487.de Silva, Vin, & Ghrist, Robert. 2007. Coverage in sensor networks via persistent homology. Algebr. Geom. Topol. , (1), 339–358.Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li, Kai, & Fei-Fei, Li. 2009. Imagenet:A large-scale hierarchical image database. Pages 248–255 of: 2009 IEEE conference oncomputer vision and pattern recognition . Ieee.Edelsbrunner, Herbert, Letscher, David, & Zomorodian, Afra. 2000. Topological persistenceand simpliﬁcation.

Pages 454–463 of: Proceedings 41st annual symposium on foundationsof computer science . IEEE.Fletcher, P Thomas, Lu, Conglin, Pizer, Stephen M, & Joshi, Sarang. 2004. Principalgeodesic analysis for the study of nonlinear statistics of shape.

IEEE transactions onmedical imaging , (8), 995–1005.Frosini, Patrizio, & Landi, Claudia. 2013. Persistent Betti numbers for a noise tolerantshape-based approach to image retrieval. Pattern Recognition Letters , (8), 863–872.Garside, Kathryn, Henderson, Robin, Makarenko, Irina, & Masoller, Cristina. 2019. Topo-logical data analysis of high resolution diabetic retinopathy images. PloS one , (5),e0217413.Hiraoka, Yasuaki, Nakamura, Takenobu, Hirata, Akihiko, Escolar, Emerson G, Matsue,Kaname, & Nishiura, Yasumasa. 2016. Hierarchical structures of amorphous solids charac-terized by persistent homology. Proceedings of the National Academy of Sciences , (26),7035–7040. 19uang, Chaolin, Wang, Yeming, Li, Xingwang, Ren, Lili, Zhao, Jianping, Hu, Yi, Zhang,Li, Fan, Guohui, Xu, Jiuyang, Gu, Xiaoying, Cheng, Zhenshun, Yu, Ting, Xia, Jiaan,Wei, Yuan, Wu, Wenjuan, Xie, Xuelei, Yin, Wen, Li, Hui, Liu, Min, Xiao, Yan, Gao,Hong, Guo, Li, Xie, Jungang, Wang, Guangfa, Jiang, Rongmeng, Gao, Zhancheng, Jin,Qi, Wang, Jianwei, & Cao, Bin. 2020. Clinical features of patients infected with 2019 novelcoronavirus in Wuhan, China. The Lancet , (10223), 497–506.Jaiswal, Aayush, Gianchandani, Neha, Singh, Dilbag, Kumar, Vijay, & Kaur, Manjit. 2020.Classiﬁcation of the COVID-19 infected patients using DenseNet201 based deep transferlearning. Journal of Biomolecular Structure and Dynamics , 1–8.Kasson, Peter M., Zomorodian, Afra, Park, Sanghyun, Singhal, Nina, Guibas, Leonidas J.,& Pande, Vijay S. 2007. Persistent voids: a new structural metric for membrane fusion.

Bioinformatics , (14), 1753–1759.Kong, Weifang, & Agarwal, Prachi P. 2020. Chest Imaging Appearance of COVID-19 Infec-tion. Radiology: Cardiothoracic Imaging , (1). https://doi.org/10.1148/ryct.2020200028.Lee, H., Kang, H., Chung, M. K., Kim, B., & Lee, D. S. 2012. Persistent Brain NetworkHomology From the Perspective of Dendrogram. IEEE Transactions on Medical Imaging , (12), 2267–2277.Li, Xiaoming, Zeng, Wenbing, Li, Xiang, Chen, Haonan, Shi, Linping, Li, Xinghui, Xiang,Hongnian, Cao, Yang, Chen, Hui, Liu, Chen, & Wang, Jian. 2020. CT imaging changes ofcorona virus disease 2019 (COVID-19): a multi-center study in Southwest China. Journalof Translational Medicine , . https://doi.org/10.1186/s12967-020-02324-w.Liu, Xu, Xie, Zheng, Yi, Dongyun, et al. . 2012. A fast algorithm for constructing topologicalstructure in large data. Homology, Homotopy and Applications , (1), 221–238.Otter, Nina, Porter, Mason A, Tillmann, Ulrike, Grindrod, Peter, & Harrington, Heather A.2017. A roadmap for the computation of persistent homology. EPJ Data Science , (1),17.Panwar, Harsh, Gupta, PK, Siddiqui, Mohammad Khubeb, Morales-Menendez, Ruben,Bhardwaj, Prakhar, & Singh, Vaishnavi. 2020. A deep learning and grad-CAM basedcolor visualization approach for fast detection of COVID-19 cases using chest X-ray andCT-Scan images. Chaos, Solitons & Fractals , , 110190.Qaiser, Talha, Sirinukunwattana, Korsuk, Nakane, Kazuaki, Tsang, Yee-Wah, Epstein,David, & Rajpoot, Nasir M. 2016. Persistent homology for fast tumor segmentation inwhole slide histology images. Procedia Computer Science , , 119–124.Qaiser, Talha, Tsang, Yee-Wah, Epstein, David, & Rajpoot, Nasir. 2017. Tumor segmen-tation in whole slide images using persistent homology and deep convolutional features. Pages 320–329 of: Annual Conference on Medical Image Understanding and Analysis .Springer. 20aiser, Talha, Tsang, Yee-Wah, Taniyama, Daiki, Sakamoto, Naoya, Nakane, Kazuaki, Ep-stein, David, & Rajpoot, Nasir. 2019. Fast and accurate tumor segmentation of histologyimages using persistent homology and deep convolutional features.

Medical Image Analy-sis , , 1 – 14.Rieck, Bastian, Mara, Hubert, & Leitte, Heike. 2012. Multivariate data analysis usingpersistence-based ﬁltering and topological signatures. IEEE Transactions on Visualizationand Computer Graphics , (12), 2382–2391.Rotman, Joseph J. 2013. An introduction to algebraic topology . Vol. 119. Springer Science& Business Media.Ruder, Sebastian. 2019.

Neural transfer learning for natural language processing . Ph.D.thesis, NUI Galway.Ruder, Sebastian, Peters, Matthew E, Swayamdipta, Swabha, & Wolf, Thomas. 2019. Trans-fer learning in natural language processing.

Pages 15–18 of: Proceedings of the 2019 Con-ference of the North American Chapter of the Association for Computational Linguistics:Tutorials .Soares, Eduardo, Angelov, Plamen, Biaso, Sarah, Froes, Michele Higa, & Abe, Daniel Kanda.2020. SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identiﬁcation. medRxiv .Tahamtan, Alireza, & Ardebili, Abdollah. 2020. Real-time RT-PCR in COVID-19 detec-tion: issues aﬀecting the results.

Expert review of molecular diagnostics , , 453–454.doi:10.1080/14737159.2020.1757437.Tralie, Christopher, Saul, Nathaniel, & Bar-On, Rann. 2018. Ripser.py: A Lean PersistentHomology Library for Python. The Journal of Open Source Software , (29), 925.Voulodimos, Athanasios, Doulamis, Nikolaos, Doulamis, Anastasios, & Protopapadakis,Eftychios. 2018. Deep learning for computer vision: A brief review. Computational intel-ligence and neuroscience , .Wang, Bao, & Wei, Guo-Wei. 2016. Object-oriented persistent homology. Journal of com-putational physics , , 276–299.Wang, Zhao, Liu, Quande, & Dou, Qi. 2020. Contrastive Cross-Site Learning With Re-designed Net for COVID-19 CT Classiﬁcation. IEEE Journal of Biomedical and HealthInformatics , (10), 2806–2813.Yao, Yuan, Sun, Jian, Huang, Xuhui, Bowman, Gregory R, Singh, Gurjeet, Lesnick, Michael,Guibas, Leonidas J, Pande, Vijay S, & Carlsson, Gunnar. 2009. Topological methods forexploring low-density states in biomolecular folding pathways. The Journal of chemicalphysics , (14), 04B614. 21azdani, Shakib, Minaee, Shervin, Kaﬁeh, Rahele, Saeedizadeh, Narges, & Sonka, Milan.2020. Covid ct-net: Predicting covid-19 from chest ct images using attentional convolu-tional network. arXiv preprint arXiv:2009.05096arXiv preprint arXiv:2009.05096