[PDF] COVID-19 Infection Map Generation and Detection from Chest X-Ray Images

Abstract

Computer-aided diagnosis has become a necessity for accurate and immediate coronavirus disease 2019 (COVID-19) detection to aid treatment and prevent the spread of the virus. Numerous studies have proposed to use Deep Learning techniques for COVID-19 diagnosis. However, they have used very limited chest X-ray (CXR) image repositories for evaluation with a small number, a few hundreds, of COVID-19 samples. Moreover, these methods can neither localize nor grade the severity of COVID-19 infection. For this purpose, recent studies proposed to explore the activation maps of deep networks. However, they remain inaccurate for localizing the actual infestation making them unreliable for clinical use. This study proposes a novel method for the joint localization, severity grading, and detection of COVID-19 from CXR images by generating the so-called infection maps. To accomplish this, we have compiled the largest dataset with 119,316 CXR images including 2951 COVID-19 samples, where the annotation of the ground-truth segmentation masks is performed on CXRs by a novel collaborative human-machine approach. Furthermore, we publicly release the first CXR dataset with the ground-truth segmentation masks of the COVID-19 infected regions. A detailed set of experiments show that state-of-the-art segmentation networks can learn to localize COVID-19 infection with an F1-score of 83.20%, which is significantly superior to the activation maps created by the previous methods. Finally, the proposed approach achieved a COVID-19 detection performance with 94.96% sensitivity and 99.88% specificity.

Full PDF

11 COVID-19 Infection Map Generation and Detectionfrom Chest X-Ray Images

Aysen Degerli, Mete Ahishali, Mehmet Yamac, Serkan Kiranyaz, Muhammad E. H. Chowdhury, Khalid Hameed,Tahir Hamid, Rashid Mazhar, and Moncef Gabbouj

Abstract —Computer-aided diagnosis has become a necessityfor accurate and immediate coronavirus disease 2019 (COVID-19) detection to aid treatment and prevent the spread of thevirus. Compared to other diagnosis methodologies, chest X-ray(CXR) imaging is an advantageous tool since it is fast, low-cost,and easily accessible. Thus, CXR has a great potential not onlyto help diagnose COVID-19 but also to track the progressionof the disease. Numerous studies have proposed to use DeepLearning techniques for COVID-19 diagnosis. However, they haveused very limited CXR image repositories for evaluation with asmall number, a few hundreds, of COVID-19 samples. Moreover,these methods can neither localize nor grade the severity ofCOVID-19 infection. For this purpose, recent studies proposedto explore the activation maps of deep networks. However, theyremain inaccurate for localizing the actual infestation makingthem unreliable for clinical use. This study proposes a novelmethod for the joint localization, severity grading, and detectionof COVID-19 from CXR images by generating the so-called infection maps that can accurately localize and grade the severityof COVID-19 infection. To accomplish this, we have compiled thelargest COVID-19 dataset up to date with 2951 COVID-19 CXRimages, where the annotation of the ground-truth segmentationmasks is performed on CXRs by a novel collaborative experthuman-machine approach. Furthermore, we publicly release theﬁrst CXR dataset with the ground-truth segmentation masks ofthe COVID-19 infected regions. A detailed set of experimentsshow that state-of-the-art segmentation networks can learn tolocalize COVID-19 infection with an F1-score of 85.81%, thatis signiﬁcantly superior to the activation maps created by theprevious methods. Finally, the proposed approach achieved aCOVID-19 detection performance with 98.37% sensitivity and99.16% speciﬁcity.

Index Terms —SARS-CoV-2, COVID-19 Detection, COVID-19Infection Segmentation, Deep Learning

I. I

NTRODUCTION C ORONAVIRUS disease 2019 (COVID-19) caused bysevere acute respiratory syndrome Coronavirus-2 (SARs-CoV-2) was ﬁrst reported in December 2019 in Wuhan, China.The highly infectious disease rapidly spread around the Worldwith millions of positive cases. As a result, COVID-19 wasdeclared as a pandemic by the World Health Organization in

Aysen Degerli, Mete Ahishali, Mehmet Yamac, and Moncef Gabbouj arewith the Faculty of Information Technology and Communication Sciences,Tampere University, Tampere, Finland (e-mail: name.surname@tuni.ﬁ).Serkan Kiranyaz and Muhammad E. H. Chowdhury are with the De-partment of Electrical Engineering, Qatar University, Doha, Qatar (e-mail:[email protected] and [email protected]).Khalid Hameed is a MD in Reem Medical Center, Doha, Qatar (e-mail:[email protected]).Tahir Hamid is consultant cardiologist in Hamad Medical CorporationHospital and with Weill Cornell Medicine - Qatar, Doha, Qatar. Rashid Mazharis a MD in Hamad Medical Corporation Hospital, Doha, Qatar.

Fig. 1: The COVID-19 sample CXR images, their correspond-ing ground-truth segmentation masks which are annotated bythe collaborative human-machine approach, and the generatedinfection maps from the state-of-the-art segmentation models.March 2020. The disease may lead to hospitalization, intuba-tion, intensive care, and even death, especially for the elderly[1], [2]. Naturally, reliable detection of the disease has theutmost importance. However, the diagnosis of COVID-19 isnot straight-forward since its symptoms, such as cough, fever,breathlessness, and diarrhea are generally indistinguishablewithin other viral infections [3], [4].The diagnostic tools to detect COVID-19 are currentlyreverse transcription of polymerase chain reaction (RT-PCR)assays and chest imaging techniques, such as Computed To-mography (CT) and X-ray imaging. Primarily, RT-PCR hasbecome the gold standard in the diagnosis of COVID-19[5], [6]. However, RT-PCR arrays have a high false alarmrate which may be caused by the virus mutations in theSARS-CoV-2 genome, sample contamination, or damage to a r X i v : . [ ee ss . I V ] S e p the sample acquired from the patient [7], [8]. In fact, itis shown in hospitalized patients that RT-PCR sensitivity islow and the test results are highly unstable [6], [9]–[11].Therefore, it is recommended to perform chest CT imaginginitially on the suspected COVID-19 cases [12], since it isa more reliable clinical tool in the diagnosis with highersensitivity compared to RT-PCR. Hence, several studies [12]–[14] suggest performing CT on the negative RT-PCR ﬁndingsof the suspected cases. However, there are several limitationsof CT scans. Their sensitivity is limited in the early COVID-19 phase groups [15], and they are limited to recognize onlyspeciﬁc viruses [16], slow in image acquisition, and costly.On the other hand, X-ray imaging is faster, cheaper, and lessharmful to the body in terms of radiation exposure comparedto CT [17], [18]. Moreover, unlike CT devices, X-ray devicesare easily accessible; hence, reducing the risk of COVID-19 contamination during the imaging process [19]. Currently,chest X-ray (CXR) imaging is widely used as an assistive toolin COVID-19 prognosis, and it is reported to have a potentialdiagnosis capability in recent studies [20].In order to automate COVID-19 detection/recognition fromCXR images, many studies [17], [21]–[27] have proposed touse deep Convolutional Neural Networks (CNNs). However,the main limitation of these studies is that the data is scarcefor the target COVID-19 class. Such limited amount of datadegrades the learning performance of the deep networks. Tworecent studies [28] and [29] have addressed this drawback witha compact network structure and achieved the state-of-the-art detection performance over the benchmark QaTa-COV19and Early-QaTa-COV19 datasets that consist of and COVID-19 CXR images, respectively. Despite the fact thatthese datasets were the largest available at that time, such alimited number of COVID-19 samples raises robustness andreliability issues for the proposed methods in general.Moreover, all these previous machine learning solutionswith X-ray imaging remain limited to only COVID-19 detec-tion. However, as stated by Shi [30], COVID-19 pneumoniascreening is important for evaluating the status of the patientand treatment. Therefore, along with the detection, COVID-19 related infection localization is another crucial problem.Hence, several studies [31]–[33] produced activation mapsthat are generated from different Deep Learning (DL) modelstrained for COVID-19 detection (classiﬁcation) task to localizeCOVID-19 infection in the lungs. Infection localization hastwo vital objectives: an accurate assessment of the infectionlocation and the severity of the disease. However, resultsof previous studies show that the activation maps generatedinherently from the underlying DL network may fail to accom-plish both objectives, that is, irrelevant locations with biasedseverity grading appeared in many cases. To overcome theseproblems, two studies [34], [35] proposed to perform lungsegmentation as a ﬁrst step in their approaches. This way,they have narrowed the region of interest down to the regionsof lungs to increase reliability of their methods. Overall, untilthis study, screening COVID-19 infection from such activationmaps produced by classiﬁcation networks was the only optionfor the localization due to the absence of ground-truth ofthe datasets available in the literature. Many studies [30], [34], [36]–[38] have COVID-19 infection ground-truths for CTimages; however, ground-truth segmentation masks for CXRimages are non-existent.In this study, in order to overcome the aforementionedlimitations and drawbacks, ﬁrst, the benchmark dataset QaTa-COVSeg proposed by the researchers of Qatar University andTampere University in [28] and [29] is extended to include

COVID-19 samples. This new dataset is - times largerthan those used in earlier studies. The extended benchmarkdataset, QaTa-COVSeg with around , CXR images, isnot only the largest ever composed dataset, but it is theﬁrst dataset that has the ground-truth segmentation masks forCOVID-19 infection regions, some samples are shown in Fig.1. To obtain the ground-truth, an expert human-machine col-laborative approach is introduced to improve the segmentationmasks manually drawn by medical doctors (MDs). This isan iterative process, where MDs initiate the segmentation by”manually-drawn” segmentation masks for a subset of CXRimages. Then, the trained segmentation networks over thissubset generate their own ”competing” masks and the MDs areasked to compare them pair-wise (initial manual segmentationversus automatically segmented mask) for the same patient.The networks also segment the remaining CXR images, whichare veriﬁed by the expert MDs. Such a veriﬁcation improvesthe quality of the generated masks as well as the training.The human-machine collaboration continues until the MDsare fully satisﬁed, i.e., a satisfactory mask can be foundamong the masks generated by the networks for all CXRimages in the dataset. In this study, we show that even withtwo stages (iterations), highly superior infection maps can beobtained, and an elegant COVID-19 detection performance canbe achieved.For the infection map generation, we use the following state-of-the-art deep networks: U-Net [39], U-Net++ [40], whichprovide top performances in biomedical image segmentation,and Deep Layer Aggression (DLA) [41] encoder-decoderCNN (E-D CNN) type segmentation networks. Moreover, theencoder structure of the E-D CNN architectures is varied:CheXNet [42] (ﬁned-tuned version of DenseNet-121 [43]),DenseNet-121, Inception-v3 [44], and ResNet-50 [45]. Next,the infection maps are generated from the predictions of theE-D CNN models to visualize/detect COVID-19 infection.The rest of the paper is organized as follows. In sectionII-A, we introduce the benchmark QaTa-COVSeg dataset. Ournovel human-machine collaborative approach for the ground-truth annotation is explained in Section II-B. Next, the detailsof COVID-19 infected region segmentation, and the infectionmap generation and COVID-19 detection are presented inSections II-C and II-D, respectively. The experimental setupand results with the benchmark dataset are reported in SectionIII-A and III-B, respectively. Finally, we conclude the paperin Section IV.II. M

ATERIALS AND M ETHODOLOGY

The proposed approach in this study is composed of threemain phases: 1) training the state-of-the-art deep models forCOVID-19 infected region segmentation using the ground-truth segmentation masks, 2) infection map generation from

Fig. 2: The pipeline of the proposed approach has three stages: COVID-19 infected region segmentation, infection mapgeneration, and COVID-19 detection. The CXR image is the input to the trained E-D CNN and the network’s probabilisticprediction is used to generate infection maps. The generated infection maps are used for COVID-19 detection.the trained segmentation networks, and 3) COVID-19 detec-tion as it can be depicted in Fig. 2. In this section, we ﬁrstdetail the creation of the benchmark QaTa-COVSeg dataset.Then, the proposed approach for collaborative human-machineground-truth generation is introduced.

A. The Benchmark QaTa-COVSeg Dataset

The researchers of Qatar University and Tampere Universityhave compiled the largest COVID-19 dataset up to date:

QaTa-COVSeg including

COVID-19, and , normal (con-trol group) CXR images. To create QaTa-COVSeg, we haveutilized several publicly available, scattered, and different for-mat datasets and repositories. Therefore, the collected imagesfrom the datasets had some duplicate, over-exposed and low-quality images that were identiﬁed and removed in the pre-processing stage. Consequently, the COVID-19 CXRs are fromdifferent publicly available sources resulting in high intra-class dissimilarity as depicted in Fig. 3. The image sourcesof normal and COVID-19 CXRs are detailed as follows:Fig. 3: The COVID-19 CXR samples from the benchmarkQaTa-COVSeg dataset. Normal CXRs:

RSNA pneumonia detection challengedataset [46] is comprised of about . K CXR images, where images are normal. All CXRs in the dataset are inDICOM format, a popularly used format for medical imaging.Padchest dataset [47] consists of , CXR images from , patients, where , images are from normal class.The images are evaluated and reported by radiologists atHospital Sun Juan in Spain during − . The datasetincludes six different position views of CXR and additionalinformation regarding image acquisition and patient demogra-phy. Paul Mooney [48] has released an X-ray dataset of CXR images from a total of patients, where imagesare from normal class. The data is collected from pediatricpatients aging one to ﬁve years old at Guangzhou Women andChildren’s Medical Center, Guangzhou. The dataset in [49]consists of

CXR images and the corresponding radiolo-gist reports from the Indiana Network for Patient Care, wherea total of frontal CXR samples are labeled as normal. In[50], there are normal CXRs from the tuberculosis controlprogram of the Department of Health and Human Services ofMontgomery County and normal CXRs from ShenzhenHospital. In this study, a total of , normal CXRs aregathered from the aforementioned datasets. COVID-19 CXRs:

BIMCV-COVID19+ [51] is the largestpublicly available dataset with

COVID-19 positive CXRimages. The CXR images of BIMCV-COVID19+ dataset wererecorded with computed radiography (CR) and digital X-ray(DX) machines. Hannover Medical School and Institute forDiagnostic and Interventional Radiology [52] released

CXR images of COVID-19 patients. A total of

CXRimages are from public repositories: Italian Society of Medicaland Interventional Radiology (SIRM), GitHub, and Kaggle[35], [53]–[56]. As mentioned earlier, any duplication and low-quality images are removed since COVID-19 CXR images arecollected from different public datasets and repositories. In thisstudy, a total of

COVID-19 CXRs are gathered from theaforementioned datasets. Therefore, COVID-19 CXRs are ofdifferent age, group, gender, and ethnicity.

Fig. 4: The two stages of the human-machine collaborative approach. Stage I: A subset of CXR images with manually drawnsegmentation masks are used to train three different deep networks in a 5-fold cross-validation scheme. The manually drawnground-truth (a), and the three predictions (b, c, d) are blindly shown to MDs, and they select the best ground-truth mask. StageII: Five deep networks are trained over the best segmentation masks selected. Then, they are used to produce the segmentationmasks for the rest of the CXR dataset (a, b, c, d, e), which are shown to MDs.

B. Collaborative Human-Machine Ground-Truth Annotation

Recent developments in machine and deep learning tech-niques led to state-of-the-art performance in many computervision (CV) tasks, such as image classiﬁcation, object de-tection, and image segmentation. However, supervised DLmethods require a huge amount of annotated data. Otherwise,the limited amount of data degrades the performance of thedeep network structures since their generalization capabilitydepends on the availability of large datasets. Nevertheless,to produce ground-truth segmentation masks, pixel-accurateimage segmentation by human experts can be a cumbersomeand highly subjective task even for moderate size datasets.In order to overcome this challenge, in this study, wepropose a novel collaborative human-machine approach toaccurately produce the ground-truth segmentation masks forinfected regions directly from the CXR images. The proposedapproach is performed in two main stages. First, a group ofexpert MDs manually segment the infected regions of a subsetof ( in our case) CXR images. Then, several segmentationnetworks that are inspired by the U-Net [39] structure witha 5-fold cross-validation scheme, are trained over the initialground-truth masks. For each fold, the segmentation masks ofthe test samples are predicted by the networks. The networkpredicted masks along with the initial (MD drawn) ground-truth masks, and original CXR image are assessed by the MDs,and the best segmentation mask among them is selected. Stepsof Stage-I are illustrated in Fig. 4 (top). At the end of theﬁrst stage, collaboratively annotated ground-truth masks forthe subset of CXR images are formed, and they are obviously superior to the initial manually drawn masks since they areselected by the MDs. An interesting observation in this stagewas that MDs preferred the machine-generated masks over themanually drawn masks in the ﬁrst stage in three out of ﬁvecases.In the second stage ﬁve deep networks, inspired by U-Net[39], UNet++ [40], and DLA [41] architectures are trainedover the collaborative masks, which were formed in Stage-I. The trained segmentation networks are used to predict thesegmentation masks of the rest of the data, which is around unannotated COVID-19 images. Among the ﬁve predic-tions, the expert MDs select the best one as the ground-truthor deny all if none was found successful. For the latter case,MDs were asked to draw the ground-truth masks manually.However, we notice that this was indeed a minority case thatincluded less than % of unannotated data. The steps of Stage-II are shown in Fig. 4 (bottom). As a result, the ground-truth masks for COVID-19 CXR images are gathered toconstruct the benchmark QaTa-COVSeg dataset. The proposedapproach does not only save valuable human labor time, butit also improves the quality and reliability of the masks byreducing the subjectivity with Stage-II veriﬁcation step.

C. COVID-19 Infected Region Segmentation

Segmentation of COVID-19 infection is the ﬁrst step of ourproposed approach as depicted in Fig. 2. Once the ground-truthannotation for QaTa-COVSeg benchmark dataset is formedas explained in the previous section, we perform infectedregion segmentation extensively with different network conﬁgurations. We have used three different segmentationmodels: U-Net, UNet++ and DLA, with four different encoderstructures: CheXNet, DenseNet-121, Inception-v3 and ResNet-50, and frozen & not frozen encoder weight conﬁgurations.

1) Segmentation Models:

We have tried distinct segmenta-tion model structures starting from shallow to deep structureswith varied conﬁgurations as follows: • U-Net [39] is an outperforming network for medicalimage segmentation applications with a u-shaped archi-tecture as the encoder part is symmetric with respect toits decoder part. Therefore, this unique decoder structurewith many feature channels allows the network to carrythe information through its latest layers. • UNet++ [40] has further developed the decoder structureof U-Net by connecting the encoder to the decoder withthe nested dense convolutional blocks. This way, thebridge between the encoder and decoder parts are moreﬁrmly knit; thus, the information can be transferred toits ﬁnal layers more intensively compared to the classicU-Net. • DLA [41] investigates the connecting bridges betweenthe encoder and decoder, and proposes a way to fusethe semantic and spatial information with dense layers,which are progressively aggregated by iterative mergingto deeper and larger scales.

2) Encoder Selections for Segmentation Models:

In thisstudy, we use several deep CNNs to form the encoder partof the above-mentioned segmentation models as follows: • DenseNet-121 [43] is a deep network with layers,each with additional input nodes connecting all the layersdirectly with each other. Therefore, the maximum infor-mation ﬂow through the network is satisﬁed. • CheXNet [42] is based on the architecture of DenseNet-121, which is trained over the -class ChestX-ray14dataset [57] to detect pneumonia cases from CXR images.In [42], DenseNet-121 is initialized with the ImageNetweights, and ﬁne-tuned over K CXR images resultingthe state-of-the-art results on the ChestX-ray14 datasetwith a better performance compared to the conclusionsof radiologists. • Inception-v3 [44] achieves state-of-the-art results withmuch less computational complexity compared to itsdeep competitors by factorizing the convolutions andpruning the dimensions inside the network. Despite theless complexity, it preserves a higher performance. • ResNet-50 [45] introduces a deep residual learningframework that forces the desired mapping of the inputto a residual mapping. It is possible to achieve this goalby the shortcut connections on the stacked layers. Theseconnections enable to merge the input and output ofthe stacked layers by addition operations; therefore, theproblem of gradient vanishing is prevented.We perform transfer learning on the encoder side of the seg-mentation models by initializing the layers with the ImageNetweights, except for CheXNet which is pre-trained on theChestX-ray14 dataset. We tried two conﬁgurations, in the ﬁrst we freeze the encoder layers, while in the second, they areallowed to vary.

3) Hybrid Loss Function:

In this study, we have performedtraining the segmentation networks with a hybrid loss functionby combining focal loss [58] together with dice loss [59] toachieve a better segmentation performance. We use focal losssince COVID-19 infected region segmentation is an imbal-anced problem: the number of background pixels is superiorto the foreground’s. Let the ground-truth segmentation maskbe Y , where each pixel class label is deﬁned as y , and thenetwork prediction as ˆ y . We deﬁne the pixel class probabilitiesas for the positive class P ( y = 1) = p , and for the negativeclass P ( y = 0) = 1 − p . On the other hand, the networkprediction probabilities are modeled by the logistic functionusing the sigmoid curve as, P (ˆ y = 1) = 11 + e − z = q (1) P (ˆ y = 0) = 1 −

11 + e − z = 1 − q (2)where z is some function of the input CXR image X . Then,we deﬁne the cross-entropy (CE) loss as follows: CE ( p, q ) = − p log q − (1 − p ) log(1 − q ) . (3)A common solution to address the class imbalance problemis to add a weighting factor α ∈ [0 , for the positive class,and − α for the negative class, which deﬁnes the balancedcross-entropy (BCE) loss as, BCE ( p, q ) = − αp log q − (1 − α )(1 − p ) log(1 − q ) . (4)In this way, the importance of positive and negative samplesare balanced. However, adding the α factor does not solve theissue for the large class imbalance scenario. This is because thenetwork cannot distinguish outliers (hard samples) and inliers(easy samples) with the BCE loss. To overcome this drawback,focal loss [58] proposes to set focusing parameter γ ≥ inorder to down-weight the loss of easy samples that occur withsmall errors; so that the model can be forced to learn hardnegative samples. The focal (F) loss is deﬁned as, F ( p, q ) = − α (1 − q ) γ p log q − (1 − α ) q γ (1 − p ) log(1 − q ) . (5)where F loss is equivalent to BCE loss when γ = 0 . In ourexperimental setup, we use the default setting as α = 0 . , and γ = 2 for all the networks. To achieve a good segmentationperformance, we combined focal loss with dice loss, which isbased on the dice coefﬁcient (DC) deﬁned as follows: DC = 2 | Y ∩ ˆY || Y | ∪ | ˆY | (6)where ˆY is the predicted segmentation mask of the network.Hence, the DC can be interpreted as a dice (D) loss as follows: D ( p, q ) = 1 − (cid:80) p h,w q h,w (cid:80) p h,w + (cid:80) q h,w (7)where h and w are height and width of the ground-truth andprediction masks Y and ˆY , respectively. Finally, we combinedD and F losses by summation to achieve the so-called hybridloss function for the segmentation networks. Fig. 5: The three COVID-19 CXR test samples, X withthe corresponding ground-truth masks, Y . The color-codednetwork predictions, ˆY R , G , B are reﬂected translucent ontothe X to generate infection map on the lungs, where ˆY > . D. Infection Map Generation and COVID-19 Detection

Having the training set of COVID-19 CXR images via thecollaborative human-machine approach explained in SectionII-A, we train the aforementioned segmentation networks toproduce infection maps. We train the networks with a 5-foldcross-validation scheme, where in each fold we feed each testCXR sample X into the network. Then, we obtain the networkprediction mask ˆY , which is used to generate an infectionmap that is a measure of infected region probabilities onthe input X . Each pixel in ˆY is deﬁned as ˆY h , w ∈ [0 , ,where h and w represent the size of the image. We thenapply an RGB-based color transform, i.e., the jet color scaleto obtain the RGB version of the prediction mask, ˆY R,G,B as shown in Fig. 5 for a pseudo-colored probability measurevisualization. The infection map is generated as a reﬂectionof the network prediction ˆY R,G,B onto the CXR image X .Hence, for visualization, we form the imposed image byconcatenating the hue and saturation components of ˆY H,S,V ,and value component of X H,S,V . Finally, the imposed imageis converted back to RGB domain. In the infection map, wedo not show the pixels/regions with zero probabilities fora better visualization effect. This way, the infected regions,where ˆY > are shown translucent as in Fig 5.Along with the infection map generation, which alreadyprovides localization and segmentation of COVID-19 infec-tion, COVID-19 detection can easily be performed using theproposed approach. The detection of COVID-19 is performedbased on the predictions of the trained segmentation networks.Accordingly, a test sample is classiﬁed as COVID-19 class if ˆY ≥ . at any pixel location.III. E XPERIMENTAL R ESULTS

In this section, ﬁrst, the experimental setup is presented.Then, both numerical and visual results are reported with an extensive set of comparative evaluations over the benchmarkQaTa-COVSeg dataset. Finally, visual comparative evaluationsare presented between the infection maps and the activationmaps extracted from state-of-the-art deep models.

A. Experimental Setup

Quantitative evaluations for the proposed approach are per-formed for both COVID-19 infected region segmentation andCOVID-19 detection. COVID-19 infected region segmentationis evaluated on a pixel-level, where we consider the foreground(infected region) as the positive class, and background as thenegative class. For COVID-19 detection, the performance iscomputed per CXR sample, and we consider COVID-19 as thepositive class and the control group (normal) as the negativeclass. Overall, elements of the confusion matrix are formed asfollows: true positive (TP): the number of correctly detectedpositive class members, true negative (TN): the number ofcorrectly detected negative class samples, false positive (FP):the number of misclassiﬁed negative class members, and falsenegative (FN): the number of misclassiﬁed positive classsamples. The standard performance evaluation metrics aredeﬁned as follows:

Sensitivity = T PT P + F N (8)where sensitivity (or Recall) is the rate of correctly detectedpositive samples in the positive class samples,

Specif icity = T NT N + F P (9)where speciﬁcity is the ratio of accurately detected negativeclass samples to all negative class samples,

P recision = T PT P + F P (10)where precision is the rate of correctly classiﬁed positive classsamples among all the members classiﬁed as positive samples,

Accuracy = T P + T NT P + T N + F P + F N (11)where accuracy is the ratio of correctly classiﬁed elementsamong all the data, F ( β ) = (1 + β ) ( P recision × Sensitivity ) β × P recision + Sensitivity (12)where F -score is deﬁned by the weighting parameter β . The F1 -Score is calculated with β = 1 , which is the harmonicaverage of precision and sensitivity . The F2 -score is calculatedwith β = 2 , which emphasizes FN minimization over FPs. Themain objective of both COVID-19 segmentation and detectionis to maximize sensitivity with a reasonable speciﬁcity inorder to minimize FP COVID-19 cases or pixels. Equivalently,maximized F2 -score is targeted with an acceptable F1 -Scorevalue. The performances with their % conﬁdence interval(CI) for both COVID-19 infected region segmentation anddetection are given in Tables I and III, respectively. The rangeof values can be calculated for each performance as follows: r = ± z (cid:112) metric (1 − metric ) /N , (13) TABLE I: Average performance metrics (%) for

COVID-19 infected region segmentation computed on the test (unseen) set from folds with three state-of-the-art segmentation models, four encoder architectures, and weight initializations. The initializedencoder layers are set to frozen ( (cid:51) ) and not frozen ( (cid:55) ) states during the investigation. Model Encoder EncoderLayers Sensitivity Speciﬁcity Precision F1-Score F2-Score Accuracy AUC

U-Net

CheXNet (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − CheXNet (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − DenseNet-121 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − ± . × − DenseNet-121 (cid:55) ± . × − . ± × − . ± . × − ± . × − ± . × − ± . × − . ± . × − Inception-v3 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − Inception-v3 (cid:55) . ± . × − ± × − ± . × − . ± . × − . ± . × − . ± . × − . ± . × − ResNet-50 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − ResNet-50 (cid:55) . ± . × − ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − UNet++

CheXNet (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± × − CheXNet (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± × − DenseNet-121 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − DenseNet-121 (cid:55) . ± . × − ± × − ± . × − . ± . × − . ± . × − . ± . × − ± . × − Inception-v3 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − Inception-v3 (cid:55) ± . × − . ± × − . ± . × − ± . × − ± . × − ± . × − . ± . × − ResNet-50 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − ResNet-50 (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± × − DLA

CheXNet (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − CheXNet (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − DenseNet-121 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − DenseNet-121 (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − Inception-v3 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − Inception-v3 (cid:55) ± . × − ± × − ± . × − ± . × − ± . × − ± . × − ± . × − ResNet-50 (cid:51) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − ResNet-50 (cid:55) . ± . × − . ± × − . ± . × − . ± . × − . ± . × − . ± . × − . ± . × − where z is the level of signiﬁcance, metric is any perfor-mance evaluation metric, and N is the number of samples.Accordingly, z is set to . for % CI.We have evaluated the networks in a stratiﬁed 5-fold cross-validation scheme with a ratio of training to test(unseen folds) over the benchmark QaTa-COVSeg dataset. Theinput CXR images are resized to × pixels. Table IIshows the number of CXRs per class in the dataset. Since thetwo classes are imbalanced, we have applied data augmen-tation in order to balance the classes. Therefore, COVID-19samples are augmented up to the same number of samples asthe normal class in the training set for each fold. The dataaugmentation is performed using Image Data Generator inKeras: the CXR samples are augmented by randomly shiftingthem both vertically and horizontally by and randomlyrotating them in a range of degrees. After shifting androtating the images, blank sections are ﬁlled using the nearest mode.TABLE II: Number of CXR samples per-class and per-foldbefore and after data augmentation. Class Number ofSamples TrainingSamples AugmentedTraining Samples TestSamplesCOVID-19

Normal

Total

We have implemented the deep networks with Tensorﬂowlibrary [60] using Python on NVidia R (cid:13) GeForce RTX 2080Ti GPU card. For training, Adam optimizer [61] is used withthe default momentum parameters, β . and β . using the aforementioned hybrid loss function. The segmenta-tion networks are trained with -epochs with a learning rateof α = 10 − and a batch size of .For comparing the computed infection maps, the activationmaps are computed as follows: the encoder structures of thesegmentation networks are trained for the classiﬁcation taskwith a modiﬁcation at the output layer by adding -neurons for the number of total classes. The activation maps extracted fromthe classiﬁcation models are then compared with the infectionmaps of the segmentation models. The classiﬁcation networks,CheXNet, DenseNet-121, Inception-v3 and ResNet-50 areﬁne-tuned using categorical cross-entropy as loss function with epochs and a learning rate of α = 10 − , which is asufﬁcient setting to prevent over-ﬁtting, based on our previousstudy [29]. Other settings of the classiﬁers are kept the samewith the segmentation models. B. Experimental Results

The experiments are carried out for both COVID-19 infectedregion segmentation and COVID-19 detection. We extensivelytested the benchmark QaTa-COVSeg dataset using three differ-ent state-of-the-art segmentation networks with four differentencoder options. We also investigated the effect of frozenencoder weights on the performance.

1) COVID-19 Infected Region Segmentation:

The perfor-mance of the segmentation models for COVID-19 infectedregion segmentation are presented in Table I. Each modelstructure is evaluated with two conﬁgurations: frozen and notfrozen encoder layers. We have used transfer learning onthe encoder layers with ImageNet weights, except for theCheXNet model, which is pre-trained on the ChestX-ray14dataset. The evaluation of the models with frozen encoderlayers is also important since this process can lead to a betterconvergence and improved performance. However, as theresults show, better performance is obtained when the networkcontinues to learn on the encoder layers as well. For eachmodel, we have observed that two encoders: DenseNet-121and Inception-v3 are the top-performing ones for the infectedregion segmentation task. The U-Net model with DenseNet-121 encoder holds the leading performance by % sensi-tivity, . % F1-Score, and . % F2-Score. DenseNet-121 produces better results compared to other encoder typessince it can preserve the information coming from earlierlayers through the output by concatenating the feature maps TABLE III: Average

COVID-19 detection performance results (%) computed from folds over the test (unseen) set with threenetwork models, four encoder architectures, and weight initializations. The initialized encoder layers are set to frozen ( (cid:51) ) and not frozen ( (cid:55) ) states during the investigation. Encoder EncoderLayers Sensitivity Speciﬁcity Precision F1-Score F2-Score Accuracy U - N e t CheXNet (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . CheXNet (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:55) ± . . ± . . ± . . ± . ± . . ± . Inception-v3 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . Inception-v3 (cid:55) . ± . ± . ± . ± . . ± . ± . ResNet-50 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . ResNet-50 (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . UN e t ++ CheXNet (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . CheXNet (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:55) . ± . ± . ± . ± . ± . ± . Inception-v3 (cid:51) ± . . ± . . ± . . ± . . ± . . ± . Inception-v3 (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . ResNet-50 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . ResNet-50 (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . D L A CheXNet (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . CheXNet (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . DenseNet-121 (cid:55) . ± . . ± . . ± . ± . ± . ± . Inception-v3 (cid:51) . ± . . ± . . ± . . ± . . ± . . ± . Inception-v3 (cid:55) . ± . ± . ± . . ± . . ± . . ± . ResNet-50 (cid:51) ± . . ± . . ± . . ± . . ± . . ± . ResNet-50 (cid:55) . ± . . ± . . ± . . ± . . ± . . ± . from each dense layer. However, in the other segmentationmodels, Inception-v3 outperforms the other encoder types. Thepresented segmentation performances are obtained by settingthe threshold value to . to compute the segmentation maskfrom the network probabilities. The Precision-Recall curvesare plotted in Fig. 6 by varying this threshold value.

2) COVID-19 Detection:

The performances of the seg-mentation models for COVID-19 detection are presented inTable III. All the models are evaluated by stratiﬁed a 5-foldcross-validation scheme, and the table shows the averagedFig. 6: The Precision-Recall curves of the three leading modelsall with the not frozen encoder layers setting. results of these folds. The most crucial metric here is thesensitivity since missing any patient with COVID-19 is critical.In fact, the results indicate the robustness of the model asthe proposed approach can achieve high sensitivity levels of . % with a . % F2-Score. Additionally, the proposedapproach achieves an elegant speciﬁcity of . %, indicatinga signiﬁcantly low false alarm rate.TABLE IV: Cumulative confusion matrices of COVID-19detection by the best performing U-Net and UNet++ modelswith DenseNet-121 encoder. (a) U-Net DenseNet-121 U-Net

PredictedNormal COVID-19GroundTruth Normal

COVID-19

48 2903 (b) UNet++ DenseNet-121

UNet++

PredictedNormal COVID-19GroundTruth Normal

COVID-19

103 2848

It can be observed from Table III that DenseNet-121 encoderwith the not frozen encoder layer setting gives the mostpromising results among the others. The confusion matrices,accumulated on each fold’s test set, are presented in TableIV. The highest sensitivity in COVID-19 detection is achievedby the U-Net DenseNet-121 model (Table IVa). Accordingly,the U-Net DenseNet-121 model only misses COVID-19patients out of . On the other hand, the highest speciﬁcityis achieved by UNet++ DenseNet-121 model (Table IVb). TheUNet++ model only misses a minor part of the normal class

Fig. 7: Several CXR images with their corresponding ground-truth masks. The activation maps extracted from the classiﬁcationmodels are presented in the middle block. The last block is the generated infection maps from the segmentation models. It isevident that the infection maps yield a superior localization of COVID-19 infection compared to activation maps.with samples out of . C. Infection vs Activation Maps

Several studies [31]–[33] propose to localize COVID-19from CXRs by extracting activation maps from the deepclassiﬁcation models trained for COVID-19 detection. Despitethe simplicity of the idea, there are many limitations of this ap-proach. First of all, without any infected region segmentationground-truth masks, the network can only produce a roughlocalization, and the extracted activation maps may entirelyfail to localize COVID-19 infection.In this study, we check the reliability of our proposedCOVID-19 detection approach by comparing it with DL mod-els trained for the classiﬁcation task. In order to achieve thisobjective, we compare the infection map and activation mapof CXR images, which are generated from the segmentationand classiﬁcation networks, respectively. Therefore, we havetrained the encoder structures of the segmentation networks,which are CheXNet, DenseNet-121, Inception-v3, and ResNet-50 to perform COVID-19 classiﬁcation task in a stratiﬁed5-fold cross-validation scheme. We have extracted activationmaps from these trained models by the Gradient-weightedClass Activation Mapping (Grad-CAM) approach proposed in[62]. The localization Grad-CAM L c Grad-CAM ∈ R h × w of height h and width w for class c is calculated by the gradient of m c before the softmax with respect to the convolutional layer’s feature maps A k as ∂m c ∂A k . The gradients are passed throughfrom the global average pooling during back-propagation; α ck = 1 Z (cid:88) i (cid:88) j ∂m c ∂A k , (14)where α is the weight that shows the important feature map k from A for a target class c . Then, the linear combination isperformed following by ReLU to obtain the Grad-CAM; L c Grad-CAM = ReLU ( (cid:88) k α ck A k ) . (15)Despite their elegant performance, activation maps extractedfrom deep classiﬁcation networks are not suitable for localiz-ing COVID-19 infection as depicted in Fig 7. In fact, infectionsfound by the activation maps are highly irrelevant indicatingfalse locations outside of the lung areas. On the other hand,infection maps can generate a highly accurate location with anelegant severity grading of COVID-19 infection. The proposedinfection maps can conveniently be used by medical expertsfor an enhanced assessment of the disease. Real-time imple-mentation of the infection maps will obviously speed up thedetection process, cal also monitor the progression of COVID-19 infection in the lungs. D. Computational Complexity Analysis

In this section, we present the computational times ofthe networks and their number of trainable & non-trainable parameters. Table V shows the elapsed time in milliseconds(ms) during the inference step for each network used in theexperiments. The results in the table represent the runningtime per sample. It can be observed from the table that the U-Net model is the fastest among the others due to its shallowstructure. The fastest network is U-Net Inception-v3 withfrozen encoder layers taking up . ms. On the other hand,the slowest model is UNet++ structure since it has the largestnumber of trainable parameters. The most computationallydemanding model is UNet++ ResNet-50 with frozen encoderlayers, which takes . ms. We therefore conclude that allmodels can be used as real-time clinical applications.TABLE V: The number of trainable and non-trainable param-eters of the models with their inference time (ms) per sample.The initialized encoder layers are set to frozen ( (cid:51) ) or notfrozen ( (cid:55) ). Encoder EncoderLayersFrozen Trainable Non-Trainable Time(ms) U - N e t CheXNet (cid:51) . M . M . DensNet-121 (cid:51) . M . M . Inception-v3 (cid:51) . M . M ResNet-50 (cid:51) . M . M . CheXNet (cid:51) . M . K . DenseNet-121 (cid:55) . M . K . Inception-v3 (cid:55) . M . K . ResNet-50 (cid:55) . M . K . UN e t ++ CheXNet (cid:51) . M . M . DenseNet-121 (cid:51) . M . M . Inception-v3 (cid:51) . M . M . ResNet-50 (cid:51) . M . M . CheXNet (cid:55) . M . K . DenseNet-121 (cid:55) . M . K . Inception-v3 (cid:55) . M . K . ResNet-50 (cid:55) . M . K . D L A CheXNet (cid:51) . M . M . DensNet-121 (cid:51) . M . M . Inception-v3 (cid:51) . M . M . ResNet-50 (cid:51) . M . M . CheXNet (cid:55) . M . K . DenseNet-121 (cid:55) . M . K . Inception-v3 (cid:55) . M . K . ResNet-50 (cid:55) . M . K . IV. C

ONCLUSIONS

The immediate and accurate detection of highly infectiousCOVID-19 plays a vital role in preventing the spread of thevirus. In this study, we used CXR images since X-ray imagingis cheaper, easily accessible, and faster than the conventionalmethods commonly used such as RT-PCR and CT. As a majorcontribution, the largest CXR dataset, QaTa-COVSeg, whichconsists of 2951 COVID-19, and 12544 normal images, hasbeen compiled and will be shared publicly as a benchmarkdataset. Moreover, for the ﬁrst time in the literature, werelease the ground-truth segmentation masks of the infectedregions along with the introduced benchmark QaTa-COVSeg.Furthermore, we proposed a human-machine collaborativeapproach, which can be used when fast and accurate ground-truth annotation is desired but manual segmentation is slow,costly, and subjective. Finally, we propose a joint approachfor the COVID-19 infection map generation and detectionby using state-of-the-art segmentation models. Our extensive experiments on QaTa-COVSeg show that a reliable COVID-19 diagnosis can be achieved by generating infection maps,which can locate the infection on the lungs by % sensitivity,and . % F1-Score. Moreover, the proposed joint approachcan achieve an elegant COVID-19 detection performance with . % sensitivity and . % speciﬁcity. The most importantaspect of this study is that the generated infection maps can bevaluable from a medical perspective, whilst they can be usedfor a better and objective COVID-19 assessment. It is clearthat when compared with the activation maps extracted fromdeep models, infection maps are highly superior and reliablemappings of COVID-19 infection.R EFERENCES[1] “Severe Outcomes Among Patients with Coronavirus Disease 2019(COVID-19)-United States, February 12-March 16, 2020. MMWR MorbMortal Wkly Rep 2020;69:343-346.” DOI:http://dx.doi.org/10.15585/mmwr.mm6912e2.[2] World Health Organization, “Coronavirus disease 2019 (covid-19): sit-uation report, 88,” 2020.[3] C. Sohrabi, Z. Alsaﬁ, N. O’Neill, M. Khan, A. Kerwan, A. Al-Jabir, C. Iosiﬁdis, and R. Agha, “World health organization declaresglobal emergency: A review of the 2019 novel coronavirus (covid-19),”

International Journal of Surgery , 2020.[4] T. Singhal, “A review of coronavirus disease-2019 (covid-19),”

TheIndian Journal of Pediatrics , pp. 1–6, 2020.[5] P. Kakodkar, N. Kaka, and M. Baig, “A comprehensive literaturereview on the clinical presentation, and management of the pandemiccoronavirus disease 2019 (covid-19),”

Cureus , vol. 12, no. 4, 2020.[6] Y. Li, L. Yao, J. Li, L. Chen, Y. Song, Z. Cai, and C. Yang, “Stabilityissues of rt-pcr testing of sars-cov-2 for hospitalized patients clinicallydiagnosed with covid-19,”

Journal of medical virology , 2020.[7] A. Tahamtan and A. Ardebili, “Real-time rt-pcr in covid-19 detection:issues affecting the results,”

Expert Review of Molecular Diagnostics ,vol. 20, no. 5, pp. 453–454, 2020.[8] J. Xia, J. Tong, M. Liu, Y. Shen, and D. Guo, “Evaluation of coronavirusin tears and conjunctival secretions of patients with sars-cov-2 infection,”

Journal of medical virology , vol. 92, no. 6, pp. 589–594, 2020.[9] A. T. Xiao, Y. X. Tong, and S. Zhang, “False-negative of rt-pcr andprolonged nucleic acid conversion in covid-19: rather than recurrence,”

Journal of medical virology , 2020.[10] Y. Yang, M. Yang, C. Shen, F. Wang, J. Yuan, J. Li, M. Zhang, Z. Wang,L. Xing, J. Wei et al. , “Laboratory diagnosis and monitoring the viralshedding of 2019-ncov infections,”

MedRxiv , 2020.[11] World Health Organization, “Laboratory testing for coronavirus disease2019 (covid-19) in suspected human cases: interim guidance, 2 march2020,” World Health Organization, Tech. Rep., 2020.[12] S. Salehi, A. Abedi, S. Balakrishnan, and A. Gholamrezanezhad, “Coro-navirus disease 2019 (covid-19): a systematic review of imaging ﬁndingsin 919 patients,”

American Journal of Roentgenology , pp. 1–7, 2020.[13] Y. Fang, H. Zhang, J. Xie, M. Lin, L. Ying, P. Pang, and W. Ji,“Sensitivity of chest ct for covid-19: comparison to rt-pcr,”

Radiology ,p. 200432, 2020.[14] T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, Q. Tao, Z. Sun, andL. Xia, “Correlation of chest ct and rt-pcr testing in coronavirus disease2019 (covid-19) in china: a report of 1014 cases,”

Radiology , p. 200642,2020.[15] A. Bernheim, X. Mei, M. Huang, Y. Yang, Z. A. Fayad, N. Zhang,K. Diao, B. Lin, X. Zhu, K. Li et al. , “Chest ct ﬁndings in coronavirusdisease-19 (covid-19): relationship to duration of infection,”

Radiology ,p. 200463, 2020.[16] Y. Li and L. Xia, “Coronavirus disease 2019 (covid-19): role of chestct in diagnosis and management,”

American Journal of Roentgenology ,vol. 214, no. 6, pp. 1280–1286, 2020.[17] A. Narin, C. Kaya, and Z. Pamuk, “Automatic detection of coronavirusdisease (covid-19) using x-ray images and deep convolutional neuralnetworks,” arXiv preprint arXiv:2003.10849 , 2020.[18] D. J. Brenner and E. J. Hall, “Computed tomography—an increasingsource of radiation exposure,”

The New England Journal of Medicine ,vol. 357, no. 22, pp. 2277–2284, 2007. [19] G. D. Rubin, C. J. Ryerson, L. B. Haramati, N. Sverzellati, J. P. Kanne,S. Raoof, N. W. Schluger, A. Volpi, J.-J. Yim, I. B. Martin et al. ,“The role of chest imaging in patient management during the covid-19 pandemic: a multinational consensus statement from the ﬂeischnersociety,” Chest , vol. 158, no. 1, pp. 106–116, 2020.[20] F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, K. He, Y. Shi, andD. Shen, “Review of artiﬁcial intelligence techniques in imaging dataacquisition, segmentation and diagnosis for covid-19,”

IEEE Reviews inBiomedical Engineering , 2020.[21] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir,Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. Al-Emadi et al. ,“Can ai help in screening viral and covid-19 pneumonia?” arXiv preprintarXiv:2003.13145 , 2020.[22] I. D. Apostolopoulos and T. A. Mpesiana, “Covid-19: automatic de-tection from x-ray images utilizing transfer learning with convolutionalneural networks,”

Physical and Engineering Sciences in Medicine , p. 1,2020.[23] L. O. Hall, R. Paul, D. B. Goldgof, and G. M. Goldgof, “Finding covid-19 from chest x-rays using deep learning on a small dataset,” arXivpreprint arXiv:2004.02060 , 2020.[24] L. Wang and A. Wong, “Covid-net: A tailored deep convolutional neuralnetwork design for detection of covid-19 cases from chest x-ray images,” arXiv preprint arXiv:2003.09871 , 2020.[25] P. K. Sethy and S. K. Behera, “Detection of coronavirus disease (covid-19) based on deep features,” , vol. 2020030300, p. 2020, 2020.[26] J. Zhang, Y. Xie, Y. Li, C. Shen, and Y. Xia, “Covid-19 screening onchest x-ray images using deep learning based anomaly detection,” arXivpreprint arXiv:2003.12338 , 2020.[27] P. Afshar, S. Heidarian, F. Naderkhani, A. Oikonomou, K. N. Plataniotis,and A. Mohammadi, “Covid-caps: A capsule network-based frameworkfor identiﬁcation of covid-19 cases from x-ray images,” arXiv preprintarXiv:2004.02696 , 2020.[28] M. Yamac, M. Ahishali, A. Degerli, S. Kiranyaz, M. E. Chowdhury,and M. Gabbouj, “Convolutional sparse support estimator based covid-19 recognition from x-ray images,” arXiv preprint arXiv:2005.04014 ,2020.[29] M. Ahishali, A. Degerli, M. Yamac, S. Kiranyaz, M. E. Chowdhury,K. Hameed, T. Hamid, R. Mazhar, and M. Gabbouj, “Advance warningmethodologies for covid-19 using chest x-ray images,” arXiv preprintarXiv:2006.05332 , 2020.[30] F. Shi, L. Xia, F. Shan, D. Wu, Y. Wei, H. Yuan, H. Jiang, Y. Gao, H. Sui,and D. Shen, “Large-scale screening of covid-19 from communityacquired pneumonia using infection size-aware classiﬁcation,” arXivpreprint arXiv:2003.09860 , 2020.[31] C.-F. Yeh, H.-T. Cheng, A. Wei, K.-C. Liu, M.-C. Ko, P.-C. Kuo, R.-J.Chen, P.-C. Lee, J.-H. Chuang, C.-M. Chen et al. , “A cascaded learningstrategy for robust covid-19 pneumonia chest x-ray screening,” arXivpreprint arXiv:2004.12786 , 2020.[32] Y. Oh, S. Park, and J. C. Ye, “Deep learning covid-19 features on cxrusing limited training data sets,”

IEEE Transactions on Medical Imaging ,2020.[33] T. Ozturk, M. Talo, E. A. Yildirim, U. B. Baloglu, O. Yildirim, and U. R.Acharya, “Automated detection of covid-19 cases using deep neuralnetworks with x-ray images,”

Computers in Biology and Medicine , p.103792, 2020.[34] M. Z. Alom, M. Rahman, M. S. Nasrin, T. M. Taha, and V. K.Asari, “Covid mtnet: Covid-19 detection with multi-task deep learningapproaches,” arXiv preprint arXiv:2004.03747 , 2020.[35] A. Haghanifar, M. M. Majdabadi, and S. Ko, “Covid-cxnet: Detectingcovid-19 in frontal chest x-ray images using deep learning,” arXivpreprint arXiv:2006.13807 , 2020.[36] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han, Z. Xue, andY. Shi, “Lung infection quantiﬁcation of covid-19 in ct images withdeep learning,” arXiv preprint arXiv:2003.04655 , 2020.[37] K. Zhang, X. Liu, J. Shen, Z. Li, Y. Sang, X. Wu, Y. Zha, W. Liang,C. Wang, K. Wang et al. , “Clinically applicable ai system for accuratediagnosis, quantitative measurements, and prognosis of covid-19 pneu-monia using computed tomography,”

Cell , 2020.[38] Y. Qiu, Y. Liu, and J. Xu, “Miniseg: An extremely minimum networkfor efﬁcient covid-19 segmentation,” arXiv preprint arXiv:2004.09750 ,2020.[39] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in

International Conference onMedical image computing and computer-assisted intervention . Springer,2015, pp. 234–241.[40] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++:A nested u-net architecture for medical image segmentation,” in

Deep Learning in Medical Image Analysis and Multimodal Learning forClinical Decision Support . Springer, 2018, pp. 3–11.[41] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,”in

Proceedings of the IEEE conference on computer vision and patternrecognition , 2018, pp. 2403–2412.[42] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding,A. Bagul, C. Langlotz, K. Shpanskaya et al. , “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,” arXivpreprint arXiv:1711.05225 , 2017.[43] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Denselyconnected convolutional networks,” in

Proceedings of the IEEE confer-ence on computer vision and pattern recognition , 2017, pp. 4700–4708.[44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinkingthe inception architecture for computer vision,” in

Proceedings of theIEEE conference on computer vision and pattern recognition , 2016, pp.2818–2826.[45] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition

Medical Image Analysis , p. 101797, 2020.[48] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L.Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al. , “Identifyingmedical diagnoses and treatable diseases by image-based deep learning,”

Cell , vol. 172, no. 5, pp. 1122–1131, 2018.[49] D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan,L. Rodriguez, S. Antani, G. R. Thoma, and C. J. McDonald, “Preparinga collection of radiology examinations for distribution and retrieval,”

Journal of the American Medical Informatics Association , vol. 23, no. 2,pp. 304–310, 2016.[50] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. W´ang, P.-X. Lu, andG. Thoma, “Two public chest x-ray datasets for computer-aided screen-ing of pulmonary diseases,”

Quantitative imaging in medicine andsurgery , vol. 4, no. 6, p. 475, 2014.[51] M. d. l. I. Vay´a, J. M. Saborit, J. A. Montell, A. Pertusa, A. Bustos,M. Cazorla, J. Galant, X. Barber, D. Orozco-Beltr´an, F. Garcia et al. ,“Bimcv covid-19+: a large annotated dataset of rx and ct images fromcovid-19 patients,” arXiv preprint arXiv:2006.01174 arXiv preprint arXiv:2003.11597

Proceedings of the IEEE conference on computer visionand pattern recognition , 2017, pp. 2097–2106.[58] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar, “Focal lossfor dense object detection,” in

Proceedings of the IEEE internationalconference on computer vision , 2017, pp. 2980–2988.[59] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutionalneural networks for volumetric medical image segmentation,” in . IEEE, 2016, pp.565–571.[60] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.Corrado, A. Davis, J. Dean, M. Devin et al. , “Tensorﬂow: Large-scalemachine learning on heterogeneous distributed systems,” arXiv preprintarXiv:1603.04467 , 2016.[61] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[62] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, andD. Batra, “Grad-cam: Visual explanations from deep networks viagradient-based localization,” in