[PDF] ECOVNet: An Ensemble of Deep Convolutional Neural Networks Based on EfficientNet to Detect COVID-19 From Chest X-rays

Abstract

This paper proposed an ensemble of deep convolutional neural networks (CNN) based on EfficientNet, named ECOVNet, to detect COVID-19 using a large chest X-ray data set. At first, the open-access large chest X-ray collection is augmented, and then ImageNet pre-trained weights for EfficientNet is transferred with some customized fine-tuning top layers that are trained, followed by an ensemble of model snapshots to classify chest X-rays corresponding to COVID-19, normal, and pneumonia. The predictions of the model snapshots, which are created during a single training, are combined through two ensemble strategies, i.e., hard ensemble and soft ensemble to ameliorate classification performance and generalization in the related task of classifying chest X-rays.

Full PDF

EECOVN ET : A N E NSEMBLE OF D EEP C ONVOLUTIONAL N EURAL N ETWORKS B ASED ON E FFICIENT N ET TO D ETECT

COVID-19 F

ROM C HEST X- RAYS

A P

REPRINT

Nihad Karim Chowdhury

Department of Computer Science and EngineeringUniversity of ChittagongBangladesh [email protected]

Muhammad Ashad Kabir

School of Computing and MathematicsCharles Sturt University, NSWAustralia [email protected]

Md. Muhtadir Rahman

Department of Computer Science and EngineeringUniversity of ChittagongBangladesh [email protected]

Noortaz Rezoana

Department of Computer Science and EngineeringUniversity of ChittagongBangladesh [email protected]

October 19, 2020 A BSTRACT

The perilous COVID-19 disease puts the world in an exotic state of emergency since it overwhelmsthe global healthcare system. Consequently, people’s lives are at greater risk due to high mortalityas there is no effective vaccine as a precaution against contracting the occurrence of COVID-19disease. Also, researchers all over the world are working to develop corresponding vaccines, andat the same time striving to obtain something effective but pragmatic screening technologies, suchas medical imaging. To combat this disease, one of the preeminent screening techniques may bechest X-ray, as it has many historical credentials related to lung diseases that can provide clinicalinsights. This paper proposed an ensemble of deep convolutional neural networks (CNN) based onEfﬁcientNet, named ECOVNet, to detect COVID-19 using a large chest X-ray data set. At ﬁrst, theopen-access large chest X-ray collection is augmented, and then ImageNet pre-trained weights forEfﬁcientNet is transferred with some customized ﬁne-tuning top layers that are trained, followedby an ensemble of model snapshots to classify chest X-rays corresponding to COVID-19, normal,and pneumonia. The predictions of the model snapshots, which are created during a single training,are combined through two ensemble strategies, i.e., hard ensemble and soft ensemble to ameliorateclassiﬁcation performance and generalization in the related task of classifying chest X-rays. Inaddition, a visualization technique is incorporated in the proposed method to highlight areas thatdistinguish categories, thereby enhancing the understanding of primal components related to COVID-19 infection. Empirical evaluations show that the ensemble strategy (especially soft ensemble) cansigniﬁcantly improve prediction performance with accuracy, while the precision and recall rateof detecting COVID-19 are both . We believe that ECOVNet can strengthen the resistanceto COVID-19 disease, and more broadly, it will propel towards a fully automated and efﬁcaciousCOVID-19 detection system. K eywords COVID-19 · Chest X-ray · Convolutional Neural Network · EfﬁcientNet · Ensemble · Hard Ensemble · SoftEnsemble a r X i v : . [ ee ss . I V ] O c t PREPRINT - O

CTOBER

19, 2020

Coronavirus disease 2019 (COVID-19) is a contagious disease that was caused by the Severe Acute RespiratorySyndrome Coronavirus 2 (SARS-CoV-2). The disease was ﬁrst detected in Wuhan City, Hubei Province, China inDecember 2019, and was related to contact with a seafood wholesale market and quickly spread to all parts of theworld [1]. The World Health Organization (WHO) promulgated the outbreak of the COVID-19 pandemic on March 11,2020. As of September 20, 2020, this perilous virus has not only overwhelmed the world, but also affected millionsof lives. So far, there have been , , conﬁrmed COVID-19 cases and , conﬁrmed deaths [2]. To limitthe spread of this infection, all infected countries strive to cover many strategies such as encourage people to maintainsocial distancing as well as lead hygienic life, enhance the infection screening system through multi-functional testing,seek mass vaccination to reduce the pandemic ahead of time, etc. The reverse transcriptase-polymerase chain reaction(RT-PCR) is a modular diagnosis method, however, it has certain limitations, such as the accurate detection of suspectpatients causes delay since the testing procedures inevitably preserve the strict necessity of conditions at the clinicallaboratory [3] and false-negative results may lead to greater impact in the prevention and control of the disease [4].To make up for the shortcomings of RT-PCR testing, researchers around the world are seeking to promote a fastand reliable diagnostic method to detect COVID-19 infection. The WHO and Wuhan University Zhongnan Hospitalrespectively issued quick guides [5, 6], suggesting that in addition to detecting clinical symptoms, chest imaging canalso be used to evaluate the disease to diagnose and treat COVID-19. In [7], the authors have contributed a proliﬁcguideline for medical practitioners to use chest radiography and computed tomography (CT) to screen and assess thedisease progression of COVID-19 cases. Although CT scans have higher sensitivity, it also has some drawbacks, suchas high cost and the need for high doses of radiation during screening, which exposes pregnant women and childrento greater radiation risks [8]. On the other hand, diagnosis based on chest X-ray appears to be a propitious solutionfor COVID-19 detection and treatment. In [9], Ng et al. remarked that COVID-19 infection pulmonary manifestationis immensely delineated by chest X-ray images. Moreover, in the case of an artiﬁcial intelligence (AI)-based diseaserecognition system, medical practitioners have already emphasized chest X-rays to explore potential symptoms ofCOVID-19 infection, such as opaque patterns in the lungs [10].The purpose of this study is to ameliorate the accuracy of COVID-19 detection system from chest X-ray images. In thiscontext, we contemplate a CNN-based architecture since it is illustrious for its topnotch recognition performance inimage classiﬁcation or detection. For medical image analysis, higher detection accuracy along with crucial ﬁndingsis a top aspiration, and in current years, CNN based architectures are comprehensively featured the critical ﬁndingsrelated to medical imaging that’s why we constructed the proposed architecture with CNN. In order to achieve thedeﬁned purpose, this paper presents a novel CNN based architecture called ECOVNet, exploiting the cutting-edgeEfﬁcientNet [11] family of CNN models together with ensemble strategies. The pipeline of the proposed architecturecommences with the data augmentation approach, then optimizes and ﬁne-tunes the pre-trained EfﬁcientNet models,creating respective model’s snapshots. After that, generated model snapshots are integrated into an ensemble, i.e., softvoting and hard voting, to make predictions. The motivation for using EfﬁcientNets is that they are known for their highaccuracy, while being smaller and faster than the best existing CNN architectures. Moreover, an ensemble techniquehas proven to be effective in predicting since it produces a lower error rate compared with the prediction of a singlemodel. Owing to the limited number of COVID-19 images currently available, diagnosing COVID-19 infection ismore challenging, thereby investing with a visual explainable approach is applied for further analysis. In this regard,we use a Gradient-based Class Activation Mapping algorithm, i.e., Grad-CAM [12], providing explanations of thepredictions and identifying relevant features associated with COVID-19 infection. The key contributions of this paperare as follows:• We propose a novel CNN based architecture that includes front-end pre-trained EfﬁcientNets for featureextraction and model snapshots to detect COVID-19 from chest X-rays.• Taking into account the following assumption, the decisions of multiple radiologists are considered in theﬁnal prediction, so we propose an ensemble in the proposed architecture to make predictions, thus making acredible and fair evaluation of the system.• We visualize a class activation map through Grad-CAM to explain the prediction as well as to identify thecritical regions in the chest X-ray.• Finally, we appraise our architecture with state-of-the-art architectures through empirical observations tohighlight the effectiveness of the proposed architecture in detecting COVID-19.The remainder of the paper is arranged as follows: Section 2 discusses related work. Section 3 explains the details ofthe data set and proposed network architecture, as well as its adjustments to the detection of COVID-19 infection. Theresults of our experimental evaluation is presented in Section 4. Finally, Section 5 concludes paper and highlights thefuture work. 2 PREPRINT - O

CTOBER

19, 2020

Due to the need to identify COVID-19 infections faster, the latest application areas of CNN-based AI systems arebooming, which can speed up the analysis of various medical images. As we all know, a chest X-ray screening is astate-of-the-art technology with historical prospects for image diagnosis systems for detecting pneumonia [13]. Inaddition, both pneumonia and COVID-19 go through certain infection characteristics (such as the occurrence of severelung infections). Hence, it has inspired researchers around the world to explore the ability of chest X-rays throughvarious feature extraction methods especially CNN based approaches to detect COVID-19, thus playing a role when thecurrent health care system is exhausted by the pandemic.An in-depth survey of the application of CNN technology in COVID-19 detection and automatic lung segmentation isexplained in [14], with a focus on analysis using X-rays and computed tomography (CT) images. Halgurd et al. [15]tested a modiﬁed CNN model as well as a modiﬁed pre-trained AlexNet [16] using their own chest X-ray and CT scandata set while providing accuracy up to 98% via modiﬁed pre-trained model and . accuracy by using the modiﬁedCNN. Narin et al. [17] achieved the highest accuracy of 98% by using three pre-trained models with ImageNet [18]weights (such as ResNet50 [19], Inception v3 [20], and Inception-ResNet v2 [21]), taking into account two types ofimages, i.e., COVID-19 and normal images. A completely new CNN framework named COVID-Net and a large chestx-ray benchmark data set, i.e., COVIDx introduced by Wang et al. [22]. The proposed COVID-Net obtained the best testaccuracy of . , and studied how COVID-Net uses an interpretability method to predict. In [23], the state-of-the-artCNN architectures (such as VGG19 [24], MobileNetV2 [25], Inception [20], Xception [26], Inception-ResNet v2 [21])were trained using transfer learning on ImageNet, and different neural network architectures were used on top of eacharchitecture. The results produced by ﬁne-tuned models demonstrated the proof-of-principle for using CNN withtransfer learning to extract radiological features.The authors of [27] prepared a dataset of 5,000 chest x-rays from the publicly available datasets, and a subset oftheir benchmark utilized to develop a model by ﬁne-tuning four popular pre-trained CNNs (such as ResNet18 [19],ResNet50, SqueezeNet [28] and DenseNet121 [29]). The proposed model was evaluated using the remaining imagesand produced promising results in terms of sensitivity and speciﬁcity. Eduardo et al. [30] proposed a new deep learningframework that extends the EfﬁcientNet [11] series, which is well known for its excellent prediction performance andfewer computational steps. Their experimental evaluation showed noteworthy classiﬁcation performance, especiallyin COVID-19 cases. Next, Farooq et al. [31] proposed a method called COVID-ResNet, which uses a three-steptechnique, including gradually adjusting image size, automatic learning rate selection, and then ﬁne-tuning the pre-trained ResNet50 architecture to improve model performance. A CNN model called DarkCovidNet [32] proposed forthe automatic detection of COVID-19 using chest X-ray images where the proposed method carried out two types ofclassiﬁcation, one for binary classiﬁcation (such as COVID and No-Findings) and another for multi-class (such asCOVID, No-Findings and pneumonia) classiﬁcation. Finally, the authors provided an intuitive explanation throughthe heat map, so it can assist the radiologist to ﬁnd the affected area on the chest X-ray. In another study, Ucar etal. [33] proposed a ﬁne-tuned lightweight SqueezeNet, in which the ﬁne-tuned hyper-parameters were obtained throughBayesian optimization, and the performance of the proposed network was superior to some of the existing CNNnetworks for detecting COVID-19 cases.Another research [34] proposed an explainable CNN-based method adjusting on a neural ensemble technique followedby highlighting class-discriminating regions named DeepCOVIDExplainer for automatic detection of COVID-19 casesfrom chest x-ray images. A study accomplished by Afshar et al. [35] to contribute an efﬁcacious COVID-19 detectionsystem using Capsule Networks(CapsNets) [36] based CNN architecture, and the authors of this research claimedtheir system efﬁcacy not only in statistical performance evaluation but also for a lesser number of trainable parameterscompared to its counterparts. Asif et al. [37] proposed a model named CoroNet that used Xception architecturepre-trained on ImageNet dataset and trained on their benchmark creating from two publicly available data sets, andcarried out two different classiﬁcation performance measurement, i.e., three and four classes since the overall accuracyof three and four class classiﬁcation are and . respectively. In [38], Mohammad et al. proposed a CNN-basedmodel called CovXNet, which uses depthwise dilated convolution. At ﬁrst, the model trained with some non-COVIDpneumonia images, and further transferred the acquired learning with some additional ﬁne-tuning layers that trainedagain with a smaller number of chest X-rays related to COVID-19 and other pneumonia cases. As features extracted fromdifferent resolutions of X-rays, a stacking algorithm is used in the prediction process, and for multi-class classiﬁcation,the accuracy of CovXNet is . . In another research, Haghanifa et al. [39] prepared a new benchmark by amassingthe largest public dataset of COVID-19 chest X-ray images from diverse sources and developed a ﬁne-tuned modelbased on DenseNet121 using CheXNet [40] weight while providing statistical performance along with the visual markerto efﬁcaciously localize the critical region of COVID-19 cases. Another CNN-based modular architecture proposed byNihad et al. [41] named PDCOVIDNet (dilated convolution-based COVID-19 detection network), which consists ofseveral blocks (such as a parallel stack of multi-layer ﬁlter blocks in a cascade with a classiﬁcation and visualization3 PREPRINT - O

CTOBER

19, 2020block), in the workﬂow of COVID-19 detection from chest X-ray images. The authors claimed the effectiveness ofthe model compared with some well known CNN architecture and showed precision and recall of . and . respectively in a case of COVID-19 detection.Table 1: Overview of CNN based architectures for detecting COVID-19 from chest X-rays Method Data Source Architecture Pre-trained Weight Ensemble VisualizationHalgurd et al. [15] Cohen et al. [42],BSTI a AlexNet ImageNet No NoNarin et al. [17] Cohen et al. [42],P. Mooney [43] ResNet50,Inception v3,Inception-ResNet v2 ImageNet No NoWang et al. [22] COVIDx [22] COVID-Net(Custom CNN) ImageNet No GSInquire [44]Apostolopoulos et al. [23] Cohen et al. [42], Kermany et al. [45],RSNA [46], SIRM [47] Xception,Inception-ResNet v2,VGG19, MobileNet v2,Inception ImageNet No NoShervin et al. [27] COVID-Xray-5k dataset [27] ResNet18, ResNet50,SqueezeNet, DenseNet121 ImageNet No Heat Map,Radiologist-MarkedEduardo et al. [30] COVIDx EfﬁcientNet Imagenet No Heat MapFarooq et al. [31] COVIDx ResNet50 ImageNet No NoOzturk et al. [32] Cohen et al. [42],NIH Chest X-ray dataset [48] DarkCovidNet(Custom CNN) No No Grad-CAM,Radiologist-MarkedUcar et al. [33] COVIDx SqueezeNet ImageNet No Heat MapKarim et al. [34] COVIDx VGG16,VGG19,ResNet18,ResNet34, DenseNet161, DenseNet201 Imagenet Yes Grad-CAM,Grad-CAM++ [49],LRP [50]Afshar et al. [35] COVIDx Capsule Networks(CapsNets) NIH Chest X-ray dataset No NoAsif et al. [37] Cohen et al. [42], P. Mooney [43] Xception Imagenet No NoMohammad et al. [38] Mendeley Data, V2 [51],305 COVID-19 images CovXNet(Custom CNN) Non-COVID X-rays Yes Grad-CAMHaghanifa et al. [39] COVID-19 Data Collection b DenseNet121 ImageNet, CheXNet [13] No Grad-CAM,LIME [52]Nihad et al. [41] COVID-19 Radiography Database [53] PDCOVIDNet(Custom CNN) No No Grad-CAM, Grad-CAM++ a b https://github.com/armiro/COVID-CXNet It can be seen from the literature review that most methods make prediction decisions based on the output of a singlemodel rather than on ensemble, but few methods [34, 38] rely on an ensemble. As we have seen, the ensemble brings abeneﬁt, that is, it can reduce prediction errors, thus making the model more versatile. One of the previous studies usedensemble on heterogeneous models, i.e., VGG19, ResNet18, and DenseNet161 in [34], but that approach has somelimitations such as that each model requires a separate training session, and an individual model suffers from trainingmany parameters. Another method [38] is to perform an ensemble on a single model, but uses various image resolutions,and for each image resolution, it creates a separate model and stacks it for prediction, which incurs computationaloverhead. Contrary to the ensemble, an advanced custom CNN architecture, COVID-Net [22], was implemented andtested using a large COVID-19 benchmark, but due to the large number of parameters, the computational overhead ofthis model is high. To address the aforementioned problems, we use a lightweight but effective model EfﬁcientNet sinceit is 8.4 times smaller and 6.1 times faster than the best existing CNN [11]. Also, to extenuate the limitation related tothe computational cost of training multiple deep learning models for ensemble prediction, we force large changes inmodel weights through the recurrent learning rate, creating model snapshots in the same training, and further apply anensemble to make the proposed architecture more robust.

In this section, we brieﬂy discuss our approach. First, we will precede the benchmark data set and data augmentationstrategy used in the proposed architecture. Next, we will outline the proposed ECOVNet architecture, including networkconstruction using a pre-trained EfﬁcientNet and training methods, and then model ensemble strategies. Finally, tomake disease detection more acceptable, we will integrate decision visualizations to highlight pivotal facts with visualmarkers.

In this sub-section, we concisely inaugurate the benchmark data set, named COVIDx [22], that used in our experiment.To the best of our knowledge, this data set is one of the largest open-access benchmark data set for the number ofCOVID-19 infection cases, and the total number of 14,914 images for training and 1,579 images for testing, comprisingthree categories of COVID-19, normal and pneumonia . Figure 1 shows sample images from the benchmark dataset,including COVID-19, normal and pneumonia. Table 2 depicts the distribution of images in training and testing sets. Togenerate the COVIDx, the authors [22] used ﬁve different publicly accessible data repositories: Access on July 17, 2020 PREPRINT - O

CTOBER

19, 2020• From COVID-19 Image Data Collection [42], they gathered non-COVID19 pneumonia and COVID-19 cases.• The Figure 1 COVID-19 Chest X-ray Dataset [54] Initiative utilized for COVID-19 cases.• ActualMed COVID-19 Chest X-ray Dataset Initiative [55] selected for COVID-19 cases.• Radiological Society of North America (RSNA) Pneumonia Detection Challenge dataset [46] employed fornormal and non-COVID19 pneumonia cases.• COVID-19 radiography database [53] managed for COVID-19 cases. (a) COVID-19 (b) Normal (c) Pneumonia

Figure 1: Some image labels available in the benchmark dataset [22]Table 2: Distribution of images in training and testing sets [22]Category COVID-19 Normal Pneumonia TotalTraining 489 7,966 5,459 14,914Testing 100 885 594 1,579

Data augmentation is a process performed in time during the training process to expand the training set. As long asthe semantic information of an image is preserved, the transformation of the images in the training data set can beused for data augmentation. Using data augmentation, the performance of the model can be improved by solving theproblem of overﬁtting thus greatly improve inductive reasoning. Although the CNN model has properties such as partialtranslation-invariant, augmentation strategies i.e., translated images can often considerably enhance generalizationcapabilities [56]. Data augmentation strategies provide various alternatives, each of which has the advantage ofinterpreting images in multiple ways to present important features, thereby improving the performance of the model.We have considered the following parameters: horizontal ﬂip, rotation, shear, and zoom for augmentation during thetraining process.

In this section, we will brieﬂy describe the proposed ECOVNet architecture. After augmenting the COVIDx dataset,we used pre-trained EfﬁcientNet as a feature extractor. This step ensures that the pre-trained EfﬁcientNet can extractand learn useful chest X-ray features, and can generalize it well. Indeed, EfﬁcientNets are an order of models thatare obtained from a base model, i.e., EfﬁcientNet-B0. In the proposed architecture, we demonstrated EfﬁcientNet-B0,however, during the experimental evaluation, we considered other models. The output features from the pre-trainedEfﬁcientNet fed to our proposed custom top layers through two fully connected layers, which are respectively integratedwith batch normalization, activation, and dropout. We generated several snapshots in a training session, and thencombined their predictions with an ensemble prediction. At the same time, the visualization approach, which canqualitatively analyze the relationship between input examples and model predictions, was incorporated into the followingpart of the proposed model. Fig.2 shows a graphical presentation of the proposed ECOVNet architecture using apre-trained EfﬁcientNet. 5

PREPRINT - O

CTOBER

19, 2020

Pre-processing Stage Base Model EfﬁcientNet-B0 c o n v3x3 M B C o n v1 , Top Layers M B C o n v6 , M B C o n v6 , G l o b a l A v er ag e P oo l D r o p o u t P re d . : C l a ss e s Chest X-rayImages AugmentedImages

Model 1Model N

Ensemble Prediction(Hard/Soft)TestPrediction

COVID-19NormalPneumonia

FC1,BN,Activation,DropoutPred. : 3 Classes C u s t o m i ze d T o p L ay er s FC2,BN,Activation,Dropout c o n v3x3 M B C o n v1 , M B C o n v6 , M B C o n v6 , Base Model EfﬁcientNet-B0

TrainingValidation

Transfer Learning

Global Average Pool

Visualization (Grad-CAM)

Figure 2: Graphical representation of the proposed ECOVNet architecture

EfﬁcientNets are a series of models (namely EfﬁcientNet-B0 to B7) that are derived from the baseline network(often called EfﬁcientNet-B0) by scale it up. The advantages of EfﬁcientNets are reﬂected in two aspects, namely, itnot only provides higher accuracy, but also ameliorates the effectiveness of the model by reducing parameters andFLOPS(Floating Point Operations Per Second). By adopting a compound scaling method in all dimensions of thenetwork, i.e., width, depth, and resolution, EfﬁcientNets have pulled attention due to its supremacy in predictionperformance. Mention that, width refers to the number of channels in any layer, depth relates to the number of layers inCNN, and resolution associates with the size of the image. The intuition of using compound scaling is that scaling anydimension of the network (such as width, depth, or image resolution) can increase accuracy, but for larger models, theaccuracy gain will decrease. To scale the dimensions of the network systematically, compound scaling uses a compound6

PREPRINT - O

CTOBER

19, 2020coefﬁcient that controls how many more resources are functional for model scaling, and the dimensions are scaled bythe compound coefﬁcient in the following way [11]:depth: d = α φ width: w = β φ resolution: r = γ φ s.t. α.β .γ ≈ α ≥ , β ≥ , γ ≥ (1)where φ is the compound coefﬁcient, and α , β , and γ are the scaling coefﬁcients of each dimension that can be ﬁxedby a grid search. After determining the scaling coefﬁcients, these coefﬁcients are applied to the baseline network(EfﬁcientNet-B0) for scaling to obtain the desired target model size. For instance, in the case of EfﬁcientNet-B0, when φ = 1 is set, the optimal values are yielded using a grid search, i.e., α = 1 . , β = 1 . , and γ = 1 . , under theconstraint of α.β .γ ≈ [11]. By changing the value of φ in Equation 1, EfﬁcientNet-B0 can be scaled up to obtainEfﬁcientNet-B1 to B7.The feature extraction of the EfﬁcientNet-B0 baseline architecture is comprised of the several mobile inverted bottleneckconvolution (MBConv) [25, 57] blocks with built-in squeeze-and-excitation (SE) [58], Batch Normalization, and Swishactivation [59] as integrated into EfﬁcientNet. Compared with conventional convolution, EfﬁcientNet’s ensembleframework is, i.e., MBConv, proven to be more accurate in image classiﬁcation, while reducing parameters and FLOPSby an order of magnitude. Table3 shows the detailed information of each layer of the EfﬁcientNet-B0 baseline network.EfﬁcientNet-B0 consists a total of of MBConv blocks varying in several aspects, for instance, kernel size, featuremaps expansion phase, reduction ratio, etc. A complete workﬂow of the MBConv1,k × and MBConv6,k × blocksare shown in Figure 3. Both MBConv1,k × and MBConv6,k × use depthwise convolution, which integrates akernel size of × with the stride size of s . In these two blocks, batch normalization, activation, and convolution with akernel size of × are integrated. The skip connection and a dropout layer are also incorporated in MBConv6,k × , butthis is not the case with MBConv1,k × . Furthermore, in the case of the extended feature map, MBConv6,k × is sixtimes that of MBConv1,k × , and the same is true for the reduction rate in the SE block, that is, for MBConv1,k × and MBConv6,k × , r is ﬁxed to and , respectively. Note that, MBConv6,k × performs the identical operationsas MBConv6,k × , but MBConv6,k × applies a kernel size of × , while a kernel size of × is used byMBConv6,k × . Table 3: EfﬁcientNet-B0 baseline network layers outlineStage Operator Resolution Conv × ×

224 32 12

MBConv , k × ×

112 16 13

MBConv , k × ×

112 24 24

MBConv , k × ×

56 40 25

MBConv , k × ×

28 80 36

MBConv , k × ×

14 112 37

MBConv , k × ×

15 192 48

MBConv , k × × Conv × & Pooling & FC × Instead of random initialization of network weights, we instantiate ImageNet’s pre-trained weights in the EfﬁcientNetmodel thereby accelerating the training process. Transferring the pre-trained weights of the ImageNet have performeda great feat in the ﬁeld of image analysis, since it composes more than 14 million images covering eclectic classes.The rationale for using pre-trained weights is that the imported model already has sufﬁcient knowledge in the broaderaspects of the image domain. As it has been manifested in several studies [17, 60], using pre-trained ImageNet weightsin the state-of-the-art CNN models remain optimistic even when the problem area (namely COVID-19 detection) isconsiderably distinct from the one in which the original weights have been obtained. The optimization process willﬁne-tune the initial pre-training weights in the new training phase so that we can ﬁt the pre-trained model to a speciﬁcproblem domain, such as COVID-19 detection. 7

PREPRINT - O

CTOBER

19, 2020

MBConv1, k3 (cid:53) (cid:53) w (cid:53) c DW Conv(k3 (cid:53) (cid:53) h/s (cid:53) w/s (cid:53)

C SE(r=R)h (cid:53) w (cid:53) c Global Average PoolReshapeConv(k1 (cid:53) (cid:53)

1, c,1) c (cid:53) (cid:53) c1 (cid:53) (cid:53) c/Rh (cid:53) w (cid:53) c MBConv6, k3 (cid:53) (cid:53) w (cid:53) c Conv(k1 (cid:53) (cid:53) (cid:53) h/s (cid:53) w/s (cid:53) C1 (cid:53) (cid:53) c SE(r=R) (cid:53) (a) (b) (c)

Figure 3: The basic building block of EfﬁcientNet-B0. All MBConv blocks take the height, width, and channelof h, w, and c as input. C is the output channel of the two blocks. (Note that, MBConv= Mobile InvertedBottleneck Convolution, DW Conv= Depth-wise Convolution, SE= Squeeze-Excitation, Conv= Convolution)

The ﬁnal output of the EfﬁcientNet architecture turns out as a global averaged feature followed by a classiﬁer. Toperform the classiﬁcation task, we used a two-layer MLP (usually called a fully connected (FC) layer), which capturesthe features of EfﬁcientNet through two neural layers (each neural layer has 512 nodes). In between FC layers, weincluded batch normalization, activation, and dropout layer. Batch normalization greatly accelerates the training of deepnetworks and increases the stability of neural networks [61]. It makes the optimization process smoother, resulting in amore predictable and stable gradient behavior, thereby speeding up training [62]. In this study, in a case of activationfunction, we have preferred Swish which is deﬁned as [59]: f ( x ) = x · σ ( x ) (2)where σ ( x ) = (1 + exp ( − x )) − is the sigmoid function. Comparison with other activation functions Swish consistentlyoutperforming others including Rectiﬁed Linear Unit(ReLU) [63], which is the most successful and widely-usedactivation function, on deep networks applied to a variety of challenging ﬁelds i.e., image classiﬁcation and machinetranslation. Swish has many characteristics, such as one-sided boundedness at zero, smoothness, and non-monotonicity,which play an important role in improving it [59]. After performing the activation operation, we integrated a Dropout [64]layer, which is one of the preeminent regularization methods to reduce overﬁtting and make better predictions. Thislayer can randomly drop certain FC layer nodes, which means removing all randomly selected nodes, along with allits incoming and outgoing weights. The number of randomly selected nodes drop in each layer is obtained with aprobability p independent of other layers, where p can be chosen by using either a validation set or a random estimate(i.e., p = 0 . ). In this study, we maintained a dropout size of . . Next, the classiﬁcation layer used the softmaxactivation function to render the activation from the previous FC layers into a class score to determine the class ofthe input chest X-ray image as COVID-19, normal, and pneumonia. The softmax activation function is deﬁned in thefollowing way: s ( y i ) = e y i (cid:80) Cj =1 e y j (3)where C is the total number of classes. This normalization limits the output sum to , so the softmax output s ( y i ) canbe interpreted as the probability that the input belongs to the i class. In the training process, we apply the categorical8 PREPRINT - O

CTOBER

19, 2020cross-entropy loss function, which uses the softmax activation function in the classiﬁcation layer to measure the lossbetween the true probability of the category and the probability of the predicted category. The categorical cross-entropyloss function is deﬁned as l = − N (cid:88) n =1 log( e y i,n (cid:80) Cj =1 e y j,n ) . (4)The total number of input samples is denoted as N , and C is the total number of classes, that is, C = 3 in our case. The main concept of building model snapshots is to train one model with constantly reducing the learning rate to attain alocal minimum and save a snapshot of the current model’s weight. Later, it is necessary to actively increase the learningrate to retreat from the current local minimum requirements. This process continues repeatedly until it completes cycles.One of the prominent methods for creating model snapshots for CNN is to collect multiple models during a singletraining run with cyclic cosine annealing [65]. The cyclic cosine annealing method starts from the initial learning rate,then gradually decreases to the minimum, and then rapidly increases. The learning rate of cyclic cosine annealing ineach epoch is deﬁned as: α ( t ) = α π mod( t − , (cid:100) T /M (cid:101) ) (cid:100) T /M (cid:101) ) + 1) (5)where α ( t ) is the learning rate at epoch t , α is the initial learning rate, T is the total number of training iterations and M is the number of cycles. The weight at the bottom of each cycle is regarded as the weight of the snapshot model. Thefollowing learning rate cycle uses these weights, but allows the learning algorithm to converge to different solutions,thereby generating diverse snapshots model. After completing M cycles of training, we get M model snapshots s ...s M ,each of which will be utilized in the ensemble prediction.Ensemble through model snapshots is more effective than a structure based on a single model only. Therefore, comparedwith the prediction of a single model, the ensemble prediction reduces the generalization error, thereby improving theprediction performance. We have experimented with two ensemble strategies, i.e., hard ensemble and soft ensemble, toconsolidate the predictions of snapshots model to classify chest X-ray images as COVID-19 or normal or pneumonia.Both hard ensemble and soft ensemble use the last m ( m ≤ M ) model’s softmax outputs since these models have atendency to have the lowest test error. We also consider class weights to obtain a softmax score before applying theensemble. Let O i ( x ) be the softmax score of the test sample x of the i -th snapshot model. Using hard ensemble, theprediction of the i -th snapshot model is deﬁned as H i = argmax x O i ( x ) . (6)The ﬁnal ensemble constrains to aggregate the votes of the classiﬁcation labels (i.e., COVID-19, normal, and pneumonia)in the other snapshot models and predict the category with the most votes. On the other hand, the output of the softensemble includes averaging the predicted probabilities of class labels in the last m snapshots model deﬁned as S = 1 m m − (cid:88) i =0 O M − i ( x ) . (7)Finally, the class label with the highest probability is used for the prediction. Fine-tuned hyper-parameters have a great impact on the performance of the model because they directly govern thetraining of the model. What’s more, ﬁne-tuned parameters can avoid overﬁtting and form a generalized model. Since wehave dealt with an unbalanced data set, the proposed architecture may have a huge possibility to confront the problemof overﬁtting. In order to solve the problem of overﬁtting, we use L L weight decay regularization with coefﬁcients e − and e − in FC layers. Next, dropout is another successful regularization technique that has been integratedinto the proposed architecture, especially in FC layers with p = 0 . , to suppress overﬁtting. In the experiments on theproposed architecture, we have explored the Adam [66] optimizer, which can converge faster. When creating snapshots,we set the number of epochs to , the minimum batch size to , the initial learning rate to e − , and the number ofcycles to , thus providing snapshots for each model, on which we build up the ensemble prediction. Although the CNN-based modular architecture provides encouraging recognition performance for image classiﬁcation,there are still several issues where it is challenging to reveal why and how to produce such impressive results. Due to its9

PREPRINT - O

CTOBER

19, 2020black-box nature, it is sometimes contrary to apply it in a medical diagnosis system where we need an interpretablesystem i.e., visualization as well as an accurate diagnosis. Despite it has certain challenges, researchers are stillendeavoring to seek for an efﬁcient visualization technique since it can contribute the most critical key facts in thehealth-care system into focus, assist medical practitioners to distinguish correlations and patterns in imaging, andperform data analysis more efﬁcacious. In the ﬁeld of detecting COVID-19 through chest X-rays, some early studiesfocused on visualizing the behavior of CNN models to distinguish between different categories (such as COVID-19,normal, and pneumonia), so they can produce explanatory models. In our proposed model, we applied a gradient-basedapproach named Grad-CAM [12], which measures the gradients of features maps in the ﬁnal convolution layer ona CNN model for a target image, to foreground the critical regions that are class-discriminative saliency maps. InGrad-CAM, gradients that are ﬂowing back to the ﬁnal convolutional layer in a CNN model are globally averagedto calculate the target class weights of each ﬁlter. Grad-CAM heat-map is a combination of weighted feature maps,followed by a ReLU activation. The class-discriminative saliency map L c for the target image class c is deﬁned asfollows [12]: L ci,j = ReLU ( (cid:88) k w ck A ki,j ) , (8)where A ki,j denotes the activation map for the k -th ﬁlter at a spatial location ( i, j ), and ReLU captures the positivefeatures of the target class. The target class weights of k -th ﬁlter is computed as: w ck = 1 Z (cid:88) i (cid:88) j ∂Y c ∂A ki,j , (9)where Y c is the probability of classifying the target category as c , and the total number of pixels in the activation map isdenoted as Z . In this section, we will present the results and consider several experimental settings to analyze the results of theproposed ECOVNet to explore the robustness of the model. The performance of the proposed model for ﬁguring out thethree-class classiﬁcation problem is compared with some state-of-the-art methods. The three-class classiﬁcation problemis to determine whether the chest X-ray image belongs to the category of COVID-19 or the normal or pneumoniacategory. All our programs are written in Python, and the software pile is composed of Keras with the TensorFlowbackend and scikit-learn.

In sub-section 3.1 and sub-section 3.2, the benchmark data set with the augmentation approach used in the experimentis illustrated in brief. We conﬁgure two test sets (namely Imbalanced and Balanced Test) that the imbalanced test is theoriginal test set that comes from COVIDx while the balanced test is also from COVIDx test set, but we randomly choose images for both normal and pneumonia where the test size of COVID-19 is ﬁxed, i.e., . During training, we setthe training and validation ratios to 90% and 10%, respectively. The entire image distribution of training, validation,and testing is shown in Table 4. We regard the pre-trained EfﬁcientNet as feature extraction, and in the description ofthe related structure in sub-section 3.3, the impression is that EfﬁcientNet is a series of models formed by arbitraryselection of scale factors. In our experiment, we consider EfﬁcientNet B0 to B5 base models; however, the input shapesare different. Table 5 displays a list of input shapes for each base model as well as the total number of parametersduring training.Table 4: Image partition of Training, Validation, and Testing set for Balanced and Imbalanced test

Category COVID-19 Normal Pneumonia TotalTraining

441 7 ,

170 4 ,

914 12 , Validation

48 796 545 1 , Testing(Balanced)

100 100 100 300

Testing(Imbalanced)

100 885 594 1 , In order to evaluate the performance of the proposed method, we considered the following evaluation metrics: accuracy,precision, recall, F1 score, conﬁdence interval (CI), receiver operating characteristic (ROC) curve and area under the10

PREPRINT - O

CTOBER

19, 2020Table 5: Image resolution and total number of parameters of ECOVNet considering the base models (B0 to B5)of EfﬁcientNet

Base Model Image Resolution Parameter Size(ECOVNet) Base Model Image Resolution Parameter Size(ECOVNet)EfﬁcientNet-B0 ×

224 4 , , EfﬁcientNet-B3 ×

360 11 , , EfﬁcientNet-B1 ×

240 7 , , EfﬁcientNet-B4 ×

380 18 , , EfﬁcientNet-B2 ×

260 8 , , EfﬁcientNet-B5 ×

456 29 , , curve (AUC). The deﬁnitions of accuracy, precision, recall and F1 score are as follows: Accuracy = T P + T NT otal Samples (10)

P recision = T PT P + F P (11)

Recall = T PT P + F N (12) F × P recision × RecallP recision + Recall (13)where

T P stands for true positive, while

T N , F P , and

F N stand for true negative, false positive, and false negative,respectively. Since the benchmark data set is not balanced, F score may be a more substantial evaluation metric. Forexample, COVID-19 has images and non-COVID, that is, normal and pneumonia have , and , images,respectively. What’s more, a CI is considered as it’s a more practical metric compared with speciﬁc performanceindicators. It can increase the level of statistical signiﬁcance as well as can reﬂect the reliability of the problem domain.Finally, we displayed the ROC curve to display the results and measured the area under the ROC curve (usually calledAUC) to provide information about the effectiveness of the model. The ROC curve is plotted between True PositiveRate (TPR)/Recall and False Positive Rate (FPR), and FPR is deﬁned as

F P R = F PF P + T N . (14)

In Table 6, the predictions of the proposed ECOVNet without any ensemble are shown. In the comparison withoutan ensemble, the prediction of ECOVNet with EfﬁcientNet-B5 pre-trained weights yields superior results than otherbase models for the case of images with augmentation and without augmentation, which reﬂects the fact that featureextraction using an optimized model that considers three aspects, namely higher depth and width, and a broader imageresolution, can capture more and ﬁner details, thereby improving classiﬁcation accuracy. Without augmentation, underthe condition of the imbalanced test set, ECOVNet’s accuracy reaches . , and its performance is slightly better forthe balance test set, reaching . accuracy. On the other hand, under augmentation condition, ECOVNet has thesame best accuracy, i.e., . for both unbalanced and balanced test sets. Moreover, in Table 6, we used a CIfor accuracy as the measure to analyze the uncertainty inherent of the ECOVNet. A tight range of CI means higherprecision, while the wide range of CI indicates the opposite. As we can see, for the imbalanced test set, the CI intervalis within a narrow range, but for the balanced case, the CI range is wider because it considers a smaller amount of testdata. Furthermore, Figure 4 shows the training loss of ECOVNet considering EfﬁcientNet-B5. Note that the value in bold indicates that the method has statistically better performance than other methods.We implement two ensemble strategies: hard ensemble and soft ensemble, and each ensemble considers a total of model snapshots that are generated during a single training. Table 7 and Table 8 show the classiﬁcation results ofdifferent evaluation indicators for without augmentation and augmentation, respectively, including ensemble methodsand no ensemble. As shown in Table 7, in handling COVID-19 cases, the ensemble methods are signiﬁcantly better thanthe no ensemble method. More speciﬁcally, the recall hits its maximum value, which is , and to a greater extent,this result demonstrates the robustness of our proposed architecture. In addition, considering that the test set is balanced,soft integration appears to be the preferred method because of its precision, recall, and F1 score of . Comparingtwo ensembles, since the average softmax score of each category will affect the direction of the desired result, the effectof the soft ensemble is better than the hard ensemble. Owing to the uneven distribution of the imbalanced test set, an11 PREPRINT - O

CTOBER

19, 2020F1 score may be more reliable than an accuracy. It can be clearly seen from Table 7 that for the unbalanced test set,compared with no ensemble, the ensemble methods can improve the F1 score of COVID-19, while the F1 scores ofthe hard ensemble and soft ensemble are . and . , respectively. For augmentation, in Table 8, we see thatthe ensemble method presents better results than the no ensemble, leading to the exception that the hard ensemble isslightly better than the soft ensemble. However, for augmentation and without augmentation, with an imbalanced testset, we observe that accuracy with more precision than a balanced test set, so the conﬁdence interval is tight whencomputed from an imbalanced test set since it covers a large sample.Table 6: Prediction performance of proposed ECOVNet without using ensemble Imbalance Test Balance TestMethod Pre-trained Weight Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI) Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI)ECOVNet(w/o aug. a ) EfﬁcientNet-B0 .

27 93 .

29 93 .

27 93 . ± .

23 89 .

11 88 .

33 88 .

41 88 . ± . EfﬁcientNet-B1 .

28 94 .

30 94 .

26 94 . ± .

14 91 .

20 90 .

33 90 .

27 90 . ± . EfﬁcientNet-B2 .

24 93 .

03 93 .

08 93 . ± .

26 91 .

87 91 .

67 91 .

70 91 . ± . EfﬁcientNet-B3 .

56 95 .

57 95 .

56 95 . ± .

01 94 .

94 94 .

67 94 .

65 94 . ± . EfﬁcientNet-B4 .

52 95 .

50 95 .

50 95 . ± .

02 95 .

26 95 .

00 95 .

01 95 . ± . EfﬁcientNet-B5 ± . ± . ECOVNet(w/ aug. b ) EfﬁcientNet-B0 .

71 74 .

10 79 .

72 74 . ± .

16 85 .

59 80 .

67 80 .

87 80 . ± . EfﬁcientNet-B1 .

02 86 .

19 87 .

67 86 . ± .

70 89 .

24 88 .

67 88 .

64 88 . ± . EfﬁcientNet-B2 .

60 93 .

10 93 .

24 93 . ± .

25 91 .

80 91 .

67 91 .

66 91 . ± . EfﬁcientNet-B3 .

60 90 .

25 90 .

92 90 . ± .

46 91 .

73 91 .

67 91 .

64 91 . ± . EfﬁcientNet-B4 .

32 93 .

73 93 .

89 93 . ± .

20 94 .

01 94 .

00 94 .

00 94 . ± . EfﬁcientNet-B5 .

79 94 .

68 94 .

70 94 . ± .

11 94 .

82 94 .

67 94 .

68 94 . ± . a w/o aug.=without augmentation b w/ aug.=with augmentation . . Epoch L o ss Train. lossVal. loss (a) Without Augmentation . . Epoch L o ss Train. lossVal. loss (b) With Augmentation

Figure 4: Loss curve of ECOVNet (Base model EfﬁcientNet-B5) during trainingIt can be seen from Figure 5 that since the ensemble methods combine the predictions from the model snapshots, theensemble methods tend to improve the classiﬁcation accuracy of the proposed ECOVNet. In addition, it is obviousthat when we consider deeper base models, the classiﬁcation accuracy of the proposed ECOVNet will increase. Morespeciﬁcally, in the case of a soft ensemble, the base models EfﬁcientNet-B4 and EfﬁcientNet-B5 provide the sameaccuracy and have the highest accuracy, that is, . Meanwhile, when the base model is moderately deeper, thehard ensemble and the soft ensemble have comparable results. On the other hand, when the model is deeper, the softensemble shows its superiority. Taking into account COVID-19 cases, for the balanced test data with soft ensemble,Figure 6 shows the precision, recall, and F1 score of ECOVNet. When comparing the precision of ECOVNet, wehave seen that, except for EfﬁcientNet-B0, almost all base models show signiﬁcantly better performance. However, interms of recall, as we consider more in-depth base models, the value gradually increases, but it decreases by from12 PREPRINT - O

CTOBER

19, 2020Table 7: Class-wise classiﬁcation results of ECOVNet (Base model EfﬁcientNet-B5) without augmentation

Imbalance Test Balance TestMethod Class Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI) Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI)ECOVNet(W/O Ensemble) COVID-19 .

43 96 .

00 93 .

66 96 . ± . .

00 97 .

96 96 . ± . Normal .

07 97 .

29 97 .

18 93 .

40 99 .

00 96 . Pneumonia .

91 94 .

78 95 .

34 95 .

92 94 .

00 94 . ECOVNet(Hard Ensemble) COVID-19 .

17 97 .

00 95 .

57 96 . ± . .

00 98 .

48 95 . ± . Normal .

05 96 .

72 96 .

89 93 .

27 97 .

00 95 . Pneumonia .

95 94 .

95 93 .

94 93 .

00 93 . ECOVNet(Soft Ensemble) COVID-19 . .

15 96 . ± .

100 100 100 97.00 ± . Normal .

05 96 .

61 96 .

83 93 .

33 98 .

00 95 . Pneumonia .

25 94 .

61 94 .

93 97 .

89 93 .

00 95 . Table 8: Class-wise classiﬁcation results of ECOVNet (Base model EfﬁcientNet-B5) with augmentation

Imbalance Test Balance TestMethod Class Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI) Precision( % ) Recall( % ) F1( % ) Accuracy( % )( CI)ECOVNet(W/O Ensemble) COVID-19 .

62 92 .

00 89 .

76 94 . ± .

11 98 .

92 92 .

00 95 .

34 94 . ± . Normal .

31 94 .

12 95 .

69 91 .

35 95 .

00 93 . Pneumonia .

31 95 .

96 94 .

06 94 . . ECOVNet(Hard Ensemble) COVID-19 .

29 93 .

00 91 . ± .

02 98 .

94 93 .

00 95 .

88 95 . ± . Normal .

35 95 .

37 96 .

35 90 . . Pneumonia .

76 96 .

13 94 .

93 96 .

97 96 . ECOVNet(Soft Ensemble) COVID-19 .

45 94 .

00 89 .

52 95 . ± . .

00 96 .

41 95 . ± . Normal .

67 94 .

92 96 .

28 92 .

23 95 .

00 93 . Pneumonia .

43 95 .

79 94 .

60 94 .

12 96 .

00 95 . ECOVNet-B0 to ECOVNet-B1. The same observation is true for F1-score while a drop of from ECOVNet-B0 toECOVNet-B1.It is often useful to analyze the ROC curve to reﬂect the classiﬁcation performance of the model since the ROC curvegives a summary of the trade-off between the true positive rate and the false positive rate of a model that takes intoaccount different probability thresholds. In Figure 7, the ROC curves show the micro and macro average and class-wiseAUC scores obtained by the proposed ECOVNet, where each curve refers to the ROC curve of an individual modelsnapshot. The AUC scores of all categories are consistent, indicating that the prediction of the proposed model isstable. However, the AUC scores in the third and fourth snapshots are better than other snapshots. As it is evidentfrom Figure 7 that the area under the curve of all classes is relatively similar, but COVID-19’s AUC is higher thanother classes, i.e., . Figure 8 shows the confusion matrices of the proposed ECOVNet considering the base model ofEﬁicientNet-B5. In Figure 8, it is clear that for COVID-19, the ensemble methods provide much better results thanthose without ensemble. For balanced and unbalanced test sets, these methods provide results that are − betterthan those without ensemble. However, ECOVNet shows the ability to detect normal and pneumonia chest X-rays,and it provides the same performance while ensemble or no ensemble for the imbalanced test set, although it shows aslightly better performance when classifying the balanced test set with no ensemble. Finally, we can say that ECOVNetis an eminent architecture for detecting COVID-19 cases from chest X-ray images, because it focuses on distinguishingfeatures that help distinguish COVID-19 from other types (such as normal and pneumonia). Table 9 shows the comparison between the proposed method and the latest methods from which to detect COVID-19using chest X-rays, and we have seen that the proposed method is superior to other methods. Some previous methods(namely COVID-Net [22], EfﬁcientNet-B3 [30], DeepCOVIDExplainer [34]) used ImageNet weights and the COVIDxdata set, however, one of the previous methods, i.e., DeepCOVIDExplainer, also considered two ensemble strategies.On the other hand, CovXNet [38] used an ensemble method and a transfer learning scheme from non-COVID chestX-rays, while retaining training and testing data sets other than COVIDx. One of the previous methods [30] showedcomparable performance to our proposed method in terms of accuracy because it can reach . Another methodcalled PDCOVIDNet [41] achieved an accuracy of . , which lacked by a small margin compared to our proposedmethod. As we have observed that the proposed approach consistently exhibits better classiﬁcation accuracy in different13 PREPRINT - O

CTOBER

19, 2020

ECOVNet-B0 ECOVNet-B1 ECOVNet-B2 ECOVNet-B3 ECOVNet-B4 ECOVNet-B5 . . . . . . . . . . . . . A cc u r ac y ( % ) No Ensemble Hard Ensemble Soft Ensemble

Figure 5: Comparison between ensemble and no ensemble of the proposed ECOVNet in terms of accuracy forthe balanced test data.combinations of ensemble with an imbalanced and a balanced set of test data considering a larger number of COVID-19chest X-rays. When comparing the results of the two ensemble methods, we observed that the soft ensemble showedimpressive results in classifying COVID-19, and the accuracy and recall were both . In our evaluation, we applied the Grad-CAM visual interpretation method to visually depict the salient areas whereECOVNet emphasizes the classiﬁcation decision for a given chest X-ray image. Accurate and deﬁnitive salient regiondetection is crucial for the analysis of classiﬁcation decisions as well as for assuring the trustworthiness of the results.In order to locate the salient area, the feature weights with various illuminations related to feature importance are usedto create a two-dimensional heat map and superimpose it on a given input image. Figure 9 shows the visualizationresults of locating Grad-CAM using ECOVNet for each model snapshots. This salient area locates the area of eachcategory area in the lung that has been identiﬁed when a given image is classiﬁed as COVID-19 or normal or pneumonia.As shown in Figure 9, for COVID-19, a ground-glass opacity(GGO) occurs along with some consolidation, therebypartially covering the markings of the lungs. Hence, it leads to lung inﬂammation in both the upper and lower zones ofthe lung. When examining the heat maps generated from the COVID-19 chest X-ray, it can be distinguished that theheat maps created from snapshot 2 and snapshot 3 points to the salient area (such as GGO). However, in the case of thenormal chest X-ray, no lung inﬂammation is observed, so there is no signiﬁcant area, thereby easily distinguishablefrom other classes, i.e., COVID-19 and pneumonia. As well, it can be observed from the chest X-ray for pneumonia isthat there are GGOs in the middle and lower parts of the lungs. The heat maps generated for the pneumonia chest X-rayare localized in the salient regions with GGO, but for the 4th snapshot model, it appears to fail to identify the salientregions as the heat map highlights outside the lung. Accordingly, we believe that the proposed ECOVNet providessufﬁcient information about the inherent causes of the COVID-19 disease through an intuitive heat map, and this typeof heat map can help AI-based systems interpret the classiﬁcation results achieved from the proposed architecture.14

PREPRINT - O

CTOBER

19, 2020

ECOVNet-B0 ECOVNet-B1 ECOVNet-B2 ECOVNet-B3 ECOVNet-B4 ECOVNet-B5 . . . . . . . V a l u e ( % ) Precision Recall F1-Score

Figure 6: Precision, Recall, F1 score of the proposed ECOVNet for the balanced test data with soft ensembleconsidering COVID-19 casesFigure 7: ROC curves of model snapshots of the proposed ECOVNet considering EfﬁcientNet-B5 base model15

PREPRINT - O

CTOBER

19, 2020 T r u e l a b e l .

00 0 . . .

00 0 .

98 0 . .

00 0 .

07 0 . T r u e l a b e l .

97 0 . . .

00 0 .

97 0 . .

00 0 .

07 0 . T r u e l a b e l .

96 0 . . .

00 0 .

99 0 . .

00 0 .

06 0 . T r u e l a b e l .

00 0 . . .

00 0 .

97 0 . .

01 0 .

04 0 . T r u e l a b e l .

97 0 . . .

00 0 .

97 0 . .

01 0 .

04 0 . T r u e l a b e l .

96 0 . . .

00 0 .

97 0 . .

01 0 .

04 0 . Predicted label(b) Hard Ensemble(Balanced Test) Predicted label(c) No Ensemble(Balanced Test)Predicted label(a) Soft Ensemble(Balanced Test) Predicted label(e) Hard Ensemble(Imbalanced Test) Predicted label(f) No Ensemble(Imbalanced Test)Predicted label(d) Soft Ensemble(Imbalanced Test)

Figure 8: Confusion matrices of the proposed ECOVNet considering EfﬁcientNet-B5 as a base model. In theconfusion matrices, the predicted labels, such as COVID-19, Normal, and Pneumonia, are marked as 0, 1, and 2,respectively.

In this paper, we proposed a new modular architecture ECOVNet based on CNN, which can effectively detect COVID-19with the class activation maps from one of the largest publicly available chest X-ray data set, i.e., COVIDx. In thiswork, a highly effective CNN structure (such as the EfﬁcientNet base model with ImageNet pre-trained weights) is usedas feature extractors, while ﬁne-tuned pre-trained weights are considered for related COVID-19 detection tasks. Also,ensemble predictions can improve performance by exploiting the predictions obtained from the proposed ECOVNetmodel snapshots. From empirical evaluations, it is observed that the soft ensemble of the proposed ECOVNet modelsnapshots outperformed the other state-of-the-art methods. Finally, we performed a visualization study to locatesigniﬁcant areas in the chest X-ray through the class activation map, which is used to classify the chest X-ray intoits expected category. What’s more, we believe that our ﬁndings will make a useful contribution to the control ofCOVID-19 infection and the widespread acceptance of automated applications in medical practice.While this work contributes to reduce the effort of health professional’s radiological assessment, our further plan isto lead this work to design a fully-functional application using guidelines of the design research paradigm [67, 68].Such a modern methodological lens could offer further directions both for developing innovative clinical solutionsand associative knowledge in the body of relevant literature. Furthermore, we will spring up a mobile application thatcan be able to prognosticate whether the disease will become a deadly or not through analyzing a patient’s short termhistorical chest X-ray pattern if the patient manifests any clinical symptoms related to COVID-19 disease. Therefore,this might be a new way to prevent and stop the spread of the COVID-19 pandemic.16

PREPRINT - O

CTOBER

19, 2020Table 9: Comparison of the proposed ECOVNet with other state-of-the-art methods on COVID-19 detection

Method Total chest X-rays Precision( % )(COVID-19) Recall( % )(COVID-19) Accuracy( % )COVID-Net [22] COVID-19 , , Normal, , Pneumonia . . . EfﬁcientNet-B3 [30](Flat Classiﬁcation)

COVID-19 , , Normal, , Pneumonia . . EfﬁcientNet-B3 [30](Hierarchical Classiﬁcation)

COVID-19 , , Normal, , Pneumonia . . DeepCOVIDExplainer [34]

COVID-19 , , Normal, , Pneumonia . . . CovXNet [38]

COVID-19 ,

Viral Pneumonia,

Bacterial Pneumonia . . . COVID-19 ,

Viral Pneumonia,

Bacterial Pneumonia +

Normal . . . PDCOVIDNet [41]

COVID-19 , , Normal, , Viral Pneumonia . . . ECOVNet-Hard Ensemble a (Proposed) COVID-19 , , Normal, , Pneumonia . . . ECOVNet-Soft Ensemble a (Proposed) COVID-19 , , Normal, , Pneumonia . . ECOVNet-Hard Ensemble b (Proposed) COVID-19 , , Normal, , Pneumonia . . ECOVNet-Soft Ensemble b (Proposed) COVID-19 , , Normal, , Pneumonia

100 100 97.0 a Imbalanced test set b Balanced test set i = 1 i = 2 i = 3 i = 4 i = 5 COVID-19NormalPneumonia Grad-CAM Visualization for i th Snapshot Model

Figure 9: Grad-CAM visualization for the proposed ECOVNet considering the base model EfﬁcientNet-B5. Atotal of 5 (ﬁve) model snapshots were generated during the training process.17

PREPRINT - O

CTOBER

19, 2020

References [1] World Health Organization. Covid-2019 situation reports. Online: .[2] World Health Organization. Covid-19 pandemic. Online: .[3] C Zheng, X Deng, Q Fu, Q Zhou, J Feng, H Ma, W Liu, and X Wang. Deep learning-based detection for covid-19from chest ct using weak label. medRXiv:10.1101/2020.03.12.20027185 , 2020.[4] Yicheng Fang, Huangqi Zhang, Jicheng Xie, Minjie Lin, Lingjun Ying, Peipei Pang, and Wenbin Ji. Sensitivity ofchest ct for covid-19: Comparison to rt-pcr.

Radiology , 296(2), 2020.[5] World Health Organization. Use of chest imaging in covid-19, 2020. Online: .[6] Jin Ying-Hui and et al. A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus(2019-ncov) infected pneumonia (standard version).

Military Medical Research , 7:4(1), 2020.[7] GD Rubin, CJ Ryerson, LB Haramati, and et al. The role of chest imaging in patient management during thecovid-19 pandemic: A multinational consensus statement from the ﬂeischner society.

Chest , 158(1):106–116,2020.[8] H. E. Davies, C. G. Wathen, and F. V. Gleeson. The risks of radiation exposure related to diagnostic imaging andhow to minimise them.

Bmj , 342:d947, 2011.[9] Ming-Yen Ng, Elaine Yp Lee, Jin Yang, Fangfang Yang, Xia Li, Hongxia Wang, Macy Mei-Sze Lui, ChristineShing-Yen Lo, Barry Leung, Pek-Lan Khong, and et al. Imaging proﬁle of the covid-19 infection: Radiologicﬁndings and literature review.

Radiology: Cardiothoracic Imaging , 2(1), 2020.[10] BBC. Bbc business reports. Online: .[11] Mingxing Tan and Quoc V. Le. Efﬁcientnet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946 , 2019.[12] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-CAM: Visual explanationsfrom deep networks via gradient-based localization. In

IEEE International Conference on Computer Vision(ICCV) , pages 618–626, 2017.[13] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Yi Ding, AartiBagul, Curtis Langlotz, Katie S. Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet: Radiologist-levelpneumonia detection on chest x-rays with deep learning. arXiv:1711.05225v3 , 2017.[14] Afshin Shoeibi, Marjane Khodatars, Roohallah Alizadehsani, Navid Ghassemi, Mahboobeh Jafari, Parisa Moridian,Ali Khadem, Delaram Sadeghi, Sadiq Hussain, Assef Zare, Zahra Alizadeh Sani, Javad Bazeli, Fahime Khozeimeh,Abbas Khosravi, Saeid Nahavandi, U. Rajendra Acharya, and Peng Shi. Automated detection and forecasting ofcovid-19 using deep learning techniques: A review. arXiv:abs/2007.10785 , 2020.[15] Halgurd S. Maghdid, Aras T. Asaad, Kayhan Zrar Ghafoor, Ali Safaa Sadiq, and Muhammad Khurram Khan.Diagnosing covid-19 pneumonia from x-ray and ct images using deep learning and transfer learning algorithms. arXiv:2004.00038 , 2020.[16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classiﬁcation with deep convolutional neuralnetworks. In

Proceedings of the 25th International Conference on Neural Information Processing Systems ,volume 1, page 1097–1105, 2012.[17] A Narin, K Ceren, and P Ziynet. Automatic detection of coronavirus disease (covid-19) using x-ray images anddeep convolutional neural networks. arXiv:2003.10849 , 2020.[18] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical ImageDatabase. In

CVPR , page 248–255, 2009.[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

IEEEConference on Computer Vision and Pattern Recognition (CVPR) , pages 770–778, 2016.[20] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and ZB Wojna. Rethinking the inceptionarchitecture for computer vision. In

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016.[21] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residualconnections on learning. In

Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017.18

PREPRINT - O

CTOBER

19, 2020[22] L Wang and A Wong. Covid-net: A tailored deep convolutional neural network design for detection of covid-19cases from chest x-ray images. arXiv:2003.09871 , 2020.[23] D Apostolopoulos, I and A Mpesiana, T. Covid-19: automatic detection from x-ray images utilizing trans-fer learning with convolutional neural networks.

Physical and Engineering Sciences in Medicine , 2020.DOI:10.1007/s13246-020-00865-4.[24] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In

International Conference on Learning Representations , 2015.[25] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2:Inverted residuals and linear bottlenecks. In

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR) , 2018.[26] François Chollet. Xception: Deep learning with depthwise separable convolutions. In

Proceedings of the IEEEconference on computer vision and pattern recognition , pages 1251–1258, 2017.[27] Shervin Minaee, Rahele Kaﬁeh, Milan Sonka, Shakib Yazdani, and Ghazaleh Jamalipour Souﬁ. Deep-covid:Predicting covid-19 from chest x-ray images using deep transfer learning.

Medical Image Analysis , 65, 2020.[28] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer.Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size. arXiv:1602.07360 , 2016.[29] Gao Huang, Liu Zhuang, Laurens Van Der Maaten, and Kilian Q. Weinberger. Densely connected convolutionalnetworks. In

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4700–4708, 2017.[30] Eduardo Luz, Pedro Lopes Silva, Rodrigo Silva, Ludmila Silva, Gladston Moreira, and David Menotti. Towardsan effective and efﬁcient deep learning model for covid-19 patterns detection in x-ray images. arXiv:2004.05717 ,2020.[31] Muhammad Farooq and Abdul Hafeez. Covid-resnet: A deep learning framework for screening of covid19 fromradiographs. arXiv:2003.14395 , 2020.[32] Tulin Ozturk, Muhammed Talo, Eylul Azra Yildirim, Ulas Baran Baloglu, Ozal Yildirim, and U. RajendraAcharyaf. Automated detection of covid-19 cases using deep neural networks with x-ray images.

Computers inBiology and Medicine 121 (2020) 103792 , 2020. DOI:10.1016/j.compbiomed.2020.103792.[33] Ferhat Ucar and Deniz Korkmaz. Covidiagnosis-net: Deep bayes-squeezenet based diagnosis of the coronavirusdisease 2019 (covid-19) from x-ray images.

Medical Hypotheses , 140, 2020.[34] Md. Rezaul Karim, Till Döhmen, Dietrich Rebholz-Schuhmann, Stefan Decker, Michael Cochez, and Oya DenizBeyan. Deepcovidexplainer: Explainable covid-19 predictions based on chest x-ray images. arXiv:abs/2004.04582 ,2020.[35] Parnian Afshar, Shahin Heidarian, Farnoosh Naderkhani, A. Oikonomou, Konstantinos N. Plataniotis, and ArashMohammadi. Covid-caps: A capsule network-based framework for identiﬁcation of covid-19 cases from x-rayimages. arXiv:abs/2004.02696 , 2020.[36] Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with em routing. In

ICLR , 2018.[37] Asif Iqbal Khan, Junaid Latief Shah, and Mohammad Mudasir Bhat. Coronet: A deep neural network for detectionand diagnosis of covid-19 from chest x-ray images.

Computer Methods and Programs in Biomedicine , 196, 2020.[38] Tanvir Mahmud, Md Awsafur Rahman, and Shaikh Anowarul Fattah. Covxnet: A multi-dilation convolutionalneural network for automatic covid-19 and other pneumonia detection from chest x-ray images with transferablemulti-receptive feature optimization.

Computers in Biology and Medicine , 122, 2020.[39] Arman Haghanifar, Mahdiyar Molahasani Majdabadi, and Seok-Bum Ko. Covid-cxnet: Detecting covid-19 infrontal chest x-ray images using deep learning. arXiv:abs/2006.13807 , 2020.[40] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, AartiBagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet: Radiologist-levelpneumonia detection on chest x-rays with deep learning. arXiv:1711.05225 , 2017.[41] Nihad Karim Chowdhury, Md. Muhtadir Rahman, and Muhammad Ashad Kabir. Pdcovidnet: A parallel-dilatedconvolutional neural network architecture for detecting covid-19 from chest x-ray images.

Health InformationScience and Systems , 8:27, 2020.[42] Joseph Paul Cohen, Paul Morrison, and Lan Dao. Covid-19 image data collection. arXiv:2003.11597 , 2020.[43] Paul Mooney. Chest x-ray images (pneumonia). Online: . 19

PREPRINT - O

CTOBER

19, 2020[44] Zhong Qiu Lin, Mohammad Javad Shaﬁee, Stanislav Bochkarev, Michael St. Jules, Xiao Yu Wang, and AlexanderWong. Do explanations reﬂect decisions? a machine-centric strategy to quantify the performance of explainabilityalgorithms. arXiv:1910.07387 , 2019.[45] Kermany DS, Goldbaum M, and Cai W.et al. Identifying medical diagnoses and treatable diseases by image-baseddeep learning.

Cell , 172(5):1122–1131, 2018.[46] Radiological Society of North America. Rsna pneumonia detection challenge, 2019. Online: .[47] SIRM. Covid-19 database. Online: .[48] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classiﬁcation and localizationof common thorax diseases. In

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages2097–2106, 2017.[49] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian. Grad-CAM++: Generalized gradient-basedvisual explanations for deep convolutional networks. In

IEEE Winter Conference on Applications of ComputerVision (WACV) , pages 839–847, 2018.[50] S. Bach, Alexander Binder, Grégoire Montavon, F. Klauschen, K. Müller, and W. Samek. On pixel-wiseexplanations for non-linear classiﬁer decisions by layer-wise relevance propagation.

PLoS ONE , 10, 2015.[51] Daniel S. Kermany, K. Zhang, and M. Goldbaum. Labeled optical coherence tomography (oct) and chest x-rayimages for classiﬁcation. In

Mendeley Data, V2 , 2018.[52] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictionsof any classiﬁer. In

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , page 1135–1144, 2016.[53] Muhammad E. H. Chowdhury, Tawsifur Rahman, Amith Khandakar, Rashid Mazhar, Muhammad Abdul Kadir,Zaid Bin Mahbub, Khandaker Reajul Islam, Muhammad Salman Khan, Atif Iqbal, Nasser Al-Emadi, and MamunBin Ibne Reaz. Can ai help in screening viral and covid-19 pneumonia? arXiv:2003.13145 , 2020.[54] A Chung. Figure1 covid chestxray dataset, 2020. Online: https://github.com/agchung/Figure1-COVID-chestxray-dataset .[55] A Chung. Actualmed covid-19 chest x-ray data initiative, 2020. Online: https://github.com/agchung/Actualmed-COVID-chestxray-dataset .[56] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep learning . The MIT Press, 2016.[57] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le.Mnasnet: Platform-aware neural architecture search for mobile. In , 2019.[58] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In , pages 7132–7141, 2018.[59] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for activation functions. arXiv:abs/1710.05941 ,2017.[60] S. Rajaraman, J. Siegelman, P. O. Alderson, L. S. Folio, L. R. Folio, and S. K. Antani. Iteratively pruned deeplearning ensembles for covid-19 detection in chest x-rays.

IEEE Access , 8, 2020.[61] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internalcovariate shift. arXiv:abs/1502.03167 , 2015.[62] Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How does batch normalization helpoptimization? In

Proceedings of the 32nd International Conference on Neural Information Processing Systems ,page 2488–2498, 2018.[63] Vinod Nair and Geoffrey E. Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In

Proceedingsof the 27th International Conference on International Conference on Machine Learning , pages 807–814, 2010.[64] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simpleway to prevent neural networks from overﬁtting.

Journal of Machine Learning Research , 15(56):1929–1958,2014.[65] Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, and Kilian Q. Weinberger. Snapshotensembles: Train 1, get m for free. arXiv:1704.00109 , 2017.20

PREPRINT - O

CTOBER

19, 2020[66] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980 , 2014.[67] Shah J Miah and John G Gammack. Ensemble artifact design for context sensitive decision support.

AustralasianJournal of Information Systems , 18(2), Jun. 2014.[68] Shah Jahan Miah.