[PDF] Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans

Abstract

The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3\_18) to establish the baseline performance on the three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification with the Gumbel Softmax technique to improve the searching efficiency. We further exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID-19 datasets to provide interpretability for medical diagnosis.

Full PDF

AAutomated Model Design and Benchmarking of 3D Deep Learning Models forCOVID-19 Detection with Chest CT Scans

Xin He, Shihao Wang, Xiaowen Chu , * Shaohuai Shi, Jiangping Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, ∗ Guiguang Ding Department of Computer Science, Hong Kong Baptist University, Hong Kong, China Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China School of Automation, Hangzhou Dianzi University, Hang Zhou, China School of Software, Tsinghua University, Beijing, China

Abstract

The COVID-19 pandemic has spread globally for severalmonths. Because its transmissibility and high pathogenicityseriously threaten people’s lives, it is crucial to accuratelyand quickly detect COVID-19 infection. Many recent stud-ies have shown that deep learning (DL) based solutions canhelp detect COVID-19 based on chest CT scans. However,most existing work focuses on 2D datasets, which may re-sult in low quality models as the real CT scans are 3D im-ages. Besides, the reported results span a broad spectrumon different datasets with a relatively unfair comparison.In this paper, we ﬁrst use three state-of-the-art 3D mod-els (ResNet3D101, DenseNet3D121, and MC3 18) to estab-lish the baseline performance on the three publicly availablechest CT scan datasets. Then we propose a differentiable neu-ral architecture search (DNAS) framework to automaticallysearch for the 3D DL models for 3D chest CT scans classi-ﬁcation with the Gumbel Softmax technique to improve thesearching efﬁciency. We further exploit the Class ActivationMapping (CAM) technique on our models to provide the in-terpretability of the results. The experimental results showthat our automatically searched models (CovidNet3D) out-perform the baseline human-designed models on the threedatasets with tens of times smaller model size and higher ac-curacy. Furthermore, the results also verify that CAM can bewell applied in CovidNet3D for COVID-19 datasets to pro-vide interpretability for medical diagnosis. Code: https://github.com/HKBU-HPML/CovidNet3D . Introduction

The Corona Virus Disease 2019 (COVID-19), pandemic isan ongoing pandemic caused by severe acute respiratorysyndrome coronavirus 2 (SARS-CoV-2). The SARS-CoV-2virus can be easily spread among people via small dropletsproduced by coughing, sneezing, and talking. COVID-19 isnot only easily contagious but also a severe threat to hu-man lives. The COVID-19 infected patients usually presentpneumonia-like symptoms, such as fever, dry cough anddyspnea, and gastrointestinal symptoms, followed by a se-vere acute respiratory infection. The usual incubation periodof COVID-19 ranges from one to 14 days. Many COVID-19 patients do not even know that they have been infected * without any symptoms, which would easily cause delayedtreatments and lead to a sudden exacerbation of the condi-tion. Therefore, a fast and accurate method of diagnosingCOVID-19 infection is crucial.Currently, there are two commonly used methods forCOVID-19 diagnosis. One is viral testing, which uses real-time reverse transcription-prognosis chain reaction (rRT-PCR) to detect viral RNA fragments. The other is makingdiagnoses based on characteristic imaging features on chestX-rays or computed tomography (CT) scan images. (Ai etal. 2020) conducted the effectiveness comparison betweenthe two diagnosis methods and concluded that chest CT hasa faster detection from the initial negative to positive thanrRT-PCR. However, the manual process of analyzing and di-agnosing based on CT images highly relies on professionalknowledge and is time-consuming to analyze the features ofthe CT images. Therefore, many recent studies have tried touse deep learning (DL) methods to assist COVID-19 diag-nosis with chest X-rays or CT scan images.However, the reported accuracy of the existing DL-basedCOVID-19 detection solutions spans a broad spectrum be-cause they were evaluated on different datasets, making itdifﬁcult to achieve a fair comparison. Besides, most stud-ies focus on 2D CT datasets (Singh et al. 2020; Ardakaniet al. 2020; Alom et al. 2020). However, the real CT scanis usually the 3D data. Thus it is necessary to use 3D mod-els to classify 3D CT scan data. To this end, we use threestate-of-the-art (SOTA) 3D DL models to establish the base-line performance on the three open-source 3D chest CTscan datasets: CC-CCII (Zhang et al. 2020b), MosMedData(Morozov et al. 2020) and COVID-CTset (Rahimzadeh, At-tar, and Sakhaei 2020). The details are shown in Table 2.In addition, designing a high-quality model for the spe-ciﬁc medical image dataset is a time-consuming task andrequires much expertise, which hinders the development ofDL technology in the medical ﬁeld. Recently, neural archi-tecture search (NAS) has become a prevalent topic, as itcan efﬁciently discover high-quality DL models automati-cally. Many studies have used the NAS technique to imageclassiﬁcation and object detection tasks (Pham et al. 2018;Liu, Simonyan, and Yang 2019; Zoph et al. 2018; Tan et al. We ﬁnd there are some errors and noises in the original dataset(Version 1.0). Therefore we built our version based on it. a r X i v : . [ ee ss . I V ] F e b able 1: Summary of the existing studies of DL-based methods for COVID-19 detection. NCP indicates the novel coronaviruspneumonia, Non-NCP includes CP (common pneumonia) and Normal. ‡ : the number of scans. *: the number of patients. Paper Type Open-source? Dataset Statistics Class Statistics ( (Ghoshal and Tucker 2020) X-ray(2D) Yes - - 5,941 68 4,290 1,583 1,188 2D 88.39(Zhang et al. 2020a) X-ray(2D) Yes - - 1,531 100 1,431 764 2D -(Narin, Kaya, and Pamuk 2020) X-ray(2D) Yes - - 100 50 50 20 2D 98.00(Singh et al. 2020) CT(2D) No - - 133 68 65 26 2D 93.20(Ardakani et al. 2020) CT(2D) No 194 - 1,020 510 510 102 2D 99.63(Alom et al. 2020) CT(2D) Yes - - 425 178 247 45 2D 98.78(He et al. 2020) CT(2D) Yes 143 - 746 349 397 186 2D 86.00(Mobiny et al. 2020) CT(2D) Yes - - 746 349 397 105 2D 87.60(Rahimzadeh, Attar, and Sakhaei 2020) CT(3D) Yes 377 526 12,058 244 ‡ ‡ ‡

2D -(Zheng et al. 2020) CT(3D) No 542 630 - 313* 229* 131* 3D 90.10(Li et al. 2020) CT(3D) No 3,322 4,356 - 1,296 ‡ ‡ ‡ ‡

3D -(Morozov et al. 2020) CT(3D) Yes 1,110 1,110 46,411 856 ‡ ‡ ‡

3D -(Zhang et al. 2020b) CT(3D) Yes 2,778 4,356 444,034 1,578 ‡ ‡ ‡ ‡

3D 92.49

CovidNet3D . The experimental results show that Covid-Net3D can achieve comparable results to human-designedSOTA models with a smaller size. Furthermore, medical di-agnoses generally require interpretability of the decision, sowe apply Class Activation Mapping (CAM) (Zhou et al.2016) techniques to provide interpretability for our Covid-Net3D models. In summary, our contributions are summa-rized as follows:• We use three manually designed 3D models to estab-lish the baseline performance on the three open-sourceCOVID-19 chest CT scan datasets.• To the best of our knowledge, we are the ﬁrst to apply theNAS technique to search for 3D DL models for COVID-19 chest CT scan datasets. Our DNAS framework can efﬁ-ciently discover competitive neural architectures that out-perform the baseline models on the three CT datasets.• We use the Class Activation Mapping (CAM) (Zhou et al.2016) algorithm to add the interpretability of our DNAS-designed models, which can help doctors quickly locatethe discriminative lesion areas on the CT scan images.

Related Work

In recent years, DL techniques have been proven to be effec-tive in diagnosing diseases with X-ray and CT images (Lit-jens et al. 2017). To enable DL techniques to be applied inhelping the detection of COVID-19, an increasing number ofpublicly available COVID-19 datasets have been proposed,as shown in Table 1.

Publicly-available Datasets of COVID-19

We separate the publicly available datasets into two differentcategories: the pre-pandemic datasets and the post-pandemicdatasets which mainly differ in quality and quantity.

Pre-pandemic Datasets

In the pre-pandemic period,gathering datasets for COVID-19 is a tough job as there isno enough data for collection. Most datasets in this periodwere gathered from medical papers or uploaded by the pub-lic. IEEE8023 Coivd-chestxray-dataset (Cohen, Morrison,and Dao 2020) is a dataset of COVID-19 cases with chestX-ray and CT images collected from public sources. But itsquality has no guarantee since the images are not veriﬁedby medical experts. Covid-ct-dataset (Zhao et al. 2020) isanother CT dataset of COVID-19, mainly composed of CTimages extracted from COVID-19 research papers, and itsquality is low. The dataset only contains 2D information be-cause each patient has only one to several CT images insteadof a complete 3D scan volume.

Post-pandemic Datasets

During the pandemic, the num-ber of conﬁrmed cases of COVID-19 has been rising rapidly,which brings many high-quality COVID-19 chest CT scandatasets, such as CC-CCII (Zhang et al. 2020b) and COVID-CTset (Rahimzadeh, Attar, and Sakhaei 2020). Some ofthem have annotations by doctors, e.g., COVID-19-CT-Seg-Dataset (Jun et al. 2020) and MosMedData (Morozov et al.2020). The three datasets we use in this work are all fromthis category.

DL-based methods for COVID-19 detection

Much research is conducted on CT images, but the 3D in-formation of CT images is under-explored, such as the workby (He et al. 2020; Mobiny et al. 2020; Singh et al. 2020).These work mainly propose 2D DL models for COVID-19detection. (Ardakani et al. 2020) benchmarks ten 2D CNNsand compares their performance in classifying 2D CT im-ages on their private dataset with 102 testing images. On theother hand, the studies in utilizing 3D CT images are rel-atively rare, mainly due to the lack of 3D COVID-19 CTscan datasets. (Li et al. 2020; Zheng et al. 2020) propose3D CNNs with their private 3D CT datasets. There are alsoome other studies conducted on X-ray images. For exam-ple, (Narin, Kaya, and Pamuk 2020) proposes three 2D DLmodels for COVID-19 detection. (Zhang et al. 2020a) intro-duces a deep anomaly detection model for fast and reliablescreening. (Ghoshal and Tucker 2020) investigates the esti-mation of uncertainty and interpretability by Bayesian CNNon the X-ray images. (Alom et al. 2020) uses both X-rayimages and CT images to do segmentation and detection.

Neural Architecture Search

In recent years, NAS has created many SOTA results byautomatically searching for neural architectures for manytasks (He, Zhao, and Chu 2021; Elsken, Metzen, and Hut-ter 2018). (Zoph and Le 2017; Zoph et al. 2018) ﬁrst pro-pose to use reinforcement learning (RL) to search for neu-ral architectures and achieves comparable results to SOTAhuman-designed models. Since then, several types of NASmethods have been proposed, such as evolutionary algo-rithm (EA) (Real et al. 2019), surrogate model-based op-timization (SMBO) (Liu et al. 2018), and gradient de-scent (GD) based methods (Liu, Simonyan, and Yang 2019;Dong and Yang 2019). (Dong and Yang 2019; Wu et al.2019) combine the GD-based method and the Gumbel Soft-max (Jang, Gu, and Poole 2017) technique to further im-prove the searching efﬁciency.Due to the success of NAS in natural image recognition(such as ImageNet (Deng et al. 2009)), researchers also tryto extend it to the medical datasets, such as Magnetic reso-nance imaging (MRI) segmentation (Kim et al. 2019). (Faeset al. 2019) uses ﬁve public datasets, MESSIDOR, OCT im-ages, HAM 10000, Paediatric images, and CXR images, tosearch for and train models by Google Cloud AutoML plat-form. Their experimental results demonstrate that AutoMLcan generate competitive classiﬁers compared to manuallydesigned DL models. But to the best of our knowledge,there is no study applying the NAS technique to search for3D DL models for COVID-19 chest CT scan datasets. Tothis end, we exploit the NAS technique to the three open-source COVID-19 chest CT scan datasets and successfullydiscover high-quality 3D models that achieve comparableperformance with the human-designed SOTA 3D models.

Method

In this section, we ﬁrst describe our search space for 3D CTscans classiﬁcation models. Then, we introduce the differen-tiable neural architecture search (DNAS) method combinedwith the Gumbel Softmax technique (Jang, Gu, and Poole2017; Dong and Yang 2019).

Search Space

There are two critical points to be considered before design-ing the search space. One is that all datasets we use arecomposed of 3D CT scans; therefore, the searched modelshould be good at extracting the information from three-dimensional spatial data. The other is that the model shouldbe lightweight, as the time required to process 3D data ismuch longer than 2D image data. inputoutputCell 1Cell n Block 1Block Bn ...

Block 1Block B1 ...

Conv3d 1x1x1,BN3d, ReLU6Dwise3d 3x3x3,BN3d, ReLU6Conv3d 1x1x1,BN3d + D×H×W×FD×H×W×3FD×H×W×F3×3×3_MBConv3D×H×W×3FConv3d 1x1x1,BN3d, ReLU6

Dwise3d 5x5x5,BN3d, ReLU6

Conv3d 1x1x1,BN3d + D×H×W×FD×H×W×6FD×H×W×F5×5×5_MBConv6D×H×W×6F

Stem ...

LinearGlobal Avg pool

Figure 1: The overview of our search space. The model isgenerated by stacking a predeﬁned number of cells. Eachcell contains different number of blocks, and the block ofdifferent cells is different and needs to be searched.

Conv3d1 × × denotes 3D convolution with 1 × × Dwise3d denotes 3D depthwise convolution,

BN3d denotes3D batch norm, D × H × W × F denotes tensor shape(depth, height, width, channel), and MBConv denotes mo-bile inverted bottleneck convolution.Although the cell-based search space (Pham et al. 2018;Liu, Simonyan, and Yang 2019) is one of the most com-monly used search space, it has several problems: 1) the ﬁnalmodel is built by stacking the same cells, which precludesthe layer diversity; 2) many searched cells are very com-plicated and fragmented and are therefore inefﬁcient for in-ference. MobileNetV2 (Sandler et al. 2018) is a lightweightmodel manually designed for mobile and embedded devicesfor efﬁcient inference. Several NAS studies (Tan et al. 2019;Wu et al. 2019) have successfully used the layer modules(Sandler et al. 2018) including inverted residuals and linearbottlenecks to search for neural architectures and achievedSOTA results on the 2D image datasets. Therefore, we useMobileNetV2 as a reference to design our 3D search space.As shown in Fig. 1, we represent the search space by asupernet, which consists of the stem layer, a ﬁxed numberof cells, and a linear layer. The stem layer performs convo-lutional operations, and the last linear layer follows behinda 3D global average pooling operation (Zhou et al. 2016).Each cell is composed of several blocks. The structures ofall blocks need to be searched. In different cells, the num-ber of channels and the number of blocks are different andhand-picked empirically. By default, all blocks have a strideof 1. However, if a cell’s input/output resolutions are differ-ent, then its ﬁrst block has a stride of 2. The blocks withinthe same cell have the same number of input/output chan-nels. Inspired by MobileNetV2 (Sandler et al. 2018), eachlock is a MBConv-similar module (see Fig. 1). It consistsof three sub-modules: 1) a point-wise ( × × ) convolution;2) a 3D depthwise convolution with K × K × K kernel size,where K is a searchable parameter; 3) another point-wise( × × ) convolution. All convolutional operations are fol-lowed by a 3D batch normalization and a ReLU6 activationfunction (Howard et al. 2017), which is denoted by Conv3D-BN3D-ReLU6, and the last convolution has no ReLU6 acti-vation. Another searchable parameter is the expansion ratio e , which controls the ratio between the size of the input bot-tleneck and the inner size. For example, × × MBConv6 denotes that the kernel size of

MBConv is × × , and theexpansion ratio is 6.In our experiments, the search space is a ﬁxed macro-architecture supernet consisting of 6 cells, where each has4 blocks, but the last cell only has 1 block. We empiricallycollect the following set of candidate operations:• × × MBConv3 • × × MBConv4 • × × MBConv6 • × × MBConv3 • × × MBConv4 • × × MBConv3 • × × MBConv4 • Skip connectionTherefore, it contains ≈ . × possible archi-tectures. Finding an optimal architecture from such a hugesearch space is a stupendous task. We will introduce oursearch strategy in the following. Differentiable NAS with Gumbel Softmax

According to (He, Zhao, and Chu 2021), gradient descent(GD) based NAS is an efﬁcient method, and many studiesuse it to ﬁnd competitive models with much shorter time andless computational resources (Dong and Yang 2019; Wu etal. 2019) than other NAS methods. Hence, in this paper, weuse the GD-based method and combine it with the GumbelSoftmax (Jang, Gu, and Poole 2017) technique to discovermodels for COVID-19 detection.

Preliminary: DARTS

DARTS (Liu, Simonyan, and Yang2019) was one of the ﬁrst studies to use GD-based methodto search for neural architectures. Each cell is deﬁned as adirected acyclic graph (DAG) of N nodes, where each nodeis a network layer, and each edge between node i and node j indicates a candidate operation (i.e., block structure) that isselected from the predeﬁned operation space O . To make thesearch space continuous, DARTS (Liu, Simonyan, and Yang2019) uses Softmax over all possible operations to relax thecategorical choice of a particular operation, i.e., o i,j ( x ) = (cid:80) Kk =0 P k o k ( x ) s.t. P k = exp ( α ki,j ) (cid:80) Kl =0 exp ( α li,j ) , (1)where o k indicates the k -th candidate operation performedon input x , α ki,j indicates the weight for the operation o k between a pair of nodes ( i, j ), and K is the number of pre-deﬁned candidate operations. The training and the validationloss are denoted by L train and L val , respectively. Therefore,the task of searching for architectures is transformed into a bilevel optimization problem of neural architecture α andthe weights ω α of the architecture: min α L val ( ω ∗ α , α ) s.t. ω ∗ α = argmin ω α L train ( ω α , α ) (2) Differentiable Model Sampling by Gumbel Softmax

InDARTS, as Fig. 2 (left) shows, the output of each node is theweighted average of the mixed operation during the wholesearch stage. It causes a linear increase in the requirementsof computational resources with the number of candidate op-erations. To alleviate this problem, we follow the same ideaas (Dong and Yang 2019). Speciﬁcally, for each layer, onlyone operation is sampled and executed with the samplingprobability distribution P α deﬁned in Equation 1. For ex-ample, the probability of being sampled for the three oper-ations in Fig. 2 (left) is 0.1, 0.2, and 0.7, respectively, butonly one operation will be sampled at a time. Therefore, thesampling distribution P α of all layers is encoded into a one-hot random distribution Z , e.g., P α = [0 . , . , . → Z =[0 , , . i0.1 0.2 0. mixed operationj single operationunsampledi0 0 1jsampled Figure 2: The comparison between two GD-based methods.(Left) Applying a mixture of all candidate operations, eachwith different weight. (Right) Only one operation is sampledat a time. (best viewed in color)However, each operation is sampled from a discrete prob-ability distribution Z , so we cannot back-propagate gradi-ents through Z to α . To enable back-propagation, we use areparameterization trick named Gumbel Softmax (Jang, Gu,and Poole 2017), which can be formulated by Z k = exp (( log α ki,j + G ki,j ) /τ ) (cid:80) Kl =0 exp (( log α li,j + G li,j ) /τ ) , (3)where G ki,j = − log ( − log ( u ki,j )) is the k -th Gumbel sam-ple, u ki,j is a uniform random variable, and τ is the softmaxtemperature. When τ → ∞ , the possibility distribution ofall operations between each pair of nodes approximates tothe one-hot distribution. To be noticed, we perform argmax function on Equation 3 during the forward process but re-turn the gradients according to the Equation 3 during thebackward process. Class Activation Mapping Algorithm

As mentioned above, the last linear layer follows behind a3D global average pooling layer, which enables us to utilize lice sampling(Under-sampling orUp-sampling)  Resize  Centercrop  Normalization ...

CNN classificationmodels stack . . . scan 1scan n

Input size : bs ×1×d×h× w Predictions

Figure 3: The pipeline of training 3D deep learning models. All CT scans need to be pre-processed by the slice samplingstrategy to make sure that each scan contains the same number of slices. The input size of network is bs × × d × h × w , where bs is batch size, d is the number of slices, h and w indicate the height and width, respectively.class activation mapping (CAM) algorithm to generate 3Dactivation maps for our model. CAM exploits the global av-erage pooling layer to calculate get the activation map M c for class c , where each spacial element is given by M c ( x, y, z ) = (cid:88) k w ck f k ( x, y, z ) (4)where in a given image, f k ( x, y, z ) is the activation of unit k at the last convolutional layer before global average poolinglayer at spatial location ( x, y, z ) , w ck is the correspondinglinear layer weight of class c for unit k . After getting theclass activation map, we can simply upsample it to the sizeof the input scan images to visualize and identify the regionsmost relevant to the speciﬁc class. Experiments

Datasets and Pre-processing

In this paper, we use three publicly available datasets: CC-CCII (Zhang et al. 2020b), MosMedData (Morozov et al.2020) and COVID-CTset (Rahimzadeh, Attar, and Sakhaei2020). The three datasets are all chest CT volumes. How-ever, since the data format varies from the three datasets, itis necessary to pre-process each dataset to make them followa uniﬁed way of reading data.The original CC-CCII dataset contains a total number617,775 slices of 6,752 CT scans from 4,154 patients, but ithas ﬁve main problems (i.e., damaged data, non-uniﬁed datatype, repeated and noisy slices, disordered slices, and non-segmented slices) that would have high negative impacts onthe model performance. To solve these problems, we man-ually remove the damaged, repeated and noisy data. Thenwe segment the lung part for the unsegmented slice imageand convert the whole dataset to PNG format. After address-ing the above problems, we build a clean CC-CCII datasetnamed

Clean-CC-CCII , which consists of 340,190 slices of3,993 scans from 2,698 patients.

Scan images construction

Each CT scan contains a dif-ferent number of slices, but DL models require the samedimensional inputs. To this end, we propose two slice sam-pling algorithms: random sampling and symmetrical sam-

Table 2: The statistics of the three CT scan datasets.Dataset[Format] Classes pling . Speciﬁcally, the random sampling strategy is appliedto the training set, which can be regarded as the data aug-mentation, while the symmetrical sampling strategy is per-formed on the test set to avoid introducing randomness intothe testing results. The symmetrical sampling strategy refersto sampling from the middle to both sides at equal intervals.The relative order between slices remains the same beforeand after sampling.

Benchmarking

We use three manually-designed 3D neural architectures asthe baseline methods: DenseNet3D121 (Diba et al. 2017),ResNet3D101 (Tran et al. 2018), and MC3 18 (Tran et al.2018). As shown in Fig. 3, after building the scan images bythe sampling algorithm, we further apply transformations toscans, including resize, center-crop, and normalization. Be-sides, for the training set, we also perform a random ﬂipoperation in the horizontal or vertical direction. The otherimplementation details are as follows: we use the Adam(Kingma and Ba 2015) optimizer and the weight decay of5e-4. We start the learning rate of 0.001 and anneal it downto 1e-5. All baseline models are trained for 200 epochs.

NAS for CT Scan Classiﬁcation

To verify the efﬁciency of the method, we apply the DNASmethod combined with the Gumbel Softmax technique tosearch for neural architectures on the three datasets. Thepipeline of our method is shown in Fig. 4, which containstwo sequential stages: search stage and evaluation stage.

Search stage

In our experiments, the supernet consistsof 6 cells with the number of blocks of [4 , , , , , .Besides, the blocks within the same cell have the samenumber of channels. Here, we test two settings: small-scale and large-scale, where the number of channelsof blocks in the 6 cells is [24 , , , , , and [32 , , , , , , respectively. We name themodels searched under the two settings as CovidNet3D-S and

CovidNet3D-L , respectively. The stem block is aConv3D-BN3D-ReLU6 sequential module with the numberof output channels ﬁxed to 32.To improve searching efﬁciency, we set the input reso-lution to 64 ×

64, and the number of slices in a scan to 16.We implement three independent search experiments for thethree datasets. During the search stage, we split the trainingset into the training set D T and the validation set D V . In eachstep, we ﬁrst use D V to update the architecture parameters α , and then use the training set to update the sampled archi-tecture weights ω α . Besides, the architecture parameter α is optimized by the Adam (Kingma and Ba 2015) optimizer,and the architecture weights are optimized with the SGD op-timizer with a momentum of 3e-4. The initial learning ratefor both optimizers is 0.001. Each experiment is conductedon four Nvidia Tesla V100 GPUs (the 32GB PCIe version)and it can be ﬁnished in about 2 hours. After each epoch,we save the sampled architecture and its performance (e.g.,accuracy). Therefore, we generate 100 neural architecturesfor each experiment after the search stage. Evaluation stage

As Fig. 4 shows, the search stagesrecords the performance of the sampled architectures. In theevaluation stage, we select top-10 architectures and trainingthese architectures with the training set for several batches,then the best-performing architecture will be retrained for200 epochs with the full training set, and then evaluatedon the test set. We set different input resolutions for threedatasets to evaluate the generalization of searched architec-tures. Besides, since the number of slices contained in CTscans of different datasets is different, we set the intermedi-ate value for each dataset, shown in Table 3. Each evalua-tion experiment uses the same settings as follows: we usethe Adam (Kingma and Ba 2015) optimizer with an ini-tial learning rate of 0.001. The cosine annealing scheduler(Loshchilov and Hutter 2017) is applied to adjust the learn-ing rate. We use Cross-entropy as the loss function.

Results and Analysis

Evaluation Metrics

Our experiment results are summarized in Table 3. We com-pare our searched models with SOTA efﬁcient models. Weuse several commonly used evaluation metrics to compare

Figure 4: The pipeline of DNAS consists of two stages: thesearch and the evaluation stage.the model performance, as follows:

P recision = T PT P + F P (5)

Sensitivity ( Recall ) =

T PT P + F N (6) F − score = 2 × ( precision × recall ) precision + recall (7) Accuracy = T N + T PT N + T P + F N + F P (8)To be noticed, the positive and negative cases are as-signed to the COVID-19 class and the non-COVID-19 class,respectively. Speciﬁcally,

T P and

T N indicate the num-ber of correctly classiﬁed COVID-19 and non-COVID-19scans, respectively.

F P and

F N indicate the number ofwrongly classiﬁed COVID-19 and non-COVID-19 scans, re-spectively. For the Clean-CC-CCII dataset, the non-COVID-19 class includes both normal and common pneumonia. Theaccuracy is the micro-averaging value for all test data toevaluate the overall performance of the model. Besides, wealso take the model size as an evaluation metric to comparethe model efﬁciency.

Results on the Three CT Datasets

Table 3 divides the results according to the datasets. Wecan see that our searched CovidNet3D models outperformall baseline models on the three datasets in terms of accu-racy. Speciﬁcally, CovidNet3D-L models achieve the high-est accuracy of the three datasets. Besides, all CovidNet3D-S models are with much smaller sizes than the baselinemodels, but they can also achieve similar or even betterable 3: The experimental results of manually designed models and DNAS-designed models.Dataset Model Model size(MB) Input size ×

128 32 85.54 89.62 77.15 0.8292DenseNet3D121 43.06 128 ×

128 32 87.02 88.97 82.78 0.8576MC3 18 43.84 128 ×

128 32 86.16 87.11 82.78 0.8489CovidNet3D-S 11.48 128 ×

128 32 88.55 88.78

CovidNet3D-L 53.26 128 ×

128 32 ×

256 40 81.82 81.31 97.25

DenseNet3D121 43.06 256 ×

256 40 79.55 ×

256 40 80.4 79.43 98.43 0.8792CovidNet3D-S 12.48 256 ×

256 40 81.17 78.82 ×

256 40 ×

512 32 93.87 92.34 ×

512 32 91.91 92.57 92.57 0.9257MC3 18 43.84 512 ×

512 32 92.57 90.95 94.55 0.9272CovidNet3D-S 8.36 512 ×

512 32 94.27 92.68 90.48 0.9157CovidNet3D-L 62.82 512 ×

512 32 results. For example, CovidNet3D-S (8.36 MB) achieves94.27% accuracy on Covid-CTset, which is 41 × smallerthan ResNet3D101 (325.21 MB) with 0.4% higher accuracy.In summary, the results demonstrate that our DNAS methodcan discover well-performing models without inconsistencyon network size, input size or scan depth (the number ofslices).We can also see that the performance of both baselinemodels and our CovidNet3D on the MosMedData datasetis not as good as that on the other two datasets. There aretwo possible reasons. One is that the MosMedData datasets’soriginal data format is NIfTI, but all our models do not con-verge when trained with NIfTI ﬁles; therefore we convertNIfTI to Portable Network Graphics (PNG) format, and thisprocess would loss information of the input ﬁles. The otherpossible reason is that the MosMedData dataset is imbal-anced (shown in Table 2), which increases the difﬁculty ofmodel training.We also ﬁnd that the random seed greatly inﬂuences onthe training of the searched CovidNet3D model through ex-periments. In other words, the results obtained by using dif-ferent seeds for the same model would differ signiﬁcantly.Hence, how to improve the robustness of NAS-based mod-els is worthy for further exploring. Interpretability

Although our model achieves promising result in detectingCOVID-19 in CT images, classiﬁcation result itself does nothelp clinical diagnosis without proving the inner mechanismwhich leads to the ﬁnal decision makes sense. To inspectour CovidNet3D model’s inner mechanism, we apply ClassActivation Mapping (CAM) (Zhou et al. 2016) on it.CAM is an algorithm that can visualize the discriminativelesion regions that the model focuses on. In Fig. 5, we applyCAM on each slice of a whole 3D CT scan volume fromClean-CC-CCII dataset. Regions appear red and brighterhave a larger impact on the model’s decision to classify it toCOVID-19. From the perspective of the scan volume, we can Figure 5: The class activation mappings generated on Covid-Net3D on a chest CT scan of the Clean-CC-CCII dataset.Regions colored in red and brighter has more impact onmodel’s decision to the class of COVID-19 while blue anddarker region has less.see that some slices have more impacts on the model’s deci-sion than the others. In terms of a single slice, the areas thatCovidNet3D focuses on has ground-glass opacity, which isproved a distinctive feature of CT images of COVID-19Chest CT images (Bai et al. 2020). CAM enables the inter-pretability of our searched models (CovidNet3D), helpingdoctors quickly locate the discriminative lesion areas.

Conclusion

In this work, we introduce the differentiable neural architec-ture (DNAS) framework combined with the Gumbel Soft-max technique to search for 3D models on three open-sourceCOVID-19 CT scan datasets. The results show that Covid-Net3D, a family of models discovered by DNAS can achievecomparable results to the baseline 3D models with smallersize, which demonstrates that NAS is a powerful tool for as-sisting in COVID-19 detection. In the future, we will applyNAS to the task of 3D medical image segmentation to locatethe lesion areas in a more ﬁne-grained manner. cknowledgments

The research was supported by the grant RMGS2019 1 23from Hong Kong Research Matching Grant Scheme . Wewould like to thank the anonymous reviewers for their valu-able comments.

References [Ai et al. 2020] Ai, T.; Yang, Z.; Hou, H.; Zhan, C.; Chen,C.; Lv, W.; Tao, Q.; Sun, Z.; and Xia, L. 2020. Corre-lation of chest ct and rt-pcr testing in coronavirus disease2019 (covid-19) in china: a report of 1014 cases.

Radiology arXiv preprint arXiv:2004.03747 .[Ardakani et al. 2020] Ardakani, A. A.; Kanaﬁ, A. R.;Acharya, U. R.; Khadem, N.; and Mohammadi, A. 2020.Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of10 convolutional neural networks.

Computers in Biologyand Medicine

Radiology arXivpreprint 2003.11597 .[Deng et al. 2009] Deng, J.; Dong, W.; Socher, R.; Li, L.; Li,K.; and Li, F. 2009. Imagenet: A large-scale hierarchicalimage database. In , 248–255.IEEE Computer Society.[Diba et al. 2017] Diba, A.; Fayyaz, M.; Sharma, V.; Karami,A. H.; Arzani, M. M.; Yousefzadeh, R.; and Van Gool,L. 2017. Temporal 3d convnets: New architecture andtransfer learning for video classiﬁcation. arXiv preprintarXiv:1711.08200 .[Dong and Yang 2019] Dong, X., and Yang, Y. 2019.Searching for a robust neural architecture in four GPU hours.In

IEEE Conference on Computer Vision and Pattern Recog-nition, CVPR 2019, Long Beach, CA, USA, June 16-20,2019 , 1761–1770. Computer Vision Foundation / IEEE.[Elsken, Metzen, and Hutter 2018] Elsken, T.; Metzen, J. H.;and Hutter, F. 2018. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 .[Faes et al. 2019] Faes, L.; Wagner, S.; Fu, D.; Liu, X.; Ko-rot, E.; Ledsam, J.; Back, T.; Chopra, R.; Pontikos, N.; Kern,C.; Moraes, G.; Schmid, M.; Sim, D.; Balaskas, K.; Bach-mann, L.; Denniston, A.; and Keane, P. 2019. Automateddeep learning design for medical image classiﬁcation byhealth-care professionals with no coding experience: a fea-sibility study.

The Lancet Digital Health arXivpreprint arXiv:2003.10769 medRxiv .[He, Zhao, and Chu 2021] He, X.; Zhao, K.; and Chu, X.2021. Automl: A survey of the state-of-the-art.

Knowledge-Based Systems arXiv preprintarXiv:1704.04861 .[Jang, Gu, and Poole 2017] Jang, E.; Gu, S.; and Poole, B.2017. Categorical reparameterization with gumbel-softmax.In . OpenReview.net.[Jun et al. 2020] Jun, M.; Cheng, G.; Yixin, W.; Xingle, A.;Jiantao, G.; Ziqi, Y.; Minqing, Z.; Xin, L.; Xueyuan, D.;Shucheng, C.; Hao, W.; Sen, M.; Xiaoyu, Y.; Ziwei, N.;Chen, L.; Lu, T.; Yuntao, Z.; Qiongjie, Z.; Guoqiang, D.; andJian, H. 2020. Covid-19 ct lung and infection segmentationdataset.

Zenodo .[Kim et al. 2019] Kim, S.; Kim, I.; Lim, S.; Baek, W.; Kim,C.; Cho, H.; Yoon, B.; and Kim, T. 2019. Scalable neuralarchitecture search for 3d medical image segmentation. In

International Conference on Medical Image Computing andComputer-Assisted Intervention , 220–228. Springer.[Kingma and Ba 2015] Kingma, D. P., and Ba, J. 2015.Adam: A method for stochastic optimization. In Bengio, Y.,and LeCun, Y., eds., .[Li et al. 2020] Li, L.; Qin, L.; Xu, Z.; Yin, Y.; Wang, X.;Kong, B.; Bai, J.; Lu, Y.; Fang, Z.; Song, Q.; et al. 2020. Ar-tiﬁcial intelligence distinguishes covid-19 from communityacquired pneumonia on chest ct.

Radiology

MedicalImage Analysis

Proceedings of the European Conference on Computer Vi-sion (ECCV) .OpenReview.net.[Loshchilov and Hutter 2017] Loshchilov, I., and Hutter, F.017. SGDR: stochastic gradient descent with warmrestarts. In . OpenReview.net.[Mobiny et al. 2020] Mobiny, A.; Cicalese, P. A.; Zare, S.;Yuan, P.; Abavisani, M.; Wu, C. C.; Ahuja, J.; de Groot,P. M.; and Van Nguyen, H. 2020. Radiologist-LevelCOVID-19 Detection Using CT Scans with Detail-OrientedCapsule Networks. arXiv preprint arXiv:2004.07407 .[Morozov et al. 2020] Morozov, S.; Andreychenko, A.;Pavlov, N.; Vladzymyrskyy, A.; Ledikhova, N.; Gom-bolevskiy, V.; Blokhin, I.; Gelezhe, P.; Gonchar, A.;Chernina, V.; and Babkin, V. 2020. Mosmeddata: Chest ctscans with covid-19 related ﬁndings. medRxiv .[Narin, Kaya, and Pamuk 2020] Narin, A.; Kaya, C.; and Pa-muk, Z. 2020. Automatic Detection of Coronavirus Disease(COVID-19) Using X-ray Images and Deep ConvolutionalNeural Networks. arXiv preprint arXiv:2003.10849 .[Pham et al. 2018] Pham, H.; Guan, M. Y.; Zoph, B.; Le,Q. V.; and Dean, J. 2018. Efﬁcient neural architecture searchvia parameter sharing. In Dy, J. G., and Krause, A., eds.,

Proceedings of the 35th International Conference on Ma-chine Learning, ICML 2018, Stockholmsm¨assan, Stockholm,Sweden, July 10-15, 2018 , volume 80 of

Proceedings of Ma-chine Learning Research , 4092–4101. PMLR.[Rahimzadeh, Attar, and Sakhaei 2020] Rahimzadeh, M.;Attar, A.; and Sakhaei, S. M. 2020. A fully automated deeplearning-based network for detecting covid-19 from a newand large lung ct scan dataset. medRxiv .[Real et al. 2019] Real, E.; Aggarwal, A.; Huang, Y.; and Le,Q. V. 2019. Regularized evolution for image classiﬁer ar-chitecture search. In

The Thirty-Third AAAI Conference onArtiﬁcial Intelligence, AAAI 2019, The Thirty-First Innova-tive Applications of Artiﬁcial Intelligence Conference, IAAI2019, The Ninth AAAI Symposium on Educational Advancesin Artiﬁcial Intelligence, EAAI 2019, Honolulu, Hawaii,USA, January 27 - February 1, 2019 , 4780–4789. AAAIPress.[Sandler et al. 2018] Sandler, M.; Howard, A. G.; Zhu, M.;Zhmoginov, A.; and Chen, L. 2018. Mobilenetv2: Invertedresiduals and linear bottlenecks. In , 4510–4520.IEEE Computer Society.[Singh et al. 2020] Singh, D.; Kumar, V.; Vaishali; and Kaur,M. 2020. Classiﬁcation of COVID-19 patients from chestCT images using multi-objective differential evolution-based convolutional neural networks.

European journal ofclinical microbiology & infectious diseases : ofﬁcial publi-cation of European Society of Clinical Microbiology .[Tan et al. 2019] Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.;Sandler, M.; Howard, A.; and Le, Q. V. 2019. Mnasnet:Platform-aware neural architecture search for mobile. In

IEEE Conference on Computer Vision and Pattern Recogni-tion, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019 ,2820–2828. Computer Vision Foundation / IEEE. [Tran et al. 2018] Tran, D.; Wang, H.; Torresani, L.; Ray, J.;LeCun, Y.; and Paluri, M. 2018. A closer look at spa-tiotemporal convolutions for action recognition. In , 6450–6459. IEEE Computer Society.[Wu et al. 2019] Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun,F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; and Keutzer, K. 2019.Fbnet: Hardware-aware efﬁcient convnet design via differ-entiable neural architecture search. In

IEEE Conferenceon Computer Vision and Pattern Recognition, CVPR 2019,Long Beach, CA, USA, June 16-20, 2019 , 10734–10742.Computer Vision Foundation / IEEE.[Zhang et al. 2020a] Zhang, J.; Xie, Y.; Li, Y.; Shen, C.; andXia, Y. 2020a. COVID-19 Screening on Chest X-ray Im-ages Using Deep Learning based Anomaly Detection. arXivpreprint arXiv:2003.12338 .[Zhang et al. 2020b] Zhang, K.; Liu, X.; Shen, J.; Li, Z.;Sang, Y.; Wu, X.; Zha, Y.; Liang, W.; Wang, C.; Wang, K.;et al. 2020b. Clinically applicable AI system for accurate di-agnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography.

Cell .[Zhao et al. 2020] Zhao, J.; He, X.; Yang, X.; Zhang, Y.;Zhang, S.; and Xie, P. 2020. Covid-ct-dataset: A ct scandataset about covid-19.[Zheng et al. 2020] Zheng, C.; bo Deng, X.; Fu, Q.; Zhou,Q.; Feng, J.; Ma, H.; Liu, W.; and Wang, X. 2020. Deeplearning-based detection for covid-19 from chest ct usingweak label. medRxiv .[Zhou et al. 2016] Zhou, B.; Khosla, A.; Lapedriza, `A.;Oliva, A.; and Torralba, A. 2016. Learning deep featuresfor discriminative localization. In , 2921–2929. IEEEComputer Society.[Zoph and Le 2017] Zoph, B., and Le, Q. V. 2017. Neuralarchitecture search with reinforcement learning. In . OpenReview.net.[Zoph et al. 2018] Zoph, B.; Vasudevan, V.; Shlens, J.; andLe, Q. V. 2018. Learning transferable architectures for scal-able image recognition. In2018 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR 2018, Salt LakeCity, UT, USA, June 18-22, 2018