[PDF] Comparative Evaluation of 3D and 2D Deep Learning Techniques for Semantic Segmentation in CT Scans

Abstract

Image segmentation plays a pivotal role in several medical-imaging applications by assisting the segmentation of the regions of interest. Deep learning-based approaches have been widely adopted for semantic segmentation of medical data. In recent years, in addition to 2D deep learning architectures, 3D architectures have been employed as the predictive algorithms for 3D medical image data. In this paper, we propose a 3D stack-based deep learning technique for segmenting manifestations of consolidation and ground-glass opacities in 3D Computed Tomography (CT) scans. We also present a comparison based on the segmentation results, the contextual information retained, and the inference time between this 3D technique and a traditional 2D deep learning technique. We also define the area-plot, which represents the peculiar pattern observed in the slice-wise areas of the pathology regions predicted by these deep learning models. In our exhaustive evaluation, 3D technique performs better than the 2D technique for the segmentation of CT scans. We get dice scores of 79% and 73% for the 3D and the 2D techniques respectively. The 3D technique results in a 5X reduction in the inference time compared to the 2D technique. Results also show that the area-plots predicted by the 3D model are more similar to the ground truth than those predicted by the 2D model. We also show how increasing the amount of contextual information retained during the training can improve the 3D model's performance.

Full PDF

CComparative Evaluation of 3D and 2D Deep Learning Techniques for SemanticSegmentation in CT Scans

Abhishek Shivdeo , Rohit Lokwani , Viraj Kulkarni , Amit Kharat , Aniruddha Pant

DeepTek Inc

Abstract

Image segmentation plays a pivotal role in severalmedical-imaging applications by assisting the seg-mentation of the regions of interest. Deep learning-based approaches have been widely adopted for se-mantic segmentation of medical data. In recentyears, in addition to 2D deep learning architectures,3D architectures have been employed as the predic-tive algorithms for 3D medical image data. In thispaper, we propose a 3D stack-based deep learningtechnique for segmenting manifestations of con-solidation and ground-glass opacities in 3D Com-puted Tomography (CT) scans. We also present acomparison based on the segmentation results, thecontextual information retained, and the inferencetime between this 3D technique and a traditional2D deep learning technique. We also deﬁne thearea-plot, which represents the peculiar pattern ob-served in the slice-wise areas of the pathology re-gions predicted by these deep learning models. Inour exhaustive evaluation, 3D technique performsbetter than the 2D technique for the segmentationof CT scans. We get dice scores of 79% and 73%for the 3D and the 2D techniques respectively. The3D technique results in a 5X reduction in the in-ference time compared to the 2D technique. Re-sults also show that the area-plots predicted by the3D model are more similar to the ground truth thanthose predicted by the 2D model. We also showhow increasing the amount of contextual informa-tion retained during the training can improve the3D model’s performance.

Medical imaging techniques like X-rays, Magnetic Reso-nance Imaging (MRI), Computed Tomography (CT), etc.,provide precise anatomy of a human body and thus help de-tect abnormalities present in the body [1]. Effective and earlyidentiﬁcation of regions of infection in medical images canplay a crucial role in assisting the doctors for the treatment ofvarious pathologies. For instance, observing X-rays and ﬁnd-ing early signs of pneumonia, which causes around 50,000deaths per year in the US [2], and treating it in time can save many lives. However, X-rays compress a 3D volume into asingle 2D image, which causes loss of information. X-raysalso lack speciﬁcity when pathology regions are concealedby overlapping tissues, bones, or bad contrast environmentswhen detecting pathologies like COVID-19 [3]. Thus, thismakes understanding and identifying a pathology using high-resolution CT scans a sought after medical diagnosis tech-nique [4].CT scans are 3D medical images that comprise severalslices or images, similar to X-rays, stacked upon each other,which combine to give us a volumetric representation of theinterior aspects of our body [5]. Nevertheless, classifying andmarking regions of interest in CT scans needs signiﬁcant ef-fort from the radiologists. Hence, automated detection andsegmentation of pathologies in CT scans, to reduce a radiolo-gist’s involvement, is seen as an essential tool for diagnosingand treating a disease.The rapid research and development in machine learning,graphics processing technologies and the availability of largeamounts of data [6] [7] [8] [9] have improved the ﬁeld ofComputer vision [10] [6]. The availability of high-qualitymedical image datasets [11] combined with rapid advance-ment in CNN based architectures has led to an increase inthe adoption of deep learning models to assist radiologists inevaluating CT scans. Deep learning models are used to detect,classify, and segment fractures, tumors, and other pathologiesin CT scans. However, there is a lot of variation in the contrastof images based on different radiation doses [12] [13] givento patients. The quality of CT scanners in different hospi-tals, and slice thickness can also differ from scan to scan in amulti-sourced dataset, making it challenging to train machinelearning models for CT scans. Also, selecting a proper CTwindow, by manipulating the Hounsﬁeld unit (HU) values ina CT, can affect the model’s performance [14]. A deep learn-ing model needs to be robust enough to handle these varia-tions or it may experience a covariance shift when it is testedon out-of-source data. Studies have tried to mitigate these ef-fects by trying out image noise reduction methods to reducethe radiation dose of CT imaging [15] [16].Researchers have adopted both 2D as well as 3D ap-proaches [17]. Studies conducted by Zhou et al. [18] comparethe segmentation performance of 2D and 3D deep learning-based approaches using conventional segmentation metricsuch as dice score. In a 2D deep learning technique [19], the a r X i v : . [ ee ss . I V ] J a n nput to the model is a single 2D image, whereas, in a 3D deeplearning technique [20], the model takes a 3D volume as itsinput. Both these approaches employ a Fully Connected Net-work (FCN) for segmentation. As opposed to training sliceby slice in 2D FCNs, the 3D FCNs models analyze volumet-ric input data and utilize the global features in between the CTslices. Just like how a word in a sentence gives a clue aboutwhat the next word could be [21] [22], a CT scan’s slice cangive a clue about the shape of the pathology in its adjacentslices. This is because, in most of the pathologies, (consoli-dation and ground-glass-opacities here) the regions of interest(ROI) or the areas of the pathology in a scan follow a contin-uous pattern. As the CT scans are captured in a particularorder, it is observed that continuity of manifestations existsin the adjacent slices, we can observe the same in Figure 6.Using 3D models for the segmentation of CT scans is sim-ilar to using LSTMs with the attention module [23] for theformation of a sentence. This contextual information is lostwhen we use 2D CT scans because the 2D model predicts theoutcome by considering an individual slice as a single datapoint, thus their prediction is not affected by the adjacentslices, which is evident from the results of our experiments(Figure 7).In our paper, we propose a 3D technique for the segmenta-tion of consolidation and ground-glass opacities in CT scans,and also present a comparison between this 3D technique anda traditional 2D technique. We compare the dice scores be-tween the predicted masks and the radiologists’ annotationsfor both these techniques. We also plot the areas of the pre-dicted masks versus the position of the slice in the CT scan forboth 2D and 3D techniques, and compare it with the groundtruth area-plots. We also compare the inference time for thesetwo techniques, which is an important factor when a modelruns inference in real-life situations post-deployment. Recent studies have emphasized the use of deep learning formedical imaging analysis. The problems solved using deeplearning can be broadly classiﬁed into image classiﬁcationand semantic segmentation. Convolutional Neural Networks(CNNs) are commonly used for image classiﬁcation. Badeaet al. [24] used LeNet [25] and NiN (Network in Network)[26] for classifying burns on the human body from images ofsize 320 x 240 captured using a camera and achieved an ac-curacy of 75.91% and 58.01% for classiﬁcation of Skin vs.Burn and Skin vs. Light Burn vs. Serious Burn, respec-tively. Polsinelli et al. [27] used SqueezeNet, a CNN ar-chitecture, for classifying CT scan’s slices into COVID-19 ornon-COVID-19 with an accuracy of 85%. The classiﬁcationprocess is simpler than segmentation because in classiﬁcationall the pixels in a single image need to be grouped into a sin-gle class. While in semantic segmentation, each pixel needsto be assigned a class.Image segmentation was initially solved using conven-tional image processing approaches. Okada et al.[28] pro-posed an image processing technique for multi-organ seg-mentation in which he used statistical shape modeling andprobabilistic atlas and the segmentation of organs was done by combining the intra-organ information with the inter-organcorrelation to get an average dice coefﬁcients of 92% for theliver, spleen, and kidneys, and a dice coefﬁcient of 73% and67% for the pancreas and gallbladder, respectively. This con-ventional approach demonstrated highly accurate multiple or-gan segmentation techniques for CT scans and presented adetailed evaluation of the observations.After the development of encoder-decoder [29] architec-tures, they were commonly used for segmentation purposes.U-Net, a 2D deep learning approach, proposed by Ron-neberger et al. [30], segmented a single (512, 512) imagein under a second on an NVidia Titan GPU. U-Net was fast,efﬁcient, and accurate and thus was the ﬁrst widely used deeplearning architecture for image segmentation tasks for med-ical image data [30]. Christ et al. [31] cascaded two FCNsto segment out the liver and its lesions from CT and MRIscans and achieved an accuracy of around 94% on the val-idation set in under 100 seconds per volume. Almotairi etal. [32] proposed another deep learning architecture, Seg-Net, that employed a trained VGG-16 image classiﬁcationnetwork as its encoder, and had a corresponding decoder ar-chitecture for pixel-wise classiﬁcation at the end, which wasable to achieve an accuracy of 99.99% for segmenting a livertumor. For these 2D approaches, slices of the MRI and CTscans present in the dataset were treated as individual 2D im-ages, which means that the 3D volumetric data is transformedinto a 2D planar data. Other such 2D based image segmenta-tion approaches were implemented [33] [34] [35] [36] [37].Zhou et al. [38] proposed a segmentation approach inwhich the 2D slice-wise results were later combined using3D majority voting, where a simple encoder-decoder networkwas combined to be a part of an all-in-one network whichcould segment out complicated multiple organs; it correctlysegmented 89% of the voxels from the CT scans.However, in recent years, due to improved 3D convolu-tion architectures and advancements in computational power(GPUs), training of highly complex 3D deep learning modelshaving a 3D volume as its input, has become much more ac-curate, efﬁcient, and faster [39]. Cicek et al. [40] proposeda 3D U-Net architecture that predicted volumetric segmenta-tion using 2D annotated slices. The average Intersection overUnion (IoU) achieved was 0.863. They [40] were able to an-notate unseen data as well as densify the sparsely annotateddata.Milletari et al. [41] proposed V-Net: a novel fully convolu-tional neural network for volumetric medical image segmen-tation which gave an average dice score of 86% to segmentout the prostate depicted in 30 MRI scans. These MRIs wereconverted into a constant volume of 128 × 128 × 64 using B-spline interpolation, which alters the global features duringthe conversion and can have detrimental effects on the train-ing.VoxResNet proposed by Chen et al. [42], which borrowsthe spirit of deep residual learning in 2D image recognitiontasks, and is extended into a 3D variant for handling volu-metric data, has also been successfully applied for 3D med-ical image segmentation tasks. Allan et al. [43] proposedanother 3D FCN model architecture called “3D-DenseUNet-569” for liver and tumor segmentation which used Depthwiseeparable Convolution (DS-Conv) as opposed to traditionalconvolution.Although between 2D and 3D, 3D FCN provides us withmore accurate results, it is more complex and requires highermemory along with greater computational resources [44].The higher complexity restrains the model from training alarger dataset efﬁciently. Moreover, the high memory foot-print leads to reduced network depth and ﬁlter size, whichadversely affects the performance of the model [45].

For our experiments, we obtained 182 CT scans from 2 pri-vate Indian hospitals. These CT scans have non-uniform vol-umes and are annotated for consolidation and ground-glass-opacities [46]. Our team of expert radiologists marked outthe regions of infections, in the form of free-hand annota-tions, which served as the ground truth for our model. Theseprecise annotations were done using the ITK-snap tool, anopen-source free-hand annotation tool [47]. Some examplesof the CT slices and their superimposed masks are showcasedin the Figure 1 (a) and Figure 1 (b).

Figure 1: (a,b): CT Slice (Left), Free hand annotated CT Slice(Right)

Dataset Number of CT Scans Number of SlicesTraining 126 56387Validation 20 9992Test 36 14727

Table 1: Scan-level and Slice-level Dataset splits

The positive class COVID-19 comprised consolidation andground-glass opacities. These chest CT scans were dividedinto train, validation, and test datasets whose splits are givenin Table 1.The prevalence for all the datasets is 20%, which meansthat there are 20% positive slices in the total dataset. We re-sized all the images to a standard image size of (512, 512).Windowing, also known as grey-level mapping is the pre-processing of CT scans in which the grey-scale componentof the CT slice is changed to highlight some particular fea-tures. We applied windowing to our scans [48], for whichwe used the information stored in the metadata from the DI-COM ﬁles of the CT scans’ slices. The masks were storedas binary images of size (512, 512). The distribution of thenumber of slices in these CT scans for the whole dataset canbe visualized from the histogram in Figure 2

Figure 2: Volume variation in the CT scans for the whole dataset

In this paper, we follow two approaches to address image seg-mentation of these CT scans:

Figure 3: 2D Technique

In this approach, we divide the CT scan volume into sepa-rate 2D slices (Figure 3). For example, a CT scan of volume(512, 512, 601) containing 601 slices was divided into 601individual images of (512, 512). We used the U-net with theConvolutional Block Attention Module (CBAM) [49] to fo-cus its attention on a region of the image and then segmentthe pathology from that region of the image. U-Nets are fullyconvolutional networks having skip connections between en-coder and decoder which provide deconvolution layers withimportant features [50]. We used Xception [51] as the en-coder having depthwise separable convolutions and residualconnections. The model was trained using ADAM as its opti-mizer with an initial learning rate of 1e-3 and having a learn-ing rate scheduler which reduced the learning rate to 1/3 theoriginal learning rate, every 5 epochs. We used dice loss asthe loss function. Our architecture had 38 million trainableparameters.

In the 3D approach, instead of assigning 2D images as theinput for the model, we provide 3D volumes as to input forhe model, and the corresponding stacked annotated slices asthe label. For our 3D approach, we use V-Net [41], which is a3D implementation of U-Net. For our input size of (512, 512,32); V-Net had 206 million trainable parameters. We used thedice loss as the loss function while training. The model wastrained using ADAM as its optimizer with an initial learningrate of 1e-3 and having a learning rate scheduler which re-duced the learning rate to 1/3 the original learning rate every5 epochs. The input to V-Net is volumetric data of dimen-sion (x, y, z) = (512, 512, 32) where x is slice’s height, y isslice’s width, and z is the number of slices in the depth of thevolume. To prepare the input data for training V-Net, we nor-malize the volume of the CT scans to the same value, havinga dimension equal to the input dimension of the model. Tosatisfy this condition without losing any information, we splitCTs into smaller volumes that match the input dimensions.We can see the CT scan being split into multiple stacks ofslices in Figure 4.

Figure 4: 3D Technique

There are three variables that affect the stacks creation pro-cess.1. CT Volume: Number of slices in the CT scan2. Stack size: Desired number of slices in the sub volume3.Overlap factor = (Number of overlapping slices/stack size)Where overlapping slices is the number of slices that are com-mon or overlapping in adjacent stacks.For a CT scan having 601 slices, with a stack size of 32,and the number of overlapping slices as 20 (overlap factor= 0.625), we get a list of 49 stacks having dimensions (512,512, 32). The ﬁrst stack will have indices from 0 to 32, whichmeans that the ﬁrst input datapoint for V-Net will be a CTsub volume from the 1st slice to the 32nd slice. The secondinput data point will be a sub volume of the CT from index 12to 44, both inclusive. As we have 20 overlapping slices, thesecond stack starts from the 12th index and not the 33rd. Andso on for the rest of the stacks. For the last stack, however,the indices are (576, 608), the 7 extra slices are the addedpaddings to keep the volume of the stack compatible with the3D model’s input dimension. During the inference, we keepthe overlap factor as 0, the list of predicted mask volumes arethen stacked together. So during the inference evaluation, wehave a total of 19 stacks grouped. We remove the paddingwhen the predictions are stacked together to match the vol- ume of the whole CT scan.

We evaluate the 2D and the 3D model based on these 5 crite-ria.

We predicted masks for 32 scans in the test set having a preva-lence of 20% and calculated the dice score for the wholedataset. We observed the dice scores given in Table 2.Model Dice Score2D Model 73%3D Model 79%

Table 2: Dice Scores for 2D and 3D techniques

We took the average of the dice scores of these 32 scans.For the 2D model, we arranged the individual slices on topof each other to get the ﬁnal predicted volume. For the 3Dmodel, we combined the stacks of an individual CT and thenremoved the corresponding padding to match the true labelvolume.

Figure 5(a) and 5(b) show the predicted masks at the scan andslice level for 2D and the 3D model. We applied a thresholdof 0.2 on the predictions.

Figure 5: a)3D Model, b) 2D ModelLeft=Original CT Slice, Middle = True label superimposed, Right =Predicted Mask superimposed

Here, we ﬁnd the area of the annotated region with respectto the total area of the image. We normalize this list of area-ratios by dividing all the values by the maximum value inthe list. Next, we plot these normalized values according tothe slices in the CT scan to get the plot in Figure 6, theseplots are called area-plots. We follow the same procedure forthe predicted masks for 2D and 3D models, by plotting theirnormalized area-plots according to the position of the slicesin the CT scan in Figure 7, 8 respectively. We have comparedhe predicted area-plots for both the 2D and the 3D approachto the area-plots of the true label.

Figure 6: True Label’s Area-PlotFigure 7: Area-Plot predicted by 2D ModelFigure 8: Area-Plot predicted by 3D Model

1. From Figure 6, we observe a continuous pattern inthe manifestations of the pathology in the CT scan. Similarcontinuous patterns were observed for the rest of the CTscans, area-plots.2. From Figure 7, we observe that there are abrupt variationsin the predicted mask’s area for the slices within a CT scanwhen we use the 2D approach.3. From Figure 8, we can see the prediction area-plot for aCT scan when using the 3D approach. We observe a smoothand continuous pattern in the masks predicted by the 3Dmodel for the slices of that CT scan.To further prove and understand point number 2, we plottedthe predicted masks for consecutive slices in a CT scan using2D and the 3D approach.For the four consecutive positive slices that we chose, the3D model predicted all the slices as positive, i.e.(1,1,1,1),whereas, in the case of the 2D model, the predictions were(0,1,0,0). From Figure 9, we observe that the 3D approachpredicted the masks in all 4 consecutive slices to closely

Figure 9: Predicted masks in four consecutive slices (a, b, c, d).Left: Original slice, Middle: True Mask, Right: Predicted Mask2D (top 3 images) and 3D (bottom 3 images) match the labels. This represents the continuity in the pre-dictions. We also observe that the shape or the area of thepredicted masks do not change abruptly when we look atthe adjacent slices. Although, in the masks predicted by the2D approach, we observe that only some of the slices weremarked positive. This represents the discontinuity in the pre-dicted masks’ pattern. It indicates that the 2D approach doesnot consider the slices’ information adjacent to the input slice,thus giving us this abrupt predicted pattern, which is indepen-dent of the slices above and below the current slice.

We calculate the inference time for 2D as well as 3D tech-niques for a CT scan.Technique Inference time Inference timeWith GPU Without GPU2D 70 11453D 17 229

Table 3: Inference time for 2D and 3D techniques in seconds

Table 3 shows the inference time for a single CT scan hav-ing 709 slices with a stack size of 32 and an image size of(512, 512). We observe that there is a boost up of 5X in the3D approach as compared to the 2D approach. For inference,we used a Tesla T4 GPU with 15GB of memory.

In this experiment, we changed the overlap factor to see itseffect on the predictions. We kept the overlap factor as 0,0.375, and 0.625. The predicted area-plots are given in Figure11, 12, 13 respectively and the predicted masks are given inﬁg 14(a), 14(b), and 14(c) respectively. Figure 10 shows thetrue area-plot for the same CT scan.As we increase the overlap factor, the number of overlap-ping slices increases. This allows the model to interpret more igure 10: True Label’s Area-PlotFigure 11: Area-Plot predicted by 3D Model,with overlap factor = 0Figure 12: Area-Plot predicted by 3D Model,with overlap factor = 0.375Figure 13: Area-Plot predicted by 3D Model,with overlap factor = 0.625 global features. We observe that the predicted area-plots inFigure 14 with an overlap factor 0.625 match closely to thetrue area-plots in Figure 10. This shift in the pattern from Fig-ure 11 to Figure 13 demonstrates that increasing the amountof contextual information retained allows the model to per-form better thus giving closer predictions to the ground truth.

Figure 14: Predicted mask for the same slice having (a) overlap fac-tor = 0, (b)overlap factor = 0.375, (c) overlap factor = 0.625Left: Original slice, Middle: True Mask, Right: Predicted Mask

We implement and compare two deep learning approaches forsegmenting out manifestations of consolidation and ground-glass-opacities in CT scans on three major grounds: predictedmasks or the segmentation results, the pattern observed in thearea of the predicted masks in a CT scan, and the inferencetime.We saw that the 3D approach provided us with a better dicescore than the 2D approach, thus proving to be more accu-rate at segmenting pathology regions. We also observed fromFigure 8 that the area-plots we get using the 3D approachmatch closely to the original annotation area-plots against thearea-plots of the 2D approach. We attribute these peculiarpredicted patterns in the area-plots to the contextual informa-tion retained in the 3D stack volumes used to train the 3Dmodel. The independent nature of the input images in the 2Dmodel could be the reason for a discontinuous pattern in thepredicted area-plots of the 2D approach. When we calculatedthe inference time, the 3D approach provided a boost of 5Xin inference time compared to that of the 2D approach This ishugely beneﬁcial for effective and quick diagnosis of patientsin hospital settings especially the Intensive Care Units(ICUs).We conclude that this 3D stack-based approach is a betterchoice than the conventional 2D approach.Later, when we experimented with the overlap factor forthe 3D approach, the predicted area-plots’ patterns changed(Figure 11, 12, 13) with increasing overlap factor. However,a higher overlap factor may result in overﬁtting the modelto the given dataset. The model gets more susceptible to acovariance-shift in the case of out-of-sample datasets whenthe overlap factor is high. Thus, the overlap factor should betreated as a hyperparameter for this 3D approach and shouldbe set optimally.sually, in the case of 3D segmentation for CT scans, thewhole CT scan is compressed using spline interpolation [41],or other methods, to the size of input dimension for a 3Dmodel. This removes useful information when compressedfor training and adds noise when decompressed for inference[52]. To counter this alteration in the original data, we splitthe CT into smaller stacks to retain the valuable contextualinformation. Thus, our batch-based 3D approach is highlyadaptable to any CT volume because of its stack-based nature,where none of the information between the slices in the CT isaltered.The segmentation results can be improved using more data,and different augmentation techniques while training. In ourcase study, the goal was to compare the 2D and the 3D ap-proaches on equal grounds, and not the segmentation resultitself. In conclusion, the 3D approach that has been proposedin this paper has outperformed the traditional 2D approach inall the three criteria that we had set for evaluation.

References [1] A. Umar and S. Atabo, “A review of imaging techniquesin scientiﬁc research/clinical diagnosis,”

MOJ Anat &Physiol , vol. 6, no. 5, pp. 175–183, 2019.[2] C. for Disease Control, Prevention, et al. , “Pneumoniacan be prevented–vaccines can help,” 2012.[3] J. Zhang, Y. Xie, Z. Liao, G. Pang, J. Verjans, W. Li,Z. Sun, J. He, Y. Li, C. Shen, et al. , “Viral pneumoniascreening on chest x-ray images using conﬁdence-awareanomaly detection,” arXiv preprint arXiv:2003.12338 ,vol. 3, 2020.[4] K. Doi, “Computer-aided diagnosis in medical imaging:historical review, current status and future potential,”

Computerized medical imaging and graphics , vol. 31,no. 4-5, pp. 198–211, 2007.[5] O. H. Karatas and E. Toy, “Three-dimensional imagingtechniques: A literature review,”

European journal ofdentistry , vol. 8, no. 1, p. 132, 2014.[6] L. Rodr´ıguez-Mazahua, C.-A. Rodr´ıguez-Enr´ıquez,J. L. S´anchez-Cervantes, J. Cervantes, J. L. Garc´ıa-Alcaraz, and G. Alor-Hern´andez, “A general perspec-tive of big data: applications, tools, challenges andtrends,”

The Journal of Supercomputing , vol. 72, no. 8,pp. 3073–3113, 2016.[7] M. Minsky, “Steps toward artiﬁcial intelligence,”

Pro-ceedings of the IRE , vol. 49, no. 1, pp. 8–30, 1961.[8] I. M. Cockburn, R. Henderson, and S. Stern, “The im-pact of artiﬁcial intelligence on innovation,” tech. rep.,National bureau of economic research, 2018.[9] L. Yang, “Research on application of artiﬁcial intelli-gence based on big data background in computer net-work technology,”

Journal of Jiujiang Vocational &Technical College , vol. 392, no. 6, 2018.[10] Y. Lu and S. Young, “A survey of public datasetsfor computer vision tasks in precision agriculture,”

Computers and Electronics in Agriculture , vol. 178,p. 105760, 2020. [11] M. D. Kohli, R. M. Summers, and J. R. Geis, “Medi-cal image data and datasets in the era of machine learn-ing—whitepaper from the 2016 c-mimi meeting datasetsession,”

Journal of digital imaging , vol. 30, no. 4,pp. 392–399, 2017.[12] R. Smith-Bindman, Y. Wang, P. Chu, R. Chung, A. J.Einstein, J. Balcombe, M. Cocker, M. Das, B. N. Del-man, M. Flynn, et al. , “International variation in ra-diation dose for computed tomography examinations:prospective cohort study,”

Bmj , vol. 364, 2019.[13] S. Trattner, G. D. Pearson, C. Chin, D. D. Cody,R. Gupta, C. P. Hess, M. K. Kalra, J. M. Koﬂer Jr, M. S.Krishnam, and A. J. Einstein, “Standardization and op-timization of ct protocols to achieve low dose,”

Journalof the American College of Radiology , vol. 11, no. 3,pp. 271–278, 2014.[14] Z. Xue, S. Antani, L. R. Long, D. Demner-Fushman,and G. R. Thoma, “Window classiﬁcation of brain ctimages in biomedical articles,” in

AMIA Annual Sympo-sium Proceedings , vol. 2012, p. 1023, American Medi-cal Informatics Association, 2012.[15] M. J. Willemink and P. B. No¨el, “The evolution of im-age reconstruction for ct—from ﬁltered back projectionto artiﬁcial intelligence,”

European radiology , vol. 29,no. 5, pp. 2185–2195, 2019.[16] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou,M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-dose ct image denoising using a generative adversarialnetwork with wasserstein distance and perceptual loss,”

IEEE transactions on medical imaging , vol. 37, no. 6,pp. 1348–1357, 2018.[17] I. R. I. Haque and J. Neubert, “Deep learning ap-proaches to biomedical image segmentation,”

Informat-ics in Medicine Unlocked , vol. 18, p. 100297, 2020.[18] X. Zhou, K. Yamada, T. Kojima, R. Takayama, S. Wang,X. Zhou, T. Hara, and H. Fujita, “Performance evalua-tion of 2d and 3d deep learning approaches for auto-matic segmentation of multiple organs on ct images,”in

Medical Imaging 2018: Computer-Aided Diagnosis ,vol. 10575, p. 105752C, International Society for Opticsand Photonics, 2018.[19] A. D. Weston, P. Korﬁatis, T. L. Kline, K. A. Philbrick,P. Kostandy, T. Sakinis, M. Sugimoto, N. Takahashi, andB. J. Erickson, “Automated abdominal segmentation ofct scans for body composition analysis using deep learn-ing,”

Radiology , vol. 290, no. 3, pp. 669–679, 2019.[20] W. Zhao, D. Jiang, J. P. Queralta, and T. Westerlund,“Mss u-net: 3d segmentation of kidneys and tumorsfrom ct images with a multi-scale supervised u-net,”

In-formatics in Medicine Unlocked , p. 100357, 2020.[21] A. Akbik, D. Blythe, and R. Vollgraf, “Contextual stringembeddings for sequence labeling,” in

Proceedings ofthe 27th International Conference on ComputationalLinguistics , pp. 1638–1649, 2018.22] A. Hassan and A. Mahmood, “Deep learning for sen-tence classiﬁcation,” in ,pp. 1–5, IEEE, 2017.[23] C. Olah and S. Carter, “Attention and augmented recur-rent neural networks,”

Distill , vol. 1, no. 9, p. e1, 2016.[24] M.-S. Badea, I.-I. Felea, L. M. Florea, and C. Ver-tan, “The use of deep learning in image segmen-tation, classiﬁcation and detection,” arXiv preprintarXiv:1605.09612 , 2016.[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,“Gradient-based learning applied to document recog-nition,”

Proceedings of the IEEE , vol. 86, no. 11,pp. 2278–2324, 1998.[26] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400 , 2013.[27] M. Polsinelli, L. Cinque, and G. Placidi, “A light cnnfor detecting covid-19 from ct scans of the chest,” arXivpreprint arXiv:2004.12837 , 2020.[28] T. Okada, M. G. Linguraru, M. Hori, R. M. Summers,N. Tomiyama, and Y. Sato, “Abdominal multi-organsegmentation from ct images using conditional shape–location and unsupervised intensity priors,”

Medical im-age analysis , vol. 26, no. 1, pp. 1–18, 2015.[29] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Seg-net: A deep convolutional encoder-decoder architec-ture for image segmentation,”

IEEE transactions on pat-tern analysis and machine intelligence , vol. 39, no. 12,pp. 2481–2495, 2017.[30] O. Ronneberger, P. Fischer, and T. Brox, “U-net:Convolutional networks for biomedical image seg-mentation,” in

International Conference on Medicalimage computing and computer-assisted intervention ,pp. 234–241, Springer, 2015.[31] P. F. Christ, F. Ettlinger, F. Gr¨un, M. E. A. Elshaera,J. Lipkova, S. Schlecht, F. Ahmaddy, S. Tatavarty,M. Bickel, P. Bilic, et al. , “Automatic liver and tu-mor segmentation of ct and mri volumes using cascadedfully convolutional neural networks,” arXiv preprintarXiv:1702.05970 , 2017.[32] S. Almotairi, G. Kareem, M. Aouf, B. Almutairi, andM. A.-M. Salem, “Liver tumor segmentation in ct scansusing modiﬁed segnet,”

Sensors , vol. 20, no. 5, p. 1516,2020.[33] X. ZHOU, T. ITO, R. TAKAYAMA, S. WANG,T. HARA, and H. FUJITA, “First trial and evaluationof anatomical structure segmentations in 3d ct imagesbased only on deep learning,”

Medical Imaging and In-formation Sciences , vol. 33, no. 3, pp. 69–74, 2016.[34] J. Long, E. Shelhamer, and T. Darrell, “Fully convolu-tional networks for semantic segmentation,” in

Proceed-ings of the IEEE conference on computer vision and pat-tern recognition , pp. 3431–3440, 2015. [35] A. de Brebisson and G. Montana, “Deep neural net-works for anatomical brain segmentation,” in

Proceed-ings of the IEEE conference on computer vision and pat-tern recognition workshops , pp. 20–28, 2015.[36] H. R. Roth, A. Farag, L. Lu, E. B. Turkbey, and R. M.Summers, “Deep convolutional networks for pancreassegmentation in ct imaging,” in

Medical Imaging 2015:Image Processing , vol. 9413, p. 94131G, InternationalSociety for Optics and Photonics, 2015.[37] K. H. Cha, L. Hadjiiski, R. K. Samala, H.-P. Chan, E. M.Caoili, and R. H. Cohan, “Urinary bladder segmentationin ct urography using deep-learning convolutional neu-ral network and level sets,”

Medical physics , vol. 43,no. 4, pp. 1882–1896, 2016.[38] X. Zhou, T. Ito, R. Takayama, S. Wang, T. Hara, andH. Fujita, “Three-dimensional ct image segmentation bycombining 2d fully convolutional network with 3d ma-jority voting,” in

Deep Learning and Data Labeling forMedical Applications , pp. 111–120, Springer, 2016.[39] A. Kayid, Y. Khaled, and M. Elmahdy, “Performance ofcpus/gpus for deep learning workloads,”

The GermanUniversity in Cairo , 2018.[40] ¨O. C¸ ic¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, andO. Ronneberger, “3d u-net: learning dense volumetricsegmentation from sparse annotation,” in

Internationalconference on medical image computing and computer-assisted intervention , pp. 424–432, Springer, 2016.[41] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fullyconvolutional neural networks for volumetric medicalimage segmentation,” in , pp. 565–571, IEEE, 2016.[42] H. Chen, Q. Dou, L. Yu, and P.-A. Heng, “Voxresnet:Deep voxelwise residual networks for volumetric brainsegmentation,” arXiv preprint arXiv:1608.05895 , 2016.[43] N. Alalwan, A. Abozeid, A. A. ElHabshy, andA. Alzahrani, “Efﬁcient 3d deep learning model formedical image semantic segmentation,”

Alexandria En-gineering Journal , vol. 60, no. 1, pp. 1231–1239.[44] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A.Heng, “H-denseunet: hybrid densely connected unetfor liver and tumor segmentation from ct volumes,”

IEEE transactions on medical imaging , vol. 37, no. 12,pp. 2663–2674, 2018.[45] K. Simonyan and A. Zisserman, “Very deep convo-lutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.[46] M. Chung, A. Bernheim, X. Mei, N. Zhang, M. Huang,X. Zeng, J. Cui, W. Xu, Y. Yang, Z. A. Fayad, et al. ,“Ct imaging features of 2019 novel coronavirus (2019-ncov),”

Radiology , vol. 295, no. 1, pp. 202–207, 2020.[47] P. A. Yushkevich, Y. Gao, and G. Gerig, “Itk-snap:An interactive tool for semi-automatic segmentation ofmulti-modality biomedical images,” in , pp. 3342–3345, IEEE, 2016.[48] H. Lee, M. Kim, and S. Do, “Practical window settingoptimization for medical image deep learning,” arXivpreprint arXiv:1812.00572 , 2018.[49] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, “Cbam:Convolutional block attention module,” in

Proceedingsof the European conference on computer vision (ECCV) ,pp. 3–19, 2018.[50] M. H. Hesamian, W. Jia, X. He, and P. Kennedy, “Deeplearning techniques for medical image segmentation:Achievements and challenges,”

Journal of digital imag-ing , vol. 32, no. 4, pp. 582–596, 2019.[51] F. Chollet, “Xception: Deep learning with depthwiseseparable convolutions,” in

Proceedings of the IEEEconference on computer vision and pattern recognition ,pp. 1251–1258, 2017.[52] K. Hahn, H. Sch¨ondube, K. Stierstorfer, J. Horneg-ger, and F. Noo, “A comparison of linear interpolationmodels for iterative ct reconstruction,”