[PDF] Interpretative Computer-aided Lung Cancer Diagnosis: from Radiology Analysis to Malignancy Evaluation

Abstract

Background and Objective:Computer-aided diagnosis (CAD) systems promote diagnosis effectiveness and alleviate pressure of radiologists. A CAD system for lung cancer diagnosis includes nodule candidate detection and nodule malignancy evaluation. Recently, deep learning-based pulmonary nodule detection has reached satisfactory performance ready for clinical application. However, deep learning-based nodule malignancy evaluation depends on heuristic inference from low-dose computed tomography volume to malignant probability, which lacks clinical cognition. Methods:In this paper, we propose a joint radiology analysis and malignancy evaluation network (R2MNet) to evaluate the pulmonary nodule malignancy via radiology characteristics analysis. Radiological features are extracted as channel descriptor to highlight specific regions of the input volume that are critical for nodule malignancy evaluation. In addition, for model explanations, we propose channel-dependent activation mapping to visualize the features and shed light on the decision process of deep neural network. Results:Experimental results on the LIDC-IDRI dataset demonstrate that the proposed method achieved area under curve of 96.27% on nodule radiology analysis and AUC of 97.52% on nodule malignancy evaluation. In addition, explanations of CDAM features proved that the shape and density of nodule regions were two critical factors that influence a nodule to be inferred as malignant, which conforms with the diagnosis cognition of experienced radiologists. Conclusion:Incorporating radiology analysis with nodule malignant evaluation, the network inference process conforms to the diagnostic procedure of radiologists and increases the confidence of evaluation results. Besides, model interpretation with CDAM features shed light on the regions which DNNs focus on when they estimate nodule malignancy probabilities.

Full PDF

aa r X i v : . [ ee ss . I V ] F e b Interpretative Computer-aided Lung Cancer Diagnosis: fromRadiology Analysis to Malignancy Evaluation

Shaohua Zheng a ,1 , Zhiqiang Shen a ,2 , Chenhao Pei a ,3 , Wangbin Ding a ,4 , Haojin Lin a ,5 ,Jiepeng Zheng b ,6 , Lin Pan a , ∗ ,7 , Bin Zheng b , ∗∗ ,8 and Liqin Huang a ,9 a College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China b Thoracic Department, Fujian Medical University Union Hospital, Fuzhou 350001, China

A R T I C L E I N F O

Keywords :Computer-aided diagnosisMalignancy evaluationPulmonary noduleRadiology analysis

A B S T R A C T

Background and Objective: Computer-aided diagnosis (CAD) systems promote diagnosis eﬀective-ness and alleviate pressure of radiologists. A CAD system for lung cancer diagnosis includes nodulecandidate detection and nodule malignancy evaluation. Recently, deep learning-based pulmonarynodule detection has reached satisfactory performance ready for clinical application. However, deeplearning-based nodule malignancy evaluation depends on heuristic inference from low-dose computedtomography (LDCT) volume to malignant probability, which lacks clinical cognition.Methods: In this paper, we propose a joint radiology analysis and malignancy evaluation network(R2MNet) to evaluate the pulmonary nodule malignancy via radiology characteristics analysis. Radi-ological features are extracted as channel descriptor to highlight speciﬁc regions of the input volumethat are critical for nodule malignancy evaluation. In addition, for model explanations, we proposechannel-dependent activation mapping (CDAM) to visualize the features and shed light on the deci-sion process of deep neural network (DNN).Results: Experimental results on the LIDC-IDRI dataset demonstrate that the proposed methodachieved area under curve (AUC) of . on nodule radiology analysis and AUC of . onnodule malignancy evaluation. In addition, explanations of CDAM features proved that the shape anddensity of nodule regions were two critical factors that inﬂuence a nodule to be inferred as malignant,which conforms with the diagnosis cognition of experienced radiologists.Conclusion: Incorporating radiology analysis with nodule malignant evaluation, the network inferenceprocess conforms to the diagnostic procedure of radiologists and increases the conﬁdence of evalua-tion results. Besides, model interpretation with CDAM features shed light on the regions which DNNsfocus on when they estimate nodule malignancy probabilities.

1. Introduction

Lung cancer is the most common cause of cancer deathworldwide [1]. Lung cancer screening using low-dose com-puted tomography (LDCT) scans has been proved as an ef-fective tool to reduce patient mortality [2]. However, A thor-ough inspection of a CT scan usually takes a radiologist around10 minutes and diagnosis results are inﬂuenced by the doc-tor’s experience and emotion. With the increasing number ofCT images, the data volumes to be analyzed overwhelm thecapacity of radiologists. Computer-aided diagnosis (CAD)systems have the potential to reduce this burden. In recentyears, deep learning-based methods have demonstrated im-pressive performance in medical image processing, and takenup a dominant position in the design of CAD systems [3, 4,5, 6, 7, 8].A general deep learning-based CAD system for lung can-cer diagnosis includes 1) a pulmonary nodule detection mod-ule that detects candidate pulmonary nodules, and 2) a nod-ule malignancy evaluation module that diagnoses the suspi-cious nodules proposed by the previous stage. Deep learning- ∗ Corresponding author at: College of Physics and Informa-tion Engineering, Fuzhou University, Fuzhou 350108, China E-mail:[email protected]). ∗∗ Corresponding author at: Thoracic Department, Fujian Med-ical University Union Hospital, Fuzhou 350001, China E-mail:[email protected]).

ORCID (s):

Benign Malignant

Figure 1:

Examples of benign (the left column) and malignantnodules (the right column). The red rectangles emphasize thenodule locations and the yellow dashed rectangles highlight thenodule areas. Figure best viewed in color. based nodule detection has achieved remarkable results. How-ever, deep learning-based nodule malignancy evaluation mod-els that straightforwardly predict malignant probabilities areshort of explanations of which regions deep neural networks(DNNs) focus on [9, 10]. Doctors estimate nodule malignant xxx et al.:

Preprint submitted to Elsevier

Page 1 of 11 (a) (b)(c) (d)

Figure 2:

Examples of nodules labeled as solid nodule (a), mixground-glass opacity nodule (b), ground-glass opacity nodule(c), and calciﬁed nodule (d). The red rectangles emphasize thenodule locations and the yellow dashed rectangles highlight thenodule areas. Figure best viewed in color. risk mainly according to the shape and density of the nod-ules as well as other pathology information. Qualitatively,compared to the benign nodules, the malignant ones oftenhave larger volumes, varied density, and irregular shapes.Examples of benign and malignant nodules are illustrated inFig.1. The inference results of the DNNs, therefore, lackconﬁdence and interpretation.To overcome the problems mentioned above, we proposea joint radiology analysis and malignancy evaluation net-work (R2MNet) that evaluates nodule malignancy accord-ing to radiology analysis. Speciﬁcally, radiology analysisaims to classify nodules as solid nodules (SN), ground-glassopacity nodules (GGO), mix GGO nodules (MGGO), andcalciﬁed nodules (CN) as shown in Fig.2. The purpose ofmalignancy evaluation is to estimate malignant risk of nod-ule. R2MNet consists of two sub-networks, the radiologyanalysis network (RNet) and the malignancy evaluation net-work (MNet) to implemented these two task, respectively.To consolidate the two sub-networks, we design assisted gat-ing units (AGUs) embedded in the MNet to transform thefeature maps extracted by RNet as a channel descriptor tocapture channel dependencies of that by MNet. Moreover,model interpretability is crucial in CAD. To enable our modelexplainable, we propose channel-dependent activation map-ping (CDAM) that adopts channel dependencies of activa-tion maps themselves for features interpretation. Extensiveexperiments on LIDC-IDRI [11] via ﬁve-fold cross-validationdemonstrate that the proposed R2MNet achieves satisfactoryperformance on nodule malignancy evaluation. Moreover,its inference process conforms to clinical diagnosis proce-dure which increases the conﬁdence level of evaluation re-sults. Our contributions can be summarized as follows:• We propose R2MNet that integrates two sub-networks (RNet and MNet) to inference malignant risk via radi-ology analysis. The RNet extracts radiological featureusing new labeled data. MNet evaluates nodule ma-lignancy.• To conjoin the two sub-networks of R2MNet, we de-sign the AGUs embedded in MNet to transform thefeature maps extracted by RNet as a channel descrip-tor to capture channel dependencies of that by MNet.• To enable our model interpretable, we propose CDAMthat exploits channel dependencies of the activationmaps for visualizing explanation.• Extensive experiments on the LIDC-IDRI dataset in-dicate that our method achieves promising accuracyfor nodule malignancy evaluation. Remarkably, theinference process conforms to clinical diagnosis pro-cedure.The rest of this paper is organized as follows. In Section2, we review the relevant literature. Datasets and their cor-responding preprocessing are speciﬁed in Section 3. Section4 elaborates on the proposed methods. Experiments settingand results are shown in Section 5. In Section 6, we dis-cuss the experiment results and analyze the superiority andlimitations of our approach. Section 7 concludes this paper.

2. Related work

In the following, we review the works related to pul-monary nodule classiﬁcation, long-range dependencies, andClass Activation Map (CAM)-based explanation.

In a deep learning-based CAD system, nodule classiﬁerseither reduce false-positive nodules following nodule detec-tors or evaluate nodule malignancy in the back of the CADsystems. Setio et al. extracted 2D patches from nine sym-metrical planes of a cube for false positive reduction [12].Dou et al. encoded multi-level context information with 3DConvolutional Neural Network (CNN) to reduce false posi-tives [13]. MD-NDNet integrated nodule volumetric infor-mation and spatial nodule correlation features from sagittal,coronal, and axial planes to decrease false positive rate [14].Winkels et al. developed a 3D version of group equivariantconvolutional networks that generalizes automatically overdiscrete rotations and reﬂection for false-positive reduction[15]. False-positive reduction using CNNs that identiﬁes in-put CT volumes whether have nodules or not conforms tothe clinical basis. However, nodule benign/malignant eval-uation directly from CT to malignant probability lacks in-terpretation of features extracted by CNN [9, 10]. To im-prove model interpretability, Hussein et al. adopted multipleCNNs based on graph regularized sparse multi-task learningfor malignant risk stratiﬁcation [16]. Similarly, Wu et al. in-tegrated the tasks including classiﬁcation and segmentationin a multi-task learning manner [17]. In this work, we ex-ploit radiological features as a channel descriptor for nodule xxx et al.:

Preprint submitted to Elsevier

Page 2 of 112MNet malignancy evaluation. Besides, we employ the proposedCDAM for model explanation. Overview of the proposedmodel is introduced in Section 4.1.

Learning long-range dependencies is of great importancein deep neural networks. Long-range dependencies enablenetworks to capture large receptive ﬁeld and learn globalfeatures. Convolutions are local operations in which long-range dependencies can only be captured when these opera-tions are applied repeatedly. The transformer was one of theﬁrst attempts to apply a self-attention mechanism to modellong-range dependencies in machine translation [18]. Non-local operation captured the pixel-level pairwise relations forsolving computer vision [19]. GCNet improved the Non-local network with less computation while maintained theeﬀectiveness of long-range dependencies capturing [20]. Tolearn channel-wise dependencies of feature maps, SENet re-calibrated the channel dependency with global context fea-tures as each channel of feature maps corresponding to thespeciﬁc region of the input image [21]. Motivating by thesuperiority of SENet, we propose a AGU for recalibratingchannel relationships using speciﬁc features as a channel de-scriptor. Details of the AGU are presented in Section 4.2.

Activation maps visualization has been the most main-stream method for CNN interpretation. Speciﬁcally, the ClassActivation Map (CAM) is one of the widely adopted meth-ods [22]. CAM-based explanations provide feature visual-ization for explanations with a weighted combination of ac-tivation maps learned from CNN [22, 23, 24, 25].CAM identiﬁed discriminative regions by a linear weightedcombination of activation maps of the last convolutional layerbefore the global pooling layer [22]. However, it is only ap-propriate for a restricted class of CNNs that contain globalaverage pooling layers and fully connected layers. To ex-tend the range of application of CAM, Grad-CAM general-ized the deﬁnition of the weighting coeﬃcients as the gra-dient of class conﬁdence concerning the activation map andapplies to a signiﬁcantly broader range of CNN model fami-lies [23]. The variation of Grad-CAM, Grad-CAM++ aimedto provide better localization of objects as well as explain-ing occurrences of multiple objects of a class in a singleimage [24]. Using gradient to incorporate the importanceof each channel towards the class conﬁdence is a naturalchoice. The gradient information for a deep neural networkcan be noisy and also tends to vanish due to saturation in sig-moid or the ﬂat zero-gradient region in Rectiﬁed Linear Unit(ReLU). Instead of using the gradient information ﬂowinginto the last convolutional layer to represent the importanceof each activation map, Score-CAM exploited the impor-tance as the linear combination of score-based weights andactivation maps [25]. However, the aforementioned methodsadopted weighting coeﬃcients derived from external data,which may introduce noise and bias. Therefore, we pro-pose the CDAM for activation maps visualization where theweighting coeﬃcients are calculated from the activation maps themselves. Details of the CDAM are presented in Section4.3.

3. Materials

In this section, we introduce the database used in our ex-periments. Data annotation and preprocessing methods arealso speciﬁed.

In this study, we use a selected version of the LIDC database[11] provided in the LUNA16 challenge [26] which consistsof

CT scans comprising a total of nodules. Wehave obtained the nodules malignancy from the annotationﬁles in the LIDC-IDRI dataset. Nodules with an averagescore higher than were labeled as malignant and lower than are labeled as benign. Some nodules were removed fromthe experiments in the case of the averaged malignancy score , ambiguous IDs, and rated by only one or two radiologists,which resulted in a total of nodules where there were malignant nodules and benign nodules. For nodule radiological analysis, two experienced radiol-ogists labeled nodules as SN, GGO, MGGO, CN accordingto the 3D radiological features of LDCT scans using ITK-SNAP [27]. The steps of data annotation are brieﬂy listed asfollows.1) Two experienced doctors, respectively, marked the classof all nodules based on radiological characteristics.2) Then, they carefully inspected and corrected the labeledresults, respectively.3) The ﬁnal version was obtained by discussing and remarkthe diﬀerent classes labeled in previous steps.

The data preprocessing follows four steps.1)

Normalization . We clipped hounsﬁeld units (HU) of theraw CT data into [-1200, 600] and normalized them into[0, 1].2)

Extraction . Foreground regions of normalized CT scanswere extracted according to the ground truth masks pro-vided by LUNA 16 challenge.3)

Resample . We resampled all CT volumes to have 1 mmspacing in the z-, y-, x-dimension.4)

Crop . The nodule regions used to train and test our methodwere cropped according to the experiment conﬁgurations(i.e. 2D/3D format and size) xxx et al.:

Preprint submitted to Elsevier

Page 3 of 112MNet C onv R e s B l o c k B N R e L U C onv B N R e L U C onv B l o c k C onv B l o c k R e s B l o c k R e s B l o c k R e s B l o c k R e s B l o c k R e s B l o c k R e s B l o c k R e s B l o c k AGU C onv B l o c k C onv B l o c k AGU AGU AGU SNMGGOGGOCNBM (a)

RNetMNet (b) C onv B N R e L U C onv B N R e L U C onv B N R e L U C onv B N R e L U C onv B N R e L U C onv B N R e L U (c) (cid:2162) (cid:2201)(cid:2199) (cid:1668) (cid:2162) (cid:2187)(cid:2206) (cid:1668) , (cid:2179) (d) … (cid:2200) (cid:2778) (cid:2200) (cid:2779) (cid:2200) (cid:2170) … (cid:2195) (cid:2778) (cid:2195) (cid:2779) (cid:2195) (cid:2170) … (cid:3557)(cid:2195) (cid:2778) (cid:3557)(cid:2195) (cid:2779) (cid:3557)(cid:2195) (cid:2170) Figure 3:

The diagram of the proposed method. (a) R2MNet. Note that we omit the four max pooling layers each of whichis behind the Residual blocks for illustration convinience. (b) The convolutional block. (c) The residual blocks. (d) The AGUmodule.

4. Methods

In this section, we introduce our R2MNet and detail itscomponents. Then, the implemented details are presented.The proposed R2MNet are shown in Fig.3. The diagramof R2MNet is illustrated in Fig.3(a). R2MNet is composedof two CNN trained in multi-task learning manner (Section4.1). The AGU transforms radiological features into a chan-nel descriptor to facilitate malignancy evaluation (Fig.3(d)).CDAM are proposed for model explanation (Fig.4).

Here we present our R2MNet and provide an overview ofthe key components. The proposed R2MNet takes a 3D CTvolume of as input and provides as outputs a radiology classand a nodule malignant score. Speciﬁcally, the R2MNetconsists of two improved residual networks [28], i.e., RNetand MNet as illustrated in Fig.3 (a). MNet includes twoconvolutional blocks(Fig.3 (b)), four residual blocks each ofwhich contains three residual units (Fig.3 (c)), four AGUs(Fig.3 (d)), and four max-pooling layers. The architecture ofRNet is similar to MNet but without AGUs. The proposed method can combine nodule radiological features for nodulemalignancy evaluation. The RNet and MNet are trained si-multaneously in a multi-task learning manner. This is diﬀer-ent than current approaches that use directly a CNN for ma-lignancy estimation [9, 10]. The goals of RNet are extractingradiological features of pulmonary nodules for nodule eval-uation as well as providing the radiological characteristics asa reference for practice diagnosis. The outputs of RNet arefour categories probabilities and radiological features. Theradiological features are transformed into a channel descrip-tor by AGU (Section 4.2) to render the MNet focus on nodulearea. MNet takes as inputs the CT volume data and the radi-ological features for pulmonary nodule malignancy evalua-tion. The loss function for training our networks is weightedcross-entropy (CE) loss. 𝐿 ( 𝑌 𝑟 , 𝑌 𝑚 , ̂𝑌 𝑟 , ̂𝑌 𝑚 ) = 𝜆𝐿 𝐶𝐸 ( 𝑌 𝑟 , ̂𝑌 𝑟 )+(1− 𝜆 ) 𝐿 𝐶𝐸 ( 𝑌 𝑚 , ̂𝑌 𝑚 ) (1), where 𝑌 𝑟 and 𝑌 𝑚 are the ground truth, and ̂𝑌 𝑟 and ̂𝑌 𝑚 arepredictions of the R2MNet. xxx et al.: Preprint submitted to Elsevier

Page 4 of 112MNet

Global Average Pooling (cid:1668) (cid:2183) (cid:2778) , (cid:2183) (cid:2779) , …, (cid:2183) (cid:2170) (cid:2183) (cid:2778) * …… (cid:2157) (cid:2778)(cid:2194) (cid:2157) (cid:2779)(cid:2194) (cid:2157) (cid:2898)(cid:2194) (cid:2183) (cid:2779) * (cid:2183) (cid:2170) * … Softmax(.) Softmax(.)Upsampling(.)

Figure 4:

The diagram of the proposed CDAM. Activation maps are linearly weighted to generate visual explanation.

The vanilla SE layer [21] adopted the input features tocapture channel dependencies in 2D senario. The AGU, in-stead, transforms the features extracted by RNet as a channeldescriptor to capture channel dependencies of that by MNetin 3D senario. The diagram of AGU is shown in Fig. 3 (d).Speciﬁcally, we transform radiological features into a chan-nel descriptor to capture the channel dependencies of malig-nancy features. Similar to the SE block, we model channelinterdependencies to recalibrate ﬁlter responses in two steps(i.e., squeeze and excitation) as discussed follows.1)

Squeeze . Squeeze operations are adopted for global in-formation embedding. In R2MNet, radiological features 𝑅 = [ 𝑟 , 𝑟 , . . . , 𝑟 𝑁 ] are squeezed to a channel descriptorby using global average pooling (GAP). Noting that moresophisticated aggregation strategies could be employedhere as well, we adopt GAP as used in [21]. The channeldescriptor 𝑇 = [ 𝑡 , 𝑡 , . . . , 𝑡 𝑁 ] ∈ ℝ 𝐶 is computed as: 𝑡 𝑛 = 𝐹 𝑠𝑞 ( 𝑟 𝑛 ) = 1 𝐷 × 𝐻 × 𝑊 𝐷 ∑ 𝑖 =1 𝐻 ∑ 𝑗 =1 𝑊 ∑ 𝑘 =1 𝑟 𝑛 ( 𝑖, 𝑗, 𝑘 ) (2)where 𝐹 𝑠𝑞 is the squeeze operation, and 𝐷 , 𝐻 and 𝑊 denotes the depth, height and width of the feature maps.2) Excitation . The following operation takes as input theinformation aggregated in the last step to capture chan-nel dependencies, i.e. 𝑆 = [ 𝑠 , 𝑠 , . . . , 𝑠 𝑁 ] ∈ ℝ 𝐶 . Theexcitation operation can be formulated as follows: 𝑆 = 𝐹 𝑒𝑥 ( 𝑇 , 𝑊 ) = 𝜎 ( 𝑔 ( 𝑇 , 𝑊 )) = 𝜎 ( 𝑊 𝛿 ( 𝑊 𝑇 )) (3) where 𝜎 is the sigmoid function and 𝛿 refers to ReLUactivation, 𝑊 ∈ ℝ 𝐶𝑟 × 𝐶 and 𝑊 ∈ ℝ 𝐶 × 𝐶𝑟 . Similar to[21], we form a bottleneck including a dimensionality-reduction layer with parameters 𝑊 with reduction ratio 𝑟 , a ReLU acitivation, and then a dimensionality-increasinglayer with parameters 𝑊 . Finally, the recalibrated ma-lignant features ̃𝑀 = [ ̃𝑚 , ̃𝑚 , . . . , ̃𝑚 𝑁 ] ∈ ℝ 𝐶 are ob-tained by rescaling the malignant features 𝑀 = [ 𝑚 , 𝑚 , . . . , 𝑚 𝑁 ] ∈ ℝ 𝐶 with the radiological channel descriptors 𝑇 : ̃𝑚 𝑛 = 𝐹 scale ( 𝑡 𝑛 , 𝑠 𝑛 ) = 𝑠 𝑛 ⋅ 𝑡 𝑛 (4)where 𝐹 scale denotes channel-wise multiplication. We propose CDAM for 3D features visualization moti-vating by CAM-based methods, as shown in Fig.4. CAM is atechnique for identifying discriminative regions by a linearlyweighted combination of activation maps of the last convo-lutional layer before the global pooling layer [22]. The mo-tivation behind CAM is that each activation map of a CNNlayer contains diﬀerent spatial information about the input 𝑋 and the importance of each channel is the weight of thelinear combination of the fully connected layer following theglobal pooling. However, if there is no global pooling layeror there is no fully connected layers, CAM will not applydue to no deﬁnition of the weighted coeﬃcients. Grad-CAM[23] and its variations [24] generalize CAM to models with-out global pooling layers by employing gradients as weights.Instead of using weights of the fully connected layer orgradient information derived from external layers, CDAM xxx et al.: Preprint submitted to Elsevier

Page 5 of 112MNet T r u e P o s iti v e R a t e False Positive Rate RNet R2MNet (a) T r u e P o s iti v e R a t e False Positive Rate MNet R2MNet (b)

Figure 5:

ROC curves of RNet and R2MNet on the radiology analysis (a), and malignancy evaluation (b). employs activation maps themselves to obtain weights for alinear combination of activation maps. Formally, CDAM isdeﬁned as: 𝐿 𝐶𝐷𝐴𝑀 = ReLU ( 𝐶 ∑ 𝑖 =1 𝛼 𝑖 𝐴 𝑙𝑖 ) (5)where 𝐴 𝑙 denotes the activations of the 𝑙 th CNN layer, 𝐴 𝑙𝑖 refers to the activation map for the 𝑖 th channel of 𝐴 𝑙 , and 𝑎 = [ 𝑎 , 𝑎 , ..., 𝑎 𝑛 ] ∈ ℝ 𝐶 is deﬁned as: 𝑎 𝑛 = 1 𝐷 × 𝐻 × 𝑊 𝐷 ∑ 𝑖 =1 𝐻 ∑ 𝑗 =1 𝑊 ∑ 𝑘 =1 𝐴 𝑙𝑛 ( 𝑖, 𝑗, 𝑘 ) (6), We apply a ReLU activation to the linear combination ofmaps because we are only interested in the features that havea positive inﬂuence. Both 𝑎 𝑙 and 𝐴 𝑙 are utilized after theSoftmax activation because the relative output value afternormalization is more reasonable to measure the relevancethan the absolute output value. Furthermore, to capture voxel-wise importance, we up-sample 𝐿 𝐶𝐷𝐴𝑀 to the input resolu-tion using bicubic interpolation.

The network were performed on PyTorch [29]. The mod-els were trained via Adam optimizer [30] with standard back-propagation. Data augmentation operations i.e., scaling, ﬂip,and rotation were also employed in the experiments. Thelearning rate was set as a ﬁxed value of 𝑒 −4 and the numberof epochs was . The networks were trained on a singleNVIDIA GeForce GTX 1080Ti.

5. Experiments and results

In this section, we evaluate the proposed R2MNet on theLIDC-IDRI database and show the results. First, we per-formed nodule characteristics identiﬁcation and malignancyevaluation individually. Then, we combined two tasks in

Table 1

Performance comparison of RNet and R2MNet on radiologyanalysis.Model SN MGGO GGO CN AUCRNet 95.50 89.88 91.01 96.63 95.21R2MNet 96.63 92.13 91.01 97.75 97.08 multi-task learning where radiology analysis assisted malig-nancy evaluation. For model explanations, we visualized thefeature maps and analyzed their characteristics. Experimen-tal results show that the proposed method achieved higherperformance compared to the baseline.

Nodule radiology analysis aims to classify nodules asSN, MGGO, GGO, and CN nodules. Identifying these char-acteristics renders the model to learn radiological featuresfor facilitating malignant evaluation. In addition, these char-acteristics can assist radiologists in determining nodule at-tributes as well. Experimental results of nodule character-istics classiﬁcation are listed in Table 1. Both RNet andR2MNet achieved accuracy higher than among the fourcategories. After combined with MNet, the performance ofR2MNet either remained the accuracy level of RNet (GGO,CN) or was higher than that of RNet (SN, MGGO). Also,the area under curve (AUC) of R2MNet is larger than that ofRNet. According to the Fig.5 (b), the ROC curve of R2MNetnearly surrounds that of the RNet.

Radiological features of pulmonary nodules can assistCNN for malignant classiﬁcation because the inference pro-cedure conforms to the diagnosis process. To testify the ef-fectiveness of the proposed method, we conducted experi-ments of nodule malignant classiﬁcation. As shown in Ta-ble 2, R2MNet outperforms MNet with an accuracy gainof . and an AUC gain of . , respectively. More-over, the accuracy and AUC of R2MNet are more stable xxx et al.: Preprint submitted to Elsevier

Page 6 of 112MNet

MNet R2MNetw/oAGU R2MNet MNet R2MNetw/oAGU R2MNet0.900.910.920.930.940.950.960.970.980.991.00 ACC ACC ACC AUC AUC AUC (b)

RNet R2MNetw/oAGU R2MNet RNet R2MNetw/oAGU R2MNet0.860.880.900.920.940.960.981.00 ACC ACC ACC AUC AUC AUC (a)

Figure 6:

Comparison among RNet, MNet, R2MNetw/oAGU, and R2MNet with accuracy and AUC on radiology analysis (a) andmalignancy evaluation (b), respectively. The ﬁrst three columns are the accuracy boxes and the remaining are AUC ones. Eachscalar in the left of the corresponding boxes is the average value.

Table 2

Performance comparison measured by accuracy and AUC ( 𝑚𝑒𝑎𝑛 ± 𝑠.𝑑. % ) for MNet,R2MNetw/oAGU, and R2MNet on radiology analysis and malignancy evaluation.Task Radiological analysis Malignant evaluationModel Accuracy AUC Accuracy AUCMNet .

82% ± 1 .

09 95 .

15% ± 1 .

05 92 .

89% ± 0 .

76 95 .

24% ± 1 . R2MNet_w/oAGU .

11% ± 0 .

95 96 .

41% ± 1 .

64 93 .

97% ± 1 .

33 96 .

08% ± 1 . R2MNet .

13% ± 1 .

10 96 .

27% ± 1 .

60 94 .

74% ± 0 .

62 97 .

52% ± 1 . compared to MNet according to the standard deviation. TheROC curves of MNet and R2MNet are depicted in Fig.5 (c).To compare the overall performance of MNet and R2MNetthrough ﬁve-fold cross-validation, we also illustrated the boxplots with accuracy and AUC in Fig.6. As shown, comparedto MNet, R2MNet achieved more stable and higher results. We conducted an ablation study to investigate the indi-vidual contributions of R2MNet and AGU module. We im-plemented the experiments both on radiology analysis andmalignant evaluation. The experiments were performed fromtwo ends; on the one hand, we just included the radiologyanalysis in nodule malignant evaluation, which resulted ina fundamental version of R2MNet (i.e., R2MNetw/oAGU).On the other hand, the AGU modules were introduced intothe preliminary R2MNet to construct the ﬁnal version of theproposed method (i.e., R2MNet).In nodule radiology analysis, a comparison was madeamong RNet, R2MNetw/oAGU, and R2MNet. As indicatedin Table 2 the accuracy and AUC scores of the R2MNetw/oAGUare similar to that of R2MNet. Both of them slightly outper-forms RNet. Results are shown in Fig.6(a).In nodule malignancy evaluation, a comparison was im-plemented among MNet, R2MNetw/oAGU, and R2MNet. The results of the ﬁve-fold cross-validation are listed in Ta-ble 2. We can observe from the table that combining ra-diological analysis with malignant evaluation improves per-formance over doing the latter only. Further, when AGUis introduced into R2MNet, the synergy between these twocomponents generates the best performance. The illustrationof these results is shown in Fig.6 (b).

Direct approaches that classify pulmonary nodule as be-nign or malignant from input CT data to the malignant prob-abilities lack of interpretation. To build explainable models,we provided visual explanations using the proposed CDAM.The experiments were performed both on malignant evalua-tion and radiology analysis to investigate voxel-wise impor-tance regions which the models focus on in diﬀerent tasks.Speciﬁcally, we employed the feature maps with a size of

256 × 6 × 6 × 6 after the last residual block in our model asactivation maps. Since the activation maps are volume data,we adopted the center slice for visualization convenience.Fig.7 shows the CDAM features and its corresponding prob-abilities of MNet, R2MNetw/oAGU, and R2MNet concern-ing nodule malignant evaluation, respectively. The value be-low each sub-ﬁgure is the probability predicted by the corre-sponding model. Besides, we illustrated the CDAM features xxx et al.:

Preprint submitted to Elsevier

Page 7 of 112MNet S li ce Malignant M N e t R M N e t w / o A G UR M N e t Benign

Figure 7:

Visualization of CDAM features derived from MNet, R2MNetw/oAGU, and R2MNet regarding malignant evaluation,respectively. The value under each sub-ﬁgure is probability predicted by the corresponding model. Note that we show the centralslice only for visualizing convenience. Figure best viewed in color. with respect to nodule radiology analysis in Fig.8.

6. Discussion

Automatic pulmonary nodule malignancy evaluation isan essential component of a CAD system for lung cancer di-agnosis. Deep learning-based methods have demonstratedpromising results on this task. Table 3 summarizes the re-lated works from the literature. Shen et al. introduced a Mul-tiscale CNN for nodule malignancy diagnosis and achievedan accuracy of . on a selected LIDC-IDRI dataset [31].Nibali et al. adopted ResNet with multiview inputs for be-nign/malignant classiﬁcation [9]. They evaluated their methodon the dataset derived from the LIDC-IDRI and achieved anaccuracy of . . Al-Shabi et al. employed non-localblocks to model nodule global features and residual blocksto capture local features of nodule [10]. They estimated themodel on the selected LIDC-IDRI database with accuracyof . . However, classifying lung nodules as benign ormalignant directly from the CT volume (or slice) lack clin-ical basis and explanations of the features extracted by theCNN. Therefore, the results are short of conﬁdence level.Hussein et al. empirically established the signiﬁcance of dif-ferent high-level nodule attributes for malignancy determi-nation [32]. Furthermore, they adopted CNNs to learn a se-ries of features for nodule attributes then fused these featuresto predict the malignancy of pulmonary nodule in a multi- Table 3

Overview of previous methods for pulmonary nodule evalua-tion. Abbreviations: Information Processing in Medical Imag-ing (IPMI), International Symposium on Biomedical Imaging(ISBI), International Journal of Computer Assisted Radiologyand Surgery (IJCARS).Methods AccuracyMCNN [31], IPMI 86.84%TurmorNet [32], ISBI 82.47%TurmorNet (Attributes) [32], ISBI 92.31%Nodule-ResNet [9], IJCARS 89.90%MIT-3DCNN [16], IPMI 91.26%PN-SAMP [17], ISBI 97.58%Local-Global Networks [10], IJCARS 88.46%R2MNet, ours 94.74% task learning manner [16]. Similarly, Wu et al. proposeda multi-task learning CNN that integrated pulmonary nod-ule segmentation attributes and malignancy prediction [17].Their approach simultaneously predicted the malignancy oflung nodules, segmented the nodule areas and learned nod-ule attributes, and aimed to tackle the problem of modelinterpretability. Note that it can be diﬃcult to pursue anobjective cross-study comparison due to the diﬀerences indatasets, initialization methods, and experimental settings.Our method leveraged radiological features as a channeldescriptor to assist lung nodule evaluation in a multi-task xxx et al.:

Preprint submitted to Elsevier

Page 8 of 112MNet S li ce RN e t R M N e t w / o A G UR M N e t SN MGGO GGO CN Figure 8:

Visualization of CDAM features derived from RNet, R2MNet_w/oAGU, and R2MNet concerning radiology analysis,respectively. Note that we show the central slice only for visualizing convenience. Figure best viewed in color. learning manner. Speciﬁcally, Table 1 indicates the resultsof the radiological analysis. Although radiology analysis is aauxiliary component in the R2MNet, R2MNet increased theaccuracy among four nodule categories and the AUC scorecompared with RNet. Moreover, the ROC curves in Fig.5(b)where the curve of R2MNet nearly surrounds that of RNet il-lustrate that the classiﬁcation performance of R2MNet betterthan that of RNet. In nodule malignancy evaluation, Fig.5(c)depicts the ROC curves of MNet and R2MNet in which thecurves of the latter higher than that of the former. As indi-cated in Table 2, in general, joint learning of radiology anal-ysis and malignancy evaluation improved the performancecompared to each individual. Combined learning facilitatescommunication between diﬀerent tasks. We can concludethat these two tasks reinforce each other. Furthermore, com-paring R2MNetw/oAGU with MNet, we can view the accu-racy and AUC gain both in the two tasks. The performanceof R2MNet on radiological analysis nearly equal to that ofR2MNetw/oAGU. It is reasonable because the AGU moduleadopted radiological features to facilitate nodule malignancyevaluation. Indeed, the performance gain was obtained byR2MNet in malignancy estimation. On the other hand, Fig.6depicts the box plots with average values and data distribu-tion. The accuracy scores and AUC scores increase grad-ually among MNet, RNet, R2MNetw/oAGU, and R2MNet,which further proves the eﬀectiveness of the proposed meth-ods. Viewing the boxes of MNet/RNet and R2MNetw/oAGU, one can conclude that although multi-task learning can bringperformance gain, the results tend to ﬂuctuate due to thehard convergence of the networks. However, the results ofR2MNet are stable compared to others because introducingAGU into R2MNetw/oAGU enables the R2MNet to employradiological features and then improve the adaptability of themodel to diﬀerent data.Although performance improvement is one of a greatpurpose in developing deep learning-based methods, inter-pretability is essential as well. According to the experiencesof radiologists, the shape and density of nodule regions aretwo critical factors that inﬂuence a nodule to be inferredas malignant. Fig.7 shows the CDAM features of MNet,R2MNetw/oAGU, and R2MNet concerning nodule malig-nant evaluation, respectively. MNet tended to be disturbedby the background noise and confused with benign and ma-lignant features. In contrast, both R2MNet and R2MNetw/oAGUcan focus on nodule regions except that they yielded a wrongidentiﬁcation in the ﬁrst benign nodule. Furthermore, thesetwo architectures paid higher attention to malignant nodulesand lower attention to benign ones, which conforms to therisk of the nodules. According to the last two columns of be-nign and malignant nodules in Fig.7, even though the MNetgenerated high probabilities, similar to other models, the con-cerning regions of MNet slightly deviate from the groundtruth. On the contrary, R2MNet predicted low scores whenit falsely located the nodule region, whereas MNet still gen- xxx et al.:

Preprint submitted to Elsevier

Page 9 of 112MNet erated high probability (Fig.7, the ﬁrst column). We can con-clude that incorporating malignancy evaluation with radiol-ogy analysis can render the network emphasize nodule re-gions and characterize the shape and density features of nod-ules. Besides, the density of nodules plays a key role for nod-ule radiology analysis. As shown in Fig.8, even though thefour classes of nodules have diﬀerent densities, the bound-aries among them are confused, which led both the RNet andR2MNetw/oAGU to locate the nodule regions inaccurately.On the contrary, the R2MNet accurately located the nodulesand lay diﬀerent emphasis on these regions according to theirdensities, conforming with the clinical basis. Therefore, wecan conclude that even though the results of R2MNet andR2MNetw/oAGU are similar, the inference process of R2MNetis more reasonable.A major limitation of this work is that the input data de-pend on pulmonary nodule detection. The input data are de-rived from either manually choosing by radiologists or au-tomatic detection by nodule detectors. Previous researchesintegrated multi-models into a synthetic system whose com-ponents were trained separately to performed diﬀerent tasks.For example, Bonavita et al. developed a lung cancer clas-siﬁcation pipeline that integrated a 3D CNN with an exist-ing nodule detection framework [33]. Liao et al. adopted a3D Faster R-CNN for patch-based nodule detection and in-tegrated the leaky noisy-OR model into neural networks tosolve lung cancer prediction [7]. Similarly, Zhu et al. build aDeepLung system to identify suspicious nodules and predictnodule malignancy [6]. Ozdemir et al. introduced a CADsystem that included two sub-systems for nodule candidatessegmentation and malignancy prediction [8]. An end-to-endexplainable CAD system for lung cancer diagnosis that in-tegrates nodule detection,segmentation and malignancy pre-diction is of extensive clinical application value. This willbe considered as our future work.

7. Conclusion

In this paper, we proposed the R2MNet that evaluatedpulmonary nodule malignancy resorting to radiology anal-ysis instead of directly infer malignant probability, whichconformed to the clinical diagnosis procedure and increasedthe conﬁdence of prediction results. Speciﬁcally, the radio-logical features were transformed into a channel descriptorthat emphasized the informative malignant features and sup-pressed the less useful ones, so that the network could esti-mate the malignant risk based on radiological characteristicsas did an experienced doctor to a patient. Besides, modelexplanations with CDAM shed light on the voxel-wise nod-ule regions which CNNs focussed on when they estimatednodule malignancy risk. The experimental results on theLIDC-IDRI database demonstrated the eﬀectiveness of theproposed R2MNet.

Acknowledgement

This work was supported by the Natural Science Foun-dation (Grant No. 2020J01472) and Provincial Science and Technology Leading Project (Grant No.2018Y0032) of Fu-jian Province, China. This work was also supported by Fu-jian Key Laboratory of Cardio-Thoracic Surgery (Fujian Med-ical University)

References [1] J. Ferlay, I. Soerjomataram, R. Dikshit, S. Eser, C. Mathers, M. Re-belo, D. M. Parkin, D. Forman, F. Bray, Cancer incidence and mor-tality worldwide: sources, methods and major patterns in globocan2012, International journal of cancer 136 (2015) E359–E386.[2] I. Sluimer, A. Schilham, M. Prokop, B. Van Ginneken, Computeranalysis of computed tomography scans of the lung: a survey, IEEEtransactions on medical imaging 25 (2006) 385–405.[3] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional net-works for biomedical image segmentation, in: International Confer-ence on Medical image computing and computer-assisted interven-tion, Springer, 2015, pp. 234–241.[4] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger,3d u-net: learning dense volumetric segmentation from sparse anno-tation, in: International conference on medical image computing andcomputer-assisted intervention, Springer, 2016, pp. 424–432.[5] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutionalneural networks for volumetric medical image segmentation, in: 2016fourth international conference on 3D vision (3DV), IEEE, 2016, pp.565–571.[6] W. Zhu, C. Liu, W. Fan, X. Xie, Deeplung: Deep 3d dual pathnets for automated pulmonary nodule detection and classiﬁcation, in:2018 IEEE Winter Conference on Applications of Computer Vision(WACV), IEEE, 2018, pp. 673–681.[7] F. Liao, M. Liang, Z. Li, X. Hu, S. Song, Evaluate the malignancy ofpulmonary nodules using the 3-d deep leaky noisy-or network, IEEEtransactions on neural networks and learning systems 30 (2019) 3484–3495.[8] O. Ozdemir, R. L. Russell, A. A. Berlin, A 3d probabilistic deep learn-ing system for detection and diagnosis of lung cancer using low-dosect scans, IEEE Transactions on Medical Imaging 39 (2019) 1419–1429.[9] A. Nibali, Z. He, D. Wollersheim, Pulmonary nodule classiﬁcationwith deep residual networks, International journal of computer as-sisted radiology and surgery 12 (2017) 1799–1808.[10] M. Al-Shabi, B. L. Lan, W. Y. Chan, K.-H. Ng, M. Tan, Lung noduleclassiﬁcation using deep local–global networks, International journalof computer assisted radiology and surgery 14 (2019) 1815–1819.[11] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R.Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A.Hoﬀman, et al., The lung image database consortium (lidc) and imagedatabase resource initiative (idri): a completed reference database oflung nodules on ct scans, Medical physics 38 (2011) 915–931.[12] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J.Van Riel, M. M. W. Wille, M. Naqibullah, C. I. Sánchez, B. van Gin-neken, Pulmonary nodule detection in ct images: false positive re-duction using multi-view convolutional networks, IEEE transactionson medical imaging 35 (2016) 1160–1169.[13] Q. Dou, H. Chen, L. Yu, J. Qin, P.-A. Heng, Multilevel contextual 3-dcnns for false positive reduction in pulmonary nodule detection, IEEETransactions on Biomedical Engineering 64 (2016) 1558–1567.[14] Z. Wu, R. Ge, G. Shi, L. Zhang, Y. Chen, L. Luo, Y. Cao, H. Yu,Md-ndnet: a multi-dimensional convolutional neural network forfalse-positive reduction in pulmonary nodule detection, Physics inMedicine & Biology 65 (2020) 235053.[15] M. Winkels, T. S. Cohen, Pulmonary nodule detection in ct scanswith equivariant cnns, Medical image analysis 55 (2019) 15–26.[16] S. Hussein, K. Cao, Q. Song, U. Bagci, Risk stratiﬁcation of lung nod-ules using 3d cnn-based multi-task learning, in: International confer-ence on information processing in medical imaging, Springer, 2017,pp. 249–260.[17] B. Wu, Z. Zhou, J. Wang, Y. Wang, Joint learning for pulmonary xxx et al.:

Preprint submitted to Elsevier

Page 10 of 112MNet nodule segmentation, attributes and malignancy prediction, in: 2018IEEE 15th International Symposium on Biomedical Imaging (ISBI2018), IEEE, 2018, pp. 1109–1113.[18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in:Advances in neural information processing systems, 2017, pp. 5998–6008.[19] X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks,in: Proceedings of the IEEE conference on computer vision and pat-tern recognition, 2018, pp. 7794–7803.[20] Y. Cao, J. Xu, S. Lin, F. Wei, H. Hu, Gcnet: Non-local networks meetsqueeze-excitation networks and beyond, in: Proceedings of the IEEEInternational Conference on Computer Vision Workshops, 2019, pp.0–0.[21] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Pro-ceedings of the IEEE conference on computer vision and patternrecognition, 2018, pp. 7132–7141.[22] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learningdeep features for discriminative localization, in: Proceedings of theIEEE conference on computer vision and pattern recognition, 2016,pp. 2921–2929.[23] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Ba-tra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international confer-ence on computer vision, 2017, pp. 618–626.[24] A. Chattopadhay, A. Sarkar, P. Howlader, V. N. Balasubramanian,Grad-cam++: Generalized gradient-based visual explanations fordeep convolutional networks, in: 2018 IEEE Winter Conference onApplications of Computer Vision (WACV), IEEE, 2018, pp. 839–847.[25] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel,X. Hu, Score-cam: Score-weighted visual explanations for convolu-tional neural networks, in: Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops, 2020, pp.24–25.[26] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van denBogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts,et al., Validation, comparison, and combination of algorithms forautomatic detection of pulmonary nodules in computed tomographyimages: the luna16 challenge, Medical image analysis 42 (2017) 1–13.[27] P. A. Yushkevich, Y. Gao, G. Gerig, Itk-snap: An interactive tool forsemi-automatic segmentation of multi-modality biomedical images,in: 2016 38th Annual International Conference of the IEEE Engi-neering in Medicine and Biology Society (EMBC), IEEE, 2016, pp.3342–3345.[28] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for imagerecognition, in: Proceedings of the IEEE conference on computervision and pattern recognition, 2016, pp. 770–778.[29] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An im-perative style, high-performance deep learning library, in: Advancesin neural information processing systems, 2019, pp. 8026–8037.[30] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization,arXiv preprint arXiv:1412.6980 (2014).[31] W. Shen, M. Zhou, F. Yang, C. Yang, J. Tian, Multi-scale convolu-tional neural networks for lung nodule classiﬁcation, in: InternationalConference on Information Processing in Medical Imaging, Springer,2015, pp. 588–599.[32] S. Hussein, R. Gillies, K. Cao, Q. Song, U. Bagci, Tumornet: Lungnodule characterization using multi-view convolutional neural net-work with gaussian process, in: 2017 IEEE 14th International Sym-posium on Biomedical Imaging (ISBI 2017), IEEE, 2017, pp. 1007–1010.[33] I. Bonavita, X. Rafael-Palou, M. Ceresa, G. Piella, V. Ribas, M. A. G.Ballester, Integration of convolutional neural networks for pulmonarynodule malignancy assessment in a lung cancer classiﬁcation pipeline,Computer methods and programs in biomedicine 185 (2020) 105172. xxx et al.: