[PDF] DAN-Net: Dual-Domain Adaptive-Scaling Non-local Network for CT Metal Artifact Reduction

Abstract

Metal implants can heavily attenuate X-rays in computed tomography (CT) scans, leading to severe artifacts in reconstructed images, which significantly jeopardize image quality and negatively impact subsequent diagnoses and treatment planning. With the rapid development of deep learning in the field of medical imaging, several network models have been proposed for metal artifact reduction (MAR) in CT. Despite the encouraging results achieved by these methods, there is still much room to further improve performance. In this paper, a novel Dual-domain Adaptive-scaling Non-local network (DAN-Net) for MAR. We correct the corrupted sinogram using adaptive scaling first to preserve more tissue and bone details as a more informative input. Then, an end-to-end dual-domain network is adopted to successively process the sinogram and its corresponding reconstructed image generated by the analytical reconstruction layer. In addition, to better suppress the existing artifacts and restrain the potential secondary artifacts caused by inaccurate results of the sinogram-domain network, a novel residual sinogram learning strategy and nonlocal module are leveraged in the proposed network model. In the experiments, the proposed DAN-Net demonstrates performance competitive with several state-of-the-art MAR methods in both qualitative and quantitative aspects.

Full PDF

DDAN-Net: Dual-Domain Adaptive-Scaling Non-local Network for CT Metal Artifact Reduction

Tao wang , Wenjun Xia , Yongqiang Huang , Huaiqiang Sun , Yan Liu , Hu Chen , Jiliu Zhou , Yi Zhang  College of Computer Science, Sichuan University, Chengdu 610065, China 2.

Department of Radiology, West China Hospital of Sichuan University, Chengdu 610041, China 3.

College of Electrical Engineering, Sichuan University, Chengdu 610065, China

Abstract : Metal implants can heavily attenuate X-rays in computed tomography (CT) scans, leading to severe artifacts in reconstructed images, which significantly jeopardize image quality and negatively impact subsequent diagnoses and treatment planning. With the rapid development of deep learning in the field of medical imaging, several network models have been proposed for metal artifact reduction (MAR) in CT. Despite the encouraging results achieved by these methods, there is still much room to further improve performance. In this paper, a novel Dual-domain Adaptive-scaling Non-local network (DAN-Net) for MAR. We correct the corrupted sinogram using adaptive scaling first to preserve more tissue and bone details as a more informative input. Then, an end-to-end dual-domain network is adopted to successively process the sinogram and its corresponding reconstructed image generated by the analytical reconstruction layer. In addition, to better suppress the existing artifacts and restrain the potential secondary artifacts caused by inaccurate results of the sinogram-domain network, a novel residual sinogram learning strategy and nonlocal module are leveraged in the proposed network model. In the experiments, the proposed DAN-Net demonstrates performance competitive with several state-of-the-art MAR methods in both qualitative and quantitative aspects.

Key words: computed tomography, metal artifact reduction, deep learning, dual-domain network  Corresponding author.

E-mail address: [email protected]. . Introduction

Computed tomography (CT) technology has developed rapidly in clinical, industrial, security and other spheres [1]. With the help of CT images, medical diagnosis and treatments can be conducted effectively. However, the effects of noise, photon starvation, beam hardening, scattered radiation and nonlinear partial volume effects are much more severe in the case of metallic implants in scanned regions [2]. Due to these metallic objects, the reconstructed CT images are contaminated by heavy artifacts, specifically those called “metal artifacts.” These artifacts degrade the imaging quality and severely compromise doctors’ diagnoses. In particular, some artifacts and certain lesions have considerable commonalities, leading to misdiagnosis, and subsequent medical image analysis is difficult [3]. Therefore, it is of great significance to reduce metal artifacts in CT images. During the past several decades, numerous metal artifact reduction (MAR) methods have been dedicated to addressing the abovementioned problem. Conventional MAR methods can be grouped into three categories: projection completion methods, iterative reconstruction methods and image postprocessing methods [4]. The projection completion methods regard projection data in the metal trace as missing information and fill in lost data with estimated values by different image inpainting [5-7] or interpolation strategies [8-12]. Linear or polynomial interpolations [8, 9, 12] are widely adopted to estimate the missed values in projection data. However, interpolation-based methods can hardly guarantee smoothness at the interpolation boundaries [13]. After filtering, discontinuities are amplified at the metal trace boundaries, which introduce new artifacts into the reconstructed CT images. To fully explore the local information in both dimensions of the angle and detector bin, some diffusion-based image inpainting methods were introduced for projection completion [5-7]. Although these methods may mitigate the discontinuity to some extent, extra artifacts are still inevitable in the reconstructed images. o smooth the transition region between the metal and nonmetal portions and to suppress secondary artifacts, some prior image-based methods have been proposed [14-16], such as the normalized metal artifact reduction (NMAR) method [15]. NMAR normalizes the projection data with the constraint of prior images obtained by multi-threshold segmentation based on interpolation methods. Corrected CT images can be derived from completed sinograms by filtered back projection (FBP). However, the result of NMAR is limited by the quality of the prior image. In addition, FBP is based on the line integral model, which does not take into account the statistical characteristics of measured data and simply assumes that the measured data are noiseless and that all response lines have the same weight, which is not always consistent with the real situation. Iterative reconstruction is an alternative way to tackle these problems, which improves image quality gradually based on constrained optimization, such as the least square method and maximum likelihood. Classical iterative reconstruction MAR methods can be divided into two groups. One uses projection data outside of the metal trace, which can be regarded as clean data [11, 17-22]. The other adopts a statistical objective function to decay corrupted projection data [23]. However, iterative methods are usually time-consuming and require manually well-designed regularizers [24, 25], both of which bring difficulties to clinical application. Image postprocessing methods [26, 27] aim to reduce metal artifacts in the image domain without accessing raw projection data. However, since the noise and artifacts in CT images do not obey any specific statistical distribution, postprocessing methods usually cannot suppress the artifacts well and are apt to distort the anatomic structure [4, 28]. Recently, with the successful applications of deep learning (DL) in many fields [29-33], DL-based methods have shown great potential for medical imaging [34, 35]. Recently, several DL-based MAR methods have been proposed. Different network architectures, such as convolutional neural networks (CNNs) and generative adversarial networks (GANs), have been utilized to recover the missed data in he metal trace [36-40]. Meanwhile, some studies have been dedicated to using DL methods to reduce metal artifacts in the image domain. Zhang et al. [41] proposed a CNN framework (CNNMAR) fusing different MAR methods to improve the performance of artifact reduction. To eliminate metal artifacts from original CT images, [42] introduced a novel unsupervised artifact disentanglement network (ADN). Gjesteby et al. [43] took detailed images derived from filtering and base images by NMAR as inputs and mapped them to metal artifact-free images with a dual-stream residual network. Despite the encouraging results achieved by the abovementioned sinogram- or image domain-based DL methods, there are still some limitations in single-domain methods, which has been demonstrated in [44]. For sinogram domain-based DL methods, although corrupted projections within metal traces are local, it is difficult to preserve continuity at metal trace boundaries, where secondary artifacts can be introduced easily. In terms of the image domain-based methods, the input CT images reconstructed from corrupted sinograms are full of severe artifacts, which cover most clinically important details. These images with low quality may lead to misclassification of some structures and artifacts due to their similar patterns. In addition, artifacts are nonlocal in the image, which is hard to remove completely. Thus, the goal for MAR becomes twofold. The first goal is to eliminate existing artifacts as much as possible, and the other is to avoid introducing extra artifacts. To this end, combining the merits of both projection and image domain-based methods is meaningful. Some end-to-end dual domain networks were proposed very recently. Lin et al. [44] proposed DuDoNet, which progressively restores sinogram consistency and enhances CT images linked by a differentiable radon inversion layer. Lyu et al. [45] proposed improving DuDoNet by specifying the metal mask projection and encoding it into the network. Yu et al. [28] proposed employing an image-domain network to generate a prior image at first. Then, the sinogram obtained from the prior image was utilized to guide the sinogram-domain network. In [46], the authors roposed using partial convolution to recover irregular metal trace regions with only valid pixels outside the corrupted areas. Furthermore, an auxiliary inpainting network is introduced to suppress the secondary artifacts in the reconstructed image from the previous step. Both sinograms from the last two steps were fused to generate the final result. Due to their state-of-the-art performance, dual-domain networks have become the mainstream for MAR. However, current dual domain-based methods still suffer from some critical limitations. [44, 46] regarded projection data in the metal trace as missing data, which resulted in the loss of details near the metal area in reconstructed CT images. [45] and [28] used metal corrupted projection data and corresponding reconstructed CT images as inputs directly. Actually, the data in the metal trace have a much higher amplitude than the data outside the metal trace, and there is a rapid change at the boundary of the metal trace. According to the CT imaging principle [45], due to this amplitude difference, data inside and outside of the metal trace can be regarded as obeying two different data distributions. It is difficult for neural networks to transform two different data distributions into a uniform distribution. As demonstrated in [44], the authors experimentally found that their method did not perform well while taking original sinogram and CT images as inputs. Meanwhile, the change in the boundary will cause weak continuity of the first derivative of projection data in a certain section, which will be further expanded by filtering and will generate extra artifacts [47]. To address the problems mentioned above, in this paper, a novel Dual-domain Adaptive-scaling Non-local network (DAN-Net) for MAR is proposed. The projection data are considered to be composed of two parts: one part comes from the tissues and the other part comes from the metal objects. A rough estimation of tissue-like projection data in the metal trace is obtained by a linear interpolation operation, and the residual between it and the original projection data is regarded as the contribution of the metal. o weaken the rapid change caused by metal implants and retain the data characteristics of this part, the residual in the metal trace is adaptively scaled [48, 49] and then filtered by an average filter to further improve the continuity at metal trace boundaries. The results of this adaptive scaling and corresponding reconstruction by FBP are used as the inputs of our network. In addition, a novel residual sinogram learning strategy is applied in the sinogram-domain network to weaken the rapid change in projection data and improve the smoothness of the projection. On the other hand, to handle the nonlocality of artifacts, a nonlocal U-Net architecture is employed for image-domain enhancement, capturing long-range dependencies via nonlocal operations. The whole network is trained in an end-to-end manner so that the image-domain enhancement and sinogram-domain enhancement can benefit each other. Our main contributions are summarized as follows. 1)

Different from current dual-domain networks, the original sinogram is preprocessed using adaptive scaling and accompanied by its corresponding FBP result as the inputs, which can preliminarily suppress metal artifacts. 2)

A novel residual sinogram learning strategy is proposed to improve the smoothness of the projection and alleviate the secondary artifacts in the reconstructed CT images. 3)

A nonlocal U-Net architecture is designed for image-domain enhancement, which can capture long-range dependencies of metal artifacts and further improve image quality. The remainder of this paper is organized as follows. The proposed DAN-Net is elaborated in Section 2. The experiments on and results for the simulated and clinical data are presented in Section 3. The results of analytical studies are shown in Section 4. Discussion and the conclusion are provided in Section 5. Method .1

Problem formulation

In our work, we consider the case of a 2D attenuation distribution. If there are metallic objects in the scanner field, the linear attenuation coefficients can be expressed as follows: μ(E) = μ 𝑡 (E)⨀(1 − 𝑀) + μ 𝑚 (E)⨀𝑀 (1) where μ 𝑡 (E) and μ 𝑚 (E) represent attenuation images of normal human tissue and metal parts, respectively; 𝑀 denotes the metal mask in the CT image; and ⨀ is the elementwise multiplication. In this case, letting 𝒫 be a forward projection operation, the projection data contaminated by metals, 𝑆 𝑚𝑎 , can be calculated as follows [45]: 𝑆 𝑚𝑎 = −ln ∫ 𝜂(𝐸)exp (−𝒫(μ(E))) dE = −ln ∫ 𝜂(𝐸)exp (−𝒫((μ 𝑡 (E)⨀(1 − 𝑀) + μ 𝑚 (E)⨀𝑀)) dE = −ln ∫ 𝜂(𝐸)exp (−𝒫(μ 𝑡 (E)⨀(1 − 𝑀))) dE − ln ∫ 𝜂(𝐸)exp (−𝒫(μ 𝑚 (E)⨀𝑀)) dE = 𝑆 𝑡𝑖𝑠𝑠𝑢𝑒 + 𝑆 𝑚𝑒𝑡𝑎𝑙 (2) where 𝜂(𝐸) denotes the intensity distribution with spectral energy at 𝐸 . Thus, 𝑆 𝑚𝑎 can be regarded as having two parts: one is contributed by the attenuation of tissues, denoted as 𝑆 𝑡𝑖𝑠𝑠𝑢𝑒 , and the other is produced by metal objects, denoted as 𝑆 𝑚𝑒𝑡𝑎𝑙 . The tissue-like projection data in the metal trace, referred to as 𝑆 𝐿𝐼 , are obtained by performing linear interpolation, and the residual of the original projection data and linear interpolation results, notated as 𝑆 𝑠𝑢𝑏 = 𝑆 𝑚𝑎 − 𝑆 𝐿𝐼 , are regarded as the metal contribution. In most MAR methods, the target is to remove 𝑆 𝑚𝑒𝑡𝑎𝑙 from 𝑆 𝑚𝑎 . Ideally, we assume 𝑆 𝐿𝐼 = 𝑆 𝑡𝑖𝑠𝑠𝑢𝑒 and 𝑆 𝑠𝑢𝑏 = 𝑆 𝑚𝑒𝑡𝑎𝑙 . However, 𝑆 𝐿𝐼 is just a coarse estimation of 𝑆 𝑡𝑖𝑠𝑠𝑢𝑒 , and some useful information is still reserved in 𝑆 𝑠𝑢𝑏 . Moreover, as shown in Eq. (1), if we simply discard the projection data in the metal trace, both tissue and metal projections will be lost, and the reconstructed CT image has to take the risk of losing tissue details around the metallic implants. Based on these considerations, our method attempts o restore this valuable information 𝑆 𝑣 from 𝑆 𝑠𝑢𝑏 . Then, the artifact-suppressed image 𝑋 𝑠𝑖𝑛𝑜 can be reconstructed from 𝑆 𝐿𝐼 + 𝑆 𝑣 , which can be expressed as 𝑋 𝑠𝑖𝑛𝑜 = 𝒫 −1 ( 𝑆 𝐿𝐼 + 𝑆 𝑣 ) , in which 𝒫 −1 denotes FBP operation. The proposed DAN-Net

To simultaneously leverage the advantages of both sinogram- and image-domain information, we adopt a dual-domain joint learning strategy for CT MAR, and back-propagation of gradients is conducted by the analytical reconstruction layer. Fig. 1 depicts the overview of our proposed DAN-Net, which consists of three components: adaptive scaling, sinogram-domain network and image-domain network. More details are presented in subsequent sections.

When X-rays pass through a metal material with high attenuation coefficients, the intensity of low energy will be significantly reduced. At this time, the beam-hardening effect will be more pronounced, leading to an abrupt change in projection data at metal trace boundaries. As we mentioned before, this change will raise more artifacts after filtering. To eliminate the rapid shift in projection data caused by

Fig. 1: Overview of the proposed DAN-Net. he metal and maintain more useful information, Chen et al.[50] adopted a linear attenuation operation to restore the data in the metal trace, which can be written as the following formula for simplicity: 𝑆 𝑟𝑒𝑠 = 𝜆 ∗ 𝑆 𝑠𝑢𝑏 (3) 𝑆 𝑝𝑟𝑒 = 𝑆 𝐿𝐼 + 𝑆 𝑟𝑒𝑠 (4) where 𝜆 is the scaling parameter to control the trade-off between artifact reduction and detail preservation around the metallic implant in the final reconstructed CT images. 𝑆 𝑟𝑒𝑠 and 𝑆 𝑝𝑟𝑒 represent the scaled metal projection and the corrected projection after adaptive scaling, respectively. As a result, a greater 𝜆 will keep more tissue details but lead to more artifacts as well, while a smaller 𝜆 will generate fewer artifacts but lose more tissue details. Typically, the value of 𝜆 is set to between 0.3 and 0.5 according to [50]. We chose 𝜆 = 0.4 experimentally and corresponding adaptively scaled CT images obtained as 𝑋 𝑝𝑟𝑒 = 𝒫 −1 ( 𝑆 𝑝𝑟𝑒 ) . Fig. 2 shows one example of the original sinogram, an LI corrected sinogram and our adaptively scaled sinogram, as well as their corresponding reconstructions. Although the adaptively scaled result (D2) has more artifacts than the LI corrected output (C2), many Fig. 2: Visual comparison with LI and adaptive scaling. (A1) and (A2) refer to the sinogram image and the referenced CT image. (B1) and (B2) are the corrupted sinogram and reconstruction. The MAR sinogram and CT image using LI (C1) and (C2) and adaptive scaling (D1) and (D2) are also presented. The simulated metal masks are colored in red for better visualization. ore bone and tissue details, especially bounding the metal, are reserved. It will be helpful to compensate for the inaccurate beam hardening corrections and reduce the impact of any errors within the projections of metal implants when backprojecting them to other bone or tissue positions in the reconstructed image.

To complete the sinogram, we train a neural network to process the projection data, denoted as 𝐺 𝑠𝑖𝑛𝑜 . If we take only the LI corrected sinogram 𝑆 𝐿𝐼 as the input of 𝐺 𝑠𝑖𝑛𝑜 , due to the characteristics of 𝑆 𝐿𝐼 , the CT image reconstructed from the output of 𝐺 𝑠𝑖𝑛𝑜 will be oversmoothed, and some tissue details will be lost [45]. On the other hand, it is challenging to restore information directly from the original corrupted sinogram because the projection data inside and outside the metal trace follow two different distributions. Moreover, as we depicted before, there is some useful information in the adaptively scaled residual sinogram 𝑆 𝑟𝑒𝑠 . To remedy these drawbacks, instead of taking the original sinogram or LI refined result as the input of 𝐺 𝑠𝑖𝑛𝑜 , we propose a residual sinogram learning strategy for 𝐺 𝑠𝑖𝑛𝑜 , e.g., taking 𝑆 𝑟𝑒𝑠 as input to enhance the smoothness of preprocessed projection data, retrieving useful information from the metal trace region 𝑀 𝑡 and improving the continuity at the metal trace boundary. Meanwhile, current networks contain down-sampling operations, which will cause the information loss of metal traces [36, 38]. In this work, U-Net is utilized as the backbone of 𝐺 𝑠𝑖𝑛𝑜 . To retain sufficient information on metal traces, a mask pyramid network (MPN) [51] is introduced to explicitly feed the mask information into each layer [28, 45]. To utilize the knowledge of metal mask projection 𝑀 𝑝 specifically, we concatenate 𝑆 𝑟𝑒𝑠 and 𝑀 𝑝 together as the inputs of 𝐺 𝑠𝑖𝑛𝑜 [45]. Thus, we have 𝑀 𝑝 = 𝒫(𝑋 𝑚𝑒𝑡𝑎𝑙 ) (5) 𝑀 𝑡 = 𝛿(𝑀 𝑝 > 0) (6) here 𝑋 𝑚𝑒𝑡𝑎𝑙 represents the metal mask in the image domain, and 𝛿(∙) is a binary indicator function. Since our main goal is to retrieve information from the metal trace, we only refine the adaptively scaled residual sinogram in the metal trace. The details of 𝐺 𝑠𝑖𝑛𝑜 are shown in Fig. 3 and the corrected sinogram can be written as 𝑆 𝑠𝑖𝑛𝑜 = 𝐺 𝑠𝑖𝑛𝑜 (𝑆 𝑟𝑒𝑠 , 𝑀 𝑝 )⨀𝑀 𝑡 + 𝑆 𝐿𝐼 (7) where ⨀ stands for the elementwise multiplication. L1 loss is adopted to measure the differences between 𝑆 𝑠𝑖𝑛𝑜 , and the ground truth 𝑆 𝑔𝑡 as ℒ 𝑠𝑖𝑛𝑜 = ∥ (𝑆 𝑠𝑖𝑛𝑜 − 𝑆 𝑔𝑡 )⨀ 𝑀 𝑡 ∥ (8) Then, 𝑋 𝑠𝑖𝑛𝑜 = 𝒫 −1 (𝑆 𝑠𝑖𝑛𝑜 ) can be obtained using an analytical reconstruction layer, which is differentiable and easily injected into neural networks. To alleviate the secondary artifacts in the reconstructed CT image, the L1 reconstruction loss between 𝑋 𝑠𝑖𝑛𝑜 and the ground truth image 𝑋 𝑔𝑡 is utilized as ℒ 𝐹𝐵𝑃 = ∥ (𝑋 𝑠𝑖𝑛𝑜 − 𝑋 𝑔𝑡 )⨀(𝟏 − 𝑀) ∥ (9) Fig. 3: An illustration of the 𝐺 𝑠𝑖𝑛𝑜 . K: kernel, S: stride, and P: padding sizes. o suppress the secondary artifacts introduced by the errors of projection data completion in 𝐺 𝑠𝑖𝑛𝑜 , we also utilize U-Net as the backbone to enhance the reconstructed CT images. For computational efficiency, we halve the channel numbers. It is well known that convolution is a local operator whose receptive field is limited by the size of filters. Once the network is insufficiently deep, it is difficult to capture the latent features in long-range dependencies. For instance, since images have the property of self-similarity [52], and metal artifacts are nonlocal, convolution-based postprocessing methods may fail to remove the artifacts well. To tackle this problem, a nonlocal network (NLN) [53], which can capture long-range dependencies via nonlocal operations, is introduced into our proposed image domain network 𝐺 𝑖𝑚 . NLNs originate from the nonlocal means denoising method [52]. Different from nonlocal means, which performs weighted summation with similar pixels, NLN captures feature maps globally. A generic nonlocal operation is defined as 𝑦 𝑖 = ∑ 𝑓(𝑥 𝑖 , 𝑥 𝑗 )𝑔(𝑥 𝑗 ) 𝑗∈𝑆 (10) where 𝑥 𝑖 represents the i -th element to be replaced, and 𝑦 𝑖 is the result. 𝑆 represents a search window. The pairwise function 𝑓 computes the similarity between 𝑥 𝑖 and 𝑥 𝑗 , which is expressed as follows: 𝑓(𝑥 𝑖 , 𝑥 𝑗 ) = exp (𝜃 (𝑥 𝑖 ) 𝑇 )exp (𝜃 (𝑥 𝑗 )) (11) where 𝜃 (𝑥 𝑖 ) = 𝑊 𝑥 𝑖 and 𝜃 (𝑥 𝑗 ) = 𝑊 𝑥 𝑗 are two embeddings of feature maps, and 𝑊 and 𝑊 are the learnable weight matrices. The function 𝑔 serves to compute a representation of the input signal at Fig. 4: An illustration of the NLN he position of 𝑗 . According to [53], 𝑔 is a linear embedding: 𝑔(𝑥 𝑗 ) = 𝑊 𝑔 𝑥 𝑗 , where 𝑊 𝑔 is a learned weighting matrix. 𝐶(𝑥) represents the normalization factor, which is defined as

𝐶(𝑥) = ∑ 𝑓(𝑥 𝑖 , 𝑥 𝑗 ) 𝑗∈𝑆 (12) To insert nonlocal operations into the neural network, a residual connection is adopted: 𝑧 𝑖 = 𝑊 𝑧 𝑦 𝑖 + 𝑥 𝑖 (13) where 𝑥 𝑖 denotes input data, i.e., the residual connection in ResNet [54]. In this work, the NLN module is embedded into 𝐺 𝑖𝑚 after the second and third down-sampling steps, as depicted in Fig. 1. In NLN, 𝑊 , 𝑊 , 𝑊 𝑔 and 𝑊 𝑧 are obtained by convolution. Fig. 4 shows the nonlocal module. To focus on the artifact-impacted regions, 𝑋 𝑠𝑖𝑛𝑜 and 𝑋 𝑝𝑟𝑒 are concatenated as inputs of 𝐺 𝑖𝑚 . A residual learning strategy is also adopted, which is written as: 𝑋 𝑖𝑚 = 𝑋 𝑝𝑟𝑒 + 𝐺 𝑖𝑚 (𝑋 𝑠𝑖𝑛𝑜 , 𝑋 𝑝𝑟𝑒 ) (14) The details of 𝐺 𝑖𝑚 are shown in Fig. 5. 𝐺 𝑖𝑚 is also optimized with L1 loss in the image domain: ℒ 𝑖𝑚 = ∥ (𝑋 𝑖𝑚 − 𝑋 𝑔𝑡 )⨀(1 − 𝑀) ∥ (15) In summary, the total objective function is: Fig. 5: An illustration of the 𝐺 𝑖𝑚 . K: kernel, S: stride, P: padding sizes and D: dilation. ℒ = ℒ 𝑠𝑖𝑛𝑜 + 𝛼 ∗ ℒ

𝐹𝐵𝑃 + 𝛽 ∗ ℒ 𝑖𝑚 (16) where 𝛼 and 𝛽 are the weighting parameters of different components. In our experiments, we empirically set 𝛼 = 𝛽 = 1 . Experiments

In this section, the data generation, details of neural networks, training strategies and experimental results will be shown in detail.

Dataset

For data simulation, we followed the procedure of [44] and used the DeepLesion dataset [55], which has high diversity and good quality. For metal mask simulation, the shape, size and positions of masks should be delicately designed to cover real clinical scenes. In this work, we employed the masks generated from [41], containing 100 manually segmented metal implants with all kinds of metal implants, such as dental fillings, spine fixed crews, hip prostheses, coiling and wires. Specifically, we randomly selected 1000 CT images from the DeepLesion dataset and 90 metal masks to synthesize 90,000 combinations in the training set. The remaining 200 CT images and 10 masks were adopted for evaluation. The original CT images were resized to

256 × 256 for computational efficiency. To simulate Poisson noise, a polychromatic X-ray source was employed, and the incident beam X-ray was set to 2 × 10 photons [56]. The partial volume effects and scatter were also taken into consideration. Without loss of generality, our experiments were restricted to 2D parallel-beam geometry, i.e., the sinograms of CT images were obtained by the radon function with MATLAB R2017b. For the sampling condition, 367 detector bins and 361 sampling views uniformly distributed from 0° to 180° were assumed. Therefore, the sinogram had a size of

367 × 361 . Unlike [44], we truncated the CT values to [0, 4095], which better conforms to the real situation. .2 Implementation details

We trained our network in an end-to-end manner, and the model was implemented with the

PyTorch framework [57]. The back-projection was implemented by the numba library in Python, which can improve the computational efficiency, aided by CUDA. The network was optimized by the Adam optimizer with the parameters (𝛽 , 𝛽 ) = (0.5,0.999) . The learning rate was initialized to 0.0002 and halved every 20 epochs. The network was trained with 200 epochs on an NVIDIA 1080Ti GPU with 11 GB memory, and the batch size was 4.

Comparison with State-of-the-Art Methods

TABLE I Quantitative comparison of different methods on the simulated dataset Methods Uncorrected LI NMAR CNNMAR DuDoNet ADN DAN-Net PSNR 15.33 30.74 30.83 32.15 36.82 33.60

SSIM 0.6673 0.9224 0.9270 0.9508 0.9777 0.9275

The proposed DAN-Net was compared with several state-of-the-art MAR methods: linear interpolation (LI) [8], NMAR [15], CNNMAR [41], DuDoNet [44] and ADN [42]. LI and NMAR are classic methods widely used in MAR. CNNMAR is a well-known application of DL in MAR that comprehensively demonstrates the effectiveness and potential of CNN-based methods. DuDoNet is a supervised dual-domain framework in MAR that incorporates an extra sinogram enhancement network to ease the learning of the image domain. ADN is a state-of-the-art unsupervised framework in MAR that disentangles CT images corrupted by metal artifacts into an artifact-free domain and a pure artifact domain and decodes disentangled representations of artifact-free domains to artifact-suppressed images. For the LI, NMAR, CNNMAR, and ADN methods, we used publicly released codes. Because there are no public implementations of the DuDoNet method, we reimplemented it following [44].

Fig. 6: Visual comparison using different methods on the simulated dataset with different metal sizes. (A1)-(A3) Reference images ith different metal sizes; (B1)-(B3) metal corrupted images; (C1)-(C3) corresponding results of LI; (D1)-(D3) corresponding results of NMAR; (E1)-(E3) corresponding results of CNNMAR; (F1)-(F3) corresponding results of DuDoNet; (G1)-(G3): corresponding results of ADN; (H1)-(H3) corresponding results of our method.

Fig. 7: Sinogram visual comparison of case 3 in Fig. 4 using different sinogram enhancement methods on the simulated dataset. (A) Reference images; (B) metal corrupted images; (C) corresponding results for LI; (D) corresponding results of NMAR; (E) corresponding results for CNNMAR; (F) corresponding results for DuDoNet; and (G) corresponding results for DAN-Net.

Structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) are adopted as quantitative metrics. TABLE I lists the quantitative results obtained by calculating the mean values of both metrics on all of the test images using different methods. It is observed that the traditional MAR methods LI and NAMR significantly improve both SSIM and PSNR values compared with uncorrected CT images, and NMAR achieves better scores than LI. NMAR outperforms LI since it takes advantage of both prior images and the LI method. CNNMAR takes the outputs of different MAR methods as inputs, which is an open framework to fuse the merits of different MAR methods based on DL technology and outperforms conventional methods. ADN is an advanced unsupervised DL-based method that achieves similar performance to CNNMAR without the need for paired training data. DuDoNet and our method attain emarkable improvements on both SSIM and PSNR since they simultaneously leverage the advantages of the sinogram domain and image domain. Compared with DuDoNet, DAN-Net further raises the scores, which demonstrates the performance of our proposed method quantitatively. For qualitative comparisons, the visual results are shown in Fig. 6, presenting three representative metallic implants with different sizes. In Fig. 6, metal-free images, metal-corrupted images and the results using different MAR methods are included. For better visualization, the simulated metal masks are colored in red. In Fig. 6, it can be seen that in the case of small metallic implants, the traditional methods, LI and NAMR, still contain some radial artifacts, while DL-based methods perform better. When metal objects get larger, LI and NAMR perform even worse. LI and NMAR introduce obvious new artifacts in Fig. 6 (C1-C3&D1-D3). Although CNNMAR suppresses secondary artifacts, distorted structures and missing tissue details can be observed in Fig. 6 (E3). Another point that needs to be mentioned is that it can be noticed in the third case that other methods fail to preserve the details around metallic implants, while DAN-Net maintains these structural details more completely. Fig. 7 shows the corresponding intermediate sinogram enhancement results. Considering that ADN is an image postprocessing method, its sinogram enhancement is not presented. In the regions indicated by the blue arrows in Fig. 7 (C), there are obvious artificial boundaries, whereas in the results of other methods, these boundaries are inconspicuous. In Fig. 7 (D-G), as indicated by the green arrows, NMAR, CNNMAR and DuDoNet generate visible differences from the reference sinogram (Fig. 7 (A)), and DAN-Net achieves the most visibly consistent sinogram with the reference.

Clinical study

To verify the performance of proposed DAN-Net in a clinical scenario, two clinical CT images with small and large metal artifacts were tested. In this experiment, the metal artifacts were empirically segmented

Fig. 8: Visual comparison with different MAR methods on a clinical CT image. A1-G1 and A2-G2 represent uncorrected CT images and corrected results using LI, NMAR, CNNMAR, DuDoNet, ADN and DAN-Net.

Fig. 9: Visual comparison with different MAR methods on a clinical CT image. A1-G1 and A2-G2 represent uncorrected CT images and corrected results using LI, NMAR, CNNMAR, DuDoNet, ADN and DAN-Net. using 2000 HU as the threshold. The test images were normalized to the same range as the training data. Fig. 8 and Fig. 9 present the MAR results using different methods. It is observed that DAN-Net uppresses most of the metal artifacts and preserves the fine-grained anatomical structures around the metals, which supplies coherent results to the simulated data and demonstrates the potential for real clinical application. Meanwhile, the performance of most MAR methods is dependent on the previous results of segmentation, and our method will also benefit from a more accurate segmentation algorithm.

TABLE II: Quantitative comparison of different variants of our method on the simulated dataset. Methods Uncorrected Sino-Net Res-Sino-Net IM-Net Non-local-IM-Net Ma-Dual-Net DAN-Net PSNR 15.33 31.43 31.71 33.79 34.75 34.15

SSIM 0.6673 0.9232 0.9494 0.9520 0.9720 0.9597 Ablation Study

In this section, we investigate the effectiveness of different modules of the proposed DAN-Net. The ablation study configurations are listed as follows: 1)

Sino-Net: the sinogram-domain network without residual learning; 2)

Res-Sino-Net: the sinogram-domain network with residual learning; 3)

IM-Net: the image-domain network without a nonlocal module; 4)

Nonlocal-IM-Net: the image-domain network with the nonlocal module; 5)

Ma-Dual-Net: a dual-domain network with sinogram-domain residual learning and an image-domain nonlocal module without an adaptively scaled sinogram; and 6)

DAN-Net: the same architecture as Ma-Dual-Net with an adaptively scaled sinogram. The quantitative results of the ablation study are given in TABLE II and the visual results are shown in Fig. 10 and Fig. 11.

Fig. 10: Sinograms and corresponding reconstructions with sinogram-domain enhancement methods. The simulated metal masks are colored red. (A1) and (A2): ground truth. (B1) and (B2): Sino-Net. (C1) and (C2): Res-Sino-Net.

Fig. 11: Reconstructions using image-domain and dual-domain enhancement methods. The simulated metal masks are colored in red. The reference image is Fig. 8 (A1). (A): IM-Net, (B): Nonlocal-IM-Net, (C): Ma-Dual-Net and (D): DAN-Net.

To evaluate the performance of our residual sinogram learning strategy, a neural network that takes the original sinogram as input to complete the projection data within the metal trace was trained. In Table II, it is obvious that the residual sinogram learning strategy significantly improves the SSIM and PSNR values. The visual results are shown in Fig. 10 (C) and (D). There are evident artifacts in the result of Sino-Net, experimentally demonstrating that it is difficult for a convolutional neural network to transform two different data distributions to the same distribution in the sinogram domain. In contrast, in the Res-ino-Net results, residual information is recovered from adaptively scaled projection data within the metal trace, thereby easing network learning.

To further suppress artifacts in the image domain, a nonlocal U-Net architecture is adopted. To validate the effectiveness of this modification, a neural network is trained to refine the CT images directly without the nonlocal module. The qualitative comparison is shown in TABLE II, from which we can see that the nonlocal-IM-Net has higher SSIM and PSNR values than IM-Net. For the qualitative comparison, it is observed that artifacts are better suppressed in the results of Nonlocal-IM-Net than IM-Net in Fig. 11 (A) and (B).

In this subsection, the impact of adaptive scaling is sensed. Ma-Dual-Net takes the original sinogram and corresponding reconstructed image as the inputs, and ours takes the adaptively scaled sinogram and corresponding reconstructed image as the inputs. In Table II, our approach outperforms Ma-Dual-Net in quantitative aspects. The visual comparison is also presented in Fig. 11 (C) and (D), in which we can observe that our method retrieves many more structural details around the metallic implants. Discussions and conclusion

Due to the insertion of metals, the imaging quality of CT images will significantly degrade. Over the past few decades, a large number of MAR methods have been proposed to alleviate the effects of metal artifacts in CT images. In conventional methods, projection data in the metal trace are regarded as missing, and MAR is formulated as an image interpolation or inpainting problem. Some interpolation methods, such as linear interpolation and cubic polynomial interpolation, are applied to fill the missed projection data. Nonetheless, since most interpolation methods cannot guarantee continuity near the interpolation oundary, there are apparent borderlines in the corrected sinogram, and secondary artifacts appear. Furthermore, since the projection data in the metal trace are simply abandoned and replaced with the value estimated with the data outside the metal trace, the information within the metal trace is lost, leading to the loss of tissue details around the metal in the reconstructed CT image. Therefore, not only secondary artifacts but also details are lost in interpolation-based methods. In practice, it is difficult for single-domain methods to achieve both goals simultaneously [44]. However, interpolation-based methods can generate a proper initial estimation for DL-based methods, which has been employed in several works. In our work, we also introduce this technique. In this work, we combine the advantages of conventional MAR approaches and DL-based methods to further improve the performance. Although we adopt the same end-to-end training strategy, there are some significant differences. To restrain artifacts and maintain tissue details more efficiently, adaptive scaling on the original projection data in the metal trace is applied. Then, the preprocessed sinogram and corresponding reconstructed CT images are utilized as the inputs of our network. Because metal has a much higher attenuation coefficient, the projection data inside and outside the metal trace can be regarded as two different data distributions. It is difficult to convert two different data distributions to a unified distribution for normal networks. To tackle this problem, a residual learning strategy that only modifies the metal trace region values of the adaptively scaled sinogram is used. To alleviate the new artifacts introduced in image domain enhancement, we propose a novel nonlocal U-Net architecture that can capture long-range dependencies to suppress metal artifacts. However, there are some limitations to our work, and we will dedicate ourselves to solving them in the future. In an end-to-end training manner, it is preferable to obtain the adaptive parameter by learning instead of through a manual setting. Fortunately, the subsequent filtering can reduce the influence of naccurate parameter selection according to [50]. In the future, we will investigate how to integrate this parameter learning into the model to minimize human interference. We trained and evaluated our networks on simulated datasets, and few clinical CT images were used to validate the effectiveness of our model. In the future, we will collect large-scale clinical images to evaluate the performance of our method in the clinical scenario more comprehensively and systematically.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement Tao Wang:

Conceptualization, Methodology, Writing - original draft. Investigation, Software.

Wenjun Xia:

Writing - review & editing, Formal analysis.

Yongqiang Huang:

Writing - review & editing.

Huaiqiang Sun:

Data curation.

Yan Liu:

Writing - review & editing, Funding acquisition.

Hu Chen:

Writing - review & editing, Funding acquisition.

Jiliu Zhou:

Data curation, Funding acquisition, Project administration.

Yi Zhang:

Methodology, Writing - review & editing, Funding acquisition, Project administration.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61871277, 61902264 and in part by the Sichuan Science and Technology Program under Grant 2021JDJQ0024, 2019YFS0125.

References [1]. Thlin, G. and Jan, Basic Principles of Computed Tomography. Radiology, 1984. 151(1): p. 144-144. [2]. Mouton, A., et al., An experimental survey of metal artefact reduction in computed tomography. Journal of X-ray Science nd Technology, 2013. 21(2): p. 193-226. [3]. Yazdi, M. and L. Beaulieu. A novel approach for reducing metal artifacts due to metallic dental implants. in Nuclear Science Symposium Conference Record. p. 2260-2263, 2007: IEEE. [4]. Gjesteby, L., et al., Metal artifact reduction in CT: where are we after four decades? IEEE Access, 2016. 4: p. 5826-5849. [5]. Zhang, Y., et al., A new CT metal artifacts reduction algorithm based on fractional-order sinogram inpainting. Journal of X-ray Science and Technology, 2011. 19(3): p. 373-384. [6]. Xue, H., et al. Metal artifact reduction in dual energy CT by sinogram segmentation based on active contour model and TV inpainting. in 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC). p. 904-908, 2009: IEEE. [7]. Duan, X., et al. Metal artifact reduction in CT images by sinogram TV inpainting. in 2008 IEEE Nuclear Science Symposium Conference Record. p. 4175-4177, 2008: IEEE. [8]. Lewitt, R.M. and R.H.T. Bates, Image reconstruction from projections III: projection completion methods (theory). Optik, 1978: p. 50: 189-204. [9]. Kalender, W.A., R. Hebel and J. Ebersberger, Reduction of CT artifacts caused by metallic implants. Radiology, 1987. 164(2): p. 576-577. [10]. Zhao, S., et al., A wavelet method for metal artifact reduction with multiple metallic objects in the field of view. Journal of X-Ray Science and Technology, 2002. 10(1): p. 67-76. [11]. Mehranian, A., et al., X-ray CT Metal Artifact Reduction Using Wavelet Domain L-0 Sparse Regularization. IEEE Transactions on Medical Imaging, 2013. 32(9): p. 1707-1722. [12]. Lin, Z. and Q. Shi, Reduction of metal artifact in X-ray CT by quartic-polynomial interpolation. Journal of Image and Graphic, 2001. 6(2): p. 142-147. [13]. Gu, J., et al., Comparison among general interpolation methods for metal artifacts reduction in CT images. Nuclear Electronics and Detection Technology, 2006. 26(6): p. 905. 14]. Wang, J., et al., Metal artifact reduction in CT using fusion based prior image. Medical Physics, 2013. 40(8): p. 081903. [15]. Meyer, E., et al., Normalized metal artifact reduction (NMAR) in computed tomography. Medical Physics, 2010. 37(10): p. 5482-5493. [16]. Li, M., et al., A prior-based metal artifact reduction algorithm for x-ray CT. Journal of X-ray Science and Technology, 2015. 23(2): p. 229-241. [17]. Wang, G. and D.L. Snyder, Iterative deblurring for CT metal artifact reduction. IEEE Trans Med Imaging, 1996. 15(5): p. 657-664. [18]. Wang, G., M.W. Vannier and P. Cheng, Iterative X-ray cone-beam tomography for metal artifact reduction and local region reconstruction. Microscopy and microanalysis, 1999. 5(1): p. 58-65. [19]. Wang, G., T. Frei and M.W. Vannier, Fast iterative algorithm for metal artifact reduction in X-ray CT. Academic radiology, 2000. 7(8): p. 607-614. [20]. Peng, C., et al., GPU-accelerated dynamic wavelet thresholding algorithm for X-ray CT metal artifact reduction. IEEE Transactions on Radiation and Plasma Medical Sciences, 2017. 2(1): p. 17-26. [21]. Zhang, H., B. Dong and B. Liu, A reweighted joint spatial-radon domain ct image reconstruction model for metal artifact reduction. SIAM Journal on Imaging Sciences, 2018. 11(1): p. 707-733. [22]. Mehranian, A., et al., 3D prior image constrained projection completion for X-ray CT metal artifact reduction. IEEE Transactions on Nuclear Science, 2013. 60(5): p. 3318-3332. [23]. De Man, B., et al., Reduction of metal streak artifacts in x-ray computed tomography using a transmission maximum a posteriori algorithm. IEEE Transactions on Nuclear Science, 2000. 47(3): p. 977-981. [24]. Yu, W., et al., Low-dose computed tomography reconstruction regularized by structural group sparsity joined with gradient prior. Signal Processing, 2021. 182: p. 107945. [25]. Gong, C. and L. Zeng, Adaptive iterative reconstruction based on relative total variation for low-intensity computed omography. Signal Processing, 2019. 165: p. 149-162. [26]. Soltanian-Zadeh, H., J.P. Windham and J. Soltanianzadeh. CT artifact correction: an image-processing approach. in Medical Imaging 1996: Image Processing. DOI:10.1117/12.237950, 1996: International Society for Optics and Photonics. [27]. Ballhausen, H., et al., Post-processing sets of tilted CT volumes as a method for metal artifact reduction. Radiation Oncology, 2014. 9(1): p. 114. [28]. Yu, L., et al., Deep Sinogram Completion with Image Prior for Metal Artifact Reduction in CT Images. IEEE Transactions on Medical Imaging, 2020. 40(1): p. 228-238. [29]. Miotto, R., et al., Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 2018. 19(6): p. 1236-1246. [30]. Guo, Y., et al., Deep learning for visual understanding: A review. Neurocomputing, 2016. 187: p. 27-48. [31]. LeCun, Y., Y. Bengio and G. Hinton, Deep learning. Nature, 2015. 521(7553): p. 436-444. [32]. Wang, G., et al., Image reconstruction is a new frontier of machine learning. IEEE Transactions on Medical Imaging, 2018. 37(6): p. 1289-1296. [33]. Bai, Y., et al., Deep learning methods for solving linear inverse problems: Research directions and paradigms. Signal Processing, 2020: p. 107729. [34]. Jin, K.H., et al., Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 2017. 26(9): p. 4509-4522. [35]. Ye, J.C., Y. Han and E. Cha, Deep convolutional framelets: A general deep learning framework for inverse problems. SIAM Journal on Imaging Sciences, 2018. 11(2): p. 991-1048. [36]. Park, H.S., et al., CT sinogram-consistency learning for metal-induced beam hardening correction. Medical Physics, 2018. 45(12): p. 5376-5384. [37]. Ghani, M.U. and W.C. Karl, Deep Learning Based Sinogram Correction for Metal Artifact Reduction. Electronic Imaging, 018. 2018(15): p. 472-1-4728. [38]. Ghani, M.U. and W.C. Karl, Fast enhanced CT metal artifact reduction using data domain deep learning. IEEE Transactions on Computational Imaging, 2019. 6: p. 181-193. [39]. Long, J., E. Shelhamer and T. Darrell, Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015. 39(4): p. 640-651. [40]. Pimkin, A., et al. Multidomain CT Metal Artifacts Reduction Using Partial Convolution Based Inpainting. in 2020 International Joint Conference on Neural Networks (IJCNN). DOI: 10.1109/IJCNN48605.2020.9206625, 2020: IEEE. [41]. Zhang, Y. and H. Yu, Convolutional neural network based metal artifact reduction in x-ray computed tomography. IEEE Transactions on Medical Imaging, 2018. 37(6): p. 1370-1381. [42]. Liao, H., et al., ADN: Artifact disentanglement network for unsupervised metal artifact reduction. IEEE Transactions on Medical Imaging, 2019. 39(3): p. 634-643. [43]. Gjesteby, L., et al., A dual-stream deep convolutional network for reducing metal streak artifacts in CT images. Physics in Medicine & Biology, 2019. 64(23): p. 235003. [44]. Lin, W., et al. Dudonet: Dual domain network for ct metal artifact reduction. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. p. 10512-10521, 2019. [45]. Lyu, Y., et al. Encoding Metal Mask Projection for Metal Artifact Reduction in Computed Tomography. in International Conference on Medical Image Computing and Computer-Assisted Intervention. p. 147-157, 2020: Springer. [46]. Peng, C., et al., An irregular metal trace inpainting network for x-ray CT metal artifact reduction. Medical Physics, 2020. 47(9): p. 4087-4100. [47]. Pan, X., Optimal noise control in and fast reconstruction of fan ‐ beam computed tomography image. Medical Physics, 1999. 26(5): p. 689-697. [48]. Watzke, O. and W.A. Kalender, A pragmatic approach to metal artifact reduction in CT: merging of metal artifact reduced mages. European radiology, 2004. 14(5): p. 849-856. [49]. Kachelrieß, M., O. Watzke and W.A. Kalender, Generalized multi ‐ dimensional adaptive filtering for conventional and spiral single ‐ slice, multi ‐ slice, and cone ‐‐