Iterative Reconstruction for Low-Dose CT using Deep Gradient Priors of Generative Model
Zhuonan He, Yikun Zhang, Yu Guan, Shanzhou Niu, Yi Zhang, Yang Chen, Qiegen Liu
Abstract— —Dose reduction in computed tomography (CT) is essential for decreasing radiation risk in clinical applications. Iterative reconstruction is one of the most promising ways to compensate for the increased noise due to reduction of photon flux. Rather than most exist-ing prior-driven algorithms that benefit from manually designed prior functions or supervised learning schemes, in this work we integrate the data-consistency as a con-ditional term into the iterative generative model for low-dose CT. At first, a score-based generative network is used for unsupervised distribution learning and the gradient of generative density prior is learned from normal-dose images. Then, the annealing Langevin dy-namics is employed to update the trained priors with conditional scheme, i.e., the distance between the recon-structed image and the manifold is minimized along with data fidelity during reconstruction. Experimental com-parisons demonstrated the noise reduction and detail preservation abilities of the proposed method.
Index Terms—
Computed tomography, iterative re-construction, score-based generative network, gradient prior. I. I NTRODUCTION -RAY computed tomography (CT) is a popular imaging modality with applications in biology, medicine and other fields. The extensive use of CT examination has raised concerns about the potential risks of carcinogenesis or genetic damage from X-ray radiation [1]. Low-dose CT (LDCT) can image many clinical indications to minimize radiation-related risks without significantly affecting screening or diagnostic performance. Hence, making the radiation dose as low as reasonably achievable is commonly recognized, and it has been a hot research topic during the latest three decades. However, reducing the radiation dose will increase data noise and introduce artifacts into the reconstructed images, which adversely affects its diagnostic performance if these issues are not addressed [2]. Later than the classical filtered back-projection (FBP) algorithm, iterative reconstruction methods become the mainstream in the past few decades [3]-[7]. Incorporating photon statistics and prior information of the target image, these methods have great potentials in noise reduction and
This work was supported in part by the National Natural Science Foun-dation of China under 61871206, 61661031 and project of innovative spe-cial funds for graduate students in Jiangxi province (CX2019075). Z. He, Y Guan and Q. Liu are with the Department of Electronic Infor-mation Engineering, Nanchang University, Nanchang 330031, China. ({hezhuonan, guanyu}@email.ncu.edu.cn, {liuqiegen}@ncu.edu.cn). S. Niu is with the School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China. ([email protected]). Y. Zhang is with the College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China. ([email protected]). Y. Chen, and Y. Zhang are with the Laboratory of Image Science and Technology, School of Computer Science and Engineering, Southeast University, Nanjing 210096, China. ({yikun, chenyang.list}@seu.edu.cn). information preservation. Specifically, most of the priors are manually designed under a set of neighboring pixels in the image or transform domain, emphasizing on enhancing image smoothness while maintaining edges. Total variation (TV), sparsity in wavelet transform (WT), and non-local patch-based priors have shown promising results in LDCT [5]-[7]. Nevertheless, these reconstruction approaches may still lose some image details and suffer from remaining artifacts. Deep learning (DL) techniques, particularly convolutional neural networks (CNN), have been actively developed and applied to various applications recently. The explosive development of them suggests new thinking and huge potential for the medical imaging field. Broadly speaking, these approaches can be categorized into two types: One is using end-to-end supervised DL techniques as a post-processing method [8]-[14], and the other is integrating DL-driven prior techniques into an iterative scheme. In the first class,
Chen et al . [8] and Kang et al . [9] are the first tries to study the deep CNN for LDCT.
By recognizing that the WT operator is able to improve the denoising efficiency and preserves or even enhances the edge features, the results in [10] demonstrated that the directional wavelet utilizing deep CNN was more effective in getting rid of low-dose related noises. Although these algorithms attained promising results, they were usually designed for a particular task with specialized architectures or loss functions, and trained with paired data by taking one image as input and the other as supervision.
In another class of methods, it reuses the knowledge in learned priors to tackle various tasks without retraining or modification. As expected, it is possible to model the nonlinear manifold, so that knowledge of normal-dose image can be learned more precisely, subsequently, improved reconstruction quality is achieved. Specifically, Baguer et al . [15] used deep image priors for CT reconstruction to achieve promising results in the low-data regime. Meanwhile, based on the feature learning and mapping ability of the generative models, many CT reconstruction methods have been proposed from the perspectives of network structure and objective function. Recent progress is mainly driven by two approaches: likelihood-based methods [16]-[19] and generative adversarial network (GAN) [20]. The former use log-likelihood (or a suitable surrogate) as the training objective, while the latter uses adversarial training to minimize f -divergences [21] or integral probability metrics between model and data distribution [22], [23]. For example, Adler et al . [24] employed a Wasserstein GAN to draw samples from the conditional distribution. Regretfully, despite the success of generative models in tackling natural images, there have been few studies in field of medical imaging, especially in LDCT. In this work, to boost the LDCT reconstruction, we focus on exploring dEep grAdient priorS of genErative modeL Iterative Reconstruction for Low-Dose CT using Deep Gradient Priors of Generative Model
Zhuonan He , Yikun Zhang , Yu Guan, Shanzhou Niu, Yi Zhang,
Senior Member, IEEE , Yang Chen * , Senior Member, IEEE , Qiegen Liu * , Senior Member, IEEE X EASEL). As a newly developed unsupervised learning, the score-based generative model has exhibited great potential for diverse image representation [25]. Viewing the noisy observation as a conditional variable, a conditional score-based generative model is introduced into LDCT reconstruction in this work. We first estimate gradient of data density via denoising score matching, and then utilize it in annealing and conditional Langevin dynamics. More specifically, based on Bayesian rule, the data consistency is incorporated into the annealing Langevin dynamics procedure. During iterative reconstruction procedure, the distance between the reconstructed images and the learned manifold is minimized with the data fidelity. Since EASEL is implemented in a fully unsupervised way, it has more flexibility and less requirement on training data [26]. The remainder of the paper is organized as follows. Section II provides a brief description of preliminary work with regard to the generative model and gradient of genera-tive model. Section III presents forward formulation of LDCT and the corresponding iterative solver. Extensive experimental comparisons between the proposed EASEL and state-of-the-arts are conducted in Section IV. Finally, concluding remarks and directions for future research are given in Section V.
II. P RELIMINARIES A. Deep Generative Models
Recently advances with deep generative networks have shown promising results in modeling complex distributions such as images [27], audios [28] and texts [29]. As shown in Fig.1, the popular deep generative models can be primarily categorized into two groups: Explicit generative model and implicit generative model. The former model provides an explicit parametric specification of the data distribution, specifying a log-likelihood function log ( ) p x , including autoencoders (AE) and its variants [30], [31], flow-based generative models [32], [33], and deep Boltzmann machine [34]. Specially, score-based models [37]-[43] train paramet-ric network to approximate the likelihood gradient log ( ) x p x , which have tractable likelihood estimation. Alternatively, we can specify implicit probabilistic models that define a stochastic procedure to directly generate data. GANs [16] are the well-known implicit likelihood models, in the sense that they optimize the objection function through adversarial learning, and have been shown to pro-duce high quality images [20], [35], [36]. Despite its success, GANs still suffer from a remarkable difficulty in training and in interpretability due to the lack of theory guarantee. Fig. 1.
Classification of the popular deep generative models. B. Deep Gradient Priors of Generative Model
Recent works show that successful image generation can be achieved by score-based generative models [37]-[43]. They represent probability distributions through score func-tions log ( ) x p x —a vector field pointing in the direction where the likelihood of data ( ) p x increases most rapidly. Remarkably, these score functions can be learned from data without requiring adversarial optimization, and can produce realistic image samples that rival GAN [37]. The whole procedure consists of two steps: First, the de-noising score matching (DSM) is used to train a network ( ) θ s x to approximate log ( ) l x p x with different scale magnitude { } Ll l = ; Second, via annealing strategy of de-ceasing l , the Langevin dynamics is executed to draw sampling from a sequence of θ ( ; ) l s x and approaches to the final estimation of log ( ) p x . Denoising Score Matching:
In general, using score match-ing [41], a score network ( ) θ s x can be trained to estimate log( ( )) x p x by minimizing the objective function
2( ) 2 [ ( ) - log ( ) ] p x x
E s x p x . Alternatively, the objective is equiv-alent to
2( ) 2
1[ ( ( )) ( ) ]2 p x x
E tr s x s x + , (1) where ( ) x s x denotes the Jacobian of ( ) s x . Neverthe-less, because of the expensive computation of ( ( )) x tr s x , score matching is not scalable to deep neural network and high-dimensional data. To overcome this issue, Vincent [42] used DSM ( ) s x to match a non-parametric kernel density estimator:
2( ) 2 [ ( ) - log ( ) ]
P x x
E s x p x , (2) where the corresponding perturbed data distribution is ( ) ( | ) ( ) p x p x x p x dx = . Crucially, one caveat of DSM is that the optimal score * ( ) log ( ) log ( ) x x s x p x p x = is true only when the noise is small enough. Whereas, learning the score function with the single-noise perturbed data distribution will lead to inaccurate score estimation in the low data density region on high-dimension data space, which could be severe due to the low-dimensional manifold assumption. Thus, Song and Ermon [25] proposed learning a single neural network based on multiple perturbed data dis-tributions with Gaussian noises of varying magnitudes:
22 ( | ) ( ) 21 ll L l p x x p x l xl
E s x p x xL = − , (3) where { } Ll l = is a positive geometric sequence, the target log ( | ) ( ) l x l p x x x x = − has a simple closed-form and the empirical average is utilized to estimate all expectations. Annealing Langevin Dynamics:
In many situations, score function is easier to model and estimate than the original density function [43]. For example, for an unnormalized density it does not depend on the partition function. Once the score function is known, we can employ Langevin dy-namics to sample from the corresponding distribution. The Langevin dynamic algorithm is an efficient Markov chain Monte Carlo sampling method. Given a step size > 0, a total number of iterations T , an initial sample x , it eval-ates the gradient of the negative log-probability iteratively: log ( )2 ( ; )2 t t t tlt t t x l x x p x zx s x z − − −− − − = + += + + . (4) where ~ (0, ) z N I and
1, , t T = . However, when two modes of the data distribution are separated by low density regions, Langevin dynamics will not be able to correctly recover the relative weights of these two modes in reasonable time, and therefore might not con-verge to the true distribution. In addition, since Langevin dynamics will often be initialized in low-density regions of the data distribution, inaccurate score estimation in these regions will negatively affect the sampling process. Hence, mixing can be difficult because of the need of traversing low-density regions to transition between modes of the dis-tribution. To tackle these two challenges, Song and Ermon [25] proposed an annealed version of Langevin dynamics, where they initially used scores corresponding to the highest noise level, and gradually annealed down the noise level { 0} Ll l = → until it was small enough to be indistinguisha-ble from the original data distribution. As a remedy, large noise levels will produce samples in low density regions of the original data distribution, which can improve score es-timation. III. M ETHODOLOGY A. LDCT Imaging Model
The statistics of CT measure data are often described by a Poisson distribution [46]. Specifically, a Poisson model for the intensity measurement is { }, 1, , i Axi i i m
I Poisson b e r i N − + = , (5) where i I is the number of transmitted photons, A is a system matrix (projection operator), i b denotes the X-ray source intensity of the i -th ray, and i r denotes the back-ground contributions of scatter and electrical noise. x is a vector for the representation of attenuation coefficients with units of inverse length, m N is the number of measurements and v N is the number of image voxels. After taking the logarithm operation, the sinogram data are often approximated as a weighted Gaussian [10]: ( [ ] , ( ) ) i i i i i y N Ax I I r − , (6) where [ ] i i I E I = . In fact, the LDCT problem can be for-mulated as an inverse problem + y Ax = . In order to cover the uncertainties that occur especially with ill-posed prob-lems, the theory of Bayesian inversion considers the poste-rior distribution ( | ) p x y [47]. This posterior is the condi-tional density of the image x conditioned on the meas-urements y . B. Proposed EASEL Model
In this paper, we are devoted to using the generative model to tackle the LDCT problem, a process of noise addition in imaging [52]. Accordingly, the estimation problem can be reduced to ˆ arg max log( ( )). . + x x p xs t y Ax = = . (7) Owing to the observation y , the sampling object is not directly from ( ) p x , but from the posterior distribution ( | ) p x y , i.e., (log ( | ))2 t t tt x x x p x y z − − = + + . (8) Due to the Bayesian rule ( | ) ( | ) ( ) ( ) p x y p y x p x p y = , it becomes to be [log ( ) log ( | )]2 (log ( )) [ ]2 2 t t t tt t t tt xx x x x p x p y x zx p x y Ax z − − −− − − = + + += + + − + . (9) Here, log ( | ) p y x is given by the data model that is usu-ally derived from knowledge about how data is generated and log( ( )) p x is given by the prior model that represents information known beforehand about the true model param-eter. The hyperparameter balances the trade-off between priors and data fidelity. It has to be estimated if we do not know the standard deviation of the measurement noise. C. Optimization for EASEL
In the following, we elaborate on the details for implementing Eq. (9). First, we use annealing strategy to approximate Eq. (9). Especially, Let ( ) l p x denote the distribution of + l x for ~ x p and ~ (0, ) l l N I . At high noise levels l , ( ) l p x is approximately Gaussian and irreducible, so the Langevin dynamics Eq. (9) will mix quickly. The modified Langevin dynamics is as follows: (log ( )) [ ]2 2 t t tt t x x x x p x y Ax z − − − = + + − + (10) Second, as done in [24][53], we seek to an alternating technology to handle Eq. (10). Especially, we decompose Eq. (10) into two sub-iterative steps as: ( ; )2 t t tt l l l u x s x z − − = + + , (11) arg min[ ] t t x x y Ax x u = − + − , (12) where parameter ( ) = is related to . The separable quadratic surrogates (SQS) algorithm [54][55] is adopted on the convex problem (12): ( ) ( )1 1 t t T tt t T
A Ax y x ux x A A − −− − + −= − + , (13) where T A is the transpose of A and refers to back-projection, is an all-ones vector. We begin by initializing z as the standard Gaussian vector, and then take a gradient step in one of these while fixing the other. Additionally, for additional acceleration, we apply the Nesterov’s momentum [56] that exploits the pre-vious descent directions. A momentum term can be ( ) t t t t w x x x + + + = + − , where is a relaxation factor that lies between 0 and 1. In summary, the visual flowchart of training phase and it-erative reconstruction phase for LDCT is shown in Fig. 2. Furthermore, Algorithm 1 explains the reconstruction algo-rithm in detail. EASEL consists of two loops. The outer loop is composed of several stages to enable that ( ) l p x tends o ( ) p x with decreasing l -value. The inner loop conducts the conditional Langevin dynamics with alter-natingly updating of data-consistency and deep gradient prior. Since both the Langevin dynamics updating [57] and SQS updating [54][55] have convergence guarantee, the overall EASEL algorithm will come to the convergence region after finite iterations.
Algorithm 1
EASEL for Low Dose CT Imaging
Initialization : x and w , Ll l = , , , For
1, 2, , l L = = l l L For
1, 2, , t T = Draw -1 ~ (0,1) t z N ( ; )2 t t tt l l l u x s x z − − = + + ( ) ( )1 1 t t T tt t T
A Ax y x ux w A A − −− − + −= − + ( ) t t t t w x x x − = + − End for T x w End for D. Network Architecture of ( , ) s x As described above, the adaption of unsupervised network to the general LDCT reconstruction is the key contribution of EASEL. Besides, the architecture of the score-based network ( , ) s x is also an important factor that impacts the algorithm performance. In this subsection, following the idea of high-dimensional embedded denoising network in our previous work [51][52], we pre-sent a channel-copy guided RefineNet as ( , ) s x . Originally, the RefineNet [46] is a modern variant of U-Net [47] that also incorporates ResNet designs [48]. On this basis, Song and Ermon [25] replaced the max pooling layers in refine blocks with multi-path block, as multi-path block is reported to produce smoother and more diversity features for image generation tasks such as image recon-struction. More importantly, they used convolutional opera-tor to produce high-dimensional features before the mul-ti-path block, as seen in Fig. 3(a), such as to obtain flexible representation and excellent robustness abilities. Alternatively, in Fig. 3(b), we choose the channel-copy strategy to replace the convolutional operator in naïve Re-fineNet to form the channel-copy guided RefineNet. As seen in Fig. 3, both the naïve RefineNet and channel-copy guided RefineNet extend the generative ability via extend the in-formation/representation dimension, such as increasing channel dimension of the input. However, the latter strategy is simpler and favors to computation effectiveness. As demonstrated in Section IV. D, the naïve RefineNet needs 64 convolutional kernels, while the present channel-copy guided RefineNet only uses 10 coped channels. Fig. 4 depicts the whole architecture of the channel-copy guided RefineNet. At the prior training stage, we copy and rearrange the single-channel image to the same 10-channel images. Thus, the DSM network can be trained with these data injected with artificial Gaussian noise. At the iterative reconstruction stage, in order to pave the way to apply the trained 10-channel prior to the intermediate single-channel image from the previous iteration, it needs to conduct the channel-copy and channel-mean operators before and after the Langevin dynamics updating . Fig. 2.
The training and reconstruction paradigm of the generative model-based algorithm EASEL. It consists of two components, i.e., a denoising score matching for score estimation involving various noise magnitudes simultaneously, and an iterative cycle for reconstruction including the annealed and con-ditional Langevin dynamics.
Fig. 4.
The architecture of channel-copy guided RefineNet used in ( ; ) s x of EASEL. A distinct difference of the channel-copy guided RefineNet from the naïve RefineNet is that, we use channel-copy technique to attain high-dimensional features and then inject noise into them. A more detailed visual comparison is shown in Fig. 3. (a) (b) Fig. 3.
Visual comparison of the network input in the naïve RefineNet and the modified network ( ; ) s x used in EASEL. IV. E XPERIMENTAL RESULTS
In the experiments, the proposed EASEL is implemented in PyTorch framework, conducted on GPU platform of NVIDIA Titan XP with 12G memory. Five methods are compared with EASEL including the classical FBP recon-struction (Ramp-filter) [58], TV-based iterative method [59], dictionary learned by K-SVD algorithm [60], RED-CNN network [61] and domain progressive residual network DP-ResNet [62]. The involved parameters are set following the guidelines in their original papers. For more in-depth study and research, the source code of EASEL can be found at: https://github.com/yqx7150/EASEL. A. Data Specification
AAPM Challenge Data:
To evaluate the algorithm on a clinically realistic use-case, we consider reconstruction of simulated data from human abdomen CT scans as provided by Mayo Clinic for the AAPM Low Dose CT Grand Chal-lenge [63]. The data includes high-dose CT scans from 10 patients, of which we use 9 for training and 1 for evaluation. We use the 1 mm slice thickness reconstructions, resulting in 2961 training images, each 512×512 pixel in size. We use a two-dimensional fan-beam geometry with 1000 angles, 1000 pixels, source to axis distance 500 mm and axis to detector distance 500 mm. The corresponding LDCT images are simulated by adding Poisson noise (whose intensities are =5 4 i b e in Eq. (5)) into the sinogram data [64]. CIRS Phantom Data:
A high-quality set of CT volumes (512×512×100 voxels, voxel size 0.78×0.78×0.625 mm ) of an anthropomorphic CIRS phantom is obtained from a GE Discovery HD750 CT system, in which the tube current value is set to 600 mAs to guarantee a good image quality for low-dose simulation under 150 mAs. The source-to-axial distance is 573 mm, and the source-to-detector distance is 1010 mm. Fig. 5 displays some representative high-dose images, low-dose CT images and the differential images. B. Quantitative Indices
To evaluate the quality of the reconstructed images, three metrics, mean absolute error (MAE), peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), are selected for quantitative assessment. Specifically, in statistics, MAE is a measure of errors be-tween paired observations expressing the same phenomenon. It is defined as: ˆ| | / N i ii
MAE x x N = = − , (14) where N is the number of pixels in the reconstructed im-age. MAE approaches to zero if the reconstructed image is closer to the reference image. (a) (b) (c) (d) (e) (f) Fig. 5.
Visual illustration of LDCT data and its counterpart high-dose CT image. (a)(d) High-dose CT in AAPM challenge data and CIRS phantom data, respectively; (b)(e) The corresponding LDCT images; (c)(f) Differ-ence image between high-dose CT and LDCT image.
The PSNR measure describes the relationship of the maximum possible power of a signal with the power of noise corruption. Higher PSNR means better image quality. Denoting x and ˆ x to be the reconstructed image and ground truth, PSNR is expressed as:
10 2 ˆ ˆ ˆ( , ) 20log [Max( ) ]
PSNR x x x x x = − . (15) he SSIM-value is used to measure the similarity be-tween the original CT image and reconstructed images, evaluated on three aspects: luminance, contrast, and struc-tural correlation. SSIM values are normalized between 0 and 1, being 1 the situation in which both images are equal. SSIM is defined as: ˆ ˆ1 22 2 2 2ˆ ˆ1 2 (2 )(2 )ˆ( , ) ( )( ) x x xxx x x x c cSSIM x x c c + += + + + + , (16) where x and x are the average and variances of x . ˆ xx is the covariance of x and ˆ x . c and c are used to maintain a stable constant. C. Reconstruction Results
AAPM Challenge Data:
For better evaluation of recon-struction image quality, we calculate the MAE, PSNR and SSIM values for the LDCT images in the test dataset. Table I presents all the results, and the best value of each metric is marked in black bold. Intuitively, our method scores the highest PSNR, and ranks the second in terms of MAE and SSIM values, whereas the DP-ResNet ranks second in PSNR measure.
At the same time, comparing the experi-mental results between TV and K-SVD, both of them obtain worse image quality evaluation metrics. It implies that the performance of the classical algorithms may still lose some details and suffer from remaining artifacts. T ABLE
I Q
UANTITATIVE RESULTS ( MEAN STD)
ASSOCIATED WITH DIFFERENT METHODS FOR THE IMAGES IN THE TESTING LD PROJECTION DATASET . Method MAE PSNR SSIM FBP(Ramp-filter) 67.69 EASEL 15.44 To visually illustrate the reconstruction performance, we perform qualitative comparisons over the selected methods for CT images of different body parts, as shown in Fig. 6. It is noteworthy that the results focus on content restoration, artifact suppression, and noise reduction. In addition, for better evaluation of image quality, Fig. 6 depicts the zoomed regions-of-interest (ROI) marked by the red rectangles. From the FBP reconstructions in Fig. 6(b1)-(b3), we can see that FBP reconstruction leads to severely degraded LDCT images with obviously amplified noise and artifacts. As a traditional method, K-SVD produces reconstruction images (Fig. 6(c1)-(c3)) with visually smoother appearances. How-ever, some tiny structures might be smoothed out and lead to lowered tissue contrast, as indicated by the red arrows. Fur-thermore, it also can be seen that deep learning methods effectively reduce noise and remarkably overmatch TV and K-SVD in Fig. 6(e1)-(f3). They improve the effect of noise reduction and suppress most artifacts. However, they have incomplete preservation of details and texture information. Comparing the results in Fig. 6(g1)-(g3), we can see that the proposed EASEL method achieves the best performance in terms of noise-artifact suppression and tissue feature preservation. The corresponding residual images are shown in Fig. 7. Among them, we find that TV is able to perform well but tends to smooth textures and edges. Compared to the competitive reconstruction methods, EASEL has its own advantages, which can effectively reduce the noise, and its reconstruction performs better in terms of artifact reduction and detail preservation. To better illustrate the effectiveness of noise removal by EASEL, Fig. 8 plots the 1D line intensity profile passing through the red dashed line in Fig. 6(a1). It compares the same line intensity profiles from the CT image reconstructed by various methods. Through visual inspection, it is evident that the line intensity profile from our proposed method re-sembles most closely to the one from the normal-dose CT image. The comparison demonstrated the advantage of the proposed reconstruction method over the other reconstruc-tion algorithms on edge preservation.
Fig. 7.
The absolute difference images between the reference CT image and the CT images reconstructed from the different algorithms:(a) FBP (b) TV (c) K-SVD (d) RED-CNN (e) DP-ResNet (f) EASEL.
Fig. 8.
1D intensity profile passing through the solid red line in Fig. 6(a1). All the results in (a1)-(g1) are compared.
CIRS Phantom Data:
For the CIRS phantom data, quantita-tive results reconstructed from different reconstruction methods are tabulated in Table II. It can be observed that EASEL performs better than the other methods in a trend similar to what we have seen from the reconstruction images and produces the highest PSNR. The PSNR measure for the EASEL reconstruction image improves by about 0.37dB compared to the result from DP-ResNet. In fact, compared with the end-to-end DL algorithms, the unsupervised meth-od greatly reduces the image reconstruction effect after changing the test data. To visualize the benefits of the proposed method, recon-struction images with ROIs using different methods to the ground truth (full-dose CT images) are provided in Fig. 9. Specifically, a supervised method with the network structure would cause an obvious artifact around the center of the reconstruction, whereas the similar artifact would not appear in the reconstruction of K-SVD and TV. One possible reason s that end-to-end learning models that are particularly trained for a certain task with the same data. By comparing the result of K-SVD, we find that the boundary of recon-struction image is still visible, while it is blurrier in the CT images from other methods. Moreover, DP-ResNet works a bit better than RED-CNN method but some edges and small structures are oversmoothed. For attaining further perspec-tives of our adapted EASEL for LDCT reconstruction, the residual images between the reconstructions and the refer-ences are presented in Fig. 10, which demonstrates that the EASEL can achieve better reconstruction accuracy than that of the other algorithms on edge preservation. (a1) (b1) (c1) (d1) (e1) (f1) (g1) (a2) (b2) (c2) (d2) (e2) (f2) (g2) (a3) (b3) (c3) (d3) (e3) (f3) (g3)
Fig. 6.
Reconstruction results of AAPM challenge data for different methods. (a1)-(a3) reference image (b1)-(b3) FBP (c1)-(c3) TV (d1)-(d3) K-SVD (e1)-(e3) RED-CNN (f1)-(f3) DP-ResNet (g1)-(g3) EASEL. The display windows are [-1150, 350] HU, [-700, 1300] HU and [-160, 240] HU, respectively.
To further investigate the algorithms’ ability of recon-struction, Fig. 11 plots the image profiles for the six meth-ods together with that of the ground-truth image. Intuitively, the bias can be observed more clearly in the profile plots: The pixel intensities for the DP-ResNet reconstruction better follow those of the “true” clinical image, while those for the K-SVD reconstruction are much worse than the “true” val-ues. Moreover, the gap between the profiles of the TV method and the ground-truth shows the gigantic bias. This means that TV produces a strong bias in the reconstruction. Instead, it is obvious that the profiles for EASEL are closest to the ground-truth among the compared methods. To sum up, in almost existing supervised DL-based methods, the network is learned by training a large amount of data that acquired with specific imaging geometry. If the observed sinograms are acquired with different low-dose scanning conditions and inconsistent with the training data, the reconstruction performance might dramatically decrease. By contrast, the proposed EASEL method largely alleviates the deficiency. D. Variants of Hyperparameters
The hyperparameter selection is one of the most crucial factors for the image quality attainable with the proposed method. Since the -value is the most sensitive parameter to the image quality, it is estimated by comparing the nu-merical indicators between the full-dose CT images and the processed LDCT images. Fig. 12 depicts the MAEs and SSIMs of the reconstructed results with different hyperpa-rameters. It can be seen that the performance gradually im-roves as -value increase. Later, the difference of per-formance between large and the peak vanishes. Thus, we set =150 in our experiments. (a1) (b1) (c1) (d1) (e1) (f1) (g1) (a2) (b2) (c2) (d2) (e2) (f2) (g2) Fig. 9.
Reconstruction results of an abdominal CT scan from CIRS phantom data using different methods. (a) reference image, (b) FBP, (c), TV, (d) K-SVD, (e) RED-CNN, (f) DP-ResNet, (g) EASEL.
The display window of the full images is [-160, 240] HU. T
ABLE
II Q
UANTITATIVE RESULTS ( MEAN STD)
ASSOCIATED WITH DIFFERENT METHODS FOR THE IMAGES IN THE TESTING LD PROJECTION DATASET . Method MAE PSNR SSIM FBP(Ramp-filter) 9.48 EASEL 6.03 Fig. 10.
The absolute difference images between the original CT image and the CT images reconstructed from the different algorithms:(a) FBP (b) TV (c) K-SVD (d) RED-CNN (e) DP-ResNet (f) EASEL.
To examine the convergence of EASEL, the convergence tendency of SSIM curve versus iteration is plotted in Fig. 13. It can be seen that, the fluctuation of the curve is not obvi-ous as the iteration increases. Besides, it is evident that EASEL is effective for noise and artifact suppression of LDCT images after 1500 iterations. Therefore, EASEL has a reasonably fast convergence rate. In addition, the reconstruction results with regard to the number of input channels of the network ( ) θ s x are inves- tigated in Table III. It is predictable that, as the channel number increases, the representation ability of the prior lev-erages. Subsequently, the performance will be improved. Considering the computational complexity, the channel number is set to be 10. Fig. 11.
1D intensity profile passing through the red solid line in Fig. 9(a1).
Fig. 12.
Evolution of the regularization parameter for EASEL. T ABLE
III T
HE IMPACT OF CHANNEL NUMBER ON LDCT RECONSTRUCTION . Channel 3 5 10 Naïve RefineNet MAE 15.70 15.81
Fig. 13.
Convergence tendency of EASEL in LDCT reconstruction. E. Analysis of Loss Function
The loss function, despite being the effective driver of the network’s learning, has attracted little attention within the image processing community [65]. The choice of the cost function generally defaults to be the squared L norm of the error. In this study, we bring attention to loss function for LDCT reconstruction. Specifically, the mean squared error L penalizes larger errors, but it is more tolerant to small errors, regardless of the underlying structure in the image. By contrast, loss function with L -norm does not over-penalize larger errors. Consequently, they may have different convergence properties. Inspired by this observa-tion, we test whether a different local metric such as L can produce better results with the state-of-the-art metrics for image quality. The performance of several losses (i.e. L , L , L + L ) is recorded in Table IV. As can be observed, the re-construction quality varies scarcely with the loss functions. L -norm is arguably the dominant error measure across very diverse fields, from regression problems, to pattern recognition, to signal and image processing. The main rea-son for its popularity is the fact that it is convex and differ-entiable—very convenient properties for optimization prob-lems. Meanwhile, it provides the maximum likelihood esti-mate in case of independent and identically distributed Gaussian noise, to the fact that it is additive for independent noise sources. Overall, considering the case of our work that performs noise distribution mapping, i.e., the distribution obeys normal distribution and has the same mathematical expression as L , we choose L as loss function to avoid the selection of complex parameters. T ABLE
IV T
HE IMPACT OF LOSS FUNCTIONS ON LDCT RECONSTRUCTION . Loss function L L L + L MAE 15.78 15.84 15.70 PSNR 41.24 41.27 41.19 SSIM 0.9482 0.9520 0.9515 V. C ONCLUSIONS AND D ISCUSSIONS
In this work, a novel iterative reconstruction framework, coined EASEL, was proposed. By annealing the gradient of data of density, elaborate prior knowledge from generative models was incorporated into the iterative procedure. Ex-perimental results on two public datasets demonstrated the feasibility and efficiency of EASEL for LDCT imaging problem, improving image quality and avoiding noise effect. On the one hand, to effectively extract image features at multiple scales, we self-copied the LDCT image into ten channels, and then the generative network used the ten components as input. Moreover, the modification of network architecture was used in our experiments to better combine high-dimensional information for LDCT reconstruction. On the other hand, a novel generative model where samples were produced via Langevin dynamics using gradients of the data distribution estimated with DSM was utilized for LDCT reconstruction. In addition, the distance between the reconstructed images and the learned manifold was mini-mized along with the data fidelity by SQS algorithm during iterative reconstruction. As a result, the reconstructed imag-es become closer to the reference data. In future study, we will extend our model to find the im-age similarity search on latent space over huge clinical im-age dataset and deal with more challenging tasks. The most significant requirement will be the availability of a larger multi-GPU server architecture. Experiments that use such a computational infrastructure are under-way. Besides, we would also like to integrate the deep gradient priors of gen-erative model into an iterative reconstruction pipeline to mitigate possible image feature suppression imposed by the network. A CKNOWLEDGMENTS
The authors would like to thank Dr. C. McCollough of the Mayo Clinic, Rochester, MN, USA, for providing clinical projection data, as agreed under the American Association of Physicists in Medicine. R EFERENCES [1]
D. J. Brenner and E. J. Hall, “Computed tomography-an increasing source of radiation exposure,”
New Engl. J. Med. , vol. 357, no. 22, pp. 2277-2284, 2007. [2]
M.K. Kalra, M.M. Maher, T.L. Toth, L.M. Hamberg, and M.A. Blake et al. , “Strategies for CT radiation dose optimization,”
Radiology , vol. 230, no. 3, pp. 619–628, 2004. [3]
J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image recon-struction for low-dose X-ray computed tomography,”
IEEE Trans. Med. Imaging , vol. 25, no. 10, pp. 1272-1283, 2006. [4]
Y. Chen, W. Chen, X. Yin, X. Ye, and X. Bao et al. , “Improving low-dose abdominal CT images by weighted intensity averaging over large-scale neighborhoods,”
Eur. J. Radiol. , vol. 80, no. 2, pp. 42-49, 2011. [5]
Z. Tian, X. Jia, K. Yuan, T. Pan, and S. B. Jiang, “Low-dose CT reconstruction via edge-preserving total variation regularization,”
Phy. Med. Biol. , vol. 56, no. 18, p. 5949-5967, 2011. [6]
A. Mehranian, M.R. Ay, A. Rahmim, and H. Zaidi, “X-ray CT metal artifact reduction using wavelet domain L0 sparse regularization,”
IEEE Trans Med Imaging , vol 32, no. 9, pp. 1707-1722, 2013. [7]
D. Zeng, J. Huang, H. Zhang, Z. Bian, and S. Niu et al. , “Spectral CT image restoration via an average image-induced nonlocal means fil-ter,”
IEEE Trans. Biomed. Eng. , vol. 63, no. 5, pp. 1044-1057, 2015. [8]
H. Chen, Y. Zhang, W. Zhang, P. Liao, and K. Li et al. , “Low-dose CT via convolutional neural network,”
Biomed. Opt. Exp. , vol. 8, no. 2, pp. 679–694, 2017. [9]
E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for low-dose CT via wavelet residual network,”
IEEE Trans. Med. Imaging , vol. 37, no. 6, pp. 1358-1369, 2018. [10]
E. Kang, J. Min, and J.C. Ye, “A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction,”
Med. phys , vol. 44, no. 10, pp. e360-e375, 2017. [11]
J.M. Wolterink, T. Leiner, M.A. Viergever, and I. Isgum, “Genera-tive adversarial networks for noise reduction in LDCT,”
IEEE Trans. Med. Imaging. , vol. 36, no. 12, pp. 2536-2545, 2017. 12]
D. Wu, K. Kim, G. El Fakhri, and Q. Li, “Iterative low-dose CT reconstruction with priors trained by artificial neural network,”
IEEE Trans. Med. Imaging. , vol. 36, no. 12, pp. 2479-2486, 2017. [13]
H. Shan, Y. Zhang, Q. Yang, U. Kruger, and M. K Kalra et al. , “3D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2D trained network,”
IEEE Trans. Med. Imaging. , vol. 37, no. 6, pp. 1522-1534, 2018. [14]
Q. Yang, P. Yan, Y. Zhang, H. Yu, and Y. Shi et al ., “Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,”
IEEE Trans. Med. Imaging. , vol. 37, no. 6, pp. 1348-1357,2018. [15]
D.O. Baguer, J. Leuschner, and M. Schmidt, “Computed tomography reconstruction using deep image prior and learned reconstruction methods,” arXiv preprint, arXiv :2003.04989, 2020. [16]
A. Graves, “Generating sequences with recurrent neural networks,” arXiv preprint arXiv :1308.0850, 2013. [17]
D.P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv :1312.6114, 2013. [18]
L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent components estimation,” arXiv preprint arXiv :1410.8516, 2014. [19]
A. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv :1601.06759, 2016. [20]
I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, and D. WardeFarley et al ., “Generative adversarial nets,” in Adv. Neural Inf. Process. Syst. , pp: 2672–2680, 2014. [21]
S. Nowozin, B. Cseke, and R. Tomioka, “F-GAN: Training genera-tive neural samplers using variational divergence minimization,” in Adv. Neural Inf. Process. Syst. , pp: 271–279, 2016. [22]
M. Arjovsky, S. Chintala, and L. Bottou. “Wasserstein generative adversarial networks,” in
Proc. Int. Conf. Mach. Learn. , vol. 70, pp. 214–223, 2017. [23]
B.K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G.R. Lanckriet, “On integral probability metrics, -divergences and binary classification,” arXiv preprint arXiv :0901.2698, 2009. [24] J. Adler and O. Öktem, “Deep Bayesian inversion,” arXiv preprint, arXiv :1811.05910, 2018. [25]
Y. Song and S. Ermon, “Generative modeling by estimating gradi-ents of the data distribution,” in Adv. Neural Inf. Process. Syst. , pp. 11918-11930, 2019. [26]
V. Antun, F. Renna, C. Poon, B. Adcock, A.C. Hansen, “On instabili-ties of deep learning in image reconstruction and the potential costs of AI,”
In Proceedings of the National Academy of Sciences , 2020. [27]
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv :1511.06434, 2015. [28]
A. Oord, S. Dieleman, H. Zen, K. Simonyan, and O. Vinyals et al ., “Wavenet: A generative model for raw audio,” arXiv preprint arXiv :1609.03499, 2016. [29]
S.R. Bowman, L. Vilnis, O. Vinyals, A.M. Dai, R. Jozefowicz, and S. Bengio, “Generating sentences from a continuous space,” arXiv preprint arXiv :1511.06349, 2015. [30]
D.P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv :1312.6114, 2013. [31]
K.C. Tezcan, C.F. Baumgartner, R. Luechinger, K.P. Pruessmann, and E. Konukoglu, “MR image reconstruction using deep density pri-ors,”
IEEE Trans. Med. Imag. , vol. 38, pp. 1633-1642, 2018. [32]
T. Taskaya-Temizel and M.C. Casey, “A comparative study of auto-regressive neural network hybrids,”
Neural Networks , vol. 18, no. 5-6, pp. 781-789, 2005. [33]
M. Asim, A. Ahmed, and P. Hand, “Invertible generative models for inverse problems: mitigating representation error and dataset bias,” arXiv preprint arXiv :1905.11672, 2019. [34]
R. Salakhutdinov and G.E. Hinton, “Deep Boltzmann machines,” in Proc. Int. Conf. Artif. Intell. Statist., pp. 448–455, 2009. [35]
E.L. Denton, S. Chintala, and R. Fergus, “Deep generative image models using a Laplacian pyramid of adversarial networks,” in Adv. Neural Inf. Process. Syst. , pp. 1486-1494, 2015. [36]
A. Radford, L. Metz, and S. Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv :1511.06434, 2015. [37]
Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” arXiv preprint arXiv :2006.09011, 2020. [38]
A. Nguyen, J. Clune, Y. Bengio, A. Dosovitskiy, and J. Yosinski, “Plug & play generative networks: Conditional iterative generation of images in latent space,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pp. 4467-4477, 2017. [39]
W. Grathwohl, K.C. Wang, J.H. Jacobsen, D. Duvenaud, and M. Norouzi et al ., “Your classifier is secretly an energy-based model and you should treat it like one,” arXiv preprint arXiv :1912.03263, 2019. [40]
D. Kingma and LeCun, “Regularized estimation of image statistics by score matching,” in Adv. Neural Inf. Process. Syst. , pp. 1126–1134, 2010. [41]
Y. Song, S. Garg, J. Shi, and S. Ermon, “Sliced score matching: A scalable approach to density and score estimation,” arXiv preprint arXiv :1905.07088, 2019. [42]
P. Vincent, “A connection between score matching and denoising autoencoders,”
Neural computation , vol. 23, no. 7, pp.1661–1674, 2011. [43]
A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,”
J. Machine Learning Res., pp. 695–709, 2005. [44]
J. Ma, Z. Liang, Y. Fan, Y. Liu, J. Huang et al ., “A study on CT sinogram statistical distribution by information divergence theory,” in Nucl. Sci. Symp. Med. Imag. Conf. Rec. , pp. 3191-3196, 2011. [45]
M. Dashti and A. M. Stuart, “The Bayesian approach to inverse problems,” arXiv preprint arXiv :1302.6989, 2014. [46]
G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path re-finement networks for high-resolution semantic segmentation,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pp. 5168-5177, 2017. [47]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Med. Image Comput. Comput.-Assisted Intervention , pp. 234-241, 2015 [48]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pp. 770-778, 2016. [49]
A. Sawatzky, Q. Xu, C.O. Schirra, and M.A. Anastasio, “Proximal ADMM for multi-channel image reconstruction in spectral X-ray CT,”
IEEE Trans. Med. Imaging. , vol. 33, no. 8, pp. 1657-1668, 2014. [50]
Q. Liu, Q. Yang, H. Cheng, S. Wang, M. Zhang et al. , “Highly un-der-sampled magnetic resonance imaging reconstruction using autoen-coding priors,”
Magn. Reson. Med. , vol. 83, no. 1, pp. 322-336, 2020. [51]
S. Li, B. Qin, J. Xiao, Q. Liu, Y. Wang et al. , “Multi-channel and multi-model based autoencoding prior for grayscale image restoration,”
IEEE Trans Image Process. , vol. 29, pp. 142-156, 2020. [52]
M.K. Kalra, M.M. Maher, D.V. Sahani, M.A. Blake, P. F. Hahn et al ., “Low-dose CT of the abdomen: evaluation of image improvement with use of noise reduction filters-pilot study,”
Radiology , vol. 228, no. 1, pp. 251-256, 2003. [53]
J. Liu, Y. Hu, J. Yang, Y. Chen, H. Shu, “3D feature constrained reconstruction for low-dose CT imaging,”
IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 5, pp. 1232–1247, 2018. [54]
H. Erdogan and J.A. Fessler, “Monotonic algorithms for transmis-sion tomography,”
IEEE Trans. Med. Imag. , vol. 18, no. 9, pp. 801–814, 1999. [55]
K. Kim, G.E. Fakhri, and Q. Li, “Low dose CT reconstruction using spatially-encoded non-local penalty,”
Med. phys , vol. 44, no. 10, pp. e376-e390, 2017. [56]
O. Devolder, F. Glineur, and Y. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,”
Math. Programm. , vol. 146, no. 1–2, pp. 37–75, 2014. [57]
A. Block, Y. Mroueh, and A. Rakhlin, “Generative modeling with denoising auto-encoders and Langevin sampling,” arXiv preprint arXiv:2002.00107 , 2020. [58]
J. Liu, J. Ma, Y. Zhang, Y. Chen, J. Yang et al ., “Discriminative feature representation to improve projection data inconsistency for low dose CT imaging,”
IEEE Trans. Med. Imaging. , vol. 36, no. 12, pp. 2499–2509, 2017. [59]
E. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained total-variation minimization,”
Phys. Med. Biol. , vol. 53, no. 17, pp. 4777–4807, 2008. [60]
M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,”
IEEE Trans. Signal Process. , vol. 54, no. 11, pp. 4311–4322, 2006. [61]
H. Chen, Y. Zhang, M.K. Kalra, F. Lin, and Y. Chen et al. , “Low-dose CT with a residual encoder decoder convolutional neural network,”
IEEE Trans. Med. Imaging. , vol. 36, no. 12, pp. 2524-2535, 2017. [62]
X. Yin, Q. Zhao, J. Liu, W. Yang, J. Yang et al. , “Domain progressive 3D residual convolution network to improve low-dose CT imaging,”
IEEE Trans. Med. Imaging. , vol. 38, no. 12, pp. 2903-2913, 2019. [63]
I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for polyenergetic X-ray computed tomography,”
IEEE Trans. Med. Imaging. , vol. 21, no. 2, pp. 89–99, 2002. [65]
B. Kim, M. Han, H. Shim, J. Baek, “A performance comparison of convolutional neural network-based image denoising methods: the effect of loss functions on low-dose CT images,”