[PDF] Deep Iteration Assisted by Multi-level Obey-pixel Network Discriminator (DIAMOND) for Medical Image Recovery

Abstract

Image restoration is a typical ill-posed problem, and it contains various tasks. In the medical imaging field, an ill-posed image interrupts diagnosis and even following image processing. Both traditional iterative and up-to-date deep networks have attracted much attention and obtained a significant improvement in reconstructing satisfying images. This study combines their advantages into one unified mathematical model and proposes a general image restoration strategy to deal with such problems. This strategy consists of two modules. First, a novel generative adversarial net(GAN) with WGAN-GP training is built to recover image structures and subtle details. Then, a deep iteration module promotes image quality with a combination of pre-trained deep networks and compressed sensing algorithms by ADMM optimization. (D)eep (I)teration module suppresses image artifacts and further recovers subtle image details, (A)ssisted by (M)ulti-level (O)bey-pixel feature extraction networks (D)iscriminator to recover general structures. Therefore, the proposed strategy is named DIAMOND.

Full PDF

11 Deep Iteration Assisted by Multi-level Obey-pixelNetwork Discriminator (DIAMOND) for MedicalImage Recovery

Moran Xu, Dianlin Hu, Weifei Wu ˚ , and Weiwen Wu Abstract —Image restoration is a typical ill-posed problem, and it contains various tasks. In the medical imaging ﬁeld, an ill-posedimage interrupts diagnosis and even following image processing. Both traditional iterative and up-to-date deep networks have attractedmuch attention and obtained a signiﬁcant improvement in reconstructing satisfying images. This study combines their advantagesinto one uniﬁed mathematical model and proposes a general image restoration strategy to deal with such problems. This strategyconsists of two modules. First, a novel generative adversarial net(GAN) with WGAN-GP training is built to recover image structures andsubtle details. Then, a deep iteration module promotes image quality with a combination of pre-trained deep networks and compressedsensing algorithms by ADMM optimization. (D)eep (I)teration module suppresses image artifacts and further recovers subtle imagedetails, (A)ssisted by (M)ulti-level (O)bey-pixel feature extraction networks (D)iscriminator to recover general structures. Therefore, theproposed strategy is named DIAMOND.

Index Terms —Medical image recovery, WGAN-GP, compressed sensing, ADMM, iteration. (cid:70)

NTRODUCTION I MAGE recovery is a signiﬁcant part of inverse problems.Speciﬁcally, when an original image is polluted by noise,the recovery task aims to remove noise and preserve ﬁnedetails. When the original image is blurred because ofmotions, the task mainly focuses on recovering a sharpimage from the deblurring one. When the original imagedoes not satisfy resoution demand, the recovery task istransferred to resolution enhancement, etc. In this study, wefocus on a common approach to solve image recovery tasks.Speciﬁcally, image denoising and image super-resolutiontasks are discussed.

There are two types of image super-resolution reconstruc-tion technology. One is to synthesize a high-resolution im-age from multiple low-resolution images, and the other is toobtain a high-resolution image from a single low-resolutionimage. In this column, we focus on Single Image Super-Resolution Reconstruction (SISR). SISR methods can be di-vided into three categories: interpolation-based methods,reconstruction-based methods, and learning-based meth-ods. Interpolation-based methods are simple to implementand have been widely used, but these linear models limittheir ability to recover high-frequency details. Sparse repre-sentation [1] based technologies enhance the ability of linear ‚ M. R. Xu and W. F. Wu are with the People’s Hospital of China ThreeGorges University, Yichang, 443000, China and they are also with theFirst People’s Hospital of Yichang, Yichang, 443000, China. D. L. Huare with the Laboratory of Imaging Science and Technology, Schoolof Computer Science and Engineering, Southeast University, Nanjing,211189, China. W. W. Wu is with the department of radiology diagnosisin the Univeristy of Hong Kong, 999077, SAR, China.( ˚ refers to thecorresponding author)E-mail: [email protected], [email protected],[email protected], [email protected] models by using prior knowledge. This type of technologyassumes that any natural image can be sparsely representedby a dictionary’s elements. This dictionary can form adatabase and learn the mapping from low-resolution imagesto high-resolution images from the database. However, suchmethods are computationally complex and require manycomputing resources [2] [3]. Based on CNN (ConvolutionalNeural Network) model, SRCNN [4] ﬁrst introduced CNNinto SISR. It only used a three-layer network and achievedadvanced results. Subsequently, various models based ondeep learning entered the ﬁeld of SISR, roughly dividedinto the following two signiﬁcant directions. One is topursue the recovery of details, using PSNR, SSIM, and otherevaluation standard algorithms, among which the SRCNNmodel is the representative. Another is a series of algorithmsrepresented by SRGAN [5] and ESRGAN [6], which aimsto reduce the perceptual loss without paying attention todetails and looking at the big picture. The two algorithmsin different directions have different application ﬁelds. Inmedical imaging, the details and features of the image ishelpful for making a precise diagnosis instead of pursuingthe image’s overall clarity. Therefore, in this work, we willdig into the algorithms that pursue detail restoration andtheir medical ﬁeld applications. Algorithms pursuing detailrestoration are also sorted into three categories.1. Pre-sampling super-resolution: this algorithm usestraditional interpolation as a preprocessing to obtain coarsehigher-resolution images and then reﬁnes them using deepneural networks [4] [7] [8] [9] [10] [11].2. Post-sampling super-resolution: Most computation isperformed in low-dimension, the predeﬁned upsampling isreplaced with end-to-end learnable layers integrated at theend of the models [5] [12] [13] [14].3. Progressive upsampling super-resolution: the net-works are based on a cascade of CNNs and progressively a r X i v : . [ ee ss . I V ] F e b reconstruct higher-resolution images. At each stage, theimages are upsampled to higher resolution and reﬁned byCNNs [15] [16] [17].The SRCNN model [4] is a pioneering work of intro-ducing deep learning into SISR and using bicubic interpola-tion as the preprocessing process. Subsequently, the VDSRmodel [7] introduced the residual structure to SISR. Insteadof directly learning the mapping from low-resolution im-ages to high-resolution images, VDSR learns the residualsof the two images. Residual learning structure not onlyaccelerates convergence speed of model training but alsointroduces deeper network structure into SISR so that themodel has a wider receptive ﬁeld. The DRCN [10] modelintroduces the recursive structure into the SISR and dividesthe model into three areas. One is the Embedding network,the other is the Inference network, and the third is theReconstruction network. The highlight of this model liesin the intermediate Inference network and loss function.Inference network shares a convolution parameter, there areD layers, the output of each layer is pooled together, andthen two losses are deﬁned. The ﬁrst type, local loss, isthe difference between each layer’s output value and HRimage. The second type computes the difference betweenthe weighted average output of all layers and the HRimage, combining these two losses to form the overall loss.The FSRCNN model [18] uses deconvolution to replacethe interpolation in the SRCNN model and directly learnsthe mapping from low-resolution images to high-resolutionimages to achieve end-to-end training. The core conceptof ESPCN [19] is sub-pixel convolutional layer (also called”pixel shufﬂe”). The input of the network is the original low-resolution image. After passing through three convolutionallayers, a featured image with r channels equal to the inputimage size is obtained. Then rearrange each channel pixelof the feature image into an r ˆ r area, correspondingto a sub-block of size r in the high-resolution image,so that the feature size is H ˆ W ˆ r Images are rear-ranged into rH ˆ rW ˆ high-resolution images. The sub-pixel convolutional layer proposed by the ESPCN modelis widely used in subsequent studies. Compared with thedeconvolutional layer of the FSRCNN model, it can learnthe nonlinearity of low-resolution to high-resolution images.The SRDenseNet model [13] introduces DenseNet into theSISR ﬁeld. DenseNet inputs the features of each layer in thedense block to all subsequent layers so that the features ofall layers are concatenated, instead of directly performingtensor summation like in ResNet [20]. This architecturebrings the advantages of reducing the problem of gradientdisappearance, strengthening feature propagation, support-ing feature reuse, and reducing the weight parameters tothe entire network. Hu et al. [21]deploys Resblocks intoU-Net [22] architectures to enhance video resolution andsuppresses blurring. Recently, the Generative AdversarialNets (GANs) [23] attract much attention to resolution en-hancement because of its advantages in better promotingﬁner details, sharp edges, and removing inaccurate artifacts.SRGAN [4] ﬁrst introduced GAN into the ﬁeld of super-resolution. Compared with traditional GAN study, SRGANinputs low-resolution image instead of noise samples. More-over, SRGAN deﬁnes a Content loss, which is a weightedsum of MSE loss and perceptual loss, to replace a simple MSE loss in generator training. Together with adversarialloss, the author named the whole generator loss as ’Per-ceptual Loss’. By making full use of a pre-trained networklike VGG19 in ImageNet, Perceptual loss can complementtexture information in high-resolution outputs. Based onSRGAN, Wang et al. [6] proposed an enhanced version ofthat structure called ESRGAN, which outperforms SRGANin many SR competitions. Comparingly, ESRGAN accessesfour promotions: First, ESRGAN replaced Residual Blockwith Dense Blocks and removed all Batch Normalization intraining. Batch Normalization is similar to a kind of contraststretching for images. After any image passing throughBatch Norm, its color distribution will be normalized. Inother words, it destroys the original contrast information ofthe image, which is not suitable for pixel-level image gener-ation tasks like this study. Compared with ResBlocks, DenseBlocks intends to converge to a globally optimal solution,especially without BN constraint. Second, the discriminatorpart of the loss function is modiﬁed by subtraction betweenreal data loss and generated data loss. Third, generator losswas calculated using the characteristic map before ReLUactivation. Finally, Using the network interpolation methodto settle the contradiction between the objective evaluationindex and subjective visual effect. The importance of image denoising in low-level vision canbe revealed in many aspects:1. Noise corruption is inevitable during image acqui-sition and processing, and it will heavily degrade imagequality and add interference to high-level vision tasks.2. In medical imaging, even subtle noise may misguidediagnosis.3. In step-progressive inference via splitting variables,many image restoration tasks can be addressed by embed-ding an intermediate denoising step, further expanding itsapplication ﬁelds.Image denoising technology has become a researchhotspot for tens of years. For example, using non-local sim-ilarity [24] to optimize the sparse method can improve de-noising performance. Dictionary learning [25] helps removenoise quickly. The prior knowledge [26] [27] [28] restoresthe details of the potentially clean image by smoothing thenoise image. More competitive denoising methods includ-ing BM3D [29], WNNM [30], NLR-MRF [31] and TNRD [32]can be used. Although most of these methods achieved goodperformance in image denoising, they have the followingdrawbacks:1.The testing phase involves complex optimizationmethods.2.Numerous manually set parameters.3.Denoising models are ﬁxed to certain denoising tasks.Deep learning technology with strong self-learning abil-ity can address these shortcomings. The application ofdeep learning technology in image denoising includes deeplearning technology of additive white noisy(AWNI) imagedenoising, deep learning technology of real noise imagedenoising, deep learning technology of blind denoising,and deep learning technology of composite noise imagedenoising. DnCNN [33] proposes to use convolution to learn from end-to-end residuals and the perspective offunctional regression, using convolutional neural networksto separate noise from noisy images and achieve denoisingresults that are signiﬁcantly better than other methods.Since then, a series of improvements based on the networkstructure have been proposed. Residual Encoder-DecoderNetwork(REDNet) [34] uses a deep convolutional encoding-decoding framework based on symmetric skip connections,so that in the reverse process, information can be directlytransferred from the top layer to the bottom layer; Mem-ory Network(MemNet) [8] further proposes a long-termmemory model for image denoising; Multi-level WaveletCNN(MWCNN) [35] proposes a multi-level wavelet CNNframework, which is beneﬁcial to restore image details bycombining discrete wavelet transform with a convolutionalnetwork. The above methods usually require separate train-ing models for different noise levels, which not only lackﬂexibility but also cannot be applied to real noise imageswith more complex degradation processes. CBDNet [36] isa blind denoising method that combines noise estimationand a non-blind denoising model. By relying on signal-dependent noise and the inﬂuence of camera image signalprocessing on noise, synthetic noise and real noise imagesare used for network training, which can achieve sound de-noising effects and generalization capabilities on real noiseimages. A new trend to combine the traditional mathemat-ical model with deep learning priors has become a hotspot.For example, Regularization by Denoising (RED) [28], andits more efﬁcient variant [37] attempt to incorporate deeplearning priors into denoising models and achieve relativelygood performances. In image denoising tasks, convolutionalneural networks have achieved great success. However,most of the existing models are based on noise-clear imagepairs for supervised learning. In some speciﬁc applications,such as CT, MRI, due to the difﬁculty of obtaining explicitimages, the methods based on unsupervised learning showa wide range of application prospects. However, the existingconvolutional neural network denoising methods based onunsupervised learning are still in the trial stage, and thetraining speed and recovery performance need improve-ment. Therefore, it is of great signiﬁcance to explore self-supervised and unsupervised learning methods for realnoisy images.In this study, we work on a strategy to deal with generalimage recovery tasks, including image denoising and imagesuper-resolution. In imaging scenes like camera, CT, andMRI, we face a complex image recovery task. That is to say,to reconstruct a satisfying image, we need to deploy step-by-step techniques. For example, due to the X-ray transmitter’slow-dose photon, the CT image will be polluted by severenoise. In the meantime, the image resolution may need tobe enhanced to satisfy diagnosis demand because of the CTmachine’s resolution limit. Our research focuses on a gen-eral post-processing strategy to deal with such compositeimage recovery tasks. The main contributions are threefold:First, we combine deep learning module with deep iterationmodule to reconstruct different kinds of image recoverytasks by one strategy. Second, we propose a novel GANnetwork for the deep learning module to recover imagedetails and lost information. Third, the previous network,together with a compressed sensing technique, is deployed further to promote image quality for the deep iterationmodule. The proposed method is proved to be effective inboth super-resolution and denoising tasks.The rest of the paper is organized as follows: In sectionII, we brieﬂy review related mathematical theories andthen establish the deep network module and deep iterationmodule of our proposed method. In section III, both imagesuper-resolution and image denoising experiments on var-ious datasets are performed, and then we compare severalindexes to evaluate the proposed method qualitatively andquantitatively. In section IV, we discuss some related issuesand conclude.

ROPOSED M ETHOD

The proposed method is useful in various image recon-struction tasks, including but not limited to, resolutionenhancement and image denoising. This method consists oftwo modules. First, a generative adversarial network withWGAN-GP training is built to recover general image struc-tures; Second, a post-processing strategy, named IterationReﬁnement (IR), deploys a compressed sensing method anda pre-trained network to recover details and suppress arti-facts iteratively. During training the proposed network, fordenoising tasks, a set of input label images are ﬁrst pollutedby Gaussian white noise; for image super-resolution tasks,label images are downsampled by a factor of two in both x and y directions and then upsampled by a factor of 2in both directions using Bicubic interpolation. For trainingthe network, every low-quality image and its correspondinglabel image are used. In testing, given a low-quality inputimage, the trained network predicts the high-quality image.Then the predicted image is fed into an iterative module topromote image quality again iteratively. The proposed DIAMOND architecture consists of a gener-ator subnetwork and a discriminator subnetwork and per-forms WGAN-GP training. For the generator, we base ournetwork architecture on RUNet [21], which is used initiallyto enhance image resolution in video sequences. To betterapply RUNet into image recovery tasks, we use stridedconvolution in the contracting path to strengthen its multi-level feature extraction function. Apart from that, to reducetraining parameters and speed up WGAN-GP training, weuse pixel summation instead of original concatenation in theexpanding path. For discriminator, we set up a multi-levelfeature extraction network with a one-dimension tensoroutput, suggested by WGAN-GP, to calculate and minimizeWasserstein Distance.

Following Arjovsky et al. [38] and Gulrajani et al. [39], wedeﬁne a discriminator network D ω which we optimize in analternating manner along with G θ to solve the adversarialmin-max problem: min θ max ω E I „ P g r D ω p I qs ´ E I „ P r r D ω p I qs ` Λ E I „ P ˆ I rp} ∇ I D ω p I q} ´ sq s (1) Figure 1. Architecture of (a) generator and (b) discriminator network. Where kz nz represents a convolutional layer with kernel size of z ˆ z and z feature maps. The residual blocks in black dotted boxes is used in image super-resolution tasks, and should be replaced by two sets ofconvolution-batch norm-ReLU layers in denoising tasks. Similarly, residual blocks in red dotted boxes are suitable for super-resolution tasks, andshould be replaced by two sets of convolution-batch norm-ReLU layers in denoising tasks. Suppose I L is the input low-quality image, I H is the high-quality label image. Where ˆ I “ G θ p I L q , (cid:15) „ U nif orm r , s , ˆ I “ (cid:15) I H ` p ´ (cid:15) q ˆ I , Λ is the coefﬁcient for gradient paneltyterm.Unlike traditional GAN by Goodfellow et al. [23],WGAN-GP training stables the training process by remov-ing logarithm in loss functions, discarding sigmoid activa-tion in discriminators, and adding a gradient penalty termin loss functions. With this approach, our generator canlearn to create highly similar solutions to real label imagesand thus difﬁcult to classify by D . Meanwhile, the networkis much easier to converge.Our generator network consists of several residualblocks, strided convolutions, and tensor operations, asshown in Fig. 1(a). We use the residual training method tooptimize the training process, which means the proposedgenerator learns the residue image between label imagesand low-quality images. Unlike conventional UNet archi-tecture [22], the contracting path (left path) shown in Figure1(a) consists of a sequence of blocks, each followed by atensor addition operation to feed forward the same blockinput to the subsequent block, so-called residual block [12].This architecture allows the network to transfer shallow features directly to deep layers. The image features can bebetter preserved using multiple residual blocks in everystep of the contracting path. To efﬁciently upscale the low-resolution image, the transposed convolution layers areused for the expanding path, the right path shown in Fig.1(a). The number of residual blocks deployed in every stepwill be further discussed in 3.3.2. The residual blocks inblack dotted boxes are used in image super-resolution tasksand replaced by two groups of convolution-BN-ReLU layersin denoising tasks. Similarly, residual blocks in red dottedboxes are suitable for super-resolution tasks and should bereplaced by two groups of convolution-BN-ReLU layers indenoising tasks.For the contracting path, the input residual image passesthrough one set of convolution-BN-ReLU layer to produce64 feature maps, then its size is contracted by half.Moreover, our proposed generator modiﬁes classicRUNet from two aspects. First, instead of deploying pool-ing layers, we use convolutional layers with stride 2 and1/2 for in-network downsampling and upsampling, whichwill enlarge the reception ﬁelds. Speciﬁcally, we utilize k p“ q downsampling and upsampling steps in the mod-iﬁed RUNet, leading to k ` spatial scales of feature maps.Second, we adopt a simple pixel-wise summation operation to combine the feature maps from the encoder and decodersubnetworks instead of concatenation utilized in UNet. Weempirically ﬁnd that element-wise summation effectivelyreduces the network parameters and can lead to comparablereconstruction results. To achieve better perceptual performance, we use percep-tual loss function [40] during all training tasks, as shownin the following section. However, the perceptual loss has asevere shortcoming to introduce annular or rectangular ar-tifacts in reconstruction images. According to our research,the proposed discriminator structure performs well in sup-pressing such artifacts and attain more delicate features.A well-trained discriminator indicates ’distance’ from thegenerated image to the real image by minimizing discrim-inant loss (Wasserstein loss in our study). By alternatelytraining a generator and discriminator, annular artifacts canbe effectively suppressed during this process.To discriminate real images from generated image sam-ples, we train a discriminator network. The architectureis shown in Figure 1(b). We follow the guidelines fromRadford et al. [41] and use LeakyReLU activation p σ “ . q to avoid ’Dead Neurons’. Unlike the original method of us-ing max-pooling to reduce image sizes, we applied stridedconvolution throughout the network to enlarge receptionﬁelds. The discriminator network is trained to solve themaximization problem in Equation 1. It contains eight con-volutional layers with an increasing number of ˆ ﬁlterkernels, increasing by a factor of 2 from 32 to 256 kernels.Strided convolutions are used to reduce image resolutionand increase channels each time the number of features isdoubled. The resulting 256 feature maps are followed by onedense layer to obtain a one-dimensional tensor for WGAN-GP training. By deploying a discriminator network, we cansuppress annular artifacts introduced by perceptual lossfunction; By removing sigmoid function at the output layer,we follow WGAN-GP training demands, which means thetraining process can achieve better global convergence.According to [38] and [39], WGAN-GP training is adapt-able to various GAN training procedures. By removingsigmoid activation in the discriminator’s output layer, dis-carding all logarithms in generator and discriminator losses,and adding a gradient penalty term to stabilize gradient de-scent, WGAN-GP training introduces Wasserstein Distanceinstead of Jensen-Shannon divergence in loss functionsto prevent gradient vanishing problems. When applyingWGAN-GP training, the exponential decay rate of ﬁrst-moment estimation( β ) and second-moment estimation( β )in discriminator’s adam optimizer are empirically set as 0.5and 0.9. The deﬁnition of our perceptual loss function l is criticalfor our generator network’s performance. While l is com-monly based on the MSE [12], we consider the perceptualloss functions [40] which map the predicted image ˆ I andthe target high-quality image I H into a feature space andmeasure the distance between the two mapped images inthe feature space. We formulate the perceptual loss as the weighted sum of a content loss p l C q and an adversarial losscomponent p l Gen q as : l “ l C ` λl Gen (2)where λ is a hyper-parameter. In the following we describeour choices for the content loss p l C q and adversarial loss p l Gen q .2.1.3.1 Content Loss: The pixel-wise MSE loss iscalculated as: l MSE “ W HC C ÿ z “ W ÿ x “ H ÿ y “ p I Hx,y,z ´ G θ p I L q x,y,z q (3)where I L is the input low-quality image. Above is the mostwidely used optimization target for image reconstructiontasks. However, while achieving exceptionally high PSNRvalue, reconstructions often lack sharp edges and ﬁne de-tails. In other words, the high-frequency contents of theimage are not preserved, resulting in unsatisfying solutionswith overly smooth textures.To solve this problem, we rely on the ideas of Johnson[40] et al. and use a loss function to measure perceptual sim-ilarity. We use a pre-trained VGG-16 network proposed bySimonyan and Zisserman [42]. Let Φ “ t φ j , j “ , , ..., N p u denote a loss network that extracts features from a giveninput image and consists of N p convolutional layers, where φ j p I q denotes a feature map of size C j ˆ H j ˆ W j obtainedat the j th convolutional layer for the input image I L , and N p “ in this paper. Given a predicted image ˆ I and a targetimage I H , the feature distance (cid:96) j at the j th layer can becomputed as follows: (cid:96) j “ W j H j C j p φ j p ˆ I q ´ φ j p I H qq (4)So, the content loss can be written as: l C “ N p ÿ j “ (cid:96) j (5)2.1.3.2 Adversarial Loss: As we mentioned beforein 2.1.1, merely using perceptual loss will introduce an-nular or rectangular artifacts in reconstruction images. Tosolve this problem, we combine the perceptual loss with anadversarial loss to further suppress artifacts. By trying tofool the discriminator, this network encourages generatedimages to approach real images gradually. We absorb theidea from Arjovsky et al. [38] and Gulrajani er al [39] todeploy WGAN-GP training. Speciﬁcally, the generator loss l Gen is deﬁned as: l Gen “ N ÿ n “ ´ D ω p G θ p I L qq (6)Here, ´ D ω p G θ p I L qq is the loss function of generator lossin WGAN-GP training which indicates the probability thatthe reconstructed image G θ p I L q is a real image. Correspond-ingly, we use D ω p G θ p I L qq ´ D ω p I H q ` λ p ››› ∇ ˆ I D ω p ˆ I q ››› ´ q ,where ˆ I “ (cid:15) I H ` p ´ (cid:15) q G θ p I L q as the discriminator loss inWGAN-GP training. For a given low-quality image I L , and I p k q is recoveredimage at k th iteration using the deep iteration module,where k P r , K s is the index of iteration, and K is the totalnumber of iterations. H represents the operation kernel,which is blurred kernel and Gauss kernel in this study. ψ p I q is a trained deep reconstruction network, which transfersa poor image quality to a good recovered image. f p¨q isa regularization prior penalized on the recovered image.The goal of deep iteration module is to search the solutionsatisfying measurement data within the near domain ofcurrent iteration. In general, the optimization model basedon current image is formulated as follows: ! I p k ` q , y p k ` q ) “ argmin t I , y u p ››› y ´ p I p L q ´ HI p k q q ››› F ` µ ››› I ´ I p k q ´ ψ p y q ››› F ` ξf p I qq (7)where µ ą and ξ ą are weighting parameters to balancethe component of deep learning and regularization term.The ﬁrst term on the right enforces data ﬁdelity in the mea-surement domain. The second term emphasizes the recov-ered images need to satisfy the requirement of deep learningprior. The third term based on f p I q is a general regularizerby considering the general priors. The mathematical modelof Eq. (7) enables a superior image reconstruction based ona combination of a deep image prior and a regularizationprior.Because the model of Eq. (7) contains the optimizationof neural network, i.e., ψ p y q , which is complex and we canreplacing ψ p y q with g and then Eq. (7) is converted to be thefollowing form: ! I p k ` q , g p k ` q , y p k ` q ) “ argmin t I , g , y u p ››› y ´ p I p L q ´ HI p k q q ››› F ` µ ››› I ´ I p k q ´ g ››› F ` ξf p I qq ,s.t. g “ ψ p y q (8)The mathematical model of Eq. (8) is a constraint opti-mization problem, which can be convert into the followingunconstrained problem: ! I p k ` q , g p k ` q ) “ argmin t I , g , y u p ››› y ´ p I p L q ´ HI p k q q ››› F ` µ ››› I ´ I p k q ´ g ››› F ` ξf p I qq , ` υ } g ´ ψ p y q} F q (9)where there are three variables to be optimized. By using thealternating optimization strategy, it can be divided into threesub-problem: the sub-problem of solving y , the sub-problemof solving g , the sub-problem I , which can be respectivelywritten as follows: y p k ` q “ argmin y p ››› y ´ p I p L q ´ HI p k q q ››› F ` υ ››› g p k q ´ ψ p y q ››› F , (10) g p k ` q “ argmin g µ ››› I ´ I p k q ´ g ››› F ` υ ››› g ´ ψ p y p k ` q q ››› F , (11) I p k ` q “ argmin I ››› I ´ I p k q ´ g p k ` q ››› F ` ξf p I qq , (12)where ξ “ ξ { λ . Regarding as the sub-problem of y , it issolved by derivative descent method and then we have p p k ` q “ p I p L q ´ HI p k q ` υ Hg p k q q{p ` υ q (13)To keep consistent with the original measurement, we as-sume that an initial condition Hg p q “ I L is satisﬁed. Inother words, the above formula is valid for k “ with thecondition Hg p q “ I L .Regarding the sub-problem for g, the solution can bedirectly obtained: g p k ` q “ υψ p y p k ` q q υ ` µ (14)Regarding as the regularization prior term, different se-lection of regularization priors result in different recoveryresults. The regularization prior has an important effecton the ﬁnal reconstruction. Among many priors for imagereconstruction, including dictionary learning [43], low-rank[44], sparsity [45], and others [46], we use a simple TV-typeregularizer to encourage the sparsity as an example: f p I q “ ÿ J j “ ÿ J j “ | I p j , j q ´ f p j ´ , j q| `| f p j , j q ´ f p j , j ´ q|q , j “ , ..., J ; j “ , ...J , (15)where J and J represent the width and height of areconstructed image, and the gradients on the image borderare set to zero. Thus, I p k ` q can be updated as follows: I p k ` q “ argmin I p q ››› I ´ I p k q ´ g p k ` q ››› F ` ξ J ÿ j “ J ÿ j “ | I p j , j q ´ I p j ´ , j q| `| I p j , j q ´ I p j , j ´ q|q (16)Replacing I p j , j q ´ I p j ´ , j q and I p j , j q ´ I p j , j ´ q with d p j , j q and d p j , j q respectively, we have theunconstrained problem: ! I p k ` q , d p k ` q , d p k ` q ) “ argmin t I , d , d u p q ››› I ´ I p k q ´ g p k ` q ››› F ` ξ J ÿ j “ J ÿ j “ | d p j , j q ´ d p j , j q| ` ρ J ÿ j “ J ÿ j “ p| d p j , j q ´ p I p j , j q ´ I p j ´ , j qq| `| d p j , j q ´ p I p j , j q ´ I p j , j ´ qq|q (17)The above optimization problem can be solved by alter-nately minimizing the objective function. An FFT-basedalgorithm, FTVd [47], is employed to ﬁnd the solution. Notethat there are two parameters in the above problem: ρ and ξ . These parameters are made the same in this study, we canuse the same variable δ to replace ρ and ξ . Deep Iteration Module Mechanism : As demonstratedin Fig. 2, the mechanism of the deep iteration module isbased on the iterative reﬁnement. The error feedback isessential to recover structural subtleties that can be lost

Figure 2. Architecture of deep iteration module. This module consists of four components: deep reconstruction, compressed sensing, imagedegradation mapping, and iterative reﬁnement. p p q is the original tomographic dataset, and p p k q , k “ , , . . . , K , represents an estimated residualdataset in the k th iteration between p p q and the currently reconstructed counterpart. Φ w p p p k q q is an output of the deep reconstruction module, and f p k q represents a reconstruction regularized via compressed sensing. using a single neural network. This mechanism helps effec-tively suppress mismatches and/or inconsistencies causedby existing deep learning methods [48] [49]. The output ofthe neural network is combined with the data as the input tothe DL reconstruction network. The trained neural networkis employed to perform image recovery again so that onecan obtain residual image and then add it to the previousrecovery result. The deep learning network and compressedsensing at each iteration produce residual image for a grad-ually improved image recovery. It is easy to understand thatthe DL network is trained on original images but it may notdirectly produce an idea clean image that is consistent withthe sparsity requirement by compressed sensing. This issuecan be addressed with regularization prior in terms of totalvariation [50], low-rank [51], dictionary learning [52], etc. Inthis study, the anisotropic TV is employed to perform suchtask [53]. XPERIMENTS

We implemented the proposed models using the Tensorﬂowframework. We use the python implementation of Bicubicinterpolation, SRGAN, and RU-Net to do super-resolutiontasks for fair comparisons. We also use the Tensorﬂowimplementation of U-Net, DnCNN, and GAN to do denois-ing tasks. The performance of the proposed DIAMOND isevaluated on simulated and real datasets. For image super-resolution tasks, we ﬁrst conduct simulated experiments to verify DIAMOND’s mechanism in image super-resolutiontasks. We use real datasets to furtherly prove the method’seffectiveness. For image denoising tasks, we also followthe previous experimental process. All the experiments areimplemented on Ubuntu (14 CPUs Intel Xeon E5-2683 v3,@2GHz, Titan X GPU, 12.0 GB VRAM, 64.0 GB RAM).

Evaluation measures.

Four quantitative picture qualityindices (PQI) are employed for performance evaluation,including root mean square error (RMSE), peak signal-to-noise ratio (PSNR), structure similarity index (SSIM). Asmaller MSE value refers to a subtler deviation between thereconstructed image and reference image. A larger PSNRvalue means a higher image quality. A larger SSIM valuereﬂects a higher similarity in image structures.

Implementation details.

We implement and train ournetwork using the Tensorﬂow framework. We use Adamoptimizer to train the network for 200 epochs, 1100 itera-tions. The learning rate l is halved every 100 epochs. Thebatch size b is set according to the number of training data.More implementation details are listed in Table. 1. In this study, an abdominal cavity CT dataset from AAPMcompetition is ﬁrst used to compare all reconstructionmethods’ performance. After proving DIAMOND outper-forms all other methods, we further apply our method toreal oral cavity CT data from Jiangsu Province Hospital,

Figure 3. CT Image Super-resolution Results. The 1st-4th rows are random abdominal CT slices and their corresponding ROIs from AAPM dataset.The5th-8th rows are random oral CT slices and their corresponding ROIs from local hospital.

China. The size of the original data is ˆ . We ﬁrstdownsample this image to ˆ , then do bicubicinterpolation to recover its original size. Then we use theinterpolated image as the input of the network module ofDIAMOND. Mean square error(MSE), peak signal-to-noiseratio(PSNR), and structural similarity(SSIM) are employedto access the reconstruction results quantitatively. To reachan optimal performance of the proposed method, we modifyhyper-parameter values empirically. For all methods, hyper-parameters’ optimized values to minimize RMSE (also max-imizing PSNR) have been selected. We now list all of thembelow in Table. 2.To validate the performance of the proposed DIAMONDmethod for image super-resolution reconstruction, Fig. 3shows the reconstruction results using all super-resolutionmethods. The downsampled scale is set as 2.To fairly com- pare the performance of all methods, the parameters havebeen optimized to obtain the best results. Results in the5th column are obtained by the mere network module ofDIAMOND and are named DIAMOND-. Pre-upsamplingimages are obtained from bicubic interpolation of downsam-pled counterparts. Fig. 3 demonstrates that the proposedmethod leads to images with better edge preservation andadequate feature discovery than those obtained with othermethods. More speciﬁcally, pre-upsampling images sufferfrom severe blur and detail missing, as shown in Fig. 3(b1)-(b4). Circular artifacts are observed in RUNet results, asillustrated by Fig. 3(d1)-(d4). DIAMOND- achieves betterresults than the above methods in suppressing circularartifacts and restoring image details, which can be observedfrom extracted regions-of-interest (ROIs) in Fig. 3(e1)-(e4).Compared with DIAMOND-, the proposed method has Figure 4. CT Image-denosing Results. The 1st-4th rows are random abdominal CT slices and their corresponding ROIs from AAPM dataset. The5th-8th rows are random oral CT slices and their corresponding ROIs from Jiangsu Province Hospital, China. better performance in subtle detail preservation, as pointedby arrows in ROIs in Fig. 3(f1)-(f4).Table. 2 shows the quantitative results (RMSE, PSNR,SSIM) concerning super-resolution reconstructions in Fig.2. In Fig. 3, we show two slices for both abdominal dataand oral data. In Table. 2, the quantitative results of the twoslices are averaged. It can be ﬁgured out that our proposedmethod has the smallest RMSE value and highest PSNRand SSIM value, meaning that our proposed method canachieve the nearest distance from ground truth, suppressingnoise and preserving subtle details. It should be mentionedthat SRGAN and RUNet results are not quantitatively betterthan bicubic interpolation results though they can maintainimage structures and optimize ﬁne details much better visually. That is mainly due to the artifacts introduced byperceptual loss functions. In our method, we manage toremove artifact pollution, which results in a quantitativepromotion.

To validate the performance of the proposed DIAMONDmethod for image denoising tasks, we need to preparea training dataset of input-output pairs tp y i , M i ; x i qu Ni “ .Here, y i is obtained by adding Additive White GaussianNoise (AWGN) to latent image x i and M i is the noise levelmap. The reason to use AWGN to generate the trainingdataset is two-fold. First, AWGN is a natural choice whenthere is no speciﬁc prior information on the noise source. Table 1. Parameter values for all experimentsDeep Network Module Deep Iteration module λ Λ l b s δ ε Denoising abdominal 0.005 10 0.00005 16 0.0005 - 0.0009oral 0.001 10 0.0001 48 0.0001 - 0.0009Super resolution abdominal 0.005 10 0.00002 16 0.05 0.01 0.00005oral 0.001 10 0.00002 48 0.01 1 0.00025

Figure 5. Histograms of ablation results on extracting depth and numberof residual blocks. (a) shows PSNR values on two datasets consideringdifferent contracting depths. After conﬁrming contracting depth(=4), (b)shows PSNR values on abdominal dataset considering different residualblocks

Table. 2 Results of different Super Resolution methods on two datasetsAAPMabdominalCT data Bicubic SRGAN RUNet DIAMOND- DIAMONDRMSE 6.9677 8.0404 7.9280 4.2147

PSNR 31.2399 30.0269 29.7839 35.6635

SSIM 0.8983 0.9264 0.7342 0.9497 real oralCT data Bicubic SRGAN RUNet DIAMOND- DIAMONDRMSE 11.2241 7.1176 5.3530 4.6239

PSNR 27.0790 31.0842 33.2968 35.2895

SSIM 0.7856 0.8503 0.8406 0.9037

Second, real-world noise can be approximated as locallyAWGN. We found that the learned model still works onreal noisy images.We compare our DIAMOND method with DnCNN, U-Net, and DIAMOND- methods on the same datasets. Allmethods are trained through residual learning. The RMSE,PSNR, and SSIM values are shown in Table. 3, which un-doubtedly indicates that our method outperforms others.The results are visualized in Fig. 4, showing that the DI-AMOND method can effectively remove AWGN withoutgenerating annular artifacts. Fig. 4(d1)-(d4) shows that U-Net results destroy image details though removing noise.Fig. 4(e1)-(e4) shows that DIAMOND- do well in remov-ing AWGN and preserve image structures. However, thismethod fails to suppress annular artifacts, which degradesimage quality. Equipped with a post-processing deep iter-

Table. 3 Results of different Denoising methods on two datasetsAAPMabdominalCT data AWGN DnCNN Unet DIAMOND- DIAMONDRMSE 13.6934 5.6987 5.1152 5.1148

PSNR 25.4007 33.0254 33.9536 33.9542

SSIM 0.4124 0.7601 0.8638 0.8644 real oralCT data AWGN DnCNN UNet DIAMOND- DIAMONDRMSE 13.4655 5.0630 5.4125 4.3722

PSNR 25.5467 34.0429 33.6077 35.3486

SSIM 0.4316 0.9040 0.9296 0.9277 ation module and a GAN network, the proposed methodhave an obvious advantage in removing noise, suppressingartifacts, and preserving delicate features, as is shown in Fig.4 (f1)-(f4). Moreover, it can be readily illustrated in Table. 3that the proposed method outperform all others in all threeindexes.Fig. 5 analyzes the convergence speed of the proposedmethod in Super-resolution tasks. It can be ﬁgured outthat our method converges at tens of steps in differentdatasets. Also, their index values (RMSE, PSNR, and SSIM)are promoted during the process. Fig. 6 shows the conver-gence curves in denoising tasks. The deep iteration moduleperforms as a useful tool to promote the recovery images,which reduces RMSE value and increases PSNR and SSIMvalues during iterations.

This section compares the contracting/expanding depth ofthe generator network, the number of residual blocks inthe contracting/expanding path, and the loss functions,respectively. First, considering the generator architecture inFig. 1(a), we analyze how its depth inﬂuences output imagequality. Since the input patch is set as ˆ , the contractingdepth is possible to be from one to ﬁve and we comparethe results in denoising tasks using the generator network.Second, after conﬁrming the optimal contracting depth, thenumber of optimal residual blocks for each depth is furtherdiscussed. Third, pixel-wise loss functions, perceptual lossfunctions, and adversarial loss functions are compared inthe area of feature preservation and artifact introduction. We modify the generator network and achieve several re-sults to analyze how contracting/expanding depth affectsimage de-noising results. Table. 4 shows the PSNR value ofde-noising images regarding contracting/expanding depth.It can be ﬁgured out that both datasets can reach the highestPSNR values when the depth is set as 4. The histogram inFig. 5 (a) also proves this discussion.

On the basis that the optimal contracting depth is four,further ablation experiments are performed to analyzeeach depth’s optimal number of residual blocks in super-resolution tasks. It should be pointed out that at least tworesidual blocks should exist on each path, both to extractand transmit features, meanwhile the last one to increasethe number of feature maps. Moreover, when the number ofresidual blocks on a certain depth is discussed, all the otherdepths achieve the residual blocks’ optimal number. Figure 6. Ablation experiment results in loss functions. The 1st and 2nd rows are random abdominal CT slices for super-resolution tasks and theircorresponding ROIs. The 3rd and 4th rows are are random oral CT slices for de-noising tasks and their corresponding ROIs.

Table 5 and Fig. 5 (b) show the experimental PSNRresults on each depth’s number of residual blocks. To reachthe highest PSNR value, four residual blocks are in the ﬁrstdepth, four residual blocks in the second depth, six residualblocks in the third depth, and two residual blocks in thefourth depth.

Table 4. Analysis on depth of contracting/ecpanding path (/dB)Depth 1 2 3 4 5Abdominal 28.8802 28.1417 28.5645

Table 5. Analysis on number of residual blocks (/dB)res-2 res-3 res-4 res-5 res-6 res-7depth-1 35.5138 35.0753

In this section, three loss functions are discussed: pixel-wiseloss, perceptual loss, and adversarial loss.The deﬁnition of pixel-wise loss is in Eq. (3). This lossfunction calculates the pixel loss between the predictedimages and the target images. Standard pixel-wise lossfunctions, such as MSE or L2 loss, can be mostly appliedbetween each pair of predicted and the target pixels. Sincethese loss functions evaluate each pixel vector separatelyand then average all pixels, they assert that the same learn-ing is done for each pixel in the image. Pixel-wise loss iswidely used in image recovery tasks. However, pixel-wiseloss concentrates on pixel-level similarity and sometimesmisses the overall image effect or losses subtle image details,shown in Fig. 6 (b1)-(b4). (b1) is the super-resolution resultof one abdominal CT slice contrained by MSE loss, (b2)is its corresponding ROI. In this project, MSE loss hardlyworks in super-resolution reconstruction tasks. (b3) is thede-noising result of one oral cavity CT slice constrainedby MSE loss, (b2) is its corresponding ROI. MSE loss canremove noise and promote image qualities to some extent.However, compared with (d3) and (d4), it fails to preservesubtle details.The perceptual loss is deﬁned in Eq. (4) and (5). Itcompares two different images that look similar, such asthe same image at different resolutions. Even though theimages are very similar in these cases, the pixel-level loss Figure 7. Super-resolution results (DIAMOND) with different parameters.ROIs are listed at the right together.After optimizing step s and δ , the TVparameter ε is optimized on the last row.Figure 8. RMSE, PSNR, and SSIM line diagrams of Fig. 7 results(row1-4).Figure 9. RMSE, PSNR, and SSIM line diagrams of Fig. 7 results(row5). function will output a considerable error value. However,the perceptual loss function compares high-level perceptionand semantic differences between images and is good at pre-serving image details and delicate structures in image super-resolution tasks. Nevertheless, deep networks constrainedby perceptual loss tends to introduce artifacts into recon-structions. From Fig. 6 (d1)-(d4), we can see that thoughminimizing perceptual loss is beneﬁcial to recover imagedetails, it introduces annular artifacts into reconstructedimages and degrades visual effects.To suppress artifacts introduced by the perceptual loss, the adversarial loss is introduced into the overall loss func-tions with a weight parameter λ (see Eq. (2) and Eq. (6)). Thediscriminator can be valid to capture the potential attributesof high-resolution images. Compared with Fig. 6 (d2) and(d4), artifacts in (e2) and (e4) are visibly reduced.Finally, It should be mentioned that some artifacts re-main in Fig. 6 (e2) and (e4). Our Proposed method furtherremoves artifacts and predicts images with the best visualeffect(see Fig. 6 (f1)-(f4)). In this section, we provide some suggestions on parameterselections of the deep iteration module. There are threeparameters in all in this module: ADMM optimization pa-rameter δ , iterative step s and TV parameter ε . The resultsare shown in Fig. 7 and Fig. 8.With the decrase of s , iterative results tend to havemore delicate results. However, the smaller step will notonly increase artifacts but also slow down convergence.Increasing δ can subtly preserve image details. However,a more extensive δ will also bring artifacts. Increasing ε cansuppress artifacts and smooth the whole image structure.It is signiﬁcant to tradeoff between these parameters whileperforming deep iteration operations. ONCLUSION

In this study, we propose a novel strategy to solve generalmedical image restoration tasks. Our contributions are threefolds: First, we put forward a novel GAN network withmulti-level residual blocks and WGAN-GP training. Second, a deep iteration module combines deep learning with com-pressed sensing and promote restoration iteratively. Third,we incorporate the perceptual loss into the loss functionand manage to suppress artifacts introduced by that lossfunction.Medical imaging is a widely applied ﬁeld, and a distinctmedical image is helpful to medical diagnosis in manyways. However, medical images are sometimes pollutedby noise or cannot reach resolution demands. An effectiveway to restore these polluted images and reach a satisfyingimage quality both in visual effect and indexes is highlysigniﬁcant in such cases. Our proposed method can restorethe previous images and achieve reasonably outstandingperformance in different datasets compared with competingmethods. Moreover, we elaborately compare our methodwith the network part of our method to point out that theDIAMOND strategy performs better than the mere network.Also, the proposed network performs better than state-of-artmethods, which is shown in Section 3.It is also important to point out that our method hassome shortcomings:1. The proposed method only deals with ˆ super-resolution task and noise-level-15 de-noising task. The pro-posed method only deals with ˆ super-resolution task andnoise-level-15 de-noising task. The perceptual loss functionrestricts its performance in more difﬁcult recovery tasks.More loss function constraints, like L loss functions, canbe combined into current loss functions to preserve imagestructures.2. This strategy consists of two steps, and for the iterationmodule, the computational cost is relatively high. Furtherresearch should focus on simpler regularization priors tospeed up convergence.3. Whether this strategy applies to other image restora-tion tasks is worthy of a try. Other image recovery tasks,such as image deblurring and image inpainting, can betaken into consideration.In future research, we will conduct further experimentsbased on the three points above. R EFERENCES [1] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolutionas sparse representation of raw image patches,” in . IEEE, 2008,pp. 1–8.[2] W. Wu, H. Yu, P. Chen, F. Luo, F. Liu, Q. Wang, Y. Zhu, Y. Zhang,J. Feng, and H. Yu, “Dictionary learning based image-domainmaterial decomposition for spectral ct,”

Physics in Medicine &Biology , vol. 65, no. 24, p. 245006, 2020.[3] D. Hu, W. Wu, M. Xu, Y. Zhang, J. Liu, R. Ge, Y. Chen, L. Luo, andG. Coatrieux, “Sister: Spectral-image similarity-based tensor withenhanced-sparsity reconstruction for sparse-view multi-energyct,”

IEEE Transactions on Computational Imaging , vol. 6, pp. 477–490, 2019.[4] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolutionusing deep convolutional networks,”

IEEE transactions on patternanalysis and machine intelligence , vol. 38, no. 2, pp. 295–307, 2015.[5] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham,A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al. , “Photo-realistic single image super-resolution using a generative adver-sarial network,” in

Proceedings of the IEEE conference on computervision and pattern recognition , 2017, pp. 4681–4690.[6] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, andC. Change Loy, “Esrgan: Enhanced super-resolution generativeadversarial networks,” in

Proceedings of the European Conference onComputer Vision (ECCV) , 2018, pp. 0–0. [7] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in

Proceed-ings of the IEEE conference on computer vision and pattern recognition ,2016, pp. 1646–1654.[8] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memorynetwork for image restoration,” in

Proceedings of the IEEE interna-tional conference on computer vision , 2017, pp. 4539–4547.[9] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deeprecursive residual network,” in

Proceedings of the IEEE conferenceon computer vision and pattern recognition , 2017, pp. 3147–3155.[10] J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeply-recursive convolu-tional network for image super-resolution,” in

Proceedings of theIEEE conference on computer vision and pattern recognition , 2016, pp.1637–1645.[11] A. Shocher, N. Cohen, and M. Irani, ““zero-shot” super-resolutionusing deep internal learning,” in

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2018, pp. 3118–3126.[12] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanceddeep residual networks for single image super-resolution,” in

Proceedings of the IEEE conference on computer vision and patternrecognition workshops , 2017, pp. 136–144.[13] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution usingdense skip connections,” in

Proceedings of the IEEE InternationalConference on Computer Vision , 2017, pp. 4799–4807.[14] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. S. Huang,“Image super-resolution via dual-state recurrent networks,” in

Proceedings of the IEEE conference on computer vision and patternrecognition , 2018, pp. 1654–1663.[15] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacianpyramid networks for fast and accurate super-resolution,” in

Proceedings of the IEEE conference on computer vision and patternrecognition , 2017, pp. 624–632.[16] Y. Wang, F. Perazzi, B. McWilliams, A. Sorkine-Hornung,O. Sorkine-Hornung, and C. Schroers, “A fully progressive ap-proach to single-image super-resolution,” in

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition Workshops ,2018, pp. 864–873.[17] C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang, “Learning a no-reference quality metric for single-image super-resolution,”

Com-puter Vision and Image Understanding , vol. 158, pp. 1–16, 2017.[18] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in

European conferenceon computer vision . Springer, 2016, pp. 391–407.[19] W. Shi, J. Caballero, F. Husz´ar, J. Totz, A. P. Aitken, R. Bishop,D. Rueckert, and Z. Wang, “Real-time single image and videosuper-resolution using an efﬁcient sub-pixel convolutional neuralnetwork,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 1874–1883.[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in

Proceedings of the IEEE conference on computervision and pattern recognition , 2016, pp. 770–778.[21] X. Hu, M. A. Naiel, A. Wong, M. Lamm, and P. Fieguth, “Runet: Arobust unet architecture for image super-resolution,” in

Proceedingsof the IEEE Conference on Computer Vision and Pattern RecognitionWorkshops , 2019, pp. 0–0.[22] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutionalnetworks for biomedical image segmentation,” in

InternationalConference on Medical image computing and computer-assisted inter-vention . Springer, 2015, pp. 234–241.[23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarialnets,” in

Advances in neural information processing systems , 2014, pp.2672–2680.[24] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm forimage denoising,” in , vol. 2. IEEE,2005, pp. 60–65.[25] J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. Bach, “Super-vised dictionary learning,”

Advances in neural information processingsystems , vol. 21, pp. 1033–1040, 2008.[26] A. Levin and B. Nadler, “Natural image denoising: Optimality andinherent bounds,” in

CVPR 2011 . IEEE, 2011, pp. 2833–2840.[27] S. D. Babacan, R. Molina, M. N. Do, and A. K. Katsaggelos,“Bayesian blind deconvolution with general sparse image priors,”in

European conference on computer vision . Springer, 2012, pp. 341–355. [28] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could:Regularization by denoising (red),” SIAM Journal on Imaging Sci-ences , vol. 10, no. 4, pp. 1804–1844, 2017.[29] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denois-ing by sparse 3-d transform-domain collaborative ﬁltering,”

IEEETransactions on image processing , vol. 16, no. 8, pp. 2080–2095, 2007.[30] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear normminimization with application to image denoising,” in

Proceedingsof the IEEE conference on computer vision and pattern recognition , 2014,pp. 2862–2869.[31] J. Sun and M. F. Tappen, “Learning non-local range markovrandom ﬁeld for image restoration,” in

CVPR 2011 . IEEE, 2011,pp. 2745–2752.[32] Y. Chen, W. Yu, and T. Pock, “On learning optimized reactiondiffusion processes for effective image restoration,” in

Proceedingsof the IEEE conference on computer vision and pattern recognition , 2015,pp. 5261–5269.[33] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyonda gaussian denoiser: Residual learning of deep cnn for imagedenoising,”

IEEE Transactions on Image Processing , vol. 26, no. 7,pp. 3142–3155, 2017.[34] J. Jiang, L. Zheng, F. Luo, and Z. Zhang, “Rednet: Residualencoder-decoder network for indoor rgb-d semantic segmenta-tion,” arXiv preprint arXiv:1806.01054 , 2018.[35] P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo, “Multi-levelwavelet-cnn for image restoration,” in

Proceedings of the IEEEconference on computer vision and pattern recognition workshops , 2018,pp. 773–782.[36] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Towardconvolutional blind denoising of real photographs,” in

Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition ,2019, pp. 1712–1722.[37] Y. Sun, J. Liu, and U. S. Kamilov, “Block coordinate regularizationby denoising,”

IEEE Transactions on Computational Imaging , vol. 6,pp. 908–921, 2020.[38] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXivpreprint arXiv:1701.07875 , 2017.[39] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C.Courville, “Improved training of wasserstein gans,” in

Advancesin neural information processing systems , 2017, pp. 5767–5777.[40] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in

European conference oncomputer vision . Springer, 2016, pp. 694–711.[41] A. Radford, L. Metz, and S. Chintala, “Unsupervised represen-tation learning with deep convolutional generative adversarialnetworks,” arXiv preprint arXiv:1511.06434 , 2015.[42] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” arXiv preprintarXiv:1409.1556 , 2014.[43] S. Ravishankar and Y. Bresler, “Mr image reconstruction fromhighly undersampled k-space data by dictionary learning,”

IEEEtransactions on medical imaging , vol. 30, no. 5, pp. 1028–1041, 2010.[44] Y. Lee, S. Lee, and H.-J. Kim, “Comparison of spectral ct imagingmethods based a photon-counting detector: Experimental study,”

Nuclear Instruments Methods in Physics Research Section A: Accel-erators, Spectrometers, Detectors Associated Equipment , vol. 815, pp.68–74, 2016.[45] S. S. Vasanawala, M. T. Alley, B. A. Hargreaves, R. A. Barth,J. M. Pauly, and M. Lustig, “Improved pediatric mr imaging withcompressed sensing,”

Radiology , vol. 256, no. 2, pp. 607–616, 2010.[46] R. H. Chan, T. F. Chan, L. Shen, and Z. Shen, “Wavelet algorithmsfor high-resolution image reconstruction,”

SIAM Journal on Scien-tiﬁc Computing , vol. 24, no. 4, pp. 1408–1432, 2003.[47] Y. Wang, W. Yin, and Y. Zhang, “A fast algorithm for imagedeblurring with total variation regularization,” 2007.[48] D. Wu, K. Kim, G. El Fakhri, and Q. Li, “Iterative low-dose ctreconstruction with priors trained by artiﬁcial neural network,”

IEEE transactions on medical imaging , vol. 36, no. 12, pp. 2479–2486,2017.[49] C. Shen, Y. Gonzalez, L. Chen, S. B. Jiang, and X. Jia, “Intelligentparameter tuning in optimization-based iterative ct reconstructionvia deep reinforcement learning,”

IEEE transactions on medicalimaging , vol. 37, no. 6, pp. 1430–1439, 2018.[50] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variationbased noise removal algorithms,”

Physica D: nonlinear phenomena ,vol. 60, no. 1-4, pp. 259–268, 1992. [51] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery ofsubspace structures by low-rank representation,”

IEEE transactionson pattern analysis and machine intelligence , vol. 35, no. 1, pp. 171–184, 2012.[52] I. Tosic and P. Frossard, “Dictionary learning,”

IEEE Signal Process-ing Magazine , vol. 28, no. 2, pp. 27–38, 2011.[53] Z. Chen, X. Jin, L. Li, and G. Wang, “A limited-angle ct recon-struction method based on anisotropic tv minimization,”