[PDF] Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Abstract

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. The most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a great volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves the results for different scaling factors in comparison with the state-of-art methods.

Full PDF

DDeep Iterative Residual Convolutional Network forSingle Image Super-Resolution

Rao M. Umer, G. L. Foresti, Senior member, IEEE, C. Micheloni, Member, IEEE

University of Udine, Italy.

Abstract —Deep convolutional neural networks (CNNs) haverecently achieved great success for single image super-resolution(SISR) task due to their powerful feature representation capabili-ties. The most recent deep learning based SISR methods focus ondesigning deeper / wider models to learn the non-linear mappingbetween low-resolution (LR) inputs and high-resolution (HR)outputs. These existing SR methods do not take into accountthe image observation (physical) model and thus require a largenumber of network’s trainable parameters with a great volume oftraining data. To address these issues, we propose a deep IterativeSuper-Resolution Residual Convolutional Network (ISRResCNet)that exploits the powerful image regularization and large-scaleoptimization techniques by training the deep network in aniterative manner with a residual learning approach. Extensiveexperimental results on various super-resolution benchmarksdemonstrate that our method with a few trainable parametersimproves the results for different scaling factors in comparisonwith the state-of-art methods.

I. I

NTRODUCTION

The goal of the single image super-resolution (SISR) is torecover the high-resolution (HR) image from its low-resolution(LR) counterpart. SISR problem is a fundamental low-levelvision and image processing problem with various practicalapplications in satellite imaging, medical imaging, astronomy,microscopy, seismology, remote sensing, surveillance, biomet-ric, image compression, etc. In the last decade, most of thephotos are taken using built-in smartphones cameras, wherethe resulting LR image is inevitable and undesirable due totheir physical limitations. It is of great interest to restore sharpHR images because some captured moments are difﬁcult toreproduce. On the other hand, we are also interested to designlow cost (limited memory and cpu power) camera devices,where the deployment of our deep network would be possiblein practice. Both are the ultimate goals to the end users.Usually, SISR is described as a linear forward observationmodel by the following image degradation process: y = H ˜ x + η, (1)where y ∈ R N/s is an observed LR image (here N = m × n is typically the total number of pixels in an image), H ∈ R N/s × N/s is a down-sampling operator (usually a bicubic,circulant matrix) that resizes an HR image ˜ x ∈ R N by ascaling factor s and η is considered as an additive whiteGaussian noise with standard deviation σ . However, in real-world settings, η also accounts for all possible errors during theimage acquisition process that include inherent sensor noise,stochastic noise, compression artifacts, and the possible mis-match between the forward observation model and the camera device. The operator H is usually ill-conditioned or singulardue to the presence of unknown noise ( η ) that makes theSISR of a highly ill-posed nature of inverse problems. Since,due to ill-posed nature, there are many possible solutions,regularization is required to select the most plausible ones.Generally, SISR methods can be classiﬁed into threemain categories, i.e. interpolation-based methods, model-basedoptimization methods, and discriminative learning methods.Interpolation-based methods i.e. nearest-neighbor, bilinear, andbicubic interpolators are efﬁcient and simple, but have verylimited reconstruction image quality. Model-based optimiza-tion [1] methods have powerful image priors to reconstructhigh-quality clean images, but require hundreds of iterationsto achieve acceptable performance, thus making these meth-ods computationally expensive. Model-based optimization [2]methods with the integration of deep CNNs priors can improveefﬁciency, but due to hand-crafted parameters, they are notsuitable for end-to-end deep learning methods. On the otherhand, discriminative learning [3]–[5] methods have attractedsigniﬁcant attentions due to their effectiveness and efﬁciencyfor SISR performance by using deep CNNs. Our work is in-spired by discriminative and residual learning approaches withpowerful image priors and large-scale optimization schemes inan iterative manner for an end-to-end deep CNNs to solveSISR problem. The visualization of our proposed iterativeSISR approach is shown in Figure 1, where the LR input ( y )is given to the network and then the network reconstructs theSR output. A single optimizer is used for all network stageswith shared structures and parameters. Our contributions inthis paper are three-fold as follows:1) We propose an end-to-end deep iterative Residual CNNsfor image super-resolution. In contrast to the existingdeep SISR networks, our proposed method strictly fol-lows the image observation (physical) model (refers toEq. (1)), and thus it is able to achieve better reconstructionresults even with few network’s trainable parameters(refers to Table III).2) A deep SISR network is proposed to solve image super-resolution in an iterative manner by minimizing thediscriminative loss function with a residual learning ap-proach.3) The proposed ISRResCNet is inspired by powerful im-age regularization and large-scale optimization techniquesthat have been successfully used to solve general inverseproblems in the past. a r X i v : . [ ee ss . I V ] S e p ig. 1: The visualization of our proposed iterative SISR approach as described in Algorithm 1. Given an LR image ( y ) and aninitial estimate ( x ), each network’s stage ERD (Encoder-Resnet-Decoder) produces a new estimate x ( k +1) from the previousstep estimate x ( k ) . A single optimizer is used for all network stages with shared structures and parameters by K steps.II. R ELATED W ORKS

Recently, numerous works have been addressed the task ofSISR that are based on deep CNNs for their powerful featurerepresentation capabilities. A preliminary CNN-based methodto solve SISR is a super-resolution convolutional network withthree layers (SRCNN) [6]. Kim et al. [3] proposed a very deepSR (VDSR) network with residual learning approach. Lim et al. [4] proposed an enhanced deep SR (EDSR) networkby taking advantage of the residual learning. Zeng et al. [7] proposed an S cSR method (conventional sparse coding)that learns HR/LR dictionaries exploiting the iterative sparsecoding. Jiang et al. [8] proposed a Recursive Inception (RISR)network to adopt the inception-like structure to extract HR/LRfeatures. Liu et al. [9] proposed an attention-based approachfor SISR problem. Yaoman et al. [10] proposed a feedbacknetwork (SRFBN) based on feedback connections and recur-rent neural network like structure. Zhang et al. [11] proposeda deep plug-and-play Super-Resolution method for arbitraryblur kernels by following the multiple degradation. In [12], theauthors proposed SRWDNet to solve the joint deblurring andsuper-resolution task by following the realistic degradation.The above methods are deep or wide CNN networks to learnnon-linear mapping from LR to HR with a large numberof training samples, while neglecting the image acquisitionprocess. However, our approach takes into account the physicalimage observation process by greatly increase its applicability.III. P ROPOSED M ETHOD

A. Problem Formulation

By referencing to equation (1), the recovery of x from y mostly relies on the variational approach for combining theobservation and prior knowledge, and is given as the followingobjective function: J ( x ) = arg min x (cid:107) y − Hx (cid:107) + λ R ( x ) , (2)where (cid:107) y − Hx (cid:107) is the data ﬁdelity (also known as log-likelihood) term that measures the proximity of the solutionto the observations, R ( x ) is the regularization term that isassociated with image priors, and λ is the trade-off parameter that governs the compromise between the data ﬁdelity and theregularizer term. Interestingly, the variational approach has adirect link to the Bayesian approach and the derived solutionscan be described by either as penalized maximum likelihood oras maximum a posteriori (MAP) estimates [13], [14]. Thanksto the recent advances of deep learning, the regularizer ( i.e. R ( x ) ) is employed by deep convolutional neural networks(ConvNets) that have powerful image priors capabilities. B. Objective Function Minimization Strategy

Besides the proper selection of the regularizer and formula-tion of the objective function, another important aspect of thevariational approach is the minimization strategy that will beused to get the required solution. In the literature, there areseveral modern convex-optimization schemes for large-scalemachine learning problems, such as Split-Bregman [15], HQSmethod [16], ADMM [17], Primal-dual algorithms [18], etc.In our work, we solve the under study problem (2) by usingthe Majorization-Minimization (MM) framework [19] because J ( x ) is too complicated to manipulate ( i.e. convex functionbut possibly non-differentiable). In MM [19]–[21] approach,an iterative algorithm for solving the minimization problem ˆ x = arg min x J ( x ) , (3)takes the form x ( k +1) = arg min x Q ( x ; x ( k ) ) , (4)where Q ( x ; x ( k ) ) is the majorizer of the function J ( x ) at aﬁxed point x ( k ) by satisfying the following two conditions: Q ( x ; x ( k ) ) > J ( x ) , ∀ x (cid:54) = x ( k ) and Q ( x ( k ) ; x ( k ) ) = J ( x ( k ) ) . (5)Here, we want to upper-bound the J ( x ) by a suitable majorizer Q ( x ; x ( k ) ) , and instead of minimizing the actual objectivefunction (3) due to its complexity, we minimize the majorizer Q ( . ) to produce the next estimate x ( k +1) . By satisfying theproperties of the majorizer given in (5), iteratively minimizing Q ( . ; x ( k ) ) also decreases the actual objective function J ( . ) ig. 2: The architecture of ERD (Encoder-Resnet-Decoder) blocks used in the proposed ISRResCNet.[19]. Thus, we can write a quadratic majorizer for the completeobjective function (2) as the following form: Q ( x ; x ( k ) ) = arg min x (cid:107) y − Hx (cid:107) + λ Q R ( x ; x ( k ) ) , (6)To start an initial estimate x , we have: Q R ( x ; x ) = 12 ( x − x ) T [ α I − H T H ]( x − x ) , (7)where Q R ( . ) is a distance function between x and x . Inorder to get a valid majorizer Q R ( . ) , we need to satisfytwo conditions in (5) as Q R ( x ; x ) > , ∀ x (cid:54) = x and Q R ( x ; x ) = 0 . This suggests that α I − H T H must be apositive deﬁnite matrix, which only holds if α > (cid:107) H T H (cid:107) .The parameter α depends upon the largest eigenvalue of H T H ,but, in most image restoration cases [20] such as inpainting,deblurring, demosaicking [22], and super-resolution, it approx-imately equals to one ( α ≈ ). Based on the above discussion,we can write the overall majorizer as: Q ( x ; x ) = 12 /α (cid:107) x − z (cid:107) + λ R ( x ) + const., (8)where z = x + α H T ( y − Hx ) , and the constant does notdepend on x and thus it is irrelevant to the optimization task.Finally, we proceed with the MM optimization scheme toiteratively minimize the quadratic majorizer function Q ( . ) bythe following formulation as: ˆ x ( k ) = arg min x Q ( x ; x k )= arg min x (cid:107) y − Hx (cid:107) + λ Q R ( x ; x k )= arg min x /α (cid:107) x − z k (cid:107) + λ R ( x )= Prox ( λ/α ) R ( . ) ( z k ) (9)where z k = z k + H T ( y − Hz k ) and Prox ( . ) is the proximaloperator [23], which is deﬁned as: P C ( z ) = arg min x ∈ C σ (cid:107) x − z (cid:107) + λα R ( x ) . (10)It can be noted that the above Eq. (10) is treated as theobjective function of a denoising problem, where z is the noisyobservation with noise level σ . In this way, we heavily rely onemploying a deep denoising neural network to get the requiredestimate ˆ x ( k ) by unrolling the MM scheme as K ﬁnite steps.Another thing to notice, in Eq. (9), is that we decouple thedegradation operator H from x and now we need to tackle it Algorithm 1:

The proposed SISR iterative approach. TheERD structure and parameters are shared across all itera-tive steps.

Input : y : LR input, H : Down-sampling operator, H T :Up-sampling operator, K : iterative steps, w ∈ R K : extrapolation weights, σ : estimatednoise, λ, α : projection parameters Initialization: x (0) = H T y , H T : Bilinear kernel; z (1) = x (0) + H T ( y − Hx (0) ) ; for k ← to K do Extrapolation step: z ( k +1) = x ( k ) + w ( k ) ( x ( k ) − x ( k − ) ;Proximal step (ERD-block): ˆ x ( k ) = Prox ( λ/α ) R ( . ) ( z k + H T ( y − Hz k )) ; endOutput: x K : SR outputwith a less complex denoising problem. However, obtainingthe resulting solution ˆ x ( k ) from (9) can be computationallyexpensive since it demands K times the parameters of theemployed denoiser and can exhibit the slow convergence [24],[25]. To avoid this hurdles, we adopt the similar strategyas done in [22], where the trainable extrapolation weights w ( k ) are learnt directly from the training data instead of theﬁxed ones [26]. Moreover, the convergence of our proposedmethod is sped up by adopting the continuation strategy [27].Our overall proposed method is shown in Fig. 1 and alsodescribed in the Algorithm 1, where the input settings, initial-ization, extrapolation steps, and proximal steps are deﬁned.Our proposed Algorithm 1 has a close connection to otherproximal algorithms such as ISTA [28] and FISTA [29] thatrequire the exact form of the employed regularizer such asTotal Variation / Hessian Schatten-norm [21]. However, in ourcase, the regularizer is learned implicitly from the training data( i.e. non-convex form), and therefore our algorithm acts as aninexact form of proximal gradient descent steps. C. Network Architecture

The proposed network architecture for super-resolutionis shown in Fig. 1. Given an LR image ( y ) and an initialestimate ( x ), each network’s stage ERD (Encoder-Resnet-Decoder) produces a new estimate x ( k +1) from the previousstep estimate x ( k ) . The Algorithm 1 describes the inputs,nitial conditions, and desired updates for each network stage.The ERD structure and parameters are shared across alliterative steps. Finally, a single optimizer is used to minimizethe (cid:96) -Loss between the estimated latent SR image ( x ( k ) ) andground-truth (GT) ( x ( gt ) ) after k-steps as: arg min Θ L ( Θ ) = 12 N (cid:88) n =1 (cid:107) x kn − x gtn (cid:107) (11)where N is the mini-batch size and Θ are the trainable param-eters of our network. Fig. 2 shows the ERD block used in thenetwork. In ERD network, both

Encoder (Conv) and

Decoder(TConv) layers have feature maps of × kernel size with C × H × W tensors, where C is the number of channelsof the input image. Resnet consists of residual blocks withtwo Pre-activation Conv layers, each of feature maps withkernels support × , and the pre-activation is the parametrizedrectiﬁed linear unit (PReLU) [30] with out feature channels.The Resnet also contains the Feedback (FB) path after resblocks with an initial concatenation pre-activation Conv layer by × kernel support that maps 128 features channelsto 64 to feed into resblocks. The trainable projection layer [31]inside the Decoder computes the proximal map for Eq. (10)with given noise standard deviation σ and handle the dataﬁdelity and prior terms. The noise realization is estimatedin the intermediate Resnet that is sandwiched between the

Encoder and

Decoder . The estimated residual image after

Decoder is subtracted from the LR input image. Finally, theclipping layer incorporates our prior knowledge about the validrange of image intensities and enforces the pixel values ofthe reconstructed image to lie in the range [0 , . Reﬂectionpadding is also used before all Conv layers to ensure slowly-varying changes at the boundaries of the input images. OurERD structure can also be described as the generalizationof one stage TNRD [32] and UDNet [31] that have goodreconstruction performance for image denoising problem.

D. Network Training via TBPTT

Due to the iterative nature of our SISR approach, the net-work parameters are updated using back-propagation throughtime (BPTT) algorithm by unrolling K steps to train thenetwork, which is previously used in recurrent neural net-works training such as LSTMs. However, it is computationallyexpensive by increasing the number of iterative steps K , soboth K and mini-batch ( N ) size are upper-bound on theGPU memory. Therefore, to tackle this problem, we use theTruncated Backpropagation Through Time (TBPTT) algorithmas do in [22] to train our network, where the sequence isunrolled into a small number of k -steps out of total K andthen the back-propagation is performed on the small k -steps.Furthermore, we compute the (cid:96) -Loss with respect to GTimages after k iterative steps according to Eq. (11).IV. E XPERIMENTS

A. Data augmentation

We use DIV2K [33] dataset that contains 800 HR imagesfor training. We take the input LR image patches as a TABLE I: The settings of input LR and corresponding HRpatch sizes during training.

Scale factor LR Patch size HR Patch size × ×

60 120 × × ×

50 150 × × ×

40 160 × bicubic downsample ( i.e. regarded as a standard degradation)with their corresponding HR image patches. We augment thetraining data with random vertical and horizontal ﬂipping, and ◦ rotations. Moreover, we also consider another effectivedata augmentation technique, called MixUp [34]. In

Mixup ,we take randomly two samples ( x i , y i ) and ( x j , y j ) in thetraining HR/LR set ( ˜ X , Y ) and then form a new sample (˜ x , y ) by interpolation of the pair samples by following the samedegradation model (1) as do in [35]. This simple techniqueencourages our network to support linear behavior amongtraining samples. B. Technical details

We use the RGB input LR and corresponding HR patcheswith different patch sizes according to the upscaling factor aslisted in Table I. We train the network for 300 epochs witha batch size of 4 using the Adam optimizer with parameters β = 0 . , β = 0 . , and (cid:15) = 10 − without weight decayto minimize the (cid:96) -Loss (11). We use the method of kaimingHe [30] to initial the Conv weights and bias to zero. Thelearning rate is initially set to − for the ﬁrst 100 epochsand then multiplies by . for every 50 epochs. We set thenumber of iterative steps ( K ) to and feedback steps (FB)to 4 for our method. The extrapolation weights w ∈ R K areinitialized with w t = t k − t k +2 , ∀ ≤ t ≤ K , and then further ﬁne-tune on the training data as do in [22]. The projection layerparameter σ is estimated according to [36] from the input LRimage. We initialize the projection layer parameter α on alog-scale value from α max = 2 to α min = 1 and then furtherﬁne-tune during the training via back-propagation. In order tofurther enhance the performance of our network, we use a self-ensemble strategy [37] (denoted as ISRResCNet+), where theLR inputs are ﬂipped/rotated and the SR results are alignedand averaged for enhanced prediction. C. Evaluation metrics and SR benchmarks

We evaluate the trained model under the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) metricson four benchmark datasets: Set5 [38], Set14 [39], B100 [40],and Urban100 [41]. In order to keep a fair comparison withexisting networks, the quantitative SR results are only eval-uated on Y (luminance) channel of the transformed Y CbCr color space.

D. Ablation study of iterative (K) and feedback (FB) steps

For our ablation study, we evaluate our proposed ISR-ResCNet and ISRResCNet+ performance on Set5 benchmarkdataset at × upscaling factor. Table III shows the averageABLE II: Average PSNR/SSIM values for scale factors × , × , and × with bicubic degradation model. The best performanceis shown in red and the second best performance is shown in blue. Dataset Scale Bicubic SRCNN [6] VDSR [3] EDSR-baseline [4] RISR [8] SRFBN-S [10] ISRResCNet ISRResCNet+(ECCV-2014) (CVPR-2016) (CVPR-2017) (ICPR-2018) (CVPR-2019) (Ours) (Ours)Set5 × × × × × × × × × × × × (a) PSNR vs. K(b) SSIM vs. K Fig. 3: Average PSNR/SSIM performance (Set5 on × ) ofproposed ISRResCNet and ISRResCNet+ after each iterativestep (K).TABLE III: The impact of iterative (K) and feedback (FB)steps on ISRResCNet on the scale factor × . The averagePSNR/SSIM values are evaluated on Set5 testset. Feedbacksteps (FB) Iterativesteps (K) × ) ResBlocks(D) Feature-Maps(F) ISRResCNet ISRResCNet+None 10 380 5 64 31.44 / 0.8855 31.59 / 0.8876None 20 380 5 64 31.56 / 0.8874 31.69 / 0.88914 10 388 5 64 31.63 / 0.8890 31.77 / 0.8908 PSNR/SSIM performance after iterative steps ( K ) and feed-back (FB) steps. Our trained model achieves better perfor-mance (PSNR/SSIM) by increasing the number of iterativesteps to with the shared network parameters ( i.e. FB steps (see in Fig. 3 and Table III). Whenthe FB connections introduce into our network, the model con-verges in the less number of iterative steps ( i.e. ) with betterreconstruction results by requiring a few additional parameters( i.e. +8K) because these error feedback connections [10] afterresidual blocks provide strong early reconstruction ability.Since these error feedbacks are beneﬁcial on the higher scale( × ), so we report the quantitative results in the Table II with feedback steps at × upscaling factor, while the others( × , × ) are without feedback steps with iterative steps. Itcan also be noted (see Fig. 3) that a few iterative steps (e.g. )are enough to obtain excellent SR results with the performancetrade-off between quantitative results and the computation timeof our method.Fig. 4: Visual comparison of our method with other state-of-artmethods on × super-resolution. E. Comparison with the state-of-art methods

We compare our method with other state-of-art SISR meth-ods including SRCNN [6], VDSR [3], EDSR [4], RISR [8],and SRFBN [10], whose source codes are available onlineexcept for RISR method for which the quantitative results aredirectly taken from the paper. We run all the source codes bydefault parameters test settings through all the experiments.We report the quantitative results of our method with others inthe Table II. Our method exhibits better improvement in PSNRand SSIM compared to other methods, except the EDSR. Sincethe EDSR has a much deeper network containing 16 residualblocks with . M parameters, while our model contains 5residual blocks with K parameters, which is a much lightermodel than EDSR with slightly performance difference inthe PSNR ( i.e. +0 . dB on Set5) at × upscaling factor.Despite that, the parameters of the proposed network are muchless than the other state-of-art SISR networks, which makesit suitable for deployment in mobile devices where memorystorage and cpu power are limited as well as good imagereconstruction quality (see section IV-D).egarding the visual quality, Fig. 4 shows the visualcomparison of our method with other SR methods for ahigh ( × ) upscaling factor. The proposed method successfullyreconstructs the good textures regions, sharp edges, and ﬁnerdetails of SR image compared to the other methods.V. C ONCLUSION

We proposed a deep iterative residual CNNs for a single im-age super-resolution task by following the image observation(physical / real-world settings) model. The proposed methodsolves the SISR problem in an iterative manner by minimizingthe discriminative loss function with residual learning. Ourmodel requires few trainable parameters in comparison to othercompeting methods. The proposed network exploits powerfulimage regularization and large-scale optimization techniquesfor image restoration. Our method achieves excellent SRresults in terms of PSNR/SSIM and visual quality by followingthe real-world settings for limited memory storage and cpupower requirements for the mobile/embedded deployment.R

EFERENCES[1] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparserepresentation for image restoration,”

IEEE Transactions on ImageProcessing , vol. 22, pp. 1620–1630, 2013.[2] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiserprior for image restoration,”

IEEE Conference on Computer Vision andPattern Recognition (CVPR) , pp. 2808–2817, 2017.[3] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolutionusing very deep convolutional networks,”

CVPR , pp. 1646–1654, 2016.[4] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residualnetworks for single image super-resolution,”

CVPRW , pp. 1132–1140,2017.[5] R. Muhammad Umer, G. Luca Foresti, and C. Micheloni, “Deepgenerative adversarial residual convolutional networks for real-worldsuper-resolution,” in

The IEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR) Workshops , June 2020.[6] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutionalnetwork for image super-resolution,”

ECCV , pp. 184–199, 2014.[7] K. Zeng, H. Zheng, Y. Qu, X. Qu, L. Bao, and Z. Chen, “Single imagesuper-resolution with learning iteratively non-linear mapping betweenlow-and high-resolution sparse representations,”

IEEE InternationalConference on Pattern Recognition (ICPR) , pp. 507–512, 2018.[8] T. Jiang, X. Wu, Z. Yu, W. Shui, G. Lu, S. Guo, H. Fei, and Q. Zhang,“Recursive inception network for super-resolution,”

IEEE InternationalConference on Pattern Recognition (ICPR) , pp. 2759–2764, 2018.[9] Y. Liu, Y. Wang, N. Li, X. Cheng, Y. Zhang, Y. Huang, and G. Lu, “Anattention-based approach for single image super resolution,” , pp. 2777–2784, 2018.[10] Y. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, “Feedback networkfor image super-resolution,”

CVPR , 2019.[11] K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolutionfor arbitrary blur kernels,” in

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 2019, pp. 1671–1681.[12] R. M. Umer, G. L. Foresti, and C. Micheloni, “Deep super-resolutionnetwork for single image super-resolution with realistic degradations,”in

ICDSC , September 2019, pp. 21:1–21:7.[13] M. Bertero and P. Boccacci,

Introduction to inverse problems in imaging .CRC press, 1998.[14] M. Figueiredo, J. M. Bioucas-Dias, and R. D. Nowak, “Majorization–minimization algorithms for wavelet-based image restoration,”

IEEETransactions on Image processing , vol. 16, no. 12, pp. 2980–2991, 2007.[15] T. Goldstein and S. Osher, “The split bregman method for L1-regularizedproblems,”

SIAM Journal on Imaging Sciences , pp. 323–343, 2009.[16] D. Geman and C. Yang, “Nonlinear image recovery with half-quadraticregularization,”

IEEE Transactions on Image Processing , pp. 932–946,July 1995. [17] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”

Foundations and Trends in Machine Learning , pp. 1–122, 2011.[18] A. Chambolle and T. Pock, “A ﬁrst-order primal-dual algorithm forconvex problems with applications to imaging,”

Journal of MathematicalImaging and Vision , pp. 120–145, 2011.[19] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,”

TheAmerican Statistician , pp. 30–37, 2004.[20] M. A. Figueiredo, J. M. Bioucas-Dias, and R. D. Nowak, “Majorization–Minimization algorithms for wavelet-based image restoration,”

IEEETransactions on Image processing , pp. 2980–2991, 2007.[21] S. Lefkimmiatis, A. Bourquard, and M. Unser, “Hessian-based normregularization for image restoration with biomedical applications,”

IEEETransactions on Image Processing , pp. 983–995, 2011.[22] F. Kokkinos and S. Lefkimmiatis, “Iterative joint image demosaickingand denoising using a residual denoising network,”

IEEE Transactionson Image Processing , pp. 4177–4188, 2019.[23] N. Parikh, S. Boyd et al. , “Proximal algorithms,”

Foundations andTrends in Optimization , pp. 127–239, 2014.[24] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algo-rithm for linear inverse problems,”

SIAM journal on imaging sciences ,pp. 183–202, 2009.[25] F. Kokkinos and S. Lefkimmiatis, “Deep image demosaicking using acascade of convolutional residual denoising networks,”

IEEE EuropeanConference on Computer Vision (ECCV) , pp. 303–319, 2018.[26] H. Li and Z. Lin, “Accelerated proximal gradient methods for noncon-vex programming,”

Advances in neural information processing systems(NIPS) , pp. 379–387, 2015.[27] Q. Lin and L. Xiao, “An adaptive accelerated proximal gradient methodand its homotopy continuation for sparse optimization,”

InternationalConference on Machine Learning (ICML) , pp. 73–81, 2014.[28] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholdingalgorithm for linear inverse problems with a sparsity constraint,”

Com-munications on Pure and Applied Mathematics: A Journal Issued by theCourant Institute of Mathematical Sciences , pp. 1413–1457, 2004.[29] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algo-rithm for linear inverse problems,”

SIAM journal on imaging sciences ,pp. 183–202, 2009.[30] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers:Surpassing human-level performance on imagenet classiﬁcation,”

IEEEInternational Conference on Computer Vision (ICCV) , pp. 1026–1034,2015.[31] S. Lefkimmiatis, “Universal denoising networks: A novel cnn architec-ture for image denoising,”

CVPR , pp. 3204–3213, 2018.[32] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A ﬂexibleframework for fast and effective image restoration,”

IEEE Transactionson Pattern Analysis and Machine Intelligence , pp. 1256–1272, 2017.[33] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single imagesuper-resolution: Dataset and study,” in

CVPRW , 2017, pp. 126–135.[34] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyondempirical risk minimization,”

International Conference on LearningRepresentations (ICLR) , 2018.[35] R. Feng, J. Gu, Y. Qiao, and C. Dong, “Suppressing model overﬁttingfor image super-resolution networks,”

IEEE Conference on ComputerVision and Pattern Recognition Workshops (CVPRW) , pp. 0–0, 2019.[36] X. Liu, M. Tanaka, and M. Okutomi, “Single-image noise level estima-tion for blind denoising,”

IEEE transactions on image processing , pp.5226–5237, 2013.[37] R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improveexample-based single image super resolution,” in

CVPR , 2016, pp. 1865–1873.[38] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighborembedding,”

BMVC , 2012.[39] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up usingsparse-representations,” in

International conference on curves and sur-faces , 2010, pp. 711–730.[40] D. Martin, C. Fowlkes, D. Tal, J. Malik et al. , “A database of humansegmented natural images and its application to evaluating segmentationalgorithms and measuring ecological statistics,” in

ICCV , 2001.[41] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolutionfrom transformed self-exemplars,” in