[PDF] Converged Deep Framework Assembling Principled Modules for CS-MRI

Abstract

Compressed Sensing Magnetic Resonance Imaging (CS-MRI) significantly accelerates MR data acquisition at a sampling rate much lower than the Nyquist criterion. A major challenge for CS-MRI lies in solving the severely ill-posed inverse problem to reconstruct aliasing-free MR images from the sparse k-space data. Conventional methods typically optimize an energy function, producing reconstruction of high quality, but their iterative numerical solvers unavoidably bring extremely slow processing. Recent data-driven techniques are able to provide fast restoration by either learning direct prediction to final reconstruction or plugging learned modules into the energy optimizer. Nevertheless, these data-driven predictors cannot guarantee the reconstruction following constraints underlying the regularizers of conventional methods so that the reliability of their reconstruction results are questionable. In this paper, we propose a converged deep framework assembling principled modules for CS-MRI that fuses learning strategy with the iterative solver of a conventional reconstruction energy. This framework embeds an optimal condition checking mechanism, fostering \emph{efficient} and \emph{reliable} reconstruction. We also apply the framework to two practical tasks, \emph{i.e.}, parallel imaging and reconstruction with Rician noise. Extensive experiments on both benchmark and manufacturer-testing images demonstrate that the proposed method reliably converges to the optimal solution more efficiently and accurately than the state-of-the-art in various scenarios.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Converged Deep Framework Assembling PrincipledModules for CS-MRI

Risheng Liu,

Member, IEEE,

Yuxi Zhang, Shichao Cheng, Zhongxuan Luo, and Xin Fan,

Senior Member, IEEE

Abstract —Compressed Sensing Magnetic Resonance Imaging(CS-MRI) signiﬁcantly accelerates MR data acquisition at asampling rate much lower than the Nyquist criterion. A majorchallenge for CS-MRI lies in solving the severely ill-posed inverseproblem to reconstruct aliasing-free MR images from the sparsek-space data. Conventional methods typically optimize an energyfunction, producing reconstruction of high quality, but theiriterative numerical solvers unavoidably bring extremely slowprocessing. Recent data-driven techniques are able to providefast restoration by either learning direct prediction to ﬁnal recon-struction or plugging learned modules into the energy optimizer.Nevertheless, these data-driven predictors cannot guarantee thereconstruction following constraints underlying the regularizersof conventional methods so that the reliability of their recon-struction results are questionable. In this paper, we propose aconverged deep framework assembling principled modules forCS-MRI that fuses learning strategy with the iterative solver ofa conventional reconstruction energy. This framework embedsan optimal condition checking mechanism, fostering efﬁcient and reliable reconstruction. We also apply the framework totwo practical tasks, i.e. , parallel imaging and reconstructionwith Rician noise. Extensive experiments on both benchmarkand manufacturer-testing images demonstrate that the proposedmethod reliably converges to the optimal solution more efﬁcientlyand accurately than the state-of-the-art in various scenarios.

Index Terms —Compressed sensing, MRI reconstruction, Deeplearning, Optimization, Parallel imaging, Rician noise

I. I

NTRODUCTION M AGNETIC Resonance Imaging (MRI) is a widely ap-plied none-invasive technology that reveals both func-tional and anatomical information. Unfortunately, traditionalMRI suffers from inherently slow acquisition from a largevolume of data in the k-space, i.e. , Fourier space [1]. Re-searchers introduce the compressed sensing (CS) theory intoMRI reconstruction, allowing fast acquisition at a samplingrate much lower than the Nyquist rate without degrading imagequality. The main challenge for CS-MRI is the illness of theinherited inverse problem to estimate images of high quality

This work was supported in part by the National Natural Science Foun-dation of China (NSFC) under Grant Nos 61922019, 61672125, 61572096and 61632019, the Youth Tiptop Talent project in Liaoning Province(XLYC1807088) and the Fundamental Research Funds for the Central Uni-versities. (Corresponding author: Xin Fan).

R. Liu, Y. Zhang, Z. Luo and X. Fan are with the DUT-RU InternationalSchool of Information Science and Technology, Dalian University of Tech-nology, Dalian, 116620, China, and also with the Liaoning Key Laboratoryof Ubiquitous Network and Service Software (e-mail: [email protected];[email protected]; [email protected]; [email protected]).S. Cheng is with the School of Computer Science and Technol-ogy, Hangzhou Dianzi University, Hangzhou, 310018, China (e-mail:[email protected]).Z. Luo is also with the Institute of Artiﬁcial Intelligence, Guilin Universityof Electronic Technology, Guilin, China from under-sampled data points. Moreover, the existence ofacquisition noise and computation of discrete Fourier trans-formation deteriorate the reconstruction stability.Conventional approaches to CS-MRI devote signiﬁcant ef-forts to incorporating prior knowledge on images as regu-larizers in order to restore details as well as suppress ar-tifacts. Sparsity constraints in a speciﬁc transform domainare the most commonly used [2]–[5]. Dictionary learningbased methods consider the sparsity in the subspace spannedby reference images [6]–[8]. Non-local paradigms take theadvantage of the sparse representation by grouping similarlocal image patches [9], [10]. These methods derived fromcertain prior models are able to ﬁnd a stable and satisfactoryreconstruction by optimizing an energy functional with variousforms of regularizers. Nevertheless, this optimization typicallydemands iterative calculation of gradient information in a high-dimensional image space, suffering from rather slow process.Recent deep learning based approaches directly apply pre-trained deep architectures to infer MR images from degradedsampled inputs [11]–[13], rendering fast reconstruction. Nev-ertheless, these methods upon direct deep mappings are notso stable as conventional ones to data variations because theyabandon the principled prior knowledge and totally dependon training examples. By taking advantage of both domainknowledge and available data, hybrid paradigms attempt tobridge the gap between the data learning and numericaloptimization for functional energy with regularizers. For in-stance, Sun et al. [14] and Diamond et al. [15] unroll theiterative optimization as the prediction process using deepnetworks. Unfortunately, this type of ad hoc combinationsbreaks the convergence of optimization iterations so that itcannot guarantee the optimal solution to the original objectiveenergy. Figure 1 illustrates an example of reconstruction fromthe latest deep learning based method [16] where the circledanatomic structures mistakenly appear artifacts while smearingpart of tissues, though increasing PSNR from . to . .Therefore, neither medical research nor clinical diagnosis cantrust reconstruction without theoretical guarantee.Existing methods fail to provide an efﬁcient or reliable treatment for CS-MRI. In this study, we design a convergeddeep framework comprising modules assembled in principlefor trustworthy fast MR reconstruction. Speciﬁcally, we pro-vide an efﬁcient iterative process by integrating the task-driven domain knowledge with the data-driven deep networks.Meanwhile, we introduce an automatic feedback mechanismthat leads deep iterations converging to the optimal solution ofthe original energy. Further, we consider two extensive tasksin practical scenarios, i.e. , parallel imaging and Rician noise a r X i v : . [ ee ss . I V ] O c t OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

Ground Truth Zero-Filling ReﬁneGAN Ours

Fig. 1. A reconstruction example from naive zero-ﬁlling, recent ReﬁneGAN,and ours. Undesired artifacts and smoothing details in red circles emerge inthe result of ReﬁneGAN. pollution, to investigate the robustness of our paradigm forreal-world applications. Our framework ﬂexibly adapts to theenergies targeting to these scenarios, and robustly ﬁnds thesolutions. Our contributions can be listed as follows: • We propose a generic deep framework assembling mod-ules in principle to efﬁciently and reliably solve the CS-MRI model, typically non-convex and non-smooth. • We establish an automatic feedback mechanism in virtueof the ﬁrst-order optimality condition to reject improperpredictions from embedded deep architectures and toguide the deep iterations towards the desired solution. • We custom-tailor two extensions of our framework tohandle the case of parallel imaging and of Rician noisecontamination in practical scenarios.Extensive experimental results on benchmarks and raw datacollected in the k -space of an MRI equipment demonstratethe superiority of the proposed paradigm against state-of-the-art techniques in both reconstruction accuracy and efﬁciency,as well as the robustness to noise contamination.II. R ELATED W ORK

This section reviews approaches to CS-MRI reconstructionbased on prior models and deep learning, followed by thealgorithms for two practical reconstruction scenarios, i.e. ,parallel imaging and Rician noise removal.

A. Model based CS-MRI

Conventional CS-MRI needs to solve an energy optimiza-tion with regularizations derived from image priors in aspeciﬁc transform domain or subspace. This term plays a vitalrole in the reconstruction quality. Researchers have devotedtremendous efforts to exploring the sparsity prior in varioustransform domains including the gradient [2], discrete Cosinetransform [4], partial Fourier transform (RecPF) [17], andwavelet transform domains [3]. These techniques, developedupon the sparse representation with a ﬁxed set of basis func-tions, can hardly provide satisfactory quality. Alternatively, re-searchers resort to the data-driven sparse prior in the subspacespanned by an over-complete dictionary of reference images,performing better in details recovery and artifacts removal [6]–[10]. These dictionary based CS-MRI methods are able togenerate reliable or plausible reconstruction following theunderlying physical prior or constraint, but typically sufferfrom extremely slow process as the energy optimization turnsout to be a computationally expensive iterative process.Additionally, most of regularization terms derive the sparsityprior in the forms of the l p (0 ≤ p ≤ norm. The norm has nice mathematical properties, but may not be able to char-acterize complex data distributions in real applications. Also,simply concatenating reference images into an over-completedictionary as the data dependent prior has the limitation inhandling ﬂexible structures of real imaging problems. B. Deep learning based CS-MRI

Recent studies introduce deep learning techniques into MRIreconstruction in order to utilize the information given byvaults of available data examples. Wang et al. design andtrain a convolutional neural network (CNN) to learn thedirect mapping from the input under-sampled k -space datato the desired fully-sampled image [11]. A modiﬁed U-Netarchitecture is used to learn the aliasing artifacts in [12]and a deep cascading of convolutional network is applied toaccelerate MR imaging in [13].Lately, generative adversarialnetworks (GANs) are able to remove aliasing artifacts bygenerated details, and gain the state-of-the-art quality for aspeciﬁc benchmark [16], [18], [19]. These methods provide anextremely fast feed-forward inference for reconstruction, buthighly depend on training examples without any principledknowledge, failing to give the desired solution when the inputdeviates from the mode of training data distribution.There exist hybrid techniques that integrate deep architec-tures into an iterative optimization process for the reconstruc-tion energy with prior knowledge. Sun et al. present ADMM-Net for CS-MRI by introducing learnable architectures intothe Alternating Direction Method of Multipliers (ADMM)iterations [14]. A similar unrolling scheme is developedin [15]. These methods upon deep priors can improve thereconstruction accuracy and reliability at a relatively fasterspeed. Unfortunately, the unrolling schemes that directly re-place iterations by deep architectures cannot guarantee theconvergence of the iterative optimization process. Moreover,no mechanism is available to control the errors generated bynested structures. Therefore, we can still hardly rely on thesereconstructed results in medical analysis and diagnosis. C. Parallel imaging algorithms

Parallel imaging (PI) techniques, similar with CS-MRI,are also commonly used in clinic applications to accelerateMRI acquisition by using different sensitivities received frommultiple coils installed at different locations. For the sakeof saving time, PI conducts spatial encoding in the formof receiver coils with various sensitivity proﬁles instead ofgradient encoding. SENSE and its variants directly restore theimages from under-sampled data in the image domain [20],[21]. Another line of studies is to ﬁrst interpolate the missingdata in the frequency domain, and then to transfer the results tothe image domain, e.g. , GRAPPA [22], SPIRiT [23], and theirderived techniques. Recent approaches combine a CS energymodel with PI in order to further speed up the acquisition suchas CS-SENSE [24], CS-GRAPPA [25], L -SPIRiT [26], andSAKE [27]. It is worth investigating how these energy-basedPI approaches may beneﬁt from deep learning. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Data-driven Module

Conv ReLU BN  Prox

Optimal

Condition Prox  u k 1  v k 1   k 1  w k 1   k 1 Fidelity  k Fidelity Module Optimal Condition Module Prior Module

Zero-Filled

Iteration 1 Iteration k Iteration k+1 Iteration N Output

Fig. 2. The ﬁdelity F , data-driven N , optimal condition C , and prior P modules forms one iteration of our reconstruction framework. D. Rician noise removal

Independent zero-mean Gaussian noise with equal variancesmay simultaneously corrupt both real and imaginary partsof the complex data in the k -space during practical MRacquisition [28]. Typical MR imaging techniques convert theoriginal complex Gaussian noise to Rician noise when com-puting the magnitude of the complex k -space data [29]. Thenon-additive Rician noise challenges traditional noise removaltechniques. Researchers designate special treatments targetingat Rician noise removal, e.g. , nonlocal-means algorithms [30],[31], wavelet packet [32], wavelet domain ﬁltering [33], andtotal variation(TV) denoisers [34], [35]. Deep learning baseddenoisers are also developed to tackle Rician noise [14], [18].Nevertheless, it is still an open problem to fuse Rician noiseremoval with efﬁcient CS-MRI reconstruction in a uniﬁedframework for practical MR acquisition.III. M ETHOD

We ﬁrst give the problem formulation of CS-MRI, andpresent our deep framework. Subsequently, we detail thereconstruction process that converges to the optimum of theoriginal energy, and ﬁnally provide the theoretical analysis.

A. Problem formulation

According to the CS theory [10], typical CS-MRI recon-struction attempts to recover a fully-sampled image x fromunder-sampled k -space data y as: min x Φ ( x ) s.t. PFx = y , (1)where y ∈ C M represents the under-sampled observation inthe k -space, and x ∈ C N ( M << N ) denotes the columnvector of the desired MR image to be reconstructed. TheFourier transform and under-sampling operation are denotedas F and P ∈ R M × N , respectively. The composite operator PF constitutes a linear operation C N → C M . Φ ( x ) representsthe regularization imposed on the ill-posed inverse problem forreconstruction in order to constrain the searching space of thedesired solution x . Existing CS-MRI techniques usually solve the energy op-timization (1) by iterations upon gradient information withthe prior penalty. These methods suffer from expensive timeconsumption owing to the iterative gradient calculations. Al-ternatively, an end-to-end inference upon deep networks canachieve efﬁcient MR reconstruction from under-sampled data.These direct approaches without explicit constraints from priorknowledge lack reliability or robustness to data variations.Hybrid strategies integrate both principled knowledge anddeep architectures into the optimization procedure, but thissimple combination has no theoretical guarantee on the con-vergence so that the reliability issue has not been resolved yet.Such trade of reliability for efﬁciency may produce seriousconsequence in medical analysis and diagnosis because onecannot tell whether a reconstruction algorithm restores au-thentic anatomical structures or fabricates/hallucinates detailsupon an available data distribution. In this work, we propose adeep reconstruction framework with theoretical guarantee onconvergence, enabling both efﬁciency and robustness. B. Deep optimization framework

We consider the classical CS-MRI model with the sparsityregularization on the wavelet basis: α ∈ arg min α (cid:107) PFA α − y (cid:107) + λ (cid:107) α (cid:107) p , (2)where α denotes the sparse code of x on the wavelet basis A ( x = A α ), and λ indicates the trade-off parameter. Weuse the (cid:96) p regularization with p ∈ (0 , so that the problemturns out to be challenging non-convex sparse optimization. Totackle the challenge, our framework embraces imaging princi-ples, deep inference, and sparsity priors, either of which wasproved to be critical for effective reconstruction in previousstudies. Accordingly, we design the ﬁdelity, data-driven, andprior modules in every optimization iteration to exploit thesethree types of information. Further, the framework embeds anoptimality conditional module that automatically checks theoutput of each iteration and rejects the improper one leading to OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 the undesired result. This module maintains the convergence,yielding a reliable solution to the optimization.The top row of Fig. 2 demonstrates the cascade of all ingre-dients at each iteration of the optimization for our framework,and we elaborate the four modules in the order shown as Fig. 2.

Fidelity module:

The ﬁdelity term reveals the image for-mation process of MR imaging in (2), which is necessaryfor reconstruction algorithms. This module in our frame-work constitutes a mapping at the ( k +1)th iteration fromthe previous output α k (initialized as the degraded input α = A T F T P T y ) to an intermediate reconstructed image u k +1 , characterizing the inverse imaging process as: u k +1 = F (cid:0) α k ; θ F (cid:1) , (3)where θ F denotes the parameters for the mapping. The calcu-lation of the mapping F and its parameters θ F will be detailedlater in the reconstruction process. Data-driven module:

Networks provide a ﬂexible mech-anism for characterizing underlying structures shared by nu-merous training pairs. Our framework consists of deep archi-tectures encoding these image priors. We regard the artifactsintroduced by the ﬁdelity mapping F as unknown noise,and thus employ a residual denoising network with shortcuts(IRCNN) [36] as the data-driven module of our framework: v k +1 = N (cid:0) u k +1 ; θ k +1 N (cid:1) , (4)where θ k +1 N denotes the parameters of the residual networkin the ( k +1)th stage. We will give a detailed depiction of thechoice for this CNN-based denoiser later. Optimal condition module:

One major issue arises withthe data-driven module whether the inference of the deepnetwork N follows the direction converging to the optimumof the model energy. This module equipped with a checkingmechanism resolves this issue, via automatically checking andrejecting the improper output of the data-driven module: w k +1 = C ( v k +1 , α k ; θ C ) , (5)where v k +1 and α k are the outputs of the data-driven moduleat this stage and of the previous iteration, respectively. θ C denotes the parameters for the optimality condition. Thismodule calculates an indicator using the ﬁrst-order optimalitycondition that determines whether to accept the updating fromdeep architectures. Prior module:

The sparsity prior upon a speciﬁc basisis necessary to constrain the solution space in traditionalCS-MRI. The prior naturally describes the intrinsic mutualcharacteristics of MR images so that reconstruction can beneﬁtfrom incorporating this prior. Thus, we append a prior moduleto the condition module to further improve restoration quality: α k +1 = P (cid:0) w k +1 ; θ P (cid:1) , (6)where θ P denotes the parameters for the prior. Consequently,we are able to include domain knowledge for MR images,and recover more details. We apply an iterative process withall these four modules cascaded to solve the optimization (2)as shown in the bottom row of Fig. 2. The converged optimum α ∗ , found by the process, gives the ﬁnal reconstruction. C. Reconstruction process

We elaborate the computation of each module in our deepoptimization framework, forming an efﬁcient and convergedreconstruction process.

Closed solution for module F : This module intends togive an intermediate solution to the ﬁdelity term in (2), (cid:107) PFA α − y (cid:107) , without the complex sparsity constraint.Instead, we constrain the solution close to the reconstructionat the previous iteration α k using the l norm: u k +1 = arg min u (cid:107) PFAu − y (cid:107) + ρ (cid:107) u − α k (cid:107) , (7)where ρ balances between the ﬁdelity term and α k . Thisconstraint guarantees the recovery from the ﬁdelity term notdeviating from the overall optimization direction. Moreover,the continuity of (7) renders a closed solution: u k +1 = F (cid:0) α k ; ρ (cid:1) = A T F T (cid:0) P T P + ρ I (cid:1) − (cid:0) P T y + ρ FA α k (cid:1) . (8) Residual learning for module N : Inspired by the discoverythat artifacts from under-sampled data have a topologicallysimpler structure than original images [12], we choose adeep architecture with shortcuts to learn structural informationunderlying pre-reconstructed images u k +1 with noise/artifacts.The deep architecture consists of seven blocks. The ﬁrst blockis composed of a dilated convolution layer cascaded with a rec-tiﬁed linear unit (ReLU) layer. Each of the ﬁve middle blocksconsists of three layers, i.e. , a dilated convolution, a batchnormalization and a ReLU. The last one is a single convolutionlayer. Dilated convolutions with a larger receptive ﬁeld areadopted in the network to differentiate anatomical structuresfrom artifacts, resulting in better denoising performance.Such a deep approach accommodates various image struc-tures reﬂected by training examples without the need ofexplicit models. More importantly, a well trained deep networkcan provide rather fast forward inference needing no gradientcalculation in the image domain at each iteration. Speciﬁcally,we formulate this forward process as: v k +1 = N (cid:0) u k +1 ; ϑ k +1 (cid:1) = A T ( ResNet ( Au k +1 ; ϑ k +1 )) , (9)where ResNet denotes the inference via the trained deepnetwork with shortcuts. The output u k +1 of the ﬁdelity termis a set of coefﬁcients upon the wavelet domain rather thanan image, and thus we have to convert u k +1 into the imagedomain by A u k +1 as the input fed to the network. Subse-quently, we convert the network output back into coefﬁcientsby applying the inverse transformation of A . Optimal condition for module C : To maintain the theo-retical convergence of the proposed framework, we design achecking mechanism to guarantee iterations always convergingtowards the optimal solution of the original energy (2). First,we introduce a proximal gradient of a momentum term toconnect the output of the data-driven module with the ﬁrst-order optimal condition of the minimization energy. We deﬁnethe momentum proximal gradient as β k +1 ∈ prox η λ (cid:107)·(cid:107) p (cid:0) v k +1 − η (cid:0) ∇ f (cid:0) v k +1 (cid:1) + ρ (cid:0) v k +1 − α k (cid:1)(cid:1)(cid:1) , (10) prox ηλ (cid:107)·(cid:107) p ( v ) = arg min x λ (cid:107) x (cid:107) p + (cid:107) x − v (cid:107) . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

Algorithm 1

Reconstruction procedure

Require: x , P , F , y , and necessary parameters. Ensure:

Reconstructed MR image x . while not converged do u k +1 = F (cid:0) α k ; ρ (cid:1) , v k +1 = N (cid:0) u k +1 ; ϑ k +1 (cid:1) , w k +1 = C (cid:0) v k +1 , α k ; η (cid:1) , α k +1 = P (cid:0) w k +1 ; η (cid:1) , end while x = A α ∗ . where η is the step-size and f denotes the ﬁdelity term in (2).Then, we establish a feedback mechanism by considering theﬁrst-order optimal condition of (10) as (cid:107) v k +1 − β k +1 (cid:107) ≤ ε k (cid:107) α k − β k +1 (cid:107) . (11)Here, ε k is a positive constant revealing the tolerance scale ofthe distance between the current solution β k +1 and previousupdating α k at the k -th stage. The previous output α k is alsore-considered when (11) is not satisﬁed. Finally, we summarizethe checking module as w k +1 = C ( v k +1 , α k ; η ) = (cid:26) β k +1 if (11) is satisﬁed α k otherwise . (12)In this manner, the iterative process bypasses the improperoutput of deep networks, and directs to the desired solution. Proximal gradient for module P : The sparsity prior λ (cid:107) v (cid:107) p in (2) acts as the last module for each iteration. Consideringthe non-convexity of (cid:96) p -norm regularization, we solve it via astep of proximal gradient as follows: α k +1 = P (cid:0) w k +1 ; η (cid:1) ∈ prox η λ (cid:107)·(cid:107) p (cid:0) w k +1 − η ∇ f (cid:0) w k +1 (cid:1)(cid:1) , (13)where η is the step size. This operation enforces the priorterm of the reconstruction model, and thus preserves moredetails avoiding over-smoothing reconstruction.Algorithm 1 lists the iterative reconstruction process whereeach iteration consists of cascaded computations for the fourprincipled modules. This process converges to the solutionto (2), and we will give the theoretical analysis below. D. Theoretical investigations

Existing deep optimization strategies highly rely on thedistribution of data, and discard the convergence guaranteein iterations [14], [15]. Our scheme not only includes energyoptimization with imaging principles and prior knowledge, butalso embraces a learnable deep architecture for fast inference.Furthermore, we design an effective mechanism to determinewhether the output of networks at current iteration follows adescent direction towards the energy optimum. This subsectiontheoretically analyzes the convergence behavior of our method.To simplify the following derivations, we ﬁrst rewrite thefunction in (2) as Φ( α ) = (cid:107) PFA α − y (cid:107) + λ (cid:107) α (cid:107) p . Furthermore, we also give the following properties about Φ that are helpful for the convergence analysis :a) (cid:107) PFA α − y (cid:107) is proper and Lipschitz smooth;b) λ (cid:107) α (cid:107) p is proper and lower semi-continuous;c) Φ( α ) satisﬁes the KŁ property and is coercive.Then two important propositions and one theorem are givento characterize the convergence properties of our method. Proposition 1:

Let (cid:8) α k (cid:9) k ∈ N and (cid:8) β k (cid:9) k ∈ N be the sequencesgenerated by Alg. 1. Supposing that the error condition (cid:107) v k +1 − α k (cid:107) ≤ ε k (cid:107) β k +1 − α k (cid:107) in our module C is satisﬁed,there exists a sequence { C k } k ∈ N such that Φ( β k +1 ) ≤ Φ( α k ) − C k (cid:107) β k +1 − α k (cid:107) , (14)where C k = 1 / η − L f / − ( L f + | ρ − /η | ) (cid:15) k > and L f is the Lipschitz coefﬁcient of ∇ f . Proposition 2: If η < /L f , let (cid:8) α k (cid:9) k ∈ N and (cid:8) w k (cid:9) k ∈ N be the sequences generated by a proximal operator, then wehave Φ( α k +1 ) ≤ Φ( w k +1 ) − (1 / (2 η ) − L f / (cid:107) α k +1 − w k +1 (cid:107) . (15) Remark 1:

The inequalities in Propositions 3 and 4 providea descent sequence Φ( α k ) by illustrating the relationship of Φ( α k ) and Φ( w k ) as −∞ < Φ( α k +1 ) ≤ Φ( w k ) ≤ Φ( α k ) ≤ Φ( α ) . Thus, the operator C is a key criterion to check the outputof the learnable deep module whether to propagate alonga descent direction toward the optimum. Moreover, it alsoingeniously builds a bridge to connect the adjacent iteration Φ( α k ) and Φ( α k +1 ) . Theorem 1:

Suppose (cid:8) α k (cid:9) k ∈ N be a sequence generated byour algorithm. The following properties hold. • The square summable of sequence (cid:8) α k +1 − w k +1 (cid:9) k ∈ N is bounded, i.e., (cid:80) ∞ k =1 (cid:107) α k +1 − w k +1 (cid:107) < ∞ . • The sequence (cid:8) α k (cid:9) k ∈ N converges to a critical point α ∗ of Φ . Remark 2:

The second property in Theorem 1 implies that (cid:8) α k (cid:9) k ∈ N is a Cauchy sequence so that it globally convergesto the critical point of Φ .IV. E XTENSIONS TO PRACTICAL SCENARIOS

Our framework can adapt to practical scenarios havingsimilar models with (2). This section gives two examples, i.e. ,parallel imaging and reconstruction with Rician noise.

A. Extension 1: Parallel imaging based CS-MRI

Reconstruction of sparse multi-coil data can be formed as: α ∈ arg min α (cid:40) L (cid:88) l =1 (cid:107) PFS l A α − y l (cid:107) + λ (cid:107) α (cid:107) p (cid:41) , (16)where L is the number of receivers used for multi-coil dataacquisition. y l ∈ C M denotes the under-sampled MR data We place the details of these properties, and all the proofs of the followingpropositions and theorem in supplemental materials due to page limit.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6 recorded by the l th receiver with the corresponding sensitivitymap S l ∈ C N × N which can be estimated via commonly usedself-calibration techniques [37], [38].This CS-PI model (16) shares a similar form with (2) sothat we are able to readily apply the proposed framework forefﬁcient and reliable reconstruction. The ﬁdelity term in (16)derives the closed-form solution for the module F as: u k +1 = F (cid:0) α k ; ρ (cid:1) = L (cid:88) l =1 A T S l T F T (cid:0) P T y l + ρ FS l A α k (cid:1) P T P + ρ I . (17)Accordingly, the derivative of the ﬁdelity term, ∇ f ( α ) , inthe modules C and P changes into: ∇ f ( α ) = L (cid:88) l =1 A T S l T F T P T ( PFS l A α − y l ) . (18)We can ﬁnd ﬁnal reconstruction by running Alg. 1 withother computations unchanged. B. Extension 2: CS-MRI with Rician noise

In order to enable the proposed framework to tackle thereconstruction with non-addition Rician noise, we re-formulatethe CS-MRI model in (2) as: min x (cid:107) PFz − y (cid:107) + λ (cid:107) A (cid:62) x (cid:107) p + λ (cid:107) A (cid:62) z (cid:107) p s.t. z = (cid:112) ( x + n ) + n , (19)where x , y and n , n ∼ N (0 , σ ) denote the fully sampledclear MR image, observed k -space data and independent Gaus-sian noise in real and imaginary components, respectively. Thematrices A (cid:62) and A (cid:62) are the inverse of wavelet transforms A and A , respectively.We then split (19) into two subproblems:  z k +1 = arg min z (cid:107) PFz − y (cid:107) + λ (cid:107) A (cid:62) z (cid:107) p + ρ (cid:107) z − z k (cid:107) , (20a) x k +1 = arg min x (cid:107) (cid:112) ( x + n ) + n − z k +1 (cid:107) + λ (cid:107) A (cid:62) x (cid:107) p , (20b)where ρ is a penalty parameter. Therefore, we can alterna-tively apply the iterative process in Alg. 1 to solve z k +1 and x k +1 , ﬁnally converging to the optimal estimate x ∗ .The subproblem (20a) aims to reconstruct a fully sampled noisy MR image from the k -space observation y . Therefore,Alg.1 is applicable to the subproblem with all modules un-changed but taking the two square terms as the ﬁdelity module: u z k +1 = arg min u z (cid:107) PFu z − y (cid:107) + ρ (cid:107) u z − z k (cid:107) + ρ (cid:107) u z − x k (cid:107) , (21)where z k and x k represent the output at previous iterationof the subproblems (20a) and (20b), respectively. ρ and ρ balances among the ﬁdelity term, z k and x k . Also, thecontinuity of (21) renders a closed solution: u z k +1 = F (cid:0) z k , x k ; ρ (cid:1) = F T (cid:0) P T P + ρ I + ρ I (cid:1) − (cid:0) P T y + ρ Fz k + ρ Fx k (cid:1) . (22) Iteration -10-8-6-4-2

Iteration

Fig. 3. Loss (left) and PSNR (right) evolution during iterations of four modulecombinations.

F→P F→N F→N→P

Ours

Fig. 4. Images reconstructed with four module combinations.

Accordingly, we derive the derivative of the ﬁdelity term ∇ f ( z ) in both modules C and P as: ∇ f ( z ) = F T (cid:0) P T PFz − P T y (cid:1) + ρ (cid:0) z − z k (cid:1) . (23)The second subproblem (20b) targets at noise removal bywriting the ﬁdelity module as: u x k +1 = arg min u x (cid:107) (cid:113) ( u x + n ) + n − z k +1 (cid:107) . (24)Unfortunately, we can hardly express a closed-form solution u x to the module F , and thus resort to an efﬁcient learnablestrategy with two-stage IRCNNs, trained using pairs of noise-free and Gaussian noisy MR images, in order to tackle non-additive Rician noise. The ﬁrst stage predicts ( u x + n ) from ( z k +1 ) , which removes additive noise n by noticing ( z k +1 ) = ( u x + n ) + n . The second stage feeds the squareroot of the output of the ﬁrst stage, i.e. , (cid:112) ( u x + n ) , tothe trained IRCNN, inferring u x k +1 that eliminates additivenoise n . Given u x k +1 , we obtain one step iteration x k +1 bysubsequently calculating the other three modules, N , C and P in Alg. 1. The gradient ∇ f ( x ) in C and P changes into: ∇ f ( x ) = ( x + n )(1 − ( (cid:112) ( x + n ) + n ) − z k +1 )) , (25)where n and n are handily available as byproducts of thetwo-stage IRCNN inference.V. E XPERIMENTS

This section ﬁrst explores the impact of each module inour framework on accuracy and convergence behavior. Next,we compare our algorithm with the state-of-the-art CS-MRImethods to demonstrate its superiority on accuracy, efﬁciency,and robustness. We also demonstrate on raw k -space data pro-vided by an MR manufacturer Prodiva 1.5T scanner (PhilipsHealthcare, Best, Netherlands) besides benchmarks. Further,we conduct experiments on multi-coils data and Rician noisy OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

TABLE IQ

UANTITATIVE COMPARISON ON DIFFERENT SAMPLING PATTERNS AT

SAMPLING RATIO .Data T -weighted MR Data T -weighted MR DataMask Cartesian Radial Gaussian Cartesian Radial GaussianEvaluation PSNR RLNE PSNR RLNE PSNR RLNE PSNR RLNE PSNR RLNE PSNR RLNEZeroFilling [39] 22.16 0.2874 25.19 0.2026 25.08 0.2051 23.80 0.3724 25.69 0.2636 26.48 0.2737TV [4] 23.85 0.2366 27.76 0.1514 29.76 0.1204 26.76 0.2650 33.27 0.1258 35.85 0.0938SIDWT [40] 23.43 0.2479 28.82 0.1449 29.64 0.1220 25.96 0.2905 33.51 0.1173 36.14 0.0905PBDW [3] 25.96 0.1857 30.86 0.1145 32.38 0.0890 29.18 0.2008 35.33 0.0951 38.54 0.0685PANO [9] 27.88 0.1492 30.51 0.1104 33.94 0.0746 31.74 0.1499 36.09 0.0912 40.43 0.0557FDLCP [8] 26.47 0.1757 30.75 0.1075 26.47 0.1757 31.74 0.1500 37.33 0.0788 31.74 0.1500BM3D-MRI [10] 26.20 0.1802 31.08 0.1035 35.41 0.0630 29.23 0.1999 37.55 0.0770 42.67 0.0428DAMP [41] 25.90 0.1866 30.31 0.1128 35.30 0.0638 27.98 0.2306 35.98 0.0919 42.30 0.0447ADMM-Net [14] 25.14 0.2041 29.42 0.1254 33.22 0.0807 28.62 0.2141 36.30 0.0888 39.19 0.0636DAGAN [18] 24.57 0.1707 26.79 0.1324 27.79 0.1177 25.89 0.1196 29.99 0.0748 30.40 0.0714ReﬁneGAN [16] 25.27 0.2057 28.20 0.1462 26.72 0.1713 27.58 0.2506 33.54 0.1240 33.81 0.1226Ours TABLE IIC

OMPARISONS OF TIME CONSUMPTION ON T - WEIGHTED D ATA USING R ADIAL PATTERN AT

SAMPLING RATIO .Evaluation ZeroFilling SIDWT PBDW PANO FDLCP BM3D-MRI DMAP ADMM-Net OursPSNR 22.85 26.66 26.55 26.53 25.65 24.88 26.23 24.90

SSIM 0.5353 0.5863 0.5553 0.5804 0.5293 0.5674 0.5282 0.5431

RLNE 0.3062 0.2339 0.2359 0.2289 0.2653 0.2729 0.2529 0.3085 -15 0 15 30 45 60 75 90

Running Time R L N E SIDWTPBDWPANOFDLCPBM3D-MRIDAMPADMM-NetDAGANRefineGANOurs

10 20 30 40 50

Sampling Ration P S NR SIDWTPBDWPANOFDLCPBM3D-MRIDAMPADMM-NetDAGANRefineGANOurs

Fig. 5. RLNE VS Running time (left) and PSNR VS sample ratios (right)reﬂect efﬁciency and robustness to sampling variations, respectively. We omitZeroﬁlling and TV whose results fall beyond the range of these two plots.TABLE IIIC

OMPARISONS ON PARALLEL IMAGING

Evaluation SENSE ESPIRiT ESPIRiT-L GRAPPA OursPSNR 27.82 29.14 30.13 30.96 31.02RLNE 0.31 0.27 0.25 00.23 0.23Time 8.5570 9.8833 234.982 116.407 9.1078 data. All experiments were executed on a PC with Intel(R)CPU @3.0GHz 256GB RAM and a NVIDIA TITAN Xp . A. Ablation analysis

First we evaluate four combinations of modules in ourframework to ﬁgure out their roles. The ﬁrst case

F → P performs reconstruction with the constraint of sparsity prior,and the second one

F → N combines the ﬁdelity and data-driven modules in order to demonstrate the performance ofdeep networks. We use both sparsity and deep priors withoutthe optimal condition module as the third case

F→N→P . The Source codes and testing data are available at https://github.com/dlut-dimt/TGDF-TMI entire paradigm with all modules is the last choice

Ours . Weapply these strategies on T -weighted data using the Ridialsampling pattern with 20% ratio. The stopping criterion foriterations is set as (cid:107) x k +1 − x k (cid:107) / (cid:107) x k (cid:107) ≤ e − .Figure 3 illustrates that the loss of F → P is slightlylarger than that of

F →N at the ﬁrst several iterations. Thedata-driven module is more signiﬁcant when image qualityis low in the beginning. As the process goes on, the lossfor

F → N ﬂuctuates and its PSNR decreases, showingunstable inference, while iterations for

F → P keep stable.Combining deep and sparsity priors

F → N → P improvesperformance over the former two, but converges not so stableand fast as

Ours . Higher PSNR values of

Ours demonstratemore accurate reconstruction. The execution time of

Ours is2.5225s, less than that of

F→P (4.4762s),

F→N (3.3240s),and

F→N →P (6.2760s). Visual results in Fig. 4 also verifythe superior quality of

Ours over the other three.

B. Comparisons with the State-of-the-art

We compared our method with eleven CS-MRI techniques,including eight model based, Zero-Filling [39], TV [4],SIDWT [40], PBDW [3], PANO [9], FDLCP [8], BM3D-MRI[10] and DAMP [41], as well as three learning based, ADMM-Net [14], DAGAN [18] and ReﬁneGAN [16]. The latter threemethods and ours were tested on GPU.

Comparisons on benchmark sets:

We randomly selected25 T -weighted and 25 T -weighted MRI data from 50 differ-ent subjects in the IXI dataset for testing in order to evaluatereconstruction accuracy, time consumption and robustness todata variations. Three commonly used sampling patterns areadopted, i.e. , the Cartesian [3], Radial [14] and Gaussian [18],with the sampling ratio varying from 10% to 50%.The parameter ρ in F is set as 5 and noise level in N ranges from 3.0 to 49.0. The parameters L ( η = 2 /L ), λ http://brain-development.org/ixi-dataset/ OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Ground Truth ZF (23.94) PBDW (27.21) PANO (27.66) FDLCP (26.75) DAMP (27.20) Ours ( ) Fig. 6. Images, zoomed-in details, and error heat maps (blue-lower and red-higher) reconstructed from raw k -space data at the acceleration factor of 2.0.Resultant PSNR values are parenthesized after corresponding algorithms. cGT SENSE (27.43) ESPIRiT(29.59) ESPIRiT-L (30.59) GRAPPA (30.43) Ours ( ) Fig. 7. Images for parallel imaging from multi-channel data collected by 8 coils with the acceleration factor of 2.0. PNSR values are given in parenthesesafter corresponding algorithms.

Noisy LMMSE NLM UNLM RiceVST Ours(20.77) (22.82) (25.76) (29.23) (29.93) (30.84)

Fig. 8. Heat maps (Blue-lower, Red-higher) of Rician denoisers with thenoise level of 20. PSNR values are given below corresponding algorithms.

10 20 30 40 50

Noise Level P S NR -0.50 0.5 1 1.5 2 2.5 3 3.5 4 R unn i n g T i m e PSNR of RiceVSTPSNR of OursTime of RiceVSTTime of Ours

22 23 24 25 26 27 28 29

RiceVST O u r s D e n o i s er SIDWTPBDWPANOFDLCPBM3D-MRIDAMPADMM-NetDAGANRefineGANOurs

Fig. 9. Left: PSNR and running time at different noise levels. Right: PSNRvalues by reconstruction algorithms with RiceVST ( x -axis) and the two-stagedeep denoiser ( y -axis) cascaded as post-processing at the Rician noise level20. Ours embeds the deep denoiser in every iteration. and p in the modules C and P are set as . , − and . , respectively. The maximum number of iterations is 50.Comparative approaches take the parameter settings as theirrespective papers. We re-ﬁnetune deep models in DAGAN and ReﬁneGAN upon their own using image pairs generated byvarious patterns and sampling ratios for fair comparisons.First, we evaluate reconstruction accuracy on benchmarkdata using three sampling patterns in terms of PSNR andRLNE. Table I shows that ours leads all the competitors bya large margin for all sampling patterns. Next, reconstructionefﬁciency is explored via comparisons on time consumptionand reconstruction error with the Radial sampling at a 50%ratio. The marker at the lower left of the left plot in Fig. 5indicates that our method gives the least reconstruction errorwith a comparable low processing time, balancing accuracyand efﬁciency. Third, we use the Gaussian mask at ﬁvesampling ratios varying from 10% to 50%. As shown in theright plot of Fig. 5, ours have the highest PSNR values overall others at all sampling ratios, demonstrating its robustness.These comparisons demonstrates that traditional methodsgive ideal accuracy but suffer from large time consumption,while learning based ones enjoy fast forward inference but de-pend so highly on training data that testing data deviating fromtraining distribution may severely degrade the performance. Comparisons on Real Complex Data:

We conduct com-parisons on real k -space data directly collected from a Philipmachine to investigate the performance for practical applica-tions. Many algorithms like BM3D-MRI and those learningbased ones consider no complex form of these real data. Theﬂexibility of our framework enables us to directly deal withcomplex data by the modules F , C and P . We ﬁrst separatelyhandle the real and imaginary parts, and then merge to the OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

Ground Truth ZeroFilling (24.22) SIDWT (25.50) PBDW (26.64) PANO (27.03) FDLCP (25.00)BM3D-MRI (27.36) DAMP (26.27) ADMM-Net(26.09) DAGAN(24.19) ReﬁneGAN(25.12) Ours (27.88)

Fig. 10. Images, zoomed-in details, and heat maps reconstructed by traditional CS-MRI methods with the two-stage deep denoiser as post-processing. Ourscombines reconstruction with the denoiser in one framewok.The parenthesized value gives the resultant PSNR. complex form for the module N . In this experiment, the noiselevel N ranges from 20.0 to 49.0. The parameter L in C and P is set to 1.7, and the maximum iteration number is 60.The PSNR, SSIM and RLNE values of our method in Tab. IIdemonstrate more accurate reconstruction than the others onreal data. Figure 6 gives the qualitative comparisons withenlarged details and error heat maps. Our method producessharper textures and less error, showing the superior capability. C. Experiments on Parallel Imaging

We perform reconstruction from multi-coils by compar-ing with four PI techniques, SENSE [20], GRAPPA [22],SPIRiT [23] and L -SPIRiT [26]. Table III shows that ourapproach achieves higher accuracy in less time, possessing anideal ability to handle multi-coil data. The top row of Fig. 7visualizes the input from multiple coils, and the lower rowshows the reconstruction using different techniques, where ourmethod restores richer details with higher PSNR. D. Experiments on reconstruction with Rician noise

The two-stage deep networks solving the subproblem (20b)acts as a Rician denoiser. We ﬁrst compare the strategy withclassical Rician denoisers including NLM [42], UNLM [43],LMMSE [44] and RiceVST [45]. We generate training databy adding Rician noise at different levels to 500 T -weightedMRI data randomly chosen from the MICCAI 2013 grandchallenge dataset . Figure 8 shows error heatmaps betweenthe ground truth and denoised image by various techniques.Our strategy successfully resolves the non-additivity so thatit attenuates noise more effectively than the others, especiallyfor background. RiceVST performs close to ours, and hencewe further compare with RiceVST on ﬁve noise levels varyingfrom 10 to 50, shown in Fig. 9. Ours outputs higher PSNR atlevels less than 30 with much less time than RiceVST.Our framework naturally embeds the two-stage deep de-noiser into iterations for reconstruction. We investigate the http://masiweb.vuse.vanderbilt.edu/workshop2013/index.php/Segmentation Challenge Details performance of this one-set solution for reconstruction withRician noise on T -weighted data in the IXI dataset. Theparameters ρ , λ and λ in (20a) and (20b) are set as0.01, 1.0 and 1.0, respectively. For fair comparisons, wecascade RiceVST or our two-stage denoiser to the otherreconstruction algorithms as post-processing. The right plotof Fig. 9 illustrates PSNR values reconstructed by the com-parative algorithms with RiceVST ( x -axis) and our two-stagedenoiser ( y -axis) cascaded. The brown triangle denotes PSNR( dB) obtained by our one-set solution. This ﬁgure showsthat a) our framework combing reconstruction with denoisingproduces the best quality over the other reconstruction algo-rithms with post-processing, and b) our learnable two-stagedenoiser performs better than RiceVST as post-pocessing forreconstruction. Figure 10 provides an visual illustration forreconstruction with noise, where our framework also preservesmore details and introduces lower errors.VI. C ONCLUSION

We propose a theoretically converged deep framework forCS-MRI reconstruction. This paradigm takes advantages ofboth model-based and deep representation, improving accu-racy and efﬁciency. Further, we devise an optimal conditionthat guides iterative propagation converging to the criticalpoint of the energy with priors, guaranteeing reliability. Wealso apply this framework to parallel imaging and reconstruc-tion with Rician noise for practical scenarios, without needingsigniﬁcant data re-train. Extensive experiments demonstratethe superiority of our framework over the state-of-the-art. Fu-ture work directs to applications for more tasks like low-doseCT, and to incorporation of other effective deep architectures.A

CKNOWLEDGMENTS

This paper extends from the conference version published atAAAI 2019 [46]. The authors would like to thank MR methodmanager Drs. Chenguang Zhao and sequence engineer YuliHuang at Philips Healthcare (Suzhou) Co. Ltd. for providingtesting data from Prodiva 1.5T scanner (Philips Healthcare,Best, Netherlands). This work is partially supported by the Na-tional Natural Science Foundation of China (Nos. 61922019,

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

UPPLEMENTAL M ATERIALS

Proof

To simplify the following derivations, we ﬁrst rewrite thefunction in Eq. (2) as Φ( α ) = f ( α ) + g ( α ) = (cid:107) PFA α − y (cid:107) + λ (cid:107) α (cid:107) p . We ﬁrst provide detailed explanations about the properties of f, g , and Φ . • f ( α ) is proper if dom f := { α ∈ R D : f ( α ) < + ∞} isnonempty. • f ( α ) is L -Lipschitz smooth if f is differentiable andthere exists L > such that (cid:107)∇ f ( α ) − ∇ f ( β ) (cid:107) ≤ L (cid:107) α − β (cid:107) , ∀ α , β ∈ R D . If f is L -Lipschitz smooth, we have the following inequal-ity f ( α ) ≤ f ( β )+ (cid:104)∇ f ( β ) , α − β (cid:105) + L (cid:107) α − β (cid:107) , ∀ α , β ∈ R D . • g ( α ) is lower semi-continuous if lim inf α → β g ( α ) ≥ g ( β ) atany point β ∈ dom g . • Φ( α ) is coercive if Φ is bounded from below and Φ → ∞ if (cid:107) α (cid:107) → ∞ , where (cid:107) · (cid:107) is the (cid:96) norm. • Φ( α ) is is said to have the Kurdyka-Łojasiewicz (KŁ)property at ¯ α ∈ dom ∂ Φ := { α ∈ R D : ∂g ( α ) (cid:54) = ∅} ifthere exist η ∈ (0 , ∞ ] , a neighborhood U ¯ α of ¯ α and adesingularizing function φ : [0 , η ) → R + which satisﬁes(1) φ is continuous at and φ (0) = 0 ; (2) φ is concaveand C on (0 , η ) ; (3) for all s ∈ (0 , η ) : φ (cid:48) ( s ) > , suchthat for all α ∈ U ¯ α ∩ [Φ( ¯ α ) < Φ( α ) < Φ( ¯ α ) + η ] , the following inequality holds φ (cid:48) (Φ( α ) − Φ( ¯ α )) dist (0 , ∂ Φ( α )) ≥ . Moreover, if Φ satisﬁes the KŁ property at each point of dom ∂ Φ then Φ is called a KŁ function. Proposition 3:

Let (cid:8) α k (cid:9) k ∈ N and (cid:8) β k (cid:9) k ∈ N be the sequencesgenerated by Alg.1. Suppose that the error condition (cid:107) v k +1 − α k (cid:107) ≤ ε k (cid:107) β k +1 − α k (cid:107) in our icheck is satisﬁed. Then thereexisted a sequence { C k } k ∈ N such that Φ( β k +1 ) ≤ Φ( α k ) − C k (cid:107) β k +1 − α k (cid:107) , (26)where C k = η − L f − ( L f + | ρ − η | ) (cid:15) k > and L f is theLipschitz coefﬁcient of ∇ f . Proof 6.1:

In our icheck stage, we have β k +1 ∈ prox η λ (cid:107)·(cid:107) p (cid:0) v k +1 − η ∇ f (cid:0) v k +1 (cid:1) + ρ (cid:0) v k +1 − α k (cid:1)(cid:1) , if the error condition is satisﬁed. Thus, by the deﬁnition ofthe proximal operator we get β k +1 ∈ arg min β η λ (cid:107) β (cid:107) p + 12 (cid:107) β − v k +1 + η (cid:0) ∇ f (cid:0) v k +1 (cid:1) + ρ (cid:0) v k +1 − α k (cid:1)(cid:1) (cid:107) = arg min β η λ (cid:107) β (cid:107) p + 12 (cid:107) β − α k + η ∇ f (cid:0) v k +1 (cid:1) + ( η ρ − (cid:0) v k +1 − α k (cid:1) (cid:107) = arg min β λ (cid:107) β (cid:107) p + 12 η (cid:107) β − α k (cid:107) + (cid:28) β − α k , ∇ f (cid:0) v k +1 (cid:1) + ( ρ − η ) (cid:0) v k +1 − α k (cid:1)(cid:29) . (27)Hence in particular, taking β = β k +1 and β = α k respec-tively, we obtain λ (cid:107) β k +1 (cid:107) p + η (cid:107) β k +1 − α k (cid:107) + (cid:10) β k +1 − α k , ∇ f (cid:0) v k +1 (cid:1) +( ρ − η ) (cid:0) v k +1 − α k (cid:1)(cid:69) ≤ λ (cid:107) α k (cid:107) p . (28)Invoking the Lipschitz smooth property of f , we also have f ( β k +1 ) ≤ f ( α k )+ (cid:10) β k +1 − α k , ∇ f ( α k ) (cid:11) + L f (cid:107) β k +1 − α k (cid:107) . (29)Combing Eqs. (28) and (29), we obtain Φ( β k +1 ) ≤ Φ( α k ) − η (cid:107) β k +1 − α k (cid:107) + L f (cid:107) β k +1 − α k (cid:107) + (cid:10) β k +1 − α k , ∇ f ( α k ) − ∇ f (cid:0) v k +1 (cid:1) − ( ρ − η ) (cid:0) v k +1 − α k (cid:1)(cid:29) ≤ Φ( α k ) − η (cid:107) β k +1 − α k (cid:107) + L f (cid:107) β k +1 − α k (cid:107) + ( L f + | ρ − η | ) (cid:15) k (cid:107) β k +1 − α k (cid:107) ≤ Φ( α k ) − ( 12 η − L f − ( L f + | ρ − η | ) (cid:15) k ) (cid:107) β k +1 − α k (cid:107) ≤ Φ( α k ) − C k (cid:107) β k +1 − α k (cid:107) . The last inequality holds under the assumption C k = η − L f − ( L f + | ρ − η | ) (cid:15) k > in the k -th iteration. So far, thisprove that the inequality (3) in Proposition 3 holds. Proposition 4: If η < /L f , let (cid:8) α k (cid:9) k ∈ N and (cid:8) w k (cid:9) k ∈ N be the sequences generated by a proximal operator in Alg.1.Then we have Φ( α k +1 ) ≤ Φ( w k +1 ) − (1 / (2 η ) − L f / (cid:107) α k +1 − w k +1 (cid:107) . (30) Proof 6.2:

As the proximal stage shows in Alg. 1, we have α k +1 ∈ prox η λ (cid:107)·(cid:107) p (cid:0) w k +1 − η ∇ f (cid:0) w k +1 (cid:1)(cid:1) = arg min α λ (cid:107) α (cid:107) p + 12 η (cid:107) α − w k +1 (cid:107) + (cid:104) α − w k +1 , ∇ f ( w k +1 ) (cid:105) . (31) OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

Similar with the deduction in Proposition 3, we obtain thefollowing inequality: λ (cid:107) α k +1 (cid:107) p + η (cid:107) α k +1 − w k +1 (cid:107) + (cid:10) α k +1 − w k +1 , ∇ f ( w k ) (cid:11) ≤ λ (cid:107) w k +1 (cid:107) p ,f ( α k +1 ) ≤ f ( w k +1 ) + (cid:10) α k +1 − w k +1 , ∇ f ( w k +1 ) (cid:11) + L f (cid:107) α k +1 − w k +1 (cid:107) . Thus we get the conclusion Φ( α k +1 ) ≤ Φ( w k +1 ) − ( 12 η − L f (cid:107) α k +1 − w k +1 (cid:107) . Theorem 2:

Suppose that (cid:8) α k (cid:9) k ∈ N be a sequence generatedby Alg. 1. The following assertions hold. • The square summable of sequence (cid:8) α k +1 − w k +1 (cid:9) k ∈ N is bounded, i.e., ∞ (cid:88) k =1 (cid:107) α k +1 − w k +1 (cid:107) < ∞ . • The sequence (cid:8) α k (cid:9) k ∈ N converges to a critical point α ∗ of Φ . Proof 6.3:

We ﬁrst verify that square summable of (cid:8) α k − w k (cid:9) k ∈ N is bounded. From Propositions 3 and 4, wededuce the Φ( α k +1 ) ≤ Φ( w k +1 ) ≤ Φ( α k ) ≤ Φ( w k ) ≤ Φ( α ) , is established. It follows that both sequences (cid:8) Φ( α k ) (cid:9) k ∈ N and (cid:8) Φ( w k ) (cid:9) k ∈ N are non-increasing. Then, since both f and g areproper, we have Φ is bounded and that the objective sequences (cid:8) Φ( α k ) (cid:9) k ∈ N and (cid:8) Φ( w k ) (cid:9) k ∈ N converge to the same value Φ ∗ , i.e., lim k →∞ Φ( α k ) = lim k →∞ Φ( w k ) = Φ ∗ . Moreover, using the assumption Φ is coercive, we have thatboth (cid:8) α k (cid:9) k ∈ N and (cid:8) w k (cid:9) k ∈ N are bounded and thus haveaccumulation points. Considering Eq. (30) and the relationshipof Φ( α ) and Φ( w ) , we get for any k ≥ , ( 12 η − L f (cid:107) α k +1 − w k +1 (cid:107) ≤ Φ( w k +1 ) − Φ( α k +1 ) ≤ Φ( α k ) − Φ( α k +1 ) . (32)Summing over k , we further have ( 12 η − L f ∞ (cid:88) k =0 (cid:107) α k +1 − w k +1 (cid:107) ≤ Φ( α ) − Φ ∗ < ∞ . (33)So far, the ﬁrst assertion holds.Next, we prove lim k →∞ α k = α ∗ is a critical point of Φ .Eq. (33) implies that (cid:107) α k +1 − w k +1 (cid:107) → when k → ∞ , i.e.,there exist subsequences { w k } k j ∈ N and { α k } k j ∈ N such thatthey share the same accumulation point α ∗ as j → ∞ . Thenby Eq. (31), we have λ (cid:107) α k +1 (cid:107) p + 12 η (cid:107) α k +1 − w k +1 (cid:107) + (cid:10) α k +1 − w k +1 , ∇ f ( w k ) (cid:11) ≤ λ (cid:107) α ∗ (cid:107) p + 12 η (cid:107) α ∗ − w k +1 (cid:107) + (cid:10) α ∗ − w k +1 , ∇ f ( w k ) (cid:11) . Let k + 1 = k j and j → ∞ , then by taking lim sup on theabove inequality, we have lim sup j →∞ (cid:107) α k j (cid:107) p ≤ (cid:107) α ∗ (cid:107) p . What’s more, we also get lim inf j →∞ (cid:107) α k j (cid:107) p ≥ (cid:107) α ∗ (cid:107) p since (cid:107) · (cid:107) p is lower semi-continuous. Thus we have lim j →∞ (cid:107) α k j (cid:107) p = (cid:107) α ∗ (cid:107) p . Considering the continuity of f , wehave lim j →∞ f ( α k j ) = f ( α ∗ ) . Thus, we obtain lim j →∞ Φ( α k j ) = lim j →∞ f ( α k j ) + λ (cid:107) α k j (cid:107) p = Φ( α ∗ ) . (34)By the ﬁrst-order optimality condition of Eq. (31) and k j = k + 1 , we deduce ∈ ∂λ (cid:107) α k j (cid:107) p + ∇ f ( w k j ) + 1 η ( α k j − w k j ) . Hence we get ∇ f ( α k j ) − ∇ f ( w k j ) − η ( α k j − w k j ) ∈ ∂ Φ( α k j ) ⇒(cid:107) ∂ Φ( α k j ) (cid:107) = (cid:107)∇ f ( α k j ) − ∇ f ( w k j ) − η ( α k j − w k j ) (cid:107)≤ ( L f + η ) (cid:107) α k j − w k j (cid:107) . (35)Using the sub-differential of Φ and Eqs. (34), (35), weﬁnally deduce that ∈ Φ( α ∗ ) , which means that { α k } k ∈ N is subsequence convergence.Furthermore, we will prove that { α k } k ∈ N is sequenceconvergence. Since Φ is a KŁ function, we have ϕ (cid:48) (Φ( α k +1 ) − Φ( α ∗ )) dist (0 , ∂ Φ( α k +1 )) ≥ . From Eq. (35) we get that ϕ (cid:48) (Φ( α k +1 ) − Φ( α ∗ )) ≥ L f + η (cid:107) α k j − w k j (cid:107) − . On the other hand, from the concavity of ϕ and Eq. (32) and(35) we have that ϕ (Φ( α k +1 ) − Φ( α ∗ )) − ϕ (Φ( α k +2 ) − Φ( α ∗ )) ≥ ϕ (cid:48) (Φ( α k +1 ) − Φ( α ∗ ))(Φ( α k +1 ) − Φ( α k +2 )) ≥ L f + η (cid:107) α k +1 − w k +1 (cid:107) − ( η − L f ) (cid:107) α k +2 − w k +2 (cid:107) . For convenience, we deﬁne for all m, n ∈ N and α ∗ thefollowing quantities ∆ m,n := ϕ (Φ( α m ) − Φ( α ∗ )) − ϕ (Φ( α n ) − Φ( α ∗ )) , and E := 2 L f η + 21 − L f η . These deduce that ∆ k +1 ,K +2 ≥ (cid:107) α k +2 − w k +2 (cid:107) E (cid:107) α k +1 − w k +1 (cid:107) ⇒ (cid:107) α k +2 − w k +2 (cid:107) ≤ E ∆ k +1 ,k +2 (cid:107) α k +1 − w k +1 (cid:107)⇒ (cid:107) α k +2 − w k +2 (cid:107) ≤ E ∆ k +1 ,k +2 + (cid:107) α k +1 − w k +1 (cid:107) . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

Summing up the above inequality for i = l, . . . , k yields k (cid:88) i = l +1 (cid:107) α i +2 − w i +2 (cid:107)≤ k (cid:88) i = l +1 (cid:107) α i +1 − w i +1 (cid:107) + E k (cid:88) i = l +1 ∆ i +1 ,i +2 ≤ k (cid:88) i = l +1 (cid:107) α i +2 − w i +2 (cid:107) + (cid:107) α l +2 − w l +2 (cid:107) + E ∆ l +2 ,k +2 , where the last inequality holds under the fact ∆ m,n + ∆ n,r =∆ m,r for all m, n, r ∈ N . Since ϕ > , we thus have for any k > l that k (cid:88) i = l +1 (cid:107) α i +2 − w i +2 (cid:107) ≤ (cid:107) α l +2 − w l +2 (cid:107) + E ∆ l +2 ,k +2 ≤ (cid:107) α l +2 − w l +2 (cid:107) + Eϕ (Φ( α l +2 ) − Φ( α ∗ )) . (36)Moreover, recalling the conclusion in Proposition 3, we alsohas min i { C i } k (cid:88) i = l +1 (cid:107) w i +2 − α i +1 (cid:107) ≤ c k (cid:88) i = l +1 (Φ( w i +2 ) − Φ( α i +1 )) ≤ k (cid:88) i = l +1 (Φ( α i +2 ) − Φ( α i +1 )) = Φ( α k +2 ) − Φ( α l +2 ) . (37)Combing with Eqs. (36) and (37), we easily deduce ∞ (cid:88) k =1 (cid:107) α k +1 − α k (cid:107) < ∞ . (38)It is clear that Eq. (38) implies that the sequence { α k } k ∈ N isa Cauchy sequence and hence is a convergent sequence. Sofar, the second assertion holds. Sampling Masks

In experiments we include three common used types ofundersampling masks such as the Cartesian pattern in [3],Radial pattern in [14] and Gaussian mask in [18]. Fig.11gives a visualization of the three kinds of patterns at a uniﬁedsampling ratio of 30%.Cartesian Radial Gaussian

Fig. 11. Three types of sampling pattern. R EFERENCES[1] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressedsensing MRI,”

IEEE Sig. Proc. Mag. , vol. 25, no. 2, pp. 72–82, 2008.[2] K. T. Block, M. Uecker, and J. Frahm, “Undersampled radial MRIwith multiple coils. iterative image reconstruction using a total variationconstraint,”

Magn. Reson. Med. , vol. 57, no. 6, pp. 1086–1098, 2007.[3] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai et al. , “UndersampledMRI reconstruction with patch-based directional wavelets,”

J. Magn.Reson. Imag , vol. 30, no. 7, pp. 964–977, 2012.[4] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application ofcompressed sensing for rapid MR imaging,”

Magn. Reson. Med. , vol. 58,no. 6, pp. 1182–1195, 2007.[5] S. Gho, Y. Nam, S. Zho, E. Y. Kim, and D. Kim, “Three dimension dou-ble inversion recovery gray matter imaging using compressed sensing,”

J. Magn. Reson. Imag , vol. 28, no. 10, pp. 1395–1402, 2010.[6] S. Ravishankar and Y. Bresler, “MR image reconstruction from highlyundersampled k-space data by dictionary learning,”

IEEE Trans. Med.Imag. , vol. 30, no. 5, p. 1028, 2011.[7] S. D. Babacan, X. Peng, X. Wang, M. N. Do, and Z. Liang, “Reference-guided sparsifying transform design for compressive sensing MRI,” in

Proc. IEEE Eng. Med. Biol. , 2011, pp. 5718–5721.[8] Z. Zhan, J. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu, “Fast multiclassdictionaries learning with geometrical directions in MRI reconstruction,”

IEEE TBME , vol. 63, no. 9, pp. 1850–1861, 2016.[9] X. Qu, Y. Hou, F. Lam, D. Guo, J. Zhong, and Z. Chen, “Magneticresonance image reconstruction from undersampled measurements usinga patch-based nonlocal operator,”

Med. Imag. Anal. , vol. 18, no. 6, pp.843–856, 2014.[10] E. M. Eksioglu, “Decoupled algorithm for MRI reconstruction usingnonlocal block matching model: BM3D-MRI,”

J. Math. Imag. Vis. ,vol. 56, no. 3, pp. 430–440, 2016.[11] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang et al. , “Acceleratingmagnetic resonance imaging via deep learning,” in

Proc. IEEE 13th Int.Symp. Biomed. Imag. (ISBI) , 2016, pp. 514–517.[12] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressedsensing MRI,” in

Proc. IEEE 14th Int. Symp. Biomed. Imag. (ISBI) ,2017, pp. 15–18.[13] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “Adeep cascade of convolutional neural networks for dynamic MR imagereconstruction,”

IEEE Trans. Med. Imag. , vol. 37, no. 2, pp. 491–503,2018.[14] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admm-net for compressivesensing MRI,” in

Advances in neural information processing systems ,2016, pp. 10–18.[15] S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolledoptimization with deep priors,” arXiv preprint arXiv:1705.08041 , 2017.[16] T. M. Quan, T. Nguyen Duc, and W. K. Jeong, “Compressed sensingMRI reconstruction using a generative adversarial network with a cyclicloss,”

IEEE Trans. Med. Imag. , vol. 37, no. 6, pp. 1488–1497, 2018.[17] J. Yang, Y. Zhang, and W. Yin, “A fast alternating direction method fortvl1-l2 signal reconstruction from partial fourier data,”

IEEE J. Sel. Top.Sig. Proc. , vol. 4, no. 2, pp. 288–297, 2010.[18] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye et al. ,“Dagan: deep de-aliasing generative adversarial networks for fast com-pressed sensing MRI reconstruction,”

IEEE Trans. Med. Imag. , vol. 37,no. 6, pp. 1310–1321, 2017.[19] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk,L. Xing et al. , “Deep generative adversarial neural networks for com-pressive sensing MRI,”

IEEE Trans. Med. Imag. , vol. 38, no. 1, pp.167–179, 2018.[20] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger,“Sense: sensitivity encoding for fast MRI,”

Magn. Reson. Med. , vol. 42,no. 5, pp. 952–962, 1999.[21] L. Ying and J. Sheng, “Joint image reconstruction and sensitivityestimation in sense (jsense),”

Magn. Reson. Med. , vol. 57, no. 6, pp.1196–1202, 2007.[22] M. A. Griswold, M. Blaimer, F. Breuer, R. M. Heidemann, M. Mueller,and P. M. Jakob, “Parallel magnetic resonance imaging using the grappaoperator formalism,”

Magn. Reson. Med. , vol. 54, no. 6, pp. 1553–1556,2005.[23] M. Lustig and J. M. Pauly, “Spirit: iterative self-consistent parallelimaging reconstruction from arbitrary k-space,”

Magn. Reson. Med. ,vol. 64, no. 2, pp. 457–471, 2010.[24] D. Liang, B. Liu, J. Wang, and L. Ying, “Accelerating sense usingcompressed sensing,”

Magn. Reson. Med. , vol. 62, no. 6, pp. 1574–1584, 2009.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13 [25] Y. Chang, D. Liang, and L. Ying, “Nonlinear grappa: A kernel approachto parallel MRI reconstruction,”

Magn. Reson. Med. , vol. 68, no. 3, pp.730–740, 2012.[26] M. Murphy, M. Alley, J. Demmel, K. Keutzer, S. Vasanawala, andM. Lustig, “Fast \ ell -spirit compressed sensing parallel imaging MRI:Scalable parallel implementation and clinically feasible runtime,” IEEETrans. Med. Imag. , vol. 31, no. 6, pp. 1250–1262, 2012.[27] P. J. Shin, P. E. Larson, M. A. Ohliger, M. Elad, J. M. Pauly, D. B.Vigneron et al. , “Calibrationless parallel imaging reconstruction basedon structured low-rank matrix completion,”

Magn. Reson. Med. , vol. 72,no. 4, pp. 959–970, 2014.[28] J. Rajan, J. Van Audekerke, A. Van der Linden, M. Verhoye, andJ. Sijbers, “An adaptive non local maximum likelihood estimationmethod for denoising magnetic resonance images,” in

Proc. IEEE 9thInt. Symp. Biomed. Imag. (ISBI) , 2012, pp. 1136–1139.[29] H. Gudbjartsson and S. Patz, “The rician distribution of noisy MRI data,”

Magn. Reson. Med. , vol. 34, no. 6, pp. 910–914, 1995.[30] N. Wiest-Daessl´e, S. Prima, P. Coup´e, S. P. Morrissey, and C. Barillot,“Rician noise removal by non-local means ﬁltering for low signal-to-noise ratio MRI: applications to DT-MRI,” in

Proc. Int. Conf. Med.Image Comput. Comput.-Assist. , 2008, pp. 171–179.[31] J. V. Manj´on, P. Coup´e, L. Mart´ı Bonmat´ı, D. L. Collins, and M. Robles,“Adaptive non-local means denoising of MR images with spatiallyvarying noise levels,”

J. Magn. Reson. Imag , vol. 31, no. 1, pp. 192–203,2010.[32] J. C. Wood and K. M. Johnson, “Wavelet packet denoising of magneticresonance images: importance of rician noise at low snr,”

Magn. Reson.Med. , vol. 41, no. 3, pp. 631–635, 1999.[33] R. D. Nowak, “Wavelet-based rician noise removal for magnetic reso-nance imaging,”

IEEE Trans. Imag. Proc. , vol. 8, no. 10, pp. 1408–1419,1999.[34] L. Chen and T. Zeng, “A convex variational model for restoring blurredimages with large rician noise,”

J. Math. Imag. Vis. , vol. 53, no. 1, pp.92–111, 2015.[35] J. Liu and Z. Zhao, “Variational approach to second-order dampedhamiltonian systems with impulsive effects,”

J. Non. Sci. Appl. , vol. 9,no. 6, pp. 3459–3472, 2016.[36] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiserprior for image restoration,” in

CVPR , vol. 2, 2017.[37] M. Uecker, P. Lai, M. J. Murphy, P. Virtue, M. Elad, J. M. Pauly et al. ,“Espiritan eigenvalue approach to autocalibrating parallel MRI: wheresense meets grappa,”

Magn. Reson. Med. , vol. 71, no. 3, pp. 990–1001,2014.[38] L. El Gueddari, C. Lazarus, H. Carri´e, A. Vignaud, and P. Ciuciu,“Self-calibrating nonlinear reconstruction algorithms for variable densitysampling and parallel reception MRI,” in

IEEE 10th Sensor Array andMultichannel Signal Processing Workshop , 2018, pp. 415–419.[39] M. A. Bernstein, S. B. Fain, and S. J. Riederer, “Effect of windowingand zero-ﬁlled reconstruction of MRI data on spatial resolution andacquisition strategy,”

J. Magn. Reson. Imag , vol. 14, no. 3, pp. 270–280, 2001.[40] R. G. Baraniuk, “Compressive sensing [lecture notes],”

IEEE Sig. Proc.Mag. , vol. 24, no. 4, pp. 118–121, 2007.[41] E. M. Eksioglu and A. K. Tanc, “Denoising amp for MRI reconstruction:BM3D-AMP-MRI,”

SIAM Journal on Imaging Sciences , vol. 11, no. 3,pp. 2090–2109, 2018.[42] A. Buades, B. Coll, and J.-M. Morel, “A review of image denoisingalgorithms, with a new one,”

Multiscale Modeling & Simulation , vol. 4,no. 2, pp. 490–530, 2005.[43] J. V. Manj´on, J. Carbonell Caballero, J. J. Lull, G. Garc´ıa Mart´ı,L. Mart´ı Bonmat´ı, and M. Robles, “MRI denoising using non-localmeans,”

Med. Image Anal. , vol. 12, no. 4, pp. 514–523, 2008.[44] S. Aja Fern´andez, M. Niethammer, M. Kubicki, M. E. Shenton, andC. F. Westin, “Restoration of dwi data using a rician lmmse estimator,”

IEEE Trans. Med. Imag. , vol. 27, no. 10, pp. 1389–1403, 2008.[45] A. Foi, “Noise estimation and removal in MR imaging: The variance-stabilization approach.” in

Proc. IEEE 8th Int. Symp. Biomed. Imag.(ISBI) , 2011, pp. 1809–1814.[46] R. Liu, Y. Zhang, S. Cheng, X. Fan, and Z. Luo, “A theoreticallyguaranteed deep optimization framework for robust compressive sensingmri,” in