[PDF] MRI Reconstruction Using Deep Bayesian Estimation

Abstract

Purpose: To develop a deep learning-based Bayesian inference for MRI reconstruction. Methods: We modeled the MRI reconstruction problem with Bayes's theorem, following the recently proposed PixelCNN++ method. The image reconstruction from incomplete k-space measurement was obtained by maximizing the posterior possibility. A generative network was utilized as the image prior, which was computationally tractable, and the k-space data fidelity was enforced by using an equality constraint. The stochastic backpropagation was utilized to calculate the descent gradient in the process of maximum a posterior, and a projected subgradient method was used to impose the equality constraint. In contrast to the other deep learning reconstruction methods, the proposed one used the likelihood of prior as the training loss and the objective function in reconstruction to improve the image quality. Results: The proposed method showed an improved performance in preserving image details and reducing aliasing artifacts, compared with GRAPPA, ℓ 1 -ESPRiT, and MODL, a state-of-the-art deep learning reconstruction method. The proposed method generally achieved more than 5 dB peak signal-to-noise ratio improvement for compressed sensing and parallel imaging reconstructions compared with the other methods. Conclusion: The Bayesian inference significantly improved the reconstruction performance, compared with the conventional ℓ 1 -sparsity prior in compressed sensing reconstruction tasks. More importantly, the proposed reconstruction framework can be generalized for most MRI reconstruction scenarios.

Full PDF

MMRI Reconstruction Using Deep Bayesian Inference

GuanXiong Luo , Na Zhao , Wenhao Jiang , Peng Cao ∗ Department of Diagnostic Radiology, The University of Hong Kong,Hong Kong Department of Statistics and Actuarial Science, The University of HongKong ∗ Corresponding to:Peng CaoDepartment of Diagnostic Radiology, The University of Hong Kong, Hong KongAddress: 5 Sassoon Road, Pok Fu Lum, Hong KongPhone: 0852-53761014Email: [email protected]

Short Running Title:

Deep Bayesian MRI Reconstruction

Key words: generative network, Bayesian inference, deep learning reconstruction, com-pressed sensing, parallel imagingTotal Words: 3900Number of Figures: 8 a r X i v : . [ c s . C V ] S e p bstractPurpose: To develop a deep learning-based Bayesian inference for MRI recon-struction.

Methods:

We modeled the MRI reconstruction problem with Bayes’s theorem,following the recently proposed PixelCNN++ method. The image reconstructionfrom incomplete k-space measurement was obtained by maximizing the posteriorpossibility. A generative network was utilized as the image prior, which was compu-tationally tractable, and the k-space data ﬁdelity was enforced by using an equal-ity constraint. The stochastic backpropagation was utilized to calculate the de-scent gradient in the process of maximum a posterior, and a projected subgradientmethod was used to impose the equality constraint. In contrast to the other deeplearning reconstruction methods, the proposed one used the likelihood of prior asthe training loss and the objective function in reconstruction to improve the imagequality.

Results:

The proposed method showed an improved performance in preserving im-age details and reducing aliasing artifacts, compared with GRAPPA, (cid:96) -ESPRiT,and MODL, a state-of-the-art deep learning reconstruction method. The proposedmethod generally achieved more than 5 dB peak signal-to-noise ratio improvementfor compressed sensing and parallel imaging reconstructions compared with theother methods. Conclusion:

The Bayesian inference signiﬁcantly improved the reconstruction per-formance, compared with the conventional (cid:96) -sparsity prior in compressed sensingreconstruction tasks. More importantly, the proposed reconstruction frameworkcan be generalized for most MRI reconstruction scenarios. Introduction

In compressed sensing MRI reconstruction, the commonly used analytical regularizationsuch as (cid:96) regularization can ensure the convergence of the iterative algorithm and improveMR image quality [1]. The conventional iterative reconstruction algorithm with analyticalregularization has an explicit mathematical deduction in gradient descent, which ensuresthe convergence of the algorithm to a local or global optimal and the generalizability, de-pending on the convexity of the regularization function. Besides, the dictionary learningis an extension of analytical regularization, providing an improvement over the (cid:96) regu-larization in speciﬁc applications [2]. The study of analytically regularized reconstructionmainly focused on choosing the appropriate regularization function and parameters forminimizing the reconstruction error. As an extension of analytical regularization, thedeep learning reconstruction was employed as an unrolled iterative algorithm for solvingthe regularized optimization or used as a substitute for analytic regularization [3, 4, 5, 6].With the advances of deep learning methodology, research started shifting the paradigmto structured feature representation of MRI, such as cascade, deep residual, and generativedeep neural networks [3, 4, 5, 6]. Especially, the method proposed in [3] recast the com-pressed sensing reconstruction into a specially designed neural network that still partlyimitated the analytical data ﬁdelity and regularization terms. In that study, the analyt-ical regularization term was replaced with convolutional layers and a specially designedactivation function [3]. These deep learning methods may show improved performance insome predetermined acquisition settings or pre-trained imaging tasks. However, they alsolack ﬂexibility when used with changes in MRI under-sampling scheme, the number ofradio-frequency coils, and matrix size or spatial resolution. Such restriction is caused bythe mixing of k-space data ﬁdelity and the regularization terms in neural network imple-mentations. Therefore, it was preferable to separate the k-space data ﬁdelity and neuralnetwork-based regularization for improving the ﬂexibility in changing MRI acquisitionconﬁgurations.This study applied Bayesian inference to model the MRI reconstruction problem, andthe statistical representation of an MRI database was used as a prior model. In Bayesianinference, the prior model is required to be computationally scalable and tractable [7, 8].The scalability of the prior model indicates the likelihood of it being used as a meansfor measuring the image quality [7, 8]. The tractability of the prior model means thegradient that facilitates the maximization of posterior distribution can be calculated bystochastic backpropagation for the model [7, 8]. In such Bayesian inference, the imageto be reconstructed was referred to as the parameters of the Bayesian model, which wasconditioned on the measured k-space data (as the posterior). Bayes’s theorem expressedhe posterior as a function of the k-space data likelihood and the image prior. For theimage prior, Refs [9, 10] proposed a generative deep learning model, providing a tractableand scalable likelihood. In those studies, the image prior model was written as themultiplication of the conditional probabilities those indicated pixel-wise dependenciesof the input image. The k-space data likelihood described how the measured k-spacedata was computed from a given MR image. The relationship between k-space data andMR image can be described, using the well-known MRI encoding matrix in an equalityconstraint [1, 11]. With such computationally scalable and tractable prior model, themaximum a posterior can serve as an eﬀective estimator [8] for the high dimensionalimage reconstruction problem tackled in this study. To summarize, the Bayesian inferencefor MRI reconstruction had two separate models: the k-space likelihood model that wasused to encourage data consistency and the image prior model that was used to exploitknowledge learned from an MRI database.This paper presented a generic and interpretable deep learning-based reconstructionframework, using Bayesian inference. It employed a generative network as the MR im-age prior model. The proposed framework was capable of exploiting the MR imagedatabase with the prior model, regardless of the changes in MR imaging acquisition set-tings. Also, the reconstruction was achieved by a series of inferences those employedthe maximum likelihood of posterior with the image prior, i.e., applying the Bayesianinference repeatedly. The reconstruction iterated over the data ﬁdelity enforcement ink-space and the image reﬁnement, using the Bayesian inference. During the iteration,the projected sub-gradient algorithm was used to maximize the posterior. The methodis theoretically described, which was adapted from the methodology proposed by oth-ers [9], and then demonstrated in diﬀerent MRI acquisition scenarios, including parallelimaging, compressed sensing, and non-Cartesian reconstructions. The robustness and thereproducibility of the algorithm were also experimentally validated. Theory

The proposed method applied a generative neural network, as a data-driven MRI prior,to an MRI reconstruction method. This section contained an MRI reconstruction methodusing Bayes (cid:48) theorem and a generative neural network-based MRI prior model, a pixel-wise joint probability distribution for images, using the PixelCNN++ [9].

RI reconstruction using Bayes (cid:48) theorem

With Bayes (cid:48) theorem, one could write the posterior as a product of likelihood and prior: f ( x | y ) = f ( y | x ) g ( x ) f ( y ) ∝ f ( y | x ) g ( x ) (1)where f ( y | x ) was probability of the measured k-space data y for a given image x ,and g ( x ) was the prior model of the image. The image reconstruction was achieved byexploring the posterior f ( x | y ) with an appropriate estimator. The maximum a posteriorestimation (MAP) could provide the reconstructed image ˆ x that was given by:ˆ x MAP ( y ) = arg max x f ( x | y ) = arg max x f ( y | x ) g ( x ) (2)Following the PixelCNN++ [9], a deep neural network model, which was trained withMR image database, was used to approximate the prior g ( x ). Prior model for MR images

In this study, a deep autoregressive network [12] was used as the prior model. This deepneural network served as a generative autoencoder that provided a hierarchic represen-tation of the input image. The prior model predicted a mixture distribution of the inputimage [9]. For MRI reconstruction, we adopted the prior model in the PixelCNN++ [9],except that t he number of image channels was changed from three (i.e., RGB channelsfor color image) to two (i.e., real and imaginary parts for MR image). For each imagepixel, the variable ν had a continuous distribution that gave representation to real orimaginary signal intensity. Like in the VAE and pixelCNN++ [9, 13], the distribution of ν was a mixture of the logistic distribution, given by ν ∼ K (cid:88) i =1 π i logistic( µ i , s i ) . (3)Here, π i was the mixture indicator, µ i and s i were the mean and scale of logistic dis-tribution, respectively. Then the probability on each observed pixel ν was computed as[9] P ( ν ; π, µ, s ) = K (cid:88) i =1 π i [ σ ( ν + 0 . − µ i ) /s i − σ ( ν − . − µ i ) /s i ] (4)where σ was the logistic sigmoid function. Furthermore, in [9, 10], each pixel was depen-dent on all previous pixels up and to the left in an image, as shown in Figure 1a. Theconditional distribution of the subsequent pixel (Re( x i,j ) , Im( x i,j )) at position ( i, j ) wasiven by [9] p ( x i,j | C i,j ) = p (Re( x i,j ) , Im( x i,j ) | C i,j ) = P (Re( x i,j ) | µ Re ( C i,j ) , s Re ( C ij )) × (5) P (Im( x i,j ) | µ Im ( C i,j , Re( x i,j )) , s Im ( C i,j )) µ Im ( C i,j , Re( x i,j )) = µ Im ( C i,j ) + α ( C i,j ) Re( x i,j ) , (6)where the C i,j = { x i − ,j , x i − ,j , ..., x , } denoted the context information which was com-prised of the mixture indicator and the previous pixels as showed in Figure 1a, α was thecoeﬃcient related to mixture indicator and previous pixels. p (Re( x i,j ) , Im( x i,j ) | C i,j ) wasalso a joint distribution for both real and imaginary channels. The real part of the ﬁrstpixel, i.e., x , in Figure 1a, was predicted by a mixture of logistics as described in Eq. 3.This deﬁnition assumed that the mean of mixture components of the imaginary channelwas linearly dependent on the real channel. In this study, the number of mixture com-ponents was 10. In this model, mixture indicator was shared between two channels. The n × n image could be considered as an vectorized image x = ( x (1) , ..., x ( n ) ) by stackingpixels from left to right and up to bottom of one another, i.e., x (1) = x , , x (2) = x , , ..., and x ( n ) = x n,n . The joint distribution of the image vector could be expressed as follow-ing [9]: p ( x ; π, µ, s, α ) = p ( x (1) ) n (cid:89) i =2 p ( x ( i ) | x (1) , .., x ( i − ) (7) π, µ, s, α were the parameters of mixture distribution for each pixel intensity. The gen-erative network PixelCNN++ was expected to predict the joint probability distributionof all pixels in the input image [9], as illustrated in Figure 1b. Therefore, the network NET ( x , Θ) was trained by maximizing the likelihood in Eq. 7, as the training loss wasgiven by ˆΘ = arg max Θ p ( x ; NET ( x , Θ)) , (8)where Θ was the trainable parameter within the network. After training, the networkcould be used as the image prior. Here, we deﬁned the prior model g ( x ) as g ( x ) = p ( x ; NET ( x , ˆΘ)) (9)To summarize, a prior model of x was deﬁned in Eqs. from 2 to 9 that could be con-sidered as as data-driven model, utilizing the knowledge learned from an image database.Such prior model was computationally tractable, as was described in PixelCNN++ [9]. mage reconstruction by MAP The measured k-space data y was given by y = A x + ε , (10)where A was the encoding matrix, x was MR image, and ε was the noise. The matrix A consisted of Fourier matrix, sampling trajectory, and coil sensitivity map. SubstitutingEq. 9 into the log-likelihood for Eq. 2 yieldedˆ x MAP ( y ) = arg max x log f ( y | x ) + log p ( x | NET ( x , ˆΘ)) (11)From the data model, the log-likelihood term for f ( y | x ) had less uncertainty, consideringthe MR imaging principles, for a given image x , the probability for k-space, y , i.e., f ( y | x ) when y = A x + ε , was close to a constant with the uncertainty from noise thatwas irrelevant to x . Hence, Eq. 11 could be rewritten asˆ x MAP ( y ) = arg max x log p ( x | NET ( x , ˆΘ)) s . t . y = A x + ε (12)The equality constraint for data consistency was the result of eliminating the ﬁrst log-likelihood term in Eq. 11. The projected subgradient method was used to solve the equal-ity constrained problem [12, 14]. In [14], authors proposed a stochastic backpropagationmethod for computing gradients through random variables for deep generative models.In PixelCNN++, the stochastic backpropagation provided the subgradient ∇ x log g ( x ),where g ( x ) = p ( x | NET ( x , ˆΘ)), for minimizing the log-likelihood in Eq. 12. We empiri-cally found that the dropout (which applied to ˆΘ) was necessary, when using the gradientto update x in Eq. 12 [15]. To summarize, the MAP-based MRI reconstruction had thefollowing iterative steps:1. Get the descent direction ∇ x ( k ) log g ( x ( k ) )2. Pick up a step size α k = 1 /k

3. Update z ( k +1) = x ( k ) − α k ∇ x ( k ) log g ( x ( k ) )4. Projection x ( k +1) = arg max x ∈ X (cid:107) x − z ( k +1) (cid:107) The projection of z onto { x | y = A x + ε } was given by P ( z ) = z − A ∗ ( AA ∗ ) − ( A z − y ) . (13)herefore, the generative network as a prior model was incorporated into the reconstruc-tion of x through the Bayesian inference based on MAP. Methods

MRI data and pre-processing

Both knee and brain MRI data were used to test the reconstruction performance of theproposed method. The knee MRI data (multi-channel k-space data, 973 scans) weredownloaded from fastMRI reconstruction database [16]. As such, NYU fastMRI investi-gators provided data but did not participate in analysis or writing of this report. A listingof NYU fastMRI investigators, subject to updates, can be found at: fastmri.med.nyu.edu.The primary goal of fastMRI is to test whether machine learning can aid in the recon-struction of medical images. The knee data had two contrast weightings: proton-densitywith and without fat suppression (PDFS and PD). Scan parameters included 15-channelknee coil and 2D multi-slice turbo spin-echo (TSE) acquisition, and other settings whichcould be found in Ref. [16].For brain MRI, we collected 2D multi-slice T1 weighted, T2 weighted, T2 weightedFLAIR, and T2 ∗ weighted brain images from 16 healthy volunteers examined with clinicalstandard-of-care protocols. All brain data were acquired using our 3T MRI (Philips,Achieva), and an eight-channel brain RF coil. T1 weighted, T2 weighted, and T2 weightedFIAIR images were all acquired with TSE readout. Meanwhile, T2 ∗ -weighted imageswere obtained using a gradient-echo sequence. Brain MRI parameters for four contrastweightings were listed in Table 1.Training images were reconstructed from multi-channel k-space data without under-sampling. Then, these image datasets after coil combination were scaled to a magnituderange of [ − ,

1] and resized to an image size of 256 × ×

128 was the largest size that our 4-GPU server could handle.Hence, the original 256 ×

256 images were resized into 128 ×

128 low-resolution imagesby cropping in k-space for knee MRI. For brain MRI, we split each raw 256 ×

256 imageinto four 128 ×

128 image patches, before fed into the network for training. Real andimaginary parts of all 2D images were separated into two channels when inputted intothe neural network. For knee MRI, 15541 images were used as the training dataset, and100 images were used for testing. For brain MRI, 1300 images were used as the trainingdataset, and 100 images were used for testing. eep neural network

The PixelCNN++ was modiﬁed from the code in https://github.com/openai/pixel-cnn.We implemented the reconstruction algorithm using Python, as explained in Eq. 13 andAppendix. With the trained prior model, we implemented the iterative reconstructionalgorithm for maximizing the posterior while enforcing the k-space data ﬁdelity (as ex-plained in Appendix and Fig. 1)c. Only two deep learning models were trained andutilized, one for knee MRI with two contrast weightings, and another for brain MRI withfour contrast weightings. These two models can support all experiments performed inthis study with variable undersampling patterns, coil sensitivity maps, channel numbers,image sizes, and trajectory types. Our networks were trained in Tensorﬂow software, andon four NVIDIA RTX-2080Ti graphic cards. Other hyperparameters were 500 epochs,batch size = 4, and Adam optimizer. It took about ﬁve days to train the network forknee dataset and two days for brain dataset under the above-mentioned conﬁguration.

Parallel imaging and (cid:96) or (cid:96) regularization driven reconstruction The GRAPPA reconstruction was performed with a block size of 4 and 20 central k-space lines as the auto-calibration area [17]. We simulated GRAPPA accelerations withundersampling factors from 2 to 4. The representative undersampling masks were shownin Supplementary Figure 1. We chose l -ESPIRIT and MODL [6] as baseline methodsfor comparison. They were analytical regularizations. The l -ESPIRIT exploited thesparsity of image, and the MODL was a deep learning method for compressed sensingreconstruction, trained via minimizing l reconstruction error. In the l -ESPIRiT recon-struction, we set the l regularization parameter to be 0.01, using BART software. Onereason for choosing MODL was that it supported the coil sensitivity map for applyingparalleling imaging. We followed settings in Ref [6] when training MODL to reconstructthe undersampled knee data. The only diﬀerence was the k-space mask in Ref [6] was 2Dundersampled, while in the current study, the 1D undersampling was applied. The central20 k-space lines were sampled which account for 7% of the full k-space of one 256 × ×

256 matrix size were reconstructed,using the prior model in Eq. 9 that was trained by 128 ×

128 images or image patches.During inference, the 256 ×

256 image was split into four 128 ×

128 patches for applyingthe prior model, as shown in Figure 1c. After updating s ( k +1) , four patches for one imagewere concatenated to form an image with the original size of 256 × { x | y = A x + ε } in Eq. 13. The detailed algorithm was presented inhe Appendix. Non-Cartesian k-space acquisition

In this experiment, spiral sampled k-space from the acquired T2 ∗ -weighted k-space datawas simulated. The method proposed in Ref [18] was used to design the spiral trajectory.The full k-space coverage required 24 spiral interleaves for the spatial resolution used inthis study. The spiral trajectory was shown in Supplementary Figure 1. Besides, theimplementation of non-uniform fast Fourier transform was based on the method in Ref[19]. For comparison, we used the iterative SENSE, i.e., conjugate gradient SENSE (CGSENSE), proposed in Ref [20], as a baseline method. Results

Parallel imaging

Figures 2 and 3 show the comparison of knee and brain MRI reconstructed using GRAPPAand the proposed method. The proposed method had an improved performance in re-covering brain and knee image details and reducing the aliasing artifacts, compared withGRAPPA. As expected, parallel imaging ampliﬁed the noise in the low coil sensitivityregions and along the undersampled dimension. On the other hand, error maps demon-strated in Figure 2 and 3 showed that the proposed method eﬀectively eliminated the noiseampliﬁcation and the aliasing artifacts. Table 2 presents the comparison of GRAPPAreconstruction and the proposed method for knee (N = 100) and brain (N = 100) MRItesting images. With the increase of the undersampling factor, the peak signal to noiseratio (PSNR) of the proposed method decreased less, compared with that of GRAPPA.In addition, with acceleration factor R = 2 in brain MRI, the proposed method showed8 dB more improvement in the PSNR than GRAPPA.

Compressed sensing reconstruction

In Figures 4 and 5, the (cid:96) -ESPIRiT had caused apparent blurring in the reconstructedimages for both knee and brain MRI data. Both the (cid:96) -ESPIRiT and MODL methodscaused residual aliasing artifacts. Meanwhile, the proposed reconstruction recovered mostanatomical structures and sharp boundaries in knee and brain MR images, compared withthose from (cid:96) -ESPIRiT and MODL reconstructions, as shown on error maps in Figure 4and 5. Tables 3 summarized reconstruction results using (cid:96) regularization, MODL, andhe proposed method. The proposed method generally showed more than 5 dB PSNRimprovement compared with (cid:96) -ESPIRiT and MODL. Preliminary result in non-Cartesian MRI reconstruction and quan-titative susceptibility mapping (QSM)

In this study, we used a T2 ∗ weighted gradient-echo images to simulate the spiral k-spacedata with 4-fold acceleration. The reconstructed images from the CG SENSE and theproposed method were compared. The proposed method showed apparent improvementregarding the aliasing artifact reduction and the preservation of T2 ∗ contrast betweengray matter and white matter. The proposed method also showed a slight denoisingeﬀect on the reconstructed image compared with the ground truth. Noted that the samedeep learning model used in the previous Cartesian k-space reconstruction experiments inFigures 3 and 5 was applied to spiral reconstruction, without the need of re-training thedeep learning model. Figure 7 shows the preliminary result from the proposed acceleratedreconstruction in QSM with 4-fold acceleration. Noted that the same deep learningmodel used in the previous brain experiments was applied to this experiment, with phaseinformation preserved in all reconstructed images. The proposed deep learning methodalso showed an apparent de-noising eﬀect on QSM maps, while still preserved the majorphase contrast even with high acceleration. Discussion

The proposed method can reliably and consistently recover the nearly aliased-free imageswith relatively high acceleration factors. Meanwhile, as expected, the increase of imagesmoothing with high acceleration factors was noticed, reﬂecting the loss of intrinsic res-olution. The estimated image from the maximum of the posterior can not guarantee thefull recovery of the image details, i.e., PSNR >

40 dB for a full recovery. However, atmodest acceleration, the reconstruction from a maximum of posterior showed the success-ful reconstruction of the detailed anatomical structures, such as vessels, cartilage, andmembranes in-between muscle bundles.In this study, the results demonstrated the successful reconstruction of high-resolutionimage (i.e., 256 ×

256 matrix) with low-resolution prior (i.e., trained with 128 ×

128 ma-trix), conﬁrming the feasibility of reconstructing images of diﬀerent sizes without theneed for retraining the prior model. The prior model was trained by 128 x 128 images; itwas still valid and applicable for the reconstruction of a high-resolution image. The pro-posed methods provided more than 8 dB improvement over the conventional GRAPPAeconstruction at the 4-fold acceleration in knee MRI. Besides, in contrast to other deeplearning-based methods, which focused on the (cid:96) loss, the likelihood that was condi-tioned by pixel-wise dependencies of the whole image showed an improved representationcapacity, leading to a higher reconstruction accuracy. The applicability of the proposedmethod in the patch-based reconstruction also suggested its high representation capacityand ﬂexibility. Even when the inputs were image patches, the prior model could stillrecover the whole image.The projected subgradient approach to solving Eq. 12 was computationally inex-pensive but converged slowly, as shown in Figure 8. For a random initialization, thealgorithm needed about 500 iterations to converge with a ﬁxed step size. Meanwhile, wenoticed that if the zero-ﬁlled-reconstructed image was used for initialization, the num-ber of iterations required could be reduced to 100. Besides, the decay of residual normstopped earlier than that of the log-likelihood, i.e., when the residual norm stopped de-caying, the likelihood can still penalize the error. This evidence indicated that usingthe residual norm as the (cid:96) ﬁdelity alone was sub-optimal, and the deep learning-basedstatistical regularization can lead to a better reconstruction result compared with the (cid:96) ﬁdelity. Deep learning-based statistical regularization in the proposed method outper-formed other conventional regularizations trained by image-level (cid:96) loss. (cid:96) loss did notgive an explicit description of the relationship amid all pixels in the image, while thelikelihood used in conjunction with the proposed image prior model was conditioned bythe pixel-wise relationship and demonstrated superior performance compared with theconventional methods, under the current experimental setting.Furthermore, the demonstrated image prior can be extended to a more elaboratedform with clinical information, such as organs and contrast types, as the model inputs.For example, one could input the image prior with labels such as brain or knee. Thenhypothetically, the image prior can be designed as a conditional probability for the givenimage label. In other words, the posterior would be dependent on both the k-space dataand image labels. Moreover, the MR pulse sequence parameters could serve as imagelabels for the prior, such as echo time and repetition time. In short, the prior model canbe used to describe clinical information or acquisition parameters. This setting opens up afuture direction on a more elaborated image prior, incorporating clinical information andMR sequence parameters, for more intelligent image representation and pattern detection.In this study, the generative network solely served as an image prior model, in contrastto how neural network was used in other deep learning-based reconstructions [3, 4, 5, 6].Speciﬁcally, in previous studies [3, 4, 5, 6], embedding k-space ﬁdelity term into the net-work made the algorithm inﬂexible because image prior and undersampling artifacts weremixed during the training. The proposed method used the standard analytical term fordelity enforcement; therefore, its ﬂexibility was comparable to the traditional optimiza-tion algorithm, such as (cid:96) regularization. Due to unavoidable changes of the encodingscheme, e.g., the image size and the RF coils during MRI experiment in practice, it wasessentially needed to separate the learned component (the image prior) from the encodingmatrix used in the ﬁdelity term in reconstruction. Besides, the proposed method showedthe feasibility of incorporating the coil sensitivity information in the ﬁdelity term, whichenabled the changeable encoding scheme without the need of retraining the model [20, 21].In summary, the separation of the image prior and the encoding matrix embedded in theﬁdelity term made the proposed method more ﬂexibility and generalizable compared withconventional deep learning approaches. Conclusion

In summary, this study presented the application of Bayesian inference in MR imagingreconstruction with the deep learning-based prior model. We demonstrated that the deepMRI prior model was a computationally tractable and eﬀective tool for MR image recon-struction. The Bayesian inference signiﬁcantly improved the reconstruction performanceover that of conventional (cid:96) sparsity prior in compressed sensing. More importantly, theproposed reconstruction framework was generalizable for most reconstruction scenarios. Acknowledgment

We thank Dr. Hongjiang Wei for providing the QSM processing code.able 1: The scan parameters of diﬀerent weightings used in brain MRI experiments.Type Dimension Voxel(mm) TSE factor TR/TE(ms) TIT1 256 × ×

24 0.9 × × × ×

24 0.9 × × ∗ × ×

28 0.9 × × ± standard deviation, N = 100) for parallelimaging and the proposed method on knee and brain MRI.Undersampling factor organ GRAPPA OursR=2 knee 40.98 ± ± ± ± ± ± ± ± ± ± ± ± ± standard deviation, N = 100) for compressedsensing and the proposed method on knee and brain MRI.Undersampling rate organ (cid:96) -ESPIRiT MODL Ours15% + 7% knee 29.33 ± ± ± ± ± ± ± ± ± ± ± ± i,j x i,j (a)(b)(c) Figure 1: ( a ) The conditional model in [9, 10] deﬁned the probability of image pixel(yellow) x i,j dependent on all the pixels from its up and left side (green), and C i,j was aset that contains all the previous pixels. ( b ) Diagram shows the PixelCNN++ networkin [9], which was the prior model used in this study, i.e., g ( x ) in Eq. 9. Each ResNetblock (gray) consisted of 3 Resnet components. The input of network was x , outputsof network were parameters of mixture distribution ( π, µ, s, α ), which were fed into theconditional probability model in Eq. 7. ( c ) In this method, we reconstructed images with256 ×

256 matrix size, using the prior model g ( x ) that was trained with 128 ×

128 imagesand illustrated in Fig. (b). To reconcile this mismatch, we split one 256 ×

256 image intofour 128 ×

128 patches for applying the prior model. After updating s ( k +1) , four patchesfor one image were merged to form an image with the original size of 256 × { x | y = A x + ε } in Eq. 13. Furthermore, therandom shift along phase encoding direction was applied to mitigate the stitching linein-between patches.RAPPA Ours Ground truth PSNR(dB) 33.49 44.33NMSE(%) 1.61 0.14PSNR(dB) 30.11 41.58NMSE(%) 11.06 0.79

Figure 2: Comparisons on PD and PDFS contrasts using GRAPPA and the proposedreconstructions with R=3 acceleration and 256 ×

256 matrix size. The intensity of er-ror maps was ﬁve times magniﬁed. The proposed method eﬀectively eliminated noiseampliﬁcation and aliasing artifact in GRAPPA reconstruction. -ESPIRiT MODL Ours Ground truth PSNR (dB) 34.55 34.16 40.92NMSE(%) 2.69 2.94 0.56PSNR(dB) 32.59 33.33 34.15NMSE(%) 7.02 5.9 4.9

Figure 4: Comparison of diﬀerent methods on PD and PDFS contrasts, using 27% 1Dundersampled k-space and 256 ×

256 matrix size. The intensity of error maps was ﬁvetimes magniﬁed. The proposed method substantially reduced the aliasing artifact andpreserved image details in compressed sensing reconstruction.RAPPA Ours Ground truth

PSNR(dB) 34.88 47.62NMSE(%) 4.49 0.24PSNR(dB) 33.18 48.31NMSE(%) 2.64 0.10PSNR(dB) 33.21 46.27NMSE(%) 4.46 0.22

Figure 3: Comparisons on T1, T2, and FLAIR-T2 weighted image reconstruction, usingparallel imaging and the proposed reconstruction with R=3 acceleration and 256 × -ESPIRiT MODL Ours Ground truth PSNR(dB) 36.32 33.79 43.47NMSE(%) 3.22 5.78 0.62PSNR(dB) 34.23 32.98 42.41NMSE(%) 2.56 3.41 0.39PSNR(dB) 36.19 35.61 41.53NMSE(%) 2.25 2.56 0.66

Figure 5: Comparison of compressed sensing and deep learning approaches for T1, T2,and FLAIR-T2 weighted image reconstructions, using 22% 1D undersampled k-spaceand 256 ×

256 matrix size. The intensity of error maps was ten times magniﬁed. Theproposed method substantially reduced the aliasing artifact and preserved image detailsin compressed sensing reconstruction.G SENSE Ours Ground truth

PSNR(dB) 22.52 37.18NMSE(%) 15.69 0.54

Figure 6: Comparison of the CG SENSE and proposed reconstruction for simulated spi-ral k-space with 4-fold acceleration (i.e., 6 out of 24 spiral interleaves), acquired by T2 ∗ weighted gradient echo sequence. The intensity of error maps was ﬁve times magniﬁed.The proposed method substantially reduced the aliasing artifact in spiral reconstruction.Noted that the same deep learning model used in the previous Cartesian k-space recon-struction was applied to spiral reconstruction, without the need of re-training the deeplearning model.urs Ground truth ∆ χ [ pp m ] ∆ χ [ pp m ] Figure 7: The preliminary result from the proposed accelerated reconstruction in quan-titative susceptibility mapping (QSM), with R = 4 and GRAPPA type of 1D undersam-pling. The raw images were acquired by T2 ∗ weighted gradient echo sequence. Notedthat the same deep learning model used in the previous experiments was applied to thisexperiment, with phase information preserved in all reconstructed images. The proposeddeep learning method also showed an apparent de-noising eﬀect on QSM maps, while stillpreserved the major phase contrast even with high acceleration, i.e., R = 4. Two rowsshow maps on diﬀerent slices from one healthy volunteer.

100 200 300 400 500Number of iteration0246810121416 Residual norm1/log-likelihood

Figure 8: Convergence curves reﬂected stabilities of iterative steps: 1) maximizing theposterior, which eﬀectively minimized the log-likelihood of MRI prior model and 2) k-space ﬁdelity enforcement, which reduced the residual norm on k-space ﬁdelity. The 22%sampling rate and 1D undersampling scheme were used in this simulation. The residualnorm was written as (cid:107) y − A x (cid:107) in Eq. 13, and the reciprocal of log-likelihood for MRIprior model was given in Eq. 9. a) Mask in Figs. 2 and 3 (b) Mask in Fig. 4 top (c) Mask in Fig. 4 bottom(d) Mask in Fig. 5 top (e) Mask in Fig. 5 middle (f) Mask in Fig. 5 bottom (g) One spiral leaf in Fig. 6 Supporting Figure 1: k-space masks used in the compressed sensing, parallel imaging,and deep learning reconstructions. Bright lines indicate the sampled frequency encodinglines in the 2D k-space, i.e., the 1D undersamplings were simulated. eferences [1] Michael Lustig, David Donoho, and John M Pauly. Sparse mri: The applicationof compressed sensing for rapid mr imaging.

Magnetic Resonance in Medicine: AnOﬃcial Journal of the International Society for Magnetic Resonance in Medicine ,58(6):1182–1195, 2007.[2] Saiprasad Ravishankar and Yoram Bresler. Mr image reconstruction from highlyundersampled k-space data by dictionary learning.

IEEE transactions on medicalimaging , 30(5):1028–1041, 2010.[3] Yan Yang, Jian Sun, Huibin Li, and Zongben Xu. Deep admm-net for compressivesensing mri. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett,editors,

Advances in Neural Information Processing Systems 29 , pages 10–18. CurranAssociates, Inc., 2016.[4] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony N Price, and Daniel Rueck-ert. A deep cascade of convolutional neural networks for dynamic mr image recon-struction.

IEEE transactions on Medical Imaging , 37(2):491–503, 2017.[5] Morteza Mardani, Enhao Gong, Joseph Y Cheng, Shreyas S Vasanawala, Greg Za-harchuk, Lei Xing, and John M Pauly. Deep generative adversarial neural networksfor compressive sensing mri.

IEEE transactions on medical imaging , 38(1):167–179,2018.[6] Hemant K Aggarwal, Merry P Mani, and Mathews Jacob. Modl: Model-based deeplearning architecture for inverse problems.

IEEE transactions on medical imaging ,38(2):394–405, 2018.[7] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic back-propagation and approximate inference in deep generative models. arXiv preprintarXiv:1401.4082 , 2014.[8] Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨onlieb. Solvinginverse problems using data-driven models.

Acta Numerica , 28:1–174, 2019.[9] Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P Kingma. Pixelcnn++:Improving the pixelcnn with discretized logistic mixture likelihood and other modi-ﬁcations. arXiv preprint arXiv:1701.05517 , 2017.[10] Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrentneural networks. arXiv preprint arXiv:1601.06759 , 2016.11] Martin Uecker, Peng Lai, Mark J Murphy, Patrick Virtue, Michael Elad, John MPauly, Shreyas S Vasanawala, and Michael Lustig. Espiritan eigenvalue approachto autocalibrating parallel mri: where sense meets grappa.

Magnetic resonance inmedicine , 71(3):990–1001, 2014.[12] Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra.Deep autoregressive networks. arXiv preprint arXiv:1310.8499 , 2013.[13] Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, andMax Welling. Improving variational inference with inverse autoregressive ﬂow.(nips),2016.

URL http://arxiv. org/abs/1606.04934 .[14] Stephen Boyd, Lin Xiao, and Almir Mutapcic. Subgradient methods. lecture notesof EE392o, Stanford University, Autumn Quarter , 2004:2004–2005, 2003.[15] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Repre-senting model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q.Weinberger, editors,

Proceedings of The 33rd International Conference on MachineLearning , volume 48 of

Proceedings of Machine Learning Research , pages 1050–1059,New York, New York, USA, 20–22 Jun 2016. PMLR.[16] Jure Zbontar, Florian Knoll, Anuroop Sriram, Matthew J. Muckley, Mary Bruno,Aaron Defazio, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chan-darana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, PascalVincent, James Pinkerton, Duo Wang, Naﬁssa Yakubova, Erich Owens, C. LawrenceZitnick, Michael P. Recht, Daniel K. Sodickson, and Yvonne W. Lui. fastmri: Anopen dataset and benchmarks for accelerated MRI.

CoRR , abs/1811.08839, 2018.[17] Mark A Griswold, Peter M Jakob, Robin M Heidemann, Mathias Nittka, VladimirJellus, Jianmin Wang, Berthold Kiefer, and Axel Haase. Generalized autocalibrat-ing partially parallel acquisitions (grappa).

Magnetic Resonance in Medicine: AnOﬃcial Journal of the International Society for Magnetic Resonance in Medicine ,47(6):1202–1210, 2002.[18] Michael Lustig, Seung-Jean Kim, and John M Pauly. A fast method for designingtime-optimal gradient waveforms for arbitrary k-space trajectories.

IEEE transac-tions on medical imaging , 27(6):866–873, 2008.[19] Jeﬀrey A Fessler and Bradley P Sutton. Nonuniform fast fourier transforms usingmin-max interpolation.

IEEE transactions on signal processing , 51(2):560–574, 2003.20] Klaas P Pruessmann, Markus Weiger, Peter B¨ornert, and Peter Boesiger. Advancesin sensitivity encoding with arbitrary k-space trajectories.

Magnetic Resonance inMedicine: An Oﬃcial Journal of the International Society for Magnetic Resonancein Medicine , 46(4):638–651, 2001.[21] Jeﬀrey A Fessler. Model-based image reconstruction for mri.

IEEE Signal ProcessingMagazine , 27(4):81–89, 2010.

A Reconstruction for the varied image size with deepprior model

Algorithm 1

Reconstruction algorithm with deep prior model

Input: y - k-space data A - encoding matrix λ - maximum iteration Output: x - the restored image Give a random initial point x (0) (cid:46) Initialization while (cid:107) g ( k ) (cid:107) > ε and k < λ do (cid:46) Iteration Generate a random shifting oﬀset d Shift x ( k ) d pixels away from the center circularly Split x ( k ) into pieces s ( k ) for feeding to network Get subgradient ∇ x log( g ( x )) at x ( k ) Pick a step size α k = 1 /k Update s ( k +1) = s ( k ) − α k ∇ s ( k ) log g ( s ( k ) ) Merge pieces s ( k +1) into z ( k +1) for projection Shift z ( k +1) − d pixels away from the center circularly Projection x ( k +1) = arg max x ∈ X (cid:107) z − z ( k +1) (cid:107) return x ( kk