[PDF] Bayesian Uncertainty Estimation of Learned Variational MRI Reconstruction

Abstract

Recent deep learning approaches focus on improving quantitative scores of dedicated benchmarks, and therefore only reduce the observation-related (aleatoric) uncertainty. However, the model-immanent (epistemic) uncertainty is less frequently systematically analyzed. In this work, we introduce a Bayesian variational framework to quantify the epistemic uncertainty. To this end, we solve the linear inverse problem of undersampled MRI reconstruction in a variational setting. The associated energy functional is composed of a data fidelity term and the total deep variation (TDV) as a learned parametric regularizer. To estimate the epistemic uncertainty we draw the parameters of the TDV regularizer from a multivariate Gaussian distribution, whose mean and covariance matrix are learned in a stochastic optimal control problem. In several numerical experiments, we demonstrate that our approach yields competitive results for undersampled MRI reconstruction. Moreover, we can accurately quantify the pixelwise epistemic uncertainty, which can serve radiologists as an additional resource to visualize reconstruction reliability.

Full PDF

BBayesian Uncertainty Estimationof Learned Variational MRI Reconstruction

Dominik Narnhofer , Alexander Eﬄand , Erich Kobler , Kerstin Hammernik ,Florian Knoll , and Thomas Pock Silicon Austria Labs (TU Graz SAL DES Lab), Graz, Austria Technical University of Munich, Munich, Germany Imperial College London, London, United Kingdom NYU School of Medicine, New York, USA

Abstract

Recent deep learning approaches focus on improving quantitative scores of dedicated bench-marks, and therefore only reduce the observation-related (aleatoric) uncertainty. However, themodel-immanent (epistemic) uncertainty is less frequently systematically analyzed. In this work,we introduce a Bayesian variational framework to quantify the epistemic uncertainty. To this end,we solve the linear inverse problem of undersampled MRI reconstruction in a variational setting.The associated energy functional is composed of a data ﬁdelity term and the total deep variation(TDV) as a learned parametric regularizer. To estimate the epistemic uncertainty we draw theparameters of the TDV regularizer from a multivariate Gaussian distribution, whose mean andcovariance matrix are learned in a stochastic optimal control problem. In several numerical ex-periments, we demonstrate that our approach yields competitive results for undersampled MRIreconstruction. Moreover, we can accurately quantify the pixelwise epistemic uncertainty, whichcan serve radiologists as an additional resource to visualize reconstruction reliability.

A classical inverse problem related to magnetic resonance imaging (MRI) emerges from the under-sampling of the raw data in Fourier domain (known as k -space) to reduce acquisition time. Whendirectly applying the inverse Fourier transform, the quality of the resulting image is deteriorated byundersampling artifacts since in general the sampling rate does not satisfy the Nyquist–Shannon sam-pling theorem. Prominent approaches to reduce these artifacts incorporate parallel imaging [PWSB99]on the hardware side, or compressed sensing on the algorithmic side [LDP07]. In further algorith-mic approaches, the MRI undersampling problem is cast as an ill-posed inverse problem using ahand-crafted total variation-based regularizer [KCB + + + +

20] for an overview of existing methods and their applicability to MRI.An established technique for solving ill-posed inverse problems are variational methods, in which theminimizer of an energy functional deﬁnes the restored output image. A probabilistic interpretation ofvariational methods is motivated by Bayes’ theorem, which states that the posterior distribution p( x | z )1 a r X i v : . [ ee ss . I V ] F e b ∼ N ( µ, LL > ) θ : x θ s +1 = prox D ( x θ s − TS ∇ x R ( x θ s , θ ) , z, T ) X θ S θ : x θ s +1 = prox D ( x θ s − TS ∇ x R ( x θ s , θ ) , z, T ) X θ S θ : x θ s +1 = prox D ( x θ s − TS ∇ x R ( x θ s , θ ) , z, T ) X θ S θ N : x θ N s +1 = prox D ( x θ N s − TS ∇ x R ( x θ N s , θ N ) , z, T ) X θ N S meanstd Figure 1: Illustration of the stochastic MRI undersampling reconstruction model to calculate theepistemic uncertainty. Here, N instances of the model parameters θ are drawn from N ( µ, LL > ), whichlead to N output images X θ S , . . . , X θ N S . The associated pixelwise mean and standard deviation aredepicted on the right.of a reconstruction x and observed data z is proportional to p( z | x )p( x ). The maximum a posteriori(MAP) estimate [Mur15] in a negative log-domain is the minimizer of the energy E ( x, z ) := D ( x, z ) + R ( x ) (1)among all x , where we deﬁne the data ﬁdelity term as D ( x, z ) ∝ − log(p( z | x )) and the regularizer as R ( x ) ∝ − log(p( x )). Deep learning has been successfully integrated in this approach in a variety ofpapers [HKK +

18, EKKP20, KEKP20b], in which the regularizer is learned from data. However, noneof these publications addresses the quantiﬁcation of the uncertainty in the model itself.In general, two sources of uncertainties exist: aleatoric and epistemic. The former quantiﬁesthe uncertainty caused by observation-related errors, while the latter measures the inherent error ofthe model. Most of the aforementioned methods generate visually impressive reconstructions andthus reduce the aleatoric uncertainty, but the problem of quantifying the epistemic uncertainty iscommonly not addressed. In practice, an accurate estimation of the epistemic uncertainty is vitalfor the identiﬁcation of regions that cannot be reliably reconstructed such as hallucinated patterns,which could potentially result in a misdiagnosis. This problem has been addressed in a Bayesiansetting by several approaches [BCKW15, GG16, KG17, WRV + +

18] use MonteCarlo-dropout and a heteroscedastic loss to estimate MRI reconstruction uncertainty in U-Net/DC-CNN based models. In contrast, Edupugantiet et al. [EMVP21] advocated a probabilistic variationalautoencoder, which in combination with a Monte Carlo approach and Stein’s unbiased risk estimatorallows for a pixelwise uncertainty estimation.There are two main contributions of this paper: First, we adapt the total deep variation (TDV)[KEKP20b, KEKP20a] designed as a novel framework for general linear inverse problems to under-sampled MRI reconstruction, and show that we achieve competitive results on the fastMRI dataset [KZS +

20, KMS + Methods

In this section, we ﬁrst recall the mathematical setting of undersampled MRI reconstruction. Then, weintroduce the sampled optimal control problem for deterministic and stochastic MRI reconstruction,where the latter additionally allows for an estimation of the epistemic uncertainty.

In what follows, we brieﬂy recall the basic mathematical concepts of (undersampled) MRI. We referthe reader to [BCH +

14] for further details.In MRI, fully sampled raw data u are acquired in the Fourier domain commonly known as k -spacewith Q ≥ n = width × height and are identiﬁed with vectors in C nQ (e.g. u ∈ C nQ ). If Q = 1, we referto a single-coil, in all other cases to a multi-coil MRI reconstruction problem. Here, the associateduncorrupted data in image domain are given by y = F − u ∈ C nQ , where F ∈ C nQ × nQ denotesthe channel-wise unitary matrix representation of the two-dimensional discrete Fourier transform and F − its inverse. In this case, the ﬁnal root-sum-of-square image estimate Y for y with a resolution of n = width × height is retrieved as Y i = vuut Q X q =1 | y i,q | for i = 1 , . . . , n , where y i,q refers to the i th pixel value of the q th coil and | · | denotes the absolutevalue or magnitude. Henceforth, we frequently denote the root-sum-of-square of an image by uppercase letters. Acquiring the entity of the Fourier space data (known as fully sampled MRI) results inlong acquisition times and consequently a low patient throughput. To address this issue, a subsetof the k -space data along lines deﬁned by a certain sampling pattern is acquired. However, thisapproach violates the Nyquist–Shannon theorem, which results in clearly visible backfolding artifacts.The aforementioned scheme is numerically realized by a downsampling operator M R ∈ C ( nQ/R ) × nQ representing R -fold Cartesian undersampling ( R ∈ N ), which only preserves R of the lines in frequencyencoding direction. In this case, the linear forward operator is deﬁned as A = M R F ∈ C ( nQ/R ) × nQ .Thus, the observations resulting from the forward formulation of the inverse problem are given by z = Ay + ν ∈ C nQ/R , (2)where ν ∈ C nQ/R is additive noise. The starting point of the proposed framework is a variant of the energy formulation (1). In this paper,we use the speciﬁc data ﬁdelity term D ( x, z ) = 12 k Ax − z k and the data-driven TDV regularizer R ( x, θ ), depending on the learned parameters θ ∈ Θ ⊂ R p ,originally proposed in [KEKP20b,KEKP20a]. In detail, R : R nQ × Θ → R +0 is computed by summingthe pixelwise regularization energy r : R nQ × Θ → R n , i.e. R ( x, θ ) = P nl =1 r ( x, θ ) l , which is deﬁnedas r ( x, θ ) = wψ ( K x ). Note that we use the identiﬁcation C ∼ = R to handle complex numbers. Thebuilding blocks of r are as follows:• K ∈ R nm × nQ is the matrix representation of a 3 × m feature channelsand zero-mean constraint, which enforces an invariance with respect to global shifts,3 magreal K ψ ( · ) w Macroblock Macroblock Macroblock MacroblockR R R R R R R + + + + + + Residual block K φ K + + downsamplingupsamplingaddition Figure 2: The building blocks of the total deep variation with 3 macroblocks (Figure adaptedfrom [KEKP20b, Figure 1]). Complex data are transformed to a pixelwise energy as seen in thetop right corner.• ψ : R nm → R nm is a convolutional neural network (CNN) described below,• w ∈ R n × nm is the matrix representation of a learned 1 × θ encodes K , and all convolutional weights in ψ and w . The CNN ψ is composed of3 macroblocks connected by skip connections (Fig. 2, second row), where each macroblock consistsof 7 residual blocks (Fig. 2, third row). We remark that the core architecture is inspired by a U-Net [RFB15] with additional residual connections and a more sophisticated structure. Each residualblock has two bias-free 3 × K , K ∈ R nm × nm and a smooth log-student-t activationfunction φ ( x ) = log(1 + x ) as depicted in Fig. 2 (last row). This choice of the activation functionis motivated by the pioneering work of Mumford and coworkers [HM99]. The down-/upsamplingare realized by learned 3 × y i , z i ) Ii =1 ∈ C nQ × C nQ/R be a collection of I pairs of uncorrupted data y i in imagedomain and associated observed R -fold undersampled k -space data z i for i = 1 , . . . , I , where both arerelated by (2).Next, we approximate the MAP (1) of E w.r.t. x . To this end, we use gradient descent with asemi-implicit discretization scheme to increase numerical stability, which encompasses an explicit stepin the regularizer and an implicit step in the data ﬁdelity term. This scheme can also be interpretedas a semi-implicit version of the Landweber iteration [Lan51], and reads as x s +1 = x s − TS (cid:16) A ∗ ( Ax s +1 − z ) + ∇ x R ( x s , θ ) (cid:17) (3)for s = 0 , . . . , S −

1, where S ∈ N denotes a ﬁxed number of iteration steps, T > ∇ x denotes the gradient with respect to the x -component. Here, we set x = F − M ∗ R z and the terminal state of the gradient descent x S is the approximation of the MAP4stimator. Rearranging the terms in (3) we get x s +1 = (Id + TS A ∗ A ) − (cid:16) x s + TS ( A ∗ z − ∇ x R ( x s , θ )) (cid:17) . (4)Note that (Id + TS A ∗ A ) − describes the proximal map of the data ﬁdelity term denoted by prox D . Toderive a closed-form expression of prox D , we ﬁrst note that A ∗ A = F − M ∗ R M R F since F is unitary,which readily implies x s +1 = F − (Id + TS M ∗ R M R ) − F (cid:16) x s + TS ( F − M ∗ R z − ∇ x R ( x s , θ )) (cid:17) = F − (Id + TS M ∗ R M R ) − (cid:16) F ( x s − TS ∇ x R ( x s , θ )) + TS M ∗ R z (cid:17) . Hence, we can rephrase (4) as x s +1 = prox D ( x s − TS ∇ x R ( x s , θ ) , z, T ) , (5)where the proximal map of the data ﬁdelity term isprox D ( x, z, T ) = F − ((Id + TS M ∗ R M R ) − ( F x + TS M ∗ R z )) . We denote by x s ( z, T, θ ) the value of this scheme as a function of z , T and θ after s iterations.Following [EKKP20,KEKP20b,KEKP20a,EHL19] we cast the training process as a discrete optimalcontrol problem with control parameters T and θ . We remark that x S ( z, T, θ ) refers to the terminalstate of (5) using the parameters T and θ and the data z , and X S ( z, T, θ ) deﬁned as the root-sum-of-square of x S ( z, T, θ ) coincides with the reconstructed output image. We use the subsequent commonlyused loss functional J ( y, z, T, θ ) = k X S ( z, T, θ ) − Y k + τ (1 − SSIM( X S ( z, T, θ ) , Y )) (6)for τ >

0, which balances the ‘ -norm and the SSIM score. Note that the loss functional onlyincorporates the diﬀerence of the magnitudes of the reconstruction X S ( z, T, θ ) and the target Y . Inthe cost functional given by inf T ∈ R +0 ,θ ∈ Θ 1

I I X i =1 J ( y i , z i , T, θ ) (7)the discrepancy of the reconstructions and the targets among the entire data set is minimized. Inspired by [BCKW15], we estimate the epistemic uncertainty of the previous deterministic model bysampling the weights from a learned probability distribution. In detail, we draw the weights θ of theregularizer from the multivariate Gaussian distribution N ( µ, Σ) with a learned mean µ ∈ Θ ⊂ R p and covariance matrix Σ ∈ R p × p . To decrease the amount of learnable parameter, we reparametrizeΣ = LL > ∈ R p × p , where L ∈ R p × p is a learned lower triangular matrix with non-vanishing diagonalentries. In particular, Σ is always positive deﬁnite and symmetric. Henceforth, we assume that T is ﬁxed and that K , the down- and upsampling operators, and w are always deterministic. As astraightforward approach to model uncertainty, one could minimize (7) w.r.t. µ and L . However,in this case a deterministic minimizer with θ = µ and Σ being the null matrix is retrieved. Thus,to enforce a certain level of uncertainty we include the Kullback–Leibler divergence KL in the loss5unctional [Mac03]. We recall that KL for two multivariate probability distributions p and p withdensity functions f and f on a domain Ω reads asKL( p k p ) = Z Ω f ( x ) log (cid:18) f ( x ) f ( x ) (cid:19) d x. In particular, KL is non-negative and in general non-symmetric, and can be regarded as a discrepancymeasure of two probability distributions. In the special case of multivariate Gaussian probabilitydistributions p = N ( µ , Σ ) and p = N ( µ , Σ ), the Kullback–Leibler divergence admits the closed-form expression KL( p k p ) = (cid:0) log | Σ || Σ | + tr(Σ − Σ ) + ( µ − µ ) > Σ − ( µ − µ ) − d (cid:1) . In this paper, we advocate the particular choice p = N ( µ, Σ) and p = N ( µ, α − Id), where µ and Σ = LL > are computed during optimization and α > α is essential to control the level of uncertainty in themodel. Neglecting constants and scaling the Kullback–Leibler divergence with β ≥ (cid:16) E θ ∼N ( µ,LL > ) h I I X i =1 J ( y i , z i , T, θ ) i + β ( α tr( LL > ) − log(det( LL > ))) : T ∈ R +0 , µ ∈ Θ , L ∈ R p × p with det( L ) = 0 (cid:17) . (8)Note that (8) coincides with the deterministic model if β = 0. Indeed, in this case the MAP estimateis retrieved which minimizes the functional J since no uncertainty is promoted.We observe that the variational problem (8) is actually composed of the nonlinear term J ( T, µ, L ) := E θ ∼N ( µ,LL > ) " I I X i =1 J ( y i , z i , T, θ ) as well as the convex component f ( L ) := β ( α tr( LL > ) − log(det( LL > ))) . Following [CP16], to increase stability we employ a semi-implicit gradient descent-based scheme in (8)for the optimization of L with an implicit step on f , which is equivalent to an update with a proximalmap on f . To this end, we recall that the proximal map of f with step size τ > τf ( L ) = argmin L τ k L − L k + f ( L ) , (9)where the minimum is taken among all non-singular lower triangular matrices. Proposition 1.

Given a regular lower triangular matrix L ∈ R p × p , the proximal map of f admits theclosed-form expression prox τf ( L ) ij =  L ii + q L ii + 8 βτ (1 + 2 αβτ )2(1 + 2 αβτ ) , i = j, (1 + 2 αβτ ) − L ij , i = j. roof. We note that f can be rewritten as f ( L ) = αβ p X i,j =1 L ij − β p X i =1 log( L ii ) . Thus, the optimization problem (9) can be optimized component-wise and results in a quadraticproblem. The expression for the proximal map immediately follows.In summary, we propose the subsequent alternating update scheme  T k +1 µ k +1 L k +1  =  T k − h kT ∇ T J ( T k , µ k , L k ) µ k − h kµ ∇ µ J ( T k , µ k , L k )prox τh kL f ( L k − h kL ∇ L J ( T k , µ k , L k ))  , where the iteration-dependent step sizes h kT , h kµ , h kL > β determines the level of entropy inherent in the model. We deﬁne that themean entropy H as a measure of uncertainty in the model [Sha48] for the convolutional kernels K and K of all residual blocks as H (Σ) = 12 N K N K X i =1 ln(2 π det(Σ i )) , where N K is the total number of stochastic convolutional kernels in the network and Σ i is the collectionof associated covariance matrices of each kernel. To optimize the optimal control problems in the deterministic (7) and stochastic (8) regime, we usethe ADAM algorithm [KB15] with a batch size of 8, momentum variables β = 0 . β = 0 . k parameter updates. The initiallearning rate is 10 − , which is halved each 50 k iterations, and the total number of iterations is 120 k for Q = 1 and 200 k for Q ≥

2. The memory consumption is reduced by randomly extracting patchesof size 96 ×

368 in frequency encoding direction as advocated by [SCH + S = 2 iterations, which is successivelyincremented by 1 after 7500 iterations. For the same reason, we retrain our model for R = 8 startingfrom the terminal parameters for R = 4 using 100 k iterations for Q = 1 and 130 k for Q ≥ + + To retrieve estimates of the undersampled MRI reconstruction in the stochastic setting, we draw N ∈ N instances θ N = ( θ , . . . , θ N ) ∈ Θ N from N ( µ, LL > ), where µ and L are determined by (8).7hen, the average x N and the corresponding standard deviation σ N are deﬁned as x Ns ( z, T, θ N ) = N N X i =1 x s ( z, T, θ i ) , ( σ Ns ( z, T, θ N )) j = N N X i =1 (( x s ( z, T, θ i ) − x Ns ( z, T, θ N )) j ) for each pixel j = 1 , . . . , n and s = 0 , . . . , S . In particular, the reconstructed image is given by x NS ( z, T, θ N ). This approach is a special form of posterior sampling and summarized in Fig. 1. Finally,the root-sum-of-square reconstruction X Ns ( z, T, θ N ) of x Ns ( z, T, θ N ) and the corresponding standarddeviation are given by X Ns ( z, T, θ N ) = N N X i =1 X s ( z, T, θ i ) , (10)( b σ Ns ( z, T, θ N )) j = N N X i =1 (( X s ( z, T, θ i ) − X Ns ( z, T, θ N )) j ) . In this section, we present numerical results for single and multi-coil undersampled MRI reconstructionin the deterministic and stochastic setting. In all experiments, we set the initial lower triangularma-trix L = √ − Id, α = 10 , S = 15 and N = 32, in all multi-coil results we have Q = 15 coils.Table 1 lists quantitative results for R ∈ { , } of the initial zero ﬁlling, two state-of-the-art meth-ods (U-Net [KZS +

20] and iRim [PKT + θ used for reconstruction is identical to thedeterministic case.Fig. 3 depicts two prototypic ground truth images of the CORPD data in the single (ﬁrst row)and multi-coil (second row) case, the corresponding zero ﬁlling results, the deterministic X and themean X with β = 10 − for Q = 1 and β = 7 . · − for Q = 15 (see (10)), and the standarddeviation b σ for an undersampling factor of R = 4. The associated entropy levels are H (Σ) = − . H (Σ) = − .

68, respectively. In both the deterministic and the stochastic reconstructions evenﬁne details and structures are clearly visible, and the noise level is substantially reduced comparedto the ground truth, which can be seen in the zoom with magniﬁcation factor 3. Note that hardlyany visual diﬀerence is observed in both reconstructions. Clearly, the quality in the single-coil caseis inferior to the multi-coil case. Moreover, large values of the standard deviation are concentratedin regions with clearly pronounced texture patterns, which are caused by the lack of data in high-frequency k -space regions. Thus, the standard deviation can be interpreted as a local measure for theepistemic uncertainty. Since the proximal operator is applied after the update of the regularizer, highvalues of the standard deviation can only be found in regions where data is unknown.The k -space associated with the aforementioned single-coil case is depicted in Fig. 4. In detail, theleftmost image visualizes the undersampling pattern resulting from the predeﬁned Cartesian down-sampling operator M R , which yields the zero-ﬁlled observation (second image) when combined with8he fully sampled raw data (third image), where we plot the magnitudes in a logarithmic scale. Thefourth and the ﬁfth image depict the mean x and the standard deviation σ of the reconstructionin k -space. As a result, our proposed method accurately retrieves the central star-shaped structures ofthe k -space representing essential image features, although the undersampling pattern is still clearlyvisible. Moreover, the standard deviation peaks in the central star-shaped section when data is missingand thus empirically identiﬁes regions with larger uncertainty.Likewise, Fig. 5 depicts the corresponding results for CORPDFS data and R = 4 using the sameentropy levels as before, all other parameters are the same as in the previous Fig. 3. We remark that thesignal-to-noise ratio is smaller in CORPDFS data than in CORPD data and thus the reconstructionshave a tendency to include more noise and imperfections. The inferior quality compared to CORPDis also reﬂected in the higher average intensities of the standard deviations.Finally, Fig. 6 shows the multi-coil reconstruction results for 8-fold undersampling and both datasets in the same arrangement as before, the entropy level is H (Σ) = − .

41. As expected, the overallreconstruction quality is quantitatively and qualitatively inferior to the case R = 4. As before, thediﬀerence of the deterministic and the stochastic restored images is relatively small and the standarddeviations properly identify regions with higher uncertainties. Fig. 7 contains triplets of color-coded covariance matrices of the convolution layers K in diﬀerentmacroblocks and residual blocks (using the abbreviations MB i for i = 1 , , j for j = 1 , , β = 7 . · − . Note that we use diﬀerent scalings for positiveand negative values among each residual block. Each visualized covariance matrix is the mean of theindividual covariance matrices of each convolution block K appearing in Σ. The resulting covariancematrices are clearly diagonally dominant with a similar magnitude of the diagonal entries among eachresidual block, but diﬀerent magnitudes among diﬀerent residual blocks. Furthermore, most of the oﬀ-diagonal entries signiﬁcantly diﬀer from 0. As a result, the entries of the covariance matrices associatedwith the ﬁrst residual block in each macroblock have a tendency to smaller values compared to theones of the last residual block. Thus, the uncertainty of the network is primarily aggregated at latterresidual blocks within each macroblock, which is in correspondence with error propagation theory:perturbations occurring shortly after the initial period have commonly a larger impact on a dynamicalsystem than perturbations occurring later. Next, we perform a nonlinear eigenfunction analysis [Gil18] following the approach in [EKKP20,KEKP20a] to heuristically identify energy minimizing local structures.

Nonlinear eigenfunctions aredeﬁned as minimizers of the variational problemmin x ∈ C nQ k∇ x R ( x, θ ) − Λ( x ) x k (11)subject to a ﬁxed initial image, where the generalized Rayleigh quotient deﬁning the eigenvalues isgiven by Λ( x ) = h∇ x R ( x, θ ) , x ik x k . In particular, minimizers of (11) satisfy ∇ x R ( x, θ ) ≈ Λ( x ) x . We exploit Nesterov’s projected acceler-ated gradient descent [Nes83] for the optimization in (11).Fig. 8 depicts two pairs of reconstructed images X for the initialization and the corresponding root-sum-of-squares of the eigenfunctions in the deterministic single-coil case. The resulting eigenfunctions9redominantly exhibit piecewise smooth regions, where additional high-frequency stripe patterns andlines in the proximity of bone structures as well as blood vessels are hallucinated. This behaviororiginates from two opposing eﬀects: some backfolding artifacts caused by missing high-frequencycomponents in k -space are removed in our approach, whereas certain high-frequency information arehallucinated. In the ﬁnal experiment, we analyze the eﬀects of the entropy level and the averaging on the PSNRvalues. To this end, we draw 32 instances θ = ( θ , . . . , θ ) ∈ Θ from N ( µ, Σ). Then, for diﬀerentlevels of the entropy enforced by diﬀerent values of β we calculate the lower and upper bounds of thePSNR values of X ls ( z, T, θ l ) for 1 ≤ l ≤

32, where θ l is any subset of θ with l elements. Fig. 9 depictsthe resulting color-coded spans for l ∈ { , , , } and ﬁve diﬀerent levels of entropy (including thelimiting case H (Σ) = −∞ in the deterministic case). As a result, the PSNR curves monotonicallydecrease with higher levels of entropy, but even at the highest entropy level induced by β = 5 · − we observe only a relatively small decrease in the PSNR value. Moreover, the spans of the diﬀerentaveraging processes clearly prove that higher values of l are beneﬁcial, that is why an averaging amonga larger number of realizations should be conducted whenever possible. Finally, we observe that theperformance saturates with larger values of l .Table 1: Quantitative results for various single and multi-coil MRI reconstruction methods for R ∈{ , } . R = 4 R = 8 Acquisition Method PSNR ↑ NMSE ↓ SSIM ↑ PSNR ↑ NMSE ↓ SSIM ↑ Parameters ( × ) single-coil zero ﬁlling 30.5 0.0438 0.687 26.6 0.0839 0.543 − U-Net [KZS +

20] 32.2 0.032 0.754 29.5 0.048 0.651 214 . +

19] 33.5 0.0279 0.777 n.a. n.a. n.a. 140.92iRim [PKT +

20] 33.7 0.0271 0.781 30.6 0.0419 0.687 275 . . . − U-Net [KZS +

20] 35.9 0.0106 0.904 33.6 0.0171 0.858 214 . +

19] 39.8 0.0051 0.928 36.7 0.0091 0.888 675.97iRim [PKT +

20] 39.6 0.0051 0.928 36.7 0.0091 0.888 329 . . . round truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ Figure 3: Single (ﬁrst row) and multi-coil (second row) MRI reconstruction results for CORPD dataand R = 4. From left to right: ground truth images Y , zero ﬁlling, deterministic reconstructions X ,stochastic reconstructions X and standard deviations b σ (0 0 . undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ undersampling pattern zero-ﬁlled observation fully sampled raw data mean x standard deviation σ Figure 4: Visualization of the magnitude images in k -space (logarithmic scale) for R = 4 in thesingle-coil case. From left to right: undersampling pattern, zero-ﬁlled observation, fully sampled rawdata, mean x , standard deviation σ ( − . − . round truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ Figure 5: Single (ﬁrst row) and multi-coil (second row) MRI reconstruction results for CORPDFS dataand R = 4. From left to right: ground truth images Y , zero ﬁlling, deterministic reconstructions X ,stochastic reconstructions X and standard deviation b σ (0 0 .

04 ( Q = 1) / 0 .

03 ( Q = 15)). ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ ground truth Y zero ﬁlling reconstruction X reconstruction X standard deviation b σ Figure 6: Multi-coil MRI reconstruction results for CORPD (ﬁrst row) and CORPDFS (second row)data and R = 8. From left to right: ground truth images Y , zero ﬁlling, deterministic reconstruc-tions X , stochastic reconstructions X and standard deviation b σ (0 0 . MB MB MB R − . · − . · − MB MB MB R − . · − . · − MB MB MB R − . · − . · − Figure 7: From left to right: triplets of color-coded covariance matrices of the convolution layers K in diﬀerent macroblocks (MB i for i = 1 , ,

3) and residual blocks (R j for j = 1 , , −∞ − − − − − − H . . . . . . . P S N R span of X span of X span of X X Figure 9: Dependency of the PSNR value on the entropy and the averaging.13

Conclusion

In this paper, we proposed a Bayesian framework for uncertainty quantiﬁcation in single and multi-coil undersampled MRI reconstruction exploiting the total deep variation regularizer. To estimatethe epistemic uncertainty, we introduced a stochastic optimal control problem, in which the weightsof the regularizer are sampled from a learned multivariate Gaussian distribution. With the proposedBayesian framework, we can generate visually appealing reconstruction results alongside a pixelwiseestimation of the epistemic uncertainty, which might aid medical scientists to revise diagnoses basedon structures clearly visible in the standard deviation plots.

Acknowledgements

The authors acknowledge support from the ERC starting grant HOMOVIS (No. 640156) and the USNational Institute of Health (R01EB024532, P41 EB017183 and R21 EB027241).

References [AMJ18] Hemant K Aggarwal, Merry P Mani, and Mathews Jacob. MoDL: Model-based deeplearning architecture for inverse problems.

IEEE Trans. Med. Imaging , 38(2):394–405,2018.[AMWU19] Mehmet Akçakaya, Steen Moeller, Sebastian Weingärtner, and Kâmil Uğurbil. Scan-speciﬁc robust artiﬁcial-neural-networks for k-space interpolation (RAKI) reconstruction:Database-free deep learning for fast imaging.

Magn. Reson. Med. , 81(1):439–453, 2019.[BCH +

14] Robert W Brown, Y-C Norman Cheng, E Mark Haacke, Michael R Thompson, andRamesh Venkatesan.

Magnetic resonance imaging: physical principles and sequence de-sign . John Wiley & Sons, 2014.[BCKW15] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weightuncertainty in neural networks. In

ICML , volume 37, pages 1613–1622, 2015.[CP16] Antonin Chambolle and Thomas Pock. An introduction to continuous optimization forimaging.

Acta Numer. , 25:161–319, 2016.[EHL19] Weinan E, Jiequn Han, and Qianxiao Li. A mean-ﬁeld optimal control formulation ofdeep learning.

Res. Math. Sci. , 6(1):Paper No. 10, 41, 2019.[EKKP20] Alexander Eﬄand, Erich Kobler, Karl Kunisch, and Thomas Pock. Variational networks:an optimal control approach to early stopping variational methods for image restoration.

J. Math. Imaging Vision , 62(3):396–416, 2020.[EMVP21] Vineet Edupuganti, M. Mardani, S. Vasanawala, and J. Pauly. Uncertainty quantiﬁcationin deep MRI reconstruction.

IEEE Trans. Med. Imaging , 40:239–250, 2021.[GG16] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representingmodel uncertainty in deep learning. In

ICML , pages 1050–1059, 2016.[Gil18] Guy Gilboa.

Nonlinear eigenproblems in image processing and computer vision . Advancesin Computer Vision and Pattern Recognition. Springer, Cham, 2018.14HKK +

18] Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P. Recht, Daniel K. Sodickson,Thomas Pock, and Florian Knoll. Learning a variational network for reconstruction ofaccelerated MRI data.

Magn. Reson. Med. , 79(6):3055–3071, 2018.[HM99] Jinggang Huang and David Mumford. Statistics of natural images and models. In

IEEEConference on Computer Vision and Pattern Recognition , pages 541–547, 1999.[HSQ +

19] Kerstin Hammernik, Jo Schlemper, Chen Qin, Jinming Duan, Ronald M. Summers, andDaniel Rueckert. Σ-net: Systematic evaluation of iterative deep neural networks for fastparallel MR image reconstruction. arXiv , 2019.[KB15] Diederik P. Kingma and Jimmy Lei Ba. ADAM: a method for stochastic optimization. In

ICLR , 2015.[KCB +

12] Florian Knoll, Christian Clason, Kristian Bredies, Martin Uecker, and Rudolf Stollberger.Parallel imaging with nonlinear reconstruction using variational penalties.

Magn. Reson.Med. , 67(1):34–41, 2012.[KEKP20a] Erich Kobler, Alexander Eﬄand, Karl Kunisch, and Thomas Pock. Total deep variation:A stable regularizer for inverse problems. arXiv , 2020.[KEKP20b] Erich Kobler, Alexander Eﬄand, Karl Kunisch, and Thomas Pock. Total deep variationfor linear inverse problems. In

CVPR , 2020.[KG17] Alex Kendall and Yarin Gal. What uncertainties do we need in Bayesian deep learningfor computer vision? In

NIPS , pages 5574–5584. Curran Associates, Inc., 2017.[KHZ +

20] Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K.Sodickson, and Mehmet Akçakaya. Deep learning methods for parallel magnetic resonanceimage reconstruction.

IEEE Signal Process. Mag. , 37(1):128–140, 2020.[KMS +

20] Florian Knoll, Tullie Murrell, Anuroop Sriram, Naﬁssa Yakubova, Jure Zbontar, MichaelRabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick,and Michael P. Recht. Advancing machine learning for MR image reconstruction with anopen competition: Overview of the 2019 fastMRI challenge. arXiv , 2020.[KZS +

20] Florian Knoll, Jure Zbontar, Anuroop Sriram, Matthew J. Muckley, Mary Bruno, AaronDefazio, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, ZizhaoZhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, James Pinker-ton, Duo Wang, Naﬁssa Yakubova, Erich Owens, Lawrence C. Zitnick, Michael P. Recht,Daniel K. Sodickson, and Yvonne W. Lui. fastMRI: A publicly available raw k-space andDICOM dataset of knee images for accelerated MR image reconstruction using machinelearning.

Radiol. Artif. Intell. , 2020.[Lan51] Louis Landweber. An iteration formula for fredholm integral equations of the ﬁrst kind.

American Journal of Mathematics , 73(3):615–624, 1951.[LDP07] Michael Lustig, David Donoho, and John M Pauly. Sparse MRI: The application ofcompressed sensing for rapid MR imaging.

Magn. Reson. Med. , 58(6):1182–1195, 2007.[LL19] Alexander Selvikvåg Lundervold and Arvid Lundervold. An overview of deep learningin medical imaging focusing on MRI.

Zeitschrift für Medizinische Physik , 29(2):102–127,2019. 15Mac03] David J. C. MacKay.

Information theory, inference and learning algorithms . CambridgeUniversity Press, New York, 2003.[Mur15] Kevin P. Murphy.

Machine learning: a probabilistic perspective . Adaptive Computationand Machine Learning. MIT Press, 2015.[Nes83] Yuri E. Nesterov. A method of solving a convex programming problem with convergencerate O ( k ). Dokl. Akad. Nauk SSSR , 269(3):543–547, 1983.[PKT +

20] Patrick Putzky, Dimitrios Karkalousos, Jonas Teuwen, Nikita Miriakov, Bart Bakker,Matthan Caan, and Max Welling. i-RIM applied to the fastMRI challenge. arXiv , 2020.[PWSB99] Klaas P. Pruessmann, Markus Weiger, Markus B. Scheidegger, and Peter Boesiger.SENSE: sensitivity encoding for fast MRI.

Magn. Reson. Med. , 42(5):952–962, 1999.[RFB15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networksfor biomedical image segmentation. In

MICCAI , pages 234–241. Springer, 2015.[SCB +

18] Jo Schlemper, Daniel C. Castro, Wenjia Bai, Chen Qin, Ozan Oktay, Jinming Duan,Anthony N. Price, Jo Hajnal, and Daniel Rueckert. Bayesian deep learning for acceleratedMR image reconstruction. In Florian Knoll, Andreas Maier, and Daniel Rueckert, editors,

Machine Learning for Medical Image Reconstruction , pages 64–71. Springer InternationalPublishing, 2018.[SCH +

18] Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony N. Price, and Daniel Rueckert.A deep cascade of convolutional neural networks for dynamic mr image reconstruction.

IEEE Trans. Med. Imaging , 37(2):491–503, 2018.[Sha48] C. E. Shannon. A mathematical theory of communication.

Bell System Tech. J. , 27:379–423, 623–656, 1948.[TZ18] Mark Tygert and Jure Zbontar. Simulating single-coil MRI from the responses of multiplecoils. arXiv , 2018.[WRV +

20] Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Światkowski, Linh Tran, StephanMandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, and Sebastian Nowozin. Howgood is the Bayes posterior in deep neural networks really?

ICML , 2020.[YHC18] Jong Chul Ye, Yoseob Han, and Eunju Cha. Deep convolutional framelets: a general deeplearning framework for inverse problems.