[PDF] Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM

Abstract

We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM).

Full PDF

VVariational Semi-blind Sparse Deconvolution withOrthogonal Kernel Bases and its Application to MRFM

Se Un Park a, ∗ , Nicolas Dobigeon b , Alfred O. Hero a a University of Michigan, Department of EECS, Ann Arbor, MI 48109-2122, USA b University of Toulouse, IRIT/INP-ENSEEIHT, 2 rue Camichel, BP 7122, 31071 Toulousecedex 7, France

Abstract

We present a variational Bayesian method of joint image reconstruction andpoint spread function (PSF) estimation when the PSF of the imaging deviceis only partially known. To solve this semi-blind deconvolution problem, priordistributions are speciﬁed for the PSF and the 3D image. Joint image recon-struction and PSF estimation is then performed within a Bayesian framework,using a variational algorithm to estimate the posterior distribution. The imageprior distribution imposes an explicit atomic measure that corresponds to im-age sparsity. Importantly, the proposed Bayesian deconvolution algorithm doesnot require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chainMonte Carlo (MCMC) version of myopic sparse reconstruction. It signiﬁcantlyoutperforms mismatched non-blind algorithms that rely on the assumption ofthe perfect knowledge of the PSF. The algorithm is illustrated on real data frommagnetic resonance force microscopy (MRFM).

Keywords:

Variational Bayesian inference, posterior image distribution,image reconstruction, hyperparameter estimation, MRFM experiment. ∗ Corresponding author. Tel: +1 (734) 763-4497, fax: +1 (734) 763-8041This research was partially supported by a grant from ARO, grant number W911NF-05-1-0403.

Email addresses: [email protected] (Se Un Park), [email protected] (Nicolas Dobigeon), [email protected] (Alfred O. Hero)

Preprint submitted to Elsevier October 17, 2018 a r X i v : . [ phy s i c s . d a t a - a n ] M a r . Introduction The standard and popular image deconvolution techniques generally assumethat the space-invariant instrument response, i.e., the point spread function(PSF), is perfectly known. However, in many practical situations, the truePSF is either unknown or, at best, partially known. For example, in an opti-cal system a perfectly known PSF does not exist because of light diﬀraction,apparatus/lense aberration, out-of-focus, or image motion [1, 2]. Such imper-fections are common in general imaging systems including MRFM, where thereexist additional model PSF errors in the sensitive magnetic resonance condition[3]. In such circumstances, the PSF required in the reconstruction process ismismatched with the true PSF. The quality of standard image reconstructiontechniques may suﬀer from this disparity. To deal with this mismatch, decon-volution methods have been proposed to estimate the unknown image and thePSF jointly. When prior knowledge of the PSF is available, these methods areusually referred to as semi-blind deconvolution [4, 5] or myopic deconvolution[6, 7, 8].In this paper, we formulate the semi-blind deconvolution task as an estima-tion problem in a Bayesian setting. Bayesian estimation has the great advantageof oﬀering a ﬂexible framework to solve complex model-based problems. Priorinformation available on the parameters to be estimated can be eﬃciently in-cluded within the model, leading to an implicit regularization of our ill-posedproblem. In addition, the Bayes framework produces posterior estimates of un-certainty, via posterior variance and posterior conﬁdence intervals. Extendingour previous work, we propose a variational estimator for the parameters ascontrasted to the Monte Carlo approach in [9]. This extension is non-trivial.Our variational Bayes algorithm iterates on a hidden variable domain associ-ated with the mixture coeﬃcients. This algorithm is faster, more scalable forequivalent image reconstruction qualities in [9].Like in [9], the PSF uncertainty is modeled as the deviation of the a prioriknown PSF from the true PSF. Applying an eigendecomposition to the PSF co-2ariance, the deviation is represented as a linear combination of orthogonal PSFbases with unknown coeﬃcients that need to be estimated. Furthermore, weassume the desired image is sparse, corresponding to the natural sparsity of themolecular image. The image prior is a weighted sum of a sparsity inducing partand a continuous distribution; a positive truncated

Laplacian and atom at zero (LAZE) prior [10]. Similar priors have been applied to estimating mixtures ofdensities [11, 12, 13] and sparse, nonnegative hyperspectral unmixing [14]. Herewe introduce a hidden label variable for the contribution of the discrete mass(empty pixel) and a continuous density function (non-empty pixel). Similar toour ‘hybrid’ mixture model, inhomogeneous gamma-Gaussian mixture modelshave been proposed in [15].Bayesian inference of parameters from the posterior distribution generallyrequires challenging computations, such as functional optimization and numer-ical integration. One widely advocated strategy relies on approximations tothe minimum mean square error (MMSE) or maximum a posteriori (MAP) es-timators using samples drawn from the posterior distribution. Generation ofthese samples can be accomplished using Markov chain Monte Carlo methods(MCMC) [16]. MCMC has been successfully adopted in numerous imaging prob-lems such as image segmentation, denoising, and deblurring [17, 16]. Recently,to solve blind deconvolution, two promising semi-blind MCMC methods havebeen suggested [9, 18]. However, these sampling methods have the disadvantagethat convergence may be slow.An alternative to Monte Carlo integration is a variational approximation tothe posterior distribution, and this approach is adopted in this paper. Theseapproximations have been extensively exploited to conduct inference in graph-ical models [19]. If properly designed, they can produce an analytical poste-rior distribution from which Bayesian estimators can be eﬃciently computed.Compared to MCMC, variational methods are of lower computational complex- A Laplace distribution as a prior distribution acts as a sparse regularization using (cid:96) norm. This can be seen by taking negative logarithm on the distribution. et al. [31],and the related method of Shan et al. [32] under a motion blur model.To implement variational Bayesian inference, prior distributions and the4nstrument-dependent likelihood function are speciﬁed. Then the posterior dis-tributions are estimated by minimizing the Kullback-Leibler (KL) distance be-tween the model and the empirical distribution. Simulations conducted on syn-thetic images show that the resulting myopic deconvolution algorithm outper-forms previous mismatched non-blind algorithms and competes with the previ-ous MCMC-based semi-blind method [9] with lower computational complexity.We illustrate the proposed method on real data from magnetic resonanceforce microscopy (MRFM) experiments. MRFM is an emerging molecular imag-ing modality that has the potential for achieving 3D atomic scale resolution[33, 34, 35]. Recently, MRFM has successfully demonstrated imaging [36, 37] ofa tobacco mosaic virus [38]. The 3D image reconstruction problem for MRFMexperiments was investigated with Wiener ﬁlters [39, 40, 37], iterative leastsquare reconstruction approaches [41, 38, 42], and recently the Bayesian esti-mation framework [10, 43, 8, 9]. The drawback of these approaches is that theyrequire prior knowledge on the PSF. However, in many practical situations ofMRFM imaging, the exact PSF, i.e., the response of the MRFM tip, is onlypartially known [3]. The proposed semi-blind reconstruction method accountsfor this partial knowledge.The rest of this paper is organized as follows. Section 2 formulates theimaging deconvolution problem in a hierarchical Bayesian framework. Section3 covers the variational methodology and our proposed solutions. Section 4reports simulation results and an application to the real MRFM data. Section5 discusses our ﬁndings and concludes.

2. Formulation

As in [9, 43], the image model is deﬁned as: y = Hx + n = T ( κ , x ) + n , (1)5here y is a P × x = [ x , . . . , x N ] T (cid:23) N × T ( κ , · ) is a convolution operator withthe PSF κ , H = [ h , . . . , h N ] is an equivalent system matrix, and n is themeasurement noise vector. In this work, the noise vector n is assumed to beGaussian , n ∼ N (cid:0) , σ I P (cid:1) . The PSF κ is assumed to be unknown but anominal PSF estimate κ is available. The semi-blind deconvolution problemaddressed in this paper consists of the joint estimation of x and κ from thenoisy measurements y and nominal PSF κ . The nominal PSF κ is assumed to be generated with known parameters(gathered in the vector ζ ) tuned during imaging experiments. However, due tomodel mismatch and experimental errors, the true PSF κ may deviate from thenominal PSF κ . If the generation model for PSFs is complex, direct estimationof a parameter deviation, ∆ ζ = ζ true − ζ , is diﬃcult.We model the PSF κ (resp. { H } ) as a perturbation about a nominal PSF κ (resp. { H } ) with K basis vectors κ k , k = 1 , . . . , K , that span a subspacerepresenting possible perturbations ∆ κ . We empirically determined this basisusing the following PSF variational eigendecomposition approach. A numberof PSFs ˜ κ are generated following the PSF generation model with parameters ζ randomly drawn according to Gaussian distribution centered at the nom-inal values ζ . Then a standard principal component analysis (PCA) of theresiduals { ˜ κ j − κ } j =1 ,... is used to identify K principal axes that are associ-ated with the basis vectors κ k . The necessary number of basis vectors, K , isdetermined empirically by detecting a knee at the scree plot. The ﬁrst feweigenfunctions, corresponding to the ﬁrst few largest eigenvalues, explain majorportion of the observed perturbations. If there is no PSF generation model, thenwe can decompose the support region of the true (suspected) PSF to produce N ( µ , Σ ) denotes a Gaussian random variable with mean µ and covariance matrix Σ . The variances of the Gaussian distributions are carefully tuned so that their standarddeviations produce a minimal volume ellipsoid that contains the set of valid PSFs.

6n orthonormal basis. The necessary number of the bases is again chosen toexplain most support areas that have major portion/energy of the desired PSF.This approach is presented in our experiment with Gaussian PSFs.We use a basis expansion to present κ ( c ) as the following linear approxima-tion to κ , κ ( c ) = κ + K (cid:88) i =1 c i κ i , (2)where the { c i } determine the PSF relative to this bases. With this parameter-ization, the objective of semi-blind deconvolution is to estimate the unknownimage, x , and the linear expansion coeﬃcients c = [ c , . . . , c K ] T . The priors on the PSF, the image, and the noise are constructed as latentvariables in a hierarchical Bayesian model.

Under the hypothesis that the noise in (1) is white Gaussian, the likelihoodfunction takes the form p (cid:0) y | x , c , σ (cid:1) = (cid:18) πσ (cid:19) P × exp (cid:32) − (cid:107) y − T ( κ ( c ) , x ) (cid:107) σ (cid:33) , (3)where (cid:107)·(cid:107) denotes the (cid:96) norm (cid:107) x (cid:107) = x T x . To induce sparsity and positivity of the image, we use an image prior con-sisting of “a mixture of a point mass at zero and a single-sided exponentialdistribution” [10, 43, 9]. This prior is a convex combination of an atom at zeroand an exponential distribution: p ( x i | a, w ) = (1 − w ) δ ( x i ) + wg ( x i | a ) . (4)7n (4), δ ( · ) is the Dirac delta function, w = P ( x i (cid:54) = 0) is the prior proba-bility of a non-zero pixel and g ( x i | a ) = a exp (cid:0) − x i a (cid:1) R ∗ + ( x i ) is a single-sidedexponential distribution where R ∗ + is a set of positive real numbers and E ( · )denotes the indicator function on the set E : E ( x ) =  , if x ∈ E ;0 , otherwise. (5)A distinctive property of the image prior (4) is that it can be expressed as alatent variable model p ( x i | a, z i ) = (1 − z i ) δ ( x i ) + z i g ( x i | a ) , (6)where the binary variables { z i } N are independent and identically distributedand indicate if the pixel x i is active z i =  , if x i (cid:54) = 0;0 , otherwise. (7)and have the Bernoulli probabilities: z i ∼ Ber ( w ).The prior distribution of pixel value x i in (4) can be rewritten conditionallyupon latent variable z i p ( x i | z i = 0) = δ ( x i ) ,p ( x i | a, z i = 1) = g ( x i | a ) , which can be summarized in the following factorized form p ( x i | a, z i ) = δ ( x i ) − z i g ( x i | a ) z i . (8)By assuming each component x i to be conditionally independent given z i and8 , the following conditional prior distribution is obtained for x : p ( x | a, z ) = N (cid:89) i =1 (cid:2) δ ( x i ) − z i g ( x i | a ) z i (cid:3) (9)where z = [ z , . . . , z N ].This factorized form will turn out to be crucial for simplifying the variationalBayes reconstruction algorithm in Section 3. We assume that the PSF parameters c , . . . , c K are independent and c k isuniformly distributed over intervals S k = [ − ∆ c k , ∆ c k ] . (10)These intervals are speciﬁed a priori and are associated with error tolerancesof the imaging instrument. The joint prior distribution of c = [ c , . . . , c K ] T istherefore: p ( c ) = K (cid:89) k =1 c k S k ( c k ) . (11) A conjugate inverse-Gamma distribution with parameters ς and ς is as-sumed as the prior distribution for the noise variance (See Appendix A.1 forthe details of this distribution): σ | ς , ς ∼ IG ( ς , ς ) . (12)The parameters ς and ς will be ﬁxed to a number small enough to obtain avague hyperprior, unless we have good prior knowledge. As reported in [10, 43], the values of the hyperparameters { a, w } greatlyimpact the quality of the deconvolution. Following the approach in [9], we9ropose to include them within the Bayesian model, leading to a second level ofhierarchy in the Bayesian paradigm. This hierarchical Bayesian model requiresthe deﬁnition of prior distributions for these hyperparameters, also referred toas hyperpriors which are deﬁned below. a A conjugate inverse-Gamma distribution is assumed for the Laplacian scaleparameter a : a | α ∼ IG ( α , α ) , (13)with α = [ α , α ] T . The parameters α and α will be ﬁxed to a number smallenough to obtain a vague hyperprior, unless we have good prior knowledge. w We assume a Beta random variable with parameters ( β , β ), which areiteratively updated in accordance with data ﬁdelity. The parameter values willreﬂect the degree of prior knowledge and we set β = β = 1 to obtain a non-informative prior. (See Appendix A.2 for the details of this distribution) w ∼ B ( β , β ) . (14) The conditional relationships between variables is illustrated in Fig. 1. Theresulting posterior of hidden variables given the observation is p ( x , a, z , w, c , σ | y ) ∝ p ( y | x , c , σ ) × p ( x | a, z ) p ( z | w ) p ( w ) p ( a ) p ( c ) p ( σ ) . (15)Since it is too complex to derive exact Bayesian estimators from this posterior,a variational approximation of this distribution is proposed in the next section.10 igure 1: Conditional relationships between variables. A node at an arrow tail conditions thenode at the arrow head.

3. Variational Approximation

In this section, we show how to approximate the posterior densities within avariational Bayes framework. Denote by U the set of all hidden parameter vari-ables including the image variable x in the model, denoted by M . The hierarchi-cal model implies the Markov representation p ( y , U |M ) = p ( y | U , M ) p ( U |M ).Our objective is to compute the posterior p ( x | y , M ) = (cid:82) p ( y | U , M ) p ( U |M ) d U \ x /p ( y |M ),where U \ x is a set of variables in U except x . Let q be any arbitrary distributionof U . Then ln p ( y |M ) = L ( q ) + KL( q (cid:107) p ) (16)with L ( q ) = (cid:90) q ( U |M ) ln (cid:18) p ( y , U |M ) q ( U |M ) (cid:19) d U (17)KL( q (cid:107) p ) = − (cid:90) q ( U |M ) ln (cid:18) p ( U | y , M ) q ( U |M ) (cid:19) d U . (18)We observe that maximizing the lower bound L ( q ) is equivalent to mini-mizing the Kullback-Leibler (KL) divergence KL( q (cid:107) p ). Consequently, insteadof directly evaluating p ( y |M ) given M , we will specify a distribution q ( U |M )that approximates the posterior p ( U | y , M ). The best approximation maximizes L ( q ). We present Algorithm 1 that iteratively increases the value of L ( q ) byupdating posterior surrogate densities. To obtain a tractable approximatingdistribution q , we will assume a factorized form as q ( U ) = (cid:81) j q ( U j ) where11 has been partitioned into disjoint groups U j . Subject to this factorizationconstraint, the optimal distribution q ∗ ( U ) = (cid:81) j q ∗ ( U j ) is given byln q ∗ j ( U j ) = E \ U j [ln p ( U , y )] + (const) , ∀ j (19)where E \ U j denotes the expectation with respect to all factors U i except i = j .We will call q ∗ ( U ) the posterior surrogate for p . Based on our assumptions on the image and hidden parameters, the randomvector is U (cid:44) { θ , φ } = { x , a, z , w, c , σ } with θ = { x , z , c } and φ = (cid:8) a, w, σ (cid:9) .We propose the following factorized approximating distribution q ( U ) = q ( x , a, z , w, c , σ ) = q ( x , z , c ) q ( a, w, σ ) . (20)Ignoring constants , (19) leads toln q ( a, w, σ ) = E \ a ln p ( x | a, z ) p ( a ) (cid:124) (cid:123)(cid:122) (cid:125) ln q ( a ) +E \ w ln p ( z | w ) p ( w ) (cid:124) (cid:123)(cid:122) (cid:125) ln q ( w ) + E \ σ ln p ( y | x , σ ) p ( σ ) (cid:124) (cid:123)(cid:122) (cid:125) ln q ( σ ) (21)which induces the factorization q ( φ ) = q ( a ) q ( w ) q ( σ ) . (22) In the sequel, we use both E [ · ] and (cid:104)·(cid:105) to denote the expectation. To make our expressionsmore compact, we use subscripts to denote expectation with respect to the random variablesin the subscripts. These notations with the subscripts of ‘ \ v ’ denote expectation with respectto all random variables except for the variable v . e.g. E \ U j In the sequel, constant terms with respect to the variables of interest can be omitted inequations. x , z and c is q ( θ ) = (cid:34)(cid:89) i q ( x i | z i ) (cid:35) q ( z ) q ( c ) (23)leading to the fully factorized distribution q ( θ , φ ) = (cid:34)(cid:89) i q ( x i | z i ) (cid:35) q ( a ) q ( z ) q ( w ) q ( c ) q ( σ ) (24) q In this section, we specify the marginal distributions in the approximatedposterior distribution required in (24). More details are described in AppendixB. The parameters for the posterior distributions are evaluated iteratively dueto the mutual dependence of the parameters in the distributions for the hiddenvariables, as illustrated in Algorithm 1. aq ( a ) = IG (˜ α , ˜ α ) , (25)with ˜ α = α + (cid:80) (cid:104) z i (cid:105) , ˜ α = α + (cid:80) (cid:104) z i x i (cid:105) . wq ( w ) = B ( ˜ β , ˜ β ) , (26)with ˜ β = β + N − (cid:80) (cid:104) z i (cid:105) , ˜ β = β + (cid:80) (cid:104) z i (cid:105) . σ q ( σ ) = IG (˜ ς , ˜ ς ) , (27)with ˜ ς = P/ ς , ˜ ς = (cid:104)(cid:107) y − Hx (cid:107) (cid:105) / ς , and (cid:104)(cid:107) y − Hx (cid:107) (cid:105) = (cid:107) y −(cid:104) H (cid:105)(cid:104) x (cid:105)(cid:107) + (cid:80) var[ x i ] (cid:2) (cid:107)(cid:104) κ (cid:105)(cid:107) + (cid:80) l σ c l (cid:107) κ l (cid:107) (cid:3) + (cid:80) l σ c l (cid:107) H l (cid:104) x (cid:105)(cid:107) , where σ c l is the variance of13he Gaussian distribution q ( c l ) given in (33) and var[ x i ] is computed under thedistribution q ( x i ) deﬁned in the next section and described in Appendix B.3. x We ﬁrst note thatln q ( x , z ) = ln q ( x | z ) q ( z ) = E (cid:2) ln p ( y | x , σ ) p ( x | a, z ) p ( z | w ) (cid:3) . (28)The conditional density of x given z is p ( x | a, z ) = (cid:81) Ni g z i ( x i ), where g ( x i ) (cid:44) δ ( x i ) , g ( x i ) (cid:44) g ( x i | a ). Therefore, the conditional posterior surrogate for x i is q ( x i | z i = 0) = δ ( x i ) , (29) q ( x i | z i = 1) = φ + ( µ i , η i ) , (30)where φ + ( µ, σ ) is a positively truncated-Gaussian density function with thehidden mean µ and variance σ , η i = 1 / [ (cid:104)(cid:107) h i (cid:107) (cid:105)(cid:104) /σ (cid:105) ], µ i = η i [ (cid:104) h Ti e i (cid:105)(cid:104) /σ (cid:105) −(cid:104) /a (cid:105) ], e i = y − Hx − i , x − i is x except for the i th entry replaced with 0, and h i is the i th column of H . Therefore, q ( x i ) = q ( z i = 0) δ ( x i ) + q ( z i = 1) φ + ( µ i , η i ) , (31)which is a Bernoulli truncated-Gaussian density. z For i = 1 , . . . , N , q ( z i = 1) = 1 / [1 + C (cid:48) i ] and q ( z i = 0) = 1 − q ( z i = 1) , (32)with C (cid:48) i = exp( C i / × ˜ ς / ˜ ς + µ i ˜ α / ˜ α + ln ˜ α − ψ (˜ α ) + ψ ( ˜ β ) − ψ ( ˜ β )). ψ isthe digamma function and C i = (cid:104)(cid:107) h i (cid:107) (cid:105) ( µ i + η i ) − (cid:104) e Ti h i (cid:105) µ i .14 .3.6. Posterior surrogate for c For j = 1 , . . . , K , q ( c j ) = φ ( µ c j , σ c j ) , (33)where φ ( µ, σ ) is the probability density function for the normal distribution withthe mean µ and variance σ , µ c j = (cid:104) x T H jT y − xH jT H x − (cid:80) l (cid:54) = j x T H jT H l c l x (cid:105)(cid:104) x T H jT H j x (cid:105) ,and 1 /σ c j = (cid:104) /σ (cid:105)(cid:104) x T H jT H j x (cid:105) . Algorithm 1

VB semi-blind image reconstruction algorithm % Initialization: Initialize estimates (cid:104) x (0) (cid:105) , (cid:104) z (0) (cid:105) , and w (0) , and set c = to have ˆ κ (0) = κ , % Iterations: for t = 1 , , . . . , do Evaluate ˜ α ( t )0 , ˜ α ( t )1 in (25) by using (cid:104) x ( t − (cid:105) , (cid:104) z ( t − (cid:105) , Evaluate ˜ β ( t )0 , ˜ β ( t )1 in (26) by using (cid:104) z ( t − (cid:105) , Evaluate ˜ ς ( t )0 , ˜ ς ( t )1 in (27) from (cid:104)(cid:107) y − Hx (cid:107) (cid:105) , for i = 1 , , . . . , N do Evaluate necessary statistics ( µ i , η i ) for q ( x i | z i = 1) in (29), Evaluate q ( z i = 1) in (32), Evaluate (cid:104) x i (cid:105) , var[ x i ], For l = 1 , . . . , K , evaluate µ c l , /σ c l for q ( c l ) in (33), end for end for The ﬁnal iterative algorithm is presented in Algorithm 1, where required shaping parameters under distributional assumptions and related statistics areiteratively updated.

4. Simulation Results

We ﬁrst present numerical results obtained for Gaussian and typical MRFMPSFs, shown in Fig. 2 and Fig. 6, respectively. Then the proposed variationalalgorithm is applied to a tobacco virus MRFM data set. There are many pos-sible approaches to selecting hyperparameters, including the non-informativeapproach of [9] and the expectation-maximization approach of [12]. In our ex-periments, hyper-parameters ς , ς , α , and α for the densities are chosen based15n the framework advocated in [9]. This leads to the vague priors correspondingto selecting small values ς = ς = α = α = 1. For w , the noninformativeinitialization is made by setting β = β = 1, which gives ﬂexibility to the sur-rogate posterior density for w . The resulting prior Beta distribution for w is auniform distribution on [0 ,

1] for the mean proportion of non-zero pixels. w ∼ B ( β , β ) ∼ U ([0 , . (34)The initial image used to initialize the algorithm is obtained from one Landwe-ber iteration [44]. The true image x used to generate the data, observation y , the true PSF, andthe initial, mismatched PSF are shown in Fig. 2. Some quantities of interest,computed from the outputs of the variational algorithm are depicted as functionsof the iteration number in Fig. 3. These plots indicate that convergence to thesteady state is achieved after few iterations. In Fig. 3, E [ w ] and E [1 /a ] getclose to the true level but E (cid:2) /σ (cid:3) shows a deviation from the true values.This large deviation implies that our estimation of noise level is conservative;the estimated noise level is larger than the true level. This relates to the largedeviation in projection error from noise level (Fig. 3(a)). The drastic changes inthe initial steps seen in the curves of E [1 /a ] , E [ w ] are due to the imperfect priorknowledge (initialization). The ﬁnal estimated PSF and reconstructed imageare depicted in Fig. 4, along with the reconstructed variances and posteriorprobability of z i (cid:54) = 0. We decomposed the support region of the true PSF toproduce orthonormal bases { κ i } i shown in Fig. 5. We extracted 4 bases becausethese four PSF bases clearly explain the signiﬁcant part of the true GaussianPSF. In other words, little energy resides outside of this basis set in PSF space.The reconstructed PSF clearly matches the true one, as seen in Fig. 2 andFig. 4. Note that the restored image is slightly attenuated while the restoredPSF is ampliﬁed because of intrinsic scale ambiguity.16 a) True image x (b) Obsevation(c) True PSF (d) Mismatched PSFFigure 2: Experiment with Gaussian PSF: true image, observation, true PSF, and mismatchedPSF ( κ ).(a) log (cid:107) y − E H E x (cid:107) (solid line) and noiselevel (dashed line) (b) log (cid:107) x true − E x (cid:107) (c) E[1 /a ] (solid line)and true value (dashedline)(d) E (cid:2) /σ (cid:3) (solid line)and true value (dashedline) (e) E[ w ] (solid line) andtrue value (dashed line) (f) E[ c ]. Four PSF coef-ﬁcients.Figure 3: Result of Algorithm 1: curves of residual, error, E [1 /a ] , E (cid:2) /σ (cid:3) , E [ w ] , E [ c ], asfunctions of number of iterations. These curves show how fast the convergence is achieved. The true image x used to generate the data, observation y , the true PSF,and the initial, mismatched PSF are shown in Fig. 6. The PSF models the PSFof the MRFM instrument, derived by Mamin et al. [3]. The convergence ofthe algorithm is achieved after the 10th iteration. The reconstructed image can17 a) Estimated PSF (b) Estimated image(c) Variance map (d) Weight mapFigure 4: (a) Restored PSF, (b) image, (c) map of pixel-wise (posterior) variance, and (d)weight map. ˆ κ = E κ is close to the true one. A pixel-wise weight shown in (d) is the posteriorprobability of the pixel being a nonzero signal. be compared to the true image in Fig. 7, where the pixel-wise variances andposterior probability of z i (cid:54) = 0 are rendered. The PSF bases are obtained bythe procedure proposed in Section 2.2 with the simpliﬁed MRFM PSF modeland the nominal parameter values [10]. Speciﬁcally, by detecting a knee K = 4at the scree plot, explaining more than 98.69% of the observed perturbations(Fig. 3 in [9]), we use the ﬁrst four eigenfunctions, corresponding to the ﬁrst fourlargest eigenvalues. The resulting K = 4 principal basis vectors are depictedin Fig. 8. The reconstructed PSF with the bases clearly matches the true one,as seen in Fig. 6 and Fig. 7. The results from the variational deconvolution algorithm with a mismatchedGaussian PSF and a MRFM type PSF are presented in Fig. 9 and Fig. 10, re-spectively; the relevant PSFs and observations are presented in Fig. 2 in Section4.1 and in Fig. 6 in Section 4.2, respectively. Compared with the results of ourVB semi-blind algorithm (Algorithm 1), shown in Fig. 4 and Fig. 7, the recon-structed images from the mismatched non-blind VB algorithm in Fig. 9 and18 a) The ﬁrst basis κ (b) The second basis κ (c) The third basis κ (d) The fourth basis κ Figure 5: PSF bases, κ , . . . , κ , for Gaussian PSF. Fig. 10, respectively, inaccurately estimate signal locations and blur most of thenon-zero values.Additional experiments (not shown here) establish that the PSF estimatoris very accurate when the algorithm is initialized with the true image.

To quantify the comparison, we performed experiments with the same setof four sparse images and the MRFM type PSFs as used in [9]. By generating100 diﬀerent noise realizations for 100 independent trials with each true image,we measured errors according to various criteria. We tested four sparse imageswith sparsity levels (cid:107) x (cid:107) = 6 , , , , Fig. 11 visualizes the reconstruction error performancefor several measures of error. From these ﬁgures we conclude that the VB semi-blind algorithm performs at least as well as the previous MCMC semi-blind al- Note that the (cid:96) norm has been normalized. The true image has value 1; (cid:107) ˆ x (cid:107) / (cid:107) x (cid:107) isused for MCMC method; E [ w ] × N/ (cid:107) x (cid:107) for variational method since this method does notproduce zero pixels but E [ w ].Note also that, for our simulated data, the (normalized) true noise levels are (cid:107) n (cid:107) / (cid:107) x (cid:107) =0 . , . , . , . (cid:107) x (cid:107) = 6 , , ,

30, respectively. a) True image x (b) Obsevation(c) True PSF (d) Mismatched PSFFigure 6: Experiment with simpliﬁed MRFM PSF: true image, observation, true PSF, andmismatched PSF ( κ ). gorithm. In addition, the VB method outperforms AM [45] and the mismatchednon-blind MCMC [43] methods. In terms of PSF estimation, for very sparseimages the VB semi-blind method seems to outperform the MCMC method.Also, the proposed VB semi-blind method converges more quickly and requiresfewer iterations. For example, the VB semi-blind algorithm converges in ap-proximately 9.6 seconds after 12 iterations, but the previous MCMC algorithmtakes more than 19.2 seconds after 40 iterations to achieve convergence .In addition, we made comparisons between our sparse image reconstructionmethod and other state-of-the-art blind deconvolution methods [27, 29, 30, 28,31, 32], as shown in our previous work [9]. These algorithms were initialized withthe nominal, mismatched PSF and were applied to the same sparse image as ourexperiment above. For a fair comparison, we made a sparse prior modiﬁcationin the image model of other algorithms, as needed. Most of these methods donot assume or ﬁt into the sparse model in our experiments, thus leading to poorperformance in terms of image and PSF estimation errors. Among these tested The convergence here is deﬁned as the state where the change in estimation curves overtime is negligible. a) Estimated PSF (b) Estimated image(c) Variance map (d) Weight mapFigure 7: Restored PSF and image with pixel-wise variance and weight map. ˆ κ = E κ is closeto the true one. algorithms, two of them, proposed by Tzikas et al. [28] and Almeida et al. [30],produced non-trivial and convergent solutions and the corresponding results arecompared to ours in Fig. 11. By using basis kernels the method proposed byTzikas et al. [28] uses a similar PSF model to ours. Because a sparse imageprior is not assumed in their algorithm [28], we applied their suggested PSFmodel along with our sparse image prior for a fair comparison. The methodproposed by Almeida et al. [30] exploits the sharp edge property in naturalimages and uses initial, high regularization for eﬀective PSF estimation. Bothof these perform worse than our VB method as seen in Fig. 11. The remainingalgorithms [27, 29, 31, 32], which focus on photo image reconstruction or motionblur, either produce a trivial solution (ˆ x ≈ y ) or are a special case of Tzikas’smodel [28].To show lower bound our myopic reconstruction algorithm, we used the Iter-ative Shrinkage/Thresholding (IST) algorithm with a true PSF. This algorithmeﬀectively restores sparse images with a sparsity constraint [46]. We demon-strate comparisons of the computation time of our proposed reconstruction Matlab is used under Windows 7 Enterprise and HP-Z200 (Quad 2.66 GHz) platform. a) The ﬁrst basis κ (b) The second basis κ (c) The third basis κ (d) The fourth basis κ Figure 8: PSF bases, κ , . . . , κ , for MRFM PSF. algorithm to that of others in Table 1. Table 1: Computation time of algorithms (in seconds), for the data in Fig. 6.

Our method 9.58semi-blind MC [9] 19.20Bayesian nonblind [43] 3.61AM [45] 0.40Almeida’s method [30] 5.63Amizic’s method [29] 5.69Tzikas’s method [28] 20.31(oracle) IST [46] 0.09

We applied the proposed variational semi-blind sparse deconvolution algo-rithm to the tobacco mosaic virus data, made available by our IBM collaborators[38], shown in the ﬁrst row in Fig. 12. Our algorithm is easily modiﬁable to these3D raw image data and 3D PSF with an additional dimension in dealing withbasis functions to evaluate each voxel value x i . The noise is assumed Gaussian22 a) True image (b) Estimated image(c) Variance map (d) Weight mapFigure 9: (mismatched) Non-blind result with a mismatched Gaussian PSF. [36, 38] and the four PSF bases are obtained by the procedure proposed in 2.2with the physical MRFM PSF model and the nominal parameter values [3]. Thereconstruction of the 6th layer is shown in Fig. 12(b), and is consistent with theresults obtained by other methods. (see [9, 43].) The estimated deviation inPSF is small, as predicted in [9].While they now exhibit similar smoothness, the VB and MCMC images arestill somewhat diﬀerent since each algorithm follows diﬀerent iterative trajectoryin the high dimensional space of 3D images, thus converging possibly to slightlydiﬀerent stopping points near the maximum of the surrogate distribution. Weconclude that the two images from VB and MCMC are comparable in that bothrepresent the 2D SEM image well, but VB is signiﬁcantly faster. In blind deconvolution, joint identiﬁability is a common issue. For example,because of scale ambiguity, the unicity cannot be guaranteed in a general set-ting. It is not proven in our solution either. However, the shift/time ambiguityissue noticed in [47] is implicitly addressed in our method using a nominal andbasis PSFs. Moreover, our constraint on the PSF space using a basis approach23 a) True image (b) Estimated image(c) Variance map (d) Weight mapFigure 10: (mismatched) Non-blind result with a mismatched MRFM type PSF. eﬀectively excludes a delta function as a PSF solution, thus avoiding the trivialsolution. Secondly, the PSF solution is restricted to this linear spanning space,starting form the initial, mismatched PSF. We can, therefore, reasonably expectthat the solution provided by the algorithm is close to the true PSF, away fromthe trivial solution or the initial PSF.To resolve scale ambiguity in a MCMC Bayesian framework, stochastic samplersare proposed in [47] by imposing a ﬁxed variance on a certain distribution . An-other approach to resolve the scale ambiguity is to assume a hidden scale variablethat is multiplied to the PSF and dividing the image (or vice versa.), where thescale is drawn along each iteration of the Gibbs sampler [48].

5. Conclusion

We suggested a novel variational solution to a semi-blind sparse deconvolu-tion problem. Our method uses Bayesian inference for image and PSF restora-tion with a sparsity-inducing image prior via the variational Bayes approxima- We note that this MCMC method designed for 1D signal deconvolution is not eﬃcient foranalyzing 2D and 3D images, since the grouped and marginalized samplers are usually slowto converge requiring hundreds of iterations [47]. a) (cid:107) ˆ x (cid:107) / (cid:107) x (cid:107) (b) (cid:107) x (cid:107) x (cid:107) − ˆ x (cid:107) ˆ x (cid:107) (cid:107) / (cid:107) x (cid:107) (c) (cid:107) y − ˆ y (cid:107) / (cid:107) x (cid:107) (d) (cid:107) ˆ κ (cid:107) ˆ κ (cid:107) − κ (cid:107) κ (cid:107) (cid:107) Figure 11: For various image sparsity levels (x-axis: log (cid:107) x (cid:107) ), performance of several blind,semi-blind, and nonblind deconvolution algorithms: the proposed method (red), AM (blue),Almeida’s method (green), Tzikas’s method (cyan), semi-blind MC (black), mismatched non-blind MC (magenta). Errors are illustrated with standard deviations. (a): Estimated sparsity.Normalized true level is 1 (black circles). (b): Normalized error in reconstructed image. Forthe lower bound, information about the true PSF is only available to the oracle IST (blackcircles). (c): Residual (projection) error. The noise level appears in black circles. (d): PSFrecovery error, as a performance gauge of our semi-blind method. At the initial stage of thealgorithm, (cid:107) κ (cid:107) κ (cid:107) − κ (cid:107) κ (cid:107) (cid:107) = 0 . tion. Its power in automatically producing all required parameter values fromthe data merits further attention for the extraction of image properties andretrieval of necessary features.From the simulation results, we conclude that the performance of the VBmethod competes with MCMC methods in sparse image estimation, while re-quiring fewer computations. Compared to a non-blind algorithm whose mis-matched PSF leads to imprecise and blurred signal locations in the restoredimage, the VB semi-blind algorithm correctly produces sparse image estimates.The beneﬁts of this solution compared to the previous solution [9] are faster25 a) TMV raw data.(b) VB estimate (c) MC estimate (d) SEM [38]Figure 12: (a) TMV raw data, (b) estimated virus image by VB, (c) estimated virus imageby MCMC [9], and (d) virus image from electron microscope [38]. convergence and stability of the method. Appendix A. Useful Distributions

Appendix A.1. Inverse Gamma Distribution

The density of an inverse Gamma random variable X ∼ IG ( a, b ) is b a Γ( a ) x − a − exp( − bx ),for x ∈ (0 , ∞ ). E X − = a/b and E ln( X ) = ln( b ) − ψ ( a ). Appendix A.2. Beta Distribution

The density of a Beta random variable X ∼ B ( a, b ) is Γ( a )Γ( b )Γ( a + b ) x b − (1 − x ) a − , for x ∈ (0 , c ) = (cid:82) ∞ t c − e − t dt . The mean of B ( a, b ) is ba + b and E ln( B ( a, b )) = ψ ( b ) − ψ ( a + b ), where ψ is a digamma function. Appendix A.3. Positively Truncated Gaussian Distribution

The density of a truncated Gaussian random variable x i is denoted by x i ∼N + ( x i ; µ, η ), and its statistics used in the paper areE [ x i | x i >

0] = E [ N + ( x i ; µ, η )]= µ + √ η φ ( − µ/ √ η )1 − Φ ( − µ/ √ η ) , E (cid:2) x i | x i > (cid:3) = var[ x i | x i >

0] + (E [ x i | x i > = η + µ (E [ x i | x i > , where Φ is a cumulative distribution function for the standard normal distribution. ppendix B. Derivations of q ( · ) In this section, we derive the posterior densities deﬁned by variational Bayes frame-work in Section 3.

Appendix B.1. Derivation of q ( c ) We denote the expected value of the squared residual term by R = E (cid:107) y − Hx (cid:107) .For c l , l = 1 , . . . , K , R =E (cid:107) y − H x − (cid:88) l (cid:54) = j H l x c l − H j x c j (cid:107) = c j (cid:104) x T H jT H j x (cid:105) − c j (cid:104) x T H jT y − xH jT H x − (cid:88) l (cid:54) = j x T H jT H l c l x (cid:105) + const , where H j is the convolution matrix corresponding to the convolution with κ j . For i (cid:54) = j and i, j >

0, E( H i x ) T ( H j x ) = tr ( H iT H j (cov( x ) + (cid:104) x (cid:105)(cid:104) x T (cid:105) )) = ( H i (cid:104) x (cid:105) ) T ( H j (cid:104) x (cid:105) ),since tr ( H iT H j cov( x )) = tr ( H i D T H j D ) = (cid:80) k d k h ik h jk = 0. Here, cov( x ) is approxi-mated as a diagonal matrix D = diag( d , . . . , d n ). This is reasonable, especially whenthe expected recovered signal ˆ x exhibits high sparsity. Likewise, E( H x ) T ( H j x ) = κ T κ j (cid:80) i var[ x i ]+( H (cid:104) x (cid:105) ) T ( H j (cid:104) x (cid:105) ) and E( H j x ) T ( H j x ) = (cid:107) κ j (cid:107) (cid:80) i var[ x i ]+ (cid:107) H j (cid:104) x (cid:105)(cid:107) .Then, we factorize E (cid:2) − R σ (cid:3) = − ( c j − µ cj ) σ cj , with µ c j = (cid:104) x T H jT y − xH jT H x − (cid:80) l (cid:54) = j x T H jT H l c l x (cid:105)(cid:104) x T H jT H j x (cid:105) ,1 /σ c j = (cid:104) /σ (cid:105)(cid:104) x T H jT H j x (cid:105) .If we set the prior, p ( c j ), to be a uniform distribution over a wide range of thereal line that covers error tolerances, we obtain a normally distributed variationaldensity q ( c j ) = φ ( µ c j , σ c j ) with its mean µ c j and variance σ c j deﬁned above, becauseln q ( c j ) = E (cid:2) − R σ (cid:3) . By the independence assumption, q ( c ) = (cid:81) q ( c j ), so q ( c ) can beeasily evaluated. Appendix B.2. Derivation of q ( σ ) We evaluate R ignoring edge eﬀects; R = (cid:107) y − (cid:104) H (cid:105)(cid:104) x (cid:105)(cid:107) + (cid:80) var[ x i ][ (cid:107)(cid:104) κ (cid:105)(cid:107) + (cid:80) l σ c l (cid:107) κ l (cid:107) ] + (cid:80) l σ c l (cid:107) H l (cid:104) x (cid:105)(cid:107) . (cid:107) κ (cid:107) is a kernel energy in (cid:96) sense and the varianceterms add uncertainty, due to the uncertainty in κ , to the estimation of density.Applying (19), (ignoring constants)ln q ( σ ) = E \ σ (cid:2) ln p ( y | x , c , σ ) p ( σ ) p ( x | a, w ) p ( w ) p ( a ) (cid:3) = E x , c (cid:2) ln p ( y | x , σ ) (cid:3) + ln p ( σ )= − E x , c (cid:2) (cid:107) y − Hx (cid:107) (cid:3) σ − P σ + ln p ( σ ) . IG (˜ ς , ˜ ς ) (cid:44) q ( σ ) = IG ( P/ ς , (cid:104)(cid:107) y − Hx (cid:107) (cid:105) / ς ) . (E \ σ denotes expectation with respect to all variables except σ .) ppendix B.3. Derivation of q ( x ) For x i , i = 1 , . . . , N , R = E (cid:107) e i − h i x i (cid:107) with e i = y − Hx − i = y − H x − i − (cid:80) l H l c l x − i , h i = [ H + (cid:80) H l c l ] i = h i + (cid:80) h li c l = ( i th column of H ). Ignoringconstants, R = (cid:104)(cid:107) h i (cid:107) (cid:105) x i − (cid:104) h Ti e i (cid:105) x i .Using the orthogonality of the kernel bases and uncorrelatedness of c l ’s, we derivethe following terms (necessary to evaluate R ): (cid:104)(cid:107) h i (cid:107) (cid:105) = (cid:107) h i (cid:107) + (cid:80) l σ c l (cid:107) h li (cid:107) and, (cid:104) h Ti e i (cid:105) = (cid:104) h Ti (cid:105) ( y − (cid:104) H (cid:105)(cid:104) x − i (cid:105) ) − (cid:80) l var[ c l ] h liT H l (cid:104) x − i (cid:105) .Then, var[ x i ] = w (cid:48) i E (cid:2) x i | x i > (cid:3) − w (cid:48) i (E [ x i | x i > , E [ x i ] = w (cid:48) i E [ x i | x i > w (cid:48) i = q ( z i = 1) is the posterior weight for the normal distribution and 1 − w (cid:48) i isthe weight for the delta function. The required statistics of x i that are used to derivethe distribution above are obtained by applying Appendix A.3. Appendix B.4. Derivation of q ( z ) To derive q ( z i = 1) = (cid:104) z i (cid:105) , we evaluate the unnormalized version ˆ q ( z i ) of q ( z i ) andnormalize it. ln ˆ q ( z i = 1) = E \ z i (cid:104) − (cid:107) e i − h i x i (cid:107) σ − ln a − x i a + ln w (cid:105) with x i ∼ N + ( µ i , η i )and ln ˆ q ( z i = 0) = E \ z i (cid:104) − (cid:107) e i (cid:107) σ + ln(1 − w ) (cid:105) with x i = 0. The normalized version ofthe weight is q ( z i = 1) = 1 / [1 + C (cid:48) i ]. C (cid:48) i = exp(ln ˆ q ( z i = 0) − ln ˆ q ( z i = 1)) = exp( C i / ×(cid:104) /σ (cid:105) + µ (cid:104) /a (cid:105) + (cid:104) ln a (cid:105) + (cid:104) ln(1 − w ) − ln w (cid:105) = exp( C i / × ˜ ς / ˜ ς + µ ˜ α / ˜ α +ln ˜ α − ψ (˜ α )+ ψ ( ˜ β ) − ψ ( ˜ β )). ψ is a digamma function and C i = (cid:104)(cid:107) h i (cid:107) (cid:105) ( µ i + η i ) − (cid:104) e Ti h i (cid:105) µ i . References [1] R. Ward, B. Saleh, Deblurring random blur, IEEE Trans. Acoustics, Speech, Signal Processing35 (10) (1987) 1494–1498.[2] D. Kundur, D. Hatzinakos, Blind image deconvolution, IEEE Signal Processing Magazine 13 (3)(1996) 43–64.[3] J. Mamin, R. Budakian, D. Rugar, Point response function of an MRFM tip, Tech. rep., IBMResearch Division (Oct. 2003).[4] S. Makni, P. Ciuciu, J. Idier, J.-B. Poline, Joint detection-estimation of brain activity infunctional mri: A multichannel deconvolution solution, IEEE Trans. Signal Processing 53 (9)(2005) 3488–3502.[5] G. Pillonetto, C. Cobelli, Identiﬁability of the stochastic semi-blind deconvolution problem fora class of time-invariant linear systems, Automatica 43 (4) (2007) 647–654.[6] P. Sarri, G. Thomas, E. Sekko, P. Neveux, Myopic deconvolution combining Kalman ﬁlter andtracking control, in: Proc. IEEE Int. Conf. Acoust., Speech, and Signal (ICASSP), Vol. 3,1998, pp. 1833–1836.[7] G. Chenegros, L. M. Mugnier, F. Lacombe, M. Glanc, 3D phase diversity: a myopic deconvo-lution method for short-exposure images: application to retinal imaging, J. Opt. Soc. Am. A24 (5) (2007) 1349–1357.[8] S. U. Park, N. Dobigeon, A. O. Hero, Myopic sparse image reconstruction with applicationto MRFM, in: C. A. Bouman, I. Pollak, P. J. Wolfe (Eds.), Proc. Computational ImagingConference in IS&T SPIE Symposium on Electronic Imaging Science and Technology, Vol.7873, SPIE, 2011, pp. 787303/1–787303/14.[9] S. U. Park, N. Dobigeon, A. O. Hero, Semi-blind sparse image reconstruction with applicationto MRFM, IEEE Trans. Image Processing 21 (9) (2012) 3838 –3849.[10] M. Ting, R. Raich, A. O. Hero, Sparse image reconstruction for molecular imaging, IEEETrans. Image Processing 18 (6) (2009) 1215–1227.

11] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, NY, USA,2006.[12] N. Nasios, A. Bors, Variational learning for Gaussian mixture models, IEEE Trans. Systems,Man, Cybernet. Part B 36 (4) (2006) 849–862.[13] A. Corduneanu, C. M. Bishop, Variational Bayesian model selection for mixture distributions,in: Proc. Conf. Artiﬁcial Intelligence and Statistics, 2001, pp. 27–34.[14] K. Themelis, A. Rontogiannis, K. Koutroumbas, A novel hierarchical Bayesian approach forsparse semisupervised hyperspectral unmixing, IEEE Trans. Signal Processing 60 (2) (2012)585–599.[15] S. Makni, J. Idier, T. Vincent, B. Thirion, G. Dehaene-Lambertz, P. Ciuciu, A fully Bayesianapproach to the parcel-based detection-estimation of brain activity in fMRI., Neuroimage.[16] C. P. Robert, G. Casella, Monte Carlo Statistical Methods, 2nd Edition, Springer, 2004.[17] W. R. Gilks, Markov Chain Monte Carlo In Practice, Chapman and Hall/CRC, 1999.[18] F. Orieux, J.-F. Giovannelli, T. Rodet, Bayesian estimation of regularization and point spreadfunction parameters for Wiener-Hunt deconvolution, J. Opt. Soc. Am. A 27 (7) (2010) 1593–1607.[19] H. Attias, A variational Bayesian framework for graphical models, in: Proc. Adv. in NeuralInf. Process. Syst. (NIPS), MIT Press, 2000, pp. 209–215.[20] A. M. Walker, On the asymptotic behaviour of posterior distributions, Journal of the RoyalStatistical Society. Series B (Methodological) 31 (1) (1969) 80–88.[21] B. Wang, D. Titterington, Convergence and asymptotic normality of variational Bayesian ap-proximations for exponential family models with missing values, in: Proc. Conf. Uncertaintyin Artiﬁcial Intelligence (UAI), AUAI Press, 2004, pp. 577–584.[22] B. Wang, M. Titterington, Convergence properties of a general algorithm for calculating vari-ational Bayesian estimates for a normal mixture model, Bayesian Anal. 1 (3) (2006) 625–650.[23] C. M. Bishop, J. M. Winn, C. C. Nh, Non-linear Bayesian image modelling, in: Proc. Eur.Conf. Comput. Vis. (EECV), Springer-Verlag, 2000, pp. 3–17.[24] Z. Ghahramani, M. J. Beal, Variational inference for Bayesian mixtures of factor analysers, in:Proc. Adv. in Neural Inf. Process. Syst. (NIPS), MIT Press, 2000, pp. 449–455.[25] M. J. Beal, Z. Ghahramani, The variational Bayesian EM algorithm for incomplete data: withapplication to scoring graphical model structures, in: Bayesian Stat., Vol. 7, 2003, pp. 453–464.[26] J. Winn, C. M. Bishop, T. Jaakkola, Variational message passing, Journal of Machine LearningResearch 6 (2005) 661–694.[27] S. Babacan, R. Molina, A. Katsaggelos, Variational Bayesian blind deconvolution using a totalvariation prior, IEEE Trans. Image Processing 18 (1) (2009) 12–26.[28] D. Tzikas, A. Likas, N. Galatsanos, Variational Bayesian sparse kernel-based blind image de-convolution with Student’s-t priors, IEEE Trans. Image Processing 18 (4) (2009) 753–764.[29] B. Amizic, S. D. Babacan, R. Molina, A. K. Katsaggelos, Sparse Bayesian blind image decon-volution with parameter estimation, in: Proc. Eur. Signal Process. Conf. (EUSIPCO), Aalborg(Denmark), 2010, pp. 626–630.[30] M. Almeida, L. Almeida, Blind and semi-blind deblurring of natural images, IEEE Trans.Image Processing 19 (1) (2010) 36–52.[31] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, W. T. Freeman, Removing camera shakefrom a single photograph, in: ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, ACM, NewYork, NY, USA, 2006, pp. 787–794.[32] Q. Shan, J. Jia, A. Agarwala, High-quality motion deblurring from a single image, ACM Trans.Graphics 27 (3) (2008) 1.[33] J. A. Sidles, Noninductive detection of single-proton magnetic resonance, Appl. Phys. Lett.58 (24) (1991) 2854–2856.

34] J. A. Sidles, Folded stern-gerlach experiment as a means for detecting nuclear magnetic reso-nance in individual nuclei, Phys. Rev. Lett. 68 (8) (1992) 1124–1127.[35] J. A. Sidles, J. L. Garbini, K. J. Bruland, D. Rugar, O. Z¨uger, S. Hoen, C. S. Yannoni, Magneticresonance force microscopy, Rev. Mod. Phys. 67 (1) (1995) 249–265.[36] D. Rugar, C. S. Yannoni, J. A. Sidles, Mechanical detection of magnetic resonance, Nature360 (6404) (1992) 563–566.[37] O. Z¨uger, S. T. Hoen, C. S. Yannoni, D. Rugar, Three-dimensional imaging with a nuclearmagnetic resonance force microscope, J. Appl. Phys. 79 (4) (1996) 1881–1884.[38] C. L. Degen, M. Poggio, H. J. Mamin, C. T. Rettner, D. Rugar, Nanoscale magnetic resonanceimaging, Proc. Nat. Academy of Science 106 (5) (2009) 1313–1317.[39] O. Z¨uger, D. Rugar, First images from a magnetic resonance force microscope, Applied PhysicsLetters 63 (18) (1993) 2496–2498.[40] O. Z¨uger, D. Rugar, Magnetic resonance detection and imaging using force microscope tech-niques, J. Appl. Phys. 75 (10) (1994) 6211–6216.[41] S. Chao, W. M. Dougherty, J. L. Garbini, J. A. Sidles, Nanometer-scale magnetic resonanceimaging, Review Sci. Instrum. 75 (5) (2004) 1175–1181.[42] C. L. Degen, M. Poggio, H. J. Mamin, C. T. Rettner, D. Rugar, Nanoscale magnetic resonanceimaging. Supporting information, Proc. Nat. Academy of Science 106 (5).[43] N. Dobigeon, A. O. Hero, J.-Y. Tourneret, Hierarchical Bayesian sparse image reconstructionwith application to MRFM, IEEE Trans. Image Processing 18 (9) (2009) 2059–2070.[44] L. Landweber, An iteration formula for Fredholm integral equations of the ﬁrst kind, Amer.J. Math. 73 (3) (1951) 615–624.[45] K. Herrity, R. Raich, A. O. Hero, Blind deconvolution for sparse molecular imaging, in: Proc.IEEE Int. Conf. Acoust., Speech, and Signal (ICASSP), Las Vegas, USA, 2008, pp. 545–548.[46] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear in-verse problems with a sparsity constraint, Communications on Pure and Applied Mathematics57 (11) (2004) 1413–1457.[47] D. Ge, J. Idier, E. L. Carpentier, Enhanced sampling schemes for MCMC based blind Bernoul-liGaussian deconvolution, Signal Processing 91 (4) (2011) 759 –772.[48] T. Vincent, L. Risser, P. Ciuciu, Spatially adaptive mixture modeling for analysis of fMRI timeseries, IEEE Trans. Med. Imag. 29 (4) (2010) 1059–1074.34] J. A. Sidles, Folded stern-gerlach experiment as a means for detecting nuclear magnetic reso-nance in individual nuclei, Phys. Rev. Lett. 68 (8) (1992) 1124–1127.[35] J. A. Sidles, J. L. Garbini, K. J. Bruland, D. Rugar, O. Z¨uger, S. Hoen, C. S. Yannoni, Magneticresonance force microscopy, Rev. Mod. Phys. 67 (1) (1995) 249–265.[36] D. Rugar, C. S. Yannoni, J. A. Sidles, Mechanical detection of magnetic resonance, Nature360 (6404) (1992) 563–566.[37] O. Z¨uger, S. T. Hoen, C. S. Yannoni, D. Rugar, Three-dimensional imaging with a nuclearmagnetic resonance force microscope, J. Appl. Phys. 79 (4) (1996) 1881–1884.[38] C. L. Degen, M. Poggio, H. J. Mamin, C. T. Rettner, D. Rugar, Nanoscale magnetic resonanceimaging, Proc. Nat. Academy of Science 106 (5) (2009) 1313–1317.[39] O. Z¨uger, D. Rugar, First images from a magnetic resonance force microscope, Applied PhysicsLetters 63 (18) (1993) 2496–2498.[40] O. Z¨uger, D. Rugar, Magnetic resonance detection and imaging using force microscope tech-niques, J. Appl. Phys. 75 (10) (1994) 6211–6216.[41] S. Chao, W. M. Dougherty, J. L. Garbini, J. A. Sidles, Nanometer-scale magnetic resonanceimaging, Review Sci. Instrum. 75 (5) (2004) 1175–1181.[42] C. L. Degen, M. Poggio, H. J. Mamin, C. T. Rettner, D. Rugar, Nanoscale magnetic resonanceimaging. Supporting information, Proc. Nat. Academy of Science 106 (5).[43] N. Dobigeon, A. O. Hero, J.-Y. Tourneret, Hierarchical Bayesian sparse image reconstructionwith application to MRFM, IEEE Trans. Image Processing 18 (9) (2009) 2059–2070.[44] L. Landweber, An iteration formula for Fredholm integral equations of the ﬁrst kind, Amer.J. Math. 73 (3) (1951) 615–624.[45] K. Herrity, R. Raich, A. O. Hero, Blind deconvolution for sparse molecular imaging, in: Proc.IEEE Int. Conf. Acoust., Speech, and Signal (ICASSP), Las Vegas, USA, 2008, pp. 545–548.[46] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear in-verse problems with a sparsity constraint, Communications on Pure and Applied Mathematics57 (11) (2004) 1413–1457.[47] D. Ge, J. Idier, E. L. Carpentier, Enhanced sampling schemes for MCMC based blind Bernoul-liGaussian deconvolution, Signal Processing 91 (4) (2011) 759 –772.[48] T. Vincent, L. Risser, P. Ciuciu, Spatially adaptive mixture modeling for analysis of fMRI timeseries, IEEE Trans. Med. Imag. 29 (4) (2010) 1059–1074.