[PDF] Bayesian model selection for unsupervised image deconvolution with structured Gaussian priors

Abstract

This paper considers the objective comparison of stochastic models to solve inverse problems, more specifically image restoration. Most often, model comparison is addressed in a supervised manner, that can be time-consuming and partly arbitrary. Here we adopt an unsupervised Bayesian approach and objectively compare the models based on their posterior probabilities, directly from the data without ground truth available. The probabilities depend on the marginal likelihood or "evidence" of the models and we resort to the Chib approach including a Gibbs sampler. We focus on the family of Gaussian models with circulant covariances and unknown hyperparameters, and compare different types of covariance matrices for the image and noise.

Full PDF

BBAYESIAN MODEL SELECTION FOR UNSUPERVISED IMAGE DECONVOLUTIONWITH STRUCTURED GAUSSIAN PRIORS

B. Harrou´e , , J.-F. Giovannelli and M. Pereyra (1) IMS (Univ. Bordeaux, CNRS, B-INP), Talence, France(2) MACS, Heriot-Watt University, Edinburgh, United Kingdom ABSTRACT

This paper considers the objective comparison of stochasticmodels to solve inverse problems, more speciﬁcally imagerestoration. Most often, model comparison is addressed in asupervised manner, that can be time-consuming and partly ar-bitrary. Here we adopt an unsupervised Bayesian approachand objectively compare the models based on their poste-rior probabilities, directly from the data without ground truthavailable. The probabilities depend on the marginal likeli-hood or “evidence” of the models and we resort to the Chibapproach including a Gibbs sampler. We focus on the familyof Gaussian models with circulant covariances and unknownhyperparameters, and compare different types of covariancematrices for the image and noise.

1. INTRODUCTION

Image restoration is a subject of interest in many ﬁelds: med-ical imaging, astronomy, physics in general, and the literatureon the subject is extensive [1,2]. The recurrent difﬁculty mostoften comes from the badly-scaled character, then regulariza-tion is required. Regularization can be founded on probabilis-tic models, such as Markov models, Gaussian models, mix-tures of distributions,. . . In practice, it is necessary to knowthe structure of these models: neighborhood of a Markovmodels, type of covariance for a Gaussian, number of compo-nents of a mixture,. . . A common pragmatic approach consistsin giving oneself a family of models and comparing them inan empirical way. This approach has two disadvantages: itrequires time to supervise the study and it is partly arbitrary.The advantage of (automatic) model selection is then obvi-ous. There are many approaches [3, 4]: posterior probabili-ties, information criteria (IC): AIC, BIC, Deviance IC, Pre-dictive BIC, Generalized IC, Widely Applicable BIC,. . . Theapproach used here is optimal in the sense of the Bayesian de-cision [5]: the model with the highest posterior probability isselected among the candidate models. These probabilities arebased on an integral called evidence , which is often difﬁcultto compute especially in large dimensions. Here we resort tothe Chib approach itself based on a Gibbs sampler. See alsoour previous papers on the subject [6–9]. More speciﬁcally, the restoration relies on zero-meanGaussian models with stationary-circulant covariances andinclude the comparison of various types of covariance for theimage and the measurement noise. See also our preliminarypaper on the subject [10].

2. PROBLEM STATEMENT

Consider the Bayesian estimation of an unknown image x ∈ R P from a noisy and blurred observation y ∈ R P relatedto x by a linear model y = Hx + e , where H ∈ R P × P is a known blur operator and e ∈ R P is additive noise. Inthis paper, we suppose that there are K alternative modelsavailable to recover x from y and investigate model selectionprocedures to objectively compare the models, directly from y , without ground truth available.More precisely, we assume that x and e are Gaussianrandom vectors with mean zeros and covariance R x , R e ∈ R P × P . We adopt the representation R x = γ − x C x and R e = γ − e C e , where C x , C e ∈ R P × P deﬁne the covariance struc-ture and γ x , γ e > control the energy of x and e . We con-sider that γ x , γ e are unknown and deﬁne γ = [ γ x , γ e ] .We focus on the case where R x , R e and H are circulantmatrices diagnolizable in a discrete Fourier basis F :• R x = γ − x F † S x F , S x = diag (cid:2) s x ( p ) (cid:3) p =1 ...P • R e = γ − e F † S e F , S e = diag (cid:2) s e ( p ) (cid:3) p =1 ...P • H = F † S h F , S h = diag (cid:2) s h ( p ) (cid:3) p =1 ...P where S x and S e determine the power spectral density (PSD)of x and e , up to the scale factors γ x and γ e .Without loss of generality, here we consider the followingfour possible alternative models for S x and S e :Lorentz : 1 / [( πω )(1 + [ ν h /ω ] )(1 + [ ν v /ω ] )] Gauss : (2 πω ) − exp[ − ( ν h + ν v ) / (2 ω )] Laplace : (4 ω ) − exp[ − ( | ν h | + | ν v | ) /ω ] White : ( ν h , ν v ) where ω is a (ﬁxed) bandwidth parameter and ( ν h , ν v ) are thehorizontal and vertical image frequencies. We have chosen a r X i v : . [ s t a t . C O ] O c t hese models because they capture a rich variety of differentregularity and pixel correlation structures, see Fig. 1. Othermodels could be considered too. Fig. 1 . Different PSD structures. From left to right: Lorentz,Gauss, Laplace and White.Moreover, we model the scale parameters γ x , γ e as a priori independent and assign them conjugate gamma density: p ( γ x ) ∝ γ α x − x exp − β x γ x p ( γ e ) ∝ γ α e − e exp − β e γ e , with α (cid:63) and β (cid:63) set to very small values to obtain vague priors.Fig.2 depicts graphical structure of the considered proba-bilistic models. y x γ x α x , β x γ e α e , β e Fig. 2 . Graphical structure of the considered hierarchicalmodel; circles represent unknown quantities.The four alternative models for S x and S e result in K =16 possible models indexed by the variable M taking valuesin { , . . . , K } . Each model M deﬁnes a different posteriordistribution for x and will hence lead to potentially very dif-ferent estimates. The next section introduces a Bayesian ap-proach to objectively compare the K models in the absenceof ground truth.

3. BAYESIAN MODEL SELECTION BY USING THECHIB GIBBS EVIDENCE APPROXIMATION

Following Bayesian decision theory, we compare the K competing models by calculating the posterior probabilities p ( M = k | y ) , given for any k ∈ { , . . . , K } by p ( M = k | y ) = p ( y | M = k ) p ( M = k ) p ( y )= p ( y | M = k ) p ( M = k ) K (cid:80) l =1 p ( y | M = l ) p ( M = l ) , (1)where the so-called model evidence or marginal likelihood p ( y | M ) = (cid:90) (cid:90) γx p ( y , x , γ | M ) d γ d x , measures the likelihood of the data given the model M . Weuse the uniform prior p ( M = k ) = 1 /K reﬂecting that allmodels are equally likely a priori .The key challenge in implementing this Bayesian decisiontheoretic approach is to compute model evidences. In thispaper, we propose to address this difﬁculty by using the Chibapproach [11]. More precisely, note that for all γ ∈ R (cid:63) p ( y | M ) = p ( y , γ | M ) p ( γ | y , M )= p ( y | γ , M ) p ( γ | M ) p ( γ | y , M ) , (2)and note that the numerator is tractable for the consideredmodels. The denominator is not analytically tractable, butcan be conveniently expressed as the expectation p ( γ | y , M ) = (cid:90) x p ( γ , x | y , M ) d x = (cid:90) x p ( γ | x , y , M ) p ( x | y , M ) d x = E x | y , M [ p ( γ | x , y , M )] , which can be efﬁciently and accurately computed by MonteCarlo integration. Precisely, we draw G samples { x [ g ] } Gg =1 from p ( x | y , M ) and calculate the empirical mean (cid:101) p ( γ | y , M ) = 1 G G (cid:88) g =1 p ( γ | x [ g ] , y , M ) . (3)While the above expressions are valid for all γ , they are usu-ally evaluated at the posterior mean of γ | y to reduce the vari-ance of the empirical mean.Markov chain Monte Carlo algorithms [5, 12] are a stan-dard computation strategy to simulate the samples x [ g ] from p ( x | y , M ) . In particular, the Gibbs sampler is an approachof choice for the class of models considered in this pa-per. It iteratively constructs a Markov chain { x [ g ] , γ [ g ] } Gg =1 targeting the joint density p ( x , γ | y , M ) by alternativelysampling γ e , γ x and x from the conditional distributions p ( γ e | y , γ x , x , M ) , p ( γ x | y , γ e , x , M ) and p ( x | y , γ e , γ x , M ) evaluated at the current state of the chain.By marginalisation through projection, the drawn samples { γ [ g ] } Gg =1 ∼ p ( γ | y , M ) are used to calculate the mean ¯ γ = (cid:80) Gg =1 γ [ g ] /G , followed by the computation of (cid:101) p (¯ γ | y , M ) from the samples { x [ g ] } Gg =1 ∼ p ( x | y , M ) and (3).Implementing this approach requires knowledge of thefollowing ﬁve densities:1. p ( y | γ , M = k ) to compute the numerator (2),2. The conditional densities deﬁning the Gibbs sampler:• p ( x | y , γ e , γ x , M ) ,• p ( γ e | y , γ x , x , M ) , p ( γ x | y , γ e , x , M ) ,3. p ( γ | x , y , M ) to compute (3).To derive the likelihood p ( y | γ , M = k ) we use that x and e are zero-mean Gaussian vectors. Accordingly, y | γ isalso a Gaussian vectors with mean zero and covariance matrix R ky = HR ix H † + R je . Because R ix , R je and H are circulant, R ky is also circulant R ky = F † ( γ − x S h S ix S h † + γ − e S je ) F = F S ky F † , where S ky = diag (cid:2) s ky ( p ) (cid:3) p =1 ,...,P is the PSD of the data and s ky ( p ) = γ − x | s h ( p ) | s ix ( p ) + γ − e s je ( p ) is the variance associated to the p -th frequency. Moreover, det R ky = P (cid:89) p =1 s ky ( p ) et y † R ky − y = P (cid:88) p =1 | ◦ y ( p ) | s ky ( p ) and hence the likelihood is given by p ( y | γ , M ) =(2 π ) − P/ exp − P (cid:88) p =1 (cid:16) log s ky ( p ) + | ◦ y ( p ) | s ky ( p ) (cid:17) (4)The derivation of the conditional densities deﬁning theGibbs sampler follows from standard conjugacy results. Thevector x | y , γ e , γ x , M is Gaussian with mean and variance µ k = γ e Σ H t R je − y Σ k = (cid:104) γ e H t R je − H + R ix − (cid:105) − . Similarly, the precisions γ e | y , x , M and γ x | y , x , M aregamma densities with parameters given by (cid:40) α e,k = α e + P/ α x,k = α x + P/ (cid:40) β e,k = β e + (cid:107) y − Hx (cid:107) R je / β x,k = β x + (cid:107) x (cid:107) R ix / Lastly, using the fact that γ e and γ x are conditionally in-dependent given y , x , M , we obtain p ( γ | x , y , M ) = p ( γ e | y , x , M ) p ( γ x | y , x , M ) . Note that all the required matrix and vector products areefﬁciently computed by using Fourier basis representations.Similarly, one can efﬁciently simulate form p ( x | y , γ e , γ x , M ) by leveraging the fact that Σ k is diagonal on a Fourier basis.

4. NUMERICAL EXPERIMENTS

We now present an experiment with synthetic data designedto demonstrate the feasibility of the proposed Bayesian modelselection approach in an image processing setting.For each of the K = 16 models (we have 4 alternativemodels for S x and S e ), we generated synthetic blurredand noisy images of size × . The blur is a cardinal sineof unitary width and the true values are γ (cid:63) x = 6 and γ (cid:63) e = 4 .Then, for each of the K (true) models and each of the images, we have computed the K probabilities p ( M = k | y ) for k = 1 , . . . , K by running K Gibbs samplers and usingthe Chib approach. We then performed model selection byposterior maximisation ˆ k = arg max k p ( M = k | y ) . Theresults are given in Fig. 3, which shows the percentage ofthe number of times that each model was selected against thetruth k (cid:63) .We observe that the results are very accurate for all theconsidered conﬁgurations, with accuracy ranging from to depending on the speciﬁc conﬁguration (the con-ﬁgurations White/Laplace and

White/White seem to beparticularly easy to identify, whereas the conﬁgurations

Lorentz/Gauss and

Lorentz/White appear to be more dif-ﬁcult). The overall accuracy in this experiment is over . Fig. 3 . Confusion matrix (percentage). Candidate model (x-axis) and true model (y-axis).Note that the calculation of the K posterior probabilities p ( M = k | y ) for an image y relies on samples, thatrequires nearly seconds using MATLAB on a standard PCbecause all the computations and specially the sampling under p ( x | y , γ e , γ x , M ) are performed in the Fourier domain.Also, the evidences are computed in logarithmic scale toavoid overﬂow and underﬂow problems. Moreover, it is im-portant to translate the values before computing the linearcale and ﬁnally apply a factor to correct the translation.To produce this experiment we calculated × ×

50 =12 800 model evidences and never observed any convergenceor numerical stability issues.Furthermore, for illustration, Fig. 4 shows the evolutionof the approximation of the log-evidence (2) based on em-pirical mean (3) as a function of the number G of MonteCarlo samples, for one speciﬁc dataset and model conﬁgu-ration. For comparison, we also include the “exact” evidence p ( y | M = k ) calculated by a computationally intensive in-tegration of p ( y , γ x , γ e | M = k ) w.r.t. γ x , γ e over a ﬁnegrid. Observe that in the order of samples are required toobtain a stable approximation. It must be kept in mind thata very good precision for the log-evidence is required for agood precision on the probabilities. Fig. 4 . Approximation of the Log-Evidence as a function ofthe sample number G . The exact value is computed by a bruteforce method of integration.Lastly, it is worth noting that the Gibbs sampler also pro-duces approximation of the marginal posteriors p ( x | y , M = k ) , p ( γ x | y , M = k ) and p ( γ e | y , M = k ) . Fig. 5 showsthe traces of γ x and γ e and the associated histograms.

5. CONCLUSION AND PERSPECTIVES FORFUTURE WORK

We have presented a contribution to the automatic selection ofmodels for deconvolution. We have worked in the case of cir-culant Gaussian models which allows the marginalization ofthe unknown image and the easy manipulation of covariances,the major difﬁculty concerning the marginalization of hyper-parameters. Our strategy is optimal in the sense of a Bayesianrisk and is based on the choice of the most probable model.We therefore evaluate the probability of each candidate model

Fig. 5 . Samples of the simulated chains shown as a functionthe iteration index (left) and as histograms (right). Top: γ e and bottom γ x .and for this we need the evidence resulting from the marginal-ization of the unknown image and hyperparameters. Severaloptions are possible and we have opted for Chib’s approachand a Gibbs algorithm. We show excellent performances interms of selection, which encourages us to extend our work.In an extended version of the paper, we will make a completecomparison with other existing methods: Laplace’s approxi-mation, RJMCMC, WBIC which can also be used to computemodel probabilities. It will also be interesting to comparewith information criteria such as AIC or BIC.Among the perspectives, the non-circulant Gaussian case:a direct extension will resort to [13–15] to sample the image,the rest of the algorithm remaining unchanged. The extensionto non-Gaussian cases will be based on more advanced sam-pling tools [16, 17] but we will have to face a new difﬁcultyin relation with hyperparameters and partition functions. Wealso intend to include new hyperparameters, e.g. shape pa-rameters of the image and noise DSPs, such as the width ω .Naturally, processing of real data is also part of our plans.

6. REFERENCES [1] J.-F. Giovannelli and J. Idier, Eds.,

Regularization andBayesian Methods for Inverse Problems in Signal and ImageProcessing . London: ISTE and John Wiley & Sons Inc., 2015.[2] J. Kaipio and E. Somersalo,

Statistical and computational in-verse problems . Berlin, Germany: Springer, 2005.[3] T. Ando,

Bayesian model selection and statistical modeling .Boca Raton, USA: Chapman & Hall/CRC, 2010.[4] J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques:An overview,”

IEEE Signal Processing Magazine , vol. 35,no. 6, pp. 16–34, November 2018.[5] C. P. Robert,

The Bayesian Choice. From decision-theoreticfoundations to computational implementation , ser. SpringerTexts in Statistics. New York, USA: Springer Verlag, 2007.6] C. Vacar, J.-F. Giovannelli, and A.-M. Roman, “Bayesian tex-ture model selection by harmonic mean,” in

Proceedings of theInternational Conference on Image Processing , vol. 19, Or-lando, USA, September 2012, p. 5.[7] J.-F. Giovannelli and A. Giremus, “Bayesian noise model se-lection and system identiﬁcation based on approximation ofthe evidence,” in

Proceedings of the International Conferenceon Statistical Signal Processing (special session) , Gold Coast,Australia, June 2014, pp. 125–128.[8] A. Barbos, A. Giremus, and J.-F. Giovannelli, “Bayesian noisemodel selection and system identiﬁcation using Chib approxi-mation based on the Metropolis-Hastings sampler,” in

Actes du25 e colloque GRETSI , Lyon, France, September 2015.[9] M. Pereyra and S. McLaughlin, “Comparing bayesian modelsin the absence of ground truth,” in EUSIPCO , Budapest, Hun-gary, August 2016, pp. 528–532.[10] B. Harrou´e, J.-F. Giovannelli, and M. Pereyra, “S´election demod`eles en restauration d’image. Approche bay´esienne dans lecas gaussien,” in

Actes du 27 e colloque GRETSI , Lille, France,August 2019.[11] B. P. Carlin and S. Chib, “Bayesian model choice via markovchain monte carlo methods,” Journal of the Royal StatisticalSociety B , vol. 57, pp. 473–484, 1995.[12] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng,

Handbookof Markov Chain Monte Carlo . Boca Raton, USA: Chapman& Hall / CRC, 2011.[13] F. Orieux, O. F´eron, and J.-F. Giovannelli, “Sampling high-dimensional Gaussian ﬁelds for general linear inverse prob-lem,”

IEEE Signal Processing Letters , vol. 19, no. 5, pp. 251–254, May 2012.[14] C. Gilavert, S. Moussaoui, and J. Idier, “Efﬁcient Gaus-sian sampling for solving large-scale inverse problems usingMCMC,”

IEEE Transactions on Signal Processing , vol. 63,no. 1, pp. 70–80, January 2015.[15] Y. Marnissi, E. Chouzenoux, A. Benazza-Benyahia, and J.-C.Pesquet, “An auxiliary variable method for MCMC algorithmsin high dimension,”

Entropy , vol. 20, p. 110, 2018.[16] M. Pereyra, “Proximal Markov chain Monte Carlo algorithms,”

Statistical Computation , May 2015.[17] M. Pereyra, P. Schniter, E. Chouzenoux, J.-C. Pesquet, J.-Y. Tourneret, A. O. Hero, and S. McLaughlin, “A survey ofstochastic simulation and optimization methods in signal pro-cessing,”