Bayesian model selection for unsupervised image deconvolution with structured Gaussian priors
Benjamin Harroué, Jean-François Giovannelli, Marcelo Pereyra
BBAYESIAN MODEL SELECTION FOR UNSUPERVISED IMAGE DECONVOLUTIONWITH STRUCTURED GAUSSIAN PRIORS
B. Harrou´e , , J.-F. Giovannelli and M. Pereyra (1) IMS (Univ. Bordeaux, CNRS, B-INP), Talence, France(2) MACS, Heriot-Watt University, Edinburgh, United Kingdom ABSTRACT
This paper considers the objective comparison of stochasticmodels to solve inverse problems, more specifically imagerestoration. Most often, model comparison is addressed in asupervised manner, that can be time-consuming and partly ar-bitrary. Here we adopt an unsupervised Bayesian approachand objectively compare the models based on their poste-rior probabilities, directly from the data without ground truthavailable. The probabilities depend on the marginal likeli-hood or “evidence” of the models and we resort to the Chibapproach including a Gibbs sampler. We focus on the familyof Gaussian models with circulant covariances and unknownhyperparameters, and compare different types of covariancematrices for the image and noise.
1. INTRODUCTION
Image restoration is a subject of interest in many fields: med-ical imaging, astronomy, physics in general, and the literatureon the subject is extensive [1,2]. The recurrent difficulty mostoften comes from the badly-scaled character, then regulariza-tion is required. Regularization can be founded on probabilis-tic models, such as Markov models, Gaussian models, mix-tures of distributions,. . . In practice, it is necessary to knowthe structure of these models: neighborhood of a Markovmodels, type of covariance for a Gaussian, number of compo-nents of a mixture,. . . A common pragmatic approach consistsin giving oneself a family of models and comparing them inan empirical way. This approach has two disadvantages: itrequires time to supervise the study and it is partly arbitrary.The advantage of (automatic) model selection is then obvi-ous. There are many approaches [3, 4]: posterior probabili-ties, information criteria (IC): AIC, BIC, Deviance IC, Pre-dictive BIC, Generalized IC, Widely Applicable BIC,. . . Theapproach used here is optimal in the sense of the Bayesian de-cision [5]: the model with the highest posterior probability isselected among the candidate models. These probabilities arebased on an integral called evidence , which is often difficultto compute especially in large dimensions. Here we resort tothe Chib approach itself based on a Gibbs sampler. See alsoour previous papers on the subject [6–9]. More specifically, the restoration relies on zero-meanGaussian models with stationary-circulant covariances andinclude the comparison of various types of covariance for theimage and the measurement noise. See also our preliminarypaper on the subject [10].
2. PROBLEM STATEMENT
Consider the Bayesian estimation of an unknown image x ∈ R P from a noisy and blurred observation y ∈ R P relatedto x by a linear model y = Hx + e , where H ∈ R P × P is a known blur operator and e ∈ R P is additive noise. Inthis paper, we suppose that there are K alternative modelsavailable to recover x from y and investigate model selectionprocedures to objectively compare the models, directly from y , without ground truth available.More precisely, we assume that x and e are Gaussianrandom vectors with mean zeros and covariance R x , R e ∈ R P × P . We adopt the representation R x = γ − x C x and R e = γ − e C e , where C x , C e ∈ R P × P define the covariance struc-ture and γ x , γ e > control the energy of x and e . We con-sider that γ x , γ e are unknown and define γ = [ γ x , γ e ] .We focus on the case where R x , R e and H are circulantmatrices diagnolizable in a discrete Fourier basis F :• R x = γ − x F † S x F , S x = diag (cid:2) s x ( p ) (cid:3) p =1 ...P • R e = γ − e F † S e F , S e = diag (cid:2) s e ( p ) (cid:3) p =1 ...P • H = F † S h F , S h = diag (cid:2) s h ( p ) (cid:3) p =1 ...P where S x and S e determine the power spectral density (PSD)of x and e , up to the scale factors γ x and γ e .Without loss of generality, here we consider the followingfour possible alternative models for S x and S e :Lorentz : 1 / [( πω )(1 + [ ν h /ω ] )(1 + [ ν v /ω ] )] Gauss : (2 πω ) − exp[ − ( ν h + ν v ) / (2 ω )] Laplace : (4 ω ) − exp[ − ( | ν h | + | ν v | ) /ω ] White : ( ν h , ν v ) where ω is a (fixed) bandwidth parameter and ( ν h , ν v ) are thehorizontal and vertical image frequencies. We have chosen a r X i v : . [ s t a t . C O ] O c t hese models because they capture a rich variety of differentregularity and pixel correlation structures, see Fig. 1. Othermodels could be considered too. Fig. 1 . Different PSD structures. From left to right: Lorentz,Gauss, Laplace and White.Moreover, we model the scale parameters γ x , γ e as a priori independent and assign them conjugate gamma density: p ( γ x ) ∝ γ α x − x exp − β x γ x p ( γ e ) ∝ γ α e − e exp − β e γ e , with α (cid:63) and β (cid:63) set to very small values to obtain vague priors.Fig.2 depicts graphical structure of the considered proba-bilistic models. y x γ x α x , β x γ e α e , β e Fig. 2 . Graphical structure of the considered hierarchicalmodel; circles represent unknown quantities.The four alternative models for S x and S e result in K =16 possible models indexed by the variable M taking valuesin { , . . . , K } . Each model M defines a different posteriordistribution for x and will hence lead to potentially very dif-ferent estimates. The next section introduces a Bayesian ap-proach to objectively compare the K models in the absenceof ground truth.
3. BAYESIAN MODEL SELECTION BY USING THECHIB GIBBS EVIDENCE APPROXIMATION
Following Bayesian decision theory, we compare the K competing models by calculating the posterior probabilities p ( M = k | y ) , given for any k ∈ { , . . . , K } by p ( M = k | y ) = p ( y | M = k ) p ( M = k ) p ( y )= p ( y | M = k ) p ( M = k ) K (cid:80) l =1 p ( y | M = l ) p ( M = l ) , (1)where the so-called model evidence or marginal likelihood p ( y | M ) = (cid:90) (cid:90) γx p ( y , x , γ | M ) d γ d x , measures the likelihood of the data given the model M . Weuse the uniform prior p ( M = k ) = 1 /K reflecting that allmodels are equally likely a priori .The key challenge in implementing this Bayesian decisiontheoretic approach is to compute model evidences. In thispaper, we propose to address this difficulty by using the Chibapproach [11]. More precisely, note that for all γ ∈ R (cid:63) p ( y | M ) = p ( y , γ | M ) p ( γ | y , M )= p ( y | γ , M ) p ( γ | M ) p ( γ | y , M ) , (2)and note that the numerator is tractable for the consideredmodels. The denominator is not analytically tractable, butcan be conveniently expressed as the expectation p ( γ | y , M ) = (cid:90) x p ( γ , x | y , M ) d x = (cid:90) x p ( γ | x , y , M ) p ( x | y , M ) d x = E x | y , M [ p ( γ | x , y , M )] , which can be efficiently and accurately computed by MonteCarlo integration. Precisely, we draw G samples { x [ g ] } Gg =1 from p ( x | y , M ) and calculate the empirical mean (cid:101) p ( γ | y , M ) = 1 G G (cid:88) g =1 p ( γ | x [ g ] , y , M ) . (3)While the above expressions are valid for all γ , they are usu-ally evaluated at the posterior mean of γ | y to reduce the vari-ance of the empirical mean.Markov chain Monte Carlo algorithms [5, 12] are a stan-dard computation strategy to simulate the samples x [ g ] from p ( x | y , M ) . In particular, the Gibbs sampler is an approachof choice for the class of models considered in this pa-per. It iteratively constructs a Markov chain { x [ g ] , γ [ g ] } Gg =1 targeting the joint density p ( x , γ | y , M ) by alternativelysampling γ e , γ x and x from the conditional distributions p ( γ e | y , γ x , x , M ) , p ( γ x | y , γ e , x , M ) and p ( x | y , γ e , γ x , M ) evaluated at the current state of the chain.By marginalisation through projection, the drawn samples { γ [ g ] } Gg =1 ∼ p ( γ | y , M ) are used to calculate the mean ¯ γ = (cid:80) Gg =1 γ [ g ] /G , followed by the computation of (cid:101) p (¯ γ | y , M ) from the samples { x [ g ] } Gg =1 ∼ p ( x | y , M ) and (3).Implementing this approach requires knowledge of thefollowing five densities:1. p ( y | γ , M = k ) to compute the numerator (2),2. The conditional densities defining the Gibbs sampler:• p ( x | y , γ e , γ x , M ) ,• p ( γ e | y , γ x , x , M ) , p ( γ x | y , γ e , x , M ) ,3. p ( γ | x , y , M ) to compute (3).To derive the likelihood p ( y | γ , M = k ) we use that x and e are zero-mean Gaussian vectors. Accordingly, y | γ isalso a Gaussian vectors with mean zero and covariance matrix R ky = HR ix H † + R je . Because R ix , R je and H are circulant, R ky is also circulant R ky = F † ( γ − x S h S ix S h † + γ − e S je ) F = F S ky F † , where S ky = diag (cid:2) s ky ( p ) (cid:3) p =1 ,...,P is the PSD of the data and s ky ( p ) = γ − x | s h ( p ) | s ix ( p ) + γ − e s je ( p ) is the variance associated to the p -th frequency. Moreover, det R ky = P (cid:89) p =1 s ky ( p ) et y † R ky − y = P (cid:88) p =1 | ◦ y ( p ) | s ky ( p ) and hence the likelihood is given by p ( y | γ , M ) =(2 π ) − P/ exp − P (cid:88) p =1 (cid:16) log s ky ( p ) + | ◦ y ( p ) | s ky ( p ) (cid:17) (4)The derivation of the conditional densities defining theGibbs sampler follows from standard conjugacy results. Thevector x | y , γ e , γ x , M is Gaussian with mean and variance µ k = γ e Σ H t R je − y Σ k = (cid:104) γ e H t R je − H + R ix − (cid:105) − . Similarly, the precisions γ e | y , x , M and γ x | y , x , M aregamma densities with parameters given by (cid:40) α e,k = α e + P/ α x,k = α x + P/ (cid:40) β e,k = β e + (cid:107) y − Hx (cid:107) R je / β x,k = β x + (cid:107) x (cid:107) R ix / Lastly, using the fact that γ e and γ x are conditionally in-dependent given y , x , M , we obtain p ( γ | x , y , M ) = p ( γ e | y , x , M ) p ( γ x | y , x , M ) . Note that all the required matrix and vector products areefficiently computed by using Fourier basis representations.Similarly, one can efficiently simulate form p ( x | y , γ e , γ x , M ) by leveraging the fact that Σ k is diagonal on a Fourier basis.
4. NUMERICAL EXPERIMENTS
We now present an experiment with synthetic data designedto demonstrate the feasibility of the proposed Bayesian modelselection approach in an image processing setting.For each of the K = 16 models (we have 4 alternativemodels for S x and S e ), we generated synthetic blurredand noisy images of size × . The blur is a cardinal sineof unitary width and the true values are γ (cid:63) x = 6 and γ (cid:63) e = 4 .Then, for each of the K (true) models and each of the images, we have computed the K probabilities p ( M = k | y ) for k = 1 , . . . , K by running K Gibbs samplers and usingthe Chib approach. We then performed model selection byposterior maximisation ˆ k = arg max k p ( M = k | y ) . Theresults are given in Fig. 3, which shows the percentage ofthe number of times that each model was selected against thetruth k (cid:63) .We observe that the results are very accurate for all theconsidered configurations, with accuracy ranging from to depending on the specific configuration (the con-figurations White/Laplace and
White/White seem to beparticularly easy to identify, whereas the configurations
Lorentz/Gauss and
Lorentz/White appear to be more dif-ficult). The overall accuracy in this experiment is over . Fig. 3 . Confusion matrix (percentage). Candidate model (x-axis) and true model (y-axis).Note that the calculation of the K posterior probabilities p ( M = k | y ) for an image y relies on samples, thatrequires nearly seconds using MATLAB on a standard PCbecause all the computations and specially the sampling under p ( x | y , γ e , γ x , M ) are performed in the Fourier domain.Also, the evidences are computed in logarithmic scale toavoid overflow and underflow problems. Moreover, it is im-portant to translate the values before computing the linearcale and finally apply a factor to correct the translation.To produce this experiment we calculated × ×
50 =12 800 model evidences and never observed any convergenceor numerical stability issues.Furthermore, for illustration, Fig. 4 shows the evolutionof the approximation of the log-evidence (2) based on em-pirical mean (3) as a function of the number G of MonteCarlo samples, for one specific dataset and model configu-ration. For comparison, we also include the “exact” evidence p ( y | M = k ) calculated by a computationally intensive in-tegration of p ( y , γ x , γ e | M = k ) w.r.t. γ x , γ e over a finegrid. Observe that in the order of samples are required toobtain a stable approximation. It must be kept in mind thata very good precision for the log-evidence is required for agood precision on the probabilities. Fig. 4 . Approximation of the Log-Evidence as a function ofthe sample number G . The exact value is computed by a bruteforce method of integration.Lastly, it is worth noting that the Gibbs sampler also pro-duces approximation of the marginal posteriors p ( x | y , M = k ) , p ( γ x | y , M = k ) and p ( γ e | y , M = k ) . Fig. 5 showsthe traces of γ x and γ e and the associated histograms.
5. CONCLUSION AND PERSPECTIVES FORFUTURE WORK
We have presented a contribution to the automatic selection ofmodels for deconvolution. We have worked in the case of cir-culant Gaussian models which allows the marginalization ofthe unknown image and the easy manipulation of covariances,the major difficulty concerning the marginalization of hyper-parameters. Our strategy is optimal in the sense of a Bayesianrisk and is based on the choice of the most probable model.We therefore evaluate the probability of each candidate model
Fig. 5 . Samples of the simulated chains shown as a functionthe iteration index (left) and as histograms (right). Top: γ e and bottom γ x .and for this we need the evidence resulting from the marginal-ization of the unknown image and hyperparameters. Severaloptions are possible and we have opted for Chib’s approachand a Gibbs algorithm. We show excellent performances interms of selection, which encourages us to extend our work.In an extended version of the paper, we will make a completecomparison with other existing methods: Laplace’s approxi-mation, RJMCMC, WBIC which can also be used to computemodel probabilities. It will also be interesting to comparewith information criteria such as AIC or BIC.Among the perspectives, the non-circulant Gaussian case:a direct extension will resort to [13–15] to sample the image,the rest of the algorithm remaining unchanged. The extensionto non-Gaussian cases will be based on more advanced sam-pling tools [16, 17] but we will have to face a new difficultyin relation with hyperparameters and partition functions. Wealso intend to include new hyperparameters, e.g. shape pa-rameters of the image and noise DSPs, such as the width ω .Naturally, processing of real data is also part of our plans.
6. REFERENCES [1] J.-F. Giovannelli and J. Idier, Eds.,
Regularization andBayesian Methods for Inverse Problems in Signal and ImageProcessing . London: ISTE and John Wiley & Sons Inc., 2015.[2] J. Kaipio and E. Somersalo,
Statistical and computational in-verse problems . Berlin, Germany: Springer, 2005.[3] T. Ando,
Bayesian model selection and statistical modeling .Boca Raton, USA: Chapman & Hall/CRC, 2010.[4] J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques:An overview,”
IEEE Signal Processing Magazine , vol. 35,no. 6, pp. 16–34, November 2018.[5] C. P. Robert,
The Bayesian Choice. From decision-theoreticfoundations to computational implementation , ser. SpringerTexts in Statistics. New York, USA: Springer Verlag, 2007.6] C. Vacar, J.-F. Giovannelli, and A.-M. Roman, “Bayesian tex-ture model selection by harmonic mean,” in
Proceedings of theInternational Conference on Image Processing , vol. 19, Or-lando, USA, September 2012, p. 5.[7] J.-F. Giovannelli and A. Giremus, “Bayesian noise model se-lection and system identification based on approximation ofthe evidence,” in
Proceedings of the International Conferenceon Statistical Signal Processing (special session) , Gold Coast,Australia, June 2014, pp. 125–128.[8] A. Barbos, A. Giremus, and J.-F. Giovannelli, “Bayesian noisemodel selection and system identification using Chib approxi-mation based on the Metropolis-Hastings sampler,” in
Actes du25 e colloque GRETSI , Lyon, France, September 2015.[9] M. Pereyra and S. McLaughlin, “Comparing bayesian modelsin the absence of ground truth,” in EUSIPCO , Budapest, Hun-gary, August 2016, pp. 528–532.[10] B. Harrou´e, J.-F. Giovannelli, and M. Pereyra, “S´election demod`eles en restauration d’image. Approche bay´esienne dans lecas gaussien,” in
Actes du 27 e colloque GRETSI , Lille, France,August 2019.[11] B. P. Carlin and S. Chib, “Bayesian model choice via markovchain monte carlo methods,” Journal of the Royal StatisticalSociety B , vol. 57, pp. 473–484, 1995.[12] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng,
Handbookof Markov Chain Monte Carlo . Boca Raton, USA: Chapman& Hall / CRC, 2011.[13] F. Orieux, O. F´eron, and J.-F. Giovannelli, “Sampling high-dimensional Gaussian fields for general linear inverse prob-lem,”
IEEE Signal Processing Letters , vol. 19, no. 5, pp. 251–254, May 2012.[14] C. Gilavert, S. Moussaoui, and J. Idier, “Efficient Gaus-sian sampling for solving large-scale inverse problems usingMCMC,”
IEEE Transactions on Signal Processing , vol. 63,no. 1, pp. 70–80, January 2015.[15] Y. Marnissi, E. Chouzenoux, A. Benazza-Benyahia, and J.-C.Pesquet, “An auxiliary variable method for MCMC algorithmsin high dimension,”
Entropy , vol. 20, p. 110, 2018.[16] M. Pereyra, “Proximal Markov chain Monte Carlo algorithms,”
Statistical Computation , May 2015.[17] M. Pereyra, P. Schniter, E. Chouzenoux, J.-C. Pesquet, J.-Y. Tourneret, A. O. Hero, and S. McLaughlin, “A survey ofstochastic simulation and optimization methods in signal pro-cessing,”