[PDF] The Maximum Entropy on the Mean Method for Image Deblurring

Abstract

Image deblurring is a notoriously challenging ill-posed inverse problem. In recent years, a wide variety of approaches have been proposed based upon regularization at the level of the image or on techniques from machine learning. We propose an alternative approach, shifting the paradigm towards regularization at the level of the probability distribution on the space of images. Our method is based upon the idea of maximum entropy on the mean wherein we work at the level of the probability density function of the image whose expectation is our estimate of the ground truth. Using techniques from convex analysis and probability theory, we show that the method is computationally feasible and amenable to very large blurs. Moreover, when images are imbedded with symbology (a known pattern), we show how our method can be applied to approximate the unknown blur kernel with remarkable effects. While our method is stable with respect to small amounts of noise, it does not actively denoise. However, for moderate to large amounts of noise, it performs well by preconditioned denoising with a state of the art method.

Full PDF

TThe Maximum Entropy on the Mean Method forImage Deblurring

Gabriel Rioux , Rustum Choksi , Tim Hoheisel , PierreMar´echal , and Christopher Scarvelis Department of Mathematics and Statistics, McGill University, Montreal, QC H3G1Y6, Canada Institut de Math´ematiques de Toulouse, Universit´e Paul Sabatier, Toulouse 31 062,FranceE-mail: { gabriel.rioux,christopher.scarvelis } @mail.mcgill.ca, { rustum.choksi,tim.hoheisel } @mcgill.ca,[email protected] Abstract.

Image deblurring is a notoriously challenging ill-posed inverse problem.In recent years, a wide variety of approaches have been proposed based uponregularization at the level of the image or on techniques from machine learning. Inthis article, we adapt the principal of maximum entropy on the mean (MEM) to bothdeconvolution of general images and point spread function (PSF) estimation (blinddeblurring). This approach shifts the paradigm towards regularization at the level ofthe probability distribution on the space of images whose expectation is our estimateof the ground truth. We present a self-contained analysis of this method, reducing theproblem to solving a diﬀerentiable, strongly convex ﬁnite-dimensional optimizationproblem for which there exists an abundance of black-box solvers. The strength of theMEM method lies in its simplicity, its ability to handle large blurs, and its potentialfor generalization and modiﬁcations. When images are embedded with symbology (aknown pattern), we show how our method can be applied to approximate the unknownblur kernel with remarkable eﬀects.

Keywords : Image Deblurring, Maximum Entropy on the Mean, Kullback-Leibler Diver-gence, Convex Analysis, Fenchel-Rockafellar Duality.

1. Introduction

Ill-posed inverse problems permeate the ﬁelds of image processing and machine learning.Prototypical examples stem from non-blind (deconvolution) and blind deblurring ofdigital images. The vast majority of methods for image deblurring are based on somenotion of regularization at the image level. In this article, we present, analyse and test a r X i v : . [ c s . C V ] O c t a diﬀerent method known in information theory as Maximum Entropy on the Mean (MEM).The general idea of maximum entropy dates back to Jaynes in 1957 ([39, 40]). Avast literature originates in Jaynes’ paper, on conceptual and theoretical aspects, butalso on the multiple applications of the principle of maximum entropy. Based uponthese ideas, Navaza and others developed a method for solving ill-posed problems incrystallography ([56, 57, 22, 25]). Further applications were made to deconvolutionproblems in astrophysics ‡ ([82, 55, 5, 89]).In image reconstruction (as in other ﬁelds of applied sciences), it is necessaryto distinguish between the method of maximum entropy (ME) and the principle ofMaximum Entropy on the Mean (MEM) as described in [34, 22]. In the former, thegray level of each pixel is interpreted as a probability; the image is then normalized,so that the values add up to one, and the maximum entropy principle is applied underthe available constraints (see [82] and the references therein). In the latter (see forexample [37, 52, 46]), the pixelated image is considered as a random vector, and theprinciple of the maximum entropy is applied to the probability distribution of this vector;the available constraints are imposed on the expectation under the unknown probabilitydistribution, and thus become constraints on the probability of the image (see [6, 4, 52]).A deﬁnite advantage of the MEM is that, by moving to an upper level object (aprobability distribution on the object to be recovered), it becomes possible to incorporatenonlinear constraints via the introduction of a prior distribution. This results in avery ﬂexible machinery, which makes the MEM very attractive for a wide variety ofapplications.To the best of our knowledge, the ﬁrst occurrence of the MEM inference schemeappeared in [81], for the purpose of spectral analysis. Surprisingly, the MEM has notbeen widely used for image deconvolution. Indeed, citations from the key articles suggestthat MEM is not well-known in the image processing § and machine learning communitiesand, to our knowledge, its full potential has not been explored and implemented in thecontext of blind deblurring in image processing. One of the goals of this article is torectify this by demonstrating that a reformulation of the MEM method can produce ageneral scheme for deconvolution and in certain cases, kernel estimation, which comparesfavourably with the state of the art and is amenable to very large blurs. Moreover,its ability to elegantly incorporate prior information opens the door to many possibleavenues for generalizations and other applications.Following the presentation in Le Besnerais, Bercher, and Demoment [46], let usﬁrst state the classical MEM approach introduced by Navaza ([56, 57]) for solving linear ‡ The setting of astronomical imaging is ideal for deconvolution, as in most situations an accurateestimation of the point spread function can be obtained. Indeed, one can estimate the point spreadfunction by calibrating the telescope using reference stars or by analysing the properties of the opticssystem for example [53, 83]. § We remark that Noll [60] did implement a MEM framework for deblurring a few photo images butwith only modest results. inverse problems. We work with vectors in R d , where in the context of an image, d represents the number of pixels. Given a d × d convolution matrix C and an observabledata z ∈ R d , we wish to determine the ground truth x ∈ R d and noise n ∈ R d where z = Cx + n or z = Hy, where H = [ C, I ] and y = (cid:34) xn (cid:35) . (1)If ρ, µ are two probability measures on a subset Ω ⊂ R d , we deﬁne the Kullback-Leiblerdivergence to be K ( ρ, µ ) = (cid:40) (cid:82) Ω log (cid:16) d ρ d µ (cid:17) d ρ, ρ, µ ∈ P (Ω) , ρ (cid:28) µ, + ∞ , otherwise, (2)where P (Ω) denotes the space of probability measures on Ω. One can think of ρ asthe probability distribution associated with x and µ as some prior distribution. In theclassical MEM approach, the best estimate of y is taken to be (cid:98) y := E ¯ ρ [ X ] , where ¯ ρ = argmin ρ (cid:8) K ( ρ, µ ) } (cid:12)(cid:12) H E ρ [ y ] = z (cid:9) . This inﬁnite-dimensional variational problem is recast as follows:setting F ( y ) := min ρ (cid:8) K ( ρ, µ ) (cid:12)(cid:12) E ρ [ X ] = y (cid:9) , solve min y F ( y ) s . t Hy = z. (3)To solve (3), they introduce what they call the log-Laplace transform of µ which turnsout to be the convex conjugate of F under some assumptions on µ : F ∗ ( s ) := log (cid:90) exp (cid:104) s, u (cid:105) d µ ( u ) . (4)Then, via Fenchel-Rockafellar duality, they show that the primal problem (3) has a dualformulation min Hy = z F ( y ) = max λ ∈ R d D ( λ ) := λ T z − F ∗ ( H T λ ) . The primal-dual recovery formula states that if (cid:98) λ is a maximizer of D and (cid:98) s := H T (cid:98) λ then the primal solution is (cid:98) y = ∇ F ∗ ( (cid:98) s ) . (5)In this article, we present a general MEM based deconvolution approach whichdoes not directly include a noise component, but rather treats the constraint via anadditive ﬁdelity term. We also treat directly the inﬁnite-dimensional primal problemover probability densities. A version of this approach was recently implemented by us(cf. [70]) for the blind deblurring of 2D QR and 1D UPC barcodes, where we showed itsremarkable ability to blindly deblur highly blurred and noisy barcodes. In this article,we present a self-contained, short (yet complete) analysis of the general theory includinga stability analysis, and then apply it to both non-blind deblurring and kernel (PSF)estimation (blind deblurring). Our computational results are dramatic and comparewell with the state of the art.Let us provide a few details of our approach to MEM as a method for deblurring. Inour notation, pixel values are taken from a compact subset of Ω ⊂ R d , ρ the distributionof the image and the prior µ are both in P (Ω), and b ∈ R d is a measured signalrepresenting the blurred and possibly noisy image. Our best guess of the ground truthis determined via¯ x := E ¯ ρ [ X ] , where ¯ ρ := arg min ρ ∈P (Ω) (cid:110) K ( ρ, µ ) + α || b − C E ρ [ X ] || (cid:111) . Here α > C E ρ [ X ] to theblurred image b (see (18)). This primal problem has a ﬁnite-dimension dual (cf. Theorem2) and a recovery formula for the minimizer ¯ ρ in terms of the optimizer ¯ λ of the dualproblem. To compute the expectation of ¯ ρ , we use the moment-generating function M X [ t ] of the prior µ ∈ P (Ω) to show in Section 3.3 that E ¯ ρ [ X ] = ∇ t log ( M X [ t ]) | C T ¯ λ . (6)This is the analogous statement to (5). When the moment generating function of µ isknown, ﬁnding our estimate of the ground has thus been reduced to the optimization in d dimensions of an explicit strongly convex, diﬀerentiable function.Of course, eﬀective implementation of deconvolution is directly tied to the presenceof noise. In our MEM formulation, we do not directly include noise as a (solved for)variable. We thus make three important remarks about noise: • In Theorem 3 we prove a stability estimate valid for any prior µ which supportsthe fact that our method is stable with respect to small amounts of noise. • With only a very modest prior (with a sole purpose of imposing box constraints),we demonstrate that for moderate to large amounts of noise, our method workswell by preconditioning with expected patch log likelihood (EPLL) denoising [104]. • While the inclusion of noise-based priors will be addressed elsewhere, we brieﬂycomment on directly denoising with our MEM deconvolution method in Section 6.Our MEM deconvolution scheme can equally well be used for PSF estimation, henceblind deblurring. Indeed, given some approximation of the ground truth image, say ˜ x ,we estimate the ground truth PSF ¯ c as¯ c := E ¯ η [ X ] , where ¯ η := arg min η ∈P (Ω) (cid:110) K ( η, ν ) + γ || E η [ X ] ∗ ˜ x − b || (cid:111) . (7)Many blind deblurring algorithms iterate between estimating the image x and the PSF c . One could take a similar iterative approach by invoking prior information about c in ν . However, here we proceed via symbology; that is, we assume the image containsa known pattern analogous to a ﬁnder pattern in a QR or UPC barcode (cf. [70]). Inthese cases, one focuses (7) on ˜ x , the part of the image which is known. While ournumerical results (cf. Figures 2, 3, 4) are dramatic, the presence and exploitation ofa ﬁnder pattern is indeed restrictive and does weaken our notion of blind deblurring.Moreover, it renders comparisons with other methods unfair. Nevertheless, we feel that,besides synthetic images like barcodes, there are many possible applications whereinsome part of an image is a priori known. As a matter of fact, let us stress that theability of the MEM to incorporate nonlinear constraints (via the prior measure) maytremendously improve the estimation of the convolution kernel. For example, the choiceof the support of the prior will enable us to conﬁne the kernel to any prescribed closedconvex subset. In the paper, we exploit very little of this potential yet already obtainremarkable reconstructions.We reiterate that our approach directly treats the inﬁnite-dimensional primalproblem over probability measures ρ and its ﬁnite-dimensional dual, a setting oftenreferred to as partially ﬁnite programming [9]. While we can also reformulate our primalproblem in terms of a ﬁnite-dimensional analogue of (3) deﬁned at the image level (seeSection 6.1), an advantage to our approach lies in the fact that one may be interested incomputing the optimal ¯ ρ or a general linear functional L of ¯ ρ for which a simple closedform formula (analogous to (5) and (6) for the expectation) fails to exist. Here onecould adapt a stochastic gradient descent algorithm to compute L ¯ ρ wherein the onlyrequirement on the prior would be that one can eﬃciently sample from µ .The paper is organized as follows. We start by brieﬂy recalling current methods fordeblurring, highlighting the common theme of regularization at the level of the image,or level of the PSF (for blind deblurring). After a brief section on preliminaries, wepresent in Sections 3.1, 3.2, and 3.3 our MEM deconvolution method which we havejust outlined. We then address PSF estimations via the exploitation of symbology inSection 3.4. We prove a stability estimate in Section 4. Numerical examples withcomparisons for both deconvolution and blind deblurring with symbology are presentedin Section 5. We present a short discussion of the role of the prior and future work inSection 6. Finally we end with a short conclusion, highlighting the MEM method andthe novelties of this article. The process of capturing one channel of a blurred image b ∈ R n × m from a ground truthchannel x ∈ R n × m is modelled throughout by the relation b = c ∗ x , where ∗ denotesthe 2-dimensional convolution between the kernel c ∈ R k × k ( k < n, m ) and the groundtruth; this model represents spatially invariant blurring. For images composed of morethan one channel, blurring is assumed to act on a per-channel basis. Therefore, wederive a method to deblur one channel and apply it to each channel separately.Most current blind deblurring methods consist of solvinginf x ∈ R n × m c ∈ R k × k (cid:110) R ( x, c ) + α || c ∗ x − b || (cid:111) , (8)where R : R n × m × R k × k → R serves as a regularizer which permits the impositionof certain constraints on the optimizers and α > c ∈ inf c ∈ R k × k (cid:110) R (˜ x, c ) + α || c ∗ ˜ x − b || (cid:111) , ˜ x ∈ inf x ∈ R n × m (cid:110) R ( x, ˜ c ) + α || ˜ c ∗ x − b || (cid:111) , the ﬁrst subproblem is a kernel estimation and the second is a deconvolution. We willutilize this approach in the sequel.Taking R c = ||·|| and replacing ˜ x and b by their derivatives to estimate thekernel has yielded good results in diﬀerent methods (see e.g. [17, 49, 62, 64, 99])and can be eﬃciently solved using spectral methods. The deconvolution step ofteninvolves more elaborate regularizers which may be non-diﬀerentiable (such as the (cid:96) regularizer) or non-convex (such as the (cid:96) regularizer), thus it is often the case thatnew optimization techniques must be developped to solve the resulting problem. Forexample, the half-quadratic splitting method [96] has been used to great eﬀect whenthe (cid:96) regularizer is employed [49, 62, 64, 99]. Optimization methods that are well-suited for the (cid:96) regularizer include primal-dual interior-point methods [43], iterativeshrinkage thresholding algorithms [3, 24], and the split Bregman method [35]. In ourcase, standard convex optimization software can be used regardless of the choice of prior,as the problem to be solved is strongly convex and diﬀerentiable.Approaches that are not based on machine learning diﬀer mostly in the choiceof regularizer; common choices include (cid:96) , (cid:96) , and (cid:96) -regularization (and combinationsthereof), which promote sparsity of the solution to varying degrees (see [17, 41, 47, 101,102],[7, 14, 27, 67, 75, 79, 92, 94, 100], and [49, 62, 63, 64] for some examples whichemploy, respectively, (cid:96) , (cid:96) and (cid:96) regularizers on the image or its derivatives).Such regularizers may be employed to enforce sparsity on other quantities related tothe image. For example, one can employ (cid:96) -regularization of the dark or bright channelto promote sparsity of a channel consisting of local minima or maxima in the intensitychannel [65, 99]. Further, one can regularize the coeﬃents of some representationof the image. In particular, many natural images have sparse representations incertain transformation domains. Some frames which have seen use in deblurring anddeconvolution include wavelets [2, 30, 32], DCT (Discrete Cosine Transform) [58],contourlets [88], and framelets [12]. Moreover, combinations of diﬀerent frames to exploittheir individual properties can also be utilized [11, 31, 84].Other notable choices of regularizer include the ratio of the l and l norm to enforcesparsity [44], the weighted nuclear norm, which ensures that the image or gradientmatrices have low rank [69, 98], a penalization based on text-speciﬁc properties [16], thespatially-variant hyper-Laplacian penalty [15, 80], a surface-aware prior which penalizesthe surface area of the image [49], and approximations of the (cid:96) penalty via concatenationof a quadratic penalty and a constant [95] or via a logarithmic prior [66].Approaches based on machine learning include modelling the optimization problemas a deep neural network [78, 93] and estimating the ground truth image from ablurred input without estimating the kernel using convolutional neural networks (CNNs)[54, 61], recurrent neural networks (RNNs) [86] or generative adversarial networks(GANs)[45, 68]. Other popular methods consist of estimating the kernel via CNNs andusing it to perform deconvolution [85], deconvolving via inversion in the Fourier domainand denoising the result using a neural network [21, 23, 77], and learning dictionariesfor sparse representations of natural images [28, 29, 38, 50, 59, 97].

2. Preliminaries

We begin by recalling some standard deﬁnitions and establishing notation. We referto [103] for convex analysis in inﬁnite dimensions and [71] for the ﬁnite-dimensionalsetting. We follow [76] as a standard reference for real analysis.Letting (

X, τ ) be a separated locally convex space, we denote by X ∗ its topologicaldual. The duality pairing between X and its dual will be written as ( · , · ) : X × X ∗ → R in order to distinguish it from the canonical inner product on R d , (cid:104)· , ·(cid:105) : R d × R d → R .For f : X → ¯ R ≡ R ∪ {−∞ , + ∞} , an extended real-valued function on X , the (Fenchel)conjugate of f is f ∗ : X ∗ → ¯ R deﬁned by f ∗ ( x ∗ ) = sup x ∈ X { ( x, x ∗ ) − f ( x ) } , using the convention a − ( −∞ ) = + ∞ and a − (+ ∞ ) = −∞ for every a ∈ R . Thesubdiﬀerential of f at ¯ x ∈ X is the set ∂f (¯ x ) = { x ∗ ∈ X ∗ | ( x − ¯ x, x ∗ ) ≤ f ( x ) − f (¯ x ) ∀ x ∈ X } . We deﬁne dom f := { x ∈ X | f ( x ) < + ∞} , the domain of f , and say that f is proper ifdom f (cid:54) = ∅ and f ( x ) > −∞ for every x ∈ X . f is said to be lower semicontinuous if f − ([ −∞ , α ]) is τ -closed for every α ∈ R .A proper function f is convex provided for every x, y ∈ dom f and λ ∈ (0 , f ( λx + (1 − λ ) y ) ≤ λf ( x ) + (1 − λ ) f ( y ) . If the above inequality is strict whenever x (cid:54) = y , f is said to be strictly convex. If f isproper and for every x, y ∈ dom f and λ ∈ (0 , f ( λx + (1 − λ ) y ) + λ (1 − λ ) c || x − y || ≤ λf ( x ) + (1 − λ ) f ( y ) , then f is called c -strongly convex.For any set A ⊆ X , the indicator function of A is given by δ A : X → R ∪ { + ∞} , x (cid:55)→ (cid:40) , x ∈ A, + ∞ , otherwise.For any Ω ⊆ R d , we denote by P (Ω) the set of probability measures on Ω. The setof all signed Borel measures with ﬁnite total variation on Ω will be denoted by M (Ω).We say that a measure is σ -ﬁnite (on Ω) if Ω = ∪ i ∈ N Ω i with | µ (Ω i ) | < + ∞ .Let µ be a positive σ -ﬁnite Borel measure on Ω and ρ be an arbitrary Borel measureon Ω, we write ρ (cid:28) µ to signify that ρ is absolutely continuous with respect to µ , i.e.if A ⊆ Ω is such that µ ( A ) = 0, then ρ ( A ) = 0. If ρ (cid:28) µ there exists a unique function d ρ d µ ∈ L ( µ ) for which ρ ( A ) = (cid:90) A d ρ d µ d µ, ∀ A ⊆ Ω measurable . The function d ρ d µ is known as the Radon-Nikodym derivative (cf. [76, Thm. 6.10]).These measure-theoretic notions were used previously to deﬁne the Kullback-Leiblerdivergence (2).For Ω ⊆ R d , η ∈ M (Ω) we denote, by a slight abuse of notation, E η [ X ] to bea vector whose k th component is ( E η [ X ]) k = (cid:82) Ω x k d η ( x ). Thus, E ( · ) [ X ] is a map from M (Ω) to R d whose restriction to P (Ω) is known as the expectation of the random vector X = [ X , · · · , X d ] associated with the input measure.Finally, the smallest (resp. largest) singular value σ min ( C ) (resp. σ max ( C )) of thematrix C ∈ R m × n is the square root of the smallest (resp. largest) eigenvalue of C T C .

3. The MEM Method

Notation:

We ﬁrst establish some notation pertaining to deconvolution. Theconvolution operator c ∗ will be denoted by the matrix C : R d → R d acting on avectorized image x ∈ R d for d = nm and resulting in a vectorized blurred image forwhich the k th coordinate in R d corresponds to the k th pixel of the image. We assumethroughout that the matrix C is nonsingular.We recall that traditional deconvolution software functions by solving (8) with aﬁxed convolution kernel c . Our approach diﬀers from previous work by adopting themaximum entropy on the mean framework which posits that the state best describinga system is given by the mean of the probability distribution which maximizes somemeasure of entropy [39, 40]. As such, taking Ω ⊆ R d to be compact, µ ∈ P (Ω) to bea prior measure and b ∈ R d to be a blurred image, our approach is to determine thesolution of inf ρ ∈P (Ω) (cid:110) K ( ρ, µ ) + α || b − C E ρ [ X ] || (cid:111) = inf P (Ω) { f + g ◦ A } , (9)for f = K ( · , µ ) , g = α || b + ( · ) || , A = − C E ( · ) [ X ] . (10)The following lemma establishes some basic properties of f . Lemma 1.

The functional f : M (Ω) → ¯ R is proper, weak ∗ lower semicontinuous andstrictly convex.Proof. We begin with strict convexity of f . Let x ∈ Ω and t ∈ (0 ,

1) be arbitrarymoreover let ρ (cid:54) = ρ be elements of P (Ω) and ρ t = tρ + (1 − t ) ρ . We havelog (cid:32) d ρ t d µ ( x ) t + (1 − t ) (cid:33) d ρ t d µ ( x ) = log (cid:32) t d ρ d µ ( x ) + (1 − t ) d ρ d µ ( x ) t + (1 − t ) (cid:33) (cid:18) t d ρ d µ ( x ) + (1 − t ) d ρ d µ ( x ) (cid:19) ≤ t log (cid:18) d ρ d µ ( x ) (cid:19) d ρ d µ ( x ) + (1 − t ) log (cid:18) d ρ d µ ( x ) (cid:19) d ρ d µ ( x ) . The inequality is due to the log-sum inequality [20, Thm. 2.7.1], and since ρ (cid:54) = ρ , d ρ d µ and d ρ d µ diﬀer on a set E ⊆ Ω such that µ ( E ) >

0. The strict log-sum inequalitytherefore implies that the inequality is strict for every x ∈ E . Since integration preservesstrict inequalities, f ( ρ t ) = (cid:90) Ω \ E log (cid:18) d ρ t d µ (cid:19) d ρ t d µ d µ + (cid:90) E log (cid:18) d ρ t d µ (cid:19) d ρ t d µ d µ < tf ( ρ ) + (1 − t ) f ( ρ )so f is, indeed, strictly convex.It is well known that the restriction of f to P (Ω) is weak ∗ lower semicontinuousand proper (cf. [26, Thm. 3.2.17]). Since f ≡ + ∞ on M (Ω) \P (Ω), f preserves theseproperties.Problem (9) is an inﬁnite-dimensional optimization problem with no obvioussolution and is thus intractable in its current form. However, existence and uniquenessof solutions thereof is established in the following remark. Remark 1.

First, the objective function in (9) is proper, strictly convex and weak ∗ lower semicontinuous since f is proper, strictly convex and weak ∗ lower semicontinuouswhereas g ◦ A is proper, weak ∗ continuous and convex.Now, recall that the Riesz representation theorem [33, Cor. 7.18] identiﬁes M (Ω) as being isomorphic to the dual space of ( C (Ω) , ||·|| ∞ ) . Hence, by the Banach-Alaoglutheorem, [33, Thm. 5.18] the unit ball of M (Ω) in the norm-induced topology (cid:107) ( B ∗ ) isweak ∗ -compact. (cid:107) The norm here is given by the total variation, we make precise that the weak ∗ topology will be theonly topology considered in the sequel. Since dom f ⊆ P (Ω) ⊆ B ∗ , standard theory for the existence of minimizers of τ -lower semicontinuous functionals on τ -compact sets [1, Cor. 3.2.3] imply that (9) hasa solution and strict convexity of f guarantees that it is unique. Even with this theoretical guarantee, direct computation of solutions to (9) remainsinfeasible. In the sequel, a corresponding ﬁnite-dimensional dual problem will beestablished which will, along with a method to recover the expectation of solutionsof (9) from solutions of this dual problem, permit an eﬃcient and accurate estimationof the original image.

In order to derive the (Fenchel-Rockafellar) dual problem to (9) we provide the readerwith the Fenchel-Rockafellar duality theorem in a form expedient for our study, cf. e.g.[103, Cor. 2.8.5].

Theorem 1 (Fenchel-Rockafellar Duality Theorem) . Let ( X, τ ) and ( Y, τ (cid:48) ) be locallyconvex spaces and let X ∗ and Y ∗ denote their dual spaces. Moreover, let f : X → R ∪ { + ∞} and g : Y → R ∪ { + ∞} be convex, lower semicontinuous (in their respectivetopologies) and proper functions and let A be a continuous linear operator from X to Y .Assume that there exists ¯ y ∈ A dom f ∩ dom g such that g is continuous at ¯ y . Then inf x ∈ X { f ( x ) + g ( − Ax ) } = max y ∗ ∈ Y ∗ {− f ∗ ( A ∗ y ∗ ) − g ∗ ( y ∗ ) } (11) with A ∗ denoting the adjoint of A . Moreover, ¯ x is optimal in the primal problem if andonly if there exists ¯ y ∗ ∈ Y ∗ satisfying A ∗ ¯ y ∗ ∈ ∂f (¯ x ) and ¯ y ∗ ∈ ∂g ( − A ¯ x ) . In (11), the minimization problem is referred to as the primal problem, whereas themaximization problem is called the dual problem. Under certain conditions, a solutionto the primal problem can be obtained from a solution to the dual problem.

Remark 2 (Primal-Dual Recovery) . In the context of Theorem 1, f ∗ and g ∗ are proper,lower semicontinuous and convex, also ( f ∗ ) ∗ = f and ( g ∗ ) ∗ = g [103, Thm. 2.3.3].Suppose additionally that ∈ int( A ∗ dom g ∗ − dom f ∗ ) .Let ¯ y ∗ ∈ argmax Y ∗ {− f ∗ ◦ A ∗ − g ∗ } . By the ﬁrst order optimality conditions, [103,Thm. 2.5.7] ∈ ∂ ( f ∗ ◦ A ∗ + g ) (¯ y ∗ ) = A∂f ∗ ( A ∗ ¯ y ∗ ) + ∂g ( y ∗ ) , the second expression is due to [8, Thm. 2.168] (the conditions to apply this theoremare satisﬁed by assumption). Consequently, there exists ¯ z ∈ ∂g ∗ (¯ y ∗ ) and ¯ x ∈ ∂f ∗ ( A ∗ ¯ y ∗ ) for which ¯ z = − A ¯ x . Since f and g are proper, lower semicontinuous and convex wehave [103, Thm. 2.4.2 (iii)]: A ∗ ¯ y ∗ ∈ ∂f (¯ x ) , ¯ y ∗ ∈ ∂g (¯ z ) = ∂g ( − A ¯ x ) . Thus Theorem 1 demonstrates that ¯ x is a solution of the primal problem, that is if ¯ y ∗ isa solution of the dual problem, ∂f ∗ ( A ∗ ¯ y ∗ ) contains a solution to the primal problem. If, additionally, f ∗ ( A ∗ ¯ y ∗ ) < + ∞ [8, Prop. 2.118] implies that, ¯ x ∈ ∂f ∗ ( A ∗ ¯ y ∗ ) = arg max x ∈ X { ( x, A ∗ ¯ y ∗ ) − f ( x ) } . (12) We refer to (12) as the primal-dual recovery formula.

A particularly useful case of this theorem is when A is an operator between aninﬁnite-dimensional locally convex space X and R d , as the dual problem will be a ﬁnite-dimensional maximization problem. Moreover, the primal-dual recovery is easy if f ∗ is Gˆateaux diﬀerentiable at A ∗ ¯ y ∗ , in which case the subdiﬀerential and the derivativecoincide at this point [103, Cor. 2.4.10], so (12) reads ¯ x = ∇ f ∗ ( A ∗ ¯ y ∗ ). Some remarksare in order to justify the use of this theorem. Remark 3.

It is clear that P (Ω) endowed with any topology is not a locally convexspace, however it is a subset of M (Ω) . Previously, M (Ω) was identiﬁed with the dual of ( C (Ω) , ||·|| ∞ ) , thus the dual of M (Ω) endowed with its weak ∗ topology ( M (Ω) , w ∗ ) ∗ canbe identiﬁed with C (Ω) [19, Thm. 1.3] with duality pairing ( φ, ρ ) ∈ C (Ω) × M (Ω) (cid:55)→ (cid:82) Ω φ d ρ .Since dom f ⊆ P (Ω) , the inf in (9) can be taken over M (Ω) or P (Ω) interchangeably. In the following we verify that A is a bounded linear operator and compute itsadjoint. Lemma 2.

The operator A : M (Ω) → R d in (10) is linear and weak ∗ continuous.Moreover, its adjoint is the mapping z ∈ R d (cid:55)→ (cid:10) C T z, · (cid:11) ∈ C (Ω) .Proof. We begin by demonstrating weak ∗ continuity of E ( · ) [ X ] : M (Ω) → R d . Letting π i : R d → R denote the projection of a vector onto its i − th coodinate, we have E ρ [ X ] = (( π , ρ ) , . . . , ( π n , ρ )) (13)Thus, A is the composition of a weak ∗ continuous operator from M (Ω) to R d and acontinuous operator from R d to R d and hence is weak ∗ continuous.Eq. (13) equally establishes linearity of A , since the duality pairing is a bilinearform.The adjoint can be determined by noting that (cid:104) E ρ [ X ] , z (cid:105) = d (cid:88) i =1 (cid:90) Ω x i d ρ ( x ) z i = (cid:90) Ω d (cid:88) i =1 x i z i d ρ ( x ) = ( (cid:104) z, ·(cid:105) , ρ ) , so, (cid:104) C E ρ [ X ] , z (cid:105) = (cid:10) E ρ [ X ] , C T z (cid:11) = ( (cid:10) C T z, · (cid:11) , ρ ) , yielding A ∗ ( z ) = (cid:10) C T z, · (cid:11) .We now compute the conjugates of f and g , respectively and provide an explicitform for the dual problem of (9).2 Lemma 3.

The conjugate of f in (10) is f ∗ : φ ∈ C (Ω) (cid:55)→ log (cid:0)(cid:82) Ω exp( φ ) d µ (cid:1) . In par-ticular, f ∗ is ﬁnite-valued. Moreover, for any φ ∈ C (Ω) , argmax P (Ω) { ( φ, · ) − K ( · , µ ) } = { ¯ ρ φ } , the unique probability measure on Ω for whichd ¯ ρ φ d µ = exp φ (cid:82) Ω exp φ d µ . (14) Proof.

We proceed by direct computation: f ∗ ( φ ) = sup ρ ∈M (Ω) { ( φ, ρ ) − K ( ρ, µ ) } = sup ρ ∈P (Ω) { ( φ, ρ ) − K ( ρ, µ ) } = sup ρ ∈P (Ω) (cid:40)(cid:90) Ω log (cid:32) exp φ d ρ d µ (cid:33) d ρ (cid:41) , where we have used the fact that dom f ⊆ P (Ω) as noted in Remark 3. Note thatexp φ ∈ C (Ω) ⊆ L ( ρ ) and since t (cid:55)→ log t is concave, Jensen’s inequality [76, Thm. 3.3]yields f ∗ ( φ ) ≤ sup ρ ∈P (Ω) (cid:40) log (cid:32)(cid:90) Ω exp φ d ρ d µ d ρ (cid:33)(cid:41) = log (cid:18)(cid:90) Ω exp φ d µ (cid:19) (15)Letting ¯ ρ φ be the measure with Radon-Nikodym derivatived ¯ ρ φ d µ = exp φ (cid:82) Ω exp φ d µ , one has that( φ, ¯ ρ φ ) − K ( ¯ ρ φ , µ ) = ( φ, ¯ ρ φ ) − (cid:90) Ω log (cid:18) exp φ (cid:82) Ω exp φ d µ (cid:19) d ¯ ρ φ = log (cid:18)(cid:90) Ω exp φ d µ (cid:19) , so ¯ ρ φ ∈ argmax P (Ω) { ( φ, · ) − K ( · , µ ) } as ¯ ρ φ saturates the upper bound for f ∗ ( φ )established in (15), thus f ∗ ( φ ) = log (cid:0)(cid:82) Ω exp φ d µ (cid:1) . Moreover ¯ ρ φ is the unique maximizersince the objective is strictly concave.With this expression in hand, we show that dom f ∗ = C (Ω). To this eﬀect, let φ ∈ C (Ω) be arbitrary and note that,exp ( φ ( x )) ≤ exp (cid:16) max Ω φ (cid:17) , ( x ∈ Ω) . Thus, f ∗ ( φ ) = log (cid:18)(cid:90) Ω exp φ d µ (cid:19) ≤ log (cid:16) exp (cid:16) max Ω φ (cid:17)(cid:17) = max Ω φ < + ∞ , since C (Ω) = C b (Ω) by compactness of Ω. Since φ is arbitrary, dom f ∗ = C (Ω) and,coupled with the fact that f ∗ is proper [103, Thm. 2.3.3], we obtain that f ∗ is ﬁnite-valued.3 Lemma 4.

The conjugate of g from (10) is g ∗ : z ∈ R d (cid:55)→ α || z || − (cid:104) b, z (cid:105) .Proof. The assertion follows from the fact that ||·|| is self-conjugate [72, Ex. 11.11]and some standard properties of conjugacy [72, Eqn. 11(3)].Combining these results we obtain the main duality theorem. Theorem 2.

The (Fenchel-Rockafellar) dual of (9) is given by max λ ∈ R d (cid:26) (cid:104) b, λ (cid:105) − α || λ || − log (cid:18)(cid:90) Ω exp (cid:10) C T λ, x (cid:11) d µ ( x ) (cid:19)(cid:27) . (16) Given a maximizer ¯ λ of (16) one can recover a minimizer of (9) viad ¯ ρ = exp (cid:10) C T ¯ λ, · (cid:11)(cid:82) Ω exp (cid:10) C T ¯ λ, · (cid:11) d µ d µ. (17) Proof.

The dual problem can be obtained by applying the Fenchel-Rockafellar dualitytheorem (Theorem 1), with f and g deﬁned in (10), to the primal probleminf ρ ∈M (Ω) (cid:110) K ( ρ, µ ) + α || b − C E ρ [ X ] || (cid:111) , and substituting the expressions obtained in Lemmas 2, 3 and 4. All relevant conditionsto apply this theorem have either been veriﬁed previously or are clearly satisﬁed.Note that ⊆ dom g ∗ = R d and A ∗ = 0 ∈ C (Ω), so A ∗ (dom g ∗ ) − dom f ∗ ⊇ − dom f ∗ = { φ | − φ ∈ dom f ∗ } = C (Ω) , since dom f ∗ = C (Ω) by Lemma 3. Thus 0 ∈ int ( A ∗ dom g ∗ − dom f ∗ ) = C (Ω), andRemark 2 is applicable. The primal-dual recovery formula (12) is given explicit form byLemma 3 by evaluating d ¯ ρ (cid:104) C T ¯ λ, · (cid:105) .The utility of the dual problem is that it permits a staggering dimensionalityreduction, passing from an inﬁnite-dimensional problem to a ﬁnite-dimensional one.Moreover, the form of the dual problem makes precise the role of α in (9). Notably in[9, Cor. 4.9] the probleminf ρ ∈P (Ω) ∩ dom K ( · ,µ ) K ( ρ, µ ) s.t. || C E ρ [ X ] − b || ≤ α (18)is paired in duality with (16). Thus the choice of α is directly related to the ﬁdelity of C E ρ [ X ] to the blurred image. The following section is devoted to the choice of a priorand describing a method to directly compute E ¯ ρ [ X ] from a solution of (16).4 If no information is known about the original image, the prior µ is used to impose boxconstraints on the optimizer such that its expectation will be in the interval [0 , d andwill only assign non-zero probability to measurable subsets of this interval. With thisconsideration in mind, the prior distribution should be the distribution of the randomvector X = [ X , X , . . . ] with the X i denoting independent random variables withuniform distributions on the interval [ u i , v i ]. If the k th pixel of the original image isunknown, we let [ u k , v k ] = [0 − (cid:15), (cid:15) ] for (cid:15) > k th pixel of the ground truth image was known to have a value of (cid:96) , one can enforce this constraint by taking the random variable X k to be distributeduniformly on [ (cid:96) − (cid:15), (cid:96) + (cid:15) ]. Constructing µ in this fashion guarantees that its support(and hence Ω) is compact.To deal with the integrals in (16) and (17) it is convenient to note that (cf. [73,Sec. 4.4]) (cid:90) Ω exp (cid:0)(cid:10) C T λ, x (cid:11)(cid:1) d µ = M X [ C T λ ] , the moment-generating function of X evaluated at C T λ . Since the X i are independentlydistributed, M X [ t ] = Π di =1 M X i [ t ] [73, Sec. 4.4], and since the X i are uniformlydistributed on [ u i , v i ] one has M X [ t ] = d (cid:89) i =1 e t i v i − e t i u i t i ( v i − u i ) , and therefore the dual problem (16) with this choice of prior can be written asmax λ ∈ R d (cid:40) (cid:104) b, λ (cid:105) − α || λ || − d (cid:88) i =1 log (cid:32) e C Ti λv i − e C Ti λu i C Ti λ ( v i − u i ) (cid:33)(cid:41) , (19)where C i denotes the i -th column of C . A solution of (19) can be determined usinga number of standard numerical solvers. We opted for the implementation [10] of theL-BFGS algorithm due to its speed and eﬃciency.Since only the expectation of the optimal probability measure for (9) is of interest,we compute the i th component of the expectation ( E ¯ ρ [ X ]) i of the optimizer provided bythe primal-dual recovery formula (17) via (cid:82) Ω x i e (cid:104) C T ¯ λ,x (cid:105) d µ (cid:82) Ω e (cid:104) C T ¯ λ,x (cid:105) d µ = ∂ t i log (cid:18)(cid:90) Ω e (cid:104) t,x (cid:105) d µ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) t = C T ¯ λ . Using the independence assumption on the prior, we obtain E ¯ ρ [ X ] = ∇ t d (cid:88) i =1 log ( M X i [ t ])5such that the best estimate of the ground truth image is given by( E ¯ ρ [ X ]) i = v i e C Ti ¯ λv i − u i e C Ti ¯ λu i e C Ti ¯ λv i − e C Ti ¯ λu i − C Ti ¯ λ . (20)With (19) and (20) in hand, our entropic method for deconvolution can be implemented. In order to implement blind deblurring on images that incorporate a symbology, onemust ﬁrst estimate the convolution kernel responsible for blurring the image. This stepcan be performed by analyzing the blurred symbolic constraints. We propose a methodthat is based on the entropic regularization framework discussed in the previous sections.In order to perform this kernel estimation step, we will use the same frameworkas (9) with x taking the role of c . In the assumption that the kernel is of size k × k ,we take Ω = [0 − (cid:15), (cid:15) ] k for (cid:15) > η ∈P (Ω) (cid:26) K ( η, ν ) + γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E η [ X ] ∗ ˜ x − ˜ b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:27) . (21)Here, γ > x and ˜ b are the segments of the originaland blurred image which are known to be ﬁxed by the symbolic constraints. That is, ˜ x consists solely of the embedded symbology and ˜ b is the blurry symbology. By analogywith (9), the expectation of the optimizer of (21) is taken to be the estimated kernel.The role of ν ∈ P (Ω) is to enforce the fact that the kernel should be normalized andnon-negative (hence its components should be elements of [0 , k uniform distributions on [0 − (cid:15), (cid:15) ]. As in thenon-blind deblurring step, the expectation of the optimizer of (21) can be determinedby passing to the dual problem (which is of the same form as (19)), solving the dualproblem numerically and using the primal-dual recovery formula (20). A summary ofthe blind deblurring algorithm is compiled in Algorithm 1. We would like to pointout that the algorithm is not iterative, rather only one kernel estimate step and onedeconvolution step are used.This method can be further reﬁned to compare only the pixels of the symbologywhich are not convolved with pixels of the image which are unknown. By choosing thesespeciﬁc pixels, one can greatly improve the quality of the kernel estimate, as every pixelthat was blurred to form the signal is known; however, this reﬁnement limits the size ofconvolution kernel which can be estimated.

4. Stability Analysis for Deconvolution

In contrast to, say, total variation methods, our maximum entropy method does notactively denoise. However, its ability to perform well with a denoising preprocessing6

Algorithm 1

Entropic Blind Deblurring

Require:

Blurred image b , prior µ , kernel width k , ﬁdelity parameters γ, α ; Ensure:

Deblurred image ¯ xν ← density of k uniformly distributed independent random variables λ ¯ c ← argmax of analog of (19) for kernel estimate.¯ c ← expectation of argmin of (21) computed via analog of (20) for kernel estimateevaluated at λ ¯ c λ ¯ x ← argmax of (19)¯ x ← expectation of argmin of (9) with kernel ¯ c computed via (20) evaluated at λ ¯ x return ¯ x step highlights that it is “stable” to small perturbations in the data. In this section,we show that our convex analysis framework readily allows us to prove the followingexplicit stability estimate. Theorem 3.

Let b , b ∈ R d , C be non-singular, and let ρ i = arg min ρ ∈P (Ω) (cid:110) K ( ρ, µ ) + α || C E ρ [ X ] − b i || (cid:111) ( i = 1 , . Then || E ρ [ X ] − E ρ [ X ] || ≤ σ min ( C ) || b − b || . Where σ min ( C ) is the smallest singular value of C . We remark that the identical result holds true (with minor modiﬁcations to theproof) if the expectation is replaced with any linear operator P (Ω) → R d .The proof will follow from a sequence of lemmas. To this end we consider theoptimal value function for (9), which we denote v : R d → R , as v ( b ) := inf ρ ∈P (Ω) (cid:110) K ( ρ, µ ) + α || C E ρ [ X ] − b || (cid:111) = inf ρ ∈P (Ω) { k ( ρ, b ) + h ◦ L ( ρ, b ) } , (22)where k : ( ρ, b ) ∈ M (Ω) × R d (cid:55)→ K ( ρ, µ ) , h = α ||·|| , L ( ρ, b ) = C E ρ [ X ] − b. (23)The following results will allow us to conclude that ∇ v is (globally) α -Lipschitz. Lemma 5.

The operator L in (23) is linear and continuous in the product topology, itsadjoint is the map z (cid:55)→ ( (cid:10) C T z, · (cid:11) , − z ) ∈ C (Ω) × R d .Proof. Linearity and continuity of this operator from the linearity and weak ∗ continuiyof the expectation operator (cf. Lemma 2). The adjoint is obtained as in Lemma 2, (cid:104) C E ρ [ X ] − b, z (cid:105) = ( (cid:10) C T z, · (cid:11) , ρ ) + (cid:104) b, − z (cid:105) . k + h ◦ L . Lemma 6.

The conjugate of k + h ◦ L deﬁned in (23) is the function ( φ, y ) ∈ C (Ω) × R d (cid:55)→ ( K ( · , µ )) ∗ ( φ + (cid:10) C T y, · (cid:11) ) + 12 α || y || , (24) where ( K ( · , µ )) ∗ is the conjugate computed in Lemma 3.Proof. Since dom h = R d , h is continuous and k is proper, there exists x ∈ L dom k ∩ dom h such that h is continuous at x . The previous condition guarantees that, [103,Thm. 2.8.3] ( k + h ◦ L ) ∗ ( φ, y ) = min z ∈ R d { k ∗ (( φ, y ) − L ∗ ( z )) + h ∗ ( z ) } . (25)The conjugate of k is given by k ∗ ( φ, y ) = sup ρ ∈M (Ω) b ∈ R d { ( φ, ρ ) + (cid:104) y, b (cid:105) − K ( ρ, µ ) } . For y (cid:54) = 0, sup R d (cid:104) y, ·(cid:105) = + ∞ . Thus, k ∗ ( φ, y ) = sup ρ ∈M (Ω) { ( φ, ρ ) − K ( ρ, µ ) } + δ { } ( y ) = ( K ( · , ρ )) ∗ ( φ ) + δ { } ( y ) . The conjugate of h was established in Lemma 4 and the adjoint of L is given in Lemma5. Substituting these expressions into (25) yields,( k + h ◦ L ) ∗ ( φ, y ) = min z ∈ R d (cid:110) ( K ( · , µ )) ∗ (( φ − (cid:10) C T z, · (cid:11) ) + δ { } ( y + z ) + α || z || (cid:111) = ( K ( · , µ )) ∗ ( φ + (cid:10) C T y, · (cid:11) ) + 12 α || y || . The conjugate computed in the previous lemma can be used to establish that ofthe optimal value function.

Lemma 7.

The conjugate of v in (22) is v ∗ : y ∈ R d (cid:55)→ ( K ( · , µ )) ∗ ( (cid:10) C T y, · (cid:11) ) + α || y || which is α -strongly convex.Proof. We begin by computing the conjugate, v ∗ ( y ) = sup b ∈ R d (cid:26) (cid:104) y, b (cid:105) − inf ρ ∈M (Ω) { k ( ρ, b ) + h ◦ L ( ρ, b ) } (cid:27) = sup ρ ∈M (Ω) b ∈ R d { (0 , ρ ) + (cid:104) y, b (cid:105) − k ( ρ, b ) − h ◦ L ( ρ, b ) } = ( k + h ◦ L ) ∗ (0 , y ) . In light of (24), v ∗ ( y ) = ( K ( · , µ )) ∗ ( (cid:10) C T y, · (cid:11) ) + α || y || which is the sum of a convexfunction and a α -strongly convex function and is thus α -strongly convex.8 Remark 4.

Theorem 2 establishes attainment for the problem deﬁning v in (22) , so dom v = R d and v is proper. Moreover, [8, Prop. 2.152] and [8, Prop. 2.143] establish,respectively, continuity and convexity of v . Consequently, ( v ∗ ) ∗ = v [103, Thm. 2.3.3]and since v ∗ is α -strongly convex, v is Gˆateaux diﬀerentiable with globally α -Lipschitzderivative [103, Rmk. 3.5.3]. We now compute the derivative of v . Lemma 8.

The derivative of v is the map b (cid:55)→ α ( b − C E ¯ ρ [ X ]) , where ¯ ρ is the solutionof the primal problem (9) , which is given in (17) .Proof. By [103, Thm. 2.6.1] and [103, Thm. 2.8.3], s ∈ ∂v ( b ) ⇐⇒ (0 , s ) ∈ ∂ ( k + h ◦ L ) ( ¯ ρ, b ) = ∂k ( ¯ ρ, b ) + L ∗ ( ∂h ( L ( ¯ ρ, b ))) , for ¯ ρ satisfying v ( b ) = k ( ¯ ρ, b ) + h ◦ L ( ¯ ρ, b ). Since k is independent of b , ∂k ( ¯ ρ, b ) = ∂ K ( ¯ ρ, µ ) × { } . By Lemma 5, L ∗ ( ∂h ( L ( ¯ ρ, b ))) = L ∗ ( α ( C E ¯ ρ [ X ] − b )) = (cid:0)(cid:10) αC T ( C E ¯ ρ [ X ] − b ) , · (cid:11) , α ( b − C E ¯ ρ [ X ]) (cid:1) , so s = ∂v ( b ) = α ( b − C E ¯ ρ [ X ]).We now prove Theorem 3. Proof of Theorem 3.

By Lemma 7, v ∗ is α -strongly convex, so ∇ v computed in Lemma8 is globally α -Lipschitz (cf. Remark 4), thus ||∇ v ( b ) − ∇ v ( b ) || ≤ α || b − b || and ||∇ v ( b ) − ∇ v ( b ) || = α || b − b + C ( E ρ [ X ] − E ρ [ X ]) || ≥ α || C ( E ρ [ X ] − E ρ [ X ]) || − α || b − b || ≥ ασ min ( C ) || E ρ [ X ] − E ρ [ X ] || − α || b − b || . Consequently, || E ρ [ X ] − E ρ [ X ] || ≤ σ min ( C ) || b − b || .

5. Numerical Results

We present results obtained using our method on certain simulated images. We beginwith deconvolution, i.e. when the blurring kernel c is known. Figure 1 provides anexample in which a blurry and noisy image has been deblurred using the non-blinddeblurring method. We note that the method does not actively denoise blurred imageswhen a uniform prior is used, so a preprocessing step consisting of expected patch loglikelihood (EPLL) denoising [104] is ﬁrst performed. For the sake of consistency, thesame preprocessing step is applied prior to using Cho et al ’s deconvolution method [18](this step also improves the quality of the restoration for this method). The resulting9 (a) PSNR: 20.83 dB (b) Cho et al [18],PSNR: 27.50 dB (c) Ours,PSNR: 26.68 dB Figure 1:

Deconvolution with noise:

Original image is 512 ×

512 pixels. (a) is theblurred image which is further degraded with 1% Gaussian noise along with the 23 pixelwide convolution kernel. (b) is the result obtained using Cho et al ’s deconvolutionmethod [18]. (c) is the result obtained from the blurred image via our non-blinddeblurring method.image is subsequently deblurred and ﬁnally TV denoising [74] is used to smooth theimage in our case (this step is unnecessary for the other method as it already results ina smooth restoration). Note that for binary images such as text, TV denoising can bereplaced by a thresholding step (see ﬁgure 3).Results for blind deblurring are compiled in Figure 2, 3 and 4. In this case γ = 10 and α = 10 provide good results in the noiseless case and γ = 10 , α = 10 is adequate for the noisy case, these parameters require manual tuning to yieldthe best results however. Comparisons are provided with various state of the artmethods [49, 65, 99]. These methods estimate the kernel and subsequently use knowndeconvolution algorithms to generate the latent image, the same deconvolution method[18] was used for all three methods as it yielded the best results.We have stressed that the other methods in our comparison do not exploit thepresence of a ﬁnder pattern; they are fully blind where ours are symbology-based. Henceit would be natural to ask if these methods can also beneﬁt from known symbology.It is not immediate how to successfully use these methods with symbology. Most ofthese methods iterate between optimizing for the image and the kernel, and employ (cid:96) regularization which is non-convex and can therefore terminate in a local minimum.Hence iteration could prove problematic. Following what we have done here, one coulduse a strictly convex regularization scheme to ﬁrst estimate c using the ﬁnder patternas the sole image and then deconvolute. But this was precisely the approach taken byGennip et al in [90] using a strictly convex regularization scheme of the form (8) toexploit the known ﬁnder patterns in QR barcodes. Its performance was signiﬁcantlyinferior to our MEM method as presented in [70]. The ability of MEM to incorporatenonlinear constraints via the introduction of a prior is a deﬁnite advantage.0PSNR: 13.79 dB Liu et al [49]PSNR: 16.39 dB Pan et al [65]PSNR: 16.49 dB Yan et al [99]PSNR: 16.54 dB OursPSNR: 29.44 dB (a) PSNR: 13.54 dB PSNR: 25.98 dB PSNR: 23.39 dB PSNR: 23.71 dB PSNR: 27.79 dB (b)

PSNR: 15.37 dB PSNR: 23.17 dB PSNR: 21.60 dB PSNR: 20.97 dB PSNR: 25.67 dB (c)

Figure 2:

Blind deblurring with and without noise:

Original image is 256 × strong caveat of this comparison: Unlike the othermethods, ours directly exploits the known symbology.1 (a) (b) Figure 3:

Blind text deblurring with and without noise:

Original image is500 ×

155 pixels. Top: Blurred and noisy image. Middle: Original convolution kernel onthe left and estimated kernel on the right. Bottom: Deblurred image obtained using ourmethod with an EPLL denoising preprocessing step and a thresholding postprocessingstep. (a) is noiseless with a 57 pixel kernel. (b) has 1% Gaussian noise with a 45 pixelwide kernel.PSNR: 14.61 dB (a)

Liu et al [49]PSNR: 26.87 dB (b)

Pan et al [65]PSNR: 23.97 dB (c)

Yan et al [99]PSNR: 24.67 dB (d)

OursPSNR: 39.66 dB (e)

Figure 4:

Blind deblurring in color:

Original image is 512 ×

512 pixels. (a) is theimage which has been blurred with a 17 ×

17 kernel (no noise). (b)-(e) are the latentimage and estimated kernel obtained with diﬀerent methods. We repeat the strongcaveat of this comparison: Unlike the other methods, ours directly exploits the knownsymbology.In Appendix D, we compensate for the bias (in our favour) in the comparisons ofour symbology-based blind deblurring method with fully blind methods by presenting a2comparison which clearly gives the favorable bias to the other method. We consider thesame examples as in ﬁgures 2 and 4 but compare our symbology-based blind methodwith the deconvolution (non-blind) method of Cho et al. [18]; that is, we give thecomparison method the advantage of knowing the PSF.

In the presence of additive noise, attempting to deblur images using methods that are nottailored for noise is generally ineﬀective. Indeed, the image acquisition model b = c ∗ x is replaced by b = c ∗ x + n where n denotes the added noise. The noiseless modelposits that the captured image should be relatively smooth due to the convolution,whereas the added noise sharpens segments of the image randomly, so the two modelsare incompatible. However, Figures 1 and 2 show that our method yields good results inboth deconvolution and blind deblurring when a denoising preprocessing step (the othermethods use the preprocessed version of the image as well for the sake of consistency)and a smoothing postprocessing step are utilized.Remarkably, with a uniform prior, the blind deblurring method is more robust tothe presence of additive noise in the blurred image than the non-blind method. Indeed,accurate results were obtained with up to 5% Gaussian noise in the blind case whereasin the non-blind case, quality of the recovery diminished past 1% Gaussian noise. Thisis due to the fact that the preprocessing step fundamentally changes the blurring kernelof the image. We are therefore attempting to deconvolve the image with the wrongkernel, thus leading to aberrations. On the other hand, the estimated kernel for blinddeblurring is likely to approximate the kernel modiﬁed by the preprocessing step, leadingto better results. Moreover, a sparse (Poisson) prior was used in the kernel estimate forthe results in Figure 2 so as to mitigate the eﬀects of noise on the symbology.Finally, we note that there is a tradeoﬀ between the magnitude of blurring andthe magnitude of noise. Indeed, large amounts of noise can be dealt with only if theblurring kernel is relatively small and for large blurring kernels, only small amounts ofnoise can be considered. This is due to the fact that for larger kernels, deviations inkernel estimation aﬀect the convolved image to a greater extent than for small kernels.

6. The Role of the Prior, Denoising and Further Extensions

Our method is based upon the premise that a priori the probability density ρ at eachpixel is independent from the other pixels. Hence in our model, the only way to introducecorrelations between pixels is via the prior µ . Let us ﬁrst recall the role of the prior µ in the deconvolution (and ν in the kernel estimation). In deconvolution for generalimages, the prior µ was only used to impose box constraints; otherwise, it was unbiased(uniform). For deconvolution with symbology, e.g. the presence of a known ﬁnderpattern, this information was directly imposed on the prior. For kernel estimation, theprior ν was used to enforce normalization and positivity of the kernel; but otherwise3unbiased.Our general method, on the other hand, facilitates the incorporation of far moreprior information. Indeed, we seek a prior probability distribution µ over the space oflatent images that possesses at least one of the following two properties:(i) µ has a tractable moment-generating function (so that the dual problem can besolved via gradient-based methods such as L-BFGS),(ii) It is possible to eﬃciently sample from µ (so that the dual problem can be solvedvia stochastic optimization methods).As a simple example, we provide a comparison between a uniform and anexponential prior with large rate parameter ( β = 400 at every pixel) to deblur a textimage corrupted by 5% Gaussian noise with no preprocessing or postprocessing in ﬁgure5. In the former case, we set the ﬁdelity parameter α = 3 × and in the latter, α = 10 .It is clear from this ﬁgure that the noise in the blurred image is better handled by theexponential prior. This fact will be further discussed in Section 6.1 which also introducesan eﬃcient implementation of the MEM with an exponential prior. In this case, sparsityhas been used to promote the presence of a white background by inverting the intensityof the channels during the deblurring process. (a) PSNR: 15.69 dB (b) Uniform,PSNR: 19.18 dB (c) Exponential,PSNR: 20.73 dB Figure 5:

Deconvolution with diﬀerent priors:

Original image is 256 ×

256 pixels.(a) is the blurred image with added 5% Gaussian noise along with the 19 pixel wideconvolution kernel. (b) is the result obtained using a uniform prior. (c) is obtainedusing an exponential prior.More generally, we believe our method could be tailored to contemporaryapproaches for priors used in machine learning, and this could be one way of blinddeblurring without the presence of a ﬁnder pattern. A natural candidate for such aprior µ is a generative adversarial network (GAN) (cf. [36]) trained on a set ofinstances from a class of natural images (such as face images). GANs have achievedstate-of-the-art performance in the generative modelling of natural images (cf. [42]) andit is possible, by design, to eﬃciently sample from the distribution implicitly deﬁned by4a GAN’s generator. Consequently, when equipped with a pre-trained GAN prior, ourdual problem (12) would be tractable via stochastic compositional optimization methodssuch as the ASC-PG algorithm of Wang et al. in [91]. It can be advantageous, for example in directly relating the role of the prior to the imageregularization, to reformulate our MEM primal problem (9) at the image level. Recallthat previous MEM approaches for inverse problems (3), what we called the classicalapproach, were all based on a primal problem on the space of images. Our formulationcan also be rephrased at the image level as follows: ﬁnd ¯ x , the estimate of the groundtruth image, wherearg min x ∈ R d (cid:110) v ( x ) + α || Cx − b || (cid:111) with v ( x ) := inf ρ (cid:8) K ( ρ, µ ) (cid:12)(cid:12) E ρ [ X ] = x (cid:9) . (26)In this problem, which essentially appears in [51, 46], one can think of v ( x ) as a regularizer for the image estimate x .Given the structure of the above problem as the sum of a potentially, lowersemicontinuous convex function and a smooth convex function with L -Lipschitz gradient,the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [3] can be utilizedprovided the proximal operator, deﬁned for t > tv ( u ) := arg min x ∈ R d (cid:18) v ( x ) + 12 t || x − u || (cid:19) , can be computed eﬃciently. As before, one can consider a dual formulation to theproblem deﬁning the proximal operator (see (4) for the conjugate of v )max λ ∈ R d (cid:26) (cid:104) u, λ (cid:105) − t || λ || − log ( M µ [ λ ]) (cid:27) , (27)such that prox tv ( u ) = ∇ t log ( M µ [ t ]) | ¯ λ , for ¯ λ a solution to (27). Note that the Lipschitz constant of the derivative of α || Cx − b || (which dictates the step size used in the FISTA iterations) is ασ max ( C ). Even if thelargest singular value of C is unknown, one can determine the step size using a linesearch.One example for which the proximal operator can be computed eﬃciently is whenthe prior consists of independent exponential distributions at each pixel with respectiverate parameters β i >

0. Indeed, (27) reads in this casemax λ ∈ R d (cid:40) (cid:104) u, λ (cid:105) − t || λ || + d (cid:88) i =1 log (cid:18) − λ i β i (cid:19)(cid:41) , λ ± ) i = u i + β i t ± (cid:112) ( u i + β i t ) − t ( u i β i − t = u i + β i t ± (cid:112) ( u i − β i t ) + 4 t t , thus one takes the smaller root (¯ λ − ) since the log moment-generating function is well-deﬁned for λ i β i <

1, thus (prox tv ( u )) i = 1 β i − (¯ λ − ) i . As such, one can implement the FISTA algorithm to perform deblurring via the MEMwith an exponential prior. This method was used to generate the example in ﬁgure 5.It is natural to seek a correspondence between regularization at the level of theprobability measure using a ﬁxed prior and the regularization at the image level (i.e.the reformulation). In the case of an exponential distribution, one has the followingexpression for the image space regularization [46, Table 1] v ( x ) = d (cid:88) i =1 x i β i − − log( x i β i ) , ( x i > . Note that if x i β i is large, the contribution of that summand is dominated by the linearterm. As such, taking an exponential prior with large rate parameter yields results inan image space regularization which approximates (cid:96) regularization.This subsection is simply meant to highlight this approach: A full study (theoryand applications) of the formulation (26) is in progress. We have used throughout the duality pairing between M (Ω) and C (Ω) with Ω ⊂ R d compact. Notice that, since the Kullback-Leibler entropy takes ﬁnite values only formeasures that are absolutely continuous with respect to the reference measure, it is alsopossible to work with the Radon-Nikodym derivatives, as in [51]. The primal problemis then expressed in a space of measurable functions. This setting also facilitates aninteresting extension to the case where Ω is not bounded. As a matter of fact, in thelatter case, ﬁrst-order moment integrals such that( π k , ρ ) = (cid:90) Ω x k d ρ ( x )are not necessarily well-deﬁned since the coordinate functions π k : x (cid:55)→ x k may beunbounded on Ω. As shown in [51], partially ﬁnite convex programming can still becarried out in this setting, oﬀering interesting possible extensions to our analysis. Inessence, in case Ω is unbounded one must restrict the primal problem to spaces offunctions deﬁned by an integrability condition against a family of constraint functions.Such spaces are sometimes referred to as K¨othe spaces, and their nature was shown toallow for the application of the convex dual machinery for entropy optimization [51].The corresponding extensions are currently under consideration, and will give rise tointeresting future work.6

7. Conclusion

The MEM method for the regularization of ill-posed deconvolution problems garneredmuch attention in the 80’s and 90’s with imaging applications in astrophysics andcrystallography. However, it is surprising that since that time it has rarely been usedfor image deblurring (both blind and non blind), and is not well-known in the imageprocessing and machine learning communities. We have shown that a reformulation ofthe MEM principle produces an eﬃcient (comparable with the state of the art) schemefor both deconvolution and kernel estimation for general images. It is also amenableto large blurs which are seldom used for testing methods. The scheme reduces to anunconstrained, smooth and strongly convex optimization problem in ﬁnite dimensionsfor which there exist an abundance of black-box solvers. The strength of this higher-level method lies in its ability to incorporate prior information, often in the form ofnonlinear constraints.For kernel estimation (blind deblurring), we focused our attention on exploiting apriori assumed symbology (a ﬁnder pattern). While this situation/assumption is indeedrestrictive: (i) there are scenarios and applications, in addition to synthetic images likebarcodes; (ii) It is far from clear how standard regularization based methods of the form(8) can be used to exploit symbology to obtain a similar accuracy of kernel estimation.In general, the MEM method is stable with respect to small amounts of noise andthis allowed us to successfully deblur noisy data by ﬁrst pre conditioning with a state ofthe art denoiser. However, as shown in Section 6, the MEM method itself can be usedfor denoising with a particular choice of prior.Finally, let us reiterate that in our numerical experiments we use only a modestamount of the potential of MEM to exploit prior information. Future work will concernboth kernel estimation without the presence of ﬁnder patterns as well as a full study ofeﬀects of regularization via the image formulation discussed in Section 6.1.

Acknowledgments

G.R. was partially supported by the NSERC CGS-M program, R.C. and T.H. werepartially supported by the NSERC Discovery Grants program. We thank YakovVaisbourd for proposing the use of FISTA in Section 6.1. We would also like to thank theanonymous referees for their many comments which signiﬁcantly improved the paper.

Appendix A. Implementation Details

All ﬁgures were generated by implementing the methods in the Python programminglanguage using the Jupyter notebook environment. Images were blurred syntheticallyusing motion blur kernels taken from [48] as well as Gaussian blur kernels to simulateout of focus blur. The relevant convolutions are performed using fast Fourier transforms.Images that are not standard test bank images were generated using the GNU Image7Manipulation Program (GIMP), moreover this software was used to add symbolicconstraints to images that did not originally incorporate them. All testing was performedon a laptop with an Intel i5-4200U processor. The running time of this method dependson a number of factors such as the size of the image being deblurred, whether the imageis monochrome or colour, the desired quality of the reproduction desired (controlled bythe parameter α ) as well as the size of the kernel and whether or not it is given. Ifa very accurate result is required, these runtimes vary from a few seconds for a smallmonochrome text image blurred with a small sized kernel to upwards of an hour for ahighly blurred colour image. Appendix B. The pre and post-processing steps

We provide an example of the intermittent images generated in the process of deblurringa noisy image via our method. (a) (b) (c) (d)

Figure B1: (a) is the blurred and noisy image from Figure 2 (c). (b) is the denoisedimage obtained via the method of Zoran and Weiss [104]. (c) is the deblurred imageobtained using our method. (d) is a smoothed version of (c) obtained via TV denoising[74] using the implementation of Chambolle [13].

Appendix C. Parameters for Comparisons

We compile the parameters used for the kernel estimation step of the deblurring methodsto which we compared our method.Once the kernel has been estimated, the deconvolution method for images withoutliers [18] was used to obtain the latent image. For this method, we set the standarddeviation for inlier noise to and set the regularization strength for the sparse priorsto 0 .

003 and decrease it iteratively (with the same kernel) until we obtain a balancebetween the sharpness of the image and the amount of noise.8Liu et al [49] Pan et al [65] Yan et al [99]Fig. 2 (a) Kernel size 55Kernel priorparameter 0 . . . . . . . . (cid:96) gradient parameter is set to 4 e − , theparameters for the surface-aware prior, dark channel prior and the extreme channelprior are all set to 4 e − in their respective methods. Appendix D. Comparison of our symbology-based Blind Method to aNon-Blind Deblurring Method

Here we consider the same examples as in ﬁgures 2 and 4 but compare our symbology-based blind method with a state of art method deconvolution method of Cho et al. [18];that is, we give the comparison method the advantage of knowing the PSF. PSNR valuesfor Cho’s method were computed with a cropped version of the latent image to reducethe eﬀects of the boundary conditions for the convolution. The choice of boundarycondition accounts for some of our the higher PSNR values. In images with noise, thenon-blind deconvolution method was applied to both the noisy image and the denoisedimage (via our pre-denoising step), the better result is presented in the ﬁgure D1.

Bibliography [1] H. Attouch, G. Buttazzo, and G. Michaille.

Variational Analysis in Sobolev and BV Spaces .Society for Industrial and Applied Mathematics, 2014.[2] M. R. Banham and A. K. Katsaggelos. Spatially adaptive wavelet-based multiscale imagerestoration.

IEEE Transactions on Image Processing , 5(4):619–634, 1996.[3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm with application towavelet-based image deblurring. In , pages 693–696, 2009.[4] G. Le Besnerais, J.-F. Bercher, and G. Demoment. A new look at entropy for solving linearinverse problems.

IEEE Transactions on Information Theory , 45(5):1565–1578, 1999. InputPSNR: 13.79 dB OursPSNR: 29.44 dB Cho et al.PSNR: 23.73 dBInputPSNR: 13.54 dB OursPSNR: 27.79 dB Cho et al.PSNR: 28.50 dBInputPSNR: 15.37 dB OursPSNR: 25.67 dB Cho et al.PSNR: 25.12 dBInputPSNR: 14.61 dB OursPSNR: 39.66 dB Cho et al.PSNR: 31.52 dB

Figure D1 [5] G. Le Besnerais, J. Navaza, and G. Demoment. Aperture synthesis in astronomical radio-interferometry using maximum entropy on the mean. In Su-Shing Chen, editor,

Stochasticand Neural Methods in Signal Processing, Image Processing, and Computer Vision , volume1569, pages 386 – 395. International Society for Optics and Photonics, SPIE, 1991.[6] G. Le Besnerais, J. Navaza, and G. Demoment. Aperture synthesis in astronomical radio-interferometry using maximum entropy on the mean. In

Stochastic and Neural Methods inSignal Processing, Image Processing, and Computer Vision , volume 1569, pages 386–395.International Society for Optics and Photonics, 1991.[7] P. Blomgren, T. F. Chan, P. Mulet, and C. K. Wong. Total variation image restoration: numerical methods and extensions. In Proceedings of International Conference on Image Processing ,volume 3, pages 384–387 vol.3, 1997.[8] J. F. Bonnans and A. Shapiro.

Perturbation Analysis of Optimization Problems . Springer, 2000.[9] J.M. Borwein and A.S. Lewis. Partially ﬁnite convex programming, Part I: Quasi relative interiorsand duality theory.

Math.l Program. , 57:15–48, 1992.[10] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A Limited Memory Algorithm for Bound ConstrainedOptimization.

SIAM J. Sci. Comput. , 16:1190–1208, 1995.[11] J. Cai, Hui Ji, Chaoqiang Liu, and Z. Shen. Blind motion deblurring from a single image usingsparse approximation. In ,pages 104–111, 2009.[12] J.-F. Cai, S. Osher, and Z. Shen. Linearized bregman iterations for frame-based image deblurring.

SIAM Journal on Imaging Sciences , 2(1):226–252, 2009.[13] A. Chambolle. An Algorithm for Total Variation Minimization and Applications.

Journal ofMathematical Imaging and Vision , 20:89–97, 2004.[14] T. F. Chan and Chiu-Kwong Wong. Total variation blind deconvolution.

IEEE Transactions onImage Processing , 7(3):370–375, 1998.[15] J. Cheng, Y. Gao, B. Guo, and W. Zuo. Image restoration using spatially variant hyper-laplacianprior.

Signal, Image and Video Processing , 13(1):155–162, Feb 2019.[16] H. Cho, J. Wang, and S. Lee. Text image deblurring using text-speciﬁc properties. In , volume 7576, pages 524–537, 10 2012.[17] S. Cho and S. Lee. Fast motion deblurring.

ACM Trans. Graph. , 28(5):1–8, December 2009.[18] S. Cho, J. Wang, and S. Lee. Handling outliers in non-blind image deconvolution. In , pages 1–8, 2011.[19] J. B. Conway.

A Course in Functional Analysis . Springer, 2007.[20] T. M. Cover and J. A. Thomas.

Elements of Information Theory . Wiley, 2006.[21] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image restoration by sparse 3D transform-domain collaborative ﬁltering. In Jaakko T. Astola, Karen O. Egiazarian, and Edward R.Dougherty, editors,

Image Processing: Algorithms and Systems VI , volume 6812, pages 62 –73. International Society for Optics and Photonics, SPIE, 2008.[22] D. Dacunha-Castelle and F. Gamboa. Maximum d’entropie et probl`eme des moments.

Annalesde l’I.H.P. Probabilit´es et statistiques , 26(4):567–596, 1990.[23] A. Danielyan, V. Katkovnik, and K. Egiazarian. Bm3d frames and variational image deblurring.

IEEE Transactions on Image Processing , 21(4):1715–1728, 2012.[24] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverseproblems with a sparsity constraint.

Communications on Pure and Applied Mathematics ,57(11):1413–1457, 2004.[25] A. Decarreau, D. Hilhorst, C. Lemar´echal, and J. Navaza. Dual methods in entropy maximization.application to some problems in crystallography.

SIAM Journal on Optimization , 2(2):173–197,1992.[26] J.-D. Deuschel and D. W. Stroock.

Large Deviations . Academic Press, 1989.[27] D. C. Dobson and F. Santosa. Recovery of blocky images from noisy and blurred data.

SIAMJournal on Applied Mathematics , 56(4):1181–1198, 1996.[28] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally centralized sparse representation for imagerestoration.

IEEE Transactions on Image Processing , 22(4):1620–1630, 2013.[29] W. Dong, L. Zhang, G. Shi, and X. Wu. Image deblurring and super-resolution by adaptivesparse domain selection and adaptive regularization.

IEEE Transactions on Image Processing ,20(7):1838–1857, 2011.[30] M. Elad, M. A. T. Figueiredo, and Y. Ma. On the role of sparse and redundant representationsin image processing.

Proceedings of the IEEE , 98(6):972–982, 2010.[31] J. M. Fadili and J.-L. Starck. Sparse Representation-based Image Deconvolution by iterativeThresholding. In

Astronomical Data Analysis ADA’06 , Marseille, France, 2006. [32] M. A. T. Figueiredo and R. D. Nowak. An em algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing , 12(8):906–916, 2003.[33] G. B. Folland.

Real Analysis: Modern Techniques and Their Applications . Wiley, 1999.[34] F. Gamboa.

M´ethode du maximum d’entropie sur la moyenne et applications . PhD thesis, Paris11, 1989.[35] Tom Goldstein and Stanley Osher. The split bregman method for l1-regularized problems.

SIAMJournal on Imaging Sciences , 2(2):323–343, 2009.[36] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, andY. Bengio. Generative Adversarial Networks.

Proceedings of the International Conference onNeural Information Processing Systems , pages 2672–2680, 2014.[37] C. Heinrich, J.-F. Bercher, and G. Demoment. The maximum entropy on the mean method,correlations and implementation issues. In

Workshop Maximum Entropy and BayesianMethods , pages 52 – 61. Kluwer, 1996.[38] Z. Hu, J. Huang, and M. Yang. Single image deblurring with adaptive dictionary learning. In , pages 1169–1172, 2010.[39] E. T. Jaynes. Information Theory and Statistical Mechanics.

Phys. Rev. , 106:620–630, May1957.[40] E. T. Jaynes. Information Theory and Statistical Mechanics. II.

Phys. Rev. , 108:171–190, Oct1957.[41] N. Joshi, R. Szeliski, and D. J. Kriegman. Psf estimation using sharp edge prediction. In , pages 1–8, 2008.[42] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarialnetworks. In , pages4396–4405, 2019.[43] S. Kim, K. Koh, M. Lustig, and S. Boyd. An eﬃcient method for compressed sensing. In , volume 3, pages III – 117–III – 120, 2007.[44] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure.In , pages 233–240, 2011.[45] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin., and J. Matasi. DeblurGAN: Blind MotionDeblurring Using Conditional Adversarial Networks. , pages 8183–8192, 2018.[46] G. Le Besnerais, J. Bercher, and G. Demoment. A new look at entropy for solving linear inverseproblems.

IEEE Transactions on Information Theory , 45(5):1565–1578, 1999.[47] A. Levin, R. Fergus, F. Durand, and W.T. Freeman. Image and depth from a conventionalcamera with a coded aperture.

ACM Trans. Graph. , 26(3):70–es, July 2007.[48] A. Levin, Y. Weiss, F. Durand, and W.T. Freeman. Understanding and Evaluating DeconvolutionAlgorithms. , pages 1964–1971, 2009.[49] J. Liu, M. Yan, and T. Zeng. Surface-aware blind image deblurring.

IEEE Transactions onPattern Analysis and Machine Intelligence , pages 1–1, 2019.[50] J. Mairal, G. Sapiro, and M. Elad. Learning multiscale sparse representations for image andvideo restoration.

Multiscale Modeling & Simulation , 7(1):214–241, 2008.[51] P. Mar´echal. On the principle of maximum entropy on the mean as a methodology for theregularization of inverse problems. in B. Grigelionis et al. (Editeurs), Probability Theory andMathematical Statistics, VPS/TEV , pages 481–492, 1999.[52] P. Mar´echal and A. Lannes. Uniﬁcation of some deterministic and probabilistic methods for thesolution of linear inverse problems via the principle of maximum entropy on the mean.

InverseProblems , 13(1):135–151, feb 1997.[53] J. G. Nagy and D. P. O’Leary. Restoring images degraded by spatially variant blur.

SIAMJournal on Scientiﬁc Computing , 19(4):1063–1082, 1998.[54] S. Nah, T.H. Kim, and K.M. Lee. Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring. , pages3883–3891, 2017.[55] R. Narayan and R. Nityananda. Maximum entropy image restoration in astronomy.

AnnualReview of Astronomy and Astrophysics , 24(1):127–170, 1986.[56] J. Navaza. On the maximum-entropy estimate of the electron density function.

ActaCrystallographica Section A , 41(3):232–244, 1985.[57] J. Navaza. The use of non-local constraints in maximum-entropy electron density reconstruction.

Acta Crystallographica Section A , 42(4):212–223, Jul 1986.[58] M. K. Ng, R. H. Chan, and T. Wun-Cheung. A fast algorithm for deblurring models withneumann boundary conditions.

SIAM Journal on Scientiﬁc Computing , 21(3):851 – 866, 1999.[59] T. M. Nimisha, A. K. Singh, and A. N. Rajagopalan. Blur-invariant deep learning for blind-deblurring. In , pages 4762–4770,2017.[60] D. Noll. Restoration of Degraded Images with Maximum Entropy.

Journal of GlobalOptimization , 10:91–103, 1997.[61] M. Noroozi, P. Chandramouli, and P. Favaro. Motion Deblurring in the Wild.

GCPR 2017 ,pages 65–77, 2017.[62] J. Pan, Z. Hu, Z. Su, and M. Yang. Deblurring text images via l0-regularized intensity andgradient prior. In , pages2901–2908, 2014.[63] J. Pan, Z. Hu, Z. Su, and M.-H. Yang. Deblurring face images with exemplars. 2014.[64] J. Pan, Z. Hu, Z. Su, and M.-H. Yang. L -Regularized Intensity and Gradient Prior for DeblurringText Images and Beyond. IEEE Trans. Pattern Anal. Mach. Intell. , 39(2):342–355, 2017.[65] J. Pan, D. Sun, H. Pﬁster, and M.-H. Yang. Blind Image Deblurring Using Dark Channel Prior. , pages 1628–1636, 2016.[66] D. Perrone and P. Favaro. A logarithmic image prior for blind deconvolution.

InternationalJournal of Computer Vision , 117, 09 2015.[67] D. Perrone and P. Favaro. A clearer picture of total variation blind deconvolution.

IEEETransactions on Pattern Analysis and Machine Intelligence , 38(6):1041–1055, 2016.[68] S. Ramakrishnan, S. Pachori, A. Gangopadhyay, and S. Raman. Deep Generative Filter forMotion Deblurring. , pages 2993–3000, 2017.[69] W. Ren, X. Cao, J. Pan, X. Guo, W. Zuo, and M.-H. Yang. Image Deblurring via EnhancedLow-Rank Prior.

IEEE Trans. Image Process. , 25(7):3426–3437, 2016.[70] G. Rioux, C. Scarvelis, R. Choksi, T. Hoheisel, and P. Mar´echal. Blind Deblurring of Barcodesvia Kullback-Leibler Divergence.

IEEE Trans. Pattern Anal. Mach. Intell., in press , 2020.doi: .[71] R. T. Rockafellar.

Convex analysis . Princeton University Press, 1970.[72] R. T. Rockafellar and R. J-B Wets.

Variational Analysis . Springer, 2009.[73] V. K. Rohatgi and A. K. MD. E. Saleh.

An Introduction to Probability and Statistics . Wiley,2001.[74] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms.

Physica D , 60:259–268, 1992.[75] L. I. Rudin and S. Osher. Total variation based image restoration with free local constraints. In

Proceedings of 1st International Conference on Image Processing , volume 1, pages 31–35 vol.1,1994.[76] W. Rudin.

Real and Complex Analysis . McGraw-Hill, 1987.[77] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Sch¨olkopf. A machine learning approach fornon-blind image deconvolution. In , pages 1067–1074, 2013.[78] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Sch¨olkopf. Learning to Deblur.

IEEE Trans. Pattern Anal. Mach. Intell. , 38(7):1439–1451, 2015.[79] Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. In

ACM SIGGRAPH 2008 Papers , SIGGRAPH ’08, New York, NY, USA, 2008. Association forComputing Machinery.[80] M. Shi, T. Han, and S. Liu. Total variation image restoration using hyper-laplacian priorwith overlapping group sparsity.

Signal Processing , 126:65 – 76, 2016. Signal Processingfor Heterogeneous Sensor Networks.[81] J Shore. Minimum cross-entropy spectral analysis.

IEEE Transactions on Acoustics, Speech,and Signal Processing , 29(2):230–237, 1981.[82] J. Skilling and R. K. Bryan. Maximum entropy image reconstruction-general algorithm.

Monthlynotices of the royal astronomical society , 211:111, 1984.[83] F. Soulez, E. Thi´ebaut, and L. Denis. Restoration of hyperspectral astronomical data withspectrally varying blur.

EAS Publications Series , 59:403–416, 2013.[84] J.-L. Starck, M. K. Nguyen, and F. Murtagh. Wavelets and curvelets for image deconvolution:a combined approach.

Signal Processing , 83(10):2279 – 2283, 2003.[85] J. Sun, Wenfei Cao, Zongben Xu, and J. Ponce. Learning a convolutional neural network fornon-uniform motion blur removal. In , pages 769–777, 2015.[86] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia. Scale-Recurrent Network for Deep ImageDeblurring. , pages 8174–8182, 2018.[87] A. N. Tikhonov and V. Y. Arsenin.

Solutions of Ill-Posed Problems . Winston and Sons, 1977.[88] J. Tzeng, C. Liu, and T. Q. Nguyen. Contourlet domain multiband deblurring based on colorcorrelation for ﬂuid lens cameras.

IEEE Transactions on Image Processing , 19(10):2659–2668,2010.[89] B. Urban. Retrieval of atmospheric thermodynamical parameters using satellite measurementswith a maximum entropy method.

Inverse Problems , 12:779–796, 1996.[90] Y. van Gennip, P. Athavale, J. Gilles, and R. Choksi. A regularization approach to blinddeblurring and denoising of qr barcodes.

IEEE Transactions on Image Processing , 24(9):2864–2873, 2015.[91] M. Wang, J. Liu, and E.X. Fang. Accelerating Stochastic Composition Optimization.

Journalof Machine Learning Research , 18:1–23, 2017.[92] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for totalvariation image reconstruction.

SIAM Journal on Imaging Sciences , 1(3):248–272, 2008.[93] J. Wu and X. Di. Integrating neural networks into the blind deblurring framework to competewith the end-to-end learning-based methods.

IEEE Transactions on Image Processing ,29:6841–6851, 2020.[94] L. Xu and J. Jia. Two-phase kernel estimation for robust motion deblurring. volume 6311, pages157–170, 09 2010.[95] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse representation for natural image deblurring. In , pages 1107–1114, 2013.[96] Li Xu, Cewu Lu, Yi Xu, and Jiaya Jia. Image smoothing via l0 gradient minimization.

ACMTrans. Graph. , 30(6):1–12, 2011.[97] S. Xu, X. Yang, and S. Jiang. A fast nonlocally centralized sparse representation algorithm forimage denoising.

Signal Processing , 131:99 – 112, 2017.[98] N. Yair and T. Michaeli. Multi-scale weighted nuclear norm image restoration. In , pages 3165–3174, 2018.[99] Y. Yan, W. Ren, Y. Guo, R. Wang, and X. Cao. Image deblurring via extreme channels prior. In , pages 6978–6986, 2017.[100] Y. You and M. Kaveh. Anisotropic blind image restoration. In

Proceedings of 3rd IEEEInternational Conference on Image Processing , volume 2, pages 461–464 vol.2, 1996. [101] Y. You and M. Kaveh. A regularization approach to joint blur identiﬁcation and imagerestoration. IEEE Transactions on Image Processing , 5(3):416–428, 1996.[102] L. Yuan, J. Sun, L. Quan, and H.-Y. Shum. Image deblurring with blurred/noisy image pairs.

ACM Trans. Graph. , 26(3):1–es, July 2007.[103] C. Z˘alinescu.

Convex Analysis in General Vector Spaces . World Scientiﬁc, 2002.[104] D. Zoran and Y. Weiss. From Learning Models of Natural Image Patches to Whole ImageRestoration.2011 IEEE International Conference on Computer Vision