[PDF] A nonlinear PDE-based method for sparse deconvolution

Abstract

In this paper, we introduce a new nonlinear evolution partial differential equation for sparse deconvolution problems. The proposed PDE has the form of continuity equation that arises in various research areas, e.g. fluid dynamics and optimal transportation, and thus has some interesting physical and geometric interpretations. The underlying optimization model that we consider is the standard ℓ 1 minimization with linear equality constraints, i.e. min u {∥u ∥ 1 :Au=f} with A being an under-sampled convolution operator. We show that our PDE preserves the ℓ 1 norm while lowering the residual ∥Au−f ∥ 2 . More importantly the solution of the PDE becomes sparser asymptotically, which is illustrated numerically. Therefore, it can be treated as a natural and helpful plug-in to some algorithms for ℓ 1 minimization problems, e.g. Bregman iterative methods introduced for sparse reconstruction problems in [W. Yin, S. Osher, D. Goldfarb, and J. Darbon,SIAM J. Imaging Sci., 1 (2008), pp. 143-168]. Numerical experiments show great improvements in terms of both convergence speed and reconstruction quality.

Full PDF

aa r X i v : . [ m a t h . O C ] A p r A nonlinear PDE-based method for sparse deconvolution ∗ Yu Mao † Bin Dong ‡ Stanley Osher § February 10, 2010

Abstract

In this paper, we introduce a new nonlinear evolution partial diﬀerential equation for sparse deconvo-lution problems. The proposed PDE has the form of continuity equation that arises in various researchareas, e.g. ﬂuid dynamics and optimal transportation, and thus has some interesting physical and geomet-ric interpretations. The underlying optimization model that we consider is the standard ℓ minimizationwith linear equality constraints, i.e. min u {k u k : Au = f } with A being an under-sampled convolutionoperator. We show that our PDE preserves the ℓ norm while lowering the residual k Au − f k . Moreimportantly the solution of the PDE becomes sparser asymptotically, which is illustrated numerically.Therefore, it can be treated as a natural and helpful plug-in to some algorithms for ℓ minimization prob-lems, e.g. Bregman iterative methods introduced for sparse reconstruction problems in [1]. Numericalexperiments show great improvements in terms of both convergence speed and reconstruction quality. The sparse deconvolution problem is to estimate an unknown sparse signal which has been convolved withsome known kernel function and corrupted by noise. There are many applications of sparse deconvolution,e.g. in seismology, nondestructive ultrasonic evaluation, image restoration, or intracardiac electrograms (seee.g. [2, 3, 4, 5, 6, 7, 8]).There has been an intensive development of sparse deconvolution techniques and algorithms. Due to theirsimplicity, least squares methods have been used to deconvolve sparse seismic signals [9, 10]. The majordrawbacks of least squares methods are their lack of robustness and the ill-poseness of the problem when thenumber of unknowns is less than or equal to the number of equations. Many complementary methods havebeen presented, such as Tikhonov regularization [11] and the total least squares method [12, 13]. Anotherclass of complementary methods to the classical least squares method is the ℓ -penalized models [14, 15],where ℓ norm of the unknown signal is used as an additional regularization term, allowing us to control thesparseness of the solution.In the recent burst of research in compressive sensing (CS) [16, 17, 18], the following ℓ minimizationproblem (basis pursuit) is widely used (see e.g. [19, 1])min u ∈ R n {k u k : Au = f } , (1) ∗ Research supported by ONR grants: N00014-08-1-1119 and N00014-07-0810 and the Department of Defense † Department of Mathematics, UCLA, Los Angeles, CA 90095 ( [email protected] ) ‡ Department of Mathematics, UCSD, 9500 Gilman Drive, La Jolla, CA, 92093-0112 , ( [email protected] ) § Department of Mathematics, UCLA, Los Angeles, CA 90095 ( [email protected] ) k u k := P i | u i | denotes the ℓ norm of u . In our paper, we shall adopt the above optimization modelwith A being a convolution matrix with some known continuous diﬀerentiable kernel function (e.g. theimpulse response of the system or wavelet basis). When the kernel function is everywhere nonnegative andthe sparse signal we want to recover is also known to be nonnegative, we can solve the constrained leastsquares problem min u {k Au − f k : u ≥ } as suggested by [20]. However, this model does not work forgeneral kernel functions and signals with negative entries. Hence in this paper, we will stick to the above ℓ minimization model (1), although we believe our PDE can be also added to many nonnegative least squaresapproaches.To solve the constrained optimization (1), standard second-order methods such as interior-point methodswere introduced in [21, 22, 23]. However, these methods, although accurate, become ineﬃcient for largescale problems. To overcome this, large scale optimization methods have recently been developed to solve(1) or its siblings [1, 24, 25, 26, 27, 28, 29, 30, 31, 32]. These methods are very popular in CS, and all ofthem, especially Bregman iterations introduced to CS in [1], have outstanding performance in solving CSproblems, e.g. when the matrix A is taken to be a random submatrix of some orthonormal matrix, such asFourier or DCT matrix.For deconvolution problems however, the above mentioned methods are usually less eﬃcient than solvingCS problems. The major reason is that each column vector of a convolution matrix A is highly coherentto the nearby columns, especially the ones that are right next to it. This makes the signal much harder todeconvolve if it has two spikes that are very close to each other. Therefore, in contrast to CS problems,the spatial information of u (i.e. the variable “ x ” of u ( x )) is no longer irrelevant for sparse deconvolutionproblems, which motivates us to introduce what we call a spatial motion PDE into optimization algorithmsto solve problem (1).Intuitively speaking, this PDE should spatially move spikes around such that the constraint Au = f iscloser to being satisﬁed. However, the ℓ norm of the solution should not be altered during the process. Wewill prove in Section 3 that our motion PDE indeed satisﬁes these properties. Furthermore, the PDE alonetends to sparsify the solution, which, together with its other properties, makes it an eﬀective and powerfuladd-on to some of the optimization algorithms for CS problems mentioned above. In this paper, we shallcombine the motion PDE with Bregman iteration [1] (with the inner unconstrained problem (3) solved byFPC [30]) to enhance its performance. We note that it is also possible to combine the motion PDE withother iterative thresholding based algorithms (e.g. the linearized Bregman iterative method in [25] or theBregmanized operator splitting method in [28]).The rest of the paper is organized as follows. We will ﬁrst recall some basics of Bregman distance andBregman iterations in Section 2. Then in Section 3, we introduce our motion PDE and explore its physicaland geometrical meaning. In Section 4, we combine our PDE with Bregman iteration and propose our newalgorithm. Numerical experiments are also conducted. Finally, we draw our conclusions in Section 5. In this section, we brieﬂy recall the concept of Bregman distance [33] and Bregman iterations [1].2 .1 Bregman Distance

The Bregman distance [33], based on the convex function J , between points u and v , is deﬁned by D pJ ( u, v ) = J ( u ) − J ( v ) − h p, u − v i where p ∈ ∂J ( v ) is an element in the subgradient of J at the point v . In general D pJ ( u, v ) = D pJ ( v, u ) and thetriangle inequality is not satisﬁed, so D pJ ( u, v ) is not a distance in the usual sense. However it does measurethe closeness between u and v in the sense that D pJ ( u, v ) ≥ D pJ ( u, v ) ≥ D pJ ( w, v ) for all points w on theline segment connecting u and v . Moreover, if J is convex, D pJ ( u, v ) ≥

0, if J is strictly convex D pJ ( u, v ) > u = v and if J is strongly convex, then there exists a constant a > D pJ ( u, v ) ≥ a k u − v k . To solve (1) Bregman iteration was proposed in [1]. Given u = p = 0, we deﬁne: u k +1 = arg min u ∈ R n (cid:8) µ k u k − µ k u k k − h u − u k , p k i + k Au − f k (cid:9) p k +1 = p k − A T ( Au k +1 − f ) . (2)This can be written as, for J ( u ) = µ k u k , u k +1 = arg min u ∈ R (cid:26) D p k J ( u, u k ) + 12 k Au − f k (cid:27) . As shown in [1], see also [34, 35], the Bregman iteration (2) can be written as, for f = 0 , u = 0: u k +1 = arg min u ∈ R n (cid:26) µ k u k + 12 k Au − f k k (cid:27) (3) f k +1 = f k + f − Au k +1 . At each iteration, we will solve the problem (3) via the ﬁxed point continuation (FPC) method proposed in[30]. Other ways of solving the unconstrained problem (3) can be found in e.g. [31, 36, 37, 38, 32, 39] andthe references therein. Our improvement will work with most of these solvers. Now altogether, we have thefollowing FPC Bregman iterations solving problem (1): u k +1 ,N ←− ( u k,l + ← u k,l − δA ⊤ ( Au k,l − f k ) u k,l +1 ← shrink( u k,l + , µ ) (FPC) f k +1 = f k + f − Au k +1 ,N . (4)Here “shrink” is the soft thresholding function [40] deﬁned asshrink( x, µ ) :=  x − µ, if x > µ , if − µ ≤ x ≤ µx + µ, if x < − µ. N = 1, this gives us another algorithm,called Bregmanized operator splitting, that also solves problem (1) [28]. In this section, we introduce and analyze a spatial motion PDE. The main purpose of this PDE is to spatiallymove and properly combine spikes in a sparse signal u , such that the residual k Au − f k is reduced while k u k is preserved. We will also present some interesting physical and geometric interpretations of the motionPDE. Throughout this section, u is understood as a function deﬁned on the image domain Ω, while A is acomposition of an convolution operator and a spatial restriction operator with respect to a sampling region S , and A ⊤ is the conjugate operator of A . In other words,( Au )( x ) = χ S ( x ) Z Ω K ( x − y ) u ( y ) dy (5)where K is a diﬀerentiable convolution kernel and χ S is the characteristic function of S . The other operatorsin this section should be understood as functional operators as well. To analyze the desired spatial motion of the signal, we start from a characteristic point of view, i.e. observingthe moving trajectory of a point mass and describing its motion. Consider a simple sparse signal that consistsof an isolated spike at location c : u = δ ( x − c ) . We want to ﬁnd the proper spatial motion that decreases the energy H ( u ) = k Au − f k . To do this, wetake the partial derivative of the energy with respect to the spatial variable c and obtain ∂∂c H ( u ) = −∇ x ( H ′ ( u )) (cid:12)(cid:12) x = c = −∇ x ( A ⊤ ( Au − f )) (cid:12)(cid:12) x = c . The above identity means that if we want to minimize H ( u ), we can spatially move the spike along thedirection ∇ x ( A ⊤ ( Au − f )) (cid:12)(cid:12) x = c . In other words, the trajectory of the moving spike that minimizes H ( u ) canbe described as u ( x, t ) = δ ( x − c ( t )) where c ( t ) satisﬁes c ′ ( t ) = −∇ x ( A ⊤ ( Au − f )) (cid:12)(cid:12) x = c ( t ) . Similarly, if the initial signal is a negative spike, i.e. u = − δ ( x − c ), then its correct direction of spatialmovement should be ∂∂c H ( u ) = ∇ x ( A ⊤ ( Au − f )) (cid:12)(cid:12) x = c .Based on this observation, we can formulate a spatial motion PDE as follows: u t + ∇ x · [ ua ( u )] = 0 (6)where a ( u ) is the velocity ﬁeld of the spatial transport. As elaborated by the examples above, the spatialvelocity ﬁeld a ( u ) should be deﬁned as a ( u ) = −∇ x ( A ⊤ ( Au − f )) · sign( u )4hus (6) can be equivalently formulated as u t = ∇ x · [ | u |∇ x ( A ⊤ ( Au − f ))] . (7)For deﬁniteness we attach the initial and boundary conditions to (7) and obtain the following Cauchy problem  u t = ∇ x · [ | u |∇ x ( A ⊤ ( Au − f ))] , ( x, t ) ∈ Ω × (0 , T ] u ( · ,

0) = u , u | ∂ Ω = 0 (8)where u is a continuous initial function. (8) is the core spatial motion PDE we will discuss throughout thispaper. A locally integrable function u is called a weak solution of (8) if u satisﬁes the boundary conditionand for any C test function η the following identities hold: Z Ω × (0 ,T ] [ η t · u − | u |∇ x η · ∇ x ( A ⊤ ( Au − f ))] dxdt = 0 , (9)lim t → + Z Ω η ( x, t ) u ( x, t ) dx = Z Ω η ( x, u ( x ) dx. (10) (7) The equation (7) has a nice physical interpretation, and is closely related to optimal transportation andnonlinear dissipative processes. To see this, we assume u > u as the mass density of acertain ﬂuid. The mass conservation can be expressed as the following continuity equation for u : u t + ∇ x · [ u ~w ] = 0 (11)where ~w is the velocity ﬁeld that describes the spatial motion of the ﬂuid. Darcy’s law [41] connects thevelocity ﬁeld ~w with the pressure ﬁeld p by ~w = −∇ x p, (12)and thermodynamics tells us the pressure p is determined by the potential E ( u ): p = ∂ u E ( u ) . (13)For an ideal gas in a homogeneous porous medium, the potential E ( u ) is given by the free energy R Cu m .Thus equations (11), (12) and (13) lead to the classical porous medium equation [41]. If we replace thepotential E ( u ) by our residual H ( u ), the combination of (11), (12) and (13) turns out to be our spatialmotion equation (7).Equation (7) can also be understood as a gradient ﬂow of the potential H ( u ) under the Wassersteinmetric. In [42, 43], Otto et al. showed that the porous medium equation is a gradient ﬂow under theWasserstein metric for some given potential. Our following interpretation of (7) will be in the same spiritas in [42, 43].Wasserstein distance is widely used in optimal transportation problems, whose fundamental goal is toﬁnd the most eﬃcient plan to transport a density function u to another density function u in a mass5reserving fashion. Monge’s optimal transportation problem is to ﬁnd the solution toinf M u = u Z u ( x ) | x − M ( x ) | dx (14)and the corresponding optimal value is called Wasserstein distance of order 2 (see [44, 45] for the deﬁnitionof “ g on the tangentspace at u is formally deﬁned by h s , s i g = Z u ∇ x φ · ∇ x φ (15)where s and s are any two inﬁnitesimal variations of u , and φ i , i = 1 , , are solutions of s i + ∇ x · ( u ∇ x φ i ) = 0 . (16)Given the metric tensor g , the set of all density functions forms a Riemannian manifold ( M , g ).Now we can formally interpret the motion PDE (7) as a gradient ﬂow of H ( u ) = k Au − f k on ( M , g ).Under the Riemannian structure, the corresponding gradient of energy H ( u ) w.r.t. g , denoted by grad g H ( u ),is deﬁned by the following identity (cid:10) grad g H ( u ) , ~v (cid:11) g = ∂ ~v H ( u ) (17)for all vector ﬁeld ~v on ( M , g ), where ∂ ~v denotes the directional derivative along ~v . On the other hand, from(15) and (16) we can see that h s , s i g = R φ s where φ solves s + ∇ x · ( u ∇ x φ ) = 0. Therefore, since u solves u t + ∇ x · ( u ∇ x ( − ∂ u H ( u ))) = 0, we have h u t , ~v i g = Z − ∂ u H ( u ) · ~v = − ∂ ~v H ( u ) , ∀ ~v (18)which indicates that u t = − grad g H ( u ). This elegant interpretation leads to a substantial understanding ofour motion PDE (7): it gives the most natural way (in the sense of optimal transport) to spatially move thesignal such that the residual H ( u ) is decreasing. Now we come back to the general case where u is not assumed to be positive. We have the following fairlywell known properties of equations that resemble (8) that explain, to some extent, why this PDE can beutilized to improve the spatial reconstruction. Proposition 1. If u ( x, t ) , ( x, t ) ∈ Ω × (0 , T ] is a weak solution of (8) then H ( u ( x, t )) = k Au − f k isnon-increasing over time. roof. We need to show that ddt H ( u ( x, t )) ≤

0. Indeed, ddt H ( u ( x, t )) = Z H ′ ( u ) · u t dx = − Z ∇ x ( A ⊤ ( Au − f )) · [ | u |∇ x ( A ⊤ ( Au − f ))] dx = − Z | u | [ ∇ x ( A ⊤ ( Au − f ))] dx ≤ Proposition 2. If u ( x, t ) , ( x, t ) ∈ Ω × (0 , T ] is continuous weak solution of (8) then k u k is conserved overtime.Proof. For simplicity, we will prove this in the 1D case, while the proof for higher dimensions is similar. Fora ﬁxed time t , suppose I = [ a, b ] is one of the connected component of { u = 0 } , i.e. u has constant signwithin I , then we have u ( a, t ) = u ( b, t ) = 0, so ddt Z ba | u | dx = Z ba u t sign( u ) dx = sign( u ) · Z ba ∇ x · [ | u |∇ x ( A ⊤ ( Au − f ))]= sign( u ) · [ | u |∇ x ( A ⊤ ( Au − f ))] (cid:12)(cid:12) x = bx = a = 0Therefore overall we have ddt k u k = 0.The above two propositions show that, although the total mass of u is preserved over time, the spatialmass distribution is adjusted such that the equation Au = f is better satisﬁed.Figure 1: The ﬁrst column are observed images f obtained from convolution of a Gaussian with one ortwo spikes. The the second column shows the initial functions u and the remaining 4 columns show thecorresponding solutions of spatial motion PDE (7) at time T=250, 500, 750, 2000 respectively. The diﬀerencebetween the second and third rows reﬂect the fact that with the same f , diﬀerent choices of initial value willlead to diﬀerent results. 7nother and yet more important property of the motion PDE (8) is that the solution u ( x, t ) becomessparser asymptotically in t . We illustrate this phenomena numerically in Figure 1, where the peaks of thesolution u move to the correct locations, and when they are close to the desired locations, u begins to becomesparser which is desired for sparse reconstruction problems. This explains why we can get more accuratesolutions when combining the motion PDE (8) with FPC Bregman iterations (see Section 4.3.2 for moredetails).It is also worth noticing that, comparing the second and the third rows of Figure 1, for diﬀerent initialfunctions u , the solutions of PDE (8) have rather diﬀerent asymptotic behavior. This means that theresidual H ( u ) is not convex under the Wasserstein metric. In fact, the solution in the second row of Figure1 only approximates a local minimizer of H ( u ) with respect to the Wasserstein metric but not a global one,i.e. H ( u ) = 0 in this case, but grad g H ( u ) = 0. (8) The ﬁnite diﬀerence scheme that we use for our motion PDE is as follows: u n +1 i,j − u ni,j dt = D x − h(cid:12)(cid:12)(cid:12) u ni + ,j (cid:12)(cid:12)(cid:12) D x + p ni,j i + D y − h(cid:12)(cid:12)(cid:12) u ni,j + (cid:12)(cid:12)(cid:12) D y + p ni,j i , (19)where D x − u i,j = u i,j − u i − ,j , D x + u i,j = u i +1 ,j − u i,j ,D y − u i,j = u i,j − u i,j − , D y + u i,j = u i,j +1 − u i,j (cid:12)(cid:12)(cid:12) u ni + ,j (cid:12)(cid:12)(cid:12) := ( | u ni +1 ,j | + | u ni,j | , u i +1 ,j u i,j > , (cid:12)(cid:12)(cid:12) u ni,j + (cid:12)(cid:12)(cid:12) := ( | u ni,j +1 | + | u ni,j | , u i,j +1 u i,j > , p n = A ⊤ ( Au n − f )where the operator A is implemented by standard matrix multiplication. The special deﬁnition of | u ni + ,j | and | u ni,j + | above is to make sure that the ℓ norm of u k is exactly preserved in the discrete setting.To guarantee the stability of the numerical solution, the time step dt should be chosen appropriately.Although it is hard to derive a explicit stable condition for our nonlinear PDE, for each step we can treatthe PDE as a transport equation with a ﬁxed velocity ﬁeld, and ﬁnd a necessary stability condition, whichis given by 1 dt n ≥ max i,j {| D x + p ni,j | , | D y + p ni,j |} . (20) As we explained in section 3.3, the PDE (8) does not necessarily lead to a correct solution of the optimizationproblem (1). However, it helps the function to be more sparse while decreasing the residual. Therefore it can8e combined with other existing deconvolution algorithm and improve the performance. Since the motionPDE (8) is time-dependent, it is natural to combine the numerical scheme (19) with FPC Bregman iteration(4) and introduce the following algorithm: u k +1 ,N ←  u k,l + ← u k,l − δA ⊤ ( Au k,l − f k ) u k,l + ← u k,l + + dt ∇ x · h | u k,l + |∇ x (cid:0) A ⊤ ( Au k,l − f k ) (cid:1)i (PDE Update) u k,l +1 ← shrink( u k,l + , µ ) f k +1 = f k + f − Au k +1 ,N . (21)We shall refer to the above algorithm as PDE-FPC Bregman iterations.We further note that for each iteration the PDE update only slightly increases the complexity, because wedo not need to recalculate the term A ⊤ ( Au − f ) and the ﬁnite diﬀerence operator (19) can be implementedrather eﬃciently.We want to point out that this PDE update step can be similarly imposed to many other iterativethresholding based algorithms, e.g. the linearized Bregman iterative method [25] or the BOS method [28].As far as we tested, this complementary step always improves the performance and accuracy of these algo-rithms without increasing the computational complexity very much. In many cases the improvement is verysigniﬁcant. Remark:

Note that the inner loop of (21) contains N ≥ N is not critical for the convergence of the scheme, however it aﬀects the performance. When N is small, the algorithm converges faster (in terms of total number of iterations) while the decay of theresidual is more oscillatory; when N is large, it converges slower while the decay of the residual appears morestable. In our experiments we choose N = 10, which seems to give us a good balance between the speed anddecay of residual. In this section, we compare the performance of FPC Bregman interations (4) with that of PDE-FPC Bregmaniterations (21). We will ﬁrst consider the case where there is not any noise in f , and compare the convergencespeed. Then we consider a noisy case and compare the accuracy of the solutions obtained from the twoalgorithms. The operator A is taken to be a convolution operator with a Gaussian kernel. The clean andnoisy f are shown in Figure 2, where the size of the images is 50 ×

50. The two kernels of the cleanblurry images are generated by Matlab function fspecial(’gaussian’,[41 41], σ ) with σ = 4 and 4.5respectively. The noisy images are generated by adding Gaussian white noise to the clean blurry image withkernel fspecial(’gaussian’,[21 21], . ) . The intensities of the spikes are generated randomly in [0 . , We consider the noise free case (left two images in Figure 2). Here we take the stopping criteria k u k − ¯ u k k ¯ u k < − , where ¯ u is the true solution. The following table summaries the comparison results with variouschoices of parameter µ . The time step dt for the PDE is chosen to reﬂect best performance for given µ .From Table 1 and Table 2 we can see that PDE-FPC Bregman is generally faster than FPC Bregman interms of computational time. Furthermore, the smaller µ gets, the more PDE-FPC Bregman outperformsFPC Bregman. The reason is that for small µ , u k is more regular (i.e. less sparse) than for large µ due to9igure 2: Clean observed images generated by diﬀerent Gaussian kernels(left and middle). Noisy observedimage (right) with SNR=15.87. Black circles indicating the locations of the spikes.thresholding, and hence the PDE update in (21) will have a much nicer behavior. Also, a larger time step dt is allowed when u k is more regular.It is also worth noticing from Table 1 and 2 that the best performance of PDE-FPC Bregman is 6 and10 times faster respectively than that of FPC Bregman (marked by “ ♠ ”). The more blurry is the image, thebigger is the improvement of PDE-FPC Bregman over FPC Bregman.Table 1: Comparisons of FPC Bregman (4) and PDE-FPC Bregman (21). The symbol “ ♠ ” labels the bestresults for FPC Bregman and PDE-FPC Bregman.FPC Bregman PDE-FPC Bregman( δ, µ ) dt ♠ ♠ We now consider the noisy case (right image in Figure 2). Experimental results are summarized in thefollowing Table 3. Since there is noise in f , the error, which is taken to be k u k − ¯ u k k ¯ u k , generally decreasesﬁrst and then increases, and u k will eventually converge to something noisy. Therefore, in order to reﬂectthe best performance of both algorithms, we recorded the number of iterations and computation time thatcorrespond to the smallest error possible for each given set of parameters. From Table 3 we can see that,for each given µ , although PDE+FPC Bregman may not always be faster than FPC Bregman, it alwaysproduces a solution with a smaller error. If we consider the best set of parameters for the two algorithms,PDF-FPC Bregman outperforms FPC Bregman in terms of both computation time and accuracy.10able 2: Comparisons of FPC Bregman (4) and PDE-FPC Bregman (21). The symbol “ ♠ ” labels the bestresults for FPC Bregman and PDE-FPC Bregman.FPC Bregman PDE-FPC Bregman( δ, µ ) dt ♠ ♠ Table 3: Comparisons of FPC Bregman (4) and PDE-FPC Bregman (21). The symbol “ ♠ ” labels the bestcomputational time and error respectively for FPC Bregman and PDE-FPC Bregman.FPC Bregman PDE-FPC Bregman( δ, µ ) dt ♠ ♠ ♠ ♠ (2, 0.001) 40.0 2730 4.70 4.05e-001 623 1.38 5.22e-002 In this paper, we introduced a new nonlinear evolution PDE for sparse deconvolution problems. We showedthat this spatial motion PDE preserves the ℓ norm while lowering the residual k Au − f k . We also showednumerically that the solution of the motion PDE tends to become sparse and converges to delta functionsasymptotically. Therefore, utilizing these properties of the PDE, our proposed PDE-FPC Bregman itera-tions (21) outperformed the original FPC Bregman iterations [1] in terms of both convergence speed andreconstruction quality. This was strongly supported by our numerical experiments. We ﬁnally note that ourPDE can also be used to enhance performance for many other iterative thresholding based algorithms, e.g.[25, 28, 39]. References [1] Wotao Yin, Stanley Osher, D Goldfarb, and J Darbon. Bregman iterative algorithms for l 1-minimizationwith applications to compressed sensing.

SIAM J. Imaging Sci , 1(1):143–168, 2008.[2] J.M. Mendel and CS Burrus. Maximum-likelihood deconvolution: a journey into model-based signalprocessing. 1990. 113] MS O’Brien, AN Sinclair, and SM Kramer. Recovery of a sparse spike time series by L norm deconvo-lution.

IEEE Transactions on Signal Processing , 42(12):3353–3365, 1994.[4] T. Olofsson and E. Wennerstrom. Sparse deconvolution of B-scan images.

IEEE Transactions onUltrasonics, Ferroelectrics and Frequency Control , 54(8):1634–1641, 2007.[5] T. Olofsson and T. Stepinski. Maximum a posteriori deconvolution of sparse ultrasonic signals usinggenetic optimization.

Ultrasonics , 37(6):423, 1999.[6] T. Olofsson. Semi-sparse deconvolution robust to uncertainties in the impulse responses.

Ultrasonics ,42(1-9):969–975, 2004.[7] N.P. Galatsanos, A.K. Katsaggelos, R.T. Chin, and A.D. Hillery. Least squares restoration of multi-channel images.

IEEE Transactions on Signal Processing , 39(10):2222–2236, 1991.[8] W.S. Ellis, S.J. Eisenberg, D.M. Auslander, M.W. Dae, A. Zakhor, and M.D. Lesh. Deconvolution: anovel signal processing approach for determining activation time from fractionated electrograms anddetecting infarcted tissue.

Circulation , 94(10):2633–2640, 1996.[9] AJ Berkhout. Least-squares inverse ﬁltering and wavelet deconvolution.

Geophysics , 42:1369, 1977.[10] C.L. Lawson and R.J. Hanson.

Solving least squares problems . Society for Industrial Mathematics, 1995.[11] A.N. Tikhonov, V.Y. Arsenin, and F. John.

Solutions of ill-posed problems . VH Winston Washington,DC, 1977.[12] G.H. Golub and C.F. Van Loan. An analysis of the total least squares problem.

SIAM Journal onNumerical Analysis , pages 883–893, 1980.[13] S. Van Huﬀel and J. Vandewalle. The total least squares problem: computational aspects and analysis.1991.[14] HL Taylor, SC Banks, and JF McCoy. Deconvolution with the l 1 norm.

Geophysics , 44(1):39–52, 1979.[15] HWJ Debeye and P. Riel. Lp-norm deconvolution 1.

Geophysical Prospecting , 38(4):381–403, 1990.[16] Emmanuel J Candes, JK Romberg, and Terence Tao. Stable signal recovery from incomplete andinaccurate measurements.

Communications on Pure and Applied Mathematics , 59(8), 2006.[17] Emmanuel J Candes, JK Romberg, and Terence Tao. Robust uncertainty principles: exact signal recon-struction from highly incomplete frequency information.

IEEE Transactions on Information Theory ,52(2):489– 509, 2006.[18] David L Donoho. Compressed sensing.

IEEE Transactions on Information Theory , 52(4):1289–1306,2006.[19] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit.

SIAM review ,pages 129–159, 2001.[20] Alfred M Bruckstein, Michael Elad, and Michael Zibulevsky. On the uniqueness of nonnegativesparse solutions to underdetermined systems of equations.

IEEE Transactions on Information The-ory , 54(11):4813–4820, 2008. 1221] E. Cand`es and J. Romberg. l 1-magic. , 2007.[22] K. Koh, SJ Kim, and S. Boyd. Solver for l1-regularized least squares problems.[23] M. Lustig, D. Donoho, and J.M. Pauly. Sparse MRI: The application of compressed sensing for rapidMR imaging.

Magnetic Resonance in Medicine , 58(6):1182–1195, 2007.[24] J Darbon and Stanley Osher. Fast discrete optimizations for sparse approximations and deconvolutions.preprint 2007.[25] Stanley Osher, Y Mao, B Dong, and Wotao Yin. Fast linearized Bregman iteration for compressivesensing and sparse denoising.

Communications in Mathematical Sciences , 2009. To appear.[26] J.F. Cai, S. Osher, and Z. Shen. Linearized Bregman iterations for compressed sensing.

Math. Comp ,78:1515–1536, 2009.[27] J. Cai, S. Osher, and Z. Shen. Convergence of the linearized Bregman iteration for ℓ -norm minimization.2008. UCLA CAM Report 08-52.[28] X. Zhang, M. Burger, X. Bresson, and S. Osher. Bregmanized nonlocal regularization for deconvolutionand sparse reconstruction. UCLA CAM Report , pages 09–03, 2009.[29] S. Becker, J. Bobin, and E. Candes. NESTA: A Fast and Accurate First-order Method for SparseRecovery.

Arxiv preprint arXiv:0904.3367 , 2009.[30] E. Hale, W. Yin, and Y. Zhang. A ﬁxed-point continuation method for ℓ -regularization with applicationto compressed sensing. CAAM Technical Report TR, Rice University, Houston, TX , pages 07–07, 2007.CAAM Technical Report TR07-07, Rice University, Houston, TX.[31] M.A.T. Figueiredo, R.D. Nowak, and S.J. Wright. Gradient projection for sparse reconstruction: Ap-plication to compressed sensing and other inverse problems.

IEEE Journal of Selected Topics in SignalProcessing , 1(4):586, 2007.[32] E. van den Berg and M.P. Friedlander. Probing the Pareto frontier for basis pursuit solutions.

SIAMJournal on Scientiﬁc Computing , 31(2):890–912, 2008.[33] L.M. Bregman. The relaxation method of ﬁnding the common point of convex sets and its application tothe solution of problems in convex programming.

USSR Computational Mathematics and MathematicalPhysics , 7(3):200–217, 1967.[34] Stanley Osher, M Burger, D Goldfarb, Jinjun Xu, and Wotao Yin. An iterative regularization methodfor total variation-based image restoration.

Multiscale Model Sim , 4(2):460–489, Jan 2005.[35] T-C. Chang, L. He, and T. Fang. Mr image reconstruction from sparse radial samples using bregmaniteration.

Proceedings of the 13th Annual Meeting of ISMRM , 2006.[36] S.J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. A method for large-scale l 1-regularized leastsquares problems with applications in signal processing and statistics.

Preprint , 2007.[37] D.L. Donoho, Y. Tsaig, I. Drori, and J.L. Starck. Sparse solution of underdetermined linear equationsby stagewise orthogonal matching pursuit. preprint , 2006.1338] Y. Nesterov. Gradient methods for minimizing composite objective function.

Center for OperationsResearch and Econometrics (CORE), Catholic University of Louvain, Tech. Rep , 76:2007, 2007.[39] M.A.T. Figueiredo and R.D. Nowak. An EM algorithm for wavelet-based image restoration.

IEEETransactions on Image Processing , 12(8):906–916, 2003.[40] D.L. Donoho. De-noising by soft-thresholding.

IEEE transactions on information theory , 41(3):613–627,1995.[41] J.L. Vazquez.

The porous medium equation: mathematical theory . Oxford University Press, USA, 2007.[42] F Otto. The geometry of dissipative evolution equations: The porous medium equation.

Commun PartDiﬀ Eq , 26(1-2):101–174, Jan 2001.[43] F Otto and M Westdickenberg. Eulerian calculus for the contraction in the Wasserstein distance.

SIAMJournal on Mathematical Analysis , 37(4):1227–1255, Jan 2006.[44] C. Villani.

Topics in optimal transportation . Amer Mathematical Society, 2003.[45] C´edric Villani. Optimal transportation, dissipative pde’s and functional inequalities, Jan 2003.[46] Y Rubner, C Tomasi, and L.J Guibas. A metric for distributions with applications to image databases.

Computer Vision, 1998. Sixth International Conference on , pages 59–66, 1998.[47] S Haker, L Zhu, A Tannenbaum, and S Angenent. Optimal mass transport for registration and warping.

Int J Comput Vision , 60(3):225–240, Jan 2004.[48] Kangyu Ni, Xavier Bresson, Tony F Chan, and Selim Esedoglu. Local histogram based segmentationusing the Wasserstein distance.