Fast Linearized Bregman Iteration for Compressive Sensing and Sparse Denoising
aa r X i v : . [ m a t h . O C ] A p r Fast Linearized Bregman Iteration for Compressive Sensing andSparse Denoising
Stanley Osher ∗ Yu Mao † Bin Dong ‡ Wotao Yin § December 11, 2008
Abstract
We propose and analyze an extremely fast, efficient and simple method for solving the problem:min {k u k : Au = f,u ∈ R n } . This method was first described in [1], with more details in [2] and rigorous theory given in [3] and [4].The motivation was compressive sensing, which now has a vast and exciting history, which seems to havestarted with Candes, et.al. [5] and Donoho, [6]. See [2], [3] and [4] for a large set of references. Ourmethod introduces an improvement called “kicking” of the very efficient method of [1], [2] and also appliesit to the problem of denoising of undersampled signals. The use of Bregman iteration for denoising ofimages began in [7] and led to improved results for total variation based methods. Here we apply it todenoise signals, especially essentially sparse signals, which might even be undersampled.
Let A ∈ R m × n , with n > m and f ∈ R m , be given. The aim of a basis pursuit problem is to find u ∈ R n bysolving the constrained minimization problem:min u ∈ R n { J ( u ) | Au = f } (1.1)where J ( u ) is a continuous convex function.For basis pursuit, we take: J ( u ) = | u | = n X j =1 | u j | . (1.2)We assume that AA T is invertible. Thus Au = f is underdetermined and has at least one solution, u = A T ( AA T ) − f , which minimizes the ℓ norm. We also assume that J ( u ) is coercive, i.e., whenever k u k →∞ , J ( u ) → ∞ . This implies that the set of all solutions of (1.1) is nonempty and convex. Finally, when J ( u )is strictly or strongly convex, the solution of (1.1) is unique.Basis pursuit arises from many applications. In particular, there has been a recent burst of researchin compressive sensing, which involves solving (1.1), (1.2). This was led by Candes et.al. [5], Donoho, [6],and others, see [2], [3] and [4] for extensive references. Compressive sensing guarantees, under appropriatecircumstances, that the solution to (1.1), (1.2) gives the sparsest solution satisfying Au = f . The problem ∗ Department of Mathematics, UCLA, Los Angeles, CA 90095 ( [email protected] ) This authors research was supported byONR Grant N000140710810, a grant from the Department of Defense and NIH Grant UH54RR021813 † Department of Mathematics, UCLA, Los Angeles, CA 90095 ( [email protected] ) This authors research was supportedby NIH Grant UH54RR021813 ‡ Department of Mathematics, UCLA, Los Angeles, CA 90095 ( [email protected] ) This authors research was supportedby NIH Grant UH54RR021813 § Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005 ( [email protected] ) Thisauthors research was supported by NSF Grant DMS-0748839 and an internal faculty research grant from the Dean of Engineeringat Rice University tanley Osher, Yu Mao, Bin Dong, Wotao Yin A and the sparse solutions u that arise here. To overcome this, a linearizedBregman iterative procedure was proposed in [1] and analyzed in [2], [3] and [4]. In [2], true, nonlinearBregman iteration was also used quite successfully for this problem.Bregman iteration applied to (1.1), (1.2) involves solving the constrained optimization problem throughsolving a small number of unconstrained optimization problems:min u (cid:26) µ | u | + 12 k Au − f k (cid:27) (1.3)for µ > n × n pixel images as: T V ( u ) = n − X i,j =1 (( u i +1 ,j − u ij ) + ( u i,j +1 − u ij ) ) . (1.4)We usually regard the success of the ROF TV based model [9]min u (cid:26) T V ( u ) + λ k f − u k (cid:27) (1.5)(we now drop the subscript 2 for the L norm throughout the paper) as due to the fact that images haveedges and in fact are almost piecewise constant (with texture added). Therefore, it is not surprising thatsparse signals could be denoised using (1.3). The ROF denoising model was greatly improved in [7] and [10]with the help of Bregman iterative regularization. We will do the same thing here using Bregman iterationwith (1.3) to denoise sparse signals, with the added touch of undersampling the noisy signals.The paper is organized as follows: In section 2 we describe Bregman iterative algorithms, as well asthe linearized version. We motivate these methods and describe previously obtained theoretical results.In section 3 we introduce an improvement to the linearized version, call “kicking” which greatly speedsup the method, especially for solutions u with a large dynamic range. In section 4 we present numericalresults, including sparse recovery for u having large dynamic range, and the recovery of signals in largeamounts of noise. In another work in progress [11] we apply these ideas to denoising very blurry and noisysignals remarkably well including sparse recovery for u . By blurry we mean situations where A is perhaps asubsampled discrete convolution matrix whose elements decay to zero with n , e.g. random rows of a discreteGaussian. The Bregman distance [12], based on the convex function J , between points u and v , is defined by D pJ ( u,v ) = J ( u ) − J ( v ) − h p,u − v i (2.6)where p ∈ ∂J ( v ) is an element in the subgradient of J at the point v . In general D pJ ( u,v ) = D pJ ( v,u ) and thetriangle inequality is not satisfied, so D pJ ( u,v ) is not a distance in the usual sense. However it does measurethe closeness between u and v in the sense that D pJ ( u,v ) ≥ D pJ ( u,v ) ≥ D pJ ( w,v ) for all points w on theline segment connecting u and v . Moreover, if J is convex, D pJ ( u,v ) ≥
0, if J is strictly convex D pJ ( u,v ) > u = v and if J is strongly convex, then there exists a constant a > D pJ ( u,v ) ≥ a k u − v k . tanley Osher, Yu Mao, Bin Dong, Wotao Yin u = p = 0, we define: u k +1 = arg min u ∈ R n (cid:26) J ( u ) − J ( u k ) − h u − u k ,p k i + 12 k Au − f k (cid:27) (2.7) p k +1 = p k − A T ( Au k +1 − f ) . This can be written as u k +1 = arg min u ∈ R (cid:26) D p k J ( u,u k ) + 12 k Au − f k (cid:27) . It was proven in [2] that if J ( u ) ∈ C (Ω) and is strictly convex in Ω, then k Au k − f k decays exponentiallywhenever u k ∈ Ω for all k . Furthermore, when u k converges, its limit is a solution of (1.1). It was also provenin [2] that when J ( u ) = | u | , i.e. for problem (1.1) and (1.2), or when J is a convex function satisfying someadditional conditions, the iteration (2.7) leads to a solution of (1.1) in finitely many steps.As shown in [2], see also [7], [10], the Bregman iteration (2.7) can be written as: f k +1 = f k + f − Au k u k +1 = arg min u ∈ R n (cid:26) J ( u ) + 12 k Au − f k +1 k (cid:27) (2.8)This was referred to as “adding back the residual” in [7] . Here f = 0 ,u = 0. Thus the Bregman iterationuses solutions of the unconstrained problemmin u ∈ R (cid:26) J ( u ) + 12 k Au − f k (cid:27) (2.9)as a solver in which the Bregman iteration applies this process iteratively.Since there is generally no explicit expression for the solver of (2.7) or (2.8), we turn to iterative methods.The linearized Bregman iteration which we will analyze, improve and use here is generated by u k +1 = arg min u ∈ R n (cid:26) J ( u ) − J ( u k ) − h u − u k ,p k i + 12 δ k u − ( u k − δA T ( Au k − f )) k (cid:27) p k +1 = p k − δ ( u k +1 − u k ) − A T ( Au k − f ) . (2.10)In the special case considered here, where J ( u ) = µ k u k , then we have the two line algorithm v k +1 = v k − A T ( Au k − f ) (2.11) u k +1 = δ · shrink( v k +1 ,µ ) (2.12)where v k is an auxiliary variable v k = p k + 1 δ u k (2.13)and shrink( x,µ ) := x − µ, if x > µ , if − µ ≤ x ≤ µx + µ, if x < − µ is the soft thresholding algorithm [13] .This linearized Bregman iterative algorithm was invented in [1] and used and analyzed in [2],[3] and [4].In fact it comes from the inner-outer iteration for (2.7). In [2] it was shown that the linearized Bregmaniteration (2.10) is just one step of the inner iteration for each outer iteration. Here we repeat the argumentsalso in [2], which begin by summing the second equation in (2.10) arriving at (using the fact that u = p = 0): p k + 1 δ u k + P k − j =0 A T ( Au j − f ) = p k + 1 δ u k − v k = 0 , for k = 1 , ,.... tanley Osher, Yu Mao, Bin Dong, Wotao Yin u k +1 = arg min u ∈ R n (cid:26) J ( u ) + 12 δ k u − δv k +1 k (cid:27) (2.14)i.e. we are adding back the “linearized noise”, where v k +1 is defined in (2.11).In [2] and [3] some interesting analysis was done for (2.10), (and some for (2.14)). This was done first for J ( u ) continuously differentiable in (2.10) and the gradient ∂J ( u ) satisfying k ∂J ( u ) − ∂J ( v ) k ≤ β h ∂J ( u ) − ∂J ( v ) ,u − v i , (2.15) ∀ u,v ∈ R n , β >
0. In [3] it was shown that, if (2.15) is true, then both of the sequences ( u k ) k ∈ N and ( p k ) k ∈ N defined by (2.10) converge for 0 < δ < k AA T k .In [4] the authors recently give a theoretical analysis, showing that the iteration in (2.11) and (2.12)converges to the unique solution of min u ∈ R n (cid:26) µ k u k + 12 δ k u k : Au = f (cid:27) (2.16)They also show the interesting result: let S be the set of all solutions of the Basis Pursuit problem (1.1),(1.2) and let u = argmin u ∈ S k u k (2.17)which is unique. Denote the solution of (2.16) as u ∗ µ . Thenlim µ →∞ k u ∗ µ − u k = 0 . (2.18)In passing they show that k u ∗ µ k ≤ k u k for all µ > { u k } , { p k } satisfying (2.7) for J continuous and convex, we have, for any ˜ µD p k J (˜ u,u k ) − D J p k − (˜ u,u k − ) ≤ h A ˜ u − f,Au k − − f i − k Au k − − f k . (2.20)Besides implying that the Bregman distance between u k and any element ˜ u satisfying A ˜ u = f is mono-tonically decreasing, it also implies that, if ˜ u is the “noise free” approximation to the solution of (1.1), theBregman distance between u k and ˜ u diminishes as long as k Au k − f k > k A ˜ u − f k = σ, where σ is some measure of the noise (2.21)i.e., until we get too close to the noisy signal in the sense of (2.21). Note, in [7] we took A to be the identity,but these more general results are also proven there. This gives us a stopping criterion for our denoisingalgorithm.In [7] we obtained a result for linearized Bregman iteration, following [15], which states that the Bregmandistance between ˜ u and u k diminish as long as k A ˜ u − f k < (1 − δ k AA T k ) k Au k − f k (2.22)so we need 0 < δ k AA T k < tanley Osher, Yu Mao, Bin Dong, Wotao Yin We begin with the following simple results for the linearized Bregman iteration or the equivalent algorithm(2.10).
Theorem 3.1. If u k → u ∞ , then Au ∞ = f .Proof. Assume Au ∞ = f . Then A T ( Au ∞ − f ) = 0 since A T has full rank. This means that for some i ,( A T ( Au k − f )) i converges to a nonzero value, which means that v k +1 i − v ki does as well. On the other hand { v k } = { u k /δ + p k } is bounded since { u k } converges and p k ∈ [ − µ,µ ]. Therefore { v ki } is bounded, while v k +1 i − v ki converges to a nonzero limit, which is impossible. Theorem 3.2. If u k → u ∞ and v k → v ∞ , then u ∞ minimizes { J ( u ) + δ k u k : Au = f } .Proof. Let ˜ J ( u ) = J ( u ) + δ k u k . then ∂ ˜ J ( u ) = ∂J ( u ) + 1 δ u. Since ∂J ( u k ) = p k = v k − u k /δ , we have ∂ ˜ J ( u k ) = v k . Using the non-negativity of the Bregman distance weobtain ˜ J ( u k ) ≤ ˜ J ( u opt ) − h u opt − u k ,∂ ˜ J ( u k ) i = ˜ J ( u opt ) − h u opt − u k ,v k i where u opt minimizes (1.1) with J replaced by ˜ J , which is strictly convex.Let k → ∞ , we have ˜ J ( u ∞ ) ≤ ˜ J ( u opt ) − h u opt − u ∞ ,v ∞ i Since v k = A T P k − j =0 A T ( f − Au j ), we have v ∞ ∈ range( A T ). Since Au opt = Au ∞ = f , we have h u opt − u ∞ ,v ∞ i = 0, which implies ˜ J ( u ∞ ) ≤ ˜ J ( u opt ).Equation (2.16) (from a result in [3] ) implies that u ∞ will approach a solution to (1.1), (1.2), as µ approaches ∞ .The linearized Bregman iteration has the following monotonicity property: Theorem 3.3. If u k +1 = u k and < δ < / k AA T k , then k Au k +1 − f k < k Au k − f k . Proof.
Let u k +1 − u k = ∆ u k , v k +1 − v k = ∆ v k . Then the shrinkage operation is such that ∆ u ki = δq ki ∆ v ki (3.23)with q ki = 1 if u k +1 i u ki >
0= 0 if u k +1 i = u ki = 0 ∈ (0 ,
1] otherwiseLet Q k = Diag ( q ki ). Then (3.23) can be written as∆ u k = δQ k ∆ v k = δQ k A T ( f − Au k ) (3.24)which implies Au k +1 − f = ( I − δAQ k A T )( Au k − f ) . (3.25)From (3.23), Q k is diagonal with 0 (cid:22) Q k (cid:22) I , so 0 (cid:22) AQ k A T (cid:22) AA T . If we choose δ > δAA T ≺ I , then 0 (cid:22) δAQ k A T ≺ I or − I ≺ I − δAQ k A T (cid:22) I which implies that k Au k − f k is not increasing. To get tanley Osher, Yu Mao, Bin Dong, Wotao Yin AQ k A T ( Au k − f ) = 0 is impossible if u k +1 = u k . Suppose AQ k A T ( Au k − f ) = 0 holds, then from (3.24) we have: h ∆ u k , ∆ v k i = δ h A T ( f − Au k ) ,Q k A T ( f − Au k ) i = 0 . By (3.23), this only happens if u k +1 i = u ki for all i , which is a contradiction.We are still faced with estimating how fast the residual decays. It turns out that if consecutive elementsof u do not change sign, then k Au − f k decays exponentially. By ’exponential’ we mean that the ratio of theresiduals of two consecutive iteration converges to a constant, this type of convergence is sometimes calledlinear convergence. Here we define S u = { x ∈ R n : sign( x i ) = sign( u i ) , ∀ i } (3.26)(where sign(0) = 0 and sign( a ) = a/ | a | for a = 0). Then we have the following: Theorem 3.4. If u k ∈ S ≡ S u k for k ∈ ( T ,T ) , then u k converges to u ∗ , where u ∗ ∈ argmin {k Au − f k : u ∈ S } and k Au k − f k decays to k Au ∗ − f k exponentially.Proof. . Since u k ∈ S for k ∈ [ T ,T ], we can define Q ≡ Q k for T ≤ k ≤ T −
1. From (3.23) we see that Q k isa diagonal matrix consisting of zeros or ones, so Q = Q T Q . Moreover, it is easy to see that S = { x | Qx = x } .Following the argument in Theorem 3.3 we have: u k +1 − u k = ∆ u k = δQ ∆ v k = δQA T ( f − Au k ) (3.27) Au k +1 − f = [ I − δAQA T ]( Au k − f ) (3.28)and − I ≺ I − δAQA T (cid:22) I. Let R n = V ⊕ V , where V is the null space of AQA T and V is spanned by the eigenvectors correspondingto the nonzero eigenvalues of AQA T . Let Au k − f = w k, + w k, , where w k,j ∈ V j for j = 0 ,
1. From (3.28) wehave w k +1 , = w k, w k +1 , = [ I − δAQA T ] w k, for T ≤ k ≤ T −
1. Since w k, is not in the null space of AQA T , then (3.27) and (3.28) imply that k w k, k decays exponentially. Let w = w k, , then AQA T w = 0 AQQA T w ⇒ QA T w = 0. Therefore, from (3.27)we have ∆ u k = δQ T A T ( f − Au k ) = δQA T ( w + w k, ) = δQA T w k, . Thus k ∆ u k k decays exponentially. This means { u k } forms a Cauchy sequence in S , so it has a limit u ∗ ∈ S .Moreover Au ∗ − f = lim k ( Au k − f ) = lim k w k, + lim k w k, = w . Since V and V are orthogonal: k Au k − f k = k w k, k + k w k, k = k Au ∗ − f k + k w k, k , so k Au k − f k − k Au ∗ − f k decays exponentially. The only thing left to show is that u ∗ = argmin( k Au − f k : u ∈ S ) = argmin {k Au − f k : Qu = u } . This is equivalent to way that A T ( Au ∗ − f ) is orthogonal with the hyperspace { u : Qu = u } . It’s easy to seethat since Q is a projection operator, a vector v is orthogonal with { u : Qu = u } if and only if Qv = 0, thus weneed to show QA T ( Au ∗ − f ) = 0. This is obvious because we have shown that Au ∗ − f = w and QA T w = 0.So we find that u ∗ is the desired minimizer. tanley Osher, Yu Mao, Bin Dong, Wotao Yin u will change if and only if the corresponding element of v crosses the boundaryof the interval [ − µ,µ ]. If µ is relatively large compared with the size of ∆ v (which is usually the case whenapplying the algorithm to a compressed sensing problem), then at most iterations the signs of the elementsof u will stay unchanged, i.e. u will stay in the subspace S u defined in (3.26) for a long while. This theoremtells us that under this scenario u will quickly converge to the point u ∗ that minimizes k Au − f k inside S u ,and the difference between k Au − f k and k Au ∗ − f k decays exponentially. After u converges to u ∗ , u willstay there until the sign of some element of u changes. Usually this means that a new nonzero element of u comes up. After that, u will enter a different subspace S and a new converging procedure begins.The phenomenon described above can be observed clearly in Fig 1. The final solution of u containsfive non-zero spikes. Each time a new spike appears, it converges rapidly to the position that minimizes k Au − f k in the subspace S u . After that there is a long stagnation, which means u is just waiting there untilthe accumulating v brings out a new non-zero element of u . The larger µ is, the longer the stagnation takes.Although the convergence of the residual during each phase is fast, the total speed of the convergence suffersmuch from the stagnation. The solution of this problem will be described in the next section. l og o f r e s i dua l Figure 1: The left figure presents a simple signal with 5 non-zero spikes. The right figure shows how thelinearized Bregman iteration converges.
The iterative formula in Algorithm 1 below gives us the basic linearized Bregman algorithm designed tosolve (1.1),(1.2).
Algorithm 1
Bregman Iterative RegularizationInitialize: u = 0, v = 0. while “ k f − Au k not converge” do v k +1 = v k + A ⊤ ( f − Au k ) u k +1 = δ · shrink( v k +1 ,µ ) end while This is an extremely concise algorithm, simple to program, involve only matrix multiplication and shrink-age. When A consists of rows of a matrix of a fast transform like FFT which is a common case for compressedsensing, it is even faster because matrix multiplication can be implemented efficiently using the existing fastcode of the transform. Also, storage becomes a less serious issue. tanley Osher, Yu Mao, Bin Dong, Wotao Yin u converges to a limit u ∗ so we will have u k +1 ≈ u k +2 ≈ ··· ≈ u k + m ≈ u ∗ for some m . Therefore the increment of v in each step, A ⊤ ( f − Au ), is fixed.This implies that during the stagnation u and v can be calculated explicitly as following ( u k + j ≡ u k +1 v k + j = v k + j · A ⊤ ( f − Au k +1 ) j = 1 , ··· ,m (4.29)If we denote the set of indices of the zero elements of u ∗ as I and let I = I be the support of u ∗ , then v ki will keep changing only for i ∈ I and the iteration can be formulated entry-wise as: u k + ji ≡ u k +1 i ∀ iv k + ji = v ki + j · ( A ⊤ ( f − Au k +1 )) i i ∈ I v k + ji ≡ v k +1 i i ∈ I (4.30)for j = 1 , ··· ,m . The stagnation will end when u begins to change again. This happens if and only if someelement of v in I (which keeps changing during the stagnation) crosses the boundary of the interval [ − µ,µ ].When i ∈ I , v ki ∈ [ − µ,µ ], so we can estimate the number of the steps needed for v ki to cross the boundary ∀ i ∈ I from (4.30), which is s i = & µ · sign(( A ⊤ ( f − Au k +1 )) i ) − v k +1 i ( A ⊤ ( f − Au k +1 )) i ' ∀ i ∈ I (4.31)and s = min i ∈ I { s i } (4.32)is the number of steps needed. Therefore, s is nothing but the length of the stagnation. Using (4.29), wecan predict the end status of the stagnation by ( u k + s ≡ u k +1 v k + s = v k + s · A ⊤ ( f − Au k +1 ) j = 1 , ··· ,m (4.33)Therefore, we can kick u to the critical point of the stagnation when we detect that u has been stayingunchanged for a while. Specifically, we have the following algorithm: Algorithm 2. Algorithm 2
Linearized Bregman Iteration with KickingInitialize: u = 0, v = 0. while “ k f − Au k not converge” doif “ u k − ≈ u k ” then calculate s from (4.31) and (4.32) v k +1 i = v ki + s · ( A ⊤ ( f − Au k )) i , ∀ i ∈ I v k +1 i = v ki , ∀ i ∈ I else v k +1 = v k + A ⊤ ( f − Au k ) end if u k +1 = δ · shrink( v k +1 ,µ ) end while Indeed, this kicking procedure is similar to line search commonly used in optimization problems andmodifies the initial algorithm in no way but just accelerates the speed. More precisely, note that theoutput sequence { u k ,v k } is a subsequence of the original one, so all the previous theoretical conclusions onconvergence still hold here.An example of the algorithm is shown in Fig 2. It is clear that all the stagnation in the original convergencecollapses to single steps. The total amount of computation is reduced dramatically. tanley Osher, Yu Mao, Bin Dong, Wotao Yin l og o f r e s i dua l l og o f r e s i dua l Figure 2: The left figure presents the convergence curve of the original linearized Bregman iteration usingthe same signal as Fig 1. The right figure shows the convergence curve of the linearized Bregman iterationwith the kicking modification.
In this section, we demonstrate the effectiveness of the algorithm (with kicking) in solving basis pursuit andsome related problems.
Consider the constrained minimization problemmin | u | s.t. Au = f, where the constraints Au = f are under-determined linear equations with A an m × n matrix, and f generatedfrom a sparse signal ¯ u that has a number of nonzeros κ < m .Our numerical experiments use two types of A matrices: Gaussian matrices whose elements were gen-erated from i.i.d. normal distributions N (0 ,
1) ( randn(m,n) in MATLAB), and partial discrete cosinetransform (DCT) matrices whose k rows were chosen randomly from the n × n DCT matrix. These matricesare known to be efficient for compressed sensing. The number of rows m is chosen as m ∼ κ log( n/κ ) forGaussian matrices and m ∼ κ log n for DCT matrices (following [5] ).The tested original sparse signals ¯ u had numbers of nonzeros equal to 0 . n and 0 . n rounded to thenearest integers in two sets of experiments, which were obtained by round(0.05*n) and round(0.02*n) inMATLAB, respectively. Given a sparsity k ¯ u k , i.e., the number of nonzeros, an original sparse signal ¯ u ∈ R n was generated by randomly selecting the locations of k ¯ u k nonzeros, and sampling each of these nonzeroelements from U ( − ,
1) ( in MATLAB). Then, f was computed as A ¯ u . When k ¯ u k is smallenough, we expect the basis pursuit problem, which we solved using our fast algorithm, to yield a solution u ∗ = ¯ u from the inputs A and f .Note that partial DCT matrices are implicitly stored fast transforms for which matrix-vector multipli-cations in the forms of Ax and A ⊤ x were computed by the MATLAB commands dct(x) and idct(x) ,respectively. Therefore, we were able to test on partial DCT matrices of much larger sizes than Gaussianmatrices. The sizes m -by- n of these matrices are given in the first two columns of Table 1.Our code was written in MATLAB and was run on a Windows PC with a Intel(R) Core(TM) 2 Duo2.0GHz CPU and 2GB memory. The MATLAB version is 7.4.The set of computational results given in Table 1 was obtained by using the stopping criterion k Au k − f kk f k < − , (5.34) tanley Osher, Yu Mao, Bin Dong, Wotao Yin k u k − ¯ u k / k ¯ u k . Throughout our experiments in Table 1, we used µ = 1 to ensure the correctness of the results.Table 1: Experiment results using 10 random instances for each configuration of ( m,n, k ¯ u k ), with nonzeroelements of ¯ u come from U ( − , L with kickingStopping tolerance. k Au k − f k / k f k < − Gaussian matricesstopping itr. k relative error k u k − ¯ u k / k ¯ u k time (sec.)mean std. max mean std. max mean std. max n m k ¯ u k = 0 . n n m k ¯ u k = 0 . n n m k ¯ u k = 0 . n n m k ¯ u k = 0 . n In real applications, the measurement f we obtain is usually contaminated by noise. The measurement wehave is: ˜ f = f + n = A ¯ u + n, n ∈ N (0 ,σ ) . To characterize the noise level, we shall use SNR (signal to noise ratio) instead of σ itself. The SNR isdefined as follows SN R ( u ) := 20log ( k ¯ u kk n k ) . In this section we test our algorithm on recovering the true signal ¯ u from A and the noisy measurement˜ f . As in the last section, the nonzero entries of ¯ u are generated from U ( − , A is either a Gaussianrandom matrix or a partial DCT matrix. Our stopping criteria is given bystd (cid:0) Au k − ˜ f (cid:1) < σ, and Iter. < , i.e. we stop whenever the standard deviation of residual Au k − ˜ f is less than σ or the number of iterationsexceeds 1000. Table 2 shows numerical results for different noise level, size of A and sparsity. We also showone typical result for a partial DCT matrix with size n = 4000 and k ¯ u k = 0 . n = 80 in Figure 3. tanley Osher, Yu Mao, Bin Dong, Wotao Yin m,n, k ¯ u k ).Results of linearized Bregman- L with kickingStopping criteria. std( Au k − f ) < σ . Gaussian matricesstopping itr. k relative error k u k − ¯ u k / k ¯ u k time (sec.)mean std. max mean std. max mean std. maxAvg. SNR ( n , m ) k ¯ u k = 0 . n n , m ) k ¯ u k = 0 . n n , m ) k ¯ u k = 0 . n n , m ) k ¯ u k = 0 . n In this section, we test our algorithm on signals with high dynamical ranges. Precisely speaking, letMAX = max {| ¯ u i | : i = 1 ,...,n } and MIN = min {| u i | : u i = 0 ,i = 1 ,...,n } . The signals we shall consider here sat-isfy MAXMIN ≈ . Our ¯ u is generated by multiplying a random number in [0 ,
1] with another one randomlypicked from { , ,..., } . Here we adopt the stopping criteria k Au k − f kk f k < − for the case without noise (Figure 4) and the same stopping criteria as in the previous section for the noisycases (Figures 5-7). In the experiments, we take the dimension n = 4000, the number of nonzeros of ¯ u tobe 0 . n , and µ = 10 . Here µ is chosen to be much larger than before, because the dynamical range of ¯ u is large. Figure 4 shows results for the noise free case, where the algorithm converges to a 10 − residualin less than 300 iterations. Figures 5-7 show the cases with noise (the noise is added the same way as inprevious section). As one can see, if the measurements are contaminated with less noise, signals with smallermagnitudes will be recovered well. For example in Figure 5, the SNR ≈ are well recovered; in Figure 6, the SNR ≈
97, and the entries of magnitudes 10 are well recovered; andin Figure 7, the SNR ≈
49, and the entries of magnitudes 10 are well recovered. In this section we consider ¯ u ( t ) = a sin( αt ) + b cos( βt ) , where a,b,α and β are unknown. The observed signal ˜ u is noisy and has the form ˜ u = ¯ u + n with n ∼ N (0 ,σ ).In practice, the noise in ˜ u could be huge, i.e. possibly have a negative SNR, and we may only be able tanley Osher, Yu Mao, Bin Dong, Wotao Yin Figure 3: The left figure presents the clean (red dots) and noisy (blue circles) measurements, withSNR=23.1084; the right figure shows the reconstructed signal (blue circles) v.s. original signal (red dots),where the relative error=0.020764, and number of iterations is 102.to observe partial information of ˜ u , i.e. only a subset of values of ˜ u is known. Notice that the signal issparse (only four spikes) in frequency domain. Therefore, this is essentially a compressed sensing problemand ℓ -minimization should work well here. Now the problem can be stated as reconstructing the originalsignal ¯ u from random samples of the observed signal ˜ u using our fast ℓ -minimization algorithm. In ourexperiments, the magnitudes a and b are generated from U ( − , α and β are random multiplesof πn , i.e. α = k πn and α = k πn , with k i taken from { , ,...,n − } randomly and n denotes the dimension.We let I be a random subset of { , ,...,n } and f = ˜ u ( I ), and take A and A ⊤ to be the partial matrix ofinverse Fourier matrix and Fourier matrix respectively. Now we perform our algorithm adopting the samestopping criteria as in section 5.2, and obtain a reconstructed signal denoted as x . Notice that reconstructedsignal x is in Fourier the domain, not in the physical domain. Thus we take an inverse Fourier transformto get the reconstructed signal in physical domain, denoted as u ∗ . Since we know a priori that our solutionshould have four spikes in Fourier domain, before we take the inverse Fourier transform, we pick the fourspikes with largest magnitudes and set the rest of the entries to be zero. Some numerical results are given inFigure 8-11. Our experiments show that the larger the noise level is, the more random samples we need fora reliable reconstruction, where reliable means that with high probability ( > a and b , our algorithm cannot guarantee to recover them exactly (asone can see in Figure 8-11). However, frequency information is much more important than magnitudes inthe sense that the reconstructed signal is less sensitive to errors in magnitudes than errors in frequencies(see bottom figures in Figure 8-11). On the other hand, once we recover the right frequencies, one can usehardware to estimate magnitudes accurately. We have proposed the linearized Bregman iterative algorithms as a competitive method for solving thecompressed sensing problem. Besides the simplicity of the algorithm, the special structure of the iterationenables the kicking scheme to accelerate the algorithm even when µ is extremely large. As a result, a sparsesolution can always be approached efficiently.It also turns out that our process has remarkable denoising properties for undersampled sparse signals.We will pursue this in further work.Our results suggest there is a big category of problem that can be solved by linearized Bregman iterativealgorithms. We hope that our method and its extensions could produce even more applications for problemsunder different scenarios, including very underdetermined inverse problems in partial differential equations. tanley Osher, Yu Mao, Bin Dong, Wotao Yin Iter= 298.
Zoom−In
Decay of log (||Au k −f||/||f||) l og (r e s i dua l ) Iterations 0 100 200 300−12−10−8−6−4−20
Decay of log (||u−u true ||/||u true ||) l og ( e rr o r) Iterations
Figure 4: Upper left, true signal (red dots) v.s. recovered signal (blue circle); upper right, one zoom-in tothe lower magnitudes; lower left, decay of residual log k Au k − f kk f k ; lower right, decay of error to true solutionlog k u k − ¯ u kk ¯ u k . S.O. was supported by ONR Grant N000140710810, a grant from the Department of Defense and NIHGrant UH54RR021813; Y.M. and B.D. were supported by NIH Grant UH54RR021813; W.Y. was supportedby NSF Grant DMS-0748839 and an internal faculty research grant from the Dean of Engineering at RiceUniversity.
References [1] J. Darbon and S. Osher. Fast discrete optimizations for sparse approximations and deconvolutions.preprint 2007.[2] W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithms for compressed sensingand related problems.
SIAM J. Imaging Sciences 1(1). , pages 143–168, 2008.[3] J. Cai, S. Osher, and Z. Shen. Linearized Bregman iterations for compressed sensing.
Math. Comp. ,2008. to appear, see also UCLA CAM Report 08-06. tanley Osher, Yu Mao, Bin Dong, Wotao Yin SNR = 117.8948, Iter = 181, Error = 3.4312e−007
Figure 5: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, onezoom-in to the magnitude ≈ . The error is measured by k u k − ¯ u kk ¯ u k . SNR = 96.9765, Iter = 148, Error = 2.6386e−006
Figure 6: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, onezoom-in to the magnitude ≈ . The error is measured by k u k − ¯ u kk ¯ u k . tanley Osher, Yu Mao, Bin Dong, Wotao Yin SNR = 49.289, Iter = 90, Error = 0.00067413
Figure 7: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, onezoom-in to the magnitude ≈ . The error is measured by k u k − ¯ u kk ¯ u k .[4] J. Cai, S. Osher, and Z. Shen. Convergence of the linearized Bregman iteration for ℓ -norm minimization.UCLA CAM Report 08-52, 2008.[5] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction fromhighly incomplete frequency information. 52(2):489–509, 2006.[6] D.L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory , 52:1289–1306, 2006.[7] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for totalvariation based image restoration.
Multiscale Model. Simul , 4(2):460–489, 2005.[8] E. Hale, W. Yin, and Y. Zhang. A fixed-point continuation method for ℓ -regularization with applicationto compressed sensing. CAAM Technical Report TR07-07, Rice University, Houston, TX, 2007.[9] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Phys. D ,60:259–268, 1992.[10] T-C. Chang, L. He, and T. Fang. Mr image reconstruction from sparse radial samples using bregmaniteration.
Proceedings of the 13th Annual Meeting of ISMRM , 2006.[11] Y. Li, S. Osher, and Y.-H. Tsai. Recovery of sparse noisy date from solutions to the heat equation. inpreparation.[12] L.M. Bregman. The relaxation method of finding the common point of convex sets and its application tothe solution of problems in convex programming.
USSR Computational Mathematics and MathematicalPhysics , 7(3):200–217, 1967.[13] D.L. Donoho. De-noising by soft-thresholding.
IEEE Trans. Inform. Theory .[14] W. Yin. On the linearized bregman algorithm. private communication.[15] M. Bachmayr. Iterative total variation methods for nonlinear inverse problems.
Master’s thesis, Jo-hannes Kepler Universit¨at, Linz, Austria , 2007. tanley Osher, Yu Mao, Bin Dong, Wotao Yin Orignal and Noisy Waves, SNR = 2.6185
Physical Domain
Frequency Domain
One Zoom−In
Figure 8: Reconstruction using 20% random samples of ˜ u with SNR= 2 . | b u ∗ | v.s. | b ¯ u | ); bottom left shows thereconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up ofthe figure at bottom left. tanley Osher, Yu Mao, Bin Dong, Wotao Yin Orignal and Noisy Waves, SNR = −4.7836
Physical Domain
Frequency Domain
One Zoom−In
Figure 9: Reconstruction using 40% random samples of ˜ u with SNR= − . | b u ∗ | v.s. | b ¯ u | ); bottom left shows thereconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up ofthe figure at bottom left. tanley Osher, Yu Mao, Bin Dong, Wotao Yin Orignal and Noisy Waves, SNR = −6.7908
Physical Domain
Frequency Domain
One Zoom−In
Figure 10: Reconstruction using 60% random samples of ˜ u with SNR= − . | b u ∗ | v.s. | b ¯ u | ); bottom left shows thereconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up ofthe figure at bottom left. tanley Osher, Yu Mao, Bin Dong, Wotao Yin Orignal and Noisy Waves, SNR = −11.0016
Physical Domain
Frequency Domain
One Zoom−In
Figure 11: Reconstruction using 80% random samples of ˜ u with SNR= − . | b u ∗ | v.s. ||
Figure 11: Reconstruction using 80% random samples of ˜ u with SNR= − . | b u ∗ | v.s. || b ¯ u ||