Adaptive Damping and Mean Removal for the Generalized Approximate Message Passing Algorithm
Jeremy Vila, Philip Schniter, Sundeep Rangan, Florent Krzakala, Lenka Zdeborova
aa r X i v : . [ c s . I T ] D ec ADAPTIVE DAMPING AND MEAN REMOVAL FORTHE GENERALIZED APPROXIMATE MESSAGE PASSING ALGORITHM
Jeremy Vila ⋆ Philip Schniter ⋆ Sundeep Rangan † Florent Krzakala ‡ Lenka Zdeborov´a ◦ ⋆ Dept. of ECE, The Ohio State University, Columbus, OH 43202, USA. † Dept. of ECE, Polytechnic Institute of New York University, Brooklyn, NY 11201, USA. ‡ Sorbonne Universit´es, UPMC Univ Paris 06 and ´Ecole Normale Sup´erieure, 75005 Paris, France. ◦ Institut de Physique Th´eorique, CEA Saclay, and CNRS URA 2306, 91191 Gif-sur-Yvette, France.
ABSTRACT
The generalized approximate message passing (GAMP) algorithm isan efficient method of MAP or approximate-MMSE estimation of x observed from a noisy version of the transform coefficients z = Ax .In fact, for large zero-mean i.i.d sub-Gaussian A , GAMP is char-acterized by a state evolution whose fixed points, when unique, areoptimal. For generic A , however, GAMP may diverge. In thispaper, we propose adaptive-damping and mean-removal strategiesthat aim to prevent divergence. Numerical results demonstrate sig-nificantly enhanced robustness to non-zero-mean, rank-deficient,column-correlated, and ill-conditioned A . Index Terms — Approximate message passing, belief propaga-tion, compressed sensing.
1. INTRODUCTION
Consider estimating a realization x ∈ R N of a random vector x with statistically independent components x n ∼ p x n from observa-tions y = [ y m ] ∈ R M that are conditionally independent given thetransform outputs z = A x (1)for some known matrix A = [ a mn ] ∈ R M × N . Here, the likeli-hood function can be written as p y | z ( y | Ax ) with separable p y | z ,i.e., p y | z ( y | z ) = Q Mm =1 p y m | z m ( y m | z m ) . Such problems arise ina range of applications including statistical regression, inverse prob-lems, and compressive sensing. Note that, for clarity, we use san-serif fonts (e.g., x , x n ) to denote random quantities and serif fonts(e.g., x , x n ) to denote deterministic ones.Assuming knowledge of the prior p x ( x ) = Q Nn =1 p x n ( x n ) and likelihood p y | z ( y | z ) , typical estimation goals are to computethe minimum mean-squared error (MMSE) estimate b x MMSE , R R N x p x | y ( x | y ) d x or the maximum a posteriori (MAP) estimate b x MAP , arg max x p x | y ( x | y ) = arg min x J MAP ( x ) for the MAPcost J MAP ( b x ) , − ln p y | z ( y | A b x ) − ln p x ( b x ) . (2)Recently, the generalized approximate message passing (GAMP) al-gorithm [1] has been proposed as a means of tackling these twoproblems in the case that M and N are large. Essentially, GAMPuses a high-dimensional approximation of loopy belief propagation This work has been supported in part by NSF grants IIP-0968910, CCF-1018368, and CCF-1218754, an allocation of computing time from the OhioSupercomputer Center, and by European Unions 7th Framework Programme(FP/2007-2013)/ERC Grant Agreement 307087-SPARCS. to convert the MMSE or MAP inference problems into a sequenceof tractable scalar inference problems.GAMP is well motivated in the case that A is a realization ofa large random matrix with i.i.d zero-mean sub-Gaussian entries.For such A , in the large-system limit (i.e., M, N → ∞ for fixed
M/N ∈ R + ), GAMP is characterized by a state evolution whosefixed points, when unique, are MMSE or MAP optimal [1–3]. Fur-thermore, for generic A , it has been shown [4] that MAP-GAMP’sfixed points coincide with the critical points of the cost function (2)and that MMSE-GAMP’s fixed points coincide with those of a Bethefree entropy [5], as discussed in detail in Section 2.2.For generic A , however, GAMP may not reach its fixed points,i.e., it may diverge (e.g., [6]). The convergence of GAMP has beenfully characterized in [7] for the simple case that p x n and p y m | z m areGaussian. There, it was shown that Gaussian-GAMP converges ifand only if the peak-to-average ratio of the squared singular values of A is sufficiently small. A damping modification was then proposedin [7] that guarantees the convergence of Gaussian-GAMP with arbi-trary A , at the expense of a slower convergence rate. For strictly log-concave p x n and p y m | z m , the local convergence of GAMP was alsocharacterized in [7]. However, the global convergence of GAMPunder generic A , p x n , and p y m | z m is not yet understood.Because of its practical importance, prior work has attempted torobustify the convergence of GAMP in the face of “difficult” A (e.g.,high peak-to-average singular values) for generic p x n and p y m | z m .For example, “swept” GAMP (SwAMP) [8] updates the estimatesof { x n } Nn =1 and { z m } Mm =1 sequentially, in contrast to GAMP, whichupdates them in parallel. Relative to GAMP, experiments in [8] showthat SwAMP is much more robust to difficult A , but it is slower andcannot facilitate fast implementations of A like an FFT. As anotherexample, the public-domain GAMPmatlab implementation [9] hasincluded “adaptive damping” and “mean removal” mechanisms forsome time, but they have never been described in the literature.In this paper, we detail the most recent versions of GAMP-matlab’s adaptive damping and mean-removal mechanisms, and weexperimentally characterize their performance on non-zero-mean,rank-deficient, column-correlated, and ill-conditioned A matrices.Our results show improved robustness relative to SwAMP and en-hanced convergence speed.
2. ADAPTIVELY DAMPED GAMP
Damping is commonly used in loopy belief propagation to “slowdown” the updates in an effort to promote convergence. (See, e.g.,[10] for damping applied to the sum-product algorithm and [7,9,11]for damping applied to GAMP.) However, since not enough damping efinitions for MMSE-GAMP: g z m ( b p, ν p ) , R z f z m ( z ; b p, ν p ) dz (D1) for f z m ( z ; b p, ν p ) , p y m | z m ( y m | z ) N ( z ; b p,ν p ) B m ( b p,ν p ) and B m ( b p, ν p ) , R p y m | z m ( y m | z ) N ( z ; b p, ν p ) dzg x n ( b r, ν r ) , R x f x n ( x ; b r, ν r ) dx (D2) for f x n ( x ; b r, ν r ) , p x n ( x ) N ( x ; b r,ν r ) C n ( b r,ν r ) and C n ( b r, ν r ) , R p x n ( x ) N ( x ; b r, ν r ) dx definitions for MAP-GAMP: g z m ( b p, ν p ) , arg max z ln p y m | z m ( y m | z ) + ν p | z − b p | (D3) g x n ( b r, ν r ) , arg max x ln p x n ( x ) + ν r | x − b r | (D4) inputs: ∀ m, n : g z m , g x n , b x n (1) , ν xn (1) , a mn , T max ≥ , ǫ ≥ T β ≥ , β max ∈ (0 , , β min ∈ [0 , β max ] , G pass ≥ , G fail < initialize: ∀ m : ν pm (1) = P Nn =1 | a mn | ν xn (1) , b p m (1)= P Nn =1 a mn b x n (1) (I2) J (1) = ∞ , β (1) = 1 , t = 1 (I3) while t ≤ T max , ∀ m : ν zm ( t ) = ν pm ( t ) g ′ z m ( b p m ( t ) , ν pm ( t )) (R1) ∀ m : b z m ( t ) = g z m ( b p m ( t ) , ν pm ( t )) (R2) ∀ m : ν sm ( t ) = β ( t ) (cid:16) − ν zm ( t ) ν pm ( t ) (cid:17) ν pm ( t ) + (cid:0) − β ( t ) (cid:1) ν sm ( t − (R3) ∀ m : b s m ( t ) = β ( t ) b z m ( t ) − b p m ( t ) ν pm ( t ) + (cid:0) − β ( t ) (cid:1)b s m ( t − (R4) ∀ n : e x n ( t ) = β ( t ) b x n ( t ) + (cid:0) − β ( t ) (cid:1)e x n ( t − (R5) ∀ n : ν rn ( t ) = β ( t ) P Mm =1 | a mn | ν sm ( t ) + (cid:0) − β ( t ) (cid:1) ν rn ( t − (R6) ∀ n : b r n ( t ) = e x n ( t ) + ν rn ( t ) P Mm =1 a H mn b s m ( t ) (R7) ∀ n : ν xn ( t +1) = ν rn ( t ) g ′ x n ( b r n ( t ) , ν rn ( t )) (R8) ∀ n : b x n ( t +1) = g x n ( b r n ( t ) , ν rn ( t )) (R9) ∀ m : ν pm ( t +1) = β ( t ) P Nn =1 | a mn | ν xn ( t +1) + (1 − β ( t ) (cid:1) ν pm ( t ) (R10) ∀ m : b p m ( t +1) = P Nn =1 a mn b x n ( t +1) − ν pm ( t +1) b s m ( t ) (R11) J ( t +1) = eqn (2) for MAP-GAMP or eqn (11) for MMSE-GAMP (R12) if J ( t +1) ≤ max τ =max { t − T β , } ,...,t J ( τ ) or β ( t ) = β min (R13) then if k b x ( t ) − b x ( t +1) k / k b x ( t +1) k < ǫ, (R14) then stop (R15) else β ( t +1) = min { β max , G pass β ( t ) } (R16) t = t +1 (R17) else β ( t ) = max { β min , G fail β ( t ) } , (R18) endoutputs: ∀ m, n : b r n ( t ) , ν rn ( t ) , b p m ( t +1) , ν pm ( t +1) , b x n ( t +1) , ν xn ( t +1) Table 1 . The adaptively damped GAMP algorithm. In lines (R1)and (R8), g ′ z m and g ′ x n denote the derivatives of g z m and g x n w.r.ttheir first arguments.allows divergence while too much damping unnecessarily slows con-vergence, we are motivated to develop an adaptive damping schemethat applies just the right amount of damping at each iteration.Table 1 details the proposed adaptively damped GAMP (AD-GAMP) algorithm. Lines (R3)-(R6) and (R10) use an iteration- t -dependent damping parameter β ( t ) ∈ (0 , to slow the updates, and lines (R12)-(R18) adapt the parameter β ( t ) . When β ( t ) = 1 ∀ t ,AD-GAMP reduces to the original GAMP from [1]. Due to lack ofspace, we refer readers to [1, 4] for further details on GAMP. The damping adaptation mechanism in AD-GAMP works as fol-lows. Line (R12) computes the current cost J ( t +1) , as describedin the sequel. Line (R13) then checks evaluates whether the currentiteration “passes” or “fails”: it passes if the current cost is at leastas good as the worst cost over the last T β ≥ iterations or if β ( t ) The GAMPmatlab implementation [9] allows one to disable damping in(R6) and/or (R10). is already at its minimum allowed value β min , else it fails. If the it-eration passes, (R14)-(R15) implement a stopping condition, (R16)increases β ( t ) by a factor G pass ≥ (up to the maximum value β max ),and (R17) increments the counter t . If the iteration fails, (R18) de-creases β ( t ) by a factor G fail < (down to the minimum value β min )and the counter t is not advanced, causing AD-GAMP to re-try the t th iteration with the new value of β ( t ) .In the MAP case, line (R12) simply computes the cost J ( t +1) = J MAP ( b x ( t +1)) for J MAP from (2). The MMSE case, which is moreinvolved, will be described next.
As proven in [4] and interpreted in the context of Bethe free entropyin [5], the fixed points of MMSE-GAMP are critical points of theoptimization problem ( f x , f z ) = arg min b x ,b z J Bethe ( b x , b z ) s.t. E { z | b z } = A E { x | b x } (3) J Bethe ( b x , b z ) , D (cid:0) b x k p x (cid:1) + D (cid:0) b z k p y | z Z − (cid:1) + H (cid:0) b z , ν p (cid:1) (4) H (cid:0) b z , ν p (cid:1) , M X m =1 (cid:18) var { z m | b z m } ν pm + ln 2 πν pm (cid:19) , (5)where b x and b z are separable pdfs, Z − , R p y | z ( y | z ) d z is thescaling factor that renders p y | z ( y | z ) Z − a valid pdf over z ∈ R M , D ( ·k· ) denotes Kullback-Leibler (KL) divergence, and H ( b z ) isan upper bound on the entropy of b z that is tight when b z is in-dependent Gaussian with variances in ν p . In other words, thepdfs f x ( x ; b r , ν r ) = Q Nn =1 f x n ( x n ; b r n , ν rn ) and f z ( z ; b p , ν p ) = Q Mm =1 f z m ( z m ; b p m , ν pm ) given in lines (D1) and (D2) of Table 1 arecritical points of (3) for fixed-point versions of b r , ν r , b p , ν p .Since f x and f z are functions of b r , ν r , b p , ν p , the cost J Bethe canbe written in terms of these quantities as well. For this, we first note D (cid:0) f x n (cid:13)(cid:13) p x n (cid:1) = Z f x n ( x ; b r n , ν rn ) ln p x n ( x ) N ( x ; b r n , ν rn ) p x n ( x ) C n ( b r n , ν rn ) dx (6) = − ln C n ( b r n , ν rn ) − ln 2 πν rn − Z f x n ( x ; b r n , ν rn ) | x − b r n | ν rn dx (7) = − ln C n ( b r n , ν rn ) − ln 2 πν rn − | b x n − b r n | + ν xn ν rn , (8)where b x n and ν xn are the mean and variance of f x n ( · ; b r n , ν rn ) from(R9) and (R8). Following a similar procedure, D (cid:0) f z m k p y m | z m Z − m (cid:1) = − ln B m ( b p m , ν pm ) Z m − ln 2 πν pm − | b z m − b p m | + ν zm ν pm , (9)where b z m and ν zm are the mean and variance of f z m ( · ; b p m , ν pm ) from(R2) and (R1). Then, since D ( f x k p x ) = P Nn =1 D (cid:0) f x n k p x n (cid:1) and D ( f z k p y | z Z − ) = P Mm =1 D (cid:0) f z m k p y m | z m Z − m (cid:1) , (4) and (5) imply J Bethe ( b r , ν r , b p , ν p ) = − M X m =1 ln B m ( b p m , ν pm ) + | b z m − b p m | ν pm ! − N X n =1 ln C n ( b r n , ν rn ) + ln ν rn ν xn + | b x n − b r n | ν rn ! + const , (10)where we have written J Bethe ( f x , f z ) as “ J Bethe ( b r , ν r , b p , ν p ) ” tomake the ( b r , ν r , b p , ν p ) -dependence clear, and where const collectsterms invariant to ( b r , ν r , b p , ν p ) . nputs: g z m , [ A b x ] m , ν pm , e p m (1) , I max ≥ , ǫ inv ≥ , α ∈ (0 , , φ ≥ for i = 1 : I max , e m ( i ) = [ A b x ] m − g z m (cid:0)e p m ( i ) , ν pm (cid:1) (F1) if (cid:12)(cid:12) e m ( i ) /g z m (cid:0)e p m ( i ) , ν pm (cid:1)(cid:12)(cid:12) < ǫ inv , stop (F2) ∇ m ( i ) = g ′ z m (cid:0)e p m ( i ) , ν pm (cid:1) (F3) e p m ( i +1) = e p m ( i ) + α e m ( i ) ∇ m ( i ) ∇ m ( i )+ φ (F4) endoutputs: e p m ( i ) Table 2 . A regularized Newton’s method to find the value of e p m thatsolves [ A b x ] m = g z m ( e p m , ν pm ) for a given [ A b x ] m and ν pm .Note that the iteration- t MMSE-GAMP cost is not obtainedsimply by plugging ( b r ( t ) , ν r ( t ) , b p ( t +1) , ν p ( t +1)) into (10), be-cause the latter quantities do not necessarily yield ( f x , f z ) satisfyingthe moment-matching constraint E { z | f z } = A E { x | f x } from (3).Thus, it was suggested in [5] to compute the cost as J MSE ( b r ( t ) , ν r ( t )) = J Bethe ( b r ( t ) , ν r ( t ) , e p , ν p ( t +1)) , (11)for e p chosen to match the moment-matching constraint, i.e., for [ A b x ( t +1)] m = g z m (cid:0)e p m , ν pm ( t +1) (cid:1) for m = 1 , . . . , M (12)where b x n ( t +1) = g x n (cid:0)b r n ( t ) , µ rn ( t ) (cid:1) for n = 1 , . . . , N from (R9).Note that, since ν p ( t +1) can be computed from ( b r ( t ) , ν r ( t )) via(R8) and (R10), the left side of (11) uses only ( b r ( t ) , ν r ( t )) .In the case of an additive white Gaussian noise (AWGN), i.e., p y m | z m ( y m | z m ) = N ( z m ; y m , ν w ) with ν w > , the function g z m ( e p m , ν pm ) is linear in e p m . In this case, [5] showed that (12) canbe solved in closed-form, yielding the solution e p m = (cid:0) ( ν pm ( t +1) + ν w )[ A b x ( t +1)] m − ν pm ( t +1) y m (cid:1) /ν w . (13)For general p y m | z m , however, the function g z m ( e p m , ν pm ) is non-linear in e p m and difficult to invert in closed-form. Thus, we proposeto solve (12) numerically using the regularized Newton’s methoddetailed in Table 2. There, α ∈ (0 , is a stepsize, φ ≥ is a reg-ularization parameter that keeps the update’s denominator positive,and I max is a maximum number of iterations, all of which should betuned in accordance with p y m | z m . Meanwhile, e p m (1) is an initializa-tion that can be set at b p m ( t +1) or [ A b x ( t +1)] m and ǫ inv is a stoppingtolerance. Note that the functions g z m and g ′ z m employed in Table 2are readily available from Table 1. To mitigate the difficulties caused by A with non-zero mean entries,we propose to rewrite the linear system “ z = Ax ” in (1) as z z M +1 z M +2 | {z } , z = e A b γ b M b H N − b b b c H − b b | {z } , A x x N +1 x N +2 | {z } , x (14)where ( · ) H is conjugate transpose, P , [1 , . . . , H ∈ R P , and µ , MN H M A N (15) γ , N A N (16) c H , M H M (cid:0) A − µ M H N (cid:1) (17) e A , A − γ H N − M c H . (18) The advantage of (14) is that the rows and columns of A are ap-proximately zero-mean. This can be seen by first verifying, via thedefinitions above, that c H N = 0 , e A N = , and H M e A = H ,which implies that the elements in every row and column of e A arezero-mean. Thus, for large N and M , the elements in all but a van-ishing fraction of the rows and columns in A will also be zero-mean.The mean-square coefficient size in the last two rows and columnsof A can be made to match that in e A via choice of b , b , b , b .To understand the construction of (14), note that (18) implies z = Ax = e Ax + b γ H N x /b | {z } , x N +1 + b M c H x /b | {z } , x N +2 , (19)which explains the first M rows of (14). To satisfy the definitions in(19), we then require that z M +1 = 0 and z M +2 = 0 in (14), whichcan be ensured through the Dirac-delta likelihood p y m | z m ( y m | z m ) , δ ( z m ) for m ∈ { M +1 , M +2 } . (20)Meanwhile, we make no assumption about the newly added elements x N +1 and x N +2 , and thus adopt the improper uniform prior p x n ( x n ) ∝ for n ∈ { N +1 , N +2 } . (21)In summary, the mean-removal approach suggested here runsGAMP or AD-GAMP (as in Table 1) with A in place of A and withthe likelihoods and priors augmented by (20) and (21). It is impor-tant to note that, if multiplication by A and A H can be implementedusing a fast transform (e.g., FFT), then multiplication by A and A H can too; for details, see the GAMPmatlab implementation [9].
3. NUMERICAL RESULTS
We numerically studied the recovery
NMSE , k b x − x k / k x k of SwAMP [8] and the MMSE version of the original GAMPfrom [1] relative to the proposed mean-removed (M-GAMP) andadaptively damped (AD-GAMP) modifications, as well as theircombination (MAD-GAMP). In all experiments, the signal x wasdrawn Bernoulli-Gaussian (BG) with sparsity rate τ and length N = 1000 , and performance was averaged over realizations.Average NMSE was clipped to dB for plotting purposes. Thematrix A was drawn in one of four ways:(a) Non-zero mean : i.i.d a mn ∼ N ( µ, N ) for a specified µ = 0 .(b) Low-rank product : A = N UV with U ∈ R M × R , V ∈ R R × N ,and i.i.d u mr , v rn ∼ N (0 , , for a specified R . Note A is rankdeficient when R < min { M, N } .(c) Column-correlated : the rows of A are independent zero-meanstationary Gauss-Markov processes with a specified correlationcoefficient ρ = E { a mn a H m,n +1 } / E {| a mn | } .(d) Ill-conditioned : A = U Σ V H where U and V H are the left andright singular vector matrices of an i.i.d N (0 , matrix and Σ is a singular value matrix such that [ Σ ] i,i / [ Σ ] i +1 ,i +1 =( κ ) / min { M,N } for i = 1 , . . . , min { M, N } − , with a spec-ified condition number κ > .For all algorithms, we used T max = 1000 and ǫ = 10 − . Unlessotherwise noted, for adaptive damping, we used T β = 0 , G pass = 1 . , G fail = 0 . , β max = 1 , and β min = 0 . . For SwAMP, we used theauthors’ publicly available code [12].First we experiment with compressive sensing (CS) in AWGNat SNR , E {k z k } / E {k y − z k } = 60 dB. For this, we used −3 −2 −1 −60−50−40−30−20−100 0.40.50.60.70.80.91−70−60−50−40−30−20−100 −60−50−40−30−20−100 PSfrag replacements ρ N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] (b) Rank Ratio R/N (a) Mean µ (c) Correlation ρ (d) Condition number κ MAD-GAMPM-GAMPAD-GAMPSwAMPgenieGAMP
Fig. 1 . AWGN compressive sensing under (a) non-zero-mean, (b)low-rank product, (c) column-correlated, and (d) ill-conditioned A . −3 −2 −1 −60−50−40−30−20−100 0.40.50.60.70.80.91−70−60−50−40−30−20−100 −60−50−40−30−20−100 PSfrag replacements ρ N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] (b) Rank Ratio R/N (a) Mean µ (c) Correlation ρ (d) Condition number κ MAD-GAMPM-GAMPAD-GAMPSwAMP genie
GAMP
Fig. 2 . “Robust” compressive sensing under (a) non-zero-mean, (b)low-rank product, (c) column-correlated, and (d) ill-conditioned A . M = 500 = N/ measurements and sparsity rate τ = 0 . . As a ref-erence, we compute a lower-bound on the achievable NMSE usinga genie who knows the support of x . For non-zero-mean matrices,Fig. 1(a) shows that the proposed M-GAMP and MAD-GAMP pro-vided near-genie performance for all tested means µ . In contrast,GAMP only worked with zero-mean A and SwAMP with small-mean A . For low-rank product, correlated, and ill-conditioned ma-trices, Figs. 1(b)-(d) show that AD-GAMP is slightly more robustthan SwAMP and significantly more robust than GAMP.Next, we tried “robust” CS by repeating the previous experi-ment with sparsity rate τ = 0 . and with 10% of the observations(selected uniformly at random) replaced by “outliers” corrupted byAWGN at SNR = 0 dB. For (M)AD-GAMP, we set β max = 0 . and T max = 2000 . With non-zero-mean A , Fig. 2(a) shows increasingperformance as we move from GAMP to M-GAMP to SwAMP toMAD-GAMP. For low-rank product, correlated, and ill-conditionedmatrices, Fig. 2(b)-(d) show that SwAMP was slightly more robustthan AD-GAMP, and both where much more robust than GAMP.Finally, we experimented with noiseless -bit CS [13], where y = sgn( Ax ) , using M = 3000 measurements and sparsity ratio τ = 0 . . In each realization, the empirical mean was subtracted −3 −2 −1 −20−15−10−50 0.40.50.60.70.80.91−18−16−14−12−10−8−6−4−20 −18−16−14−12−10−8−6−4−20 PSfrag replacements ρ N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] N M SE [ d B ] (b) Rank Ratio R/N (a) Mean µ (c) Correlation ρ (d) Condition number κ MAD-GAMP
MAD-GAMPAD-GAMPM-GAMPSwAMPGAMP
Fig. 3 . -bit compressive sensing under (a) non-zero-mean, (b) low-rank product, (c) column-correlated, and (d) ill-conditioned A . µ = 0 . R/N = 0 . ρ = 0 . κ = 1 MAD-GAMP SwAMP AD-GAMP SwAMP AD-GAMP SwAMP AD-GAMP SwAMP s ec ond s AWGN -bit it e r s AWGN 42.9 -bit 947.8 Robust 187.3
Table 3 . Average runtime (in seconds) and x to prevent y m = 1 ∀ m . For (M)AD-GAMP, we used β max = 0 . . For SwAMP, we increased the stop-ping tolerance to ǫ = 5 × − , as it significantly improved runtimewithout degrading accuracy. For non-zero-mean A , Fig. 3(a) showsthat M-GAMP and MAD-GAMP were more robust than SwAMP,which was in turn much more robust than GAMP. For low-rankproduct, correlated, and ill-conditioned matrices, Figs. 3(b)-(d) showthat MAD-GAMP and SwAMP gave similarly robust performance,while the original GAMP was very fragile.Finally, we compare the convergence speed of MAD-GAMP toSwAMP. For each problem, we chose a setting that allowed MAD-GAMP and SwAMP to converge for each matrix type. Table 3shows that, on the whole, MAD-GAMP ran several times faster thanSwAMP but used more iterations. Thus, it may be possible to re-duce SwAMP’s runtime to below that of MAD-GAMP using a moreefficient (e.g., BLAS-based) implementation, at least for explicit A .When A has a fast O ( N log N ) implementation (e.g., FFT), only(M)AD-GAMP will be able to exploit the reduced complexity.
4. CONCLUSIONS
We proposed adaptive damping and mean-removal modificationsof GAMP that help prevent divergence in the case of “difficult” A matrices. We then numerically demonstrated that the resulting mod-ifications significantly increase GAMP’s robustness to non-zero-mean, low-rank product, column-correlated, and ill-conditioned A matrices. Moreover, they provide robustness similar to the recentlyproposed SwAMP algorithm, whilerunning faster than the currentSwAMP implementation. For future work, we note that the se-quential update of SwAMP could in principle be combined with theproposed mean-removal and/or adaptive damping to perhaps achievea level robustness greater than either SwAMP or (M)AD-GAMP. . REFERENCES [1] S. Rangan, “Generalized approximate message passing forestimation with random linear mixing,” in Proc. IEEE Int.Symp. Inform. Thy. , Aug. 2011, pp. 2168–2172, (full versionat arXiv:1010.5141 ).[2] M. Bayati, M. Lelarge, and A. Montanari, “Universality inpolytope phase transitions and iterative algorithms,” in
Proc.IEEE Int. Symp. Inform. Thy. , Boston, MA, June 2012, pp.1643–1647, (full paper at arXiv:1207.7321 ).[3] Adel Javanmard and Andrea Montanari, “State evolution forgeneral approximate message passing algorithms, with appli-cations to spatial coupling,”
Inform. Inference , vol. 2, no. 2,pp. 115–144, 2013.[4] S. Rangan, P. Schniter, E. Riegler, A. Fletcher, and V. Cevher,“Fixed points of generalized approximate message passingwith arbitrary matrices,” in
Proc. IEEE Int. Symp. Inform. Thy. ,July 2013, pp. 664–668, (full version at arXiv:1301.6295 ).[5] F. Krzakala, A. Manoel, E. W. Tramel, and L. Zdeborov´a,“Variational free energies for compressed sensing,” in
Proc.IEEE Int. Symp. Inform. Thy. , July 2014, pp. 1499–1503, (seealso arXiv:1402.1384 ).[6] F. Caltagirone, F. Krzakala, and L. Zdeborov´a, “On conver-gence of approximate message passing,” in
Proc. IEEE Int.Symp. Inform. Thy. , July 2014, pp. 1812–1816, (see also arXiv:1401.6384 ).[7] S. Rangan, P. Schniter, and A. Fletcher, “On the convergenceof generalized approximate message passing with arbitrary ma-trices,” in
Proc. IEEE Int. Symp. Inform. Thy. , July 2014, pp.236–240, (full version at arXiv:1402.3210 ).[8] Andre Manoel, Florent Krzakala, Eric W. Tramel, and LenkaZdeborov´a, “Sparse estimation with the swept approximatedmessage-passing algorithm,” arXiv:1406.4311 , June 2014.[9] S. Rangan, P. Schniter, J. T. Parker, J. Ziniel,J. Vila, M. Borgerding, and et al., “GAMPmatlab,” https://sourceforge.net/projects/gampmatlab/ .[10] T. Heskes, “Stable fixed points of loopy belief propagationare minima of the Bethe free energy,” in
Proc. Neural Inform.Process. Syst. Conf. , Vancouver, B.C., Dec. 2002, pp. 343–350.[11] P. Schniter and S. Rangan, “Compressive phase retrieval viageneralized approximate message passing,” in
Proc. AllertonConf. Commun. Control Comput. , Monticello, IL, Oct. 2012,pp. 815–822, (full version at arXiv:1405.5618 ).[12] Andre Manoel, Florent Krzakala, Eric W. Tramel, andLenka Zdeborov´a, “SwAMP demo user’s manual,” https://github.com/eric-tramel/SwAMP-Demo .[13] U. S. Kamilov, V. K. Goyal, and S. Rangan, “Message-passing de-quantization with applications to compressed sens-ing,”