Vector Approximate Message Passing Algorithm for Structured Perturbed Sensing Matrix
11 Vector Approximate Message Passing Algorithm forStructured Perturbed Sensing Matrix
Jiang Zhu, Qi Zhang, Xiangming Meng and Zhiwei Xu
Abstract
In this paper, we consider a general form of noisy compressive sensing (CS) where the sensing matrix is not preciselyknown. Such cases exist when there are imperfections or unknown calibration parameters during the measurement process.Particularly, the sensing matrix may have some structure, which makes the perturbation follow a fixed pattern. While previouswork has focused on extending the approximate message passing (AMP) and LASSO algorithm to deal with the independent andidentically distributed (i.i.d.) perturbation, we propose the robust variant vector approximate message passing (VAMP) algorithmwith the perturbation being structured, based on the recent VAMP algorithm. The performance of the robust version of VAMP isdemonstrated numerically.
Keywords:
VAMP, structured perturbation, compressed sensingI. I
NTRODUCTION
Compressed Sensing (CS) aims to reconstruct an N -dimensional sparse signal from M underdetermined linear measurements y = Ax + w , where M < N and w is additive noise. It has been shown that in the absence of noise, perfect reconstructionis possible given that the signal is exactly K sparse and the measurement matrix satisfies certain properties (e.g., restrictedisometry, spark, null space). In practical applications, the measurement matrix A may not be known exactly due to, e.g.,model mismatch, imperfect calibration and imperfections in the signal acquisition hardware. Consequently, several works havestudied the recovery algorithm and performance bounds for the general signal with independent and identically distributed(i.i.d.) perturbation [1]. In addition, the measurement matrix uncertainty in quantized settings has also been studied [2].For the sparse signal recovery under i.i.d. perturbation, the recovery performance of algorithms such as basis pursuit (BP)and orthogonal matching pursuit (OMP) algorithm are analyzed [3, 4]. While the above works study the effect of perturbationon established algorithms, there also exist some algorithms which take the measurement matrix uncertainty into account. In[5], the Sparsity-cognizant Total Least Squares (S-TLS) approach is developed. A modified version of the Dantzig selectordealing with the matrix uncertainty is proposed in [6]. To taking the structure of perturbation into account, a weighted S-TLS(WS-TLS) is proposed, and numerical results demonstrate that WS-TLS performs significantly better than S-TLS [5].Approximate message passing (AMP) algorithm is a popular method for performing high dimensional inference, due to itslow computational complexity and good performance [7]. In [8], a generalized AMP (GAMP) algorithm is proposed to copewith the generalized linear model [8]. Since then, AMP and GAMP algorithm has been applied in various signal processingapplications, such as data detection and channel estimation [9]. Recently, orthogonal AMP [10] and vector AMP (VAMP)algorithms [11] are proposed, which can deal with a larger ensemble of measurement matrix set, compared to the AMPalgorithm. Given that some statistical parameters are unknown, expectation maximization approximate message passing (EM-AMP) and expectation maximization vector approximate message passing (EM-VAMP) are proposed to jointly recover theunknown signal and learn the statistical parameters [12, 13].In [14], an AMP algorithm is extended to deal with the sparse signal recovery problem under matrix uncertainty. Theperturbation is treated as an additive white Gaussian noise, and the matrix uncertainty GAMP (MU-GAMP) is proposed.Provided that the perturbation has some additional structure, an alternating MU-GAMP is proposed to jointly estimate themeasurement matrix and signal, in contrast with this paper where the structured perturbation is also treated as the randomvariables. In [15], the robust approximate message passing algorithm is proposed, and the mean square error of the Bayes-optimal reconstruction of sparse signals under matrix uncertainty is calculated via replica method.In this paper, we consider a kind of general structured perturbation. This structure arises because the sensing matrix hasknown structure such that its elements can not be chosen arbitrarily. For example, in signal and communication problems, theconvolving operation between channel and data can be reformulated as a linear regression problem. For the zero boundaryconditions, the sensing matrix has a Toeplitz structure, while a circulant structure appears for periodic boundary conditions[16]. As a result, the structure of model uncertainty has to be taken into account to improve the reconstruction performance.Since the equivalent noise (perturbation plus additive noise) is coloured and related to the unknown signal, in contrast with thewhite Gaussian noise in [14], conventional AMP and VAMP algorithm can not be applied in this scenario. Here we proposeto approximate the likelihood function in each iteration, and numerical results demonstrate the effectiveness of the proposedmethod. a r X i v : . [ ee ss . SP ] A ug p ( x ) x δ ( x − x ) x N ( y ; Ax ; Γ ( x )) Fig. 1. The factor graph used for the derivation of the robust VAMP algorithm. The circles represent variable nodes and the squares represent factor nodesfrom (5).
II. A
LGORITHM
The mathematical model we consider in this paper is [17] y = (cid:32) A + q (cid:88) i =1 e i E i (cid:33) x + w . (1)where we assume that y ∈ R M , A ∈ R M × N denotes the random known sensing matrix and (cid:107) A (cid:107) = N , where (cid:107) · (cid:107) F denotes the Frobenius norm, E i ∈ R M × N denotes the known structure of the perturbation, e i , i = 1 , · · · , q are i.i.d. randomvariables and satisfying e i ∼ N ( e i ; 0 , γ − e ) , x ∈ R N . The prior distribution of signal x follows x ∼ N (cid:81) i =1 p ( x i ) , where p ( x i ) is a sparsity-inducing prior, w ∼ N ( , γ − w I M ) . Note that [14] and [17] study model (1). However, [14] treats { a i } qi =1 asunknown deterministic parameters in contrast to [17] as random parameters, which correspond to two classical ways to modelmeasurement uncertainty. As shown in [17], the strategy of modeling measurement uncertainty as random parameters yieldsaccurate results. The drawback is that one needs to estimate the statistics of the random parameters. Compared to [17] whichassumes an unknown deterministic vector x , this paper enforces prior distribution of x . The perturbation model in (1) is verygeneral and we list some specific structure of perturbation as follows: • i.i.d perturbation, where the perturbation takes the form (cid:80) Mi =1 (cid:80) Nj =1 e ij E ij , e ij ∼ N (0 , γ − e ) and E ij is a all zero matrixexcept that the ( i, j ) -th element is one. • Matrix-restricted structured perturbation where the perturbation takes the form
DEC with D and C being known matrices.This structure can model the scenario in which the coefficients of the sensing matrix have unequal uncertainties, as shownin [17]. • Circulant structure perturbation. Here the N × N circulant matrix A is of the form A = a a · · · a N a N a · · · a N − ... ... ... ... a a · · · a . (2)As a result, the perturbation also takes this form [17].By defining z = (cid:80) qi =1 e i E i x + w , model (1) is equivalent to y = Ax + z , (3)where z ∼ N ( , (cid:80) qi =1 γ − e E i xx T E T i + γ − w I M ) (cid:44) N ( , Γ ( x )) . In the following text, we introduce the VAMP briefly . Westart with the joint probability density function of x and y as p ( y , x ) = p ( x ) p ( y | x ) = p ( x ) N ( y ; Ax , Γ ( x )) . (4)By splitting x into two identical variables x and x , we obtain an equivalent factorization p ( y , x , x ) = p ( x ) δ ( x − x ) N ( y ; Ax , Γ ( x )) . (5)The factor graph corresponding to the above factorization (5) is presented in Fig. 1. We then pass messages on this factorgraph. We initialize the message of the factor node δ ( x − x ) to the variable node x with µ δ → x ( x ) = N ( x ; r k , γ − k I N ) where k = 0 . Combing the factor node p ( x ) , the sum product (SP) belief on variable node x is b sp ( x ) ∝ p ( x ) N ( x ; r k , γ − k I N ) . (6)where ∝ means proportional to. We calculate the posterior means and variances as ˆ x k = E[ x | b sp ( x )] , (7a) η − k = < diag(Cov[ x | b sp ( x )]) >, (7b) Here N ( e i ; 0 , γ − e ) means that e i follows Gaussian distribution with mean zero and variance γ − e . Sometimes we use N (0 , γ − e ) instead when therandom variable is clear. For the detailed derivation of VAMP utilizing expectation propagation, please refer to [11]. where < x > = ( N (cid:80) i =1 x i ) /N , Cov[ ·| b sp ( x )] is the covariance matrix with respect to the belief estimate b sp ( x ) and diag( A ) returns a column vector whose elements are the main diagonal of A . Exploiting the expectation propagation, the above belief b sp ( x ) is approximated as a Gaussian distribution b app ( x ) given by b app ( x ) = N ( x ; ˆ x k , η − k I N ) . (8)Then we calculate the message from the variable node x to the factor node δ ( x − x ) , which is the ratio of the most recentapproximate belief b app ( x ) to the most recent message from δ ( x − x ) to x , i.e., µ x → δ = N ( x ; r k , γ k I N ) ∝ N ( x ; ˆ x k , η − k I N ) / N ( x ; r k , γ − k I N ) , (9)where r k and γ k are calculated according to line 5 in Algorithm 1. For the factor node δ ( x − x ) , the message from thefactor node δ ( x − x ) to the variable node x can be calculated directly as µ δ → x ( x ) = N ( x ; r k , γ k I N ) which can beviewed as the prior of the variable node x . For the rightmost factor node N ( y ; Ax ; Γ ( x )) , its covariance matrix dependson the unknown x . As a result, we approximate Γ ( x ) as Γ ( x ) ≈ E x ∼N ( r k ,γ − k I ) [ Γ ( x k )]= q (cid:88) i =1 γ − a A i ( r k r T2 k + γ − k I ) A T i + γ − w I m (cid:44) Γ k . (10)As a consequence, we obtain an approximate model with the likelihood N ( y k ; A k x ; γ − w, k I M ) , where y k = γ − w, k Γ − k y , (11) A k = γ − w, k Γ − k A , (12)where γ − w, k is to ensure (cid:107) A k (cid:107) = N and γ w, k = (cid:107) Γ − k A (cid:107) /N . With such an approximation, the SP belief on variable x is b sp ( x ) ∝ N ( y k ; A k x ; γ − w, k I M ) N ( x , r k , γ k I N ) . (13)Utilizing the expectation propagation, the SP belief b sp ( x ) on variable x can be further approximated as b app ( x ) = N ( x ; ˆ x k , η − k I N ) , (14)where ˆ x k = ( γ w, k A T2 k A k + γ k I ) − ( γ w, k A T2 k y k + γ k r k ) ,η − k = 1 N Tr (cid:2) ( γ w, k A T2 k A k + γ k I ) − (cid:3) . (15a)We then obtain the message from the variable node x to the factor node δ ( x − x ) with µ x → δ ( x ) ∝ b app ( x ) / N ( x , r k , γ k I N )= N ( x ; r ,k +1 , γ − ,k +1 I n ) , (16)where r ,k +1 and γ − ,k +1 are given in line 10 in Algorithm 1. Similarly, we calculate the message from the variable node x to the factor node δ ( x − x ) as µ δ → x ( x ) = µ x → δ ( x ) , (17)which closes the loop of the proposed VAMP algorithm and is shown in Algorithm 1.Now we discuss the computation complexity of Algorithm 1. For the VAMP presented in [11], the main computationalburden lies in the SVD of the sensing matrix, which performs only once. For Algorithm 1, the main additional computationalburden lies in line 7, which involves in calculating the eigenvalue decomposition of Γ k and the singular value decompositionof A k for each iteration. For some cases, the computation complexity of Algorithm 1 is comparable to that of VAMP. Giventhat the model is y = ( A + E ) x + n and the elements of perturbation E are i.i.d., where E ij ∼ N (0 , γ − e ) , one can see that Γ ( x ) = ( γ − w + γ − e (cid:107) x (cid:107) ) I M and the whitening operation in line 7 is unnecessary. Algorithm 1
Vector AMP in perturbed setting Initialize r and γ ≥ and set the maximum number of iterations K it ; for k = 0 , , · · · , K it do // Denoising Calculate ˆ x k and η k according to (7). γ k = η k − γ k , r k = ( η k ˆ x k − γ k r k ) /γ k // Whitening Approximate Γ ( x ) with Γ k and obtain the equivalent y k (11), A k (12) and γ w, k . // LMMSE estimation Calculate ˆ x k and η k according to (15a). γ ,k +1 = η k − γ k , r ,k +1 = ( η k ˆ x k − γ k r k ) /γ ,k +1 end for Return ˆ x K it . III. N UMERICAL R ESULTS
In this section, numerical results are performed to verify the effectiveness of the proposed robust VAMP. The performanceof the following algorithms are evaluated: • The AMP-oracle algorithm with precisely known sensing matrix • The PI-AMP algorithm which does not take the perturbation into account. • The MU-GAMP algorithm presented in [14]. • The VAMP-oracle algorithm with precisely known sensing matrix • The VAMP-PI algorithm which ignores perturbation • The VAMP-PC algorithm shown in Algorithm 1 which considers model perturbation.In the numerical simulation, we assume a Bernoulli Gaussian prior, i.e., p ( x i ) = (1 − ρ ) δ ( x i ) + ρ N ( x i , µ x , σ x ) , where ρ = 0 . , µ x = 0 and σ x = 1 . For the first two numerical experiments, the elements of matrix A are i.i.d. drawn from Gaussiandistribution. We assume that each deterministic E i is also drawn from Gaussian distribution, and we set M = 0 . N and q = N . The normalized mean square error (NMSE) is defined as NMSE(ˆ x ) = 10 log (cid:107) x − ˆ x (cid:107) (cid:107) x (cid:107) , where x denotes the true value.We also define SNR w (cid:44)
10 log (cid:107) Ax (cid:107) (cid:107) w (cid:107) and SNR e (cid:44)
10 log (cid:107) (cid:80) qi =1 e i E i x (cid:107) (cid:107) w (cid:107) . The maximum number of iterations is K it = 60 .In the first numerical simulation, the NMSE versus iteration is presented. We set SNR w = 30dB and SNR e = 20dB . FromFig. 2, one can see that the oracle VAMP and GAMP algorithm achieves the lowest NMSE, and the oracle VAMP achievesthe fastest speed of convergence. For unknown structured perturbation, PC-VAMP works better than MU-GAMP. Iteration k -35-30-25-20-15-10-505 N M SE ( d B ) PC-VAMPPI-VAMPoracle VAMPGAMPMU-GAMPoracle GAMP
Fig. 2. NMSE versus algorithm iteration in a single realization
In the second simulation, all the parameters are the same as that in the first simulation and
SNR w = 30dB . In Fig. 3, wesee that there exists obvious performance gap between PI-VAMP algorithm and MU-GAMP given SNR e ≤ . Comparedto the MU-GAMP algorithm, PC-VAMP algorithm works better. When the perturbation is small such that SNR e ≥ , theperformances of all the AMP and VAMP algorithms are similar.
10 15 20 25 30 35 40 45 50
SNR e (dB) -35-30-25-20-15-10-50 m ean N M SE ( d B ) PC-VAMPoracle LMMSEPI-VAMPGAMPMU-GAMP
Fig. 3. Mean NMSE versus
SNR e . The reported NMSE is averaged over realizations. p original image oracle VAMP, PSNR = 37dB GAMP, PSNR = 21dBMU-GAMP, PSNR = 22dB PI-VAMP, PSNR = 20dB PC-VAMP, PSNR = 25dB Fig. 4. × image recovery results. The last experiment investigates the performance of PC-VAMP algorithm for real image recovery. We threshold the waveletcoefficients such that the sparsity is ρ = 0 . . We set µ x = 3 . × − , σ x = 1 . × − . We use a N × N circulantmatrix (2), set a i = 0 . i , i = 0 , · · · , N − and N = 64 . The perturbation also has the circulant structure. We use a randommatrix to compress the observations such that the measurement ratio is . . For this compressed observation model, we set SNR w = 40dB and SNR e = 20dB . From Fig. 4, it can be seen that PC-VAMP yields the best recovery results with theperturbation being unknown, and the PSNR is . IV. C ONCLUSION
In this paper, we propose a matrix-uncertainty extension of the VAMP algorithm, when some structured perturbation isadded on the sensing matrix. By iteratively approximating the original likelihood function with constant covariance matrix, weobtain a modified VAMP algorithm. Numerical results demonstrate the effectiveness of the proposed algorithm.R
EFERENCES [1] A. Wiesel, Y. C. Eldar and A. Yeredor, “Linear regression with Gaussian model uncertainty: Algorithms and bounds,”
IEEE Trans. Signal Process. , vol. 56, no. 6, pp. 2194-2205, Jun. 2008.[2] J. Zhu, X. Wang, X. Lin, and Y. Gu, “Maximum likelihood estimation from sign measurements with sensing matrixperturbation,”
IEEE Trans. Signal Process. , 62(15):3741-3753, 2014.[3] M. A. Herman and T. Strohmer, “General deviants: An analysis of perturbations in compressed sensing,”
IEEE J. Sel. Top.Signal Process. , vol. 4, no. 2, pp. 342-349, 2010.[4] J. Ding, L. Chen, and Y. Gu, “Perturbation analysis of orthogonal matching pursuit,”
IEEE Trans. Signal Process. , vol. 61,no. 2, pp. 398-410, 2013.[5] H. Zhu, G. Leus, and G. Giannakis, “Sparsity-cognizant total leastsquares for perturbed compressive sampling,”
IEEETrans. Signal Process. , vol. 59, no. 5, pp. 2002-2016, May 2011. [6] M. Rosenbaum and A. Tsybakov, “Sparse recovery under matrix uncertainty,”
The Annals of Statistics , vol. 38, no. 5, pp.2620-2651, 2010.[7] D. L. Donoho, A. Maleki and A. Montanari, “Message passing algorithms for compressed sensing,”
Proc. Nati. Acad. Sci. ,pp. 18914-18919, 2009.[8] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” avaliable athttps://arxiv.org/pdf/1010.5141v2.pdf, 2012.[9] P. Schniter, “A message-passing receiver for BICM-OFDM over unknown clustered-sparse channels,”
IEEE J. Sel. Top.Signal Process. , vol. 5, no. 8, pp. 1462-1474, Dec. 2011.[10] J. Ma and L. Ping, “Orthogonal AMP,”
IEEE Access, vol. 5, pp. 2620-2033, Jan. 2017.[11] S. Rangan, P. Schniter and A. Fletcher, “Vector approximate message passing”, avaliable athttps://arxiv.org/pdf/1610.03082.pdf.[12] J. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,”
IEEE Trans. SignalProcess. , vol. 61, no. 19, pp. 4658-4672, Oct. 2013.[13] A. K. Fletcher and P. Schniter, “Learning and free energies for vector approximate message passing,” avaliable athttps://arxiv.org/pdf/1602.08207.pdf.[14] J. T. Parker, V. Cevher and P. Schniter, “Compressive sensing under matrix uncertainties: An approximate message passingapproach,”
ASILOMAR , Pacific Grove, CA, USA, Nov. 2011, pp. 804-808.[15] F. Krzakala, M. Mezard, and L. Zdeborov´a, “Compressed sensing under matrix uncertainty: Optimum thresholds androbust approximate message passing,”
ICASSP , 2013.[16] P. C. Hansen, J. G. Nagy, and D. P. O’Leary,
Deblurring Images: Matrices, Spectra, and Filtering , SIAM, Philadelphia,PA, 2006.[17] A. Beck and Y. C. Eldar, “Structured total maximum likelihood: An alternative to structured total least squares,”