Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing
aa r X i v : . [ c s . I T ] O c t ANALYSIS OF MAP ESTIMATION VIA THE REPLICA METHOD AND APPLICATIONS TO COMPRESSED SENSING 1
Asymptotic Analysis of MAP Estimationvia the Replica Method andApplications to Compressed Sensing
Sundeep Rangan, Alyson K. Fletcher, and Vivek K Goyal
Abstract —The replica method is a non-rigorous but well-known technique from statistical physics used in the asymptoticanalysis of large, random, nonlinear problems. This paper appliesthe replica method, under the assumption of replica symmetry,to study estimators that are maximum a posteriori (MAP) undera postulated prior distribution. It is shown that with randomlinear measurements and Gaussian noise, the replica-symmetricprediction of the asymptotic behavior of the postulated MAPestimate of an n -dimensional vector “decouples” as n scalarpostulated MAP estimators. The result is based on applyinga hardening argument to the replica analysis of postulatedposterior mean estimators of Tanaka and of Guo and Verd ´u.The replica-symmetric postulated MAP analysis can be readilyapplied to many estimators used in compressed sensing, includingbasis pursuit, lasso, linear estimation with thresholding, andzero norm-regularized estimation. In the case of lasso estimationthe scalar estimator reduces to a soft-thresholding operator,and for zero norm-regularized estimation it reduces to a hard-threshold. Among other benefits, the replica method provides acomputationally-tractable method for precisely predicting var-ious performance metrics including mean-squared error andsparsity pattern recovery probability. Index Terms —Compressed sensing, Laplace’s method, largedeviations. least absolute shrinkage and selection operator (lasso),nonlinear estimation, non-Gaussian estimation, random matrices,sparsity, spin glasses, statistical mechanics, thresholding
I. I
NTRODUCTION
Estimating a vector x ∈ R n from measurements of the form y = Φ x + w , (1)where Φ ∈ R m × n represents a known measurement matrix and w ∈ R m represents measurement errors or noise, is a genericproblem that arises in a range of circumstances. When thenoise w is i.i.d. zero-mean Gaussian with variance σ and x is i.i.d. with components x j having a probability distributionfunction p ( x j ) , the maximum a posteriori (MAP) estimate is This material is based upon work supported in part by a Universityof California President’s Postdoctoral Fellowship and the National ScienceFoundation under CAREER Grant No. 0643836.S. Rangan (email: [email protected]) is with the Department of Electricaland Computer Engineering, Polytechnic Institute of New York University.A. K. Fletcher (email: [email protected]) is with the Departmentof Electrical Engineering and Computer Sciences, University of California,Berkeley.V. K. Goyal (email: [email protected]) is with the Department of ElectricalEngineering and Computer Science and the Research Laboratory of Electron-ics, Massachusetts Institute of Technology. given by ˆ x pmap ( y ) = arg min x ∈ R n σ k y − Φ x k + n X j =1 f ( x j ) , (2)where f ( x j ) = − log p ( x j ) . Estimators of the form (2) arealso used with the regularization function f ( x j ) or noise levelparameter σ not matching the true prior or noise level, eithersince those quantities are not known or since the optimizationin (2) using the true values is too difficult to compute. In suchcases, the estimator (2) can be interpreted as a MAP estimatefor a postulated distribution and noise level, and we will thuscall estimators of the form (2) postulated MAP estimators .Due to their prevalence, characterizing the behavior ofpostulated MAP estimators is of interest in a wide range ofapplications. However, for most regularization functions f ( · ) ,the postulated MAP estimator (2) is nonlinear and not easyto analyze. Even if, for the purpose of analysis, one assumesseparable priors on x and w , the analysis of the postulatedMAP estimate may be difficult since the matrix Φ couples the n unknown components of x with the m measurements in thevector y .This paper provides a general analysis of postulated MAPestimators based on the replica method —a non-rigorous butwidely-used method from statistical physics for analyzinglarge random systems. It is shown that, under a key assumptionof replica symmetry described below, the replica methodpredicts that with certain large random Φ and Gaussian w ,there is an asymptotic decoupling of the vector postulatedMAP estimate (2) into n scalar MAP estimators. Specifically,the replica method predicts that the joint distribution of eachcomponent x j of x and its corresponding component ˆ x j inthe estimate vector ˆ x pmap ( y ) is asymptotically identical tothe outputs of a simple system where ˆ x j is a postulated MAPestimate of the scalar random variable x j observed in Gaussiannoise. Using this scalar equivalent model, one can then readilycompute the asymptotic joint distribution of ( x j , ˆ x j ) for anycomponent j .The replica method’s non-rigorous but simple prescriptionfor computing the asymptotic joint componentwise distribu-tions has three key, attractive features: • Sharp predictions:
Most importantly, the replicamethod provides—under the assumption of the replicahypotheses—not just bounds, but sharp predictions ofthe asymptotic behavior of postulated MAP estimators.From the joint distribution, various further computations
ANALYSIS OF MAP ESTIMATION VIA THE REPLICA METHOD AND APPLICATIONS TO COMPRESSED SENSING can be made, to provide precise predictions of quantitiessuch as the mean-squared error (MSE) and the errorprobability of any componentwise hypothesis testcomputed from a postulated MAP estimate. • Computational tractability:
Since the scalar equivalentmodel involves only a scalar random variable x j , scalarGaussian noise, and scalar postulated MAP estimate ˆ x j , any quantity derived from the joint distribution canbe computed numerically from one- or two-dimensionalintegrals. • Generality:
The replica analysis can incorporate arbitraryseparable distributions on x and regularization functions f ( · ) . It thus applies to a large class of estimators and testscenarios. A. Replica Method and Contributions of this Work
The replica method was originally developed by Edwardsand Anderson [1] to study the statistical mechanics of spinglasses. Although not fully rigorous from the perspective ofprobability theory, the technique was able to provide explicitsolutions for a range of complex problems where many othermethods had previously failed. Indeed, the replica method andrelated ideas from statistical mechanics have found success ina number of classic NP-hard problems including the travelingsalesman problem [2], graph partitioning [3], k -SAT [4] andothers [5]. Statistical physics methods have also been appliedto the study of error correcting codes [6], [7]. There are nowseveral general texts on the replica method [8]–[11].The replica method was first applied to the study of non-linear MAP estimation problems by Tanaka [12]. That workapplied what is called a replica symmetric analysis to multiuserdetection for large CDMA systems with random spreadingsequences. M¨uller [13] considered a mathematically-similarproblem for MIMO communication systems. In the context ofthe estimation problem considered here, Tanaka’s and M¨uller’spapers essentially characterized the behavior of the MAPestimator of a vector x with i.i.d. binary components observedthrough linear measurements of the form (1) with a largerandom Φ and Gaussian w .Tanaka’s results were then generalized in a remarkable paperby Guo and Verd´u [14] to vectors x with arbitrary separabledistributions. Guo and Verd´u’s result was also able to incorpo-rate a large class of postulated minimum mean squared error(MMSE) estimators, where the estimator may assume a priorthat is different from the actual prior. Replica analyses havealso been applied to related communication problems such aslattice precoding for the Gaussian broadcast channel [15]. Abrief review of the replica method analysis by Tanaka [12] andGuo and Verd´u [14] is provided in Appendix A.The result in this paper is derived from Guo and Verd´u [14]by a standard hardening argument. Specifically, the postulatedMAP estimator (2) is first expressed as a limit of the postu-lated MMSE estimators analyzed in [14]. Then, the behaviorof the postulated MAP estimator can be derived by takingappropriate limits of the results in [14] on postulated MMSEestimators. This hardening technique is well-known and isused in Tanaka’s original work [12] in the analysis of MAPestimators with binary and Gaussian priors. Through the limiting analysis via hardening, the postulatedMAP results here follow from the postulated MMSE resultsin [14]. Thus, the central contribution of this work is to workout these limits to provide a set of equations for a general classof postulated MAP estimators. In particular, while Tanaka hasderived the equations for replica predictions of MAP estimatesfor binary and Gaussian priors, the results here provide explicitequations for general priors and regularization functions. B. Replica Assumptions
The non-rigorous aspect of the replica method involves aset of assumptions that include a self-averaging property, thevalidity of a “replica trick,” and the ability to exchange certainlimits. Importantly, this work is based on an additional strongassumption of replica symmetry . As described in Appendix A,the replica method reduces the calculation of a certain freeenergy to an optimization problem over covariance matrices.The replica symmetric (RS) assumption is that the maxima inthis optimization satisfy certain symmetry properties. This RSassumption is not always valid, and indeed Appendix A pro-vides several examples from other applications of the replicamethod where replica symmetry breaking (RSB) solutions areknown to be needed to provide correct predictions.For the analysis of postulated MMSE estimators, [12]and [14] derive analytic conditions for the validity of theRS assumption only in some limited cases. Our analysis ofpostulated MAP estimators depends on [14], and, unfortu-nately, we have not provided a general analytic test for thevalidity of the RS assumption in this work. Following [14], ourapproach instead is to compare, where possible, the predictionsunder the RS assumption to numerical simulations of thepostulated MAP estimator. As we will see in Section VI,the RS predictions appear to be accurate, at least for manycommon estimators arising in compressed sensing. That beingsaid, the RS analysis can also provide predictions for optimalMMSE and zero norm-regularized estimators that cannot besimulated tractably. Extra caution must be applied in assumingthe validity of the RS predictions for these estimators.To emphasize our dependence on these unprovenassumptions—notably replica symmetry—we will referto the general MMSE analysis in Guo and Verd´u’s work [14]as the replica symmetric postulated MMSE decouplingproperty . Our main result will be called the replicasymmetric postulated MAP decoupling property . C. Connections to Belief Propagation
Although not explored in this work, it is important to pointout that the results of the replica analysis of postulated MMSEand MAP estimation are similar to those derived for beliefpropagation (BP) estimation. Specifically, there is now a largebody of work analyzing BP and approximate BP algorithms forestimation of vectors x observed through linear measurementsof the form (1) with large random Φ . For both certain largesparse random matrices [16]–[22], and more recently forcertain large dense random matrices [23]–[26], several resultsnow show that BP estimates exhibit an asymptotic decouplingproperty similar to RS predictions for postulated MMSE and ANGAN, FLETCHER, AND GOYAL 3
MAP estimators. Graphical model arguments have also beenused to establish a decoupling property under a very general,random sparse observation model [27].The effective noise level in the scalar equivalent model forBP and approximate BP methods can be predicted by certainstate evolution equations similar to density evolution analysisof BP decoding of LDPC codes [28], [29]. It turns out that inseveral cases, the fixed point equations for state evolution areidentical to the equations for the effective noise level predictedby the RS analysis of postulated MMSE and MAP estimators.In particular, the equations in [23], [24] agree exactly with theRS predictions for LASSO estimation given in this work.These connections are significant in several regards: Firstly,the state evolution analysis of BP algorithms can be madefully rigorous under suitable assumptions and thus provides anindependent, rigorous justification for some of the RS claims.Secondly, the replica method provides only an analysis ofestimators, but no method to actually compute those estima-tors. In contrast, the BP and approximate BP algorithms pro-vide a possible tractable method for achieving the performancepredicted by the replica method.Finally, the BP analysis provides an algorithmic intuitionas to why decoupling may occur (and hence when replicasymmetry may be valid): As described in [30], BP andapproximate BP algorithms can be seen as iterative procedureswhere the vector estimation problem is reduced to a sequenceof “decoupled” scalar estimation problems. This decouplingis based essentially on the principle that, in each iteration,when estimating one component x j , the uncertainty in theother components { x k , k = j } can be aggregated as Gaussiannoise. Based on the state evolution analysis of BP algorithms,we know that this Central Limit Theorem-based approximationis asymptotically valid when the components of the mixingmatrix Φ are sufficiently dense and independent. Thus, thevalidity of RS is possibly connected to validity of this Gaussianapproximation. D. Applications to Compressed Sensing
As an application of our main result, we will develop afew analyses of estimation problems that arise in compressedsensing [31]–[33]. In compressed sensing , one estimates asparse vector x from random linear measurements. A vector x is sparse when its number of nonzero entries k is smallerthan its length n . Generically, optimal estimation of x with asparse prior is NP-hard [34]. Thus, most attention has focusedon greedy heuristics such as matching pursuit [35]–[38] andconvex relaxations such as basis pursuit [39] or lasso [40].While successful in practice, these algorithms are difficult toanalyze precisely.Compressed sensing of sparse x through (1) (using innerproducts with rows of Φ ) is mathematically identical to sparse approximation of y with respect to columns of Φ .An important set of results for both sparse approximation andcompressed sensing are the deterministic conditions on the co-herence of Φ that are sufficient to guarantee good performanceof the suboptimal methods mentioned above [41]–[43]. Theseconditions can be satisfied with high probability for certain large random measurement matrices. Compressed sensing hasprovided many sufficient conditions that are easier to satisfythan the initial coherence-based conditions. However, despitethis progress, the exact performance of most sparse estimatorsis still not known precisely, even in the asymptotic case oflarge random measurement matrices. Most results describe theestimation performance via bounds, and the tightness of thesebounds is generally not known.There are, of course, notable exceptions including [44]and [45], which provide matching necessary and sufficientconditions for recovery of strictly sparse vectors with basispursuit and lasso. However, even these results only considerexact recovery and are limited to measurements that are noise-free or measurements with a signal-to-noise ratio (SNR) thatscales to infinity.Many common sparse estimators can be seen as MAPestimators with certain postulated priors. Most importantly,lasso and basis pursuit are MAP estimators assuming a Lapla-cian prior. Other commonly-used sparse estimation algorithms,including linear estimation with and without thresholdingand zero norm-regularized estimators, can also be seen aspostulated MAP-based estimators. For these postulated MAP-based sparse estimation algorithms, the replica method canprovide non-rigorous but sharp, easily-computable predictionsfor the asymptotic behavior. In the context of compressedsensing, this analysis can predict various performance metricssuch as MSE or fraction of support recovery. The expressionscan apply to arbitrary ratios k/n , n/m , and SNR . Due tothe generality of the replica analysis, the methodology canalso incorporate arbitrary distributions on x including severalsparsity models, such as Laplacian, generalized Gaussian, andGaussian mixture priors. Discrete distributions can also bestudied.It should be pointed out that this work is not the first to useideas from statistical physics for the study of sparse estimation.Guo, Baron and Shamai [46] have provided a replica analysisof compressed sensing that characterizes not just the postu-lated MAP or postulated MMSE estimate, but the asymptoticposterior marginal distribution. That work also shows an inde-pendence property across finite sets of components. Merhav,Guo and Shamai [47] consider, among other applications, theestimation of a sparse vector x from measurements of theform y = x + w . In their model, there is no measurementmatrix such as Φ in (1), but the components of x are possiblycorrelated. Their work derives explicit expressions for theMMSE as a function of the probability distribution on thenumber of nonzero components. The analysis does not relyon replica assumptions and is fully rigorous. More recently,Kabashima, Wadayama and Tanaka [48] have used the replicamethod to derive precise conditions on which sparse signalscan be recovered with ℓ p -based relaxations such as lasso.Their analysis does not consider noise, but can find condi-tions on recovery on the entire vector x , not just individualcomponents. Also, using free probability theory [49], [50], arecent analysis [51] extends the replica analysis of compressedsensing to larger classes of matrices, including matrices Φ thatare possibly not i.i.d. ANALYSIS OF MAP ESTIMATION VIA THE REPLICA METHOD AND APPLICATIONS TO COMPRESSED SENSING
E. Outline
The remainder of the paper is organized as follows. Theprecise estimation problem is described in Section II. Wereview the RS postulated MMSE decoupling property of Guoand Verd´u in Section III. We then present our main result,an RS postulated MAP decoupling property, in Section IV.The results are applied to the analysis of compressed sensingalgorithms in Section V, which is followed by numericalsimulations in Section VI. Conclusions are possible avenuesfor future work are given in Section VII. The proof of themain result is somewhat long and given in a set of appendices;Appendix B provides an overview of the proof and a guidethrough the appendices with detailed arguments.II. E
STIMATION P ROBLEM AND A SSUMPTIONS
Consider the estimation of a random vector x ∈ R n fromlinear measurements of the form y = Φ x + w = AS / x + w , (3)where y ∈ R m is a vector of observations; Φ = AS / , with A ∈ R m × n , is a measurement matrix; S is a diagonal matrixof positive scale factors, S = diag ( s , . . . , s n ) , s j > (4)and w ∈ R m is zero-mean, white Gaussian noise. We considera sequence of such problems indexed by n , with n → ∞ . Foreach n , the problem is to determine an estimate b x of x fromthe observations y knowing the measurement matrix A andscale factor matrix S .The components x j of x are modeled as zero mean andi.i.d. with some prior probability distribution p ( x j ) . The per-component variance of the Gaussian noise is E | w j | = σ .We use the subscript “0” on the prior and noise level to dif-ferentiate these quantities from certain “postulated” values tobe defined later. When we develop applications in Section V,the prior p ( x j ) will incorporate presumed sparsity of x .In (3), we have factored Φ = AS / so that even with thei.i.d. assumption on { x j } nj =1 above and an i.i.d. assumptionon entries of A , the model can capture variations in powers ofthe components of x that are known a priori at the estimator.Specifically, multiplication by S / scales the variance of the j th component of x by a factor s j . Variations in the power of x that are not known to the estimator should be captured inthe distribution of x .We summarize the situation and make additional assump-tions to specify the problem precisely as follows:(a) The number of measurements m = m ( n ) is a determin-istic quantity that varies with n and satisfies lim n →∞ n/m ( n ) = β for some β ≥ . (The dependence of m on n is usuallyomitted for brevity.)(b) The components x j of x are i.i.d. with probability distri-bution p ( x j ) . All moments of x j are finite.(c) The noise w is Gaussian with w ∼ N (0 , σ I m ) .(d) The components of the matrix A are i.i.d. and distributedas A ij ∼ (1 / √ m ) A for some random variable A with zero mean, unit variance and all other moments of A finite.(e) The scale factors s j are i.i.d., satisfy s j > almost surely,and all moments of s j are finite.(f) The scale factor matrix S , measurement matrix A , vector x , and noise w are all independent.III. R EVIEW OF THE R EPLICA S YMMETRIC P OSTULATED
MMSE D
ECOUPLING P ROPERTY
We begin by reviewing the RS postulated MMSE decou-pling property of Guo and Verd´u [14].
A. Postulated MMSE Estimators
To define the concept of a postulated MMSE estimator,suppose one is given a “postulated” prior distribution p post and a postulated noise level σ that may be different fromthe true values p and σ . We define the postulated minimumMSE (PMMSE) estimate of x as ˆ x pmmse ( y ) = E (cid:0) x | y ; p post , σ (cid:1) = Z x p x | y ( x | y ; p post , σ ) d x , (5)where p x | y ( x | y ; q, σ ) is the conditional distribution of x given y under the x distribution q and noise variance σ specified as parameters after the semicolon. We will use thissort of notation throughout the rest of the paper, includingthe use of p without a subscript for the p.d.f. of the scalar orvector quantity understood from context. In this case, due tothe Gaussianity of the noise, we have p x | y ( x | y ; q, σ )= C − exp (cid:18) − σ k y − AS / x k (cid:19) q ( x ) , (6)where the normalization constant is C = Z exp (cid:18) − σ k y − AS / x k (cid:19) q ( x ) d x and q ( x ) is the joint p.d.f. q ( x ) = n Y j =1 q ( x j ) . In the case when p post = p and σ = σ , so that thepostulated and true values agree, the PMMSE estimate reducesto the true MMSE estimate. B. Decoupling under Replica Symmetric Assumption
The essence of the RS PMMSE decoupling property is thatthe asymptotic behavior of the PMMSE estimator is describedby an equivalent scalar estimator. Let q ( x ) be a probabilitydistribution defined on some set X ⊆ R . Given µ > , let p x | z ( x | z ; q, µ ) be the conditional distribution p x | z ( x | z ; q, µ )= (cid:20)Z x ∈X φ ( z − x ; µ ) q ( x ) dx (cid:21) − φ ( z − x ; µ ) q ( x ) (7) ANGAN, FLETCHER, AND GOYAL 5 where φ ( · ) is the Gaussian distribution φ ( v ; µ ) = 1 √ πµ e −| v | / (2 µ ) . (8)The distribution p x | z ( x | z ; q, µ ) is the conditional distributionof the scalar random variable x ∼ q ( x ) given an observationof the form z = x + √ µv, (9)where v ∼ N (0 , . Using this distribution, we can define thescalar conditional MMSE estimate ˆ x pmmsescalar ( z ; q, µ ) = Z x ∈X x p x | z ( x | z ; µ ) dx. (10)Also, given two distributions, p ( x ) and p ( x ) , and two noiselevels, µ > and µ > , define mse ( p , p , µ , µ , z )= Z x ∈X | x − ˆ x pmmsescalar ( z ; p , µ ) | p x | z ( x | z ; p , µ ) dx, (11)which is the MSE in estimating the scalar x from the variable z in (9) when x has a true distribution x ∼ p ( x ) and thenoise level is µ = µ , but the estimator assumes a distribution x ∼ p ( x ) and noise level µ = µ . Replica Symmetric Postulated MMSE Decoupling Prop-erty [14]:
Consider the estimation problem in Section II. Let ˆ x pmmse ( y ) be the PMMSE estimator based on a postulatedprior p post and postulated noise level σ . For each n ,let j = j ( n ) be some deterministic component index with j ( n ) ∈ { , . . . , n } . Then under replica symmetry, there exist effective noise levels σ and σ − eff such that:(a) As n → ∞ , the random vectors ( x j , s j , ˆ x pmmse j ) con-verge in distribution to the random vector ( x, s, ˆ x ) con-sistent with the block diagram in Fig. 1. Here x , s , and v are independent with x ∼ p ( x ) , s ∼ p S ( s ) , v ∼ N (0 , ,and ˆ x = ˆ x pmmsescalar ( z ; p post , µ p ) , (12a) z = x + √ µv, (12b)where µ = σ /s and µ p = σ − eff /s .(b) The effective noise levels satisfy the equations σ = σ + β E [ s mse ( p post , p , µ p , µ, z )] , (13a) σ − eff = σ + β E [ s mse ( p post , p post , µ p , µ p , z )] , (13b)where the expectations are taken over s ∼ p S ( s ) and z generated by (12b).This result asserts that the asymptotic behavior of the jointestimation of the n -dimensional vector x can be describedby n equivalent scalar estimators. In the scalar estimationproblem, a component x ∼ p ( x ) is corrupted by additiveGaussian noise yielding a noisy measurement z . The additivenoise variance is µ = σ /s , which is the effective noisedivided by the scale factor s . The estimate of that componentis then described by the (generally nonlinear) scalar estimator ˆ x pmmsescalar ( z ; p post , µ p ) . √ µ v v ∼ N (0 , µ = σ /sµ p = σ − eff /sx ∼ p ( x ) + ˆ x pmmsescalar ( · ; p post , µ p ) ˆ xz Fig. 1. Equivalent scalar model for the estimator behavior predicted by thereplica symmetric postulated MMSE decoupling property.
The effective noise levels σ and σ − eff are described bythe solutions to fixed-point equations (13). Note that σ and σ − eff appear implicitly on the left- and right-hand sides ofthese equations via the terms µ and µ p . In general, thereis no closed form solution to these equations. However, theexpectations can be evaluated via (one-dimensional) numericalintegration.It is important to point out that there may, in general, bemultiple solutions to the fixed-point equations (13). In thiscase, it turns out that the true solution is the minimizer of acertain Gibbs’ function described in [14]. C. Effective Noise and Multiuser Efficiency
To understand the significance of the effective noise level σ , it is useful to consider the following estimation problemwith side information. Suppose that when estimating thecomponent x j an estimator is given as side information thevalues of all the other components { x ℓ , ℓ = j } . Then, thishypothetical estimator with side information can “subtract out”the effect of all the known components and compute z j = 1 k a j k √ s j a ′ j y − X ℓ = j √ s ℓ a ℓ x ℓ , where a ℓ is the ℓ th column of the measurement matrix A . Itis easily checked that z j = 1 k a j k √ s j a ′ j (cid:0) √ s j a j x j + w (cid:1) = x j + √ µ v j , (14)where v j = 1 σ k a j k a ′ j w , µ = σ s j . Thus, (14) shows that with side information, estimation of x j reduces to a scalar estimation problem where x j is corruptedby additive noise √ µ v j . Since w is Gaussian with meanzero and per-component variance σ , v j is Gaussian withmean zero and variance / k a j k . Also, since a j is an m -dimensional vector whose components are i.i.d. with variance /m , k a j k → as m → ∞ . Therefore, for large m , v j willapproach v j ∼ N (0 , .Comparing (14) with (12b), we see that the equivalent scalarmodel predicted by the RS PMMSE decoupling property (12b)is identical to the estimation with perfect side information (14),except that the noise level is increased by a factor /η = µ/µ = σ /σ . (15) ANALYSIS OF MAP ESTIMATION VIA THE REPLICA METHOD AND APPLICATIONS TO COMPRESSED SENSING
In multiuser detection, the factor η is called the multiuserefficiency [52], [53].The multiuser efficiency can be interpreted as degradationin the effective signal-to-noise ratio (SNR): With perfect side-information, an estimator using z j in (14) can estimate x j withan effective SNR of SNR ( s ) = 1 µ E | x j | = sσ E | x j | . (16)In CDMA multiuser detection, the factor SNR ( s ) is called thepost-despreading SNR with no multiple access interference.The RS PMMSE decoupling property shows that without sideinformation, the effective SNR is given by SNR ( s ) = 1 µ E | x j | = sσ E | x j | . (17)Therefore, the multiuser efficiency η in (15) is the ratio of theeffective SNR with and without perfect side information.IV. A NALYSIS OF P OSTULATED
MAP E
STIMATORS VIA H ARDENING
The main result of the paper is developed in this section.
A. Postulated MAP Estimators
Let
X ⊆ R be some (measurable) set and consider anestimator of the form ˆ x pmap ( y ) = arg min x ∈X n γ k y − AS / x k + n X j =1 f ( x j ) , (18)where γ > is an algorithm parameter and f : X → R issome scalar-valued, nonnegative cost function. We will assumethat the objective function in (18) has a unique essentialminimizer for almost all y .The estimator (18) can be interpreted as a MAP estimator.To see this, suppose that for u sufficiently large, Z x ∈X n e − uf ( x ) d x < ∞ , (19)where we have extended the notation f ( · ) to vector argumentssuch that f ( x ) = n X j =1 f ( x j ) . (20)When (19) is satisfied, we can define a prior probabilitydistribution depending on u : p u ( x ) = (cid:20)Z x ∈X n exp( − uf ( x )) d x (cid:21) − exp( − uf ( x )) . (21)Also, let σ u = γ/u. (22)Substituting (21) and (22) into (6), we see that p x | y ( x | y ; p u , σ u )= C u exp (cid:20) − u (cid:18) γ k y − AS / x k + f ( x ) (cid:19)(cid:21) (23) for some constant C u that does not depend on x . (The scalingof the noise variance along with p u enables the factorizationin the exponent of (23).) Comparing to (18), we see that ˆ x pmap ( y ) = arg max x ∈X n p x | y ( x | y ; p u , σ u ) . Thus for all sufficiently large u , we indeed have a MAPestimate—assuming the prior p u and noise level σ u . B. Decoupling under Replica Symmetric Assumption
To analyze the postulated MAP (PMAP) estimator, weconsider a sequence of postulated MMSE estimators indexedby u . For each u , let b x u ( y ) = E (cid:0) x | y ; p u , σ u (cid:1) , (24)which is the MMSE estimator of x under the postulated prior p u in (21) and noise level σ u in (22). Using a standardlarge deviations argument, one can show that under suitableconditions lim u →∞ b x u ( y ) = ˆ x pmap ( y ) for all y . A formal proof is given in Appendix D (seeLemma 4). Under the assumption that the behaviors of thepostulated MMSE estimators are described by the RS PMMSEdecoupling property, we can then extrapolate the behavior ofthe postulated MAP estimator. This will yield our main result.In statistical physics the parameter u has the interpretationof inverse temperature (see a general discussion in [54]). Thus,the limit as u → ∞ can be interpreted as a cooling or“hardening” of the system.In preparation for the main result, define the scalar MAPestimator ˆ x pmapscalar ( z ; λ ) = arg min x ∈X F ( x, z, λ ) (25)where F ( x, z, λ ) = 12 λ | z − x | + f ( x ) . (26)The estimator (25) plays a similar role as the scalar MMSEestimator (10).The main result pertains to the estimator (18) applied to thesequence of estimation problems defined in Section II. Ourassumptions are as follows: Assumption 1:
For all u > sufficiently large, assume thatthe postulated MMSE estimator (5) with the postulated prior p u in (21) and postulated noise level σ u in (22) satisfy the RSPMMSE decoupling property in Section III-B. Assumption 2:
Let σ ( u ) and σ − eff ( u ) be the effectivenoise levels when using the postulated prior p u and noise level σ u . Assume the following limits exist: σ , map = lim u →∞ σ ( u ) ,γ p = lim u →∞ uσ − eff ( u ) . Assumption 3:
Suppose for each n , ˆ x uj ( n ) is the MMSEestimate of the component x j for some index j ∈ { , . . . , n } based on the postulated prior p u and postulated noise level ANGAN, FLETCHER, AND GOYAL 7 σ u . Then, assume that limits can be interchanged to give thefollowing equality: lim u →∞ lim n →∞ ˆ x uj ( n ) = lim n →∞ lim u →∞ ˆ x uj ( n ) , where the limits are in distribution. Assumption 4:
For every n , A , and S , assume that foralmost all y , the minimization in (18) achieves a uniqueessential minimum. Here, essential should be understood inthe standard measure-theoretic sense in that the minimum andessential infimum agree. Assumption 5:
Assume that f ( x ) is nonnegative and satis-fies lim | x |→∞ f ( x )log | x | = ∞ , where the limit must hold over all sequences in X with | x | →∞ . If X is compact, this limit is automatically satisfied (sincethere are no sequences in X with | x | → ∞ ). Assumption 6:
For all λ ∈ R and almost all z , the mini-mization in (25) has a unique, essential minimum. Moreover,for all λ and almost all z , there exists a σ ( z, λ ) such that lim x → ˆ x | x − ˆ x | F ( x, z, λ ) − F (ˆ x, z, λ )) = σ ( z, λ ) , (28)where ˆ x = ˆ x pmapscalar ( z ; λ ) .Assumption 1 is simply stated to again point out that we areassuming the validity of replica symmetry for the postulatedMMSE estimates. We make the additional Assumptions 2and 3, which are also difficult to verify but similar in spirit.Taken together, Assumptions 1–3 reflect the main limitationsof the replica symmetric analysis and precisely state themanner in which the analysis is non-rigorous.Assumptions 4–6 are technical conditions on the existenceand uniqueness of the MAP estimate. Assumption 4 will betrue for any strictly convex regularization f ( x j ) , althoughit is difficult to verify in the non-convex case. The othertwo assumptions, Assumptions 5 and 6, will be verified forthe problems of interest. In fact, we will explicitly calculate σ ( z, λ ) .We can now state our extension of the RS PMMSE decou-pling property. Replica Symmetric Postulated MAP Decoupling Property:
Consider the estimation problem in Section II. Let ˆ x pmap ( y ) be the postulated MAP estimator (18) defined for some f ( x ) and γ > satisfying Assumptions 1–6. For each n ,let j = j ( n ) be some deterministic component index with j ( n ) ∈ { , . . . , n } . Then under replica symmetry (as part ofAssumption 1):(a) As n → ∞ , the random vectors ( x j , s j , ˆ x pmap j ) convergein distribution to the random vector ( x, s, ˆ x ) consistentwith the block diagram in Fig. 2 for the limiting effectivenoise levels σ and γ p in Assumption 2. Here x , s , and v are independent with x ∼ p ( x ) , s ∼ p S ( s ) , v ∼ N (0 , ,and ˆ x = ˆ x pmapscalar ( z, λ p ) , (29a) z = x + √ µv, (29b)where µ = σ , map /s and λ p = γ p /s . √ µ v v ∼ N (0 , µ = σ , map /sλ p = γ p /sx ∼ p ( x ) + ˆ x pmapscalar ( · ; λ p ) ˆ xz Fig. 2. Equivalent scalar model for the estimator behavior predicted by thereplica symmetric postulated MAP decoupling property. (b) The limiting effective noise levels σ , map and γ p satisfythe equations σ , map = σ + β E (cid:2) s | x − ˆ x | (cid:3) , (30a) γ p = γ + β E (cid:2) sσ ( z, λ p ) (cid:3) , (30b)where the expectations are taken over x ∼ p ( x ) , s ∼ p S ( s ) , and v ∼ N (0 , , with ˆ x and z defined in (29). Proof:
See Appendices B–F.Analogously to the RS PMMSE decoupling property, the RSPMAP decoupling property asserts that asymptotic behavior ofthe PMAP estimate of any single component of x is describedby a simple equivalent scalar estimator. In the equivalent scalarmodel, the component of the true vector x is corrupted byGaussian noise and the estimate of that component is givenby a scalar PMAP estimate of the component from the noise-corrupted version.V. A NALYSIS OF C OMPRESSED S ENSING
Our results thus far hold for any separable distribution for x (see Section II) and under mild conditions on the cost function f (see especially Assumption 5, but other assumptions alsoimplicitly constrain f ). In this section, we provide additionaldetails on replica analysis for choices of f that yield PMAPestimators relevant to compressed sensing. Since the role of f is to determine the estimator, this is not the same as choosingsparse priors for x . Numerical evaluations of asymptoticperformance with sparse priors for x are given in Section VI. A. Linear Estimation
We first apply the RS PMAP decoupling property to thesimple case of linear estimation. Linear estimators only usesecond-order statistics and generally do not directly exploitsparsity or other aspects of the distribution of the unknownvector x . Nonetheless, for sparse estimation problems, linearestimators can be used as a first step in estimation, followedby thresholding or other nonlinear operations [55], [56]. Itis therefore worthwhile to analyze the behavior of linearestimators even in the context of sparse priors.The asymptotic behavior of linear estimators with large ran-dom measurement matrices is well known. For example, usingthe Marˇcenko-Pastur theorem [57], Verd´u and Shamai [58]characterized the behavior of linear estimators with largei.i.d. matrices A and constant scale factors S = I . Tseand Hanly [59] extended the analysis to general S . Guo and ANALYSIS OF MAP ESTIMATION VIA THE REPLICA METHOD AND APPLICATIONS TO COMPRESSED SENSING
Verd´u [14] showed that both of these results can be recoveredas special cases of the general RS PMMSE decoupling prop-erty. We show here that the RS PMAP decoupling propertycan also recover these results. Although the calculations arevery similar to [14], and indeed we arrive at precisely the sameresults, walking through the computations will illustrate howthe RS PMAP decoupling property is used.To simplify the notation, suppose that the true prior on x is such that each component has zero mean and unit variance.Choose the cost function f ( x ) = 12 | x | , which corresponds to the negative log of a Gaussian prior alsowith zero mean and unit variance. With this cost function, thePMAP estimator (18) reduces to the linear estimator ˆ x pmap ( y ) = S / A ′ ( ASA ′ + γI ) − y . (31)When γ = σ , the true noise variance, the estimator (31) isthe linear MMSE estimate.Now, let us compute the effective noise levels from the RSPMAP decoupling property. First note that F ( x, z, λ ) in (26)is given by F ( x, z, λ ) = 12 λ | z − x | + 12 | x | , and therefore the scalar MAP estimator in (25) is given by ˆ x pmapscalar ( z ; λ ) = 11 + λ z. (32)A simple calculation also shows that σ ( z, λ ) in (28) is givenby σ ( z, λ ) = λ λ . (33)As part (a) of the RS PMAP decoupling property, let µ = σ , map /s and λ p = γ p /s . Observe that E (cid:2) s | x − ˆ x pmapscalar ( z ; λ p ) | (cid:3) ( a ) = E " s (cid:12)(cid:12)(cid:12)(cid:12) x −
11 + λ p z (cid:12)(cid:12)(cid:12)(cid:12) ( b ) = E " s (cid:12)(cid:12)(cid:12)(cid:12) λ p λ p x − √ µ λ p v (cid:12)(cid:12)(cid:12)(cid:12) ( c ) = s ( λ p + µ )(1 + λ p ) , (34)where (a) follows from (32); (b) follows from (29b); and (c)follows from the fact that x and v are uncorrelated with zeromean and unit variance. Substituting (33) and (34) into thefixed-point equations (30), we see that the limiting noise levels σ , map and γ p must satisfy σ , map = σ + β E " s ( λ p + µ )(1 + λ p ) ,γ p = γ + β E (cid:20) sλ p λ p (cid:21) , where the expectation is over s ∼ p S ( s ) . In the case when γ = σ , it can be verified that a solution to these fixed-pointequations is σ , map = γ p , which results in µ = λ p and σ , map = σ + β E (cid:20) sλ p λ p (cid:21) = σ + β E " sσ , map s + σ , map . (35)The expression (35) is precisely the Tse-Hanly formula [59]for the effective interference. Given a distribution on s , thisexpression can be solved numerically for σ , map . In thespecial case of constant s , (35) reduces to Verd´u and Shamai’sresult in [60] and can be solved via a quadratic equation.The RS PMAP decoupling property now states that forany component index j , the asymptotic joint distribution of ( x j , s j , ˆ x j ) is described by x j corrupted by additive Gaussiannoise with variance σ , map /s followed by a scalar linearestimator.As described in [14], the above analysis can also be appliedto other linear estimators including the matched filter (where γ → ∞ ) or the decorrelating receiver ( γ → ). B. Lasso Estimation
We next consider lasso estimation, which is widely usedfor estimation of sparse vectors. The lasso estimate [40](sometimes referred to as basis pursuit denoising [39]) is givenby ˆ x lasso ( y ) = arg min x ∈ R n γ k y − AS / x k + k x k , (36)where γ > is an algorithm parameter. The estimator isessentially a least-squares estimator with an additional k x k regularization term to encourage sparsity in the solution.The parameter γ is selected to trade off the sparsity of theestimate with the prediction error. An appealing feature oflasso estimation is that the minimization in (36) is convex;lasso thus enables computationally-tractable algorithms forfinding sparse estimates.The lasso estimator (36) is identical to the PMAP estimator(18) with the cost function f ( x ) = | x | . With this cost function, F ( x, z, λ ) in (26) is given by F ( x, z, λ ) = 12 λ | z − x | + | x | , and therefore the scalar MAP estimator in (25) is given by ˆ x pmapscalar ( z ; λ ) = T soft λ ( z ) , (37)where T soft λ ( z ) is the soft thresholding operator T soft λ ( z ) = z − λ, if z > λ ;0 , if | z | ≤ λ ; z + λ, if z < − λ. (38)The RS PMAP decoupling property now states that thereexists effective noise levels σ , map and γ p such that for anycomponent index j , the random vector ( x j , s j , ˆ x j ) converges ANGAN, FLETCHER, AND GOYAL 9 in distribution to the vector ( x, s, ˆ x ) where x ∼ p ( x ) , s ∼ p S ( s ) , and ˆ x is given by ˆ x = T soft λ p ( z ) , z = x + √ µv, (39)where v ∼ N (0 , , λ p = γ p /s , and µ = σ , map /s . Hence,the asymptotic behavior of lasso has a remarkably simpledescription: the asymptotic distribution of the lasso estimate ˆ x j of the component x j is identical to x j being corrupted byGaussian noise and then soft-thresholded to yield the estimate ˆ x j .This soft-threshold description has an appealing interpreta-tion. Consider the case when the measurement matrix A = I .In this case, the lasso estimator (36) reduces to n scalarestimates, ˆ x j = T soft λ ( x j + √ µ v j ) , j = 1 , , . . . , n, (40)where v i ∼ N (0 , , λ = γ/s , and µ = σ /s . Comparing(39) and (40), we see that the asymptotic distribution of ( x j , s j , ˆ x j ) with large random A is identical to the distributionin the trivial case where A = I , except that the noise levels γ and σ are replaced by effective noise levels γ p and σ , map .To calculate the effective noise levels, one can perform asimple calculation to show that σ ( z, λ ) in (28) is given by σ ( z, λ ) = (cid:26) λ, if | z | > λ ;0 , if | z | ≤ λ. (41)Hence, E (cid:2) sσ ( z, λ p ) (cid:3) = E [ sλ p Pr( | z | > λ p )]= γ p Pr( | z | > γ p /s ) , (42)where we have used the fact that λ p = γ p /s . Substituting (37)and (42) into (30), we obtain the fixed-point equations σ , map = σ + β E h s | x − T soft λ p ( z ) | i , (43a) γ p = γ + βγ p Pr( | z | > γ p /s ) , (43b)where the expectations are taken with respect to x ∼ p ( x ) , s ∼ p S ( s ) , and z in (39). Again, while these fixed-pointequations do not have a closed-form solution, they can berelatively easily solved numerically given distributions of x and s . C. Zero Norm-Regularized Estimation
Lasso can be regarded as a convex relaxation of zero norm-regularized estimator ˆ x zero ( y ) = arg min x ∈ R n γ k y − AS / x k + k x k , (44)where k x k is the number of nonzero components of x . Forcertain strictly sparse priors, zero norm-regularized estimationmay provide better performance than lasso. While computing the zero norm-regularized estimate is generally very difficult,we can use the replica analysis to provide a simple predictionof its performance . This analysis can provide a bound on theperformance achievable by practical algorithms.To apply the RS PMAP decoupling property to the zeronorm-regularized estimator (44), we observe that the zero norm-regularized estimator is identical to the PMAP estimator(18) with the cost function f ( x ) = (cid:26) , if x = 0;1 , if x = 0 . (45)Technically, this cost function does not satisfy the conditionsof the RS PMAP decoupling property. For one thing, withoutbounding the range of x , the bound (19) is not satisfied.Also, the minimum of (25) does not agree with the essen-tial infimum. To avoid these problems, we can consider anapproximation of (45), f δ,M ( x ) = (cid:26) , if | x | < δ ;1 , if | x | ∈ [ δ, M ] , which is defined on the set X = { x : | x | ≤ M } . Wecan then take the limits δ → and M → ∞ . For spaceconsiderations and to simplify the presentation, we will justapply the decoupling property with f ( x ) in (45) and omit thedetails of taking the appropriate limits.With f ( x ) given by (45), the scalar MAP estimator in (25)is given by ˆ x pmapscalar ( z ; λ ) = T hard t ( z ) , t = √ λ, (46)where T hard t is the hard thresholding operator, T hard t ( z ) = (cid:26) z, if | z | > t ;0 , if | z | ≤ t. (47)Now, similar to the case of lasso estimation, the RS PMAPdecoupling property states that there exists effective noiselevels σ , map and γ p such that for any component index j ,the random vector ( x j , s j , ˆ x j ) converges in distribution to thevector ( x, s, ˆ x ) where x ∼ p ( x ) , s ∼ p S ( s ) , and ˆ x is givenby ˆ x = T hard t ( z ) , z = x + √ µv, (48)where v ∼ N (0 , , λ p = γ p /s , µ = σ , map /s , and t = p λ p = q γ p /s. (49)Thus, the zero norm-regularized estimation of a vector x isequivalent to n scalar components corrupted by some effectivenoise level σ , map and hard-thresholded based on an effectivenoise level γ p .The fixed-point equations for the effective noise levels σ , map and γ p can be computed similarly to the case of lasso.Specifically, one can verify that (41) and (42) are both satisfiedfor the hard thresholding operator as well. Substituting (42)and (46) into (30), we obtain the fixed-point equations σ , map = σ + β E (cid:2) s | x − T hard t ( z ) | (cid:3) , (50a) γ p = γ + βγ p Pr( | z | > t ) , (50b)where the expectations are taken with respect to x ∼ p ( x ) , s ∼ p S ( s ) , z in (48), and t given by (49). These fixed-pointequations can be solved numerically. D. Optimal Regularization
The lasso estimator (36) and zero norm-regularized estima-tor (44) require the setting of a regularization parameter γ .Qualitatively, the parameter provides a mechanism to tradeoff the sparsity level of the estimate with the fitting error.One of the benefits of the replica analysis is that it providesa simple mechanism for optimizing the parameter level giventhe problem statistics.Consider first the lasso estimator (36) with some β > anddistributions x ∼ p ( x ) and s ∼ p S ( s ) . Observe that thereexists a solution to (43b) with γ > if and only if Pr ( | z | > γ p /s ) < /β. (51)This leads to a natural optimization: we consider an optimiza-tion over two variables σ , map and γ p , where we minimize σ , map subject to (43a) and (51).One simple procedure for performing this minimizationis as follows: Start with t = 0 and some initial value of σ , map (0) . For any iteration t ≥ , we update σ , map ( t ) with the minimization σ , map ( t + 1) = σ + β min γ p E h s | x − T soft λ p ( z ) | i , (52)where, on the right-hand side, the expectation is taken over x ∼ p ( x ) , s ∼ p S ( s ) , z in (39), µ = σ , map ( t ) /s , and λ p = γ p /s . The minimization in (52) is over γ p > subject to (51).One can show that with a sufficiently high initial condition,the sequence σ , map ( t ) monotonically decreases to a localminimum of the objective function. Given the final value for γ p , one can then recover γ from (43b). A similar procedurecan be used for the zero norm-regularized estimator.VI. N UMERICAL S IMULATIONS
A. Bernoulli–Gaussian Mixture Distribution
As discussed above, the replica method is based on certainunproven assumptions and even then the decoupling resultsunder replica symmetry are only asymptotic for the largedimension limit. To validate the predictive power of the RSPMAP decoupling property for finite dimensions, we firstperformed numerical simulations where the components of x are a zero-mean Bernoulli–Gaussian process, or equivalentlya two-component, zero-mean Gaussian mixture where onecomponent has zero variance. Specifically, x j ∼ (cid:26) N (0 , , with prob. ρ ;0 , with prob. − ρ, where ρ represents a sparsity ratio. In the experiments, ρ =0 . . This is one of many possible sparse priors.We took the vector x to have n = 100 i.i.d. componentswith this prior, and we varied m for 10 different values of β = n/m from 0.5 to 3. For the measurements (3), we tooka measurement matrix A with i.i.d. Gaussian components anda constant scale factor matrix S = I . The noise level σ wasset so that SNR = 10 dB, where SNR is the signal-to-noiseratio with perfect side information defined in (16).We simulated various estimators and compared their perfor-mances against the asymptotic values predicted by the replica β = n/m M ed i an s qua r ed e rr o r ( d B ) Linear (replica)Linear(sim.)Lasso (replica)Lasso (sim.)Zero norm−regOptimalMMSE
Fig. 3. MSE performance prediction with the RS PMAP decoupling property.Plotted is the median normalized SE for various sparse recovery algorithms:linear MMSE estimation, lasso, zero norm-regularized estimation, and optimalMMSE estimation. Solid lines show the asymptotic predicted MSE from thereplica method. For the linear and lasso estimators, the circles and trianglesshow the actual median SE over 1000 Monte Carlo simulations. The unknownvector has i.i.d. Bernoulli–Gaussian components with a 90% probability ofbeing zero. The noise level is set so that
SNR = 10 dB. See text for details. analysis. For each value of β , we performed 1000 MonteCarlo trials of each estimator. For each trial, we measuredthe normalized squared error (SE) in dB
10 log (cid:18) k b x − x k k x k (cid:19) , where b x is the estimate of x . The results are shown inFig. 3, with each set of 1000 trials represented by the mediannormalized SE in dB.The top curve shows the performance of the linear MMSEestimator (31). As discussed in Section V-A, the RS PMAPdecoupling property applied to the case of a constant scalematrix S = I reduces to Verd´u and Shamai’s result in [60].As can be seen in Fig. 3, the result predicts the simulatedperformance of the linear estimator extremely well.The next curve shows the lasso estimator (36) with thefactor γ selected to minimize the MSE as described inSection V-D. To compute the predicted value of the MSEfrom the RS PMAP decoupling property, we numericallysolve the fixed-point equations (43) to obtain the effectivenoise levels σ , map and γ p . We then use the scalar MAPmodel with the estimator (37) to predict the MSE. We seefrom Fig. 3 that the predicted MSE matches the median SEwithin 0.3 dB over a range of β values. At the time of initialdissemination of this work [61], precise prediction of lasso’sperformance given a specific noise variance and prior wasnot achievable with any other method. Now, as discussed inSection I-C, such asymptotic performance predictions can alsobe proven rigorously through connections with approximatebelief propagation.Fig. 3 also shows the theoretical minimum MSE (as com-puted with the RS PMMSE decoupling property) and the ANGAN, FLETCHER, AND GOYAL 11 −20 −15 −1000.10.20.30.40.50.60.70.80.91 SE (dB) β =1 C u m u l a t i v e p r ob . −15 −10 −500.10.20.30.40.50.60.70.80.91 SE (dB) β =2 n=100sim. n=500sim. Replicalimit Fig. 4. Convergence to the asymptotic limit from the RS PMAP decouplingproperty. Plotted are the CDFs of the SE over 1000 Monte Carlo trials of thelasso method for the Gaussian mixture distribution. Details are in the text.The CDF is shown for dimensions n = 100 and n = 500 and β = 1 and .As vector dimension increases, the performance begins to concentrate aroundthe limit predicted by the RS PMAP decoupling property. theoretical MSE from the zero norm-regularized estimator ascomputed in Section V-C. For these two cases, the estimatorscannot be simulated since they involve NP-hard computations.But we have depicted the curves to show that the replicamethod can be used to calculate the gap between practical andimpractical algorithms. Interestingly, we see that there is abouta 2.0 to 2.5 dB gap between lasso and zero norm-regularizedestimation, and another 1 to 2 dB gap between zero norm-regularized estimation and optimal MMSE.It is, of course, not surprising that zero norm-regularizedestimation performs better than lasso for the strictly sparseprior considered in this simulation, and that optimal MMSEperforms better yet. However, what is valuable is that replicaanalysis can quantify the precise performance differences.In Fig. 3, we plotted the median SE since there is actuallyconsiderable variation in the SE over the random realizationsof the problem parameters. To illustrate the degree of vari-ability, Fig. 4 shows the CDF of the SE values over the1000 Monte Carlo trials. Each trial has different noise andmeasurement matrix realizations, and both contribute to SEvariations. We see that the variation of the SE is especiallylarge at the smaller dimension n = 100 . While the medianvalue agrees well with the theoretical replica limit, any partic-ular instance of the problem can vary considerably from thatlimit. This is a significant drawback of the replica method: atlower dimensions, the replica method may provide accuratepredictions of the median behavior, but it does not bound thevariations from the median.As one might expect, at the higher dimension of n = 500 ,the level of variability is reduced and the observed SE be-gins to concentrate around the replica limit. In his originalpaper [12], Tanaka assumes that concentration of the SE will occur; he calls this the self-averaging assumption. Fig. 4provides some empirical evidence that self-averaging doesindeed occur. However, even at n = 500 , the variation isnot insignificant. As a result, caution should be exercised inusing the replica predictions on particular low-dimensionalinstances. B. Discrete Distribution with Dynamic Range
The RS PMAP decoupling property can also be used tostudy the effects of dynamic range in power levels. To validatethe replica analysis with power variations, we ran the followingexperiment: the vector x was generated with i.i.d. components x j = √ s j u j , (53)where s j is a random power level and u j is a discrete three-valued random variable with probability mass function u j ∼ / √ ρ, with prob = ρ/ − / √ ρ, with prob = ρ/ , with prob = 1 − ρ. (54)As before, the parameter ρ represents the sparsity ratio and wechose a value of ρ = 0 . . The measurements were generatedby y = Ax + w = AS / u + w , where A is an i.i.d. Gaussian measurement matrix and w is Gaussian noise. As in the previous section, the post-despreading SNR with side-information was normalized to10 dB.The factor s j in (53) accounts for power variations in x j .We considered two random distributions for s j : (a) s j = 1 , sothat the power level is constant; and (b) s j is uniform (in dBscale) over a 10 dB range with unit average power.In case (b), when there is variation in the power levels, wecan analyze two different scenarios for the lasso estimator: • Power variations unknown:
If the power level s j in (53) isunknown to the estimator, then we can apply the standardlasso estimator: b x ( y ) = arg min x ∈ R n γ k y − Ax k + k x k , (55)which does not need knowledge of the power levels s j .To analyze the behavior of this estimator with the replicamethod, we simply incorporate variations of both u j and s j into the prior of x j and assume a constant scale factor s in the replica equations. • Power variations known:
If the power levels s j areknown, the estimator can compute b u ( y ) = arg min u ∈ R n γ k y − AS / u k + k u k (56)and then take b x = S / b u . This can be analyzed withthe replica method by incorporating the distribution of s j into the scale factors.Fig. 5 shows the performance of the lasso estimator for thedifferent power range scenarios. As before, for each β , thefigure plots the median SE over 1000 Monte Carlo simulationtrials. Fig. 5 also shows the theoretical asymptotic performance β = n/m M ed i an s qua r ed e rr o r ( d B ) Const. power (replica)Const. power (sim.)10dB range, unknown (replica)10dB range, unknown (sim.)10dB range, known (replica)10dB range, known (sim.)
Fig. 5. MSE performance prediction by the replica method of the lassoestimator with power variations in the components. Plotted is the medianSE of the lasso method in estimating a discrete-valued distribution. Threescenarios are considered: (a) all components have the same power; (b) thecomponents have a 10 dB range in power that is unknown to the estimator;and (c) the power range is known to the estimator and incorporated into themeasurement matrix. Solid lines represent the asymptotic prediction from theRS PMAP decoupling property, and the circles, triangles, and squares showthe median SE over 1000 Monte Carlo simulation. See text for details. as predicted with the RS PMAP decoupling property. Simu-lated values are based on a vector dimension of n = 100 andoptimal selection of γ as described in Section V-D.We see that in all three cases (constant power and powervariations unknown and known to the estimator), the replicaprediction is in excellent agreement with the simulated perfor-mance. With one exception, the replica method matches thesimulated performance within 0.2 dB. The one exception isfor β = 2 . with constant power, where the replica methodunderpredicts the median SE by about 1 dB. A simulation ata higher dimension of n = 500 (not shown here) reduced thisdiscrepancy to 0.2 dB, suggesting that the replica method isstill asymptotically correct.We can also observe two interesting phenomena in Fig. 5.First, the lasso method’s performance with constant poweris almost identical to the performance with unknown powervariations for values of β < . However, at higher valuesof β , the power variations actually improve the performanceof the lasso method, even though the average power is thesame in both cases. Wainwright’s analysis [44] demonstratedthe significance of the minimum component power in dictatinglasso’s performance. The above simulation and the correspond-ing replica predictions suggest that dynamic range may alsoplay a role in the performance of lasso. That increased dy-namic range can improve the performance of sparse estimationhas been observed for other estimators [62], [63].A second phenomena we see in Fig. 5 is that knowing thepower variations and incorporating them into the measurementmatrix can actually degrade the performance of lasso. Indeed,knowing the power variations appears to result in a 1 to 2 dBloss in MSE performance. Of course, one cannot conclude from this one simulationthat these effects of dynamic range hold more generally. Thestudy of the effect of dynamic range is interesting and beyondthe scope of this work. The point is that the replica methodprovides a simple analytic method for quantifying the effectof dynamic range that appears to match actual performancewell. C. Support Recovery with Thresholding
In estimating vectors with strictly sparse priors, one im-portant problem is to detect the locations of the nonzerocomponents in the vector x . This problem, sometimes called support recovery , arises for example in subset selection inlinear regression [64], where finding the support of the vector x corresponds to determining a subset of features with stronglinear influence on some observed data y . Several works haveattempted to find conditions under which the support of asparse vector x can be fully detected [44], [56], [65] orpartially detected [66]–[68]. Unfortunately, with the exceptionof [44], the only available results are bounds that are not tight.One of the uses of RS PMAP decoupling property is to exactly predict the fraction of support that can be detected cor-rectly. To see how to predict the support recovery performance,observe that the decoupling property provides the asymptoticjoint distribution for the vector ( x j , s j , ˆ x j ) , where x j is thecomponent of the unknown vector, s j is the correspondingscale factor and ˆ x j is the component estimate. Now, in supportrecovery, we want to estimate θ j , the indicator function that x j is nonzero θ j = (cid:26) , if x j = 0;0 , if x j = 0 . One natural estimate for θ j is to compare the magnitude ofthe component estimate ˆ x j to some scale-dependent threshold t ( s j ) , b θ j = (cid:26) , if | ˆ x j | > t ( s j );0 , if | ˆ x j | ≤ t ( s j ) . This idea of using thresholding for sparsity detection hasbeen proposed in [55] and [69]. Using the joint distribution ( x j , s j , ˆ x j ) , one can then compute the probability of sparsitymisdetection p err = Pr( b θ j = θ j ) . The probability of error can be minimized over the thresholdlevels t ( s ) .To verify this calculation, we generated random vectors x with n = 100 i.i.d. components given by (53) and (54). Weused a constant power ( s j = 1 ) and a sparsity fraction of ρ = 0 . . As before, the observations y were generated withan i.i.d. Gaussian matrix with SNR = 10 dB.Fig. 6 compares the theoretical probability of sparsity mis-detection predicted by the replica method against the actualprobability of misdetection based on the average of 1000Monte Carlo trials. We tested two algorithms: linear MMSEestimation and lasso estimation. For lasso, the regularizationparameter was selected for minimum MMSE as described inSection V-D. The results show a good match. ANGAN, FLETCHER, AND GOYAL 13 −3 −2 −1 Measurement ratio β = n/m P ( m i s de t e c t i on ) Linear+thresholding (replica)Linear+thresholding (sim.)Lasso+thresholding (replica)Lasso+threshodling (sim.)
Fig. 6. Support recovery performance prediction with the replica method. Thesolid lines show the theoretical probability of error in sparsity misdetectionusing linear and lasso estimation followed by optimal thresholding. The circlesand triangles are the corresponding mean probabilities of misdetection over1000 Monte Carlo trials.
VII. C
ONCLUSIONS AND F UTURE W ORK
We have applied the replica method from statistical physicsfor computing the asymptotic performance of postulated MAPestimators of non-Gaussian vectors with large random linearmeasurements, under a replica symmetric assumption. Themethod can be readily applied to problems in compressedsensing. While the method is not theoretically rigorous, sim-ulations show an excellent ability to predict the performancefor a range of algorithms, performance metrics, and input dis-tributions. Indeed, we believe that the replica method providesthe only method to date for asymptotically-exact prediction ofperformance of compressed sensing algorithms that can applyin a large range of circumstances.Moreover, we believe that the availability of a simple scalarmodel that exactly characterizes certain sparse estimatorsopens up numerous avenues for analysis. For one thing, itwould be useful to see if the replica analysis of lasso canbe used to recover the scaling laws of Wainwright [44]and Donoho and Tanner [45] for support recovery and toextend the latter to the noisy setting. Also, the best knownbounds for MSE performance in sparse estimation are givenby Haupt and Nowak [70] and Cand`es and Tao [71]. Sincethe replica analysis is asymptotically exact (subject to var-ious assumptions), we may be able to obtain much tighteranalytic expressions. In a similar vein, several researchershave attempted to find information-theoretic lower boundswith optimal estimation [56], [65], [72]. Using the replicaanalysis of optimal estimators, one may be able to improvethese scaling laws as well.Finally, there is a well-understood connection between sta-tistical mechanics and belief propagation-based decoding oferror correcting codes [6], [7]. These connections may suggestimproved iterative algorithms for sparse estimation as well. A
PPENDIX AR EVIEW OF THE R EPLICA M ETHOD
We provide a brief summary of the replica method, witha focus on some of the details of the replica symmetricanalysis of postulated MMSE estimation in [12], [14]. Thisreview will elucidate some of the key assumptions, notablythe assumption of replica symmetry. General descriptions ofthe replica method can be found in texts such as [8]–[11].The replica method is based on evaluating variants of theso-called asymptotic free energy F = − lim n →∞ n E [log Z ( y , Φ)] , (57)where Z ( y , Φ) is the postulated partition function Z ( y , Φ) = E (cid:2) log p y ( y | Φ ; p post , σ ) (cid:3) and the expectation in (57) is with respect to the true dis-tribution on y . For the replica PMMSE and PMAP analysesin [12], [14], various joint moments of the variables x j and ˆ x j are computed from certain variants of the free energy, andthe convergence of the joint distribution of ( x j , ˆ x j ) is thenanalyzed based on these moments.To evaluate the asymptotic free energy, the replica methoduses the identity that, for any random variable Z , E [log Z ] = lim ν → ∂∂ν log E [ Z ν ] . Therefore, the asymptotic free energy (57) can be rewritten as F = − lim n →∞ n lim ν → ∂∂ν log E [ Z ν ( y , Φ)] . (58)The “replica trick” involves evaluating the expectation E [ Z ν ( y , Φ)] for positive integer values of ν and then assumingan analytic continuation so that the resulting expression is validfor real ν in the vicinity of zero. For positive integer valuesof ν , the quantity Z ν ( y , Φ) can be written as Z ν ( y , Φ) = E " ν Y a =1 p y | x ( y | x a , Φ ; p post , σ ) , (59)where the expectation is over independent copies of the vectors x a , a = 1 , . . . , ν , with i.i.d. components x aj ∼ p post ( x aj ) .The motivation for the replica trick is that the quantity Z ν ( y , Φ) in (59) can be thought of as a partition function ofa new system with ν “replicated” copies of the variables x a , a = 1 , . . . , ν . The parameter ν is called the replica number.The replicated system is relatively easy to analyze. Specif-ically, to evaluate E [ Z ν ( y , Φ)] , the replica analysis in [12],[14] first assumes a self-averaging property that essentiallyassumes that the variations in Z ν ( y , Φ) due to randomness ofthe measurement matrix Φ vanish in the limit as n → ∞ .Although a large number of statistical physics quantitiesexhibit such self-averaging, the self-averaging of the relevantquantities for the general PMMSE and PMAP analyses hasnot been rigorously established. Following [12], [14], self-averaging in this work is thus simply assumed.Under the self-averaging assumption, the expectation in (59)is evaluated in [14] by first conditioning on the ( ν + 1) -by- ( ν + 1) correlation matrix Q = (1 /n ) X T X , where X is the n -by- ( ν + 1) matrix X = [ x x . . . x ν ] , with x having i.i.d. components according to the true distri-bution x j ∼ p ( x j ) and the vectors x a being independentwith i.i.d. components following the postulated distribution x aj ∼ p post ( x aj ) . The conditioning on Q reduces the expec-tation in (59) to an integral of the form n E [ Z ν ( y , Φ)]= 1 n log Z exp (cid:18) nβ G ( ν ) ( Q ) (cid:19) µ ( ν ) n ( d Q ) + O (cid:18) n (cid:19) , (60)where G ( ν ) ( Q ) is some function of the correlation matrix Q and µ ( ν ) n ( Q ) is a probability measure on Q . It is then arguedthat the measures µ νn ( Q ) satisfy a large deviations propertywith some rate function I ν ( Q ) . Then, using standard largedeviations arguments as in [73], the asymptotic value of theexpectation in (60) reduces to a maximization of the form lim n →∞ E [ Z ν ( y , Φ)] = sup Q (cid:20) β G ( ν ) ( Q ) − I ν ( Q ) (cid:21) , (61)where the supremum is over the set of covariance matrices Q .The correlation matrix Q plays a similar role as the so-calledoverlap matrix in replica analyses of systems with discreteenergy states [10].The maximization in (61) over all covariance matrices is, ingeneral, difficult to perform. The key replica symmetry (RS)assumption used in [12] and [14], and hence implicitly usedin this paper, is that the maxima are achieved with matrices Q that are symmetric with respect to permutations of the ν replica indices. Under this symmetry assumption, the space ofcovariance matrices is greatly reduced and the maxima (61)can be explicitly evaluated.The RS assumption is not always valid, even though thesystem itself is symmetric across the replica indices. Forexample, it is well-known that even in the simple randomenergy model, the corresponding maximization may not satisfythe RS assumption, particularly at low temperatures [10]; see,also [74]. More recently, it has been shown that replica sym-metry may also be broken when analyzing lattice precodingfor the Gaussian broadcast channel [15].In absence of replica symmetry, one must search througha larger class of overlap or covariance matrices Q . Onesuch hierarchy of classes of matrices that is often used isdescribed by the so-called k -step replica symmetry breaking(RSB) matrices, a description of which can be found in varioustexts [8]–[11]. In this regard, the analysis in this paper, whichassumes replica symmetry, is thus only a 0-step RSB analysisor 0th-level prediction.It is difficult to derive general tests for whether the RSassumption is rigorously valid. Tanaka’s original work [12]derived an explicit condition for the validity of the RSassumption based on the Almeida–Thouless (AT) test [75]that considers asymmetric perturbations around the RS saddlepoints of the maximization (61). For the case of binarysignals, the condition has a simple formula with the SNRand measurement ratio β . In [48], an AT condition was also derived for RS analysis of ℓ p reconstruction with Bernoulli–Gaussian priors. Unfortunately, no equivalent condition hasbeen derived for the general scenario considered in Guo andVerd´u’s extension in [14].In this work, we simply assume replica symmetry for theall values of the scale factor u > . Since u is analogous toinverse temperature [54] and validity of the RS assumption ismore problematic at low temperatures, one must be cautious ininterpreting our results. As stated in Section I, where possiblewe have confirmed the replica predictions by comparisonto numerical experiments. However, such experiments arelimited to computable estimators such as LASSO and linearestimators. For other estimators, such as the true MMSE orzero norm-regularized estimator, the RS assumption may verywell not hold. A PPENDIX BP ROOF O VERVIEW
Fix a deterministic sequence of indices j = j ( n ) with j ( n ) ∈ { , . . . , n } . For each n , define the random vectortriples θ u ( n ) = ( x j ( n ) , s j ( n ) , ˆ x uj ( n )) , (62a) θ map ( n ) = ( x j ( n ) , s j ( n ) , ˆ x pmap j ( n )) , (62b)where x j ( n ) , ˆ x uj ( n ) , and ˆ x pmap j ( n ) are the j th components ofthe random vectors x , b x u ( y ) , and ˆ x pmap ( y ) , and s j ( n ) is the j th diagonal entry of the matrix S .For each u , we will use the notation ˆ x u scalar ( z ; λ ) = ˆ x pmmsescalar ( z ; p u , λ/u ) , (63)where p u is defined in (21) and ˆ x pmmsescalar ( z ; · , · ) is defined in(10). Also, for every σ and γ > define the random vectors θ u scalar ( σ , γ ) = ( x, s, ˆ x u scalar ( z ; γ/s )) , (64a) θ mapscalar ( σ , γ ) = ( x, s, ˆ x pmapscalar ( z ; γ/s )) , (64b)where x and s are independent with x ∼ p ( x ) , s ∼ p S ( s ) ,and z = x + σ √ s v (65)with v ∼ N (0 , .Now, to prove the RS PMAP decoupling property, we needto show that (under the stated assumptions) lim n →∞ θ map ( n ) = θ mapscalar ( σ , map , γ p ) , (66)where the limit is in distribution and the noise levels σ , map and γ p satisfy part (b) of the claim. This desired equivalenceis depicted in the right column of Fig. 7.To show this limit we first observe that under Assumption 1,for u sufficiently large, the postulated prior distribution p u ( x ) in (21) and noise level σ u in (22) are assumed to satisfy theRS PMMSE decoupling property. This implies that lim n →∞ ( x j ( n ) , s j ( n ) , ˆ x uj ( n ))= ( x, s, ˆ x pmmsescalar ( z ; p u , σ − eff ( u ) /s )) , (67) ANGAN, FLETCHER, AND GOYAL 15 ˆ x u scalar ( z ; γ/s ) ˆ x pmapscalar ( z ; γ/s )ˆ x uj ( n ) ˆ x pmap j ( n ) Appendix EAppendix DRS PMMSEdecouplingproperty RS PMAPdecouplingpropertyFig. 7. The RS PMAP decoupling property of this paper relates ˆ x pmap j ( n ) to ˆ x pmapscalar ( z ; γ/s ) through an n → ∞ limit. We establish the equivalence ofits validity to the validity of the RS PMMSE decoupling property [14] throughtwo u → ∞ limits: Appendix D relates ˆ x uj ( n ) and ˆ x pmap j ( n ) ; Appendix Erelates ˆ x u scalar ( z ; γ/s ) and ˆ x pmapscalar ( z ; γ/s ) . where the limit is in distribution, x ∼ p ( x ) , s ∼ p S ( s ) , and z = x + σ eff ( u ) √ s v, v ∼ N (0 , . Using the notation above, we can rewrite this limit as lim n →∞ θ u ( n ) ( a ) = lim n →∞ ( x j ( n ) , s j ( n ) , ˆ x uj ( n )) ( b ) = ( x, s, ˆ x pmmsescalar ( z ; p u , σ − eff ( u ) /s )) ( c ) = ( x, s, ˆ x u scalar ( z ; uσ − eff ( u ) /s )) ( d ) = θ u scalar ( σ ( u ) , uσ − eff ( u )) , (68)where all the limits are in distribution and (a) follows from thedefinition of θ u ( n ) in (62a); (b) follows from (67); (c) followsfrom (63); and (d) follows from (64a). This equivalence isdepicted in the left column of Fig. 7.The key part of the proof is to use a large deviationsargument to show that for almost all y , lim u →∞ b x u ( y ) = ˆ x pmap ( y ) . This limit in turn shows (see Lemma 5 of Appendix D) thatfor every n , lim u →∞ θ u ( n ) = θ map ( n ) (69)almost surely and in distribution. A large deviation argumentis also used to show that for every λ and almost all z , lim u →∞ ˆ x u scalar ( z ; λ ) = ˆ x pmapscalar ( z ; λ ) . Combining this with the limits in Assumption 2, we will see(see Lemma 7 of Appendix E) that lim u →∞ θ u scalar ( σ ( u ) , uσ − eff ( u ))= θ mapscalar ( σ , map , γ p ) (70)almost surely and in distribution.The equivalences (69) and (70) are shown as rows in Fig. 7.As shown, they combine with the RS PMMSE decouplingproperty to prove the RS PMAP decoupling property. Inequations instead of diagrammatic form, the combination of limits is lim n →∞ θ map ( n ) ( a ) = lim n →∞ lim u →∞ θ u ( n ) ( b ) = lim u →∞ lim n →∞ θ u ( n ) ( c ) = lim u →∞ θ u scalar ( σ ( u ) , uσ − eff ( u )) ( d ) = θ mapscalar ( σ , map , γ p ) where all the limits are in distribution and (a) follows from(69); (b) follows from Assumption 3; (c) follows from (68);and (d) follows from (70). This proves (66) and part (a) of theclaim.Therefore, to prove the claim we prove the limit (69) inAppendix D and the limit (70) in Appendix E and show thatthe limiting noise levels σ , map and γ p satisfy the fixed-pointequations in part (b) of the claim in Appendix F. Before theseresults are given, we review in Appendix C some requisiteresults from large deviations theory.A PPENDIX CL ARGE D EVIATIONS R ESULTS
The above proof overview shows that the RS predictions forthe postulated MAP estimate are calculated by taking the limitas u → ∞ of the RS predictions of the postulated MMSEestimates. These limits are evaluated with large deviationstheory and we begin, in this appendix, by deriving some simplemodifications of standard large deviations results. The mainresult we need is Laplace’s principle as described in [73]: Lemma 1 (Laplace’s Principle):
Let ϕ ( x ) be any measur-able function defined on some measurable subset D ⊆ R n such that Z x ∈D exp( − ϕ ( x )) d x < ∞ . (71)Then lim u →∞ u log Z x ∈D exp( − uϕ ( x )) d x = − ess inf x ∈D ϕ ( x ) . Given ϕ ( x ) as in Lemma 1, define the probability distribu-tion q u ( x ) = (cid:20)Z x ∈D exp( − uϕ ( x )) d x (cid:21) − exp( − uϕ ( x )) . (72)We want to evaluate expectations of the form lim u →∞ Z x ∈D g ( u, x ) q u ( x ) d x for some real-valued measurable function g ( u, x ) . The follow-ing lemma shows that this integral is described by the behaviorof g ( u, x ) in a neighborhood of the minimizer of ϕ ( x ) . Lemma 2:
Suppose that ϕ ( x ) and g ( u, x ) are real-valuedmeasurable functions such that:(a) The function ϕ ( x ) satisfies (71) and has a unique es-sential minimizer b x ∈ R n such that for every openneighborhood U of b x , inf x U ϕ ( x ) > ϕ ( b x ) . (b) The function g ( u, x ) is positive and satisfies lim sup u →∞ sup x U log g ( u, x ) u ( ϕ ( x ) − ϕ ( b x )) ≤ for every open neighborhood U of b x .(c) There exists a constant g ∞ such that for every ǫ > ,there exists a neighborhood U of ˆ x such that lim sup u →∞ (cid:12)(cid:12)(cid:12)(cid:12)Z U g ( u, x ) q u ( x ) dx − g ∞ (cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ. Then, lim u →∞ Z g ( u, x ) q u ( x ) d x = g ∞ . Proof:
Due to item (c), we simply have to show that forany open neighborhood U of b x , lim sup u →∞ Z x ∈ U c g ( u, x ) q u ( x ) d x = 0 . To this end, let Z ( u ) = log Z x ∈ U c g ( u, x ) q u ( x ) d x . It suffices to show that Z ( u ) → −∞ as u → ∞ . Using thedefinition of q u ( x ) in (72), it is easy to check that Z ( u ) = Z ( u ) − Z ( u ) , (73)where Z ( u ) = log Z x ∈ U c g ( u, x ) exp ( − u ( ϕ ( x ) − ϕ ( b x ))) d x ,Z ( u ) = log Z x ∈D exp ( − u ( ϕ ( x ) − ϕ ( b x ))) d x . Now, let M = ess inf x ∈ U c ϕ ( x ) − ϕ ( b x ) . By item (a),
M > . Therefore, we can find a δ > such that − M (1 − δ ) + 3 δ < . (74)Now, from item (b), there exists a u such that for all u > u , Z ( u ) ≤ log Z x ∈ U c exp( − u (1 − δ )( ϕ ( x ) − ϕ ( b x ))) d x . By Laplace’s principle, we can find a u such that for all u > u , Z ( u ) ≤ u (cid:20) δ − inf x ∈ U c (1 − δ )( ϕ ( x ) − ϕ ( b x )) (cid:21) = u ( − M (1 − δ ) + δ ) . (75)Also, since b x is an essential minimizer of ϕ ( x ) , ess inf x ∈D ϕ ( x ) = ϕ ( b x ) . Therefore, by Laplace’s principle, there exists a u such thatfor u > u , Z ( u ) ≥ u (cid:20) − δ − ess inf x ∈D ( ϕ ( x ) − ϕ ( b x )) (cid:21) = − uδ. (76)Substituting (75) and (76) into (73) we see that for u suffi-ciently large, Z ( u ) ≤ u ( − M (1 − δ ) + δ ) + uδ < − uδ, where the last inequality follows from (74). This shows Z ( u ) → −∞ as u → ∞ and the proof is complete.One simple application of this lemma is as follows: Lemma 3:
Let ϕ ( x ) and h ( x ) be real-valued measurablefunctions satisfying the following:(a) The function ϕ ( x ) has a unique essential minimizer b x such that for every open neighborhood U of b x , inf x U ϕ ( x ) > ϕ ( b x ) . (b) The function h ( x ) is continuous at b x .(c) There exists a c > and compact set K such that for all x K , ϕ ( x ) ≥ c log | h ( x ) | . (77)Then, lim u →∞ Z h ( x ) q u ( x ) d x = h ( b x ) . Proof:
We will apply Lemma 2 with g ( u, x ) = | h ( x ) − h ( b x ) | and g ∞ = 0 . Item (a) of this lemma shows that ϕ ( x ) satisfies item (a) in Lemma 2.To verify that item (b) of Lemma 2 holds, we first claimthere exists a constant M > such that for all x , ϕ ( x ) − ϕ ( b x ) ≥ M log | h ( x ) − h ( b x ) | . (78)We find a valid constant M for three regions. First, let U be theset of x such that | h ( x ) − h ( b x ) | < . Since h ( x ) is continuousin x , U is an open neighborhood of b x . Also, for x ∈ U , theright hand side of (78) is negative. Since the left-hand side of(78) is positive, the inequality will be satisfied in U for any M > .Next, consider the set K = K \ U where K is the compactset in item (c) of this lemma. Since K is compact and h ( x ) iscontinuous, there exists a c > such that log | h ( x ) − h ( b x ) |
Hence, for any open neighborhood U of b x , lim sup u →∞ sup x U log g ( u, x ) u ( ϕ ( x ) − ϕ ( b x )) ≤ lim u →∞ uM = 0 . Now let us verify that item (c) of Lemma 2 holds. Let ǫ > . Since h ( x ) is continuous at x , there exists an openneighborhood U of x such that g ( u, x ) < ǫ for all x ∈ U and u . This implies that for all u , Z U g ( u, x ) q u ( x ) d x < ǫ Z U q u ( x ) d x ≤ ǫ, ANGAN, FLETCHER, AND GOYAL 17 which shows that g ( u, x ) satisfies item (c) of Lemma 2. Thus (cid:12)(cid:12)(cid:12)(cid:12)Z h ( x ) q u ( x ) d x − h ( b x ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z ( h ( x ) − h ( b x )) q u ( x ) d x (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z | h ( x ) − h ( b x ) | q u ( x ) d x ≤ Z g ( u, x ) q u ( x ) d x → , where the last limit is as u → ∞ and follows from Lemma 2.A PPENDIX DE VALUATION OF lim u →∞ b x u ( y ) We can now apply Laplace’s principle in the previoussection to prove (69). We begin by examining the pointwiseconvergence of the PMMSE estimator b x u ( y ) . Lemma 4:
For every n , A , and S and almost all y , lim u →∞ b x u ( y ) = ˆ x pmap ( y ) , where b x u ( y ) is the PMMSE estimator in (24) and ˆ x pmap ( y ) is the PMAP estimator in (18). Proof:
The lemma is a direct application of Lemma 3.Fix n , y , A , and S and let ϕ ( x ) = 12 λ k y − AS / x k + f ( x ) . (79)The definition of ˆ x pmap ( y ) in (18) shows that ˆ x pmap ( y ) = arg min x ∈X n ϕ ( x ) . Assumption 4 shows that this minimizer is unique for almostall y . Also (23) shows that p x | y ( x | y ; p u , σ u )= (cid:20)Z x ∈X n exp ( − uϕ ( x )) d x (cid:21) − exp( − uϕ ( x ))= q u ( x ) , where q u ( x ) is given in (72) with D = X n . Therefore, using(24), b x u ( y ) = E (cid:0) x | y ; p u , σ u (cid:1) = Z x ∈X n x q u ( x ) d x . (80)Now, to prove the lemma, we need to show that lim n →∞ ˆ x uj ( y ) = ˆ x pmap j ( y ) for every component j = 1 , . . . , n . To this end, fix a compo-nent index j . Using (80), we can write the j th component of b x u ( y ) as ˆ x uj ( y ) = Z x ∈X n h ( x ) q u ( x ) d x , where h ( x ) = x j . The function h ( x ) is continuous. To verifyitem (c) of Lemma 3, using Assumption 5, we first find acompact set K such that | x | 6∈ K implies that f ( x j ) > c log | x j | . (81) Then, for x K , ϕ ( x ) ( a ) ≥ f ( x ) ( b ) ≥ f ( x j ) ( c ) ≥ c log | x j | , where (a) follows from the definition of ϕ ( x ) in (79); (b)follows from (20) and the assumption that the cost functions f ( x i ) are non-negative; and (c) follows from (81). Therefore,item (c) of Lemma 3 follows since h ( x j ) = x j . Thus, all thehypotheses of Lemma 3 are satisfied and we have the limit lim u →∞ ˆ x uj ( y ) = h (ˆ x pmap ( y )) = ˆ x pmap j ( y ) . This proves the lemma.
Lemma 5:
Consider the random vectors θ u ( n ) and θ map ( n ) defined in (62a) and (62b), respectively. Then, for all n , lim u →∞ θ u ( n ) = θ map ( n ) (82)almost surely and in distribution. Proof:
The vectors θ u ( n ) and θ map ( n ) are deterministicfunctions of x ( n ) , A ( n ) , S ( n ) , and y . Lemma 4 shows that thelimit (82) holds for any values of x ( n ) , A ( n ) , and S ( n ) , andalmost all y . Since y has a continuous probability distribution(due to the additive noise w in (3)), the set of values where thislimit does not hold must have probability zero. Thus, the limit(82) holds almost surely, and therefore, also in distribution.A PPENDIX EE VALUATION OF lim u →∞ ˆ x u scalar ( z ; λ ) We first show the pointwise convergence of the scalarMMSE estimator ˆ x u scalar ( z ; λ ) . Lemma 6:
Consider the scalar estimators ˆ x u scalar ( z ; λ ) de-fined in (63) and ˆ x pmapscalar ( z ; λ ) defined in (25). For all λ > and almost all z , we have the deterministic limit lim u →∞ ˆ x u scalar ( z ; λ ) = ˆ x pmapscalar ( z ; λ ) . Proof:
The proof is similar to that of Lemma 4. Fix z and λ and consider the conditional distribution p x | z ( x | z ; p u , λ/u ) . Using (7) along with the definition of p u ( x ) in(21) and an argument similar to the proof of Lemma 4, it iseasily checked that p x | z ( x | z ; p u , λ/u ) = q u ( x ) , (83)where q u ( x ) is given by (72) with D = X and ϕ ( x ) = F ( x, z, λ ) , (84)where F ( x, z, λ ) is defined in (26). Using (63) and (10), ˆ x u scalar ( z ; λ ) = ˆ x pmmsescalar ( z ; p u , λ/u )= Z x ∈X x p x | z ( x | z ; p u , λ/u ) dx = Z x ∈X h ( x ) q u ( x ) dx, with h ( x ) = x .We can now apply Lemma 3. The definition of ˆ x pmapscalar ( z ; λ ) in (25) shows that ˆ x pmapscalar ( z ; λ ) = arg min x ∈X ϕ ( x ) . (85) Assumption 6 shows that for all λ > and almost all z , thisminimization is unique so ϕ ( x ) > ϕ (ˆ x pmapscalar ( z ; λ )) for all x = ˆ x pmapscalar ( z ; λ ) . Also, using (26), lim | x |→∞ ϕ ( x ) ( a ) = lim | x |→∞ F ( x, z, λ ) ( b ) ≥ lim | x |→∞ f ( x ) ( c ) = ∞ (86)where (a) follows from (84); (b) follows from (26); and (c)follows from Assumption 5. Equations (85) and (86) show thatitem (a) of Lemma 3 is satisfied. Item (b) of Lemma 3 is alsoclearly satisfied since h ( x ) = x is continuous.Also, similar to the proof of Lemma 4, one can show usingAssumption 5 that item (c) of Lemma 3 is satisfied for some c > . Thus, all the hypotheses of Lemma 3 are satisfied andwe have the limit lim u →∞ ˆ x u scalar ( z ; λ ) = h (ˆ x pmapscalar ( z ; λ )) = ˆ x pmapscalar ( z ; λ ) . This proves the lemma.We now turn to convergence of the random variable θ u scalar ( σ ( u ) , uσ − eff ( u )) . Lemma 7:
Consider the random vectors θ u scalar ( σ , γ ) de-fined in (64a) and θ mapscalar ( σ , γ ) in (64b). Let σ ( u ) , σ − eff ( u ) , σ , map and γ p be as defined in Assumption 2.Then the following limit holds: lim u →∞ θ u scalar ( σ ( u ) , uσ − eff ( u )) = θ mapscalar ( σ , map , γ p ) (87)almost surely and in distribution. Proof:
The proof is similar to that of Lemma 5. For any σ and γ > , the vectors θ u scalar ( σ , γ ) and θ mapscalar ( σ , γ ) aredeterministic functions of the random variables x ∼ p ( x ) , s ∼ p S ( s ) , and z given (65) with v ∼ N (0 , . Lemma 6shows that the limit lim u →∞ θ u scalar ( σ , γ ) = θ mapscalar ( σ , γ ) (88)holds for any values of σ , γ , x , and s and almost all z . Also,if we fix x , s , and v , by Assumption 6, the function ˆ x pmapscalar ( z ; γ/s ) = ˆ x pmapscalar ( x + σ √ s v ; γ/s ) is continuous in γ and σ for almost all values of v . Therefore,we can combine (88) with the limits in Assumption 2 to showthat lim u →∞ θ u scalar ( σ ( u ) , uσ − eff ( u )) = θ mapscalar ( σ , map , γ p ) for almost all x and s and almost all z . Since z has acontinuous probability distribution (due to the additive noise v in (65)), the set of values where this limit does not holdmust have probability zero. Thus, the limit (87) holds almostsurely, and therefore, also in distribution. A PPENDIX FP ROOF OF THE F IXED -P OINT E QUATIONS
For the final part of the proof, we need to show that thelimits σ , map and γ p in Assumption 2 satisfy the fixed-pointequations (30). The proof is straightforward, but we just needto keep track of the notation properly. We begin with thefollowing limit. Lemma 8:
The following limit holds: lim u →∞ E (cid:2) s mse ( p u , p , µ up , µ u , z u ) (cid:3) = E (cid:2) s | x − ˆ x pmapscalar ( z ; λ ) | (cid:3) , where the expectations are taken over x ∼ p ( x ) and s ∼ p S ( s ) , and z and z u are the random variables z u = x + √ µ u v, (89a) z = x + √ µv, (89b)with v ∼ N (0 , and µ u = σ ( u ) /s , µ up = σ − eff ( u ) /s , µ = σ , map /s , and λ = γ p /s . Proof:
Using the definitions of mse in (11) and ˆ x u scalar ( z ; · ) in (63), mse ( p u , p , µ up , µ u , z u )= Z x ∈X | x − ˆ x pmmsescalar ( z u ; p u , µ up ) | p x | z ( x | z u ; p , µ u ) dx = Z x ∈X | x − ˆ x u scalar ( z u ; µ up /u ) | p x | z ( x | z u ; p , µ u ) dx. Therefore, fixing s (and hence µ up and µ u ), we obtain theconditional expectation E (cid:2) mse ( p u , p , µ up , µ u , z u ) | s (cid:3) = E (cid:2) | x − ˆ x u scalar ( z u ; µ up /u ) | | s (cid:3) , (90)where the expectation on the right is over x ∼ p ( x ) and z u given by (89a).Also, observe that the definitions µ u = σ ( u ) /s and µ = σ , map /s and along with the limit in Assumption 2 show that lim u →∞ µ u = µ. (91)Similarly, since µ up = σ − eff ( u ) /s and λ = γ p /s , Assump-tion 2 shows that lim u →∞ µ up u = λ. (92)Taking the limit as u → ∞ , lim u →∞ E (cid:2) s mse ( p u , p , µ up , µ u , z u ) (cid:3) ( a ) = lim u →∞ E (cid:2) s | x − ˆ x u scalar ( z u ; µ up /u ) | (cid:3) , ( b ) = lim u →∞ E (cid:2) s | x − ˆ x u scalar ( z u ; λ ) | (cid:3) , ( c ) = lim u →∞ E (cid:2) s | x − ˆ x u scalar ( z ; λ ) | (cid:3) , ( d ) = lim u →∞ E (cid:2) s | x − ˆ x pmapscalar ( z ; λ ) | (cid:3) , where (a) follows from (90); (b) follows from (92); (c) followsfrom (91), which implies that z u → z ; and (d) follows fromLemma 6. ANGAN, FLETCHER, AND GOYAL 19
The previous lemma enables us to evaluate the limit of theMSE in (30a). To evaluate the limit of the MSE in (30b), weneed the following lemma.
Lemma 9:
Fix z and λ , and let g ( u, x ) = u | x − ˆ x | , ˆ x = ˆ x pmapscalar ( z ; λ ) . (93)Also, let ϕ ( x ) be given by (84) and q u ( x ) be given by (72)with D = X . Then, for any ǫ > , there exists an openneighborhood U ⊆ X of ˆ x such that lim sup u →∞ (cid:12)(cid:12)(cid:12)(cid:12)Z x ∈ U g ( u, x ) q u ( x ) dx − σ ( z, λ ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ, where σ ( z, λ ) is given in Assumption 6. Proof:
The proof is straightforward but somewhat tedious.We will just outline the main steps. Let δ > . UsingAssumption 5, one can find an open neighborhood U ⊆ X of ˆ x such that for all x ∈ U and u > , φ (cid:0) x, σ − ( u ) (cid:1) ≤ exp( − u ( ϕ ( x ) − ϕ (ˆ x ))) ≤ φ (cid:0) x, σ ( u ) (cid:1) , (94)where φ ( x, σ ) is the unnormalized Gaussian distribution φ ( x, σ ) = exp (cid:18) − σ | x − ˆ x | (cid:19) and σ ( u ) = (1 + δ ) σ ( z, λ ) /u,σ − ( u ) = (1 − δ ) σ ( z, λ ) /u. Combining the bounds in (94) with the definition of q u ( x ) in(72) and the fact that U ⊆ X shows that for all x ∈ U and u > , q u ( x ) = (cid:20)Z x ∈X e − uϕ ( x ) dx (cid:21) − e − uϕ ( x ) ≤ (cid:20)Z x ∈ U φ ( x, σ − ( u )) dx (cid:21) − φ ( x, σ ( u )) . Therefore, Z x ∈ U g ( u, x ) q u ( x ) dx = Z x ∈ U u | x − ˆ x | q u ( x ) dx ≤ (cid:20)Z x ∈ U φ ( x, σ − ( u )) dx (cid:21) − Z x ∈ U u | x − ˆ x | φ ( x, σ ( u )) dx. (95)Now, it can be verified that lim u →∞ Z x ∈ U u / φ ( x, σ − ( u )) dx = p π (1 − δ ) σ ( z, λ ) (96)and lim u →∞ Z x ∈ U u / | x − ˆ x | φ ( x, σ ( u )) dx = p π (1 + δ ) σ ( z, λ ) . (97)Substituting (96) and (97) into (95) shows that lim sup u →∞ Z x ∈ U g ( u, x ) q u ( x ) dx ≤ (1 + δ ) / − δ σ ( z, λ ) . A similar calculation shows that lim inf u →∞ Z x ∈ U g ( u, x ) q u ( x ) dx ≥ (1 − δ ) / δ σ ( z, λ ) . Therefore, with appropriate selection of δ , one can find aneighborhood U of ˆ x such that lim sup u →∞ (cid:12)(cid:12)(cid:12)(cid:12)Z x ∈ U g ( u, x ) q u ( x ) dx − σ ( z, λ ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ, and this proves the lemma.Using the above result, we can evaluate the scalar MSE. Lemma 10:
Using the notation of Lemma 8, lim u →∞ E (cid:2) us mse ( p u , p u , µ up , µ up , z ) (cid:3) = E (cid:2) sσ ( z, γ p /s ) (cid:3) . Proof:
This is an application of Lemma 2. Fix z and λ and define g ( u, x ) as in (93). As in the proof of Lemma 6,the conditional distribution p x | z ( x | z ; p u , λ/u ) is given by(83) with ϕ ( x ) given by (84). The definition of ˆ x pmapscalar ( z ; λ ) in (25) shows that ˆ x pmapscalar ( z ; λ ) minimizes ϕ ( x ) . Similar tothe proof of Lemma 6, one can show that items (a) and (b)of Lemma 2 are satisfied. Also, Lemma 9 shows that item(c) of Lemma 2 holds with g ∞ = σ ( z, λ ) . Therefore, all thehypotheses of Lemma 2 are satisfied and lim u →∞ Z x ∈X u | x − ˆ x pmapscalar ( z ; λ ) | q u ( x ) dx = σ ( z, λ ) , (98)for all λ and almost all z .Now mse ( p u , p u , λ/u, λ/u, z ) ( a ) = Z x ∈X | x − ˆ x pmmsescalar ( z ; p u , λ/u ) | p x | z ( x | z ; p u , λ/u ) dx ( b ) = Z x ∈X | x − ˆ x pmmsescalar ( z ; p u , λ/u ) | q u ( x ) dx ( c ) = Z x ∈X | x − ˆ x u scalar ( z ; λ ) | q u ( x ) dx, (99)where (a) is the definition of mse in (11); (b) follows from(83); and (c) follows from (63). Taking the limit of thisexpression lim u →∞ u mse ( p u , p u , λ/u, λ/u, z ) ( a ) = lim u →∞ Z x ∈X u | x − ˆ x u scalar ( z ; λ ) | q u ( x ) dx ( b ) = lim u →∞ Z x ∈X u | x − ˆ x pmapscalar ( z ; λ ) | q u ( x ) dx ( c ) = σ ( z, λ ) , (100)where (a) follows from (99); (b) follows from Lemma 6; and(c) follows from (98).The variables z u and z in (89a) and (89b) as well as µ u and µ up are deterministic functions of x , v , s , and u . Fixing x , v , and s and taking the limit with respect to u we obtain the deterministic limit lim u →∞ u mse ( p u , p u , µ up , µ up , z u ) ( a ) = lim u →∞ u mse ( p u , p u , σ − eff ( u ) /s, σ − eff ( u ) /s, z u ) ( b ) = lim u →∞ σ ( z u , uσ − eff ( u ) /s ) ( c ) = lim u →∞ σ ( z, uσ − eff ( u ) /s ) ( d ) = σ ( z, γ p /s ) , (101)where (a) follows from the definitions of µ u and µ up inLemma 8; (b) follows from (100); (c) follows from the limit(proved in Lemma 8) that z u → z as u → ∞ ; and (d) followsfrom the limit in Assumption 2.Finally, observe that for any prior p and noise level µ , mse ( p, p, µ, µ, z ) ≤ µ, since the MSE error must be smaller than the additive noiselevel µ . Therefore, for any u and s , us mse ( p u , p u , µ up , µ up , z u ) ≤ usµ up = uσ ( u ) , where we have used the definition µ up = σ ( u ) /s . Since uσ ( u ) converges, there must exists a constant M > suchthat us mse ( p u , p u , µ up , µ up , z u ) ≤ usµ up ≤ M, for all u , s and z u . The lemma now follows from applyingthe Dominated Convergence Theorem and taking expectationsof both sides of (101).We can now show that the limiting noise values satisfy thefixed-point equations. Lemma 11:
The limiting effective noise levels σ , map and γ p in Assumption 2 satisfy the fixed-point equations (30a) and(30b). Proof:
The noise levels σ ( u ) and σ − eff ( u ) satisfythe fixed-point equations (13a) and (13b) of the RS PMMSEdecoupling property with the postulated prior p post = p u andnoise level σ = γ/u . Therefore, using the notation inLemma 8, σ ( u ) = σ + β E (cid:2) s mse ( p u , p , µ up , µ u , z u ) (cid:3) , (102a) uσ − eff ( u ) = γ + β E (cid:2) us mse ( p u , p u , µ up , µ up , z u ) (cid:3) , (102b)where (as defined in Lemma 8), µ u = σ ( u ) /s and µ up = σ − eff ( u ) /s and the expectations are taken over s ∼ p S ( s ) , x ∼ p ( x ) , and z u in (89a).Therefore, σ , map ( a ) = lim u →∞ σ ( u ) ( b ) = σ + β E (cid:2) s mse ( p u , p , µ up , µ u , z u ) (cid:3) ( c ) = σ + β E (cid:2) s | x − ˆ x pmapscalar ( z ; λ ) | (cid:3) , where (a) follows from the limit in Assumption 2; (b) followsfrom (102a); and (c) follows from Lemma 8. This shows that(30a) is satisfied. Similarly, γ p ( a ) = lim u →∞ uσ − eff ( u ) ( b ) = γ + β E (cid:2) s mse ( p u , p u , µ up , µ up , z u ) (cid:3) ( c ) = γ + β E (cid:2) sσ ( z, λ p ) (cid:3) , where (a) follows from the limit in Assumption 2; (b) followsfrom (102b); and (c) follows from Lemma 10. This shows that(30b) is satisfied. A CKNOWLEDGMENT
The authors thank reviewers for helpful comments. Theyalso thank Martin Vetterli and Amin Shokrollahi for discus-sions and support. R
EFERENCES[1] S. F. Edwards and P. W. Anderson, “Theory of spin glasses,”
J. Phys.F: Metal Physics , vol. 5, pp. 965–974, 1975.[2] M. M´ezard and G. Parisi, “A replica analysis of the travelling salesmanproblem,”
J. Phys. , pp. 1285–1296, 1986.[3] Y. Fu and P. W. Anderson, “Application of statistical mechanics to NP-complete problems in combinatorial optimisation,”
J. Phys. A: Math.Gen. , vol. 19, pp. 1605–1620, 1986.[4] R. Monasson and R. Zecchina, “Statistical mechanics of the randomK-satisfiability model,”
Phys. Rev. E , vol. 56, no. 2, pp. 1357–1370,1997.[5] H. Nishimori,
Statistical physics of spin glasses and information pro-cessing: An introduction , ser. International Series of Monographs onPhysics. Oxford, UK: Oxford Univ. Press, 2001.[6] N. Sourlas, “Spin-glass models as error-correcting codes,”
Nature , vol.339, pp. 693–695, Jun. 1989.[7] A. Montanari, “Turbo codes: The phase transition,”
Europ. Phys. J. B ,vol. 18, pp. 121–136, 2000.[8] V. Dotsenko,
An Introduction to the Theory of Spin Glasses and NeuralNetworks . World Scientific Press, 1995.[9] M. M´ezard, G. Parisi, and M. Virasoro,
Spin Glass Theory and Beyond:An Introduction . World Scientific Press, 2001.[10] M. M´ezard and A. Montanari,
Information, Physics and Computation .Oxford University Press, 2009.[11] M. Talagrand,
Spin Glasses: A Challenge for Mathematicians . NewYork: Springer, 2003.[12] T. Tanaka, “A statistical-mechanics approach to large-system analysisof CDMA multiuser detectors,”
IEEE Trans. Inform. Theory , vol. 48,no. 11, pp. 2888–2910, Nov. 2002.[13] R. R. M¨uller, “Channel capacity and minimum probability of error inlarge dual antenna array systems with binary modulation,”
IEEE Trans.Signal Process. , vol. 51, no. 11, pp. 2821–2828, Nov. 2003.[14] D. Guo and S. Verd´u, “Randomly spread CDMA: Asymptotics viastatistical physics,”
IEEE Trans. Inform. Theory , vol. 51, no. 6, pp.1983–2010, Jun. 2005.[15] B. Zaidel, R. M¨uller, A. Moustakas, and R. de Miguel, “Vector precodingfor Gaussian MIMO broadcast channels: Impact of replica symmetrybreaking,” arXiv:1001.3790 [cs.IT]., Jan. 2010.[16] J. Boutros and G. Caire, “Iterative multiuser joint decoding: Unifiedframework and asymptotic analysis,”
IEEE Trans. Inform. Theory ,vol. 48, no. 7, pp. 1772–1793, Jul. 2002.[17] T. Tanaka and M. Okada, “Approximate belief propagation, densityevolution, and neurodynamics for CDMA multiuser detection,”
IEEETrans. Inform. Theory , vol. 51, no. 2, pp. 700–706, Feb. 2005.[18] A. Montanari and D. Tse, “Analysis of belief propagation for non-linearproblems: The example of CDMA (or: How to prove Tanaka’s formula),”arXiv:cs/0602028v1 [cs.IT]., Feb. 2006.[19] D. Guo and C.-C. Wang, “Asymptotic mean-square optimality of beliefpropagation for sparse linear systems,” in
Proc. IEEE Inform. TheoryWorkshop , Chengdu, China, Oct. 2006, pp. 194–198.[20] ——, “Random sparse linear systems observed via arbitrary channels:A decoupling principle,” in
Proc. IEEE Int. Symp. Inform. Theory , Nice,France, Jun. 2007, pp. 946–950.
ANGAN, FLETCHER, AND GOYAL 21 [21] S. Rangan, “Estimation with random linear mixing, belief propagationand compressed sensing,” in
Proc. Conf. on Inform. Sci. & Sys. ,Princeton, NJ, Mar. 2010, pp. 1–6.[22] ——, “Estimation with random linear mixing, belief propagation andcompressed sensing,” arXiv:1001.2228v1 [cs.IT]., Jan. 2010.[23] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo-rithms for compressed sensing,”
Proc. Nat. Acad. Sci. , vol. 106, no. 45,pp. 18 914–18 919, Nov. 2009.[24] M. Bayati and A. Montanari, “The dynamics of message passing ondense graphs, with applications to compressed sensing,”
IEEE Trans.Inform. Theory , vol. 57, no. 2, pp. 764–785, Feb. 2011.[25] S. Rangan, “Generalized approximate message passing for estimationwith random linear mixing,” arXiv:1010.5141 [cs.IT]., Oct. 2010.[26] ——, “Generalized approximate message passing for estimation withrandom linear mixing,” in
Proc. IEEE Int. Symp. Inform. Theory , SaintPetersburg, Russia, Jul.–Aug. 2011, pp. 2174–2178.[27] A. Montanari, “Estimating random variables from random sparse obser-vations,” arXiv:0709.0145v1 [cs.IT]., Feb. 2008.[28] S. ten Brink, “Convergence behavior of iteratively decoded parallelconcatenated codes,”
IEEE Trans. Commun. , vol. 49, no. 10, pp. 1727–1737, Oct. 2001.[29] A. Ashikhmin, G. Kramer, and S. ten Brink, “Extrinsic informationtransfer functions: Model and erasure channel properties,”
IEEE Trans.Inform. Theory , vol. 50, no. 11, pp. 2657–2673, Nov. 2004.[30] A. Montanari, “Graphical models concepts in compressed sensing,”arXiv:1011.4328v3 [cs.IT], Mar. 2011.[31] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,”
IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, Feb.2006.[32] D. L. Donoho, “Compressed sensing,”
IEEE Trans. Inform. Theory ,vol. 52, no. 4, pp. 1289–1306, Apr. 2006.[33] E. J. Cand`es and T. Tao, “Near-optimal signal recovery from randomprojections: Universal encoding strategies?”
IEEE Trans. Inform. The-ory , vol. 52, no. 12, pp. 5406–5425, Dec. 2006.[34] B. K. Natarajan, “Sparse approximate solutions to linear systems,”
SIAMJ. Computing , vol. 24, no. 2, pp. 227–234, Apr. 1995.[35] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methodsand their application to non-linear system identification,”
Int. J. Control ,vol. 50, no. 5, pp. 1873–1896, Nov. 1989.[36] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequencydictionaries,”
IEEE Trans. Signal Process. , vol. 41, no. 12, pp. 3397–3415, Dec. 1993.[37] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition,” in
Conf. Rec. 27th Asilomar Conf. Sig., Sys., & Comput. ,vol. 1, Pacific Grove, CA, Nov. 1993, pp. 40–44.[38] G. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompo-sition,”
Optical Eng. , vol. 33, no. 7, pp. 2183–2191, Jul. 1994.[39] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,”
SIAM J. Sci. Comp. , vol. 20, no. 1, pp. 33–61, 1999.[40] R. Tibshirani, “Regression shrinkage and selection via the lasso,”
J.Royal Stat. Soc., Ser. B , vol. 58, no. 1, pp. 267–288, 1996.[41] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima-tion,”
IEEE Trans. Inform. Theory , vol. 50, no. 10, pp. 2231–2242, Oct.2004.[42] ——, “Just relax: Convex programming methods for identifying sparsesignals in noise,”
IEEE Trans. Inform. Theory , vol. 52, no. 3, pp. 1030–1051, Mar. 2006.[43] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparseovercomplete representations in the presence of noise,”
IEEE Trans.Inform. Theory , vol. 52, no. 1, pp. 6–18, Jan. 2006.[44] M. J. Wainwright, “Sharp thresholds for high-dimensional and noisysparsity recovery using ℓ -constrained quadratic programming (lasso),” IEEE Trans. Inform. Theory , vol. 55, no. 5, pp. 2183–2202, May 2009.[45] D. L. Donoho and J. Tanner, “Counting faces of randomly-projectedpolytopes when the projection radically lowers dimension,”
J. Amer.Math. Soc. , vol. 22, no. 1, pp. 1–53, Jan. 2009.[46] D. Guo, D. Baron, and S. Shamai, “A single-letter characterization ofoptimal noisy compressed sensing,” in
Proc. 47th Ann. Allerton Conf.on Commun., Control and Comp. , Monticello, IL, Sep.–Oct. 2009.[47] N. Merhav, D. Guo, and S. Shamai, “Statistical physics of signalestimation in Gaussian noise: Theory and examples of phase transitions,”
IEEE Trans. Inform. Theory , vol. 56, no. 3, pp. 1400–1416, 2010.[48] Y. Kabashima, T. Wadayama, and T. Tanaka, “Typical reconstruc-tion limit of compressed sensing based on l p -norm minimization,”arXiv:0907.0914 [cs.IT]., Jun. 2009. [49] D. Voiculescu, “Limit laws for random matrices and free products,” Inventiones Mathematicae , vol. 104, pp. 201–220, 1991.[50] R. R. M¨uller, “Random matrices, free probability and the replicamethod,” in
EUSIPCO , 2004.[51] G. Caire, S. Shamai, A. Tulino, and S. Verd´u, “Support recovery incompressed sensing: Information-theoretic bounds,” in
Proc. UCSDWorkshop Inform. Theory & Its Applications , La Jolla, CA, Jan. 2011.[52] S. Verd´u, “Minimum probability of error for asynchronous Gaussianmultiple-access channel,”
IEEE Trans. Inform. Theory , vol. 32, no. 1,pp. 85–96, Jan. 1986.[53] S. Verdu,
Multiuser Detection . New York: Cambridge University Press,1998.[54] N. Merhav, “Physics of the Shannon limits,” arXiv:0903.1484 [cs.IT].,Mar. 2009.[55] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed sensing andredundant dictionaries,”
IEEE Trans. Inform. Theory , vol. 54, no. 5, pp.2210–2219, May 2008.[56] A. K. Fletcher, S. Rangan, and V. K. Goyal, “Necessary and sufficientconditions for sparsity pattern recovery,”
IEEE Trans. Inform. Theory ,vol. 55, no. 12, pp. 5758–5772, Dec. 2009.[57] V. A. Marˇcenko and L. A. Pastur, “Distribution of eigenvalues for somesets of random matrices,”
Math. USSR–Sbornik , vol. 1, no. 4, pp. 457–483, 1967.[58] S. Verd´u and S. Shamai, “Spectral efficiency of CDMA with randomspreading,”
IEEE Trans. Inform. Theory , vol. 45, no. 3, pp. 622–640,Mar. 1999.[59] D. Tse and S. Hanly, “Linear multiuser receivers: Effective interference,effective bandwidth and capacity,”
IEEE Trans. Inform. Theory , vol. 45,no. 3, pp. 641–675, Mar. 1999.[60] S. Verd´u and S. Shamai, “Multiuser detection with random spreadingand error-correction codes: Fundamental limits,” in
Proc. Allerton Conf.on Commun., Control and Comp. , Monticello, IL, Sep. 1997.[61] S. Rangan, A. K. Fletcher, and V. K. Goyal, “Asymptotic analysis ofMAP estimation via the replica method and applications to compressedsensing,” arXiv:0906.3234v1 [cs.IT]., Jun. 2009.[62] D. Wipf and B. Rao, “Comparing the effects of different weight distri-butions on finding sparse representations,” in
Proc. Neural InformationProcess. Syst. , Vancouver, Canada, Dec. 2006.[63] A. K. Fletcher, S. Rangan, and V. K. Goyal, “On–off random ac-cess channels: A compressed sensing framework,” arXiv:0903.1022v1[cs.IT]., Mar. 2009.[64] A. Miller,
Subset Selection in Regression , 2nd ed., ser. Monographs onStatistics and Applied Probability. New York: Chapman & Hall/CRC,2002, no. 95.[65] M. J. Wainwright, “Information-theoretic limits on sparsity recovery inthe high-dimensional and noisy setting,”
IEEE Trans. Inform. Theory ,vol. 55, no. 12, pp. 5728–5741, Dec. 2009.[66] M. Akc¸akaya and V. Tarokh, “Shannon-theoretic limits on noisy com-pressive sampling,”
IEEE Trans. Inform. Theory , vol. 56, no. 1, pp.492–504, Jan. 2010.[67] G. Reeves, “Sparse signal sampling using noisy linear projections,” Univ.of California, Berkeley, Dept. of Elec. Eng. and Comp. Sci., Tech. Rep.UCB/EECS-2008-3, Jan. 2008.[68] S. Aeron, V. Saligrama, and M. Zhao, “Information theoretic bounds forcompressed sensing,”
IEEE Trans. Inform. Theory , vol. 56, no. 10, pp.5111–5130, Oct. 2010.[69] V. Saligrama and M. Zhao, “Thresholded basis pursuit: LP algorithm foroder-wise optimal support recovery for sparse and approximately sparsesignals from noisy measurements,”
IEEE Trans. Inform. Theory , vol. 57,no. 3, pp. 1567–1586, Mar. 2011.[70] J. Haupt and R. Nowak, “Signal reconstruction from noisy randomprojections,”
IEEE Trans. Inform. Theory , vol. 52, no. 9, pp. 4036–4048,Sep. 2006.[71] E. J. Cand`es and T. Tao, “The Dantzig selector: Statistical estimationwhen p is much larger than n ,” Ann. Stat. , vol. 35, no. 6, pp. 2313–2351,Dec. 2007.[72] S. Sarvotham, D. Baron, and R. G. Baraniuk, “Measurements vs. bits:Compressed sensing meets information theory,” in
Proc. 44th Ann.Allerton Conf. on Commun., Control and Comp. , Monticello, IL, Sep.2006.[73] A. Dembo and O. Zeitouni,
Large Deviations Techniques and Applica-tions . New York: Springer, 1998.[74] H. Nishimori and D. Sherrington, “Absence of replica symmetry break-ing in a region of the phase diagram of the Ising spin glass,” in
Amer.Inst. Phys. Conf.: Disordered and Complex Sys. , 2001. [75] J. R. L. de Almeida and D. J. Thouless, “Stability of the Sherrington–Kirkpatrick solution of a spin glass model,”