[PDF] Asymptotic Analysis of ADMM for Compressed Sensing

Abstract

In this paper, we analyze the asymptotic behavior of alternating direction method of multipliers (ADMM) for compressed sensing, where we reconstruct an unknown structured signal from its underdetermined linear measurements. The analytical tool used in this paper is recently developed convex Gaussian min-max theorem (CGMT), which can be applied to various convex optimization problems to obtain its asymptotic error performance. In our analysis of ADMM, we analyze the convex subproblem in the update of ADMM and characterize the asymptotic distribution of the tentative estimate obtained at each iteration. The result shows that the update equations in ADMM can be decoupled into a scalar-valued stochastic process in the asymptotic regime with the large system limit. From the asymptotic result, we can predict the evolution of the error (e.g. mean-square-error (MSE) and symbol error rate (SER)) in ADMM for large-scale compressed sensing problems. Simulation results show that the empirical performance of ADMM and its theoretical prediction are close to each other in sparse vector reconstruction and binary vector reconstruction.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 1

Asymptotic Analysis of ADMMfor Compressed Sensing

Ryo Hayakawa,

Member, IEEE

Abstract —In this paper, we analyze the asymptotic behaviorof alternating direction method of multipliers (ADMM) forcompressed sensing, where we reconstruct an unknown struc-tured signal from its underdetermined linear measurements. Theanalytical tool used in this paper is recently developed convexGaussian min-max theorem (CGMT), which can be applied tovarious convex optimization problems to obtain its asymptoticerror performance. In our analysis of ADMM, we analyze theconvex subproblem in the update of ADMM and characterizethe asymptotic distribution of the tentative estimate obtained ateach iteration. The result shows that the update equations inADMM can be decoupled into a scalar-valued stochastic processin the asymptotic regime with the large system limit. From theasymptotic result, we can predict the evolution of the error(e.g. mean-square-error (MSE) and symbol error rate (SER)) inADMM for large-scale compressed sensing problems. Simulationresults show that the empirical performance of ADMM and itstheoretical prediction are close to each other in sparse vectorreconstruction and binary vector reconstruction.

Index Terms —Compressed sensing, alternating directionmethod of multipliers, convex Gaussian min-max theorem,asymptotic performance

I. I

NTRODUCTION C OMPRESSED sensing [1]–[4] becomes a key technologyin the ﬁeld of signal processing such as image process-ing [5], [6] and wireless communication [7], [8]. A basicproblem in compressed sensing is to reconstruct an unknownsparse vector from its underdetermined linear measurements,where the number of measurements is less than that ofunknown variables. The compressed sensing techniques takeadvantage of the sparsity as the prior knowledge to reconstructthe vector. The idea of compressed sensing can also be appliedfor other structured signals by appropriately utilizing thestructures, e.g., group sparsity [9], low-rankness [10], [11],and discreteness [12], [13].For compressed sensing, various algorithms have been pro-posed in the literature. Greedy algorithms such as match-ing pursuit (MP) [14] and orthogonal matching pursuit(OMP) [15], [16] iteratively update the support of the esti-mate of the unknown sparse vector. Several improved greedyalgorithms have also been proposed to achieve better re-construction performance [17]–[22]. Another approach forcompressed sensing is based on message passing algorithmsusing Bayesian framework. Approximated belief propagation(BP) [23] and approximate message passing (AMP) [24]

This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version mayno longer be accessible.Ryo Hayakawa is with Graduate School of Engineering Science, OsakaUniversity, Osaka 560-8531, Japan (e-mail: [email protected]). can reconstruct the unknown vector with low computationalcomplexity. Moreover, the asymptotic performance can bepredicted by the state evolution framework [24], [25]. TheAMP algorithm can also be used when the unknown vector hassome structure other than the sparsity [26], [27]. However, theAMP algorithm requires an assumption on the measurementmatrix, and hence other message passing-based algorithmshave also been proposed [28]–[31].Convex optimization-based approaches have also been wellstudied for compressed sensing. The most popular convexoptimization problem for compressed sensing is the (cid:96) op-timization, where we utilize the (cid:96) norm as the regularizer topromote the sparsity of the estimate. Although the objectivefunction is not differentiable, the iterative soft threshold-ing algorithm (ISTA) [32]–[34] and the fast iterative softthresholding algorithm (FISTA) [35] can solve the (cid:96) op-timization problem with feasible computational complexity.Another promising algorithm is alternating direction methodof multipliers (ADMM) [36]–[39], which can be applied towider class of optimization problems than ISTA and FISTA.Moreover, ADMM can provide a sufﬁciently accurate solutionwith relatively small number of iterations in practice [39].However, since the convergence speed largely depends onthe parameter, it is important to determine the appropriateparameter in practical applications. For the parameter selectionof ADMM, several approaches have been proposed [40]–[44]. However, they are rather heuristic or inapplicable tocompressed sensing problems.There are several theoretical analyses for convexoptimization-based compressed sensing, e.g., [45]–[47].In particular, recently developed convex Gaussian min-maxtheorem (CGMT) [48], [49] can be utilized to obtain theasymptotic error of various optimization problems in aprecise manner. For example, the asymptotic mean-square-error (MSE) has been analyzed for various regularizedestimators [49], [50]. The asymptotic symbol error rate (SER)has also been derived for convex optimization-based discrete-valued vector reconstruction [51], [52]. The CGMT-basedanalysis has been extended for the optimization problem inthe complex-valued domain [53], whereas above analysesconsider optimization problems in the real-valued domain.These analyses focus on the performance of the optimizer,and do not deeply discuss the optimization algorithm toobtain the optimizer.In this paper, we analyze the asymptotic behavior of ADMMfor convex optimization-based compressed sensing. The mainidea is that, when we use the squared loss function as the dataﬁdelity term, the subproblem in the iterations of ADMM can a r X i v : . [ ee ss . SP ] S e p OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 2 be analyzed by the CGMT framework. We thus analyze theasymptotic property of the tentative estimate of the unknownvector at each iteration in ADMM. We show that the asymp-totic distribution of the tentative estimate can be characterizedby a scalar-valued stochastic process, which depends on themeasurement ratio, the parameter in the optimization problem,the parameter in ADMM, the distribution of the unknownvector, and the noise variance. As a corollary, we can predictthe evolution of the error such as MSE and SER in ADMM forlarge-scale compressed sensing problems. We can also utilizethe asymptotic result to reveal the effect of the parameterin ADMM and tune it to achieve the fast convergence. Asexamples, we consider sparse vector reconstruction and binaryvector reconstruction and then evaluate the asymptotic resultvia computer simulations. Simulation results show that theasymptotic evolution of MSE converges to the MSE of theoptimizer, which can be obtained with the previous CGMT-based analysis in the literature [49]. We also observe that theempirical performance of ADMM and its theoretical predictionare close to each other in both sparse vector reconstruction andbinary vector reconstruction.The rest of the paper is organized as follows. In Sec-tion II, we describe the ADMM-based compressed sensingand CGMT as the preliminary. We then provide the mainanalytical results for ADMM in Section III. In Section IV,we consider two examples of the reconstruction problem andshow several simulation results. Finally, Section V presentssome conclusions.In this paper, we use the following notations. We denotethe transpose by ( · ) T and the identity matrix by I . Fora vector z = [ z · · · z N ] T ∈ R N , the (cid:96) norm and the (cid:96) norm are given by (cid:107) z (cid:107) = (cid:80) Nn =1 | z n | and (cid:107) z (cid:107) = (cid:113)(cid:80) Nn =1 z n , respectively. We denote the number of nonzeroelements of z by (cid:107) z (cid:107) . sign( · ) denotes the sign function.For a lower semicontinuous convex function ζ : R N → R ∪ { + ∞} , we deﬁne the proximity operator as prox ζ ( z ) =arg min u ∈ R N (cid:110) ζ ( u ) + (cid:107) u − z (cid:107) (cid:111) . The Gaussian distribu-tion with mean µ and variance σ is denoted as N ( µ, σ ) .When a sequence of random variables { Θ n } ( n = 1 , , . . . )converges in probability to Θ , we denote Θ n P −→ Θ as n → ∞ or plim n →∞ Θ n = Θ .II. P RELIMINARIES

A. ADMM-Based Compressed Sensing

In this paper, we consider the reconstruction of an N dimensional vector x = [ x · · · x N ] T ∈ R N from its linearmeasurements given by y = Ax + v ∈ R M . (1)Here, A ∈ R M × N is a known measurement matrix and v ∈ R M is an additive Gaussian noise vector. We denotethe measurement ratio by ∆ = M/N . In the scenario ofcompressed sensing, we focus on the underdetermined casewith ∆ < and utilize the structure of x as the priorknowledge for the reconstruction. Note that we can use not only the sparsity but also other structures such as boundednessand discreteness [13], [54].The convex optimization-based method is a promising ap-proach for compressed sensing because we can ﬂexibly designthe objective function to utilize the structure of the unknownvector x . In this paper, we consider the following convexoptimization problem minimize s ∈ R N (cid:26) (cid:107) y − As (cid:107) + λf ( s ) (cid:27) , (2)where f ( · ) : R N → R ∪ { + ∞} is a convex regularizer toutilize the prior knowledge of the unknown vector x . Forexample, (cid:96) regularization f ( s ) = (cid:107) s (cid:107) is a popular convexregularizer for the reconstruction of the sparse vector. Theregularization parameter λ ( > ) controls the balance betweenthe data ﬁdelity term (cid:107) y − As (cid:107) and the regularization term λf ( s ) .ADMM has been used in wide range of applications becauseit can be applied to various optimization problems. Moreover,we can obtain a sufﬁciently accurate solution with relativelysmall number of iterations in practice [39]. To derive ADMMfor the optimization problem (2), we ﬁrstly rewrite (2) as minimize s , z ∈ R N (cid:26) (cid:107) y − As (cid:107) + λf ( z ) (cid:27) subject to s = z . (3)The update equations of ADMM for (3) are given by s ( k +1) = arg min s ∈ R N (cid:26) (cid:107) y − As (cid:107) + ρ (cid:13)(cid:13)(cid:13) s − z ( k ) + w ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (4) = (cid:0) A T A + ρ I (cid:1) − (cid:16) A T y + ρ (cid:16) z ( k ) − w ( k ) (cid:17)(cid:17) , (5) z ( k +1) = arg min z ∈ R N (cid:26) λf ( z ) + ρ (cid:13)(cid:13)(cid:13) s ( k +1) − z + w ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (6) = prox λρ f (cid:16) s ( k +1) + w ( k ) (cid:17) , (7) w ( k +1) = w ( k ) + s ( k +1) − z ( k +1) , (8)where k ( = 0 , , , . . . ) is the iteration index in the algorithmand ρ ( > ) is the parameter. In this paper, we refer to s ( k +1) as the tentative estimate of the unknown vector x in ADMM.We use z (0) = w (0) = as the initial value in this paper. B. CGMT

CGMT associates the primary optimization (PO) problemwith the auxiliary optimization (AO) problem given by(PO): Φ( G ) = min w ∈S w max u ∈S u (cid:8) u T Gw + ξ ( w , u ) (cid:9) , (9)(AO): φ ( g , h ) = min w ∈S w max u ∈S u (cid:8) (cid:107) w (cid:107) g T u − (cid:107) u (cid:107) h T w + ξ ( w , u ) } , (10)respectively, where G ∈ R M × N , g ∈ R M , h ∈ R N , S w ⊂ R N , S u ⊂ R M , and ξ ( · , · ) : R N × R M → R . S w and S u are assumed to be closed compact sets. ξ ( · , · ) is a continuousconvex-concave function on S w ×S u . Also, the elements of G , OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 3 g , and h are i.i.d. standard Gaussian random variables. Fromthe following theorem, we can relate the optimizer ˆ w Φ ( G ) of(PO) with the optimal value of (AO) in the large system limitof M, N → ∞ with a ﬁxed ratio ∆ =

M/N . For simplicity,we denote the large system limit by N → ∞ in this paper. Theorem II.1 (CGMT [51]) . Let S be an open set in S w and S c = S w \ S . Also, let φ S c ( g , h ) be the optimal valueof (AO) with the constraint w ∈ S c . If there are constants η > and ¯ φ satisfying (i) φ ( g , h ) ≤ ¯ φ + η and (ii) φ S c ( g , h ) ≥ ¯ φ + 2 η with probability approaching , then wehave lim N →∞ Pr ( ˆ w Φ ( G ) ∈ S ) = 1 .III. M AIN R ESULT

In this section, we provide the main analytical result for thebehavior of ADMM for the problem (2). In the analysis, weuse the following assumptions.

Assumption III.1.

The unknown vector x is composed of in-dependent and identically distributed (i.i.d.) random variableswith a known distribution p X which has some mean and vari-ance. The measurement matrix A ∈ R M × N is composed ofi.i.d. Gaussian random variables with zero mean and variance /N . Moreover, the additive noise vector v ∈ R M is alsoGaussian with zero mean and the covariance matrix σ v I . Assumption III.2.

The regularizer f ( · ) : R N → R ∪ { + ∞} is a lower semicontinuous convex function. Moreover, f ( · ) is separable and expressed with the corresponding function ˜ f ( · ) : R → R ∪ { + ∞} as f ( s ) = (cid:80) Nn =1 ˜ f ( s n ) , where s =[ s · · · s N ] T ∈ R N . With the slight abuse of notation, we usethe same f ( · ) for the corresponding function ˜ f ( · ) .In Assumption III.1, we assume that the elements of themeasurement matrix A are Gaussian variables because CGMTrequires the Gaussian assumption in the proof [49]. However,the universality [55]–[57] of random matrices suggests that theresult of the analysis can be applied when the measurementmatrix is drawn from some other distributions. In fact, ourtheoretical result is valid for the random matrix from Bernoullidistribution with { / √ N , − / √ N } in computer simulations(See Example IV.1).In Assumption III.2, we assume the separability of theregularizer f ( · ) . Under this assumption, the proximity operator prox γf ( · ) : R N → R N ( γ > ) becomes an element-wisefunction, i.e., the n -th element of the output depends only onthe corresponding n -th element of the input.Under Assumptions III.1 and III.2, we present the followingtheorem. Theorem III.1.

We assume that x , A , v , and f ( · ) satisfythe Assumptions III.1 and III.2. We consider the followingstochastic process S k +1 = ˆ S k +1 ( α ∗ k , β ∗ k ) (11) Z k +1 = prox λρ f ( S k +1 + W k ) (12) W k +1 = W k + S k +1 − Z k +1 , (13) with the index k , where ˆ S k +1 ( α, β ) is deﬁned as ˆ S k +1 ( α, β )= 1 β √ ∆ α + ρ (cid:32) β √ ∆ α (cid:18) X + α √ ∆ H (cid:19) + ρ ( Z k − W k ) (cid:33) (14)with the random variables X ∼ p X and H ∼ N (0 , ( Z = W = 0 ). We here assume the optimization problem min α> max β> (cid:40) αβ √ ∆2 + βσ v √ ∆2 α − β + E (cid:104) J ( k +1) ( α, β ) (cid:105)(cid:41) (15)has a unique optimizer ( α ∗ k , β ∗ k ) , where J ( k +1) ( α, β )= β √ ∆2 α (cid:16) ˆ S k +1 ( α, β ) − X (cid:17) − βH (cid:16) ˆ S k +1 ( α, β ) − X (cid:17) + ρ (cid:16) ˆ S k +1 ( α, β ) − Z k + W k (cid:17) . (16)The expectation is taken over all random variables X , H , Z k ,and W k .Let µ s ( k +1) be the empirical distribution of s ( k +1) cor-responding to the cumulative distribution function (CDF)given by P s ( k +1) ( s ) = N (cid:80) Nn =1 I (cid:16) s ( k +1) n < s (cid:17) , where wedeﬁne I (cid:16) s ( k +1) n < s (cid:17) = 1 if s ( k +1) n < s and otherwise I (cid:16) s ( k +1) n < s (cid:17) = 0 . Moreover, we denote the distribution ofthe random variable S k +1 in (11) as µ S k +1 . Then, the empiricaldistribution µ s ( k +1) converges weakly in probability to µ S k +1 as N → ∞ , i.e., (cid:82) gdµ s ( k +1) P −→ (cid:82) gdµ S k +1 holds for anycontinuous compactly supported function g ( · ) : R → R . Proof:

See Appendix A.Theorem III.1 means that the distribution of the elementsof s k is characterized by the random variable S k in theasymptotic regime with M, N → ∞ ( M/N = ∆ ). The updateof S k in (11) can be regarded as the ‘decoupled’ version of theupdate of s ( k ) in (4). Figure 1 shows the comparison betweenthe update of s ( k ) in (4) and its decoupled version obtainedfrom Theorem III.1. In the update of s ( k ) , the measurementvector y is obtained through the linear transformation of x and additive Gaussian noise channel. On the other hand, inthe decoupled system, the random variable X goes throughonly the additive Gaussian noise channel. We can also seethat the update of s ( k ) and S k have the similar form becausethey can be rewritten as s ( k +1) = (cid:0) A T A + ρ I (cid:1) − (cid:16) A T Ax + A T v + ρ (cid:16) z ( k ) − w ( k ) (cid:17)(cid:17) , (17) S k +1 = 1 β ∗ k √ ∆ α ∗ k + ρ (cid:32) β ∗ k √ ∆ α ∗ k X + β ∗ k H + ρ ( Z k − W k ) (cid:33) , (18) OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 4 s ( k +1) = ( A 𝖳 A + ρ I ) −1 ( A 𝖳 y + ρ ( z ( k ) − w ( k ) ) ) x X A v y

Optimization problem in (4) z ( k ) w ( k ) α * k Δ H = 1 β * k Δ α * k + ρ β * k Δ α * k X + β * k H + ρ ( Z k − W k ) Update of in (11) S k Z k W k (a) update of in ADMM s ( k ) (b) decoupled scalar update of S k = ( A 𝖳 A + ρ I ) −1 ( A 𝖳 Ax + A 𝖳 v + ρ ( z ( k ) − w ( k ) ) ) S k +1 = 1 β * k Δ α * k + ρ β * k Δ α * k ( X + α * k Δ H ) + ρ ( Z k − W k ) Fig. 1. Comparison between the update of s ( k ) and its decoupled version. respectively. The update of S k in (11) and (14) shows that S k +1 is the weighted sum of X + α ∗ k √ ∆ H and Z k − W k with theweights β ∗ k √ ∆ α ∗ k and ρ , respectively. Since ρ is the parameterof ADMM, we can control the weight in the update of S k bytuning ρ .One of the most important performance measures for thereconstruction algorithm is MSE given by N (cid:13)(cid:13) s ( k ) − x (cid:13)(cid:13) . Asin the CGMT-based analysis [49], the optimal value of α is re-lated to the asymptotic MSE. Speciﬁcally, from Theorem III.1,the asymptotic MSE of the tentative estimate s ( k ) in ADMMcan be obtained as follows (See also [52, Remark IV. 1]). Corollary III.1.

Under the assumptions in Theorem III.1, theasymptotic MSE of s ( k +1) is given by plim N →∞ N (cid:13)(cid:13)(cid:13) s ( k +1) − x (cid:13)(cid:13)(cid:13) = ( α ∗ k ) − σ v . (19)From the theoretical result in Theorem III.1 (or Corol-lary III.1), we can tune the parameter ρ in ADMM to achievethe fast convergence. The conventional parameter tuning [40]–[44] focus on the difference between the tentative estimateand the optimizer of the optimization problem. On the otherhand, the parameter tuning based on Theorem III.1 can takeaccount of the error from the true unknown vector in theasymptotic regime. Since the effect of ρ to α ∗ k and β ∗ k is complicated, the explicit expression of the optimal ρ isdifﬁcult to obtain. By numerical simulations, however, we canpredict the performance of ADMM and select the parameter ρ achieving the fast convergence. For instance, see Example IV.1in Section IV. IV. E XAMPLES

In this section, we consider two examples of the recon-struction problem and compare the empirical performance ofADMM and its prediction obtained by Theorem III.1.

Example IV.1 (Sparse Vector Reconstruction) . The (cid:96) opti-mization minimize s ∈ R N (cid:26) (cid:107) y − As (cid:107) + λ (cid:107) s (cid:107) (cid:27) (20)with the (cid:96) norm is the most popular convex optimizationproblem for sparse vector reconstruction. The (cid:96) regularizationpromote the sparsity of the estimate of the unknown vectorin the reconstruction. We here assume that the distributionof the unknown vector x is given by the Bernoulli-Gaussiandistribution as p X ( x ) = p δ ( x ) + (1 − p ) p H ( x ) , (21)where p ∈ (0 , , δ ( · ) denotes the Dirac delta function,and p H ( · ) is the probability density function of the standardGaussian distribution. When p is large, the unknown vectorbecomes sparse. The proximity operator of the (cid:96) norm isgiven by (cid:104) prox γ (cid:107)·(cid:107) ( r ) (cid:105) n = sign( r n ) max( | r n | − γ, , (22)where r = [ r · · · r N ] T ∈ R N , γ > , and [ · ] n denotes the n -th element of the vector. By using (22), we can performADMM in (4)–(8) for the (cid:96) optimization (20). Theorem III.1enables us to predict the asymptotic behavior of ADMM forthe (cid:96) optimization.We ﬁrstly compare the empirical performance of the sparsevector reconstruction and its prediction obtained from The-orem III.1. Figure 2 shows that the MSE performance ofthe sparse vector reconstruction, where ∆ = 0 . , p = 0 . ,and σ v = 0 . . The measurement matrix A and the noisevector v satisfy Assumption III.1. The parameter ρ of ADMMis set as ρ = 0 . . In the ﬁgure, ‘empirical’ means theempirical performance obtained by ADMM in (4)–(8) when N = 50 , , , and . The empirical performanceis obtained by averaging the results for independentrealizations of x , A , and v . On the other hand, ‘prediction’ OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 5 − − − M S E empirical ( N = 50)empirical ( N = 100)empirical ( N = 500)empirical ( N = 1000)predictionasymptotic MSE of optimizer Fig. 2. MSE performance for different N in sparse vector reconstruction( ∆ = 0 . , p = 0 . , σ v = 0 . , ρ = 0 . ). shows the theoretical performance prediction obtained by The-orem III.1 (or Corollary III.1) in the large system limit. Sincethe exact computation of the distribution of ( S k , Z k , W k ) isdifﬁcult in practice, we make , realizations of therandom variables ( S k , Z k , W k ) and obtain the approximationof ( α ∗ k , β ∗ k ) . For the optimization of α and β , we can usesearching techniques such as ternary search and golden-sectionsearch [58]. In the simulations, we use the ternary searchwith the error tolerance − . We also show the asymptoticMSE of the optimizer obtained by applying CGMT to the (cid:96) optimization problem as in [49]. The parameter λ in (2)is determined by minimizing the asymptotic MSE. FromFig. 2, we can see that the empirical performance and itsprediction are close to each other. Moreover, they convergeto the asymptotic MSE of the optimizer in the original (cid:96) optimization problem. Precisely, there is a slight differencebetween the empirical performance and its prediction. One ofpossible reasons is that the empirical performance is evaluatedfor ﬁnite N , whereas the large system limit N → ∞ isassumed in the asymptotic analysis. Another reason is that wecreate the many realizations of ( S k , Z k , W k ) for the theoreticalprediction instead of computing their exact distributions.Next, we evaluate the MSE performance for different ma-trix structures. Figure 3 shows the MSE performance when N = 500 , M = 250 , p = 0 . , σ v = 0 . , and ρ = 0 . . Inthe ﬁgure, ’Gaussian’ means the performance when the mea-surement matrix A is composed of i.i.d. Gaussian elementsand satisﬁes Assumption III.1. On the other hand, ’Bernoulli’shows the performance when each element of measurementmatrix is drawn uniformly from { / √ N , − / √ N } . The em-pirical performance is obtained by averaging the results for independent realizations of x , A , and v . From Fig. 3, weobserve that the empirical performance for the both cases isclose to the theoretical prediction obtained by Theorem III.1(or Corollary III.1).We then evaluate the effects of the parameter ρ in ADMM.Figure 4 shows the asymptotic MSE performance for ρ =0 . , . , and . . In the ﬁgure, we set ∆ = 0 . , p = 0 . , − − − M S E GaussianBernoullipredictionasymptotic MSE of optimizer

Fig. 3. MSE performance for different measurement matrix in sparse vectorreconstruction ( N = 500 , M = 250 , p = 0 . , σ v = 0 . , ρ = 0 . ). − − − M S E empirical ( ρ = 0 . ρ = 0 . ρ = 0 . Fig. 4. MSE performance for different parameter ρ in sparse vector recon-struction ( ∆ = 0 . , p = 0 . , σ v = 0 . ). and σ v = 0 . . We can see that the value of the parameter ρ signiﬁcantly affects the convergence speed of ADMM. Byusing the theoretical prediction obtained from Theorem III.1,we can adjust ρ to achieve the fast convergence. Example IV.2 (Binary Vector Reconstruction) . We considerthe reconstruction of a binary vector x ∈ { , − } N with p X ( x ) = 12 ( δ ( x −

1) + δ ( x + 1)) . (23)A reasonable approach to reconstruct x ∈ { , − } N is thebox relaxation method [54], [59] given by minimize s ∈ [ − , N (cid:26) (cid:107) y − As (cid:107) (cid:27) , (24)which is a convex relaxation of the maximum likelihoodapproach minimize s ∈{ , − } N (cid:26) (cid:107) y − As (cid:107) (cid:27) . (25) OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 6 − − − − S E R empirical (∆ = 0 . . . Fig. 5. Asymptotic SER performance for different measurement ratio ∆ inbinary vector reconstruction ( N = 500 , σ v = 0 . , ρ = 0 . ). The asymptotic performance of the ﬁnal estimate obtained bythe box relaxation method has been analyzed with CGMTin [51]. The optimization problem (24) is equivalent to (2)with f ( s ) = (cid:80) Nn =1 ι ( s n ) , where ι ( s ) = (cid:40) s ∈ [ − , ∞ ( s / ∈ [ − , . (26)Since the proximity operator of ι ( · ) is given by the projectionto [ − , , i.e., prox γι ( r ) = min(max( r, − , , (27)we can perform ADMM in (4)–(8) by using (27). FromTheorem III.1, we can predict the asymptotic performance ofADMM for the box relaxation method.We evaluate the SER performance deﬁned as (cid:13)(cid:13) sign (cid:0) s ( k ) (cid:1) − x (cid:13)(cid:13) /N , which is important performancemeasure in binary vector reconstruction. From Theorem III.1,we can predict the asymptotic SER performance by E [sign( S k ) − X ] . Although the sign function is notcontinuous, we can approximate the function to use theresult of Theorem III.1 (cf. [51, Lemma A.4]). Figure 5shows the SER performance of ADMM for ∆ = 0 . , . ,and . , where N = 500 , σ v = 0 . , and ρ = 0 . . Theempirical performance is obtained by averaging resultsfor independent realizations of x , A , and v . The theoreticalprediction is computed by making , realizations of therandom variables ( S k , Z k , W k ) . We observe that the empiricalperformance and the theoretical prediction are close to eachother. We can see that the prediction of Theorem III.1 is validfor the binary vector reconstruction.Next, we compare the distributions of s ( k ) in ADMM and S k in (11). Figure 6 shows the histogram of the empiricalCDF P s ( k ) ( s ) and its prediction P S k ( s ) , where N = 500 , M = 400 , σ v = 0 . , and ρ = 0 . . The left, middle, andright ﬁgure denotes the distributions when k = 1 , k = 4 , and k = 7 , respectively. The empirical performance is obtainedby averaging results for independent realizations of x , A , and v . The theoretical prediction is computed by making , realizations of the random variables ( S k , Z k , W k ) .From Fig. 6, we observe that the empirical CDF agrees wellwith the theoretical prediction at each iteration. We can alsosee that the distributions concentrate near and − as theiteration proceeds. V. C ONCLUSION

In this paper, we have analyzed the asymptotic behavior ofADMM for compressed sensing. By using recently developedCGMT framework, we have shown that the asymptotic distri-bution of the tentative estimate in ADMM is characterizedby the stochastic process { S k } k =1 , ,... . The main theoremenables us to predict the error evolution of ADMM in thelarge system limit. We can also tune the parameter in ADMMfrom the asymptotic result. Simulation results show that theempirical performance obtained by ADMM and its theoreticalprediction are close to each other in terms of MSE and SER insparse vector reconstruction and binary vector reconstruction,respectively.We here show some possible research directions based onthe analysis in this paper. Although we consider the ﬁxedparameter ρ in ADMM, it is possible to use the differentparameter ρ k at each iteration and predict the asymptoticperformance in the same manner. The theoretical result inthis case would provide the faster convergence of the algo-rithm. Moreover, both ADMM and CGMT can be appliedto the convex optimization problem in the complex-valueddomain [53], [60], [61]. It would be also an interesting topicto analyze the performance of ADMM for compressed sensingproblems in the complex-valued domain, which often appearin communication systems.A PPENDIX AP ROOF OF T HEOREM

III.1In Appendices A–C, we provide the proof of the maintheorem in Theorem III.1. Figure 7 shows the overview ofthe proof in the appendices.The equations (11)–(13) in Theorem III.1 correspond to theupdates (4)–(8) in ADMM, respectively. Since the updates of z ( k ) and w ( k ) are element-wise from Assumption III.2, we cansee that these updates can be characterized by (12) and (13),respectively. Hence, it is sufﬁcient to show that the behaviorof s ( k +1) in (4) can be characterized with the random variable S k +1 in (11). By applying the standard approach with CGMTto the optimization problem (4), we can obtain the followinglemma, which implies that S k +1 has the probabilistic propertyof s ( k +1) in the asymptotic regime. Lemma A.1.

Let L = { ψ ( · , · ) : R × R → R | ψ ( · , x ) is Lipschitz continuous for any x ∈ R } . (28)For any function ψ ( · , · ) ∈ L , we have plim N →∞ N N (cid:88) n =1 ψ (cid:16) s ( k +1) n − x n , x n (cid:17) = E [ ψ ( S k +1 − X, X )] . (29) OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 7 − − s . . . . . . c u m u l a t i v e d i s tr i bu t i o n − − s . . . . . . c u m u l a t i v e d i s tr i bu t i o n − − s . . . . . . c u m u l a t i v e d i s tr i bu t i o n Fig. 6. Comparison between empirical CDF and its prediction in binary vector reconstruction ( N = 500 , M = 400 , σ v = 0 . , ρ = 0 . ). Appendix A (Proof of Theorem III.1)Appendix B (Proof of Lemma A.1)Appendix C (Proof of Lemma B.1)Analyze the solution of (AO)Analyze (4) via CGMT frameworkApply CGMT (Theorem II.1)Construct 𝒮 k +1 Obtain (PO) and (AO) from (4)Show the convergence of μ s ( k +1) Obtain the asymptotic distribution of s ( k +1) Fig. 7. Overview of Appendices A–C.

Proof:

See Appendix B.To prove Theorem III.1, we show lim N →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) gdµ s ( k +1) − (cid:90) gdµ S k +1 (cid:12)(cid:12)(cid:12)(cid:12) < ε (cid:19) = 1 (30)for any continuous compactly supported function g ( · ) : R → R and any ε ( > ). Since the function g ( · ) has a compactsupport, there exists a polynomial ν ( · ) : R → R such that | g ( x ) − ν ( x ) | < ε (31)for any x in the support from the Stone-Weierstrass theo- rem [62]. We thus have (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) gdµ s ( k +1) − (cid:90) gdµ S k +1 (cid:12)(cid:12)(cid:12)(cid:12) < (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) gdµ s ( k +1) − (cid:90) νdµ s ( k +1) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) νdµ s ( k +1) − (cid:90) νdµ S k +1 (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) νdµ S k +1 − (cid:90) gdµ S k +1 (cid:12)(cid:12)(cid:12)(cid:12) (32) < (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) νdµ s ( k +1) − (cid:90) νdµ S k +1 (cid:12)(cid:12)(cid:12)(cid:12) + 23 ε. (33)Since the polynomial ν ( · ) is Lipschitz on the compact support,we deﬁne ψ ( e, x ) = ν ( e + x ) in Lemma A.1 and obtain plim N →∞ N N (cid:88) n =1 ν (cid:16) s ( k +1) n (cid:17) = E [ ν ( S k +1 )] . (34)Since we have (30) from (33) and (34), we can obtain (cid:82) gdµ s ( k +1) P −→ (cid:82) gdµ S k +1 as N → ∞ , which is the resultof Theorem III.1. A PPENDIX BP ROOF OF L EMMA

A.1We investigate the asymptotic behavior of the update equa-tion (4). Since the analysis for the optimization problem (4)is based on the standard approach with CGMT [49], we omitsome details and show only the outline of the proof. For detailsof the CGMT-based analysis, see [49]–[52] and referencestherein.

A. (PO) Problem

We ﬁrstly deﬁne the error vector e = s − x to rewrite theoptimization problem (4) as min e ∈ R N N (cid:26) (cid:107) Ae − v (cid:107) + ρ (cid:13)(cid:13)(cid:13) e + x − z ( k ) + w ( k ) (cid:13)(cid:13)(cid:13) (cid:27) , (35)where the objective function is normalized by N . By using (cid:107) Ae − v (cid:107) = max u ∈ R M (cid:26) √ N u T ( Ae − v ) − N (cid:107) u (cid:107) (cid:27) , (36) OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 8 we can obtain the equivalent (PO) problem given by min e ∈ R N max u ∈ R M (cid:40) N u T ( √ N A ) e − √ N v T u − (cid:107) u (cid:107) + 1 N ρ (cid:13)(cid:13)(cid:13) e + x − z ( k ) + w ( k ) (cid:13)(cid:13)(cid:13) (cid:41) . (37) B. (AO) Problem

The corresponding (AO) problem is given by min e ∈S e max u ∈S u (cid:40) N (cid:0) (cid:107) e (cid:107) g T u − (cid:107) u (cid:107) h T e (cid:1) − √ N v T u − (cid:107) u (cid:107) + 1 N ρ (cid:13)(cid:13)(cid:13) e + x − z ( k ) + w ( k ) (cid:13)(cid:13)(cid:13) (cid:41) . (38)Although the constraint set of the problem (37) is unbounded,we can introduce a bounded constraint with sufﬁciently largeconstraint sets S e and S u to apply CGMT (For details,see [49, Appendix A]). Since both g and v are Gaussian,the vector (cid:107) e (cid:107) √ N g − v is also Gaussian with zero mean andthe covariance matrix (cid:16) (cid:107) e (cid:107) N + σ v (cid:17) I . Hence, we can rewrite (cid:16) (cid:107) e (cid:107) √ N g − v (cid:17) T u as (cid:113) (cid:107) e (cid:107) N + σ v g T u with the slight abuse ofnotation, where g has i.i.d. standard Gaussian elements. Weapply this technique to (38), set (cid:107) u (cid:107) = β , and use the identity χ = min α> (cid:18) α χ α (cid:19) (39)for χ ( > ) to rewrite (38) as min α> max β> (cid:40) αβ (cid:107) g (cid:107) √ N + βσ v α (cid:107) g (cid:107) √ N − β + min e ∈S e N N (cid:88) n =1 J ( k +1) n ( e n , α, β ) (cid:41) , (40)where J ( k +1) n ( e n , α, β ) = β α (cid:107) g (cid:107) √ N e n − βh n e n + ρ (cid:16) e n + x n − z ( k ) n + w ( k ) n (cid:17) . (41)The minimum value of J ( k +1) n ( e n , α, β ) is achieved when ˆ e ( k +1) n ( α, β ) = 1 βα (cid:107) g (cid:107) √ N + ρ (cid:16) βh n − ρ (cid:16) x n − z ( k ) n + w ( k ) n (cid:17)(cid:17) . (42)We then deﬁne ˆ s ( k +1) n ( α, β ) = ˆ e ( k +1) n ( α, β ) + x n , which isgiven by ˆ s ( k +1) n ( α, β ) = 1 βα (cid:107) g (cid:107) √ N + ρ (cid:32) βα (cid:107) g (cid:107) √ N (cid:32) x n + √ N (cid:107) g (cid:107) αh n (cid:33) + ρ (cid:16) z ( k ) n − w ( k ) n (cid:17)(cid:33) . (43) The optimization problem (40) can be rewritten as min α> max β> (cid:40) αβ (cid:107) g (cid:107) √ N + βσ v α (cid:107) g (cid:107) √ N − β + 1 N N (cid:88) n =1 J ( k +1) n (cid:16) ˆ s ( k +1) n ( α, β ) − x n , α, β (cid:17)(cid:41) . (44)As N → ∞ , the objective function of (44) converges point-wise to αβ √ ∆2 + βσ v √ ∆2 α − β + E (cid:104) J ( k +1) ( α, β ) (cid:105) , (45)where J ( k +1) ( α, β )= β √ ∆2 α (cid:16) ˆ S k +1 ( α, β ) − X (cid:17) − βH (cid:16) ˆ S k +1 ( α, β ) − X (cid:17) + ρ (cid:16) ˆ S k +1 ( α, β ) − Z k + W k (cid:17) (46)and ˆ S k +1 ( α, β ) is deﬁned in (14). Note that the function (45)is the objective function of (15) in Theorem III.1. C. Applying CGMT

To apply CGMT for the above (PO) and (AO), we considerthe conditions (i) and (ii) in Theorem II.1. We denote the opti-mal value of the objective function in (44) and the correspond-ing solution as φ ∗ k,N and (cid:16) α ∗ k,N , β ∗ k,N (cid:17) , respectively. Theoptimal value of e in (AO) is given by ˆ e ( k +1) N ( α ∗ k,N , β ∗ k,N ) = (cid:104) ˆ e ( k +1)1 ( α ∗ k,N , β ∗ k,N ) · · · ˆ e ( k +1) N ( α ∗ k,N , β ∗ k,N ) (cid:105) T from (40)–(42). Moreover, let φ ∗ k be the optimal value of the objectivefunction in (15) ( = (45)) and recall that ( α ∗ k , β ∗ k ) is thecorresponding optimal value of ( α, β ) . By a similar dis-cussion to [51, Lemma IV.1], we have φ ∗ k,N P −→ φ ∗ k and ( α ∗ k,N , β ∗ k,N ) → ( α ∗ k , β ∗ k ) as N → ∞ . Thus, the optimal valueof (AO) satisﬁes the condition (i) in Theorem II.1 for ¯ φ = φ ∗ and any η ( > ).Next, we investigate the condition (ii) in Theorem II.1. Weuse the following lemma to construct the set S in CGMT. Lemma B.1.

For any function ψ ( · , · ) ∈ L ( L is deﬁnedin (28)), we have plim N →∞ N N (cid:88) n =1 ψ (cid:16) ˆ s ( k +1) n (cid:0) α ∗ k,N , β ∗ k,N (cid:1) − x n , x n (cid:17) = E (cid:104) ψ (cid:16) ˆ S k +1 ( α ∗ k , β ∗ k ) − X, X (cid:17)(cid:105) (47)

Proof:

See Appendix C.From Lemma B.1, we deﬁne S k +1 = (cid:40) z ∈ R N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 ψ ( z n , x n ) − E (cid:104) ψ (cid:16) ˆ S k +1 ( α ∗ k , β ∗ k ) − X, X (cid:17)(cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ε (cid:41) (48)

OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 9 and obtain ˆ e ( k +1) N ( α ∗ k,N , β ∗ k,N ) ∈ S k +1 with probability ap-proaching for any ε ( > ). By using the strong convexityof J ( k +1) n ( e n , α, β ) over e n , we can see that there exists aconstant η satisfying the condition (ii) in CGMT with S k +1 .Hence, from CGMT, Lemma B.1 holds not only for theoptimizer of (AO) in (38) but also for that of (PO) in (37),i.e., we have plim N →∞ N N (cid:88) n =1 ψ (cid:16) s ( k +1) n − x n , x n (cid:17) = E [ ψ ( S k +1 − X, X )] . (49)A PPENDIX CP ROOF OF L EMMA

B.1Deﬁne ¯ s ( k +1) n ( α, β )= 1 β √ ∆ α + ρ (cid:32) β √ ∆ α (cid:18) x n + α √ ∆ h n (cid:19) + ρ (cid:16) z ( k ) n − w ( k ) n (cid:17)(cid:33) , (50)where we replace (cid:107) g (cid:107) √ N in (43) with its asymptotic value √ ∆ .From the law of large numbers, we have plim N →∞ N N (cid:88) n =1 ψ (cid:16) ¯ s ( k +1) n ( α ∗ k , β ∗ k ) − x n , x n (cid:17) = E (cid:104) ψ (cid:16) ˆ S k +1 ( α ∗ k , β ∗ k ) − X, X (cid:17)(cid:105) . (51)Thus, it is sufﬁcient to show plim N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 ψ (cid:16) ˆ s ( k +1) n (cid:0) α ∗ k,N , β ∗ k,N (cid:1) − x n , x n (cid:17) − N N (cid:88) n =1 ψ (cid:16) ¯ s ( k +1) n ( α ∗ k , β ∗ k ) − x n , x n (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 . (52)Since ψ ( · , x n ) is Lipschitz, there is a constant C ψ ( > ) suchthat (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 ψ (cid:16) ˆ s ( k +1) n (cid:0) α ∗ k,N , β ∗ k,N (cid:1) − x n , x n (cid:17) − N N (cid:88) n =1 ψ (cid:16) ¯ s ( k +1) n ( α ∗ k , β ∗ k ) − x n , x n (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ψ N N (cid:88) n =1 (cid:12)(cid:12)(cid:12) ˆ s ( k +1) n (cid:0) α ∗ k,N , β ∗ k,N (cid:1) − ¯ s ( k +1) n ( α ∗ k , β ∗ k ) (cid:12)(cid:12)(cid:12) (53) P −→ N → ∞ ) , (54)which completes the proof. R EFERENCES[1] E. J. Cand`es and T. Tao, “Decoding by linear programming,”

IEEETrans. Inf. Theory , vol. 51, no. 12, pp. 4203–4215, Dec. 2005.[2] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,”

IEEE Trans. Inf. Theory , vol. 52, no. 2, pp. 489–509, Feb. 2006.[3] D. L. Donoho, “Compressed sensing,”

IEEE Trans. Inf. Theory , vol. 52,no. 4, pp. 1289–1306, Apr. 2006.[4] E. J. Cand`es and M. B. Wakin, “An introduction to compressivesampling,”

IEEE Signal Process. Mag. , vol. 25, no. 2, pp. 21–30, Mar.2008.[5] M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: The applicationof compressed sensing for rapid MR imaging,”

Magn. Reson. Med. ,vol. 58, no. 6, pp. 1182–1195, 2007.[6] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressedsensing MRI,”

IEEE Signal Process. Mag. , vol. 25, no. 2, pp. 72–82,Mar. 2008.[7] K. Hayashi, M. Nagahara, and T. Tanaka, “A user’s guide to compressedsensing for communications systems,”

IEICE Trans. Commun. , vol. E96-B, no. 3, pp. 685–712, Mar. 2013.[8] J. W. Choi, B. Shim, Y. Ding, B. Rao, and D. I. Kim, “Compressedsensing for wireless communications: Useful tips and tricks,”

IEEECommun. Surv. Tutor. , vol. 19, no. 3, pp. 1527–1550, thirdquarter 2017.[9] J. Huang and T. Zhang, “The beneﬁt of group sparsity,”

Ann. Statist. ,vol. 38, no. 4, pp. 1978–2004, Aug. 2010.[10] E. J. Cand`es and B. Recht, “Exact matrix completion via convexoptimization,”

Found Comput Math , vol. 9, no. 6, pp. 717–772, Dec.2009.[11] E. J. Cand`es and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,”

IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2053–2080, May 2010.[12] A. A¨ıssa-El-Bey, D. Pastor, S. M. A. Sba¨ı, and Y. Fadlallah, “Sparsity-based recovery of ﬁnite alphabet solutions to underdetermined linearsystems,”

IEEE Trans. Inf. Theory , vol. 61, no. 4, pp. 2008–2018, Apr.2015.[13] M. Nagahara, “Discrete signal reconstruction by sum of absolute values,”

IEEE Signal Process. Lett. , vol. 22, no. 10, pp. 1575–1579, Oct. 2015.[14] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio-naries,”

IEEE Trans. Signal Process. , vol. 41, no. 12, pp. 3397–3415,Dec. 1993.[15] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition,” in

Proc. 27th Asilomar Conference on Signals, Systemsand Computers , Nov. 1993, pp. 40–44.[16] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-surements via orthogonal matching pursuit,”

IEEE Trans. Inf. Theory ,vol. 53, no. 12, pp. 4655–4666, Dec. 2007.[17] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”

IEEE Trans. Inf. Theory , vol. 55, no. 5, pp. 2230–2249, May 2009.[18] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”

Applied and Computational Har-monic Analysis , vol. 26, no. 3, pp. 301–321, May 2009.[19] E. Liu and V. N. Temlyakov, “The orthogonal super greedy algorithmand applications in compressed sensing,”

IEEE Trans. Inf. Theory ,vol. 58, no. 4, pp. 2040–2047, Apr. 2012.[20] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution ofunderdetermined systems of linear equations by stagewise orthogonalmatching pursuit,”

IEEE Trans. Inf. Theory , vol. 58, no. 2, pp. 1094–1121, Feb. 2012.[21] J. Wang, S. Kwon, and B. Shim, “Generalized orthogonal matchingpursuit,”

IEEE Trans. Signal Process. , vol. 60, no. 12, pp. 6202–6216,Dec. 2012.[22] S. Kwon, J. Wang, and B. Shim, “Multipath matching pursuit,”

IEEETrans. Inf. Theory , vol. 60, no. 5, pp. 2986–3001, May 2014.[23] Y. Kabashima, “A CDMA multiuser detection algorithm on the basis ofbelief propagation,”

J. Phys. A: Math. Gen. , vol. 36, no. 43, pp. 11 111–11 121, Oct. 2003.[24] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo-rithms for compressed sensing,”

PNAS , vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.[25] M. Bayati and A. Montanari, “The Dynamics of Message Passing onDense Graphs, with Applications to Compressed Sensing,”

IEEE Trans.Inf. Theory , vol. 57, no. 2, pp. 764–785, Feb. 2011.

OURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, SEPTEMBER 20XX 10 [26] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of large MIMOdetection via approximate message passing,” in

Proc. IEEE InternationalSymposium on Information Theory (ISIT) , Jun. 2015, pp. 1227–1231.[27] R. Hayakawa and K. Hayashi, “Discreteness-aware approximate mes-sage passing for discrete-valued vector reconstruction,”

IEEE Trans.Signal Process. , vol. 66, no. 24, pp. 6443–6457, Dec. 2018.[28] J. C´espedes, P. M. Olmos, M. S´anchez-Fern´andez, and F. Perez-Cruz,“Expectation propagation detection for high-order high-dimensionalMIMO systems,”

IEEE Trans. Commun. , vol. 62, no. 8, pp. 2840–2849,Aug. 2014.[29] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate messagepassing,” in

Proc. IEEE International Symposium on Information Theory(ISIT) , Jun. 2017, pp. 1588–1592.[30] J. Ma and L. Ping, “Orthogonal AMP,”

IEEE Access , vol. 5, pp. 2020–2033, 2017.[31] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based sig-nal recovery from unitarily invariant measurements,”

IEEE Trans. Inf.Theory , vol. 66, no. 1, pp. 368–386, Jan. 2020.[32] I. Daubechies, M. Defrise, and C. D. Mol, “An iterative thresholding al-gorithm for linear inverse problems with a sparsity constraint,”

Commun.Pure Appl. Math. , vol. 57, no. 11, pp. 1413–1457, 2004.[33] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,”

Multiscale Model. Simul. , vol. 4, no. 4, pp. 1168–1200, Jan. 2005.[34] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projectionfor sparse reconstruction: Application to compressed sensing and otherinverse problems,”

IEEE J. Sel. Top. Signal Process. , vol. 1, no. 4, pp.586–597, Dec. 2007.[35] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholdingalgorithm for linear inverse problems,”

SIAM J. Imaging Sci. , vol. 2,no. 1, pp. 183–202, Jan. 2009.[36] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinearvariational problems via ﬁnite element approximation,”

Computers &Mathematics with Applications , vol. 2, no. 1, pp. 17–40, Jan. 1976.[37] J. Eckstein and D. P. Bertsekas, “On the Douglas-Rachford splittingmethod and the proximal point algorithm for maximal monotone oper-ators,”

Mathematical Programming , vol. 55, no. 1, pp. 293–318, Apr.1992.[38] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signalprocessing,” in

Fixed-Point Algorithms for Inverse Problems in Scienceand Engineering , ser. Springer Optimization and Its Applications. NewYork, NY: Springer New York, 2011, vol. 49, pp. 185–212.[39] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”

Found Trends Mach Learn , vol. 3, no. 1, pp. 1–122, Jan.2011.[40] A. U. Raghunathan and S. Di Cairano, “Alternating direction method ofmultipliers for strictly convex quadratic programs: Optimal parameterselection,” in

Proc. American Control Conference , Jun. 2014, pp. 4324–4329.[41] E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson, “Optimalparameter selection for the alternating direction method of multipliers(ADMM): Quadratic problems,”

IEEE Trans. Autom. Control , vol. 60,no. 3, pp. 644–658, Mar. 2015.[42] Z. Xu, M. Figueiredo, and T. Goldstein, “Adaptive ADMM with spectralpenalty parameter selection,” in

Proc. Artiﬁcial Intelligence and Statis-tics , Apr. 2017, pp. 718–727.[43] Y. Xu, M. Liu, Q. Lin, and T. Yang, “ADMM without a ﬁxed penaltyparameter: Faster convergence with new adaptive penalization,” in

Proc.Advances in Neural Information Processing Systems , 2017, pp. 1267–1277.[44] Y. Lin, B. Wohlberg, and V. Vesselinov, “ADMM penalty parameterselection with Krylov subspace recycling technique for sparse coding,”in

Proc. IEEE International Conference on Image Processing (ICIP) ,Sep. 2017, pp. 1945–1949.[45] D. L. Donoho, A. Maleki, and A. Montanari, “The noise-sensitivityphase transition in compressed sensing,”

IEEE Trans. Inf. Theory ,vol. 57, no. 10, pp. 6920–6941, Oct. 2011.[46] M. Bayati and A. Montanari, “The LASSO risk for Gaussian matrices,”

IEEE Trans. Inf. Theory , vol. 58, no. 4, pp. 1997–2017, Apr. 2012.[47] D. L. Donoho, I. Johnstone, and A. Montanari, “Accurate prediction ofphase transitions in compressed sensing via a connection to minimaxdenoising,”

IEEE Trans. Inf. Theory , vol. 59, no. 6, pp. 3396–3433, Jun.2013.[48] C. Thrampoulidis, S. Oymak, and B. Hassibi, “Regularized linear regres-sion: A precise analysis of the estimation error,” in

Proc. Conference onLearning Theory , Jun. 2015, pp. 1683–1709. [49] C. Thrampoulidis, E. Abbasi, and B. Hassibi, “Precise error analysis ofregularized M -estimators in high dimensions,” IEEE Trans. Inf. Theory ,vol. 64, no. 8, pp. 5592–5628, Aug. 2018.[50] I. B. Atitallah, C. Thrampoulidis, A. Kammoun, T. Y. Al-Naffouri,M. Alouini, and B. Hassibi, “The BOX-LASSO with application toGSSK modulation in massive MIMO systems,” in

Proc. IEEE Interna-tional Symposium on Information Theory (ISIT) , Jun. 2017, pp. 1082–1086.[51] C. Thrampoulidis, W. Xu, and B. Hassibi, “Symbol error rate perfor-mance of box-relaxation decoders in massive MIMO,”

IEEE Trans.Signal Process. , vol. 66, no. 13, pp. 3377–3392, Jul. 2018.[52] R. Hayakawa and K. Hayashi, “Asymptotic performance of discrete-valued vector reconstruction via box-constrained optimization with sumof (cid:96) regularizers,” IEEE Trans. Signal Process. , vol. 68, pp. 4320–4335,2020.[53] E. Abbasi, F. Salehi, and B. Hassibi, “Performance analysis of convexdata detection in MIMO,” in

Proc. IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP) , May 2019, pp.4554–4558.[54] P. H. Tan, L. K. Rasmussen, and T. J. Lim, “Constrained maximum-likelihood detection in CDMA,”

IEEE Trans. Commun. , vol. 49, no. 1,pp. 142–153, Jan. 2001.[55] M. Bayati, M. Lelarge, and A. Montanari, “Universality in polytopephase transitions and message passing algorithms,”

Ann. Appl. Probab. ,vol. 25, no. 2, pp. 753–822, Apr. 2015.[56] A. Panahi and B. Hassibi, “A universal analysis of large-scale regularizedleast squares solutions,” in

Proc. Advances in Neural InformationProcessing Systems , 2017, pp. 3381–3390.[57] S. Oymak and J. A. Tropp, “Universality laws for randomized dimensionreduction, with applications,”

Inf Inference , vol. 7, no. 3, pp. 337–446,Sep. 2018.[58] D. G. Luenberger and Y. Ye, “Basic Descent Methods,” in

Linear andNonlinear Programming , ser. International Series in Operations Research& Management Science. New York, NY: Springer US, 2008, pp. 215–262.[59] A. Yener, R. D. Yates, and S. Ulukus, “CDMA multiuser detection:A nonlinear programming approach,”

IEEE Trans. Commun. , vol. 50,no. 6, pp. 1016–1024, Jun. 2002.[60] L. Li, X. Wang, and G. Wang, “Alternating direction method of mul-tipliers for separable convex optimization of real functions in complexvariables,”

Math. Probl. Eng. , 2015.[61] R. Hayakawa and K. Hayashi, “Reconstruction of complex discrete-valued vector via convex optimization with sparse regularizers,”

IEEEAccess , vol. 6, pp. 66 499–66 512, 2018.[62] D. P´erez and Y. Quintana, “A survey on the Weierstrass approximationtheorem,”

Divulg. Matem´aticas , vol. 16, no. 1, pp. 231–247, 2008.PLACEPHOTOHERE