BER Analysis of the box relaxation for BPSK Signal Recovery
Christos Thrampoulidis, Ehsan Abbasi, Weiyu Xu, Babak Hassibi
BBER ANALYSIS OF THE BOX RELAXATION FOR BPSK SIGNAL RECOVERY
Christos Thrampoulidis (cid:63) , Ehsan Abbasi (cid:63) , Weiyu Xu † , Babak Hassibi (cid:63)(cid:63) Department of Electrical Engeeniring, Caltech, Pasadena, USA † Department of ECE, University of Iowa
ABSTRACT
We study the problem of recovering an n -dimensional vec-tor of {± } n (BPSK) signals from m noise corrupted mea-surements y = Ax + z . In particular, we consider the boxrelaxation method which relaxes the discrete set {± } n tothe convex set [ − , n to obtain a convex optimization algo-rithm followed by hard thresholding. When the noise z andmeasurement matrix A have iid standard normal entries, weobtain an exact expression for the bit-wise probability of error P e in the limit of n and m growing and mn fixed. At high SNRour result shows that the P e of box relaxation is within dB ofthe matched filter bound (MFB) for square systems, and thatit approaches (MFB) as m grows large compared to n . Ourresults also indicates that as m, n → ∞ , for any fixed set ofsize k , the error events of the corresponding k bits in the boxrelaxation method are independent. Index Terms — BER Analysis, Box Relaxation, BPSK,Matched Filter Bound, Maximum Likelihood Decoder
1. INTRODUCTION
The problem of recovering an unknown BPSK vector froma set of noise corrupted linearly related measurements arisesin numerous applications, such as Massive MIMO [1, 2, 3,4]. As a result, a large host of exact and heuristic optimiza-tion algorithms have been proposed. Exact algorithms, suchas sphere decoding and its variants, become computationallyprohibitive as the problem dimension grows. Heuristic algo-rithms such as zero-forcing, MMSE, decision-feedback, etc.,[5, 6, 7] have inferior performances that are often difficult toprecisely characterize.One popular heuristic is the so called ”Box Relaxation”which replaces the discrete set {± } n with the convex set [ − , n [8]. This allows one to recover the signal via con-vex optimization followed by hard thresholding. Despite itspopularity, very little is known about the performance of thismethod. In this paper, we exactly characterize its bit-wise er-ror probability in the regime of large dimensions and underGaussian assumptions.The remainder of the paper is organized as follows. Somebackground and the formal problem definition are given in therest of this section. The main result and a detailed discussion follows in Section 2. An outline of the proof is the subject ofSection 3. Finally, the paper concludes in Section 4. Our goal is to recover an n -dimensional BPSK vector x ∈{± } n from the noisy multiple-input multiple output (MIMO)relation y = Ax + z ∈ R m , where A ∈ R m × n is the MIMOchannel matrix (assumed to be known) and z ∈ R m is thenoise vector. We assume that A has entries iid N (0 , /n ) and z has entries iid N (0 , σ ) . The normalization is suchthat the reciprocal of the noise variance σ is equal to theSignal-to-Noise Ratio, i.e. SNR = 1 /σ . The Maximum-Likelihood (ML) decoder . The ML decoderwhich maximizes the probability of error (assuming the x ,i are equally likely) is given by min x ∈{± } n (cid:107) y − Ax (cid:107) . Solv-ing the above, is often computationally intractable, especiallywhen n is large, and therefore a variety of heuristics have beenproposed (zero-forcing, mmse, decision-feedback, etc.) [9]. Box Relaxation Optimization . The heuristic we shall use,we refer to it as Box Relaxation Optimization (BRO). It con-sists of two steps. The first one involves solving a convexrelaxation of the ML algorithm, where x ∈ {± } n is re-laxed to x ∈ [ − , n . The output of the optimization ishard-thresholded in the second step to produce the final bi-nary estimate. Formally, the algorithm outputs an estimate x ∗ of x given as ˆ x = arg min − ≤ x i ≤ (cid:107) y − Ax (cid:107) , (1a) x ∗ = sign(ˆ x ) , (1b)where the sign function returns the sign of its input and actselement-wise on input vectors. Bit error probability . We evaluate the performance of thedetection algorithm by the bit error probability P e , defined asthe expectation of the Bit Error Rate BER . Formally,
BER := 1 n n (cid:88) i =1 { x ∗ i (cid:54) = x ,i } , (2a) P e := E [ BER ] = 1 n n (cid:88) i =1 Pr ( x ∗ i (cid:54) = x ,i ) . (2b) a r X i v : . [ c s . I T ] O c t . MAIN RESULT Our main result analyzes the P e of the (BRO) in (1). We as-sume a large-system limit where m, n → ∞ at a proportionalrate δ . The SNR is assumed constant; in particular, it doesnot scale with n . Let Q ( · ) denote the Q-function associatedwith the standard normal density p ( h ) = √ π e − h / . Theorem 2.1 ( P e of the (BRO) ) . Let P e denote the bit errorprobability of the detection scheme in (1) for some fixed butunknown BPSK signal x ∈ {± } n . For constant SNR and mn → δ ∈ ( , ∞ ) , it holds: lim n →∞ P e = Q (1 /τ ∗ ) , where τ ∗ is the unique solution to min τ> τ (cid:18) δ − (cid:19) + 1 / SNR2 τ − τ (cid:90) ∞ τ (cid:18) h + 2 τ (cid:19) p ( h )d h. (3)Theorem 2.1 derives a precise formula for the bit errorprobability of the (BRO). The formula involves solving a con-vex and deterministic minimization problem in (3). We out-line the proof in Section 3. First, a few remarks are in place. τ ∗ . It can be shown that the objective functionof (3) is strictly convex when δ > . When δ < , it iswell known that even the noiseless box relaxation fails [10].(In fact, δ = is the recovery threshold for this convex re-laxation.) Thus, (3) has a unique solution τ ∗ . Observe thatthe problem parameters δ and SNR appear explicitly in (3);naturally then τ ∗ is indeed a function of those. The minimiza-tion in (3) can be efficiently solved numerically. In addition,owing to the strict convexity of the objective function, τ ∗ canbe equivalently expressed as the unique solution to the corre-sponding first order optimality conditions. Numerical illustration . Figure 1 illustrates the accuracy ofthe prediction of Theorem 2.1. Note that although the theo-rem requires n → ∞ , the prediction is already accurate for n ranging on a few hundreds. Analysis of convex algorithms . We are able to predict the P e of the detection algorithm in (1) by analyzing the performanceof the convex algorithm in (1a). These type of convex algo-rithms, which minimize a least-squares function of the resid-ual y − Ax subject to a (typically non-smooth) constraint on x , are often referred to in the (statistics and signal-processing)literature as LASSO-type algorithms. When the performanceof such algorithms is measured in terms of the square-error (cid:107) ˆ x − x (cid:107) , the recent line of works [11, 12, 13, 14] has led toprecise results and a clear understanding of its asymptotic be-havior. The analysis of these works builds upon the ConvexGaussian Min-max Theorem (CGMT) [14, Thm. 1], whichis an extension of a classical Guassian comparison inequalitydue to Gordon [15]. Of interest to us is not calculating the squared-error of (1a) but rather the P e . Thus, we manage toextend the precise analysis of the LASSO-type algorithm be-yond the squared-error. To prove our result we require a slightgeneralization of the CGMT as it appears in [14]. P e at high-SNR . It can be shown that when SNR (cid:29) , then τ ∗ = 1 / (cid:112) ( δ − / SN R . This can be intuitively under-stood as follows: at high-SNR, we expect τ ∗ to be going tozero (correspondingly P e to be small). When this is the case,the last term in (3) is negligible; then, τ ∗ is the solution to min τ> τ (cid:0) δ − (cid:1) + / SNR2 τ which gives the derided result.Hence, for SNR (cid:29) , lim n →∞ P e ≈ Q ( (cid:112) ( δ − / · SNR) . (4)In Figure 2 we have plotted this high-SNR expression for the log ( P e ) vs its exact value as predicted by Theorem 2.1. It isinteresting to observe that the former is actually a very goodapproximation to the latter even for small practical values ofSNR. The range of SNR values for which the approximationis valid becomes larger with increasing δ . Heuristically, for δ > . the expression in (4) is a good proxy for the trueprobability of error at practical SNR values. Comparison to the matched filter bound . Theorem 2.1gives us a handle on the P e of (BRO) in (1) and thereforeallows to evaluate its practical performance. Here, we com-pare the performance to an idealistic case, where all n − ,but , bits of x are known to us. As is customary in the field,we refer to the bit error probability of this case as the matchedfilter bound (MFB) and denote it by P MF Be . The (MFB) cor-responds to the probability of error in detecting (say) x ,n ∈{± } from: ˜ y = x ,n a n + z , where ˜ y = y − (cid:80) n − i =1 x ,i a i is assumed known, and, a i denotes the i th column of A . TheML estimate is just the sign of the projection of the vector ˜ y to the direction of a n . Without loss of generality assume that x ,n = 1 . Then, the output of the matched filter becomes sign( ˜ X ) , where ˜ X = (cid:107) a n (cid:107) + σ ν, where ν ∼ N (0 , .When n → ∞ , (cid:107) a n (cid:107) P −→ δ . Hence, with probability one, lim n →∞ P MF Be = lim n →∞ P ( ˜ X <
0) = Q ( √ δ · SNR) . (5)A direct comparison of (5) to (4) shows that at high-SNR,the performance of the (BRO) is
10 log δδ − / dB off that ofthe (MFB) . In particular, in the square case ( δ = 1 ), wherethe number of receive and transmit antennas are the same, the(BRO) is 3dB off the (MFB) . When the number of receiveantennas is much larger, i.e. when δ → ∞ , then the perfor-mance of the (BRO) approaches the (MFB) .
3. PROOF OUTLINE
For simplicity, we write (cid:107) · (cid:107) for the (cid:96) -norm. The error vector . It is convenient to re-write (1a) by chang-ing the variable to the error vector w := x − x : We use P −→ to denote convergence in probability with n → ∞ . ( SNR ) -4 -2 0 2 4 6 8 10 12 14 P e -5 -4 -3 -2 -1 SimulationTheory = 1 = 0 . Fig. 1 : BER Performance of the Boxed Relaxation: P e as a function of SNR for different values of the ration δ = (cid:100) m/n (cid:101) . The theoretical predic-tion follows from Theorem 2.1. For the simulations, we used n = 512 . Thedata are averages over 20 independent realizations of the channel matrix andof the noise vector for each value of the SNR . ˆ w := min − ≤ w i ≤ (cid:107) z − Aw (cid:107) . (6)Without loss of generality we assume for the analysis that x = n = (1 , , . . . , . Then, we can write (2a) in terms ofthe error vector w as: BER = n (cid:80) ni =1 { ˆ w i ≤− } . The CGMT . The fundamental tool behind our analysis is theConvex Gaussian Min-max Theorem (CGMT) [14]; CGMTbuilds upon a classical result due to Gordon [15] . It asso-ciates with a primary optimization (PO) problem a simpli-fied auxiliary optimization (AO) problem from which we cantightly infer properties of the original (PO), such as the op-timal cost, the optimal solution, etc.. The idea of combiningthe GMT with convexity is attributed to Stojnic [11]. Thram-poulidis, Oymak and Hassibi built and significantly extendedon this idea arriving at the CGMT as it appears in [14, Thm. 3]For the ease of reference we repeat a statement of the CGMThere; in this generality the theorem appears in [19]. Considerthe following two min-max problems Φ( G ) := min w ∈S w max u ∈S u u T Gw + ψ ( w , u ) , (7a) φ ( g , h ) := min w ∈S w max u ∈S u (cid:107) w (cid:107) g T u − (cid:107) u (cid:107) h T w + ψ ( w , u ) , (7b)where G ∈ R m × n , g ∈ R m , h ∈ R n , S w ⊂ R n , S u ⊂ R m and ψ : R n × R m → R . Denote w Φ := w Φ ( G ) and w φ := w φ ( g , h ) any optimal minimizers in (7a) and (7b),respectively. Gordon’s original result is classically used to establish non-asymptoticprobabilistic lower bounds on the minimum singular value of Gaussian ma-trices (e.g. [16]), and has a number of other applications in high-dimensionalconvex geometry (e.g. [17, 18]).
10 log ( SNR ) P e -9 -8 -7 -6 -5 -4 -3 -2 -1 BRO (Thm. 2.1)BRO (High-SNR)MFB = 1 = 0 . Fig. 2 : Bit error probability of the Box Relaxation Optimization (BRO) in(1) in comparison to the Matched Filter Bound (MFB) for δ = 0 . (dashedlines) and δ = 1 (solid lines). The red curves follow the formula of Thm. 2.1,the green ones correspond to (4), and, P MFBe of (5) is in blue.
Theorem 3.1 (CGMT) . In (7) , let S w , S u be convex and com-pact sets, ψ be continuous and convex-concave on S w × S u ,and, G , g and h all have entries iid standard normal. Let S be an arbitrary open subset of S w and S c = S w / S . Denote φ S c ( g , h ) the optimal cost of the optimization in (7b) , whenthe minimization over w is now constrained over w ∈ S c . Ifin the limit of n → ∞ both φ ( g , h ) and φ S c ( g , h ) convergein probability, and, lim n →∞ Pr( w φ ∈ S ) = 1 , then, it alsoholds lim n →∞ Pr( w Φ ∈ S ) = 1 . Identifying the (PO) and the (AO) . Using the CGMT forthe analysis of the P e , requires as a first step expressing theoptimization in (1a) in the form of a (PO) as it appears in (7a).It is easy to see that (6) is equivalent to min − ≤ w i ≤ max (cid:107) u (cid:107)≤ u T Aw − u T z . (8)Observe that the constraint sets above are both convex andcompact; also, the objective function is convex in w and con-cave in u . Hence, according to the CGMT we can perform theanalysis of the BER for the corresponding (AO) problem in-stead, which becomes (note the normalization to account forthe variance of the entries of A ) √ n min − ≤ w i ≤ max (cid:107) u (cid:107)≤ ( (cid:107) w (cid:107) g − √ n z ) T u − (cid:107) u (cid:107) h T w . (9)We refer to the optimization in (9) as the (AO) problem. Computing the
BER via the (AO) . Call ˜ w the optimal so-lution of the (AO). Fix any (cid:15) > and consider the set S = { v : (cid:12)(cid:12)(cid:12) n n (cid:88) i =1 { v i ≤− } − Q (1 /τ ∗ ) (cid:12)(cid:12)(cid:12) < (cid:15) } , (10)where τ ∗ is defined in the statement of Theorem 2.1. We willapply Theorem 3.1 for the above set S . In particular, we showhat (i) the (AO) in (9) converges in probability (after propernormalization with n ), and, (ii) ˜ w ∈ S with probability one.These will suffice to conclude that ˆ w ∈ S with probabilityone, which would complete the proof of Theorem 2.1. Simplifying the (AO) . We begin by simplifying the (AO)problem as it appears in (9). First, since both g and z have entries iid Gaussian, then, (cid:107) w (cid:107) g − √ n z has entriesiid N (0 , (cid:112) (cid:107) w (cid:107) + nσ ) . Hence, for our purposes and us-ing some abuse of notation so that g continues to denotea vector with iid standard normal entries, the first termin (9) can be treated as (cid:112) (cid:107) w (cid:107) + nσ g T u , instead. Asa next step, fix the norm of u to say (cid:107) u (cid:107) = β . Opti-mizing over its direction is now straightforward, and gives min − ≤ w i ≤ max ≤ β ≤ β √ n (cid:16)(cid:112) (cid:107) w (cid:107) + nσ (cid:107) g (cid:107) − h T w (cid:17) . In fact, it is easy to now optimize over β as well; its optimalvalue is if the term in the parenthesis is non-negative, and, is0 otherwise. With this, the (AO) simplifies to the following: (cid:0) min − ≤ w i ≤ (cid:114) (cid:107) w (cid:107) n + σ (cid:107) g (cid:107) − √ n h T w (cid:1) + , where we defined ( χ ) + := max { χ, } . To facilitate the opti-mization over w , we express the term in the square-root in avariational form, using √ χ = min τ> τ + χ τ . With this trick,the minimization over the entries of w becomes separable: min τ ≥ τ (cid:107) g (cid:107) σ (cid:107) g (cid:107) τ + n (cid:88) i =1 min − ≤ w i ≤ (cid:107) g (cid:107) τ n w i − h i √ n w i . Then, the optimal ˜ w i satisfies ˜ w i = , if h i ≥ , τ √ n (cid:107) g (cid:107) h i , if − (cid:107) g (cid:107) τ √ n ≤ h i < , − , if h i < − (cid:107) g (cid:107) τ √ n . (11)where, τ is the solution to the following: (cid:32) min τ> τ (cid:107) g (cid:107) σ (cid:107) g (cid:107) τ + 1 √ n n (cid:88) i =1 υ ( τ ; h i , (cid:107) g (cid:107) ) (cid:33) + , (12) υ ( τ ; h i , (cid:107) g (cid:107) ) := , if h i ≥ , − τ √ n (cid:107) g (cid:107) h i , if − (cid:107) g (cid:107) τ √ n ≤ h i < , (cid:107) g (cid:107) τ √ n + 2 h i , if h i ≤ − (cid:107) g (cid:107) τ √ n . Convergence of the (AO) . Now that the (AO) is simplifiedas in (12), we can get a handle on the limiting behaviorof the optimization itself as well as of the optimal ˜ w . Butfirst, we need to properly normalize the (AO) by dividingthe objective in (12) by √ n . Also, for convenience, re-define τ := τ √ δ . By the WLLN, we have (cid:107) g (cid:107)√ n P −→ √ δ ,and, for all τ > , n (cid:80) ni =1 υ ( τ ; h i , (cid:107) g (cid:107) ) P −→ Y ( τ ) := − τ (cid:82) τ h p ( h )d h + τ Q (cid:0) τ (cid:1) + 2 (cid:82) ∞ τ hp ( h )d h. With these we can evaluate the point-wise (in τ ) limit of the objectivefunction in (12). Next, we use the fact that the objective isconvex in τ and Lemma [20, Cor.. II.1], to conclude thatthe convergence is indeed uniform in τ . Hence, the randomoptimization in (12) converges to the following deterministicoptimization min τ> τδ + σ τ + Y ( τ ) ; some algebra showsthat the latter is the same as (3). If δ > / , then, it canbe shown via differentiation that the objective function of itis strictly convex. Also, it is nonnegative; thus, the entireexpression in (12), which is nothing but the (AO) problem westarted with, converges in probability to (3). What is more,using [21, Thm. 2.7] it can be shown that the optimal τ ∗ ( g , h ) of the (AO) converges in probability to the unique optimalsolution τ ∗ of (3). This is crucial for the final step of theproof. Proving ˜ w ∈ S . Recall the definition in (10). We prove that n (cid:80) ni =1 { ˜ w i ≤− } P −→ Q (1 /τ ∗ ) . From (11), { ˜ w i ≤− } = { h i ≤− (cid:107) g (cid:107)√ n √ δτ } . Recall, (cid:107) g (cid:107) / √ n P −→ √ δ and τ P −→ τ ∗ . Con-ditioning on those high probability events it can be shown that n (cid:80) ni =1 { h i ≤− (cid:107) g (cid:107)√ n √ δτ } P −→ n (cid:80) ni =1 { h i ≤− τ ∗ } P −→ Q ( τ ∗ ) .
4. DISCUSSION AND CONCLUSION
In this paper we have used the CGMT framework of [14] toprecisely compute the P e of the box relaxation method (BRO)to recover BPSK signals in MIMO systems. At high SNR weobtain P e = Q ( (cid:112) ( δ − / SN R ) , compared to the Mathcedfilter bound (MFB), Q ( √ δSN R ) . As the interested readermay observe and expect, similar results can be achieved forhigher order constellations (m-PAM, m-QAM, m-PSK, etc.).However, we shall leave the detailed calculations for anotheroccasion.In the proof outline, we made use of the set S = { v : (cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ( v i ≤− − Q ( 1 τ ∗ ) (cid:12)(cid:12)(cid:12) < (cid:15) } , to establish that the (AO) and (PO) have the same expectedBER. A study of our analysis of the (AO) reveals that errorevents for each of the bits in the (AO) are iid . This means thatif, for constant k , we define the set: S ∗ k = { v : (cid:12)(cid:12)(cid:12) (cid:0) nk (cid:1) (cid:88) T ∈{ ,...,n }| T | = k ( v i ≤− ,..., v ik ≤− − Q k ( 1 τ ∗ ) (cid:12)(cid:12)(cid:12) < (cid:15) } then lim n →∞ P { w φ ∈ S ∗ k } = 1 . By Thm. 2.1, this implies lim n →∞ P { w Φ ∈ S ∗ k } = 1 , which means that error eventsfor any fixed k bits in the (PO) are also iid. This fact hassignificant consequences. For example, it implies that, whena block of data is in error, only a few of its bits are. Thismeans that the output of the (BRO) can be used by variouslocal methods to further reduce the BER. We shall explainthis in future work. eferences [1] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energyand spectral efficiency of very large multiuser mimo sys-tems,” Communications, IEEE Transactions on , vol. 61,no. 4, pp. 1436–1449, 2013.[2] C.-K. Wen, J.-C. Chen, K.-K. Wong, and P. Ting, “Mes-sage passing algorithm for distributed downlink regu-larized zero-forcing beamforming with cooperative basestations,”
Wireless Communications, IEEE Transactionson , vol. 13, no. 5, pp. 2920–2930, 2014.[3] T. L. Narasimhan and A. Chockalingam, “Channelhardening-exploiting message passing (chemp) receiverin large-scale mimo systems,”
Selected Topics in SignalProcessing, IEEE Journal of , vol. 8, no. 5, pp. 847–860,2014.[4] J. Charles, G. Ramina, M. Arian, and S. Christoph, “Op-timality of large mimo detection via approximate mes-sage passing,” in
Information Theory (ISIT), 2015 IEEEInternational Symposium on . IEEE, 2015.[5] M. Gr¨otschel, L. Lov´asz, and A. Schrijver,
Geometricalgorithms and combinatorial optimization . SpringerScience & Business Media, 2012, vol. 2.[6] G. J. Foschini, “Layered space-time architecture forwireless communication in a fading environment whenusing multi-element antennas,”
Bell labs technical jour-nal , vol. 1, no. 2, pp. 41–59, 1996.[7] B. Hassibi and H. Vikalo, “On the sphere-decodingalgorithm i. expected complexity,”
Signal Processing,IEEE Transactions on , vol. 53, no. 8, pp. 2806–2818,2005.[8] S. Boyd and L. Vandenberghe,
Convex optimization .Cambridge university press, 2009.[9] S. Verdu,
Multiuser detection . Cambridge universitypress, 1998.[10] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S.Willsky, “The convex geometry of linear inverse prob-lems,”
Foundations of Computational Mathematics ,vol. 12, no. 6, pp. 805–849, 2012.[11] M. Stojnic, “A framework to characterize performanceof lasso algorithms,” arXiv preprint arXiv:1303.7291 ,2013.[12] S. Oymak, C. Thrampoulidis, and B. Hassibi, “Thesquared-error of generalized lasso: A precise analysis,” arXiv preprint arXiv:1311.0830 , 2013. [13] C. Thrampoulidis, A. Panahi, D. Guo, and B. Has-sibi, “Precise error analysis of the lasso,” in , 2015.[14] C. Thrampoulidis, S. Oymak, and B. Hassibi, “Regular-ized linear regression: A precise analysis of the estima-tion error,” in
Proceedings of The 28th Conference onLearning Theory , 2015, pp. 1683–1709.[15] Y. Gordon, “Some inequalities for gaussian processesand applications,”
Israel Journal of Mathematics ,vol. 50, no. 4, pp. 265–289, 1985.[16] Y. Plan and R. Vershynin, “The generalized lassowith non-linear observations,” arXiv preprintarXiv:1502.04071 , 2015.[17] Y. Gordon,
On Milman’s inequality and random sub-spaces which escape through a mesh in R n . Springer,1988.[18] M. Ledoux and M. Talagrand, Probability in BanachSpaces: isoperimetry and processes . Springer, 1991,vol. 23.[19] C. Thrampoulidis, E. Abbasi, and B. Hassibi, “Pre-cise high-dimensional error analysis of regularized m-estimators under gaussian measurement designs,” in
Communication, Control, and Computing (Allerton),2015 53rd Annual Allerton Conference on . IEEE, 2015.[20] P. K. Andersen and R. D. Gill, “Cox’s regression modelfor counting processes: a large sample study,”
The an-nals of statistics , pp. 1100–1120, 1982.[21] W. K. Newey and D. McFadden, “Large sample estima-tion and hypothesis testing,”