A simple bound on the BER of the MAP decoder for massive MIMO systems
AA SIMPLE BOUND ON THE BER OF THE MAP DECODERFOR MASSIVE MIMO SYSTEMS
Christos Thrampoulidis (cid:63) , Ilias Zadik † , Yury Polyanskiy † (cid:63) University of California, Santa Barbara, USA. † Massachusetts Institute of Technology, USA.
ABSTRACT
The deployment of massive MIMO systems has revived much of theinterest in the study of the large-system performance of multiuserdetection systems. In this paper, we prove a non-trivial upper boundon the bit-error rate (BER) of the MAP detector for BPSK signaltransmission and equal-power condition. In particular, our bound isapproximately tight at high-SNR. The proof is simple and relies onGordon’s comparison inequality. Interestingly, we show that underthe assumption that Gordon’s inequality is tight, the resulting BERprediction matches that of the replica method under the replica sym-metry (RS) ansatz. Also, we prove that, when the ratio of receiveto transmit antennas exceeds . , the replica prediction matchesthe matched filter lower bound (MFB) at high-SNR. We corroborateour results by numerical evidence. Index Terms — massive mimo, large-system analysis, JO detec-tor, Gaussian process inequalities, replica method.
1. INTRODUCTION
Massive multiple-input multiple-output (MIMO) systems, where thebase station is equipped with hundreds of thousands of antennas,promise improved spectral efficiency, coverage and range comparedto small-scale systems. As such, they are widely believed to playan important role in 5G wireless communication systems [1]. Theirdeployment has revived much of the recent interest for the study ofmultiuser detection schemes in high-dimensions, e.g., [2, 3, 4, 5].A large host of exact and heuristic detection schemes have beenproposed over the years. Decoders such as zero-forcing (ZF) andlinear minimum mean square error (LMMSE) have inferior perfor-mances [6], and others such as local neighborhood search-basedmethods [7] and lattice reduction-aided (LRA) decoders [8, 9] are of-ten difficult to precisely characterize. Recently, [10] studied in detailthe performance of the box-relaxation optimization (BRO), which isa natural convex relaxation of the maximum a posteriori (MAP) de-coder, and which allows one to recover the signal via efficient con-vex optimization followed by hard thresholding. In particular, [10]precisely quantifies the performance gain of the BRO compared tothe ZF and the LMMSE. Despite such gains, it remains unclear thedegree of sub-optimality of the convex relaxation compared to thecombinatorial MAP detector. The challenge lies in the complexityof analyzing the latter. In particular, known predictions of the perfor-mance of the MAP detector are known only via the (non-rigorous)replica method from statistical physics [11].In this paper, we derive a simple, yet non-trivial, upper bound onthe bit error rate (BER) of the MAP detector. We show (in a precise
This material is based upon work supported by the National ScienceFoundation under Grant No CCF-17-17842. manner) that our bound is approximately tight at high-SNR, sinceit is close to the matched filter lower bound (MFB). Our numeri-cal simulations verify our claims and further include comparisons tothe replica prediction and to the BER of the BRO. Our proof relieson Gordon’s Gaussian comparison inequality [12]. While Gordon’sinequality is not guaranteed to be tight, we make the following pos-sibly interesting and useful observation. If Gordon’s inequality wasasymptotically tight, then its BER prediction would match the pre-diction of the replica method (under replica-symmetry).
2. SETTING
We assume a real Gaussian wireless channel, additive Gaussian noiseand and uncoded modulation scheme. For concreteness, we focus onthe binary-phase-shift-keying (BPSK) transmission; but, the tech-niques naturally extend to other constellations. Formally, we seek torecover an n -dimensional BPSK vector x ∈ {± } n from the noisyMIMO relation y = Ax + σ z ∈ R m , where A ∈ R m × n is thechannel matrix (assumed to be known) with entries iid N (0 , /n ) .and z ∈ R m the noise vector with entries iid N (0 , . The normal-ization is such that the reciprocal of the noise variance σ is equal tothe SNR, i.e., SNR = 1 /σ . The performance metric of interest isthe bit-error rate (BER). The BER of a detector which outputs ˆ x as anestimate to x is formally defined as BER := n (cid:80) ni =1 { ˆ x i (cid:54) = x ,i } . In this paper, we study the BER of the MAP (also commonlyreferred to in this context as the jointly-optimal (JO) multiuser) de-tector, which is defined by ˆ x = arg min x ∈{± } n (cid:107) y − Ax (cid:107) . (1)We state our results in the large-system limit where m, n → ∞ ,while the ratio of receive to transmit antennas is maintained fix to δ = m/n It is well known that in the worst case, solving (1)is an NP-hard combinatorial optimization problem in the numberof users [13]. The BRO is a relaxation of (1) to an efficient con-vex quadratic program, namely ˆ x = sign (cid:0) arg min x ∈ [ − , n (cid:107) y − Ax (cid:107) (cid:1) . Its performance in the large-system limit has been recentlyanalyzed in [10]. Regarding the performance of (1), Tse and Verdu[14] have shown that the BER approaches zero at high-SNR. Beyondthat, there is a now long literature that studied (1) using the replicamethod, developed in the field of spin-glasses. The use of the methodin the context of multiuser detection was pioneered by Tanaka [11]and several extensions have followed up since then [15, 16]. Thereplica method has the remarkable ability to yield highly nontrivialpredictions, which in certain problem instances they can been for-mally shown to be correct (e.g., [17, 18, 19]). However, it is stilllacking a complete rigorous mathematical justification. The proof of our main result Theorem 3.1 reveals that a non-asymptoticbound is also possible with only slight more effort. a r X i v : . [ c s . I T ] M a r . RESULTS3.1. Upper bound This section contains our main result: a simple upper bound onthe BER of (1). First, we introduce some useful notation. We saythat an event E ( n ) holds with probability approaching 1 (wpa 1) if lim n → Pr( E ( n )) = 1 . Let X n a sequence of random variables in-dexed by n and X some constant. We write X n P = X and X n P ≤ X ,if for all (cid:15) > the events {| X n − X | ≤ (cid:15) } and { X n ≤ X + (cid:15) } hold wpa 1. Finally, let φ ( x ) = √ π e − x / , Q ( x ) = (cid:82) ∞ x φ ( τ )d τ the Gaussian tail function and Q − its inverse. Theorem 3.1.
Fix constant noise variance σ > and δ > . Let BER denote the bit-error-rate of the MAP detector in (1) for fixedbut unknown x ∈ {± } n . Define the function (cid:96) ( θ ) : (0 , → R : (cid:96) ( θ ) := √ δ (cid:112) θ + σ − (cid:114) π e − ( Q − θ ))22 , (2) and let θ ∈ (0 , be the largest solution to the equation (cid:96) ( θ ) = σ √ δ . Then, in the limit of m, n → ∞ , mn = δ , it holds BER P ≤ θ . Propositions A.1 and A.2 in the Appendix gathers several usefulproperties of the function (cid:96) . Notice that (cid:96) (1 − ) > (cid:96) (0 + ) = √ δσ .Also, (cid:96) is continuous and (cid:96) (cid:48) (0 + ) < . Thus, θ in Theorem 3.1is well-defined. Moreover, we show in Proposition A.2(i) that if δ > . or σ > . , then θ is the unique solution of theequation (cid:96) ( θ ) = σ √ δ in (0 , . Remark (cid:96) ( θ ) ) . Let us elaborate on the opera-tional role of the function (cid:96) . We partition the feasible vectors x ∈{± } n according to their Hamming distance from the true vector x .Specifically, for θ ∈ [0 , let S θ := { x ∈ {± } n : (cid:107) x − x (cid:107) = θn } and consider the optimal cost of (1) for each partition, i.e., c (cid:63) ( θ ) := min x ∈S θ √ n (cid:107) y − Ax (cid:107) . (3)Evaluating the BER of (1) is of course closely related to understand-ing the typical behavior of c (cid:63) ( θ ) in the large system limit. The proofof the theorem in Section 3.3 shows that (cid:96) ( θ ) is a high-probabilitylower bound on c (cid:63) ( θ ) . Hence, we get an estimate on the BER viastudying (cid:96) ( θ ) instead. In this direction, note that the value σ √ δ ,to which (cid:96) ( θ ) is compared to, is nothing but the typical value of c (cid:63) (0) = (cid:107) y − Ax (cid:107) √ n = (cid:107) z (cid:107) √ n . Finally, we make the following notefor later reference: the value inf θ ∈ (0 , (cid:96) ( θ ) is a high-probabilitylower bound to the optimal cost of (1), i.e., to c (cid:63) = inf θ ∈ (0 , c (cid:63) ( θ ) .An illustration of these is included in Figure 2. Remark . A lower bound on the BER of(1) can be obtained easily via comparison to the idealistic matchedfilter bound (MFB), where one assumes that all n − , but 1, bits of x are known. In particular, the MFB corresponds to the probabilityof error in detecting (say) x , ∈ {± } from (cid:101) y = x , a + z ,where (cid:101) y = y − (cid:80) ni =2 x ,i a i is assumed known, and a i is the i th column of A (eqv., the MFB is the error probability of an isolatedtransmission of only the first bit over the channel). It can be shown(e.g., [10]) that the MFB is given by Q ( √ δ SNR) . Combining thiswith (a straightforward re-parametrization of) Theorem 3.1 it followsthat the BER of (1) satisfies Q ( √ δ SNR) ≤ BER ≤ Q ( τ ) , (4)where τ ∈ R is the smallest solution to the equation √ δ SNR +2 φ ( τ ) = √ δ SNR (cid:112) Q ( τ ) . -5 -4 -3 -2 -1 -6 -5 -4 -3 -2 -1 Fig. 1 : Plots of the function (cid:96) ( θ ) defined in (2) for two probleminstances: ( δ = 1 , SNR = 5 dB), ( δ = 1 , SNR = 10 dB). Alsodepicted the value of θ for each instance (see Theorem 3.1). Fig. 2 : The function (cid:96) ( θ ) (in red) is a high-probability lower boundon the typical value of c (cid:63) ( θ ) (in dashed blue) defined in (3). SeeRemark 1. Remark . In Proposition A.2(iv) we provethat at high values of
SNR (cid:29) : θ → . Thus, from Theorem 3.1we have that BER approaches zero (thus, providing an alternativeproof to the corresponding result in [14]). This thinking confirmsalready that our upper bound is non-trivial. In fact, an even strongerstatement can be shown, namely, at
SNR (cid:29) : θ ≈ Q ( √ δ SNR − η ) for an arbitrarily small η > (see Proposition A.2(iv) for exactstatement). This, when combined with the MFB in (4) shows thatour upper bound is approximately tight at high-SNR. Remark . The proof of Theo-rem 3.1 uses Gordon’s comparison inequality for Gaussian processes(also known as the Gaussian min-max Theorem (GMT)). In essence,the GMT provides a simple lower bound on the typical value of c (cid:63) ( θ ) in (3) in the large-system limit. Gordon’s inequality is classicallyused to establish (non)-asymptotic probabilistic lower bounds on theminimum singular value of Gaussian matrices [20], and has a num-ber of other applications in high-dimensional convex geometry [21].In general, the inequality is not tight. Recently, Stojnic [22] provedthat the inequality is tight when applied to convex problems. Theresult was refined in [23] and has been successfully exploited to pre-cisely analyze the BER of the BRO [10]. Unfortunately, the mini-mization in (3) is not convex, thus there are no immediate tightnessguarantees regarding the lower bound (cid:96) ( θ ) . Interestingly, in Section3.4 we show that if GMT was (asymptotically) tight then it wouldresult in a prediction that matches the replica prediction in [24]. Remark . The replica prediction on the BER of(1) is given by [11] (based on the ansatz of replica-symmetry (RS))as the solution to a system of nonlinear equations. It is reported in[24, Eqn. (15)] that as long as δ is not too small, the saddle-pointequations reduce to the solution of the following fixed-point equa- SNR (dB) -12 -10 -8 -6 -4 -2 MFBReplicaTheorem 2.1BRO (a) δ = 1 SNR (dB) -15 -10 -5 MFBReplicaTheorem 2.1BRO (b) δ = 1 . Fig. 3 : BER curve as a function of the SNR (in dB) for the fol-lowing: matched-filter lower bound (MFB) (cf. Remark 2); replicaprediction corresponding to (5)); upper bound of Theorem 3.1 for(1); box-relaxation optimization (BRO) [10].tion : θ = Q (cid:0)(cid:114) δσ + 4 θ (cid:1) . (5)Onwards, we refer to (5) as the replica-symmetry prediction. InProposition A.1, we prove that equation (5) has either one or threesolutions. In the later case, the BER formula can exhibit compli-cated behavior, such as anomalous, non-monotonic dependence onthe SNR [11]. On the other hand, the solution is unique if either δ > . or σ ≥ . . This proves the numerical observa-tions reported in [25, Fig. 3]. Finally, Proposition A.2(iii) shows thatwhen δ > . and SNR (cid:29) , the unique solution of (5) satisfies θ (cid:63) ≈ Q ( √ δ SNR) . This suggests that at high-SNR, the BER of the(1) decreases at an optimal rate.
Figure 3 includes numerical illustrations that help visualize the pre-diction of Theorem 3.1 and several of the remarks that followed.For two values of δ , we plot BER as a function of SNR = 1 /σ .Each plot includes four curves: (i) the MFB; (ii) the solution to (5) For the reader’s convenience we note the following mapping betweennotation here and [24]: δ ↔ α , σ ↔ β − s and BER ↔ (1 − m ) / . corresponding to the replica prediction; (iii) the upper bound θ ofTheorem 3.1; (iv) the BER of the BRO according to [10, Thm. II.I].We make several observations. First, it is interesting to note that ourupper bound follows the same trend as the replica prediction. Forexample, note the kink at values of SNR ∼ dB in both the curvesin Figure 3a. Second, note that the upper bound of Theorem 3.1 ap-proaches the MFB at high-SNR confirming our theoretical findingsin Remark 3. Also, as predicted in Remark 5, the solution θ (cid:63) to (5)goes to zero exactly at the rate of the MFB. Finally, let us comparethe upper bound θ of Theorem 3.1 to the BER of the BRO. At lowSNR, θ takes values larger than the latter. We remark that Theorem3.1 is not entirely to be blamed for this behavior, since the replicaprediction experiences the very same one. There is no contradic-tion here: the MAP detector is not optimal for minimizing the BER(e.g., [11, Sec. 2]), thus it is likely that its convex approximation(aka, the BRO) shows better BER performance at low-SNR. On theother hand, for high-SNR our upper bound takes values significantlysmaller than the BER of the BRO. This proves that at high-SNR thelatter is still quite far from that of the combinatorial optimization ittries to approximate. Let ˆ x be the solution to (1). First, observe that (cid:107) ˆ x − x (cid:107) = 2 n − n − (cid:80) ni =1 { ˆ x i (cid:54) = x ,i } ) = 4 n BER . Hence, we will prove that (cid:107) ˆ x − x (cid:107) √ n P ≤ α =: 2 √ θ ∈ (0 , . (6)Second, due to rotational invariance of the Gaussian measure we canassume without loss of generality that x = + . For convenience,define the (normalized) error vector w := n − / ( x − ) and con-sider the set of feasible such vectors that do not satisfy (6), i.e., S ( α ) := (cid:8) w ∈ {− / √ n, } n : (cid:107) w (cid:107) ≥ α + (cid:15) (cid:9) , for some fixed (but arbitrary) (cid:15) > . Also, denote the (normalized)objective function of (1) as F ( w ) = F ( w ; z , G ) := n − / (cid:107) z − Gw (cid:107) , where G = √ n A has entries iid standard normal. Withthis notation, our goal towards establishing (6) is proving that thereexists constant η := η ( (cid:15) ) > such that the following holds wpa 1, min w ∈S ( α ) F ( w ) ≥ min w ∈{− / √ n, } n F ( w ) + η. (7)Our strategy in showing the above is as follows.First, we use Gordon’s inequality to obtain a high-probabilitylower bound on the left-hand side (LHS) of (7). In particular, it canbe shown (see for example [10, Sec. D.3]) that the primary opti-mization (PO) in the (LHS) of (7) can be lower bounded with high-probability by an auxiliary optimization (AO) problem, which is de-fined as follows: min w ∈S ( α ) G ( w ; g , h ) := (cid:113) (cid:107) w (cid:107) + σ (cid:107) g (cid:107) − h T w , (8)where g ∈ R m and h ∈ R n have entries iid Gaussian N (0 , /n ) .Specifically, the following statement holds for all c ∈ R : Pr (cid:0) min w ∈S ( α ) F ( w ; z , G ) ≤ c (cid:1) ≤ (cid:0) min w ∈S ( α ) G ( w ; g , h ) ≤ c (cid:1) . (9)The AO can be easily simplified as follows min ≥ α ≥ α + (cid:15) (cid:112) α + σ (cid:107) g (cid:107) − √ n ( α / n (cid:88) i =1 h ↓ i , (10)here, h ↓ ≥ h ↓ ≥ . . . ≥ h ↓ n denotes the ordered statistics of theentries of h and we have used the fact that for w ∈ {− / √ n, } n itholds (cid:107) w (cid:107) = α ⇔ (cid:107) w (cid:107) = α / . Furthermore, note that (cid:107) g (cid:107) P = √ δ and for any fixed θ ∈ (0 ,
1) : √ n (cid:80) θni =1 h ↓ i P = φ (cid:0) Q − ( θ ) (cid:1) . Thus, the objective function in (10) converges in probability, point-wise on α , to (cid:96) ( α / (cf. (2)). In fact, since the minimizationin (10) is over a compact set, uniform convergence holds and theminimum value converges to min ≥ α ≥ α + (cid:15) (cid:96) ( α / . Combiningthe above, shows that for all η > the following event holds wpa 1: min w ∈S ( α ) G ( w ; g , h ) ≥ min ≥ α ≥ α + (cid:15) (cid:96) (cid:0) α / (cid:1) − η. (11)Hence, from (9) the above statement holds with G ( w ; g , h ) replacedby F ( w ; z , G ) .Next, we obtain a simple upper bound on the RHS in (7): min w ∈{− / √ n, } n F ( w ) ≤ F ( ) = (cid:107) z (cid:107) √ n , (12)which we combine with the fact that wpa 1 it holds (cid:107) z (cid:107) / √ n ≤√ δ σ + η .Combining the two displays in (11) and (12), we have shownthat (7) holds as long as there exists η > such that min ≥ α ≥ α + (cid:15) (cid:96) (cid:0) α / (cid:1) ≥ √ δ σ + 3 η. (13)At this point, recall that α / θ and the definition of θ as thelargest solution to the equation (cid:96) ( θ ) = √ δ σ . By this definitionand the fact that (cid:96) ( θ ) is continuous and satisfies (cid:96) (1 − ) > √ δ σ wehave that (cid:96) ( θ ) > √ δ for all θ > θ . Thus, there always exist η ( (cid:15) ) satisfying (13) and the proof is complete. Inspecting the proof of Theorem 3.1 reveals two possible explana-tions for why the resulting upper bound might be loose. First, recallthat we obtain a lower bound in the LHS of (7) via Gordon’s inequal-ity. As mentioned, in Remark 4 the inequality is not guaranteed to betight in this instance. Second, recall that in upper bounding the RHSof (7) we use the crude bound (12). Specifically, we upper boundthe optimal cost c (cid:63) of the MAP in (1) simply by the value of theobjective function at a known feasible solution, namely x = x .In this section, we make the following leap of faith. We assumethat inf θ ∈ (0 , (cid:96) ( θ ) is an asymptotically tight high-probability lowerbound of c (cid:63) , i.e., for all η > wpa 1: min x ∈{± } n √ n (cid:107) y − Ax (cid:107) ≤ inf θ ∈ (0 , (cid:96) ( θ ) + η. (14)Assuming (14) is true and repeating the arguments of Section3.3 leads to the following conclusion: the BER of the MAP detec-tor is upper bounded by θ (cid:63) = arg min θ ∈ (0 , (cid:96) ( θ ) . This can be alsobe expressed as the solution to the fixed-point equation (cid:96) (cid:48) ( θ (cid:63) ) = 0 .Interestingly, this is shown in Proposition A.1(i) to be equivalentto (5). In other words, under the assumption above, Gordon’s pre-diction on the BER of the MAP detector coincides with the replicaprediction under the RS ansatz. While it is known that the MAPdetector exhibits replica symmetry breaking (RSB) behavior [25],we believe that our observation on a possible connection betweenGordon’s inequality and the replica symmetric prediction is worthexploring further. Let γ i iid ∼ N (0 , and θ ∈ (0 , . Then, for large n : n (cid:80) θni =1 γ ↓ i ≈ n (cid:80) { i : γ i ≥ Q − ( θ ) } h i ≈ E [ γ | γ ≥ Q − ( θ )] = φ (cid:0) ( Q − ( θ ) (cid:1) .
4. CONCLUSION
In this paper, we prove a simple yet highly non-trivial upper boundon the BER of the MAP detector in the case of BPSK signals and ofequal-power condition. Theorem 3.1 naturally extends to allow forother constellation types (such as M-PAM) and power control andit also enjoys a non-asymptotic version. Perhaps more challenging,but certainly of interest, is the extension of our results to complexGaussian channels. Also, we wish to develop a deeper understand-ing of the connection between Gordon’s inequality and the replica-symmetric prediction.
Acknowledgement
The authors would like to thank Dr. Tom Richardson for sharingwith us his proof of Proposition A.1. In particular, Proposition A.1as it appears here is a refined version of what appeared in our initialsubmission.
A. PROPERTIES OF (cid:96) ( θ ) Let (cid:96) : (0 , → R and θ be defined as in Theorem 3.1. The proofsof the propositions below are deferred to Appendices B.1 and B.2. Proposition A.1.
The following statements are true:(i) θ ∈ (0 , is a critical point of (cid:96) if and only if it solves (5) . Allcritical points belong in (0 , ) .(ii) (cid:96) has either one or three critical points.(iii) (cid:96) has a unique critical point if either one of the following twoholds: δ > . or σ > . . Proposition A.2. If (cid:96) has a unique critical point (see Prop. A.1(iii)),then the following are true:(i) θ is the unique solution of the equation (cid:96) ( θ ) = σ √ δ in (0 , .(ii) The unique solution of (5) is the unique θ (cid:63) = arg min θ (cid:96) ( θ ) .Moreover, if δ > . it holds that:(iii) The unique solution θ ∗ = θ ∗ ( σ ) of (5) satisfies θ ∗ Q (cid:0) √ δ/σ (cid:1) → ,in the limit of σ → .(iv) For η > , θ = θ ( σ ) satisfies lim sup σ → θ Q (cid:16) √ δσ − η (cid:17) ≤ . B. APPENDIXB.1. Proof of Proposition A.1
Setting u := Q − ( θ ) , we will equivalently study the critical pointsof the function ˜ (cid:96) ( u ) := (cid:96) ( Q − ( θ )) = √ δ (cid:112) Q ( u ) + σ − φ ( u ) . By simple algebra, ˜ (cid:96) (cid:48) ( u ) = − Q (cid:48) ( u ) (cid:0) u − √ δ (4 Q ( u ) + σ ) − / (cid:1) , where Q (cid:48) ( u ) = − φ ( u ) . Clearly, ˜ (cid:96) (cid:48) ( u ) < for all u ≤ . Thus, allcritical points of ˜ (cid:96) are in (0 , + ∞ ) . Now, let us define F ( u ) := u (4 Q ( u ) + σ ) . (15)ote that for u > (cid:96) (cid:48) ( u ) = 0 ⇔ F ( u ) = δ. (16)Using the transformation θ = Q ( u ) and simple algebra shows thatthe equation on the RHS of (16) is identical to (5). This provesstatement (i) of the proposition.Next, we study the function F ( u ) . It can be shown that F (cid:48) ( u ) =2 u (cid:0) G ( u ) + σ ) , where we define G ( u ) := 4 Q ( u ) + 2 uQ (cid:48) ( u ) . (17)By differentiating G , setting the derivative equal to zero, and usingthe identity Q (cid:48)(cid:48) ( u ) = − uQ (cid:48) ( u ) , it can be shown that G is decreasingin [0 , √ and increasing in [ √ , + ∞ ) . In particular, ∀ u > G ( u ) ≥ G ( √ ≈ − . . (18)Thus, for σ ≥ . > − G ( √ , it holds F (cid:48) ( u ) > for all u > . Thus, F is increasing, which implies (cf. (16)) that ˜ (cid:96) has aunique critical point. Moreover, for σ ∈ (cid:2) , G ( √ (cid:1) , the equation F (cid:48) ( u ) = 0 has two solution. Thus, F has exactly two critical points,which we denote by u A and u B , onwards.From the above properties of F , we conclude that F is increas-ing in [0 , u A ] , decreasing in [ u A , u B ] and increasing in [ u B , + ∞ ] .Moreover, u A ≤ √ . Thus, for σ ∈ (cid:2) , G ( √ (cid:1) : ˜ (cid:96) has three crit-ical points if δ ∈ [ F ( u A ) , F ( u B )] and one critical point, otherwise.This proves statement (ii) of the proposition.Next, note that if δ > F ( u A ) then ˜ (cid:96) has a unique critical point.We will show that F ( u A ) ≤ . , thus establishing statement (iii)of the proposition. Using the fact that G ( u A ) + σ = 0 , it followsthat F ( u A ) = − u A Q (cid:48) ( u A ) . Now, setting H ( u ) = − u Q (cid:48) ( u ) , itcan be readily shown by studying the derivative H (cid:48) ( u ) , that max u> − u Q (cid:48) ( u ) ≤ H ( √ ≈ . < . , (19)as desired. This concludes the proof of the proposition. B.2. Proof of Proposition A.2 (i) We can continuously extend (cid:96) to include the endpoints of theinterval [0 , . Note that (cid:96) (0) = (cid:96) (0 + ) = σ √ δ . For the shakecontradiction, suppose that there exists < θ < θ such that (cid:96) ( θ ) = σ √ δ . Then, by Rolle’s theorem (cid:96) would have two distinctcritical points, which contradicts the hypothesis.(ii) For all θ ∈ (0 , , we have (cid:96) (cid:48) ( θ ) = √ δ √ θ + σ − Q − ( θ ) . Asfor θ approaching , Q − ( θ ) approaches + ∞ , we can conclude thatfor θ sufficiently small, (cid:96) (cid:48) ( θ ) < . Similarly, as for θ approaching , Q − ( θ ) approaches −∞ , we conclude that for θ sufficiently closeto 1, (cid:96) (cid:48) ( θ ) > . Given that (cid:96) (cid:48) ( θ ) = 0 has a unique solution θ c weconclude that θ < θ c implies (cid:96) (cid:48) ( θ ) < and θ > θ c , (cid:96) (cid:48) ( θ ) > . Inparticular, θ c is the global minimum of (cid:96) , i.e. θ c = θ ∗ .(iii) We first establish that θ ∗ → , as σ → . To see this, con-sider by contradiction a limiting point, θ L ∈ (0 , / of the function θ ∗ ( σ ) . By (5) it must be true by taking limits, θ L = Q ( (cid:113) δ θ L ) . Thus, setting u L = Q − ( θ L ) , it holds F ( u L ) = δ , where we de-fined the function F ( u ) := 4 u Q ( u ) for u ≥ . To conclude witha contradiction, we prove next that max u ≥ F ( u ) < δ. (20) Note that F (cid:48) ( u ) = 2 uG ( u ) , where G is defined in (17). Let u A bethe solution of G ( u ) = 0 . From the proof of Proposition A.1, it isknown that u A is unique and maximizes F ( u ) . Moreover, it is eas-ily seen that F ( u A ) = − u A Q (cid:48) ( u A ) . Thus, max u ≥ F ( u ) ≤ F ( u A ) ≤ . . . . . where the last inequality was establishedin (19). This shows (20).Now by mean value theorem, for some θ T ∈ (0 , θ ∗ ) , θ ∗ − Q (cid:32) √ δσ (cid:33) = Q (cid:32) √ δ √ σ + 4 θ ∗ (cid:33) − Q (cid:32) √ δ √ σ (cid:33) = (cid:34) Q (cid:32) √ δ √ σ + 4 θ T (cid:33)(cid:35) (cid:48) θ ∗ = 2 √ δ (4 θ T + σ ) Q (cid:48) (cid:32) √ δ √ σ + 4 θ T (cid:33) θ ∗ = √ √ δ √ π (4 θ T + σ ) e − δ σ θT ) θ ∗ . Therefore, lim sup σ → (cid:12)(cid:12)(cid:12) − Q (cid:0) √ δσ (cid:1) θ ∗ (cid:12)(cid:12)(cid:12) ≤ lim sup σ → √ √ δ √ π (4 θ T + σ ) e − δ σ θT ) . (21)Since < θ T < θ ∗ we know that θ T also goes to zero as σ goes tozero. In particular σ + 4 θ T goes to zero and as δ is fixed, ((21))implies the desired result.(vi) By statement (i) of the proposition, θ is the unique solutionin (0 , of (cid:96) ( θ ) = σ √ δ = (cid:96) (0 + ) . Now we have θ > satisfies (cid:96) ( θ ) = σ √ δ , or, √ δ (cid:112) θ + σ − φ (cid:0) Q − ( θ ) (cid:1) = √ δσ. (22)We first prove that θ → , as σ → . Indeed, if not, suppose θ N > is a positive limiting point of θ as σ goes to zero. Then(22) implies L ( θ N ) = 0 for L ( θ ) := √ δ √ θ − φ (cid:0) Q − ( θ ) (cid:1) .Since L (0) = 0 by Rolle’s theorem we have for some θ L ∈ (0 , , L (cid:48) ( θ L ) = 0 which gives θ L = Q ( (cid:113) δ θ L ) . This leads to a contra-diction, exactly as in the proof of statement (iii) above.Now by (22) by rearranging and φ the Gaussian density, we have √ δ (cid:16)(cid:112) θ + σ − σ (cid:17) = φ ( Q − ( θ )) . Taylor expansion around and the fact that θ = o (1) give φ ( Q − ( θ )) = 2 Q − ( θ ) θ + o ( θ ) . Hence we have √ δ θ (cid:16)(cid:112) θ + σ − σ (cid:17) = Q − ( θ ) + o (1) or by simple algebra √ δ √ θ + σ + σ = Q − ( θ ) + o (1) . Now since η > , we conclude that for σ sufficiently large, √ δ √ θ + σ + σ − η < Q − ( θ ) , or θ < Q (cid:32) √ δ √ θ + σ + σ − η (cid:33) . (23)inally, by (23 and the mean value theorem we have that for some θ T ∈ (0 , θ ) , θ − Q (cid:32) √ δσ − η (cid:33) < Q (cid:32) √ δ √ θ + σ + σ − η (cid:33) − Q (cid:32) √ δ √ σ − η (cid:33) = (cid:34) Q (cid:32) √ δ √ θ T + σ + σ − η (cid:33)(cid:35) (cid:48) θ = 2 √ δ (4 θ T + σ − η ) Q (cid:48) (cid:32) √ δ √ σ + 4 θ T − η (cid:33) θ = √ √ δ √ π (4 θ T + σ − η ) e − (cid:32) √ δ √ σ θT − η (cid:33) / θ . Therefore, − lim inf σ → Q (cid:16) √ δσ − η (cid:17) θ ≤ lim sup σ → √ √ δ √ π (4 θ T + σ − η ) e − (cid:32) √ δ √ σ θT − η (cid:33) / . (24)Since < θ T < θ we know that θ T also goes to zero as σ goesto zero. In particular σ + 4 θ T goes to zero and as δ is fixed, (24)implies the desired result. B.3. Tanaka’s equations
For the reader’s convenience we repeat Tanaka’s fixed-point (FP)equations [11, Eqn. (43)] using our notation. This is based on thefollowing mapping between notation here and [11]: δ ↔ β − , σ ↔ B − = σ /β , and BER ↔ (1 − m ) / . − (cid:90) tanh( √ F z + E ) φ ( z )d zq = (cid:90) tanh ( √ F z + E ) φ ( z )d zE = δB B (1 − q ) F = δB ( σ + 4 BER + q − B (1 − q )) , where for the MAP decoder one needs to solve the equations abovefor B → ∞ . In this case, − (cid:90) tanh( √ F z + E ) φ ( z )d zq = (cid:90) tanh ( √ F z + E ) φ ( z )d zE = δ (1 − q ) F = δ ( σ + 4BER + q − − q ) ⇔ √ F = E (cid:114) σ + 4 BER + q − δ , (25)It is empirically observed in [11, Sec. V.A] that for δ > / . , theFP equations (25) have a unique solution. Now, consider (5). Let δ > . such that (5) has a unique solution θ (cid:63) . Then, it is not hardto see that the quadruple (cid:0) m → − θ (cid:63) , q → , E → + ∞ , F → + ∞ (cid:1) satisfies (25). Indeed, denoting c := c (BER) = (cid:114) σ + 4 BER δ , note that for q → : √ F = c E . Next, using that tanh (cid:0) E ( c z + 1) (cid:1) → { z ≥− c − } − { z ≤− c − } as E → + ∞ , shows that the second equation in (25) is consistentwith q → . Finally, the first equation becomes − z ≥ − c − ) − Pr( z ≤ − c − )= 1 − z ≤ − c − ) = 1 − Q ( c − ) , which agrees with (5), as desired. C. REFERENCES [1] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta,“Massive mimo for next generation wireless systems,”
IEEEcommunications magazine , vol. 52, no. 2, pp. 186–195, 2014.[2] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy andspectral efficiency of very large multiuser MIMO systems,”
Communications, IEEE Transactions on , vol. 61, no. 4, pp.1436–1449, 2013.[3] C.-K. Wen, J.-C. Chen, K.-K. Wong, and P. Ting, “Messagepassing algorithm for distributed downlink regularized zero-forcing beamforming with cooperative base stations,”
WirelessCommunications, IEEE Transactions on , vol. 13, no. 5, pp.2920–2930, 2014.[4] T. L. Narasimhan and A. Chockalingam, “Channel hardening-exploiting message passing (chemp) receiver in large-scalemimo systems,”
Selected Topics in Signal Processing, IEEEJournal of , vol. 8, no. 5, pp. 847–860, 2014.[5] J. Charles, G. Ramina, M. Arian, and S. Christoph, “Optimal-ity of large mimo detection via approximate message passing,”in
Information Theory (ISIT), 2015 IEEE International Sympo-sium on . IEEE, 2015.[6] S. Verdu,
Multiuser detection . Cambridge university press,1998.[7] K. V. Vardhan, S. K. Mohammed, A. Chockalingam, and B. S.Rajan, “A low-complexity detector for large mimo systems andmulticarrier cdma systems,”
IEEE Journal on Selected Areas inCommunications , vol. 26, no. 3, pp. 473–485, 2008.[8] C. Windpassinger and R. F. Fischer, “Low-complexity near-maximum-likelihood detection and precoding for mimo sys-tems using lattice reduction,” in
Information Theory Workshop,2003. Proceedings. 2003 IEEE . IEEE, pp. 345–348.[9] Q. Zhou and X. Ma, “Element-based lattice reduction algo-rithms for large mimo detection,”
IEEE Journal on SelectedAreas in Communications , vol. 31, no. 2, pp. 274–286, 2013.[10] C. Thrampoulidis, W. Xu, and B. Hassibi, “Symbol error rateperformance of box-relaxation decoders in massive mimo,”
IEEE Transactions on Signal Processing , vol. 66, no. 13, pp.3377–3392, 2018.11] T. Tanaka, “A statistical-mechanics approach to large-systemanalysis of cdma multiuser detectors,”
IEEE Transactions onInformation theory , vol. 48, no. 11, pp. 2888–2910, 2002.[12] Y. Gordon,
On Milman’s inequality and random subspaceswhich escape through a mesh in R n . Springer, 1988.[13] S. Verd´u, “Computational complexity of optimum multiuserdetection,” Algorithmica , vol. 4, no. 1-4, pp. 303–312, 1989.[14] D. N. C. Tse and S. Verd´u, “Optimum asymptotic multiuserefficiency of randomly spread cdma,”
IEEE Transactions onInformation Theory , vol. 46, no. 7, pp. 2718–2722, 2000.[15] D. Guo and S. Verd´u, “Randomly spread cdma: Asymptoticsvia statistical physics,”
IEEE Transactions on Information The-ory , vol. 51, no. 6, pp. 1983–2010, 2005.[16] G. Caire, R. R. Muller, and T. Tanaka, “Iterative multiuserjoint decoding: Optimal power allocation and low-complexityimplementation,”
IEEE Transactions on Information Theory ,vol. 50, no. 9, pp. 1950–1973, 2004.[17] M. Talagrand,
Spin glasses: a challenge for mathematicians:cavity and mean field models . Springer Science & BusinessMedia, 2003, vol. 46.[18] G. Reeves and H. D. Pfister, “The replica-symmetric predictionfor compressed sensing with gaussian matrices is exact,” in
In-formation Theory (ISIT), 2016 IEEE International Symposiumon . IEEE, 2016, pp. 665–669.[19] J. Barbier, M. Dia, N. Macris, and F. Krzakala, “The mutualinformation in random linear estimation,” in
Communication,Control, and Computing (Allerton), 2016 54th Annual AllertonConference on . IEEE, 2016, pp. 625–632.[20] K. R. Davidson and S. J. Szarek, “Local operator theory, ran-dom matrices and banach spaces,”
Handbook of the geometryof Banach spaces , vol. 1, no. 317-366, p. 131, 2001.[21] M. Ledoux and M. Talagrand,
Probability in Banach Spaces:isoperimetry and processes . Springer, 1991, vol. 23.[22] M. Stojnic, “A framework to characterize performance of lassoalgorithms,” arXiv preprint arXiv:1303.7291 , 2013.[23] C. Thrampoulidis, S. Oymak, and B. Hassibi, “Regularizedlinear regression: A precise analysis of the estimation error,”in
Proceedings of The 28th Conference on Learning Theory,2015 .[24] T. Tanaka, “Analysis of bit error probability of direct-sequencecdma multiuser demodulators,” in
Advances in Neural Infor-mation Processing Systems , 2001, pp. 315–321.[25] ——, “Statistical mechanics of cdma multiuser demodulation,”