[PDF] Channel Hardening-Exploiting Message Passing (CHEMP) Receiver in Large-Scale MIMO Systems

Abstract

In this paper, we propose a MIMO receiver algorithm that exploits {\em channel hardening} that occurs in large MIMO channels. Channel hardening refers to the phenomenon where the off-diagonal terms of the H H H matrix become increasingly weaker compared to the diagonal terms as the size of the channel gain matrix H increases. Specifically, we propose a message passing detection (MPD) algorithm which works with the real-valued matched filtered received vector (whose signal term becomes H T Hx , where x is the transmitted vector), and uses a Gaussian approximation on the off-diagonal terms of the H T H matrix. We also propose a simple estimation scheme which directly obtains an estimate of H T H (instead of an estimate of H ), which is used as an effective channel estimate in the MPD algorithm. We refer to this receiver as the {\em channel hardening-exploiting message passing (CHEMP)} receiver. The proposed CHEMP receiver achieves very good performance in large-scale MIMO systems (e.g., in systems with 16 to 128 uplink users and 128 base station antennas). For the considered large MIMO settings, the complexity of the proposed MPD algorithm is almost the same as or less than that of the minimum mean square error (MMSE) detection. This is because the MPD algorithm does not need a matrix inversion. It also achieves a significantly better performance compared to MMSE and other message passing detection algorithms using MMSE estimate of H . We also present a convergence analysis of the proposed MPD algorithm. Further, we design optimized irregular low density parity check (LDPC) codes specific to the considered large MIMO channel and the CHEMP receiver through EXIT chart matching. The LDPC codes thus obtained achieve improved coded bit error rate performance compared to off-the-shelf irregular LDPC codes.

Full PDF

CChannel Hardening-Exploiting Message Passing (CHEMP)Receiver in Large-Scale MIMO Systems

T. Lakshmi Narasimhan and A. ChockalingamDepartment of ECE, Indian Institute of Science, Bangalore

Abstract —In this paper, we propose a multiple-input multiple-output (MIMO) receiver algorithm that exploits channel hard-ening that occurs in large MIMO channels. Channel hardeningrefers to the phenomenon where the off-diagonal terms of the H H H matrix become increasingly weaker compared to the di-agonal terms as the size of the channel gain matrix H increases.Speciﬁcally, we propose a message passing detection (MPD)algorithm which works with the real-valued matched ﬁlteredreceived vector (whose signal term becomes H T Hx , where x isthe transmitted vector), and uses a Gaussian approximation onthe off-diagonal terms of the H T H matrix. We also propose asimple estimation scheme which directly obtains an estimate of H T H (instead of an estimate of H ), which is used as an effectivechannel estimate in the MPD algorithm. We refer to this receiveras the channel hardening-exploiting message passing (CHEMP) receiver. The proposed CHEMP receiver achieves very goodperformance in large-scale MIMO systems (e.g., in systems with16 to 128 uplink users and 128 base station antennas). For theconsidered large MIMO settings, the complexity of the proposedMPD algorithm is almost the same as or less than that ofthe minimum mean square error (MMSE) detection. This isbecause the MPD algorithm does not need a matrix inversion.It also achieves a signiﬁcantly better performance compared toMMSE and other message passing detection algorithms usingMMSE estimate of H . We also present a convergence analysisof the proposed MPD algorithm. Further, we design optimizedirregular low density parity check (LDPC) codes speciﬁc tothe considered large MIMO channel and the CHEMP receiverthrough EXIT chart matching. The LDPC codes thus obtainedachieve improved coded bit error rate performance comparedto off-the-shelf irregular LDPC codes. Keywords – Large-scale MIMO systems, channel hardening, mes-sage passing, detection, channel estimation, decoding.

I. I

NTRODUCTION

Wireless communication systems using multiple-inputmultiple-output (MIMO) conﬁgurations with a large num-ber of antennas have attracted a lot of research attention[1],[2],[3],[4]. These systems can achieve high spectral andpower efﬁciencies. An emerging architecture for large-scalemultiuser MIMO communications is one where each basestation (BS) is equipped with a large number of antennasand the user terminals are equipped with one antenna each.A key requirement on the uplink (user terminal to BS link)in such large-scale MIMO systems is to achieve reducedchannel estimation, detection and decoding complexities atthe BS receiver to enable practical implementation, whilemaintaining good performance. When the number of BSantennas is much larger than the number of uplink users(i.e., low system loading factors), linear detectors like the minimum mean square error (MMSE) detector are good interms of both complexity and performance [5]. In the recentyears, several low complexity detection algorithms whichachieve near-optimal performance in large dimensions usingcomplexities comparable to that of MMSE detection havebeen proposed [1],[2],[6]-[16]. These algorithms are based onlocal search (cid:0) e.g., likelihood ascent search (LAS) algorithmand variants in [1],[2],[6],[7] (cid:1) , meta-heuristics (cid:0) e.g., reactivetabu search (RTS) and variants in [8],[9] (cid:1) , message passingtechniques (cid:0) e.g., belief propagation (BP) based algorithms in[11],[12] (cid:1) , lattice reduction techniques (cid:0) e.g., lattice reduction(LR) aided detectors in [13],[14] (cid:1) , and Monte-Carlo samplingtechniques (cid:0) e.g., Markov chain Monte Carlo (MCMC) algo-rithms in [15] (cid:1) . Issues related channel estimation and lowdensity parity check codes for large-scale MIMO systemsare also being addressed [17],[18].Message passing on graphical models is a promising low-complexity high-performance approach for signal processingin large dimensions [19]. Decoding of turbo codes andLDPC codes, and equalization/detection [20]-[22] are pop-ular examples of the use of message passing algorithmsin communications. In [11], a MIMO detection algorithmbased on approximate message passing on a factor graphis presented. The message passing algorithm in [12] uses adifferent approach. It obtains a tree that approximates thefully-connected MIMO graph and performs message passingon this tree.In this this paper, we propose a promising low-complexityreceiver for large-scale MIMO systems. The receiver is basedon message passing. The novelty in the proposed receiver liesin the exploitation of the ‘channel hardening’ phenomenonthat occurs in large MIMO channels [23],[24],[25],[26].Channel hardening refers to the phenomenon where the off-diagonal terms of the H T H matrix become increasinglyweaker compared to the diagonal terms as the size of thechannel gain matrix H increases. We exploit this for thepurposes of detection and channel estimation. The proposedreceiver, referred to as the channel hardening-exploiting mes-sage passing (CHEMP) receiver, consists of two components;a message passing detection (MPD) algorithm and an estima-tion scheme to obtain an estimate of H T H . The highlights ofour contributions in this paper can be summarized as follows: • proposal of the MPD algorithm which works with thereal-valued matched ﬁltered received vector, and usesa Gaussian approximation on the off-diagonal terms of a r X i v : . [ c s . I T ] J a n he H T H matrix. • proposal of a simple estimation scheme which directlyobtains an estimate of H T H (instead of an estimate of H ), which is used as an effective channel estimate inthe MPD algorithm. • less than the MMSE detection complexity (becausematrix inversion is not needed in the MPD algorithm). • signiﬁcantly better performance compared to MMSEand other message passing detection algorithms whichuse MMSE estimate of H . • convergence analysis of the MPD algorithm whichproves the existence of a ﬁxed point in the MPDalgorithm. • analysis of the mean square difference of the log-likelihood ratios (LLRs) in the proposed receiver withperfect and estimated channel state information (CSI). • design of optimized irregular LDPC codes speciﬁc tothe considered large MIMO channel and the CHEMPreceiver through EXIT chart matching.The rest of the paper is organized as follows. The systemmodel and the channel hardening phenomenon are describedin Section II. The proposed CHEMP receiver, and its per-formance and complexity are presented in Section III. Ananalysis of the CHEMP receiver is presented in Section IV.Section V presents an extension to higher-order QAM. Thedesign and performance of LDPC codes matched to the largeMIMO channel and the CHEMP receiver are presented inSection VI. Conclusions are presented in Section VII.II. S YSTEM M ODEL

Consider a large-scale multiuser MIMO system where K uplink users, each transmitting with a single antenna, commu-nicate with a BS having a large number of receive antennas.Let N denote the number of BS antennas; N is in the range oftens to hundreds. The ratio α = K/N is the system loadingfactor. We consider α ≤ (i.e., K ≤ N ). The system modelis illustrated in Fig. 1. Each user encodes a sequence of k information bits to a sequence of n coded symbols usingan LDPC code of code rate R = k/n . The encoded bitsare modulated and transmitted. Let A denote the modulationalphabet. The transmission of one LDPC code block requires n/ (log | A | ) channel uses.Let H ( t ) c ∈ C N × K denote the channel gain matrix in the t th channel use and H cij denote the complex channel gainfrom the j th user to the i th BS antenna. The channel gains H cij s are assumed to be independent Gaussian with zero meanand variance σ j , such that (cid:80) j σ j = K . The σ j models theimbalance in the received power from user j due to path lossetc., and σ j = 1 corresponds to the case of perfect powercontrol. Let x ( t ) c ∈ A K denote the modulated symbol vectortransmitted in the t th channel use, where the j th element of x ( t ) c denotes the modulation symbol transmitted by the j thuser. Assuming perfect synchronization, the received vector Base Station . . . . . . . . . . . . . . . . .

User 1User 3User 2 (Tens to hundreds of large MIMO channel receive antennas) N Distributed

User K Fig. 1. Large-scale multiuser MIMO system model on the uplink. at the BS in the t th channel use, y ( t ) c , is given by y ( t ) c = H ( t ) c x ( t ) c + w ( t ) c , (1)where w ( t ) c is the noise vector. Dropping the channel useindex for convenience, (1) can be written in the real domainas y = Hx + w , (2)where H (cid:44) (cid:20) (cid:60) ( H c ) − (cid:61) ( H c ) (cid:61) ( H c ) (cid:60) ( H c ) (cid:21) , y (cid:44) (cid:20) (cid:60) ( y c ) (cid:61) ( y c ) (cid:21) , x (cid:44) (cid:20) (cid:60) ( x c ) (cid:61) ( x c ) (cid:21) , w (cid:44) (cid:20) (cid:60) ( w c ) (cid:61) ( w c ) (cid:21) , (cid:60) ( . ) and (cid:61) ( . ) denote the real and imaginary parts, respec-tively. Note that H ∈ R N × K , y ∈ R N , w ∈ R N ,and x ∈ R K . For a QAM alphabet A , the elements of x will take values from the underlying PAM alphabet B ,i.e., x ∈ B K . The elements of w are modeled as i.i.d. N (0 , σ n ) . The average received SNR per receive antennais given by γ = KE s σ n , where E s is the average energy ofthe transmitted symbols. For the real-valued system modelin (2), the maximum-likelihood (ML) detection rule is givenby ˆ x = argmin x ∈ B K ( y − Hx ) T ( y − Hx ) . (3)When the transmitted bits are equally likely, then the MLdecision rule is same as the maximum a posteriori probability(MAP) decision rule, given by ˆ x = argmax x ∈ B K Pr( x | y , H ) . (4)The exact computation of (3) and (4) requires exponentialcomplexity in K . Message passing algorithms can provideapproximate marginalization of the joint distribution in (4)at low complexities. In Section III, we propose such amessage passing algorithm, whose novelty lies in exploitingthe channel hardening phenomenon that happens in large2IMO channels. The channel hardening effect in largeMIMO channels is described in the following subsection. A. Channel hardening in large MIMO channels

Channel hardening refers to the phenomenon where thevariance of the mutual information of the MIMO channelgrows very slowly relative to its mean or even shrink as thenumber of antennas grows [23]. Consider a n r × n t MIMOchannel. As n r and n t are increased keeping their ratio ﬁxed,the distribution of the singular values of the MIMO channelmatrix becomes less sensitive to the actual distribution ofthe entries of the channel matrix (as long as the entriesare i.i.d.) [24]. This is a result of the Marˇcenko-Pastur law[25], which states that if the entries of a n r × n t matrix H are zero mean i.i.d. with variance /n r , then the empiricaldistribution of the eigenvalues of H H H converges almostsurely, as n r , n t → ∞ with n t /n r = α , to a density function[26] p α ( x ) = (cid:16) − α (cid:17) + δ ( x ) + (cid:112) ( x − a ) + ( b − x ) + παx , (5)where ( x ) + = max( x, , a = (1 − √ α ) , and b = (1 + √ α ) . An effect of the Marˇcenko-Pastur law is that very tallor very wide matrices are very well conditioned. The lawalso implies that the channel “hardens”, i.e., the eigenvaluehistogram of a single realization converges to the averageasymptotic eigenvalue distribution.Channel hardening can bring in several advantages in largedimensional signal processing. For example, linear detectionin large systems will require inversion of large matrices.Inversion of large random matrices can be done fast usingseries expansion techniques [27],[28],[29]. Because of chan-nel hardening, approximate matrix inversions using seriesexpansion and deterministic approximations from limitingdistribution become effective in large dimensions.An interesting aspect in channel hardening is that as the sizeof H increases, the off-diagonal terms of the H H H matrixbecome increasingly weaker compared to the diagonal terms,i.e., H H H n r → I n t for n r , n t → ∞ with n t /n r = α . Thisphenomenon is pictorially illustrated in Fig. 2, where we haveplotted H T H for the real-valued channel model in (2) for × , × , × , and × channels. In proposing thenew receiver algorithm in the next section, we will work withapproximations to the off-diagonal terms of the H T H matrixand estimates of H T H , which are found to achieve very goodperformance in large dimensions at low complexities.III. T HE P ROPOSED

CHEMP R

ECEIVER

In this section, we present the proposed CHEMP receiver.The proposed CHEMP receiver has two main components: a message passing based detection (MPD) algorithm, and In practice, the channel matrix in a multiuser system with tens of single-antenna users and hundreds of BS antennas will become a very tall matrixon the uplink, and a very wide matrix on the downlink. Fig. 2. Magnitude plots of H T H for × , × , × , and × MIMO channels. a scheme to estimate H T H . The proposed MPD algorithmworks with the real-valued matched ﬁltered received vector(whose signal term becomes H T Hx ), and uses a Gaussianapproximation on the off-diagonal terms of the H T H matrix.Before we describe the proposed MPD algorithm, we statethe following lemma which will be used in the developmentand analysis of the detection algorithm. Lemma 1.

Let X i and Y i be Gaussian random variableswith zero mean and variance σ x and σ y , respectively. Let Z i (cid:44) X i Y i and Z (cid:44) n n (cid:80) i =1 Z i . • When X i and Y i are independent, E Z i = 0 and E Z i = σ x σ y . Then by central limit theorem, for large n , Z ∼N (0 , σ x σ y n ) . When X i and Y i are i.i.d., Z ∼ N (0 , σ x n ) . • When X i = Y i , Z is a χ random variable of degree n . E Z = σ x and Var ( Z ) = σ x n .A. Proposed MPD algorithm Consider the real-valued system model in (2). We consider4-QAM modulation in this section, i.e., B = {± } . We willextend the algorithm to higher-order QAM in Section V.Performing matched ﬁlter operation on y , we have H T y = H T Hx + H T w . (6)From (6), we write the following: z = Jx + v , (7)where z (cid:44) H T y N , J (cid:44) H T H N , v (cid:44) H T w N . (8)3he i th element of z can be written as z i = J ii x i + K (cid:88) j =1 ,j (cid:54) = i J ij x j + v i (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) g i , (9)where J ij is the element in the i th row and j th column of J , x i is the i th element of x , and v i = N (cid:88) j =1 H ji w j N (10)is the i th element of v , where H ji is the ( j, i ) th elementof H . Note that the variable g i deﬁned in (9) denotes theinterference-plus-noise term, which involves the off-diagonalelements of H T H N (i.e., J ij , i (cid:54) = j ). We approximate the g i term to have a a Gaussian distribution with mean µ i andvariance σ i , i.e., the distribution of g i is approximated as N ( µ i , σ i ) . By central limit theorem, this approximation isaccurate for large K , N . The mean and variance in thisapproximation are given by µ i = E ( g i ) = K (cid:88) j =1 ,j (cid:54) = i J ij E ( x j ) (11) σ i = Var ( g i ) = K (cid:88) j =1 ,j (cid:54) = i J ij Var ( x j ) + σ v . (12)Denoting the probability of the symbol x j as p j , we have E ( x j ) = (2 p j − , Var ( x j ) = 4 p j (1 − p j ) . (13)Also, note that by Lemma 1, σ v = σ n N . Because of the aboveGaussian approximation, the a posteriori probability (APP) ofthe symbol x i can be written as p i = Pr( x i | z i , J ) ∝ exp (cid:16) − σ i ( z i − J ii x i − µ i ) (cid:17) . (14)From (14), the log-likelihood ratio (LLR) of x i , denoted by L i , can be written as L i = ln Pr( z i | x i = +1)Pr( z i | x i = − J ii σ i ( z i − µ i ) . (15)From (15), the probability of symbol x i , can be written as p i = e L i e L i . (16) Message passing:

The system is modeled as a fully-connected graph, where the data symbols in x representthe nodes. There are K nodes in the graph correspond-ing to the K elements in the vector x . The i th nodeuses the knowledge of J , z and the incoming APPs { p , p , · · · , p i − , p i +1 , · · · , p K } to obtain a soft estimateof the interference to symbol x i , and computes its APP, p i . That is, each node is an approximate APP processorfor its associated symbol, and message passing refers to the Fig. 3. Message passing in the proposed MPD algorithm. exchange of APP values computed at each iteration. Figure 3illustrates the above message passing schedule. Note that thecomputation of the message p i in (16) requires the computa-tion of (11), (12) and (15). The algorithm is initialized with p i = 0 . , ∀ i , and message passing is carried out for a certainnumber of iterations, after which the algorithm stops. Thevalues of p i s at the end are taken as the soft values of x i s.These soft values can be directly fed to the channel decoderin coded systems. In uncoded systems, a hard estimate ofsymbol x i can be obtained as ˆ x i = (cid:26) +1 if p i ≥ . − otherwise. (17) B. Improving convergence rate

At the end of the t th iteration of the detection algorithmdescribed above, we obtain the probability of the i th user’sinformation bit, p ti . The rate of convergence of this sequence { p i , p i , p i , · · · , p ti , · · · } can be improved by certain tech-niques. We discuss the following two techniques that helpsus to improve the convergence. • Aitken acceleration : Aitken’s delta-squared process is atechnique known in numerical analysis [30] for accel-erating sequence convergence. This method is also usedin [22] to accelerate the convergence of the Gaussianbelief propagation algorithm. By this method, a linearlyconverging sequence of real numbers can be acceleratedto converge quadratically. Although there is no rigorousproof guaranteeing this rate of convergence, empiricalobservations have shown that this method does acceler-ate the convergence of iterative algorithms. Accordingto Aitken’s acceleration method, we deﬁne a sequence q ti = p ti − ( p t +1 i − p ti ) p t +2 i − p t +1 i + p ti . (18)This new sequence q ti converges faster than p ti and tothe same limit, whenever p ti converges. After the ﬁrstthree iterations, q i s can be used as the messages in thealgorithm for faster convergence. • Damping : Damping of messages passed in messagepassing algorithms is a scheme known to improve the4 lgorithm 1

Proposed MPD algorithm

Require: z , J , σ v , ∆ Initialize : p i ← . , i = 1 , · · · , K for t = 1 to number of iterations do for i = 1 to K do µ i ← K (cid:80) j =1 ,j (cid:54) = i J ij (2 p t − j − σ i ← K (cid:80) j =1 ,j (cid:54) = i J ij p t − j (1 − p t − j ) + σ v L i ← J ii σ i ( z i − µ i ) ˜ p ti ← e Li e Li end for p t ← (1 − ∆)˜ p t + ∆ p t − end for rate of convergence of iterative algorithms [31]. At the t th iteration, the message is damped by obtaining aconvex combination of the message computed at the t thiteration and the message at the ( t − th iteration, witha damping factor ∆ ∈ [0 , . Thus, if ˜ p ti is the computedprobability at the t th iteration, the message at the endof t th iteration is p ti = (1 − ∆)˜ p ti + ∆ p t − i . (19)In section III-D, we will see the performance of thesemethods in improving the rate of convergence and the optimalchoice for ∆ .A listing of the proposed MPD algorithm with damping isgiven in Algorithm 1 , where p = [ p p · · · p K ] T and ˜ p = [˜ p ˜ p · · · ˜ p K ] T . C. Complexity comparison between MPD and MMSE

The computational complexity of the MPD algorithm isas follows. The complexity (in number of real operations)required to compute (11), (12) and (16) is of order O ( K ) .The complexities of computing z and J are of orders O ( N K ) and O ( N K ) , respectively. So, the total complexity of theproposed MPD is O ( N K ) , which is attractive for large-scale MIMO systems.In Table I, we present an interesting comparison betweenthe complexities of MPD and MMSE detection for N =128 , , and K varied from 16 to 256. Since we have used20 iterations for MPD in all the BER simulations, we havetaken the number of iterations to be 20 for the calculation ofthe MPD complexity. From Table I, the following interestingobservations can be made: 1) for large N (e.g., N = 256 ),MPD complexity is less than MMSE complexity. This isbecause MPD needs only matrix multiplication and notmatrix inversion, whereas MMSE detection needs both matrixmultiplication and inversion; 2) for N = 128 , the MPDcomplexity for K = 64 , , is less than the MMSEcomplexity. For K = 16 , , the MPD complexity is almost Complexity in number of real operations × K N = 128 N = 256 MMSE MPD SUMIS MMSE MPD SUMIS(prop) in [37] (prop) in [37]16 0.177 0.179 0.483 0.333 0.296 0.91732 0.748 0.749 1.737 1.321 1.190 3.13064 3.593 3.200 7.538 5.789 4.773 12.42096 9.584 7.208 19.368 14.450 10.748 29.837128 19.770 12.814 39.194 28.355 19.116 57.347256 - - - 157.373 76.505 307.633TABLE IC

OMPARISON BETWEEN THE COMPLEXITIES ( IN NUMBER OF REALOPERATIONS ) OF THE PROPOSED

MPD, MMSE

DETECTOR , AND

SUMIS

DETECTOR IN [37]

FOR DIFFERENT VALUES OF

K, N . N

UMBER OFITERATIONS FOR

MPD = 20,

AND n s = 3 FOR

SUMIS. the same as (marginally higher than) MMSE complexity,because the number of iterations ( = 20 ) is comparable with K ( = 16 , ). Also, MPD performs better than MMSE de-tection, and achieves close to optimal detection performancefor large K, N , and different system loading factors. We willsee this performance advantage of MPD in the followingsubsection.

D. BER performance of MPD

In this subsection, we present the uncoded BER performanceof MPD obtained through simulations for different systemparameter settings. We will now assume perfect knowledge H . We will relax this assumption later. First, in Fig. 4, weplot the uncoded BER of MPD at an average SNR of 12 dBfor N = K = 64 for various values of the damping factor ∆ .The number of message passing iterations used is 20. Fromthis ﬁgure, we observe that a damping factor of ∆ = 0 . isoptimal. This value of ∆ is found to give good performancefor other values of system parameters as well. So we haveused this value of ∆ in all the simulations. Next, Fig. 5shows the uncoded BER of MPD as a function of iterationindex with and without Aitken acceleration for N = K =64 , SNR=12 dB, and ∆ = 0 . . It can be observed thatthe convergence rate of the algorithm improves with Aitkenacceleration.In Fig. 6, we plot the uncoded BER of MPD for differentvalues of N ( = 4 , , , , , ) for a system loadingfactor of α = 1 ( K = N ). Since optimal detectionperformance for large-dimension systems is hard to obtain,we have plotted single-input single-output (SISO) additivewhite Gaussian noise (AWGN) channel performance as alower bound on the optimum detection performance. MMSEdetection performance is also plotted for comparison. FromFig. 6, it is observed that the performance of MPD improvesfor increasing N, K , and moves closer to the SISO-AWGNperformance for large

N, K . For example, the MPD perfor-mance for N = K = 128 gets very close to SISO-AWGNperformance. It is also observed that MPD performance isbetter than MMSE detection performance.Figure 7 shows the uncoded BER of MPD algorithm and5 −5 −4 −3 −2 −1 Damping factor, ∆ U n c oded BE R SNR = 12 dBN=K=64, 4−QAM

Fig. 4. Uncoded BER performance of the proposed MPD algorithm as afunction of damping factor ∆ . N = K = 64 , 4-QAM, SNR=12 dB. −4 −3 −2 −1 Iteration index U n c oded BE R Without Aitken accelerationWith Aitken accelerationSNR=12 dBN=K=64, 4−QAM

Fig. 5. Comparison of the convergence behavior of the MPD algorithmwithout and with Aitken acceleration. N = K = 64 , 4-QAM, SNR=12 dB. −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R α =1, 4−QAMMPD (Prop), N=128MPD (Prop), N=64MPD (Prop), N=32MPD (Prop), N=16MPD (Prop), N=8MPD (Prop), N=4MMSE, N=128MMSE, N=64MMSE, N=32MMSE, N=16MMSE, N=8MMSE, N=4SISO AWGNMMSE Fig. 6. Uncoded BER performance of the MPD algorithm and the MMSEdetector for N = K = 4 , , , , , , 4-QAM. −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R N=128, 4−QAM MPD (Prop), K=128MPD (Prop), K=96MPD (Prop), K=64MPD (Prop), K=32MPD (Prop), K=16MMSE, K=128MMSE, K=96MMSE, K=64MMSE, K=32MMSE, K=16

Fig. 7. Uncoded BER performance of the MPD algorithm and the MMSEdetector for different values of K (= 16 , , , , for a ﬁxed N =128 , 4-QAM. A v e r age S NR i n d B r equ i r ed t o a c h i e v e − un c oded BE R N = 128, 4−QAMMMSEMPD (Prop)

Fig. 8. Comparison between the average SNR required to achieve anuncoded BER of − in MPD and MMSE detection at different loadingfactors with N = 128 , 4-QAM. MMSE detector for a ﬁxed number of receiver antennas atthe BS ( N = 128 ) and varying number of users ( K =16 , , , , ), i.e., for different values of loading factors( α = , , , , ). It is observed that the BER performanceimproves considerably as the loading factor is reduced, whichis expected. The MPD performance for different loadingfactors is better than MMSE detection performance. Thisobservation is further illustrated in Fig. 8, where the averageSNRs required to achieve an uncoded BER of − in MPDand MMSE detection are plotted. It can be observed from Fig.8 that the MPD outperforms the MMSE detection by about1.2 dB at a loading factor of α = 0 . . This performance ad-vantage of MPD over MMSE detection increases for increas-ing values of α . For example, the performance advantage ofMPD over MMSE detection is about 6.5 dB and 12.5 dB for α = 0 . and α = 1 , respectively. The reason why MMSE6 S NR r equ i r ed t o a c h i e v e − un c oded BE R α = 1, 4−QAM MMSEMPD (Prop)ML (Sphere dec.)SISO AWGN2x10 Fig. 9. Comparison between the average SNR required to achieve anuncoded BER of − BER in ML (sphere decoding), MPD, and MMSEdetection as a function of N for α = 1 (i.e., N = K ) and 4-QAM. detection performs quite poorly at high loading factors isbecause the spatial interference gets increased signiﬁcantlyat higher loading factors with large N (e.g., N = K = 128 )compared to lower loading factors, and MMSE detection doesnot perform interference cancellation/suppression. Whereas,the MPD is beneﬁted by the channel hardening effect withlarge N, K . The performance advantage of MPD becomesvery attractive given that MPD complexity is almost same orless than the MMSE detection complexity (as discussed inSection III-C).The effect of channel hardening on the BER performance ofthe MPD algorithm is further illustrated in Fig. 9. This ﬁgureshows the SNRs required to achieve − BER with MPD aswell as MMSE detection in N = K = 2 to N = K = 256 systems. We have also plotted the same for ML detection(using sphere decoding) in N = K = 2 to N = K = 16 sys-tems. Since ML detection is prohibitive for larger dimensions,we have plotted the SNR required in a SISO AWGN systemas a lower bound on the ML performance. In small systemslike N = K = 2 , , systems where channel hardeningis not signiﬁcant, both MPD and MMSE performances arefar from ML performance with MPD performing better thanMMSE – e.g., MPD performance is about 10 dB awayfrom ML performance in N = K = 4 , systems, whereasMMSE performance is about 14 to 15 dB away from MLperformance in N = K = 4 , systems. In systems with sizelarger than N = K = 16 , channel hardening becomes moresigniﬁcant and the performance of MPD shows signiﬁcantimprovement compared to MMSE and gets closer to MLperformance – e.g., for N = K = 128 system, the MPDperformance is just about 0.25 dB away from the ML lowerbound whereas the MMSE performance is away from the MLlower bound by about 10 dB. These observations illustratethat harder the channel gets, better is the MPD performance. E. Channel estimation for MPD

A key issue in large-scale MIMO systems is the estimation ofchannel gains. In conventional approaches, the

N K channelgains in the channel matrix are estimated and used for the de-tection of transmitted symbols. Note that in our transformedsystem model (7), the inﬂuence of the channel on vector z isthrough H T H , rather than through H as such. We propose toexploit this observation on the structure of the system model(7). Speciﬁcally, we propose to directly obtain an estimate of H T H and use it in the MPD algorithm, rather than obtainingan estimate of H as done in conventional approaches. Wenote that this approach is simple and novel, and it works verywell in the MPD algorithm (as we will see in the performanceresults). We present the scheme to obtain an estimate of the H T H matrix next. Estimating the H T H matrix: Note that we have deﬁned J = H T H . We are interested inobtaining ˆ J , an estimate of J . We assume that the channel isslowly fading, where the channel matrix H remains constantover one frame duration (which is taken to be equal to thecoherence time of the channel). The length of one frame is L f channel uses. Each frame consists of a pilot part and adata part. The pilot part consists of K channel uses, and thedata part consists of L f − K channel uses.Let X p = P I K denote the pilot matrix, where in the i thchannel use, ≤ i ≤ K , user i transmits a pilot tone withamplitude P and the other users remain silent. The receivedpilot matrix at the BS is then given by Y p = HX p + W p = P H + W p , (20)where P = √ KE s , E s is the average symbol energy, and W p is the noise matrix. Using Lemma 1, we obtain anestimate of the matrix J as ˆ J = Y Tp Y p N P − σ v P I K . (21)An estimate of the vector z is obtained as ˆ z = Y Tp y N P . (22)The estimates ˆ J and ˆ z are used as inputs to the MPDalgorithm in place of J and z . Note on complexity:

A key advantage of the above estimation scheme is its lowcomplexity. The computation of ˆ J and ˆ z in (21) and (22)requires only matrix and vector multiplications. Note thateven when perfect knowledge of H or an estimate of H is available, similar computations are needed to compute J and z . Further note that the additional complexity neededto obtain an estimate of H in the conventional approach isavoided in our approach.7 −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R N=128, 4−QAMMMSE det. & MMSE Chl. Est.FG−GAI det. & MMSE Chl. Est.CHEMP receiver (Prop.)MPD (Prop.) & perfect CSIFG−GAI det. & perfect CSISISO AWGN

Fig. 10. Comparison of the BER performance of the proposed CHEMPreceiver with those of 1) MMSE detector with MMSE channel estimate, and2) FG-GAI detector in [11] with MMSE channel estimate, for N = K =128 , 4-QAM. F. BER performance of the CHEMP receiver

As mentioned before, we refer to the combination of pro-posed MPD algorithm and the channel estimation schemeproposed in the previous subsection as the CHEMP receiver.In this subsection, we present the uncoded BER performanceof the CHEMP receiver. The number of iterations used in theMPD algorithm is 20. We compare the performance of theCHEMP receiver with two other receivers, namely, 1) MMSEdetector with MMSE channel estimate, and 2) FG-GAI (fac-tor graph with Gaussian approximation of interference) de-tector in [11] with MMSE channel estimate. We note that theFG-GAI detector in [11] is also a message passing algorithmwhich used a Gaussian approximation of interference. Butthis approximation was done on the original system modelin (2), whereas in the proposed MPD algorithm, the Gaussianapproximation is done on the matched ﬁltered system modelin (7).In Fig. 10, we present an uncoded BER performance com-parison between 1) proposed CHEMP receiver, 2) MMSEdetector with MMSE channel estimate, and 3) FG-GAIdetector in [11] with MMSE channel estimate. It can be seenthat the performance of the proposed CHEMP receiver issigniﬁcantly better than those of the MMSE and FG-GAIdetectors with MMSE estimate of the channel. Observe thatthe performances of MPD and FG-GAI under perfect CSIconditions are almost the same, whereas under estimated CSIconditions, the CHEMP receiver performs signiﬁcantly betterthan FG-GAI with MMSE channel estimate. An analyticalreasoning for this is presented in Section IV-B.Figure 11 shows the performance of the CHEMP receiver andMMSE detector with MMSE channel estimate for differentnumber of users ( K = 16 , , , , ) and ﬁxed numberof BS antennas ( N = 128 ). As expected, the performance −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R N = 128, 4−QAMEstimated CSICHEMP (Prop), K=128CHEMP (Prop), K=96CHEMP (Prop), K=64CHEMP (Prop), K=32CHEMP (Prop), K=16MMSE, K=128MMSE, K=96MMSE, K=64MMSE, K=32MMSE, K=16

Fig. 11. BER performance of 1) proposed CHEMP receiver and 2)MMSE detector with MMSE channel estimate, for different values of K ( = 16 , , , , ) for a ﬁxed value of N ( = 128 ), 4-QAM. A v e r age S NR i n d B r equ i r ed t o a c h i e v e − un c oded BE R N = 128, 4−QAMEstimated CSIMMSE det & MMSE Chl. Est.FG−GAI det & MMSE Chl. Est.CHEMP receiver (Prop)

Fig. 12. Comparison between the average SNR required to achieve anuncoded BER of − in 1) proposed CHEMP receiver, 2) MMSE detectorwith MMSE channel estimate, and 3) FG-GAI detector in [11] with MMSEchannel estimate, at different loading factors with N = 128 , 4-QAM. improves for smaller values of K . Also, CHEMP receiverperforms better than MMSE detector with MMSE channelestimate. In Fig. 12, we illustrate a comparison between thethe average SNR required to achieve an uncoded BER of − in 1) proposed CHEMP receiver, 2) MMSE detectorwith MMSE channel estimate, and 3) FG-GAI detectorin [11] with MMSE channel estimate, at different loadingfactors with N = 128 . From this ﬁgure, we observe thatthe CHEMP receiver outperforms the other two receivers.For example, the CHEMP receiver outperforms the MMSEdetector with MMSE channel estimate by about 0.6 dB to 11dB for loading factors in the range of α = 0 . to α = 1 .Likewise, the performance advantage of the CHEMP receiverover FG-GAI detector with MMSE channel estimate is about0.6 dB to 4 dB for loading factors in the range of α = 0 . to α = 1 .8 . Comparison with SUMIS detector in [37] A subspace marginalization with interference suppression(SUMIS) detector has been proposed recently in [37]. TheSUMIS detector uses the ideas of partial marginalization(via a parameter n s ∈ { , , · · · , K } ) and soft interferencesuppression. The order of complexity of the SUMIS detectoris K + 2 N K + K (2 n s + 6) [37]. Here, we present a per-formance and complexity comparison between the proposedMPD and the SUMIS detector. Figure 13 shows the BERperformance of the proposed MPD and SUMIS detector (with n s = 3 ) for various values of K keeping N ﬁxed at 128,4-QAM, and perfect CSI. For the same system parameters,Fig. 14 shows the comparison between the proposed CHEMPreceiver and SUMIS detector with MMSE channel estimate.These ﬁgures show that the proposed MPD/CHEMP performsbetter than SUMIS/SUMIS with MMSE channel estimate.The proposed detector achieves better performance at lesscomplexity than SUMIS detector. This can be observed inTable I which presents the complexities of MPD and SUMISfor different values of N and K . The complexity advantageof the proposed MPD over SUMIS is because MPD needsonly matrix multiplication and not matrix inversion, whereasSUMIS needs both matrix multiplication and matrix inver-sion. −2 0 2 4 6 8 10 12 14 16 1810 −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R N = 128, 4−QAMPerfect CSISUMIS in [37], K=16SUMIS in [37], K=64SUMIS in [37], K=96SUMIS in [37], K=128MPD (Prop), K=16MPD (Prop), K=64MPD (Prop), K=96MPD (Prop), K=128

Fig. 13. BER performance of 1) proposed MPD detector and 2) SUMISdetector in [37] for different values of K ( = 16 , , , ) for a ﬁxedvalue of N ( = 128 ), 4-QAM, perfect CSI. IV. A

NALYSIS OF THE PROPOSED

CHEMP

RECEIVER

In this section, we carry out some analysis of the proposedCHEMP receiver. The analysis reported in this section hastwo parts. In the ﬁrst part, we analyze the convergence ofthe proposed MPD algorithm, and give a sufﬁcient conditionfor the algorithm to converge to the correct solution. Inthe second part, we present an analysis of the mean squaredifference (MSD) of the LLRs computed with estimated CSIand perfect CSI for the proposed CHEMP receiver as well −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R N = 128, 4−QAMEstimated CSISUMIS in [37], K=16SUMIS in [37], K=64SUMIS in [37], K=96SUMIS in [37], K=128CHEMP (Prop), K=16CHEMP (Prop), K=64CHEMP (Prop), K=96CHEMP (Prop), K=128

Fig. 14. BER performance of 1) proposed CHEMP receiver and 2)SUMIS detector with MMSE channel estimate for different values of K ( = 16 , , , ) for a ﬁxed value of N ( = 128 ), 4-QAM. as the FG-GAI receiver (i.e., FG-GAI detector in [11] withMMSE channel estimate). A. Analysis of the convergence of MPD algorithm

First we state the lemmas that we require to prove resultsin the later parts of this subsection. Let P denote the set { p | p ∈ [0 , K } . Lemma 2.

The set P is a compact and convex set.Proof: Since every element p i of any p ∈ P is from thesame closed compact interval [0 , ⊂ R , P is also a closedsubset of R K , and hence P is also a compact subset. Let p and p be any two elements of P . Then it can be seenthat for any λ ∈ [0 , , λ p + (1 − λ ) p ∈ P . (23)Hence, P is a convex set. This set P is the compact convexsubset of R K consisting of all probability vectors.We deﬁne the following variables for convenience: V + i ( p ) (cid:44) z i − J ii − K (cid:88) j =1 ,j (cid:54) = i J ij (2 p j − ,V − i ( p ) (cid:44) z i + J ii − K (cid:88) j =1 ,j (cid:54) = i J ij (2 p j − , (24) A + i ( p ) (cid:44) − σ i (cid:0) V + i ( p ) (cid:1) , A − i ( p ) (cid:44) − σ i (cid:0) V − i ( p ) (cid:1) ,f + i ( p ) (cid:44) exp (cid:0) A + i ( p ) (cid:1) , f − i ( p ) (cid:44) exp (cid:0) A − i ( p ) (cid:1) , (25)where z i , J ij , σ i are constants in R , σ i > and p ∈ P . Lemma 3.

Let f ( p ) be a function such that if p (cid:48) = f ( p ) then p (cid:48) i = f i ( p ) (cid:44) f + i ( p ) f + i ( p )+ f − i ( p ) . Then f ( p ) is continuous in P . roof: We see that f : P → P . Since A + i ( p ) and A − i ( p ) are polynomial functions in p j , j ∈ { , · · · , K } \ i and exp( . ) is a continuous monotone function, f + i ( p ) and f − i ( p ) are continuous functions in P . Since p belongs toa closed set and exp( . ) is a non-negative function, theterm (cid:0) f + i ( p ) + f − i ( p ) (cid:1) is always positive. Hence, f i ( p ) being a ratio of two continuous functions with non-vanishingdenominator, is also a continuous function. This proves that f ( p ) is continuous in P , as all its component functions arecontinuous in P .From Lemma 3 we see that f ( p ) is a recursive map thatrepresents the proposed MPD algorithm in Section III-A. Proposition 1.

The function f ( p ) deﬁned in Lemma 3 hasa ﬁxed point in P .Proof: By Lemma 2, P is a compact convex set and byLemma 3, f ( p ) is a continuous function such that f : P →P . Hence, by Brouwer’s ﬁxed point theorem [36], f ( p ) hasa ﬁxed point in P .Proposition 1 proves that the proposed MPD algorithm hasa ﬁxed point.Now, we give a sufﬁcient condition for the MPD algorithmto converge to the correct solution. Since the Gaussiandistribution is a symmetric function with its positive partbeing monotone decreasing, we have p (cid:48) i > in the function p (cid:48) = f ( p ) whenever V + i ( p ) < V − i ( p ) . Let d i (cid:44) V + i ( p ) − V − i ( p ) = − J ii (cid:104) J ii x i + K (cid:88) j =1 ,j (cid:54) = i J ij ( x j − p j + 1) + n i (cid:105) . (26)We know that J ii > , ∀ i . When x i = +1 , p (cid:48) i > iff d i < ,and d i will be negative irrespective of p iff J ii + K (cid:88) j =1 ,j (cid:54) = i J ij ( x j − p j + 1) + n i > . (27)Bounding the J ij ( x j − p j + 1) term on the LHS of (27) by − | J ij | , at high SNRs, we get J ii > K (cid:88) j =1 ,j (cid:54) = i | J ij | . (28)It can be similarly shown that (28) should be true for d i > when x i = − , irrespective of p . Thus, when (28) is truethe MPD algorithm has a ﬁxed point that is provably uniqueand attractive.When the algorithm starts with an initial vector of p i =0 . , ∀ i , then the condition in (28) can be simpliﬁed to J ii > K (cid:88) j =1 ,j (cid:54) = i | J ij | , (29)which is nothing but the diagonal dominance condition forthe matrix J , and it gives a sufﬁcient condition for the MPD algorithm to converge to the correct solution. It should benoted that (29) is a not a necessary condition for convergence.From extensive simulations, it has been observed that theMPD algorithm performs very well for large N, K even whenthe matrix J is not diagonally dominant. B. Analysis of LLRs in CHEMP and FG-GAI receivers

In Fig. 10, we observed that while the performances ofMPD and FG-GAI under perfect CSI conditions are almostthe same, under estimated CSI conditions, the CHEMP re-ceiver performs signiﬁcantly better than FG-GAI with MMSEchannel estimate. Here, we shall present an LLR analysisthat explains the reason for this performance advantage ofCHEMP receiver under estimated CSI conditions.We note that there are three different LLRs of interest here,which we call as Type-1 LLR, Type-2 LLR, and Type-3LLR. Type-1 LLR is the ‘true’ LLR in the ‘exact’ MAPdetector. Type-2 LLR is an approximate LLR in a detector(e.g., MPD, FG-GAI detectors) with perfect CSI. Type-3LLR is an approximate LLR in a detector with estimatedCSI. A comparison between the Type-I LLR and Type-2LLR of MPD for large dimensions like N = K = 128 is infeasible because of the exponential complexity of thecomputation of LLRs in the exact MAP detector. For thepurpose of analytically reasoning the performance advantageof the CHEMP receiver, we use a performance measurewhich is the mean square difference (MSD) between 1) Type-2 and Type-3 LLRs of the MPD detector, and 2) Type-2 andType-3 LLRs of the FG-GAI detector. This MSD measurefor a given detector can be viewed as an indicator of therelative degradation of the LLR of the detector computedunder perfect CSI to that computed under estimated CSI.In the following, we derive upper bounds on the MSD ofLLRs in CHEMP receiver and FG-GAI with MMSE channelestimate.The signal vector ˆ z in the CHEMP receiver given by (22)can be written as ˆ z = 1 N P ( Y Tp Hx + Y Tp w )= (cid:18) J + W Tp H N P (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) (cid:44) (cid:101) J x + (cid:18) H N + W p N P (cid:19) T w (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) (cid:101) w = (cid:101) Jx + (cid:101) w . (30)Likewise, the matrix ˆ J in the CHEMP receiver given by (21)can be written as ˆ J (cid:44) ( P H + W p ) T ( P H + W p ) NP − σ v P I K = (cid:18) J + W Tp H NP (cid:19) + 1 NP (cid:18) H T W p + W Tp W p P (cid:19) − σ v P I K (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) (cid:101) J (cid:48) = (cid:101) J + (cid:101) J (cid:48) . (31) x requires an estimateof (cid:101) J . But the CHEMP receiver uses ˆ J instead. This, as per(31), amounts to using an estimate of (cid:101) J with an estimationerror of (cid:101) J (cid:48) .Assume N and K are large and all the transmitted bitsare i.i.d. Let δ preﬁxed to a variable denote the differencebetween the variable computed under estimated CSI (i.e.,using ˆ J and ˆ z ) and perfect CSI (i.e., using J and z ). Forexample, δµ i = ˆ µ i − µ i , where ˆ µ i is obtained by substituting ˆ J in place of J in (11). Likewise, δL i = ˆ L i − L i , where ˆ L i obtained by substituting ˆ J and ˆ z in place of J and z ,respectively, in (15).Now, from (15), we can write the LLR computed by theCHEMP receiver as ˆ L i = 2 (cid:101) J ii + 2 (cid:101) J (cid:48) ii σ i + δσ i (ˆ z i − µ i − δµ i ) . (32)Now, δL i is bounded above as δL i ≤ (cid:101) J (cid:48) ii (ˆ z i − µ i − δµ i ) − (cid:101) J ii δµ i σ i . (33)By Lemma 1, we can write the following: (cid:101) J (cid:48) ij | i (cid:54) = j ∼ N (cid:18) , σ v N P + σ v N P (cid:19) , (34) (cid:101) J (cid:48) ii ∼ N (cid:18) , σ v N P + σ v N P (cid:19) , (35) δµ i ∼ N (cid:18) , σ v P + σ v P (cid:19) . (36)Without loss of generality, we can assume P = 1 . Therefore, E ( δL i ) = 0 , and E ( δL i ) ≤ σ v σ i (cid:26) α (cid:18) σ v + 12 (cid:19) + (cid:18) α ( σ v + σ v z i − µ i ) (cid:19) . (cid:18) σ v N + 2 N (cid:19)(cid:27) . (37) Note that E ( δL i ) is the MSD between the Type-2 and Type-3LLRs of the MPD.Next, we do a similar analysis of the MSD of LLRs for theFG-GAI detector. Using the deﬁnition of the LLRs Λ ki inthe FG-GAI detector as given in [11], the difference in LLRin FG-GAI computed with MMSE channel estimate and thatcomputed with perfect CSI is bounded above as δ Λ ki ≤ H (cid:48) ik ( y i − µ ik − δµ ik ) − H ik δµ ik σ ik , (38)where the terms µ ik and σ ik are as deﬁned in [11], H (cid:48) ij isthe error in estimating H ij , and, as deﬁned before, δ preﬁxedto a variable denotes the difference between that variablecomputed under estimated CSI and perfect CSI. The error inthe MMSE channel estimate in the FG-GAI receiver is H (cid:48) ij = W ij P − H ij σ n P + σ n , (39) Average SNR in dB M S D o f LL R Upper bound for FG−GAI receiverSimulated value for FG−GAI recevierUpper bound for CHEMP receiverSimulated value for CHEMP receiverN=K=128, 4−QAM

Fig. 15. MSD of LLRs in FG-GAI and CHEMP receivers for N = K =128 , 4-QAM. where W ij is the ( i, j ) th element in matrix W P . Thestatistics of H (cid:48) ij are computed by using Lemma 1 as follows: E ( H (cid:48) ij ) = 0 , σ e (cid:44) E ( H (cid:48) ij ) = σ n ( P + σ n )( P + σ n ) . (40)Without loss of generality, assume P = 1 and α = 1 .Now, we have H (cid:48) ij ∼ N (0 , σ e ) and δµ ij ∼ N (0 , N σ e ) .By Lemma 1, we have E ( δ Λ ji = 0) , and E (( δ Λ ji ) ) ≤ σ e σ ij (cid:18) N (cid:16) σ e + 12 (cid:17) + ( y i − µ ij ) (cid:19) . (41)The probability of the i th symbol is computed using theLLR value L Fi (cid:44) (cid:80) Nl (cid:54) = i Λ jl . Therefore, δL Fi = (cid:80) Nl (cid:54) = i δ Λ jl , E ( δL Fi ) = 0 , and E (( δL Fi ) ) = ( N − E ( δ Λ ji ) . It is notedthat E (( δL Fi ) ) is the MSD between the Type-2 and Type-3LLRs of the FG-GAI detector.It can be seen from (37) and (41) that the MSD of thecomputed LLR values in each iteration is less in the CHEMPreceiver compared to that in the FG-GAI receiver. This isfurther veriﬁed by simulation in Fig. 15, where it can beobserved that the simulated MSD of the LLRs in the CHEMPreceiver is less compared to that in the FG-GAI receiver.This makes the proposed CHEMP receiver robust to channelestimation errors when compared to the FG-GAI receiver.V. E XTENSION TO HIGHER - ORDER

QAMIn this section, we extend the MPD algorithm to higher-order QAM. For M -QAM alphabets, the elements of x in(2) belong to the underlying PAM alphabet; for example,when the transmitted symbols are from 16-QAM alphabet,the elements of x are 4-PAM symbols. In such a scenario,we compute symbol-wise probability messages in the MPDalgorithm. Speciﬁcally, in each iteration, for each element in x , we compute the probability masses for all symbols in B

11s follows. The means are computed as µ i = K (cid:88) j =1 ,j (cid:54) = i J ij E ( x j )= K (cid:88) j =1 ,j (cid:54) = i J ij (cid:88) ∀ s ∈ B s p j ( s ) . (42)The variances are computed as σ i = K (cid:88) j =1 ,j (cid:54) = i J ij Var ( x j ) + σ v = K (cid:88) j =1 ,j (cid:54) = i J ij (cid:16) (cid:88) ∀ s ∈ B s p j ( s ) − E ( x j ) (cid:17) + σ v , (43)where σ v is as deﬁned in Section III-A. The probability of x i being s ∈ B is computed as p i ( s ) ∝ exp (cid:16) − σ i ( z i − µ i − J ii s ) (cid:17) . (44)Finally, the bit probabilities are obtained as Pr( b pi = 1) = (cid:88) ∀ s ∈ B : p th bit in s is p i ( s ) , (45)where b pi is the p th bit in the i th user’s symbol, which isdetected as 1 if Pr( b pi = 1) ≥ . and 0 otherwise. It can benoted that the message passed by each node is a vector oflength | B | . Complexity : The complexity of computation of z and J are O ( N K ) and O ( N K ) , respectively. The complexity ofcomputing the messages is O ( √ M K ) for a square M -QAMconstellation. This is due to the vector nature of the messagesfor M -QAM alphabet as opposed to the scalar messages for {± } alphabet. In Table II, we present the complexity for16-QAM (in number of real operations) for the proposedMPD, MMSE detector and SUMIS detector with n s = 3 .It can be seen that the complexity of the proposed MPDis comparable to/less than MMSE complexity and is lessthan SUMIS complexity. In addition, the performance ofMPD is better than those of MMSE and SUMIS detectorsas illustrated below. Complexity in number of real operations × N = 128 K MMSE MPD SUMIS(prop) in [37]16 0.177 0.240 0.48332 0.748 0.964 1.73764 3.593 3.861 7.53896 9.584 8.692 19.368128 19.770 15.456 39.194TABLE IIC

OMPARISON BETWEEN THE COMPLEXITIES ( IN NUMBER OF REALOPERATIONS ) OF THE PROPOSED

MPD, MMSE

DETECTION , AND

SUMIS

DETECTION WITH n s = 3 FOR

Performance : In Fig. 16, we present a comparison betweenthe BER performances of the proposed MPD, MMSE de-tection, and SUMIS detection with n s = 3 , for N = 128 , K = 16 , , , and 16-QAM. A similar comparison betweenthe proposed CHEMP receiver, and the MMSE and SUMISdetectors with MMSE channel estimate is presented in Fig.17. From these ﬁgures, we can see that the proposed MPDoutperforms the MMSE and SUMIS detectors under perfectCSI and estimated CSI conditions. −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R K = 16, MPD (Prop)K = 32, MPD (Prop)K = 64, MPD (Prop)K = 16, SUMIS in [37]K = 32, SUMIS in [37]K = 64, SUMIS in [37]K = 16, MMSE det.K = 32, MMSE det.K = 64, MMSE det.

Fig. 16. Comparison of uncoded BER performance of the proposed MPD,MMSE detector and SUMIS detector in [37] with n s = 3 for 16-QAM, N = 128 , K = 16 , , . −5 −4 −3 −2 −1 Average SNR in dB U n c oded BE R K = 16, MPD (Prop)K = 32, MPD (Prop)K = 64, MPD (Prop)K = 16, SUMIS in [37]K = 32, SUMIS in [37]K = 64, SUMIS in [37]K = 16, MMSE det.K = 32, MMSE det.K = 64, MMSE det.

Fig. 17. Comparison of uncoded BER performance of the proposedCHEMP receiver, MMSE and SUMIS detectors with MMSE channel es-timate for 16-QAM, N = 128 , K = 16 , , . VI. D

ESIGN OF

LDPC

CODES FOR

CHEMP

RECEIVER

Since both the proposed CHEMP receiver and the LDPCdecoder employ message passing, a detection-decoding ap-proach based on message passing on a joint graph can benatural. In this section, we present a joint graph for the12DPC coded system model. We perform MPD and LDPCdecoding by passing messages on the joint graph. We designoptimized irregular LDPC codes speciﬁc to the consideredlarge MIMO channel and the CHEMP receiver through EXITchart matching. We also present the coded BER performanceof the LDPC codes thus obtained.When the detection and decoding operations are performedjointly, the receiver starts the detection-decoding processafter receiving n coded bits. In the joint detection-decodingapproach, we marginalize the joint probability of the receivedcoded symbols. The objective is to compute Pr( x | C, y ) ∝ Pr( x , C, y )= Pr( C | x ) Pr( y | x ) Pr( x ) , (46)where Pr( C | x ) = n − k (cid:89) l =1 Pr( C l | x ) , (47) C l is the event of the l th check equation of the LDPC codebeing satisﬁed, and C is the event of all n − k check equationsof the LDPC code being satisﬁed. We formulate a graphwhose joint probability factorizes according to (46), and thatupon marginalization gives the probability of the transmittedsymbols. A. Joint detector and decoder

Figure 18 shows the joint graph for the LDPC coded large-scale MIMO system with 4-QAM. The joint graph consistsof three sets of nodes, namely, variable nodes set, observationnodes set, and check nodes set. The nK observation nodescorrespond to the elements of the z vectors, the nK variablenodes correspond to the transmitted coded symbols over n channel uses, and ( n − k ) K check nodes correspond to thecheck equations of the LDPC code (see Fig. 18).Let i ∈ { , · · · , K } , j ∈ { , · · · , K } , m ∈ { , · · · , n } , m (cid:48) ∈ { , · · · , n } , and l ∈ { , · · · , n − k } . Now, the differentmessages passed over the graph are: • Observation node z m (cid:48) i to variable node s jm :These messages correspond to the probabilities Pr( x m (cid:48) i = +1) , the probability of the i th bit transmittedat the m (cid:48) = (cid:100) m (cid:101) th channel use, i.e., for a given m (cid:48) , m ∈ { m (cid:48) − , m (cid:48) } . • Variable node s jm to check node c jl :These messages correspond to the probabilities Pr( b jm = +1) , the probability of the m th bit inthe LDPC code block transmitted by the j th user. l ∈ N ( s jm ) , where N ( s jm ) is the neighborhood of s jm ,i.e., the set of all check nodes connected to s jm . • Check node c jl to variable node s jm :These messages correspond to the probabilities Pr( C jl | s jr , ∀ r ∈ N ( c jl ) \ s jm ) , where N ( c jl ) is the neighborhoodof c jl , i.e., the set of all variable nodes connected to c jl . This corresponds to the probability of the l th check equation of the LDPC code block transmitted by the j thuser to be satisﬁed. • Variable node s jm to observation node z m (cid:48) i :These messages correspond to the probabilities Pr( x m (cid:48) i = +1 | C jr , x mu , ∀ r ∈ N ( s jm ) , u ∈{ , · · · , K } \ i ) ,It should be noted that, due to the way messages are deﬁnedin the MPD of the CHEMP receiver, there is no message sentfrom the observation node z m (cid:48) i to the variable node s i m (cid:48) − when ≤ i ≤ K , and there is no message sent from theobservation node z m (cid:48) i to the variable node s i m (cid:48) when K +1 ≤ i ≤ K . Similarly, the variable node s jm sends no messageto any observation node except z m (cid:48) j and z m (cid:48) j . The iterationsare continued till all the LDPC check equations are satisﬁedby the estimated bits or a certain number of iterations arecompleted. B. Design of LDPC codes for the joint detector-decoder

We obtain the behavior of the proposed joint detector-decoderthrough EXIT curve analysis [32]. The EXIT function is f ( I A ) = I E , where I E is the average mutual informationbetween the coded bits and the extrinsic output for a givenvalue of I A , where I A is the average mutual informationbetween the coded bits and the input a priori information.First, we obtain the EXIT curves of the CHEMP receiverand combine it with that of the LDPC decoder to obtain theEXIT characteristics of the joint detector-decoder.The EXIT characteristics of the CHEMP receiver is obtainedthrough Monte Carlo simulations, as an analytical evaluationis intractable. We combine the CHEMP receiver’s EXITcurves with those of the LDPC decoder, whose EXIT curveshave known closed-form expressions [33]. Figure 19 showsthe EXIT curves of the proposed MPD detector and thatof the combination of the MPD detector and the variablenodes of the LDPC decoder for 4-QAM, N = 128 and K = 32 , . We know that to approach the capacity ofthe channel using LDPC codes, we need to match the EXITcurves of the check nodes set and the variable nodes set [34],by ﬁnding an appropriate degree distribution of the variablenodes and the check nodes that is speciﬁc for a channel andreceiver. Using the evaluated EXIT curves and the methoddetailed in [18], we obtain the degree distribution of irregularLDPC codes speciﬁc for the large-scale MIMO channeland the proposed CHEMP receiver. The LDPC codes thusobtained for various system parameter settings are presentedin Table III. C. Coded BER performance

We evaluated the coded BER performance of the jointdetector-decoder by combining the CHEMP receiver and theLDPC decoder, for N = 128 and K = 16 , , , , .Figure 20 shows the coded BER performance of the opti-mized LDPC codes for the cases with 1) perfect channel13 ig. 18. The joint graph of the LDPC coded large-scale MIMO system. A M u t ua l i n f o r m a t i on a t t he ou t pu t, I E Fig. 19. EXIT curves of 1) proposed MPD, and 2) combination of MPDand variable nodes of the LDPC decoder (CMVLD).Parameters ( d v , p v ) ( d c , p c ) N = 128 , (2,0.3723), (4, 0.2798), (6, 0.7067), (12, 0.2531), α = 1 (5, 0.2254), (8,0.1152), (18, 0.0402)(12, 0.0073) N = 128 (2,0.5715), (4,0.3132), (4, 0.7045), (8, 0.091) α = 0 . (5, 0.1061), (8, 0.0091) (12, 0.2045) N = 128 (2,0.4794), (4,0.4201), (6, 0.7599), (12, 0.1003) α = 0 . (8, 0.0309), (16, 0.0696) (16, 0.1398)TABLE IIID EGREE PROFILES OF OPTIMIZED RATE -1/2 LDPC

CODES FORDIFFERENT LARGE

MIMO

CONFIGURATIONS . p v , p c : FRACTION OFVARIABLE NODES OF DEGREE d v AND CHECK NODES OF DEGREE d c . knowledge and 2) estimated channel knowledge (i.e., esti-mated H T H ), for N = K = 128 . The minimum SNRrequired to achieve capacity is also marked. The rate ofthe LDPC code is 1/2 and the LDPC code block lengthis n = − coded BER. Likewise,the optimized LDPC code with estimated channel knowledgeoutperforms the off-the-shelf LDPC code by about 0.8 dB.In Fig. 21, we plot the average SNRs required to achievea coded BER of − by the optimized LDPC codes withestimated channel knowledge and perfect channel knowledge,as a function of the system loading factor α . From Fig. 21, weobserve that the optimized LDPC code with perfect channelknowledge performs better than the off-the-shelf LDPC codein [35] by about . dB at α = 1 , and . dB at α = 0 . .Likewise, the optimized LDPC code with the estimatedchannel outperforms the off-the-shelf LDPC code by about . dB at α = 1 , and . dB at α = 0 . . This performanceimprovement is due to the LDPC code optimization throughEXIT curve matching and joint detection-decoding.In Fig. 22, we show a performance comparison between theproposed optimized code and the codes in [38] and in theWiMax standard [39], in a system with N = K = 128 ,4-QAM, n = 11520 , rate-1/2, and perfect CSI. At a blocklength of n = 11520 , the proposed optimized code is foundto perform close to within about 2.2 dB from capacity. Also,the optimized code is found to perform better than the codesin [38] and [39] by about 2 dB and 2.5 dB, respectively, at − coded BER. VII. C ONCLUSIONS

We proposed a promising message passing based receiver(referred to as the ‘CHEMP receiver’) for low complexity de-tection and channel estimation in large-scale MIMO systems.The proposed CHEMP receiver is simple and novel (leadingto low complexity), yet very effective in large dimensions(leading to near-optimal performance). The key idea is anovel way of exploiting the channel hardening effect thathappens in large MIMO channels. Speciﬁcally, the receiverworked with approximations to the off-diagonal terms of the H T H matrix, and directly obtained and used an estimate of H T H (instead of an estimate of H ). For the considered large-scale MIMO settings, the proposed message passing detectionalgorithm has almost the same or less complexity compared14 −5 −4 −3 −2 −1 Average SNR in dB C oded BE R S NR a t c apa c i t y ( . d B ) N=K=128, 4−QAMn=4000, rate=1/2LDPC code in [35], est. CSIOpt. LDPC code, est. CSILDPC code in [35], perfect CSIOpt. LDPC code, perfect CSI

Fig. 20. Coded BER performance of the irregular LDPC codes optimizedfor the joint detector-decoder with 1) perfect channel knowledge and 2)estimated channel knowledge (i.e., estimated H T H ), for N = K = 128 ,4-QAM, n = 4000 , rate-1/2. A v e r age S NR r equ i r ed t o a c h i e v e a c oded BE R o f − , i n d B Opt. LDPC code, perfect CSILDPC code in [35], perfect CSIOpt. LDPC code, est. CSILDPC in [35], est. CSIN=128, 4−QAMn=4000, rate=1/2

Fig. 21. Comparison of the average SNR required to achieve a coded BERof − by the joint detector-decoder with 1) perfect channel knowledgeand 2) estimated channel knowledge (i.e., estimated H T H ), for variousloading factors with N = 128 , 4-QAM, n = 4000 , rate-1/2. to MMSE detection complexity (since the proposed detectionalgorithm does not need a matrix inversion). Yet, it couldachieve much better performance compared to MMSE de-tection performance. The proposed CHEMP receiver outper-formed MMSE and other message passing receivers usingan MMSE estimate of H . We presented an analysis of theconvergence of the proposed detection algorithm and a meansquare difference analysis of the LLRs in proposed receiverwith perfect and estimated CSI. The irregular LDPC codesobtained for the considered large MIMO channel and theproposed CHEMP receiver through EXIT chart matchingachieved better coded BER performance compared to off-the-shelf irregular LDPC codes. Stronger conditions for con-vergence compared to the condition in (28) and convergence −5 −4 −3 −2 −1 Average SNR in dB C oded BE R S NR a t c apa c i t y ( . d B ) N = K = 128, 4−QAMn=11520, rate=1/2Perfect CSILDPC code in [39]LDPC code in [35]LDPC code in [38]Opt. LDPC code (Prop)

Fig. 22. Coded BER performance comparison between the optimized LDPCcode and other LDPC codes in [38] and in WiMax standard [39]. N = K =128 , 4-QAM, n = 11520 , rate-1/2, perfect CSI. analysis for the case of estimated channel knowledge arepotential topics for future research. Extension of the proposedreceiver approach to frequency-selective channels can also becarried out as future extension to this work.R EFERENCES[1] K. V. Vardhan, S. K. Mohammed, A. Chockalingam, and B. S. Rajan,“A low-complexity detector for large MIMO systems and multicarrierCDMA systems,”

IEEE J. Sel. Areas Commun., vol. 26, no. 3, pp.473-485, Apr. 2008.[2] S. K. Mohammed, A. Chockalingam, and B. S. Rajan, “A low-Complexity precoder for large multiuser MISO systems,”

Proc. IEEEVTC’2008 , pp. 797-801, May 2008.[3] S. K. Mohammed, A. Zaki, A. Chockalingam, and B. S. Rajan, “High-rate spacetime coded large-MIMO systems: low-complexity detectionand channel estimation,”

IEEE J. Sel. Topics Signal Proc. , vol. 3, no.6, pp. 958-974, Dec. 2009.[4] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,O. Edfors, and F. Tufvesson, “Scaling up MIMO: opportunities andchallenges with very large arrays,”

IEEE Signal Process. Mag. , vol.30, no. 1, pp. 40-60, Jan. 2013.[5] J. Hoydis, S. ten Brink, and M. Debbah, “Massive MIMO in the UL/DLof cellular networks: how many antennas do we need?”

IEEE J. Sel.Areas in Commun. , vol. 31, no. 2, pp. 160-171, Feb. 2013.[6] B. Cerato and E. Viterbo, “Hardware implementation of a low-complexity detector for large MIMO,”

Proc. IEEE ISCAS’2009 , pp.593-596, May 2009.[7] P. Li and R. D. Murch, “Multiple output selection-LAS algorithm inlarge MIMO systems,”

IEEE Commun. Lett. , vol. 14, no. 5, pp. 399-401, May 2010.[8] N. Srinidhi, T. Datta, A. Chockalingam, and B. S. Rajan, “Layered tabusearch algorithm for large-MIMO detection and a lower bound on MLperformance,”

IEEE Trans. Commun. , vol. 59, no. 11, pp. 2955-2963,Nov. 2011.[9] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, “Random-restart reactive tabu search algorithm for detection in large-MIMOsystems,”

IEEE Commun. Lett. , vol. 14, no. 12, pp. 1107-1109, Dec.2010.[10] C. Knievel, M. Noemm, and P. A. Hoeher, “Low complexity receiverfor large-MIMO space time coded systems,”

Proc. IEEE VTC’2011-Fall , pp. 1-5, Sep. 2011.[11] P. Som, T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, “Low-complexity detection in large-dimension MIMO-ISI channels usinggraphical Models,”

IEEE J. Sel. Topics Signal Proc. , vol. 5, no. 8,pp. 1497-1511, Dec. 2011

12] J. Goldberger and A. Leshem, “MIMO detection for high-order QAMbased on a Gaussian tree approximation,”

IEEE Trans. Inform. Theory ,vol. 57. no. 8, pp. 4973-4982, Aug. 2011.[13] Q. Zhou and X. Ma, “Element-based lattice reduction algorithms forlarge MIMO detection,”

IEEE J. Sel. Areas Commun. , vol. 31, no. 2,274-286, Feb. 2013.[14] K. A. Singhal, T. Datta, and A. Chockalingam, “Lattice reduction aideddetection in large-MIMO systems,”

Proc. IEEE SPAWC’2013 , pp. 589-593, Jun. 2013.[15] T. Datta, N. A. Kumar, A. Chockalingam, and B. S. Rajan, “A novelMonte-Carlo-sampling-based receiver for large-scale uplink multiuserMIMO systems,”

IEEE Trans. Veh. Tech., vol. 62, no. 7, pp. 3019-3038,Sep. 2013.[16] P. Svac, F. Meyer, E. Riegler, and F. Hlawatsch, “Soft-heuristicdetectors for large MIMO systems,”

IEEE Trans. Signal Proc. vol.61, no. 18, 4573-4586, Sep. 2013.[17] L. Dai, Z. Wang, and Z. Yang, “Spectrally efﬁcient time-frequencytraining OFDM for mobile large-scale MIMO systems,”

IEEE J. Sel.Areas. Commun. , vol. 31, no. 2, pp. 251-263, Feb. 2013.[18] T. Lakshmi Narasimhan, and A. Chockalingam, “EXIT chart baseddesign of irregular LDPC codes for large-MIMO systems,”

IEEEComm. Letters , vol.17, no.1, pp. 115-118, Jan. 2013.[19] B. J. Frey,

Graphical Models for Machine Learning and DigitalCommunication,

Cambridge: MIT Press, 1998.[20] R. J. McEliece, D. J. C. MacKay, and J-F. Cheng, “Turbo decodingas an instance of Pearls “belief propagation” algorithm,”

IEEE J. Sel.Areas Commun. , vol. 16, no. 2, pp. 140-152, Feb. 1998.[21] B. M. Kurkoski, P. H. Siegel, and J. K. Wolf, “Joint message-passingdecoding of LDPC codes and partial-response channels,”

IEEE Trans.Inform. Theory , vol. 48, no. 6, pp. 1410-1422, Jun. 2002.[22] D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and D. Dolev,“Linear detection via belief propagation,”

Proc. 45th Allerton Conf.on Commun., Control, and Comput. , September, 2007.[23] B. M. Hochwald, T. L. Marzetta, and V. Tarokh, “Multiple-antennachannel hardening and its implications for rate feedback and schedul-ing,”

IEEE Trans. Inform. Theory , vol. 50, no. 9, pp. 1893-1909, Sep.2004.[24] D. Tse and P. Viswanath,

Fundamentals of Wireless Communication ,Cambridge University Press, 2005.[25] V. A. Marˇcenko and L. A. Pastur, “Distribution of eigenvalues for somesets of random matrices,”

Math. USSR Sbornik , vol. 1, pp. 457-483,1967.[26] A. Tulino and S. Verdu,

Random Matrix Theory and Wireless Commu-nications , Foundations and Trends in Communications and InformationTheory, Now Publishers, Inc., 2004.[27] S. Moshavi, E. Kanterakis, and D. Schilling, “Multistage linear re-ceivers for DS-CDMA systems,”

Intl. Jl. of Wireless Inform. Netw. ,vol. 3, no. 1, pp. 1-17, Jan. 1996.[28] J. Hoydis, “Random matrix methods for advanced communicationsystems,” Ph.D. dissertation, Ecole superieure d’electricite, Gif-Sur-Yvette, France, 2009.[29] M. Wu, B. Yin, A. Vosoughi, C. Studer, J. R. Cavallaro, C. Dick,“Approximate matrix inversion for high-throughput data detection inthe large-scale MIMO uplink,”

Proc. IEEE ISCAS’2013, pp. 2155-2158, May 2013.[30] A. C. Aitken, “On Bernoulli’s numerical solution of algebraic equa-tions,”

Proc. Roy. Soc. Edinburgh , vol. 46, pp. 289-305, 1926.[31] M. Pretti, “A message passing algorithm with damping,”

J. Stat. Mech.:Theory and Practice , Nov. 2005, P11008.[32] S. ten Brink, “Convergence behavior of iteratively decoded parallelconcatenated codes,”

IEEE Trans. Commun., vol. 49, no. 10, pp. 1727-1737, Oct. 2001.[33] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-densityparity-check codes for modulation and detection,”

IEEE Trans. Com-mun., vol. 52, no. 4, pp. 670-678, Apr. 2004.[34] A. Ashikhmin, G. Kramer, and S. ten Brink, “Extrinsic informationtransfer functions: A model and two properties,” in

Proc. CISS ,Princeton, pp. 742-747, Mar. 2002.[35] T. Richardson, A. Shokrollahi, and R. Urbanke, “Design of capacity-approaching irregular codes,”

IEEE Trans. Inform. Theory , vol. 47, no.2, pp. 619-637, Feb. 2001.[36] J. R. Webb,

Functions of several real variables , Prentice Hall , 1991.[37] M. Cirkic and E. G. Larsson, “SUMIS: a near-optimal soft-output MIMO detector at low and ﬁxed complexity,” arXiv:1207.3316v2[cs.IT] 13 Aug 2013.[38] H. Jilei, P. H. Siegel, and L. B. Milstein, “Performance analysis andcode optimization of low density parity-check codes on Rayleighfading channels,”

IEEE J. Sel. Areas in Commun. , vol. 19, no. 5, pp.924-934, May 2001.[39] Air Interface for Fixed and Mobile Broadband Systems, IEEEP802.16e Draft, 2005., vol. 19, no. 5, pp.924-934, May 2001.[39] Air Interface for Fixed and Mobile Broadband Systems, IEEEP802.16e Draft, 2005.