[PDF] Probabilistic MIMO Symbol Detection with Expectation Consistency Approximate Inference

Abstract

In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expectation Propagation. For the MIMO symbol detection problem, we discuss feasible methods to find stationary points of the EC function and explore their tradeoffs between accuracy and speed of convergence. The accuracy is studied, first in terms of input-output mutual information and show that the proposed EC MIMO detector greatly improves state-of-the-art methods, with a complexity order cubic in the number of transmitting antennas. Second, these gains are corroborated by combining the probabilistic output of the EC detector with a low-density parity-check (LDPC) channel code.

Full PDF

11 Probabilistic MIMO Symbol Detection withExpectation Consistency Approximate Inference

Javier C´espedes, Pablo M. Olmos, Matilde S´anchez-Fern´andez, Fernando Perez-Cruz

Abstract —In this paper we explore low-complexity probabilis-tic algorithms for soft symbol detection in high-dimensionalmultiple-input multiple-output (MIMO) systems. We present anovel algorithm based on the Expectation Consistency (EC)framework, which describes the approximate inference problemas an optimization over a non-convex function. EC genera-lizes algorithms such as Belief Propagation and ExpectationPropagation. For the MIMO symbol detection problem, wediscuss feasible methods to ﬁnd stationary points of the ECfunction and explore their tradeoffs between accuracy andspeed of convergence. The accuracy is studied, ﬁrst in termsof input-output mutual information and show that the proposedEC MIMO detector greatly improves state-of-the-art methods,with a complexity order cubic in the number of transmittingantennas. Second, these gains are corroborated by combiningthe probabilistic output of the EC detector with a low-densityparity-check (LDPC) channel code.

Index Terms —MIMO Communication Systems, ApproximateInference, Expectation Consistency, Low-density Parity-CheckCodes.

I. I

NTRODUCTION

With the increasing demand for higher data rates, multiple-input multiple-output (MIMO) systems have attracted muchattention over the last decade [1]. It is well known that MIMOcommunication systems achieve substantial gains in termsof spectral efﬁciency compared to conventional single-inputsingle-output (SISO) systems. In fact, it has been shown thatunder ideal conditions the capacity of a point-to-point MIMOsystem with m transmitting antennas and r receiving antennasscales linearly with min( m, r ) , which is referred to as themultiplexing gain [2].Modern channel-coding techniques, such as Turbo codes [3]or LDPC codes [4], are needed to achieve transmission ratesclose to the fundamental theoretical limits of the MIMO chan-nel. Efﬁcient decoding is possible using the belief propagation This work has been partly funded by the Spanish Government throughprojects MIMOTEX (TEC2014-61776-EXP), CIES (RTC-2015-4213-7),ELISA (TEC2014-59255-C3-3R) and FLUID (TEC2016-78434-C3-3-R), bythe Juan de la Cierva program (IJCI-2014-19150), by the European ResearchCouncil (ERC) through the European Unions Horizon 2020 research andinnovation program under Grant 714161, and by Comunidad de Madrid(project ’CASI-CAM-CM’, id. S2013/ICE-2845). We gratefully acknowledgethe support of NVIDIA Corporation with the donation of the Titan X PascalGPU used for this research.J. C´espedes, P. M. Olmos and M. S´anchez-Fern´andez are with the SignalTheory & Communications Department, Universidad Carlos III de Madrid.Pablo M. Olmos is also with the Gregorio Mara˜n´on Health Research Institute.E-mail: { jcespedes,olmos,mati } @tsc.uc3m.es F. Perez-Cruz is with the Signal Theory & Communications De-partment, Universidad Carlos III de Madrid. He is also the ChiefData Scientist at the Swiss Data Science Center (Switzerland). Email: [email protected] , [email protected] (BP) algorithm [4], [5], which is a low-complexity message-passing approximate inference method to estimate marginalsin a joint probability distribution. BP decoding needs as inputan estimate to the posterior probability of each coded bitgiven the vector of channel observations. This informationis provided by the so-called probabilistic symbol detector,which has to marginalize the posterior probability densityfunction (pdf) of the transmitted vector of symbols, given thechannel observation. For a MIMO channel, this has complexity O ( M m ) , where M is the constellation order.Multiple algorithms have been proposed to perform hard-output symbol detection in MIMO systems, see [6]–[15].On the contrary, the list of probabilistic symbol detectionalgorithms is comparatively much shorter. Soft-ouput spheredecoding (SD) methods solve the marginalization in a sub-space of the constellation alphabet A m [16]–[18]. However,to maintain good performance, the dimension of the sub-space must grow rapidly with m , the modulation order andthe inverse of the signal-to-noise ratio (SNR) [19]. Thus, SDmethods are not suitable for massive MIMO scenarios, whereboth m and M are potentially very large. Alternatively, someother works consider the use of Markov chain Monte Carlo(MCMC) algorithms to approximate the marginal posteriorprobabilities [20]–[23]. While this approach has been shownto be viable for hard-output symbol detection, probabilisticdetection requires a sufﬁciently large number of samples perconstellation point at each transmitter. For large m and high-order constellations, MCMC methods become excessivelyburdensome.The focus of this paper is on MIMO probabilistic symboldetection methods that can scale up to hundreds of antennasand high-order modulations based on quadrature amplitudemodulation (QAM). In particular, we focus on methods withpolynomial complexity with the number m of transmit an-tennas. The minimum-mean-squared error (MMSE) solutioncan be cast as a probabilistic detector since it computes themode of a Gaussian approximation to the posterior pdf ofthe MIMO symbols [13], [24], likewise its soft successiveinterference cancellation (soft MMSE-SIC) version [25]. Inboth implementations complexity is dominated by an m × m matrix inversion. The Gaussian tree approximation (GTA)algorithm [26], very close in hard detection performance toMMSE-SIC, is a detection algorithm that constructs a tree-factorized approximation to posterior pdf of the symbols,to then estimate marginals distributions using BP. Also, in-spired by their success in compressed sensing [27], in recentyears there has been an intense research interest on MIMOdetection techniques based on message passing algorithms. a r X i v : . [ c s . I T ] O c t We can mention the channel hardening-exploiting messagepassing (CHEMP) in [28] and the Gaussian Message PassingIterative Detector (GMPID) in [29], [30]. Both methods havebeen shown to be effective (close to SD methods) for largeMIMO systems with QPSK constellations. However, asymp-totic analysis of this type of algorithms shows that they donot perform well with high-order QAM constellations unlessthe number r of receiving antennas is much larger than thenumber m of transmitting antennas [30], [31]. An improvedversion of the GMPID algorithm called SA-GMPID is shownto asymptotically converge to the MMSE detection solutioneven for the case m/r > [32]. We remark that in this paperwe propose algorithms that, while having larger complexitycompared to these type of message-passing algorithms, theysigniﬁcantly improve the MMSE solution.In [33], we proposed Expectation Propagation (EP) [34],[35] to perform hard-output MIMO symbol detection in thehigh SNR regime. In that paper, EP is used to ﬁnd themode of the posterior probability distribution by projectingit into a Gaussian approximation. The method cannot beeasily generalized to perform probabilistic detection, as itsdescription is essentially an iterative algorithm that does notprovide the complete picture of the fundamental underlyinginference problem. Actually, in [36] we showed that, whilethe MIMO EP receiver in [33] is able to signiﬁcantly improveGTA as hard detector, achieving gains of around 2 dBs, bothmethods perform similarly when combined with an LDPCchannel decoder that requires a probabilistic input. In a simplerscenario, i.e. channel equalization for single-user intersymbol-interference (ISI) channels, different heuristics have been re-cently proposed in [37] to improve the EP probabilistic output,but it is shown that ultimately a turbo-like receiver, wherethe LDPC decoder output is fed back to the EP equalizer, isrequired to obtain a robust solution that is not tailored to aparticular modulation or channel instance.In this work, we consider one-shot receiver architectures,in which the channel decoder output is not fed back to theMIMO symbol detector to modify the original estimate. Inthis scenario, the design of the MIMO detector is particularlycrucial, as the overall system performance highly dependson its accuracy. One-shot receivers can be used in latency-constrained applications instead of iterative Turbo-like re-ceivers, as the latency in the latter case can become too largeif long block channel codes are used [38]. Furthermore,we show how probabilistic MIMO symbol detection can beimplemented using a general approximate inference frame-work called Expectation Consistency (EC), which was ﬁrstdescribed by Opper & Winther in [39]. In EC, we describethe inference problem as the search of a stationary point ofan approximation to the free energy associated to the trueposterior probability distribution of the transmitted symbols.Any stationary point satisﬁes a moment matching conditionbetween the involved distributions. In this paper, we tailor theoriginal EC formulation to the MIMO detection case and wediscuss feasible methods to ﬁnd such stationary points andshow the fundamental tradeoffs between accuracy and speedof convergence. In particular, we propose an update rule thatperforms very close to the moment matching EC solution, with a complexity comparable to running MMSE ten times. Also,we propose methods to overcome numerical instabilities thatmay arise in the MIMO detection scenario, particularly whenwe use large constellation alphabets. In all tested scenarios,we ﬁnd solutions that are robust and accurate across differ-ent modulation orders and system dimensions. Finally, theresulting EC probabilistic MIMO detector achieves excellentperformance results compared to state-of-the-art methods withthe same complexity order.To measure the accuracy of the EC MIMO detector proba-bilistic output, ﬁrst we use a Monte Carlo estimate to the mu-tual information between the transmitted MIMO symbol vectorand the corresponding output of the probabilistic symboldetection stage. At high SNRs, all detection methods saturateat the same mutual information level, i.e., log ( M ) bits perchannel use per antenna, due to the use of a ﬁnite discrete con-stellation of M points. Operating in the high-SNR region ofsaturation is undesirable, as the gap to channel capacity growsexponentially as we increase the SNR. However, at moderateSNR, our proposed detector outperforms other detectors in theliterature and, in those scenarios where we could obtain theoptimal detector solution, EC gets very close to it. Second, thepredicted gain at moderate-SNRs is corroborated by bit errorrate (BER) performance simulation using optimized irregularLDPC block codes [40] and terminated convolutional-LDPCblock codes [41], [42]. In all cases, we obtain remarkable SNRgains, proving that the accuracy of the MIMO probabilisticsymbol detection stage is crucial in the system’s performance.Overall, the contributions of this paper are summarized asfollows: • We introduce EC approximate inference framework andshow how it can be applied to the MIMO detectionscenario, developing the EC free energy approximationand computing its gradients. • We compare several approaches to ﬁnd EC stationarypoints, and propose iterative rules that are able to ap-proach the optimal solution at O ( m ) complexity. • We obtain the achievable rate (mutual information) of asingle-user MIMO system to show the accuracy in thepdf approximation to the true posterior, also proving thatwith EC detection we signiﬁcantly reduce the gap tocapacity. The predicted gains are corroborated via errorrate simulation with optimized LDPC codes.The paper is structured as follows. In Section II we re-view the system model. In Section III, we discuss on thetransmission rate and how it depends on the MIMO symboldetection method implemented, highlighting the importance ofa good approximation to the true posterior. Section IV brieﬂypresents the EC approximate inference framework and wetailor it to the MIMO detection case in Section V. In SectionVI, experimental results are presented. Final conclusions andpotential lines of future research are described in Section VII.Notation: Capital and lowercase boldface symbols representmatrices and vectors respectively. [ · ] (cid:62) is the transpose and [ · ] H is the Hermitian. Finally, [ n ] denotes the set { , , . . . , n } . Fig. 1. System model

II. S

YSTEM M ODEL

Consider a single-user MIMO system where m transmittingantennas communicate to a receiver with r antennas. The sys-tem model is shown in Fig. 1. Let b = [ b , b , ..., b k ] (cid:62) denotethe input information binary vector, which is Gray-mappedand modulated into QAM symbols. Then, an m -dimensionalvector of QAM symbols is generated, that is denoted by u = u re + j u im ∈ A m , where |A| = M . The symbol vector u ,is transmitted over a memoryless ﬂat-fading complex MIMOchannel, deﬁned as a matrix H with dimensions r × m of zero-mean unit-variance complex Gaussian coefﬁcients. Therefore, y = Hu + w , (1)where y ∈ C r and w ∈ C r is an additive white circular-symmetric complex Gaussian noise vector with independentzero-mean components and σ w -variance. We also assume thatthe receiver has perfect channel state information (CSI). Onthe other hand, the signal-to-noise ratio is deﬁned asSNR(dB) = 10 log (cid:18) m log ( M ) E b σ w (cid:19) , (2)where E b is the bit energy and the constellation energy E s can be written as E s = E b log ( M ) . (3)Note that the SNR deﬁned is taking into account thefull power transmission instead of the per-antenna power.Given the channel observation, the posterior distribution of thetransmitted symbols, that would lead to the optimal detectorand that is also denoted through this work as true posterior,is p ( u | y ) = p ( y | u ) p ( u ) p ( y ) ∝ N ( y : Hu , σ w I ) p ( u ) , (4)where N ( y : Hu , σ w I ) denotes a complex Gaussian withmean Hu and covariance matrix σ w I , and p ( u ) is the priorprobability density function for u . Assuming that we transmitindependent uniformly distributed symbols, we have p ( u ) = m (cid:89) i =1 p ( u i ) = m (cid:89) i =1 M I u i ∈A , (5)where I u i ∈A takes value one if u i belongs to A . Observe that,due to the likelihood term in (4), p ( u | y ) is a multidimensionaldiscrete distribution that maps over a fully connected factorgraph. Exact inference over p ( u | y ) , required to evaluatesymbol marginals p ( u i | y ) , i ∈ [ m ] , to later feed a modernchannel decoder, has cost O ( M m ) and quickly (in both M and m ) becomes unfeasible. A. Posterior approximation and inference

One of the alternatives to implement a low complexity prob-abilistic symbol detector is to construct a tractable distribution q ( u ) that approximates p ( u | y ) . By tractable we mean thatperforming inference over q ( u ) , namely marginalizing it orcomputing expectations, is feasible. Other options, reduce ormodify the constellation space, as for example SD.Focusing on the ﬁrst alternative, the MMSE method can beseen as a Gaussian approximation q ( u ) to p ( u | y ) obtained byreplacing the independent discrete priors in (5) by the productof univariate zero-mean and E s -variance complex circularly-symmetric Gaussian factors [13], [24]. The Gaussian treeapproximation (GTA) was ﬁrst proposed in [26]. The methodconstructs a tractable cycle-free discrete approximation to (4)by replacing the Gaussian likelihood term p ( y | u ) by a Gaus-sian distribution that factorizes in cycle-free graph, chosen tomatch the marginal and cross-moments of p ( y | u ) . Using thiscycle-free approximation to the likelihood, efﬁcient inferenceis carried out using BP. Finally, there exist several recent pro-posals that perform approximate inference for MIMO symboldetection based on approximate message passing (AMP) [27].AMP algorithms essentially implement the standard rules ofBP message passing [43] and all messages are approximatedwith univariate Gaussian distributions. Among AMP methodsfor MIMO detection, we can mention the CHEMP algorithmin [28] and GMPID in [30]. An approximation to p ( u | y ) can be constructed from the AMP marginals using the Bethereparameterization [43].In Section V-D, we have included a table summarizing thetheoretical complexity order of each of the MIMO detectionmethods we use in our experiments.III. T RANSMISSION RATE

Consider a ﬁxed and known channel matrix H , underthe system model deﬁned in Section II. With the powerconstraint E [ u T u ] ≤ SNR σ w , the ergodic channel capacityper transmitted antenna with perfect CSI at the receiver andno CSI at the transmitter is given by C = max p ( u ) I ( u , y ) m = log (det( I r + SNR m HH H )) m (6)bits per channel use and antenna. Capacity is achieved when u is Gaussian distributed with zero-mean and covariance matrixequal to identity [44].When u is a random vector uniformly distributed in A m ,the system transmission rate degrades and can be far from thecapacity limit in (6). The achievable rate per antenna can becomputed by evaluating the mutual information between u i ,the transmitted symbol at i -th antenna and ˆ u i ∼ p ( u i | y ) , i.e., I ( u i , ˆ u i ) = E p ( u i , ˆ u i ) (cid:20) log p (ˆ u i | u i ) p (ˆ u i ) (cid:21) (bits/channel use) , (7)for i ∈ [ m ] . Unfortunately, it is not possible to compute thismutual information in closed-form. We follow a Monte Carloprocedure to estimate m (cid:80) mi =1 I ( u i , ˆ u i ) in the same channelknowledge scenario as the one assumed in (6), namely perfectCSI only at the receiver. More precisely, we estimate I ( u i , ˆ u i ) , Fig. 2. Transmission rate in × scenario with QPSK modulation. i ∈ [ m ] , at one particular SNR point as follows: ﬁrst, wecollect N ∈ Z + samples from the joint distribution of u i , y and ˆ u i . Using this set of samples, we estimate p (ˆ u i ) , p (ˆ u i | u i ) for any u i , ˆ u i ∈ A , and, ﬁnally, compute a numerical estimateto I ( u i , ˆ u i ) in (7). As N → ∞ , the estimate to I ( u i , ˆ u i ) getstight. Samples of the joint ( u , y , ˆ u ) distribution are computedusing ancestral sampling [45]. Each of the N samples isgenerated following the next steps:1) Sample u from an uniform distribution in A m .2) Sample y from p ( y | u , H ) .3) Sample ˆ u i , i ∈ [ m ] , from p ( u i | y ) = (cid:88) u − i p ( u | y ) u i ∈ A , (8)where u − i denotes all elements in u except u i .When a probabilistic symbol detector does not use the trueposterior, the transmission rate can be evaluated by following asimilar procedure, but in 3) we sample ˆ u i after marginalizationover q ( u ) , namely the approximation constructed to p ( u | y ) .Thus, the average mutual information computed for each lowcomplexity detection method is used as a performance metricthat measures how close q ( u ) is to p ( u | y ) . At the sametime, the better the quality of the approximation is, the higherthe rate becomes. Note also that to compute this metric, weconsider uncoded transmission. For instance, Fig. 2 shows theaverage mutual information per antenna in a × scenariowith QPSK modulation for both the optimal detector (whichworks directly with the true posterior p ( u | y ) ), and for MMSE,GTA and CHEMP suboptimal detectors. It has been computedwith N = 10 samples per SNR point. Also, results havebeen averaged over 100 realizations of H . Observe that allmethods operate close to the limit of bits/channel use whenthe SNR is high, but the gap to channel capacity in this regimegrows exponentially fast with the SNR. For intermediate SNRvalues, optimal detection clearly outperforms MMSE, GTA and CHEMP detection . It is precisely in this regime wherewe must improve the accuracy of the probabilistic symboldetection stage.IV. E XPECTATION C ONSISTENCY A PPROXIMATE I NFERENCE FOR

MIMO

DETECTION

In this section we give a brief introduction to EC ap-proximate inference [39], to then tailor it for low-complexityprobabilistic MIMO detection. Let U be a random variablewith a probability density function that factors in the followingway p ( u ) = 1 Z f q ( u ) f r ( u ) , (9)where we assume that computing Z = (cid:82) f q ( u ) f r ( u ) d u orany expectation w.r.t. p ( u ) is unfeasible. However, we doassume that, separately, f q ( u ) and f r ( u ) are tractable w.r.t.a measure of the form exp( l T φ ( u )) for some function vector φ ( u ) = [ φ ( u ) , . . . , φ J ( u )] . Namely, we assume it is possibleto perform inference over the following two distributions usedto approximate p ( u ) : q ( u ) = 1 Z q ( l q ) f q ( u ) exp( l (cid:62) q φ ( u )) , (10) r ( u ) = 1 Z r ( l r ) f r ( u ) exp( l (cid:62) r φ ( u )) , (11)where the J × parameter vectors l q and l r belong to a certainconvex set Φ , and Z q ( l q ) = (cid:90) f q ( u ) exp( l (cid:62) q φ ( u )) d u , (12) Z r ( l r ) = (cid:90) f r ( u ) exp( l (cid:62) r φ ( u )) d u . (13)Note that both q ( u ) and r ( u ) deﬁne an exponential familyof distributions , where l q ( l r ) is the natural parameter vec-tor, φ ( u ) is the vector of sufﬁcient statistics and log Z q ( l q ) ( log Z r ( l r ) ) is a convex function of l q ( l r ) that satisﬁes ∇ l q log Z q ( l q ) = E q ( u ) [ φ ( u )] , (14) ∇ l r log Z r ( l r ) = E r ( u ) [ φ ( u )] . (15)The main idea behind EC approximate inference is to optimize l q and l r so that q ( u ) and r ( u ) have the same moments, i.e.,(14) is consistent with (15), keeping in mind that both q ( u ) and r ( u ) , being the functions used to approximate p ( u ) , contain“partial information” ( f q ( u ) and f r ( u ) respectively) of thistrue distribution p ( u ) .The ﬁrst step to derive the EC approximation is to note thatthe partition function Z in (9) can be expressed the followingway Z = Z q ( l q ) ZZ q ( l q ) = Z q ( l q ) (cid:90) f q ( u ) f r ( u ) d u Z q ( l q ) == Z q ( l q ) (cid:90) f q ( u ) Z q ( l q ) f r ( u ) exp(( l q − l q ) (cid:62) φ ( u )) d u (16) = Z q ( l q ) E q ( u ) [ f r ( u ) exp( − l (cid:62) q φ ( u ))] . (17) Note the similarities with the throughput results presented in [46]. See [43] for an introduction to exponential families and their properties.

And thus, log Z = log Z q ( l q ) + log (cid:0) E q ( u ) [ f r ( u ) exp( − l (cid:62) q φ ( u ))] (cid:1) . (18)where log Z is also known as the energy function . In order toestimate the expectation in the above expression, we replace q ( u ) by a simpler distribution s ( u ) that belongs to the sameexponential family than q ( u ) and r ( u ) , i.e., s ( u ) = 1 Z s ( l s ) exp( l (cid:62) s φ ( u )) , (19)where log Z s ( l s ) is a convex function of l s that satisﬁes ∇ l s log Z s ( l s ) = E s ( u ) [ φ ( u )] . While replacing q ( u ) by s ( u ) yields, in general, a poor approximation, it can be a fairlyreasonable solution if both q ( u ) and s ( u ) have the same mo-ments, namely if E q ( u ) [ φ ( u )] = E s ( u ) [ φ ( u )] . This conditionis naturally achieved as a stationary point of the resultingapproximation to log Z . By replacing q ( u ) by s ( u ) in (18), log Z is approximated by log Z EC ( l q , l s ) == log Z q ( l q ) + log( E s ( u ) [ f r ( u ) exp( − l (cid:62) q φ ( u ))]) , (20)and after simple manipulation this term can be expressed asfollows: log Z EC ( l q , l s ) == log Z q ( l q ) + log Z r ( l s − l q ) − log Z s ( l s ) . (21)Recall that by assumption Z q ( l q ) , Z r ( l s − l q ) and Z s ( l s ) can becomputed efﬁciently. And note that log Z EC depends only on l q and l s , while it depends on three probability distributions: q ( u ) with parameter vector l q , r ( u ) with parameter vector ( l s − l q ) and s ( u ) with parameter vector l s . Recall we seekmoment matching between q ( u ) and r ( u ) and also between q ( u ) and s ( u ) . While the ﬁrst condition ensures that thetwo approximations that we construct to p ( u ) are consistent,the latter is required so that the measure replacement in theexpectation in (18) is not too coarse. Both conditions aresatisﬁed at any point ( l ∗ q , l ∗ s ) where the gradient of the ECenergy function log Z EC ( l q , l s ) is zero, i.e. optimization over log Z EC ( l q , l s ) would lead to ( l ∗ q , l ∗ s ) . A. The EC free energy for MIMO detection

To simplify the low-complexity detector derivation, werewrite the probabilistic model in (4) to work with real-valued distributions, considering the real R ( · ) and imagi-nary I ( · ) parts separately. Deﬁne (cid:101) u = (cid:2) u (cid:62) re u (cid:62) im (cid:3) (cid:62) , (cid:101) y = (cid:2) R ( y ) (cid:62) I ( y ) (cid:62) (cid:3) (cid:62) , (cid:101) w = (cid:2) R ( w ) (cid:62) I ( w ) (cid:62) (cid:3) (cid:62) and (cid:101) H = (cid:20) R ( H ) −I ( H ) I ( H ) R ( H ) (cid:21) . Thus, the real-valued channel model is (cid:101) y = (cid:101) H (cid:101) u + (cid:101) w , (22)where σ (cid:101) w = σ w / is the variance of the real and imaginaryparts of the noise and we deﬁne (cid:101) A as the new alphabet for thereal and imaginary components of the M -QAM constellation, (cid:101) u ∈ (cid:101) A m , with energy (cid:101) E s = E s / . In the rest of thiswork we adopt the real-valued channel model formulationin (22) and we drop the model indicator (cid:102) ( · ) to keep thenotation uncluttered. Therefore, the a posteriori probability pdfof the transmitted symbol vector u , and that we propose toapproximate with tractable pdfs, can be expressed as follows p ( u | y ) = 1 Z N ( y : Hu , σ w I ) m (cid:89) i =1 I u i ∈A , (23)The matching of (23) with functions f q ( u ) and f r ( u ) in(9) will be done so that q ( u ) and r ( u ) in (10) and (11) aretractable w.r.t. a measure of the form exp( l T φ ( u )) , whichmeans that we have to be able to easily compute momentsof the form E [ φ ( u )] w.r.t. both distributions. For an EC basedlow-complexity detector we choose the vector of statistics andnatural parameters as follows φ ( u ) = (cid:20) u , u , . . . , u m , − u , − u , . . . , − u m (cid:21) (cid:62) , (24) l = [ γ , γ , . . . , γ m , Λ , Λ , . . . , Λ m ] (cid:62) = [ γ , Ł ] (cid:62) , (25)where γ ∈ R m and Ł ∈ R m + . According to (24), this choiceof φ ( u ) implies that at any zero-gradient point of the ECenergy function in (21), the distributions q ( u ) and r ( u ) mustbe consistent only in their marginal ﬁrst and second ordermoments. Under this assumption, if we choose functions f q ( u ) and f r ( u ) as follows f q ( u ) = N ( y : Hu , σ w I ) , and f r ( u ) = m (cid:89) i =1 p ( u i ) (26)then we conclude that q ( u ) and r ( u ) are tractable probabilitydensity functions, since q ( u ) is a Multivariate Normal distri-bution and r ( u ) is a discrete independent distribution. Moreprecisely, according to (10) and (26), we have q ( u ) = 1 Z q ( γ q , Ł q ) f q ( u ) exp (cid:18) γ (cid:62) q u − u (cid:62) diag( Ł q ) u (cid:19) = exp (cid:18) H (cid:62) y σ w + γ q (cid:19) (cid:62) (cid:124) (cid:123)(cid:122) (cid:125) g (cid:62) u − u (cid:62) (cid:18) H (cid:62) H σ w + diag( Ł q ) (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) S u  Z q ( γ q , Ł q ) , (27)where diag( Ł q ) is a diagonal matrix with main diagonal givenby Ł q . Therefore q ( u ) = N ( u : µ , Σ ) , and Σ = S − and µ = S − g . Also, we obtain log Z q ( γ q , Ł q ) = 12 µ T Σ − µ T + 12 log | Σ | . (28)By applying standard rules for matrix derivatives, we cancheck that ∂ log Z q ( γ q , Ł q ) ∂γ q,i = E q [ u i ] = µ i , (29) ∂ log Z q ( γ q , Ł q ) ∂ Λ q,i = − E q [ u i ] = − (cid:0) Σ ii + µ i (cid:1) . (30) On the other hand, from the deﬁnition of f r ( u ) in (26) we get r ( u ) = 1 Z r ( γ r , Ł r ) exp (cid:18) γ Tr u − u T diag ( Ł r ) u (cid:19) m (cid:89) i =1 I u i ∈A = 1 Z r ( γ r , Ł r ) m (cid:89) i =1 exp (cid:18) γ ri u i − Λ ri u i (cid:19) I u i ∈A . (31)Therefore, r ( u ) is an independent discrete pmf over A m suchthat, for i ∈ [2 m ] , E r [ u i ] = (cid:80) u i ∈A u i exp (cid:16) γ ri u i − Λ ri u i (cid:17)(cid:80) a ∈A exp (cid:16) γ ri a − Λ ri a (cid:17) , (32) E r [ u i ] = (cid:80) u i ∈A u i exp (cid:16) γ ri u i − Λ ri u i (cid:17)(cid:80) a ∈A exp (cid:16) γ ri a − Λ ri a (cid:17) . (33)Also we have log Z r ( γ r , Ł r ) = log (cid:32)(cid:88) a ∈A exp (cid:18) γ ri a − Λ ri a (cid:19)(cid:33) , (34)where we can again check that, ∂ log Z r ( γ r , Ł r ) ∂γ r,i = E r [ u i ] and ∂ log Z r ( γ r , Ł r ) ∂ Λ r,i = − E r [ u i ] , for i ∈ [2 m ] . Finally, theaveraging distribution s ( u ) in (19) is given by s ( u ) = 1 Z s ( l s ) exp (cid:18) γ (cid:62) s u − u (cid:62) diag ( Ł s ) u (cid:19) , (35)and therefore s ( u ) is an independent Gaussian distribution,i.e. s ( u ) = N ( u : diag (cid:0) Ł − s (cid:1) γ s , diag (cid:0) Ł − s (cid:1) ) .Note that, given the vector of moments in (24), any choicefor the functions f q ( u ) and f r ( u ) different to (26), wheresome discrete priors are multiplied together with the Gaussianlikelihood term p ( y | u ) , would result in q ( u ) or r ( u ) beingan hybrid distribution, with some components taking valuesonly in A and some other components taking real values.In such a case, evaluating the moments E [ φ ( u )] would bean issue. On the other hand, while many other statistics canbe included in the vector φ ( u ) , e.g. cross moments of theform u i u j for some or all pairs of variables, we will showin the experimental results session that our choice in (24)drives a robust and accurate MIMO detector. For instance, inthe experimental section we show that the EC-based MIMOdetector average mutual information in (7) is very close to theoptimal detector for an scenario where the true posterior canbe evaluated. Hence, there is little room for improvement ofthe EC solution by including higher order moments in φ ( u ) .V. O PTIMIZING THE

MIMO EC

FREE ENERGY

As described in the previous section, the goal in ECinference is to ﬁnd ( γ q , Ł q ) and ( γ s , Ł s ) such that q ( u ) in(27), r ( u ) in (31) (evaluated at γ r = γ s − γ q and Ł r = Ł s − Ł q )and s ( u ) in (35) satisfy E q [ u i ] = E r [ u i ] = E s [ u i ] (36) E q [ u i ] = E r [ u i ] = E s [ u i ] (37)for i ∈ [2 m ] . To achieve such a point, we present two algorithms. Theso-called single loop (SL), iteratively updates either ( γ q , Ł q ) or ( γ r , Ł r ) and follows a message-passing procedure. Theresulting algorithm has approximately the MMSE complexityper iteration (see Table I). On the other hand, by exploiting thefact that the EC free energy in (21) is a convex function w.r.t. ( γ q , Ł q ) , the so-called double loop algorithm (DL) performsiteratively a convex optimization to set ( γ q , Ł q ) for ﬁxed ( γ s , Ł s ) to then update the latter. Simulation results in SectionV-C show that the DL algorithm typically converges to apoint closer to the stationarity conditions in (36)-(37). As acaveat, its complexity is extremely large (see Table I) and wewould rather use it as a benchmark to improve the single loopapproach.It is important to remark that, for both algorithms, conver-gence to (36)-(37) is not guaranteed [39]. Actually, in mostcases we observe that both algorithms get stuck in a ( l q , l r ) point for which these parameters do not change anymore butat the same time the moment matching (MM) condition is notfully met. Our goal is to design robust algorithms to optimizethe EC free energy such that they converge to stable ( l q , l r ) points that are as close to the MM condition as possible. A. The EC MIMO detector with single loop updates

We initialize ( γ q , Ł q ) such that q ( u ) in (27) coincideswith the MMSE Gaussian approximation, i.e., γ (0) q = and Λ (0) qi = E − s ∀ i ∈ [2 m ] [13], [24]. The main stepsare summarized Algorithm 1. The complexity per iteration isdominated by the computation of the covariance matrix ofthe q ( u ) distribution in (27) at step 1) of the algorithm. Thiscomplexity is O ( m ) , but independent on the constellationsize M . After the matrix inversion, computing the mean of Algorithm 1

The EC MIMO detector with SL updatesFix a damping factor β . Set maximum number of iterations I EC-S . Set (cid:96) = 0 .Initialize γ (0) q = and Λ (0) qi = E − s i ∈ [2 m ] . repeat

1) Given γ ( (cid:96) − q , Ł ( (cid:96) − q , compute E q [ u i ] and E q [ u i ] , i ∈ [2 m ] .2) Compute γ ( (cid:96) ) s , Ł ( (cid:96) ) s such that E s [ u i ] = E q [ u i ] and E s [ u i ] = E q [ u i ] , i ∈ [2 m ] .3) Update γ ( (cid:96) ) r = γ ( (cid:96) ) s − γ ( (cid:96) ) q , Ł ( (cid:96) ) r = Ł ( (cid:96) ) s − Ł ( (cid:96) ) q .4) Given γ ( (cid:96) ) r , Ł ( (cid:96) ) r , compute E r [ u i ] and E r [ u i ] , i ∈ [2 m ] .5) Compute γ ( (cid:96) ) s , Ł ( (cid:96) ) s such that E s [ u i ] = E r [ u i ] and E s [ u i ] = E r [ u i ] , i ∈ [2 m ] .6) Update γ ( (cid:96) ) q = β (cid:16) γ ( (cid:96) ) s − γ ( (cid:96) ) r (cid:17) + (1 − β ) γ ( (cid:96) − q Ł ( (cid:96) ) q = β (cid:16) Ł ( (cid:96) ) s − Ł ( (cid:96) ) r (cid:17) + (1 − β ) Ł ( (cid:96) − q (cid:96) = (cid:96) + 1 until convergence (or (cid:96) > I EC-S ) q ( u ) requires O ( m ) operations. Computing the r ( u ) meanand variance in (32) and (33) requires O ( mM ) operations.The complexity of the rest of steps does not depend on theconstellation and thus the complexity is O ( m ) . Therefore, ifthe algorithm is run for I EC-S iterations, the ﬁnal complexityis O ( m I EC-S + m I EC-S + mM I EC-S + mI EC-S ) .Numerical issues arise due to the fact that we are propagat-ing moments between a continuous and a discrete distribution,particularly in scenarios where all the mass of the marginal r ( u i ) distribution is concentrated in a small region of apotentially very large QAM constellation. This leads to smallvalues of the marginal variance Var r [ u i ] and, consequently, Λ si may diverge in step 5). In order to avoid numerical issues, weimplement a damping (low-pass ﬁlter) in the update of ( γ q , Ł q ) at step 6) of Algorithm 1. Smoothing parameter updates viadamping is a fairly common technique to stabilize approximateinference iterative algorithms. See for instance [47]–[49] fordiscussions on message-passing stabilization. B. The EC MIMO detector with double loop updates

The double loop algorithm is based on a simultaneousupdate of both q ( u ) and r ( u ) at every iteration by solving thefollowing convex optimization problem for a ﬁxed ( γ s , Ł s )( γ ∗ q , Ł ∗ q ) = arg min ( γ q , Ł q ) log Z EC ( γ q , γ s , Ł q , Ł s ) (38) = arg min ( γ q , Ł q ) (log Z q ( γ q , Ł q ) + log Z r ( γ s − γ q , Ł s − Ł q )) At ( γ ∗ q , Ł ∗ q ) , both q ( u ) and r ( u ) have the same moments.Then, ( γ s , Ł s ) is recomputed to enforce moment matching (asin step 2) of Algorithm 1). Instead of using the distribution s ( u ) to iteratively communicate the moments between q ( u ) and r ( u ) , as the single loop algorithm does, note that thedouble loop is directly optimizing together both q ( u ) and r ( u ) to then update s ( u ) . The main steps are outlined inAlgorithm 2. We could use standard gradient descend tonumerically solve (38) in step 1). Note that in (28), evaluatingthe gradient of log Z q ( γ q , Ł q ) w.r.t. ( γ q , Ł q ) , requires a matrix Algorithm 2

The EC MIMO detector with DL updatesFix a damping factor β . Set maximum number of iterations I EC-D . Set (cid:96) = 0 .Initialize γ (0) s = and Λ (0) si = E − s i ∈ [2 m ] . repeat

1) Given γ ( (cid:96) − s , Ł ( (cid:96) − s , solve the convex optimization in(38).2) Compute γ ( (cid:96) ) s , Ł ( (cid:96) ) s such that E s [ u i ] = E q [ u i ] and E s [ u i ] = E q [ u i ] , i ∈ [2 m ] .3) Update γ ( (cid:96) ) s = β (cid:16) γ ( (cid:96) ) s (cid:17) + (1 − β ) γ ( (cid:96) − s Ł ( (cid:96) ) s = β (cid:16) Ł ( (cid:96) ) s (cid:17) + (1 − β ) Ł ( (cid:96) − s (cid:96) = (cid:96) + 1 until convergence (or (cid:96) > I EC-D ) inversion and a matrix product and thus a complexity of O ( m + m ) . If D denotes the number of gradient descendsteps and I EC-D is the number of iterations, then the complexityis O ( m DI EC-D + m DI EC-D + mI EC-D ) . C. Assessing convergence

The moment matching condition in (36) and (37) representsthe optimal operational point of the EC approximation. We em-phasize that this notion of optimality is measured in terms ofmoment matching between tractable approximations to p ( u | y ) ( q ( u ) and r ( u ) respectively), and not w.r.t. the distribution p ( u | y ) itself.For our experiments, we study the evolution of the followingtwo quantities along iterations of the single loop EC MIMOdetector: ∆ u = 12 m m (cid:88) i =1 (cid:12)(cid:12)(cid:12) E q [ u i ] − E r [ u i ] (cid:12)(cid:12)(cid:12) , (39) ∆ u = 12 m m (cid:88) i =1 (cid:12)(cid:12)(cid:12) E q [ u i ] − E r [ u i ] (cid:12)(cid:12)(cid:12) . (40)In Fig. 3 we represent ∆ u and ∆ u for a × scenario withQPSK modulation at a SNR of dB, averaged over real-izations of both the channel matrix H and received vector y .According to Fig. 2, this SNR value is far from the saturationregime (largest gap to channel capacity), and it is in this rangewhere we aim the EC detector at substantially improving state-of-the-art methods. With dotted black line we represent thedouble loop benchmark, computed for I EC-D = 50 iterations.At every iteration, we found that D , the number of gradientdescend updates at step 1) of Algorithm 2, has to be to a verylarge value until the gradient norm was below a threshold of . . We set an upper limit of D = 2000 and a gradient descendstep-size of − . We remark that every gradient descend stepis as complex as a single iteration of the single loop ECalgorithm.Three implementations of the SL algorithm are comparedin Fig. 3. For the red solid line we have used β = 0 . , i.e.,a very slow parameter update in step 6) of Algorithm 1. Theopposite case is represented by the green dashed line, whichhas been computed with β = 0 . . While the β = 0 . caseapproaches the double loop solution, achieving ∆ u and ∆ u around − , it requires in average 25 iterations to converge tosuch a stationary point. Recall that each single loop iterationis as complex as computing the MMSE estimate, due to thematrix inversion in (27). On the other hand, the β = 0 . casequickly saturates (around 10 iterations), but its solution is stillfar from the MM condition.In order to achieve a better trade-off between accuracy andcomplexity, we maintain the fast updates using β = 0 . , butmodify the parameter update in Algorithm 1 and introducea gradual decrease in the variance per component allowed ateach iteration. More precisely, we set an iteration-dependentminimum value of the variance E s [ u i ] at step 5) of Algorithm1 of the following form:Var s [ u i ] = max (cid:16) − max( (cid:96) − , , Var r [ u i ] (cid:17) , (41) TABLE I:

Complexity order of different r × m MIMO detectors.In iterative algorithms, I X denotes the number of iterations. D is thenumber of gradient descend steps for the double-loop EC detector.MIMO detector Complexity orderOptimal detector M m MMSE m + m + mM soft MMSE-SIC [25] O ( m + m + mr + mr + mM ) GTA [26] m + m M CHEMP [28] rm I CHEMP

EC (Single L.) m I EC-S + m I EC-S + mMI EC-S + mI EC-S

EC (Double L.) m DI EC-D + m DI EC-D + mI EC-D namely during the ﬁrst iterations we set a reasonably mini-mum high variance per component ( . ) and, from iteration ,we let this minimum value to decrease exponentially fast with (cid:96) . The convergence of this implementation of the EC algorithmis represented in Fig. 3 with blue dashed-dotted lines. Observethat an improvement is achieved w.r.t. the β = 0 . case,reducing the gap w.r.t. to the stationary point achieved by β = 0 . , without a signiﬁcant penalty in speed of convergence,as it typically converges in less than 10 iterations. These effectsare even more evident when we move to higher-dimensionalscenarios. In Fig. 4 we consider a × scenario withQPSK (a)-(b) and 64-QAM modulation (c)-(d). Convergencespeed is actually maintained and the gap w.r.t. the β = 0 . case is clearly reduced. While the parameter update in (41)was obtained heuristically after an intense empirical evaluationof the algorithms, we interpret the improvement achieved asfollows. Setting a high-variance parameter during the ﬁrstiterations of the algorithm is crucial in the low-SNR regime inorder to avoid over-ﬁtting. For large values of β , we observedthat the single loop EC algorithm performance is degradedby very small values of the r ( u i ) variance (Var r [ u i ] ) at earlyiterations (step 4) of Algorithm 1, indicating a very peakydistribution around a small region of the QAM constellation.Note that a very small variance is propagated to the s ( u i ) distribution at step 5) of Algorithm 1 with very large valuesof Λ si . According to (35), we have Λ − si = Var s [ u i ] = E r [ u i ] − ( E r [ u i ]) = Var r [ u i ] , (42)and the same effect is propagated to Λ qi at step 6) ofthe algorithm unless β is small enough. Very large valuesof Λ qi will dominate the diagonal of the matrix in (27)and, ultimately, this implies that successive steps of the ECalgorithm will not be able to signiﬁcantly change the u i marginal distribution anymore. Note that this is dramatic to thealgorithm performance if the mode of the r ( u i ) distributionis placed at the wrong symbol, which is likely to happen athigh-noise levels.Instead of using small values of β to control sudden changesin parameter updates, with the update in (41), we proposean easy way to artiﬁcially control overconﬁdent distributionsat early steps of the algorithm, which would restrain the ECalgorithm to move far away from the MMSE initial estimate.We note that using the EC moment matching criterion manyother variants of the single loop update methods can be tested and compared with our proposal. However, no signiﬁcant dif-ferences have been appreciated when we measure the systemperformance in terms of the mutual information in (7) orsystem bit error rate (BER). In the rest of the paper, regardlessof the dimension of the system or constellation order, weimplement the EC detector using the single loop approachwith β = 0 . , the progressive variance limit in (41) and amaximum number of iterations of I EC-S = 10 . D. Complexity

In Table V-D we summarize the main complexity orderof the algorithms presented and those that will be used inour simulation experiments in the next section. In iterativealgorithms, I X denotes the number of iterations. As a rule ofthumb, if we run the EC MIMO detector using I EC-S = 10 iterations, the incurred complexity is around 10 times largerthan the MMSE, GTA and CHEMP complexities. However,the signiﬁcant gain in performance that we report in the nextsection can justify the increased-complexity of the proposedEC detector. VI. E

XPERIMENTAL R ESULTS

In the following, we include simulation performance resultsthat demonstrate the accuracy of the EC approximation. In ourexperiments, we compare our proposal with the soft outputMMSE solution in [13], [24], the soft version of the MMSE-SIC in [25], the GTA algorithm in [26], and the CHEMPmethod in [28]. To avoid cluttering, we do not include inour experiments the GMPID algorithm [30], since it performsclose to CHEMP. For similar reasons, we do not include theEP method proposed in [33], since it performs similarly toGTA when used for probabilistic detection [36].

A. A Low Dimensional MIMO System

Consider again the × scenario with QPSK modulationdescribed in Fig. 2. Recall that the dimensionality is smallenough so we are able to solve the marginalization in (8)exactly, which represents the optimal detector. In Fig. 5(a) weinclude now the results for the EC MIMO detector. Remark-ably, it essentially overlaps the optimal detector performance,achieving a large gain w.r.t. GTA, MMSE and CHEMP. Whenthe number of antennas is small ( in our case), the columnsof the channel matrix H are typically non-orthogonal and thislimits the MMSE performance [13], [24]. Also, the CHEMPmethod relies on the matrix m − H (cid:62) H being diagonal and fora small m , this assumption is unrealistic [28].Results in Fig. 5(a) indicate that the MIMO system perfor-mance will highly beneﬁt from the more accurate estimates tothe symbol posterior marginals p ( u i | y ) provided by the EC de-tector. To corroborate this fact, we augment the system modelin Fig. 1 by including an LDPC channel encoding stage at thetransmitter and an LDPC channel decoder at the receiver. TheLDPC channel decoder is fed by soft coded bit probabilitiescomputed using the symbol posterior marginals p ( u i | y ) (ortheir estimates), according to the bit-modulation mapping. Itis well known that the more accurate the probabilistic detector is, the better performance is obtained after the LDPC decodingstage using BP [24], [50], [51]. In Fig. 5(b), we show forthis scenario the simulated BER measured after the LDPCdecoding stage (solid lines). A (3 , -regular LDPC code withblock length equal to bits has been used. Note that, tosimulate the coded performance, the SNR deﬁnition in (2) iscorrected by the coding rate R (the coding rate is R = 0 . inthe case of (3 , -regular LDPC code). To avoid confusion, wedenote this by SNR c , and thus SNR c (dB)=SNR+

10 log ( R ) .Results have been averaged over 5000 realizations of thechannel matrix H . In terms of coded performance, the gapbetween optimal detection and EC is only about . dBmeasured at a BER of − while the gap to GTA is over . dB. In all scenarios observe that, while the soft MMSE-SIC method always improves MMSE, and also GTA al lowSNR values, its performance is still far from the EC detector. B. A × MIMO system

In a larger scenario, exact marginalization is not viableanymore and we fully rely on approximate methods. In Fig. 6,we represent the obtained achievable rates for a × MIMOsystem using QPSK modulation (a), -QAM modulation(b), -QAM modulation (c), and -QAM modulation (d).While CHEMP and EC are competitive for the QPSK case,CHEMP is no longer a viable option in the -QAM or -QAM cases. As discussed in [28], the variance of theinterference noise that CHEMP aims to iteratively cancelgrows with the constellation order. For m = r and highorder constellations the interference noise becomes excessivelylarge. Note that the soft MMSE-SIC method always improvesMMSE and GTA al low-intermediate SNR values but still itsperformance is far from the EC detector.Following [28], it can be checked that CHEMP becomeseffective again as we reduce the number of transmittingantennas, i.e., if m < r . In Fig. 7 (a), we compare the ECand CHEMP transmission rates for a -QAM modulationwith r = 32 and m = 16 , and . In (b), we includeBER simulation results using the (3 , -regular LDPC codewith block length equal to bits. For small m values,CHEMP is comparative to the EC solution. However, itsperformance is severely degraded as m approaches r . CHEMPcan be regarded as a Gaussian message-passing distributedimplementation of the EC algorithm for those cases whereinterference is “locally” tractable. Unlike CHEMP, the EC al-gorithm performs the update of all parameters at the same timein a centralized manner. These results show that EC MIMOdetector is robust against the increase in the constellation order.In the following we solely consider m = r scenarios withhigh order constellations and hence we omit CHEMP fromthe results.We complete the study of this scenario by including BERperformance results using LDPC constructions that are de-signed to improve the performance of the (3 , -regular LDPCcode used in previous experiments. In Fig. 8 with dashed lineswe show the performance of the rate- / irregular LDPC codein [6, Example 3.99] with block length equal to bits. Wealso include simulation results (solid lines) for a convolutional LDPC (LDPCC) code constructed by spatially-coupling independent copies of a (3 , -regular LDPC code, each havingblock length of bits, with low-rate terminations [52]. Theresulting coding rate is . and the total block length is bits. For the irregular LDPC code, at moderate SNR ECis able to provide a signiﬁcant gain, which vanishes at highSNR because of the LDPC error ﬂoor. In contrast, becausethe LDPCC code has large minimum distance, no error ﬂoorhas been observed in the range of SNR considered and EPachieves a stable gain of . dB with respect to GTA. Finally,with dotted lines we include simulation results for a LDPCCcode with the same block length but constructed by spatially-coupling independent copies of a (3 , -regular LDPCcode. The resulting coding rate is . .VII. C ONCLUSIONS

Probabilistic symbol detection is a fundamental problem inhigh-dimensional MIMO communications since the accuracyof the method employed to approach the true posterior solutionmay bring signiﬁcant performance gains when combined witha modern capacity-approaching channel coding scheme. Inthis paper we have shown how the EC approximate infer-ence methodology, when applied to the posterior probabilitydistribution of the transmitted symbols, can lead to accurateestimates of the marginal distribution for each transmittedsymbol. Further, by computing the average per-antenna mu-tual information between the transmitted symbols and thosedistributed according to the EC output, we have shown thatthe system achievable rate heavily depends on the probabilisticdetector accuracy and thus the importance of this stage cannotbe diminished by using a more powerful channel code. Thisis actually corroborated by testing the system performancewhen we combine the probabilistic output of the symboldetectors with an LDPC channel decoder based on beliefpropagation. The presented EC probabilistic MIMO detectorhas cubic complexity with the number of antennas and it isable to greatly improve state-of-the-art methods within only10 iterations, where a matrix inversion has to be performedper iteration. R

EFERENCES[1] J. Mietzner, R. Schober, L. Lampe, W. H. Gerstacker, and P. A. Hoeher,“Multiple-antenna techniques for wireless communications - a com-prehensive literature survey,”

IEEE Communications Surveys Tutorials ,vol. 11, pp. 87–105, June 2009.[2] L. Zheng, P. Viswanath, and D. N. C. Tse, “Diversity and multiplexing: afundamental tradeoff in multiple-antenna channels,”

IEEE Transactionson Information Theory , vol. 49, pp. 1073–1095, May 2003.[3] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror-correcting coding and decoding: turbo-codes,” in

Proc. IEEE In-ternational Conference on Communications, Geneva, Switzerland , May1993.[4] T. J. Richardson and R. Urbanke,

Modern coding theory . CambridgeUniversity Press, 2008.[5] F. R. Kschischang, B. J. Frey, and H. Loeliger, “Factor graphs andthe sum-product algorithm,”

IEEE Transactions on Information Theory ,vol. 47, pp. 498–519, Feb. 2001. LDPCC codes are generated using protographs [53] in order to optimizeits minimum distance, as described in [54]. [6] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, andH. Bolcskei, “VLSI implementation of MIMO detection using the spheredecoding algorithm,” IEEE Journal of Solid-State Circuits , vol. 40,pp. 1566–1577, June 2005.[7] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-bestsphere decoding for MIMO detection,”

IEEE Journal on Selected Areasin Communications , vol. 24, pp. 491–503, March 2006.[8] G. D. Golden, C. J. Foschini, R. Valenzuela, and P. W. Wolniansky,“Detection algorithm and initial laboratory results using V-BLASTspace-time communication architecture,”

Electronics Letters , vol. 35,pp. 14–16, January 1999.[9] T.-h. Liu and Y.-L. Liu, “Modiﬁed fast recursive algorithm for efﬁcientMMSE-SIC detection of the V-BLAST system,”

IEEE Transactions onWireless Communications , vol. 7, pp. 3713–3717, October 2008.[10] H. Zhao, H. Long, and W. Wang, “Tabu Search Detection for MIMOSystems,” in

Proc. IEEE 18th International Symposium on Personal,Indoor and Mobile Radio Communications, Athens, Greece , September2007.[11] N. Srinidhi, T. Datta, A. Chockalingam, and B. S. Rajan, “LayeredTabu Search Algorithm for Large- MIMO Detection and a Lower Boundon ML Performance,”

IEEE Transactions on Communications , vol. 59,pp. 2955–2963, November 2011.[12] Q. Zhou and X. Ma, “Element-Based Lattice Reduction Algorithmsfor Large MIMO Detection,”

IEEE Journal on Selected Areas inCommunications , vol. 31, no. 2, pp. 274–286, 2013.[13] G. Caire, R. R. Muller, and T. Tanaka, “Iterative multiuser joint de-coding: optimal power allocation and low-complexity implementation,”

IEEE Transactions on Information Theory , vol. 50, pp. 1950–1973,September 2004.[14] J. Goldberger, “Improved MIMO Detection based on Successive TreeApproximations,” in

Proc. 2013 IEEE International Symposium onInformation Theory, Istanbul, Turkey , June 2013.[15] S. Yang and L. Hanzo, “Fifty years of MIMO detection: The road tolarge-scale MIMO,”

IEEE Communications Surveys & Tutorials , vol. 17,pp. 1941–1988, April 2015.[16] J. Boutros, N. Gresset, L. Brunel, and M. Fossorier, “Soft-input soft-output lattice sphere decoder for linear channels,” in

Proc. IEEE GlobalCommunications Conference, San Francisco, USA , December 2003.[17] C. Studer, A. Burg, and H. Bolcskei, “Soft-output sphere decoding:algorithms and VLSI implementation,”

IEEE Journal on Selected Areasin Communications , vol. 26, pp. 290–300, February 2008.[18] R. Wang and G. B. Giannakis, “Approaching MIMO channel capacitywith soft detection based on hard sphere decoding,”

IEEE Transactionson Communications , vol. 54, pp. 587–590, April 2006.[19] B. Steingrimsson, Z.-Q. Luo, and K. M. Wong, “Soft quasi-maximum-likelihood detection for multiple-antenna wireless channels,”

IEEETransactions on Signal Processing , vol. 51, pp. 2710–2719, November2003.[20] T. Datta, N. A. Kumar, A. Chockalingam, and B. S. Rajan, “A NovelMonte-Carlo-Sampling-Based Receiver for Large-Scale Uplink Mul-tiuser MIMO Systems,”

IEEE Transactions on Vehicular Technology ,vol. 62, pp. 3019–3038, September 2013.[21] M. Hansen, B. Hassibi, A. G. Dimakis, and W. Xu, “Near-OptimalDetection in MIMO Systems Using Gibbs Sampling,” in

Proc. IEEEGlobal Telecommunications Conference, Hawaii, USA , November 2009.[22] R.-R. Chen, R. Peng, A. Ashikhmin, and B. Farhang-Boroujeny, “Ap-proaching MIMO capacity using bitwise Markov Chain Monte Carlodetection,”

IEEE Transactions on Communications , vol. 58, pp. 423–428, February 2010.[23] Y. Jia, C. Andrieu, R. J. Piechocki, and M. Sandell, “Improving soft out-put quality of MIMO demodulation algorithm via importance sampling,”in

Proc. IEE International Conference on 3G Mobile CommunicationTechnologies, London, UK , 2004.[24] A. Sanderovich, M. Peleg, and S. Shamai, “LDPC coded MIMO multipleaccess with iterative joint decoding,”

IEEE Transactions on InformationTheory , vol. 51, pp. 1437–1450, April 2005.[25] J. Wang and S. Li, “Soft versus hard interference cancellation in MMSEOSIC MIMO detector: A comparative study,” in

Proc. 2007 4th Inter-national Symposium on Wireless Communication Systems, Trondheim,Norway , October 2007.[26] J. Goldberger and A. Leshem, “ MIMO Detection for High-OrderQAM Based on a Gaussian Tree Approximation,”

IEEE Transactionson Information Theory , vol. 57, pp. 4973–4982, August 2011.[27] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-Theoretically Optimal Compressed Sensing via Spatial Coupling andApproximate Message Passing,”

IEEE Transactions on InformationTheory, , vol. 59, pp. 7434–7464, November 2013. [28] T. L. Narasimhan and A. Chockalingam, “Channel Hardening-ExploitingMessage Passing ( CHEMP ) Receiver in Large-Scale MIMO Systems,”

IEEE Journal of Selected Topics in Signal Processing , vol. 8, pp. 847–860, October 2014.[29] L. Liu, C. Yuen, Y. L. Guan, and Y. Li, “Capacity-achieving iterativeLMMSE detection for MIMO-NOMA systems,” in ,May 2016.[30] L. Liu, C. Yuen, Y. L. Guan, Y. Li, and Y. Su, “Convergence analysisand assurance for gaussian message passing iterative detector in massiveMU-MIMO systems,”

IEEE Transactions on Wireless Communications ,vol. 15, pp. 6487–6501, September 2016.[31] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of largeMIMO detection via approximate message passing,” in

Proc. 2015 IEEEInternational Symposium on Information Theory, Hong Kong, China ,June 2015.[32] L. Liu, C. Yuen, Y. L. Guan, Y. Li, and C. Huang, “Gaussian MessagePassing Iterative Detection for MIMO-NOMA Systems with MassiveAccess,” in , Dec 2016.[33] J. C´espedes, P. M. Olmos, M. S´anchez-Fern´andez, and F. Perez-Cruz,“Expectation Propagation Detection for High-Order High-DimensionalMIMO Systems,”

IEEE Transactions on Communications , vol. 62,pp. 2840–2849, August 2014.[34] T. P. Minka, “Expectation propagation for approximate Bayesian infer-ence,” in

Proc. of the Seventeenth Coference on Uncertainty in ArtiﬁcialIntelligence, Seattle, USA , August 2001.[35] M. W. Seeger, “Expectation propagation for exponential families,” tech.rep., 2005.[36] J. Cespedes, P. M. Olmos, M. S´anchez-Fern´andez, and F. Perez-Cruz,“Improved performance of LDPC-coded MIMO systems with EP-based soft-decisions,” in

Proc. 2014 IEEE International Symposium onInformation Theory, Hawaii, USA , June 2014.[37] I. Santos, J. J. Murillo-Fuentes, R. Boloix-Tortosa, E. A. de Reyna,and P. M. Olmos, “Expectation Propagation as Turbo Equalizer in ISIChannels,”

IEEE Transactions on Communications , vol. 65, pp. 360–370, January 2017.[38] G. M. Vitetta, D. P. Taylor, G. Colavolpe, F. Pancaldi, and P. A. Martin,

Wireless Communications: Algorithmic Techniques . John Wiley & Sons,Ltd, 2013.[39] M. Opper and O. Winther, “Expectation Consistent Approximate Infer-ence,”

Journal of Machine Learning Research , vol. 6, pp. 2177–2204,December 2005.[40] T. J. Richardson, M. A. Shokrollahi, and R. Urbanke, “Design ofcapacity approaching irregular low-density parity-check codes,”

IEEETransactions on Information Theory , vol. 47, pp. 619–637, February2001.[41] D. J. Costello, Jr., L. Dolecek, T. Fuja, J. Kliewer, D. G. M. Mitchell,and R. Smarandache, “Spatially coupled sparse codes on graphs: theoryand practice,”

IEEE Communications Magazine , vol. 52, pp. 168–176,July 2014.[42] S. Kudekar, T. Richardson, and R. Urbanke, “Spatially Coupled En-sembles Universally Achieve Capacity under Belief Propagation,”

IEEETransactions on Information Theory , vol. 59, pp. 7761–7813, December2013.[43] M. J. Wainwright and M. I. Jordan,

Graphical Models, ExponentialFamilies, and Variational Inference . Foundations and Trends in MachineLearning, 2008.[44] E. Telatar, “Capacity of multi-antenna Gaussian channels,”

EuropeanTransactions on Telecommunication , vol. 10, pp. 585–596, November1999.[45] C. M. Bishop,

Pattern Recognition and Machine Learning (InformationScience and Statistics) . Secaucus, NJ, USA: Springer-Verlag, New York,2006.[46] J. Ketonen, M. Juntti, and J. R. Cavallaro, “Performance-complexitycomparison of receivers for a LTE MIMO OFDM system,”

IEEETransactions on Signal Processing , vol. 58, pp. 3360–3372, June 2010.[47] T. Heskes, “Stable ﬁxed points of loopy belief propagation are minimaof the Bethe free energy,” in

Proc. 2002 Advances in Neural InformationProcessing Systems , vol. 14, MIT Press, 2003.[48] J. M. Mooij and H. J. Kappen, “Sufﬁcient conditions for convergence ofthe sum-product algorithm,”

IEEE Transactions on Information Theory ,vol. 53, pp. 4422–4437, December 2007.[49] G. Elidan, I. McGraw, and D. Koller, “Residual belief propagation:informed scheduling for asynchronous message passing,” in

Proceedingsof the Twenty-second Conference on Uncertainty in Artiﬁcial Intelli-gence, Cambridge, USA , July 2006. [50] P. M. Olmos, J. J. Murillo-Fuentes, and F. P´erez-Cruz, “Joint nonlinearchannel equalization and soft LDPC decoding with Gaussian processes,” IEEE Transactions on Signal Processing , vol. 58, pp. 1183–1192, March2010.[51] A. G. D. Uchoa, R. C. D. Lamare, and C. Healy, “Iterative Detectionand Decoding Algorithms For Block-Fading Channels Using LDPCCodes,” in

Proc. 2014 IEEE Wireless Communications and NetworkingConference, Istanbul, Turkey , April 2014.[52] D. G. M. Mitchell, A. E. Pusane, M. Lentmaier, and D. J. Costello, Jr.,“Exact Free Distance and Trapping Set Growth Rates for LDPC Convo-lutional Codes,” in

Proc. IEEE International Symposium on InformationTheory, St. Petersburg, Russia , 2011.[53] J. Thorpe, “Low-Density Parity-Check ( LDPC ) codes constructed fromprotographs,” INP Progress Report 42-154, Jet Propulsion Laboratory,Pasadena, CA, 2003.[54] D. Mitchell, M. Lentmaier, and D. J. Costello, Jr., “Spatially CoupledLDPC Codes Constructed From Protographs,”

IEEE Transactions onInformation Theory , vol. 61, pp. 4866–4889, September 2015. (a)(b)

Fig. 3. We represent ∆ u and ∆ u for an × scenario with QPSKmodulation at a SNR of dB, averaged over realizations of both thechannel matrix H and received vector y . (a) (c)(b) (d) Fig. 4. In (a)-(b), we represent ∆ u and ∆ u for an × scenario with QPSK modulation at a SNR of dB, averaged over realizations of both thechannel matrix H and received vector y . In (c)-(d), we reproduce the results for an × scenario with 64-QAM modulation at a SNR of dB. (a)(b) Fig. 5. For an × MIMO system with QPSK modulation, in (a) we showthe achievable transmission rates. In (b), we include simulated performancewhen a (3 , -regular LDPC code with block length bits is used. (a) (b)(c) (d) Fig. 6. Transmission rate computed for an m = r = 32 MIMO system and different constellation orders with QPSK modulation (a), -QAM modulation(b), -QAM modulation (c) and -QAM modulation (d). (a) (b) Fig. 7. For an × m MIMO system with -QAM modulation, in (a) we show the achievable transmission rates for different m values. In (b), we includesimulated performance when a (3 , -regular LDPC code with block length bits is used. Fig. 8. System performance of an ×

32 16 -QAM using the irregular rate- / LDPC code in [6, Example 3.99] with (dashed lines) block length bits, a (3 , -regular LDPC convolutional code (solid lines) with the same block-length and coding rate . , and a (3 , -regular LDPC convolutionalcode (dotted lines) with the same block-length and coding rate .8698