[PDF] Performance Analysis of Quantized Uplink Massive MIMO-OFDM With Oversampling Under Adjacent Channel Interference

Abstract

Massive multiple-input multiple-output (MIMO) systems have attracted much attention lately due to the many advantages they provide over single-antenna systems. Owing to the many antennas, low-cost implementation and low power consumption per antenna are desired. To that end, massive MIMO structures with low-resolution analog-to-digital converters (ADC) have been investigated in many studies. However, the effect of a strong interferer in the adjacent band on quantized massive MIMO systems have not been examined yet. In this study, we analyze the performance of uplink massive MIMO with low-resolution ADCs under frequency selective fading with orthogonal frequency division multiplexing in the perfect and imperfect receiver channel state information cases. We derive analytical expressions for the bit error rate and ergodic capacity. We show that the interfering band can be suppressed by increasing the number of antennas or the oversampling rate when a zero-forcing receiver is employed.

Full PDF

11 Performance Analysis of Quantized UplinkMassive MIMO-OFDM With OversamplingUnder Adjacent Channel Interference

Ali Bulut ¨Uc¸ ¨unc ¨u,

Student Member, IEEE,

Emil Bj ¨ornson,

Senior Member, IEEE,

H˚akan Johansson,

Senior Member, IEEE,

Ali ¨Ozg ¨ur Yılmaz,

Member, IEEE, and Erik G. Larsson,

Fellow, IEEE

Abstract

Massive multiple-input multiple-output (MIMO) systems have attracted much attention lately dueto the many advantages they provide over single-antenna systems. Owing to the many antennas, low-cost implementation and low power consumption per antenna are desired. To that end, massive MIMOstructures with low-resolution analog-to-digital converters (ADC) have been investigated in many studies.However, the effect of a strong interferer in the adjacent band on quantized massive MIMO systemshave not been examined yet. In this study, we analyze the performance of uplink massive MIMO withlow-resolution ADCs under frequency selective fading with orthogonal frequency division multiplexingin the perfect and imperfect receiver channel state information cases. We derive analytical expressionsfor the bit error rate and ergodic capacity. We show that the interfering band can be suppressed byincreasing the number of antennas or the oversampling rate when a zero-forcing receiver is employed.

Index Terms

Oversampling, analog-to-digital converter, interferer, massive MIMO, analysis, quantization, one-bit,low-resolution, uplink.

This version of the paper has been accepted for publication in IEEE Transactions on Communications. The digital objectidentiﬁer of the accepted paper is 10.1109/TCOMM.2019.2954512.The work of A. B. ¨Uc¸ ¨unc¨u was supported by the ASELSAN Graduate Scholarship for Turkish Academicians.A. B. ¨Uc¸ ¨unc¨u and A. ¨O Yılmaz are with the Department of Electrical and Electronics Engineering, Middle East TechnicalUniversity, 06800, Ankara, Turkey, (e-mail: { ucuncu, aoyilmaz } @metu.edu.tr)E. Bj¨ornson, H. Johansson and E. G. Larsson are with the Department of Electrical Engineering (ISY), Link¨oping University,SE-581 83 Link¨oping, Sweden (e-mail: { emil.bjornson, hakan.johansson, erik.g.larsson } @liu.se) a r X i v : . [ c s . I T ] F e b I. I

NTRODUCTION

Massive multiple-input multiple-output (MIMO) systems provide high spectral and energyefﬁciency with relatively simple signal processing, promoting their utilization in modern com-munication systems [1]–[4]. However, due to the large number of antennas, low-cost hardwareand low power consumption per antenna may be necessary owing to feasibility reasons [5], [6].With regards to the employment of low-cost and power efﬁcient hardware, low-resolutionanalog-to-digital converters (ADC) can be beneﬁcial [7]–[9]. It is known that the power con-sumption of ADCs is increased by 2 to 4 times with each bit increase [10]. Among the low-resolution ADCs, one-bit ADC has the lowest possible complexity (it is composed of a singlecomparator; and no automatic gain control (AGC) unit is needed) and power consumption,which is why it has been analyzed widely in the massive MIMO literature [8], [9], [11]–[19].In particular, [8], [9], [11]–[13], [16], [17] analyze uplink massive MIMO systems with one-bit or low-resolution ADCs in terms of capacity, error rate and achievable rate, and proposechannel estimation algorithms. Temporal oversampling for such systems is proposed in [18]–[21] for single carrier transmission, along with a performance analysis. Many other studies haveproposed optimal/near optimal non-linear detectors for uplink massive MIMO systems with low-resolution ADCs (e.g. [14], [15], [22]). Another study [7] has analyzed the aggregate effect ofan ADC, non-linear power ampliﬁer, and phase noise on uplink massive MIMO with orthogonalfrequency division multiplexing (OFDM).In none of the aforementioned studies, the performance of heavily quantized massive MIMOwith an interferer in an adjacent channel is examined. However, such interference can be atsigniﬁcant levels due to near/far effect in a communication system in which users in the adjacentfrequency band may be much closer to the receiver than the users in the desired band, thus,their signal may not be adequately suppressed by the receivers intending to extract the signalsin the desired band. In fact, having the dynamic range to mitigate such interference is a keyreason for using high-resolution ADCs in current systems [23]. Since distortion is large withlow-resolution ADCs, there is a risk that such systems are practically nonoperational.This study is the ﬁrst to analyze heavily quantized and OFDM modulated massive MIMOsystems for an adjacent channel interference (ACI) scenario under frequency selective fadingand channel estimation errors. It is also the ﬁrst to analyze the performance of oversamplingADCs for such a scenario.

Other than the investigated scenario being different from the existing studies in the literature,the difference of this work compared to the aforementioned studies dealing with the analysisof quantized uplink massive MIMO systems with low-resolution ADCs are as follows. In [18]–[20], temporal oversampling and corresponding performance analysis is performed for an uplinkmassive MIMO system with one-bit ADCs in a single-carrier environment and ﬂat fading, whichresults in a high-complexity receiver in terms of baseband signal processing, whereas our studyconsiders an OFDM system under frequency selective channel with low-resolution ADCs (notonly one-bit), whose receiver complexity is changing almost linearly with oversampling. Anotherstudy [13] analyzes massive MIMO structures with one-bit ADCs under frequency selectivefading. In that work, quantization noise is regarded as an uncorrelated distortion in time andspace, which fails to hold when oversampling is performed or when the number of users ornoise variance is not high [12], [24]. However, it will be seen that our analysis takes intoaccount the temporal and spatial correlation in the quantization distortion, which enables anaccurate analysis. Moreover, [12] provides an analysis for ﬂat fading channels for massive MIMOstructures with one-bit ADCs and provide a short section for the analysis of frequency selectivechannel case claiming that extension from ﬂat fading channel case is straightforward. However,the sizes of the covariance matrices found for the quantized received signal for frequency selectivecase are

M N × M N , M and N being the number of antennas and block length (the numberof samples in a coherence interval or in pilot duration). This large size makes their use inthe performance analysis and channel estimation (covariance matrix inverse is used in channelestimation) infeasible in terms of computational complexity (even their storage in memory duringsimulations is problematic) unless the block length or number of antennas is very small, whichis not the case for massive MIMO. There is a similar complexity problem in [24], where thematrix sizes involved in the calculation of the performance metrics are as large as M N × M N .This problem is addressed in [7] by combining frequency domain operation with time domainoperation for the calculation of necessary covariance matrices. However, many important pointsregarding the construction of the signal model using Bussgang decomposition or proofs regardinghow to ﬁnd correlation matrices in frequency domain from time domain matrices or vica versaare omitted. Furthermore, the calculation of a quantization noise covariance matrix in [7, Eqn.28]is valid only when the quantization noise is uncorrelated over the time dimension, which does nothold for very low-resolution ADCs. Although the quantization noise covariance matrix calculationof [7] is shown to provide accurate results in [7], this is mostly due to the fact that the investigated

ADC bit resolution in [7] (6 bits) is rather high. The more general version taking into account thetime domain correlation is provided in Proposition 2 in this work. Moreover, the effect of systemparameters such as the number of receive antennas or oversampling rate cannot be deduced fromthe signal-to-interference-noise-and-distortion ratio (SINDR) expressions in [7], [24], whereaswe provide some approximate expressions for SINDR in Proposition 4, in which the effect ofsystem parameters can easily be followed. Moreover, the analysis in [7], [24] is only performedfor the perfect channel state information (CSI) case, whereas we propose a channel estimationalgorithm and include the effect of imperfect CSI in this study. In summary, the contributionitems are as provided below. • This study is the ﬁrst to analyze the performance of uplink massive MIMO systems withlow-resolution ADCs in terms of error-rate and ergodic capacity under an ACI scenario.The analysis covers the frequency selective fading channel conditions. We obtain two typesof analytical expressions for SINDR, one being more precise, whose accuracy is veriﬁedwith simulations, and the other being less accurate but able to provide clear insights intothe system performance and parameters. We show both analytically and with simulationsthat it is possible to combat ACI by increasing the number of receive antennas. • We analyze the effect of oversampling in such systems and show analytically that oversam-pling is also effective to suppress ACI. We also show that signiﬁcant performance gainscan be obtained by oversampling either with simulations or theoretical analysis. • We propose a linear minimum mean square error (LMMSE) based channel estimationalgorithm taking into account the effect of an adjacent channel interferer. The providedanalysis is able to incorporate the effect of imperfect CSI on the system performance. • We extend our analysis to multi-bit ADCs and discuss whether to increase the ADCresolution or oversampling rate by making comparisons in terms of error-rate performancewhile ADC power consumptions are kept constant. • The analysis in the paper is general in that it can also be applied to the scenario where noACI is present. For no ACI case, the analysis in the paper – requires much less memory resources than the ones in [12], [24] for SINDR calculationswhile providing closed-form expressions for quantization noise covariance matrix forone-bit ADCs as in (19) and (20) (more details for this item are mentioned previously), – takes into account the temporal and spatial correlation for the quantization noise, which S/P

IDFT

P/S

Up-Conversion

S/P

IDFT

P/S

Up-Conversion users (desired band)

S/P

IDFT

P/S

Up-Conversion

S/P

IDFT

P/S

Up-Conversion

MultipathChannel

ChannelEqualization/DataDetection

ADC

Down

Conversion

P/S

DFT

S/P

ADC

Down

Conversion

P/S

DFT

S/P

DataSymbolsDataSymbolsDataSymbolsDataSymbols users (interfering band) receive antennas(desired band)

Fig. 1: Multi-user uplink massive MIMO-OFDM block diagram in an interfering band scenario.are neglected in [7], [13], among which [13] does not cover the effect of oversampling, – can clearly show the impact of system parameters (such as number of antennas, over-sampling rate, etc.) and imperfect CSI on the system performance unlike [7], [24],where no channel estimation technique is proposed.A preliminary version for a subset of the mentioned contribution items is to appear as aconference paper [25]. Notation: c is a scalar, c is a column vector, C is a matrix and C H , C T , and C ∗ represent theHermitian, transpose, and conjugate of matrix C , respectively. ( C ) m,n stands for the element ofmatrix C at its m th row and n th column and |G| represents the cardinality of the set G . E [ . ] takesthe expectation of its operand. (cid:60) ( . ) and (cid:61) ( . ) take the real and imaginary parts of their operandsand j = √− . || . || corresponds to the Euclidean norm. K and I K are zero and identity matriceswith size K × K ; and diag( C ) is the diagonal matrix, whose diagonal entries are equal to thediagonal entries of C . Moreover, Tr [ . ] is the trace operator. Furthermore, Φ( . ) is the standardnormal cumulative distribution function (CDF), that is, Φ( x ) = √ π (cid:82) x −∞ e − z / dz .II. S YSTEM M ODEL

We consider the uplink scenario depicted in Fig. 1. It is assumed that U users send theirinformation to a base station in an OFDM massive MIMO setting through a set of subcarriersthat are assigned to them. This set of subcarriers will be denoted by K D and will be referred toas the desired band in this study. The desired band users are illustrated with green backgroundin Fig. 1. The receiver side in Fig. 1 is a typical OFDM receiver, in which the ADC block isassumed to have low-resolution in this study. Another group of I users, whose assigned set ofsubcarriers is denoted by the set K I , is acting as an interfering source to the users in the desired Desired Band Interfering BandRF FilterDesired Band Interfering Band Low-Pass FilterDesired Band Interfering BandInterfering Band SubcarrierIndex

Mixer at S a m p li n g a t (rad/sample) Desired Band (Replica) Desired BandDesired Band D T F T S a m p li ng a t D F T Fig. 2: Example plots for the spectrum of the signals at various receiver stages.band. These users are shaded with red background in Fig. 1. The set of subcarriers in K I will alsobe referred to as interfering band in the remainder of this study. It should be noted that althoughthe interference is from an adjacent band, it may not be suppressed enough due to near/far effectdespite all analog ﬁlters involved in the down-conversion stages. Example plots for the spectrumof the signals at various points of a zero intermediate frequency (IF) receiver are provided inFig. 2, where f c and f i are the carrier frequencies for the signals at the desired and interferingbands, respectively. The top left and right plots in Fig. 2 represent the power spectral densities(PSD) of the signals at the radio-frequency (RF) front end (just before the bandpass RF ﬁltercentered at f c , for which the interfering band signal is much stronger compared to the desiredband signal) and at the mixer output (before the low-pass ﬁlter (LPF) in the down-converter),respectively. Such a scenario can be considered as a typical long-term evolution (LTE) case inwhich different users are assigned to rectangular areas of resource blocks or subbands [26]. Thediscrete-time Fourier transform (DTFT) and discrete Fourier transform (DFT) of the sampled(but unquantized) signal are also shown at the bottom two plots in Fig. 2, where ω c = 2 πf c /F s , ω i = 2 πf i /F s and S = N ( f i − f c ) / (2 F s ) , F s being the sampling rate. Moreover, it is alsoassumed that the receiver samples fast enough to cover both the desired and interfering band toavoid any interference due to aliasing from the interfering band to the desired band. This is tostudy the isolated spectral leakage effect from the interfering to desired band owing to the non-linearity due to low-resolution ADCs, as ACI due to aliasing will occur even when there is nonon-linearity, which is not the focus of this work. The sets of users in the desired and interferingband are denoted by U D = { , , . . . , U } and U I = { U + 1 , U + 2 , . . . , U + I } , respectively. A. Signal Model

We denote the complex data symbol of a user u transmitted at the k th subcarrier by ˜ s u [ k ] , k = 0 , , . . . , N − , where N is the DFT size. Not all subcarriers are occupied, that is, ˜ s u [ k ] = 0 for k / ∈ K D when u ∈ U D or k / ∈ K I when u ∈ U I , as shown in the bottom right of Fig. 2. Theoversampling rate for the users in desired and interfering bands, β D and β I , are deﬁned as a ratioof the total number of subcarriers N to the number of occupied subcarriers, that is, β D (cid:44) N/ |K D | and β I (cid:44) N/ |K I | . Increasing N while |K D | or |K I | is ﬁxed is termed as “oversampling” since weconsider the case that the OFDM symbol duration N T s is ﬁxed, where T s is the sampling period,requiring that the sampling rate ( /T s ) is increased while N is increased. This also ensures thatthe transmission bandwidth of the desired channel users is kept the same when the oversamplingrate β D is increased, as the subcarrier spacing / ( N T s ) and the number of occupied subcarriers |K D | are ﬁxed, which results in a ﬁxed transmission bandwidth of |K D | / ( N T s ) .Following those deﬁnitions, the discrete-time signal of the u th user at the inverse DFT (IDFT)output, s u [ n ] , can be expressed as s u [ n ] =  ρ d √ N (cid:80) k ∈K D ˜ s u [ k ] e j πnk/N if u ∈ U D ,ρ i √ N (cid:80) k ∈K I ˜ s u [ k ] e j πnk/N if u ∈ U I , (1)for n = 0 , , . . . , N − . Here ρ d and ρ i are the average transmit power parameters for the desiredor interfering band users. Moreover, data symbols have unit energy, that is, E [ | ˜ s u [ k ] | ] = 1 . Forsimple equalization at the receiver side, a cyclic preﬁx (CP) of length L cp is added to thebeginning of the OFDM symbol such that s u [ n ] = s u [ N + n ] for n = − L cp + 1 , . . . , − . It isrequired that L cp ≥ L − , L being the number of channel taps .After IDFT, the up-conversion block forms the OFDM modulated continuous time signal atthe carrier frequency, whose baseband equivalent expression for user u is s u ( t ) = N − (cid:88) n = − L +1 s u [ n ] p ( t − nT s ) , (2)where p ( t ) = sinc( t/T s ) and − L cp T s < t < N T s . In (2), the effect of the neighboring OFDMframes (blocks) is omitted, as inter-block interference will not be present when CP is discardedat the receiver. Although the bandwidth of p ( t ) seems to increase with the sampling rate, the L will be increased when the sampling rate ( /T s ) is increased as the delay spread of the channels does not change withthe sampling rate. bandwidth of s u ( t ) is limited to |K D | / ( N T s ) or |K I | / ( N T s ) due to the correlation statistics of s u [ n ] in time. The baseband equivalent received signal at the m th antenna can be expressed as r m ( t ) = U + I (cid:88) u =1 h (cid:48) m,u ( t ) ∗ s u ( t ) + w m ( t ) , (3)where ∗ represents the convolution operation and w m ( t ) is the ﬁltered additive white complexGaussian noise process at the m th antenna. Moreover, h (cid:48) m,u ( t ) is the continuous-time channelimpulse response between the u th user and the m th antenna. In writing (3), it is assumed thatthe bandpass RF ﬁlter and the LPF after the mixer have a ﬂat response over the frequency bands [ f c − F s / , f c + F s / and [ − F s / , F s / , respectively. Assuming that the sampling rate of thereceiver is the same as that of the transmitter, (2) and (3) imply that the discrete-time receivedsignal at the m th antenna, namely r m [ n ] = r m ( nT s ) , can be expressed as follows: r m [ n ] = r m ( nT s ) = U + I (cid:88) u =1 L − (cid:88) (cid:96) =0 h m,u [ (cid:96) ] s u [ n − (cid:96) ] + w m [ n ] , (4)where w m [ n ] = w m ( nT s ) , h m,u [ (cid:96) ] = h m,u ( (cid:96)T s ) , in which h m,u ( t ) (cid:44) h (cid:48) m,u ( t ) ∗ sinc( t/T s ) .We assume that the channel coefﬁcients h m,u [ (cid:96) ] have complex Gaussian distribution and areuncorrelated, that is, E [ h m,u [ (cid:96) ] h m,u [ (cid:96) ] ∗ ] = p [ (cid:96) ] δ [ (cid:96) − (cid:96) ] , as in the related studies [7], [13],[24]. Here, p [ (cid:96) ] is the power delay proﬁle of the channel satisfying (cid:80) L − l =0 p [ (cid:96) ] = 1 . A morecompact version of (4) is r [ n ] = L − (cid:88) (cid:96) =0 H [ (cid:96) ] s [ n − (cid:96) ] + w [ n ] , (5)where H [ (cid:96) ] is an M × ( U + I ) matrix whose element at the m th row and the u th column is h m,u [ l ] . The elements of H [ (cid:96) ] are also assumed to be uncorrelated. Moreover, s [ n ] is a columnvector whose u th element is s u [ n ] . Furthermore, w [ n ] and r [ n ] are column vectors whose m th element is equal to w m [ n ] and r m [ n ] , respectively.The signal at the output of the one-bit quantizer (the case for the higher bit resolutions ispresented in Section IV), namely d [ n ] , can be written in terms of its input, r [ n ] as d [ n ] = sign( (cid:60) ( r [ n ])) + j sign( (cid:61) ( r [ n ])) , (6)where sign( . ) is the sign function. The DFT of the quantizer output is also taken to obtain theinput signal to the channel equalization and data detection block, namely ˜d [ k ] , as follows: ˜ d [ k ] = N − (cid:88) n =0 d [ n ] e − j πnk/N . (7)How to obtain the data estimates based on ˜d [ k ] will be considered in the next section. III. P

ERFORMANCE A NALYSIS

For a tractable analysis of the non-linear system with one-bit ADCs, the Bussgang decomposi-tion [12], which enables a linear input-output relation for a non-linear system, will be employed.Before that, it is necessary to reexpress (4) as r = H s + w , (8)where r = (cid:104) r [ N − T r [ N − T · · · r [0] T (cid:105) T , s = (cid:104) s [ N − T s [ N − T · · · s [0] T (cid:105) T , w = (cid:104) w [ N − T w [ N − T · · · w [0] T (cid:105) T , and H is a block circulant matrix of size N M × N ( U + I ) that can be expressed as follows: H =  H [0] H [1] · · · H [ L − · · ·

00 H [0] H [1] . . . . . . . . . ...... ... . . . . . . . . . . . . · · · [0] · · · · · · H [ L − ... ... . . . . . . . . . . . . · · · H [2] H [1] · · · H [ L − · · · H [0] H [1] H [1] H [2] · · · H [ L − · · · H [0]  (9)This is the MIMO extension of the circulant channel matrix deﬁned for a single-input single-output (SISO) OFDM scenario in [27]. According to the Bussgang decomposition [12], [28], d = A r + q , (10)where q is the equivalent quantization noise vector, d = (cid:104) d [ N − T d [ N − T · · · d [0] T (cid:105) T ,and A = C H d r C − r , where C r = E [ r r H ] and C d r H = E [ d r H ] . The size of A is N M × N M . Thischoice of A minimizes the variance of the quantizer noise or equivalently makes r uncorrelatedwith q . For a one-bit quantizer, assuming zero-mean Gaussian inputs , the following holds [12]: A = (cid:114) π diag( C r ) − . = (cid:114) π diag (cid:0) HR s H H + N o I (cid:1) − . , (11) This assumption is approximately true even when the transmitted symbols are from a ﬁnite cardinality set rather than beingGaussian distributed. The ADC input at the m th antenna can be written as sum of S mU and S mI , where S mU is the sum of U |K D | L independent identically distributed (i.i.d.) signals with ﬁnite variance from the desired band, and S mI is the sum of I |K I | L i.i.d. signals from the interfering band. Due to the central limit theorem, S mU and S mI converge to Gaussian as U |K D | L and I |K I | L grow large, so does the ADC input S mU + S mI . It can be shown by the Berry-Essen inequality that the differencebetween the CDFs of S mU or S mI and the CDF of a Gaussian random variable with the same mean and variance is always lessthan 0.02, even when U and I are as low as , L = 3 and |K D | or |K I | is . Since CDFs are between 0 and 1, an error of0.02 is negligable. For higher I |K I | L or U |K D | L , this error will be much less (error decreases with (cid:112) U |K D | L or (cid:112) I |K I | L ). where R s = E [ ss H ] . Although a closed form expression as in (11) is available to calculate thematrix A , its calculation is not simple, whose complexity is in the order of N M , which is verylarge for a typical massive MIMO scenario. Therefore, an alternative low-complexity approachwill be presented. Since A is a diagonal matrix, (10) can be modiﬁed as d [ n ] = A [ n ] r [ n ] + q [ n ] , (12)for n = 1 , . . . , N , where A [ n ] is a diagonal matrix whose diagonal elements are equal to a setof diagonal elements of A , which are, [( A ) n,n ( A ) n +1 ,n +1 · · · ( A ) n + M − ,n + M − ] . Moreover, as r [ n ] is stationary owing to the addition of the cyclic preﬁx and A [ n ] is diagonal, it can be shownthat A [ n ] = A [ n (cid:48) ] (cid:44) A , ∀ n, n (cid:48) , thus d [ n ] = Ar [ n ] + q [ n ] , (13)where A = (cid:112) /π diag ( C r [0]) − . is an M × M matrix, in which C r [ m ] (cid:44) E (cid:2) r [ n ] r [ n − m ] H (cid:3) .It is still hard to ﬁnd C r [0] in the time domain using (5) as there exist correlation between thetime domain symbols s [ n ] due to oversampling. It will also be more convenient to work in thefrequency domain as the ﬁnal detection of the data symbols will be performed in that domain,thus any SINDR expression to be used in the analysis should be found for the frequency domainobservations. Therefore, the analysis continues by taking the DFT of both sides of (13), yielding ˜d [ k ] = A˜r [ k ] + ˜q [ k ] , (14)where ˜ r [ k ] has a simple expression that can be found by using (5) and considering the circulantproperty of the channel convolution matrix due to the addition of the cyclic preﬁx as ˜ r [ k ] =  ρ d √ N ˜ H [ k ]˜ s [ k ] + ˜ w [ k ] , if k ∈ K D ,ρ i √ N ˜ H [ k ]˜ s [ k ] + ˜ w [ k ] , if k ∈ K I , otherwise , (15)where ˜ s [ k ] (cid:44) [˜ s [ k ] ˜ s [ k ] · · · ˜ s U + I [ k ]] , k = 0 , , . . . , N − . Moreover, ˜r [ k ] = (cid:80) N − n =0 r [ n ] e − j πnk/N , ˜H [ k ] = (cid:80) N − n =0 H [ n ] e − j πnk/N , ˜w [ k ] = (cid:80) N − n =0 w [ n ] e − j πnk/N . Next, we deﬁne C ˜r [ k ] (cid:44) E (cid:2) ˜r [ k ] ˜r [ k ] H (cid:3) .Since E [ ˜r [ k ] ˜r [ k (cid:48) ] H ] = for k (cid:54) = k (cid:48) , C r [ m ] can be found according to the following proposition. Proposition 1. C r [ m ] can be computed from C ˜r [ k ] as C r [ m ] = 1 N N − (cid:88) k =0 C ˜r [ k ] e j πmk/N . (16) Proof.

See Appendix A. (cid:4) C r [ m ] can be calculated by taking the IDFT of C ˜r [ k ] , which can be found using (15) as C ˜r [ k ] =  ρ d N ˜ H [ k ] ˜ H [ k ] H + N N o I , if k ∈ K D ,ρ i N ˜ H [ k ] ˜ H [ k ] H + N N o I , if k ∈ K I , , otherwise . (17)What remains is to ﬁnd the covariance matrix of the quantization distortion ˜q [ k ] , namely C ˜q [ k ] = E [ ˜q [ k ] ˜q [ k ] H ] . Consider the quantization noise vector q in (10). C q = E [ q q H ] is given by C q = C d − AC r A H . (18)The matrix sizes in (18) are M N × M N , which can be very large, and require vast amount ofmemory resources for the computations (for instance, with M = 64 , N = 1024 , each matrix in(18) requires about 32 GB space for double precision number format). Such large matrices arealso present in the studies [12], [24]. Therefore, an alternative method will be proposed to workwith matrices of feasible sizes. From the block Toeplitz structure of C q and the fact that A isa diagonal matrix, it can be shown that C q [ m ] = C d [ m ] − AC r [ m ] A H , (19) C d [ m ] = 4 π (cid:16) asin (cid:16) D r [ m ] − (cid:60) ( C r [ m ]) D r [ m ] − (cid:17) + j asin (cid:16) D r [ m ] − (cid:61) ( C r [ m ]) D r [ m ] − (cid:17)(cid:17) , (20)where D r [ m ] = diag ( C r [ m ]) and m = 0 , , . . . , N − . (20) is a result of the arcsine law [29].Note that the matrix sizes in (19) and (20) are M × M and the memory requirement to holdthe matrices in (19) is N times smaller compared to (18). Moreover, it can be noted that thetemporal and spatial correlation of quantization noise is taken into account since C q [ m ] can becalculated for m (cid:54) = 0 using (19) and is not necessarily a diagonal matrix. Now that every matrixin (19) is known, C q [ m ] can be calculated. The next step is to ﬁnd C ˜q [ k ] = E [ ˜q [ k ] ˜q [ k ] H ] from C q [ m ] , which is performed in the following proposition. Proposition 2. C q [ k ] can be computed from C q [ m ] as C q [ k ] = DFT N,k { Γ [ m ] } + DFT N,k { Γ [ m ] } ∗ − N C ˜q [0] , (21)where Γ [ m ] (cid:44) ( N − m ) C q [ m ] and DFT

N,k { Γ [ m ] } = (cid:80) N − m =0 Γ [ m ] e − j πmk/N . Proof.

See Appendix B. (cid:4) A. Data Detection

For data detection, zero-forcing (ZF) combining is applied to the DFT output ˜ d [ k ] to obtaindata estimates ˆx [ k ] as ˆx [ k ] = ˆB [ k ] ˜d [ k ] , (22)for k ∈ K D . In (22), ˆB [ k ] = (cid:16) ˆH [ k ] H ˆH [ k ] (cid:17) − ˆH [ k ] H , where ˆH [ k ] is the estimate for ˜H [ k ] . Thedetails about the channel estimation is provided in Section III-B. Using (14), (15) and (22), the u th element of ˆx [ k ] , namely ˆ x u [ k ] , can be found as ˆ x u [ k ] = g u [ k ] + i u [ k ] + n u [ k ] + q u [ k ] , (23)where g u [ k ] = g u [ k ] (cid:48) ˜ s u [ k ] , n u [ k ] = ˆb u [ k ] H A ˜ w [ k ] , q u [ k ] = ˆb u [ k ] H q [ k ] , (24) i u [ k ] = ρ d √ N (cid:34) (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H Aˆh z [ k ]˜ s z [ k ] + (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H A˜e z [ k ]˜ s z [ k ] (cid:35) , (25)where g u [ k ] (cid:48) = ρ d √ N ˆb u [ k ] H A˜h u [ k ] . In (23), g u [ k ] corresponds to the signal part, whereas, i u [ k ] term in (23) contains the interference from other users (the left summation in the i u [ k ] expression)and the distortion caused by imperfect CSI (the right summation in the i u [ k ] expression). Fur-thermore, n u [ k ] and q u [ k ] correspond to the distortion caused by thermal noise and quantization,respectively. Moreover, ˆb u [ k ] H is the u th row of the ZF combiner ˆB [ k ] . Furthermore, ˜h u [ k ] and ˆh u [ k ] are equal to the u th column of the matrices ˜H [ k ] and ˆH [ k ] , respectively. In addition, ˜e u [ k ] corresponds to the u th column of the channel error matrix ˜H [ k ] − ˆH [ k ] .A lower bound on the ergodic capacity per user, for which the receiver has access to the sideinformation ˆH (cid:44) { ˆH [0] , ˆH [1] , . . . , ˆH [ N − } , an SINDR expression for the data symbol of u th user at the k th subcarrier, namely γ u [ k ] , which is deﬁned in [1, eq. (2.46)], is found using(23)-(25) in the following proposition. Proposition 3.

A lower bound on the ergodic capacity per user, treating ˆH as side information,which is denoted by R , can be calculated as R = 1 U |K D | E ˆ H (cid:40) U (cid:88) u =1 (cid:88) k ∈K D log (1 + γ u [ k ]) (cid:41) , (26)where γ u [ k ] (cid:44) | E [ g u [ k ] (cid:48) | ˆH ] | Var (cid:104) g u [ k ] (cid:48) | ˆH (cid:105) + Var (cid:104) i u [ k ] + n u [ k ] + q u [ k ] | ˆH (cid:105) = ρ d N | ˆb u [ k ] H Aˆh u [ k ] | I u [ k ] + N u [ k ] + Q u [ k ] , (27) I u [ k ] = ρ d N (cid:88) z (cid:54) = u,z ∈U d (cid:104) | ˆb u [ k ] H Aˆh z [ k ] | + U σ e ρ d || ˆb u [ k ] H A || (cid:105) , (28) N u [ k ] = N N o || ˆb u [ k ] H A || , Q u [ k ] = ˆb u [ k ] H C q [ k ] ˆb u [ k ] , (29)when ˆH is the LMMSE channel estimate. In (28), σ e is the LMMSE channel estimation errorvariance, which is as deﬁned in Section III-B. Proof.

See Appendix C. (cid:4)

Moreover, to ﬁnd the bit error rate (BER) for gray coded K -ary phase shift keying modulation( K -PSK), we approximate ˆx [ k ] as a complex Gaussian random variable, so that [27] BER ≈ U |K D | E ˆH (cid:40) U (cid:88) u =1 (cid:88) k ∈K D ( K ) × (cid:16) − Φ (cid:16)(cid:112) γ u [ k ]log ( K )sin (cid:16) πK (cid:17)(cid:17)(cid:17)(cid:41) . (30)The BER calculation formula in (30) is applicable for a multi-band interference environment,as the calculation of the SINDR γ u [ k ] involves the received power from any interfering band. Asdetailed in Footnote 2, the approximation error in assuming Gaussian inputs for the quantizeris very limited, thus R and the BER expressions in (26) and (30) calculated using γ u [ k ] in(27) are expected to approximate the simulation based results precisely ( γ u [ k ] is the preciseSINDR expression mentioned in Section I). Although (26) and (30) are useful to calculatevalues of the ergodic capacity and BER, they are not able to provide clear insights into thesystem performance and system parameters such as M , N or ρ d /ρ i . Therefore, we propose amore tractable approximation of γ u [ k ] in Proposition 4, in which the conditioning on ˆH will bedropped. From an information theoretic view, this will correspond to the case that the channelestimates are used to perform ZF combining by a ﬁrst party, but the channel estimate knowledgeis not conveyed to a second party, which performs error correction decoding on the ZF outputwithout the knowledge of channel estimates [1]. The corresponding use-and-then-forget ergodiccapacity bound, namely R (cid:48) , is found in the following proposition. Proposition 4.

Use-and-then-forget ergodic capacity bound R (cid:48) can be computed as R (cid:48) = 1 U |K D | U (cid:88) u =1 (cid:88) k ∈K D log (1 + γ (cid:48) u [ k ]) , (31) γ (cid:48) u [ k ] = | E [ g u [ k ] (cid:48) ] | Var [ g u [ k ] (cid:48) ] + Var [ i u [ k ] + n u [ k ] + q u [ k ]] ≈ ρ d (1 − σ e )( M − U ) G ( U σ e ρ d G + 2 − /π + N o G ) , (32)in which G = 2 / √ π (( |K D | U ρ d + |K I | Iρ i ) /N + N o ) − . . The approximation error goes to zeroas L gets larger and |K D | + |K I | approach to N (as oversampling rates gets lower) when ρ d ≈ ρ i . Proof.

See Appendix D. (cid:4)

Note that there is no need for any Monte-Carlo based simulation to obtain γ (cid:48) u [ k ] , since it can becalculated by just plugging the system parameters into the rightmost expression in (32). γ (cid:48) u [ k ] canbe used in place of γ u [ k ] in (30) or in (31) to calculate BER and achievable rate, again without anyMonte-Carlo simulations (as averaging over ˆH is already performed by analysis to obtain γ (cid:48) u [ k ] ).Owing to the simple form of γ (cid:48) u [ k ] in (32), insights into the system performance and parameterscan be obtained as follows. From Proposition 4, it is obvious that it is always possible to increasethe SINDR by increasing the number of antennas M . The impact of the oversampling rate canbe deduced by considering the case |K D | and |K I | are ﬁxed as the block length N increaseswhich by deﬁnition is equivalent to an increase in the oversampling rates β I and β D . It can beshown that γ (cid:48) u [ k ] is increasing with N by considering γ (cid:48) u [ k ] = ρ d ( M − U )(1 − σ e ) γ u [ k ] , where γ u [ k ] = G / (2 − /π + ( N o + U σ e ρ d ) G ) . Since G and G are increasing with N , it followsthat γ u [ k ] = G / (2 − /π + ( N o + U σ e ρ d ) G ) also increases with N , which in turn meansthat γ (cid:48) u [ k ] increases with N . The channel estimation error will be calculated in (44) in the nextsection, which can similarly be shown to decrease with the oversampling rates. B. Channel Estimation

In this section, the details of channel estimation under quantization will be presented. Thereare many channel estimation techniques for massive MIMO systems with low-resolution ADCs[8], [11], [13], [14], [16]. In none of those studies, a channel estimation scheme under ACI isdiscussed. In this study, we will propose an LMMSE channel estimation based on Bussgangdecomposition. The estimation technique is the extension of the channel estimation techniquein [13]. For the channel estimation phase, orthogonal pilot sequences are transmitted at some ofthe subcarriers involved. The set of subcarriers for which pilot signals are transmitted is denotedby K P . Moreover, it will be assumed that the interferers at the adjacent channel band will betransmitting data symbols from a ﬁnite cardinality set through the subcarriers in K I during thechannel estimation phase of the users in the desired band. The transmitted time domain pilotsignal of the u th user of length N p , where u ∈ U D , can be expressed as follows: p u [ n ] =  ρ p (cid:112) N p (cid:80) k ∈K P ˜ θ u [ k ] e j πnk/N p , if u ∈ U D ,ρ i (cid:112) N p (cid:80) k ∈K I ˜ s u [ k ] e j πnk/N p , if u ∈ U I , (33) ˜ θ u [ k ] =  , if ( k mod U ) + 1 (cid:54) = u, √ U e jφ u [ k ] , if ( k mod U ) + 1 = u. (34)Here, the phases e jφ u [ k ] are known by the base station. The selection of these phases affect theestimation performance. They are selected from the uniform distribution as suggested in [13]since when they are selected as constant, that is, when e jφ u [ k ] = C ∀ u, k , the transmitted signalby u th user p u [ n ] (cid:54) = 0 only when n = µN p /, µ ∈ Z . Otherwise, p u [ n ] = 0 . This means thatusers do not transmit anything most of the time, which limits the average transmit power dueto the peak power limitation of the power ampliﬁers in the transmitter side. Introduction ofnon-constant phases, one example of which is when they are selected from uniform distribution( , π ), avoids this problem. Since the channel convolution matrix is circulant due to CP, thereceived signal at the m th antenna and the k th subcarrier can be expressed as follows: ˜ y m [ k ] =  ρ p (cid:112) N p U (cid:80) u ∈U D ˜ h m,u [ k ]˜ θ u [ k ] + z m [ k ] , if k ∈ K P ,ρ i (cid:112) N p U (cid:80) u ∈U I ˜ h m,u [ k ]˜ s u [ k ] + z m [ k ] , if k ∈ K I , (35)where ˜ s u [ k ] ’s for k ∈ K I , u ∈ U I represent the random data symbols transmitted by the interferingband users, ˜ h m,u [ k ] is the element of matrix ˜H [ k ] at its m th row and u th column, and z m [ n ] represents the additive white noise term at the m th receive antenna of spectral density N p N o .Due to (34), it can be written that ˜ y m [ k ] =  ρ p (cid:112) N p U ˜ h m,f ( k ) [ k ] e jφ f ( k ) [ k ] + z [ k ] , if k ∈ K P ,ρ i (cid:112) N p U (cid:80) u ∈U I ˜ h m,u [ k ]˜ s u [ k ] + z [ k ] , if k ∈ K I , (36)where f ( k ) = ( k mod U )+1 . Deﬁning the quantized observation vector v m [ n ] (cid:44) sign( (cid:60){ y m [ n ] } )+ j sign( (cid:61){ y m [ n ] } ) in time domain, the quantized observation ˜ v m [ k ] in the frequency domain canbe expressed as ˜ v m [ k ] = N p − (cid:88) k =0 v m [ n ] e j πnk/N p . (37)Here y m [ n ] is the IDFT of ˜ y m [ k ] . Deﬁning v [ n ] (cid:44) [ v [ n ] v [ n ] · · · v M [ n ] ] T and y [ n ] (cid:44) [ y [ n ] y [ n ] · · · y M [ n ] ] T , it can be written that v [ n ] = A (cid:48) y [ n ] + q (cid:48) [ n ] , (38)where the selection A (cid:48) = (cid:112) /π diag ( C y [0]) − . makes the quantization noise q (cid:48) [ n ] to be uncor-related with the unquantized observation vector y [ n ] . Let y (cid:44) [ y [ N − T y [ N − T . . . y [0] T ] T . Here, the quantizer inputs are again assumed to be Gaussian due to the same reasoning discussed in Footnote 2. For the simplicity of the channel estimation part, C y is approximated as C y = HR p H H + N o I ≈ (cid:0) /N p ( ρ p |K p | U + ρ i |K I | U ) + N o (cid:1) I , (39)where R p = E [ pp H ] , in which, p = [ p [ N − T p [ N − T . . . p [0] T ] T , where p [ n ] is an M × column vector whose u th element is p u [ n ] . Without the approximation, M N × M N matrix C y will be non-diagonal in general, which implies that M N × M N quantization noisecovariance matrix will be non-diagonal. This will require taking the inverse of such a largematrix for LMMSE channel estimation as in [12] for frequency selective channel, which iscomputationally exhaustive. However, the approximation error goes to zero as L grows largewhich can be shown similarly as performed for C r [0] in Appendix D. Such an approximation isalso adopted in [13]. Taking the DFT of the quantized observation vector v [ n ] , it is found that ˜v [ k ] = G (cid:48) ˜y [ k ] + ˜q (cid:48) [ k ] , (40)where G (cid:48) = 2 / √ π (cid:0) ( ρ p |K p | U + ρ i |K I | I ) /N p + N o (cid:1) − . . (40) along with (36) implies that ˜ y m [ k ] = ρ p G (cid:48) (cid:112) N p U h m,f ( k ) [ k ] e jφ f ( k ) [ k ] + G (cid:48) z m [ k ] + p m [ k ] , (41)where k ∈ K P and p m [ k ] is the m th element of the DFT of p [ n ] . It can be seen from (41) thatthe observation ˜ y m [ vU + u − , when the noise terms are omitted, is a phase rotated and scaledversion of the channel coefﬁcient ˜ h m,u [ vU + u − for user u , sampled with a sampling periodof U , as f ( vU + u −

1) = ( vU + u − mod U ) + 1 = u . These samples will be denoted by ˇ h m,u [ v ] (cid:44) ˜ h m,u [ vU + u − . According to the Nyquist sampling theorem, if N p satisﬁes N p ≥ U L, (42)it is possible to obtain the channel coefﬁcients without any aliasing. Again for the simplicity ofthe channel estimation part, the covariance matrix of q (cid:48) [ n ] will be approximated as a diagonalmatrix (2 − /π ) I , with approximation error going to zero as L grows large for low oversamplingrates and ρ p ≈ ρ i as discussed in Appendix D. Under this approximation, the LMMSE estimatefor the channel coefﬁcient ˜ h m,u [ vU + u − , namely ˜ h m,u [ vU + u − ∗ , is found as ˜ h m,u [ vU + u − ∗ = e − jφ u [ vU + u − ˜ y m [ vU + u − ρ p G (cid:48) (cid:112) N p U (cid:0) N o / ( ρ p U ) + P/ (cid:0) ρ p U ( G (cid:48) ) (cid:1)(cid:1) , (43)where the quantization distortion variance P = E [ | p m [ k ] | ] ≈ − /π . Here, the channelestimation error σ e can also be found as σ e = 1 − ρ p (cid:112) N p U (cid:0) N o / ( ρ d U ) + P/ (cid:0) ρ d U ( G (cid:48) ) (cid:1)(cid:1) . (44) The parameter σ e will be used in the performance analysis of the investigated uplink systemmodel in this paper for imperfect CSI and the results from the analysis will be compared to thesimulated results. As can be noted in (43), for the u th user, we only have the channel coefﬁcientsestimates for ˇ h m,u [ v ] (cid:44) h m,u [ vU + u − ∗ , sampled with a period of U . To obtain the remainingchannel coefﬁcients, ˇ h m,u [ v ] can be upsampled by U . There are many possibilities to upsample ˇ h m,u [ v ] . The one adopted in this study is the spline interpolation [30].The most important parameter through which the ACI is taken into account in the proposedchannel estimation method is through the factor G (cid:48) in (43). By deﬁnition, G (cid:48) decreases withincreasing total ACI power ρ i |K I | I/N p . The distortion caused by the quantization can be regardedto have two components, the additive quantization noise distortion ˜q (cid:48) [ k ] and the magnitudedistortion G (cid:48) in (40). Under the aforementioned approximations, the power of the additivequantization noise ˜q (cid:48) [ k ] does not change with the ACI power. However, as G (cid:48) decreases withincreasing ACI power, the power of the signal part G (cid:48) ˜y [ k ] in (40) diminishes, resulting in areduced signal power compared to the quantization noise power. Therefore, a worse estimationerror performance can be expected. This can also be interpreted from (44). The channel estimationerror variance σ e in (44) increases as G (cid:48) decreases with increasing ACI power. Regarding howthe estimator combats with the degredation due to ACI can be inferred from (43). Neglectingthe P/ ( ρ p U ( G (cid:48) ) ) term in the denominator in (43), it can be stated that as the ACI power isincreased, which in turn decreases G (cid:48) and reduces the signal component G (cid:48) ˜y [ k ] in (40), theestimator tries to cancel this effect by multiplying the observation by /G (cid:48) (note the G (cid:48) factorin the denominator in (43)). However, such a normalization (multiplication by /G (cid:48) when G (cid:48) issmaller than ) results in the enhancement of the quantization noise ˜q (cid:48) [ k ] , thus the cancellationof the magnitude distortion G (cid:48) should be balanced with the quantization noise enhancement.This balancing is performed through the P/ρ p U ( G (cid:48) ) factor in the estimator in (43).IV. ADC S WITH HIGHER THAN ONE - BIT RESOLUTION

In this section, the details for the performance analysis for quantizers with more than one-bit resolution is presented. To begin with, we deﬁne the set of quantizer output values L = { (cid:96) , (cid:96) , . . . , (cid:96) L (cid:48) − } , where L (cid:48) = 2 q is the number of possible quantizer output values q being thenumber of ADC bits. Moreover, the quantization thresholds can also be characterized by theset B = { b , b , . . . , b L (cid:48) } , where −∞ = b < b < · · · < b L (cid:48) = ∞ . The quantization function Q ( . ) is a point in the function space C M → Υ M , where C M denotes the complex vector space of dimension M and Υ =

L × L is the set of possible quantizer output values (the cartesianproduct

L × L represents the combination of the outputs of the pair of ADCs quantizing the realand imaginary parts of the received signals separately). The i th element of the quantizer output,namely Q ( x ) i , where x is the quantizer input vector of size M × can be expressed as Q ( x ) i = (cid:0) (cid:96) f (cid:48) ( (cid:60) ( x i )) , (cid:96) f (cid:48) ( (cid:61) ( x i )) (cid:1) , (45)where f (cid:48) ( (cid:60) ( x )) = k ∈ { , , . . . , L (cid:48) − } which satisﬁes b k ≤ (cid:60) ( x ) < b k +1 . Similarly f ( (cid:61) ( x )) = m ∈ { , , . . . , L (cid:48) − } which satisﬁes b m ≤ (cid:61) ( x ) < b m +1 . As an example, the possible quantizeroutput values (cid:96) i = ∆( i − L (cid:48) / / , i = 0 , , . . . , L (cid:48) − , whereas the quantization thresholds( b i ) can be found as b i = ∆ (cid:0) i − L (cid:48) (cid:1) , i = 1 , , . . . , L (cid:48) − for a uniform midrise quantizer( b = −∞ , b L (cid:48) = ∞ as previously speciﬁed).An important point in the design of the quantizer is the selection of the step size ∆ . In fact,AGC will dynamically adjust the gain of the input signal to ADC according to the receivedsignal power in order that it ﬁts the input signal range of the ADC. This will correspond to theapproach in this study in which the step size is selected according to the received signal powerlevels, which is assumed to stay nearly the same over a coherence interval. This will result in aﬁxed step size during a coherence interval, enabling a tractable analysis.There are two main considerations in the design of the step size ∆ . If the step size is selectedto be small for the average received signal power level, the probability that the input signalis clipped will be high and cause a distortion, referred to as overload distortion. On the otherhand, if a large step size is preferred to avoid clipping or overload distortion, this will resultin a granular distortion, causing a large range of input signal level to be mapped to the samelevel. Therefore, step size should be selected properly to balance the aforementioned granular andoverload distortions. The amount of the two distortions will affect the validity of the assumptionsin the performance analysis, as will be discussed in the subsequent parts of this section.To begin with the analysis, matrix A in (13) should be evaluated. According to Bussgangdecomposition, A = C d [ n ] r [ n ] C − [ n ] , where C d [ n ] r [ n ] = E [ d [ n ] r [ n ] H ] and C r [ n ] = E [ r [ n ] r [ n ] H ] .For the example case of midrise uniform quantizer with Gaussian inputs [7], A = ∆ √ π diag ( C r [0]) − . × q − (cid:88) i =1 exp (cid:16) − ∆ (cid:0) i − q − (cid:1) diag ( C r [0]) − . (cid:17) . (46) The multi-bit quantizer input is also assumed to be Gaussian, which is accurate owing to the same reasoning discussed inFootnote 2. As C r [0] in (46) can be found using Proposition 1, matrix A can be calculated using (46)for multi-bit quantizer case. What remains is the calculation of the covariance matrix of thequantization noise C q [ k ] . The difﬁculty with the calculation of this matrix stems from the factthat there is no closed form expression for the relation between the quantizer input and outputcovariance matrices for multi-bit quantizers as for the one-bit quantizer in (20), which wasreferred to as the arcsine law. However, a diagonal approximation can be made, for which allnon diagonal entries of the covariance matrix C d [0] are assumed to be zero and the m th diagonalentry of C d [0] , namely E [ | d m [ n ] | ] , can be found as follows: E (cid:2) | d m [ n ] | (cid:3) = 2 L (cid:48) − (cid:88) i =0 (cid:96) i P [ b i ≤ (cid:60) ( r m ) < b i +1 ] = 2 L (cid:48) − (cid:88) i =0 (cid:96) i (cid:16) Φ (cid:16) √ b i +1 /σ r m (cid:17) − Φ (cid:16) √ b i /σ r m (cid:17)(cid:17) , (47)where σ r m is the m th diagonal element of diag( C r [0]) m,m corresponding to the quantizer inputvariance at the m th antenna. In (47), it is assumed that the quantizer input has a Gaussiandistribution . We will denote the diagonal matrix whose diagonal entries are equal to the diagonalentries of C d [0] as C diag d [0] . After ﬁnding C diag d [0] from (47), the diagonal approximation forthe C q [0] , namely C diag q [0] , can be found from (19) as C diag q [0] = C diag d [0] − A diag( C r [0]) A H . (48)The covariance matrices of quantization noise for nonzero lags, namely C diag q [ m ] , m (cid:54) = 0 , arealso assumed to be zero for multi-bit quantizers, which means that the correlation in time for thequantization noise is assumed to be zero. This assumption fails to be valid for very low ADCresolutions [24] or for high oversampling rates, as discussed in Appendix D, yet, it provides moreaccurate results as the number of quantization bits is increased [24], when clipping or overloaddistortion occurs with low probability. To ensure this, we will choose the step size ∆ small enoughas will be discussed shortly. After ﬁnding C diag q [0] , C q [ k ] for the diagonal approximation case,which is referred to as C diag q [ k ] , can be found using Proposition 2 as follows: C diag q [ k ] = DFT N,k { Γ [ m ] } + DFT N,k { Γ [ m ] } ∗ − N C ˜q [0] = N C ˜q [0] , (49)as it is assumed for multi-bit quantizers that Γ [ m ] = ( N − m ) C q [ m ] ≈ ( N − m ) C diag q [ m ] = 0 for m (cid:54) = 0 . Then, C diag q [ k ] can be used to ﬁnd the SINDR expression in Proposition 3, which canbe employed to ﬁnd the error-rate performance using (30). To ensure that the overload distortionis negligible, we adjust the step size ∆ as follows: ∆ = 2 A max /L (cid:48) , (50) where the maximum quantizer output level A max is adjusted as A max = (cid:112) G/ − Φ ( P c / ,where G is as deﬁned in Proposition 4, which corresponds to the average received power and P c is the desired probability that a clipping occurs. Obviously, the received signal is also assumedto be Gaussian distributed in the adjustment of the step size, which is an accurate assumptionaccording to Footnote 2. The clipping probability will be chosen as a small number, owing toits impact on the validity of the diagonal approximations involved in the analysis. Imperfect CSIcase for multi-bit quantizer cannot be covered due to limited space and is left for future work.To see the effect of oversampling, number of ADC bits and number of antennas on the SINDRfor the multi-bit quantizer case, the parameter G in (32) can be found using (46) as G = ∆ √ π ( λ ) − . q − (cid:88) i =1 exp (cid:16) − ∆ (cid:0) i − q − (cid:1) ( λ ) − . (cid:17) , (51)where λ = ( |K D | U ρ d + |K I | Iρ i ) /N + N o . The SINDR expression in (32) will be the same forthe multi-bit quantizer case except that the parameter G in (32) is found according to (51) andthe quantization noise variance Var [ q u [ k ]] will also be another constant less than − /π , whichis decreasing with the number of ADC resolution bits, but will not change with the number ofantennas or the oversampling rate. As mentioned before, increasing N while |K D | and |K I | areﬁxed corresponds to an increase in the oversampling rates. In such case, G is increased as λ decreases with N . Therefore, the same discussion that SINDR γ (cid:48) u [ k ] increases with G or theoversampling rates for one-bit ADC also applies for the multi-bit quantizer. Moreover, due to the ( M − U ) factor in (32), it is also possible to increase SINDR by increasing the number of antennas M . Furthermore, since the step size ∆ decreases when the number of ADC bits is increased,this corresponds to an increase in G according to (51), and a decrease in the quantization noisevariance Var [ q u [ k ]] , which in turn results in an increase in the SINDR γ (cid:48) u [ k ] in (32). Therefore,it can be stated that SINDR will increase when the oversampling rates, number of antennas andADC bits are increased for the multi-bit quantizer case, in line with the intuition.V. S IMULATION R ESULTS

For the simulations, the number of receive antennas is M = 64 , while U = I = 4 . Someother parameters are N = 1024 , L = 10 and K D = { N − , N − , . . . , − , , , . . . , } ( |K D | = 300 ), while K I = { , , . . . , } ( |K I | = 300 ), which makes the oversamplingrates β D = β I ≈ . . The data symbols are quadrature phase-shift keying (QPSK) modulated.Subcarrier spacing is 15 KHz as in LTE, with a transmission bandwidth of |K D | / ( N T s ) = 4 . -40 -35 -30 -25 -20 -15 -10 -5 0 d2 / i2 (dB) -3 -2 -1 BE R ( Q PSK ) SimulatedAnalytical Approx.Analytical Tight Approx. ~5 dB (a) BER vs. SIR ( ρ d /ρ i ). -40 -35 -30 -25 -20 -15 -10 d2 / i2 (dB) R ( bp c u / u s e r) Analytical Approx.Analytical Tight Approx. ~6 dB (b) R vs. SIR ( ρ d /ρ i ). Fig. 3: BER vs. SIR in (a) and R vs. SIR in (b), M = 64 , U = I = 4 , 1-bit ADC, perfect CSI.MHz for the desired channel, which does not change with the sampling rate. Furthermore, thenoise variance parameter N o is normalized such that ρ d /N o = 4 dB to take ρ d as unity. Thetype of the multi-bit quantizers is uniform midrise, for which P c = 1% . The power delay proﬁleis taken as uniform, that is, p [ l ] = 1 /L for ≤ (cid:96) < L . In the plots, the analytical curves forwhich the SINDR calculation is made based on Proposition 3 are referred to as “Analytical TightApprox.”, indicated with dashed lines. Moreover, the curves for which the SINDR is calculatedbased on Proposition 4 are named “Analytical Approx.”, indicated with circles in Fig. 3a and withsolid lines in Fig. 3b. As the ﬁrst case, we change the block length N , when all other parametersare ﬁxed (except L which should be directly proportional to N ) for the perfect CSI condition.The BER vs signal-to-interference ratio (SIR or ρ d /ρ i ) curves are presented in Fig. 3a. Note thata single ρ d /ρ i for each data point does not imply that the average received power for every band(desired or interference band) or subcarrier/user is the same for a given channel realization due to(15). In Fig. 3a, the simulated values are indicated with solid lines. As can be noted in the threecurves grouped as M = 64 curves on the right hand side of Fig. 3a, the analytical calculationsbased on Proposition 3 and (30) are in good agreement with the simulated values. Moreover, theapproximate analytical curves based on Proposition 4 generally follow the simulated curves. Inaddition, we see that increasing the oversampling rate (equivalently increasing the block lengthwhile the number of occupied subcarriers is ﬁxed) and the number of antennas M are useful to combat ACI. We can observe up to 5 dB SIR gain by increasing the oversampling rate from β d ≈ β i = 3 .

41 ( N = 1024 , L = 10) to β d = β i ≈ .

65 ( N = 4096 , L = 40) when the SIRlevels to achieve a target BER of − is considered. Signiﬁcant SIR gains is also observed inFig. 3a when M is increased as expected from the analysis.For the same simulation setting, the ergodic capacity in terms of bits per channel use (bpcu)per user calculated using (26) are plotted in Fig. 3b. An SIR gain more than 5 dB is observed withincreasing oversampling rate when the SIR levels to achieve an ergodic capacity of 3 bpcu peruser are compared. Moreover, it should be noted that the approximate analytical curve based onProposition 4 is close to the tight approximation curve in Proposition 3 for N = 1024 , a relativelylow oversampling rate case, verifying that the approximation in Proposition 4 is accurate for lowoversampling rates and when L is large . Furthermore, it can be noticed that the approximatecurve based on Proposition 4 yields higher BER for low SIR values or lower BER for high SIRvalues. The reason for this is as follows. For low SIR values, it can be shown that each element of C q [ m ] will be the samples of an aliased sinc pulse ( m being the sample index) centered around m = 0 , all samples being real valued as K D is symmetrical around the zeroth subcarrier. Sincethe values of the tails of the sinc pulse is much lower than that of its main lobe, it is reasonable toassume that C q [ m ] ≈ when | m | > N/ |K D | (cid:44) W , as N/ |K D | is the null-to-null bandwidth ofthe sinc pulse). Therefore, Γ [ m ] ≈ for | m | > W . As C q [ m ] = C q [ − m ] , Γ [ m ] = Γ [ − m ] , and |K D | (cid:29) , DFT

N,k { Γ [ m ] } ≈ (cid:80) Wm = − W Γ [ m ] e − j πmk/N = (cid:80) Wm = − W Γ [ m ] cos (2 πmk/N ) > Γ [0] .Since the approximation assumes that Γ [ m ] = 0 for m (cid:54) = 0 and (cid:80) Wm = − W Γ [ m ] > Γ [0] , thequantization noise covariance matrix calculated using Proposition 2 under this assumption haslower values compared to its exact version, resulting in a higher SINDR calculation than theexact values. For the low SIR case, it can be shown that each element of C q [ k ] will mostly beconcentrated inside the interfering band (for k ∈ K I ) and the quantization noise in the desiredband is due to the tails of sinc pulses, making the variance of the quantization noise in thedesired band limited less than the assumed quantization distortion value. The poor performancefor the low SIR case is mostly due to the magnitude distortion caused by the matrix A in (13).Simulations for the imperfect CSI case are also carried out. The pilot sequence length is takento be N whereas the set of subcarriers for pilot signals is selected as K P = K D ∪ { N − , N − In fact, R (cid:48) should be compared to a slightly modiﬁed version of R in (26), for which the outer expectation is taken insidethe logarithm, but the values obtained for this version are very similar to the values obtained for R in (26), thus not shown inFig. 3b for the simplicity of the plot. -30 -25 -20 -15 -10 -5 0 d2 / i2 (dB) -3 -2 -1 BE R ( Q PSK ) SimulatedAnalytical Tight Approx.Analytical Approx.Channel est. [13]sim. N=1024 ~6.5 dB (a) BER (QPSK) vs. SIR ( ρ d /ρ i ), one-bit ADC. -30 -25 -20 -15 -10 -5 d2 / i2 (dB) R ( bp c u / u s e r) Analytical Approx.Analytical Tight Approx. ~6 dB (b) R vs. SIR ( ρ d /ρ i ), one-bit ADC. -10 -8 -6 -4 -2 0 2 4 6 8 10 d2 / i2 (dB) -4 -3 -2 -1 BE R ( - PSK ) SimulatedAnalytical Tight Approx.Analytical Approx. (c) BER (8-PSK) vs. SIR ( ρ d /ρ i ), one-bit ADC. -40 -35 -30 -25 -20 -15 -10 d2 / i2 (dB) -3 -2 -1 BE R ( Q PSK ) Simulated1-bit Analytical (N=1024,2048,4096)2-bit Analytical (N=1024, 2048)3-bit Analytical (N=1024) (d) BER (QPSK) vs. SIR ( ρ d /ρ i ), multi-bit ADC. Fig. 4: Performance plots for imperfect CSI, 1-bit ADC in (a),(b),(c) and multi-bit ADC in (d). , , } . The noise spectral density N o in the channel estimation phase is also normalizedsuch that ρ p /N o = 4 dB to take ρ p = 1 . The BER vs SIR plots are presented in Fig. 4a.As can be noted in Fig. 4a, the analytical BER curves obtained based on the SINDR calculationin Proposition 3 are very close to the simulated results. Moreover, while the approximateanalytical curve is not as close to the simulated values as the ”Analytical Tight Approx.” curves,it can follow the error rate curves in general. Moreover, it can also be deduced from Fig. 4a thatthe 5 dB SIR gain achieved with oversampling for the perfect CSI case can also be attained underimperfect channel knowledge when the error rates to achieve a BER value of − is considered. In fact, it is even more than 5 dB (about 6.5 dB) for imperfect CSI. This is because oversamplingalso enhances channel estimation quality which in turn results in a better error rate performanceeven further for the one-bit quantized system with imperfect CSI. Comparing the perfect andimperfect CSI BER curves in Fig. 3a and Fig. 4a, we observe about 6.5 dB SIR loss due tochannel estimation error. This is not an unexpected value, as it is known that the normalizedchannel estimation mean squared error converges to -4.4 dB for inﬁnite training power withone-bit ADCs [12], resulting in a similar SIR loss according to (32). The remaining 2.1 dB lossis due to the ﬁnite training power, which is close to the SNR loss of about 2 dB for a MIMOsetting with inﬁnite ADC resolution [31]. Moreover, to compare the proposed channel estimationalgorithm with an existing method of comparable complexity in [13], which neither considersthe effect of ACI nor employs oversampling for channel estimation, we made simulations toobtain the BER performance of our system with N = 1024 which uses the channel estimatesobtained with the channel estimation method in [13]. The corresponding BER curve is labeledas “Channel Est. [13] Sim. N=1024” in Fig. 4a. As can be noted, a signiﬁcant performance lossis observed if the channel estimation method in [13] is used instead of our method.In addition to the BER curves, the ergodic capacity curves for imperfect CSI are also presentedin Fig. 4b. As can be noted in Fig. 4b, more than 5 dB SIR advantage can be obtained withtemporal oversampling when the SIR levels to achieve a target ergodic rate per user value of3 bpcu/user are compared. Moreover, the “analytical approximate” curve can generally followthe “tight approximation” curve. The reason for the approximate ergodic capacity curve is notas close to the tight approximation curve for N = 1024 as in the perfect CSI case is dueto the additional approximation error due to the assumptions involved in the proposed channelestimation scheme, distorting the orthogonality of the channel estimates and the estimation errors.Regarding the performance with higher-order modulations, BER vs. SIR plots for 8-PSK andimperfect CSI are presented in Fig. 4c. As can be expected, worse BER performance is observedcompared to QPSK. An error ﬂoor is observed since the quantization and thermal noise existeven if the interference power is zero (inﬁnite SIR). Moreover, a signiﬁcant BER performanceadvantage is obtained with oversampling. The tight approximation based on Proposition 3 closelyapproximates the simulated values while the approximate curves based on Proposition 4 providesaccurate values for low oversampling rates as expected from the discussion in Appendix D.The error rate curves are also plotted for multi-bit quantizers (up to 3-bits) in Fig. 4d. As canbe noted in Fig. 4d, the simulated values are very close to the analytical BER curves which are based on the SINDR calculation in Proposition 3. In Fig. 4d, the oversampling rate increasesfrom to ( L from to ) towards left for all quantization resolutions (1 bit to 3 bits).However, it should be noted that for 2-bit quantizer, there are two cases, either N = 1024 and N = 2048 , while there is only the BER curve for N = 1024 for 3-bit quantizer. As can also benoted in Fig. 4d, the simulated values are in perfect agreement with the analytical values, evenfor the 2-bit quantizer, thus, it can be stated that the assumption of uncorrelated quantizationnoise in time is accurate unless the ADC resolution is as low as one bit (the quantization noisecorrelation in time is taken into account for the one-bit ADC case).In order to make fair comparisons between the cases presented in Fig. 4d, we will try toequate the power consumptions of the various quantization resolution and oversampling ratecases. It is assumed that the power consumption of an ADC is proportional to q , that is, thepower consumption is doubled for single bit addition. This assumption is veriﬁed to be accuratein various studies [10], [32]. It is also assumed that ADC power consumption grows linearlywith oversampling rate as in [32].Equating the power consumptions, the ﬁrst cases to be compared are 1-bit ADC with N = 2048 and 2-bit ADC with N = 1024 . As can be noted in Fig. 4d, ACI supression in the 1-bit ADCwith N = 2048 case is better compared to the 2-bit ADC with N = 1024 case for BER valueshigher than − , while the 2-bit ADC with N = 1024 case is slightly better for BER valueslower than − . Therefore, it can be stated that 1-bit ADC with N = 2048 is preferable over a2-bit ADC with N = 1024 , as it provides better BER values and the complexity of a 1-bit ADCis signiﬁcantly lower (there is no need for an AGC unit in a 1-bit ADC, die area is doubled withevery bit increase for ﬂash ADCs; and component matching requirements are also doubled withevery bit increase for ﬂash, sucessive approximation (SAR) or pipelined ADCs [33]). Moreover,the time it takes to complete a conversion (conversion time) is also doubled with every bit ofincrease for integrating ADCs, while the conversion time scales linearly with number of bitsfor SAR or pipelined converters [33]. Therefore, by oversampling with a 1-bit ADC, while weachieve better error rate performance than 2-bit ADCs, total power consumptions equated, wealso have advantages regarding ADC complexity and conversion time.We can also compare other two cases, one is the performance of 1-bit ADC with N = 4096 and the other is 2-bit ADC with N = 2048 , as their power consumptions are equal. We see fromFig. 4d that their performances are nearly equal for BER values lower than − , while the 2-bitADC with N = 2048 case has about 1.5 dB SIR advantage for the BER value of − . The design engineer should be considering whether it is worth to have . dB SIR advantage to use2-bit ADCs, which have the aforementioned disadvantages regarding implementation complexityand conversion time compared to the 1-bit ADCs.The remaining performance comparison is between 3-bit ADCs with N = 1024 and 2-bitADCs with N = 2048 , as their power consumptions are equal. We can see that we have about4 dB SIR advantage with 3-bit ADC. However, again, it should also be considered that such anSIR gain does not come for free, as 2-bit ADCs are much more advantageous compared to 3-bitADCs in terms of die area, component matching circuitry and conversion time, thus 2-bit ADCswith oversampling can be a choice compared to 3-bit ADCs despite the SIR disadvantage.VI. C ONCLUSIONS

In this paper, we have presented a performance analysis for an uplink massive MIMO-OFDMsystem with low-resolution oversampling ADCs under frequency selective fading in an interferingadjacent channel interference scenario for perfect or imperfect receiver CSI. The analysis arrivedat two important expressions, one of which gives very accurate results but limited insights, theother giving noticeable approximation errors but much clearer insights into the dependence ofsystem performance on the system parameters. We have shown both with analysis and simulationsthat adjacent band interference can be suppressed by increasing the number of antennas or theoversampling rate. Moreover, we discussed whether to use lower-resolution ADCs with higheroversampling rates or higher-resolution ADCs with lower oversampling rates comparing theirerror rate performances while their power consumptions are equated.A

PPENDIX AP ROOF OF P ROPOSITION C r [ m ] = E (cid:2) r [ n ] r [ n − m ] H (cid:3) = E (cid:34) N N − (cid:88) k =0 N − (cid:88) k (cid:48) =0 ˜ r [ k ]˜ r [ k (cid:48) ] H e j πnk/N e − j π ( n − m ) k (cid:48) /N (cid:35) (52) = 1 N N − (cid:88) k =0 E (cid:2) ˜ r [ k ]˜ r [ k ] H (cid:3) e j πnk/N e − j π ( n − m ) k/N (53) = 1 N N − (cid:88) k =0 C ˜r [ k ] e j πmk/N . (54)Here (52) holds by deﬁnition, (53) is because E (cid:2) ˜ r [ k ]˜ r [ k (cid:48) ] H (cid:3) = for k (cid:54) = k (cid:48) as the data symbolsof the users are independent and (54) also holds by deﬁnition. (cid:4) A PPENDIX BP ROOF OF P ROPOSITION C q [ k ] can be found from C q [ m ] as C q [ k ] = E (cid:2) q [ k ] q [ k ] H (cid:3) = N − (cid:88) m =0 N − (cid:88) m (cid:48) =0 E (cid:2) ˜ q [ m ]˜ q [ m (cid:48) ] H (cid:3) e − j πmk/N e j πm (cid:48) k/N (55) = N − (cid:88) m =0 N − (cid:88) m (cid:48) =0 C q [ m − m (cid:48) ] e − j π ( m − m (cid:48) ) k/N (56) = N − (cid:88) (cid:96) = − ( N − ( N − | (cid:96) | ) C q [ (cid:96) ] e − j π(cid:96)k/N (57) = (cid:34) N − (cid:88) (cid:96) =0 ( N − (cid:96) ) C q [ (cid:96) ] e − j π(cid:96)k/N + N − (cid:88) (cid:96) =0 ( N − (cid:96) ) C q [ (cid:96) ] ∗ e j π(cid:96)k/N (cid:35) (58) = DFT N,k { Γ [ (cid:96) ] } + DFT N,k { Γ [ (cid:96) ] } ∗ − N C ˜q [0] . (59)Here (55)-(56) hold by deﬁnition and due to stationarity of q [ m ] . A change of variable ( (cid:96) = m − m (cid:48) ) is introduced in (57). (58) holds since C q [ (cid:96) ] ∗ = C q [ − (cid:96) ] . (cid:4) A PPENDIX CP ROOF OF P ROPOSITION γ u [ k ] is as deﬁned in (27), the fact that R in (26) is a lower bound ergodic ca-pacity follows from [1, eq. (2.46)], since the conditions E [ w u [ k ] | ˆH ] = E [˜ s u [ k ] ∗ w u [ k ] | ˆH ] = E [ˆ g u [ k ] (cid:48) ˜ s u [ k ] ∗ w u [ k ] | ˆH ] = 0 , where w u [ k ] (cid:44) i u [ k ] + n u [ k ] + q u [ k ] , hold for LMMSE channelestimate ˆH . E [ g u [ k ] (cid:48) | ˆH ] in the numerator of γ u [ k ] expression can be found as follows: E (cid:104) g u [ k ] (cid:48) | ˆH (cid:105) = E (cid:104) ρ d √ N ˆb u [ k ] H A˜h u [ k ] || ˆH (cid:105) (60) = E (cid:104) ρ d √ N ˆb u [ k ] H Aˆh u [ k ] || ˆH (cid:105) + E (cid:104) ρ d √ N ˆb u [ k ] H A˜e u [ k ] || ˆH (cid:105) (61) = ρ d √ N ˆb u [ k ] H Aˆh u [ k ] , (62)where the last step is owing to the fact that LMMSE channel estimates are unbiased thus theestimation errors are zero-mean. Var (cid:104) ˆ g u [ k ] (cid:48) | ˆH (cid:105) can also be found as follows:Var (cid:104) ˆ g u [ k ] (cid:48) | ˆH (cid:105) = Var (cid:104) ρ d √ N ˆb u [ k ] H Aˆh u [ k ] + ρ d √ N ˆb u [ k ] H A˜e u [ k ] | ˆH (cid:105) (63) = Var (cid:104) ρ d √ N ˆb u [ k ] H A˜e u [ k ] | ˆH (cid:105) (64) = ρ d N ˆb u [ k ] H A E (cid:2) ˜e u [ k ] ˜e u [ k ] H (cid:3) A H ˆb u [ k ] = ρ d N σ e || ˆb u [ k ] H A || , (65) where (65) follows from the fact that the LMMSE channel estimation errors are uncorrelatedwith the channel estimates, hence with the ZF matrix row vectors. Moreover,Var [ i u [ k ] + n u [ k ] + q u [ k ] | ˆH ] = Var (cid:104) i u [ k ] | ˆH (cid:105) + Var (cid:104) n u [ k ] | ˆH (cid:105) + Var (cid:104) q u [ k ] | ˆH (cid:105) . (66)(66) is due to uncorrelatedness of the thermal and quantization noise with every other term.The uncorrelatedness of the quantization noise with the other terms is due to the Bussgangdecomposition; matrix A is selected to make r [ n ] to be uncorrelated with q [ n ] in (13) and thisalso holds for the frequency domain terms as DFT is a unitary transformation which preservesinner products. Moreover, Var (cid:104) i u [ k ] | ˆH (cid:105) can be derived as follows:Var (cid:104) i u [ k ] | ˆH (cid:105) = ρ d N Var (cid:34) (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H Aˆh z [ k ]˜ s z [ k ] + (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H A˜e z [ k ]˜ s z [ k ] (cid:12)(cid:12)(cid:12)(cid:12) ˆH (cid:35) (67) = ρ d N (cid:34) (cid:88) z (cid:54) = u,z ∈U d | ˆb u [ k ] H Aˆh z [ k ] | + (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H A E (cid:2) ˜e z [ k ] ˜e z [ k ] H (cid:3) A H ˆb u [ k ] (cid:35) (68) = ρ d N (cid:34) (cid:88) z (cid:54) = u,z ∈U d | ˆb u [ k ] H Aˆh z [ k ] | + (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H A σ e A H ˆb u [ k ] (cid:35) (69) = ρ d N (cid:34) (cid:88) z (cid:54) = u,z ∈U d | ˆb u [ k ] H Aˆh z [ k ] | + σ e ( U − || ˆb u [ k ] H A || (cid:35) (70)where (68) is follows from the fact that the channel estimation errors for all users are alsouncorrelated with each other and with ZF matrix row vectors. Moreover, (69) is due owing tothe fact that estimation errors for all channel coefﬁcients are the same due to (44). It is alsostraightforward to show that Var (cid:104) n u [ k ] | ˆH , ˜ s u [ k ] (cid:105) = N u [ k ] , Var (cid:104) q u [ k ] | ˆH , ˜ s u [ k ] (cid:105) = Q u [ k ] . (cid:4) A PPENDIX DP ROOF OF P ROPOSITION m th element of C r [0] by C r [0] m,m , which can be calculated as C r [0] m,m = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L − (cid:88) (cid:96) =0 U + I (cid:88) u =1 h m,u [ (cid:96) ] s u [ n − l ] + w m [ n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) H  = L − (cid:88) (cid:96) =0 U + I (cid:88) u =1 G u | h m,u [ (cid:96) ] | + N o , (71) G u = ( |K D | ρ d ) /N if u ∈ U D or G u = ( |K I | ρ i ) /N if u ∈ U I . Owing to Chebyshev inequality, Pr [ | C r [0] m,m − µ | ≥ (cid:15) ] ≤ Var[ C r [0] m,m ] (cid:15) ∀ m, (72)where µ (cid:44) E H [ C r [0] m,m ] = ( |K D | U ρ d + |K I | Iρ i ) /N + N o , Pr[ ξ ] denote the probability of anevent ξ and Var[ C r [0] m,m ] is the variance of C r [0] m,m with respect to H , which is equal to G u /L . Since Var[ C r [0] m,m ] is decreasing with L , for any (cid:15) > we can ﬁnd L > such that Pr [ | C r [0] m,m − µ | ≥ (cid:15) ] = 0 , thus C r [0] m,m converge in probability to µ . Therefore, the error inthe approximation C r [0] m,m ≈ ( |K D | U ρ d + |K I | Iρ i ) /N + N o converge to zero in probability as L grows large (goes to inﬁnity). This implies that A = 2 √ π diag ( C r [0]) − . → √ π (cid:0)(cid:0) |K D | U ρ d + |K I | Iρ i (cid:1) /N + N o (cid:1) − . I = G I , (73)as L grows large (convergence is represented by → symbol). It can similarly be shown that C r [0] m,i ≈ for m (cid:54) = i as L grows large since E H [ C r [0] m,i ] = 0 as h m,u [ l ] , h i,u [ l ] , w m [ n ] and w i [ n ] are uncorrelated for m (cid:54) = i . The numerator of the γ u [ k ] (cid:48) expression can be found as | E [ g u [ k ] (cid:48) ] | = (cid:12)(cid:12)(cid:12)(cid:12) E (cid:104) ρ d √ N ˆb u [ k ] H Aˆh u [ k ] (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) ≈ (cid:12)(cid:12)(cid:12)(cid:12) ρ d √ N G E (cid:104) ˆb u [ k ] H ˆh u [ k ] (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) (74) = | ρ d √ N G | = ρ d N G , (75)where the approximation in (74) is due to the approximation in (73), and (75) is owing to ZFcombining. What remains is to ﬁnd the denominator terms of γ u [ k ] (cid:48) as follows:Var [ g u [ k ] (cid:48) ] + Var [ w u [ k ]] = Var [ g u [ k ] (cid:48) ] + Var [ i u [ k ]] + Var [ n u [ k ]] + Var [ q u [ k ]] , (76)Var [ n u [ k ]] = E [ | ˆb u [ k ] H A ˜ w [ k ] | ] ≈ G E (cid:104) ˆb u [ k ] H ˜ w [ k ] ˜ w [ k ] H ˆb u [ k ] (cid:105) = G E (cid:104) Tr (cid:104) ˜ w [ k ] ˜ w [ k ] H ˆb u [ k ] ˆb u [ k ] H (cid:105)(cid:105) = G Tr (cid:104) E (cid:2) ˜w [ k ] ˜ w [ k ] H (cid:3) E (cid:104) ˆb u [ k ] ˆb u [ k ] H (cid:105)(cid:105) = G N N o E (cid:104) Tr (cid:104) ˆb u [ k ] ˆb u [ k ] H (cid:105)(cid:105) = G N N o E (cid:104) || ˆb u [ k ] || (cid:105) ≈ G N N o ( M − U )(1 − σ e ) , (77)where in the last step, the approximation E (cid:104) || ˆb u [ k ] || (cid:105) ≈ / (( M − U )(1 − σ e )) , is used, whoseproof can be found in [1]. Var [ q u [ k ]] can be found similarly asVar [ q u [ k ] | ˜ s u [ k ]] ≈ N (2 − π ) / (cid:0) ( M − U ) (cid:0) − σ e (cid:1)(cid:1) , (78)under the approximation C q [ k ] ≈ N (2 − π ) I , which is accurate under two conditions. The ﬁrst iswhen L is large, for which the approximation C r [0] ≈ G I has been shown to be accurate, whichalong with (19) and (20) implies that C q [0] ≈ (2 − /π ) I is accurate. The second condition iswhen the oversampling rates are low and when ρ d ≈ ρ i , in which case it can be shown thatthe approximation C r [ (cid:96) ] ≈ for (cid:96) (cid:54) = 0 is accurate, which implies along with (19) and (20) that the approximation C q [ (cid:96) ] ≈ for (cid:96) (cid:54) = 0 is accurate. Then, it follows from Proposition 2 that theapproximation C q [ k ] ≈ N (2 − /π ) is accurate. For the proof of the approximation C r [ (cid:96) ] ≈ for (cid:96) (cid:54) = 0 being accurate for low oversampling rates, consider the received signal vector at the m th antenna, namely r m (cid:44) [ r m [0] r m [1] . . . r m [ N − T . It can be written as r m = H m s + w m ,where H m is a block circulant channel matrix whose k th row is equal to the ( mk ) th row of H and w m is a vector whose k th element is equal to the ( mk ) th element of w . Moreover, deﬁning ˜s (cid:44) ( F N ⊗ I U + I ) ˜s , where ⊗ represents the Kronecker product, F HN is the N × N DFT matrix,the autocorrelation of r m conditioned on H m , namely C r m (cid:44) E [ r m r Hm | H m ] , can be found as C r m = H m PH Hm + N o I , (79)where P (cid:44) F H E [ ˜s˜s H ] F and F H = ( F HN ⊗ I U + I ) . When ρ d ≈ ρ i = ρ , the magnitude of theelement of matrix P at its m th row and n th column, denoted by | P m,n | , can be written as | P m,n | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ N N − (cid:88) k =0 E (cid:2) | ˜ s u [ k ] | (cid:3) e − j π(cid:96)k/N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (*) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − ρ N (cid:88) k / ∈ ( K D ∪K I ) E (cid:2) | ˜ s u [ k ] | (cid:3) e − j π(cid:96)k/N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (80) < ρ N ( N − |K D | − |K I | ) , (81)where u is the user index determined by the value of m and (cid:96) = m − n . Moreover, the equality(*) holds when (cid:96) (cid:54) = 0 . Since m th diagonal element of P , namely P m,m = ρ /N ∀ m , theratio of the magnitude of any non-diagonal element of P to any diagonal element is bounded by ( N − |K D | − |K I | ) . Therefore, as |K D | + |K I | approaches N , which occurs in a low oversamplingrate scenario, the error in approximating P as a diagonal matrix goes to zero. In this case, C r m ,whose each element is a weighted summation of the channel coefﬁcients h m,u [ (cid:96) ] , converges to E H m [ C r m ] = E H m (cid:2) H m H Hm (cid:3) P + N o I = ( U + I ) P + N o I as L grows large, which can be shownrigorously following similar steps as in (71)-(72). Since the proof is the same ∀ m , the error inapproximating C r m matrices as a diagonal matrices ∀ m , equivalently approximating C r [ (cid:96) ] ≈ for (cid:96) (cid:54) = 0 or C q [ k ] ≈ N (2 − /π ) I , goes to zero as L grows large and as |K D | + |K I | approaches N (which is a case for low oversampling rates) when ρ d ≈ ρ i .The proof continues with obtaining Var [ i u [ k ]] as follows:Var [ i u [ k ]] = ρ d N Var (cid:34) (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H Aˆh z [ k ]˜ s z [ k ] + (cid:88) z (cid:54) = u,z ∈U d ˆb u [ k ] H A˜e z [ k ]˜ s z [ k ] (cid:35) ≈ G ρ d N (cid:88) z (cid:54) = u,z ∈U d E (cid:104) | ˆb u [ k ] H ˜h z [ k ] | (cid:105) + (cid:88) z (cid:54) = u,z ∈U d G ρ d N E (cid:104) | ˜e z [ k ] H ˆb u [ k ] | (cid:105) . (82) Here, E (cid:104) | ˆb u [ k ] H ˜h z [ k ] | (cid:105) = 0 ∀ z (cid:54) = u due to ZF combining and E (cid:104) | ˜e z [ k ] H ˆb u [ k ] | (cid:105) = E (cid:104) Tr (cid:104) ˆb u [ k ] ˆb u [ k ] H ˜e z [ k ] ˜e z [ k ] H (cid:105)(cid:105) = Tr (cid:104) E (cid:104) ˆb u [ k ] ˆb u [ k ] H (cid:105) E (cid:2) ˜e z [ k ] ˜e z [ k ] H (cid:3)(cid:105) (83) = σ e E (cid:104) || ˆb u [ k ] || (cid:105) ≈ σ e ( M − U )(1 − σ e ) , (84)where (83) is due to the uncorrelatedness of the channel estimation error ˜e u [ k ] with the channelestimate ˆh z [ k ] , hence with ˆb z [ k ] . Using (82) and (84), it can be written thatVar [ i u [ k ]] = Var [ g u [ k ] (cid:48) + i u [ k ]] ≈ G ρ d N ( U − σ e ( M − U )(1 − σ e ) . (85)Similarly, Var [ g u [ k ] (cid:48) ] can be found asVar [ g u [ k ] (cid:48) ] ≈ σ e ( M − U ) / (1 − σ e ) . (86)Then, from (76), (77), (78), (85) and (86) it follows thatVar [ g u [ k ] (cid:48) ]+ Var [ˆ i u [ k ] + ˆ n u [ k ] + ˆ q u [ k ]] ≈ G N N o + N (2 − /π ) + G ρ d N U σ e ( M − U )(1 − σ e ) , (87)which implies the proposition statement along with (75). (cid:4) R EFERENCES [1] T. L. Marzetta, E. G. Larsson, Y. Hong, and H. Q. Ngo,

Fundamentals of massive MIMO , 2016.[2] E. G. Larsson et al. , “Massive MIMO for next generation wireless systems,”

IEEE Commun. Mag. , vol. 52, no. 2, pp.186–195, Feb. 2014.[3] J. G. Andrews et al. , “What will 5G be?”

IEEE J. Sel. Areas Commun. , vol. 32, no. 6, pp. 1065–1082, Jun. 2014.[4] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efﬁciency of very large multiuser MIMO systems,”

IEEE Trans. Commun. , vol. 61, no. 4, pp. 1436–1449, Apr. 2013.[5] E. Bj¨ornson et al. , “Massive MIMO: Ten myths and one critical question,”

IEEE Commun. Mag. , vol. 54, no. 2, pp.114–123, Feb. 2016.[6] E. Bj¨ornson et al. , “Massive MIMO with non-ideal arbitrary arrays: Hardware scaling laws and circuit-aware design,”

IEEE Trans. Wireless Commun. , vol. 14, no. 8, pp. 4353–4368, Aug. 2015.[7] S. Jacobsson et al. , “Massive MU-MIMO-OFDM uplink with hardware impairments: Modeling and analysis,” in

Proc.Asilomar Conf. Signals Syst Comp. , 2018, pp. 1829–1835.[8] S. Jacobsson et al. , “Throughput analysis of massive MIMO uplink with low-resolution ADCs,”

IEEE Trans. WirelessCommun. , vol. 16, no. 6, pp. 4038–4051, Jun. 2017.[9] S. Jacobsson et al. , “One-bit massive MIMO: Channel estimation and high-order modulations,” in

Proc. IEEE Int. Conf.Commun. , London, 2015, pp. 1304–1309.[10] B. Murmann, “The race for the extra decibel: A brief review of current ADC performance trajectories,”

IEEE Solid-StateCircuits Mag. , vol. 7, no. 3, pp. 58–66, Jul. 2015. [11] C. Risi et al. (2014, Apr. 30) Massive MIMO with 1-bit ADC . [Online]. Available: http://arxiv.org/abs/1404.7736[12] Y. Li et al. , “Channel estimation and performance analysis of one-bit massive MIMO systems,”

IEEE Trans. Signal Process. ,vol. 65, no. 15, pp. 4075–4089, Aug. 2017.[13] C. Mollen et al. , “Uplink performance of wideband massive MIMO with one-bit ADCs,”

IEEE Trans. Wireless Commun. ,vol. 16, no. 1, pp. 87–100, Jan. 2017.[14] J. Choi et al. , “Near maximum-likelihood detector and channel estimator for uplink multiuser massive MIMO systemswith one-bit ADCs,”

IEEE Trans. Commun. , vol. 64, no. 5, pp. 2005–2018, May. 2016.[15] C. Studer and G. Durisi, “Quantized massive MU-MIMO-OFDM uplink,”

IEEE Trans. Commun. , vol. 64, no. 6, pp.2387–2399, Jun. 2016.[16] T. Zhang et al. , “Mixed-ADC massive MIMO detectors: Performance analysis and design optimization,”

IEEE Trans.Wireless Comm. , vol. 15, no. 11, pp. 7738–7752, Nov. 2016.[17] J. Mo and R. W. Heath, “Capacity analysis of one-bit quantized MIMO systems with transmitter channel state information,”

IEEE Trans. Signal Process. , vol. 63, no. 20, pp. 5498–5512, Oct. 2015.[18] A. B. ¨Uc¸ ¨unc¨u and A. O. Yılmaz, “Performance analysis of faster than symbol rate sampling in 1-bit massive MIMOsystems,” in

Proc. IEEE Int. Conf. Commun. , 2017, pp. 1–6.[19] A. B. ¨Uc¸ ¨unc¨u and A. O. Yılmaz, “Oversampling in one-bit quantized massive MIMO systems and performance analysis,”

IEEE Trans. Wireless Commun. , vol. 17, no. 12, pp. 7952–7964, Dec. 2018.[20] A. B. ¨Uc¸ ¨unc¨u and A. O. Yılmaz, “Uplink performance analysis of oversampled wideband massive MIMO with one-bitADCs,” in

Proc. IEEE 88th Veh. Technol. Conf. , 2018, pp. 1–5.[21] A. B. ¨Uc¸ ¨unc¨u and A. O. Yılmaz, “Sequential linear detection in one-bit quantized uplink massive MIMO withoversampling,” in

Proc. IEEE 88th Veh. Technol. Conf. , 2018, pp. 1–5.[22] J. Choi et al. , “Quantized distributed reception for MIMO wireless systems using spatial multiplexing,”

IEEE Trans. SignalProcess. , vol. 63, no. 13, pp. 3537–3548, Jul. 2015.[23] S. Berger et al. , “Dynamic range-aware uplink transmit power control in LTE networks: Establishing an operational rangefor LTE’s open-loop transmit power control parameters ( α, p ) ,” IEEE Wireless Commun. Lett. , vol. 3, no. 5, pp. 521–524,Oct. 2014.[24] S. Jacobsson et al. , “Linear precoding with low-resolution DACs for massive MU-MIMO-OFDM downlink,”

IEEE Trans.Wireless Comm. , vol. 18, no. 3, pp. 1595–1609, Mar. 2019.[25] A. B. ¨Uc¸ ¨unc¨u et al. , “Performance analysis of one-bit massive MIMO with oversampling under adjacent channelinterference,” to be presented in IEEE Global Comm. Conf., Waikoloa, HI, USA, Dec. 9-13, 2019.[26]

Evolved Universal Terrestrial Radio Access (E-UTRA) Physical channels and modulation , 3GPP Std. TS 36.211 V8.9.0.[27] A. Goldsmith,

Wireless Communications . New York, NY: Cambridge University Press, 2005.[28] H. E. Rowe, “Memoryless nonlinearities with Gaussian inputs: Elementary results,”

BELL Syst. Tech. J. , vol. 61, no. 7,pp. 1519–1525, 1982.[29] J. H. V. Vleck and D. Middleton, “The spectrum of clipped noise,”

Proc. IEEE , vol. 54, no. 1, pp. 2–19, 1966.[30] C. De Boor et al. , A practical guide to splines . New York: Springer-Verlag, 1978, vol. 27.[31] I. . Lai et al. , “Asymptotic BER analysis for MIMO-BICM with zero-forcing detectors assuming imperfect CSI,” in , May. 2008, pp. 1238–1242.[32] R. H. Walden, “Analog-to-digital converter survey and analysis,”

IEEE J. Sel. Areas Commun. , vol. 17, no. 4, pp. 539–550,Apr. 1999.[33] S. Sumathi, P. Surekha, and P. Surekha,