[PDF] Unequally Sub-connected Architecture for Hybrid Beamforming in Massive MIMO Systems

Abstract

A variety of hybrid analog-digital beamforming architectures have recently been proposed for massive multiple-input multiple-output (MIMO) systems to reduce energy consumption and the cost of implementation. In the analog processing network of these architectures, the practical sub-connected structure requires lower power consumption and hardware complexity than the fully connected structure but cannot fully exploit the beamforming gains, which leads to a loss in overall performance. In this work, we propose a novel unequal sub-connected architecture for hybrid combining at the receiver of a massive MIMO system that employs unequal numbers of antennas in sub-antenna arrays. The optimal design of the proposed architecture is analytically derived, and includes antenna allocation and channel ordering schemes. Simulation results show that an enhancement of up to 10% can be attained in the total achievable rate by unequally assigning antennas to sub-arrays in the sub-connected system at the cost of a marginal increase in power consumption. Furthermore, in order to reduce the computational complexity involved in finding the optimal number of antennas connected to each radio frequency (RF) chain, we propose three low-complexity antenna allocation algorithms. The simulation results show that they can yield a significant reduction in complexity while achieving near-optimal performance.

Full PDF

11 Unequally Sub-connected Architecture forHybrid Beamforming in Massive MIMOSystems

Nhan Thanh Nguyen and Kyungchun Lee,

Senior Member, IEEE

Abstract

A variety of hybrid analog-digital beamforming architectures have recently been proposed formassive multiple-input multiple-output (MIMO) systems to reduce energy consumption and the costof implementation. In the analog processing network of these architectures, the practical sub-connectedstructure requires lower power consumption and hardware complexity than the fully connected structurebut cannot fully exploit the beamforming gains, which leads to a loss in overall performance. In thiswork, we propose a novel unequal sub-connected architecture for hybrid combining at the receiver of amassive MIMO system that employs unequal numbers of antennas in sub-antenna arrays. The optimaldesign of the proposed architecture is analytically derived, and includes antenna allocation and channelordering schemes. Simulation results show that an enhancement of up to can be attained in thetotal achievable rate by unequally assigning antennas to sub-arrays in the sub-connected system at thecost of a marginal increase in power consumption. Furthermore, in order to reduce the computationalcomplexity involved in ﬁnding the optimal number of antennas connected to each radio frequency (RF)chain, we propose three low-complexity antenna allocation algorithms. The simulation results show thatthey can yield a signiﬁcant reduction in complexity while achieving near-optimal performance.

Index Terms

Hybrid precoding, analog combining, sub-connected architecture, massive MIMO, millimeter wave.

N. T. Nguyen is with the Department of Electrical and Information Engineering, Seoul National University of Science andTechnology, Seoul 01811, Republic of Korea (e-mail: [email protected]).K. Lee is with the Department of Electrical and Information Engineering and the Research Center for Electrical andInformation Technology, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea (e-mail:[email protected]).

August 28, 2019 DRAFT a r X i v : . [ c s . I T ] A ug I. I

NTRODUCTION

In mobile communication, massive multiple-input multiple-output (MIMO) systems, where abase station (BS) is equipped with a large number of antennas, have recently been consideredto drastically improve system performance in terms of spectral and energy efﬁciency [1]–[3]. Inthe conventional frequency band, precoding is typically processed only in the digital domain forinterference mitigation among spatial substreams, which results in the requirement of a dedicatedradio frequency (RF) chain and an analog-to-digital or digital-to-analog converter (ADC/DAC)for each antenna [4], [5]. As a result, the cost and power consumption of the transceiver increaseapproximately proportionally to the number of antennas [6], [7], which can lead to excessivepower consumption in massive MIMO systems.

A. Related works

The hybrid analog-digital architecture, where signal processing is divided into the RF andbaseband domains, is considered a practical transceiver design for massive MIMO systemsbecause it can provide an enhanced tradeoff between the achievable spectral efﬁciency andpower consumption [8]–[12].A line of research has sought to optimize the tradeoff between the performance and powerconsumption of hybrid precoding/combining by optimally designing an analog precoding networkbased on phase shifters and switches [7], [8], [13]–[16]. In [7], Gholam et al. present threesimpliﬁed analog combining architectures that rely on different combinations of phase shiftersand variable gain ampliﬁers. Out of these three architectures, the one based only on phase shiftersprovides the best bit-error-rate (BER) performance. Gholam et al. have shown that their proposedarchitecture can reduce overall system cost and power consumption; however, these reductionscome at the expense of a signal-to-noise ratio (SNR) loss of at least 2 dB. In [13], Payami etal. propose a technique that successively approximates the desired overall analog precoder as alinear combination of practical analog precoders, which employ only a practical number of phaseshifters. Although the approach proposed in [13] can reduce the power consumption on the RFend, improvements in spectral efﬁciency are seen only when the channel follows the Rayleighfading model, which is generally impractical in mmWave communications due to limited scatters.The work of [14] proposes switch-only architectures in order to reduce complexity and powerconsumption on an order of while providing equivalent or better channel estimationperformance and spectral efﬁciency when compared with structures based on phase shifters

August 28, 2019 DRAFT for which the number of quantization bits is low. However, when there are more than fourquantization bits, switch-only architectures suffer from a signiﬁcant loss in spectral efﬁciencycompared to the architectures that employ phase shifters. In order to exploit the advantages ofboth switches and phase shifters, [15] proposes an analog architecture that employs both. Thisarchitecture proﬁts from an improved tradeoff between performance and power consumption byoptimizing the number of RF chains and phase shifters used in the analog part.Most past studies have considered fully connected structures for hybrid precoding to achievefull precoding gains [4], [10], [17]–[22]. To further reduce power consumption and hardwarecomplexity, the sub-connected structure, which connects each RF chain to only a subset ofantennas, can also be considered [4], [22]. However, its beamforming gain is only /N thatof the fully connected structure, where N is the number of sub-antenna arrays in the sub-connected structure. In the literature, few studies [4], [9], [23]–[26] have focused on improvingthe performance of the sub-connected structure. In [4], an algorithm called successive interferencecancellation (SIC)-based hybrid precoding was proposed to improve the energy efﬁciency of thesub-connected architecture. In [9], a dynamic sub-array structure was proposed to dynamicallyadapt to the spatial channel covariance matrix. In [23], an analog precoder was designed byusing the idea of interference alignments. This precoding scheme exploits the alternating directionoptimization method, where the phase shifter is adjusted using an analytical structure to optimizethe analog precoder and combiner. In [24], a low-complexity hybrid combining design basedon virtual path selection was introduced for both fully connected and sub-connected structures.In [25], a switch-based adaptive sub-connected architecture is proposed for hybrid precoding inmultiuser massive MIMO systems. Using a switching network, the precoding weights and theirposition in the precoding matrix are adaptively selected to match the channel entries with thelargest amplitude, resulting in a signiﬁcant improvement in the total achievable rate. In contrast,an adaptive antenna selection scheme is proposed for the transmitter with a limited number ofRF chains in [26]. In this scheme, a subset of the transmit antennas are adaptively connected toRF chains using a switching network, whereas the corresponding subset of transmit antennas isselected by low-complexity transmit selection algorithms. All these schemes have been proposedto optimize the hybrid precoder/combiner for the sub-connected structure in which the samenumber of antennas or a single antenna is assigned to the sub-arrays. This work proposes anovel sub-connected architecture to further improve performance. August 28, 2019 DRAFT

B. Contributions

Most past work in the area considered hybrid precoding schemes at the transmitter of massiveMIMO systems with a focus on the fully connected architecture in analog processing. Comparedwith the fully connected architecture, the sub-connected structure is more advantageous in termsof practical deployment, complexity, and power consumption. However, its application to thereceiver in massive MIMO systems has not been extensively investigated. In this work, wepropose a novel unequal sub-connected architecture to improve the achievable rate of hybridcombining at the receiver in massive MIMO systems. The main idea of this scheme is to optimizethe number of antennas assigned to the sub-antenna arrays based on channel conditions, whichis achieved by the theoretical analysis and low-complexity antenna allocation algorithms.Although some adaptive hybrid beamforming schemes have been introduced in literature [25],[26], our proposed scheme has a novel system structure. Speciﬁcally, in most of the prior workson sub-connected architectures, the same number of antennas are assigned to sub-arrays [4], [9],[25] or only a subset of the antennas are selected for signal precoding/combining [26]. Unlikethose prior works, our proposed architecture allows forming sub-arrays with different numbersof antennas, and all the antennas are employed for signal combining. Further comparison interms of system structure and performance are given in Section V.Our main contributions can be summarized as follows: • We ﬁrst investigate a system employing the conventional sub-connected structure, where thesame number of antennas are assigned to each sub-antenna array. In particular, we derivethe upper bound of its total achievable rate when factorization-aided analog combiningis employed. We then show that this upper bound is unreachable owing to the ﬁxedallocation of antennas in the conventional sub-connected structure, which limits overallsystem performance. • We then propose a novel sub-connected architecture called the unequal sub-array (UESA)architecture, where different numbers of antennas can be assigned to sub-antenna arraysbased on channel conditions. In the proposed architecture, the upper bound of the totalachievable rate can be enhanced and, at the same time, the total achievable rate becomesmore likely to reach its upper bound. As a result, overall performance can be improved. • Although an improvement in performance is achieved, the proposed UESA architecturerequires high computational complexity to ﬁnd the optimal number of antennas for each

August 28, 2019 DRAFT sub-antenna array via an exhaustive search (ES). To solve this problem, we propose threelow-complexity near-optimal algorithms that can substantially reduce the computationalburden at the cost of only marginal performance losses. • The power consumption and energy efﬁciency of the proposed architecture are also evalu-ated. Although the switching network in the proposed architecture causes additional powerconsumption compared with the conventional architecture, this is not signiﬁcant. Finally,simulations are performed to justify the performance improvements of the proposed archi-tecture and algorithms. Furthermore, the performance of the proposed schemes are comparedto those of existing adaptive hybrid beamforming architectures, including the schemes in[25] and [26]. Our simulation results show that the proposed schemes perform far betterthan the adaptive antenna selection scheme in [26]. It is also shown that by combiningthe proposed schemes with the adaptive hybrid beamforming scheme in [25], signiﬁcantimprovements in the total achievable rates can be achieved.The remainder of this paper is organized as follows: Section II introduces the system model,and Section III presents the factorization-aided analog-combining algorithm and the achievablerate of the conventional sub-connected structure. In Section IV, the total achievable rate andpower consumption of the proposed sub-connected architecture are analyzed, and low-complexityantenna allocation algorithms are proposed. Section V presents simulation results, followed bythe conclusion in Section VI.

Notations : Throughout this paper, scalars, vectors, and matrices are denoted by lower-case,bold-face lower-case, and bold-face upper-case letters, respectively. The ( i, j ) th element of amatrix A is denoted by a i,j , whereas ( · ) T and ( · ) H denote the transpose and conjugate transposeof a matrix, respectively. Furthermore, (cid:107)·(cid:107) represents the Frobenius norm of a matrix while | S | denotes the cardinality of a set S . Moreover, A ( i : j ) and A ( i : j, n ) with j > i denote a sub-matrix of A and a sub-column of the n th column of A , respectively, consisting of the elementson rows { i, i + 1 , . . . , j } of A . Finally, I N denotes the identity matrix of size N × N , and isa vector of zeros with an appropriate number of dimensions.II. S YSTEM MODEL

Consider the uplink of a multi-user MIMO system consisting of a BS equipped with N r receiveantennas and N RF chains, where

N < N r , and K single-antenna mobile stations (MSs). Forsimplicity, we assume that N = K . In a massive MIMO setup, N r is assumed to be much August 28, 2019 DRAFT ⋯ 𝒙𝑾 ⋯ 𝑀 ⋯ 𝑀 S i g n a l d e c o d e r LNA

RF Chain 1

LNA

ADC

LNA RF Chain 𝑁 LNA

ADC ⋯ sub-array 1sub-array 𝑁 (a) Conventional ESA architecture ⋯ 𝒙 𝑾 𝑁 𝑟 − 1 𝑁 𝑟 S i g n a l d e c o d e r LNA

RF Chain 1

LNA

ADC

LNA

RF Chain 𝑁 LNA

ADCSwitching network ⋯ ⋯⋯⋯⋯ (b) Proposed UESA architectureFig. 1. Illustration of the conventional ESA and proposed UESA architectures. larger than K . In this work, for analog combining, we focus on the sub-connected architecturewhere each RF chain is connected to a sub-array, which is a group of phase shifters and antennaelements. The conventional sub-connected architecture is illustrated in Fig. 1(a), where N groupsof M antennas are equally assigned to N sub-arrays, which are in turn connected to N RFchains [5], [23]. In this work, we refer to this conventional structure as the equal sub-array (ESA) architecture. Furthermore, we assume that following processing by the analog combiner,the received signals are passed through capacity-achieving advanced digital receivers, such assphere decoding [27] and tabu search [28]. The analog-combined signal at the receiver can beexpressed as y = W H Hx + W H z , (1)where x ∈ C K × is the vector of symbols sent from K MSs. We assume that the average transmitpower of each MS is normalized to one, i.e., E (cid:8) | x k | (cid:9) = 1 , k = 1 , . . . , K , and z ∈ C N r × isa vector of independent and identically distributed (i.i.d.) samples of additive white Gaussiannoise (AWGN), where z i ∼ CN (0 , N ) . Furthermore, H ∈ C N r × K denotes the channel matrixconsisting of K column vectors h , h , . . . , h K representing channels between K MSs and theBS. Each channel entry h i,k represents the complex channel gain between the k th MS and the August 28, 2019 DRAFT i th receive antenna of the BS. The analog combining matrix W ∈ C N r × N is given by W =  w w . . .

00 0 0 w N  , where w n = [ w (1) n , w (2) n , . . . , w ( M ) n ] T is the analog weighting vector for the n th sub-array, and w ( m ) n is the m th element of w n , which has the constant amplitude / √ M but different phases,i.e., w ( m ) n = √ M e jθ ( m ) n , n = 1 , . . . , N, m = 1 , . . . , M [4], [29].III. F ACTORIZATION - AIDED ANALOG C OMBINING AND THE CONVENTIONAL

ESA

SYSTEM

A. Factorization-aided analog combining algorithm

The idea of factorization-aided hybrid precoding was ﬁrst proposed in [4] to optimize thehybrid precoding matrix at a transmitter employing the conventional ESA architecture. In thissection, we extend it to optimize the analog combiner at the receiver. The total achievable rate R for the analog-combined signal in (1) can be expressed as [10] R = log det (cid:0) I N + R − W H HH H W (cid:1) , (2)where R = N W H W is the noise covariance matrix after combining. Because W has a block-diagonal structure with entries in the form of w ( m ) n = √ M e jθ ( m ) n , we have W H W = I N . By letting ρ = N , we have R − = ρ I N , and (2) can be rewritten as R = log det (cid:0) I K + ρ H H WW H H (cid:1) . (3)The optimal analog combiner is the solution of the problem of optimizing the total achievablerate, which can be formulated as W (cid:63) = arg max W R, s.t. w , w , . . . , w N ∈ F , (4)where F is the set of feasible analog combiners. We deﬁne H n ∈ C M × K as the n th sub-matrixof H consisting of rows { M ( n −

1) + 1 , . . . , M n } in H . Owing to the block-diagonal structureof W , we have H H W = (cid:2) H H w , H H w , . . . , H HN w N (cid:3) , which leads to H H WW H H = (cid:80) Nn =1 G n , where G n = H Hn w n w Hn H n . The following lemmaexpresses the total achievable rate R in (3) as the sum of sub-rates: August 28, 2019 DRAFT

Lemma 1:

The total achievable rate can be factorized into the sum of N sub-rates correspondingto N sub-arrays as follows: R = N (cid:88) n =1 log (cid:0) ρ w Hn T n w n (cid:1) , (5)where T n = H n Q − n − H Hn , with Q = I K and Q n − = Q n − + ρ G n − . (6) Proof:

See Appendix A. (cid:3)

Lemma 1 reveals that the total achievable rate R can be optimized by optimizing the sub-rates R n = log (cid:0) ρ w Hn T n w n (cid:1) , n = 1 , , . . . , N, (7)where R n is the sub-rate corresponding to the n th sub-array. As a result, ﬁnding the optimalcombining matrix W (cid:63) in (4) can be factorized into ﬁnding the combining vectors w (cid:63)n , n =1 , . . . , N, that are the solutions of w (cid:63)n = arg max w n R n , s.t. w n ∈ F , (8)and the optimal solution is given by [4] w (cid:63)n = arg min w n ∈F (cid:107) u (cid:63)n − w n (cid:107) , where u (cid:63)n is the eigenvector of T n corresponding to its largest eigenvalue. We note that T n is apositive semideﬁnite Hermitian matrix because by (6), Q n − is a positive semideﬁnite matrix.The n th sub-rate R n in (7) can be rewritten as R n = log (cid:0) ρ w Hn H n Q − n − H Hn w n (cid:1) . (9)It is evident from (6) and (9) that R n depends not only on H n , but also on H n − , H n − , . . . , H .Thus, given the order of { H , H , . . . , H N } , the optimization problems in (8) can be solved oneby one in order of { w (cid:63) , w (cid:63) , . . . , w (cid:63)N } . This procedure is presented in Algorithm 1. In particular,in step 3, the n th sub-matrix H n is obtained from H , which allows T n and u (cid:63)n to be computedin steps 4 and 5, respectively. In step 6, the combining vector w (cid:63)n is obtained by quantizing u (cid:63)n ,which ensures that the resultant analog combiner belongs to the feasible set F . Step 8 computes G n , which allows Q n to be updated in step 9 based on (6). August 28, 2019 DRAFT

Algorithm 1

Factorization-aided analog-combining algorithm Q = I K for n = 1 → N do H n = H ( M ( n −

1) + 1 :

M n ) T n = H n Q − n − H Hn Set u (cid:63)n to the eigenvector corresponding to the largest eigenvalue of T n . w (cid:63)n = Q ( u (cid:63)n ) W (cid:63) ( M ( n −

1) + 1 :

M n, n ) = w (cid:63)n G n = H Hn w (cid:63)n w (cid:63)nH H n Q n = Q n − + ρ G n end for B. Conventional ESA architecture

Let σ ( n ) be the largest eigenvalue of T n . The total achievable rate in (5) is upper-boundedby R ESA ≤ R ub , ESA = N (cid:88) n =1 log [1 + ρσ ( n )] . (10)The equality occurs when w (cid:63)n = u (cid:63)n , n = 1 , , . . . , N , i.e., quantization is not applied to generate w (cid:63)n . By applying Jensen’s inequality, we have R ub , ESA ≤ R ubESA = N log (cid:32) ρN N (cid:88) n =1 σ ( n ) (cid:33) , (11)where the equality occurs when σ (1) = σ (2) = . . . = σ ( N ) , i.e., the largest eigenvalues of T , T , . . . , T N are the same. Lemma 2:

In the conventional ESA system, trace ( T n ) is a descending function of n when N r grows while K is kept constant. Proof:

See Appendix B. (cid:3)

Remark 1:

By Lemma 2, trace ( T n ) , which is the sum of all eigenvalues of T n , decreaseswith n in the ESA system. Under the typical assumption for massive MIMO systems whereby N r is sufﬁciently larger than K , it is nearly impossible for the largest eigenvalues σ ( n ) , n =1 , , . . . , N to satisfy the condition on equality in (11), i.e., σ (1) = . . . = σ ( N ) in the ESAsystem. This is also conﬁrmed by the simulation results in Section V, which shows that similarto trace ( T n ) , σ ( n ) tends to decrease with n . August 28, 2019 DRAFT0

This result shows that in the ESA system, it is difﬁcult to reach the upper bound of the totalachievable rate in (11), which motivates us to design a new architecture called the UESA in thenext section. IV. P

ROPOSED

UESA

ARCHITECTURE

To improve performance, we propose assigning unequal numbers of antennas to sub-antennaarrays. In the UESA architecture, the receive antennas are dynamically connected to RF chainsvia a switching network, as illustrated in Fig. 1(b). The switching network allows the UESAscheme to adaptively assign antennas to sub-antenna arrays or, equivalently, to adaptively connectantennas to RF chains. The feasibility of switch-based analog beamforming has been widelyconsidered in the literature. For switching, tunable switches [14], [15], [30] or digital chips [25]with low power consumption, low hardware costs, and high turning speed [31] can be used.Let m n be the number of antennas assigned to the n th sub-antenna array, which is in the n thiteration of Algorithm 1. We then have m n ≥ and (cid:80) Nn =1 m n = N r . Therefore, in the UESAarchitecture, m n is a value satisfying ≤ m n ≤ N r − N + 1 . For simplicity, we deﬁne Ψ as theset of numbers of antennas assigned to sub-arrays one to N , i.e., Ψ = { m , m , . . . , m N } . A. Design of the UESA architecture

In the following analysis, to distinguish the proposed UESA system from the conventionalESA system, the largest eigenvalue of T n in the UESA system is denoted by µ ( n ) . In a similarmanner to (10) and (11), we obtain the upper bounds of the total achievable rate of the proposedUESA architecture R UESA as follows: R UESA ≤ R ub , UESA = N (cid:88) n =1 log (1 + ρµ ( n )) , (12)and R ub , UESA ≤ R ubUESA = N log (cid:32) ρN N (cid:88) n =1 µ ( n ) (cid:33) . (13)The equality in (12) is obtained when w (cid:63)n = u (cid:63)n , n = 1 , , . . . , N and that in (13) is obtainedwhen µ (1) = µ (2) = . . . = µ ( N ) . To optimize the sum rate, based on (13), we design theUESA architecture such that R ub , UESA ≈ R ubUESA , i.e., µ (1) ≈ µ (2) ≈ . . . ≈ µ ( N ) . (14) August 28, 2019 DRAFT1

At the same time, R ubUESA should be maximized, and this can be achieved by maximizing (cid:80) Nn =1 µ ( n ) . To achieve these objectives, we need to optimize µ ( n ) , n = 1 , , . . . , N . However,it is difﬁcult to directly optimize them, because of which we consider trace ( T n ) for optimization.In particular, considering that µ ( n ) is the largest eigenvalue of T n , we relax (14) totrace ( T ) ≈ trace ( T ) ≈ . . . ≈ trace ( T N ) . (15)The simulation results in Section V numerically verify that by achieving (15), µ ( n ) , n =1 , , . . . , N , can approach condition (14).Similar to (15), because it is difﬁcult to directly maximize (cid:80) Nn =1 µ ( n ) , we instead maximizeits upper bound (cid:80) Nn =1 trace ( T n ) . Consequently, we consider the following design objectives forthe UESA: • O : trace ( T ) ≈ . . . ≈ trace ( T N ) • O : (cid:80) Nn =1 trace ( T n ) is maximized

1) UESA design for O : As discussed in Remark 1, it is unlikely that objective O is satisﬁedin the conventional ESA system because M is a constant value in (30). Therefore, in the UESAsystem, we propose assigning different numbers of antennas to sub-antenna arrays. In otherwords, m , m , . . . , m N are not necessarily the same, and for a given set Ψ = { m , m , . . . , m N } ,the following theorem suggests a constraint on Ψ for optimizing the achievable rate. Theorem 1:

To satisfy the objective O , Ψ should be arranged in non-decreasing order, i.e., m ≤ m ≤ . . . ≤ m N . Proof:

See Appendix C (cid:3)

Antenna allocation based on Theorem 1 can yield objective O , which makes R UESA closerto its upper bound R ubUESA and yields the enhanced achievable rate of the UESA system. Thisalso implies that R ubUESA becomes a tighter upper bound for the total achievable rate comparedwith R ubESA . For further performance improvement, under constraint m ≤ m ≤ . . . ≤ m N , weconsider the second design objective O .

2) UESA design for O : To achieve O , we consider the following theorem: Theorem 2:

When N r grows while K is kept constant, (cid:80) Nn =1 trace ( T n ) can be maximized bysorting the rows of H in decreasing order of their norms. August 28, 2019 DRAFT2

Proof:

In a UESA architecture, H n has size m n × K . If N r grows while K is ﬁxed in amassive MIMO system, m n becomes much larger than K . Then, we have [32] H Hn H n ≈ diag (cid:26)(cid:13)(cid:13)(cid:13) h ( n )1 (cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13) h ( n )2 (cid:13)(cid:13)(cid:13) , . . . , (cid:13)(cid:13)(cid:13) h ( n ) K (cid:13)(cid:13)(cid:13) (cid:27) , which is a diagonal matrix of size K × K with (cid:13)(cid:13)(cid:13) h ( n )1 (cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13) h ( n )2 (cid:13)(cid:13)(cid:13) , . . . , (cid:13)(cid:13)(cid:13) h ( n ) K (cid:13)(cid:13)(cid:13) on the maindiagonal, where h ( n ) k is the k th column of H n . Therefore, (29) can be rewritten astrace ( T n ) = K (cid:88) k =1 (cid:13)(cid:13)(cid:13) h ( n ) k (cid:13)(cid:13)(cid:13) q ( n − k,k ≈ (cid:13)(cid:13)(cid:13) h ( n ) (cid:13)(cid:13)(cid:13) K (cid:88) k =1 q ( n − k,k (16) = (cid:13)(cid:13)(cid:13) h ( n ) (cid:13)(cid:13)(cid:13) trace (cid:0) Q − n − (cid:1) , (17)where q ( n − k,k is the k th diagonal element of Q − n − . Because (cid:13)(cid:13)(cid:13) h ( n ) k (cid:13)(cid:13)(cid:13) is approximately the same forall k , we can write (cid:13)(cid:13)(cid:13) h ( n ) k (cid:13)(cid:13)(cid:13) ≈ (cid:13)(cid:13)(cid:13) h ( n ) (cid:13)(cid:13)(cid:13) , k = 1 , , . . . , K , and with the note that trace (cid:0) Q − n − (cid:1) = (cid:80) Kk =1 q ( n ) k,k , we obtain (16) and (17). Letting φ h = (cid:20)(cid:13)(cid:13)(cid:13) h (1) (cid:13)(cid:13)(cid:13) , . . . , (cid:13)(cid:13)(cid:13) h ( N ) (cid:13)(cid:13)(cid:13) (cid:21) T , φ Q = (cid:2) trace (cid:0) Q − (cid:1) , . . . , trace (cid:0) Q − N − (cid:1)(cid:3) T , we obtain N (cid:88) n =1 trace ( T n ) = φ T h φ Q . (18)We note that for a given φ Q , (18) is maximized if φ h and φ Q are in parallel as much aspossible. Because trace (cid:0) Q − n − (cid:1) decreases with n as proved in Appendix B, the elements of φ Q are in decreasing order. Furthermore, in Appendix D, where the decreasing rate of trace (cid:0) Q − n − (cid:1) is considered, we show that it does not signiﬁcantly change with channel ordering. Therefore,from (18), it can be concluded that φ T h φ Q can be maximized if the rows of H are ordered suchthat (cid:13)(cid:13)(cid:13) h ( n ) k (cid:13)(cid:13)(cid:13) also decreases with n . Under the constraint m ≤ m ≤ . . . ≤ m N , the channelrows are ordered in decreasing order of their norms. (cid:3) Theorems 1 and 2 give insights into the design of the UESA architecture that allow forimprovements in the total achievable rate of the conventional ESA architecture. However, theydo not guarantee the optimal total rate because a large number of combinations Ψ satisﬁes August 28, 2019 DRAFT3 m ≤ m ≤ . . . ≤ m N . In the subsections below, we propose algorithms to ﬁnd the optimalnumber of antennas for each sub-antenna array. Theorems 1 and 2 are used to ﬁnd the near-optimal numbers of antennas in sub-arrays with low computational complexity. B. Algorithms for antenna allocation in UESA1) UESA with exhaustive search (UESA-ES):

Let Ψ (cid:63) = { m (cid:63) , m (cid:63) , . . . , m (cid:63)N } be the set ofoptimal numbers of antennas for sub-arrays. This provides the highest achievable rate among allpossible candidates of Ψ . Channel ordering based on Theorem 2 is ﬁrst performed, and the ESalgorithm then examines all candidates in S = (cid:40) Ψ : 1 ≤ m n ≤ N r − N + 1 , N (cid:88) n =1 m n = N r (cid:41) to determine Ψ (cid:63) . For each candidate, Algorithm 1 is used to ﬁnd the analog combining matrix W (cid:63) . Finally, Ψ (cid:63) is set to the candidate that provides the highest sum rate. Then, | S | becomes verylarge, especially for massive MIMO systems, and this results in excessively high computationalcomplexity. To resolve this problem, we propose an algorithm that performs search in a subsetof S with much lower complexity. Algorithm 2

Analog combining with the UESA-RES algorithm

Input: S r Output: W (cid:63) UESA-RES , Ψ (cid:63) Obtain H by ordering the channel rows in decreasing order of their norms. τ = 0 for i = 1 → | S r | do Set Ψ to the i th candidate in S r . Use Algorithm 1 to ﬁnd W UESA for Ψ and H . R UESA = log det (cid:0) I K + ρ H H W UESA W H UESA H (cid:1) if R UESA > τ then W (cid:63) UESA-RES = W UESA Ψ (cid:63) = Ψ τ = R UESA end if end for

August 28, 2019 DRAFT4

2) UESA with reduced-ES (UESA-RES):

Based on Theorem 1, instead of searching for Ψ (cid:63) inthe entire space S , the RES algorithm performs search only in S r , a sub-set of S , which features Ψ such that m ≤ m ≤ . . . ≤ m N , i.e., S r = { Ψ : Ψ ∈ S , m ≤ m ≤ . . . ≤ m N } . The UESA-RES algorithm is summarized in Algorithm 2. After ordering the channel rowsbased on Theorem 2 in step 1, the candidates in S r are sequentially tested. Each candidate Ψ is used as input to Algorithm 1 to ﬁnd the combining matrix and the corresponding totalachievable rate as shown in steps 5 and 6, respectively. In steps 7–11, W (cid:63) UESA-RES and Ψ (cid:63) areupdated whenever a combining matrix W UESA and Ψ with a higher sum rate are found. Simulationresults in Section V show that the RES algorithm provides almost identical performance to theES algorithm while incurring considerably lower computational complexity.

3) UESA with RES and early termination (UESA-RES-ET):

In the search for Ψ (cid:63) , we ﬁrstexamine Ψ with small differences among its elements. Alternatively, we can ﬁrst examine Ψ withlarge differences among its elements. The optimal search order depends on how fast trace (cid:0) Q − n − (cid:1) decreases with n in (42). Through simulations, we found that trace (cid:0) Q − n − (cid:1) tends to decreasesubstantially quickly with n , which implies that Ψ with large differences between neighboringelements is more likely to be Ψ (cid:63) . Based on this observation, in the UESA-RES-ET algorithm,candidates for Ψ in S r are sorted according to the differences between neighboring elements of Ψ . Furthermore, if no better candidate is found over a certain number of iterations, the searchprocess is terminated.The UESA-RES-ET algorithm is summarized in Algorithm 3. A vector Π = (cid:8) π , . . . , π | S r | (cid:9) corresponding to | S r | candidates in S r is ﬁrst computed. Speciﬁcally, for a candidate Ψ = { m , . . . , m n , m n +1 , . . . , m N } , π i is given by π i = (cid:81) N − n =1 ( d n + 1) , as shown in step 5, where d n = | m n +1 − m n | , d n ∈ [0 , N r − N ] is the difference between m n +1 and m n , which is computedin step 4 . A larger π i does, however, imply larger differences among the elements of Ψ . As aresult, Ψ with large π i is considered as a more promising candidate, and should be tested ﬁrst.Therefore, in step 7, S r is obtained by sorting the elements of S r in descending order of elementsof Π , and the candidates in S r are tested one by one to determine W (cid:63) UESA-RES-ET and Ψ (cid:63) in steps We employ this form of π i instead of the sum of differences, i.e., (cid:80) N − n =1 d n , because the latter only measures the differencebetween m and m N , i.e., (cid:80) N − n =1 d n = | d − d N − | , and a larger (cid:80) N − n =1 d n does not imply larger differences among theelements of Ψ . August 28, 2019 DRAFT5 count is used to record the number of iterations forwhich new candidates do not provide any further performance improvement. For each candidate Ψ that does not improve the achievable rate, count increases by one, and is otherwise reset tozero, as shown in steps 16 and 21, respectively. When count max is reached, it is likely that theoptimal candidate has already been searched, and thus the search process is terminated in step24.

4) UESA with fast antenna allocation (Fast-UESA):

Although UESA-RES and UESA-RES-ET signiﬁcantly reduce the complexity of antenna allocation, they still need to examine a largenumber of candidates in the case of large | S r | . In order to further reduce complexity, we proposea fast antenna allocation algorithm for UESA (Fast-UESA).Let Ψ = { m , . . . , m N } be the ﬁrst candidate in S r . The Fast-UESA algorithm starts by taking Ψ as the initial solution and subsequently updating it based on objectives O and O in orderto achieve improved sum rate. Speciﬁcally, in order to increase the upper bound of the sum rate ( R ubESA ) and to make it reachable, we update the elements of Ψ such that the largest eigenvalues µ (1) , . . . , µ ( N ) are enhanced while minimizing their differences. Intuitively, if µ ( n ) is muchsmaller than the others, more antennas should be assigned to the n th sub-array, i.e., m n shouldbe increased. However, an increase in m n can make the constraint (cid:80) n m n = N r unsatisﬁed. Tosolve this problem, we set the last element of Ψ to m N = N r − (cid:80) N − n =1 m n . The motivation forthis approach is that in Ψ , the last element m N is typically much larger than the others. Werecall that Ψ is the ﬁrst candidate in S r and that it has maximal distances between elements,as described in the UESA-RES-ET algorithm. For example, for N r = 64 , N = 4 , we have Ψ = { , , , } . An increase of m by one leads to a decrease of m N = 39 by one, whichcan signiﬁcantly improve the sum rate of the ﬁrst sub-array but does not signiﬁcantly affect thesum rate of the last sub-array.The Algorithm 4 summarizes the Fast-UESA algorithm with the input of the ﬁrst elementin S r and the ordered channel matrix. Steps 1–3 initialize the algorithm by ﬁnding W UESA and µ (1) , . . . , µ ( N ) , computing the initial sum rate τ , and assigning Ψ to Ψ (cid:63) . Elements of Ψ arethen updated over I iterations, as presented in steps 5–26. First, in step 6, ∆ n , n = 1 , . . . , N , arecomputed. These metrics measure the differences between µ ( n ) , n = 1 , . . . , N , and the averagevalue of { µ (1) , . . . , µ ( N ) } ; these differences affect objective O . For example, if ∆ n (cid:28) and ∆ k ≈ , k (cid:54) = n , µ ( n ) becomes much smaller than µ ( k ) , k (cid:54) = n . In that case, it will be difﬁcultto guarantee objectives O and O . Therefore, we compare ∆ n to a predeﬁned threshold γ , which August 28, 2019 DRAFT6

Algorithm 3

Analog combining with the UESA-RES-ET algorithm

Input: S r , count max Output: W (cid:63) UESA-RES-ET , Ψ (cid:63) Π = [0 , , . . . , , ( | S r | zeros ) for i = 1 → | S r | do Set Ψ = { m , m , . . . , m N } to the i th candidate in S r . Compute { d , d , . . . , d N − } with d n = | m n +1 − m n | . Compute π i = (cid:81) N − n =1 ( d n + 1) and set it to the i th element of Π . end for Obtain S r by sorting the candidates in S r in descending order of the elements of Π . Obtain H by ordering the channel rows in descending order of their norms. τ = 0 , count = 0 for i = 1 → | S r | do if count < count max then Set Ψ to the i th candidate in S r . Use Algorithm 1 to ﬁnd W UESA for Ψ and H . R UESA = log det (cid:0) I K + ρ H H W UESA W H UESA H (cid:1) if R UESA < τ then count = count + 1 else W (cid:63) UESA-RES-ET = W UESA Ψ (cid:63) = Ψ τ = R UESA count = 0 end if else

Terminate the search process. end if end for

August 28, 2019 DRAFT7 is set in step 4, to decide whether more antennas should be assigned to the n th sub-array, asshown in steps 8–10. In step 11, m N is adjusted to guarantee that the constraint (cid:80) N − n =1 m n = N r is satisﬁed. In steps 14–21, only those Ψ that are member of S r are examined. Speciﬁcally, instep 15, we obtain µ (1) , . . . , µ ( N ) based on Algorithm 1, which are then used in step 16 ascondition for updating Ψ (cid:63) . This condition guarantees that the updated Ψ (cid:63) will achieve larger (cid:80) Nn =1 µ ( n ) , which affects the upper bound of the total achievable rate, as discussed in SectionIV-A. We note here that this step is different from the corresponding steps in the UESA-RESand UESA-RES-ET algorithms, which check the total achievable rate for updating Ψ (cid:63) . Thiscontributes to a reduction in the complexity of the Fast-UESA algorithm. In step 23, the updateprocess is terminated early in the case that m N < m N − . Finally, W (cid:63) Fast-UESA and Ψ (cid:63) become thebest ones found so far. C. Power consumption and hardware cost

In the ESA and UESA architectures illustrated in Figs. 1(a) and 1(b), respectively, each receiveantenna needs a low-noise ampliﬁer (LNA) and a phase shifter while each RF chains requiresan ADC. Let P LNA , P PS , P SW , P RF , and P ADC be the power consumed by the LNA, phase shifter,switch, RF chain, and ADC, respectively. Then, the total power consumed by the ESA andUESA architectures can be expressed as P ESA = N r ( P LNA + P PS ) + N ( P RF + P ADC ) (19)and P UESA = N r ( P LNA + P PS ) + N ( P RF + P ADC ) + N r P SW , (20)respectively. It is observed from (19) and (20) that the UESA system additionally requirespower consumption of N r P SW compared with the conventional ESA system due to the use ofa switching network. However, because the power consumed by switches is small [14], [30],[31], [33], [34], the corresponding increases in power consumption by the UESA scheme arenot signiﬁcant. Speciﬁcally, [14] and [30] demonstrate that the power consumption of a switchis six times lower than that of a phase shifter and 40 times lower than that of an ADC block.Therefore, the power consumption increases due to the switching network do not signiﬁcantlyaffect the total power consumption of the sub-connected architecture. We further investigate thisissue in the next section. August 28, 2019 DRAFT8

Algorithm 4

Analog combining with Fast-UESA

Input:

The ﬁrst candidate Ψ = [ m , m , . . . , m N ] of S r , and the ordered channel matrix H Output: W (cid:63) Fast-UESA , Ψ (cid:63) Use Algorithm 1 to ﬁnd µ ( n ) , n = 1 , . . . , N and W UESA for Ψ and H . τ = (cid:80) Nn =1 µ ( n ) Ψ (cid:63) = Ψ Set a threshold γ . for i = 1 → I do ∆ n = µ ( n ) − N (cid:80) n µ ( n ) , n = 1 , . . . , N for n = 1 → N do if ∆ n < γ then m n = m n + 1 end if m N = N r − (cid:80) N − n =1 m n if m N > m N − then Update Ψ = [ m , m , . . . , m N ] . if Ψ ∈ S r then Use Algorithm 1 to ﬁnd µ ( n ) , n = 1 , . . . , N and W UESA for Ψ and H . if (cid:80) Nn =1 µ ( n ) > τ then Ψ (cid:63) = Ψ W (cid:63) Fast-UESA = W UESA τ = R UESA end if end if else

Break to terminate the updating process. end if end for end for

August 28, 2019 DRAFT9

The hardware cost of a receiver signiﬁcantly depends on the number of RF chains. Theproposed UESA architecture does not require any additional RF chains compared with theconventional ESA architecture. Furthermore, switches are low-cost devices [14], [33], and thisfact results in the relatively low cost of the switching network in the proposed UESA architecture.Therefore, the proposed UESA architecture can still be a cost-effective architecture for hybridbeamforming in massive MIMO systems.V. S

IMULATION RESULTS

In this section, we numerically evaluate the achievable rate, power consumption, energyefﬁciency, and computational complexities of the proposed UESA architecture. In simulations, thechannel coefﬁcients between each MS and the BS are generated based on the geometric Saleh–Valenzuela channel model, which is a typical channel model for millimeter-wave communicationsystems [4], [20], [22], [35]. Speciﬁcally, the channel vector between the BS and the k th MScan be expressed as [15], [19], [36] h k = (cid:114) N r L k L k (cid:88) l =1 α k,l a BS ( φ k,l ) , (21)where L k is the number of effective channel paths corresponding to a limited number of scattersbetween the BS and the k th MS, α k,l and φ k,l are the gain and the azimuth angle of arrival(AoA) of the l th path, respectively. All channel path gains α k,l are assumed to be i.i.d. Gaussianrandom variables with zero mean and unit variance. Furthermore, a BS represents the normalizedreceive array response vector at the BS that depends on the structure of the antenna array. Inthis work, we considered a uniform linear array (ULA) where the array response vector is givenby [4], [19] a ( φ ) = 1 √ N r [1 , e j πλ d sin( φ ) , . . . , e j ( N r − πλ d sin( φ ) ] T , (22)where λ denotes the wavelength of the signal and d is antenna spacing in the antenna array.For simplicity, we assume an identical number of effective channel paths between each MSand the BS, which is set to L k = 10 for k = 1 , , . . . , K [9], [10], [37], [38]. The AoAs φ k,l are assumed to be uniformly distributed in [0; 2 π ] [24], [37]. The ULA model is employed forthe receive antenna array at the BS with antenna spacing of half a wavelength, i.e., dλ = in(22) [4], [15]. The phases in the analog combiner are restricted to Θ = (cid:110) , πQ , πQ , . . . , Q − πQ (cid:111) ,where Q is set to 16. Then, the analog combining vector corresponding to the n th sub-antenna August 28, 2019 DRAFT0

UESAESA (a) N r = 32 , N = K = 4 UESAESA (b) N r = 64 , N = K = 4 Fig. 2. Changing properties of the average largest eigenvalues of T n with SNR = dB. array is given as w n = (cid:104) w (1) n , . . . , w ( M ) n (cid:105) T , where w ( m ) n = r n e jθ ( m ) n with r n = √ M for the ESAarchitecture and r n = √ m n for the UESA architecture. The phase of w ( m ) n is selected from Θ being closest to the phase of the m th element of u (cid:63)n , as in step 6 of Algorithm 1. Finally, theSNR is deﬁned as the ratio of the average symbol power of a user to noise power N . A. Changing properties of the largest eigenvalues of T n In Figs. 2(a) and 2(b), the average largest eigenvalue of T n in the conventional ESA is com-pared with those of the proposed schemes, namely UESA-ES, UESA-RES, UESA-RES-ET, andFast-UESA for N = K = 4 and N r = { , } at SNR = 12 dB. We set count max = { , } for the UESA-RES-ET scheme and I = { , } , γ = { , } for the Fast-UESA algorithmcorresponding to N r = { , } . We also show trace ( T n ) , which is equal to the sum of alleigenvalues of T n in these schemes. From Figs. 2(a) and 2(b), the following observations arenoted: • It is clear that in the ESA system, similar to trace ( T n ) , σ ( n ) decreases with n . • By optimizing trace ( T n ) in the UESA system, the differences among µ (1) , µ (2) , . . . , µ ( N ) become smaller, which justiﬁes the use of constraint (15) to optimize µ ( n ) . • Among the UESA-ES, UESA-RES, and UESA-RES-ET algorithms, although slightly smallerdifferences among µ (1) , µ (2) , . . . , µ ( N ) is seen in the UESA-RES-ET algorithm, UESA- August 28, 2019 DRAFT1

SNR [dB] T o t a l a c h i e v ab l e r a t e [ bp s / H z ] Rate, ESARate, UESA-ESRate, UESA-RESRate, UESA-RES-ETRate, Fast-UESAUB, ESAUB, UESA-ESUB, UESA-RESUB, UESA-RES-ETUB, Fast-UESA (a) N r = 32 , N = K = 4 SNR [dB] T o t a l a c h i e v ab l e r a t e [ bp s / H z ] Rate, ESARate, UESA-ESRate, UESA-RESRate, UESA-RES-ETRate, Fast-UESAUB, ESAUB, UESA-ESUB, UESA-RESUB, UESA-RES-ETUB, Fast-UESA (b) N r = 64 , N = K = 4 Fig. 3. Comparison of total achievable rates and their upper bounds for the conventional ESA, the proposed UESA-ES,UESA-RES, UESA-RES-ET, and Fast-UESA schemes. T o t a l a c h i e v ab l e r a t e [ bp s / H z ] UESA-ESUESA-RESUESA-RES-ETFast-UESAESA

Fig. 4. Comparison of total achievable rates of the conventional ESA architecture and the proposed schemes with N r = 64 , N = K = { , , , } and SNR = 12 dB. ES and UESA-RES achieve higher values of (cid:80) n µ ( n ) , which results in their higher upperbounds of the achievable rate, as will be shown in the next subsection. August 28, 2019 DRAFT2

B. Total achievable rate of the UESA architecture

In Figs. 3(a) and 3(b), the total achievable rates and their upper bounds (UBs) in (11) and (13),of the ESA and UESA schemes for N = K = 4 and N r = { , } , are presented. For N r = { , } , we set count max = { , } for the UESA-RES-ET scheme and I = { , } , γ = { , } for the Fast-UESA scheme, respectively. In both ﬁgures, the UESA-ES algorithm providesthe highest total rates and upper bounds while those of the UESA-RES, UESA-RES-ET, andFast-UESA schemes are slightly lower. For example, with N r = 32 , N = K = 4 , and SNR = 0dB, the UESA-ES scheme achieves approximately a 10.5 % higher total rate compared to the ESAscheme, while the corresponding enhancements attained by the UESA-RES, UESA-RES-ET, andFast-UESA algorithms are approximately 10 % . The improvement in total rate can be explainedby the upper bounds plotted in these ﬁgures. Speciﬁcally, the proposed UESA schemes not onlyenhance the upper bounds, but also reduce gaps between the upper bounds and achievable rates.The total achievable rates of the UESA-RES-ET scheme are almost identical to those of theUESA-RES scheme in spite of its reduced search region. Although the Fast-UESA algorithmhas much lower complexity compared with UESA-RES and UESA-RES-ET, as will be shownin Section V-D, only a marginal performance loss is seen for both systems.Fig. 4 presents the total achievable rates of the proposed UESA and conventional ESAarchitectures with various numbers of RF chains N = { , , , } , N r = 64 , at SNR = 12 dB. Because the antennas are equally allocated to sub-antenna arrays in the conventional ESAarchitecture, it cannot serve N = { , } without signiﬁcant modiﬁcations. By contrast, theproposed UESA architecture can work for every number of RF chains, which makes it moreﬂexible for the implementation of sub-connected analog hybrid beamforming networks and madeit easier to achieve a certain tradeoff between spectral and energy efﬁciency. Furthermore, Fig. 4shows that the proposed UESA schemes outperform the ESA, and the performance improvementis clearer for large values of N . C. Energy efﬁciency

To compute the power consumption in (19) and (20), we use the same assumption as in [14]and [34], i.e., P LNA = p, P ADC = 10 p, P RF = 2 p, P PS = 1 . p , and P SW = 0 . p , where p = 20 mW is the reference power value. Then, from (19) and (20), we obtain P ESA = 50 N r + 240 N (mW) , August 28, 2019 DRAFT3 E ne r g y E ff i c i en cy [ bp s / H z / W ] ESAUESA-ES (a) N r = { , } , N = K = 4 S pe c t r a l E ff i c i en cy [ bp s / H z ] E ne r g y E ff i c i en cy bp s / H z / W ] ESAUESA-ESUESA-RESUESA-RES-ETFast-UESA

Total achievable rateEnergy Efficiency (b) N r = { , , , , } , N = K = 4 , SNR = 12 dBFig. 5. Comparison of spectral and energy efﬁciencies of the conventional ESA architecture and the proposed schemes. P UESA = 54 N r + 240 N (mW) . The proposed UESA architecture is observed to require N r mW of additional power. However,this is much smaller than the total required power.In Fig. 5(a), we compare the energy efﬁciency of the considered architectures for N = K =4 and N r = { , } . For the UESA architecture, the UESA-ES algorithm is used. Energyefﬁciency is deﬁned as the ratio of the total achievable rate to the power consumed. In bothsystems, we observe that even though the UESA architecture requires higher power than the ESAarchitecture, it achieves comparable energy efﬁciency to the conventional ESA for N r = 64 . Incase of N r = 32 , the improvement in the energy efﬁciency of the UESA architecture is clear,especially at low SNR values.In Fig. 5(b) the total achievable rates and energy efﬁciencies of the proposed schemes, namely,UESA-ES, UESA-RES, UESA-RES-ET, and Fast-UESA, are compared with those of the ESAscheme for N r = { , , , , } , N = K = 4 with SNR = 12 dB. In this ﬁgure, theenhancement in the achievable rate of the proposed schemes is clear for all values of N r .Furthermore, because the UESA architecture requires N r mW of additional power, the gainsin terms of energy efﬁciency of the proposed schemes decrease as N r increases. However, evenat N r = 64 , the energy efﬁciencies of the UESA and ESA architectures are still comparable. August 28, 2019 DRAFT4

Table I. Comparison of the average numbers of candidates examined by theUESA-ES, UESA-RES, UESA-RES-ET, and Fast-UESA algorithms for N = K = 4 and N r = { , } at SNR = dBAlgorithms N r = 32 N r = 64 UESA-ES

UESA-RES

249 1906

UESA-RES-ET

72 ( count max = 30) 770 ( count max = 280)

Fast-UESA

28 ( I = 20) 35 ( I = 40) D. Computational complexities

We numerically analyze the complexity reduction of the proposed near-optimal algorithms,the UESA-RES, UESA-RES-ET, and Fast-UESA, in comparison with that of the optimal UESA-ES algorithm. We note that the complexities of candidate ordering in steps 1–7 of Algorithm3 can be done ofﬂine, and thus its complexity can be ignored. Furthermore, the complexityof ordering rows of the channel is signiﬁcantly lower than that of executing Algorithm 1 formultiple candidates. Consequently, the overall complexity becomes approximately proportionalto the number of examined candidates Ψ in each algorithm.In Table I, we present the numbers of examined candidates in the UESA-ES, UESA-RES,UESA-RES-ET, and Fast-UESA algorithms for two environments, N = K = 4 and N r = { , } with SNR = dB, count max = { , } for the UESA-RES-ET algorithm, and I = { , } , γ = { , } for the Fast-UESA algorithm. Table I shows that the UESA-ES requires asubstantially larger number of candidates than the UESA-RES, UESA-RES-ET, and Fast-UESA.By contrast, the UESA-RES and UESA-RES-ET algorithms explore the reduced search region,and thus signiﬁcantly reduce complexity while producing only marginal performance losses, asobserved in Fig. 4. Furthermore, the Fast-UESA algorithm has signiﬁcantly lower complexitycompared with the other proposed algorithms. Speciﬁcally, in a large MIMO system with N r = 64 antennas and N = 4 RF chains, the UESA-ES tests approximately × candidates to ﬁnd theoptimal solution, but the proposed UESA-RES and UESA-RES-ET algorithms need to test only and candidates, approximately and , respectively, of that tested by the UESA-ES. Meanwhile, the Fast-UESA algorithm requires testing only 35 candidates, corresponding to . , . , and . complexity of the UESA-ES, UESA-RES, and UESA-RES-ET algorithm. August 28, 2019 DRAFT5 T o t a l a c h i e v ab l e r a t e [ bp s / H z ] AS [26]ESA-AHB [25]UESA-RESUESA-RES-ET-AHBUESA-RES-AHB

Fig. 6. Comparison of total achievable rates of the AS [26], ESA-AHB [25], and the combined UESA-AHB schemes,namely, UESA-RES-AHB and UESA-RES-ET-AHB, with N r = { , , , , } , N = K = 4 , and SNR = 12 dB. E. Performance comparison between the ESA, UESA, AS, AHB, and combined UESA-AHBschemes

In Fig. 6, we compare the performance of the proposed UESA scheme to those of the antennaselection (AS) [26] and the adaptive hybrid beamforming (AHB) [25] schemes with N r = { , , , , } , N = K = 4 , and SNR = 12 dB. In [25], the AHB scheme is proposed forthe conventional ESA architecture, and we refer to it as ESA-AHB to avoid confusion. For theUESA architecture, the UESA-RES algorithm is used. We observe that while the ESA-AHBscheme performs better than the UESA-RES scheme for N r ≥ , the UESA-RES schemeachieves better performance for N r ≤ , whereas the AS scheme performs far worse than boththe UESA-RES and ESA-AHB schemes for all the considered systems.We note that the AHB algorithm can be combined with the UESA architecture to create anunequal sub-array architecture with adaptive hybrid beamforming (UESA-AHB). Speciﬁcally,by inheriting the design of the UESA and AHB schemes, the UESA-AHB scheme employs acombining matrix that has different numbers of non-zero elements in combining vectors reﬂectingthe unequal numbers of antennas in sub-antenna arrays, and the positions of non-zero elements aredistributed over the range [1 , N r ] . Furthermore, unlike the ESA-AHB scheme, in the UESA-AHBscheme, the combining matrix depends on different numbers of assigned antennas across sub-arrays. Therefore, the combining matrix in the UESA-AHB scheme should be jointly optimizedwith the antenna allocation of the UESA architecture. This can be done by applying the AHB August 28, 2019 DRAFT6 scheme to step 5 of Algorithm 2 and step 13 of Algorithm 3, resulting in the UESA-RES-AHBand the UESA-RES-ET-AHB schemes, respectively.In Fig. 6, we show the total achievable rates of the UESA-RES-AHB and UESA-RES-ET-AHB schemes; and count max = 40 is used for early termination in the UESA-RES-ET-AHBscheme. It is clearly seen that the combined UESA-AHB schemes achieve substantial gains overthe existing schemes for all the considered values of N r . In particular, the UESA-RES-AHBscheme shows an approximately . improvement in the total achievable rate with respect tothe ESA-AHB scheme with N r = 64 .We note that the same switching network is employed by both the UESA architecture and theESA-AHB scheme [25]. Speciﬁcally, the switching network in Fig. 1(b) can be used for bothantenna allocation in the UESA architecture and RF chain-to-antenna connection in the ESA-AHB scheme. Therefore, the power consumptions of the UESA-RES, ESA-AHB, and UESA-AHB schemes are the same. As a result, with the signiﬁcant improvement in the total achievablerate, the UESA-AHB schemes achieve the highest energy efﬁciency compared with both theUESA-RES and UESA-AHB schemes.VI. C ONCLUSION

In this work, we propose a novel unequal sub-connected architecture for a hybrid combining atthe receiver of massive MIMO systems. Unlike the conventional ESA architecture, the proposedUESA architecture assigns different numbers of antennas to sub-antenna arrays based on channelconditions. Our analytical derivations suggest that when the factorization-aided analog-combiningalgorithm is employed, fewer antennas should be assigned to the ﬁrst sub-antenna arrays toapproach the upper bound of the total achievable rate, while more antennas should be assigned tothe last sub-antenna arrays. We also show that channel rows should be ordered in decreasing orderof their norms to enhance the upper bound of the total achievable rate. The simulation resultsdemonstrate that the proposed architecture can achieve up to a improvement in total rate witha small increase in power consumption. To reduce the complexity of determining the antenna-to-sub-array connections, the near-optimal UESA-RES, UESA-RES-ET, and Fast-UESA algorithmsare proposed. The numerical results show that they can signiﬁcantly reduce the complexity of theUESA-ES with marginal performance losses. The proposed UESA architecture requires lowerpower consumption while improving the spectral efﬁciency of the sub-connected architecture.These advantages can also be beneﬁcial for the transmitter of the downlink. Therefore, in

August 28, 2019 DRAFT7 future research, the spectral-efﬁciency analysis and antenna allocation algorithms for the UESAarchitecture can be applied to the transmitter of the downlink MIMO systems.A

PPENDIX AP ROOF OF L EMMA Q = I K + ρ H H WW H H , and then get Q = I K + ρ G + ρ G + . . . + ρ G N . (23)By deﬁning E = I K + ρ G , (23) can be rewritten as Q = E (cid:0) I K + ρ E − G + . . . + ρ E − G N (cid:1) . Similarly, by deﬁning E = I K + ρ E − G , we obtain Q = E E (cid:0) I K + ρ E − E − G + . . . + ρ E − E − G N (cid:1) . In a similar manner, Q can be factorized to the product of E n , n = 1 , . . . , N , i.e., Q = E E . . . E N , (24)where E n = I K + ρ ( E . . . E n − ) − G n . By deﬁning Q = I K and Q n = E . . . E n , n = 1 , . . . , N − , we have Q n = Q n − E n , (25)and E n can be expressed as E n = I K + ρ Q − n − G n = I K + ρ Q − n − H Hn w n w Hn H n . (26)From (24) and (26), (3) can be rewritten as R = log det Q = log det ( E E . . . E N ) = N (cid:88) n =1 log det (cid:0) I K + ρ Q − n − H Hn w n w Hn H n (cid:1) = N (cid:88) n =1 log (cid:0) ρ w Hn H n Q − n − H Hn w n (cid:1) , (27)where the last equality follows the fact that det( I K + ab T ) = 1 + b T a with a = Q − n − H Hn w n and b T = w Hn H n . For simplicity, we deﬁne T n = H n Q − n − H Hn , by inserting (26) into (25), we get Q n − = Q n − + ρ G n − . (28)Then, from (28), (27) can be written as (5). This completes the proof. August 28, 2019 DRAFT8 A PPENDIX BP ROOF OF L EMMA ( T n ) = trace (cid:0) H n Q − n − H Hn (cid:1) = trace (cid:0) H Hn H n Q − n − (cid:1) . (29)When N r grows while K is kept constant, we have H Hn H n ≈ M I K [32], which yieldstrace ( T n ) ≈ M trace (cid:0) Q − n − (cid:1) . (30)Because we have Q n − = Q n − E n − , trace (cid:0) Q − n − (cid:1) in (30) can be expressed astrace (cid:0) Q − n − (cid:1) = trace (cid:0) E − n − Q − n − (cid:1) ≤ λ max (cid:0) E − n − (cid:1) trace (cid:0) Q − n − (cid:1) (31) = λ − min ( E n − ) trace (cid:0) Q − n − (cid:1) , (32)where λ max ( A ) and λ min ( A ) are the largest and smallest eigenvalues of A , respectively. Here,(31) is obtained by the inequality trace ( AB ) ≤ λ max ( A ) trace ( B ) [39], [40]. From (26), we get λ min ( E n − ) = 1 + ρλ min (cid:0) Q − n − G n (cid:1) . (33)Based on G n − = H Hn − w n − w Hn − H n − , Q = I N , and (28), we ﬁnd that Q − n − G n in (33) is asemideﬁnite Hermitian matrix, and λ min (cid:0) Q − n − G n (cid:1) > . As a result, we have λ min ( E n − ) > and λ − min ( E n − ) < . Then, from (32), we obtaintrace (cid:0) Q − n − (cid:1) < trace (cid:0) Q − n − (cid:1) , (34)which implies that trace (cid:0) Q − n (cid:1) is a decreasing function of n . In the conventional ESA system, M in (30) is a constant. Consequently, trace ( T n ) decreases with n .A PPENDIX CP ROOF OF T HEOREM G n − = H Hn − w n − w Hn − H n − is of rank one, and according to (28) and theresult in [41], we obtain Q − n − = (cid:0) Q n − + ρ G n − (cid:1) − = Q − n − − ρ Q − n − G n − Q − n − ρ trace (cid:0) G n − Q − n − (cid:1) . (35)Then, the difference between trace (cid:0) Q − n − (cid:1) and trace (cid:0) Q − n − (cid:1) is given by ∆ n − = trace (cid:0) Q − n − (cid:1) − trace (cid:0) Q − n − (cid:1) = ρ trace (cid:0) Q − n − G n − Q − n − (cid:1) ρ trace (cid:0) G n − Q − n − (cid:1) . (36) August 28, 2019 DRAFT9

Furthermore, Q n − can be expressed as Q n − = I K + ρ G + . . . + ρ G n − = I K + ρ ˜ G n − , where ˜ G n − = ρ G + . . . + ρ G n − . According to the result in [42], we have Q − n − = I K − ρ ˜ G n − (cid:0) I K + ρ ˜ G n − (cid:1) − = I K − ρ ˜ G n − Q − n − , which yields Q − n − G n − Q − n − = Q − n − G n − − ρ Q − n − G n − ˜ G n − Q − n − . (37)In the second term on the right-hand side of (37), we have G n − ˜ G n − = ρ (cid:80) n − i =1 G n − G i .Therefore, (37) yieldstrace (cid:0) Q − n − G n − Q − n − (cid:1) = trace (cid:0) Q − n − G n − (cid:1) − ρ n − (cid:88) i =1 trace (cid:0) Q − n − G n − G i Q − n − (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) (cid:44) Υ . (38)By using the inequality trace ( AB ) ≥ λ min ( A ) trace ( B ) [39], [40], we have Υ ≥ λ min ( Q − n − G n − ) trace (cid:0) G i Q − n − (cid:1) ≥ λ min ( Q − n − G n − ) λ min ( G i ) trace (cid:0) Q − n − (cid:1) ≥ , (39)where the inequality in (39) is obtained because Q − n − and G n − are positive semideﬁniteHermitian matrices for n = 1 , , . . . , N . From (38) and (39), we havetrace (cid:0) Q − n − G n − Q − n − (cid:1) ≤ trace (cid:0) Q − n − G n − (cid:1) . (40)From (36) and (40), we have ∆ n − < , which leads to (cid:80) N − n =1 ∆ n − < N − , and hence,trace (cid:0) Q − N − (cid:1) = trace (cid:0) Q − (cid:1) − N − (cid:88) n =1 ∆ n − > N − ( N −

1) = 1 (41)with the note that Q = I N .Now assume that the UESA architecture is designed such that m trace (cid:0) Q − (cid:1) ≈ . . . ≈ m N trace (cid:0) Q − N − (cid:1) , (42)which yields m n ≈ m trace (cid:0) Q − (cid:1) trace (cid:0) Q − n − (cid:1) = N m trace (cid:0) Q − n − (cid:1) , where the second equality is obtained from the fact that Q = I N . Furthermore, due to (cid:80) Nn =1 m n = N r and the decreasing property of trace (cid:0) Q − n (cid:1) , n = 0 , . . . , N − , as proven in AppendixB, we have N r ≈ N (cid:80) Nn =1 m trace (cid:16) Q − n − (cid:17) < N m trace (cid:16) Q − N − (cid:17) , which approximately leads to m > August 28, 2019 DRAFT0 N r trace (cid:16) Q − N − (cid:17) N . From (41), we have m > N r N . In hybrid beamforming for massive MIMOsystems, it is generally assumed that a relatively small number of RF chains is used, i.e., N (cid:28) N r . Therefore, when N r grows and N is kept constant, m also grows. Considering that m ≤ m ≤ . . . ≤ m N , we can conclude that for the proposed design of the UESA architecture,we have H Hn H n ≈ m n I K , ∀ n ≤ N in massive MIMO systems. From (29), we can writetrace ( T n ) ≈ m n trace (cid:0) Q − n − (cid:1) . (43)From (42) and (43), we obtain (15) as the designing objective O .A PPENDIX DR ATE OF DECREASE OF TRACE (cid:0) Q − n − (cid:1) In the second term on the right-hand side of (38), we have G n − G i = H Hn − w n − w Hn − H n − H Hi w i w Hi H i . (44)In Appendix C, it is proven that in massive MIMO systems employing the UESA architecture,we have H Hn H n ≈ m n I K , ∀ n ≤ N . Therefore, in (44), we have H n − H Hi ≈ , i (cid:54) = n − , inmassive MIMO systems [32], which yields G n − G i ≈ . Consequently, in (38) we have Υ ≈ .Now, the difference between trace (cid:0) Q − n − (cid:1) and trace (cid:0) Q − n − (cid:1) in (36) can be approximated by ∆ n − ≈ ρ trace (cid:0) G n − Q − n − (cid:1) ρ trace (cid:0) G n − Q − n − (cid:1) . (45)Furthermore, we have σ ( n ) = u (cid:63)nH U Σ U H u (cid:63)n = trace (cid:0) u (cid:63)nH H n Q − n − H Hn u (cid:63)n (cid:1) = trace (cid:0) H Hn u (cid:63)n u (cid:63)nH H n Q − n − (cid:1) . Based on step 8 in Algorithm 1, we have trace (cid:0) G n Q − n − (cid:1) = trace (cid:0) H Hn w (cid:63)n w (cid:63)nH H n Q − n − (cid:1) with w (cid:63)n = Q ( u (cid:63)n ) . Therefore, when m n grows in massive MIMO systems, we have trace (cid:0) G n Q − n − (cid:1) → ∞ as σ ( n ) → ∞ . Then, ∆ n − approaches one for a ﬁxed SNR. Thus, channel orderingleads to no signiﬁcant difference in the rate of decrease of trace (cid:0) Q − n − (cid:1) .R EFERENCES [1] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,”

IEEE Trans. WirelessCommun. , vol. 9, no. 11, pp. 3590–3600, 2010.[2] N. T. Nguyen and K. Lee, “Cell Coverage Extension With Orthogonal Random Precoding for Massive MIMO Systems,”

IEEE Access , vol. 5, pp. 5410–5424, 2017.

August 28, 2019 DRAFT1 [3] ——, “Coverage and Cell-Edge Sum-Rate Analysis of mmWave Massive MIMO Systems With ORP Schemes and MMSEReceivers,”

IEEE Trans. Signal Process. , vol. 66, no. 20, pp. 5349–5363, 2018.[4] X. Gao, L. Dai, S. Han, I. Chih-Lin, and R. W. Heath, “Energy-efﬁcient hybrid analog and digital precoding for mmWaveMIMO systems with large antenna arrays,”

IEEE J. Sel. Areas Commun. , vol. 34, no. 4, pp. 998–1009, 2016.[5] Y. Niu, Z. Feng, Y. Li, Z. Zhong, and D. Wu, “Low complexity and near-optimal beam selection for millimeter waveMIMO systems,” in

IEEE 13th Int. Conf. Wireless Commun. Mobile Computing (IWCMC), , 2017, pp. 634–639.[6] S. Sandhu and M. Ho, “Analog combining of multiple receive antennas with OFDM,” in

IEEE Int. Conf. Commun. , vol. 5,2003, pp. 3428–3432.[7] F. Gholam, J. V´ıa, and I. Santamar´ıa, “Beamforming design for simpliﬁed analog antenna combining architectures,”

IEEETrans. Veh. Tech. , vol. 60, no. 5, pp. 2373–2378, 2011.[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity technologies to enable millimeter-wave MIMO with large antennaarray for 5G wireless communications,”

IEEE Commun. Mag. , vol. 56, no. 4, pp. 211–217, 2018.[9] S. Park, A. Alkhateeb, and R. W. Heath, “Dynamic subarrays for hybrid precoding in wideband mmWave MIMO systems,”

IEEE Trans. Wireless Commun. , vol. 16, no. 5, pp. 2907–2920, 2017.[10] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMOsystems,”

IEEE Trans. Wireless Commun. , vol. 13, no. 3, pp. 1499–1513, 2014.[11] A. Alkhateeb, J. Mo, N. Gonzalez-Prelcic, and R. W. Heath, “MIMO precoding and combining solutions for millimeter-wave systems,”

IEEE Commun. Mag. , vol. 52, no. 12, pp. 122–131, 2014.[12] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniquesfor millimeter wave MIMO systems,”

IEEE J. Sel. Topics Signal Process. , vol. 10, no. 3, pp. 436–453, 2016.[13] S. Payami, M. Ghoraishi, and M. Dianati, “Hybrid beamforming for large antenna arrays with phase shifter selection,”

IEEE Trans. Wireless Commun. , vol. 15, no. 11, pp. 7258–7271, 2016.[14] R. M´endez-Rial, C. Rusu, N. G. Prelcic, A. Alkhateeb, and R. W. Heath Jr, “Hybrid MIMO architectures for millimeterwave communications: Phase shifters or switches?”

IEEE Access , vol. 4, no. 8, pp. 247–267, 2016.[15] T. E. Bogale, L. B. Le, A. Haghighat, and L. Vandendorpe, “On the number of RF chains and phase shifters, and schedulingdesign with hybrid analog–digital beamforming,”

IEEE Trans. Wireless Commun. , vol. 15, no. 5, pp. 3311–3326, 2016.[16] V. Venkateswaran and A.-J. van der Veen, “Analog beamforming in MIMO communications with phase shift networksand online channel estimation,”

IEEE Trans. Signal Process. , vol. 58, no. 8, pp. 4131–4143, 2010.[17] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wavecellular systems,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 831–846, 2014.[18] L. Liang, W. Xu, and X. Dong, “Low-complexity hybrid precoding in massive multiuser MIMO systems,”

IEEE WirelessCommun. Lett. , vol. 3, no. 6, pp. 653–656, 2014.[19] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid precoding for multi-user millimeter wave systems,”

IEEE Trans. Wireless Commun. , vol. 14, no. 11, pp. 6481–6494, 2015.[20] Y.-Y. Lee, C.-H. Wang, and Y.-H. Huang, “A hybrid RF/baseband precoding processor based on parallel-index-selectionmatrix-inversion-bypass simultaneous orthogonal matching pursuit for millimeter wave MIMO systems,”

IEEE Trans.Signal Process. , vol. 63, no. 2, pp. 305–317, 2015.[21] M. Kim and Y. H. Lee, “MSE-based hybrid RF/baseband processing for millimeter-wave communication systems in MIMOinterference channels,”

IEEE Trans. Veh. Tech. , vol. 64, no. 6, pp. 2714–2720, 2015.[22] S. Han, I. Chih-Lin, Z. Xu, and C. Rowell, “Large-scale antenna systems with hybrid analog and digital beamforming formillimeter wave 5G,”

IEEE Commun. Mag. , vol. 53, no. 1, pp. 186–194, 2015.

August 28, 2019 DRAFT2 [23] S. He, C. Qi, Y. Wu, and Y. Huang, “Energy-efﬁcient transceiver design for hybrid sub-array architecture MIMO systems,”

IEEE Access , vol. 4, pp. 9895–9905, 2017.[24] A. Li and C. Masouros, “Hybrid precoding and combining design for millimeter-wave multi-user MIMO based on SVD,”in , 2017, pp. 1–6.[25] X. Zhu, Z. Wang, L. Dai, and Q. Wang, “Adaptive hybrid precoding for multiuser massive MIMO,”

IEEE Commun. Lett. ,vol. 20, no. 4, pp. 776–779, 2016.[26] R. Chen, J. G. Andrews, and R. W. Heath, “Efﬁcient transmit antenna selection for multiuser MIMO systems with blockdiagonalization,” in

IEEE Global Telecommun. Conf.

IEEE, 2007, pp. 3499–3503.[27] E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels,”

IEEE Trans. Inf. Theory , vol. 45, no. 5,pp. 1639–1642, 1999.[28] N. T. Nguyen, K. Lee, and H. Dai, “QR-decomposition-aided Tabu Search Detection for Large MIMO Systems,” to appearin

IEEE Trans. Veh. Technol. , 2019.[29] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wirelessbackhaul and access in small cell networks,”

IEEE Trans. Commun. , vol. 61, no. 10, pp. 4391–4403, 2013.[30] A. Alkhateeb, Y.-H. Nam, J. Zhang, and R. W. Heath, “Massive MIMO combining with switches,”

IEEE Wireless Commun.Lett. , vol. 5, no. 3, pp. 232–235, 2016.[31] R. L. Schmid, P. Song, C. T. Coen, A. C¸ . Ulusoy, and J. D. Cressler, “On the analysis and design of low-loss single-poledouble-throw W-band switches utilizing saturated SiGe HBTs,”

IEEE Trans. Microw. Theory and Techn. , vol. 62, no. 11,pp. 2755–2767, 2014.[32] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Beneﬁts andchallenges,”

IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 742–758, 2014.[33] N. Celik, W. Kim, M. F. Demirkol, M. F. Iskander, and R. Emrick, “Implementation and experimental veriﬁcation ofhybrid smart-antenna beamforming algorithm,”

IEEE Antenna Wireless Propag. Lett. , vol. 5, pp. 280–283, 2006.[34] R. M´endez-Rial, C. Rusu, A. Alkhateeb, N. Gonz´alez-Prelcic, and R. W. Heath, “Channel estimation and hybrid combiningfor mmWave: Phase shifters or switches?” in

IEEE Inf. Theory Appl. Workshop (ITA) , 2015, pp. 90–97.[35] C.-E. Chen, “An iterative hybrid transceiver design algorithm for millimeter wave MIMO systems,”

IEEE Wireless Commun.Lett. , vol. 4, no. 3, pp. 285–288, 2015.[36] J. Choi, “Analog beamforming for low-complexity multiuser detection in mm-wave systems,”

IEEE Trans. Veh. Tech. ,vol. 65, no. 8, pp. 6747–6752, 2016.[37] D. H. Nguyen, L. B. Le, and T. Le-Ngoc, “Hybrid MMSE precoding for mmWave multiuser MIMO systems,” in , pp. 1–6.[38] O. El Ayach, R. W. Heath, S. Rajagopal, and Z. Pi, “Multimode precoding in millimeter wave MIMO transmitters withmultiple antenna sub-arrays,” in

IEEE GLOBECOM, 2013 , pp. 3476–3480.[39] D. Kleinman and M. Athans, “The design of suboptimal linear time-varying systems,”

IEEE Trans. Automatic Control ,vol. 13, no. 2, pp. 150–159, 1968.[40] S.-D. Wang, T.-S. Kuo, and C.-F. Hsu, “Trace bounds on the solution of the algebraic matrix Riccati and Lyapunovequation,”

IEEE Trans. Automatic Control , vol. 31, no. 7, pp. 654–656, 1986.[41] K. S. Miller, “On the inverse of the sum of matrices,”

Mathematics Mag. , vol. 54, no. 2, pp. 67–72, 1981.[42] H. V. Henderson and S. R. Searle, “On deriving the inverse of a sum of matrices,”

Siam Review , vol. 23, no. 1, pp. 53–60,1981., vol. 23, no. 1, pp. 53–60,1981.