[PDF] A Framework on Hybrid MIMO Transceiver Design based on Matrix-Monotonic Optimization

Abstract

Hybrid transceiver can strike a balance between complexity and performance of multiple-input multiple-output (MIMO) systems. In this paper, we develop a unified framework on hybrid MIMO transceiver design using matrix-monotonic optimization. The proposed framework addresses general hybrid transceiver design, rather than just limiting to certain high frequency bands, such as millimeter wave (mmWave) or terahertz bands or relying on the sparsity of some specific wireless channels. In the proposed framework, analog and digital parts of a transceiver, either linear or nonlinear, are jointly optimized. Based on matrix-monotonic optimization, we demonstrate that the combination of the optimal analog precoders and processors are equivalent to eigenchannel selection for various optimal hybrid MIMO transceivers. From the optimal structure, several effective algorithms are derived to compute the analog transceivers under unit modulus constraints. Furthermore, in order to reduce computation complexity, a simple random algorithm is introduced for analog transceiver optimization. Once the analog part of a transceiver is determined, the closed-form digital part can be obtained. Numerical results verify the advantages of the proposed design.

Full PDF

aa r X i v : . [ c s . I T ] A ug A Framework on Hybrid MIMO Transceiver Design based onMatrix-Monotonic Optimization

Chengwen Xing,

Member, IEEE,

Xin Zhao, Wei Xu,

Senior Member, IEEE,

Xiaodai Dong,

Senior Member, IEEE, and Geoffrey Ye Li,

Fellow, IEEE

Abstract — Hybrid transceiver can strike a balance be-tween complexity and performance of multiple-inputmultiple-output (MIMO) systems. In this paper, we de-velop a uniﬁed framework on hybrid MIMO transceiverdesign using matrix-monotonic optimization. The proposedframework addresses general hybrid transceiver design,rather than just limiting to certain high frequency bands,such as millimeter wave (mmWave) or terahertz bands orrelying on the sparsity of some speciﬁc wireless channels.In the proposed framework, analog and digital partsof a transceiver, either linear or nonlinear, are jointlyoptimized. Based on matrix-monotonic optimization, wedemonstrate that the combination of the optimal analogprecoders and processors are equivalent to eigenchannelselection for various optimal hybrid MIMO transceivers.From the optimal structure, several effective algorithmsare derived to compute the analog transceivers underunit modulus constraints. Furthermore, in order to reducecomputation complexity, a simple random algorithm isintroduced for analog transceiver optimization. Once theanalog part of a transceiver is determined, the closed-formdigital part can be obtained. Numerical results verify theadvantages of the proposed design.

I. I

NTRODUCTIONS

The great success of multiple-input multiple-output(MIMO) technology makes it widely accepted for current andfuture high data-rate communication systems [1]. Acting as apillar to satisfy data hungry applications, a natural question ishow to reduce the cost of MIMO technology, especially that oflarge scale antenna arrays. The traditional setting of one radio-frequency (RF) chain per antenna element is too expensivefor large-scale MIMO systems, especially at high frequencies,such as millimeter wave bands or Terahertz bands. Hybridanalog-digital architecture is promising to alleviate the straitsand strike a balance between the cost and the performance ofpractical MIMO systems.

C. Xing and X. Zhao are with the School of Information and Electronics,Beijing Institute of Technologies, Beijing 100081, China. (e-mail: [email protected] and [email protected]).W. Xu is with the National Mobile Communications Research Lab, South-east University, Nanjing 210096, China. He is also with the Department ofECE, University of Victoria, Victoria BC, Canada. (email: [email protected]).X. Dong is with the Department of Electrical and Computer Engi-neering, University of Victoria, Victoria, BC V8W 3P6, Canada. (e-mail:[email protected]).G. Y. Li is with the School of Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta 30332-0250, GA USA (e-mail:[email protected]).

A typical hybrid analog-digital MIMO transceiver consistsof four components, i.e., digital precoder, analog precoder,analog processor, and digital processor [2]. In the earlytransceiver design, hybrid MIMO technology is often referredto antenna selection [3], [4] to reap spatial diversity. Inthese works, analog switches are used in the radio-frequencydomain. Phase-shifter based soft antenna selection [5]–[7] hasbeen proposed to improve performance for correlated MIMOchannels. Nowadays, the phase-shifter based hybrid structurehas been widely used.For a phase shifter, only signal phase, instead of both mag-nitude and phase, can be adjusted. Thus, the optimization ofa MIMO transceiver with phase shifters becomes complicateddue to the constant-modulus constraints on analog precoderand analog processor. It has been shown in [8] that theperformance of a full-digital system can be achieved when thenumber of shifters is doubled in a phase-shifter based hybridstructure. However, this can hardly be practical due to therequirement on a large number of phase shifters, especiallyin large-scale MIMO systems. As a matter of fact, the phaseshifters in large-scale MIMO systems have been considered tobe a burden sometimes. Thus, sub-connected hybrid structurehas emerged as an alternative option [9], [10] and it hasreceived much attention recently [9]–[14].Unit modulus and discrete phase make the optimization ofanalog transceivers nonconvex and thus difﬁcult to address [3],[4]. There have been some works on hybrid transceiver opti-mization considering different design limitations and require-ments. Their motivation is to exploit the underlying structuresof the hybrid transceiver to achieve high performance but withlow complexity.Early hybrid transceiver design is based on approximatingdigital transceivers in terms of the norm difference betweenall-digital design and the hybrid counterpart. For the millime-ter wave (mmWave) band channels, which are usually withsparsity, an orthogonal matching pursuit (OMP) algorithm hasbeen used in signal recovery for the hybrid transceiver [15]. Inorder to overcome the non-convexity in hybrid transceiver op-timization, some distinct characteristics of mmWave channelsmust be exploited [15]. This methodology is a compromise onthe constant-modulus constraint, which has been validated indifferent environments, including multiuser and relay scenarios[16], [17]. However, it has been found later on that the OMPalgorithm cannot achieve the optimal solution sometimes. Asingular-value-decomposition (SVD) based descent algorithm[18] has been proposed, which is nearly optimal. An alternativefast constant-modulus algorithm [19] has also been developedto reduce the gap between the analog and digital precoders.

The above methods are hard for complex scenarios due tohigh computation complexity [20]–[22]. Therefore, based onthe idea of unitary matrix rotation, several algorithms [23],[24] have been proposed to improve the approximation per-formance while maintaining a relative low complexity at thesame time.On the other hand, some works for hybrid precoding designare based on codebooks, which relax the problem into a convexoptimization problem [25]. However, the codebook-based al-gorithm suffers performance loss if channel state information(CSI) is inaccurate [26]. In order to reduce the complexityof codebook design and the impact of partial CSI, specialstructures of massive MIMO channels [27], [28], can beexploited. Recently, an angle-domain based method has beenproposed from the viewpoint of array signal processing [29],[30], which provides a useful insight on hybrid analog anddigital signal processing. Based on the concept of the angle-domain design, some mathematical approaches, such as matrixdecomposition algorithm, have been developed [31], [32].Energy efﬁcient hybrid transceiver design for Rayleigh fadingchannels has been investigated in [33]. Hybrid transceiveroptimization with partial CSI and with discrete phases hasbeen discussed in [34] and [35], respectively.Hybrid MIMO transceivers are not only limited to mmWavefrequency bands or terahertz frequency bands but also poten-tially work in other frequency bands. The transceiver itselfcould either be linear or nonlinear. Moreover, the performancemetrics for MIMO transceiver could be different, includingcapacity, mean-squared error (MSE), bit-error rate (BER),etc. A uniﬁed framework on hybrid MIMO transceiver op-timization will be of great interest. In this paper, we willdevelop a uniﬁed framework for hybrid linear and nonlinearMIMO transceiver optimization. Our main contributions aresummarized as follows. • Both linear and nonlinear transceivers with Tomlinson-Harashima precoding (THP) or deci-sion-feedback de-tection (DFD) are taken into account in the proposedframework for hybrid MIMO transceiver optimization. • Different from the existing works in which a singleperformance metric is considered for hybrid MIMOtransceiver designs, more general performance metricsare considered. • Based on matrix-monotonic optimization framework, theoptimal structures of both digital and analog transceiverswith respect to different performance metrics have beenanalytically derived. From the optimal structures, the op-timal analog precoder and processor correspond to select-ing eigenchannels, which facilitates the analog transceiverdesign. Furthermore, several effective analog design al-gorithms have been proposed.The rest of this paper is organized as follows. In SectionII, a general hybrid system model and the MSE matricescorresponding to different transceivers are introduced. In Sec-tion III, a uniﬁed hybrid transceiver is discussed in detailand the related transceiver optimization is present. In SectionIV, the optimal structure of digital transceivers is derivedbased on matrix-monotonic optimization. In Section V, basic properties of the optimal analog precoder and processor areinvestigated, based on which effective algorithms to computethe analog transceiver are proposed. Next, in Section VI,simulation results are provided to demonstrate the performanceadvantages of the proposed algorithms. Finally, conclusionsare drawn in Section VII.

Notations : In this paper, scalars, vectors, and matrices aredenoted by non-bold, bold lower-case, and bold upper-caseletters, respectively. The notations X H and Tr( X ) denote theHermitian and the trace of a complex matrix X , respectively.Matrix X is the Hermitian square root of a positive semi-deﬁnite matrix X . The expression diag { X } denotes a squarediagonal matrix with the same diagonal elements as matrix X . The i th row and the j th column of a matrix are denotedas [ · ] i, : and [ · ] : ,j , respectively, and the element in the k th rowand the ℓ th column of a matrix is denoted as [ · ] k,ℓ . In thefollowing derivations, Λ always denotes a diagonal matrix(square or rectangular diagonal matrix) with diagonal elementsarranged in a nonincreasing order. Representation A (cid:22) B means that the matrix B − A is positive semideﬁnite. Thereal and imaginary parts of a complex variable are representedby ℜ{·} and

ℑ{·} , respectively, and statistical expectation isdenoted by E {·} .II. G ENERAL S TRUCTURE OF H YBRID

MIMOT

RANSCEIVER

In this section, we will ﬁrst introduce the system modelof MIMO hybrid transceiver designs. Then a general signalmodel is introduced, which includes nonlinear transceiver withTHP or DFD and linear transceiver as its special cases. Basedon the general signal model, the general linear minimummean-squared error (LMMSE) processor and data estimationmean-squared error (MSE) matrix are derived, which are thebasis for the subsequent hybrid MIMO transceiver design.

A. System Model

As shown in Fig. 1, we consider a point-to-point hybridMIMO system where the source and the destination areequipped with N and M antennas, respectively. Withoutloss of generality, it is assumed that both the source andthe destination have L RF chains. A transmit data vector a ∈ C D × is ﬁrst processed by a unit with feedback operationand then goes through a digital precoder F D ∈ C L × D and ananalog precoder F A ∈ C N × L . This is a more general model asit includes both linear precoder and nonlinear precoder as itsspecial cases. For the nonlinear transceiver with THP at source,the feedback matrix B Tx is strictly lower triangular. The keyidea behind THP is to exploit feedback operations to pre-eliminate mutual interference between different data streams.In order to control transmit signals in a predeﬁned region,a modulo operation is introduced for the feedback operation[36]. Based on lattice theory, it can be proved that the modulooperation is equivalent to adding an auxiliary complex vector d whose element is with integer imaginary and real parts [36],[37]. The vector d makes sure x = a + d in a predeﬁned RF Chain L RF Chain n AnalogRadio-Frequency

Processing

RF Chain (cid:258)(cid:258)(cid:258) (cid:3) (cid:258) RF Chain L ’ RF Chain n AnalogRadio-Frequency

Processing

RF Chain (cid:258)(cid:258)(cid:258) (cid:3) (cid:258)(cid:258) (cid:258) (cid:258) DigitalBaseband Processing (cid:258) (cid:258)

DigitalBaseband

Processing (cid:258) B Decision DeviceDecision

Device

THP

LP THP LP d bb

Fig. 1. General hybrid MIMO transceiver. region [36], [37]. Based on this fact, the output vector b ofthe feedback unit satisﬁes the following equation b = ( a + d ) − B Tx b , (1)that is b = ( I + B Tx ) − ( a + d ) | {z } , x . (2)It is worth noting that d can be perfectly removed by a modulooperation [36], [37] and thus recovering x is equivalent torecovering a . On the other hand, for linear precoder, thereis no feedback operation, i.e., B Tx = and d = [38].Moreover, based on (2) we have b = a .Then, the received signal y at the destination is y = HF A F D ( I + B Tx ) − x + n , (3)where n is an M × additive Gaussian noise vector withzero mean and covariance R n , H is an M × N channelmatrix, and B Tx is a general feedback matrix at source,which is determined by the types of precoders. It is worthnoting that B Tx = corresponds to linear precoder withoutfeedback operation. As shown in Fig. 1, after analog anddigital processing at the destination, the recovered signal isgiven by ˆx General = G D G A y − B Rx x , (4)where G A ∈ C L × M is an analog processor, G D ∈ C D × L isa digital processor, and B Rx is a general feedback matrix atthe destination. Note that since the analog precoder F A andanalog processor G A are implemented through phase shifters,they are restricted to constant-modulus matrices with constantmagnitude elements. For DFD at the receiver, the decisionfeedback matrix B Rx in (4) is a strictly lower-triangularmatrix. For linear detection, the feedback matrix in (4) isan all-zero matrix, i.e., B Rx = . Based on (3) and (4), therecovered signal vector can be rewritten as ˆx = G D G A HF A F D ( I + B Tx ) − x − B Rx x + G D G A n . (5) This is a general signal model and includes nonlinear hybridtransceivers with THP or DFD and linear hybrid transceiveras its special cases.More speciﬁcally, for a linear hybrid transceiver, there is nofeedback, either at the source or at the destination, i.e., B Tx = B Rx = . Therefore, the recovered signal in (5) becomes ˆx Linear = G D G A HF A F D x + G D G A n . (6)For the nonlinear transceiver with THP at the source andlinear decision at the destination, i.e., B Rx = [36], [37], thedetected signal vector in (5) becomes ˆx THP = G D G A HF A F D ( I + B Tx ) − x + G D G A n . (7)For the nonlinear transceiver with DFD at the destinationand a linear precoder at the source, i.e., B Tx = , the detectedsignal vector in (5) becomes ˆx DFD = ( G D G A HF A F D − B Rx ) x + G D G A n . (8) B. Uniﬁed MSE Matrix for Different Precoders and Proces-sors

Based on the general signal model in (5), the general MSEmatrix of the recovered signal at the destination equals Φ MSE ( G D , G A , F A , F D , B Tx , B Rx )= E { ( ˆx − x )( ˆx − x ) H } = E { (cid:0) G D G A HF A F D ( B Rx + I ) − x − x − G D G A n (cid:1) × (cid:0) G D G A HF A F D ( B Rx + I ) − x − x − G D G A n (cid:1) H } = E { (cid:0) G D G A HF A F D − ( B Rx + I )( I + B Tx ) (cid:1) bb H × (cid:0) G D G A HF A F D − ( B Rx + I )( I + B Tx ) (cid:1) H } + G D G A R n G HA G HD , (9)where the third equality is based on b = ( B Rx + I ) − x givenin (2).Based on lattice theory, each element of b is identical andindependent distributed, i.e., E { bb H } ∝ I [37]. Thus, fornotational simplicity, we can assume E { bb H } = I in thefollowing derivations. Denote B = B Rx + B Tx + B Rx B Tx ,then ( B Rx + I )( I + B Tx ) = I + B . (10) It is obvious that B is a strictly lower-triangular matrix basedon the deﬁnitions of B Tx and B Rx , which implies that usingnonlinear precoding at transmitter and nonlinear detection atthe receiver at the same time is equivalent to just one of two.Therefore, nonlinear precoding at the transmitter and nonlineardetection at the receiver are equivalent and only one is enough.Direct matrix derivation [38] yields that the optimal G D will be G optD = ( I + B )( G A HF A F D ) H × [( G A HF A F D )( G A HF A F D ) H + G A R n G HA ] − . (11)That is, the general MSE matrix can be further simpliﬁed into Φ MSE ( G A , F A , F D , B )= Φ MSE ( G optD , G A , F A , F D , B Tx , B Rx )= ( I + B )( I + F HD F HA H H G HA ( G A R n G HA ) − G A HF A F D ) − × ( I + B ) H (cid:22) Φ MSE ( G D , G A , F A , F D , B Tx , B Rx ) , (12)for any G D .If B = in (11) and (12), the results are reduced to lineartransceiver. Speciﬁcally, the corresponding digital LMMSEprocessor for linear transceiver is given as follows G optD , L =( G A HF A F D ) H [( G A HF A F D )( G A HF A F D ) H + G A R n G HA ] − , (13)and the MSE matrix for linear transceiver is Φ LMSE ( G A , F A , F D )= Φ MSE ( G A , F A , F D , )= (cid:2) I + F HD F HA H H G HA ( G A R n G HA ) − G A HF A F D (cid:3) − , [ I + Γ ( G A , F D , F A )] − , (14)where Γ ( G A , F D , F A ) = F HD F HA H H G HA ( G A R n G HA ) − G A HF A F D (15), which is signal-to-noise ratio for single antenna case.For the nolinear transceivers, B = B Tx for THP or B = B Rx for DFD in (10)-(12). Based on (12) and (14), the generalMSE matrix for nonlinear transceivers can also be written inthe following uniﬁed formula Φ MSE ( G A , F A , F D , B ) = ( I + B ) Φ LMSE ( G A , F A , F D ) × ( I + B ) H , (16)which turns into the MSE matrix in (12) when B = .In the following, we will investigate uniﬁed hybrid MIMOtransceiver optimization, which is applicable to various objec-tive functions based on on the general MSE matrix (16).III. T HE U NIFIED H YBRID

MIMO T

RANSCEIVER O PTIMIZATION

Because of the multi-objective optimization nature forMIMO systems with multiple data streams, there are differentkinds of objectives that reﬂect different design preferences [39]. All can be regarded as a matrix monotonic function ofthe data estimation MSE matrix in (16) [40]. A function f ( · ) is a matrix monotone increasing function if f ( X ) ≥ f ( Y ) for X (cid:23) Y (cid:23) [40]. To avoid case-by-case discussion, we willinvestigate in depth hybrid MIMO transceiver optimizationwith different performance metrics from a uniﬁed viewpoint,in this section.Based on the MSE matrix in (16), the uniﬁed hybrid MIMOtransceiver design can be formulated in the following form min G A , F A , F D , B f ( Φ MSE ( G A , F A , F D , B ))s . t . Tr( F A F D F HD F HA ) ≤ P F A ∈ F , G A ∈ G , (17)where f ( · ) is a matrix monotone increasing function [40]. Thesets F and G are the feasible analog precoder set and analogprocessor set satisfying constant-modulus constraint, and P denotes the maximum transmit power at the source. A. Speciﬁc Objective Functions

There are many ways to choose the matrix monotoneincreasing function. In this subsection, we will investigate theproperties of different objective functions in (16).One group of matrix monotone increasing functions can beexpressed as f ( Φ MSE ( G A , F A , F D , B ))= f Schur ( d ( Φ MSE ( G A , F A , F D , B ))) , (18)where d ( Φ MSE ( G A , F A , F D , B )) is a vector consisting of thediagonal elements of the matrix Φ MSE ( G A , F A , F D , B ) and f Schur ( · ) is a function of a vector satisfying one the followingfour properties discussed in Appendix A:1) Multiplicatively Schur-convex2) Multiplicatively Schur-concave3) Additively Schur-convex4) Additively Schur-concave.Many widely used metrics can be regarded as a special caseof this group of functions [36], [37], [39]. Conclusion 1:

For linear transceiver, the feedback matrix B in (17) is an all-zero matrix, i.e., B opt = . For nonlineartransceiver, from Appendix B the optimal feedback matrix B for f ( · ) is B opt = diag { [[ L ] , , · · · , [ L ] L,L ] T } L − − I , (19)where L is a lower triangular matrix of the following Choleskydecomposition Φ LMSE ( G A , F A , F D ) = LL H . (20)It has been proved in [40] and [38] that for nonlineartransceiver design each data stream will have the same perfor-mance if f Schur ( · ) in (18) is multiplicatively Schur-convex.On the other hand, if f Schur ( · ) in (18) is multiplicativelySchur-concave, for nonlinear transceiver design the objectivefunction includes geometrically weighted signal-to-noise-plus-interference-ratio (SINR) maximization as its special case.If f Schur ( · ) in (18) is additively Schur-convex, the objectivefunction includes the the maximum MSE minimization and the minimum BER with the same constellation on each datastream as special cases. If f Schur ( · ) in (18) is additivelySchur-concave, the objective function includes weighted MSEminimization as its special case. Additive Schur functions areusually used for linear transceivers ( B = in (18)) sinceclosed-form solutions can be obtained in this case.Besides the above group of matrix monotone increasingfunctions, we can choose one to reﬂect capacity and MSEfor linear transceivers. Capacity is one of the most popularperformance metrics in MIMO transceiver optimization. It canbe expressed as the form of MSE matrix considering the well-known relationship between the MSE matrix and capacity [40],i.e., C = − log | Φ MSE | . Then, the objective can be given as f ( · ) = log | Φ LMSE ( G A , F A , F D ) | . (21)MSE is another widely used performance metric that demon-strates how accurately a signal can be recovered. The corre-sponding weighted MSE minimization objective is f ( · ) = Tr (cid:2) A H Φ LMSE ( G A , F A , F D ) A (cid:3) , (22)where A is a general, not necessarily diagonal, weight matrix,even if it is often diagonal in many applications. B. Hybrid MIMO Transceiver Optimization

Denote Π L = ( G A R n G HA ) − / G A R / , Π R = F A ( F HA F A ) − , and ˜F D = ( F HA F A ) F D Q H , (23)where Q is a unitary matrix to be determined by digitaltransceiver optimization in the next section. Then (15) canbe rewritten as Γ ( G A , F D , F A ) = Q H ˜Γ ( G A , ˜F D , F A ) Q , (24)where ˜Γ ( G A , ˜F D , F A ) = ˜F HD Π HR H H R − / Π HL Π L R − / HΠ R ˜F D . (25)The optimal B is usually a function of Γ ( G A , F D , F A ) , forall objective functions as demonstrated by (19) for f Schur ( · ) in (18). From (24), we can conclude that the optimal B is afunction of Q H ˜Γ ( G A , ˜F D , F A ) Q . Therefore, using (14) and(24), the objective function of (17) can be expressed in termsof ˜Γ ( G A , ˜F D , F A ) as f (cid:0) Φ MSE ( G A , F A , F D , B opt ) (cid:1) = f (cid:0) ( I + B opt )( I + Q H ˜Γ ( G A , ˜F D , F A ) Q ) − ( I + B opt ) H (cid:1) , f S (cid:16) Q H ˜Γ ( G A , ˜F D , F A ) Q (cid:17) . (26)After introducing ˜Γ ( G A , ˜F D , F A ) and a new auxiliary matrix Q , the objective function is transferred into f S ( · ) rather than f ( · ) . Note that this new function notation, f S ( · ) , is deﬁnedonly for notational simplicity and it explicitly expresses theobjective as a function of matrix variables Q and ˜Γ . Therefore, the optimization problem in (17) is further rewritten into thefollowing one min Q , G A , ˜F D , F A f S (cid:16) Q H ˜Γ ( G A , ˜F D , F A ) Q (cid:17) s . t . Tr( ˜F D ˜F HD ) ≤ P F A ∈ F , G A ∈ G . (27)We will discuss in detail how to solve the optimization prob-lem (27) with respect to Q , G A , ˜F D , and F A subsequently. In(27), B has been formulated as a function of Q , G A , ˜F D , and F A . When Q , G A , ˜F D , and F A are calculated, the optimal B can be directly derived based on (19).IV. D IGITAL T RANSCEIVER O PTIMIZATION

In the following, we focus on the digital transceiver opti-mization for the optimization problem (27). More speciﬁcally,we ﬁrst derive the optimal unitary matrix Q and then ﬁnd theoptimal ˜F D . A. Optimal Q At the beginning of this section, two fundamental deﬁnitionsare given based on the following eigenvalue decomposition(EVD) and SVD ˜Γ ( G A , ˜F D , F A ) = U ˜Γ Λ ˜Γ U H ˜Γ A = U A Λ A V H A , (28)where Λ Φ and Λ A denote a diagonal matrix with the diagonalelements in nondecreasing order.Denote U GMD as the unitary matrix that makes the lowertriangular matrix L in (20) has the same diagonal elements. Ithas been shown in [38]–[40] that the optimal Q for the ﬁrstgroup of matrix-monotonic functions can be expressed as Q opt =  U ˜Γ U H GMD if f ( · ) is multiplicatively Schur-convex U ˜Γ if f ( · ) is multiplicatively Schur-concave U ˜Γ U H DFT if f ( · ) is additively Schur-convex U ˜Γ if f ( · ) is additively Schur-concave . (29)The above results are obtained by directly manipulating withthe objective function f ( · ) in (26), and thus the optimal Q varies with the matrix-monotone increasing function in (17).For the capacity maximization in (21), the objective functionof (27) can be written as f S , ( · ) = − log | Q H ˜Γ ( G A , ˜F D , F A ) Q + I | . (30)Since the function in (30) is independent of Q as long as itis a unitary matrix, the optimal Q , namely Q opt , can be anyunitary matrix with proper dimension.For the weighted MSE minimization given by (22), theobjective function of (27) can be rewritten as f S , ( · ) = Tr[ A H ( Q H ˜Γ ( G A , ˜F D , F A ) Q + I ) − A ] . (31)Based on the EVD and SVD deﬁned in (28) and the matrixinequality in Appendix C, the optimal Q is Q opt = U ˜Γ U H A . (32) We have to stress that it is still hard to ﬁnd the closed-form expression for the optimal Q for an arbitrary function f ( · ) . However, most of the meaningful and popular metricfunctions have been shown included in one of the abovefunction families, and are with the closed-form expression foroptimal Q . B. Optimal ˜F D After substituting the optimal Q into the objective functionof (27), the objective function becomes a function of theeigenvalues of ˜Γ ( G A , ˜F D , F A ) , i.e., f S (cid:16) Q H ˜Γ ( G A , ˜F D , F A ) Q (cid:17) , f E (cid:16) λ (cid:16) ˜Γ ( G A , ˜F D , F A ) (cid:17)(cid:17) , (33)where λ ( X ) = [ λ ( X ) , · · · , λ L ( X )] T and λ i ( X ) is the i th largest eigenvalue of X . It is worth highlighting that for f S , ( · ) and f S , ( · ) based on (29) and (32) we can directlyhave (33). For f S , ( · ) the optimal Q can be an arbitraryunitary matrix, minimizing f S , ( · ) mathematically equals tominimizing − P Ll =1 log (cid:16) λ l ( ˜Γ ( G A , ˜F D , F A )) (cid:17) for any Q . In other words, (33) always holds for these kinds offunctions discussed above.Note that the deﬁnition in (33) follows from the facts thatthe unitary matrix in ˜Γ ( G A , ˜F D , F A ) has been removed by theoptimal Q and only its eigenvalues remain to be optimized.Therefore, the uniﬁed hybrid MIMO transceiver optimizationin (27) is simpliﬁed to min G A , ˜F D , F A f E (cid:16) λ (cid:16) ˜Γ ( G A , ˜F D , F A ) (cid:17)(cid:17) s . t . Tr( ˜F D ˜F HD ) ≤ P F A ∈ F , G A ∈ G . (34)By applying the obtained results of Q opt and the fact that f ( · ) is a matrix-monotone increasing function, it can beconcluded from the discussion in [38], [39] that f E ( · ) is avector-decreasing function for f S , ( · ) . Moreover, substitutingthe optimal Q into the objective function of (27), for f S , ( · ) and f S , ( · ) we have f E ( · ) = − L X l =1 log (cid:16) λ l ( ˜Γ ( G A , ˜F D , F A )) (cid:17) , (35) f E ( · ) = L X l =1 λ l ( A )1 + λ l ( ˜Γ ( G A , ˜F D , F A )) , (36)respectively, which implies that f E ( · ) is also vector-decreasing. In a nutshell, based on Q opt we can concludethat f E ( · ) in (34) is a vector-decreasing function. Thus, from(34), the optimization becomes maximizing the eigenvaluesof ˜Γ ( G A , ˜F D , F A ) . Each eigenvalue of ˜Γ ( G A , ˜F D , F A ) cor-responds to SNR of an eigenchannel.In problem (34), the variables are still matrix variables.To simplify the optimization, we will ﬁrst derive the diag-onalizable structure of the optimal matrix variables. Basedon the derived optimal structure, the dimensionality of theoptimization problem are reduced signiﬁcantly. In order to derive the optimal structure and to avoid tediouscase-by-case discussion, we consider a multi-objective opti-mization problem in the following. Its Pareto optimal solutionset contains all the optimal solutions of different types oftransceiver optimizations. In particular, as discussed in [40],the optimal solution of problem (34) with a speciﬁc objectivefunction, i.e., f ( · ) , f ( · ) , or f ( · ) , must be in the Paretooptimal solution set of the following vector optimization(multi-objective) problem max G A , ˜F D , F A λ (cid:16) ˜Γ ( G A , ˜F D , F A ) (cid:17) s . t . Tr( ˜F D ˜F HD ) ≤ P F A ∈ F , G A ∈ G . (37)Equivalently, the vector optimization problem in (37) canbe rewritten as the following matrix-monotonic optimizationproblem max G A , ˜F D , F A ˜Γ ( G A , ˜F D , F A )s . t . Tr( ˜F D ˜F HD ) ≤ P F A ∈ F , G A ∈ G . (38)It is worth noting that optimization (38) aims at maximizing apositive semi-deﬁnite matrix. Generally speaking, maximizinga positive semi-deﬁnite matrix includes two tasks, i.e., max-imizing its eigenvalues and choosing a proper EVD unitarymatrix. Note that in (38) there is no need to optimize the EVDunitary matrix, because the constraints can remain satisﬁed ifonly EVD unitary matrix changes. Using the deﬁnitions in(23) and given analog precoder F A and analog processor G A ,problem (38) is a standard matrix-monotonic optimization withrespect to ˜F D . It follows max ˜F D ˜F HD Π HR H H R − / Π HL Π L R − / HΠ R ˜F D s . t . Tr( ˜F D ˜F HD ) ≤ P. (39)Based on the matrix-monotonic optimization theory developedin [40], the optimal solution of (39) satisﬁes the followingdiagonalizable structure. Conclusion 2:

Deﬁning the following SVD, Π L R − / HΠ R = U H Λ H V H H , (40)with the diagonal elements of Λ H in decreasing order, theoptimal ˜F D satisﬁes ˜F optD = V H Λ F U HArb , (41)where Λ F is a diagonal matrix determined by the speciﬁcobjective functions, e.g., sum MSE, capacity maximization,etc., as discussed in the previous section. The unitary matrix U Arb can be an arbitrary unitary matrix.Thus far by using Conclusion 2, the optimal ˜F D can beobtained by conducting basic manipulations as in [40] onoptimizing Λ F given a speciﬁc objective function. As a result,the remaining key task is to optimize the analog precoder andprocessor, which is the focus of the following section. V. A

NALOG T RANSCEIVER O PTIMIZATION

Based on the optimal solution of digital precoder given inthe previous section, we optimize the analog precoder andprocessor under constant-modulus constraints. In the follow-ing, the optimal structure of the analog transceiver is ﬁrstderived. Different from existing works, we show that theanalog precoder and processor design can be decoupled byusing the optimal transceiver structure. This optimal structuregreatly simpliﬁes the involved analog transceiver design.For the analog transceiver optimization in (38) and using(23), we have the following matrix-monotonic optimizationproblem max F A , G A ˜F HD Π HR H H R − / Π HL Π L R − / HΠ R ˜F D s . t . F A ∈ F , G A ∈ G . (42)Denote the SVDs R − / H , U H Λ H V H H , (43) R / G HA , U RG Λ RG V H RG . (44)In Appendix D, we prove the following conclusion on theoptimal structure of F A and G A . Conclusion 3:

Let the SVD of F A be F A , U F A Λ F A V H F A . (45)The singular values in Λ F A do not affect the objective functionin (42), and the unitary matrix U F A for the optimal F A satisﬁes [ U F A ] opt: , L = arg max {k [ V H ] : , L [ U F A ] H: , L k } . (46)On the other hand, denote the SVD of R / G HA as R / G HA , U RG Λ RG V H RG . (47)The singular values in Λ RG do not affect the objective in (42),and the unitary matrix U RG for the optimal G A satisﬁes [ U RG ] opt: , L = arg max {k [ U H ] : , L [ U RG ] H: , L k } . (48)Based on the optimal structure given in Conclusion 3, in thefollowing two kinds of algorithms are proposed to computethe analog precoder and processor. The ﬁrst one is based onphase projection, which provides better performance while thesecond one based on a heuristic random selection, is with lowcomplexity. A. Phase Projection Based Algorithm

Analog Precoder Design

From Conclusion 3, the optimal analog precoder shouldselect the ﬁrst L -best eigenchannels. It is challenging todirectly optmize F A based on (46) because of the SVD ofa constant-modulus matrix. Alternatively, we resort to ﬁndinga matrix in the constant-modulus space with the minimumdistance to the space spanned by [ V H ] : , L . Then, the corre-sponding optimization problem of analog precoder design canbe formulated as min F A , Λ A , Q A k [ V H ] : , L Λ A Q A − F A k Algorithm 1

Analog Precoder Design

Input:

Left singular matrix of equivalent channel [ V H ] : , L ,algorithm threshold ζ . Initialize F A with P F { [ V H ] : , L } . while the decrement of the objective function in (49) islarger than ζ do Calculate Λ A based on (50). Calculate Q A based on (52). Calculate F A based on (54). Update the decrement value of the objective function in(49). end while Return: F A . s . t . Q A Q HA = IF A ∈ F . (49)Different from the existing work [24], the diagonal matrix Λ A and the unitary matrix Q A in our work are jointly optimizedto make F A as close as possible in the space spanned by [ V H ] : , L in terms of Frobenius norm. As there is no constrainton the diagonal matrix Λ A , given matrices Q A and F A , theoptimal Λ A is Λ optA = diag n ℜ (cid:0) Q A F HA [ V H ] : , L (cid:1)o . (50)Then we rewrite the objective function in (49) as k [ V H ] : , L Λ A Q A − F A k =Tr([ V H ] : , L Λ optA ( Λ optA ) H [ V H ] H: , L )+ Tr( F A F HA ) − ℜ{ Tr( F HA [ V H ] : , L Λ optA Q A ) } . (51)To minimize (51) given Λ A and F A , the term ℜ{ Tr( F HA [ V H ] : , L Λ Aopt Q A ) } should be maximized.By applying the matrix inequality [41], the optimal Q A is Q optA = V Q U HQ , (52)where V Q and U Q are deﬁned based on the following SVD F HA [ V H ] : , L Λ A = U Q Σ Q V HQ . (53)Now that for given Q A and Λ A , the optimal analog precoder F A is [20] F optA = P F ([ V H ] : , L Λ A Q A ) , (54)where the phase projection P F ( A ) is deﬁned as [ P F ( A )] i,j = ( [ A ] i,j / | [ A ] i,j | , if [ A ] i,j = 01 , otherwise . (55)Using (50), (52) and (54), the phased projection based analogprecoder optimization is proposed in Algorithm 1. Analog Processor Design

Based on Conclusion 3, the optimal structure of analogprocessor is similar to the analog precoder, but a bit com-plicated in that the noise variance is tangled in the analogprocessor formulation. In this case, the left singular matrixof R / G HA is required to match the ﬁrst L column of leftsingular matrix of effective channel, i.e., [ U H ] : , L . Thus, Algorithm 2

Iterative Analog Processor Design

Input:

The matrix [ U H ] : , L , the unitary matrix Q G , thediagonal matrix Λ G , controlling factor η , and convergentthreshold υ . Compute W in (77). Initialize constant-modulus processor as r (0) = √ . while The decrement of the objective function in (77) islarger than υ do Calculate P using (78) based on G A computed in theprevious iteration. Find out the optimal solution of (77) based on (79). Update the decrement of the objective function in (77). end while Construct G A based on the optimal solution in Step 5. Return: G A .analogous to the analog precoder design in (49), we have thefollowing optimization problem min G A , Λ G , Q G (cid:13)(cid:13) [ U H ] : , L Λ G Q G − R / G HA (cid:13)(cid:13) s . t . Q G Q HG = IG A ∈ G . (56)The optimization of unitary matrix Q G and diagonal matrix Λ G in (56) is exactly the same as that for the analog pre-coder optimization. However, the optimization of the analogprocessor, G A , in (56) is different.When noises from different antennas are correlated theanalog processor design is more challenging than the analogprecoder design. In order to overcome this challenge, problem(56) is relaxed to minimize an upper bound of the originalobjective function. Applying (cid:13)(cid:13) [ U H ] : , L Λ G Q G − R / G HA (cid:13)(cid:13) ≤ λ max ( R n ) (cid:13)(cid:13) R − / [ U H ] : , L Λ G Q G − G HA (cid:13)(cid:13) , (57)the objective function of (56) is relaxed with (cid:13)(cid:13) R − / [ U H ] : , L Λ G Q G − G HA (cid:13)(cid:13) . Note that solving(56) is the same as that for the analog precoder design. It isobvious that this relaxation is tight when R n = σ n I .This relaxation may result in some performance loss. In-spired by the work in [42], an iterative algorithm is alsoproposed to compute G A . The constant modulus constraints isasymptotically satisﬁed via iteratively updating an additionalconstraint. This iterative algorithm is given in Algorithm 2,and detailed derivation is given in Appendix E. B. Random Algorithm

The proposed phase projection based analog transceiverdesign suffers from high computation complexity. This mayprohibit the proposed analog transceiver design from practi-cal implementation. In order to reduce complexity, we canrandomly generate analog precoder and processor matrices toavoid the heavy computations involved in the phase projectionbased algorithms. In this random algorithm, we randomlyselect multiple matrices in the column or row space of H H Algorithm 3

Random Algorithm for Analog Transceiver De-sign

Input:

The number of transmitter antennas N , number of RF-Chain L , selection number K , probability density function f trans ( x ) , and H Generate K parameter matrices, R , . . . , R K ∈ C M × L ,whose elements are randomly generated based on f trans ( x ) . Rotate the channel as H H R k . Calculate F k = P F (cid:0) H H R k (cid:1) . F max = arg max F i (cid:12)(cid:12) F H i H H R − HF i (cid:12)(cid:12) . Generate K parameter matrices T , . . . , T K ∈ C L × N randomly based on f trans ( x ) . Rotate the channel as T k H H . Calculate G k = P F (cid:0) T k H H (cid:1) . G max = arg max G i (cid:12)(cid:12) G i R − / HH H R − / G H i (cid:12)(cid:12) . Return: F A = F max , G A = G max and use their phase projections as the candidates for the analogtransceiver design. Then the best candidate matrix is chosenaccording to some criterion.Speciﬁcally, the random algorithm consists of three steps.First, a series of parameter matrices, denoted by { R k } and { T k } , are generated, whose elements are randomly generatedfollowing a speciﬁc distribution e.g., uniform distribution orGaussian distribution. Secondly, a series of candidate analogprecoder and processor matrices are computed based on theparameter matrices. Speciﬁcally, based on the parameter ma-trices and after computing H H R k and T k H H , the constant-modulus candidate matrices are obtained using their phaseprojections. Finally, the analog precoder and processor arechosen from these candidates according to the determinant ofa certain matrix version SNR matrix. The procedure is detailedin Algorithm 3. VI. S IMULATION R ESULTS

In this part, some numerical results are provided to assessthe performance of the proposed hybrid transceiver design.As our algorithms are applicable to any frequency band, bothmicrowave frequency band and mmWave frequency band aresimulated. In addition, quantization of phase shifters is alsotaken into account. More speciﬁcally, both mmWave channelmodel H m and classic Rayleigh channel model H r are tested.For mmWave channel, H m , the uniformed linear arrays (ULA)is adopted. Unless otherwise speciﬁed, it is assumed that 1)the mmWave channel has N cl = 2 clusters with each of themcontaining N path = 5 paths; 2) the azimuth angle spread oftransmitter is restricted to . ◦ at the mean of azimuth angle ˆ θ = 45 ◦ , and the receiver is omni-directional; 3) the pathloss factors obey the standard Gaussian distribution; 4) theinter-antenna spacing d equals to half-wavelength. The channelis normalized to meet E (cid:8) k H m k (cid:9) = N M . For the randomphase algorithm, we set K = 10 , which means that the bestanalog precoder and processor are selected from 10 candidatesand uniform distribution is utilized, i.e., f trans ( x ) = 1 for ≤ x ≤ . We average the result over 2,000 independent -5 0 5 10 15 20 (P Tx / n2 ) S p ec t r a l e ff i c i e n c y / ( B it/ s / H z ) DigitalProjectionMaGiQDirect PhaseOMP

Fig. 2. Spectral efﬁciency comparison of 5 different hybrid transceiver designmethods. Here, × mmWave channel model is adopted in the simulation.Both the transmitter and receiver are equipped with L = 4 RF-chains and thesystem is conveying D = 4 data streams. -5 0 5 10 15 20 (P Tx / n2 ) S p ec t r a l e ff i c i e n c y / ( B it/ s / H z ) DigitalProjectionMaGiQDirect PhaseOMP

Fig. 3. Spectral efﬁciency comparison of 5 different hybrid transceiverdesign methods. The × mmWave channel model, which involves N cl = 3 clusters with N path = 5 multipath at each cluster, is adoptedin the simulation. Both the transmitter and receiver are equipped with L = 6 RF-chains and the system is conveying D = 4 data streams. trials. The transmitting power is denoted as P Tx . OMP andMaGiQ algorithms refers to the corresponding algorithms in[15] and [24], respectively. The analog precoder and processorfor the direct phase algorithm are obtained by phase projection.Fig. 2 demonstrates spectral efﬁciency versus the transmitpower for different algorithms, where the hybrid transceiver iswith N = 32 transmit antennas, M = 16 receive antennas, and4 data streams. Both the transmitter and receiver are equippedwith L = 4 RF-chains. From Fig. 2, the proposed phasedprojection based hybrid transceiver design outperforms theother hybrid transceiver design algorithms. The performanceof the proposed algorithm is very close to the full digital one.Fig. 3 shows the performance of the hybrid transceiverdesign with 6 RF-chains for channel with N cl = 3 clusters,each with N path = 5 paths. From this ﬁgure, the proposedphase projection algorithm works well for different numbers ofRF-chains and performs very close to the full-digital one andit is better than that of other hybrid transceiver designs. It is -5 0 5 10 15 20 (P Tx / n2 ) S p ec t r a l e ff i c i e n c y / ( B it/ s / H z ) DigitalProjectionMaGiQDirect PhaseOMP

Fig. 4. Spectral efﬁciency comparison of 5 different hybrid transceiver designmethods. The × Rayleigh channel model is adopted in the simulation.Both the transmitter and receiver are equipped with L = 6 RF-chains and thesystem is conveying D = 4 data streams. worth noting that the direct phase projection method performs -5 0 5 10 15 20 (P Tx / n2 ) S p ec t r a l e ff i c i e n c y / ( B it/ s / H z ) DigitalProjectionOMPDirect PhaseMaGiQ

Fig. 5. Spectral efﬁciency comparison of 5 different hybrid transceiverdesign methods concerning 2 bits quantization of phase shifters. The × mmWave channel model is adopted in the simulation. Both the transmitterand receiver are equipped with L = 4 RF-chains and the system is conveying D = 4 data streams. even better than OMP and MaGiQ. This is because the errorbound of the method decreases when the number of RF chainsincreases [20]. However, as the limitation that the number ofdata streams should be equal to that of RF-chains [24] is notsatisﬁed in this case, MaGiQ algorithm is the worst at highSNR.The following simulations focus on Rayleigh channels atmicro-wave bands. Under this circumstance, the × systemis adopted with L = 6 RF-chains are in use transferring D = 4 data streams. After performing extensive simulation comparedwith randomly generated codebooks or DFT codebook, wefound that the codebook constructed by the phase projection,i.e., C = P F ( H ) , has much better performance. This code-book is used for performance comparison in the followingsimulation.Fig. 4 compares the performance for the different algorithmsunder Rayleigh channels. From the ﬁgure, the proposed algo- -5 0 5 10 15 20 (P Tx / n2 ) S p ec t r a l e ff i c i e n c y / ( B it/ s / H z ) Digital (32 16)Projection (32 16)Random (32 16)Random (64 16)

Fig. 6. Spectral efﬁciency of random algorithm under mmWave channel. L = 6 RF-chains are assumed to be equipped both transmitter and receiver.Both × and × system are involved during the simulation. Thenumber of data streams is set to be D = 4 . rithm obtains nearly the optimal performance as the full-digitalone. The proposed algorithm performs better than MaGiQalgorithm in [24]. Moreover, it is worth noting that even withthe carefully chosen codebook, the OMP algorithm exhibitsa large performance gap compared with the full-digital one,which indicates that the OMP algorithm is not suitable formicro-wave frequency bands.As the practical analog phase shifters are often implementedby digital controller with ﬁnite resolution, Fig. 5 comparesthe performance of different hybrid transceiver designs for × mmWave channel when phase quantization is takeninto account. Each hybrid transceiver design only uses thephase shifter with 2-bit resolution and L = 4 . From the ﬁgurethe performance of the proposed hybrid transceiver designstill outperforms other hybrid transceiver designs with ﬁniteresolution phase shifters.In Fig. 6, both × and × mmWave channelsare used to assess the performance. In this case, the numberof RF-chains is . From Fig. 6, with the same number oftransmit antennas, the random algorithm is worse than that ofthe phase projection based algorithm. Although the randomalgorithm suffers nearly performance loss comparingwith the full-digital one, by involving more antennas at basestation, e.g., N = 64 , the performance of random algorithmwill be comparable to the performance corresponding to thefull-digital transmitter with antennas. This implies that wecan obtain appropriate performance using the low complexityrandom algorithm by simply increasing the number of transmitantennas. Because of its low complexity, the random algorithmwill be a friendly algorithm for hardware realization.Fig. 7 shows the BER performances of different kinds ofhybrid MIMO transceiver designs for × Raleigh channelwith 4 RF chains. In this case, there are 4 data streamsand 16-QAM is used. From this ﬁgure, at high SNR, theBER performance of the hybrid nonlinear transceiver design ismuch better than that of the hybrid linear transceiver design.Furthermore, the hybrid nonlinear transceivers with THP andDFD have almost the same BER performance because of the duality between precoder design and processor design. -5 0 5 10 15 (P Tx / n2 ) -5 -4 -3 -2 -1 B E R Hybrid Linear Transceiver for CapacityMaximizationHybrid Nonlinear Transceiver with DFEHybrid Nonlinear Transceiver with THP

Fig. 7. BERs of the linear hybrid transceiver for capacity maximization,nonlinear transceiver with DFD and nonlinear transceiver THP. The × Rayleigh channel model is involved in the simulation. Both transmitter andreceiver are equipped with L = 4 RF-chains transferring D = 4 data streamssimultaneously. VII. C

ONCLUSIONS

In this paper, we have investigated the hybrid digital andanalog transceiver design for MIMO system based on matrix-monotonic optimization theory. We have proposed a uniﬁedframework for both linear and nonlinear transceivers. Basedon the matrix-monotonic optimization theory, the optimaltransceiver structure for various MIMO transceivers has beenderived, from which the function of analog transceiver partcan be regarded as eigenchannel selection. Using the derivedoptimal structure, effective algorithms have been proposedconsidering the constant-modulus constraint. Finally, it isshown that the proposed algorithms outperform existing hybridtransceiver designs. A

PPENDIX AP RELIMINARY D EFINITION OF M AJORIZATION T HEORY

In this appendix, some fundamental functions in majoriza-tion theory are deﬁned for the convenience of uniﬁed frame-work analysis. These deﬁnitions are also given in [38] and inorder to make the paper self-contained, they are also givenhere.

Deﬁnition 1 [41]:

For a K × vector x ∈ R K , the ℓ thlargest element of x is denoted as x [ ℓ ] , and in other words, wehave x [1] ≥ x [2] ≥ · · · ≥ x [ K ] . Based on this deﬁnition, fortwo vectors x , y ∈ R K , it state that y majorizes x additively,denoted by x ≺ + y , if and only if the following propertiesare satisﬁed p X k =1 x [ k ] ≤ p X k =1 y [ k ] , p = 1 , , . . . , K − and K X k =1 x [ k ] = K X k =1 y [ k ] . (58) Deﬁnition 2 [41]:

A function f ( · ) is Schur-convex if andonly if it satisﬁes the following property x ≺ + y = ⇒ f ( x ) ≤ f ( y ) . (59) On the other hand, a function f ( · ) is additively Schur-concaveif − f ( · ) is additively Schur-convex. Deﬁnition 3 [38]:

For two K × vectors x , y ∈ R K with nonnegative elements, it states that the vector y majorizesvector x multiplicatively, i.e., x ≺ × y , if and only if thefollowing properties are satisﬁed p Y k =1 x [ k ] ≤ p Y k =1 y [ k ] , p = 1 , , . . . , K − and K Y k =1 x [ k ] = K Y k =1 y [ k ] . (60) Deﬁnition 4 [38]:

A function f ( · ) is multiplicatively Schur-convex if and only if it satisﬁes the following property x ≺ + y = ⇒ f ( x ) ≤ f ( y ) . (61)On the other hand, a function f ( · ) is multiplicatively Schur-concave if − f ( · ) is multiplicatively Schur-convex.A PPENDIX BT HE OPTIMAL B Note that this optimal B for nonlinear transceiver waspreviously obtained in [36] when function belongs to the fam-ily of multiplicatively Schur-concave/convex functions deﬁnedin Appendix A. The following presents a slightly differentproof of the optimal B, which generalizes the result to thecase with an arbitrary monotone increasing function f ( · ) .Here, the function f operates only on the diagonal elementsof Φ MSE ( G A , F A , F D , B ) and B is restricted as a strictlylower triangular matrix which speciﬁes the use of nonlineartransceiver.Based on the Cholesky decomposition Φ LMSE ( G A , F A , F D ) = LL H , (62)we have Φ MSE ( G A , F A , F D , B )= ( I + B ) Φ LMSE ( G A , F A , F D )( I + B ) H = ( I + B ) LL H ( I + B ) H , (63)based on which the n th diagonal element of Φ MSE ( G A , F A , F D , B ) equals [ Φ MSE ( G A , F A , F D , B )] n,n = [( I + B ) L ] n, : [( I + B ) L ] H n, : = k [( I + B ) L ] n, : k . (64)In addition, as B is strictly lower triangular it can be calculatedthat the last element of the vector [( I + B ) L ] n, : equals [ L ] n,n ,i.e., [( I + B ) L ] n, : = [ · · · , [ L ] n,n ] . (65)Therefore, from (64) to (65) the following relationship holds [ Φ MSE ( G A , F A , F D , B )] n,n = k [( I + B ) L ] n, : k ≥ [ L ] n,n . (66)It is obvious that the above inequality can be achieved withequality as [ Φ MSE ( G A , F A , F D , B )] n,n = [ L ] n,n when thefollowing equality holds for different n ( I + B ) L = diag { [[ L ] , , · · · , [ L ] L,L ] T } , (67)based on which the optimal B equals B opt = diag { [[ L ] , , · · · , [ L ] L,L ] T } L − − I . (68) A PPENDIX CF UNDAMENTAL M ATRIX I NEQUALITIES

In this appendix, two fundamental matrix inequalities aregiven. For two positive semi-deﬁnite matrices X and Y , thereare following EVDs deﬁned X = U X Λ X U H X with Λ X ց Y = U Y Λ Y U H Y with Λ Y ց Y = ¯U Y ¯Λ Y ¯U H Y with ¯Λ Y ր . (69)For the trace of the two matrices, we have the followingfundamental matrix inequalities [40] N X i =1 λ i − N ( X ) λ i ( Y ) ≤ Tr( XY ) ≤ N X i =1 λ i ( X ) λ i ( Y ) , (70)where λ i ( X ) is the i th ordered eigenvalue of X , and the leftequality holds when U X = ¯U Y . On the other hand, the rightequality holds when U X = U Y .A PPENDIX DO PTIMAL S TRUCTURE OF A NALOG T RANSCEIVER

It is worth noting that the nonzero singular values of thematrix, Π R = F A ( F HA F A ) − , are all ones. Similarly for Π L = ( G A R n G HA ) − / G A R / , the nonzero singular valuesof Π L are all ones. It implies that the singular values of F A and G A R / do not affect the optimization problem.Based on the SVDs R − / H = U H Λ H V H H , R / G HA = U RG Λ RG V H RG , and F A = U F A Λ F A V H F A with the singularvalues in decreasing order, the objective function in (42)becomes ˜F HD V F A Λ TR U H F A H H U RG Λ TL Λ L U HRG HU F A Λ R V H F A ˜F D (71)where the diagonal elements of the diagonal matrices Λ R and Λ L satisﬁes [ Λ R ] i,i = 1 , i ≤ L [ Λ R ] i,i = 0 , i > L [ Λ L ] i,i = 1 , i ≤ L [ Λ R ] i,i = 0 , i > L. (72)Therefore, F A and G A do not affect the optimal solution.Moreover, the unitary matrices V F A and V RG do not affectthe optimal solution as ˜F D in the constraint is unitary invariant.Based on the above the discussion and (71), the re-maining task to maximize the singular values of matrix [ U HRG HU F A ] L, L . Note that U RG and U F A are unitarymatrices, for the optimal solution, the left eigenvectors of itsﬁrst L largest singular values of F A should have the maximuminner product with [ V H ] : , L i.e., [ U F A ] opt: , L = arg max {k [ V H ] : , L [ U F A ] H: , L k } . (73)Similarly for the optimal solution, the left eigenvectors ofits ﬁrst L largest singular values of R / G HA should have themaximum inner product with [ U H ] : , L , i.e., [ U RG ] opt: , L = arg max {k [ U H ] : , L [ U RG ] H: , L k } . (74) A PPENDIX EA NALOG T RANSCEIVER D ESIGN

For ﬁxed Λ G and Q G , the optimization problem (56) canbe transferred into the following vector variable optimizationproblem min r r T Wr − p T r − r T p + q s.t. r T K i r = a , i = 1 , , . . . , N L. (75)The vector r is constructed via vectorizing G A , i.e., r = (cid:2) ℜ{ vec( G A ) } T , ℑ{ vec( G A ) } T (cid:3) T , and the matrices W , K i and vector p are deﬁned as follows: W = (cid:20) ℜ{ I ⊗ R n } −ℑ{ I ⊗ R n }ℑ{ I ⊗ R n } ℜ{ I ⊗ R n } (cid:21) , K i = diag n(cid:2) T( i − × , , T( NL − × , , T( NL − i ) × (cid:3)o , and p = " ℜ{ (cid:0) I ⊗ R / (cid:1) H vec (cid:0) [ U H ] : , L Λ G Q G (cid:1) }ℑ{ (cid:0) I ⊗ R / (cid:1) H vec (cid:0) [ U H ] : , L Λ G Q G (cid:1) } . (76)The constant scalar, q , in (75) equals q = || vec (cid:0) [ U H ] : , L Λ G Q G (cid:1) || .Note that because of the constant modulus constraints, theterm r T r is a constant. As a result, for a constant real scalar, η , the objective function in (75) is equivalent to r T ( W + η I ) r − p T r − r T p + q . As the constant modulus constraintsin (75) are all quadratic equalities, the optimization problem(75) is nonconvex. Following the idea of [42], an iterativealgorithm is proposed via iteratively updating constraints toguarantee the constant modulus constraints. Speciﬁcally, atthe n th iteration each constraint r T K i r = a is replacedby ˜r T( n − K i r ( n ) = a where ˜r ( n − is a vector computedbased on r computed in the ( n − th iteration. After stack-ing ˜r T( n − K i for i = 1 , , . . . N L in P ( n − , optimizationproblem (75) is transferred to min r ( n ) r T( n ) ( W + η I ) r ( n ) − p T r ( n ) − r T( n ) p + q s.t. P ( n − r ( n ) = a , (77)where the matrix P ( n − is deﬁned as [ P ( n − ] ℓ,j =  cos (cid:0) ∠ [vec( G A , ( n − )] ℓ (cid:1) if ℓ = j, ℓ ≤ N L sin (cid:0) ∠ [vec( G A , ( n − )] ℓ (cid:1) if j = ℓ + N L, ℓ ≤ N L otherwise. (78)The vector is a column vector with all elements equal to 1.As proved in [42], when η ≥ σ max N L/ || p || , where σ max is the largest eigenvalue of R n , the optimal solution of the it-erative optimization (77) minimizes the objective function andsatisﬁes the constant modulus constraints asymptotically. As(77) is convex at each iteration, based on its KKT conditions,at the n th iteration the optimal solution of (77) is r ( n ) = ( W + η I ) − (cid:18) q + λ P T( n − (cid:19) (79) with λ (cid:16) P ( n − ( W + η I ) − P T( n − (cid:17) − × (cid:0) a − P ( n − ( W + η I ) − q (cid:1) . (80)In a nutshell, the iterative algorithm is given in Algorithm 2.Using the iterative algorithm, the numerical result of analogprocessor can be found.R EFERENCES[1] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broad-band systems,”

IEEE Commun. Mag. , vol. 49, no. 6, Jun. 2011.[2] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li,and K. Haneda, “Hybrid beamforming for massive MIMO: A survey,”

IEEE Commun. Mag. , vol. 55, no. 9, pp. 134–141, Sep. 2017.[3] A. Gorokhov, D. A. Gore, and A. J. Paulraj, “Receive antenna selectionfor MIMO spatial multiplexing: theory and algorithms,”

IEEE Trans.Signal Process. , vol. 51, no. 11, pp. 2796–2807, Dec. 2003.[4] A. F. Molisch, M. Z. Win, Y.-S. Choi, and J. H. Winters, “Capacity ofMIMO systems with antenna selection,”

IEEE Trans. Wireless Commun. ,vol. 4, no. 4, pp. 1759–1772, Jul. 2005.[5] X. Zhang, A. F. Molisch, and S.-Y. Kung, “Variable-phase-shift-basedRF-baseband codesign for MIMO antenna selection,”

IEEE Trans. SignalProcess. , vol. 53, no. 11, pp. 4091–4103, Nov. 2005.[6] P. Sudarshan, N. B. Mehta, A. F. Molisch, and J. Zhang, “Channelstatistics-based RF pre-processing with antenna selection,”

IEEE Trans.Wireless Commun. , vol. 5, no. 12, Dec. 2006.[7] Z. Xu, S. Sfar, and R. S. Blum, “Analysis of MIMO systems with receiveantenna selection in spatially correlated rayleigh fading channels,”

IEEETrans. Veh. Technol. , vol. 58, no. 1, pp. 251–262, Jan. 2009.[8] Y.-P. Lin, “On the quantization of phase shifters for hybrid precodingsystems,”

IEEE Trans. Signal Process. , vol. 65, no. 9, pp. 2237–2246,May 2017.[9] C. Lin, G. Y. Li, and L. Wang, “Subarray-based coordinated beamform-ing training for mmwave and sub-thz communications,”

IEEE J. Sel.Areas Commun. , vol. 35, no. 9, pp. 2115–2126, Aug. 2008.[10] S. Park, A. Alkhateeb, and R. W. Heath, “Dynamic subarrays for hybridprecoding in wideband mmwave MIMO systems,”

IEEE Trans. WirelessCommun. , vol. 16, no. 5, pp. 2907–2920, May 2017.[11] C. Lin and G. Y. Li, “Coordinated beamforming training for mmwaveand sub-thz communications with antenna subarrays,” in

Proc. IEEEWCNC , Jun. 2017, pp. 1–6.[12] N. Song, T. Yang, and H. Sun, “Overlapped subarray based hybridbeamforming for millimeter wave multiuser massive MIMO,”

IEEESignal Process. Lett. , vol. 24, no. 5, pp. 550–554, May 2017.[13] D. Zhang, Y. Wang, X. Li, and W. Xiang, “Hybridly connected structurefor hybrid beamforming in mmwave massive MIMO systems,”

IEEETrans. Commun. , vol. 66, no. 2, pp. 662–674, Feb. 2018.[14] S.-H. Wu, L.-K. Chiu, and J.-W. Wang, “Reconﬁgurable hybrid beam-forming for dual-polarized mmwave MIMO channels: Stochastic chan-nel modeling and architectural adaptation methods,”

IEEE Trans. Com-mun. , vol. 66, no. 2, pp. 741–755, Feb. 2018.[15] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath,“Spatially sparse precoding in millimeter wave MIMO systems,”

IEEETrans. Wireless Commun. , vol. 13, no. 3, pp. 1499–1513, Mar. 2014.[16] M. Kim and Y. H. Lee, “MSE-based hybrid RF/baseband processingfor millimeter-wave communication systems in MIMO interferencechannels,”

IEEE Trans. Veh. Technol. , vol. 64, no. 6, pp. 2714–2720,Jun. 2015.[17] J. Lee and Y. H. Lee, “AF relaying for millimeter wave communicationsystems with hybrid RF/baseband MIMO processing,” in

Proc. IEEEInt. Conf. Commun. , Jun. 2014, pp. 5838–5842.[18] W. Ni, X. Dong, and W.-S. Lu, “Near-optimal hybrid processing formassive MIMO systems via matrix decomposition,”

IEEE Trans. SignalProcess. , vol. 65, no. 15, pp. 3922–3933, Aug. 2017.[19] J. Tranter, N. D. Sidiropoulos, X. Fu, and A. Swami, “Fast unit-modulusleast squares with applications in beamforming,”

IEEE Trans. SignalProcess. , vol. 65, no. 11, pp. 2875–2887, Jun. 2017.[20] R. Rajashekar and L. Hanzo, “Hybrid beamforming in mm-wave MIMOsystems having a ﬁnite input alphabet,”

IEEE Trans. Commun. , vol. 64,no. 8, pp. 3337–3349, Aug. 2016. [21] X. Gao, L. Dai, S. Han, I. Chih-Lin, and R. W. Heath, “Energy-efﬁcienthybrid analog and digital precoding for mmwave MIMO systems withlarge antenna arrays,” IEEE J. Sel. Areas Commun. , vol. 34, no. 4, pp.998–1009, Apr. 2016.[22] Y. R. Ramadan, H. Minn, and A. S. Ibrahim, “Hybrid analog-digitalprecoding design for secrecy mmwave MISO-OFDM systems,”

IEEETrans. Commun. , vol. 65, no. 11, pp. 5009–5026, Nov. 2017.[23] R. Mai, T. Le-Ngoc, and D. H. N. Nguyen, “Joint hybrid Tx-Rx designfor wireless backhaul with delay-outage constraint in massive MIMOsystems,”

IEEE Trans. Wireless Commun. , vol. 16, no. 10, pp. 6736–6750, Oct. 2017.[24] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital beamforming formassive MIMO systems,” arXiv preprint arXiv:1712.03485 , 2017.[25] S. He, J. Wang, Y. Huang, B. Ottersten, and W. Hong, “Codebook-basedhybrid precoding for millimeter wave multiuser systems,”

IEEE Trans.Signal Process. , vol. 65, no. 20, pp. 5289–5304, Oct. 2017.[26] A. Liu and V. K. Lau, “Impact of CSI knowledge on the codebook-basedhybrid beamforming in massive MIMO,”

IEEE Trans. Signal Process. ,vol. 64, no. 24, pp. 6545–6556, Dec. 2016.[27] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming designfor large-scale antenna arrays,”

IEEE J. Sel. Topics Signal Process. ,vol. 10, no. 3, pp. 501–513, Apr. 2016.[28] ——, “Hybrid analog and digital beamforming for mmwave OFDMlarge-scale antenna arrays,”

IEEE J. Sel. Areas Commun. , vol. 35, no. 7,pp. 1432–1443, Jul. 2017.[29] H. Lin, F. Gao, S. Jin, and G. Y. Li, “A new view of multi-user hybridmassive MIMO: Non-orthogonal angle division multiple access,”

IEEEJ. Sel. Areas Commun. , vol. 35, no. 10, pp. 2268–2280, Jul. 2017.[30] J. Zhao, F. Gao, W. Jia, S. Zhang, S. Jin, and H. Lin, “Angle domainhybrid precoding and channel tracking for millimeter wave massiveMIMO systems,”

IEEE Trans. Wireless Commun. , vol. 16, no. 10, pp.6868–6880, Oct. 2017.[31] G. Zhu, K. Huang, V. K. Lau, B. Xia, X. Li, and S. Zhang, “Hybridbeamforming via the kronecker decomposition for the millimeter-wavemassive MIMO systems,”

IEEE J. Sel. Areas Commun. , vol. 35, no. 9,pp. 2097–2114, Sep. 2017.[32] M. M. Molu, P. Xiao, M. Khalily, K. Cumanan, L. Zhang, and R. Tafa-zolli, “Low-complexity and robust hybrid beamforming design formulti-antenna communication systems,”

IEEE Trans. Wireless Commun. ,vol. 17, no. 3, pp. 1445–1459, Mar. 2018.[33] S. Payami, M. Ghoraishi, and M. Dianati, “Hybrid beamforming forlarge antenna arrays with phase shifter selection,”

IEEE Trans. WirelessCommun. , vol. 15, no. 11, pp. 7258–7271, Nov. 2016.[34] D. Zhu, B. Li, and P. Liang, “A novel hybrid beamforming algorithmwith uniﬁed analog beamforming by subspace construction based onpartial CSI for massive MIMO-OFDM systems,”

IEEE Trans. Commun. ,vol. 65, no. 2, pp. 594–607, Feb. 2017.[35] J.-C. Chen, “Hybrid beamforming with discrete phase shifters formillimeter-wave massive MIMO systems,”

IEEE Trans. Veh. Technol. ,vol. 66, no. 8, pp. 7604–7608, Aug. 2017.[36] C. Xing, M. Xia, F. Gao, and Y.-C. Wu, “Robust transceiver withTomlinson-Harashima precoding for amplify-and-forward MIMO relay-ing systems,”

IEEE J. Sel. Areas Commun. , vol. 30, no. 8, pp. 1370–1382, Sep. 2012.[37] A. A. D’Amico, “Tomlinson-Harashima precoding in MIMO systems:A uniﬁed approach to transceiver optimization based on multiplicativeschur-convexity,”

IEEE Trans. Signal Process. , vol. 56, no. 8, pp. 3662–3677, Aug. 2008.[38] C. Xing, F. Gao, and Y. Zhou, “A framework for transceiver designs formulti-hop communications with covariance shaping constraints,”

IEEETrans. Signal Process. , vol. 63, no. 15, pp. 3930–3945, Aug. 2015.[39] D. P. Palomar, J. M. Ciofﬁ, and M. A. Lagunas, “Joint Tx-Rx beam-forming design for multicarrier MIMO channels: A uniﬁed frameworkfor convex optimization,”

IEEE Trans. Signal Process. , vol. 51, no. 9,pp. 2381–2401, Sep. 2003.[40] C. Xing, S. Ma, and Y. Zhou, “Matrix-monotonic optimization forMIMO systems,”

IEEE Trans. Signal Process. , vol. 63, no. 2, pp. 334–348, Jan. 2015.[41] A. W. Marshall, I. Olkin, and B. C. Arnold,

Inequalities: Theory ofMajorization and Its Applications . Springer-Verlag New York, 2011.[42] O. Aldayel, V. Monga, and M. Rangaswamy, “Tractable transmit MIMObeampattern design under a constant modulus constraint,”