[PDF] A General Robust Linear Transceiver Design for Multi-Hop Amplify-and-Forward MIMO Relaying Systems

Abstract

In this paper, linear transceiver design for multi-hop amplify-and-forward (AF) multiple-input multiple-out (MIMO) relaying systems with Gaussian distributed channel estimation errors is investigated. Commonly used transceiver design criteria including weighted mean-square-error (MSE) minimization, capacity maximization, worst-MSE/MAX-MSE minimization and weighted sum-rate maximization, are considered and unified into a single matrix-variate optimization problem. A general robust design algorithm is proposed to solve the unified problem. Specifically, by exploiting majorization theory and properties of matrix-variate functions, the optimal structure of the robust transceiver is derived when either the covariance matrix of channel estimation errors seen from the transmitter side or the corresponding covariance matrix seen from the receiver side is proportional to an identity matrix. Based on the optimal structure, the original transceiver design problems are reduced to much simpler problems with only scalar variables whose solutions are readily obtained by iterative water-filling algorithm. A number of existing transceiver design algorithms are found to be special cases of the proposed solution. The differences between our work and the existing related work are also discussed in detail. The performance advantages of the proposed robust designs are demonstrated by simulation results.

Full PDF

aa r X i v : . [ c s . I T ] F e b A General Robust Linear Transceiver Designfor Multi-Hop Amplify-and-Forward MIMORelaying Systems

Chengwen Xing, Zesong Fei, Shaodan Ma, Yik-Chung Wu and H. Vincent Poor

Abstract

In this paper, linear transceiver design for multi-hop amplify-and-forward (AF) multiple-inputmultiple-out (MIMO) relaying systems with Gaussian distributed channel estimation errors is inves-tigated. Commonly used transceiver design criteria including weighted mean-square-error (MSE) min-imization, capacity maximization, worst-MSE / MAX-MSE minimization and weighted sum-rate maxi-mization, are considered and uniﬁed into a single matrix-variate optimization problem. A general robustdesign algorithm is proposed to solve the uniﬁed problem. Speciﬁcally, by exploiting majorization theoryand properties of matrix-variate functions, the optimal structure of the robust transceiver is derivedwhen either the covariance matrix of channel estimation errors seen from the transmitter side or thecorresponding covariance matrix seen from the receiver side is proportional to an identity matrix. Basedon the optimal structure, the original transceiver design problems are reduced to much simpler problemswith only scalar variables whose solutions are readily obtained by iterative water-ﬁlling algorithm. A

Chengwen Xing and Zesong Fei are with the School of Information and Electronics, Beijing Institute of Technology, Beijing,China. Phone : (86)1068911841, Fax : (86)1068912615, Email : [email protected], [email protected] Ma is with the Department of Electrical and Computer Engineering, University of Macau, Macao. Email:[email protected] Wu is with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong.Email : [email protected]. Vincent Poor is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA. Email:[email protected] material in this paper was partially presented at the International Conference on Wireless Communications and SignalProcessing (WCSP), Nanjing, China, Nov. 2011. ∗ The corresponding author is Shaodan Ma.

November 8, 2018 DRAFT number of existing transceiver design algorithms are found to be special cases of the proposed solution.The differences between our work and the existing related work are also discussed in detail. Theperformance advantages of the proposed robust designs are demonstrated by simulation results.

Index Terms

Amplify-and-forward (AF), MIMO relaying, matrix-variate optimization, robust transceiverdesign.

I. I

NTRODUCTION

With signiﬁcant potential to enable the emerging requirements for high speed ubiquitouswireless communications, cooperative communications has been adopted as one of the keycomponents in future wireless communication standards such as long term evolution (LTE),international mobile telecommunications-advanced (IMT-Advanced), the Winner project, etc.Speciﬁcally, these developments involve the deployment of relays to enhance the coverageof base stations and to improve the communication quality of wireless links [1]. In general,relays can adopt different relaying strategies, e.g., amplify-and-forward (AF), decode-and-forward(DF) and compress-and-forward (CF). Among these relaying strategies, the AF scheme is themost attractive for practical implementation due to its low complexity and independence ofthe underlying modulation. On the other hand, it is well-established that employing multipleantennas provides spatial diversity and multiplexing gain in a wireless communication system. Itis straightforward to combine AF transmission with multi-input multi-output (MIMO) systemsso that the virtues of both techniques can be obtained. The resulting system (termed an AFMIMO relaying system) has attracted considerable interest [2] in recent years.Transceiver design for AF MIMO relaying systems, which refers to the design of sourceprecoder, relay ampliﬁer and receiver equalizer, has been widely discussed in the literature[3]–[16]. Generally speaking, transceiver design varies from system to system and dependsheavily on the design criteria and objectives. The most commonly used criteria are capacitymaximization [3], [4], [8] and data mean-square-error (MSE) minimization [5]–[8]. Usuallythese two criteria are contradictory to each other and call for different algorithms to solvethe optimization problems. Interestingly, in [8] a uniﬁed framework which is applicable to bothcapacity maximization and MSE minimization is proposed for transceiver design in dual-hop AFMIMO relay systems. Since multi-hop AF transmission is a promising technique to increase the

November 8, 2018 DRAFT coverage of a transmitter, transceiver design for a multi-hop system is further investigated in [10].It reveals that optimal solutions for both capacity and MSE criteria in a multi-hop system shouldhave diagonal structures. However, in most of the previous works on transceiver design including[8] and [10], channel state information (CSI) is assumed to be perfectly known/estimated. Thisis difﬁcult to achieve in practice and channel estimation errors are inevitable due to limitedtraining and quantization operation, resulting in signiﬁcant performance degradation. In order tomitigate the performance degradation, such channel estimation errors are necessary to be takeninto account in the transceiver design process. This kind of transceiver is called robust transceiver.It has been shown in [17] and [18] that robust transceiver design is essentially different from thetransceiver design with perfect CSI. It is more challenging and different algorithms are requiredto solve the challenging robust design problem.In general, channel estimation errors can be modeled in two different ways: norm-boundederrors with known error bound and random errors with certain distribution. Correspondingly,robust transceiver designs can also be classiﬁed into two main categories: worst-case robustdesign for norm-bounded errors [19] and Bayesian robust design for randomly distributed er-rors [20]. For linear channel estimators, the estimation errors can be accurately modeled asbeing random with a Gaussian distribution [11]. Under this kind of Gaussian estimation errors,Bayesian robust transceiver design for dual-hop AF relaying systems has been investigated in[14] and solutions for capacity maximization and MSE minimization respectively are proposedby implicitly approximating a design-variable dependent covariance matrix (the matrix A in [14])as being constant. Since the approximation is tight only when the covariance matrix of channelestimation errors seen from the receiver side is proportional to an identity matrix, the proposedsolutions are sub-optimal for general cases. In [11] and [12], Bayesian robust transceiver designtargeting at weighted MSE minimization is discussed for dual-hop AF relaying systems andan optimal solution is found without considering the source precoder. The optimality of theproposed solution is proved to hold under a wide range of cases, i.e., when either the covariancematrix of channel estimation errors from the transmitter side or the corresponding covariancematrix seen from the receiver side is proportional to an identity matrix. These works have beenextended to systems with source precoder design and an iterative algorithm has been proposedto ﬁnd a good solution without guaranteed optimality [15]. Similarly, the robust transceiverdesign for maximizing mutual information rate for dual-hop AF relaying systems under Gaussian November 8, 2018 DRAFT channel estimation errors at all nodes has been investigated in [16] and a solution without globaloptimality is proposed with an iterative algorithm. Unfortunately, the aforementioned algorithmsare applicable only to dual-hop AF systems and their extension to multi-hop AF systems is byno means straightforward as shown in [10].In this paper, we investigate robust transceiver design for a general multi-hop

AF relayingsystem with Gaussian distributed channel uncertainties. The robust design problem is signiﬁcantlydifferent from that in the literature and is challenging due to the existence of random channeluncertainties and the complexity of the multihop system. A number of widely used design criteriaincluding weighted MSE minimization, capacity maximization, worst case MSE minimization,and weighted sum-rate maximization are considered and their corresponding robust design prob-lems are uniﬁed into one matrix-variate optimization problem. A general robust design algorithmis proposed to solve the uniﬁed problem, i.e., to jointly design the precoder at the source, multipleforwarding matrices at the relays, and the equalizer at the destination. Speciﬁcally, the structureof the optimal solution for the uniﬁed problem is derived based on majorization theory [21], [22]and properties of matrix-monotone functions [22]. It is demonstrated that the derived optimalstructure is signiﬁcantly different from its counterpart with perfect CSI [10] and its optimalityholds under a wide range of cases, i.e., when either the covariance matrix of channel estimationerrors seen from the transmitter side or the corresponding covariance matrix seen from thereceiver side is proportional to an identity matrix. With the optimal structure, the robust designproblem is simpliﬁed into a design problem with only scalar variables. An iterative water-ﬁllingalgorithm is then proposed to obtain the remaining unknown parameters in the transceiver. Theperformance of the proposed robust designs is ﬁnally corroborated by simulation results. Inaddition, it is shown that the proposed solutions cover some existing transceiver design solutionsas special cases.The rest of the paper is organized as follows. In Section II, the signal model for a multi-hop AF system is introduced. Then a uniﬁed robust transceiver design problem applicable toweighted MSE minimization, capacity maximization, MAX-MSE minimization and weightedsum-rate maximization, is formulated in Section III. In Section IV, the optimal structure forthe robust transceiver is derived and the uniﬁed transceiver design problem is reduced to aproblem of ﬁnding a set of diagonal matrices, which can be solved by an iterative water-ﬁllingalgorithm. The performance of the proposed robust designs is demonstrated by simulation results

November 8, 2018 DRAFT in Section V. Finally, this paper is concluded in Section VI.The following notation is used throughout this paper. Boldface lowercase letters denote vectors,while boldface uppercase letters denote matrices. The notation Z H denotes the Hermitian of thematrix Z , and Tr( Z ) is the trace of the matrix Z . The symbol I M denotes the M × M identitymatrix, while M,N denotes the M × N all zero matrix. The notation Z / is the Hermitiansquare root of the positive semideﬁnite matrix Z , such that Z / Z / = Z and Z / is also aHermitian matrix. The symbol λ i ( Z ) represents the i th largest eigenvalue of Z . The symbol ⊗ denotes the Kronecker product. For two Hermitian matrices, C (cid:23) D means that C − D is apositive semi-deﬁnite matrix. The symbol Λ ց represents a rectangular diagonal matrix withdecreasing diagonal elements. II. S YSTEM M ODEL

In this paper, a multi-hop AF MIMO relaying system is considered. As shown in Fig. 1, onesource with N antennas wants to communicate with a destination with M K antennas through K − relays. The k th relay has M k receive antennas and N k +1 transmit antennas. It is obviousthat the dual-hop AF MIMO relaying system is a special case of this conﬁguration when K = 2 .At the source, an N × data vector s with covariance matrix R s = E { ss H } = I N is transmittedafter being precoded by a precoder matrix P . The received signal x at the ﬁrst relay is x = H P s + n where H is the MIMO channel matrix between the source and the ﬁrst relay, and n is an additive Gaussian noise vector at the ﬁrst relay with zero mean and covariance matrix R n = σ n I M .At the ﬁrst relay, the received signal x is multiplied by a forwarding matrix P and then theresulting signal is transmitted to the second relay. The received signal x at the second relayis x = H P x + n , where H is the MIMO channel matrix between the ﬁrst relay and thesecond relay, and n is an additive Gaussian noise vector at the second relay with zero mean andcovariance matrix R n = σ n I M . Similarly, the received signal at the k th relay can be writtenas x k = H k P k x k − + n k (1)where H k is the channel matrix for the k th hop, and n k is an additive Gaussian noise vectorwith zero mean and covariance matrix R n k = σ n k I M k . November 8, 2018 DRAFT

Finally, for a K -hop AF MIMO relaying system, the received signal at the destination is y = [ K Y k =1 H k P k ] s + K − X k =1 { [ K Y l = k +1 H l P l ] n k } + n K , (2)where Q Kk =1 Z k denotes Z K × · · · × Z . It is generally assumed that N k and M k are greater thanor equal to N in order to guarantee that the transmitted data s can be recovered at the destination[5].In practical systems, because of limited length of training sequences, channel estimation errorsare inevitable. With channel estimation errors, the channel matrix can be written as H k = ¯H k + ∆ H k , (3)where ¯H k is the estimated channel matrix in the k th hop and ∆ H k is the corresponding channelestimation error matrix whose elements are zero mean Gaussian random variables. Moreover,the M k × N k matrix ∆ H k can be decomposed using the widely used Kronecker model as ∆ H k = Σ / k H W,k Ψ / k [11]–[13], [17], [18], [20]. The elements of the M k × N k matrix H W,k are independent and identically distributed (i.i.d.) Gaussian random variables with zero meansand unit variances. The speciﬁc properties of the row correlation matrix Σ k and the columncorrelation matrix Ψ k are determined by the training sequences and channel estimators beingused [11], [17]. Note that Σ k and Ψ k correspond to the covariance matrices of the channelestimation errors seen from the transmitter and receiver sides, respectively.At the destination, a linear equalizer G is employed to detect the desired data vector s .The resulting data MSE matrix equals to Φ ( G , { P k } Kk =1 ) = E { ( Gy − s )( Gy − s ) H } , wherethe expectation is taken with respect to random data, channel estimation errors, and noise .Following a similar derivation in dual-hop systems [12], the MSE matrix is derived to be Φ ( G , { P k } Kk =1 ) = E { ( Gy − s )( Gy − s ) H } = G [ ¯H K P K R x K − P H K ¯H H K + Tr( P K R x K − P H K Ψ K ) Σ K + R n K ] G H + I N − [ K Y k =1 ¯H k P k ] H G H − G [ K Y k =1 ¯H k P k ] , (4) Here the channel estimation errors are assumed unknown at all the nodes. The data MSE matrix at the receiver should thus becomputed by taking expectation against all the unknown random variables including data, noise and channel estimation errors.

November 8, 2018 DRAFT where the received signal covariance matrix R x k at the k th relay satisﬁes the following recursiveformula: R x k = ¯H k P k R x k − P H k ¯H H k + Tr( P k R x k − P H k Ψ k ) Σ k + R n k , (5)and R x = R s = I N represents the signal covariance matrix at the source.III. T RANSCEIVER D ESIGN P ROBLEMS

A. Objective Functions

There are various performance metrics for transceiver design. In the following, four widelyused metrics are discussed.(1)

Weighted MSE : With the data MSE deﬁned in (4), weighted MSE can be directly writtenas Obj 1:

Tr[ WΦ ( G , { P k } Kk =1 )] (6)where the weighting matrix W is a positive semi-deﬁnite matrix [23]. Here W is not restrictedto be a diagonal matrix. Given any two matrices N and M satisfying N (cid:23) M (cid:23) , we have Tr[ W N ] ≥ Tr[ W M ] . The weighted MSE is thus a monotonically matrix-increasing functionof Φ ( G , { P k } Kk =1 ) [22]. Clearly, transceiver design with weighted MSE minimization aims atminimizing the distortion between the recovered and the transmitted signal [5], [24], [25].(2) Capacity : Capacity maximization is another important and widely used performance metricfor transceiver design. Denoting the received pilot for channel estimation as r , the channelcapacity between the source and destination is I ( s ; y | r ) [26]. To the best of our knowledge, theexact capacity of MIMO channels with channel estimation errors is still open even for point-to-point MIMO systems [18], [26]. However, a lower bound of the capacity can be found as − log | Φ ( G , { P k } Kk =1 ) | ≤ I ( s ; y | r ) . (7)The equality in (7) holds when perfect CSI is known [4], [24]. For imperfect CSI, the tightnessof this bound is extensively investigated in [26], [27]. This lower bound − log | Φ ( G , { P k } Kk =1 ) | can be interpreted as the sum-rate of multiple transmitted data streams when linear equalizer G is employed. It has been widely used to replace the unknown exact capacity as a performance November 8, 2018 DRAFT metric. Based on this lower bound, the robust transceiver design maximizing capacity can bereplaced by minimizing the following objective function [18], [27]:Obj 2: log | Φ ( G , { P k } Kk =1 ) | . (8)(3) Worst MSE : Notice that capacity maximization criterion (

Obj 2 ) does not impose anyfairness on the simultaneously transmitted multiple data streams, while the weighted MSEminimization criterion (

Obj 1 ) imposes only a limited degree of fairness on the data as it involvesonly a linear operation on the MSE. When fairness is required to balance the performance acrossdifferent data streams, worst MSE minimization is a good alternative for such transceiver design.In general, the worst MSE can be represented as [24]Obj 3: ψ [d( Φ ( G , { P k } Kk =1 ))] (9)where ψ ( • ) is an increasing Schur-convex function and d( Φ ( G , { P k } Kk =1 )) denotes a vectorconsisting of the diagonal elements of Φ ( G , { P k } Kk =1 ) , i.e., d( Φ ( G , { P k } Kk =1 )) = (cid:2) [ Φ ( G , { P k } Kk =1 )] , · · · [ Φ ( G , { P k } Kk =1 )] N,N (cid:3) T , (10)with the symbol [ Z ] i,j representing the ( i, j ) th entry of Z . It follows that ψ [d( Φ ( G , { P k } Kk =1 ))] is also a monotonically matrix increasing function with respect to Φ ( G , { P k } Kk =1 ) . Note that theobjective function in (9) is applicable to other design criteria involving fairness considerations.(4) Weighted sum rate : When a preference is required to be given to a certain data stream(e.g., loading more resources to the data streams with better channel state information so thatthe weighted sum rate is maximized), the objective function can be written as [24]Obj 4: ψ [d( Φ ( G , { P k } Kk =1 ))] (11)where ψ ( • ) is an increasing Schur-concave function. Similarly to (9), this function ψ [d( Φ ( G , { P k } Kk =1 ))] is a monotonically matrix increasing function with respect to Φ ( G , { P k } Kk =1 ) . Remark1:

Some objective functions on signal to inference plus noise ratio (SINR) and bit errorrate (BER) can also be formulated as (9) or (11) and thus can be incorporated into our framework.For example, when the objective is to maximize a sum of weighted SINRs, the objective functioncan be formulated as the form of (11) as an increasing Schur-concave function of the diagonalelements of the MSE matrix. Similarly, when the objective is to maximize the harmonic mean ofSINRs or to maximize the minimal SINR, the objective function can be formulated as the form

November 8, 2018 DRAFT of (9) as an increasing Schur-convex function of the diagonal elements of the MSE matrix. Onthe other hand, when BER minimization is concerned, when all the data streams are modulatedusing the same scheme, the average BER can be approximated as an increasing Schur-convexfunction of the diagonal elements of the MSE matrix [21] and can be incorporated into thecategory of Objective 3 in (9).

B. Problem Formulation

Although the above four criteria aim at different objectives, they have one common feature,that is, the objective functions are monotonically matrix-increasing functions with respect to thedata MSE. The corresponding transceiver design problems can therefore be uniﬁed into a singleform: min P k , G f ( Φ ( G , { P k } Kk =1 ))s . t . Tr( P k R x k − P H k ) ≤ P k , k = 1 , · · · , K (12)where f ( • ) is a real-value matrix monotonically increasing function with Φ ( G , , { P k } Kk =1 ) as itsargument. Notice that the constraints here are imposed on the powers averaged over the channelestimation errors.With the deﬁnition of the data MSE (4) and by differentiating the trace of the MSE with respectto G and setting the result to zero, we can easily obtain a linear minimum MSE (LMMSE)equalizer as [29] G LMMSE = [ K Y k =1 ¯H k P k ] H [ ¯H K P K R x K − P H K ¯H H K + Tr( P K R x K − P H K Ψ K ) Σ K + R n K ] − , (13)with the following property [23], [24]: Φ ( G LMMSE , { P k } Kk =1 ) (cid:22) Φ ( G , { P k } Kk =1 ) . (14)The above equality holds when G = G LMMSE . Because f ( • ) is monotonically matrix-increasing,it follows easily from (14) that f ( Φ ( G LMMSE , { P k } Kk =1 )) ≤ f ( Φ ( G , { P k } Kk =1 )) . It means that f ( Φ ( G LMMSE , { P k } Kk =1 )) is a tight lower bound of the objective function in (12). Together with November 8, 2018 DRAFT0 the fact that the equalizer G is not involved in the constraints in (12), the optimization problemin (12) is equivalent to min P k f ( Φ ( G LMMSE , { P k } Kk =1 ))s . t . Tr( P k R x k − P H k ) ≤ P k , k = 1 , · · · , K. (15)It implies that the optimal equalizer of (12) is G LMMSE in (13). Substituting the optimal equal-izer into Φ ( G , { P k } Kk =1 ) in (4) and denoting Φ ( G LMMSE , { P k } Kk =1 ) = Φ MMSE ( { P k } Kk =1 ) forsimplicity, we have Φ MMSE ( { P k } Kk =1 ) = I N − [ K Y k =1 ¯H k P k ] H [ ¯H K P K R x K − P H K ¯H H K + Tr( P K R x K − P H K Ψ K ) Σ K + R n K ] − [ K Y k =1 ¯H k P k ] . (16)For multi-hop AF MIMO relaying systems, the received signal at the k th relay depends onthe forwarding matrices at all preceding relays, causing the power allocations at different relaysto be coupled to each other (as seen in the constraints of (15)), and thus making the problem(15) difﬁcult to solve. To proceed, we deﬁne the following new variables in terms of P k : F , P Q H0 , F k , P k K / F k − ( K − / F k − ¯H k − F k − F H k − ¯H H k − K − / F k − + I M k − | {z } , Π k − ) / Q H k − (17)where K F k , Tr( F k F H k Ψ k ) Σ k + σ n k I M k and Q k is an unknown unitary matrix. The introductionof Q k is due to the fact that for a positive semi-deﬁnite matrix M , its square root has the form M / Q where Q is a unitary matrix. With the new variables, the MMSE matrix Φ MMSE ( { P k } Kk =1 ) (16) is reformulated as Φ MMSE ( { P k } Kk =1 ) = I N − Q H0 [ K Y k =1 Q k Π − / k K − / F k ¯H k F k ] H [ K Y k =1 Q k Π − / k K − / F k ¯H k F k | {z } , A k ] Q = I N − Q H0 A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A Q . (18)Meanwhile, the power constraint in the k th hop (i.e., Tr( P k R x k − P H k ) ≤ P k ) can now be rewrittenas Tr( F k F H k ) ≤ P k . (19) November 8, 2018 DRAFT1

It is clear that with the new variables F k , the constraints become independent of each other.Moreover, the latter transformation of the objective function in the uniﬁed problem will notaffect the constraints, thus improving the tractability of the problem. Putting (18) and (19) into(15), the uniﬁed transceiver design problem can be reformulated as P1: min F k , Q k f ( I N − Q H0 ΘQ )s . t . Tr( F k F H k ) ≤ P k , k = 1 , · · · , K Θ = A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A Q H k Q k = I M k . (20)From the deﬁnition of A k in (18) and noticing that K F k = Tr( F k F H k Ψ k ) Σ k + σ n k I M k , itcan be seen that the design variable F k appears at multiple positions in the objective functionand is involved in matrix inversion and square root operations through K F k . This is signiﬁcantlydifferent from transceiver design for multi-hop MIMO relaying systems with perfect CSI in [10].Therefore, the optimization problem is much more complicated than its counterpart with perfectCSI. Indeed, as demonstrated by, e.g., [11] and [17], [18], [20], robust transceiver design is muchmore complicated and challenging than its counterpart with perfect CSI even for point-to-pointor dual-hop relaying MIMO systems.IV. O PTIMAL S OLUTION FOR THE R OBUST T RANSCEIVER

Clearly from the formulation of P1 in (20), two sets of matrix variables (i.e., F k , Q k ) need tobe determined. In this section, their optimal structures will be derived ﬁrst, which enables thesimpliﬁcation of the optimization problem in (20) into a problem with only scalar variables. Aniterative water-ﬁlling algorithm is then applied to solve the simpliﬁed problem. The relationshipbetween our proposed solution and a number of existing solutions will also be discussed indetail. A. Optimal Q k Based on the formulations of the objectives given in (6), (8), (9) and (11), we have thefollowing property of the optimization problem P1 . November 8, 2018 DRAFT2

Property 1:

At the optimal value of P1 , Q H0 ΘQ and the objective function f ( I N − Q H0 ΘQ ) can be written respectively as Q H0 ΘQ = U Ω diag[ λ ( Θ )] U H Ω , (21) f ( I N − Q H0 ΘQ ) = f ( I N − U Ω diag[ λ ( Θ )] U H Ω ) , g [ λ ( Θ )] , (22)where g ( • ) is a monotonically decreasing and Schur-concave function with respect to λ ( Θ ) ;the vector λ ( Θ ) = [ λ ( Θ ) , · · · , λ N ( Θ )] T with λ i ( Θ ) being the i th largest eigenvalue of Θ ; and U Ω =  U W for Obj 1 U Arb for Obj 2 Q DFT for Obj 3 I N for Obj 4 . (23)In (23), the matrix U W is unitary and deﬁned from the eigen-decomposition of the weightingmatrix W , i.e., W = U W Λ W U H W with Λ W ց ; the matrix U Arb is an arbitrary unitary matrix;and Q DFT is the discrete Fourier transform (DFT) matrix making Q DFT diag[ λ ( Θ )] Q HDFT haveidentical diagonal elements.

Proof:

See Appendix A. (cid:4)

We notice that the equality in (21) will hold directly, when Q = U Θ U H Ω (24)where U Θ is the unitary matrix corresponding to the eigen-decomposition of Θ with eigenvaluesin decreasing order. Since Q is not involved in the constraints in (20), it follows from Property1 that Q = U Θ U H Ω is the optimal solution of Q for P1 .Using Property 1 , the objective function of (20) can be directly replaced by g [ λ ( Θ )] andthus the optimization problem is equivalent to P2: min F k , Q k g [ λ ( Θ )]s . t . Θ = A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A Tr( F k F H k ) ≤ P k , Q H k Q k = I M k , k = 1 , · · · , K. (25) The speciﬁc expressions for g ( • ) are given in Appendix A, but they are not important for the derivation of the optimalstructures. November 8, 2018 DRAFT3

For this optimization problem, we have another property as follows.

Property 2: As g ( • ) is a decreasing and Schur-concave function, the objective function in P2 satisﬁes g ( λ ( Θ )) ≥ g ([ γ ( { F k } Kk =1 ) · · · γ N ( { F k } Kk =1 )] T | {z } , γ ( { F k } Kk =1 ) ) (26)with γ i ( { F k } Kk =1 ) , K Y k =1 λ i ( F H k ¯H H k K − F k ¯H k F k )1 + λ i ( F H k ¯H H k K − F k ¯H k F k ) , (27)with the equality in (26) holds when Q k = V A k +1 U H A k , k = 1 , · · · , K − , (28)and Q K is an arbitrary unitary matrix. In (28), unitary matrices U A k and V A k are deﬁned basedon the singular value decomposition (SVD) A k = U A k Λ A k V H A k with Λ A k ց . Proof:

See Appendix B. (cid:4)

It is clear from (28) that Q k , k = 1 , · · · , K − can be uniquely computed from A k whichis determined only by F k as shown in (18). Similarly, according to (24) and the deﬁnition of Θ , it can be concluded that Q is determined by Q k , k = 1 , · · · , K and A k , and therefore itis eventually determined only by F k . With this fact and Property 2 , the optimization problemwith two set of variables of F k and Q k in P2 (25) can be reduced to the optimization problemwith only one set of variables of F k as follows: P3: min F k g [ γ ( { F k } Kk =1 )]s . t . γ i ( { F k } Kk =1 ) = K Y k =1 λ i ( F H k ¯H H k K − F k ¯H k F k )1 + λ i ( F H k ¯H H k K − F k ¯H k F k )Tr( F k F H k ) ≤ P k , k = 1 , · · · , K (29) B. Optimal Structure of F k Since g ( • ) is a monotonically decreasing function of its vector argument, we have the fol-lowing additional property of the optimal solution of F k in P3 . Property 3:

The optimal solutions of the optimization problem P3 in (29) always occur on theboundary, i.e., Tr( F k F H k ) = P k and the power constraint is equivalent to Tr[ F k F H k ( α k P k Ψ k + σ n k I N k )] /η f k = P k , (30) November 8, 2018 DRAFT4 where α k is a constant as α k = Tr( Σ k ) /M k and η f k , Tr( F k F H k Ψ k ) α k + σ n k . (31) Proof:

See Appendix C. (cid:4)

With

Property 3 , the optimal solution of the optimization problem (29) is exactly the optimalsolution of the following optimization problem with different constraints:

P4: min F k g [ γ ( { F k } Kk =1 )]s . t . γ i ( { F k } Kk =1 ) = K Y k =1 λ i ( F H k ¯H H k K − F k ¯H k F k )1 + λ i ( F H k ¯H H k K − F k ¯H k F k )Tr[ F k F H k ( α k P k Ψ k + σ n k I N k )] /η f k = P k . (32)Now deﬁning unitary matrices U H k and V H k based on the following SVD: ( K F k /η f k ) − / ¯H k ( α k P k Ψ k + σ n k I N k ) − / = U H k Λ H k V H H k (33)with singular values in decreasing order, we have the key result about the optimal structure of F k as follows. Property 4:

When Ψ k ∝ I or Σ k ∝ I , the matrix K F k /η f k is constant and independent of F k .Meanwhile, the optimal solution of the optimization problem (32) has the following structure: F k, opt = q ξ k ( Λ F k )( α k P k Ψ k + σ n k I N k ) − / V H k ,N Λ F k U H F k ,N , (34)where V H k ,N and U F k ,N are the matrices consisting of the ﬁrst N columns of V H k and U F k ,respectively; U F k is an arbitrary unitary matrix; Λ F k is a N × N unknown diagonal matrix;and the scalar ξ k ( Λ F k ) is a function of Λ F k and equals ξ k ( Λ F k ) = σ n k / { − α k Tr[ V H H k ,N ( α k P k Ψ k + σ n k I N k ) − / Ψ k ( α k P k Ψ k + σ n k I N k ) − / V H k ,N Λ F k ] } = η f k . (35) Proof:

See Appendix D. (cid:4)

Remark2:

When reversing the direction of data transmission in the multi-hop system, we canget a dual multi-hop system where the estimated channel matrix in its ( K − k +1) th hop becomes ¯H H k and the roles of row correlation matrices and column correlation matrices are interchanged.Using (17) and Property 4 and after some tedious manipulation, the optimal precoder matrices

November 8, 2018 DRAFT5 P ′ k, opt for the dual multi-hop system can be found to be β k P H k, opt where β k is a scalar. Thismeans that there exists an uplink-downlink duality in the multi-hop AF MIMO relaying systemswith channel estimation errors.In the optimal structure given by (34), the scalar variable ξ k ( Λ F k ) can be uniquely determinedby the matrix Λ F k and therefore the only unknown variable in (34) is Λ F k . The computationof Λ F k will be addressed in detail in the following subsection. C. Computation of Λ F k Substituting the optimal structures given by

Property 4 into P4 and deﬁning [ Λ H k ] i,i = h k,i and [ Λ F k ] i,i = f k,i for i = 1 , · · · , N , the optimization problem for computing Λ F k becomes min f k,i g [ γ ( { F k } Kk =1 )]s . t . γ i ( { F k } Kk =1 ) = Q Kk =1 f k,i h k,i Q Kk =1 ( f k,i h k,i + 1) N X i =1 f k,i = P k . (36)The exact expression for g ( • ) depends on the speciﬁc design criterion used for transceiver design.For all four criteria discussed in Section III-A, a widely used and computationally efﬁcientiterative algorithm [30] can be applied to solve for f k,i from (36), although the optimizationproblem (36) is non-convex in nature [31]. For completeness, the optimal solution for f k,i willbe given case by case in the following.

1) Weighted MSE Minimization:

For weighted MSE minimization, it is proved in Appendix Athat g [ γ ( { F k } Kk =1 )] = P Ni =1 ( w i − w i γ i ( { F k } Kk =1 )) where w i = [ Λ W ] i,i . Therefore, the optimiza-tion problem (36) can be rewritten as min f k,i N X i =1 w i − w i Q Kk =1 f k,i h k,i Q Kk =1 ( f k,i h k,i + 1) ! s . t . N X i =1 f k,i = P k . (37) November 8, 2018 DRAFT6

Using the iterative water-ﬁlling algorithm, f k,i can be directly computed with given f l,i ’s where l = k as f k,i = r w i µ k h k,i vuutY l = k { f l,i h l,i f l,i h l,i } − h k,i  + , i = 1 , · · · , N, (38)where µ k is the Lagrange multiplier that makes P Ni =1 f k,i = P k . Notice that this iterative water-ﬁlling algorithm is guaranteed to converge, as discussed in [30].

2) Capacity Maximization:

As proved in Appendix A, the objective function for capacitymaximization is given by g [ γ ( { F k } Kk =1 )] = P Ni =1 log (cid:0) − γ i ( { F k } Kk =1 ) (cid:1) , based on which theoptimization problem (36) can be written as min f k,i N X i =1 log − Q Kk =1 f k,i h k,i Q Kk =1 ( f k,i h k,i + 1) ! s . t . N X i =1 f k,i = P k . (39)Similarly, the iterative water-ﬁlling algorithm can be used to solve for f k,i with guaranteedconvergence. More speciﬁcally, when the f l,i ’s are given with l = k , the solution for f k,i can bederived as f k,i = 1 h k,i  − a k,i + q a k,i + 4(1 − a k,i ) a k,i h k,i /µ k − a k,i ) −  + i = 1 , · · · , N with a k,i = Y l = k f l,i h l,i / ( f l,i h l,i + 1) (40)where µ k is the Lagrange multiplier that makes P Ni =1 f k,i = P k hold.

3) MAX-MSE Minimization:

MAX-MSE is in fact a special case of

Obj 3 and in this case, ψ (d( Φ MSE ( { P k } Kk =1 ))) = max[ Φ MSE ( { P k } Kk =1 )] i,i . As shown in Appendix A, g ( λ ( { F k } Kk =1 )) = ψ [ N − ( P Ni =1 λ i ( { F k } Kk =1 ) /N ) ⊗ N ] . It follows that g [ γ ( { F k } Kk =1 )] equals g [ γ ( { F k } Kk =1 )] = max n N − ( X Ni =1 γ i ( { F k } Kk =1 ) /N ) ⊗ N o = 1 − N N X i =1 γ i ( { F k } Kk =1 ) . (41)Clearly this expression for g ( • ) is similar to that for weighted MSE minimization. The optimalsolution for f k,i can then be easily found as (38) with w i = 1 . November 8, 2018 DRAFT7

4) Weighted Sum-Rate Maximization:

Under weighted sum-rate maximization, the objectivefunction

Obj 4 can be further speciﬁed as ψ (d( Φ MMSE ( { P k } Kk =1 ))) = N X i =1 v i log(d [ N − i +1] ( Φ MMSE ( { P k } Kk =1 ))) where v i is the i th largest positive weighting factor and d [ i ] ( Φ MMSE ( { P k } Kk =1 )) is the i th largestdiagonal element. Roughly speaking, this design scheme exhibits preference for data streams withbetter channel state information. It is proved in Appendix A that for this objective, g ( λ ( { F k } Kk =1 )) = ψ [ N − λ ( { F k } Kk =1 )] . It follows that g [ γ ( { F k } Kk =1 )] = N X i =1 v i log (cid:0) − γ i ( { F k } Kk =1 ) (cid:1) (42)and the optimization problem is formulated as min f k,i N X i =1 v i log − Q Kk =1 f k,i h k,i Q Kk =1 ( f k,i h k,i + 1) ! s . t . N X i =1 f k,i = P k . (43)The optimization problem in (43) has a similar form to that in (39), except that there are anumber of weighting factors v i in the objective function of (43). Therefore, the iterative water-ﬁlling solution of f k,i can be obtained similarly to that in (40) but with µ k replaced by µ k /v k . D. Relationship with Existing Solutions

By comparing our proposed optimal solution given by

Property 4 with existing solutions forvarious systems in the literature, we ﬁnd that our proposed solution reduces to the followingexisting solutions by setting some system parameters accordingly: • the robust design with weighted MSE minimization for a dual-hop AF MIMO relaying systemwithout source precoder in [11], by setting K = 2 , Σ ∝ I M , and P = I N ; • the robust design for a dual-hop AF MIMO relaying system in [12], by setting K = 2 , Ψ ∝ I N , and P = I N ; • the transceiver design with weighted MSE minimization for a dual hop system with perfectCSI in [5], by setting K = 2 , Ψ k = Σ k = , W = I N and P = I N ; • the transceiver design for a dual hop system with perfect CSI in [8], by setting K = 2 , Ψ k = Σ k = and W = I N ; November 8, 2018 DRAFT8 • the transceiver design with capacity maximization for a dual hop system with perfect CSI in[4], by setting K = 2 , Ψ k = Σ k = and P = I N ; • the robust design with weighted MSE minimization for a point-to-point MIMO system in [17],by setting K = 1 ; and • the robust design with capacity maximization for a point-to-point MIMO system [18], bysetting K = 1 .In other words, our proposed solution covers the above designs as special cases. It furtherveriﬁes the correctness and optimality of our proposed solution. E. Discussions

The optimal structure of F k in (34) is derived under the condition Ψ k ∝ I or Σ k ∝ I . Thiscondition can be easily satisﬁed in practice. We notice that the expressions for Ψ k and Σ k generally depend on speciﬁc channel estimation algorithms. Denote the transmit and receiveantenna correlation matrices and the channel estimation error variance in the k th hop as R T,k , R R,k and σ e,k , respectively. Applying the widely used channel estimation algorithms in [17]and [18], the covariance matrices for channel estimation errors can be written as Ψ k = R T,k and Σ k = σ e,k ( I M k + σ e,k R − R,k ) − . Clearly, when the receive antennas are spaced widely, i.e., R R,k ∝ I M k , we directly have Σ k ∝ I M k . Moreover, when the length of training is long, thevalue of σ e,k will be small and I M k + σ e,k R − R,k ≈ I M k . As a result, Σ k can be approximated as anidentity matrix even when R R,k I M k . Furthermore, when the channel statistics are unknownand the least-squares channel estimator is applied, it can be derived that Σ k ∝ I M k [11]. Onthe other hand, when the transmit antennas are spaced widely, i.e., R T,k ∝ I N k , we can obtain Ψ k ∝ I N k .For the general case when Ψ k I N k and Σ k I M k , to the best of our knowledge, ﬁnding aclosed-form optimal solution of the robust design problem is still open, even for point-to-pointMIMO systems [17], [20]. The main difﬁculty comes from the fact that when Ψ k I N k and Σ k I M k , K F k /η f k varies with F k , and so is not a constant. However, for this general case, K F k /η f k in (33) can be replaced by an upper bound K F k /η f k (cid:22) P k λ ( Ψ k ) / ( P k λ ( Ψ k ) α k + σ n k ) Σ k + σ n k / ( P k λ ( Ψ k ) α k + σ n k ) I M k , (44)such that it is not a function of F k . Notice that the above equality holds when Ψ k ∝ I N k or Σ k ∝ I M k . Then the proposed solution can still be applied for this general case. November 8, 2018 DRAFT9

When there are two hops ( K = 2 ), our proposed optimal structure is different from thatderived in [14] (comparing (34) with Equation (16) in [14]). In [14], the solution structure isobtained by implicitly approximating a design-variable-dependent covariance matrix (the matrixA in [14]) as constant. Since the approximation is tight only when the covariance matrix ofchannel estimation errors seen from the receiver side is proportional to an identity matrix, i.e., Ψ k ∝ I N k , k = 1 , , the proposed solution in [14] is sub-optimal when Ψ k I N k , k = 1 , .In other words, for dual-hop systems, our proposed solution is optimal under a wider range ofcases than that in [14], since it is optimal when either Σ k or Ψ k is proportional to an identitymatrix.With respect to the complexity, it is clear from (34) that the complexity of our algorithm isdue to two kinds of operations, i.e., the iterative water-ﬁlling computation for the inner diagonalmatrix in (34) and the decomposition/multiplication for the matrices on the lefthand and righthandsides of the diagonal matrix in (34). Comparing the structures of the solution in (34) and thatin [14], similar operations are needed to obtain the solution in [14]. So we can expect that thecomplexity of our approach is comparable to that in [14].V. S IMULATION R ESULTS

In this section, the performance of the proposed robust designs is evaluated by simulations.In the simulations, the number of antennas at each node is set to four. At the source node,four independent data streams are transmitted and in each data stream, N Data = 10 inde-pendent quadrature phase shifting keying (QPSK) symbols are transmitted. The correlationmatrices corresponding to the channel estimation errors are chosen according to the widelyused exponential model, i.e., [ Ψ k ] i,j = σ e α | i − j | and [ Σ k ] i,j = β | i − j | , where α and β arethe correlation coefﬁcients, and σ e denotes the variance of the channel estimation error [12],[20]. The estimated channel matrices ¯H k ’s, are generated following the widely used complexGaussian distributions, ¯H k ∼ CN M k ,N k ( M k ,N k , (1 − σ e ) /σ e Σ k ⊗ Ψ T k ) [12], [28], such thatchannel realizations H k = ¯H k + ∆ H k have unit variance. The signal-to-noise ratio ( SNR )for the k th link is deﬁned as P k /σ n k and each point in the following ﬁgures shows an averageresult of trials.A dual hop system ( K = 2 ) with error correlation coefﬁcients of α = 0 . and β = 0 (i.e., Ψ k I , Σ k ∝ I , k = 1 , ) is considered ﬁrst. Fig. 2 shows the weighted MSE at the November 8, 2018 DRAFT0 destination when the weighting matrix is arbitrarily chosen as W = diag { [0 . . .

26 0 . } and P k /σ n k = 30 dB. For comparison, the performance of the algorithm based on the estimatedchannel only (labeled as non-robust design) [8], the robust algorithm proposed by Rong in [14]and the robust algorithm without source precoding in [12] is also shown. It is clear from the ﬁgurethat our proposed robust design offers the best performance, while the non-robust design is theworst. Fig. 3 shows the sum-rates of various algorithms for the considered two-hop AF MIMOrelaying system. It can be seen that the robust algorithms generally have better performancethan the algorithm based on estimated CSI only. Furthermore, the performance of the proposedrobust design is much better than that of the robust algorithm in [14].Next a three-hop AF MIMO relaying system, i.e., K = 3 , is considered to further investigatethe effectiveness of the proposed robust design. Since there are few (if any) robust transceiverdesign algorithms proposed for multi-hop AF MIMO systems in the literature, our proposedrobust design is mainly compared with the non-robust design in [10] in the following. With theweighting matrix being arbitrarily selected as W = diag { [0 .

26 0 .

25 0 .

25 0 . } , Fig. 4 showsthe weighted MSE at the destination when P k /σ n k = 30 dB. Here two sets of error correlationcoefﬁcients, ( α = 0 . , β = 0) , and ( α = 0 , β = 0 . , are taken as examples. They correspondto the cases of ( Ψ k I , Σ k ∝ I ) and ( Ψ k ∝ I , Σ k I ) , respectively. It can be seen that theproposed algorithm shows similar performance for the two cases and always outperforms thenon-robust design based on the estimated CSI only. When there is no channel estimation error,i.e., σ e = 0 , the performance of the two algorithms is the same as expected.Fig. 5 shows the sum-rates at different SNRs ( SNR = P k /σ n k ) for the three-hop system. TheSNRs at various hops are set as the same for simplicity. The correlation coefﬁcients for thechannel estimation errors are taken as α = 0 . and β = 0 . It is further demonstrated that theproposed algorithm shows better performance than the non-robust algorithm based on estimatedCSI only. Furthermore, as the estimation errors increase, the performance gap between the twoalgorithms enlarges. This result coincides with that for the weighted-MSE-based robust designshown in Fig. 4. The performance of the maximum MSE across four data streams with α = 0 and β = 0 . is then shown in Fig. 6. Similarly, it is observed that the performance gain of theproposed robust design over the non-robust design with estimated CSI only becomes larger asSNR increases. The performance gap is also more apparent when σ e increases.Finally, Fig. 7 shows the bit-error-rate (BER) performance for the three-hop systems with November 8, 2018 DRAFT1 different design criteria: capacity maximization, sum MSE minimization (i.e., weighted MSEminimization with W = I ) and MAX-MSE minimization. The parameters are chosen as α = 0 . , β = 0 and σ e = 0 . . It can be seen that in terms of BER performance, the former two criteriaperform worse than the latter one since the latter criterion targets the BER performance more.Moreover, the non-robust design with capacity maximization based on estimated CSI only is alsogiven and the results further verify the performance advantage of the proposed robust designsover the non-robust design with estimated CSI only.VI. C ONCLUSIONS

Bayesian robust transceiver design for multi-hop AF MIMO relaying systems with channelestimation errors has been considered. Various transceiver design criteria including weightedMSE minimization, capacity maximization, worst MSE minimization and weighted sum-ratemaximization have been discussed and formulated into a uniﬁed optimization problem. Usingmajorization theory and properties of matrix-variate functions, the optimal structure of the robusttransceivers has been derived. Then the transceiver design problems have been greatly simpliﬁedand solved by iterative water-ﬁlling algorithm. The performance of the proposed transceiverdesigns has been demonstrated via simulation results.A

PPENDIX AP ROOF OF P ROPERTY

Obj 1:

With the objective function of (6) and the MMSE matrix in (18), we have f ( I N − Q H0 ΘQ ) = Tr( W ) − Tr( WQ H0 ΘQ ) ≥ Tr( W ) − N X i =1 λ i ( W ) λ i ( Θ ) | {z } , g ( λ ( Θ )) (45)where the inequality follows from the fact that for two positive semi-deﬁnite matrices W and Θ , Tr ( WQ H0 ΘQ ) ≤ P i λ i ( W ) λ i ( Q H0 ΘQ ) with λ i ( Z ) denoting the i th largest eigenvalue of Z . Furthermore, the second equality in (45) holds when Q H0 ΘQ = U W diag( λ ( Θ )) U H W where λ ( Θ ) = [ λ ( Θ ) , · · · , λ N ( Θ )] T and U W is the unitary matrix containing the eigenvectors of November 8, 2018 DRAFT2 W as columns [32]. It implies that the optimal value of f ( I N − Q H0 ΘQ ) is g ( λ ( Θ )) and isachieved when Q H0 ΘQ = U W diag( λ ( Θ )) U H W .Using the Lemma [32] and the deﬁnition of g ( λ ( Θ )) in (45), it can be easily found that g ( λ ( Θ )) is a Schur-concave function with respect to λ ( Θ ) . Furthermore, for two vectors v ≤ u (i.e., v i ≤ u i ), from the deﬁnition of g ( λ ( Θ )) in (45), it can be concluded that g ( v ) ≥ g ( u ) . Itmeans that g ( • ) is a decreasing function. Obj 2:

For the second objective function given by (8), it is directly obtained that f ( I N − Q H0 ΘQ ) = log | I N − Q H0 ΘQ | = N X i =1 log[1 − λ i ( Θ )] | {z } , g ( λ ( Θ )) . (46)Obviously, the above equality holds unconditionally and thus the objective function f ( I N − Q H0 ΘQ ) is independent of Q . It follows from the optimization problem (20) that Q can takeany arbitrary unitary matrix since it is only involved in the constraint of Q H0 Q = I . Therefore, Q H0 ΘQ = U Arb diag( λ ( Θ )) U HArb with U Arb being an arbitrary unitary matrix always holds.Meanwhile, the optimal value of f ( I N − Q H0 ΘQ ) can always be written as g ( λ ( Θ )) . Based onthe Lemma [32] and the deﬁnition of g ( λ ( Θ )) in (46), it can also be proved that g ( λ ( Θ )) is a decreasing Schur-concave function with respect to λ ( Θ ) . Obj 3:

For the diagonal elements of the positive semi-deﬁnite matrix Φ MMSE ( { P k } Kk =1 ) = I N − Q H0 ΘQ , we have the following majorization relationship [32]: d( I N − Q H0 ΘQ ) ≻ N − ( X Ni =1 λ i ( Θ ) /N ) ⊗ N (47)where the equality holds if and only if [ Q H0 ΘQ ] i,i = P Ni =1 λ i ( Θ ) /N , and N is the N × all-one vector.For the third objective function in (9), as ψ ( • ) is increasing and Schur-convex, the objectivefunction in (20) satisﬁes [24] f ( I N − Q H0 ΘQ ) = ψ (d( I N − Q H0 ΘQ )) ≥ ψ (cid:16) N − ( X Ni =1 λ i ( Θ ) /N ) ⊗ N (cid:17)| {z } , g [ λ ( Θ )] , (48)with equality if and only if [ Q H0 ΘQ ] i,i = P Ni =1 λ i ( Θ ) /N . As shown in [24], when Q H0 ΘQ = Q DFT diag( λ ( Θ )) Q HDFT where Q DFT is a DFT matrix, Q H0 ΘQ has identical diagonal elements.It follows that when Q H0 ΘQ = Q DFT diag( λ ( Θ )) Q HDFT , the objective function f ( I N − Q H0 ΘQ ) will take minimum/optimal value of g ( λ ( Θ )) . November 8, 2018 DRAFT3

Based on the fact that ψ ( • ) is an increasing and Schur-convex function, it can be directlyconcluded from (48) that g ( λ ( Θ )) is a decreasing function of λ ( Θ ) . Furthermore, based on theLemma [32], ψ ( • ) is also a Schur-concave function of λ ( Θ ) . Obj 4:

Notice that for the positive semi-deﬁnite matrix Φ MMSE ( { P k } Kk =1 ) = I N − Q H0 ΘQ , d( I N − Q H0 ΘQ ) ≺ λ ( I N − Q H0 ΘQ ) [24]. With the Schur-concave function of ψ ( • ) in (11),we have f ( I N − Q H0 ΘQ ) = ψ (d( I N − Q H0 ΘQ )) ≥ ψ ([ N − λ ( Θ )]) | {z } , g [ λ ( Θ )] , (49)where the equality holds when [ Q H0 ΘQ ] i,i = λ i ( Θ ) . It is easy to see that when Q H0 ΘQ = I N diag( λ ( Θ )) I N , the preceding condition is satisﬁed and then the objective function f ( I N − Q H0 ΘQ ) achieves its minimum/optimal value of g ( λ ( Θ )) .Since ψ ( • ) is increasing and Schur-concave, it is clear that g ( λ ( Θ )) is decreasing withrespect to λ ( Θ ) . Moreover, using [32, 3.A.6.a], it can be proved that ψ ( N − λ ( Θ )) is alsoSchur-concave with respect to λ ( Θ ) . A PPENDIX BP ROOF OF P ROPERTY A and B with compatible dimensions, λ i ( AB ) = λ i ( BA ) [32, 9.A.1.a]. Together with the fact that for two positive semi-deﬁnite matrices A and B , Q ki =1 λ i ( AB ) ≤ Q ki =1 λ i ( A ) λ i ( B ) [32, 9.H.1.a], we have k Y i =1 λ i ( A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A ) ≤ k Y i =1 λ i ( A H2 Q H2 · · · A H K Q H K Q H K A K · · · A Q ) λ i ( Q A A H1 Q H1 ) | {z } = λ i ( A A H1 ) k = 1 , · · · , N. (50)Repeating this process, we have the following inequality: k Y i =1 λ i ( A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A ) ≤ k Y i =1 λ i ( A H K A K ) λ i ( A H K − A K − ) · · · λ i ( A H1 A ) | {z } , γ i ( { F k } Kk =1 ) . (51) Note that in general A k is not a square matrix. November 8, 2018 DRAFT4

Based on (51) and in [32], we directly have λ ( A H1 Q H1 · · · A H K Q H K Q K A K · · · Q A ) ≺ w [ γ ( { F k } Kk =1 ) · · · γ N ( { F k } Kk =1 )] T , γ ( { F k } Kk =1 ) (52)where a ≺ w b denotes that a is weakly majorized by b [32] and the equality holds if and onlyif the neighboring A k ’s satisfy Q k = V A k +1 U H A k , k = 1 , · · · , K − (53)where U A k and V A k are deﬁned based on the following singular value decomposition: A k = U A k Λ A k V H A k with Λ A k ց . As g ( • ) is a decreasing and Schur-concave function, we have [32] g [ λ ( Θ )] ≥ g [ γ ( { F k } Kk =1 )] (54)with equality if and only if (53) holds. Finally, based on the deﬁnition of A k in (18), using thematrix inversion lemma, the following equality holds: A H k A k = I M k − ( F H k ¯H H k K − F k ¯H k F k + I M k ) − . (55)It follows that λ i ( A H k A k ) = λ i ( F H k ¯H H k K − F k ¯H k F k ) / [1 + λ i ( F H k ¯H H k K − F k ¯H k F k )] . Based on thisresult, γ i ( { F k } Kk =1 ) in (51) equals γ i ( { F k } Kk =1 ) = K Y k =1 λ i ( F H k ¯H H k K − F k ¯H k F k )1 + λ i ( F H k ¯H H k K − F k ¯H k F k ) . (56)A PPENDIX CP ROOF OF P ROPERTY F k , denoted by F k, opt , transmission is not atthe maximum power, i.e., Tr( F k, opt F H k, opt ) < P k ; then we have a , q P k / Tr( F k, opt F H k, opt ) > .Deﬁning ˆF k , a F k, opt , it follows that ˆF H k ¯H H k K − ˆF k ¯H k ˆF k = F H k, opt ¯H H k (Tr( F k, opt F H k, opt Ψ k ) Σ k + σ n k /a I ) − ¯H k F k, opt (cid:23) F H k, opt ¯H H k (Tr( F k, opt F H k, opt Ψ k ) Σ k + σ n k I ) − | {z } K − F k, opt ¯H k F k, opt . (57) November 8, 2018 DRAFT5

Note that A (cid:23) B means that λ i ( A ) ≥ λ i ( B ) for all i , and therefore (57) implies λ i ( ˆF H k ¯H H k K − ˆF k ¯H k ˆF k ) ≥ λ i ( F H k, opt ¯H H k K − F k, opt ¯H k F k, opt ) . (58)Moreover, it is clear from the deﬁnition of γ i ( { F k } Kk =1 ) in (27) that γ i ( { F k } Kk =1 ) is an increas-ing function of λ i ( F H k ¯H H k K − F k ¯H k F k ) . It then follows that γ ( { F k = ˆF k } Kk =1 )) ≥ γ ( { F k = ˆF k, opt } Kk =1 ) . Together with the fact that g ( • ) is a decreasing function, it is concluded that g [ γ ( { F k = ˆF k } Kk =1 ))] ≤ g [ γ ( { F k = ˆF k, opt } Kk =1 )] . It is obvious that this result contradicts theoptimality of F k, opt , and therefore a necessary condition for the optimal F k is Tr( F k F H k ) = P k .Furthermore, when Tr( F k F H k ) = P k , the following equality holds: Tr[ F k F H k ( α k P k Ψ k + σ n k I )] = α k P k Tr( F k F H k Ψ k ) + σ n k Tr( F k F H k ) | {z } = P k = α k P k Tr( F k F H k Ψ k ) + σ n k P k . (59)Deﬁning η f k = α k Tr( F k F H k Ψ k )+ σ n k with α k = Tr( Σ k ) /M k , (59) can be rewritten as Tr[ F k F H k ( α k P k Ψ k + σ n k I )] = P k η f k . In other words, the power constraint Tr( F k F H k ) = P k is equivalent to Tr[ F k F H k ( α k P k Ψ k + σ n k I )] /η f k = P k . (60)A PPENDIX DP ROOF OF P ROPERTY Problem reformulation:

As shown in (27), γ i ( { F k } Kk =1 ) is a complicated function of λ ( F H k ¯H H k K − F k ¯H k F k ) .Clearly, F k appears in multiple positions. In particular, K F k is a function of F k which complicatesthe derivation of optimal solutions. In order to simplify the problem, λ ( F H k ¯H H k K − F k ¯H k F k ) isreformulated as λ ( F H k ¯H H k K − F k ¯H k F k ) = λ [ ˜F H k ( α k P k Ψ k + σ n k I N k ) − / ¯H H k ( K F k /η f k ) − / × ( K F k /η f k ) − / ¯H k ( α k P k Ψ k + σ n k I N k ) − / | {z } , H k ˜F k ] , (61)where ˜F k is deﬁned as ˜F k = 1 / √ η f k ( α k P k Ψ k + σ n k I N k ) / F k . (62)The right hand side of (61) is easier to handle than the left hand side. This is because when Ψ k ∝ I N k or Σ k ∝ I M k , H k is independent of ˜F k . In the following, we will prove this in detail. November 8, 2018 DRAFT6

It is obvious that H k being independent of ˜F k is equivalent to K F k /η f k being independentof ˜F k . First consider Ψ k ∝ I N k , i.e., Ψ k = β k I N k . With the deﬁnitions of K F k in (17) and η f k , K F k /η f k equals K F k /η f k = [ β k Tr( F k F H k ) Σ k + σ n k I M k ] / [ β k α k Tr( F k F H k ) + σ n k ]= ( β k P k Σ k + σ n k I M k ) / ( α k β k P k + σ n k ) , (63)where the second equality is based on the fact that Tr( F k F H k ) = P k for the optimal F k . On theother hand, when Σ k ∝ I M k (i.e, Σ k = α k I M k ), K F k /η f k equals K F k /η f k = [ α k Tr( F k F H k Ψ k ) I N k + σ n k I N k ] / [ α k [Tr( F k F H k Ψ k ) + σ n k ]= I N k . (64)Therefore, when Ψ k ∝ I N k or Σ k ∝ I M k , K F k /η f k is independent of ˜F k .Using the substitution (62), the optimization problem (32) is reformulated as min ˜F k g [ γ ( { ˜F k } Kk =1 )]s . t . γ i ( { ˜F k } Kk =1 ) = K Y k =1 λ i ( ˜F H k H H k H k ˜F k )1 + λ i ( ˜F H k H H k H k ˜F k )Tr( ˜F k ˜F H k ) = P k . (65) Structure of optimal ˜F k : For the optimal ˜F k , denoted as ˜F k, opt , based on the following singularvalue decompositions: H k ˜F k, opt = U M k Λ M k V H M k with Λ M k ց and H k = U H k Λ H k V H H k with Λ H k ց , (66)we can construct a matrix ¯F k , ¯F k = V H k Λ X k V H M k , (67)where Λ X k is an unknown diagonal matrix with the same rank as Λ M k and Λ H k Λ X k /b = Λ M k ,and the scalar b is chosen to make Tr( ¯F k ¯F H k ) = P k hold.Because H k is independent of the unknown variable ˜F k , using Lemma 12 in [24], we have ¯F H k H H k H k ¯F k (cid:23) ˜F H k, opt H H k H k ˜F k, opt . (68) November 8, 2018 DRAFT7

Taking eigenvalues of both sides, we have λ i ( ¯F H k H H k H k ¯F k ) ≥ λ i ( ˜F H k, opt H H k H k ˜F k, opt ) [32]. Since γ i ( { ˜F k } Kk =1 ) is an increasing function of λ i ( ˜F H k H H k H k ˜F k ) , we directly have that γ ( { ˜F k = ¯F k } Kk =1 ) ≥ γ ( { ˜F k = ˜F k, opt } Kk =1 ) . Furthermore, as the objective function g ( • ) of (65) is a decreasing function,we ﬁnally have g [ γ ( { ˜F k = ¯F k } Kk =1 )] ≤ g [ γ ( { ˜F k = ˜F k, opt } Kk =1 )] . Because ˜F k, opt is the optimalsolution, ˜F k, opt must be in the form of ¯F k . Therefore, the structure of optimal ˜F k is given by(67), i.e., ˜F k, opt = V H k Λ X k V H M k .As the minimum dimension of A k is N , on substituting (67) into γ i ( { F k } Kk =1 ) (51), it can beseen that for the optimal solution only the N × N principal submatrix of Λ X k can be nonzero,which is denoted as Λ F k . As a result, ˜F k, opt has the following structure: ˜F k, opt = V H k ,N Λ F k V H M k ,N . (69)It is clear that the values of V M k ’s do not affect the values of λ i ( ˜F H k H H k H k ˜F k ) , the constraint Tr( ˜F k ˜F H k ) = P k and the objective function in the optimization problem (65). Therefore, V M k can be an arbitrary unitary matrix. Structure of optimal F k : Based on the relationship between F k and ˜F k given in (62), F k, opt = √ η f k ( α k P k Ψ k + σ n k I N k ) − / ˜F k, opt . (70)Putting the structure of F k, opt in (70) into η f k in (31), η f k can be solved to be η f k = σ n k / { − α k Tr[ V H H k ,N ( α k P k Ψ k + σ n k I N k ) − / Ψ k ( α k P k Ψ k + σ n k I N k ) − / V H k ,N Λ F k ] } . (71)Clearly in (71), η f k is a function of Λ F k and it can be denoted as η f k = ξ k ( Λ F k ) for clariﬁcation.R EFERENCES [1] J. N. Laneman, D. N. C. Tse, and G. W. Wornell, “Cooperative diversity in wireless networks: Efﬁcient protocols andoutage behavior,”

IEEE Trans. Inf. Theory , vol. 50, no. 12 pp. 3062–3080, Dec. 2004.[2] S. Jin, M. R. Mckay, C. Zhong, and K.-K. Wong, “Ergodic capacity analysis of amplify-and-forward MIMO dual-hopsystems,”

IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2204–2224, May 2010.[3] O. Munoz-Medina, J. Vidal, and A. Agustin, “Linear transceiver design in nonregenerative relays with channel stateinformation,”

IEEE Trans. Signal Process. , vol. 55, no. 6, pp. 2593–2604, June 2007.[4] X. Tang and Y. Hua, “Optimal design of non-regenerative MIMO wireless relays,”

IEEE Trans. Wireless Commun. , vol.6, no. 4, pp. 1398–1407, Apr. 2007.[5] W. Guan and H. Luo, “Joint MMSE transceiver design in non-regenerative MIMO relay systems,”

IEEE Commun. Lett. ,vol. 12, no. 7, pp. 517–519, July 2008.

November 8, 2018 DRAFT8 [6] F.-S. Tseng, W.-R. Wu, and J.-Y. Wu “Joint source/relay precoder design in nonregenerative cooperative systems using anMMSE criterion,”

IEEE Trans. Wireless Commun. , vol. 8, no. 10, pp. 4928–4933, Oct. 2009.[7] R. Mo and Y. Chew, “Precoder design for non-regenerative MIMO relay systems,”

IEEE Trans. Wireless Commun. , vol.8, no. 10, pp. 5041–5049, Oct. 2009.[8] Y. Rong, X. Tang, and Y. Hua, “A uniﬁed framework for optimizing linear nonregenerative multicarrier MIMO relaycommunication systems,”

IEEE Trans. Signal Process. , vol. 57, no. 12, pp. 4837–4851, Dec. 2009.[9] C. Li, X. Wang, L. Yang, and W.-P. Zhu, “A joint source and relay power allocation scheme for a class of MIMO relaysystems,”

IEEE Trans. Signal Process. , vol. 57, no. 12, pp. 4852–4860, Dec. 2009.[10] Y. Rong and Y. Hua, “Optimality of diagonalization of multi-hop MIMO relays,”

IEEE Trans. Wireless Commun. , vol. 8,pp. 6068-6077, Dec. 2009.[11] C. Xing, S. Ma, Y.-C. Wu, and T.-S. Ng, “Transceiver design for dual-hop non-regenerative MIMO-OFDM relay systemsunder channel uncertainties,”

IEEE Trans. Signal Process. , vol. 58, no. 12, pp. 6325–6339, Dec. 2010.[12] C. Xing, S. Ma, and Y.-C. Wu, “Robust joint design of linear relay precoder and destination equalizer for dual-hopamplify-and-forward MIMO relay Systems,”

IEEE Trans. Signal Process. , vol. 58, no. 4, pp. 2273–2283, Apr. 2010.[13] B. K. Chalise and L. Vandendorpe, “Joint linear processing for an amplify-and-forward MIMO relay channel with imperfectchannel state information,”

EURASIP J. Adv. Signal Process. , vol. 2010, Article ID 640186, doi:10.1155/2010/640186.[14] Y. Rong, “Robust design for linear non-regenerative MIMO relays with imperfect channel state information,”

IEEE Trans.Signal Process. , vol. 59, no. 5, pp. 2455–2460, May 2011.[15] C. Xing, S. Ma, Z. Fei, Y.-C. Wu, and J. Kuang, “Joint robust weighted LMMSE transceiver design for dual-hop AFmultiple-antenna relay systems,” in Proc. IEEE Global Commun. Conf. , Houston, TX, USA, Dec. 2011.[16] C. Xing, Z. Fei, Y.-C. Wu, S. Ma, and J. Kuang, “Robust transceiver design for AF MIMO relaying systems with columncorrelations,” in Proc. IEEE International Conference on Signal Processing, Communications and Computing,

Xi’an, China,Sep. 2011.[17] M. Ding and S. D. Blostein, “MIMO minimum total MSE transceiver design with imperfect CSI at both ends,”

IEEETrans. Signal Process. , vol. 57, no. 3, pp. 1141–1150, Mar. 2009.[18] M. Ding and S. D. Blostein, “Maximum mutual information design for MIMO systems with imperfect channel knowledge,”

IEEE Trans. Inf. Theory , vol. 56, no. 10, pp.4793–4801, Oct. 2010.[19] J. Wang and D. P. Palomar, “Worst-case robust MIMO transmission with imperfect channel knowledge,”

IEEE Trans.Signal Process. , vol.57, no. 9, pp. 3575–3587, Sep. 2009.[20] X. Zhang, D. P. Palomar, and B. Ottersten, “Statistically robust design of linear MIMO transceivers,”

IEEE Trans. SignalProcess. , vol. 56, no. 8, pp. 3678–3689, Aug. 2008.[21] D. P. Palomar and Y. Jiang, “MIMO Transceiver Design via Majorization Theory,”

Foundations and Trends in Communi-cations and Information Theory , Now Publishers, vol. 3, no. 4–5, pp. 331–551, 2006.[22] E. Jorswieck and H. Boche, “Majorization and Matrix-Monotone Functions in Wireless Communications,”

Foundationsand Trends in Communications and Information Theory , Now Publishers, vol. 3, no. 6, pp. 553–701, 2007.[23] A. Beck, A. Ben-Tal and Y. C. Eldar, “Robust mean-squared error estimation of multiple signals in linear systems affectedby model and noise uncertainties,”

Math. Programming, vol.107, pp. 155-187, 2006.[24] D. P. Palomar, J. M. Ciofﬁ, and M. A. Lagunas, “Joint Tx-Rx beamforming design for multicarrier MIMO channels: Auniﬁed framework for convex optimization,”

IEEE Trans. Signal Process. , vol. 51, no. 9, pp. 2381–2401, Sep. 2003.

November 8, 2018 DRAFT9 [25] H. Sampath, P. Stoica, and A. Paulraj, “Generalized linear precoder and decoder design for MIMO channels using theweighted MMSE criterion,”

IEEE Trans. Commun. , vol. 49, no. 12, pp. 2198–2206, Dec. 2001.[26] B. Hassibi and B. M. Hochwald, “How much training is needed in mutiple-antenna wireless links?”

IEEE Trans. Inf.Theory, vol. 49, no. 4, pp. 951–963, Apr. 2003.[27] T. Yoo and A. Goldsmith “Capacity and power allocation for fading MIMO channels with channel estimation error,”

IEEETrans. Inf. Theory, vol. 52, no. 5, pp. 2203–2214, May 2006.[28] L. Musavian, M. R. Nakhi, M. Dohler, and A. H. Aghvami, “Effect of channel uncertianty on the mutual information ofMIMO fading channels,”

IEEE Trans. Veh. Technol. , vol. 56, no. 5, pp. 2798–2806, Sep. 2007.[29] S. Kay,

Fundamental of Statistical Signal Processing: Estimation Theory , Englewood Cliffs, NJ: Prentice-Hall, 1993.[30] W. Yu, W. Rhee, S. Boyd, and J. Ciofﬁ, “Iterative water-ﬁlling for Gaussian vector multiple access channels,”

IEEE Trans.Inf. Theory , vol. 50, no. 1, pp.145–151, Jan. 2004.[31] W. Zhang, U. Mitra, and M. Chiang, “Optimization of amplify-and-forward multicarrier two-hop transmission,”

IEEETrans. Commun. , vol. 59, no. 5, pp. 1434–1445, May 2011[32] A. W. Marshall and I. Olkin,

Inequalities: Theory of Majorization and Its Applications . New York: Academic Press, 1979.

Relay 1Source Destination ˆ s P P H H K H N M N K M Fig. 1. A multi-hop amplify-and-forward MIMO relaying system.

November 8, 2018 DRAFT0 −1 σ e2 W e i gh t ed M SE The non−robust design in [8]The robust design with P = I in [12]Rong’s Algorithm in [14]The proposed robust design Fig. 2. Weighted MSE of the detected data in a dual-hop AF relaying system, when α = 0 . , β = 0 and P k /σ n k = 30 dB.

10 15 20 25 30 35 406789101112131415

SNR (dB) S u m − r a t e ( b i t s / s / H z ) The proposed robust designRong’s algorithm in [14]The non−robust design in [8] σ e2 =0.04 σ e2 =0.02 Fig. 3. Sum-rate of a dual-hop AF relaying system when α = 0 . and β = 0 . November 8, 2018 DRAFT1 −1 σ e2 W e i gh t ed M SE The non−robust design in [10] with α =0.6 and β =0The proposed robust design with α =0.6 and β =0The non−robust design in [10] with α =0 and β =0.6The proposed robust design with α =0 and β =0.6 Fig. 4. Weighted MSE of the detected data in a three-hop AF relaying system when P k /σ n k = 30 dB.

10 15 20 25 30 35 406810121416182022

SNR (dB) S u m − r a t e ( b i t s / s / H z ) The proposed robust designThe non−robust design in [10] σ e2 =0.004 σ e2 =0.01 σ e2 =0.04 Fig. 5. Sum-rate of a three-hop AF relaying system when α = 0 . and β = 0 . November 8, 2018 DRAFT2

10 15 20 25 30 3510 −1 SNR (dB) M AX − M SE The non−robust design in [10]The proposed robust design σ e2 =0.01 σ e2 =0.002 σ e2 =0 Fig. 6. Maximum MSE among different received data streams in a three-hop relaying system with α = 0 and β = 0 . . −3 −2 −1 SNR (dB) BE R Capacity maximization non−robust design in [10]Capacity maximization robust designSum−MSE minimization robust design MAX−MSE minimization robust design

Fig. 7. BERs of the proposed robust design with different design objectives, when α = 0 . , β = 0 and σ e = 0 . ..