[PDF] Joint Source and Relay Precoding Designs for MIMO Two-Way Relaying Based on MSE Criterion

Abstract

Properly designed precoders can significantly improve the spectral efficiency of multiple-input multiple-output (MIMO) relay systems. In this paper, we investigate joint source and relay precoding design based on the mean-square-error (MSE) criterion in MIMO two-way relay systems, where two multi-antenna source nodes exchange information via a multi-antenna amplify-and-forward relay node. This problem is non-convex and its optimal solution remains unsolved. Aiming to find an efficient way to solve the problem, we first decouple the primal problem into three tractable sub-problems, and then propose an iterative precoding design algorithm based on alternating optimization. The solution to each sub-problem is optimal and unique, thus the convergence of the iterative algorithm is guaranteed. Secondly, we propose a structured precoding design to lower the computational complexity. The proposed precoding structure is able to parallelize the channels in the multiple access (MAC) phase and broadcast (BC) phase. It thus reduces the precoding design to a simple power allocation problem. Lastly, for the special case where only a single data stream is transmitted from each source node, we present a source-antenna-selection (SAS) based precoding design algorithm. This algorithm selects only one antenna for transmission from each source and thus requires lower signalling overhead. Comprehensive simulation is conducted to evaluate the effectiveness of all the proposed precoding designs.

Full PDF

aa r X i v : . [ c s . I T ] D ec Joint Source and Relay Precoding Designs forMIMO Two-Way Relaying Based on MSECriterion

Rui Wang and Meixia Tao*,

Senior Member, IEEE

Abstract

Properly designed precoders can signiﬁcantly improve the spectral efﬁciency of multiple-input multiple-output (MIMO) relay systems. In this paper, we investigate joint source and relay precoding design basedon the mean-square-error (MSE) criterion in MIMO two-way relay systems, where two multi-antennasource nodes exchange information via a multi-antenna amplify-and-forward relay node. This problemis non-convex and its optimal solution remains unsolved. Aiming to ﬁnd an efﬁcient way to solve theproblem, we ﬁrst decouple the primal problem into three tractable sub-problems, and then propose aniterative precoding design algorithm based on alternating optimization. The solution to each sub-problemis optimal and unique, thus the convergence of the iterative algorithm is guaranteed. Secondly, we proposea structured precoding design to lower the computational complexity. The proposed precoding structureis able to parallelize the channels in the multiple access (MAC) phase and broadcast (BC) phase. It thusreduces the precoding design to a simple power allocation problem. Lastly, for the special case where onlya single data stream is transmitted from each source node, we present a source-antenna-selection (SAS)based precoding design algorithm. This algorithm selects only one antenna for transmission from eachsource and thus requires lower signalling overhead. Comprehensive simulation is conducted to evaluatethe effectiveness of all the proposed precoding designs.

Copyright (c) 2011 IEEE. Personal use of this material is permitted. However, permission to use this material for any otherpurposes must be obtained from the IEEE by sending a request to [email protected] authors are with the Department of Electronic Engineering at Shanghai Jiao Tong University, Shanghai, 200240, P. R.China. Emails: { liouxingrui, mxtao } @sjtu.edu.cn.This work is supported by the NSF of China under grant 60902019, the Joint Research Fund for Overseas Chinese, Hong Kongand Macao Young Scholars under grant 61028001, and the Innovation Program of Shanghai Municipal Education Commissionunder grant 11ZZ19. Index Terms

Multiple-input multiple-output (MIMO), precoding, two-way relaying, non-regenerative relay, mini-mum mean-square-error (MMSE).

I. I

NTRODUCTION

Relay-assisted cooperative transmission can offer signiﬁcant beneﬁts including throughput enhance-ment, coverage extension and power reduction in wireless communications. It is therefore considered asa promising technique for the next generation wireless communication systems, such as LTE-Advancedand WiMAX. Depending on whether the relay can receive and forward signals at the same time andfrequency, there are two relay modes: full-duplex mode and half-duplex mode. Although the half-duplexrelay is more favorable for practical implementation, it is less spectrally efﬁcient than full-duplex ones.For instance, it will take four time slots for two source nodes to exchange information with the helpof a half-duplex relay when there is no direct link. To overcome the spectral efﬁciency loss caused bythe half-duplex constraint, two-way relaying has been recently proposed [1]–[4]. The notion of two-wayrelaying is to apply the principle of network coding at the relay so as to mix the signals received from twolinks for subsequent forwarding, and then apply at each destination the self-interference cancelation toextract the desired information. In contrast to the conventional one-way relaying, two-way relaying onlyneeds two time slots to complete one round of information exchange. Two-way relay strategies can bebroadly divided into two categories, decode-and-forward (DF) and amplify-and-forward (AF), similar tothose in one-way relaying. In DF-based two-way relaying, the relay decodes each individual received bitsequence, combines them together using XOR or superposition coding for example and then broadcaststo the two destinations. Decoding directly the combined bits may further improve the performance. InAF-based two-way relaying, the relay simply ampliﬁes the received superimposed signals and forwardsto the destinations. Compared with the DF relay strategy, the AF relay strategy is more attractive for itssimplicity of implementation.The multiple-input multiple-output (MIMO) technique is a signiﬁcant technical breakthrough in wirelesscommunications. By employing multiple antennas at the transmitter or the receiver, one can signiﬁcantlyimprove the transmission reliability by leveraging spatial diversity. If multiple antennas are applied atboth the transmitter and receiver sides, the channel capacity can be enhanced linearly with the minimumnumber of transmit and receive antennas. Among various MIMO techniques, transmit precoding is able toexploit the spatial multiplexing gain efﬁciently in both single-user and multi-user communication systems by making use of channel state information (CSI) at the transmitter. Incorporating the MIMO techniqueinto two-way relaying is expected to further increase the system throughput. To fully realize the beneﬁtsof MIMO and two-way relaying, efﬁcient transmit precoding by taking relay nodes into account is crucial.In this paper, we consider joint design of source and relay precoding in the MIMO two-way relay systemwhere each node is equipped with multiple antennas.Recently, a few studies have focused on MIMO two-way relaying. The ﬁrst category is based on theDF relay strategy. For example, in [5], the authors investigate and compare the capacity gain for twodifferent re-encoding operations. In [6], the boundary of capacity region of Gaussian MIMO two-wayrelay broadcast channels is derived. Furthermore, the authors in [7], [8] extend the DF-based MIMOtwo-way relay protocol to multi-user and cellular networks. From the aforementioned works, it is easyto ﬁnd that the precoding design for MIMO two-way relaying under the DF relay strategy does notdiffer much from the conventional multi-user MIMO precoding and hence many existing techniquescan be applied. The second category is based on the AF relay strategy. The authors in [9] develop analgorithm to compute the globally optimal relay beamforming matrix for a system where only the relaynode is equipped with multiple antennas and characterize the system capacity region. In [10], the optimalrelay beamforming matrix is designed to minimize the total mean-square-error (MSE) of two sources.Under the same design criterion, the authors in [11] consider the scenario with multiple multi-antennarelay nodes. Different from [9]–[11], the works [12]–[14] consider a system where the two source nodesare also equipped with multiple antennas. In [12], applying the gradient descent algorithm, an iterativescheme is introduced to ﬁnd the suboptimal relay precoder for sum-rate maximization. In [13], theauthors consider joint source and relay precoding design to maximize the sum-rate. In [14], the authorspropose a relay transceive precoding scheme by using zero-forcing (ZF) and minimum mean-square-error(MMSE) criteria with certain antenna conﬁgurations. The precoding of MIMO two-way relaying withAF strategy has also been extended to multi-user networks. For example, the authors investigate theoptimal relay precoding design for a MIMO two-way relay system with multiple pairs of users in [15]and further study the user scheduling problem in [16]. In [17], the authors design a new network-codedtransmission protocol for the same model as [15] by combining ZF beamforming and signal alignmentsuch that the intra-pair interference and inter-pair interference can be completely canceled. Other thanusing multiple antennas on each node, another way to achieve spatial diversity for AF relay strategy is toemploy network beamforming among multiple single-antenna relay nodes as in [18]–[23]. Nevertheless,the precoding design for AF MIMO two-way relaying is much more challenging than that for the DFcase.

In this study, we focus on the joint precoding design at both the source and relay nodes for MIMOtwo-way relaying with AF strategy. Our goal is to minimize the total mean-square-error (Total-MSE) oftwo users by assuming linear processing at both the transmitters and receivers. Different from [10], [11],we consider a two-way relay system where both the source and relay nodes are equipped with multipleantennas. Furthermore, we study the joint source and relay precoding design rather than relay precodingdesign only. The main contributions of this work are as follows: • Iterative precoding design: The joint optimization of source and relay precoding for Total-MSEminimization is shown to be non-convex and the optimal solution is not easily tractable. We proposean iterative algorithm to decouple the joint design problem into three sub-problems and solve eachof them in an alternating manner. In particular, we derive the optimal relay precoder in closed-formwhen source precoders and decoders are ﬁxed. Since each sub-problem can be solved optimally, theconvergence of the iterative algorithm is guaranteed. • Channel-parallelization based precoding design: We further propose a heuristic channel paralleliza-tion (CP) based precoding design algorithm for certain antenna conﬁgurations. This method appliestwo joint matrix decomposition techniques so as to parallelize the channels in the multiple access(MAC) phase and broadcast (BC) phase, respectively, of two-way relay systems. Certain structuresare hence imposed on the source and relay precoders. Based on the proposed structure, the jointprecoding design is reduced to a simple joint source and relay power allocation problem. • Source-antenna-selection based precoding design for single-data-stream transimssion: For the specialcase where only a single data stream is transmitted from each source, we introduce a source-antenna-selection (SAS) based precoding design algorithm. We ﬁnd that the SAS based precoding designcan even outperform the iterative precoding design in certain scenarios and yet has lower signallingoverhead.The rest of paper is organized as follows. In Section II, the MIMO two-way relaying model isintroduced. The iterative precoding design algorithm is presented in Section III. Section IV describesthe channel parallelization method and corresponding power allocation algorithm. The source-antenna-selection based precoding algorithm for single data stream is included in Section V. Extensive simulationresults are illustrated in Section VI. Finally, Section VII offers some concluding remarks.

Notations : Scalar is denoted by lower-case letter, bold-face lower-case letter is used for vector, andbold-face upper-case letter is for matrix. E [ · ] denotes expectation over the random variables within thebracket. ⊗ denotes the Kronecker operator. vec ( · ) and mat ( · ) signify the matrix vectorization operator and the corresponding inverse operation, respectively. Tr( A ) , A − and Rank( A ) stand for the trace, theinverse and the rank of matrix A , respectively, and Diag( a ) denotes a diagonal matrix with a being itsdiagonal entries. Superscripts ( · ) T , ( · ) ∗ and ( · ) H denote transpose, conjugate and conjugate transpose,respectively. N × M implies the N × M zero matrix and I N denotes the N × N identity matrix. || x || denotes the squared Euclidean norm of a complex vector x . | z | implies the norm of the complex number z , ℜ ( z ) and ℑ ( z ) denote its real and image part, respectively. C x × y denotes the space of x × y matriceswith complex entries. The distribution of a circular symmetric complex Gaussian vector with mean vector x and covariance matrix Σ is denoted by CN ( x , Σ ) .II. S YSTEM M ODEL

Consider an ( N, M, N ) MIMO two-way relay system where two source nodes, denoted as S and S and each equipped with N antennas, want to exchange messages through a relay node, denoted as R and equipped with M antennas. The information exchange takes two time slots as shown in Fig. 1. Inthe ﬁrst time slot (also referred to as the MAC phase), the two source nodes S and S simultaneouslytransmit the signals to the relay node R . After receiving the superimposed signal, the relay performsa linear processing by multiplying it with a precoding matrix and then forwards it in the second timeslot (also referred to as the BC phase). Without loss of generality, we assume that N data streams aretransmitted from each source in order to fully utilize the multiplexing gain. The special case with singledata stream transmission shall be investigated in Section V.Let x i ∈ C N × denote the transmit signal vector from source S i , for i = 1 , . It can be expressed as x i = A i s i , i = 1 , where s i ∈ C N × represents the information signal vector with normalized power, i.e., E ( s i s Hi ) = I N ,and A i ∈ C N × N denotes the transmit precoding matrix. Each column of A i can be interpreted as thebeamforming vector corresponding to the respective data stream in s i . The maximum transmission powerat S i is assumed to be τ i , and thus we have Tr (cid:0) A i A Hi (cid:1) ≤ τ i , i = 1 , . (1)Let y r denote the received M × signal vector at the relay node during the MAC phase. It can beexpressed as y r = H x + H x + n r , where H i ∈ C M × N is the full-rank MIMO channel matrix from S i to R , and n r denotes the additivenoise vector at the relay node, following the distribution n r ∼ CN ( , σ r I M ) . Upon receiving y r , the relay ampliﬁes it by multiplying it with a precoding matrix A r ∈ C M × M .Therefore, the M × transmit signal vector from the relay node can be expressed as x r = A r y r . The maximum transmission power at the relay node is assumed to be τ r , which yields Tr ( A r X i =1 H i A i A Hi H Hi + σ r I M ! A Hr ) ≤ τ r . (2)Then the received signal at S i during the BC phase can be written as ˜ y i = G i x r + n i = G i A r H i A i s i + G i A r H ¯ i A ¯ i s ¯ i + G i A r n r + n i , i = 1 , (3)where ¯ i = 2 if i = 1 and ¯ i = 1 if i = 2 , G i ∈ C N × M is the full-rank channel matrix from R to S i , n i denotes the additive noise vector at S i with n i ∼ CN ( , σ i I N ) . Subtracting the back propagated self-interference term G i A r H i A i s i from (3) yields the equivalent received signal vector at each destinationnode as y i = F i s ¯ i + G i A r n r + n i , i = 1 , (4)where F i = G i A r H ¯ i A ¯ i is the equivalent end-to-end MIMO channel matrix for S i .The problem in this study is joint design of the precoding matrices { A , A , A r } given the global CSI { H , H , G , G } based on the MSE criterion. Speciﬁcally, the objective is to minimize the Total-MSEof all the data streams of two users. The Total-MSE has been widely chosen as a criterion for precodingdesign in the literature, e.g., [14]–[16], [24]–[27]. Although it may not be the best criterion from theoverall performance aspect [28], the advantage of using Total-MSE is that one can obtain the optimalprecoder structure or even the closed-form solution for the precoders in some cases (see [24], [25]).For the considered MIMO two-way relay system, we show that the closed-form relay precoder can beobtained under the Total-MSE criterion for given source precoders and decoders.Before leaving this section, we provide some discussions on the signalling overhead for obtaining theCSI and the precoding information in the system. First of all, we assume that the channel characteristicsof each link change slowly enough so that they can be perfectly estimated by using pilot symbols ortraining sequences. If the channel reciprocity holds during the MAC phase and BC phase (e.g., they arein time-division duplex mode) with G = H T and G = H T , then the relay only needs to estimate thechannel parameters during the MAC phase and the global CSI can be obtained. As a result, the jointprecoding design can be conducted at the relay node and then the relay node broadcasts A i to S i , i = 1 , .To cancel self-interference and demodulate the received signals, the source nodes should estimate the corresponding channel parameters. For example, S needs to estimate G A r H to subtract the self-interference s and estimate G A r H to demodulate s . If, on the other hand, the channel reciprocitydoes not hold during the MAC phase and BC phase (e.g., they are in frequency-division duplex mode),more feedback channels and signalling overheads are required. The relay can only estimate H and H during the MAC phase. To obtain the global CSI, the relay node needs S and S to feedback G and G , respectively. III. I TERATIVE P RECODING D ESIGN

In this section, we ﬁrst formulate the joint optimization of the source and relay precoding for Total-MSEminimization in the considered MIMO two-way relay systems. This problem is shown to be non-linearand non-convex and the optimal solution is not easily tractable. To approach the global optimal solution,we propose an iterative algorithm based on alternating optimization that updates one precoder at a timewhile ﬁxing the others.According to the received signal in (4) and assuming linear receiver, the MSE at S i can be written as J i = E (cid:8) || W i y i − s ¯ i || (cid:9) , i = 1 , (5)where W i ∈ C N × N is the linear decoding matrix at the destination S i . Substituting (4) into (5), it furtheryields J i = E (cid:8) || W i ( F i s ¯ i + G i A r n r + n i ) − s ¯ i || (cid:9) =Tr (cid:8) W i F i F Hi W Hi − W i F i − F Hi W Hi + σ r W i G i A r A Hr G Hi W Hi + σ i W i W Hi + I N (cid:9) , i = 1 , (6)where we have used the fact that s i , n i and n r are mutually independent. The problem is to ﬁnd theoptimal precoding/decoding matrices { A r , A i , W i , i = 1 , } such that the Total-MSE of the two userscan be minimized. This is formulated as min A r , A i , W i ,i =1 , J + J (7) s.t. (1) (2)Before solving (7), we present the following theorem. Based on this theorem, we only consider the case M ≥ N throughout this paper. Theorem 1 : When M ≥ N , the Total-MSE J + J can be made arbitrarily small by increasing thepower at both the source nodes and the relay node in the considered ( N, M, N ) two-way relay system.Otherwise if M < N , J + J is always lower bounded by N − M ) . Proof:

We ﬁrst provide an alternative expression for the MSE of each source, J i . Since the constraintsdo not involve the decoding matrix W i in the problem formulation (7), a necessary condition for theoptimal solution is ∂J i ∂ W ∗ i = . By using the matrix differentiation rules in [29], the optimal solution of W i , denoted as W opti , can be expressed in closed-form as W opti = F Hi R − w i , i = 1 , (8)where R w i = F i F Hi + σ r G i A r A Hr G Hi + σ i I N . (9)By substituting W opti in (8) into (6), the MSE at S i , J i , transforms into ˆ J i = Tr (cid:26)h I N + F Hi (cid:0) σ i I N + σ r G i A r A Hr G Hi (cid:1) − F i i − (cid:27) , i = 1 , . (10)Therefore, the minimum Total-MSE J + J of the original problem (7) will be the same as the minimumof ˆ J + ˆ J subject to the same power constraints. For brevity of illustration, we take ˆ J as an example.Deﬁne Q = (cid:0) σ I N + σ r G A r A Hr G H (cid:1) − for simplicity of notation. Note that the rank of Q is equalto N . When M ≥ N , it is always possible to ﬁnd precoders { A r , A } to make the rank of the term F H QF equal to N . Let a n , n = 1 , , · · · , N , denote the positive eigenvalues of F H QF , then ˆ J canbe rewritten as ˆ J = Tr n [ I N + Diag ([ a , a , . . . , a N ])] − o = N X n =1

11 + a n . (11)Next, we prove that by increasing the power at both S and R , we can always increase a i and hencedecrease the MSE ˆ J . Let us deﬁne E = I N + F H QF = I N + θ θ r ¯ A H H H ¯ A Hr G H (cid:0) σ I N + θ r σ r G ¯ A r ¯ A Hr G H (cid:1) − G ¯ A r H ¯ A , where we have replaced F by G A r H A as deﬁned in (4) when obtaining the second equation andset A = √ θ ¯ A and A r = √ θ r ¯ A r with θ and θ r being power scalar parameters for A and A r ,respectively. Then, we can rewrite the MSE in (10) as ˆ J = Tr { E − } . It is easy to verify that enlarging θ can always increase the eigenvalues a i to decrease ˆ J . However, due to the power constraint at therelay, we also need to check how θ r affects ˆ J . By deﬁning β = 1 /θ r , we rewrite E as E = I N + A H H H ¯ A Hr G H (cid:0) βσ I N + σ r G ¯ A r ¯ A Hr G H (cid:1) − G ¯ A r H A . Then, we have d Tr( E − ) dβ = Tr {− E − d [ A H H H ¯ A Hr G H ( σ r G ¯ A r ¯ A Hr G H + βσ I N | {z } P ) − G ¯ A r H A ] E − } = Tr (cid:8) σ E − A H H H ¯ A Hr G H P − G ¯ A r H A E − (cid:9) > , where we have used the fact that both E and P are positive deﬁnite. Therefore, we conclude that ˆ J isa monotonically decreasing function with respect to θ r . It suggests that enlarging θ r can also increase a i and decrease ˆ J .Secondly, we show that if M ≥ N , the MSE J i can be made arbitrarily small by increasing the powerat both source and relay nodes. To this end, we simply assume that when increasing the power at S (i.e., increasing the scalar θ ), the relay just increases its power to keep θ r unchanged. Thus, similar to(11), we have ˆ J = N X n =1

11 + θ ¯ a n , (12)where ¯ a n , n = 1 , , · · · , N , are the eigenvalues of ¯ A H H H A Hr G H QG A r H ¯ A . For an arbitrarily small ǫ , by deﬁning ¯ a min = min { ¯ a , ¯ a , · · · , ¯ a N } , we can always have ˆ J = N X i =1

11 + θ ¯ a i ≤ N θ ¯ a min ≤ ǫ (13)if θ ≥ N/ǫ − a min .On the other hand, if M < N , the maximum rank of the term F H QF in ˆ J is M . Assuming that the M non-zero eigenvalues of F H QF are denoted by { b , b , · · · , b M } , the resultant ˆ J can be expressedas ˆ J = M X n =1

11 + b n + ( N − M ) . No matter how much power is provided at the source and relay nodes, ˆ J is always lower bounded by N − M . The same bound holds for ˆ J . Theorem 1 is thus proven.We now take a closer look at the problem (7), which can be proven to be non-linear and non-convexand hence is difﬁcult to solve. To make the problem tractable, we propose an iterative algorithm whichdecouple the primal problem into three sub-problems and solve each of them in an alternating optimizationapproach.First, given the precoding matrices at the source and relay nodes, i.e., A , A and A r , we try to ﬁndthe optimal decoder matrices W and W . Since the power constraints in (1) and (2) are not related to W and W , we simply get an unconstrained optimization problem min W , W J w + J w (14)where J w i = Tr (cid:8) W i R w i W Hi − W i F i − F Hi W Hi + I N (cid:9) , i = 1 , , with R w i given in (9). Since R w i , i = 1 , , is positive deﬁnite, the objective function in (14) is convex with respect to W i . Therefore, ap-plying the Karush-Kuhn-Tucker (KKT) conditions, we obtain the optimal decoding matrices as describedin (8) by equating the gradient of objective function in (14) to zero.Second, we consider the optimization of the relay precoding matrix A r by assuming that W i , A i , i = 1 , , are ﬁxed. From (6), this sub-problem is equivalent to min A r J r + J r (15) s.t. Tr (cid:8) A r R x A Hr (cid:9) ≤ τ r (16)where J r i is obtained by replacing F i in (6) with G i A r H ¯ i A ¯ i as deﬁned in (4) and using the circularproperty of trace operator Tr { AB } = Tr { BA } , given by J r i =Tr (cid:8) G Hi W Hi W i G i A r R x ¯ i A Hr − H ¯ i A ¯ i W i G i A r − G Hi W Hi A H ¯ i H H ¯ i A Hr + σ i W i W Hi + I N (cid:9) , i = 1 , (17)with R x i = H i A i A Hi H Hi + σ r I M , and (16) refers to the relay power constraint deﬁned in (2) with R x = H A A H H H + H A A H H H + σ r I M . Note the source power constraints (1) are irrelevant here since A and A are ﬁxed. Lemma 1 : The problem of relay precoding design given source precoders and decoders for Total-MSEminimization in the considered ( N, M, N ) MIMO two-way relay system as formulated in (15) is convex.

Proof:

Please refer to Appendix A.Due to the convexity of the problem (15), we can readily design the optimal relay precoder byemploying the KKT conditions. Speciﬁcally, the Lagrangian function of (15) is given as L = J r + J r + λ (cid:0) Tr (cid:8) A r R x A Hr (cid:9) − τ r (cid:1) , where λ ≥ is the Lagrangian multiplier. Thus, the KKT conditions are ∂ L ∂ A ∗ r = R r A r R x + R r A r R x − R r + λ A r R x = , (18) λ (cid:0) Tr (cid:8) A r R x A Hr (cid:9) − τ r (cid:1) = 0 , (19) Tr (cid:8) A r R x A Hr (cid:9) ≤ τ r , (20) where R r = G H W H A H H H + G H W H A H H H and R r i = G Hi W Hi W i G i , i = 1 , . To obtain (18),the differentiation rule ∂ Tr { ZA Z H A } ∂ Z ∗ = A ZA in [29] is applied.Based on (18) we further obtain A optr = mat n(cid:2) R Tx ⊗ R r + R Tx ⊗ R r + λ R Tx ⊗ I M (cid:3) − vec ( R r ) o . (21)In the special case when λ = 0 , we have A optr = mat n(cid:2) R Tx ⊗ R R + R Tx ⊗ R R (cid:3) − vec ( R r ) o . (22)If A optr in (22) meets the condition (20), then (22) is the optimal relay precoder. Otherwise, λ in (21)should be chosen to satisfy Tr (cid:8) A r R x A Hr (cid:9) = τ r . Lemma 2 : The function g ( λ ) = Tr (cid:8) A r R x A Hr (cid:9) , with A r given by (21), is monotonically decreasingwith respect to λ and the optimal λ is upper-bounded by q R r R − x R Hr τ r . Proof:

Please refer to Appendix B.With Lemma 2, the optimal λ meeting the condition Tr (cid:8) A r R x A Hr (cid:9) = τ r can be readily obtainedusing bisection search.The third sub-problem is to optimize the source precoder A i for ﬁxed A r and W i , i = 1 , . This isformulated as: min A , A J s + J s (23) s.t. Tr (cid:8) A i A Hi (cid:9) ≤ τ i , i = 1 , (cid:8) R p A A H + R p A A H (cid:9) ≤ τ ′ r (24)where τ ′ r = τ r − σ r Tr (cid:8) A r A Hr (cid:9) , R p i = H Hi A Hr A r H i , i = 1 , and J s i = Tr (cid:8) R s i A ¯ i A H ¯ i − ℜ ( R s i A ¯ i ) + R s i (cid:9) , i = 1 , (25)with R s i = H H ¯ i A Hr G Hi W Hi W i G i A r H ¯ i , R s i = W i G i A r H ¯ i , R s i = σ r W i G i A r A Hr G Hi W Hi + σ i W i W Hi + I N . To obtain (25), the circular property of trace operator is again applied for (6).It is noted that the change of source precoders can affect the power constraint at the relay. Hence, therelay power constraint should be included as (24) in (23). By applying the conclusion derived in LemmaA (given in Appendix A), we can also prove that the optimization problem (23) is convex. Lemma 3 : The optimization problem in the form of (23) can be transformed into a convex quadraticallyconstrained quadratic program (QCQP) problem.

Proof:

Please refer to Appendix C.A QCQP problem can be efﬁciently solved by applying the available software package [30].In summary, we outline the iterative precoding design algorithm as follows:

Algorithm 1 (Iterative precoding) • Initialize A , A and A r • Repeat–

Update the decoder matrices W and W using (8) for ﬁxed A , A and A r ; – Update the relay precoder matrices A r using (21) or (22) for ﬁxed A , A , W and W ; – For ﬁxed A r , W and W , solve the convex QCQP problem to get the optimal A and A as in Appendix C; • Until termination criterion is satisﬁed.

Theorem 2 : The proposed iterative precoding design algorithm, Algorithm 1, is convergent and thelimit point of the iteration is a stationary point of (7).

Proof:

Since in the proposed algorithm, the solution for each subproblem is optimal, the Total-MSEis decreased with each iteration. Meanwhile, the Total-MSE is lower bounded (at least by zero). Hence,the proposed algorithm is convergent. It further means that there must exist a limit point, denoted as (cid:8) ¯W i , ¯A i , i = 1 , , ¯A r (cid:9) , after the convergence. At the limit point, the solutions will not change if wecontinue the iteration. Otherwise, the Total-MSE can be further decreased and it contradicts the assumptionof convergence. Since ¯W i , ¯A i ( i = 1 , ) and ¯A r are local minimizers for each subproblem, we have Tr { ▽ W i J w ( ¯W i ; ¯A i , ¯A r , i = 1 , T ( W i − ¯W i ) } ≥ , Tr { ▽ A r J r ( ¯A r ; ¯A i , ¯W i , i = 1 , T ( A r − ¯A r ) } ≥ , Tr { ▽ A i J s ( ¯A i ; ¯W i , ¯A r , i = 1 , T ( A i − ¯A i ) } ≥ , where J w = J w + J w , J r = J r + J r and J s = J s + J s . Summing up all the above equations, we get Tr { ▽ X J ( ¯X ) T ( X − ¯X ) } ≥ , (26)where J = J + J and X = [ W , W , A , A , A r ] . Result (26) implies the stationarity of ¯X of (7) bydeﬁnition. Here, A i and A r can be randomly generated complex matrices or set as identity matrices, as long as they satisfy the givenpower constraints. Remark 1 : In this work, the precoders are designed to minimize the Total-MSE of all the data streamsof two users. This may lead to unbalanced MSE distribution among the data streams. In general, theoverall error performance is dominated by the data stream with the highest MSE [28]. Therefore, analternative objective is to minimize the maximum per-stream MSE among all the data streams in orderto improve the overall performance. Nevertheless, in [28], it has been proven that the min-max MSEproblem can be solved through the Total-MSE minimization. Speciﬁcally, the solutions to the min-maxproblem can be obtained by multiplying the source precoder A i of the Total-MSE problem with a rotationmatrix to make MSE matrix with equal diagonal entries.IV. L OW -C OMPLEXITY P RECODING D ESIGN B ASED ON C HANNEL P ARALLELIZATION

The iterative precoding design algorithm presented in Section III obtains good performance as veriﬁedin Section VI, but also has high computational complexity. In this section, we propose a new precodingdesign that offers a good balance between performance and complexity.It has been proven in [24]–[26], [31]–[33] that the optimal precoding structure in one-way relayingis to ﬁrst parallelize the channels between the source and the relay, as well as between the relay andthe destination using singular value decomposition (SVD) and then match the eigen-channels in the twohops. Taking the transmission of single data stream in a one-way relay system for example as consideredin [34] and [35], the idea of channel matching is as follows. The source should use the dominant rightsingular vector of the channel in the ﬁrst hop as beamformer to transmit its signal. After receiving thesignal from the source, the relay should ﬁrst multiply it with the dominant left singular vector of thesame channel and then transmit it through the dominant right singular vector of the channel in the secondhop.Motivated by the ﬁndings in [24]–[26], [31]–[35], we aim to design A , A and A r so as tosimultaneously parallelize the bidirectional links in the MIMO two-way relay system. In the following,we introduce a heuristic channel parallelization method for bidirectional communications by using twojoint channel decomposition methods, namely, generalized singular value decomposition (GSVD) for theMAC phase and SVD for the BC phase. Using this method we then reduce the precoder design to asimple power allocation problem. A. Channel Parallelization

The major task of simultaneously parallelizing the bidirectional links is to jointly decompose theforward channel matrix pair { H , H } in the MAC phase and the backward channel matrix pair { G , G } in the BC phase. To do so, we ﬁrst apply the GSVD technique for the MAC channels. The GSVD iselaborated in the lemma below. Lemma 4 [36]: Assuming A ∈ C m × n and B ∈ C m × n , m ≤ n ≤ m are two full-rank matrices thatsatisfy Rank  AB  = n , there exist two m × m unitary matrices U A , U B and an n × n non-singularmatrix V which make A = U A Σ A V , B = U B Σ B V , where Σ A = [ m × ( n − m ) , Λ A ] , Σ B = [ Λ B , m × ( n − m ) ] and they satisfy Σ TA Σ A + Σ TB Σ B = I n . Here Λ A and Λ B are two m × m non-negative diagonal matrices.By applying Lemma 4 onto the channel pair { H H , H H } , H and H can be expressed as H = V h Σ h U Hh , H = V h Σ h U Hh , (27)where V h is a non-singular M × M complex matrix, U h and U h are two N × N unitary matrices, Σ h = h T ( M − N ) × N , Λ Th i T and Σ h = h Λ Th , T ( M − N ) × N i T where Λ h and Λ h are two N × N non-negative diagonal matrices. If the relay precoder A r contains V h − at the right side and A i has U h i atthe left side, we can parallelize the two forward channels in the MAC phase.For the BC phase, since the superimposed signal should be simultaneously transmitted to two desti-nations, we construct one virtual point-to-point MIMO channel as G = (cid:2) G T , G T (cid:3) T . By imposing SVDtechnique on G , we have G = V g Σ g U Hg , (28)where V g and U g are N × N and M × M unitary matrices, respectively. Σ g = h Λ Tg , T (2 N − M ) × M i T where Λ g is an M × M non-negative diagonal matrix. If A r contains U g at its left side, the virtualpoint-to-point MIMO channel G is parallelized in the BC phase. Accordingly, we can rewrite G and G as G = V g Σ g U Hg , G = V g Σ g U Hg , where V g = V g (1 : N, N ) and V g = V g ( N + 1 : 2 N, N ) . Note that V g and V g no longerhave the unitary property.We now readily propose the following structure for the three precoders: A = U h Λ A V A , A = U h Λ A V A , A r = U g Λ A r V − h , (29) To apply Lemma 4, we here assume that M ≤ N . where V A and V A are arbitrary unitary matrices, Λ A , Λ A and Λ A r are N × N , N × N and M × M real diagonal matrices, respectively, to be optimized in the next subsection.The received signal in (4) can therefore be rewritten as y i = V g i Σ g Λ A r Σ h ¯ i Λ A ¯ i ˜ s ¯ i + V g i Σ g Λ A r ˜ n r + n i , i = 1 , (30)where ˜ s i = V A i s i and ˜ n r = V − h n r . Note that V A i being unitary, it does not affect the statisticalproperty of s i nor the designed precoders. Given M > N , since Σ h = h T ( M − N ) × N , Λ Th i T and Σ h = h Λ Th , T ( M − N ) × N i T as given by the GSVD, the effective channel gains for the N data streams of twosources can not be matched simultaneously. In other words, the gain of a certain data stream for S maybe very strong while the gain of the corresponding data stream for S can be very weak. To avoid suchunbalance, if not speciﬁed otherwise, we only consider the case with M = N where all the channelgains can be utilized for transmission of both users in the following of this section. Then, (30) turns to y i = ˜V g i Λ g Λ A r Λ h ¯ i Λ A ¯ i ˜ s ¯ i + ˜V g i Λ g Λ A r ˜ n r + n i , i = 1 , where ˜V g = V g (1 : N, N ) and ˜V g = V g ( N +1 : 2 N, N ) , and Λ k , for k ∈ { A , A , A r , g, h , h } ,is an N × N non-negative diagonal matrix. B. Joint Power Allocation

Based on the precoder structures proposed in (29), in this subsection we discuss the joint optimizationof Λ A , Λ A and Λ A r to minimize the Total-MSE of the two users. By substituting (29) into (10), werewrite ˆ J i as ˆ J i = Tr (cid:26)h I N + ( Λ A ¯ i Λ h ¯ i Λ A r Λ g ) (cid:0) σ i B g i + σ r Λ g Λ A r B h Λ A r Λ g (cid:1) − ( Λ g Λ A r Λ h ¯ i Λ A ¯ i ) i − (cid:27) , (31)where B g i = (cid:16) ˜V Hg i ˜V g i (cid:17) − and B h = (cid:0) V Hh V h (cid:1) − . It is found that, although ˆ J i , i = 1 , has beensimpliﬁed, the MSE covariance matrices are still non-diagonal. Solving the optimization problem directlybecomes difﬁcult. However, we can resort to a tractable upper bound on the MSE to simplify the problem. Lemma 5 : An upper bound of ˆ J i deﬁned in (31) is given by ˆ J i ≤ Tr (cid:26)h I N + ( Λ A ¯ i Λ h ¯ i Λ A r Λ g ) (cid:0) σ i Λ Bg i + σ r Λ g Λ A r Λ Bh Λ A r Λ g (cid:1) − ( Λ g Λ A r Λ h ¯ i Λ A ¯ i ) i − (cid:27) , (32)where Λ Bg i and Λ Bh are two diagonal matrices that contain the diagonal entries of B g i and B h ,respectively. Proof:

Please refer to Appendix D. The MSE upper bound matrix in (32) has a diagonal structure. Therefore, we can minimize the upperbound to design the precoders. By further assuming P k = Λ k for k ∈ { A , A , A r , g, h , h } , the upperbound in Lemma 5 denoted as J ui can be reformulated as J ui = N X n =1 p ng p nA r p nh ¯ i p nA ¯ i σ i λ nBg i + σ r λ nBh p ng p nA r ! − , i = 1 , (33)where p nk ’s are the diagonal entries of P k and λ nk ’s with k ∈ { Bh, Bg , Bg } are the diagonal entries of Λ k . It is interesting to ﬁnd that J ui is the Total-MSE of each sub-parallelized channel after zero forcing y i by V − g i .Finally, the precoder design can be simpliﬁed to the optimization problem as follows: min p nA ,p nA ,p nAr , ∀ n J u + J u (34) s.t. N X n =1 p nA ≤ τ , N X n =1 p nA ≤ τ , p nA ≥ , p nA ≥ , p nA r ≥ N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) ≤ τ r Compared with the original objective function in (31), the expression in (34) exhibits a simpler form and ismore analytically tractable. Nevertheless, the problem (34) is still a non-convex optimization problem. Inthe following, we apply the iterative approach to convert the problem (34) into two convex sub-problems.

1) Sub-problem 1:

For given p nA and p nA , ∀ n , we formulate the following problem as follows to getthe optimal P A r min p nAr , ∀ n J u + J u (35) s.t. N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) ≤ τ r , p nA r ≥ , ∀ n By verifying ∂ J ui ∂p nA r = 2 σ i λ nBg i p nh ¯ i p ng p nA ¯ i (cid:16) σ r λ nBh p ng + p ng p nh ¯ i p nA ¯ i (cid:17)h σ i λ nBg i + p nA r (cid:16) σ r λ nBh p ng + p ng p nh ¯ i p nA ¯ i (cid:17)i > , i = 1 , we conclude that this sub-problem is convex. Based on the KKT conditions (details presented in Ap-pendix E), we derive the water-ﬁlling solution p nA r = max [0 , Root ( f )] , ∀ n (36)where Root ( f ) denotes the maximum real root of the equation f which is given by µ (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) = X i =1 σ i λ nBg i p nh ¯ i p ng p nA ¯ i h σ i λ nBg i + p nA r (cid:16) σ r λ nBh p ng + p ng p nh ¯ i p nA ¯ i (cid:17)i , (37) and the variable µ should be chosen to satisfy N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) = τ r .

2) Sub-problem 2:

For given p nA r , ∀ n , we obtain p nA and p nA by solving the optimization problem asfollows: min p nA ,p nA , ∀ n J u + J u (38) s.t. N X n =1 p nA ≤ τ , N X n =1 p nA ≤ τ , p nA ≥ , p nA ≥ , ∀ n N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) ≤ τ r Also by verifying ∂ J u ∂p nA i = 2 (cid:16) σ i λ nBg ¯ i + σ r λ nBh p ng p nA r (cid:17) (cid:0) p nh i p ng p nA r (cid:1) h σ i λ nBg ¯ i + σ r λ nBh p ng p nA r + p nh i p ng p nA r p nA i i > , i = 1 , the sub-problem (38) is still convex. However, a closed-form solution to this problem is generally notavailable. Some standard numerical methods, such as interior-point method, can be used to get theoptimum solution.The solutions in Sub-problem 1 and

Sub-problem 2 show that P A r , P A and P A are tightly coupled.Thus, we apply an iterative approach to ﬁnd the ﬁnal solution. As veriﬁed by our simulation, the algorithmconverges in only a few iterations. After obtaining Λ A , Λ A and Λ A r from the square root of P A , P A and P A r , we substitute them into (29) to get the precoders.The overall algorithm is outlined as follows: Algorithm 2 (Channel parallelization based precoding) • Decompose the channel pairs { H , H } and { G , G } by using (27) and (28), respectively, to get Λ h , Λ h , Λ g , B g , B g and B h . • Repeat–

Update the relay power allocation p nA r using (36) to get Λ A r ; – Update the source power allocation p nA and p nA by solving (38) to get Λ A and Λ A ; • Until termination criterion is satisﬁed. • Substitute the solved Λ A , Λ A and Λ A r into (29) to get the precoders A , A and A r . V. S

OURCE - ANTENNA - SELECTION BASED P RECODING FOR S INGLE D ATA S TREAM

In this section, we consider the precoding design for the special case where only a single data streamis transmitted from each source. The iterative approach proposed in Section III can be applied directly,except that the source precoding matrices reduce to beamforming vectors. In what follows, we introducea new precoding strategy based on antenna selection at two sources. Antenna selection can be viewed as aspecial case of beamforming. In general, it is computationally less complex and requires lower feedbackoverhead. This motivates us to consider the source antenna selection while using precoding at the relaynode only.For single-data-stream transmission, the received signals y i given in (4) at each destination node issimpliﬁed as y i = √ τ ¯ i G i A r h ¯ in s ¯ i + G i A r n r + n i , i = 1 , where h ¯ in is the selected forward channel vectors for S ¯ i in the MAC phase. After decoding by w i , thecorresponding MSE at S i is denoted as J i = w Hi G i A r R x ¯ i A Hr G Hi w i − √ τ ¯ i w Hi G i A r h ¯ in − √ τ ¯ i h H ¯ in A Hr G Hi w i + σ i w Hi w i + 1 , i = 1 , where R x i = τ i h in h Hin + σ r I M . Thus, for a given selected antenna pair { h n , h m } , the optimizationproblem is formulated as min A r , w , w J + J s.t. Tr (cid:8) A r (cid:0) τ h n h H n + τ h m h H m + σ r I M (cid:1) A Hr (cid:9) ≤ τ r Next, we take two steps to solve w , w and A r , respectively. First, for ﬁxed A r , the optimal w i isdenoted as w opti = (cid:2) G i A r R x ¯ i A Hr G Hi + σ i I M (cid:3) − G i A r h ¯ in , i = 1 , . (39)Subsequently, for ﬁxed w and w , we obtain the optimal A r as A optr = mat (cid:8) R Tx ⊗ (cid:0) G H w w H G (cid:1) + R Tx ⊗ (cid:0) G H w w H G (cid:1) + µ R Tx ⊗ I M (cid:9) − vec { M } , (40)where R x = τ h n h H n + τ h m h H m + σ r I M , M = √ τ G H w h H n + √ τ G H w h H m and µ ∈ [0 , r Tr { MR − x M H } τ r ] is chosen to satisfy the KKT conditions. The derivation is similar to the steps derived in Section III, andhence omitted for brevity. In summary, we outline the algorithm as follows: Algorithm 3 (Source antenna selection (SAS)-based precoding) • For each source antenna pair { h n , h m } , ∀ n, m – Initialize A r randomly or as q τ r Tr { R x } I M with R x = τ h n h H n + τ h m h H m + σ r I M – Repeat ∗ Update the decoding vector by (39) for ﬁxed A r ; ∗ Update the relay precoder by (40) for ﬁxed w and w ; – Until termination criterion is satisﬁed. • End choose the source antenna pair and the corresponding w , w and A r that lead to the minimal Total-MSE J + J . Remark 2 : Compared with the three-step iterative precoding algorithm, Algorithm 1, the SAS-basedprecoding algorithm, Algorithm 3, only needs two steps in each iteration. Additionally, the closed-formsolution can be employed in each iteration. Thus, no advanced software package is needed here.VI. S

IMULATION RESULTS AND D ISCUSSIONS

In this section, we present some simulation examples to evaluate the proposed precoding designs. Thechannel is set to be Rayleigh fading, i.e., the elements of each channel matrix are complex Gaussianrandom variables with zero mean and unit variance. For simplicity, we consider the reciprocal channelwhere G = H T and G = H T (our algorithm is suitable for the general case where G i are H i areindependent). The noise powers at two destinations are set to be equal to each other, i.e., σ = σ = σ .The average signal-to-noise ratios (SNRs) for the MAC phase and BC phase are deﬁned as ρ = τ σ r , ρ = τ σ r and ρ r = τ r σ , respectively. The average bit error rate (BER) using quadrature phase-shift keying(QPSK) modulation is simulated. A. Convergence and Robustness of the Proposed Iterative Algorithm

Fig. 2 illustrates the convergence behavior of the iterative algorithm presented in Section III as thefunction of SNR at N = M = 2 . It is found that, in the low SNR regime, the iterative algorithm convergeswithin iterations. With medium SNR, it converges after about iterations. While in the high SNRregime, iterations are always enough.Since the proposed iterative precoding algorithm only ﬁnds the local optimal solution due to non-convexity of the primal problem, different initialization points may result in different convergent solutions.Fig. 3 and Fig. 4 show performance comparison with different initialization points at N = M = 2 and N = M = 4 , respectively. Here, “Identity” means that the algorithm is initialized by the identity matrix,while “Random N ” means that N randomly generated initialization points are tried and the one withthe best performance is ﬁnally chosen. We observe that the BER performance gain by choosing the bestout of different initialization points is minimal. We thus conclude that the proposed iterative precoding algorithm is robust to the initialization points and hence near optimal. For the rest of the simulation, the“Identity” initialization point is adopted unless speciﬁed otherwise. B. Performance Comparison for Multi-data-stream Transmission

In Fig. 5 and Fig. 6, we show the MSE and BER performance comparison of the proposed iterativeprecoding design and the channel-parallelization based precoding design (CP-precoding) as the functionof ρ = ρ = ρ r at N = M = 2 . For comparison, the CP-precoding design with uniform power allocation(uniform CP-precoding), i.e., equal power distribution among all data streams, is also simulated. We ﬁndthat with both the iterative precoding and the CP-precoding, the system BER decreases considerably whenSNR increases. This demonstrates the effectiveness of the proposed precoding designs. We also ﬁnd thatthe uniform CP-precoding only achieves marginal gain over the non-precoding case. This is due to thefact that uniform power allocation can lead to unfair channel gain distribution among the data streams,and the system BER performance is dominated by the poorest sub-channel. We thus conclude that itis essential to optimize the power allocation among data streams for the channel-parallelization basedprecoding design. From Fig. 5 and Fig. 6, it is observed that the iterative precoding designs exhibits thebest performance among all the proposed precoding designs. We attribute the performance improvementto not enforcing any structure on the precoders.Fig. 7 illustrates the BER performance comparison at different relay antenna number M when the sourceantenna number is ﬁxed at N = 2 . We ﬁnd that increasing the relay antennas signiﬁcantly enhances theBER performance thanks to the increased diversity gain. Moreover, the gain of the proposed precodingscheme over the non-precoding scheme increases dramatically as the number of relay antennas increases.It further implies that when the relay node has more antennas than the source nodes, conducting theprecoding is more beneﬁcial.Finally, the performance comparison between the proposed iterative joint source/relay precoding andthe relay precoding scheme in [14] is depicted in Fig. 8 at N = 2 . Since the antenna conﬁguration in[14] should satisfy the condition M ≥ N , we choose M = 4 and in the simulation. It is shownthat, by applying either MMSE or ZF receiver, the proposed joint source/relay precoding signiﬁcantlyoutperforms the scheme in [14] where precoding is applied at the relay only. This implies that in two-wayrelay systems, precoding at the source nodes is very helpful in improving the system performance. Itis also found that both MMSE and ZF receivers obtain almost the same performance for the proposedprecoding algorithm. C. Performance Comparison for Single-data-stream Transmission

In Fig. 9, we show the BER performance for single-data-steam transmission. Here, the proposediterative precoding (proposed ite-precoding) and the source-antenna-selection based precoding (proposedSAS-precoding) are simulated. We ﬁnd that the performance gained through precoding is more signiﬁcantfor the single-data-stream transmission than for the multi-data-stream transmission. This is because thereis no interference from other data streams. In addition, with the “Identity” initialization point, the SAS-precoding almost has the same performance as “Random ” and “Random ” cases , and it outperformsthe ite-precoding method with both “Identity” and “Random ” initialization point although it needslower feedback overhead. The reason is that the optimal beamforming vector at each source cannotbe obtained due to the non-convexity nature of the joint optimization problem, while by exhaustivelysearching the most suitable source antenna pair, the SAS-precoding design can achieve better performance.However, as the number of randomly generated initialization points increases, the ite-precoding designstarts to outperform the SAS-precoding design , as the ite-precoding design is approaching the optimalsolution. Moreover, it is shown that the “Random ” ite-precoding design scheme and the “Random ”ite-precoding design scheme almost obtain the same performance. However, such optimal approachingsolution has substantially higher computational complexity and may not be practical for implementation.VII. C ONCLUSIONS

In this paper, we studied the joint source/realy precoding design for AF MIMO two-way relay systemsbased on the MSE criterion. An iterative method was ﬁrst proposed to obtain the local optimal solutionsfor the Total-MSE minimization. Then, for the scenario in which all nodes are equipped with thesame number of antennas, we proposed a channel-parallelization based precoding design algorithm toparallelize the channels in both MAC and BC phases. By doing so, the joint precoder design is reducedto a simple power allocation problem. It was shown that the iterative precoding design outperforms thechannel-parallelization based precoding design since no structure constraint is enforced on the precoders.Although the channel-parallelization method obtains degraded performance, it on the other hand reducesthe computational complexity. When single data stream is transmitted from each source, the precoding It implies that the “Identity” relay precoding matrix is usually a good initialization point as in the multi-data-stream case. For ite-precoding method, only the relay precoder is the matrix, while two source precoders is actually vectors. Here, withslight confused using of the notation, “Identity” source precoder means the vector with equal entries. Note here it is different from the multi-data-stream iterative precoding, we ﬁnd that the “Identity” source precoding vectoris not a good initialization point. at source nodes can be replaced by the antenna selection. By this way, the system feedback overhead isreduced and no advanced software package is needed. Simulation results showed that all the proposedprecoding designs are effective compared with conventional schemes.A PPENDIX AP ROOF OF LEMMA J r + J r can be veriﬁed by showing that J r and J r are both convex. We take J r as the example to illustrate the proof and the extension to J r is straightforward. For notation simplicity, we deﬁne R = G H W H W G , R = H A W G and a = Tr (cid:8) σ W W H + I N (cid:9) . By applying matrix manipulations in [36, Eq.1.10.62, Eq.1.10.64], J r canbe reformulated as J r = a Hr (cid:0) R Tx ⊗ R (cid:1) a r + vec ( R T ) T a r + a Hr vec ( R H ) + a, where a r = vec ( A r ) . Based on the vector differential rule in [29], four Hessian matrices as deﬁned in[37] are derived as H a ∗ r , a r J r = ( R Tx ⊗ R ) T , H a r , a ∗ r J r = R Tx ⊗ R , H a r , a r J r = , H a ∗ r , a ∗ r J r = . In order to show the convexity of J r , the following block matrix should be positive semideﬁnite H ( J r ) =  R Tx ⊗ R (cid:0) R Tx ⊗ R (cid:1) T  . Before conﬁrming the positive semideﬁnition of H ( J r ) , we introduce the following lemma. Lemma A : The Kronecker product of any two positive semideﬁnite matrices is also positive semidef-inite.

Proof:

Let Z and Z be any two positive semideﬁnite matrices. We can decompose them into Z = Z Z and Z = Z Z where Z and Z are also both positive semideﬁnite matrices. Applyingthe rule AB ⊗ CD = ( A ⊗ C )( B ⊗ D ) , we have Z = Z ⊗ Z = (cid:16) Z Z (cid:17) ⊗ (cid:16) Z Z (cid:17) = (cid:16) Z ⊗ Z (cid:17) (cid:16) Z ⊗ Z (cid:17) . Since Z ⊗ Z is Hermitian, we conclude that matrix Z is positive semideﬁnite.By applying Lemma A, we derive that the matrix R Tx ⊗ R is positive semideﬁnite since both R Tx and R are positive semideﬁnite. Then, H ( J r ) is positive semideﬁnite. Hence, the convexity of J r isproven. The same result holds for J r . Thus we conclude that the objective function J r + J r is convex. Next, we prove that the feasible set provided by Tr (cid:8) A r R x A Hr (cid:9) ≤ τ r is convex. This can bealternatively proven by checking the convexity of the function f = Tr (cid:8) A r R x A Hr (cid:9) [38]. Similar tothe previous manipulation, f can be reexpressed as f = a Hr ( R Tx ⊗ I M ) a r . In addition, the correspondingfour Hessian matrices are derived as H a ∗ r , a r f = ( R Tx ⊗ I M ) T , H a r , a ∗ r f = R Tx ⊗ I M ←− , H a r , a r f = , H a ∗ r , a ∗ r f = . Applying Lemma A, we can also show that the block matrix H ( J f ) is positive semideﬁnite. Thus, wederive that the feasible set in (15) is convex. Since both the objective function and the feasible set areconvex, the optimization problem (15) is a convex problem.A PPENDIX BP ROOF OF LEMMA λ , it is easy to verify that g decreases with λ . Next we mainly focus on deriving the upper bound of λ . To this end, we ﬁrst assumethat R r can be divided into two parts as R r = Q + Q , and let Q = R r A optr R x + R r A optr R x , Q = λ opt A optr R x , (41)where A optr , λ opt are the optimal primal and dual solutions of (15). Applying (41), we have A optr = 1 λ opt Q R − x . (42)Substituting (42) into the power constraint (20) to make the equality satisﬁed, it has Tr n A optr R x A optr H o = Tr (cid:26) λ opt Q R − x R x R − x Q H (cid:27) = Tr (cid:26) λ opt Q R − x Q H (cid:27) = τ r . On the other hand, we have Tr (cid:26) λ opt R r R − x R Hr (cid:27) = Tr (cid:26) λ opt ( Q + Q ) R − x ( Q + Q ) H (cid:27) = Tr (cid:26) λ opt Q R − x Q H (cid:27) + Tr (cid:26) λ opt Q R − x Q H (cid:27) + Tr (cid:26) λ opt Q R − x Q H (cid:27) + Tr (cid:26) λ opt Q R − x Q H (cid:27) . (43)Since if Z , Z are positive semideﬁnite, it has Tr { Z Z } ≥ . We thus conclude that Tr (cid:8) λ opt Q R − x Q H (cid:9) in (43) larger than or at least equal to zero. Next we prove Tr (cid:8) λ opt Q R − x Q H (cid:9) ≥ . Based the deﬁnitionin (41), it has Tr n Q A optr H o = Tr n R r A optr R x A optr H + R r A optr R x A optr H o ≥ . (44) Substituting (42) into (44), we obtain Tr n Q A optr H o = Tr (cid:26) λ opt Q R − x Q H (cid:27) . Thus, we conclude that Tr (cid:8) λ opt Q R − x Q H (cid:9) ≥ (the same for Tr (cid:8) λ opt Q R − x Q H (cid:9) ). Since all termsin (43) are larger than or at lease equal to zero, we conclude Tr (cid:26) λ opt R r R − x R Hr (cid:27) ≥ τ r . Thus, the proof of Lemma 2 is completed. A

PPENDIX CP ROOF OF L EMMA Tr { ABCD } = (cid:0) vec ( D ) T (cid:1) T (cid:0) C T ⊗ A (cid:1) vec ( B ) in [36], J s i can be reformulated as J s i = ˆ a H ¯ i ˆ P i ˆ a ¯ i − ℜ n ˆ b Ti ˆ a ¯ i o + Tr { R s i } , i = 1 , (45)where ˆ P i = I N ⊗ R s i , ˆ b i = vec ( R Ts i ) and ˆ a i = vec ( A i ) . Again it is known that ˆ P i is a positivesemideﬁnite matrix from Lemma A. Thus, (45) can be transformed into J s i = || ˆ P i ˆ a ¯ i || − ℜ n ˆ b Ti ˆ a ¯ i o + Tr { R s i } , i = 1 , . (46)To further delete ℜ ( · ) operator, we redeﬁne a i = (cid:2) ℜ{ ˆ a Ti } , ℑ{ ˆ a Ti } (cid:3) T , i = 1 , and transform (46) into J s i = a T ¯ i P i a ¯ i − b Ti a ¯ i + Tr { R s i } , i = 1 , where P i = ˜ P Ti ˜ P i with ˜ P i =  ℜ n ˆ P i o −ℑ n ˆ P i o ℑ n ˆ P i o ℜ n ˆ P i o  , b i = h ℜ{ ˆ b Ti } , −ℑ{ ˆ b Ti } i T , i = 1 , . It iseasy to verify that P i is a positive semideﬁnite matrix. Similarly, for three power constraints, we have Tr { A Hi A i } = a Ti ˆ Q i a i with ˆ Q i = I N × N , i = 1 , , Tr (cid:8) R p A A H + R p A A H (cid:9) = a H ˆ Q a + a H ˆ Q a with ˆ Q = ˜ Q T ˜ Q , ˆ Q = ˜ Q T ˜ Q being two positive semideﬁnite matrices where ˜ Q and ˜ Q are denoted as ˜ Q =  ℜ n ( I N ⊗ R p ) o −ℑ n ( I N ⊗ R p ) o ℑ n ( I N ⊗ R p ) o ℜ n ( I N ⊗ R p ) o  , ˜ Q =  ℜ n ( I N ⊗ R p ) o −ℑ n ( I N ⊗ R p ) o ℑ n ( I N ⊗ R p ) o ℜ n ( I N ⊗ R p ) o  . Finally, by combing a and a as a = [ a T , a T ] T , the optimization (23) has the following form min a a T Pa − b T a + Tr { R s + R s } s.t. a T Q a ≤ τ , a T Q a ≤ τ , a T Q a ≤ τ ′ r where P =  P

00 P  , b = [2 b T , b T ] T , Q =  ˆ Q

00 0  , Q =  ˆ Q  and Q =  ˆ Q ˆ Q  . Since P and Q i , i = 1 , , , are positive semideﬁnite, then by deﬁnition the optimizationproblem (23) is transformed into the convex QCQP programming problem.A PPENDIX DP ROOF OF L EMMA ˆ J and ˆ J , we next focus on deriving the upper bound of ˆ J and thesimilar results will hold for ˆ J . By deﬁning C = σ B g + σ r Λ g Λ A r B h Λ A r Λ g , D = Λ g Λ A r Λ h Λ A ,the MSE in (31) is rewritten as ˆ J = Tr n(cid:2) I N + DC − D (cid:3) − o = Tr h I N − (cid:0) I N + D − CD − (cid:1) − i , where we have used the matrix inversion lemma (cid:0) I + A − (cid:1) − = I − ( I + A ) − . Since for any positivedeﬁnite square matrix A , it has Tr (cid:8) A − (cid:9) ≥ P i [ A ( i, i )] − [24], we thus have ˆ J ≤ N − N X i =1 (cid:2)(cid:0) I N + D − CD − (cid:1) ( i, i ) (cid:3) − = Tr h I N − (cid:0) I N + D − Λ C D − (cid:1) − i = Tr n(cid:2) I N + DΛ − C D (cid:3) − o . (47)Thus, Lemma 5 is proven. A PPENDIX ED ERIVING THE CONCLUSION IN (36)The Lagrangian function of (35) is given as L = N X n =1 " σ λ nBg + σ r λ nBh p ng p nA r σ λ nBg + σ r λ nBh p ng p nA r + p nh p ng p nA p nA r + σ λ nBg + σ r λ nBh p ng p nA r σ λ nBg + σ r λ nBh p ng p nA r + p nh p ng p nA p nA r + µ " N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) − τ r − N X n =1 β n p nA r , where µ and β n are Lagrangian multipliers. The resultant set of KKT conditions are obtained as ∂ L ∂p nA r = X i =1 − σ i λ nBg i p nh ¯ i p ng p nA ¯ i h σ i λ nBg i + p nA r (cid:16) σ r λ nBh p ng + p ng p nh ¯ i p nA ¯ i (cid:17)i + µ (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) − β n = 0 , (48) µ " N X n =1 p nA r (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) − τ r = 0 , β n p nA r = 0 , ∀ n, µ ≥ , β n ≥ , ∀ n (49)Based on (48) and (49), we have p nA r  µ (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) − X i =1 σ i λ nBg i p nh ¯ i p ng p nA ¯ i h σ i λ nBg i + p nA r (cid:16) σ r λ nBh p ng + p ng p nh ¯ i p nA ¯ i (cid:17)i  = p nA r β n = 0 . (50)To satisfy (50), we discuss the following cases:If µ (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) ≥ p nh p ng p nA σ λ nBg + p nh p ng p nA σ λ nBg , we must have p nA r = 0 .Else, µ (cid:0) p nh p nA + p nh p nA + σ r λ nBh (cid:1) < p nh p ng p nA σ λ nBg + p nh p ng p nA σ λ nBg , by combining the condition β n ≥ , (50)can only be fulﬁlled with p iA r > , This implies the equation (37) given earlier. Since (37) is a monotonicalfunction of p nA r within (0 , + ∞ ) , we choose the only positive root of (37) as p nA r . By combining twocases, we derive the conclusion in (36). R EFERENCES [1] P. Larsson, N. Johansson, and K.-E. Sunell, “Coded bi-directional relaying,” in

Proc. VTC’06, Spring , 2006.[2] S. Zhang, S. C. Liew, and L. L. Lam, “Physical layer network coding,” in

ACM SIGCOM’06 , Sept. 2006.[3] B. Rankov and A. Wittneben, “Spectral efﬁcient protocols for half-duplex fading relay channels,”

IEEE J. Sel. AreasCommun. , vol. 25, no. 2, pp. 379–389, Feb. 2007.[4] S. Katti, H. Rahul, W. Hu, D. Katabi, M. Medard, and J. Crowcroft, “XORs in the air: Practical wireless network coding,”

IEEE/ACM Trans. Netw. , vol. 16, no. 3, pp. 497–510, June 2008.[5] I. Hammerstrom, M. Kuhn, C. Esli, J. Zhao, A. Wittneben, and G. Bauch, “MIMO two-way relaying with transmit CSI atthe relay,” in

Proc. IEEE SPAWC’07 , 2007.[6] T. J. Oechtering, R. F. Wyrembelski, and H. Boche, “Multiantenna bidirectional broadcast channels— optimal transmitstrategies,”

IEEE Trans. Signal Process. , vol. 57, no. 5, pp. 1948–1958, May 2009.[7] C. Esli and A. Wittneben, “One- and two-way decode-and-forward relaying for wireless multiuser MIMO networks,” in

Proc. IEEE GLOBECOM’08 , 2008.[8] ——, “Multiuser MIMO two-way relaying for cellular communications,” in

Proc. IEEE PIMRC’08 , 2008.[9] R. Zhang, Y.-C. Liang, C. C. Chai, and S. Cui, “Optimal beamforming for two-way multi-antenna relay channel withanalogue network coding,”

IEEE J. Sel. Areas Commun. , vol. 27, no. 5, pp. 699–712, June 2009. [10] G. Li, Y. Wang, and P. Zhang, “Optimal linear MMSE beamforming for two way multi-antenna relay systems,” IEEECommun. Lett. , vol. 15, no. 5, pp. 533–535, May 2011.[11] C. Li, L. Yang, and W.-P. Zhu, “Two-way MIMO relay precoder design with channel state information,”

IEEE Trans.Commun. , vol. 58, no. 12, pp. 3358–3363, Dec. 2010.[12] K.-J. Lee, K. W. Lee, H. Sung, and I. Lee, “Sum-rate maximization for two-way MIMO amplify-and-forward relayingsystems,” in

Proc. IEEE VTC’09, Spring , 2009.[13] S. Xu and Y. Hua, “Optimal design of spatial source-and-relay matrices for a non-regenerative two-way MIMO relaysystem,”

IEEE Trans. Wireless Commun. , vol. 10, no. 5, pp. 1645–1655, May 2011.[14] T. Unger and A. Klein, “Duplex schemes in multiple antenna two-hop relaying,”

EURASIP Journal on Advances in SignalProcessing , 2008, DOI 101.1155/2008/128592.[15] J. Joung and A. H. Sayed, “Multiuser two-way amplify-and-forward relay processing and power control methods forbeamforming systems,”

IEEE Trans. Signal Process. , vol. 58, no. 3, pp. 1833–1846, March 2010.[16] ——, “User selection methods for multiuser two-way relay communications using space division multiple access,”

IEEETrans. Wireless Commun. , vol. 9, no. 7, pp. 2130–2136, July 2010.[17] Z. Ding, T. Wang, M. Peng, W. Wang, and K. K. Leung, “On the design of network coding for multiple two-way relayingchannels,”

IEEE Trans. Wireless Commun. , vol. 10, no. 6, pp. 1820–1832, June 2011.[18] S. Shahbazpanahi and M. Dong, “A semi-closed form solution to the SNR balancing problem of two-way relay networkbeamforming,” in

Proc. IEEE ICASSP’10 , 2010.[19] A. Schad, A. B. Gershman, and S. Shahbazpanahi, “Capacity maximization for distributed beamforming in one- andbi-directional relay networks,” in

Proc. IEEE ICASSP’11 , 2011.[20] V. Havary-Nassab, S. Shahbazpanahi, and A. Grami, “Optimal distributed beamforming for two-way relay networks,”

IEEETrans. Signal Process. , vol. 58, no. 3, pp. 1238–1250, March 2010.[21] M. Zeng, R. Zhang, and S. Cui, “On design of collaborative beamforming for two-way relay networks,”

IEEE Trans. SignalProcess. , vol. 59, no. 5, pp. 2284–2295, May 2011.[22] S. Shahbazpanahi and M. Dong, “Achievable rate region and sum-rate maximization for network beamforming for bi-directional relay networks,” in

Proc. IEEEICASSP’10 , 2010.[23] R. Vaze and R. W. Heath, “On the capacity and diversity-multiplexing tradeoff of the two-way relay channel,”

IEEE Trans.Inf. Theory , vol. 57, no. 7, pp. 4219–4234, July 2011.[24] R. Mo and Y. Chew, “MMSE-based joint source and relay precoding design for amplify-and-forward MIMO relay networks,”

IEEE Trans. Wireless Commun. , vol. 8, no. 9, pp. 4668–4676, Sept. 2009.[25] W. Guan and H. Luo, “Joint MMSE transceiver design in non-regenerative MIMO relay systems,”

IEEE Commun. Lett. ,vol. 12, no. 7, pp. 517–519, July 2008.[26] F.-S. Tseng, W.-R. Wu, and J.-Y. Wu, “Joint source/relay precoder design in nonregenerative cooperative systems using anMMSE criterion,”

IEEE Trans. Wireless Commun. , vol. 8, no. 10, pp. 4928–4933, Oct. 2009.[27] R. Hunger, M. Joham, and W. Utschick, “On the MSE-duality of the broadcast channel and the multiple access channel,”

IEEE Trans. Signal Process. , vol. 57, no. 2, pp. 698–713, Feb. 2009.[28] D. P. Palomar, J. M. Ciofﬁ, and M. A. Lagunas, “Joint Tx-Rx beamforming design for multicarrier MIMO channels: auniﬁed framework for convex optimization,”

IEEE Trans. Signal Process. , vol. 51, no. 9, pp. 2381–2401, Sept. 2003.[29] A. Hjorungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,”

IEEE Trans. SignalProcess. , vol. 55, no. 6, pp. 2740–2746, June 2007. [30] M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Programming . [Online] http://cvxr.com/cvx, July2010.[31] C. Li, X. Wang, L. Yang, and W.-P. Zhu, “A joint source and relay power allocation scheme for a class of MIMO relaysystems,”

IEEE Trans. Signal Process. , vol. 57, no. 12, pp. 4852–4860, Dec. 2009.[32] R. Mo and Y. Chew, “Precoder design for non-regenerative MIMO relay systems,”

IEEE Trans. Wireless Commun. , vol. 8,no. 10, pp. 5041–5049, Oct. 2009.[33] Z. Fang, Y. Hua, and J. C. Koshy, “Joint source and relay optimization for a non-regenerative MIMO relay,” in

Proc.Fourth IEEE Workshop Sensor Array and Multichannel Processing , 2006.[34] B. Khoshnevis, W. Yu, and R. Adve, “Grassmannian beamforming for MIMO amplify-and-forward relaying,”

IEEE J. Sel.Areas Commun. , vol. 26, no. 8, pp. 1397–1407, March 2008.[35] V. Havary-Nassab, S. Shahbazpanahi, and A. Grami, “General-rank beamforming for multi-antenna relaying schemes,” in

Proc. IEEE ICC ’09 , 2009.[36] X. Zhang,

Matrix analysis and applications . Tsinghua University Press, 2004.[37] A. Hjorungnes and D. Gesbert, “Hessians of scalar functions of complex-valued matrices: A systematic computationalapproach,” in

Proc. 9th Int. Symp. Signal Processing and Its Applications (ISSPA) 2007 , 2007.[38] S. Boyd and L. Vandenberghe,

Convex Optimization . Cambridge University Press, 2004. A s r A A s H G G H Fig. 1. Illustration of the MIMO two-way relay system. −3 −2 −1 ρ = ρ = ρ r (dB) BE R

10 iterations30 iterations40 iterations50 iterations

Fig. 2. Convergence behavior of the proposed iterative precoding algorithm.

10 15 20 25 3010 −4 −3 −2 −1 ρ = ρ = ρ r (dB) BE R IdentityRandom 5Random 10

Fig. 3. Performance comparison of iterative algorithm with different initialization points at N = M = 2 .

10 15 20 25 3010 −4 −3 −2 −1 SNR(dB) BE R IdentityRandom 5Random 10

Fig. 4. Performance comparison of iterative algorithm with different initialization points at N = M = 4 . −2 −1 ρ = ρ = ρ r (dB) M SE Nonprecoding schemeProposed Iterative precodingProposed CP−precodingProposed uniform CP−precoding

Fig. 5. The MSE performance comparison with ρ = ρ = ρ r at N = M = 2 . −4 −3 −2 −1 ρ = ρ = ρ r (dB) BE R Nonprecoding schemeProposed Iterative precodingProposed CP−precodingProposed uniform CP−precoding

Fig. 6. The BER performance comparison with ρ = ρ = ρ r at N = M = 2 . −4 −3 −2 −1 ρ = ρ = ρ r (dB) BE R Nonprecoding schemeProposed Iterative precoding M=3 M=4M=2

Fig. 7. The BER performance comparison for different relay antenna number. −4 −3 −2 −1 ρ = ρ = ρ r (dB) BE R Proposed Iterative precoding with ZFProposed Iterative precoding with MMSEZF in [14]MMSE in [14] M=4M=5

Fig. 8. The BER performance comparison with [14]. −4 −3 −2 −1 ρ = ρ = ρ r (dB) BE R Nonprecoding schemeProposed SAS−IdentityProposed SAS−Random 5Proposed SAS−Random 10Proposed Ite−IdentityProposed Ite−Random 1Proposed Ite−Random 5Proposed Ite−Random 10

Fig. 9. The BER performance comparison with ρ = ρ = ρ r at N = M = 2= 2