[PDF] Predictive Relay Selection: A Cooperative Diversity Scheme Using Deep Learning

Abstract

In this paper, we propose a novel cooperative multi-relay transmission scheme for mobile terminals to exploit spatial diversity. By improving the timeliness of measured channel state information (CSI) through deep learning (DL)-based channel prediction, the proposed scheme remarkably lowers the probability of wrong relay selection arising from outdated CSI in fast time-varying channels. It inherits the simplicity of opportunistic relaying by selecting a single relay, avoiding the complexity of multi-relay coordination and synchronization. Numerical results reveal that it can achieve full diversity gain in slow-fading channels and substantially outperforms the existing schemes in fast-fading wireless environments. Moreover, the computational complexity brought by the DL predictor is negligible compared to off-the-shelf computing hardware.

Full PDF

PPredictive Relay Selection: A Cooperative DiversityScheme Using Deep Learning

Wei Jiang

German Research Center for Artiﬁcial Intelligence (DFKI)

Kaiserslautern, Germanyhttps://orcid.org/0000-0002-3719-3710

Hans Dieter Schotten

University of Kaiserslautern

Kaiserslautern, Germanyhttps://orcid.org/0000-0001-5005-3635

Abstract

In this paper, we propose a novel cooperative multi-relay transmission scheme for mobile terminals to exploit spatial diversity.By improving the timeliness of measured channel state information (CSI) through deep learning (DL)-based channel prediction, theproposed scheme remarkably lowers the probability of wrong relay selection arising from outdated CSI in fast time-varying channels.It inherits the simplicity of opportunistic relaying by selecting a single relay, avoiding the complexity of multi-relay coordinationand synchronization. Numerical results reveal that it can achieve full diversity gain in slow-fading channels and substantiallyoutperforms the existing schemes in fast-fading wireless environments. Moreover, the computational complexity brought by the DLpredictor is negligible compared to off-the-shelf computing hardware.

Index Terms

Cooperative diversity, outdated CSI, channel prediction, deep learning, LSTM, opportunistic relaying

I. I

NTRODUCTION

Cooperative diversity [1] is an effective technique for mobile terminals without an antenna array to cultivate spatial diversitythat is typically achieved by co-located multi-antenna systems. A main challenge of cooperative diversity is the inherentasynchronization among spatially-distributed antennas (relays). Multiple timing offset and multiple carrier frequency offset[2] among simultaneously-transmitting relays make multi-relay transmission such as distributed beam-forming and distributedspace-time coding [3] too complicated for practical systems. In contrast, a single-relay approach called opportunistic relayselection (ORS) or opportunistic relaying [4] achieves full diversity gain while the complexity of multi-relay synchronizationand coordination is avoided.However, ORS is applicable only in slow-fading wireless environments since channel state information (CSI) used to selectthe best relay may be outdated quickly in fast-fading channels. Using a wrongly-selected relay substantially deteriorates theperformance of ORS, as widely veriﬁed in the literature such as [5]–[7]. With the proliferation of high-mobility applications (suchas vehicle-to-X, high-speed train, and unmanned aerial vehicle) and the utilization of higher frequency bands (e.g., millimeterwave and Terahertz) in 5G and beyond systems, the problem of outdated/aged CSI becomes more challenging. A cooperativemethod called generalized selection combining [8] shows robustness under aged channel but it suffers from a substantial loss ofspectral efﬁciency. The authors of [9] proposed a method utilizing the knowledge of channel statistics, getting only a marginalgain, whereas the complexity obviously grows. By far, opportunistic space-time coding (OSTC) proposed by the author of

February 8, 2021 DRAFT a r X i v : . [ c s . I T ] F e b his paper in [10]–[12] is the best method in fast fading channel from the perspective of diversity-multiplexing trade-off. Butits performance gap to perfect selection using the perfect knowledge of channel is still large, motivating our follow-up workspresented here.In this paper, therefore, we propose a novel cooperative method coined predictive relay selection (PRS) for mobile terminalsto exploit the gain of spatial diversity. The probability of wrong relay selection due to outdated CSI is remarkably reduced byimproving the timeliness of CSI through fading channel prediction [13]–[21]. A deep recurrent neural network is deliberatelybuilt to provide high-accurate CSI predictions. The proposed scheme inherits the simplicity of ORS by selecting a singleopportunistic relay to avoid the complexity of multi-relay coordination and synchronization. Simulation results reveal that it canachieve full diversity order in slow-fading channels and substantially outperforms the existing schemes in fast-fading wirelessenvironments. Moreover, the computational complexity brought by the deep learning (DL)-based predictor is analyzed andcompared with commercial off-the-shelf (COTS) computing hardware. The rest of this paper is organized as follows: Section IIintroduces the system model. Section III and IV present the proposed selection scheme and the principle of channel predictor,respectively. Complexity analysis and numerical results are given in Section V and VI. Finally, Section VII concludes this paper.II. S YSTEM M ODEL

Following the working assumption applied for most of prior research works [2]–[12], we consider a dual-hop decode-and-forward (DF) cooperative network where a single source node s communicates with a single destination node d with the aid of K relays. Each node is equipped with a single antenna that is used for both signal transmission and reception over a narrow-bandchannel. The received signal in an arbitrary link A → B is modeled as y B = h A,B x A + z B , where x A ∈ C is the transmittedsymbol from node A with average power P A = E [ | x A | ] , z B stands for additive white Gaussian noise with zero-mean andvariance σ n , i.e., z ∼CN (0 , σ n ) , and h A,B represents the fading coefﬁcient of the channel from A to B , which is a zero-mean circularly-symmetric complex Gaussian random variable h ∼CN (0 , σ h ) under the assumption of Rayleigh fading. Theinstantaneous signal-to-noise ratio (SNR) is denoted by γ A,B = | h A,B | P A /σ n and the average SNR ¯ γ A,B = E [ γ A,B ]= P A σ h /σ n .In a practical system, there exists a delay between the time of relay selection and the instant of using the selected relay totransmit signals. The actual CSI h may differ from its outdated version ˆ h that is applied for selecting relays. To quantify thequality of CSI, the correlation coefﬁcient between h and ˆ h is introduced, i.e., ρ o = E [ h ˆ h ∗ ] √ E [ | h | ] E [ | ˆ h | ] . With the classical Dopplerspectrum of the Jakes model, it takes the value ρ o = J (2 πf d τ ) , (1)where f d is the maximal Doppler frequency, τ stands for the delay between the outdated and actual CSI, and J ( · ) denotes the zeroth order Bessel function of the ﬁrst kind.Due to severe signal attenuation, a single-antenna relay should operate in half-duplex transmission mode to prevent fromharmful self-interference between the transmitter and receiver. Therefore, its signal transmission is organized in two phases: thesource broadcasts a signal in the source-to-relay (denoted by SR hereinafter) link, and then the relays retransmit this signal inthe relay-to-destination ( RD ) link. In the ﬁrst phase, as shown in Fig.1, the source (e.g., the drone in the ﬁgure) sends a symbol x and those relays who can correctly decode x form a decoding subset ( DS ) of the SR link DS (cid:44) { k | log (1 + γ s,k ) (cid:62) R } , (2) February 8, 2021 DRAFT ... : [x]: [x , x ]

ORS/PRS OSTC

Broadcasting

Relaying ... KK Relaying

Broadcasting source transmits relay(s) retransmits

Decoding

Subset : [ ]

ORS/PRS x : *1 2*2 1 x -xOSTC x x      Fig. 1. Schematic diagram of a cooperative network using different DF relaying strategies: ORS, PRS, and OSTC. where R is an end-to-end ( EE ) target rate for the dual-hop relaying. Note that the required data rate for either hop is doubled to R due to the adoption of half-duplex transmission. The best relay (denoted by ˙ k ) in ORS is opportunistically selected from DS in terms of ˙ k = arg max k ∈DS ˆ γ k,d , where ˆ γ k,d is the SNR of the RD link at the instant of relay selection , which is an outdatedversion of the actual SNR γ k,d during signal transmission. In contrast, the proposed PRS scheme replaces the outdated CSIwith the predicted CSI ˇ h , and determines ˙ k in terms of ˙ k = arg max k ∈DS ˇ γ k,d , where ˇ γ k,d = | ˇ h k,d | P k /σ n . In addition to thebest relay, the OSTC scheme [10] needs another relay with the second strongest SNR, i.e., ¨ k = arg max k ∈DS−{ ˙ k } ˆ γ k,d . In theﬁrst phase, the source broadcasts a pair of symbols ( x , x ) to all relays over two consecutive symbol periods. The regeneratedsignals are encoded by means of the Alamouti scheme [11], which is the unique space-time code achieving both full rate andfull diversity, at the pair of selected relays. In the second phase, a relay transmits ( x , − x ∗ ) while another transmits ( x , x ∗ ) simultaneously at the same frequency. III. P REDICTIVE R ELAY S ELECTION

Taking advantage of new degree of freedom opened by channel prediction, we propose the PRS scheme that can also achievehigh performance in fast time-varying channels. The prediction horizon relaxes the tight requirement of time procedure andtherefore provides the ﬂexibility to design an advanced relaying strategy. As depicted in

Algorithm 2 , the implementation forPRS is detailed as follows:1) At frame t , as illustrated in Fig.2, the source broadcasts a packet containing a pilot called Ready-To-Send (RTS) [4] anddata payload. The CSI h s,k [ t ] is acquired at relay k by estimating RTS and is used for detecting data symbols. Thoserelays that correctly decode the source’s signal comprise a DS .2) Clear-To-Send (CTS) is sent from the destination, so that relay k can estimate h d,k [ t ] and then h k,d [ t ] is known due tochannel reciprocity. It feeds h k,d [ t ] into its embedded channel predictor to generate ˇ h k,d [ t + 1] , and buffers it for its usageat the upcoming frame t + 1 .3) Meanwhile, relay k belonging to the DS fetches ˇ h k,d [ t ] that is buffered at the previous frame t − . This operation startsonce the CTS arrives, in parallel with Step .4) Then, a timer with a duration T t proportional (denoted by ∝ ) to / | ˇ h k,d [ t ] | is started at relay k .5) The timer on the relay with the largest channel gain expires ﬁrst, and then it sends a short packet to announce. February 8, 2021 DRAFT ) Once received the best relay’s notiﬁcation, other relays terminate their timers and keep silent. The selected relay forwardsthe signal until the end of this frame.

Frame t C T S R T S source CP CSI-E CSI-P CSI-Brelay Frame t+1 C T S R T S source CP relay [ 1] h t  [ ] cts y t Fig. 2. Frame structure of PRS.

CSI-E: CSI Estimation, CSI-P: CSI Prediction, CSI-B: CSI Buffering, CP: Contention Period . Algorithm 1

Predictive Relay Selection for t = 1 , , ... do s sends RTS; s sends data payload x [ t ] for k = 1 , ..., K do estimate h s,k [ t ] ; ˆ x [ t ] = f ( y s,k [ t ] , h s,k [ t ]) if ˆ x [ t ] is error-free then fetch ˇ h k,d [ t ] ; start a timer T t ∝ | ˇ h k,d [ t ] | end ifend for d sends CTS ˙ k = arg max k ∈DS (cid:0) | ˇ h k,d [ t ] | (cid:1) notiﬁes its presence ˙ k transmits ˆ x [ t ] for k = 1 , ..., K do estimate h k,d [ t ] ; predict ˇ h k,d [ t + 1] write ˇ h k,d [ t + 1] into Buffer end forend for IV. DL-

BASED C HANNEL P REDICTION

This section ﬁrst introduces the principle of deep recurrent networks including simple recurrent neural network (RNN) [16],Long Short-Term Memory (LSTM) [22], and Gated Recurrent Unit (GRU) [23], followed by the explanation of applying arecurrent network to build a channel predictor.

A. Deep Recurrent Networks

Unlike uni-direction information ﬂow in feed-forward neural networks, RNN has recurrent self-connections, which are appliedto memorize historical states, exhibiting great potential in time-series prediction. The activation of the previous time step is fed

February 8, 2021 DRAFT ack as part of the input for the current step. In a simple RNN, its l th recurrent layer is generally modeled as d ( l +1) t = R ( l ) ( d ( l ) t ) = δ h (cid:16) W ( l ) d ( l ) t + U ( l ) d ( l +1) t − + b ( l ) (cid:17) , (3)where W ( l ) and U ( l ) are weight matrices of the l th layer, b ( l ) is a bias vector, d ( l ) t and d ( l +1) t represent the input and outputfor layer l at time t , respectively, d ( l +1) t − is the feedback from the previous step, R ( l ) ( · ) stands for the relation function forthe input and output of the l th RNN hidden layer, and the activation function often selects the hyperbolic tangent denoted by tanh , which is δ h ( x ) = ( e x − / ( e x + 1) . Using typical stochastic gradient descent method to train a recurrent network, the Hadamard

Addition tanhSigmoid ( ) lt g ( ) lt i ( ) lt o ( )1 lt  c ( )1 lt  s ( ) lt f ( ) lt  d [ t + ] h [t] h ChannelEstimator DL CSI Predictor ( ) lt d SignalDetection RelaySelectorRF

Detected SymbolsSelection feedbackReceivedpilots

Fig. 3. Block diagram of the receiver of a relay in PRS. The DL-based predictor consists of an input layer, L LSTM hidden layers, and an output layer, wherethe l th hidden layer is opened to illustrate the internal structure of an LSTM memory block. back-propagated error signals tend to zero that implies a prohibitively-long convergence time. To tackle this gradient-vanishingproblem, Hochreiter and Schmidhuber proposed Long Short-Term Memory in their pioneer work of [22], which introduced cell and gate into the RNN structure. A typical LSTM cell has three gates: an input gate controlling the extent of new informationﬂows into the cell, a forget gate to ﬁlter out useless memory, and an output gate that controls the extent to which the memoryis applied to generate the activation. The upper part of Fig.3 shows the graphical depiction of a deep LSTM network consistingof an input layer, L hidden layers, and an output layer. Let’s use the l th hidden layer as an example to shed light on how anactivation signal goes through the network. There are two hidden states - the short-term state s ( l ) t − and the long-term state c ( l ) t − .The input d ( l ) t and s ( l ) t − jointly activate four fully connected (FC) layers, generating the activation vectors for the gates, i.e.,  i ( l ) t = δ g (cid:16) W ( l ) i d ( l ) t + U ( l ) i s ( l ) t − + b ( l ) i (cid:17) o ( l ) t = δ g (cid:16) W ( l ) o d ( l ) t + U ( l ) o s ( l ) t − + b ( l ) o (cid:17) f ( l ) t = δ g (cid:16) W ( l ) f d ( l ) t + U ( l ) f s ( l ) t − + b ( l ) f (cid:17) , (4)where W and U are weight matrices for the FC layers, b represents bias, the subscripts i , o , and f associate with the input,output, and forget gate, respectively, and δ g stands for the logistic Sigmoid function δ g ( x ) = 1 / (1 + e − x ) . The current long-termstate c ( l ) t is obtained by ﬁrst throwing away outdated memory at the forget gate and then adding new information selected by the February 8, 2021 DRAFT nput gate, i.e., c ( l ) t = f ( l ) t ⊗ c ( l ) t − + i ( l ) t ⊗ g ( l ) t , where the operator ⊗ denotes the Hadamard product (element-wise multiplication)and g ( l ) t = δ h ( W ( l ) g d ( l ) t + U ( l ) g s ( l ) t − + b ( l ) g ) . The output of this hidden layer is computed by d ( l +1) t = L ( l ) (cid:16) d ( l ) t (cid:17) = o ( l ) t ⊗ δ h (cid:16) c ( l ) t (cid:17) , (5)where L ( l ) ( · ) represents the input-output function for the l th LSTM layer.Despite of its short history, LSTM has achieved a great success and been commercially applied in many AI products such asApple Siri and Google Translate. Since its emergence, the research community published a number of its variants, among whichGRU proposed by Cho et al. in [23] draws lots of attention. It’s a simpliﬁed version with fewer parameters, but it exhibits evenbetter performance over LSTM on certain smaller and less frequent datasets. To simplify the structure, a GRU memory cell hasonly a single hidden state, and the number of gates is reduced to two: the update and reset gate. The activation vector for theupdate gate is computed by z ( l ) t = σ g ( W ( l ) z d ( l ) t + U ( l ) z s ( l ) t − + b ( l ) z ) , which decides the extend to which the memory contentfrom the previous state will remain in the current state. The reset gate controls whether the previous state is ignored, and whenit tends to , the hidden state is reset with the current input. It is given by r ( l ) t = σ g ( W ( l ) r d ( l ) t + U ( l ) r s ( l ) t − + b ( l ) r ) . Likewise, theprevious hidden state s ( l ) t − goes through the cell, drops outdated memory, and inserts some now content, generating the currenthidden state, that is s ( l ) t = (1 − z ( l ) t ) ⊗ s ( l ) t − (6) + z ( l ) t ⊗ σ h (cid:16) W ( l ) s d ( l ) t + U ( l ) s ( r ( l ) t ⊗ s ( l ) t − ) + b ( l ) s (cid:17) . The hidden state is also equal to its output of this hidden layer, i.e., d ( l +1) t = G ( l ) ( d ( l ) t ) = s ( l ) t , where G ( l ) ( · ) denotes theinput-output function. B. DL-based Channel Predictor

To shed light on the principle of a DL-based predictor, as shown in Fig.3, the chain of signal reception at the receiver isdemonstrated. The predictor is inserted at the end of a channel estimator and generates predicted CSI to replace outdated CSIas the input for a relay selector. It is transparent and therefore an ORS system can be smoothly upgraded to a PRS systemwithout any other modiﬁcations. In such a distributed-selection method, each relay requires to process only local CSI h k,d [ t ] .As we know, a complex-valued fading coefﬁcient can be expressed in polar form as h k,d [ t ] = a k,d [ t ] e jθ k,d [ t ] , where a k,d [ t ] and θ k,d [ t ] denote the magnitude and phase, respectively. Because the selection relies on the value of SNR, only the knowledge ofmagnitude a k,d [ t ] is enough, rather than complex-valued h k,d [ t ] , which in turn can simplify the implementation of the channelpredictor by employing a neural network with real-valued weights and biases. Feeding a k,d [ t ] into the input feed-forward layerobtains one-dimensional output d (1) t = d (1) t = δ h ( w ( i ) a k,d [ t ] + b ( i ) ) , where w ( i ) and b ( i ) denote the weight and bias of theinput layer. The activation of the st hidden layer is exactly d (1) t , then d (2) t = L (1) ( d (1) t ) is generated and forwarded to the nd hidden layer, where L (1) ( · ) is deﬁned in (5). The activation goes through the network until the output layer gets the predictedCSI, which is computed by ˇ a k,d [ t +1] = δ h ( W ( o ) d ( L ) t + b ( o ) ) , where W ( o ) and b ( o ) denote the weight matrix and bias of theoutput layer, and the activation of the last hidden layer equals to d ( L ) t = L ( L ) ( . . . L (2) ( L (1) ( d (1) t ))) . The building of a deeprecurrent network is ﬂexible, for example, we can apply a hybrid network consisting of RNN, GRU, and LSTM layers, like d ( L ) t = G ( L ) ( . . . L (2) ( R (1) ( d (1) t ))) . February 8, 2021 DRAFT . C

OMPUTATIONAL C OMPLEXITY

In the context of cooperative diversity, the computational complexity mainly arises from multi-relay coordination and syn-chronization [2]. The simplicity of ORS is achieved thanks to single-relay transmission that substantially lowers the amountof signalling overhead among multiple relays. A direct comparison of different schemes is not easy and does not provide realinsight. That is why most of the works in this ﬁeld [2]–[12] did not provide a quantitative analysis. On the other hand, thecomplexity of the proposed scheme comes mainly from the DL-based predictor, which is always a concern for the applicationof deep learning. From a practical perspective, it is more meaningful to make clear its demand on computing resources incomparison with the capability of COTS hardware. Hence, let’s focus on assessing the complexity of the predictors in terms ofﬂoating-point operations per second (FLOPS).A deep recurrent network can be quantitatively modelled as follows: an input layer with N i neurons, an output layer with N o neurons, and L hidden layers, which has N lh neurons at layer l = 1 , . . . , L . To begin with the input layer, it computes δ h ( W ( i ) d + b ( i ) ) , where the matrix multiplication generates N i N h ﬂoating-point multiplicative operations and ( N i − N h additive operations, and the addition of the bias vector consumes N h operations, amounting to a total of N i N h . Note that theamount of computation raised by the activation function is negligible compared to the matrix multiplication, which is usuallyignored in the calculation of complexity for deep learning. Likewise, it is easy to know that the output layer corresponds to N Lh N o . For an RNN hidden layer as given in (3), the number of operations equals to O l = (2 N l − h − N lh +(2 N lh − N lh + N lh ,where the ﬁrst term corresponds to the calculation of W ( l ) d ( l ) t , the second is for U ( l ) d ( l +1) t − , and the third is due to the additionof the bias. For simplicity, O l can be approximated to N l − h N lh + 2( N lh ) . Then, the overall complexity for a simple RNN isgiven by O rnn ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) , (7)where we apply N h = N i for a simpler expression. As derived from (4)-(5), the number of operations for the matrix multiplicationon an LSTM layer is times that of an RNN layer, i.e., O l . The computation for the gate control, which has totally N lh − operations, can be neglected. Therefore, the complexity of an LSTM network is approximated by O lstm ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) . (8)Similarly, we can derive the expression for GRU, i.e., O gru ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) (9)Note that the above expressions are the complexity per prediction step, we need to multiply (7)-(9) with the frequency ofprediction denoted by f p , i.e., the number of steps performed per second, to ﬁgure out FLOPS.Given the concrete values of these parameters, the complexity of the predictor is quantiﬁed to compare with the capacityof COTS computing hardware. Suppose the applied deep neural network has two LSTM hidden layers with N h = N h = 25 neurons . The input for the predictor at the k th relay is a k,d [ t ] , corresponding to N i = N o = 1 . It amounts to O lstm = 15 , ﬂoating-point operations per prediction in terms of (8). The interval of prediction step is assumed to be , the frequency ofprediction equals to f p = 1 , , resulting in . . In comparison with off-the-shelf Digital Signal Processors (DSPs), The selection of such hyper-parameters will be justiﬁed in the next section.

February 8, 2021 DRAFT

Number of Hidden Neurons P r ed i c t i on A cc u r a cy -4 LSTM-1LSTM-2LSTM-3LSTM-4RNN-2GRU-2 (a)

SNR [dB] -6 -4 -2 O u t age P r obab ili t y ORS-3msORS-2msOSTC-3msOSTC-2msPRS-3msPRS-2msPrefect (b)

SNR [dB] C apa c i t y [ bp s / H z ] Perfect CSIPRS-3msOSTC-3msORS-3ms (c)Fig. 4. (a) Prediction accuracy in terms of the number of hidden neurons. (b) Comparison of outage probability for ORS, OSTC, and PRS in a cooperativenetwork with K =8 relays; (c) Comparison of channel capacity for ORS, OSTC, and PRS in a cooperative network with K =8 relays. e.g., TI C6678, which provides a computation capacity of up to , the required computing resource occupies lessthan . of a single DSP chip. Taking into account its back-compatibility to legacy hardware, we further check low-endDSPs. Given TI C6748 that has computation power of . as an example, the resource required by the predictoris around . . In a nutshell, the complexity of the DL-based channel predictor applied for PRS is quite affordable, if notnegligible. VI. S IMULATION RESULTS

In this section, we clarify how to select the hyper-parameters of a deep recurrent network to obtain high prediction accuracyand then make use of Monte-Carlo simulations to evaluate the outage probability and channel capacity of PRS, compared withthe existing schemes including ORS and OSTC. Following the channel assumption adopted by most of the previous worksin this ﬁeld, we would apply single-antenna ﬂat-fading i.i.d. channels. Each channel follows the Rayleigh distribution withan average power gain of , where its fading coefﬁcient h ∼CN (0 , . The default maximal Doppler frequency shift is setto f d =100Hz , emulating fast fading environment. Continuous-time channel responses are sampled with a rate of f s =1KHz ,adhering to the assumption of ﬂat fading, and therefore the interval of samples is T s =1ms . Each channel generates a series of consecutive samples { h [ t ] (cid:12)(cid:12) t =1 , , . . . , } . As usual, an EE target rate of R =1bps / Hz is applied for outage calculation.The total transmit power P is equally allocated between two phases, where the source’s power is P s =0 . P , resulting in anaverage SNR ¯ γ s,k =0 . P/σ n , while ¯ γ k,d =0 . P/σ n for the RD link. A. Training the Predictor

The hyper-parameters of a deep network such as the number of layers or neurons have a substantial impact on predictionaccuracy. It is worth clarifying how to tune a deep network on demand. A training process starts from an initial state where allweights and biases are randomly selected. The input of the predictor at the relay is a k,d [ t ] and the output is its D -step-aheadprediction ˇ a k,d [ t + D ] . To measure prediction accuracy, the mean squared error (MSE) is applied as the cost function, namely MSE = T (cid:80) Tt =1 | a k,d [ t + D ] − ˇ a k,d [ t + D ] | , where T is the total number of channel samples for evaluation. Using the batch training, a batch of samples is fed into the network per step. The output is compared with the desired values and the February 8, 2021 DRAFT esultant error signals are propagated back through the network to update the weights by means of training algorithms such asthe Adam optimizer used in our simulation. After epochs, the trained network is employed to predict CSI.Fig.4a compares the prediction accuracy of the predictors with different hyper-parameters. Let’s ﬁrst look at the impact ofthe number of layers and the number of neurons. Starting from an LSTM network with a single hidden layer, denoted by LSTM-1 in the legend of the ﬁgure, its accuracy curve as a function of the number of hidden neurons likes an ‘U’ shape. Thatis because the network suffers from the under-ﬁtting problem with only neurons in the hidden layer, while the over-ﬁtting problem appears using over neurons. To make a fair comparison, the horizontal axis represents the total number of hiddenneurons, which are evenly allocated across layers. For instance, the point of ‘60’ in the horizontal axis means a 2-hidden-layernetwork with neurons at either layer (denoted by LSTM-2 ), a 3-hidden-layer network with neurons per layer (denoted by LSTM-3 ), or a single layer with hidden neurons. No matter how many neurons in its single hidden layer, LSTM-1 cannotreach the high accuracy achieved by

LSTM-2 and

LSTM-3 , justifying the beneﬁt of deep learning. But it does not mean that themore layers, the better, as shown by the worse result of

LSTM-4 , which has 4 hidden layers. After known that 2-hidden-layer isthe best choice for LSTM, we further observe the recurrent networks with 2 RNN or GRU hidden layers, indicated by

RNN-2 and

GRU-2 , respectively. As we can see, GRU performs as good as LSTM, whereas RNN is weak. As a result, we select a2-hidden-layer LSTM network with neurons at either layer, upon which the numerical results in the following ﬁgures arederived. B. Performance Comparison

We further compare the outage performance of three relaying schemes in a cooperative network with K =8 relays, as illustratedin Fig.4b. The relay selection with the perfect knowledge of CSI (i.e., ρ =1 ) is used as the benchmark, which has the diversityorder of and decays at a rate of / ¯ γ , where ¯ γ = P/σ n is the average EE SNR. With the delay of τ = 2 and , the qualityof outdated CSI drops to ρ o = J (0 . π ) ≈ . and J (0 . π ) ≈ . , respectively, which substantially deteriorates theperformance. The diversity of ORS falls into , i.e., no diversity, and the curve decays slowly at a rate of / ¯ γ in the high SNRregime. OSTC can redeem some loss and achieve the diversity order of by using a pair of relays, but its gap to the benchmarkis still large, more than at the level of − . Making use of channel prediction, the quality of CSI can be improved to ρ > . . The proposed scheme achieves nearly the optimal performance with the horizon of (by setting D = 2 stepsprediction), and remarkably outperforms OSTC with a gain of approximately in the case of . Moreover, the channelcapacities for different schemes given τ = 3ms are comparatively illustrated in Fig.4c. At the SNR of ¯ γ =20dB , for instance,ORS, OSTC, and PRS achieves . , . , and . / Hz , respectively, where ORS suffers from a loss of around / Hz butPRS achieves a near-optimal capacity. VII. C ONCLUSIONS

In this paper, we proposed a deep-learning-aided cooperative diversity method for mobile terminals without an antennaarray to cultivate the beneﬁt of spatial diversity. A recurrent neural network was deliberately built to improve the timelinessof channel state information applied for selecting a single opportunistic relay. Simply inserting a channel predictor betweenthe channel estimator and relay selector, an ORS system can be upgraded to a PRS system without any other modiﬁcations,making it transparent and easier to compatible with the existing systems and standards. It achieves the optimal performance withthe full diversity order equaling to the number of cooperating relays in slow fading wireless environments, and substantially

February 8, 2021 DRAFT utperforms the existing schemes in fast fading channels. It inherits the simplicity of ORS by avoiding multi-relay coordinationand synchronization, and the computational complexity arising from fading channel prediction is negligible compared withCOTS hardware. From the perspective of performance , compatibility , and complexity , it is viewed as a good candidate fornext-generation cooperative networks. R EFERENCES[1] W. Jiang, “Device-to-device based cooperative relaying for 5G network: A comparative review,”

ZTE Commun. , vol. 15, no. S1, pp. 60–66, Jun. 2017.[2] A. A. Nasir et al. , “Timing and carrier synchronization with channel estimation in multi-relay cooperative networks,”

IEEE Trans. Signal Process. , vol. 60,no. 2, pp. 793–811, Feb. 2012.[3] J. N. Laneman and G. W. Wornell, “Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks,”

IEEE Trans. Inf.Theory , vol. 49, no. 10, pp. 2415–2425, Oct. 2003.[4] A. Bletsas et al. , “A simple cooperative diversity method based on network path selection,”

IEEE J. Sel. Areas Commun. , vol. 24, no. 3, pp. 659–672,Mar. 2006.[5] W. Jiang et al. , “An MGF-based performance analysis of opportunistic relay selection with outdated CSI,” in

Proc. IEEE VTC’2014-Spring , Seoul, SouthKorea, May 2014.[6] J. L. Vicario et al. , “Opportunistic relay selection with outdated CSI: Outage probability and diversity analysis,”

IEEE Trans. Wireless Commun. , vol. 8,no. 6, pp. 2872–2876, Jun. 2009.[7] W. Jiang et al. , “Opportunistic relaying over aerial-to-terrestrial and device-to-device radio channels,” in

Proc. IEEE ICC’2014 , Sydney, Australia, Jul.2014, pp. 206–211.[8] L. Xiao and X. Dong, “Uniﬁed analysis of generalized selection combining with normalized threshold test per branch,”

IEEE Trans. Wireless Commun. ,vol. 5, no. 8, pp. 2153–2163, Aug. 2006.[9] Y. Li et al. , “On the design of relay selection strategies in regenerative cooperative networks with outdated CSI,”

IEEE Trans. Wireless Commun. , vol. 10,no. 9, pp. 3086–3097, Sep. 2011.[10] W. Jiang, T. Kaiser, and A. J. H. Vinck, “A robust opportunistic relaying strategy for co-operative wireless communications,”

IEEE Trans. WirelessCommun. , vol. 15, no. 4, pp. 2642–2655, Apr. 2016.[11] W. Jiang, H. Cao, and T. Kaiser, “Opportunistic space-time coding to exploit cooperative diversity in fast-fading channels,” in

Proc. IEEE ICC’2014 ,Sydney, Australia, Jun. 2014, pp. 4814–4819.[12] W. Jiang et al. , “Achieving high reliability in aerial-terrestrial networks: Opportunistic space-time coding,” in

Proc. IEEE Eur. Conf. on Net. and Commun.(EUCNC) , Bologne, Italy, Jun. 2014.[13] W. Jiang and H. D. Schotten, “Neural network-based fading channel prediction: A comprehensive overview,”

IEEE Access , vol. 7, pp. 118 112–118 124,Aug. 2019.[14] ——, “Deep learning for fading channel prediction,”

IEEE Open J. Commun. Society , vol. 1, pp. 320–332, Mar. 2020.[15] W. Jiang and H. Schotten, “Neural network-based channel prediction and its performance in multi-antenna systems,” in

Proc. IEEE Vehicular Tech. Conf.(VTC) , Chicago, USA, Aug. 2018.[16] W. Jiang and H. D. Schotten, “Recurrent neural network-based frequency-domain channel prediction for wideband communications,” in

Proc. IEEEVehicular Tech. Conf. (VTC) , Kuala Lumpur, Malaysia, Apr. 2019.[17] W. Jiang, H. Schotten, and J. Y. Xiang, “Neural network–based wireless channel prediction,” in

Machine Learning for Future Wireless Communications ,F. L. Luo, Ed. United Kindom: John Wiley&Sons and IEEE Press, 2019, ch. 16.[18] W. Jiang and H. D. Schotten, “Multi-antenna fading channel prediction empowered by artiﬁcial intelligence,” in

Proc. IEEE Veh. Tech. Conf. (VTC) ,Chicago, USA, Aug. 2018.[19] W. Jiang and H. Schotten, “Recurrent neural networks with long short-term memory for fading channel prediction,” in

Proc. IEEE Veh. Tech. Conf. (VTC) ,Antwerp, Belgium, May 2020.[20] W. Jiang, M. Strufe, and H. Schotten, “Long-range MIMO channel prediction using recurrent neural networks,” in

Proc. IEEE Consumer Commun. &Netw. Conf. (CCNC) , Los Vegas, USA, Jan. 2020.[21] W. Jiang and H. D. Schotten, “A deep learning method to predict fading channel in multi-antenna systems,” in

Proc. IEEE Veh. Tech. Conf. (VTC) ,Antwerp, Belgium, May 2020.[22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

Neural Computation , vol. 9, no. 8, pp. 1735–1780, Dec. 1997.[23] K. Cho et al. , “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” preprint arXiv:1406.1078 , Jun. 2014., Jun. 2014.