Predictive Relay Selection: A Cooperative Diversity Scheme Using Deep Learning
PPredictive Relay Selection: A Cooperative DiversityScheme Using Deep Learning
Wei Jiang
German Research Center for Artificial Intelligence (DFKI)
Kaiserslautern, Germanyhttps://orcid.org/0000-0002-3719-3710
Hans Dieter Schotten
University of Kaiserslautern
Kaiserslautern, Germanyhttps://orcid.org/0000-0001-5005-3635
Abstract
In this paper, we propose a novel cooperative multi-relay transmission scheme for mobile terminals to exploit spatial diversity.By improving the timeliness of measured channel state information (CSI) through deep learning (DL)-based channel prediction, theproposed scheme remarkably lowers the probability of wrong relay selection arising from outdated CSI in fast time-varying channels.It inherits the simplicity of opportunistic relaying by selecting a single relay, avoiding the complexity of multi-relay coordinationand synchronization. Numerical results reveal that it can achieve full diversity gain in slow-fading channels and substantiallyoutperforms the existing schemes in fast-fading wireless environments. Moreover, the computational complexity brought by the DLpredictor is negligible compared to off-the-shelf computing hardware.
Index Terms
Cooperative diversity, outdated CSI, channel prediction, deep learning, LSTM, opportunistic relaying
I. I
NTRODUCTION
Cooperative diversity [1] is an effective technique for mobile terminals without an antenna array to cultivate spatial diversitythat is typically achieved by co-located multi-antenna systems. A main challenge of cooperative diversity is the inherentasynchronization among spatially-distributed antennas (relays). Multiple timing offset and multiple carrier frequency offset[2] among simultaneously-transmitting relays make multi-relay transmission such as distributed beam-forming and distributedspace-time coding [3] too complicated for practical systems. In contrast, a single-relay approach called opportunistic relayselection (ORS) or opportunistic relaying [4] achieves full diversity gain while the complexity of multi-relay synchronizationand coordination is avoided.However, ORS is applicable only in slow-fading wireless environments since channel state information (CSI) used to selectthe best relay may be outdated quickly in fast-fading channels. Using a wrongly-selected relay substantially deteriorates theperformance of ORS, as widely verified in the literature such as [5]–[7]. With the proliferation of high-mobility applications (suchas vehicle-to-X, high-speed train, and unmanned aerial vehicle) and the utilization of higher frequency bands (e.g., millimeterwave and Terahertz) in 5G and beyond systems, the problem of outdated/aged CSI becomes more challenging. A cooperativemethod called generalized selection combining [8] shows robustness under aged channel but it suffers from a substantial loss ofspectral efficiency. The authors of [9] proposed a method utilizing the knowledge of channel statistics, getting only a marginalgain, whereas the complexity obviously grows. By far, opportunistic space-time coding (OSTC) proposed by the author of
February 8, 2021 DRAFT a r X i v : . [ c s . I T ] F e b his paper in [10]–[12] is the best method in fast fading channel from the perspective of diversity-multiplexing trade-off. Butits performance gap to perfect selection using the perfect knowledge of channel is still large, motivating our follow-up workspresented here.In this paper, therefore, we propose a novel cooperative method coined predictive relay selection (PRS) for mobile terminalsto exploit the gain of spatial diversity. The probability of wrong relay selection due to outdated CSI is remarkably reduced byimproving the timeliness of CSI through fading channel prediction [13]–[21]. A deep recurrent neural network is deliberatelybuilt to provide high-accurate CSI predictions. The proposed scheme inherits the simplicity of ORS by selecting a singleopportunistic relay to avoid the complexity of multi-relay coordination and synchronization. Simulation results reveal that it canachieve full diversity order in slow-fading channels and substantially outperforms the existing schemes in fast-fading wirelessenvironments. Moreover, the computational complexity brought by the deep learning (DL)-based predictor is analyzed andcompared with commercial off-the-shelf (COTS) computing hardware. The rest of this paper is organized as follows: Section IIintroduces the system model. Section III and IV present the proposed selection scheme and the principle of channel predictor,respectively. Complexity analysis and numerical results are given in Section V and VI. Finally, Section VII concludes this paper.II. S YSTEM M ODEL
Following the working assumption applied for most of prior research works [2]–[12], we consider a dual-hop decode-and-forward (DF) cooperative network where a single source node s communicates with a single destination node d with the aid of K relays. Each node is equipped with a single antenna that is used for both signal transmission and reception over a narrow-bandchannel. The received signal in an arbitrary link A → B is modeled as y B = h A,B x A + z B , where x A ∈ C is the transmittedsymbol from node A with average power P A = E [ | x A | ] , z B stands for additive white Gaussian noise with zero-mean andvariance σ n , i.e., z ∼CN (0 , σ n ) , and h A,B represents the fading coefficient of the channel from A to B , which is a zero-mean circularly-symmetric complex Gaussian random variable h ∼CN (0 , σ h ) under the assumption of Rayleigh fading. Theinstantaneous signal-to-noise ratio (SNR) is denoted by γ A,B = | h A,B | P A /σ n and the average SNR ¯ γ A,B = E [ γ A,B ]= P A σ h /σ n .In a practical system, there exists a delay between the time of relay selection and the instant of using the selected relay totransmit signals. The actual CSI h may differ from its outdated version ˆ h that is applied for selecting relays. To quantify thequality of CSI, the correlation coefficient between h and ˆ h is introduced, i.e., ρ o = E [ h ˆ h ∗ ] √ E [ | h | ] E [ | ˆ h | ] . With the classical Dopplerspectrum of the Jakes model, it takes the value ρ o = J (2 πf d τ ) , (1)where f d is the maximal Doppler frequency, τ stands for the delay between the outdated and actual CSI, and J ( · ) denotes the zeroth order Bessel function of the first kind.Due to severe signal attenuation, a single-antenna relay should operate in half-duplex transmission mode to prevent fromharmful self-interference between the transmitter and receiver. Therefore, its signal transmission is organized in two phases: thesource broadcasts a signal in the source-to-relay (denoted by SR hereinafter) link, and then the relays retransmit this signal inthe relay-to-destination ( RD ) link. In the first phase, as shown in Fig.1, the source (e.g., the drone in the figure) sends a symbol x and those relays who can correctly decode x form a decoding subset ( DS ) of the SR link DS (cid:44) { k | log (1 + γ s,k ) (cid:62) R } , (2) February 8, 2021 DRAFT ... : [x]: [x , x ]
ORS/PRS OSTC
Broadcasting
Relaying ... KK Relaying
Broadcasting source transmits relay(s) retransmits
Decoding
Subset : [ ]
ORS/PRS x : *1 2*2 1 x -xOSTC x x Fig. 1. Schematic diagram of a cooperative network using different DF relaying strategies: ORS, PRS, and OSTC. where R is an end-to-end ( EE ) target rate for the dual-hop relaying. Note that the required data rate for either hop is doubled to R due to the adoption of half-duplex transmission. The best relay (denoted by ˙ k ) in ORS is opportunistically selected from DS in terms of ˙ k = arg max k ∈DS ˆ γ k,d , where ˆ γ k,d is the SNR of the RD link at the instant of relay selection , which is an outdatedversion of the actual SNR γ k,d during signal transmission. In contrast, the proposed PRS scheme replaces the outdated CSIwith the predicted CSI ˇ h , and determines ˙ k in terms of ˙ k = arg max k ∈DS ˇ γ k,d , where ˇ γ k,d = | ˇ h k,d | P k /σ n . In addition to thebest relay, the OSTC scheme [10] needs another relay with the second strongest SNR, i.e., ¨ k = arg max k ∈DS−{ ˙ k } ˆ γ k,d . In thefirst phase, the source broadcasts a pair of symbols ( x , x ) to all relays over two consecutive symbol periods. The regeneratedsignals are encoded by means of the Alamouti scheme [11], which is the unique space-time code achieving both full rate andfull diversity, at the pair of selected relays. In the second phase, a relay transmits ( x , − x ∗ ) while another transmits ( x , x ∗ ) simultaneously at the same frequency. III. P REDICTIVE R ELAY S ELECTION
Taking advantage of new degree of freedom opened by channel prediction, we propose the PRS scheme that can also achievehigh performance in fast time-varying channels. The prediction horizon relaxes the tight requirement of time procedure andtherefore provides the flexibility to design an advanced relaying strategy. As depicted in
Algorithm 2 , the implementation forPRS is detailed as follows:1) At frame t , as illustrated in Fig.2, the source broadcasts a packet containing a pilot called Ready-To-Send (RTS) [4] anddata payload. The CSI h s,k [ t ] is acquired at relay k by estimating RTS and is used for detecting data symbols. Thoserelays that correctly decode the source’s signal comprise a DS .2) Clear-To-Send (CTS) is sent from the destination, so that relay k can estimate h d,k [ t ] and then h k,d [ t ] is known due tochannel reciprocity. It feeds h k,d [ t ] into its embedded channel predictor to generate ˇ h k,d [ t + 1] , and buffers it for its usageat the upcoming frame t + 1 .3) Meanwhile, relay k belonging to the DS fetches ˇ h k,d [ t ] that is buffered at the previous frame t − . This operation startsonce the CTS arrives, in parallel with Step .4) Then, a timer with a duration T t proportional (denoted by ∝ ) to / | ˇ h k,d [ t ] | is started at relay k .5) The timer on the relay with the largest channel gain expires first, and then it sends a short packet to announce. February 8, 2021 DRAFT ) Once received the best relay’s notification, other relays terminate their timers and keep silent. The selected relay forwardsthe signal until the end of this frame.
Frame t C T S R T S source CP CSI-E CSI-P CSI-Brelay Frame t+1 C T S R T S source CP relay [ 1] h t [ ] cts y t Fig. 2. Frame structure of PRS.
CSI-E: CSI Estimation, CSI-P: CSI Prediction, CSI-B: CSI Buffering, CP: Contention Period . Algorithm 1
Predictive Relay Selection for t = 1 , , ... do s sends RTS; s sends data payload x [ t ] for k = 1 , ..., K do estimate h s,k [ t ] ; ˆ x [ t ] = f ( y s,k [ t ] , h s,k [ t ]) if ˆ x [ t ] is error-free then fetch ˇ h k,d [ t ] ; start a timer T t ∝ | ˇ h k,d [ t ] | end ifend for d sends CTS ˙ k = arg max k ∈DS (cid:0) | ˇ h k,d [ t ] | (cid:1) notifies its presence ˙ k transmits ˆ x [ t ] for k = 1 , ..., K do estimate h k,d [ t ] ; predict ˇ h k,d [ t + 1] write ˇ h k,d [ t + 1] into Buffer end forend for IV. DL-
BASED C HANNEL P REDICTION
This section first introduces the principle of deep recurrent networks including simple recurrent neural network (RNN) [16],Long Short-Term Memory (LSTM) [22], and Gated Recurrent Unit (GRU) [23], followed by the explanation of applying arecurrent network to build a channel predictor.
A. Deep Recurrent Networks
Unlike uni-direction information flow in feed-forward neural networks, RNN has recurrent self-connections, which are appliedto memorize historical states, exhibiting great potential in time-series prediction. The activation of the previous time step is fed
February 8, 2021 DRAFT ack as part of the input for the current step. In a simple RNN, its l th recurrent layer is generally modeled as d ( l +1) t = R ( l ) ( d ( l ) t ) = δ h (cid:16) W ( l ) d ( l ) t + U ( l ) d ( l +1) t − + b ( l ) (cid:17) , (3)where W ( l ) and U ( l ) are weight matrices of the l th layer, b ( l ) is a bias vector, d ( l ) t and d ( l +1) t represent the input and outputfor layer l at time t , respectively, d ( l +1) t − is the feedback from the previous step, R ( l ) ( · ) stands for the relation function forthe input and output of the l th RNN hidden layer, and the activation function often selects the hyperbolic tangent denoted by tanh , which is δ h ( x ) = ( e x − / ( e x + 1) . Using typical stochastic gradient descent method to train a recurrent network, the Hadamard
Addition tanhSigmoid ( ) lt g ( ) lt i ( ) lt o ( )1 lt c ( )1 lt s ( ) lt f ( ) lt d [ t + ] h [t] h ChannelEstimator DL CSI Predictor ( ) lt d SignalDetection RelaySelectorRF
Detected SymbolsSelection feedbackReceivedpilots
Fig. 3. Block diagram of the receiver of a relay in PRS. The DL-based predictor consists of an input layer, L LSTM hidden layers, and an output layer, wherethe l th hidden layer is opened to illustrate the internal structure of an LSTM memory block. back-propagated error signals tend to zero that implies a prohibitively-long convergence time. To tackle this gradient-vanishingproblem, Hochreiter and Schmidhuber proposed Long Short-Term Memory in their pioneer work of [22], which introduced cell and gate into the RNN structure. A typical LSTM cell has three gates: an input gate controlling the extent of new informationflows into the cell, a forget gate to filter out useless memory, and an output gate that controls the extent to which the memoryis applied to generate the activation. The upper part of Fig.3 shows the graphical depiction of a deep LSTM network consistingof an input layer, L hidden layers, and an output layer. Let’s use the l th hidden layer as an example to shed light on how anactivation signal goes through the network. There are two hidden states - the short-term state s ( l ) t − and the long-term state c ( l ) t − .The input d ( l ) t and s ( l ) t − jointly activate four fully connected (FC) layers, generating the activation vectors for the gates, i.e., i ( l ) t = δ g (cid:16) W ( l ) i d ( l ) t + U ( l ) i s ( l ) t − + b ( l ) i (cid:17) o ( l ) t = δ g (cid:16) W ( l ) o d ( l ) t + U ( l ) o s ( l ) t − + b ( l ) o (cid:17) f ( l ) t = δ g (cid:16) W ( l ) f d ( l ) t + U ( l ) f s ( l ) t − + b ( l ) f (cid:17) , (4)where W and U are weight matrices for the FC layers, b represents bias, the subscripts i , o , and f associate with the input,output, and forget gate, respectively, and δ g stands for the logistic Sigmoid function δ g ( x ) = 1 / (1 + e − x ) . The current long-termstate c ( l ) t is obtained by first throwing away outdated memory at the forget gate and then adding new information selected by the February 8, 2021 DRAFT nput gate, i.e., c ( l ) t = f ( l ) t ⊗ c ( l ) t − + i ( l ) t ⊗ g ( l ) t , where the operator ⊗ denotes the Hadamard product (element-wise multiplication)and g ( l ) t = δ h ( W ( l ) g d ( l ) t + U ( l ) g s ( l ) t − + b ( l ) g ) . The output of this hidden layer is computed by d ( l +1) t = L ( l ) (cid:16) d ( l ) t (cid:17) = o ( l ) t ⊗ δ h (cid:16) c ( l ) t (cid:17) , (5)where L ( l ) ( · ) represents the input-output function for the l th LSTM layer.Despite of its short history, LSTM has achieved a great success and been commercially applied in many AI products such asApple Siri and Google Translate. Since its emergence, the research community published a number of its variants, among whichGRU proposed by Cho et al. in [23] draws lots of attention. It’s a simplified version with fewer parameters, but it exhibits evenbetter performance over LSTM on certain smaller and less frequent datasets. To simplify the structure, a GRU memory cell hasonly a single hidden state, and the number of gates is reduced to two: the update and reset gate. The activation vector for theupdate gate is computed by z ( l ) t = σ g ( W ( l ) z d ( l ) t + U ( l ) z s ( l ) t − + b ( l ) z ) , which decides the extend to which the memory contentfrom the previous state will remain in the current state. The reset gate controls whether the previous state is ignored, and whenit tends to , the hidden state is reset with the current input. It is given by r ( l ) t = σ g ( W ( l ) r d ( l ) t + U ( l ) r s ( l ) t − + b ( l ) r ) . Likewise, theprevious hidden state s ( l ) t − goes through the cell, drops outdated memory, and inserts some now content, generating the currenthidden state, that is s ( l ) t = (1 − z ( l ) t ) ⊗ s ( l ) t − (6) + z ( l ) t ⊗ σ h (cid:16) W ( l ) s d ( l ) t + U ( l ) s ( r ( l ) t ⊗ s ( l ) t − ) + b ( l ) s (cid:17) . The hidden state is also equal to its output of this hidden layer, i.e., d ( l +1) t = G ( l ) ( d ( l ) t ) = s ( l ) t , where G ( l ) ( · ) denotes theinput-output function. B. DL-based Channel Predictor
To shed light on the principle of a DL-based predictor, as shown in Fig.3, the chain of signal reception at the receiver isdemonstrated. The predictor is inserted at the end of a channel estimator and generates predicted CSI to replace outdated CSIas the input for a relay selector. It is transparent and therefore an ORS system can be smoothly upgraded to a PRS systemwithout any other modifications. In such a distributed-selection method, each relay requires to process only local CSI h k,d [ t ] .As we know, a complex-valued fading coefficient can be expressed in polar form as h k,d [ t ] = a k,d [ t ] e jθ k,d [ t ] , where a k,d [ t ] and θ k,d [ t ] denote the magnitude and phase, respectively. Because the selection relies on the value of SNR, only the knowledge ofmagnitude a k,d [ t ] is enough, rather than complex-valued h k,d [ t ] , which in turn can simplify the implementation of the channelpredictor by employing a neural network with real-valued weights and biases. Feeding a k,d [ t ] into the input feed-forward layerobtains one-dimensional output d (1) t = d (1) t = δ h ( w ( i ) a k,d [ t ] + b ( i ) ) , where w ( i ) and b ( i ) denote the weight and bias of theinput layer. The activation of the st hidden layer is exactly d (1) t , then d (2) t = L (1) ( d (1) t ) is generated and forwarded to the nd hidden layer, where L (1) ( · ) is defined in (5). The activation goes through the network until the output layer gets the predictedCSI, which is computed by ˇ a k,d [ t +1] = δ h ( W ( o ) d ( L ) t + b ( o ) ) , where W ( o ) and b ( o ) denote the weight matrix and bias of theoutput layer, and the activation of the last hidden layer equals to d ( L ) t = L ( L ) ( . . . L (2) ( L (1) ( d (1) t ))) . The building of a deeprecurrent network is flexible, for example, we can apply a hybrid network consisting of RNN, GRU, and LSTM layers, like d ( L ) t = G ( L ) ( . . . L (2) ( R (1) ( d (1) t ))) . February 8, 2021 DRAFT . C
OMPUTATIONAL C OMPLEXITY
In the context of cooperative diversity, the computational complexity mainly arises from multi-relay coordination and syn-chronization [2]. The simplicity of ORS is achieved thanks to single-relay transmission that substantially lowers the amountof signalling overhead among multiple relays. A direct comparison of different schemes is not easy and does not provide realinsight. That is why most of the works in this field [2]–[12] did not provide a quantitative analysis. On the other hand, thecomplexity of the proposed scheme comes mainly from the DL-based predictor, which is always a concern for the applicationof deep learning. From a practical perspective, it is more meaningful to make clear its demand on computing resources incomparison with the capability of COTS hardware. Hence, let’s focus on assessing the complexity of the predictors in terms offloating-point operations per second (FLOPS).A deep recurrent network can be quantitatively modelled as follows: an input layer with N i neurons, an output layer with N o neurons, and L hidden layers, which has N lh neurons at layer l = 1 , . . . , L . To begin with the input layer, it computes δ h ( W ( i ) d + b ( i ) ) , where the matrix multiplication generates N i N h floating-point multiplicative operations and ( N i − N h additive operations, and the addition of the bias vector consumes N h operations, amounting to a total of N i N h . Note that theamount of computation raised by the activation function is negligible compared to the matrix multiplication, which is usuallyignored in the calculation of complexity for deep learning. Likewise, it is easy to know that the output layer corresponds to N Lh N o . For an RNN hidden layer as given in (3), the number of operations equals to O l = (2 N l − h − N lh +(2 N lh − N lh + N lh ,where the first term corresponds to the calculation of W ( l ) d ( l ) t , the second is for U ( l ) d ( l +1) t − , and the third is due to the additionof the bias. For simplicity, O l can be approximated to N l − h N lh + 2( N lh ) . Then, the overall complexity for a simple RNN isgiven by O rnn ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) , (7)where we apply N h = N i for a simpler expression. As derived from (4)-(5), the number of operations for the matrix multiplicationon an LSTM layer is times that of an RNN layer, i.e., O l . The computation for the gate control, which has totally N lh − operations, can be neglected. Therefore, the complexity of an LSTM network is approximated by O lstm ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) . (8)Similarly, we can derive the expression for GRU, i.e., O gru ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) (9)Note that the above expressions are the complexity per prediction step, we need to multiply (7)-(9) with the frequency ofprediction denoted by f p , i.e., the number of steps performed per second, to figure out FLOPS.Given the concrete values of these parameters, the complexity of the predictor is quantified to compare with the capacityof COTS computing hardware. Suppose the applied deep neural network has two LSTM hidden layers with N h = N h = 25 neurons . The input for the predictor at the k th relay is a k,d [ t ] , corresponding to N i = N o = 1 . It amounts to O lstm = 15 , floating-point operations per prediction in terms of (8). The interval of prediction step is assumed to be , the frequency ofprediction equals to f p = 1 , , resulting in . . In comparison with off-the-shelf Digital Signal Processors (DSPs), The selection of such hyper-parameters will be justified in the next section.
February 8, 2021 DRAFT
Number of Hidden Neurons P r ed i c t i on A cc u r a cy -4 LSTM-1LSTM-2LSTM-3LSTM-4RNN-2GRU-2 (a)
SNR [dB] -6 -4 -2 O u t age P r obab ili t y ORS-3msORS-2msOSTC-3msOSTC-2msPRS-3msPRS-2msPrefect (b)
SNR [dB] C apa c i t y [ bp s / H z ] Perfect CSIPRS-3msOSTC-3msORS-3ms (c)Fig. 4. (a) Prediction accuracy in terms of the number of hidden neurons. (b) Comparison of outage probability for ORS, OSTC, and PRS in a cooperativenetwork with K =8 relays; (c) Comparison of channel capacity for ORS, OSTC, and PRS in a cooperative network with K =8 relays. e.g., TI C6678, which provides a computation capacity of up to , the required computing resource occupies lessthan . of a single DSP chip. Taking into account its back-compatibility to legacy hardware, we further check low-endDSPs. Given TI C6748 that has computation power of . as an example, the resource required by the predictoris around . . In a nutshell, the complexity of the DL-based channel predictor applied for PRS is quite affordable, if notnegligible. VI. S IMULATION RESULTS
In this section, we clarify how to select the hyper-parameters of a deep recurrent network to obtain high prediction accuracyand then make use of Monte-Carlo simulations to evaluate the outage probability and channel capacity of PRS, compared withthe existing schemes including ORS and OSTC. Following the channel assumption adopted by most of the previous worksin this field, we would apply single-antenna flat-fading i.i.d. channels. Each channel follows the Rayleigh distribution withan average power gain of , where its fading coefficient h ∼CN (0 , . The default maximal Doppler frequency shift is setto f d =100Hz , emulating fast fading environment. Continuous-time channel responses are sampled with a rate of f s =1KHz ,adhering to the assumption of flat fading, and therefore the interval of samples is T s =1ms . Each channel generates a series of consecutive samples { h [ t ] (cid:12)(cid:12) t =1 , , . . . , } . As usual, an EE target rate of R =1bps / Hz is applied for outage calculation.The total transmit power P is equally allocated between two phases, where the source’s power is P s =0 . P , resulting in anaverage SNR ¯ γ s,k =0 . P/σ n , while ¯ γ k,d =0 . P/σ n for the RD link. A. Training the Predictor
The hyper-parameters of a deep network such as the number of layers or neurons have a substantial impact on predictionaccuracy. It is worth clarifying how to tune a deep network on demand. A training process starts from an initial state where allweights and biases are randomly selected. The input of the predictor at the relay is a k,d [ t ] and the output is its D -step-aheadprediction ˇ a k,d [ t + D ] . To measure prediction accuracy, the mean squared error (MSE) is applied as the cost function, namely MSE = T (cid:80) Tt =1 | a k,d [ t + D ] − ˇ a k,d [ t + D ] | , where T is the total number of channel samples for evaluation. Using the batch training, a batch of samples is fed into the network per step. The output is compared with the desired values and the February 8, 2021 DRAFT esultant error signals are propagated back through the network to update the weights by means of training algorithms such asthe Adam optimizer used in our simulation. After epochs, the trained network is employed to predict CSI.Fig.4a compares the prediction accuracy of the predictors with different hyper-parameters. Let’s first look at the impact ofthe number of layers and the number of neurons. Starting from an LSTM network with a single hidden layer, denoted by LSTM-1 in the legend of the figure, its accuracy curve as a function of the number of hidden neurons likes an ‘U’ shape. Thatis because the network suffers from the under-fitting problem with only neurons in the hidden layer, while the over-fitting problem appears using over neurons. To make a fair comparison, the horizontal axis represents the total number of hiddenneurons, which are evenly allocated across layers. For instance, the point of ‘60’ in the horizontal axis means a 2-hidden-layernetwork with neurons at either layer (denoted by LSTM-2 ), a 3-hidden-layer network with neurons per layer (denoted by LSTM-3 ), or a single layer with hidden neurons. No matter how many neurons in its single hidden layer, LSTM-1 cannotreach the high accuracy achieved by
LSTM-2 and
LSTM-3 , justifying the benefit of deep learning. But it does not mean that themore layers, the better, as shown by the worse result of
LSTM-4 , which has 4 hidden layers. After known that 2-hidden-layer isthe best choice for LSTM, we further observe the recurrent networks with 2 RNN or GRU hidden layers, indicated by
RNN-2 and
GRU-2 , respectively. As we can see, GRU performs as good as LSTM, whereas RNN is weak. As a result, we select a2-hidden-layer LSTM network with neurons at either layer, upon which the numerical results in the following figures arederived. B. Performance Comparison
We further compare the outage performance of three relaying schemes in a cooperative network with K =8 relays, as illustratedin Fig.4b. The relay selection with the perfect knowledge of CSI (i.e., ρ =1 ) is used as the benchmark, which has the diversityorder of and decays at a rate of / ¯ γ , where ¯ γ = P/σ n is the average EE SNR. With the delay of τ = 2 and , the qualityof outdated CSI drops to ρ o = J (0 . π ) ≈ . and J (0 . π ) ≈ . , respectively, which substantially deteriorates theperformance. The diversity of ORS falls into , i.e., no diversity, and the curve decays slowly at a rate of / ¯ γ in the high SNRregime. OSTC can redeem some loss and achieve the diversity order of by using a pair of relays, but its gap to the benchmarkis still large, more than at the level of − . Making use of channel prediction, the quality of CSI can be improved to ρ > . . The proposed scheme achieves nearly the optimal performance with the horizon of (by setting D = 2 stepsprediction), and remarkably outperforms OSTC with a gain of approximately in the case of . Moreover, the channelcapacities for different schemes given τ = 3ms are comparatively illustrated in Fig.4c. At the SNR of ¯ γ =20dB , for instance,ORS, OSTC, and PRS achieves . , . , and . / Hz , respectively, where ORS suffers from a loss of around / Hz butPRS achieves a near-optimal capacity. VII. C ONCLUSIONS
In this paper, we proposed a deep-learning-aided cooperative diversity method for mobile terminals without an antennaarray to cultivate the benefit of spatial diversity. A recurrent neural network was deliberately built to improve the timelinessof channel state information applied for selecting a single opportunistic relay. Simply inserting a channel predictor betweenthe channel estimator and relay selector, an ORS system can be upgraded to a PRS system without any other modifications,making it transparent and easier to compatible with the existing systems and standards. It achieves the optimal performance withthe full diversity order equaling to the number of cooperating relays in slow fading wireless environments, and substantially
February 8, 2021 DRAFT utperforms the existing schemes in fast fading channels. It inherits the simplicity of ORS by avoiding multi-relay coordinationand synchronization, and the computational complexity arising from fading channel prediction is negligible compared withCOTS hardware. From the perspective of performance , compatibility , and complexity , it is viewed as a good candidate fornext-generation cooperative networks. R EFERENCES[1] W. Jiang, “Device-to-device based cooperative relaying for 5G network: A comparative review,”
ZTE Commun. , vol. 15, no. S1, pp. 60–66, Jun. 2017.[2] A. A. Nasir et al. , “Timing and carrier synchronization with channel estimation in multi-relay cooperative networks,”
IEEE Trans. Signal Process. , vol. 60,no. 2, pp. 793–811, Feb. 2012.[3] J. N. Laneman and G. W. Wornell, “Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks,”
IEEE Trans. Inf.Theory , vol. 49, no. 10, pp. 2415–2425, Oct. 2003.[4] A. Bletsas et al. , “A simple cooperative diversity method based on network path selection,”
IEEE J. Sel. Areas Commun. , vol. 24, no. 3, pp. 659–672,Mar. 2006.[5] W. Jiang et al. , “An MGF-based performance analysis of opportunistic relay selection with outdated CSI,” in
Proc. IEEE VTC’2014-Spring , Seoul, SouthKorea, May 2014.[6] J. L. Vicario et al. , “Opportunistic relay selection with outdated CSI: Outage probability and diversity analysis,”
IEEE Trans. Wireless Commun. , vol. 8,no. 6, pp. 2872–2876, Jun. 2009.[7] W. Jiang et al. , “Opportunistic relaying over aerial-to-terrestrial and device-to-device radio channels,” in
Proc. IEEE ICC’2014 , Sydney, Australia, Jul.2014, pp. 206–211.[8] L. Xiao and X. Dong, “Unified analysis of generalized selection combining with normalized threshold test per branch,”
IEEE Trans. Wireless Commun. ,vol. 5, no. 8, pp. 2153–2163, Aug. 2006.[9] Y. Li et al. , “On the design of relay selection strategies in regenerative cooperative networks with outdated CSI,”
IEEE Trans. Wireless Commun. , vol. 10,no. 9, pp. 3086–3097, Sep. 2011.[10] W. Jiang, T. Kaiser, and A. J. H. Vinck, “A robust opportunistic relaying strategy for co-operative wireless communications,”
IEEE Trans. WirelessCommun. , vol. 15, no. 4, pp. 2642–2655, Apr. 2016.[11] W. Jiang, H. Cao, and T. Kaiser, “Opportunistic space-time coding to exploit cooperative diversity in fast-fading channels,” in
Proc. IEEE ICC’2014 ,Sydney, Australia, Jun. 2014, pp. 4814–4819.[12] W. Jiang et al. , “Achieving high reliability in aerial-terrestrial networks: Opportunistic space-time coding,” in
Proc. IEEE Eur. Conf. on Net. and Commun.(EUCNC) , Bologne, Italy, Jun. 2014.[13] W. Jiang and H. D. Schotten, “Neural network-based fading channel prediction: A comprehensive overview,”
IEEE Access , vol. 7, pp. 118 112–118 124,Aug. 2019.[14] ——, “Deep learning for fading channel prediction,”
IEEE Open J. Commun. Society , vol. 1, pp. 320–332, Mar. 2020.[15] W. Jiang and H. Schotten, “Neural network-based channel prediction and its performance in multi-antenna systems,” in
Proc. IEEE Vehicular Tech. Conf.(VTC) , Chicago, USA, Aug. 2018.[16] W. Jiang and H. D. Schotten, “Recurrent neural network-based frequency-domain channel prediction for wideband communications,” in
Proc. IEEEVehicular Tech. Conf. (VTC) , Kuala Lumpur, Malaysia, Apr. 2019.[17] W. Jiang, H. Schotten, and J. Y. Xiang, “Neural network–based wireless channel prediction,” in
Machine Learning for Future Wireless Communications ,F. L. Luo, Ed. United Kindom: John Wiley&Sons and IEEE Press, 2019, ch. 16.[18] W. Jiang and H. D. Schotten, “Multi-antenna fading channel prediction empowered by artificial intelligence,” in
Proc. IEEE Veh. Tech. Conf. (VTC) ,Chicago, USA, Aug. 2018.[19] W. Jiang and H. Schotten, “Recurrent neural networks with long short-term memory for fading channel prediction,” in
Proc. IEEE Veh. Tech. Conf. (VTC) ,Antwerp, Belgium, May 2020.[20] W. Jiang, M. Strufe, and H. Schotten, “Long-range MIMO channel prediction using recurrent neural networks,” in
Proc. IEEE Consumer Commun. &Netw. Conf. (CCNC) , Los Vegas, USA, Jan. 2020.[21] W. Jiang and H. D. Schotten, “A deep learning method to predict fading channel in multi-antenna systems,” in
Proc. IEEE Veh. Tech. Conf. (VTC) ,Antwerp, Belgium, May 2020.[22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neural Computation , vol. 9, no. 8, pp. 1735–1780, Dec. 1997.[23] K. Cho et al. , “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” preprint arXiv:1406.1078 , Jun. 2014., Jun. 2014.