[PDF] A Simple Cooperative Diversity Method Based on Deep-Learning-Aided Relay Selection

Abstract

Opportunistic relay selection (ORS) has been recognized as a simple but efficient method for mobile nodes to achieve cooperative diversity in slow fading channels. However, the wrong selection of the best relay arising from outdated channel state information (CSI) in fast time-varying channels substantially degrades its performance. With the proliferation of high-mobility applications and the adoption of higher frequency bands in 5G and beyond systems, the problem of outdated CSI will become more serious. Therefore, the design of a novel cooperative method that is applicable to not only slow fading but also fast fading is increasingly of importance. To this end, we develop and analyze a deep-learning-aided cooperative method coined predictive relay selection (PRS) in this article. It can remarkably improve the quality of CSI through fading channel prediction while retaining the simplicity of ORS by selecting a single opportunistic relay so as to avoid the complexity of multi-relay coordination and synchronization. Information-theoretic analysis and numerical results in terms of outage probability and channel capacity reveal that PRS achieves full diversity gain in slow fading wireless environments and substantially outperforms the existing schemes in fast fading channels.

Full PDF

11 A Simple Cooperative Diversity Method Basedon Deep-Learning-Aided Relay Selection

Wei Jiang,

Senior Member, IEEE and Hans Dieter Schotten,

Member, IEEE

Abstract

Opportunistic relay selection (ORS) has been recognized as a simple but efﬁcient method for mobile nodes toachieve cooperative diversity in slow fading channels. However, the wrong selection of the best relay arising fromoutdated channel state information (CSI) in fast time-varying channels substantially degrades its performance. Withthe proliferation of high-mobility applications and the adoption of higher frequency bands in 5G and beyond systems,the problem of outdated CSI will become more serious. Therefore, the design of a novel cooperative method thatis applicable to not only slow fading but also fast fading is increasingly of importance. To this end, we developand analyze a deep-learning-aided cooperative method coined predictive relay selection (PRS) in this article. It canremarkably improve the quality of CSI through fading channel prediction while retaining the simplicity of ORS byselecting a single opportunistic relay so as to avoid the complexity of multi-relay coordination and synchronization.Information-theoretic analysis and numerical results in terms of outage probability and channel capacity reveal thatPRS achieves full diversity gain in slow fading wireless environments and substantially outperforms the existingschemes in fast fading channels.

Index Terms

Cooperative diversity, channel state information, channel prediction, deep learning, LSTM, opportunistic relaying

I. I

NTRODUCTION I N wireless communications [1], diversity is an important and essential technique, which can effectively combatthe effect of multi-path channel fading by means of transmitting redundant signals over independent channelsand then combining multiple faded copies at the receiver. Spatial diversity is particularly attractive as it can beeasily combined with other forms of diversity and achieve higher diversity order by simply installing more antennas.Because of the constraint on power supply, hardware size, and cost, it is difﬁcult for mobile terminals in cellularsystems or wireless nodes in ad hoc networks to exploit spatial diversity at sub- carrier frequencies. Therefore,cooperative diversity (cf. user cooperation diversity of [2]) has been proposed to break through this barrier. Exploiting

Corresponding author: Wei Jiang (e-mail: [email protected])

W. Jiang is with German Research Centre for Artiﬁcial Intelligence (DFKI), Kaiserslautern, Germany, and is also with the University ofKaiserslautern, Germany, (e-mail: [email protected]).H. D. Schotten is with German Research Centre for Artiﬁcial Intelligence (DFKI), Kaiserslautern, Germany, and is also with the Universityof Kaiserslautern, Germany, (e-mail: [email protected]).

February 9, 2021 DRAFT a r X i v : . [ c s . I T ] F e b the broadcast nature of radio signals in a relay channel [3], cooperating terminals share their distributed antennasto form ‘a virtual array’. In such a cooperative network, when a node sends a signal, its neighboring nodes couldact as relays to decode-and-forward (DF) or amplify-and-forward (AF) this signal. By combining multiple copiedversions of the original signal at the destination, the network achieves cooperative diversity that is equivalent tospatial diversity gained from co-located multi-antenna systems [4].To achieve cooperative diversity, a cooperation strategy is required to rule which nodes should participate inrelaying and how to collaboratively retransmit? The repetition-based cooperative strategies presented in [5] simplyrepeat the signal on orthogonal channels to realize full diversity, but this gain comes with a price of substantial losson spectral efﬁciency. To avoid this penalty, a method called distributed beamforming has been discussed in [2], [6].Assuming a priori knowledge of forward channels, the source and relays could simultaneously transmit signals fora coherent combination at the receiver. Beamforming is vulnerable to phase noise, whereas radio-frequency-chaincalibration among distributed antennas (relays) to align phase distortion is difﬁcult to implement. In [7], an approachcalled distributed space-time coding (DSTC) has been proposed. Although full diversity on the order of the numberof relays can be achieved, designing such a code is still an open issue since the number of distributed antennasis unknown and time-varying. Additionally, multiple timing offset (MTO) [8] and multiple carrier frequency offset(MCFO) [9] among spatially-distributed relays make the aforementioned multi-relay transmission too complicatedfor practical systems.Inspired by the beneﬁt of selection diversity from multi-user selection [10] and antenna selection [11], relayselection was proposed to simplify the implementation of cooperative networks. In [12], a location-based approachthat selects the best relay based on ideas from geographical random forwarding [13] was presented. Assuming thateach node knows its own position, as well as that of the destination, the node closest to the destination serves as therelay. Such schemes are more appropriate for static networks but less appropriate for mobile networks because theestimation of positions or distances among all nodes is not a trivial task. In contrast, another single-relay approachreferred to as opportunistic relay selection (ORS) [14] that requires no topology information was proposed. Usinglocal channel measurements, this approach opportunistically selects a single relay with the best channel condition(in accordance to a given selection criterion [15]). From the viewpoint of multiplexing-diversity trade-off, ORShas no performance loss compared to more complex protocols such as DSTC. Most importantly, it substantiallylowers the complexity of implementation by avoiding synchronization among multiple transmitting relays whilethe requirement of space-time codes is completely eliminated. It was recognized as a simple but efﬁcient way toachieve cooperative diversity in slow fading channels. In fast fading wireless environment, however, the measuredchannel state information (CSI) for relay selection may differ from the actual channel quality at the instant of signalrelaying due to processing and feedback delay. The outdated CSI causes wrong relay selection, which drasticallydeteriorates the performance of ORS, as extensively veriﬁed in [16]–[20]. With the proliferation of high-mobilityapplications and the adoption of higher frequency bands in 5G and beyond systems, the problem of outdated CSIwill become more serious. According to the Doppler effect in signal propagation [1], transmitting signals at higherfrequency (such as millimeter wave and Terahertz communications) or moving at a higher speed (e.g., vehicularcommunications, high-speed trains, and unmanned aerial vehicles) will increase the frequency shift, leading to a February 9, 2021 DRAFT faster time-varying channel. Hence, the design of a simple cooperative method that can be also applicable to fastfading channels is increasingly of signiﬁcance for next-generation wireless communications.To the best knowledge of the authors, a few proposals for cooperative diversity in the presence of outdated CSIhave been reported in the literature. Generalized selection combining and its enhanced version [21]–[23], whichselect N relays with good channel quality to retransmit in an orthogonal manner, exhibit robustness in the presenceof outdated CSI, whereas its loss of spectral efﬁciency to /N is not acceptable. The authors of [24] proposeda method utilizing the knowledge of channel statistics. It gets only marginal performance improvement, but thecomplexity obviously grows. In [25], [26], one author of this article designed a scheme called opportunistic space-time coding (OSTC) that combines the beneﬁts of both opportunistic relaying and distributed space-time coding .A ﬁxed number of N relays are opportunistically selected and N -dimensional orthogonal space-time block codingis employed on these relays. It can improve the performance of cooperative networks over fast fading channelswhile avoiding the loss of spectral efﬁciency. However, its performance gap away from the full diversity achievedby using perfect CSI is still large, motivating our works presented in this article.Channel prediction [27]–[29], which can improve the timeliness of CSI without spending radio resources, ispromising to combat outdated CSI. It earns a prediction horizon that can be used to counteract induced delay.Modeling a wireless channel into a set of propagation parameters, two statistical predictive approaches - auto-regressive [30] and parametric model [31] - have been proposed. But these models are fossilized, leaving a gapfrom real channels, and - in addition - the parameter estimation relying on complex algorithms such as MUSIC andESPRIT [32] is tedious, harmed its applicability in practical systems [33]. In 2016, when AlphaGo [34], a deeplearning (DL) computer program, achieved a historic victory versus a human champion, the passion of exploringArtiﬁcial Intelligence (AI) in almost every scientiﬁc and engineering branches was ignited [35]. As an importantAI technique, recurrent neural networks show strong capability on time-series prediction [36] and are applied toprovide a data-driven alternative to efﬁciently implement wireless channel prediction [37]–[39].Taking advantage of new degree of freedom opened by channel prediction, we develop and analyze a novelcooperative diversity method coined predictive relay selection (PRS) in this article. Its key idea is to apply a DL-based channel predictor to improve the quality of CSI so as to lower the probability of wrong relay selection. Tothis end, a deep recurrent network that speciﬁcally adapts to the characteristics of CSI data is elaborately built.To avoid MTO and MCFO in multi-relay transmission, only a single relay is opportunistically selected in termsof predicted CSI. Frame structures supporting for either distributed or centralized PRS are designed accordingly.Information theoretic analysis is conducted by deriving closed-form expressions for outage probability and channelcapacity, which are corroborated by simulation results. Moreover, its computational complexity, robustness, andscalability are investigated. The contributions and organization of this article are listed as follows:1) Section II models a half-duplex dual-hop cooperative network using either AF or DF relays, and reviews theexisting schemes including ORS and OSTC.2) Section III provides the principle of deep recurrent neural networks and the methodology to build channelpredictors. The statistics of predicted CSI and the computational complexity for predictors are analyzed.3) Section IV presents the proposed scheme and the design of two frame structures for distributed and centralized February 9, 2021 DRAFT

PRS, respectively.4) In Section V and VI, information-theoretic analyses for the proposed scheme in both AF and DF relayingare conducted through deriving closed-form expressions of outage probability and channel capacity.5) In Section VII, the acquisition of CSI dataset and the selection of hyper-parameters for high-accuracyprediction are clariﬁed. Performance evaluation is carried out through Monte-Carlo simulations to corroboratethe theoretical analyses. Moreover, we study its robustness against additive noise, synchronization error,mobility, and fading statistics, scalability in terms of the number of relays, and computational complexity incomparison with the capability of commercial off-the-shelf (COTS) computing hardware.6) Finally, Section VIII concludes this article.

Notations :: Throughout this article, bold lower-case and upper-case letters denote vectors and matrices, respec-tively. For their operation, ( · ) ∗ , ( · ) T , and ( · ) H notate the conjugate, transpose, and Hermitian transpose, respectively, (cid:107) · (cid:107) expresses the Frobenius norm, and ⊗ marks the Hadamard (element-wise) product. E denotes the statisticalexpectation, P is the notation of mathematical probability, (cid:60) and (cid:61) take the real and imaginary units of a complexquantity. h , ˆh , and ˇh represent the actual, outdated, and predicted CSI, respectively.II. S YSTEM M ODEL

Following the working assumption for the majority of prior research works in [6]–[9], [14]–[20], we consider adual-hop cooperative network where a single source node s communicates with a single destination node d with theaid of K relays, neglecting the direct link at the destination for simplifying the analysis . Each node is equippedwith a single antenna that is used for both signal transmission and reception over a narrow-band channel. Althoughthe proposed scheme is applicable for any kind of wireless channel statistics, without loss of generality, we adoptRayleigh fading to analyze performance for simplicity. Thus, the channel realization is a zero-mean circularly-symmetric complex Gaussian random variable with variance σ h , i.e., h ∼CN (0 , σ h ) . The received signal in anarbitrary link A → B is modeled as y B = h A,B x A + z B , where x A ∈ C is the transmitted symbol from node A with average power P A = E [ | x A | ] , h A,B represents the fading coefﬁcient of the channel from A to B , and z B stands for additive white Gaussian noise with zero-mean and variance σ n , i.e., z ∼CN (0 , σ n ) . The instantaneoussignal-to-noise ratio (SNR) is denoted by γ A,B = | h A,B | P A /σ n and the average SNR ¯ γ A,B = E [ γ A,B ]= σ h P A /σ n .Node A can be the source A = s or a relay A = k , k ∈{ , ..., K } , corresponding to B = k or B = d . It is notedthat relay selection depends on instantaneous channel realizations or equivalently on received instantaneous SNRs,which are interchangeably used in the context of relay selection hereinafter.From a practical point of view, there exists a delay between the time of relay selection and the instant of usingthe selected relay to transmit. The actual CSI h may differ from its outdated version ˆ h that is applied for selecting With a direct link, the overall signal-to-noise ratio (SNR) is γ tot = γ s,d + γ k,d under maximal-ratio combining at the receiver, where γ s,d and γ k,d are the SNRs of the direct and relay link. Its achievable diversity order is K + 1 and with the prefect and outdated CSI, respectively,compared to K and in the case of no direct link. Neglecting the direct link does not affect the performance impact of outdated CSI on therelay selection, as illustrated by the results in the simulation section. February 9, 2021 DRAFT relays. To quantify the quality of CSI, the correlation coefﬁcient between h and ˆ h is introduced, i.e., ρ o = E [ h ˆ h ∗ ] (cid:113) E [ | h | ] E [ | ˆ h | ] . (1)According to [11], we have ˆ h = σ ˆ h (cid:16) ρ o σ h h + ε (cid:112) − ρ o (cid:17) , where ε is a random variable with standard normaldistribution ε ∼ CN (0 , and σ h is the variance of ˆ h . With the classical Doppler spectrum of the Jakes model, ittakes the value ρ o = J (2 πf d τ ) , (2)where f d is the maximal Doppler frequency, τ stands for the delay between the outdated and actual CSI, and J ( · ) denotes the zeroth order Bessel function of the ﬁrst kind. ... :: [x , x ] ORS / PRS xOSTC

Broadcasting

Relaying ... KK Relaying

Broadcasting

Decoding

Subset : ORS / PRS x : *1 2*2 1 x -xOSTC x x      Fig. 1. Schematic diagram of a cooperative network with different DF relaying strategies: ORS, PRS, and OSTC. In the st phase, the sourcebroadcasts a signal, while the relays that successfully decode this signal form a DS . In the nd phase, the selected node(s) from the DS forwards the regenerated signal. Examples of deployment scenarios for such a cooperative network include: a ﬂying drone suffering from sparsesignal coverage maintains its connectivity to a network via a group of ground terminals; a platoon of moving vehicles optimize their mutualcommunications via relaying; a set of Internet-of-Things (IoT) devices collaboratively improve the reliability to access an edge server; and, afew neighboring user terminals in a cell, especially at cell edge, cooperatively boost their performance in uplink. A. Decode-and-Forward

Due to severe signal attenuation, a single-antenna relay should operate in half-duplex mode to prevent fromharmful self-interference between the transmitter and receiver. Without loss of generality, orthogonal transmissionbetween the source and relays using time-division multiplexing is used for analysis throughout the sequel (whilefrequency-division multiplexing can also be equivalently applied). Therefore, its signal transmission is organizedin two phases: the source broadcasts a signal in the source-to-relay (denoted by SR hereinafter) link, and then therelays retransmit this signal in the relay-to-destination ( RD ) link. In the ﬁrst phase, as shown in Fig.1, the source(e.g., the drone in the ﬁgure) sends a symbol x and those relays which overhear and correctly decode this signalform a decoding subset ( DS ) of the SR link DS (cid:44) (cid:26) k (cid:12)(cid:12)(cid:12)(cid:12)

12 log (1 + γ s,k ) (cid:62) R (cid:27) = { k | γ s,k (cid:62) γ o } , (3) February 9, 2021 DRAFT where R is an end-to-end target rate for the dual-hop relaying, corresponding to a threshold SNR γ o = 2 R − .Note that the required data rate for either hop is doubled to R due to the adoption of half-duplex transmission.The best relay (denoted by ˙ k ) in the conventional ORS is opportunistically selected from DS in terms of ˙ k = arg max k ∈DS ˆ γ k,d , where ˆ γ k,d is the SNR of the RD link at the instant of relay selection , which is an outdatedversion of the actual SNR γ k,d during signal transmission. In contrast, the proposed PRS scheme replaces outdatedCSI with predicted CSI ˇ h , and determines ˙ k in terms of ˙ k = arg max k ∈DS ˇ γ k,d , where ˇ γ k,d = | ˇ h k,d | P k /σ n .In addition to the best relay, the OSTC scheme [25] needs another relay with the second strongest SNR, i.e., ¨ k = arg max k ∈DS−{ ˙ k } ˆ γ k,d . In the ﬁrst phase, the source broadcasts a pair of symbols ( x , x ) over two consecutivesymbol periods. The regenerated symbols are encoded by means of the Alamouti scheme, which is the unique space-time code achieving both full rate and full diversity, at the pair of selected relays. In the second phase, a relaytransmits ( x , − x ∗ ) while another transmits ( x , x ∗ ) simultaneously at the same frequency over two symbol periods. B. Amplify-and-Forward

Compared to DF, the main difference of AF is that the best relay does not detect the received signal, whileonly amplifying it. In the ﬁrst phase, the source broadcasts x , and thus the received signal at the k th relay is y k = h s,k x + z k . Relay k normalizes y k to form a retransmitted signal: x k = √ P k y k (cid:112) E [ | y k | ] = √ P k ( h s,k x + z k ) (cid:112) P s | h s,k | + σ n , (4)where P s = E [ | x | ] is the average transmit power of the source and P k = E [ | x | k ] is the average power for retrans-mission. The receiver at the destination gets y d = h k,d x k + z d = √ P k h k,d ( h s,k x + z k ) (cid:112) P s | h s,k | + σ n + z d . (5)Thus, the received SNR for this end-to-end ( EE ) link is γ skd = γ s,k γ k,d γ s,k + γ k,d + 1 , k ∈ { , ..., K } . (6)For the sake of mathematical tractability, as recommended in [19], a tight upper bound is used to approximate (6),that is γ skd (cid:54) γ k = min { γ s,k , γ k,d } , k ∈ { , ..., K } . (7)As explained previously, the instantaneous SNR used for relay selection is an outdated version of (7), i.e., ˆ γ k =min { ˆ γ s,k , ˆ γ k,d } . The ORS scheme [18] opportunistically selects the best path out of K possible EE links, we have ˙ k = arg max k ∈{ ,...,K } { ˆ γ k } . In contrast, the proposed scheme chooses the best relay as ˙ k = arg max k ∈{ ,...,K } { min(ˇ γ s,k , ˇ γ k,d ) } ,where ˇ γ s,k and ˇ γ s,k are predicted CSI for the SR and RD link, respectively.III. D EEP LEARNING - BASED C HANNEL P REDICTION

This section ﬁrst introduces the principle of deep recurrent networks including simple recurrent neural network(RNN), Long Short-Term Memory (LSTM) [40], and Gated Recurrent Unit (GRU) [41], followed by explaining howto apply a recurrent network to build a channel predictor [42]. The statistics of predicted CSI and the computationalcomplexity for these predictors are also analyzed.

February 9, 2021 DRAFT

A. Deep Recurrent Networks

Unlike unidirectional information ﬂow in feed-forward neural networks, RNN has recurrent self-connections tomemorize historical information, exhibiting great potential in time-series prediction [36]. The activation of theprevious time step is fed back as part of the input for the current step. In a simple RNN, its l th recurrent layer isgenerally modeled as d ( l +1) t = R ( l ) ( d ( l ) t ) = δ h (cid:16) W ( l ) d ( l ) t + U ( l ) d ( l +1) t − + b ( l ) (cid:17) , (8)where W ( l ) and U ( l ) are weight matrices of the l th layer, b ( l ) is a bias vector, d ( l ) t and d ( l +1) t represent the inputand output for layer l at time t , respectively, d ( l +1) t − is the feedback from the previous step, R ( l ) ( · ) stands for therelation function for the input and output of the l th RNN hidden layer, and the activation function often selects the hyperbolic tangent denoted by tanh , i.e., δ h ( x ) = ( e x − / ( e x + 1) .Using typical stochastic gradient descent (SGD) method to train a recurrent network, the back-propagated errorsignals tend to zero that implies a prohibitively-long convergence time. To tackle this gradient-vanishing problem,Hochreiter and Schmidhuber proposed Long Short-Term Memory in their pioneer work of [40], which introduced cell and gate into the RNN structure. The former is a special memory unit and the latter regulates read and writeaccess to the cell. In 1999, Gers et al. [43] further introduced a new gate that learns to reset the hidden stateat appropriate times. Then, a common LSTM cell has three gates: an input gate controlling the extent of newinformation ﬂows into the cell, a forget gate to ﬁlter out useless memory, and an output gate that controls the extentto which the memory is applied to generate the activation. The upper part of Fig.2 shows the graphical depiction ofa deep LSTM network consisting of an input layer, L hidden layers, and an output layer. Let’s use the l th hiddenlayer as an example to shed light on how an activation signal goes through the network. There are two hiddenstates - the short-term state s ( l ) t − and the long-term state c ( l ) t − . The input d ( l ) t and s ( l ) t − jointly activate four fullyconnected (FC) layers, generating the activation vectors for the gates, i.e.,  i ( l ) t = δ g (cid:16) W ( l ) i d ( l ) t + U ( l ) i s ( l ) t − + b ( l ) i (cid:17) o ( l ) t = δ g (cid:16) W ( l ) o d ( l ) t + U ( l ) o s ( l ) t − + b ( l ) o (cid:17) f ( l ) t = δ g (cid:16) W ( l ) f d ( l ) t + U ( l ) f s ( l ) t − + b ( l ) f (cid:17) , (9)where W and U are weight matrices for the FC layers, b represents bias, subscripts i , o , and f associate with theinput, output, and forget gate, respectively, and δ g stands for the logistic Sigmoid function δ g ( x ) = 1 / (1 + e − x ) .The current long-term state c ( l ) t is obtained by ﬁrst throwing away outdated memory at the forget gate and thenadding new information selected by the input gate, i.e., c ( l ) t = f ( l ) t ⊗ c ( l ) t − + i ( l ) t ⊗ g ( l ) t , where the operator ⊗ denotesthe Hadamard product (element-wise multiplication) and g ( l ) t = δ h ( W ( l ) g d ( l ) t + U ( l ) g s ( l ) t − + b ( l ) g ) . The output of thishidden layer is computed by d ( l +1) t = L ( l ) (cid:16) d ( l ) t (cid:17) = o ( l ) t ⊗ δ h (cid:16) c ( l ) t (cid:17) , (10)where L ( l ) ( · ) represents the input-output function for the l th LSTM layer. Note that the current short-term state isequal to the output, i.e., s ( l ) t = d ( l +1) t . February 9, 2021 DRAFT

Hadamard

Addition tanhSigmoid

DL CSI PredictorRF

Detected

Symbols

Selection feedback

Received pilots [ ] t h [t-1] h [t- 2] h z − M ( ) lt d L L ( ) lt + d ( ) lt f ( ) lt g ( ) lt i ( ) lt o ( )1 lt − s ( )1 lt − c [ t + ] h ( z − z − z − Channel Estimator

Signal

Detector

Relay

Selector (1) t d (0) t d [t ] − h Fig. 2. Block diagram of the receiver integrated a DL-based channel predictor that mainly consists of an input layer, an output layer, and L hidden layers. The l th hidden layer is opened to detail the internal structure of an LSTM memory block and its information ﬂow. To remainhistorical channel information, a tapped-delay line is applied to form a series of consecutive CSI samples for the input layer. The predictoris inserted between the channel estimator and relay selector, transforming measured CSI to predicted CSI transparently without any othermodiﬁcations for an ORS system. Despite of its short history, LSTM has achieved a great success and been commercially applied in many AIproducts such as Apple Siri and Google Translate. After its emergence, the research community published a numberof its variants, among which GRU proposed by Cho et al. in [41] drew lots of attention. It’s a simpliﬁed versionwith fewer parameters, but it exhibits even better performance over LSTM on certain smaller and less frequentdatasets. To simplify the structure, a GRU memory cell has only a single hidden state, and the number of gatesis reduced to two: the update and reset gate. The activation vector for the update gate is computed by z ( l ) t = σ g ( W ( l ) z d ( l ) t + U ( l ) z s ( l ) t − + b ( l ) z ) , which decides the extend to which the memory content from the previous statewill remain in the current state. The reset gate controls whether the previous state is ignored, and when it tends to , the hidden state is reset with the current input. It is given by r ( l ) t = σ g ( W ( l ) r d ( l ) t + U ( l ) r s ( l ) t − + b ( l ) r ) . Likewise, theprevious hidden state s ( l ) t − goes through the cell, drops outdated memory, and inserts some new content, generatingthe current hidden state, that is s ( l ) t = (1 − z ( l ) t ) ⊗ s ( l ) t − (11) + z ( l ) t ⊗ σ h (cid:16) W ( l ) s d ( l ) t + U ( l ) s ( r ( l ) t ⊗ s ( l ) t − ) + b ( l ) s (cid:17) . The hidden state is also equal to its output of this hidden layer, i.e., d ( l +1) t = G ( l ) ( d ( l ) t ) = s ( l ) t , where G ( l ) ( · ) denotes the input-output function. February 9, 2021 DRAFT

B. DL-based Channel Predictor

To shed light on the principle of a DL-based predictor, as shown in Fig.2, the chain of signal reception at thereceiver is demonstrated. A predictor is inserted between the channel estimator and the relay selector, transformingmeasured CSI to predicted CSI as the input for relay selection. It is transparent and therefore an ORS system can besmoothly upgraded to a PRS system without any other modiﬁcations. Here, we use the centralized relay selectionas an example, where the CSI of all RD links at time t denoted by h d [ t ] = [ h ,d [ t ] , ..., h K,d [ t ]] T is processed atthe destination. For the distributed selection, each relay requires only local CSI h k,d [ t ] , which is simpler to handleand is therefore straightforwardly applicable. As illustrated in Fig.2, the instantaneous CSI h d [ t ] measured by thechannel estimator is fed into the predictor. To remain a few historical information, a tapped-delay line is applied.A series of consecutive CSI samples from h d [ t − τ ] to h d [ t ] is available for the DL predictor to generate a D -stepprediction ˇ h d [ t + D ] .As we know, a complex-valued fading coefﬁcient can be expressed in polar form as h k,d [ t ] = a k,d [ t ] e jθ k,d [ t ] ,where a k,d [ t ] and θ k,d [ t ] denote the magnitude and phase, respectively. Because the selection relies on the valueof SNR, only the knowledge of magnitude a k,d [ t ] is enough, rather than complex-valued h k,d [ t ] , which in turncan simplify the implementation of the channel predictor by employing a neural network with real-valued weightsand biases. A pre-processing layer is in charge of adapting the format of CSI data to the input layer. In this case,the magnitudes need to be extracted, e.g., a d [ t ] = [ a ,d [ t ] , ..., a K,d [ t ]] T from h d [ t ] . After that, the extracted data { a d [ t − τ ] , a d [ t − τ + 1] , · · · , a d [ t − , a d [ t ] } are multiplexed as an input vector, we have d (0) t = [ a ,d [ t − τ ] , a ,d [ t − τ ] , · · · , a K,d [ t ]] T , (12)which contains K × ( τ + 1) entries. Feeding this input vector into the input feed-forward layer obtains d (1) t = δ h ( W ( I ) d (0) t + b ( I ) ) , where W ( I ) and b ( I ) denote the weight matrix and bias vector of the input layer. Theactivation of the st hidden layer is exactly d (1) t , thus d (2) t = L (1) ( d (1) t ) is generated and forwarded to the nd hidden layer, where L (1) ( · ) is deﬁned in (10). The activation goes through the network until the output layergets the predicted CSI ˇ a d [ t +1] = [ˇ a ,d [ t +1] , ..., ˇ a K,d [ t +1]] T (assuming D = 1 ). It is computed by ˇ a d [ t +1] = δ h ( W ( O ) d ( L ) t + b ( O ) ) , where W ( O ) and b ( O ) denote the weight matrix and bias vector of the output layer, and theactivation of the last hidden layer equals to d ( L ) t = L ( L ) ( . . . L (2) ( L (1) ( d (1) t ))) . The building of a deep recurrentnetwork is ﬂexible, for example, we can apply a hybrid network consisting of RNN, GRU, and LSTM layers, like d ( L ) t = G ( L ) ( . . . L (2) ( R (1) ( d (1) t ))) .In addition to predict the magnitude of CSI, deep learning also provides the capability of processing complex-valued CSI [44]. Instead of applying a deep neural network with complex-valued weights, which is currently notwell supported by AI algorithms and software tools, we can decompose a fading coefﬁcient into two real numbersnamely h = (cid:60) ( h ) + j (cid:61) ( h ) , where (cid:60) ( · ) and (cid:61) ( · ) take the real and imaginary units of a complex number, and theimaginary unit j = − . Transforming h d [ t ] into c d [ t ] = [ (cid:60) ( h ,d [ t ]) , ..., (cid:60) ( h K,d [ t ]) , (cid:61) ( h ,d [ t ]) , ..., (cid:61) ( h K,d [ t ])] T andtraining the predictor with such transformed CSI data, the prediction output is ˇ c d [ t + 1] when feeding c d [ t ] at time t . The complex-valued prediction ˇ h d [ t +1] is obtained simply by taking a reverse manipulation over ˇ c d [ t + 1] . February 9, 2021 DRAFT0

C. Statistics of Predicted CSI

To analyze the performance of the proposed scheme, the statistics of predicted CSI is mandatory. When traininga DL-based predictor, the objective is set to generate predicted CSI ˇ h that approximates to the actual CSI asclose as possible. It is therefore assumed that ˇ h has the same distribution as h and follows zero-mean complexGaussian distribution, i.e., ˇ h ∼CN (0 , σ h ) . Then, the instantaneous SNR γ A,B conditioned on its predicted version ˇ γ A,B = | ˇ h A,B | P A /σ n follows non-central Chi-square distribution with two degrees of freedom, whose ProbabilityDensity Function (PDF) is f γ A,B | ˇ γ A,B ( γ | ˇ γ ) =1¯ γ A,B (1 − ρ ) e − γ + ρ γ ¯ γA,B (1 − ρ I (cid:18) ρ √ γ ˇ γ ¯ γ A,B (1 − ρ ) (cid:19) , (13)where I ( · ) denotes the zero th order modiﬁed Bessel function of the ﬁrst kind, and ρ stands for the correlationcoefﬁcient between ˇ h and h , like (1), deﬁned as ρ = E [ h ˇ h ∗ ] / (cid:113) E [ | h | ] E [ | ˇ h | ] . D. Computational Complexity

In the context of cooperative diversity, the computational complexity mainly arises from multi-relay coordinationand synchronization [9]. The simplicity of ORS is achieved thanks to single-relay transmission that substantiallylowers the amount of signalling overhead among multiple relays. A direct comparison of different schemes is noteasy and does not provide real insight. That is why most of the works in this ﬁeld [14]–[25] did not provide aquantitative analysis on complexity. On the other hand, the complexity of the proposed scheme comes mainly fromthe DL-based predictor, which is always a concern for the application of deep learning. From a practical perspective,it is more meaningful to make clear its demand on computing resources in comparison with the availability of off-the-shelf hardware. Hence, let’s focus on assessing the complexity of the DL predictors in terms of ﬂoating-pointoperations per second (FLOPS).A deep recurrent network can be quantitatively modelled as follows: an input layer with N i neurons, an outputlayer with N o neurons, and L hidden layers, which has N lh neurons at layer l = 1 , . . . , L . To begin with the inputlayer, it computes δ h ( W ( I ) d + b ( I ) ) , where the matrix multiplication generates N i N h ﬂoating-point multiplicativeoperations and ( N i − N h additive operations, and the addition of the bias vector consumes N h operations,amounting to a total of O i = 2 N i N h . Note that the amount of computation raised by the activation function isnegligible compared to the matrix multiplication, which is usually ignored in the calculation of complexity for deeplearning. Likewise, it is easy to know that the output layer corresponds to O o = 2 N Lh N o . For an RNN hidden layeras given in (8), the number of operations equals to O l = (2 N l − h − N lh + (2 N lh − N lh + N lh , where the ﬁrstterm corresponds to the calculation of W ( l ) d ( l ) t , the second is for U ( l ) d ( l +1) t − , and the third is due to the additionof the bias. For simplicity, O l can be approximated to N l − h N lh + 2( N lh ) . Then, the overall complexity for a February 9, 2021 DRAFT1 simple RNN is given by O rnn = O i + O o + L (cid:88) l =1 O l (14) ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) , where we apply N h = N i for a simpler expression. As derived from (9)-(10), the number of operations for thematrix multiplication on an LSTM layer is times that of an RNN layer, i.e., O l . The computation for the gatecontrol, which has totally N lh − operations, can be neglected. Therefore, the complexity of an LSTM networkis approximated by O lstm ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) . (15)Similarly, we can derive the expression for GRU, i.e., O gru ≈ (cid:34) N i N h + N Lh N o + L (cid:88) l =1 (cid:16) N l − h N lh + (cid:0) N lh (cid:1) (cid:17)(cid:35) (16)Suppose all layers has identical number n of neurons, we can further simplify (14)-(16). As listed in Table I,the complexity of recurrent networks is O ( n ) , it is moderate if the number of neurons per layer is not too large.During either the training phase using the typical SGD algorithm, or the predicting phase, the required ﬂoating-point TABLE IT

HE COMPLEXITY OF DEEP RECURRENT NETWORKS . Networks Complexity per Step FLOPS

RNN L ) n L ) f p n GRU L ) n L ) f p n LSTM L ) n L ) f p n operations at each time step is identical. Consequently, (14)-(16) are applicable for measuring the complexity ofboth training and prediction. Note that the above expressions are the complexity per step, we need to know thefrequency of prediction denoted by f p , i.e., the number of steps performed per second, to ﬁgure out FLOPS. SectionVII will further discuss the complexity quantitatively after the values of these parameters are determined.IV. P REDICTIVE R ELAY S ELECTION

Taking advantage of new degree of freedom opened by channel prediction, we propose the PRS scheme thatintends to achieve high performance in fast time-varying channels while keeping up the full diversity in slow fading.The implementation of cooperative relay-selection schemes are mainly divided into two categories: distributed [14]and centralized [17]. The former relies on a timer at each relay, and applies a contention period (CP) to choosethe best relay in a distributed manner. The latter has a centralized controller, e.g., the destination, which measuresthe CSI of all RD links and makes decision. Instead of being immediately used to select the best relay for the February 9, 2021 DRAFT2

Frame R T S SR Payload CP CSI-E CSI-P CSI-B

Distributed PRS:

Centralized PRS: S YN CRC SF D L P H ea d e r B eac on S YN SF D RD Payload

CRC H ea d e r S YN SF D L P SR Payload

Frame

CRC H ea d e r RD Payload

CRC H ea d e r T S F D Source transmit

Destination transmit

Relay transmit

CSI-E CSI-P CSI-B

Frame C T S source packet forward packetselection Fig. 3. Frame structure of the proposed scheme for the distributed (upper) and centralized (lower) relay-selection schemes. A frame is organizedin three steps: a source packet for SR transmission, relay selection, and a forward packet for RD transmission. A packet consists of a header,payload, and a cyclic redundancy check (CRC) code. The header is a combination of some of the following ﬁelds: a synchronization (SYN)preamble - a sequence of known bits used for frequency offset correction and time alignment, start frame delimiter (SFD) - a pattern of bitsapplied to deﬁne the beginning of a packet, pilot signals called Ready-To-Send (RTS), Clear-To-Send (CTS), or training sequence (TS), andlength of payload (LP) representing the number of symbols in the payload. The instantaneous CSI of frame t is measured through carrying outCSI-Estimation (CSI-E) and then CSI-Prediction (CSI-P) forecasts the possible CSI for frame t + 1 , which is buffered (CSI-B) and is fetchedat the next frame for a timely relay selection. current frame, the measured CSI is applied to generate predicted CSI for the next frame. Such a prediction horizonrelaxes the tight requirement of time procedure and therefore provides the ﬂexibility to design an advanced relayingstrategy. Without loss of generality, as depicted in Algorithm 1 , we ﬁrst depict an implementation example for thedistributed PRS with the DF strategy, as follows:1) At frame t , as illustrated in Fig.3, the source broadcasts a packet consisting of a header, payload, and a CRCcode. Other nodes (the relays and destination) synchronize with the source by means of the SYN preamble.Relay k , k ∈ { , · · · , K } measures its local CSI h s,k [ t ] by estimating RTS, which is used to detect receiveddata symbols. Those relays that correctly decode the source’s signal (i.e., passing CRC checking) comprise a DS .2) A beacon containing CTS is sent from the destination, so that relay k can estimate h d,k [ t ] and then h k,d [ t ] is known due to channel reciprocity. It feeds h k,d [ t ] into its local channel predictor to generate ˇ h k,d [ t + 1] ,and buffers it for its usage at the upcoming frame t + 1 .3) Meanwhile, relay k belonging to DS fetches ˇ h k,d [ t ] that was buffered at the previous frame t − . Thisoperation starts once the beacon arrives, in parallel with Step .4) Each relay starts a timer with a duration inversely proportional to the magnitude of CSI, e.g., T t ∝ / | ˇ h k,d [ t ] | .It is possible that this duration is too long due to a very small channel gain. To deal with this anomaly, a February 9, 2021 DRAFT3 maximal duration T m is added.5) The timer on the relay with the largest channel gain expires ﬁrst, and then it sends a ﬂag packet to announce .6) Once received the best relay’s notiﬁcation, other relays ﬂush their timers and keep silent. The selected relayforwards the signal until the end of this frame.The frame structure for the distributed PRS shown in Fig.3 is also suitable for AF relaying networks. Only threemain modiﬁcations are required: the best relay is determined in terms of min( | ˇ h s,k [ t ] | , | ˇ h k,d [ t ] | ) rather than | ˇ h k,d [ t ] | in the DF relaying, the best relay only ampliﬁes the received signal without detection, and therefore CRC is notneeded, as detailed in Algorithm 2 . Algorithm 1

Distributed DF PRS for t = 1 , , ... do s sends RTS s sends data payload x [ t ] while k = 1 , ..., K do estimate h s,k [ t ] detect: ˆ x [ t ] = f ( y s,k [ t ] , h s,k [ t ]) if ˆ x [ t ] is error-free then fetch ˇ h k,d [ t ] from Bufferstart a timer (cid:16) T t ∝ | ˇ h k,d [ t ] | (cid:17) ∩ ( T t (cid:54) T m ) end ifend while d sends CTS ˙ k = arg max k ∈DS (cid:0) | ˇ h k,d [ t ] | (cid:1) sends a ﬂag ˙ k transmits ˆ x [ t ] while k = 1 , ..., K do estimate h k,d [ t ] predict and buffer ˇ h k,d [ t + 1] end whileend for Moreover, the proposed scheme is also applicable to cooperative networks with centralized relay selection. Itscentralized version for DF relays is depicted as follows:1) At frame t , as illustrated in Fig.3, the source broadcasts a packet containing a header, payload, and a CRCcode. The relays achieve synchronization via the SYN preamble, estimate RTS to get the local CSI, and detectreceived data symbols. Due to the “hidden” node problem, signal propagation delay, and the switch time from receive to transmit mode in a transceiver, theprobability of having two or more relay timers expire within an uncertainty interval is nonozero, causing transmission collision among “best”relays. The detail analysis of collision probability refers to Section III of [14].

February 9, 2021 DRAFT4

Algorithm 2

Distributed AF PRS for t = 1 , , ... do s sends RTS s sends x [ t ] while k = 1 , ..., K do estimate h s,k [ t ] predict and buffer ˇ h s,k [ t + 1] end while d sends CTS while k = 1 , ..., K do fetch ˇ h s,k [ t ] , ˇ h k,d [ t ] from Bufferstart a timer T t ∝ | ˇ h s,k [ t ] | , | ˇ h k,d [ t ] | ) ∩ ( T t (cid:54) T m ) estimate h k,d [ t ] predict and buffer ˇ h k,d [ t + 1] end while ˙ k = arg max k (cid:0) min( | ˇ h s,k [ t ] | , | ˇ h k,d [ t ] | ) (cid:1) sends a ﬂag ˙ k transmits y ˙ k [ t ] end for

2) Once the termination of the SR transmission, the relays send out their respective TSs simultaneously.3) The destination can estimate the CSI of all RD links, i.e., h d [ t ] = [ h ,d [ t ] , ..., h K,d [ t ]] T , if the TSs areorthogonal. Feeding h d [ t ] into the global predictor at the destination, ˇ h d [ t +1] is obtained and then bufferedfor the usage at the next frame. Note that only a global predictor is needed within a cooperative network incontrast to the distributed PRS where each relay has a local predictor.4) Meanwhile, the destination fetches the predicted CSI ˇ h d [ t ] = (cid:2) ˇ h ,d [ t ] , ..., ˇ h K,d [ t ] (cid:3) T that is buffered at theprevious frame t − . This operation starts once the TSs arrive, in parallel with Step .5) The destination selects the best relay in terms of ˙ k = arg max k (cid:0) | ˇ h k,d [ t ] | (cid:1) and the selection decision is fedback (FD) to the relays.6) The selected relay checks whether it correctly detects data symbols in the source packet by checking CRC.If yes, it relays the signal in the forward packet. Otherwise, it sends non-acknowledgement to trigger a relayre-selection process or the termination of this frame (whose data will be re-transmitted at the next framework[12]).The centralized PRS using the DF strategy is also described in Algorithm 3 , while its AF counterpart can bederived, as from

Algorithm 1 to Algorithm 2 , and therefore is not repeated here because of the page limitation.

February 9, 2021 DRAFT5

Algorithm 3

Centralized DF PRS for t = 1 , , ... do s sends RTS s sends x [ t ] while k = 1 , ..., K do send k th TSestimate h s,k [ t ] detect: ˆ x [ t ] = f ( y s,k [ t ] , h s,k [ t ]) end while d fetch ˇ h d [ t ] from Bufferselect ˙ k = arg max k (cid:0) | ˇ h k,d [ t ] | (cid:1) and feed back if ˆ x [ t ] on ˙ k is error-free then ˙ k transmits ˆ x [ t ] else d re-selects ˙ k or terminates end if d estimates h d [ t ] from TSpredict and buffer ˇ h d [ t + 1] end for V. O

UTAGE P ROBABILITY A NALYSIS

The performance of PRS will be analyzed with respect to (w.r.t.) outage probability and channel capacity, whichare key performance indicators to assess cooperative diversity techniques. In this section, we ﬁrst derive the closed-form formulas of outage probabilities for DF and AF PRS, respectively, and then get their capacity expressions inthe following section.

A. Outage Probability for DF PRS

In the Information Theory [1], the outage points to the event that instantaneous channel capacity falls below atarget rate R , where reliable communication is not achievable whatever channel coding used. The metric to measurethe probability of outage is referred to as outage probability that is deﬁned as P ( R )= P { log (1 + γ ) < R } , where P is the notation of mathematical probability. In the DF relaying, the number of relays in a decoding subset varies fromtime to time due to the channel fading. Let’s categorize all decoding subsets containing M relays into one groupdenoted by DS M , M = 0 , , . . . , K . These M relays are probably different, namely M out of K relays, resulting in (cid:0) KM (cid:1) combinations. In other words, DS M is a set of decoding subsets, i.e., DS M = (cid:110) DS pM (cid:12)(cid:12)(cid:12) p =1 , ..., (cid:0) KM (cid:1) (cid:111) , where DS pM denotes the p th element of DS M . Then, the outage probability of PRS with DF relays can be calculated by P DFprs ( R ) = K (cid:88) M =0 ( KM ) (cid:88) p =1 P ( R |DS pM ) P ( DS pM ) , (17) February 9, 2021 DRAFT6 where P ( DS pM ) is the occurrence probability of DS pM , and P ( R |DS pM ) is the outage probability conditioned on DS pM . Suppose that all SR links are independent and identically-distributed ( i.i.d. ), the values of P ( DS pM ) for any p ∈ (cid:110) , ..., (cid:0) KM (cid:1)(cid:111) are equal, and as well P ( R |DS pM ) if all RD channels are i.i.d . Then, (17) can be simpliﬁed to P DFprs ( R ) = K (cid:88) M =0 P ( R ||DS| = M ) P ( |DS| = M ) , (18)where | · | represents the cardinality of a set and P ( |DS| = M ) denotes the probability that the number of relaysin a decoding subset is M . With Rayleigh fading, the instantaneous SNR of each SR channel is exponentiallydistributed, whose Cumulative Distribution Function (CDF) is given by F γ s,k ( γ ) = 1 − e − γ ¯ γs,k , γ > . (19)According to (3), the probability that a relay correctly decodes the received signal, or γ s,k (cid:62) γ o , equals to − F γ s,k ( γ o ) . M out of K relays falling into the DS follows the binomial distribution, we would obtain P ( |DS| = M ) = (cid:18) KM (cid:19) (cid:16) e − γo ¯ γs,k (cid:17) M (cid:16) − e − γo ¯ γs,k (cid:17) K − M . (20)By far, the second term in (18) is determined. Let’s turn to the ﬁrst term P ( R ||DS| = M ) , which is derived,conditioned on the number of M , as follows: M = 0 : If no relay can successfully decode the original signal, the signal transmission fails, i.e., P ( R ||DS| = 0) = 1 . (21) M = 1 : Only one relay correctly decodes the signal, it acts as the best relay directly without selection.Similar to (19), we obtain the CDF of the received SNR for this RD link as F γ ˙ k,d ( γ )=1 − e − γ/ ¯ γ k,d , resulting in P ( R ||DS| = 1) = F γ ˙ k,d ( γ o ) = 1 − e − γo ¯ γk,d . (22) M > : In this case, the best relay is opportunistically selected from the DS in terms of predicted CSI, thatis ˙ k = arg max k ∈DS (ˇ γ k,d ) . To simplify the derivation, we use A ˙ k to represent the event of ˇ γ ˙ k = max k ∈DS (ˇ γ k,d ) .The predicted CSI is applied only for relay selection, whereas the post-processing SNR during signal transmissionshould be the actual SNR γ ˙ k , whose CDF can be calculated by F γ ˙ k ( γ ) = M (cid:88) ˙ k =1 P ( γ ˙ k (cid:54) γ |A ˙ k ) P (cid:0) A ˙ k (cid:1) , (23)where P ( A ˙ k ) denotes the occurrence probability of A ˙ k . Under the assumption of i.i.d channels, each relay in thedecoding subset has the same chance to get the largest SNR, thus P ( A ˙ k ) = 1 /M . Besides, P ( γ ˙ k (cid:54) γ |A ˙ k ) is theprobability that the actual SNR is below an arbitrary threshold γ conditioned on A ˙ k , which is computed by P ( γ ˙ k (cid:54) γ |A ˙ k ) = (cid:90) γ (cid:90) ∞ f γ ˙ k | ˇ γ ˙ k ( γ | ˇ γ ) f ˇ γ ˙ k | A ˙ k (ˇ γ ) dγd ˇ γ, (24)where f γ ˙ k | ˇ γ ˙ k ( γ | ˇ γ ) stands for the PDF of γ ˙ k conditioned on its predicted version ˇ γ ˙ k , as given in (13), and f ˇ γ ˙ k | A ˙ k (ˇ γ ) denotes the PDF of ˇ γ ˙ k in the case of A ˙ k . Analogue to the multi-user selection with a max-SNR scheduler in [10],we have f ˇ γ ˙ k | A ˙ k (ˇ γ ) = M e − ˇ γ ¯ γk,d ¯ γ k,d (cid:18) − e − ˇ γ ¯ γk,d (cid:19) M − . (25) February 9, 2021 DRAFT7

Equation (24) is solved given (13) and (25), and then substituting P ( γ ˙ k (cid:54) γ |A ˙ k ) into (23), we have F γ ˙ k ( γ ) = M − (cid:88) m =0 (cid:18) M − m (cid:19) ( − m m + 1 (cid:18) − e − γ ( m +1)¯ γk,d [ m (1 − ρ ] (cid:19) . (26)Thus, the conditional outage probability at M > is P ( R ||DS| = M ) = F γ ˙ k ( γ o ) . (27)Substituting (20), (21), (22), and (27) into (18), the closed-form expression of outage probability for DF PRS isobtained: P DFprs ( γ o )= (cid:16) − e − γo ¯ γs,k (cid:17) K + K (cid:88) M =1 M − (cid:88) m =0 (cid:18) M − m (cid:19) ( − m m + 1 (cid:18) − e − γo ( m +1)¯ γk,d [ m (1 − ρ ] (cid:19) · (cid:18) KM (cid:19) (cid:16) e − γo ¯ γs,k (cid:17) M (cid:16) − e − γo ¯ γs,k (cid:17) K − M . (28) B. Outage Probability for AF PRS

In the AF relaying, the best relay is selected in terms of the equivalent end-to-end CSI. The predicted SNR of thebest relay is the largest, i.e., ˇ γ ˙ k = max k ∈ [1 ,...,K ] { min(ˇ γ s,k , ˇ γ k,d ) } . However, the calculation of outage probabilityrequires the PDF of the actual SNR, rather than the predicted SNR, i.e., P AFprs ( γ o )= (cid:90) γ o f γ ˙ k ( γ ) dγ, (29)where γ o is the threshold SNR deﬁned in (3). Conditioned on its predicted version ˇ γ ˙ k , the PDF of γ ˙ k is computedby f γ ˙ k ( γ ) = (cid:90) ∞ f γ ˙ k | ˇ γ ˙ k ( γ | ˇ γ ) f ˇ γ ˙ k (ˇ γ ) d ˇ γ, (30)where f ˇ γ ˙ k (ˇ γ ) stands for the PDF of ˇ γ ˙ k . Under the assumption of i.i.d. Rayleigh fading, we can ﬁrst ﬁgure out itsCDF as F ˇ γ ˙ k (ˇ γ ) = P (ˇ γ ˙ k < ˇ γ ) = K (cid:89) k =1 P (ˇ γ k < ˇ γ ) = K (cid:89) k =1 F ˇ γ k (ˇ γ ) . (31)Since ˇ γ s,k and ˇ γ k,d are exponentially distributed, ˇ γ k = min(ˇ γ s,k , ˇ γ k,d ) also follows the exponential distributionwith a mean of ¯ γ e = ¯ γ s,k ¯ γ k,d ¯ γ s,k +¯ γ k,d . Like (19), we have F ˇ γ k (ˇ γ )=1 − e − ˇ γ/ ¯ γ e and then (31) gets solved as F ˇ γ ˙ k (ˇ γ ) = (cid:16) − e − ˇ γ/ ¯ γ e (cid:17) K . (32)Taking its derivative, yields f ˇ γ ˙ k (ˇ γ ) = ∂F ˇ γ ˙ k (ˇ γ ) ∂ ˇ γ = K ¯ γ e e − ˇ γ/ ¯ γ e (cid:104) − e − ˇ γ/ ¯ γ e (cid:105) K − . (33)For the sake of mathematical tractability, according to [45], (33) is transformed into another form as f ˇ γ ˙ k (ˇ γ ) = K (cid:88) k =1 (cid:18) Kk (cid:19) ( − ( k − k ¯ γ e e − k ˇ γ ¯ γe . (34) February 9, 2021 DRAFT8

Substituting (13) and (34) into (30), yields f γ ˙ k ( γ )= K (cid:88) k =1 (cid:18) Kk (cid:19) ( − ( k − k ¯ γ e e − kγ ¯ γe (35) × (cid:90) ∞ e − ˇ γ (cid:16) ρ − ρ γe + k ¯ γe (cid:17) I (cid:32) (cid:112) ρ γ ˇ γ ¯ γ e (1 − ρ ) (cid:33) d ˇ γ. Applying Eq. (6.614.3) of [46], i.e., (cid:82) ∞ e − αx I (2 √ βx ) dx = α e (cid:16) βα (cid:17) , with α = (cid:16) ρ (1 − ρ )¯ γ e + k ¯ γ e (cid:17) and β = ρ γ [¯ γ e (1 − ρ )] , we would solve (35) as f γ ˙ k ( γ ) = K (cid:88) k =1 (cid:18) Kk (cid:19) ( − ( k − k ¯ γ e [ k (1 − ρ ) + ρ ] e − kγ [ k (1 − ρ ρ γe . (36)Substituting (36) into (29), the analytical expression of outage probability for AF PRS can be ﬁgured out, i.e., P AFprs ( γ o ) = K (cid:88) k =1 (cid:18) Kk (cid:19) ( − k (cid:20) e − kγo [ k (1 − ρ ρ γe − (cid:21) . (37)VI. C APACITY A NALYSIS

Channel capacity is another key performance metric, indicating the maximal transmission rate, at which data canbe delivered over a wireless channel with negligible error probability. In general, it can be calculated by taking theintegral of the received SNR’s PDF, namely C = (cid:82) ∞ log (1 + γ ) f ( γ ) dγ . In the context of cooperative relaying, aclosed-form expression of channel capacity is usually hard to derive. For instance, in [18], the ﬁnal expression stillcontains an exponential integral (cid:82) ∞ t − e − λt dt . To avoid such intractability in the PDF-based analysis, we applyanother approach taking advantage of Moment Generating Function (MGF) [47], deﬁned as M γ ( s )= E [ e − sγ ] . TheMGF-based approach is depicted as follows: Lemma 1:

The ergodic capacity of a wireless system can be derived through the MGF of the received SNR [48],that is C = 1ln(2) Q (cid:88) q =1 w q Φ( s q ) (cid:34) ∂∂s M γ ( s ) (cid:12)(cid:12)(cid:12)(cid:12) s → s q (cid:35) , (38)where ln is the natural logarithm, Q stands for the number of iterations (truncated at Q =200 is already accurateenough), Φ( s ) denotes a special mathematical function called Meijer’s G, i.e., Φ( s ) = − G , , [ · ] , the variable s q isa function of q , which is given by s q = tan [0 . π cos (( q − . π/Q ) + 0 . π ] , and another variable w q = π sin [( q − . π/Q ]4 Q cos (cid:104) . π cos [( q − . π/Q ] + 0 . π (cid:105) . (39) Proof:

The derivation refers to Appendix A . C DFprs = 12 ln(2) (cid:40) K (cid:18) e − γo ¯ γs,k (cid:19) (cid:18) − e − γo ¯ γs,k (cid:19) K − Q (cid:88) q =1 − ¯ γ k,d w q Φ( s q ) (cid:18)

11 + s q ¯ γ k,d (cid:19) + K (cid:88) M =2 (cid:18) e − γo ¯ γs,k (cid:19) M (cid:18) − e − γo ¯ γs,k (cid:19) K − M Q (cid:88) q =1 w q Φ( s q ) M − (cid:88) m =0 (cid:18) M − m (cid:19) ( − m +1 ¯ γ k,d (cid:2) m (1 − ρ ) (cid:3) (cid:104) m + 1 + s q ¯ γ k,d [1 + m (1 − ρ )] (cid:105) (cid:41) (40) February 9, 2021 DRAFT9 C AFprs = 1ln(2) Q (cid:88) q =1 K (cid:88) k =1 ( − k w q Φ( s q ) (cid:18) Kk (cid:19) (cid:32) ¯ γ e k (cid:2) k (1 − ρ ) + ρ (cid:3) (cid:104) k + s q ¯ γ e [ k (1 − ρ ) + ρ ] (cid:105) (cid:33) (41) A. Capacity of DF PRS

Analogous to (18), the channel capacity of the proposed scheme using DF relays can be computed by C DFprs = K (cid:88) M =0 C M Pr( |DS| = M ) , (42)where Pr( |DS| = M ) is given in (20), and C M denotes the capacity for the end-to-end channel conditioned on |DS| = M , which is analyzed as follows: M = 0 : It means no relay can correctly decode the source’s signal, leading to C = 0 . M = 1 : If only one relay is available, it directly serves as the best relay without the need of selection. Fora Rayleigh channel, the MGF of γ ˙ k is given by M γ ˙ k ( s ) = 11 + s ¯ γ k,d . (43)Due to the half-duplex mode in dual-hop cooperative systems, the capacity for the EE channel has to be halved,multiplying a factor of / . Substituting (43) into (38), we have C = 12 ln(2) Q (cid:88) q =1 w q Φ( s q ) − ¯ γ k,d (1 + s q ¯ γ k,d ) . (44) M > : The best relay is opportunistically selected from the DS , and its PDF of the received SNR f γ ˙ k ( γ ) can be obtained by taking the derivative of (26). Upon this, we can derive the MGF of γ ˙ k as follows M γ ˙ k ( s )= (cid:90) ∞ e − sγ f γ ˙ k ( γ ) dγ = (cid:90) ∞ e − sγ ∂F γ ˙ k ( γ ) ∂γ dγ (45) = M − (cid:88) m =0 (cid:18) M − m (cid:19) ( − m m + 1 + s ¯ γ k,d [1 + m (1 − ρ )] . Substituting (45) into (38), the capacity of the RD channel C Mr,d is obtained. Analogous to (44), a factor / ismultiplied due to the half-duplex model, yields the EE capacity of C M = 12 C Mr,d . (46)Looking back to (42), the required terms Pr( |DS| = M ) and C M are available. This enables the following theorem: Theorem 1:

The end-to-end ergodic capacity for the proposed scheme using DF relays over i.i.d.

Rayleigh channelsis given in a closed form by (40).

Proof:

Substituting (20), (44), and (46) into (42), yields (40).

February 9, 2021 DRAFT0

B. Capacity of AF PRS

Given f γ ˙ k ( γ ) in (36), the MGF of the actual SNR is calculated by M γ ˙ k ( s ) = (cid:82) ∞ e − sγ f γ ˙ k ( γ ) dγ , yielding M γ ˙ k ( s ) = K (cid:88) k =1 ( − k − (cid:0) Kk (cid:1) s ¯ γ e [ k (1 − ρ ) + ρ ] /k , (47)whose derivative is ∂M γ ˙ k ( s ) ∂s = K (cid:88) k =1 ( − k (cid:18) Kk (cid:19) × (cid:32) k ¯ γ e (cid:2) k (1 − ρ ) + ρ (cid:3) (cid:104) k + s ¯ γ e [ k (1 − ρ ) + ρ ] (cid:105) (cid:33) . (48) Theorem 2:

The end-to-end ergodic capacity for the proposed scheme using AF relays over i.i.d.

Rayleigh channelsis provided in a closed form by (41).

Proof:

Substituting (48) into (38), yields (41).VII. N

UMERICAL RESULTS

In this section, we ﬁrst introduce the acquisition of CSI datasets for training deep recurrent networks, and clarifyhow to decide hyper-parameters to obtain high prediction accuracy. Monte-Carlo simulation is carried out to getnumerical results w.r.t. outage probability and channel capacity, which are applied to corroborate the theoreticalanalysis and conduct performance comparison with the existing schemes. Moreover, the robustness, scalability, andcomplexity of the proposed scheme are evaluated.

A. CSI Datasets

A proper dataset is essential for training and testing a data-driven algorithm and plays a critical role to gethigh accuracy. We have ever established a wireless test-bed [49] based on the open-source 4G implementation, i.e.,OpenAirInterface, to acquire realistic channel data. Compared with the synthesis data, it shows no evident differencefor the task of channel prediction. For simplicity, the simulation results provided in this section are obtained basedon the synthesis data acquired on MATLAB (cid:114) using its embedded wireless channel models. Following the channelassumption adopted by most of the previous works in this ﬁeld, we would apply single-antenna ﬂat-fading i.i.d. channels. Each channel follows the Rayleigh distribution with an average power gain of , where its fadingcoefﬁcient h is zero-mean circularly-symmetric complex Gaussian random variable with the variance of , i.e., h ∼CN (0 , . To emulate fast fading environment, the maximal Doppler shift is set to f d =100Hz , which correspondsto a moving speed of around 100 km / h at the carrier frequency of . Continuous-time channel responses aresampled with a rate of f s =1KHz , adhering to the assumption of ﬂat fading, and therefore the interval of samplesis T s =1ms . Each channel generates a series of consecutive samples { h [ t ] (cid:12)(cid:12) t =1 , , . . . , } . The lower part ofFig.4b shows an example piece of such a channel. February 9, 2021 DRAFT1

20 40 60 80 100 120 140 160

Number of Hidden Neurons M SE -4 LSTM-1LSTM-2LSTM-3LSTM-4RNN-2GRU-2LSTM-2SLSTM-2L (a)

Delay [ms] ; Correlation Coefficient

ORS-100HzORS-50HzPRS-100HzPRS-50Hz

Time [ms] | h | Channel Fading

TrainDesirePredict (b)Fig. 4. (a) Prediction accuracy with different hyper-parameters in terms of the number of hidden neurons; (b) The upper: Comparison ofcorrelation coefﬁcient for outdated and predicted CSI, and the lower: Illustration of a time-varying channel differentiating the training andpredicting phase.

SNR [dB] -6 -5 -4 -3 -2 -1 O u t age P r obab ili t y DF cooperative network K=8, f d =100Hz ORS-3msORS-2msOSTC-3msOSTC-2msPRS-3msPRS-2msPrefectAnalytical (a)

SNR [dB] -6 -5 -4 -3 -2 -1 O u t age P r obab ili t y AF cooperative network K=8,f d =100Hz PerfectPRS-1msORS-1msPRS-2msORS-2msPRS-3msORS-3msAnalytical (b)Fig. 5. (a) Comparison of outage probability for ORS, OSTC, and PRS in a DF cooperative network with K =8 relays; (b) Comparison ofoutage probability for ORS and PRS in an AF cooperative network with K =8 relays. B. Training the Predictor

Hyper-parameters of a deep network, such as the number of layers or neurons, activation functions, trainingalgorithms, and the length of training data, have a substantial impact on accuracy. It is worth clarifying how totune a deep network on demand. A training process starts from an initial state where all weights and biases arerandomly selected. Using the centralized relay selection as an example, the input of the predictor at the destination isa magnitude vector a d [ t ] , while the output is its D -step-ahead prediction ˇ a d [ t + D ] . To measure prediction accuracy,mean squared error (MSE) is applied as the cost function, namely MSE = T (cid:80) Tt =1 (cid:107) ˇ a d [ t + D ] − a d [ t + D ] (cid:107) ,where T is the total number of channel samples for evaluation and (cid:107)·(cid:107) notates the Frobenius norm of a vector.Using the batch training, a batch of samples is fed into the network per step. The output is compared with the February 9, 2021 DRAFT2 desired values and the resultant error signals are propagated back through the network to update the weights bymeans of training algorithms such as the Adam optimizer [50] used in our simulation. After a total of epochs,the trained network is employed to predict CSI. TABLE IIS

IMULATION CONFIGURATION

Parameters Values

Sampling rate f s = 1000Hz Max. Doppler shift f d = 100Hz Channel model Rayleigh fadingDoppler spectrum Jakes’s modelNumber of Relay K = 8 Dataset size Deep learning LSTM netwok ( L = 2 , N l = 25 )Training algorithm Adam optimizer [50]Batch size 256Tapped-delay line τ = 4 Cost function MSEActuation function tanh

Fig.4a compares the prediction accuracy of the predictors with respect to different hyper-parameters. Withoutloss of generality, we select a cooperative network with K = 8 relays as the default scenario for simulation. Thenumber of relays does not affect the superiority of the proposed scheme, which will be veriﬁed in the followingpart. The length of the tapped-delay line is selected according to the coherence time because too ‘ old ’ CSI samplesare uncorrelated and do not provide useful information. As an example, we select τ = 4 for f d = 100Hz , whichis veriﬁed as the optimal setting in simulation. The input vector deﬁned in (12) is thus d (0) t = [ a ,d [ t − , a ,d [ t − , . . . , a ,d [ t ]] T , which has a dimension of K × ( τ + 1) = 40 . One-step-ahead prediction ˇ a d [ t + 1] = [ˇ a ,d [ t +1] , ..., ˇ a ,d [ t + 1]] T is the output of the predictor. Let’s ﬁrst look at the impact of the number of layers and thenumber of neurons. Starting from an LSTM network with a single hidden layer, denoted by LSTM-1 in the legendof the ﬁgure, its accuracy curve as a function of the number of hidden neurons likes an ‘U’ shape. That is becausethe network suffers from the under-ﬁtting problem with only neurons in the hidden layer, while the over-ﬁtting problem appears at the turn point of neurons. To make a fair comparison, the horizontal axis represents thetotal number of hidden neurons, which are evenly allocated to different layers. For instance, the point of ‘60’in the horizontal axis means a 2-hidden-layer network with neurons at either layer (denoted by LSTM-2 ), a3-hidden-layer network with neurons per layer (denoted by LSTM-3 ), or a single layer with hidden neurons.No matter how many neurons used in its single hidden layer, LSTM-1 cannot reach the high accuracy achieved by

LSTM-2 and

LSTM-3 , justifying the beneﬁt of deep learning. But it does not mean that the more layers, the better,

February 9, 2021 DRAFT3

SNR [dB] C apa c i t y [ bp s / H z ] Cooperative network K=8,f d =100Hz Perfect AFPRS AF-3msORS AF-3msPerfect DFPRS DF-3msOSTC-3msORS DF-3msAnalytical (a) -5 O u t age P r obab ili t y Impact of additive noise

PerfectPRS-30dBPRS-25dBPRS-20dBORS

SNR [dB] -5 O u t age P r obab ili t y Impact of syn. error =5 ° =10 ° =15 ° =20 ° (b)Fig. 6. (a) Comparison of channel capacity for ORS, OSTC, and PRS in a cooperative network with K =8 relays; (b) The impact of additivenoise (the upper) and synchronization error (the lower) on the performance of outage probability. as demonstrated by the worse result of LSTM-4 , which has 4 hidden layers. After known that 2-hidden-layer is thebest choice for LSTM, we further observe the recurrent networks with 2 RNN or GRU hidden layers, indicatedby

RNN-2 and

GRU-2 , respectively. As we can see, GRU performs as good as LSTM, whereas RNN is weak.Furthermore, we check the impact of the length of training data. The aforementioned results are measured with thedefault length of , channel samples. We ﬁrst shorten it to , , as shown by LSTM-2S , the deep networkseems to be under-ﬁtted, leading to an obvious loss. On the contrary, if the length is doubled to , , as shownby LSTM-2L , the performance keeps good and is even better with ∼ neurons. As a result, we select a2-hidden-layer LSTM network with neurons at either layer and a training length of , , upon which thenumerical results in the following ﬁgures are derived. C. Performance Comparison

Numerical results of outage probability and channel capacity for PRS, ORS, and OSTC in the presence of perfect,predicted, and outdated CSI are obtained from Monte-Carlo simulation. As usual, an EE target rate of R =1bps / Hz is applied for outage calculation. The total transmit power P is equally allocated between two phases, where thesource’s power is P s =0 . P , resulting in an average SNR ¯ γ s,k =0 . P/σ n , while ¯ γ k,d =0 . P/σ n for the RD link.In these ﬁgures, the numerical results are marked by markers , while the analytical results are plotted into curves .It can be seen that the markers fall into their corresponding curves, corroborating our theoretical analyses in theprevious sections of this article.The performance of a cooperative network is directly affected by the quality of applied CSI. Let’s ﬁrst give aglance at the quality superior of predicted CSI. The correlation coefﬁcient of outdated CSI is calculated by (2),e.g., ρ o = J (200 πτ ) for f d = 100Hz . With the increase of delay τ , the similarity between the outdated and actualCSI falls off, as indicated by ORS − in Fig.4b, until it becomes totally uncorrelated at nearly τ = 4ms . Thepredicted CSI has higher quality, e.g., ρ > . at τ = 3ms , in comparison with ρ o ≈ . of the outdated CSI. February 9, 2021 DRAFT4

At the point of τ = 4ms , the quality of the predicted CSI suffers from a sudden drop, because the outdated CSIfed into the predictor is already uncorrelated and cannot provide any useful information about the actual CSI. Itimplies that the maximal prediction horizon is limited by the coherence time of a fading channel. Looking at thecase of , the predicted CSI also has remarkably better quality over the outdated CSI.Next, we compare the outage performance of three DF relaying schemes in a cooperative network with K =8 relays, as illustrated in Fig.5a. The relay selection with the perfect knowledge of CSI (i.e., ρ =1 ) is used as thebenchmark, which has the diversity order of and decays at a rate of / ¯ γ , where ¯ γ = P/σ n is the average EE SNR. With the delay of τ = 2 and , the quality of outdated CSI drops to ρ o = J (0 . π ) ≈ . and J (0 . π ) ≈ . , respectively, which substantially deteriorates the performance. The diversity of ORS falls into , i.e., no diversity, and the curve decays slowly at a rate of / ¯ γ in the high SNR regime. OSTC can redeem someloss and achieve the diversity order of by using a pair of relays, but its gap to the benchmark is still large, morethan at the level of − . Making use of channel prediction, the quality of CSI can be improved to ρ > . .The proposed scheme achieves nearly the optimal performance with the horizon of (by setting D = 2 stepsprediction), and remarkably outperforms OSTC with a gain of approximately in the case of . Additionally,Fig.5b compares the performance of ORS and PRS in a cooperative network with K =8 AF relays (OSTC is onlyapplicable to DF relaying due to its utilization of space-time coding). With the horizon/delay of ms , the proposedscheme receives the optimal performance, whereas ORS suffers from a loss of around at the level of − .Increased τ to and , PRS substantially outperforms ORS with a gain of around . The analyticalresults of PRS are given by (28) and (37), which are corroborated by the numerical results, and those of ORS andOSTC are from [25].Channel capacities for different schemes are comparatively provided in Fig.6a. Looking ﬁrst at the AF relaying,ORS suffers from a capacity loss of around / Hz if τ = 3ms , but PRS can achieve a near-optimal capacityof / Hz at the SNR of ¯ γ =20dB . In a DF cooperative network, the advantage of the proposed scheme is alsoobvious. For instance, ORS, OSTC, and PRS achieves . , . , and . / Hz , respectively, where the capacityof PRS closely approaches to the perfect one. As we can see, the numerical results tightly agree with the curvesof analytical results, validating the correctness of (40) and (41). D. Robustness

In addition to its performance, the robustness of the proposed scheme against additive noise, synchronization error,mobility, and different fading statistics is evaluated. Noise is unavoidable during the process of acquiring CSI data,so it is necessary to make clear its impact on the performance. Setting the received SNR of the pilot signals usedto estimate CSI to , as shown in Fig.6b, the performance loss is negligible. The impact gradually becomesclear when the SNR is decreased to and , but it still obviously outperforms ORS. Unlike ordinarydata delivery, the acquisition of CSI data dedicated to the training of deep networks can use more transmissionresources, e.g., higher transmit power for pilots. Hence, an SNR of or even higher is practically expected andthe proposed scheme can be regarded as robust enough in the front of noise. Then, the effect of synchronizationerror between two communicating nodes is also studied. If the maximal residual phase error is θ = 5 ◦ , the loss February 9, 2021 DRAFT5 is not evident compared to the perfect CSI. With a growing value of θ , the performance deteriorates, but PRSwith θ = 20 ◦ is still better than ORS. Unlike multi-relay transmission, PRS is a kind of single-relay transmissionwhere MCFO and MTO is not required. Its synchronization is as simple as that of point-to-point single-antennacommunication link, which is a mature technique, and therefore keeping a residual error under ◦ is achievable inpractice.As a data-driven technique, a DL predictor treats a fading channel as a black box and only needs local channelmeasurements. The proposed scheme is therefore applicable for any kind of wireless channel statistics, isolatedfrom radio propagation parameters such as fading distribution, the number of propagated paths, and the angle ofarrival. It is interesting to examine the performance in Rician fading, where a dominating signal path exists betweentwo communicating nodes, in addition to a large number of reﬂecting paths in Rayleigh fading. Fig.7a demonstratesthe impact of mobility on outage probability over Rician fading channels. Normalized by carrier frequency, themoving speed of a node is measured by the Doppler shift, as , , and used in the ﬁgure. At highspeed ( f d = 100 Hz corresponding to a velocity of / h at the carrier frequency of ), PRS shows muchbetter performance over ORS. Such a superiority weakens with a slowdown of moving speed until the ORS schemeachieves the full diversity when measured CSI is not outdated. It proves our argument that the proposed schemeremains the full diversity in slow fading as same as the ORS scheme, and substantially outperforms in fast fading.We also observe the impact of network scale on the performance of a DF cooperative network, as illustrated inFig.7b, in comparison with direct transmission (DT). If there is only one relay available in the network ( K = 1 ),selection is not needed. ORS and PRS achieve identical performance, which is inferior to that of DT becausethe relaying in this case is inefﬁcient (power and time resources must be shared between the source and the bestrelay). However, DT has only one possible signal path, i.e., no diversity, and its performance curve drops at arate of / ¯ γ . Even though a relay with better CSI is selected from two available relays ( K = 2 ), the cooperativenetwork can outperform DT thanks to diversity gain. Increasing the network scale with more relays, the superiorityof selection, especially PRS, becomes increasingly evident. That is because a diversity order of K is achievableand the performance curve drops at a rate of / ¯ γ K . E. Scalability

Due to the ﬂexibility of a distributed system, the distributed PRS can scale up and down to support dynamicnetwork scale with different numbers of relays. If a new node would participate in a cooperative network, it isﬁrst admitted through some mechanisms like admission control and then synchronizes with other nodes. As we cansee in Algorithm 1, when RTS/CTS is broadcasted, this node estimates and predicts local CSI, starts a timer, andserves as the best relay if it gets the largest CSI. This procedure is independent and transparent to other relays.Vice versa, when some nodes leave, the remaining relays form a smaller cooperative network that carries out relayselection without any need of modifying the algorithm. Simulation results reveal that the PRS scheme performs wellin different numbers of relays, as shown in Fig.7b. In the centralized PRS, the relays have no predictor, and onlythe destination runs a global predictor. The destination knows the network topology and can manage the numberof relays employing admission control. Due to the ﬂexibility of deep neural networks, only the dimensions of the

February 9, 2021 DRAFT6 -6 -4 -2 O u t age P r obab ili t y (a) Rician fading channels ORS-100HzPRS-100HzORS-50HzPRS-50HzORS-25HzPRS-25Hz

SNR [dB] -5 O u t age P r obab ili t y (b) Impact of the number of relays DirectORS(K=1)PRS(K=1)ORS(K=2)PRS(K=2)ORS(K=6)PRS(K=6)

Fig. 7. (a) The impact of moving speed (measured by the Doppler shift) on outage probability of ORS and PRS in a DF cooperative networkwith K = 8 relays over Rician fading channels (b) Outage probability of a DF cooperative network with K = 1 , , and relays over Rayleighfading channels, in comparison with direct transmission between the source and the destination. In the simulation, the source of DT transmitsusing power P and full time duration T , in contrast to P/ and T/ used by the source in the relaying. input and output layer must be adjusted according to the number of relays. Keeping other hyper-parameters suchas two LSTM hidden layers with 25 neurons per layer might raise small derivation on prediction accuracy, but willnot cause a breakdown of the cooperative network. Using freshly collected CSI data, an independent DL modelis trained off-line to derive optimal hyper-parameters for the changed network, which can be applied to smoothlyupdate the online DL predictor. F. Computational Complexity

Last but not least, the complexity of the predictor is quantiﬁed to compare with the capacity of COTS computinghardware. As recommended by Fig.4a, the applied deep neural network has two LSTM hidden layers with N h = N h = 25 . In centralized PRS, the input vector d (0) t for the global predictor contains entries due to K = 8 and τ = 4 while the output ˇ a d [ t + 1] is a -dimensional vector, corresponding to N i = 40 and N o = 8 . Itamounts to O lstm = 25 , ﬂoating-point operations per prediction in terms of (15). For distributed selection,the local predictor at each relay is much simpler due to a reduced dimension of input and output ( N i = 5 and N o = 1 ), and therefore we skip it. Since the interval of prediction step is , the frequency of prediction equals to f p = 1000 , resulting in . . In comparison with off-the-shelf digital signal processors (DSPs), e.g., TIC6678, which provides a computation capacity of up to

179 GFLOPS , the required computing resource occupiesapproximately . of a single DSP chip. Taking into account its back-compatibility to legacy hardware and itsapplicability to low-cost IoT devices, we further check low-end DSPs. Utilizing TI C6748 that has computationpower of . as an example, the resource required by the predictor is around . In addition to DSPs, February 9, 2021 DRAFT7 similar results on devices with central processing unit (CPU) or graphical processing unit (GPU) can be expected.In contrast to the ORS system, the proposed scheme does not bring extra overhead on signal transmission. Theincrease of computational complexity arises only from channel prediction. If increasing load on a running DSP,the increase of energy consumption is marginal. In a nutshell, the complexity of the DL-based channel predictorapplied for PRS, as well as its associated energy consumption, is quite affordable, if not negligible.VIII. C ONCLUSIONS

In this article, we proposed and analyzed a deep-learning-aided cooperative diversity method for mobile termi-nals without an antenna array to cultivate the beneﬁt of spatial diversity. A deep recurrent neural network wasdeliberately built to improve the timeliness of channel state information. The predictor is applicable for any kindof wireless fading statistics, while speciﬁc examples for Rayleigh and Rician fading were given. It achieves theoptimal performance with full diversity on the order of the number of cooperating relays in slow fading wirelessenvironments, and it substantially outperforms the existing schemes in fast time-varying channels. It supports bothamplify-and-forward and decode-and-forward relaying strategies, and adapts to both distributed and centralizedrelay selection. Simply inserting a predictor between the channel estimator and relay selector, an ORS systemcan be transparently upgraded to a PRS system without any other modiﬁcations, making it compatible with theexisting systems and standards. By selecting a single opportunistic relay, it inherits the simplicity of ORS and avoidsmulti-relay coordination and synchronization. The computational complexity and energy consumption arising fromfading channel prediction is negligible. Moreover, it is robust enough against additive noise, synchronization error,mobility, and different network scale. From the perspective of performance , compatibility , complexity , robustness ,and scalability , it is viewed as an excellent candidate for immediate implementation in next-generation cooperativenetworks. A PPENDIX AD ERIVATION OF L EMMA log (1 + γ ) = − (cid:90) ∞ (cid:20) ∂∂s e − sγ (cid:21) × (49) H , ,  s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (1 , , (1 , , (1 , , , (0 ,  ds, where H , , [ · ] is a special mathematical function named Fox’s H. It is too complex to solve even with the aid ofmathematical software. Another special function called Meijer’s G is therefore utilized to replace Fox’s H sincemany software tools such as MATHEMATICA (cid:114) and MATLAB (cid:114) have already implemented it. It can be directly February 9, 2021 DRAFT8 invoked to return a numerical value, e.g., G , ,  , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  ≈ . , facilitating the derivation of a closed-formexpression. Deﬁning Φ( s ) = − H , ,  s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (1 , , (1 , , (1 , , , (0 ,  = − G , ,  , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s  and substituting (49) into C = (cid:82) ∞ log (1 + γ ) f ( γ ) dγ , yields C = 1ln(2) (cid:90) ∞ (cid:90) ∞ (cid:20) ∂∂s e − sγ (cid:21) Φ( s ) f ( γ ) dγds. (50)Since M γ ( s )= (cid:82) ∞ e − sγ f ( γ ) dγ , (50) can be transformed into C = 1ln(2) (cid:90) ∞ (cid:20) ∂M γ ( s ) ∂s (cid:21) Φ( s ) ds. (51)To avoid the intractability of taking integral, utilizing the Gauss-Chebyshev quadrature shown in [48], (51) istransformed to (38). R EFERENCES[1] D. Tse and P. Viswanath,

Fundamentals of Wireless Communication . Cambridge, UK: Cambridge Univ. Press, 2005.[2] A. Sendonaris et al. , “User cooperation diversity-Part I and II,”

IEEE Trans. Commun. , vol. 51, no. 11, pp. 1927–1948, Nov. 2003.[3] T. M. Cover and A. A. E. Gamal, “Capacity theorems for the relay channel,”

IEEE Trans. Inf. Theory , vol. 25, no. 5, pp. 572–584, Sep.1979.[4] J. Hoydis et al. , “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?”

IEEE J. Sel. Areas Commun. ,vol. 31, no. 2, pp. 160–171, Feb. 2013.[5] J. N. Laneman et al. , “Cooperative diversity in wireless networks: efﬁcient protocols and outage behaviour,”

IEEE Trans. Inf. Theory ,vol. 50, no. 12, pp. 3062–3080, Dec. 2004.[6] Y. Jing and H. Jafarkhani, “Network beamforming using relays with perfect channel information,”

IEEE Trans. Inf. Theory , vol. 55, no. 6,pp. 2499–2517, Jun. 2009.[7] J. N. Laneman and G. W. Wornell, “Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks,”

IEEE Trans. Inf. Theory , vol. 49, no. 10, pp. 2415–2425, Oct. 2003.[8] A. A. Nasir et al. , “Timing and carrier synchronization with channel estimation in multi-relay cooperative networks,”

IEEE Trans. SignalProcess. , vol. 60, no. 2, pp. 793–811, Feb. 2012.[9] H. Mehrpouyan et al. , “Bounds and algorithms for multiple frequency offset estimation in cooperative networks,”

IEEE Trans. WirelessCommun. , vol. 10, no. 4, pp. 1300–1311, Apr. 2011.[10] L. Yang and M. Alouini, “Performance analysis of multiuser selection diversity,”

IEEE Trans. Veh. Technol. , vol. 55, no. 6, pp. 1848–1861,2006.[11] T. R. Ramya and S. Bhashyam, “Using delayed feedback for antenna selection in MIMO systems,”

IEEE Trans. Wireless Commun. , vol. 8,no. 12, pp. 6059–6067, Dec. 2009.[12] B. Zhao and M. C. Valenti, “Practical relay networks: A generalization of hybrid-ARQ,”

IEEE J. Sel. Areas Commun. , vol. 23, no. 1, pp.7–18, Jan. 2005.[13] M. Zorzi and R. Rao, “Geographic random forwarding (GeRaF) for ad hoc and sensor networks: Multihop performance,”

IEEE Trans.Mobile Comput. , vol. 2, no. 4, pp. 337–348, Oct.-Dec. 2003.[14] A. Bletsas et al. , “A simple cooperative diversity method based on network path selection,”

IEEE J. Sel. Areas Commun. , vol. 24, no. 3,pp. 659–672, Mar. 2006.[15] ——, “Cooperative communications with outage-optimal opportunistic relaying,”

IEEE Trans. Wireless Commun. , vol. 6, no. 9, pp. 3450–3460, Sep. 2007.

February 9, 2021 DRAFT9 [16] J. L. Vicario et al. , “Opportunistic relay selection with outdated CSI: Outage probability and diversity analysis,”

IEEE Trans. WirelessCommun. , vol. 8, no. 6, pp. 2872–2876, Jun. 2009.[17] M. Seyﬁ et al. , “Effect of feedback delay on the performance of cooperative networks with relay selection,”

IEEE Trans. Wireless Commun. ,vol. 10, no. 12, pp. 4161–4171, Dec. 2011.[18] M. Torabi et al. , “Impact of outdated relay selection on the capacity of AF opportunistic relaying systems with adaptive transmission overnon-identically distributed links,”

IEEE Trans. Wireless Commun. , vol. 10, no. 11, pp. 3626–3631, Nov. 2011.[19] M. Torabi and D. Haccoun, “Capacity analysis of opportunistic relaying in cooperative systems with outdated channel information,”

IEEECommun. Lett. , vol. 14, no. 12, pp. 1137–1139, Dec. 2010.[20] M. Soysa et al. , “Partial and opportunistic relay selection with outdated channel estimates,”

IEEE Trans. Commun. , vol. 60, no. 3, pp.840–850, Mar. 2012.[21] L. Xiao and X. Dong, “Uniﬁed analysis of generalized selection combining with normalized threshold test per branch,”

IEEE Trans.Wireless Commun. , vol. 5, no. 8, pp. 2153–2163, Aug. 2006.[22] M. Chen et al. , “Opportunistic multiple relay selection with outdated channel state information,”

IEEE Trans. Veh. Technol. , vol. 61, no. 3,pp. 1333–1345, Mar. 2012.[23] W. Jiang et al. , “Analysis of generalized selection combining in cooperative networks with outdated CSI,” in

Proc. IEEE WCNC’14 ,Istanbul, Turkey, Apr. 2014, pp. 612–617.[24] Y. Li et al. , “On the design of relay selection strategies in regenerative cooperative networks with outdated CSI,”

IEEE Trans. WirelessCommun. , vol. 10, no. 9, pp. 3086–3097, Sep. 2011.[25] W. Jiang et al. , “A robust opportunistic relaying strategy for co-operative wireless communications,”

IEEE Trans. Wireless Commun. ,vol. 15, no. 4, pp. 2642–2655, Apr. 2016.[26] ——, “Opportunistic space-time coding to exploit cooperative diversity in fast-fading channels,” in

Proc. IEEE ICC’2014 , Sydney, Australia,Jun. 2014, pp. 4814–4819.[27] A. Duel-Hallen, “Fading channel prediction for mobile radio adaptive transmission systems,”

Proc. IEEE , vol. 95, no. 12, pp. 2299–2313,Dec. 2007.[28] W. Jiang and H. D. Schotten, “Neural network-based fading channel prediction: A comprehensive overview,”

IEEE Access , vol. 7, pp.118 112–118 124, Aug. 2019.[29] A. Duel-Hallen et al. , “Long-range prediction of fading signals,”

IEEE Signal Process. Mag. , vol. 17, no. 3, pp. 62–75, May 2000.[30] J.-Y. Wu and W.-M. Lee, “Optimal linear channel prediction for LTE-A uplink under channel estimation errors,”

IEEE Trans. Veh. Technol. ,vol. 62, no. 8, pp. 4135–4142, Oct. 2013.[31] R. O. Adeogun et al. , “Extrapolation of MIMO mobile-to-mobile wireless channels using parametric-model-based prediction,”

IEEE Trans.Veh. Technol. , vol. 64, no. 10, pp. 4487–4498, 2014.[32] W. Gardner, “Simpliﬁcation of MUSIC and ESPRIT by exploitation of cyclostationarity,”

Proc. IEEE , vol. 76, no. 7, pp. 845–847, Jul.1988.[33] W. Jiang et al. , “A comparison of wireless channel predictors: Artiﬁcial Intelligence versus Kalman ﬁlter,” in

Proc. of IEEE ICC’19 ,Shanghai, China, May 2019.[34] D. Silver et al. , “Mastering the game of Go with deep neural networks and tree search,”

Nature , vol. 529, pp. 484–489, Jan. 2016.[35] W. Jiang et al. , “Experimental results for artiﬁcial intelligence-based self-organized 5G networks,” in

Proc. IEEE PIMRC’17 , Montreal,QC, Canada, Oct. 2017.[36] J. Connor et al. , “Recurrent neural networks and robust time series prediction,”

IEEE Trans. Neural Netw. , vol. 5, no. 2, pp. 240–254,Mar. 1994.[37] W. Jiang and H. D. Schotten, “Deep learning for fading channel prediction,”

IEEE Open J. Commun. Soc. , vol. 1, pp. 320–332, Mar. 2020.[38] W. Jiang et al. , “Neural network based wireless channel prediction,” in

Machine Learning for Future Wireless Communications , F. L. Luo,Ed. United Kindom: John Wiley&Sons and IEEE Press, Dec. 2019, ch. 16.[39] W. Jiang and H. D. Schotten, “Recurrent neural network-based frequency-domain channel prediction for wideband communications,” in

Proc. IEEE VTC’19-Spring , Kuala Lumpur, Malaysia, Apr. 2019.[40] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

Neural Computation , vol. 9, no. 8, pp. 1735–1780, Dec. 1997.[41] K. Cho et al. , “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” preprint arXiv:1406.1078 ,Jun. 2014.

February 9, 2021 DRAFT0 [42] W. Jiang and H. Schotten, “Recurrent neural networks with long short-term memory for fading channel prediction,” in

Proc. IEEE VTC’20-Spring , Antwerp, Belgium, May 2020.[43] F. A. Gers et al. , “Learning to forget: Continual prediction with LSTM,” in

P roc. th Intl. Conf. on Artiﬁcial Neur. Netw. (ICANN) ,Edinburgh, UK, Sep. 1999, pp. 850–855.[44] W. Jiang and H. Schotten, “A deep learning method to predict fading channel in multi-antenna systems,” in

Proc. IEEE VTC’20-Spring ,Antwerp, Belgium, May 2020.[45] Q. T. Zhang and H. G. Lu, “A general analytical approach to multi-branch selection combining over various spatially correlated fadingchannels,”

IEEE Trans. Commun. , vol. 50, no. 7, pp. 1066–1073, Jul. 2002.[46] I. Gradshteyn and I. Ryzhik,

Table of Integrals, Series, and Products , 7th ed. Academic Press, 2007, p. 697.[47] W. Jiang et al. , “An MGF-based performance analysis of opportunistic relay selection with outdated CSI,” in

Proc. IEEE VTC’14-Spring ,Seoul, South Korea, May 2014.[48] F. Yilmaz and M.-S. Alouini, “A uniﬁed MGF-based capacity analysis of diversity combiners over generalized fading channels,”

IEEETrans. Commun. , vol. 60, no. 3, pp. 862–875, Mar. 2012.[49] W. Jiang et al. , “An SDN/NFV proof-of-concept test-bed for machine learning-based network management,” in

Proc. IEEE ICCC’2018 ,Chengdu, China, Dec. 2018, pp. 1966–1971.[50] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980v9 , Jan. 2017.

WEI JIANG (M’09–SM’19) received the Ph.D. degree in Computer Science from Beijing University of Posts andTelecommunications in 2008. From 2008 to 2012, he was with the 2012 Laboratory, HUAWEI Technologies. From 2012to 2015, he was with Institute of Digital Signal Processing, University of Duisburg-Essen, Germany. Since 2015, he isa Senior Researcher with German Research Center for Artiﬁcial Intelligence (DFKI), which is the biggest European AIresearch institution and is the birthplace of “Industry 4.0” strategy. Meanwhile, he is a Senior Lecturer with Universityof Kaiserslautern, Germany. He is the author of three book chapters and over 60 conference and journal papers, holdsaround 30 granted patents, and participated in a number of EU and German research projects. He is an Associate Editorfor

IEEE Access and is a Moderator for IEEE TechRxiv.

Hans D. Schotten (S’93–M’97) received the Ph.D. degrees from the RWTH Aachen University of Technology, Germany,in 1997. From 1999 to 2003, he worked for Ericsson. From 2003 to 2007, he worked for Qualcomm. He becamemanager of a R&D group, Research Coordinator for Qualcomm Europe, and Director for Technical Standards. In2007, he accepted the offer to become the full professor at the University of Kaiserslautern. In 2012, he - in addition -became scientiﬁc director of the German Research Center for Artiﬁcial Intelligence (DFKI) and head of the departmentfor Intelligent Networks. Professor Schotten served as dean of the department of Electrical Engineering of the Universityof Kaiserslautern from 2013 until 2017. Since 2018, he is chairman of the German Society for Information Technologyand member of the Supervisory Board of the VDE. He is the author of more than 200 papers and participated in 30+ European and nationalcollaborative research projects.(S’93–M’97) received the Ph.D. degrees from the RWTH Aachen University of Technology, Germany,in 1997. From 1999 to 2003, he worked for Ericsson. From 2003 to 2007, he worked for Qualcomm. He becamemanager of a R&D group, Research Coordinator for Qualcomm Europe, and Director for Technical Standards. In2007, he accepted the offer to become the full professor at the University of Kaiserslautern. In 2012, he - in addition -became scientiﬁc director of the German Research Center for Artiﬁcial Intelligence (DFKI) and head of the departmentfor Intelligent Networks. Professor Schotten served as dean of the department of Electrical Engineering of the Universityof Kaiserslautern from 2013 until 2017. Since 2018, he is chairman of the German Society for Information Technologyand member of the Supervisory Board of the VDE. He is the author of more than 200 papers and participated in 30+ European and nationalcollaborative research projects.