[PDF] Data-Rate Driven Transmission Strategy for Deep Learning Based Communication Systems

Abstract

Deep learning (DL) based autoencoder is a promising architecture to implement end-to-end communication systems. One fundamental problem of such systems is how to increase the transmission rate. Two new schemes are proposed to address the limited data rate issue: adaptive transmission scheme and generalized data representation (GDR) scheme. In the first scheme, an adaptive transmission is designed to select the transmission vectors for maximizing the data rate under different channel conditions. The block error rate (BLER) of the first scheme is 80% lower than that of the conventional one-hot vector scheme. This implies that higher data rate can be achieved by the adaptive transmission scheme. In the second scheme, the GDR replaces the conventional one-hot representation. The GDR scheme can achieve higher data rate than the conventional one-hot vector scheme with comparable BLER performance. For example, when the vector size is eight, the proposed GDR scheme can double the date rate of the one-hot vector scheme. Besides, the joint scheme of the two proposed schemes can create further benefits. The effect of signal-to-noise ratio (SNR) is analyzed for these DL-based communication systems. Numerical results show that training the autoencoder using data set with various SNR values can attain robust BLER performance under different channel conditions.

Full PDF

aa r X i v : . [ c s . I T ] A p r Data-Rate Driven Transmission Strategies for DeepLearning Based Communication Systems

Xiao Chen, Julian Cheng,

Senior Member, IEEE , Zaichen Zhang,

Senior Member, IEEE , Liang Wu,

Member,IEEE , Jian Dang,

Member, IEEE , Jiangzhou Wang,

Fellow, IEEE

Abstract —Deep learning (DL) based autoencoder is a promis-ing architecture to implement end-to-end communication sys-tems. One fundamental problem of such systems is how toincrease the transmission rate. Two new schemes are proposedto address the limited data rate issue: adaptive transmissionscheme and generalized data representation (GDR) scheme. Inthe ﬁrst scheme, an adaptive transmission is designed to selectthe transmission vectors for maximizing the data rate underdifferent channel conditions. The block error rate (BLER) of theﬁrst scheme is lower than that of the conventional one-hotvector scheme. This implies that higher data rate can be achievedby the adaptive transmission scheme. In the second scheme, theGDR replaces the conventional one-hot representation. The GDRscheme can achieve higher data rate than the conventional one-hot vector scheme with comparable BLER performance. Forexample, when the vector size is eight, the proposed GDR schemecan double the date rate of the one-hot vector scheme. Besides,the joint scheme of the two proposed schemes can create furtherbeneﬁts. The effect of signal-to-noise ratio (SNR) is analyzedfor these DL-based communication systems. Numerical resultsshow that training the autoencoder using data set with variousSNR values can attain robust BLER performance under differentchannel conditions.

Index Terms —Autoencoder, communication systems, data rate,deep learning, transmission strategy.

I. I

NTRODUCTION

To satisfy growing demand for various communicationapplications and services, the next-generation network mustdeliver enhanced mobile broadband, ultra-reliable and low-latency communications, and massive Internet of Things (IoT)ecosystems [1]–[4]. One primary concern is to accommodatethe exponential rise in the number of user equipments andthe trafﬁc capacity in future communication systems. Hence,several promising technologies have been proposed, and theyinclude massive multi-input and multi-output (MIMO) trans-missions, millimeter wave communications, ultra-dense net-

Manuscript received March 30, 2019; revised August 01, November 30,2019 and January 10, 2020; accepted January 13, 2020. This work wassupported by NSFC projects (61501109, 61571105, and 61601119), nationalkey research and development plan (2016YFB0502202), Scientiﬁc ResearchFoundation of Graduate School of Southeast University (YBJJ1816), theScholarship from China Scholarship Council (201806090072), and ZhishanYouth Scholar Program of SEU. The editor coordinating the review of thispaper and approving it for publication was V. Aggarwal. (

Correspondingauthors: Zaichen Zhang; Liang Wu. )X. Chen, Z. Zhang, L. Wu and J. Dang are with National MobileCommunications Research Laboratory, Southeast University, Nanjing, 210096,China (email: { chen xiao, zczhang, wuliang, dangjian } @seu.edu.cn).J. Cheng is with School of Engineering, The University of British Columbia,Kelowna, V1V 1V7, BC, Canada (email: [email protected]).J. Wang is with School of Engineering and Digital Arts, University of Kent,Canterbury, CT2 7NT, United Kingdom (email: [email protected]). works, and non-orthogonal multiple access [5]–[10]. For theseconventional communication systems, there exist a number oflimitations, such as unavailable channel state information incomplex transmission scenario, high complexity to processbig data, and sub-optimal performance caused by conventionalblock structure. For these reasons, with the signiﬁcant devel-opment of deep learning (DL) [11]–[13], researchers have ap-plied the machine learning (ML), especially DL technologies,to design communication systems for beneﬁts that cannot beobtained using the conventional approaches [14]–[18].As a promising technique, deep learning implements com-munication systems using deep neural networks (NNs). Differ-ent from the conventional communication system that consistsof multiple independent blocks (e.g., source/channel coding,modulation, channel estimation, equalization), the DL-basedcommunication system can jointly optimize transmitter andreceiver for end-to-end performance without a block struc-ture [19], [20]. DL-based system design is promising forthe following reasons: (i) A DL-based communication sys-tem can be optimized for end-to-end performance by usingdeep NNs, which is fundamentally different from the block-structure in conventional communication systems; (ii) A DL-based communication system can be optimized for a practicalsystem over any type of channel without requiring a tractablemathematical model, and this includes the channel models thattake into account of different transmission scenarios and non-linearities; (iii) DL algorithms can provide faster processingspeed than conventional communication algorithms, since theexecution of NNs can be highly parallel on concurrent ar-chitectures and can be implemented using low-precision datatypes [21].Attracted by these advantages, there have been a number ofstudies on DL-based communications and signal processingusing state-of-the-art tools and hardware [19], [20], [22]–[37].The DL method is used to deal with certain challenges inexisting communication systems. For example, the DL-basedbelief propagation algorithm was originally used to improvethe performances of channel decoding, where low-complexityand near optimal decoder performance were obtained [22]–[24]. Around the same time, autoencoder was developed toaddress the problem of learning an efﬁcient physical layer[25]. In DL theory, an autoencoder describes a deep NN toﬁnd a low-dimensional representation of its input at certainintermediate layer that allows reconstruction at the output withminimal error [38, Ch. 14]. The DL-based communicationsystem can be represented and implemented by an autoencoderthat is trained using the dataset ofﬂine. Then, the trained autoencoder can be directly applied to practical systems online.A DL-based communication system, interpreted as an autoen-coder, performs an end-to-end reconstruction task that jointlyoptimizes transmitter and receiver as well as learns signalencoding [19], [25], [26], [34]. To address the challengesof frame synchronization, an autoencoder was proposed torepresented a complete communication system [20], [28],and comparable performance can be achieved even withoutextensive hyperparameter tuning. More recently, a DL-basedalgorithm has been used to solve the channel state informationfeedback and channel estimation problems in massive MIMOsystems, and it outperforms the state-of-the-art compressivesensing based algorithms [29]–[31].For future communication systems, there is a huge demandfor data rate due to an increasing number of communica-tion devices and equipment types, and improved quality ofservices (QoS). Consequently, high data-rate schemes shouldbe developed in DL-based communication systems for futurewireless networks. However, one-hot vector [39], being themost commonly used data representation in existing studies[16], [19], [20], [24], [26]–[28], [34], has a low data ratein DL-based communication systems. The reason is that an M × one-hot vector consists of s in all entries with theexception of a single , e.g., [0 , . . . , , , , . . . , T , and thereare only M possible transmitted messages. Here, the valueof M cannot be large since the oversize M will lead toprohibitive training complexity and time-consuming trainingfor the autoencoder. Hence, a small value of M leads to limiteddata rate. This becomes a barrier for developing future DL-based communication systems. Besides, the autoencoder withone-hot vector is typically trained using a ﬁxed vector size M ,which becomes a constraint when designing communicationsystems having different data rate requirements. Also, theconventional autoencoder is trained under a ﬁxed signal-to-noise ratio (SNR) value with an unrealistic expectation tooperate well for a wide range of SNR values in practicaltransmission scenarios [2]. It was reported that training theautoencoder at different SNR values will affect autoencoderperformances [19], but there is no detailed study on the effecton such a system. Therefore, our objective is to design anew transmission scheme and replace the conventional one-hot vector scheme to achieve higher data rate. As well, wewill investigate the effect of training SNR on the performanceof DL-based communication systems. Here, training SNRdenotes the ﬁxed SNR used for training the autoencoderofﬂine, and it can be different from the practical SNR of acommunication system when it is operating online.In this paper, an adaptive transmission scheme is ﬁrstdesigned for different communication scenarios to maximizethe data rate in DL-based communication systems having aQoS constraint. Then, we propose a generalized data repre-sentation (GDR) scheme to improve the data rate of DL-basedcommunication systems. Finally, we analyze the effect ofSNR and mean squared error (MSE) performance in DL-basedcommunication systems. Comparable block error rate (BLER)performance can be achieved by the proposed transmissionschemes which has lower complexity and higher data rate than the conventional DL-based communication system .The major contributions of this paper are summarized asfollows:1) In DL-based communication systems, we pointout thelimited data rate problem of the conventional one-hotvector scheme. To address this issue, we design anadaptive transmission scheme having a QoS constraintfor different channel conditions. In the proposed scheme,the optimal transmission vectors are adaptively selectedfor different SNR values, where the goal is to maximizethe data rate with a constraint on MSE performance. Itis shown that, when both two schemes have the samedata rate, the proposed adaptive transmission scheme canreduce BLER of the conventional one-hot vector schemeby .2) Furthermore, we propose a generalized data represen-tation scheme to improve the data rate in DL-basedcommunication systems. The proposed scheme repre-sents the message by using a probability vector havingmultiple non-zero elements, instead of the conventionalone-hot vector having only one non-zero element. Asexpected, higher data rate is obtained by the proposedGDR scheme with comparable BLER performance andlow complexity. When the vector size is eight, as anexample, the proposed GDR scheme can double the datarate of the conventional one-hot vector scheme. To thebest of the authors’ knowledge, this is the ﬁrst time thatthe GDR scheme is proposed and its effectiveness isveriﬁed.3) We investigate the effect of SNR on the system per-formances in DL-based communication systems. Sim-ulation results show that the high training SNR canimprove the convergence performance in training, butit can also degrade the BLER performance in practicaltransmission. As a tradeoff, we introduce a training SNRset strategy, which shows trade-off between convergenceand BLER performance. Furthermore, it is shown thattraining the autoencoder at low SNR can achieve BLERand MSE performance gains when the trained autoen-coder is applied to high SNR scenario. These resultsprovide a reliable design guidance to select the suitabletraining SNR and achieve optimal system performance.For potential applications, the DL-based autoencoder-represented communication system can be applied to complexchannel conditions without a mathematically tractable modelin, for examples, massive IoT ecosystems and high-speedInternet of Vehicles systems.The remainder of this paper is organized as follows. InSection II, we describe the system model of a DL-basedcommunication system. Section III presents an adaptive trans-mission scheme. Section IV proposes the generalized datarepresentation scheme for DL-based communication systems.Section V investigates the effect of SNR and analyzes the MSEperformance of the autoencoder. In Section VI, we present Notably, throughout this paper, the conventional DL-based communicationsystem refers to an autoencoder based communication system that adopts theone-hot vector data representation. N o r m a li za ti on L a y e r N o i s e L a y e r R e L U L a y e r Channel x y p s Transmitter Receiver ˆ s V ec t o r E xp r e ss i on M u lti p l e D e n s e L a y e r s ( ) t f s u S o f t m a x A c ti v a ti on L a y e r ( ) g u ( ) r f y s P r ob a b ilit y V ec t o r (cid:17)(cid:17)(cid:17) (cid:17)(cid:17)(cid:17) (cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17) (cid:17)(cid:17)(cid:17) (cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17) NoiseInput ReLU+LinearNormalization ReLU Softmax Output M n u u M u y n y M-1 u (cid:17)(cid:17)(cid:17) s s M s M-1 s Fig. 1. A DL-based communication system represented as an autoencoder with its NN structure [19]. the numerical results of the proposed schemes and systemperformances. Section VII concludes this paper.II. D

EEP L EARNING B ASED C OMMUNICATION S YSTEMS

In this section, we describe the DL-based autoencoder foran end-to-end communication system, and then provide theresearch motivations of this paper.

A. Autoencoder for End-to-End Communication Systems

TABLE IA

CTIVATION F UNCTIONS AND L OSS F UNCTIONS

Activationfunctions

Linear s i ReLU max { s i , } Softmax e ui P Mj =1 e uj Sigmoid e − ui tanh tanh( u i ) Lossfunctions

MSE k s − p k Categorical cross-entropy − P Mi =1 s i log( p i ) We consider a DL-based communication system representedas an autoencoder consisting of transmitter, channel, andreceiver as shown in Fig. 1, where the corresponding NNstructure is shown below. The autoencoder describes a deepNN that applies unsupervised learning in order to reconstructthe input at the output [38, Ch. 14]. At the transmitter, amessage s ∈ { , , . . . , M } is ﬁrst transformed to a vector s ∈ R M after the vector expression processing, where,say, M ∈ { , , , , } . For example, if the message s = 2 is transmitted, the corresponding vector expression isa one-hot vector s = [0 , , , . . . , T in a conventional DL-based communication system. Then, the multiple dense layers, including a rectiﬁed linear unit (ReLU) layer and a linearlayer, apply the transformation f t : R M R n to producethe transmitted signal for n discrete channel uses [20]. Thecommonly used activation functions are shown in TABLE I.Finally, the normalization layer ensures the power constraintof the transmitted signal x = [ x , . . . , x n ] T as E { x j } ≤ ( j = 1 , . . . , n ), where E {·} denotes expectation.The transmit channel is implemented by a noise layer withits output being the received signal y given by y = x + n (1)where n ∼ N ( , σ I n ) denotes zero-mean additive whiteGaussian noise (AWGN) vector where each element hasvariance σ = (2 RE b /N ) − , and where R is the data rate, E b is the energy per bit, and N denotes the noise power spectraldensity. Notably, there is no complex operation in the existingNN architectures, and the complex number is represented bytwo real numbers [19]. Consequently, we assume that allthe channel coefﬁcients have real values. Furthermore, theautoencoder-represented communication system is suitable forany type of channel without a tractable mathematical model .That is to say, the autoencoder can be applied to any typeof channel model as long as real datasets are available fortraining and learning.At the receiver, the received signal y is passed through theReLU layer to realize the transformation f r : R n R M . Thelast layer of the receiver has a softmax activation as shown inTABLE I, which is a generalization of the logistic function thatcompresses an M -dimensional vector of arbitrary real values We note that a real-world communication channel often does not have atractable mathematical model. It can be shown by simulation, multiple ReLU layers do not improve theBLER performance for our problem. to an M -dimensional probability vector p = [ p , . . . , p M ] T ,where each element p i ( i = 1 , , . . . , M ) lies in the range(0, 1], and all the elements add up to one [38]. For theconventional autoencoder scheme, the estimated message ˆ s is obtained from the index of the element having the highestprobability in p . Here, the BLER of DL-based communicationsystems is deﬁned as BLER = 1 M X s Pr(ˆ s = s ) . (2)Notably, the BLER equals the symbol error rate (SER) of theDL-based communication system.The autoencoder based communication system can betrained ofﬂine using a large training dataset, while the iterativetraining process depends on the value of loss function ineach iteration. The most common loss functions are MSE andcategorical cross-entropy as shown in TABLE I, and these lossfunctions are determined by the vector expression s and theprobability vector p . The training parameters of the autoen-coder are produced to minimize the loss function. Furthermore,the trained autoencoder with the ﬁxed NN parameters isapplied to practical communication scenarios online. B. Motivations

The one-hot vector is the conventional data representationhaving only one non-zero element. Thus, the data rate of theconventional DL-based communication system with one-hotvector is limited to R C = log Mn bits/channel use. (3)Over the last few years, the demand for high data rates hasexperienced unprecedented growth in communication systems[1], [2]. Therefore, providing a high data rate is essential forDL-based communication systems in future communications.To improve the data rate, we propose two new autoencoderschemes:1) Adaptive transmission scheme.

For the conventional one-hot vector scheme, the DL-based autoencoder is trainedover a ﬁxed-size transmission vector with dimension M at ﬁxed SNR value, which can introduce two limitations.On one hand, the trained autoencoder for a certain valueof M cannot work in the scenarios with different valuesof M . On the other hand, the performance of DL-basedcommunication systems is suboptimal when the trainedautoencoder is applied to different SNR values. For thesereasons, there is a need for a new transmission schemefor the autoencoder to improve the system performances,such as maximizing the data rate while satisfying theQoS constraint [40], [41]. Therefore, we propose anadaptive transmission scheme by adaptively selecting theoptimal transmission vectors for different SNR values,where the optimization objective is to maximize the datarate with certain MSE constraint. We comment that the bit-to-bit vector representation (e.g., , , , )does not work in the autoencoder represented communication system. Thereasons is that the bit-to-bit vector representation has an all-zero vector (e.g., ) which cannot be compressed into a probability vector. Generalized data representation scheme.

From the def-inition of the data rate R def = Number of bitsChannel uses , it is obviousthat, for the same channel environment, the data rateis proportional to the number of bits being conveyed.However, the size of transmission vector M cannot beinﬁnite due to the high complexity associated with deepNNs. Therefore, a new data representation scheme isrequired to meet the high data rate requirements in futurecommunication systems. To address this issue, we designa generalized data representation scheme that employsa new vector structure instead of the one-hot vector.The new vector structure can be generalized and usedfor communication scenarios having different data raterequirements.Based on above discussions, we are motivated to develop data-rate driven transmission strategies for DL-based communica-tion systems.As for the system performances, the autoencoder that istrained ofﬂine using a ﬁxed SNR value is expected to haverobust performance for a wide SNR region online. In [19],it was found that an unaccommodated training SNR willresult in performance degradation of DL-based communicationsystems, but there is little theoretical analysis. Consequently,the effect of the training SNR needs to be investigated and areliable criterion needs to be developed for selecting trainingSNR values. Furthermore, current literature on DL-based au-toencoder research do not analyze its performance. Therefore,we are motivated to develop an analytical framework to gaininsights into the performance of DL-based communicationsystems. III. A DAPTIVE T RANSMISSION S CHEME

In this section, an adaptive transmission scheme is employedin the DL-based communication system to maximize the datarate with the MSE constraint for different channel conditions.Figure 2 shows the adaptive transmission scheme for theDL-based communication system, which consists of threeparts.The ﬁrst part is ofﬂine training. The autoencoder includingthe transmitter and receiver is trained ofﬂine using one-hotvectors s , s , . . . , s M over a ﬁxed training SNR ( SNR T ),while M should be suitably large , for example M = 64 .After training, the trained transmitter and receiver, which willbe used in the second part and the third part, are producedwith ﬁxed parameters.The second part is online transmission and selection ofoptimal vectors. The second part includes three steps. First,each one-hot vector s i in set M = { s , . . . , s M } is transmittedthrough the trained transmitter/receiver once over the practicalchannel using an operating SNR value ( SNR P ). Here, thereceiver can obtain the probability vector p i correspondingto the transmitted vector s i . Second, the receiver calculatesthe MSE of each one-hot vector. If the MSE of the j th vector( MSE j ) is less than or equal to an MSE threshold, the receiversends the label j back to the transmitter. In total, the receiver If M is too large, the training complexity is prohibitive since theautoencoder must see every message at least once [19]. (cid:51)(cid:68)(cid:85)(cid:87)(cid:3)(cid:21)(cid:29)(cid:3)(cid:50)(cid:81)(cid:79)(cid:76)(cid:81)(cid:72)(cid:55)(cid:85)(cid:68)(cid:81)(cid:86)(cid:80)(cid:76)(cid:86)(cid:86)(cid:76)(cid:82)(cid:81)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3)(cid:54)(cid:72)(cid:79)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:82)(cid:73)(cid:3)(cid:50)(cid:83)(cid:87)(cid:76)(cid:80)(cid:68)(cid:79)(cid:3)(cid:57)(cid:72)(cid:70)(cid:87)(cid:82)(cid:85)(cid:86) (cid:55)(cid:85)(cid:68)(cid:81)(cid:86)(cid:80)(cid:76)(cid:87)(cid:87)(cid:72)(cid:85) (cid:53)(cid:72)(cid:70)(cid:72)(cid:76)(cid:89)(cid:72)(cid:85) i (cid:83) (cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:72)(cid:71)(cid:55)(cid:85)(cid:68)(cid:81)(cid:86)(cid:80)(cid:76)(cid:87)(cid:87)(cid:72)(cid:85) (cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:72)(cid:71)(cid:53)(cid:72)(cid:70)(cid:72)(cid:76)(cid:89)(cid:72)(cid:85) i (cid:83) (cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:76)(cid:81)(cid:74)(cid:3)(cid:38)(cid:75)(cid:68)(cid:81)(cid:81)(cid:72)(cid:79)(cid:51)(cid:85)(cid:68)(cid:70)(cid:87)(cid:76)(cid:70)(cid:68)(cid:79)(cid:3)(cid:38)(cid:75)(cid:68)(cid:81)(cid:81)(cid:72)(cid:79) (cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:72)(cid:71)(cid:55)(cid:85)(cid:68)(cid:81)(cid:86)(cid:80)(cid:76)(cid:87)(cid:87)(cid:72)(cid:85) (cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:72)(cid:71)(cid:53)(cid:72)(cid:70)(cid:72)(cid:76)(cid:89)(cid:72)(cid:85) (cid:51)(cid:85)(cid:68)(cid:70)(cid:87)(cid:76)(cid:70)(cid:68)(cid:79)(cid:3)(cid:38)(cid:75)(cid:68)(cid:81)(cid:81)(cid:72)(cid:79) j (cid:83) (cid:51)(cid:68)(cid:85)(cid:87)(cid:3)(cid:22)(cid:29)(cid:3)(cid:50)(cid:81)(cid:79)(cid:76)(cid:81)(cid:72)(cid:55)(cid:85)(cid:68)(cid:81)(cid:86)(cid:80)(cid:76)(cid:86)(cid:86)(cid:76)(cid:82)(cid:81)(cid:3)(cid:90)(cid:76)(cid:87)(cid:75)(cid:3)(cid:54)(cid:72)(cid:79)(cid:72)(cid:70)(cid:87)(cid:72)(cid:71)(cid:3)(cid:57)(cid:72)(cid:70)(cid:87)(cid:82)(cid:85)(cid:86)(cid:51)(cid:68)(cid:85)(cid:87)(cid:3)(cid:20)(cid:29)(cid:3)(cid:50)(cid:73)(cid:73)(cid:79)(cid:76)(cid:81)(cid:72)(cid:55)(cid:85)(cid:68)(cid:76)(cid:81)(cid:76)(cid:81)(cid:74) (cid:54)(cid:49)(cid:53) T (cid:54)(cid:49)(cid:53) P (cid:54)(cid:49)(cid:53) P i (cid:143) (cid:86) (cid:37) i (cid:143) (cid:86) (cid:37) (cid:44)(cid:73) (cid:20) j (cid:143) (cid:86) (cid:5) (cid:37) (cid:21)(cid:21) (cid:48)(cid:54)(cid:40) j j th (cid:16) (cid:100) (cid:86) (cid:83) j Fig. 2. Adaptive transmission scheme applied to the DL-based communication system. sends M labels. Third, according to the feedback labels, thetransmitter forms a new vector set M , which is deﬁned as M = { ˜ s j } , j = 1 , . . . , M , where ˜ s , ˜ s , . . . , ˜ s M are the M one-hot vectors selected from { s i } with M smallest MSEvalues. The selection goal is to maximize the data rate andsatisfy the MSE requirement as R = max log M n s . t . k s j − p j k ≤ MSE th , j = 1 , . . . , M (4)where M ≤ M satisfying M ∈ { , , , , } , and MSE th is a preset MSE threshold.The third part is online transmission with the selectedvectors. The selected M one-hot vectors are used for theautoencoder online over the current channel with SNR P .The main steps of the adaptive transmission scheme aresummarized as follows:Steps of the Adaptive Transmission Scheme1) Train the autoencoder with a large training datasetconsisting of all M possible one-hot vectors ofﬂine.2) Each one-hot vector in M is transmitted through thetrained autoencoder over the practical channel online.3) Calculate the practical MSE of each vector and select s j according to (4).4) Feedback the label j and form M = { ˜ s j } .5) Encode the message symbol using M and transmit.IV. G ENERALIZED D ATA R EPRESENTATION S CHEME

In this section, we propose a generalized data representationscheme to improve the data rate for DL-based communicationsystems.Instead of the conventional one-hot vector containing onenon-zero entry, we consider a bit vector containing m non-zeroentries to improve the data rate for DL-based communicationsystems. An m -order bit vector b ∈ R M is deﬁned as b = [1 0 · · · · · · | {z } m T (5) TABLE IIR

ESULTS OF MESSAGES TRANSFORMED TO VECTORS

Message × One-hot Vector × GDR Vector [1 , , , , , . . . , T [ , , , , , , , T [0 , , , , , . . . , T [ , , , , , , , T [0 , , , , , . . . , T [ , , , , , , , T ... ... ...14 [0 , . . . , , , , , T [0 , , , , , , , T [0 , . . . , , , , , T [0 , , , , , , , T [0 , . . . , , , , , T [0 , , , , , , , T where m = 1 , , · · · , ⌊ M/ ⌋ denotes the number of non-zero entries in b , and ⌊·⌋ is the ﬂoor operation. The bitvector provides (cid:0) Mm (cid:1) possible messages for the transmission.In general, the number of possible symbols in the constellationdiagram is a power of 2. For this reason, we only select ⌊ log ( Mm ) ⌋ out of (cid:0) Mm (cid:1) possible symbols for communications.Furthermore, for the autoencoder shown in Fig. 1, thevector s at the transmitter can be viewed as a probabilitydistribution, and the probability vector p at the receiver is thecorresponding estimated probability distribution. The traininggoal of the autoencoder is to optimize p and reconstruct s while minimizing the loss function.Thus, motivated by the above discussions, we propose ageneralized data representation as a probability distribution s = (cid:20) m · · · m · · · m (cid:21)| {z } m non-zero entries T (6)where the estimated message ˆ s can be obtained from theindices of elements with the m highest probabilities in p . Theconventional one-hot vector is a special case of the proposedGDR scheme when m = 1 . Furthermore, the proposed GDRwill be employed for the vector expression processing of thetransmitter in Fig. 1. As an example, when M = 16 , there are messages need to be transmitted. For the conventional one-hot scheme, the corresponding vectors are different × one-hot vectors s i , which are shown in the ﬁrst column of TABLE II. For the proposed GDR scheme, the correspondingvectors are also different vectors, which can be × GDRvectors with m = 2 . The GDR scheme provides (cid:0) Mm (cid:1) = 28 possible vectors for transmission, and we can randomly choose vectors as shown in the second column of TABLE II.The data rate of the DL-based communication system canbe improved by employing the proposed GDR as R = j log (cid:0) Mm (cid:1)k n bits/channel use. (7)When m = 1 , the data rate is obtained for the conventionalone-hot vector scheme in (3). The data rate increases with m ,while the value of M is suitably chosen and remains ﬁxed. Theperformance gain of the proposed GDR scheme will increasewith vector size M .The maximum achievable rate of the proposed GDR schemein the DL-based communication system is derived as C = log (1 + SNR) = log (cid:18) σ (cid:19) (8) = log  E b · j log (cid:0) Mm (cid:1)k N · n  bits/s/Hz.It can be shown that the achievable rate can be improved byusing the proposed GDR scheme in the DL-based communica-tion system. For example, when M = 16 , the proposed GDRscheme with m = 6 has nearly . (bits/s/Hz) performancegain compared with the conventional one-hot vector schemeat E b /N = 20 dB for seven channel uses.Furthermore, the proposed GDR can be directly appliedto the proposed adaptive transmission scheme by using thegeneralized data representation. Combining the proposed twoschemes, we obtain an adaptive GDR-based transmissionscheme that can create further beneﬁts for the DL-basedcommunication system.V. P ERFORMANCE A NALYSIS OF THE A UTOENCODER

In this section, we provide a theoretical analysis of MSEperformance for DL-based communication systems. Such ananalysis can be applied to two proposed schemes and otherautoencoder-represented schemes.

A. MSE Performance Analysis

In Fig. 1, the output of the ReLU layer at receiver can bewritten as u = f r ( y ) , f ReLU ( W r y + b r ) (9)where f ReLU ( a ) = max { a, } ; W r and b r denote thetrainable parameters of the ReLU layer, and they are deﬁnedas W r =  w w · · · w n w w · · · w n ... ... . . . ... w M w M · · · w Mn  and b r =  b b ... b M  (10) The vector selection is done here arbitrarily, and we leave the optimalvector selection as an open research problem. respectively, where w ij , i = 1 , . . . , M , j = 1 , . . . , n , repre-sents the symmetric interaction term between unit u i and unit y j in Fig. 1, and b i is the bias term. Thus, from (1) and (9),the i th element of u is given by u i = max { [ W r ] i, : ( x + n ) + b i , } (11)where [ W r ] i, : is the i th row of W r .Next, a probability vector is derived from the softmaxfunction at the receiver, and its i th element can be writtenas p i = e u i P Mk =1 e u k . (12)From (11)-(12), in the ofﬂine training processing, different SNR = σ will lead to different trainable parameters W r and b r , which will affect u i in (11). As a result, p i , the probabilityof the i th element is directly affected by the training SNR.Also, in the online practical transmission, the trainable param-eters W r and b r are constant since the autoencoder has beentrained. When the autoencoder is applied to a different SNRscenario online, it will lead to a different estimated probabilityvector p as well. The effect of SNR will also be studiedthrough simulations.In Appendix A, it is shown that, based on (12), the prob-ability vector at the receiver in Fig. 1 can be approximatedas p ≈ Fu (13)where F ∈ R M × M is a diagonal matrix that is equivalent tothe effect of softmax activation layer. It must be highlightedthat, after training, the obtained F is constant when applyingto online transmissions.At the receiver, the output of the ReLU layer u consists ofzero and non-zero elements as shown in (11). In this paper, weaim to analyze the effect of SNR on MSE performance. Whilethe zero elements cannot reﬂect the characteristic of MSE, thenon-zero output of the ReLU layer is considered and can bederived from (11) as u + = W r ( x + n ) + b r (14)if [ W r ] i, : ( x + n ) + b i > . (15)Thus, the probability vector p under the assumption of (15)can be expressed as p + ≈ F + u + (16)where F + ∈ R M × M is the equivalent matrix of softmaxactivation layer in the non-zero case as (15), and entries of F + are ﬁxed after training.Here, the average MSE of the DL-based communicationsystem in the case of (15) can be given from (14) and (16) as MSE = E (cid:8) k p + − s k (cid:9) (17) ≈ E (cid:8) k F + ( W r x + b r ) + F + W r n − s k (cid:9) = E (cid:8) k F + ( W r x + b r ) − s k (cid:9) + k F + W r k σ . TABLE IIIP

ARAMETERS FOR THE AUTOENCODER SETUP

Parameter Value

Optimizer Adam [42]Loss function MSE Epoch 150Batch size 45Trained samples 2 × Test samples 1 × After the autoencoder is trained over

SNR T , the transfor-mation parameters F + , W r and b r in (17) are constant,where σ n T is the noise variance at the training scenario.When the trained autoencoder is applied to the practicalcommunication scenario with SNR P , the noise variance of thecurrent practical channel scenario is σ n P . For the non-zerocase, it can be observed from (17) that, when σ n P < σ n T ,the practical MSE performance will be better than that of thetraining scenario; when σ n P > σ n T , the converse is true. Itindicates that the trained autoencoder can attain better systemperformance when it is applied to higher SNR scenario. Forthe zero case in (11), the variance of noise has no effect onthe MSE performance. The MSE performance of the DL-based communication system will also be veriﬁed throughsimulations. B. Training SNR Set Strategy

In conventional DL-based communication systems, the au-toencoder is trained over a ﬁxed SNR value ofﬂine, and itcan suffer performance degradation when operating in envi-ronments having mismatched SNR values. Here, we proposea training SNR set strategy by employing multiple trainingSNRs, and it will improve the diversity of training datasetto obtain robust performance. For example, the training SNRset can be designed to

SN R T = {− , − , , , } dBfor ofﬂine training. Also, the system performance gain ofthe proposed training SNR set strategy will be shown bysimulation results.VI. N UMERICAL R ESULTS

In this section, we evaluate the numerical results of theproposed adaptive transmission scheme, the GDR scheme,the adaptive GDR-based transmission scheme, and the sys-tem performances in the DL-based communication systemvia simulations on the TensorFlow framework. In all thesimulations, the autoencoder is trained over the stochasticAWGN channel model with n = 7 channel uses withoutexhaustive hyperparameter tuning. Here, we use the sameset of parameters for the autoencoder setup as described inTABLE III.TABLE IV presents the simulated and theoretical numberof training parameters in autoencoder, where different sizeof the data representation M is employed. From TABLE IV,it is clear that the simulated number of trainable parametersincreases with M from to , including the total number (of For convenience, the MSE loss function is used to show the effect of SNRon the MSE performance and to verify the analysis in Subsection V-A.

SNR (dB) -4 -3 -2 -1 BE R One-hot: M=16, m=1Hamming with MLHamming with HD

Fig. 3. Simulated BER performance for the autoencoder and conventionalcommunication schemes. trainable parameters) and the number (of trainable parameters)in each layer except for the normalization layer. The simulatedresults agree with the theoretical number of parameters asshown in the last row of TABLE IV. The increasing numberof training parameters leads to an increased complexity fortraining. For the conventional one-hot vector, the data rate canbe improved by increasing M as shown in (3) at the cost ofhigh complexity. While the data rate of the proposed GDRscheme can be improved by controlling the number of non-zero elements m as well as the value of M as shown in (7). A. Performance of the Autoencoder and Conventional Com-munication System

This subsection shows the simulated bit-error rate (BER)performance of the autoencoder scheme with one-hot vectorsand the conventional communication scheme employing Ham-ming code, where the training SNR is dB.Figure 3 shows the simulated BER performance of theDL-based autoencoder scheme with M = 16 and m = 1 (one-hot vector) and the conventional communication scheme,where the conventional communication scheme employs bi-nary phase-shift keying (BPSK) modulation and a ( , ) Ham-ming code with either binary hard-decision (HD) or maximum-likelihood (ML) decoding. Given the same information trans-mission rate (transmitting four information bits over sevenchannel uses), it can be seen that the BER performance ofthe autoencoder scheme is better than that of the conventionalcommunication scheme employing Hamming code with MLdecoding or HD decoding. It is worth pointing out that the au-toencoder approach does not use any error control strategy forthe noisy channel, and it still outperforms a classical schemethat employs error control strategy. It was reported in [19]that an autoencoder can achieve similar BLER performancecompared to a conventional channel-coded scheme.Figure 4 depicts the simulated BER performance of the DL-based autoencoder that employs gray coding and the conven-tional one-hot vector with the vector size M = 4 , , , , TABLE IVT

RAINING PARAMETERS OF AUTOENCODER

Vectorsize Multipledense layers Normaliz-ation layer ReLUlayer Softmaxlayer TotalSimulatedparameters M = 4 55 14 32 20 121 M = 8 135 14 64 72 285 M = 16 391 14 128 272 805 M = 32 1287 14 256 1056 2613 M = 64 4615 14 512 4160 9301 Theoreticalparameters M ( M + 1)( M + n ) 2 n M ( n + 1) M ( M + 1) (2 M + 3)( M + n ) TABLE VT

HE NUMBER OF ADAPTIVELY SELECTED VECTORS M FOR DIFFERENT

SNR

VALUES AND

MSE th M SNR= − dB SNR= − dB SNR= − dB SNR= dB SNR= dB SNR= dB MSE th = 10 − th = 10 − th = 10 − SNR (dB) -4 -3 -2 -1 BE R One-hot: M=32, m=1One-hot: M=16, m=1One-hot: M=8, m=1One-hot: M=4, m=1

Fig. 4. Simulated BER for the autoencoder employing the conventionalone-hot vector scheme with different vector size M . where the training SNR is dB. In Fig. 4, the BER ofthe conventional one-hot vector scheme increases when M is varied from to . B. Performance of the Proposed Adaptive TransmissionScheme

In this subsection, we show the simulated BLER and MSEperformance of the proposed adaptive transmission scheme inthe DL-based communication system. Here, the autoencoderis trained using

SNR T = 5 dB.Figure 5 depicts the simulated BLER performance of theDL-based autoencoder that employs the proposed adaptivetransmission scheme and the conventional one-hot vectorscheme, where the MSE thresholds are − , − , and − . First, it can be seen from Fig. 5 that the BLER ofthe conventional one-hot vector scheme increases when M is varied from to , since smaller value of M requiresless trainable parameters as shown in TABLE IV. With thesame training dataset, the less trainable parameters contribute -5 -4 -3 -2 -1 0 1 2 3 4 5 SNR (dB) -6 -5 -4 -3 -2 -1 B L E R One-hot: M=4, m=1One-hot: M=16, m=1One-hot: M=32, m=1One-hot: M=64, m=1Proposed: MSE th =10 -4 Proposed: MSE th =10 -5 Proposed: MSE th =10 -6 Fig. 5. Simulated BLER for the autoencoder with conventional one-hot vectorand proposed adaptive transmission schemes, while the MSE thresholds are − , − , and − . to better training accuracy. Second, for the proposed adaptivetransmission scheme, the BLER increases when the MSEthreshold is increased from − to − in Fig. 5. Thereason is that, to maximize the data rate, a lower MSEthreshold (means the tighter bound) requires smaller M tosatisfy the MSE constraint, which results in lower BLER. Asshown in TABLE V, for each MSE threshold, the number ofselected vectors M adaptively increases from to withthe increasing SNR value. For example, for MSE th = 10 − ,when SNR is changing from − dB to dB, the M valuechanges accordingly as , , , , , . For this reason,higher SNR value makes it easy to meet the MSE requirementand as a result, a larger value M is obtained for maximizingthe data rate. Fig. 5 shows that, when the data rates are thesame, i.e. M = M , the adaptive transmission scheme canreduce the BLER of the one-hot vector scheme by . Thereason for the performance gain is that the proposed adaptivetransmission scheme can select the optimal vectors that meetthe MSE requirement as shown in (4). -5 -4 -3 -2 -1 0 1 2 3 4 5 SNR (dB) D a t a r a t e ( b i t s / c hanne l u s e ) One-hot: M=4One-hot: M=16One-hot: M=32One-hot: M=64 Proposed: MSE th =10 -4 Proposed: MSE th =10 -5 Proposed: MSE th =10 -6 Fig. 6. Data rate performance for the autoencoder with conventional andadaptive transmission schemes, while the MSE thresholds are − , − ,and − . -5 -4 -3 -2 -1 0 1 2 3 4 5 SNR (dB) -8 -7 -6 -5 -4 -3 S i m u l a t ed M SE MSE th =10 -4 MSE th =10 -5 MSE th =10 -6 M =4 M =4M =4M =4 M =64M =64M =32M =16 M =32 M =64M =4 M =64M =16 M =32 Fig. 7. Simulated MSE for the autoencoder employing the adaptivetransmission scheme with the MSE thresholds being − , − and − . Figure 6 illustrates the data rate performance of the autoen-coder that employs the conventional one-hot vector schemeand the proposed adaptive transmission scheme with the MSEthresholds being − , − , and − . From Fig. 6, weobserve that the data rates of the conventional one-hot vectorscheme are constant for all SNR values. However, in Fig. 6,the data rate of the proposed adaptive transmission schemeincreases with SNR as shown in TABLE V. From Fig. 5 andFig. 6, it can be seen that the proposed adaptive transmissionscheme can obtain better BLER performance than that of theconventional one-hot vector scheme when operating at thesame data rate.Figure 7 presents the simulated MSE performance for apractical communication system that employs the proposedadaptive transmission scheme with MSE thresholds being − , − and − . It is seen from Fig. 7 that the simulatedMSE of the proposed adaptive transmission scheme increases -5 0 5 SNR (dB)(a) -4 -3 -2 -1 B L E R M=8

GDR: m=4GDR: m=3GDR: m=2One-hot: m=1 -5 0 5

SNR (dB)(b) -4 -3 -2 -1 B L E R R=6/7 (bits/channel use)

One-hot: M=64, m=1GDR: M=16, m=2GDR: M=8, m=4

Fig. 8. Simulated BLER for the autoencoder employing different datarepresentations with (a) M = 8 and (b) R = 6 / (bits/channel use), whilethe trained SNR is 5 dB. with MSE threshold. Furthermore, the simulated MSE of theproposed scheme decreases while the SNR increases, whichis consistent with the prediction in (17). As expected, whenthe simulated MSE reaches the corresponding MSE threshold,the number of selected vectors M is almost which is themaximum value, and the maximum data rate is obtained. C. Performance of the Proposed GDR Scheme

This subsection shows the BLER performance and themaximum achievable rate of the proposed GDR scheme inthe DL-based communication system, where the training SNRis dB.Figure 14 shows the simulated BLER performance of theDL-based communication system that employs the proposedGDR and conventional one-hot vector schemes, while theschemes in (a) have the same vector size M = 8 and theschemes in (b) have the same data rate R = 6 / (bits/channeluse). In Fig. 14 (a), for the same vector size M = 8 , theproposed GDR schemes ( m = 2 , , ) obtain comparableBLER performances when compared to the conventional one-hot vector scheme ( m = 1 ). It indicates that, with the samevector size, the number of non-zero elements in s has little ef-fect on the BLER performance. Even the BLER performancesare similar, the data rates of the GDR schemes and the one-hotvector scheme are different and they are shown in TABLE VI.It can be seen from TABLE VI that, with M = 8 , the datarates of the proposed GDR schemes are R = 6 / , / , / (bits/channel use) respectively with m = 4 , , . The data ratesof all GDR schemes are greater than that of the conventionalone-hot vector scheme as R = 3 / (bits/channel use), andthe GDR scheme with m = 4 can double the data rate ofthe one-hot vector scheme. In Fig. 14 (b), with the samedata rate R = 6 / (bits/channel use) including the proposedschemes M = 8 with m = 4 , M = 16 with m = 2 , andthe conventional scheme M = 64 with m = 1 , the proposedGDR schemes have better BLER performance than that of the TABLE VID

ATA RATE OF THE

DL-

BASED COMMUNICATION SYSTEM

One-hot GDR GDR GDR GDR One-hot M =8 , m =1 M =8 , m =2 M =8 , m =3 M =8 , m =4 M =16 , m =2 M =64 , m =1 Data rate(bits/channel use) / / / / / / -15 -10 -5 0 5 10 15 E b /N (dB) M a x i m u m A c h i e v ab l e R a t e ( b i t s / s / H z ) GDR: M=64, m=2One-hot: M=64, m=1GDR: M=16, m=2GDR: M=8, m=4GDR: M=8, m=3GDR: M=8, m=2One-hot: M=8, m=1

Fig. 9. Maximum achievable rate for the autoencoder with different datarepresentations. conventional one-hot vector scheme, and these performancegains are achieved with the GDR schemes with lower trainingcomplexity, i.e., less number of training parameters as shownin TABLE IV. Obviously, the BLER decreases with the vectorsize M for the same reason as that in Fig. 5. Furthermore, itcan be found that the proposed GDR scheme can avoid theperformance degradation by increasing m , when the transmis-sion message size is large. For example, the proposed GDRscheme M = 16 with m = 8 can transmit ⌊ log ( ) ⌋ = 32768 messages by using × vectors. However, to achieve thesame data rate, the size of one-hot vector should be × at least, which will lead to signiﬁcant BLER performancedegradation. In both Fig. 14 (a) and (b), the simulated BLERis less than − when the SNR is 5 dB, which demonstratesthat the autoencoder attains a high accuracy with sufﬁcienttraining over SNR T = 5 dB.Figure 9 illustrates the maximum achievable rate of aDL-based communication system employing different datarepresentations. It can be seen from Fig. 9 that, with M = 8 ,the maximum achievable rate increases when the order m increases from to , which is consistent with the result in(8). This shows that the proposed GDR scheme can obtain aremarkable achievable rate improvement. Notably, the perfor-mance gain of the proposed GDR scheme is increased whenthe vector size M increases. As shown in Fig. 9, the GDRscheme employing M = 64 with m = 2 has a great per-formance gain when compared with the conventional schemeemploying M = 64 with m = 1 . Besides, the maximumachievable rate of the proposed GDR schemes ( M = 8 with m = 4 and M = 16 with m = 2 ) is same as that of the -5 -4 -3 -2 -1 0 1 2 3 4 5 SNR (dB) -4 -3 -2 -1 B L E R Conventional One-hot: M=64, m=1Adaptive One-hot: M=64, m=1GDR: M=8, m=4Adaptive GDR: M=8, m=4

Fig. 10. Simulated BLER for the autoencoder employing four differentschemes, while the trained SNR is 5 dB. conventional one-hot vector scheme ( M = 64 , m = 1 ) in Fig.9. To obtain the same achievable rate with the GDR scheme,the conventional one-hot vector scheme needs to increase thevector size M , which has been shown in Fig. 14 to degradethe BLER performance.Figure 10 presents the simulated BLER performance of theDL-based communication system that employs four schemes,including the conventional one-hot vector ( M = 64 , m = 1 ),the adaptive transmission (based on one-hot vector), the GDR( M = 8 , m = 4 ), and the adaptive GDR-based transmissionschemes . The MSE threshold of the two adaptive schemesis MSE th = 10 − . The achievable maximal data rate ofall four schemes is the same, R = 6 / (bits/channel use).In Fig. 10, the adaptive GDR-based transmission schemeachieves the best BLER performance in all SNR regions, i.e.,the adaptive GDR-based transmission scheme outperforms theGDR ( M = 8 , m = 4 ) scheme in low SNR region andthe adaptive transmission (based on one-hot vector) schemein high SNR region. With the MSE th = 10 − , the adap-tive GDR-based transmission scheme selects M vectors fortransmission, when SNR is changing from − dB to dB,the M value changes accordingly as , , , , , . Itcan be seen that, when SNR = − , − , − dB, the BLERof the adaptive GDR-based transmission scheme is similar tothat of the adaptive transmission (based on one-hot vector)scheme since they have similar M ; when SNR = 1 , , dB,the BLER of the adaptive GDR-based transmission scheme is Except the adaptive GDR-based transmission scheme, the BLER perfor-mance results of the other three schemes have been shown in Fig. 5 and Fig.14 (b). Epoch Lo ss f un c t i on v a l ue SNR T = 20 dBSNR T = 10 dBSNR T = 0 dBSNR T = -10 dBSNR T = -20 dBSNR T = -30 dBSNR T [-20,20] dB Fig. 11. Simulated loss function performance of autoencoder in trainingprocess, while different ﬁxed training SNRs and training SNR set areemployed. similar to that of the GDR scheme ( M = 8 , m = 4 ), and thereason is that the adaptive GDR-based transmission schemeselects all vectors for usage, in this case, the adaptiveGDR-based transmission scheme is equal to the GDR scheme( M = 8 , m = 4 ). D. Performance Comparison of Different Training SNR

In this subsection, we investigate the effect of training SNRon system performance including the loss function perfor-mance in training process, the simulated BLER, and MSEperformances in practical transmission process. Here, the datarepresentation parameters are M = 8 and m = 1 , and SNR T denotes the training SNR.Figure 11 shows the simulated loss function performancein training processing, when the autoencoder is trained overdifferent SNRs and SNR set. The SNR set is designed as SN R T = {− , − , , , } dB, which includes all theﬁxed SNRs except for − dB. In Fig. 11, an epoch is theprocess that the entire training dataset is passed through theautoencoder once. As shown in Fig. 11, when the SNR T isincreased from − dB to dB, the loss function valuedecreases and the convergence of loss function improves,which indicates that the good channel environment contributesto the improvement of the training performance. However, with SNR T = − dB, the loss value does not converge within epoches. Furthermore, it can be seen from Fig. 11 that, the lossvalue of the autoencoder training with SNR set is similar tothat of the autoencoder training with SNR T = − dB. Thesimulated results suggest that the training SNR has signiﬁcanteffect on the training performance of the autoencoder.Figure 15 depicts the simulated BLER performance ofthe practical DL-based communication system employing thetrained autoencoder with different ﬁxed training SNRs andtraining SNR set. In Fig. 15, the BLER decreases with SNR T ranging from dB to − dB. The reason is that, withthe lower training SNR (that is to say the worse channel -20 -15 -10 -5 0 5 10 15 20 SNR (dB) -4 -3 -2 -1 B L E R SNR T = 20 dBSNR T = 10 dBSNR T = 0 dBSNR T = -10 dBSNR T = -20 dBSNR T = -30 dBSNR T ∈ [-20,20] dB Fig. 12. Simulated BLER for the DL-based communication system employ-ing the trained autoencoder with different ﬁxed training SNRs and trainingSNR set. environment), the autoencoder needs to learn more featuresto reconstruct the input at the output, which leads to arobust autoencoder and better BLER performance. However,the training SNR has a lower bound for the autoencoder. Asshown in Fig. 15, when

SNR T = − dB, the BLER isapproximately 0.6, which demonstrates that the autoencodertrained over this channel environment cannot learn the featuresanymore. It is consistent with the non-convergence perfor-mance of the loss function with SNR T = − dB in Fig.11. Besides, Fig. 15 shows that the BLER performance of thetraining SNR set scheme is similar to that of SNR T = − dB scheme, which is almost the best performance except forthe SNR T = − dB scheme. It shows that training withSNR set can improve the generalization performance of theautoencoder. From Fig. 11 and Fig. 15, it can be found that,with a higher training SNR value, the autoencoder obtainsbetter convergence performance in training but worse BLERperformance. The simulated results indicate that the trainingSNR will directly affect the system performance, which agreeswith the analysis in Subsection V-A.Figure 13 illustrates the simulated MSE performance ofthe practical DL-based communication system employing dif-ferent trained autoencoders, while the training SNRs includedifferent ﬁxed SNRs and SNR set. In Fig. 13, it is clear that theMSE decreases when SNR is increased. It indicates that theMSE performance improves when the trained autoencoder isapplied to a higher SNR scenario, which is consistent with theanalysis in (17). Furthermore, the simulated MSE performancein Fig. 13 is similar to the BLER performance as shown inFig. 15 for the same reasons.VII. C ONCLUSION AND F UTURE R ESEARCH

In this paper, we proposed two new transmission schemesto address the problem of limited data rate in DL-basedcommunication system using autoencoder. We designed anadaptive transmission scheme for different channel conditionsto maximize the data rate with a mean square error constraint. -20 -15 -10 -5 0 5 10 15 20 SNR (dB) -7 -6 -5 -4 -3 -2 -1 S i m u l a t ed M SE SNR T = 20 dBSNR T = 10 dBSNR T = 0 dBSNR T = -10 dBSNR T = -20 dBSNR T = -30 dBSNR T ∈ [-20,20] dB Fig. 13. Simulated MSE for the DL-based communication system employingthe trained autoencoder with different ﬁxed training SNRs and training SNRset.

Furthermore, we proposed the GDR scheme to obtain higherdata rate than the conventional one-hot vector scheme witha similar BLER performance. Besides, the effect of trainingSNR and MSE performance were analyzed and veriﬁed bysimulations. We discovered that high training SNR can leadto good convergence in training process but worse BLERperformance for practical transmission. We also introduceda training SNR set strategy to address the tradeoff betweenconvergence and error rate. It was shown that the autoencodertrained over a low SNR can attain better BLER and MSEperformances when operating in the high SNR region. As aresult, it is concluded that training the autoencoder at a lowerSNR value, in general, will lead to good system performance.For a low SNR value, say,

SNR T = − dB, numericalresults indicate that the loss function value does not convergeand the BLER degrades dramatically. This suggests that thestudied autoencoder system is unable to learn from very noisydata set. It should be emphasized that the current systemassumes neither knowledge about the noise nor about thesystem model. Therefore, one interesting research problem isto study the low SNR communication using DL techniqueswhen partial knowledge about the noise and the system modelis known.To further improve the performance of the DL-based com-munication systems, we can possibly employ the ensemblemethod where the results of a set of individual NNs arecombined to estimate the transmitted message. In classiﬁerproblems, it has been shown that the ensemble method iseffective in improving accuracy or decomposing a complexproblem into easier subproblems [43].A PPENDIX AD ERIVATION OF (13)Let p = Fu . (18) According to (12), eq. (18) can be formulated as P Mk e u k  e u e u ... e u M  =  f f · · · f M f f · · · f M ... ... . . . ... f M f M · · · f MM  u u ... u M  (19)and we can obtain that e u i =( f i u + f i u + · · · + f ii u i + · · · + f iM u M ) M X k =1 e u k . (20)Next, e u i can be approximated according to the Taylor’stheorem as e u i ≈ u i + u i

2! + · · · + u Ni N ! (21)where N is a sufﬁciently large integer.Finally, combining (20) and (21), we can derive the elementsof matrix F as f ij ≈ ( P Mk =1 e uk (cid:16) u − i +1+ u i + · · · + u N − i N ! (cid:17) i = j i = j . (22)Thus, eq. (22) shows that the probability vector p at thereceiver can be approximated as (13).A CKNOWLEDGMENT

We thank all anonymous reviewers and the editor for theirconstructive comments that have signiﬁcantly improve theoriginal manuscript. In particular, we thank the editor Prof.Vaneet Aggarwal for suggesting us investigate the adaptiveGDR-based transmission scheme.R

EFERENCES[1] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski,“Five disruptive technology directions for 5G,”

IEEE Commun. Mag. ,vol. 52, no. 2, pp. 74–80, Feb. 2014.[2] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless net-works: A comprehensive survey,”

IEEE Commun. Surveys Tut. , vol. 18,no. 3, pp. 1617–1655, 3rd Quart. 2016.[3] P. Popovski, “Ultra-reliable communication in 5G wireless systems,” in

Proc. 1st Int. Conf. 5G Ubiq. Connect. , Nov. 2014, pp. 1–6.[4] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “To-ward massive machine type cellular communications,”

IEEE WirelessCommun. , vol. 24, no. 1, pp. 120–128, Feb. 2017.[5] J. Hoydis, S. ten Brink, and M. Debbah, “Massive MIMO in the UL/DLof cellular networks: How many antennas do we need?”

IEEE J. Sel.Areas Commun. , vol. 31, no. 2, pp. 160–171, Feb. 2013.[6] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “MassiveMIMO for next generation wireless systems,”

IEEE Commun. Mag. ,vol. 52, no. 2, pp. 186–195, Feb. 2014.[7] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broad-band systems,”

IEEE Commun. Mag. , vol. 49, no. 6, pp. 101–107, June2011.[8] X. Ge, S. Tu, G. Mao, C. Wang, and T. Han, “5G ultra-dense cellularnetworks,”

IEEE Wireless Commun. , vol. 23, no. 1, pp. 72–79, Feb.2016.[9] H. Zhang, Y. Dong, J. Cheng, M. J. Hossain, and V. C. M. Leung,“Fronthauling for 5G LTE-U ultra dense cloud small cell networks,”

IEEE Wireless Commun. , vol. 23, no. 6, pp. 48–53, Dec. 2016.[10] J. Zhu, J. Wang, Y. Huang, S. He, X. You, and L. Yang, “On optimalpower allocation for downlink non-orthogonal multiple access systems,”

IEEE J. Sel. Areas Commun. , vol. 35, no. 12, pp. 2744–2757, Dec. 2017.[11] X. Chen and X. Lin, “Big data deep learning: Challenges and perspec-tives,”

IEEE Access , vol. 2, pp. 514–525, May 2014.[12] J. Schmidhuber, “Deep learning in neural networks: An overview,”

Neural Networks , vol. 61, pp. 85–117, 2015. [13] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wirelessnetworks: A comprehensive survey,” IEEE Commun. Surveys Tut. ,vol. 20, no. 4, pp. 2595–2621, 4th Quart. 2018.[14] D. Yu and L. Deng, “Deep learning and its applications to signal andinformation processing, exploratory DSP,”

IEEE Signal Process. Mag. ,vol. 28, no. 1, pp. 145–154, Jan. 2011.[15] C. Jiang, H. Zhang, Y. Ren, Z. Han, K. Chen, and L. Hanzo, “Ma-chine learning paradigms for next-generation wireless networks,”

IEEEWireless Commun. , vol. 24, no. 2, pp. 98–105, Apr. 2017.[16] T. Wang, C. K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deeplearning for wireless physical layer: Opportunities and challenges,”

China Commun. , vol. 14, no. 11, pp. 92–111, Nov. 2017.[17] X. You, C. Zhang, X. Tan, S. Jin, and H. Wu, “AI for 5G: Researchdirections and paradigms,”

Science China Information Sciences , Sep.2018.[18] N. Huang, M. Chen, W. Xu, and J.-Y. Wang, “Incorporating importancesampling in EM learning for sequence detection in SPAD underwaterOWC,”

IEEE Access , vol. 7, pp. 4529–4537, 2019.[19] T. O’Shea and J. Hoydis, “An introduction to deep learning for thephysical layer,”

IEEE Trans. Cogn. Commun. Netw. , vol. 3, no. 4, pp.563–575, Dec. 2017.[20] S. D¨orner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learningbased communication over the air,”

IEEE J. Sel. Topics Signal Process. ,vol. 12, no. 1, pp. 132–143, Feb. 2018.[21] V. Vanhoucke, A. Senior, and M. Z. Mao, “Improving the speed of neuralnetworks on CPUs,” in

Deep Learn. Unsupervised Feature Learn. NIPSWorkshop , 2011.[22] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linearcodes using deep learning,” in

IEEE Annu. Allerton Conf. Commun.Control Comput. (Allerton) , Sep. 2016, pp. 341–346.[23] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein,and Y. Beery, “Deep learning methods for improved decoding of linearcodes,”

IEEE J. Sel. Topics Signal Process. , vol. 12, no. 1, pp. 119–131,Feb. 2018.[24] T. Gruber, S. Cammerer, J. Hoydis, and S. t. Brink, “On deep learning-based channel decoding,” in

IEEE Annu. Conf. Inf. Sci. Syst. (CISS) ,Mar. 2017, pp. 1–6.[25] T. O’Shea, K. Karra, and T. Clancy, “Learning to communicate: Channelauto-encoders, domain speciﬁc regularizers, and attention,” in

IEEE Int.Symp. Signal Process. Inf. Technol. , Dec. 2016, pp. 1–6.[26] T. Erpek, T. J. O’Shea, and T. C. Clancy, “Learning a physical layerscheme for the MIMO interference channel,” in

IEEE Int. Conf. Commun(ICC) , May 2018, pp. 1–5.[27] T. J. OShea, T. Roy, and T. C. Clancy, “Over-the-air deep learning basedradio signal classiﬁcation,”

IEEE J. Sel. Topics Signal Process. , vol. 12,no. 1, pp. 168–179, Feb. 2018.[28] A. Felix, S. Cammerer, S. D¨orner, J. Hoydis, and S. t. Brink, “OFDM-autoencoder for end-to-end learning of communications systems,” arXivpreprint arXiv:1803.05815 , 2018.[29] C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO CSIfeedback,”

IEEE Wireless Commun. Lett. , vol. 7, no. 5, pp. 748–751,Oct. 2018.[30] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based channelestimation for beamspace mmwave massive MIMO systems,”

IEEEWireless Commun. Lett. , vol. 7, no. 5, pp. 852–855, Oct. 2018.[31] T. Wang, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based CSIfeedback approach for time-varying massive MIMO channels,”

IEEEWireless Commun. Lett. , vol. 8, no. 2, pp. 416–419, Apr. 2019.[32] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” arXivpreprint arXiv: 1706.01151 , 2017.[33] M. Kim, N. Kim, W. Lee, and D. Cho, “Deep learning-aided SCMA,”

IEEE Commun. Lett. , vol. 22, no. 4, pp. 720–723, Apr. 2018.[34] S. Xue, Y. Ma, N. Yi, and R. Tafazolli, “Unsupervised deep learningfor MU-SIMO joint transmitter and noncoherent receiver design,”

IEEEWireless Commun. Lett. , vol. 8, no. 1, pp. 177–180, Feb 2019.[35] K. Kim, J. Lee, and J. Choi, “Deep learning based pilot allocationscheme (DL-PAS) for 5G massive MIMO system,”

IEEE Commun. Lett. ,vol. 22, no. 4, pp. 828–831, Apr. 2018.[36] H. Ye, G. Y. Li, and B. Juang, “Power of deep learning for channelestimation and signal detection in OFDM systems,”

IEEE WirelessCommun. Lett. , vol. 7, no. 1, pp. 114–117, Feb. 2018.[37] H. Kim, Y. Jiang, S. Kannan, S. Oh, and P. Viswanath, “Deepcode:Feedback codes via deep learning,” in

Advances in Neural Inf. Process.Syst. , 2018, pp. 9436–9446.[38] I. Goodfellow, Y. Bengio, and A. Courville,

Deep Learning . Cambridge,MA, USA: MIT Press, 2016. [39] A. D. Stefano, O. Mirabella, G. D. Cataldo, and G. Palumbo, “On the useof neural networks for Hamming coding,” in

IEEE Int. Symp. Circuitsand Systems , vol. 3, Jun. 1991, pp. 1601–1604.[40] L. Wu, Z. Zhang, and H. Liu, “Adaptive modulation with ﬁnite rate feed-back for QR decomposition-successive interference cancellationbasedmultiple-in multiple-out systems,”

IET Commun. , vol. 7, no. 5, pp. 456–462, Mar. 2013.[41] X. Chen, L. Wu, Z. Zhang, J. Dang, and J. Wang, “Adaptive modulationand ﬁlter conﬁguration in universal ﬁltered multi-carrier systems,”

IEEETrans. Wireless Commun. , vol. 17, no. 3, pp. 1869–1881, Mar. 2018.[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980 , 2014.[43] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wo´zniak,“Ensemble learning for data stream analysis: A survey,”

InformationFusion , vol. 37, pp. 132–156, 2017.[44] X. Chen, J. Cheng, Z. Zhang, L. Wu, J. Dang, and J. Wang, “Data-ratedriven transmission strategies for deep learning based communicationsystems,”

IEEE Trans. Commun. , vol. 68, no. 4, pp. 2129–2142, Apr.2020.[45] T. Erseghe, “On the evaluation of the polyanskiy-poorcverd conversebound for ﬁnite block-length coding in awgn,”

IEEE Trans. Inform.Theory , vol. 61, no. 12, pp. 6578–6590, Dec. 2015.[46] X. Chen,

Source Code. , [Online]. Available:https://github.com/EveraChen/GDR-for-autoencoder. C ORRECTIONS TO “D ATA -R ATE D RIVEN T RANSMISSION S TRATEGIES FOR D EEP L EARNING B ASED C OMMUNICATION S YSTEMS ” In [44], the simulation results are obtained by using theBatchNormalization in the TensorFlow framework. The Batch-Normalization was used to ensure the power constraint of thetransmitted signal x = [ x , . . . , x n ] T as E { x j } ≤ . However,in the testing phase, it was found the trainable parametersof the BatchNormalization layer were not updated from thetraining results, which led to unbounded transmission power.Thus, the results in Fig. 8 (a) and Fig. 12 are incorrect, becausethey did not satisfy the converse bounds of average powerconstraint over the additive white Gaussian noise channel [45].With correct normalization (e.g. l2 normalization), Fig. 8(a) and Fig. 12 in [44] should appear as Fig. 14 and Fig. 15,respectively. SNR (dB) -4 -3 -2 -1 B L E R One-hot: M=8, m=1GDR: M=8, m=2GDR: M=8, m=3GDR: M=8, m=4

Fig. 14. Simulated BLER for the autoencoder employing different datarepresentations, when the training SNR is 10 dB. -10 -5 0 5 10 15 20

SNR (dB) -4 -3 -2 -1 B L E R SNR T = -10 dBSNR T = 0 dBSNR T = 10 dBSNR T = 20 dBSNR T = 30 dBSNR T [0,30] dB Fig. 15. Simulated BLER for the communication system employing thetrained autoencoder having different ﬁxed training SNRs and training SNRset.