[PDF] MIST: A Novel Training Strategy for Low-latency Scalable Neural Net Decoders

Abstract

In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1\% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent.

Full PDF

MMIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders

Kumar Yashashwi * , Deepak Anand * , Sibi Raj B Pillai * , Prasanna Chaporkar * , K Ganesh †* Department of Electrical Engineering, Indian Institute of Technology Bombay, India † Manufacturing & Supply Chain Center of Competence, McKinsey & Company, IndiaEmail: * { kryashashwi, deepakanand, bsraj, chaporkar } @ee.iitb.ac.in, † k [email protected] Abstract —In this paper, we propose a low latency, robustand scalable neural net based decoder for convolutional andlow-density parity-check (LPDC) coding schemes. The proposeddecoders are demonstrated to have bit error rate (BER) andblock error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8times higher decoding speed. The enhanced decoding speed is dueto the use of convolutional neural network (CNN) as opposed torecurrent neural network (RNN) used in the best known neuralnet based decoders. This contradicts existing doctrine that onlyRNN based decoders can provide a performance close to theoptimal ones. The key ingredient to our approach is a novelMixed-SNR Independent Samples based Training (MIST), whichallows for training of CNN with only 1% of possible datawords,even for block length as high as 1000. The proposed decoder isrobust as, once trained, the same decoder can be used for a widerange of SNR values. Finally, in the presence of channel outages,the proposed decoders outperform the best known decoders, viz. unquantized Viterbi decoder for convolutional code, and beliefpropagation for LDPC. This gives the CNN decoder a signiﬁcantadvantage in 5G millimeter wave systems, where channel outagesare prevalent.

Index Terms —Deep learning, channel decoding, machine learn-ing, neural net decoders, 5G mmWave

I. I

NTRODUCTION

Efﬁcient communication of messages over a noisy channelis governed by information theoretic principles. While com-munication efﬁciency, measured in terms of the successfultransmission rate, can be optimized using careful code design,computational efﬁciency is also of at most importance inmodern wireless devices. In fact thoughtful design of encodingand decoding schemes such as convolutional, turbo, LDPC andpolar have successively pushed the operational rates closer tothe maximum possible limit, known as Shannon capacity ofthe channel [1]. In conjunction with iterative decoding, thecomputational complexity also stays within manageable limits,making these schemes good candidates for current and futurewireless standards. Recently, new decoding schemes basedon deep-learning have shown to have good performance fordecoding polar codes [2]. This was later extended to turbo and convolutional codes [3].The most attractive feature of a learning based decoder isthat it provides a kind of universality to the decoding scheme,i.e. the same network architecture can be properly trained todecode on a variety of channels. This leads to a certain easeof implementation, which along with parallel processing can also bring hardware advantages. Another signiﬁcant advan-tage of learning based methods over conventional decodingapproaches is that they can be made more agnostic to theexact statistics of the underlying channel. More speciﬁcally,while variations from the designed system parameters canbe harmful for even the optimal MAP decoder, a learningbased decoder is more robust against channel variations [4].In other words, the network can be trained to guard againstchannel variations also, albeit at the expense of some trainingcost, an one time expenditure. In communication systems,deep learning has also been shown to be highly useful intasks such as estimating channel state information (CSI) [5],noise parameter estimation [6] and modulation recognition [7].Designing neural net based decoder for convolutional andLDPC coding schemes is the main aim of the current paper.Some references to the prior work is in order here. In [8],a deep learning based auto-encoder strategy for designing anend-to-end communication system was proposed for low blocklengths. In fact an error correction performance comparableto that of ( , , ) Hamming code was demonstrated, undersuitable assumptions on the channel model. [6] proposes aCNN based method to estimate noise parameters and useit in tandem with belief propagation decoder to reduce thedecoding error rates for LDPC codes. Short BCH codes aredecoded using RNN based methods in [9]. In [2], a MultiLayer Perceptron (MLP) based method for decoding polarcodes is proposed. This work is extended in [10], wheredecoders for polar codes using CNN and RNN are proposed,and their accuracy and time complexity are compared withMLP. It is shown that though RNN based decoder has worsetime complexity, it achieves the lowest error rates among otherdeep learning based methods. This work also argues that theachieved performance can be close to optimal only when atleast 90% of the codewords are used for training. Furthermore,the CNN decoding performance was shown to be inferioreven with this disproportionate training requirement. Clearly,such training constraints are only feasible for low blocklengthslike 32. In their seminal paper, Kim et al. have proposedRNN based decoder for recursive convolutional codes [3]. Theproposed decoder achieves close to optimal performance andcan also scale to higher block lengths. The training was doneusing a sufﬁciently large, yet computationally feasible set ofinput codewords. In spite of these desirable characteristics, thedecoder in [3] has high decoding latency that limits its use inpractice. While CNNs are known to have low computational a r X i v : . [ ee ss . SP ] M a y ncoder BPSK Channel NeuralNet x b ym ˆ m Figure 1:

System model latency, their performance so far was perceived poor whencompared to the RNNs. The main question is whether one canrealize a CNN based decoder which can perform on par withRNN decoders, at comparable blocklengths. In this paper, weshow that the key to building such a CNN is to employ an ef-ﬁcient training mechanism. In fact, our novel training schemein conjunction with a better network architecture allows forbuilding low latency CNN decoders for both convolutionaland LDPC codes.The key contributions of the paper are: • A novel Mixed-SNR Independent Samples based Training(MIST) scheme is proposed. With this training scheme, it isshown that deep learning based CNN decoders for convolu-tional codes can be designed, achieving better BER and BLERperformance than that of a hard Viterbi decoder. • CNN based decoders with MIST are shown to achievesimilar performance as the state-of-the-art RNN based de-coders for convolutional codes, that too with less than 1%of the codebook used for training, as against the existing 90%training requirement [10]. • The same neural net architecture can be trained using MISTto decode LDPC codes. In fact this decoder achieves a betterperformance than that of the popular bit ﬂipping algorithm [1]. • The proposed method achieves a lower BER compared toboth hard and soft decoding techniques for convolutional andLDPC codes when there are random outages in a channel. Theissue of random outages are expected to play a key role in 5Gsystems [11]. • Finally, MIST procedure scales well with blocklength. Inour experiments using comparable blocklengths, it achieves8 times lower latency than existing state-of-the-art neuralnet methods, while guaranteeing the same BER and BLERperformance.Rest of the paper is organized as follows: Section IIIdescribes the proposed training strategy and neural net archi-tecture. Section IV compares the performance of the proposedmethod with the RNN decoder and traditional decoding meth-ods. Section V gives concluding remarks.II. S

YSTEM M ODEL

The problem of designing an optimal decoder can be for-mulated as follows. Consider a typical communication systemshown in Figure 1, where an l − bit dataword m ∈ { , } l isencoded to n − length codeword x ∈ { , } n . Here n is theblocklength, and r = ln is called the rate of the code. Let mmm ∈ M , where M = { , } (cid:96) . The chosen codeword xxx ismodulated to yield the baseband transmitted symbols bbb . Inthis paper we assume binary phase shift keying (BPSK) as thedigital modulation scheme. The received baseband waveformafter sampling yields the observation vector yyy . We consider anAWGN model in which yyy is given by yyy = bbb + zzz , (1) where zzz is a real zero-mean Gaussian noise vector of covari-ance matrix σ I n , independent of the transmitted symbols.While the above model depicts a block-coding scheme, withslight abuse of terminology, this also applies to non-blockcoding schemes like convolutional codes. Note that the samesystem model is considered in previous work on deep learningbased decoder design [2], [3], [10]. The decoder’s goal nowis to ﬁnd a mapping f (cid:63) : ℜ n → M such that: f (cid:63) ( y ) = arg max m ∈ M P ( mmm | yyy ) . (2)A learning based decoder effectively learns the function f (cid:63) during the training stage, typically using labelled data com-prising input codewords and the corresponding noisy receivedvectors. Next, we describe the proposed method.III. P ROPOSED M ETHOD

In this section, we explain our training procedure, neuralnet design, choice of hyperparameters, and the optimizationperformed for tuning weights of the neural net. Recall thatour objective is to decode convolutional and LDPC codes.The training procedure plays a key role in determining theperformance of a neural net. For most classiﬁcation problems,conventional training methods partition the given labelled datainto three sets, viz. 1) training dataset, 2) validation datasetand 3) testing dataset. The training dataset is ﬁrst used totrain the network, i.e., weights in the neural net are tuned tominimize an appropriately chosen loss function . Subsequently,validation data is used to detect over-ﬁtting, and followingthat the hyperparameters of the network are tuned to eliminatethe over-ﬁtting. The training and validation steps are typicallyiterated until the hyperparameters are tuned to yield the desiredlevel of accuracy. Once the neural net is optimized, the resultsare shown on the testing data.Note that decoding is effectively a classiﬁcation problem,thus the three step methodology described above is universallyemployed for decoding schemes also (see [3] and [4] fordetails). However, for a typical classiﬁcation problem, trainingis done using a ﬁxed labelled dataset, as obtaining more datais often expensive. On the other hand, for a given encodingrule and channel probability law, the training data for adecoder can be generated easily. We leverage this to propose atraining procedure that generate labelled training data on-the-ﬂy . Thus, unlike the classiﬁcation framework which employs aﬁxed fraction of the codewords in training, we randomize theset of inputs during the training stages, yielding signiﬁcantadvantages while decoding. Another key difference in ourtraining methodology is the way in which we account forvarious SNR values encountered by the decoder. Note that theprevious works use an appropriately chosen SNR for training,and this does not change during the training process. Incontrast, the proposed training method randomly samples SNRvalue from a desired range for each training codeword. Wecall our training strategy as

Mixed-SNR Independent Samplingbased Training (MIST), which is described next.Before getting into further details, it is worthwhile to notethat extensive simulations were conducted before a satisfactoryCNN architecture emerged, and the two above-mentionedssential training features of MIST give dramatic performanceimprovements from conventionally trained CNNs.

A. Mixed-SNR Independent Sampling based Training (MIST)

Let S be the set of SNR values for which a decoder hasto be designed. The training is done using batches of validcodewords generated for the coding scheme under considera-tion. Here, we are interested in convolutional and LDPC codes.The following steps are performed β times to obtain a trainingbatch with β codewords. • Uniformly sample a SNR value from S and set the noisevariance as SNR − . The noise vector zzz is then found byindependently sampling the noise distribution n times. • Sample a message mmm from M , generate the codeword xxx ( mmm ) ,and output symbols yyy using (1).Let the training batch be denoted as y , y , . . . , y β , havingdimensions β × n . The training batch is feedforwarded to theneural net that outputs a (cid:96) -dimensional vector ˆ ppp . The meansquared error (MSE) between ˆ ppp and mmm is calculated as: L ( mmm , ˆ ppp ) = β (cid:96) β ∑ i = (cid:96) ∑ j = ( m i j − ˆ p i j ) , (3)where m i j ( ˆ p i j , resp.) denotes j th value in vector mmm i ( ˆ ppp i , resp.).The function L ( · , · ) in equation (3) is our loss function , andit is back-propagated to learn the weights of the neural netusing the Adam optimizer [2]. Choosing a sufﬁciently smalllearning rate for the CNN training is now important, however,too small a value will delay the convergence of the learningalgorithm. Experimental validation suggests a suitable initiallearning rate of 10 − . A new training batch is generated ineach iteration.The key differences between the existing training strategiesand MIST are as follows: ( a ) Existing works ﬁx the training and validation datawordsin the beginning, and use the same in every training iteration.On the contrary, MIST generates the training data randomlyusing Monte-Carlo technique in every training iteration. ( b ) MIST does not require any separate validation data forhyperparameter tuning as over-ﬁtting is highly improbable.Rather, as explained in Section III-B, hyperparameters aretuned directly based on the value of loss function. ( c ) Existing works train the network for a single suitable SNRvalue, and then tests on different SNR values. In MIST, eachsample in a batch is allowed to have randomly chosen SNRfrom the given set S .Some advantages of MIST are:(1) Since the storage requirements of the training data islow in MIST, the network can be trained on a larger set ofdatawords even for a large block lengths.(2) As the training samples vary across training iterations,the neural net actually learns the decoding technique insteadof becoming a simple recording and reading algorithm, thusavoiding an overﬁt to the training data.(3) As the noise is varied across samples, the neural net getsto see the samples corresponding to various SNRs. This helpsit tune the decoding function to accommodate varying SNR. ,

000 1 ,

500 2 ,

000 2 ,

500 3 , − . − − . Iterations M e a n s q u a r ee rr o r Kernel size:3, (10,10,10) kernelsKernel size:3, (10,30,30) kernelsKernel size:3, (10,50,50) kernelsKernel size:6, (10,50,50) kernelsKernel size:12, (10,50,50) kernelsKernel size:24, (10,50,50) kernels

Figure 2:

Training procedures to decide on hyperparameters for 3layer CNN. Each entry in legend denotes the kernel size used in allthe 3 layers followed by the number of kernels in each layer. (4) MIST allows for a low complexity neural net to performclose to optimal with signiﬁcantly less decoding time permessage than that of the existing neural net based approaches.Next we describe the neural net architecture used fordecoding.

B. Network Architecture

Among different deep learning based approaches, we chooseCNNs. One-dimensional CNN kernels are capable of learningthe decoding algorithm as it models the sense of sequence necessary for learning dependencies among encoded messagebits. Moreover CNNs have lower training and testing com-plexity, which are highly desired. The architecture of a CNNis decided by 3 hyperparameters, viz.

1) number of layers,2) number of kernels in each layer, 3) kernel size in each layer.We reiterate that extensive computational experiments led tothe ﬁnal working architecture and hyperparameters presentedbelow.Having a network architecture with 2 layers was foundinsufﬁcient to achieve reasonable performance in most ofour experiments. While this behaviour dramatically changedwith 3 layers, our experiments further showed that increasinglayers beyond 3 did not improve the performance by much. Inorder to choose the number of kernels, we started with lowervalues and systematically increased the number till reasonableperformance was achieved. This is depicted in Figure 2, wherethe error performance for various kernel choices are listed.While choosing about 50 kernels for the last two layersgave reasonable results, the ﬁrst layer could operate withjust 10 kernels. Increasing the number of kernels beyond 50did not provide much improvement in the error rate. Thesecomparisons were done for a kernel size of 3. Once the numberof kernels are ﬁxed, the kernel size is determined. Since acodeword has a ﬁxed bit sequence, a larger sequence context isrequired by the CNN to learn the decoding scheme. Thus usinga larger kernel size can help reducing the error further, this isillustrated in Figure 2. Notice that when the kernel size is 3,the error is higher compared to the case when kernel size is 6.While increasing the kernel size to 12 and then 24 decreasesthe error further, no noticeable improvement was obtained afterthis. The activation function used in each convolutional layeris rectiﬁed linear unit (ReLU).The convolution was performed by padding with zeros atsignal edges. Since MIST ensures no overﬁt, dropouts areable I:

Network architecture for the proposed decoders

Layer Output shape

Conv1 (ReLU) ( β , n , β , n , β , n , β , n , β , n , β , n , β , l )0 0 . . . B l o c k L e ng t h CNNBi-GRU

Figure 3:

Average time taken to decode a dataword using CNNdecoder and Bi-GRU decoder for blocklength 100, 200 and 1000. not used while training the deep network. It is experimen-tally veriﬁed that adding dropout does not reduce the lossfunction any further. Once the signal features are extractedfrom the convolutional layer, it is passed through the denselayer having (cid:96) neurons with sigmoid activation to get theposterior probability corresponding to each message bit. Thisposterior probability is quantized to obtain the message bits.The experimentally obtained CNN architecture is given inTable I . The same neural net design is used for decodingboth convolutional as well as LDPC codes.IV. R ESULTS AND D ISCUSSION

We present the BER and BLER curves for the CNNdecoder in AWGN channels, against the available SNR. Theperformance is evaluated on 10 samples for each SNR value.Comparison with analytically derived decoders is performedfor both convolutional and LDPC codes. A. Decoding of convolutional encoding method

Generating polynomials for convolutional codes with goodminimum distance properties are readily available [1]. Forillustration, the performance on a non-systematic rate- codeis compared against a hard Viterbi decoder (same code andcomparison as in [3]). The code polynomials in octal notationare ( , ) . The minimum free distance is 5 for this code [1].In Figure 4, comparisons using block lengths 100, 200 and1000 for different SNR values is shown. Here, the CNNdecoder is compared with the state-of-the-art RNN-basedmethod proposed in [3]. The architecture in [3] uses 2 bi-direction Gated Recurrent Units (bi-GRU) layers with batchnormalization followed by a ﬁnal dense layer with sigmoidactivation. So far the general perception in literature is thatrecurrent layers outperform CNN based decoders, see [3], [4], Code available at https://github.com/kryashashwi/MIST CNN Decoder [10]. However, as evident from Figure 4, the training strategyMIST with a carefully designed CNN architecture achievesthe same performance as the RNN based method. Moreover,CNN decoders have nearly 8 times less decoding latency whencompared to RNN decoders, as shown in Fig. 3 . The timeto decode a codeword becomes extremely important whencomparing different decoders. To understand the impact ofﬁndings in Fig. 3, note that for n = . Thus,the proposed CNN based decoder performs on par with thestate-of-the-art complex RNN based decoders in terms of BERand BLER, albeit at a much lower latency.

Now that the usefulness of the proposed training methodand network architecture is established, next we show theperformance of the CNN decoder for LDPC codes.

B. Decoding of LDPC Codes

We consider a rate 1 / ( , ) code here.The received noisy signal is used against the original messageto train the decoder. The same network architecture describedin Table I is used for training purpose. The decoding perfor-mance is compared against the bit ﬂipping (BF) method forLDPC codes. In Figure 5, performance comparisons are givenfor block lengths of 100, 200 and 1000. For both BER andBLER, the CNN decoder outperforms the BF decoder.To further evaluate the decoder performance, we now com-pare the CNN decoder with soft decoding schemes. C. Comparison with soft decoding schemes

For comparing the performance on decoding convolutionalcodes, the unquantized Viterbi decoder is used as a reference.This is presented in Fig. 6a and 6b. Although the CNNdecoder outperforms the hard Viterbi decoder, it still has gotsome way to go in reaching the optimal unquantized Viterbidecoder. For LDPC codes, the belief propagation (BP) decoderperforms better than the CNN decoder in terms of both BERand BLER as shown in Fig. 6c and 6d. Thus, more workis required for any existing learning based method, be itCNN or RNN, to perform at the level of any near-optimalsoft decoding scheme. However, this situation dramaticallychanges when the channel statistics are not exactly known.In particular, for models having channel variations which arenot exactly tracked at the receiver, the optimal decoder designis much more complex. An optimal decoder designed for aparticular channel law may fare poorly due to the mismatched decoding when the encountered channel is different. However,a properly trained CNN scheme can still offer reliability, whichwe now demonstrate by experiments.

D. Random channel outages

In systems like mmWave 5G links, random channel outagescan occur. During an outage, the channel SNR suddenly drops Time comparison made on machine having Ubuntu 16 .

04, Intel 8 core i7 − k @4 . − − − − − SNR (dB) B E R Viterbi DecoderBi-GRU DecoderCNN Decoder (a) BER Vs SNR ( n =100) − − − − − SNR (dB) B L E R Viterbi DecoderBi-GRU DecoderCNN Decoder (b) BLER Vs SNR ( n =100) − − − − − − − SNR (dB) B E R Viterbi DecoderBi-GRU DecoderCNN Decoder (c) BER Vs SNR ( n =200) − − − − − SNR (dB) B L E R Viterbi DecoderBi-GRU DecoderCNN Decoder (d) BLER Vs SNR ( n =200) − − − − − − − SNR (dB) B E R Viterbi DecoderBi-GRU DecoderCNN Decoder (e) BER Vs SNR ( n =1000) − − − − − SNR (dB) B L E R Viterbi DecoderBi-GRU DecoderCNN Decoder (f) BLER Vs SNR ( n =1000) Figure 4: BER and BLER comparison of Hard Viterbi decoder, Bi-GRU decoder and CNN decoder. − − − − SNR (dB) B E R BF DecoderCNN Decoder (a) BER Vs SNR ( n =100) − − − . − . − . SNR (dB) B L E R BF DecoderCNN Decoder (b) BLER Vs SNR ( n =100) − − − − SNR (dB) B E R BF DecoderCNN Decoder (c) BER Vs SNR ( n =200) − − − . − . SNR (dB) B L E R BF DecoderCNN Decoder (d) BLER Vs SNR ( n =200) − − − − SNR (dB) B E R BF DecoderCNN Decoder (e) BER Vs SNR ( n =1000) − − − . − . − . − . SNR (dB) B L E R BF DecoderCNN Decoder (f) BLER Vs SNR ( n =1000) Figure 5: BER and BLER comparison of Bit ﬂipping based decoder and CNN decoderto a lower value before getting restored to its proper level.This results in different noise distributions within the samecode block, a condition akin to arbitrary varying channels ininformation theory [12]. In 5G systems, the SNR differencecan get as high as 20dB [11]. In order to show the robustness of the CNN decoder, a channel where each symbol independentlyexperiences outage with probability α is considered. TheSNR during an outage is taken as − α = . α = . − − − − − − − SNR (dB) B E R Unquantized ViterbiBi-GRU DecoderCNN Decoder (a) BER Vs SNR (Conv. code) − − − − − − SNR (dB) B L E R Unquantized ViterbiBi-GRU DecoderCNN Decoder (b) BLER Vs SNR (Conv. code) − − − − − SNR (dB) B E R BP DecoderCNN Decoder (c) BER Vs SNR (LDPC code) − − − SNR (dB) B L E R BP DecoderCNN Decoder (d) BLER Vs SNR (LDPC code)

Figure 6: Comparison of CNN decoders with the optimal de-coding techniques for convolutional code (ﬁgures (a) and (b))and LDPC code (ﬁgures (c) and (d)). Here, n = − − − . − − . − . − . SNR (dB) B E R Hard ViterbiUnquantizedCNN Decoder (a) α = . − − − . − . − . − . SNR (dB) B E R Hard ViterbiUnquantizedCNN Decoder (b) α = . − − − SNR (dB) B E R BF DecoderBP DecoderCNN Decoder (c) α = . − − − . − . − . SNR (dB) B E R BF DecoderBP DecoderCNN Decoder (d) α = . Figure 7: BER Vs SNR comparison of CNN decoders with theexisting decoding techniques for channel with random outage.Here, n = α as 0 . .

5, resp. (c) and (d) show BER comparison ofthree LDPC decoders, viz. CNN decoder, bit-ﬂipping and logsimple belief propagation decoders with α as 0 . .

5, resp.the results are presented in Fig. 7a and 7b respectively. Clearlythe CNN decoder outperforms the Viterbi decoder for both thecases. Notice that the unquantized Viterbi decoder is optimalfor a ﬁxed channel, but becomes mismatched in presence of unknown channel quality changes. The same experiment isnow repeated for LDPC codes. Fig. 7c and 7d compare theCNN decoder with bit ﬂipping as well as belief propagationmethods for α = . α = .

5. For this case, CNN decoderoutperforms the bit ﬂipping method. It outperforms the beliefpropagation method when α = . α = . ONCLUSION

In this paper, we proposed a novel training strategy calledMIST, and showed that CNN based neural net decoders canbe trained for convolutional and LDPC codes. The trainingmethod is scalable to higher block lengths. The proposeddecoder matches the state-of-the-art RNN based decoder intheir performance while providing signiﬁcantly lower decod-ing latency. Furthermore, when the channel suffers randomoutages, the neural net based decoders outperform even theanalytically derived decoders which are optimal for a ﬁxedchannel law.Designing decoders for arbitrary varying channels is left asa future work. Improving the CNN architecture to match softdecoding schemes for ﬁxed channels is another open problemof interest. R

EFERENCES[1] S. Lin and D. J. Costello,

Error control coding . Pearson EducationIndia, 2001.[2] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning-based channel decoding,” in . IEEE, 2017, pp. 1–6.[3] H. Kim, Y. Jiang, R. Rana, S. Kannan, S. Oh, and P. Viswanath,“Communication algorithms via deep learning,” arXiv preprintarXiv:1805.09317 , 2018.[4] N. Farsad and A. Goldsmith, “Neural network detection of data se-quences in communication systems,”

IEEE Trans. on Signal Processing ,vol. 66, no. 21, pp. 5663–5678, 2018.[5] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channelestimation and signal detection in ofdm systems,”

IEEE Wireless Comm.Letters , vol. 7, no. 1, pp. 114–117, 2018.[6] F. Liang, C. Shen, and F. Wu, “An iterative bp-cnn architecture for chan-nel decoding,”

IEEE Journal of Selected Topics in Signal Processing ,vol. 12, no. 1, pp. 144–159, 2018.[7] K. Yashashwi, A. Sethi, and P. Chaporkar, “A learnable distortioncorrection module for modulation recognition,”

IEEE Wireless Comm.Letters , vol. 8, no. 1, pp. 77–80, 2019.[8] T. OShea and J. Hoydis, “An introduction to deep learning for thephysical layer,”

IEEE Trans. on Cognitive Comm. and Networking ,vol. 3, no. 4, pp. 563–575, 2017.[9] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein,and Y. Beery, “Deep learning methods for improved decoding of linearcodes,”

IEEE Journal of Selected Topics in Signal Processing , vol. 12,no. 1, pp. 119–131, 2018.[10] W. Lyu, Z. Zhang, C. Jiao, K. Qin, and H. Zhang, “Performanceevaluation of channel decoding with deep neural networks,” in . IEEE, 2018, pp. 1–6.[11] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N.Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wavemobile communications for 5g cellular: It will work!”

IEEE Access ,vol. 1, pp. 335–349, 2013.[12] A. El Gamal and Y.-H. Kim,