ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning
11 ADMM-based Decoder for Binary Linear CodesAided by Deep Learning
Yi Wei, Ming-Min Zhao, Min-Jian Zhao, and Ming Lei
Abstract — Inspired by the recent advances in deep learning(DL), this work presents a deep neural network aided decodingalgorithm for binary linear codes. Based on the concept of deep unfolding , we design a decoding network by unfolding thealternating direction method of multipliers (ADMM)-penalizeddecoder. In addition, we propose two improved versions ofthe proposed network. The first one transforms the penaltyparameter into a set of iteration-dependent ones, and the secondone adopts a specially designed penalty function, which is basedon a piecewise linear function with adjustable slopes. Numericalresults show that the resulting DL-aided decoders outperform theoriginal ADMM-penalized decoder for various low density paritycheck (LDPC) codes with similar computational complexity.
Index Terms —ADMM, binary linear codes, channel decoding,deep learning, deep unfolding.
I. I
NTRODUCTION
The linear programming (LP) decoder [1], which is basedon LP relaxation of the original maximum likelihood (ML)decoding problem, is one of many important decoding tech-niques for binary linear codes. Due to its strong theoreticalguarantees on decoding performance, the LP decoder has beenextensively studied in the literature, especially for decodinglow-density parity-check (LDPC) codes [1], [2]. However,compared with the classical belief propagation (BP) decoder,the LP decoder has higher computational complexity andpoorer error-correcting performance in the low signal-to-noise-ratio (SNR) region.In order to address the above drawbacks, alternating di-rection method of multipliers (ADMM) has recently beenused to solve the LP decoding problem [3]–[8]. Specifically,the work [3] first presented the ADMM formulation of theLP decoding problem by exploiting the geometry of the LPdecoding constraints. The works [4] and [5] further reducedthe computational complexity of the ADMM decoder. Theauthors in [6] improved the error-correcting performancethrough an ADMM-penalized decoder, where the idea is tomake pseudocodewords more costly by adding various penaltyterms to the objective function. Moreover, in [7], the ADMM-penalized decoder was further improved by using piecewisepenalty functions, and for irregular LDPC codes, the work[8] proposed to modify the penalty term and assign differentpenalty parameters for variable nodes with different degrees.Recent advances in deep learning (DL) provide a new direc-tion to tackle tough signal processing tasks in communicationsystems, such as channel estimation [9], MIMO detection [10]and channel coding [11]–[13]. For channel coding, the work[11] proposed to use a fully connected neural network andshowed that the performance of the network approaches thatof the ML decoder for very small block codes. Then, in [12],the authors proposed to employ the recurrent neural network
The authors are with College of Information Science and Electronic En-gineering, Zhejiang University, Hangzhou 310027, China (email: { } @zju.edu.cn). (RNN) to improve the decoding performance, or alternativelyreduce the computational complexity, of a close to optimaldecoder of short BCH codes. The work [13] converted themessage-passing graph of polar codes into a conventionalsparse Tanner graph and proposed a sparse neural networkdecoder for polar codes.In this work, we propose to integrate deep unfolding [14]with the ADMM-penalized decoder to improve the decodingperformance of binary linear codes. This is the first work toconstruct a deep network by unrolling ADMM-based decodersfor binary linear codes. Different from some classical DLtechniques such as the fully connected neural network andthe convolutional neural network, which essentially operateas a black box, deep unfolding can make full use of theinherent mechanism of the problem itself and utilize multipletraining data to improve the performance with lower trainingcomplexity [15]. Based on the ADMM-penalized decoder withthe cascaded reformulation of the parity check (PC) constraints[2], we propose to construct a learnable ADMM decodingnetwork (LADN) by unfolding the corresponding ADMMiterations, i.e., each stage of LADN can be viewed as oneADMM iteration with some additional adjustable parameters.By following the prototype of LADN, two improved versionsare further proposed, which are referred to as LADN-I andLADN-P, respectively. In LADN-I, we propose to transformthe original scalar penalty parameter into a series of iteration-dependent parameters, which can improve the convergence toreduce the number of iterations and make performance lessdependent on the initial choice of the penalty parameter. More-over, in LADN-P, a specially designed penalty function, i.e., apiecewise linear function with adjustable slopes, is introducedinto the proposed LADN, in order to punish pseudocodewordsmore effectively and increase the decoding performance. Es-sentially, we provide a deep learning-based method to obtain agood set of parameters and penalty functions in the ADMM-penalized decoders. Simulation results demonstrate that theproposed decoders are able to outperform the plain ADMM-penalized decoders with a similar computational complexity.II. P ROBLEM F ORMULATION
A. ML Decoding Problem
We consider binary linear codes C of length N , eachspecified by an M × N PC matrix H . Throughout this letter,we let I (cid:44) { , · · · , N } and J (cid:44) { , · · · , M } denote thesets of variables nodes and check nodes of C , respectively,and let d j represent the degree of check node j . We fo-cus on memoryless binary-input symmetric-output channels, x = { x i = { , } , i ∈ I} is the codeword transmitted overthe considered channel and y is the received signal. Then, the a r X i v : . [ c s . I T ] F e b ML decoding problem can be formulated as follows: min x v T x (1a)s.t. (cid:104)(cid:88) Ni =1 H ji x i (cid:105) = 0 , j ∈ J , (1b) x ∈ { , } N × , i ∈ I , (1c) where [ · ] denotes the modulo-2 operator, and v =[ v , · · · , v N ] T ∈ R N × represents the log-likelihood ratio(LLR) vector whose i -th element is defined as v i = log (cid:18) Pr ( y i | x i = 0) Pr ( y i | x i = 1) (cid:19) , i ∈ I . (2)Particularly, v i can also be viewed as the cost of decoding x i = 1 .Note that the difficulty of solving problem (1) lies in thePC constraints (1b) and the discrete constraints (1c). In thenext subsection, we will review the idea of the cascadedreformulation of PC constraints proposed in [2] and presentthe resulting equivalent form of problem (1). B. Cascaded Formulation of PC Constraints
The key to address the PC constraints in [3] is to decomposethe high-degree check nodes into some low-degree ones byintroducing auxiliary variables and then recursively employingthe three-variable PC transformation, i.e., [ x + x + x ] =0 , x i ∈ { , } , i ∈ { , , } is transformed into T x (cid:22) t , x ∈{ , } , where x = [ x , x , x ] T , t = [0 , , , T , T = − − − − − − . (3)In order to express the PC constraints in a more compact form,we define u = [ x T , ˜ x T ] T ∈ { , } ( N +Γ a ) × , t = [0 , , , T , b = Γ c × ⊗ t , q = [ v ; Γ a × ] , A = [ TQ ; · · · ; TQ τ ; · · · ; TQ Γ c ] ∈ { , ± } c × ( N +Γ a ) , (4)where Γ a = (cid:80) Mj =1 ( d j − and Γ c = (cid:80) Mj =1 ( d j − representthe total numbers of auxiliary variables and three-variable PCequations, ˜ x = [˜ x , ˜ x , · · · , ˜ x Γ a ] T , and Q τ ∈ { , } ( N +Γ a ) × denotes a selection matrix that chooses the correspondingvariables in u which are involved in the τ -th three-variablePC equation. Then, we can see that the ML decoding problem(1) is equivalent to the following linear integer programmingproblem: min u q T u (5a)s.t. Au − b (cid:22) , (5b) u ∈ { , } ( N +Γ a ) × . (5c)III. L EARNED
ADMM
DECODER
In this section, we first review the ADMM-penalized de-coder to address problem (5), and then we present a detaileddescription of the proposed LADN and its improved versions.Finally, we provide the loss function of the proposed networks,which is essential to achieve better decoding performance.
A. ADMM-Penalized Decoder
The essence of the ADMM-penalized decoder is the in-troduction of a penalty term to the linear objective of the LP decoding problem, with the intent of suppressing thepseudocodewords. In order to put problem (5) in the standardADMM framework, an auxiliary variable z is first added toconstraint (5b), and consequently, problem (5) can be equiva-lently formulated as the following optimization problem: min u , z q T u (6a)s.t. Au + z = b , (6b) u ∈ { , } ( N +Γ a ) × , z ∈ R a × . (6c) Next, the discrete constraint (6c) is relaxed to u ∈ [0 , (Γ a + n ) × and we penalize the pseudocodewords usingpenalty functions that make integral solutions more favorablethan fractional solutions, which leads to the following prob-lem: min u , z q T u + (cid:88) i g ( u i ) s.t. Au + z = b , u ∈ [0 , ( N +Γ a ) × , z ∈ R a × . (7)In (7), g ( · ) : [0 , → R ∪ {±∞} is the introduced penaltyfunction, e.g., the L1 or L2 function used in [6].The augmented Lagrangian function of problem (7) can beformulated as L µ ( u , z , y ) = q T u + (cid:88) i g ( u i )+ y T ( Au + z − b ) + µ || Au + z − b || , (8)where y ∈ R c × denotes the Lagrangian multiplier and µ represents a positive penalty parameter. Then, the iterations ofADMM can be written as u k +1 = arg min u ∈ [0 , ( N +Γ a ) × L µ ( u , z k , y k ) , (9a) z k +1 = arg min z ∈ R Γ c × L µ ( u k +1 , z , y k ) , (9b) y k +1 = y k + µ ( Au k +1 + z k +1 − b ) . (9c) Since A is orthogonal in columns, we can see that A T A is a diagonal matrix and the variables in (9a) are separable.Therefore, step (9a) can be conducted by solving the following N + Γ a parallel subproblems: min u µe i u i + g ( u i ) + ( q i + a Ti ( y + µ ( z k − b ))) u i , (10a)s.t. u i ∈ [0 , , i ∈ I , (10b) where a i denotes the i -th column of A , e = diag ( A T A ) =[ e , · · · , e N +Γ a ] . With a well-designed penalty function g ( · ) ,(10a) is guaranteed to be convex and the optimal solution ofproblem (10) can be easily obtained by setting the gradient of(10a) w.r.t. u i to zero and then projecting the resulting solutionto the interval [0,1]. Similarly, the optimal solution of problem(9b) can be obtained by z k +1 = Π [0 , + ∞ ] c (cid:18) b − Au k +1 − y k µ (cid:19) , (11)where Π [0 , ∞ ] ( · ) denotes the Euclidean projection operatorwhich projects the resulting solution to the interval [0 , ∞ ] .To summarize, the ADMM-penalized decoder iterates overthree steps, i.e., (9a)-(9c), and the final estimated codeword ˆ x is obtained by ˆ x = Π { , } ([ u , · · · , u N ]) , where Π { , } ( s ) =0 if s< . , and Π { , } ( s ) = 1 otherwise. ( ) k z ( ) k y ( ) k u ( -1) k u ( -1) k z ( +1) k u ( +1) k z ( +1) k y ( -1) k y (1) u ( ) K u (0) (0) , ) (z y Fig. 1: The structure of the proposed LADN.
B. The Proposed LADN
Unfolding a well-understood iterative algorithm (alsoknown as deep unfolding) is one of the most popular andpowerful techniques to build a model-driven DL network.The resulting networks have been shown to outperform theirbaseline algorithms in many cases, such as the LAMP [16] ,the ADMM-net [17] and the LcgNet [10], etc. Based on theaforementioned ADMM-penalized decoder, we construct ourLADN by unfolding the iterations of (9a)-(9c) and regarding µ and the coefficients in g ( · ) as learnable parameters, i.e., { α, µ } . Note that training a single parameter α or µ alsohelps to improve the baseline ADMM L2 decoder, howeverthe performance gain is inferior to that achieved by LADNwith two parameters learned jointly.For the purpose of illustration, we consider the L2 penaltyfunction here, whose definition is given by g L ( u ) = − α ( u − . , where α is the coefficient that controls the slope of g L ( · ) . Then, the solution of problem (10) can be explicitlyobtained by u k +1 i = Π [0 , (cid:32) q i + a Ti ( y k + µ ( z k − b )) + α − µe i + α (cid:33) . (12)For convenience, the ADMM-penalized decoder with the L2penalty function is referred to as ADMM L2 decoder in thefollowing.The proposed LADN is defined over a data flow graph basedon the ADMM iterations, which is depicted in Fig. 1. Thenodes in the graph correspond to different operations in theADMM L2 decoder, and the directed edges represent the dataflows between these operations. LADN consists of K stageseach with the same structure, and the k -th stage corresponds tothe k -th iteration of the ADMM-penalized decoder. Each stageincludes three nodes, i.e., the u -node, the z -node and the y -node, which correspond to the updating steps in (9a)-(9c). Let ( u ( k ) , z ( k ) , y ( k ) ) denote the outputs of these nodes in the k -th stage, the detailed steps when calculating ( u ( k ) , z ( k ) , y ( k ) ) can be expressed as (also shown in Fig. 2) u ( k +1) = Π [0 , (cid:16) η (cid:12) ( q + A T ( y ( k ) + µ ( z ( k ) − b )) + α (cid:17) , (13a) z ( k +1) = ReLU (cid:16) b − Au ( k +1) − y ( k ) /µ (cid:17) , (13b) y ( k +1) = y ( k ) + µ ( Au ( k +1) + z ( k +1) − b ) , (13c) where η ∈ R N +Γ a is the output of the function η ( A ; α ; µ ) (cid:44) diag (cid:0) / ( α − µ A T A ) (cid:1) , and the symbol (cid:12) represents theHadamard product. Besides, ReLU ( · ) denotes the classicalactive function in the DL field, i.e., ReLU ( x ) = max( x ; 0) ,which is equivalent to the projection operation Π [0 , ∞ ] ( · ) in (11). The parameters { µ, α } in (13) are considered aslearnable parameters to be trained and the final decodingoutput of the proposed network, i.e., ˆ x , can be obtained by ˆ x = Π { , } ([ u ( K )1 , · · · , u ( K ) N ]) . ( ) k z b - (k) y T A q A - Projection Layer [0,1] () ( 1) k + u − A b ReLU Layer ( 1) k + z A b - (k+1) y nodenode node u z y ( ;; ) Fig. 2: The k -th stage of LADN with learnable parameters α and µ . C. LADN-I with Iteration-Dependent Penalty Parameters
In order to improve the performance of LADN, we proposeto increase the number of learnable parameters, and theresulting network is referred to as LADN-I. The main idea ofthis improved network is to transform the penalty parameter µ into a series of iteration-dependent ones. This is based on theintuition that increasing the number of learnable parameters(or network scale) is able to improve the generalization abilityof neural networks. Besides, using possibly different penaltyparameters for each iteration/stage can potentially improvethe convergence in practice, as well as make performanceless dependent on the initial choice of the penalty parameter.Since the proposed varying penalty decoder includes theconventional fixed penalty decoder as a special case, we caninfer that LADN-I incurs no loss of optimality in general.More specifically, we employ µ = [ µ , · · · , µ K ] as learn-able parameters, where µ k denotes the penalty parameter inthe k -th stage. With { α, µ } , the decoding steps in (13) can berewritten as follows: u ( k +1) = Π [0 , (cid:16) η (cid:12) ( q + A T ( y ( k ) + µ k ( z ( k ) − b )) + α (cid:17) , (14a) z ( k +1) = ReLU (cid:16) b − Au ( k +1) − y ( k ) /µ k (cid:17) , (14b) y ( k +1) = y ( k ) + µ k ( Au ( k +1) + z ( k +1) − b ) . (14c) D. LADN-P with Learnable Penalty Function
In this subsection, we propose another improved version ofLADN by introducing a novel adjustable penalty function, andthe resulting network is named as LADN-P. This network isbased on the observation that the choice of the penalty function g ( · ) has a vital impact on the performance of the ADMM-penalized decoder and designing a good penalty function withthe aid of DL can potentially improve the decoding perfor-mance. According to [6], g ( · ) should satisfy the followingproperties: 1) g ( · ) is an increasing function on [0 , . ; 2) g ( u ) = g (1 − u ) for x ∈ [0 , ; 3) g ( · ) is differentiableon [0 , . ; 4) g ( · ) is such that the solution of the u -updateproblem (10) is well-defined.Note that an improved piecewise penalty function wasproposed in [7] by increasing the slope of the penalty functionat the points near x = 0 and x = 1 , and the parametersthat controls these slopes are decided by first choosing thepossible set of parameters empirically and then simulatingthe FER performance for all possible combinations of theparameters. The number of pieces can not be large due to thefact that the search process of these parameters is complexand time-consuming. To address this difficulty, we propose alearnable linear penalty function whose slope parameters canbe obtained by training. Since a piecewise linear function isable to approximate any nonlinear function when the number of pieces is large enough, we can learn a flexible penaltyfunction which is able to outperform the conventional L1and L2 functions. Besides, by resorting to power of DL, thecorresponding parameters can be trained from data throughback propagation, instead of empirically tuning or exhaustivesearch.The definition of the proposed adjustable penalty functionis given by g l ( x ) = φ l x + β l , c l − ≤ x Let { v p , x p } Pp =1 denote the set of training samples with size P , where the LLR vector v p and the transmit codeword x p are viewed as the p -th feature and label, respectively. Afteraccepting v p as input, the proposed networks are expectedto predict x p that corresponds to this v p . In the following,we let F LADN ( · ) denote the underlying mapping performedby the proposed networks, which satisfies ˆ x = F LADN ( v ; Θ ) and Θ denotes the collection of learnable parameters in theproposed networks, e.g., { α, µ } in LADN, { α, µ } in LADN-Iand {{ φ l } Ll =1 , µ } in LADN-P.All learnable parameters in the proposed networks can beoptimized by utilizing the training samples ( v p , x p ) Pp =1 tominimize a certain loss function. For the purpose of improvingthe decoding performance, we design a novel loss functionbased on the mean square error (MSE) criterion, whosedefinition is given by L ( Θ ) = 1 P P (cid:88) p =1 (cid:16) σ || Au ( K ) p + z ( K ) p − b || + (1 − σ ) ||F LADN ( v p ; Θ ) − x p || (cid:17) . (17) As shown in (17), L ( · ) is composed of the weighted sum ofan unsupervised term and a supervised term, i.e., || Au ( K ) p + z ( K ) p − b || and ||F LADN ( v p ; Θ ) − x p || , with σ being theweighting factor. More specifically, the unsupervised termmeasures the power of the residual between Au ( k ) + z ( k ) and b . The decoding process is considered to be completed whenthis residual is sufficiently small, i.e., || Au ( k ) + z ( k ) − b || ≤ (cid:15) ,which also means that constraint (6b) is nearly satisfied. There-fore, employing this residual as part of the loss function helpsto accelerate the decoding process. Besides, the supervisedterm aims to minimize the distance between the networkoutput F LADN ( · ) and the true transmit codeword x p , which isbeneficial for improving decoding accuracy. Note that in orderto learn a decoder which is effective under various numbersof iterations, the proposed loss function takes the outputs ofall layers into consideration.IV. S IMULATION R ESULTS In this section, computer simulations are carried out toevaluate the performance of the proposed LADN and itsimproved versions. All networks are implemented in Pythonusing the TensorFlow library with the Adam optimizer [18].In the simulations, we focus on additive white Gaussian noise(AWGN) channel with binary phase shift keying (BPSK)modulation. The considered binary linear codes are [96, 48]MacKay 96.33.964 LDPC code C and [128, 64] CCSDSLDPC code C [19].It is noteworthy that the training SNR plays an importantrole in generating the training samples. If the training SNR isset too high, very few decoding errors exist and the networksmay fail to learn the underlying error patterns. However, ifthe SNR is too low, only few transmitted codewords canbe successively decoded and this will prevent the proposednetworks from learning the effective decoding mechanism. Inthis work, we set the training SNR to 2dB, which is obtainedby cross-validation [20]. The training and validation data setscontain × and samples, and the message bits canbe all-zeros or randomly generated. The hyper-parameter σ in L ( · ) is set to 0.3 and 0.9 for C and C , respectively,which are chosen by cross-validation [20]. In LADN-P, weset the number of pieces in (15) to L = 10 . We employa decaying learning rate which is set to 0.001 initially andthen reduce it by half every epoch. The training process isterminated when the average validation loss stops decreasing.Besides, it is important to note that in the offline trainingphase, the number of stages is fixed to and for C and C codes, respectively. According to (9c), (11) and (12),the total computational complexity of the ADMM L2 decoderin each iteration is roughly O ( N + Γ a ) real multiplications+ O (10( N + Γ a ) − real applications + 2 real divisions.Since LADN (LADN I) finally perform as the ADMM L2decoder loaded with learned parameters { α, µ } ( { α, µ } ), itscomputational complexity is the same as the ADMM L2decoder, which is lower than that of the ML decoder, i.e., O (2 N ) . Note that although we have verified by simulations that all-zero codewordscan also be used for training, a rigorous proof on whether the proposeddecoders satisfy the all-zero assumption [6] (i.e., the symmetry condition)remains an open problem, which is out of the scope of this paper. E b /N (dB) -4 -3 -2 -1 B L E R ADMM L2LADNLADN-ILADN-PBPML E b /N (dB) -4 -2 B L E R ADMM L2LADNLADN-ILADN-PBPML Fig. 3: BLER performance comparison of C and C codes. Fig. 3 demonstrates the block error rate (BLER) perfor-mance comparison of the BP decoder, the ADMM L2 decoderand the proposed decoding networks. For the ADMM L2decoder, α and µ are set to 1 and 1.2, respectively. In all thecurves, we collect at least 100 block errors for all data points.It can be observed that for both codes, our proposed networksshow better BLER performance over the original ADMM L2decoder at both low and high SNR regions. The proposedLADN-I achieves the best BLER performance at the low SNRregion, and LADN-P outperforms the other counterparts whenSNR is high. Note that with the increasing of the code length,the training complexity also increases and it remains to beinvestigated whether the proposed method can still achieve anoticeable performance gain for longer codes.Moreover, in Fig. 4, we show the curves of the penaltyfunctions employed in the considered decoders (for C code) toillustrate their properties, where L2 , Learned L2 and LearnedPL denote the L2 penalty function in the ADMM L2 decoder,the L2 penalty function with learned parameter α in LADNand the adjustable piecewise linear penalty function in LADN-P, respectively. Note that the absolute values of the slopesof the penalty function | g (cid:48) h ( u ) | at the points near u = 0 . should be small, due to the fact that a larger slope may preventthe ADMM-penalized decoder from forcing the variables withvalues near 0.5 to 0 or 1. On the contrary, the values of | g (cid:48) h ( u ) | at the points near u = 0 or 1 should be large. There-fore, empirically, higher-order polynomial functions, such as g h ( u ) = − α ( u − . , are better than the L2 penaltyfunction. However, the solution of problem (10) is not well-defined if these higher-order functions are employed as penaltyfunctions. It can be observed from Fig. 3 (a) that the LearnedPL function has the largest absolute values of the slopes when u = 0 or 1 and the smallest ones when u = 0 . comparedwith those of L2 and Learned L2 . Furthermore, from Fig. 4(b), we can see that the curves of the Learned PL function issimilar to a higher-order (larger than 2) polynomial function,however the solution of the u -update problem (10) in this caseis well defined since the Learned PL function is composed oflinear functions. V. CONCLUSION In this work, we adopted the deep unfolding technique toimprove the performance of the ADMM-penalized decoderfor binary linear codes. The proposed decoding network isessentially obtained by unfolding the iterations of the ADMM-penalized decoder and transforming some preset parametersinto learnable ones. Furthermore, we presented two improvedversions of the proposed network by transforming the penaltyparameter into a series of iteration-dependent parameters and u g , ( u ) -1-0.500.51 (a) Derivative function L2Learned L2Learned PL u g ( u ) -0.25-0.2-0.15-0.1-0.050 (b) Penalty function L2Learned L2Learned PL Fig. 4: The learned penalty functions and their corresponding deriva-tive functions for C code. introducing a specially designed adjustable penalty function.Numerical results were presented to show that the proposednetworks outperform the plain ADMM-penalized decoder withsimilar complexity. R EFERENCES[1] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear program-ming to decode binary linear codes,” IEEE Trans. Inf. Theory , vol. 51,no. 3, pp. 954–972, Mar. 2005.[2] K. Yang, X. Wang, and J. Feldman, “A new linear programmingapproach to decoding linear block codes,” IEEE Trans. Inf. Theory ,vol. 54, no. 3, pp. 1061–1072, Mar. 2008.[3] S. Barman, X. Liu, S. C. Draper, and B. Recht, “Decomposition methodsfor large scale LP decoding,” IEEE Trans. Inf. Theory , vol. 59, no. 12,pp. 7870–7886, Dec. 2013.[4] X. Zhang and P. H. Siegel, “Adaptive cut generation algorithm forimproved linear programming decoding of binary linear codes,” IEEETrans. Inf. Theory , vol. 58, no. 10, pp. 6581–6594, Oct. 2012.[5] H. Wei and A. H. Banihashemi, “An iterative check polytope projectionalgorithm for ADMM-based LP decoding of LDPC codes,” IEEECommun. Lett. , vol. 22, no. 1, pp. 29–32, Jan. 2018.[6] X. Liu and S. C. Draper, “The ADMM penalized decoder for LDPCcodes,” IEEE Trans. Inf. Theory , vol. 62, no. 6, pp. 2966–2984, Jun.2016.[7] B. Wang, J. Mu, X. Jiao, and Z. Wang, “Improved penalty functionsof ADMM penalized decoder for LDPC codes,” IEEE Commun. Lett. ,vol. 21, no. 2, pp. 234–237, Feb. 2017.[8] X. Jiao, H. Wei, J. Mu, and C. Chen, “Improved ADMM penalizeddecoder for irregular low-density parity-check codes,” IEEE Commun.Lett. , vol. 19, no. 6, pp. 913–916, Jun. 2015.[9] Y. Wei, M.-M. Zhao, M.-J. Zhao, M. Lei, and Q. Yu, “An AMP-basednetwork with deep residual learning for mmWave beamspace channelestimation,” IEEE Wireless Commun. Lett. , vol. 8, no. 4, pp. 1289–1292,Aug. 2019.[10] Y. Wei, M.-M. Zhao, M.-J. Zhao, and M. Lei, “Learned conjugate gradi-ent descent network for massive MIMO detection,” arXiv: 1906.03814 ,2019.[11] T. Gruber, S. Cammerer, J. Hoydis, and S. ten. Brink, “On deep learning-based channel decoding,” in , Mar. 2017, pp. 1–6.[12] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, andY. Be’ery, “Deep learning methods for improved decoding of linearcodes,” IEEE J. Sel. Topics Signal Process. , vol. 12, no. 1, pp. 119–131,Feb. 2018.[13] W. Xu, X. You, C. Zhang, and Y. Be’ery, “Polar decoding on sparsegraphs with deep learning,” in ACSSC , Oct. 2018, pp. 599–603.[14] J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv:1409.2574 , Nov.2014.[15] A. Balatsoukas-Stimming and C. Studer, “Deep unfolding for commu-nications: A survey and some new directions,” arXiv:1906.05774 , Jun.2019.[16] M. Borgerding, P. Schniter, and S. Rangan, “AMP-Inspired deep net-works for sparse linear inverse problems,” IEEE Trans. Signal Process. ,vol. 65, no. 16, pp. 4293–4308, Aug. 2017.[17] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressivesensing MRI,” in NIPS , 2016, pp. 10–18.[18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in ICLR