On the Construction of G N -coset Codes for Parallel Decoding
Xianbin Wang, Huazi Zhang, Rong Li, Jiajie Tong, Yiqun Ge, Jun Wang
aa r X i v : . [ c s . I T ] O c t On the Construction of G N -coset Codes for ParallelDecoding Xianbin Wang ∗ , Huazi Zhang ∗ , Rong Li ∗ , Jiajie Tong ∗ , Yiqun Ge † , Jun Wang ∗∗ Hangzhou Research Center, Huawei Technologies, Hangzhou, China † Ottawa Research Center, Huawei Technologies, Ottawa, CanadaEmails: { wangxianbin1,zhanghuazi,lirongone.li,tongjiajie,yiqun.ge,justin.wangjun } @huawei.com Abstract —In this work, we propose a type of G N -coset codesfor parallel decoding. The parallel decoder exploits two equiv-alent decoding graphs of G N -coset codes. For each decodinggraph, the inner code part is composed of independent componentcodes to be decoded in parallel. The extrinsic information of thecode bits is obtained and iteratively exchanged between the twographs until convergence. Accordingly, we explore a heuristic andflexible code construction method (information set selection) forvarious information lengths and coding rates. Compared to theprevious successive cancellation algorithm, the parallel decoderavoids the serial outer code processing and enjoys a higherdegree of parallelism. Furthermore, a flexible trade-off betweenperformance and decoding latency can be achieved with threetypes of component decoders. Simulation results demonstrate thatthe proposed encoder-decoder framework achieves comparableerror correction performance to polar codes with a much lowerdecoding latency. I. I
NTRODUCTION
A. Preliminary G N -coset codes, as defined by Arıkan in [1], are a class oflinear block codes with the generator matrix G N . G N is an N × N binary matrix defined as G N , F ⊗ n , (1) in which N = 2 n and F ⊗ n denotes the n -th Kronecker powerof F = [ ] .The encoding process is x N = u N G N , (2) where x N , { x , x , · · · , x N } and u N , { u , u , · · · , u N } denote the code bit sequence and the information bit sequencerespectively.An ( N, K ) G N -coset code [1] is defined by an informationset A ⊂ { , , ..., N } , |A| = K . Its generator matrix G N ( A ) is composed of the rows indexed by A in G N . Thus (2) isrewritten as x N = u ( A ) G N ( A ) , (3) where u ( A ) , { u i | i ∈ A} .The key to constructing G N -coset codes is to properlydetermine an information set A . RM codes [2] and polar codes[1], two well-known examples of G N -coset codes, determine A in terms of Hamming weight and sub-channel reliability,respectively, which are referred to as RM principle and polarprinciple. Stage1 Stage 2 Stage 3 Stage Stage 1 Stage 2 Stage 3 Stage 4 (a) (b)Outer codes Inner codes Outer codes Inner codes (cid:12)(cid:20)(cid:15)(cid:20)(cid:11) (cid:20) p xx = (cid:12)(cid:20)(cid:15)(cid:21)(cid:11) (cid:21) p xx = (cid:12)(cid:20)(cid:15)(cid:22)(cid:11) (cid:22) p xx = (cid:12)(cid:20)(cid:15)(cid:23)(cid:11) (cid:23) p xx = (cid:12)(cid:21)(cid:15)(cid:20)(cid:11) (cid:24) p xx = (cid:12)(cid:21)(cid:15)(cid:21)(cid:11) (cid:25) p xx = (cid:12)(cid:21)(cid:15)(cid:22)(cid:11) (cid:26) p xx = (cid:12)(cid:21)(cid:15)(cid:23)(cid:11) (cid:27) p xx = (cid:12)(cid:22)(cid:15)(cid:20)(cid:11) (cid:28) p xx = (cid:12)(cid:22)(cid:15)(cid:21)(cid:11) (cid:20)(cid:19) p xx = (cid:12)(cid:22)(cid:15)(cid:22)(cid:11) (cid:20)(cid:20) p xx = (cid:12)(cid:22)(cid:15)(cid:23)(cid:11) (cid:20)(cid:21) p xx = (cid:12)(cid:23)(cid:15)(cid:20)(cid:11) (cid:20)(cid:22) p xx = (cid:12)(cid:23)(cid:15)(cid:21)(cid:11) (cid:20)(cid:23) p xx = (cid:12)(cid:23)(cid:15)(cid:22)(cid:11) (cid:20)(cid:24) p xx = (cid:12)(cid:23)(cid:15)(cid:23)(cid:11) (cid:20)(cid:25) p xx = (cid:12)(cid:20)(cid:15)(cid:20)(cid:11) (cid:20) xx = (cid:12)(cid:20)(cid:15)(cid:21)(cid:11) (cid:21) xx = (cid:12)(cid:20)(cid:15)(cid:22)(cid:11) (cid:22) xx = (cid:12)(cid:20)(cid:15)(cid:23)(cid:11) (cid:23) xx = (cid:12)(cid:21)(cid:15)(cid:20)(cid:11) (cid:24) xx = (cid:12)(cid:21)(cid:15)(cid:21)(cid:11) (cid:25) xx = (cid:12)(cid:21)(cid:15)(cid:22)(cid:11) (cid:26) xx = (cid:12)(cid:21)(cid:15)(cid:23)(cid:11) (cid:27) xx = (cid:12)(cid:22)(cid:15)(cid:20)(cid:11) (cid:28) xx = (cid:12)(cid:22)(cid:15)(cid:21)(cid:11) (cid:20)(cid:19) xx = (cid:12)(cid:22)(cid:15)(cid:22)(cid:11) (cid:20)(cid:20) xx = (cid:12)(cid:22)(cid:15)(cid:23)(cid:11) (cid:20)(cid:21) xx = (cid:12)(cid:23)(cid:15)(cid:20)(cid:11) (cid:20)(cid:22) xx = (cid:12)(cid:23)(cid:15)(cid:21)(cid:11) (cid:20)(cid:23) xx = (cid:12)(cid:23)(cid:15)(cid:22)(cid:11) (cid:20)(cid:24) xx = (cid:12)(cid:23)(cid:15)(cid:23)(cid:11) (cid:20)(cid:25) xx = (cid:20) u (cid:21) u (cid:22) u (cid:23) u (cid:24) u (cid:25) u (cid:26) u (cid:27) u (cid:28) u (cid:20)(cid:19) u (cid:20)(cid:20) u (cid:20)(cid:21) u (cid:20)(cid:22) u (cid:20)(cid:23) u (cid:20)(cid:24) u (cid:20)(cid:25) u (cid:20) u (cid:21) u (cid:22) u (cid:23) u (cid:24) u (cid:25) u (cid:26) u (cid:27) u (cid:28) u (cid:20)(cid:19) u (cid:20)(cid:20) u (cid:20)(cid:21) u (cid:20)(cid:22) u (cid:20)(cid:23) u (cid:20)(cid:24) u (cid:20)(cid:25) u Fig. 1. For G N -coset codes, equivalent encoding graphs may be obtainedbased on stage permutations: (a) Arıkan’s original encoding graph [1] and(b) stage-permuted encoding graph. Each node adds (mod-2) the signals onall incoming edges from the left and sends the result out on all edges to theright. Polar codes are the first capacity-achieving channel codes[1]. RM codes are proved to achieve the binary erasure channelcapacity under the maximum-a-posteriori (MAP) decodingalgorithm [2]. Both codes have been adopted for 5G controlchannel.
B. Motivations and Contributions
Both RM codes and polar codes are not designed forparallel decoding. RM codes are only adopted for very shortcode lengths due to the lack of linear-complexity decodingalgorithms. Polar codes exhibit superior performance with suc-cessive cancellation (SC) based decoders. But an SC decoderis serial in nature [1] as it requires N − time steps for alength- N code.To seek parallelism on the decoding side, we proposea novel stage-permuted turbo-like decoding framework. Asshown in Fig. 1(a), the encoding process of G N -coset codescan be described by an n -stage encoding graph. Therefore, G N -coset codes can be considered as concatenation codes.The former and latter stages respectively correspond to outerand inner codes. The inner code parts consist of independent component codes that can be decoded in parallel (the j -th codebit of the i -th inner component code is denoted by x ( i, j ) in Fig. 1(a)) [3]. In contrast, the outer code parts must be decodedsuccessively, which is the major source of latency of all SC-based decoding algorithms.Based on the above observation, the proposed algorithmimproves decoding parallelism as follows. First, equivalentencoding/decoding graphs (see Fig. 1) of the same G N -cosetcode can be obtained by permuting the encoding stages [4].Second, decoding is performed on each of these equivalentgraphs . Within each graph, √ N inner component codes oflength √ N are decoded in parallel, but the outer componentcodes are not processed. Finally, decoding results from differ-ent graphs about the same code bit are exchanged to reacha consensus. Fig. 1(a) and Fig. 1(b) show two equivalentgraphs for N = 16 . In each decoding graph, the inner codeparts consist of component codes of length . Since onlythe inner code parts are decoded in parallel while the outercode processing is avoided, the proposed decoding algorithmexhibits a higher degree of parallelism.Furthermore, we propose a new code construction principle(selection of A ) for the stage-permuted turbo-like decodingalgorithm. In particular, we show that the principle to select A is to reduce the code rate of the inner codes. Accordingly,we explore a heuristic code construction that outperforms theRM and polar codes under the stage-permuted decoder. C. Related works
In [5], [6], product codes with polar codes as componentcodes are studied, with the same aim of improving decodingparallelism. As product codes, the codes are constructed fromthe component codes, which lead to a square number ( k )of information bits. In contrast, we follow Arıkan’s G N -cosetcode framework [1], which is more general and flexible intwo folds. First, the code construction is defined directly by A . It naturally supports “1-bit” fine-granularity of informationlength. Second, stage permutation potentially supports a moreflexible framework with richer ( n ! instead of two) combina-tions of outer-inner code concatenations. Accordingly, iterativedecoding can be performed on at most n ! stage-permutedgraphs.II. S TAGE - PERMUTED TURBO - LIKE DECODINGALGORITHM
The aforementioned stage-permuted turbo-like decoder isformally described in Algorithm 1.Denote by G the original decoding graph consisting of n stages. There are n ! equivalent stage-permuted graphs [4].Among them, we choose the permuted graph G π with stagepermutation π { , , ...n } = { n/ , ...n, , ..., n/ } . Thisresults into a swap between the inner and outer code parts ofthe original decoding graph G (see Fig. 1). Because the innercode part of G π is the outer code part of G , by decoding theinner code parts of G and G π , full information (parity checkfunctions) about G is exploited.The decoding algorithm iterates by decoding the two graphs G and G π alternately (line 3 in Algorithm 1) as follows. Fordecoding graph G (resp. G π ), the j -th code bit of the i -thinner component code is denoted by x ( i, j ) (resp. x π ( j, i ) ). Algorithm 1
A stage-permuted turbo-like decoder.
Input:
The received signal y = { y i , i = 1 · · · N } ; Output:
The recovered codeword ˆ x = { ˆ x i , i = 1 · · · N } ; Initilize L chan,i , y i σ for i = 1 · · · N ; T π,i,j = 0 ∀ i, j ; Λ = G ; for Iterations: t = 1 · · · t max do Select decoding graph:
Λ = (Λ == G ) ? G π : G ; if Λ is G then for Inner component codes: i = 1 · · · √ N (in paral-lel) do L ti,j = L chan,i +( j − √ N + α t T t − π,i,j for j =1 · · · √ N ; T ti,j =1 ···√ N = Sof tDecoder ( L ti,j =1 ···√ N ) ; end for else for Inner component codes: i = 1 · · · √ N (in paral-lel) do L tπ,j,i = L chan, ( i − √ N + j + α t T t − j,i for j =1 · · · √ N ; T tπ,j =1 ···√ N,i = Sof tDecoder ( L tπ,j =1 ···√ N,i ) ; end for end if end for for Inner component codes: i = 1 · · · √ N do ˆ x i +( j − √ N = ( L chan,i +( j − √ N + T t max i,j + T t max π,i,j < , for j = 1 · · · √ N ; end for Take the non-permuted graph G for example, L ti,j is thelog likelihood ratio (LLR) of the code bit x ( i, j ) in the t -th iteration. Specifically, L ti,j is the sum of channel LLR L chan,i +( j − √ N and the soft extrinsic information T t − π,i,j from the previous decoding iteration (line 6). Following themethod in [11], the extrinsic information is multiplied by adamping factor α t to improve performance. The i -th soft-output component decoder, denoted by Sof tDecoder () , takes L ti,j , j = 1 , , · · · , √ N as input, and generates extrinsicinformation T ti,j , j = 1 · · · √ N as output (line 7). Thereare √ N inner component codes in each decoding graph andthe √ N component decoders can be implemented in parallel.After t max iterations, the algorithm outputs the hard decisionsof combined LLRs as recovered code bits.The decoding algorithm can exploit the parity check func-tions from both graphs. During decoding each graph, a √ N -times parallelism gain is obtained thanks to the fully paralleldecoding of inner component codes. Extrinsic informationoutput of these component codes is iteratively exchanged untilreaching a consensus. We will show next that various types ofsoft-output decoders can be implemented to trade off betweenperformance and decoding latency. A. Soft-output decoders for inner codes
Since each inner component code is itself a short G N -cosetcode, it is feasible to adopt low complexity SC-based decoders[7], [8] as follows. • Type-1:
Soft-output SC list decoder provides the bestperformance but has the highest complexity and latency.A Chase-like algorithm [11] is used to generate soft LLRestimation from the decoding paths. • Type-2:
Soft-output SC permutation list decoder runsseveral permuted SC decoders in parallel. These inde-pendent SC decoders requires no sorting, thus is fasterthan the SC list decoder. The same Chase-like soft LLRgeneration method is used. • Type-3:
Soft cancellation decoder [13] can directlyoutput soft LLRs. It has the smallest complexity andlatency.The above SC-based component decoders imply that theinner component codes could be constructed as polar codes.Specifically, we may decode the inner codes using SC list L decoders (Type-1). A recently proposed SC permutationlist decoding method [9], [10] can also be adopted (Type-2).Specifically, for each inner component code, we perform SCdecoding in parallel on L permuted codewords. It does not in-volve any sorting operations among the list paths, therefore canfurther improve the parallelism and reduce latency within eachcomponent code. Either way, for the i -th inner code, we obtain L estimated codewords denoted by x li = { ˆ x li, , ˆ x li, , ..., ˆ x li, √ N } for l ∈ { , , ..., L } . The decoding results are then used tocalculate the extrinsic information about code bits as follows.For each estimated codeword, a mean square error is calculatedas follows: M li = √ N X j =1 ( σ L ti,j − (1 − x li,j )) . (4) Then, inspired by Chase decoding [11], we take M li as thepath metric to calculate the soft output T ti,j : T ti,j = min { l | ˆ x li,j =1 } M li − min { l | ˆ x li,j =0 } M li σ . (5) When the decoded bits ˆ x li,j are the same in all the L estimatedcodewords, it means that the bit value is very likely to becorrect and thus the soft output output T ti,j is simply set to alarge value.The soft cancellation decoder (Type-3), proposed in [13],can also be adopted as the inner code decoder. This algorithmcan produce (extrinsic) reliability information for the estimatedcode bits in a much simpler way. Specifically, only soft deci-sions are made and propagated in the factor graph followingthe same scheduling as an SC decoder. It does not require tomaintain L list paths as the SC list and SC permutation listdecoders do. Therefore, the soft cancellation decoder has alatency similar to an SC decoder, and requires much smallermemory than the previous two list decoders.III. CODE CONSTRUCTION PRINCIPLE FOR THESTAGE - PERMUTED DECODER
As discussed, the extrinsic information from the innercode decoders are exchanged between the two graphs duringthe decoding. Therefore, the decoding performance of innercodes is essential for the overall performance. This requires (a) (b)
Frozen/known bitsInformation bitsCode bitsOuter component code Outer component code
Fig. 2. The optimal code construction under stage-permuted decodingis different from both polar and RM constructions. For a length-4 outercomponent code, according to polar or RM principle, the last bit should be theinformation bit, as shown in (a). As a result, all code bits are unknown and thenregarded as information bits while decoding the inner codes. Alternatively, ifthe third bit is the information bit, as shown in (b), two code bits becomeknown bits. This reduces the code rate of inner codes. specific code constructions (different from both polar and RMprinciples), as elaborated in the following example.Consider a length-16 G N -coset code consisting of fourlength-4 component codes. Assume that an outer componentcode has one information bit. As shown in Fig. 2(a), eitherRM or polar principle would request the last bit to be selectedas the information bit [12]. As a result, all code bits areunknown and thus regarded as information bits of the innercomponent codes. In other words, the inner code rates becomehigher, leading to poorer performance of the correspondinginner component decoders.In contrast, consider the case that the third bit be theinformation bit. As shown in Fig. 2(b), two of the code bits areknown bits (set to ). For the inner component codes, thesetwo code bits become frozen bits and thus reduce the coderate of inner codes.This example demonstrates the disadvantage of RM/polarprinciple, and illustrates the heuristic of our code constructionalgorithm. In the following, we propose an information setselection rule that maximally reduces the inner code rates. A. Choose information set for K = k The construction involves two steps:1) ( √ N , k ) component codes: since we proposed SC-based decoders for the inner codes, the ideal con-struction is a ( √ N , k ) short polar code. Denote by P = [ p , p , ..., p √ N ] the information vector: p i = (cid:26) The i -th bit is an information bit. The i -th bit is a frozen bit. (6)2) G N -coset codes : denote by I the information vector ofthe stage-permuted G N -coset codes. It can be derivedfrom P as follows: I = P ⊗ P . (7)
For example, consider a (16 , stage-permuted G N -cosetcode construction. In the first step, we construct a (4 , polarcode. The information vector P is as follows: P = [0 1 1 1] . (8) Es/N0(dB) B LE R -3 -2 -1 Performance evaluation under the stage-permuted turbo decoder
The proposed ConstructionPolar ConstructionRM Construction
Fig. 3. Under stage-permuted decoding, the proposed stage-permuted G N -coset code achieves significantly better performance than polar and RMconstructions. N = 65536 and K = 57121 . Then, the information vector I of the stage-permuted G N -coset code is obtained as follows: I = P ⊗ P = [0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1] . (9) Compared with polar and RM constructions, this codeconstruction reduces the perceived coding rates at the innercomponents decoders. All the inner component codes havethe same information vector P . Since P is constructed by thepolar principle, inner component codes are efficiently decodedby SC-based decoders.With either polar or RM principle to construct a G N -coset code, several information bits would be allocated toconsecutive bit positions at the ending part of an informationblock. This would significantly degrade the performance ifa stage-permuted turbo-like decoder is applied, because theymight as well yield all-rate-1 inner component codes. Fig. 3provides a numerical simulation result to support this assertion.As expected, our code construction principle to avoid highercoding rate for inner codes generates better performance thanboth polar and RM ones if the stage-permuted turbo-likedecoder is applied. B. General code construction
To construct an ( N, K ) code, we first construct an ( N, K ) stage-permuted G N -coset code according to the previous sub-section, where K , ⌈√ K ⌉ is the first square number largerthan K . Then, among the K information bit positions, weadditionally freeze K − K bit positions.Unlike the polar principle that would freeze the K − K least reliable bit positions, our heuristic construction reducesthe code rates for the inner codes in an iterative way. Ineach iteration, we freeze one bit position that would reducethe inner code rate. This incremental freezing is performedalternately on the original decoding graph G and the stage-permuted decoding graph G π until K information bit positionsare left. The details are given in Algorithm 2, Algorithm 3 andAlgorithm 4, and explained as follows. Algorithm 2
A successive freezing algorithm.
Input:
Information vector I ; Output:
Newly-constructed information vector I o ; N = length ( I ) , K = P I , I o = I for i = 1 ; i ≤ K − K ; i = i + 1 do if i is odd then j = SelectOneBitPositionToFreeze( I o , G ); else j = SelectOneBitPositionToFreeze( I o , G π ); end if I o ( j ) = 0 ; end for Frozen/known bitsInformation bitsCode bits
Freeze the 3-th one Freeze the 4-th one
Outer component code Outer component code
Fig. 4. For a length-4 outer component code with last two bit positionsas information set, the last bit position is an RRBP while the third one isnot. After freezing the last bit position, two code bits become frozen bits (asshown in the right graph), which reduce the code rate. In contrast, if the thirdone is freezed, all the code bits are unknown (as shown in the left graph).
Firstly, we narrow down to the rate-reducing bit positions(RRBPs), which have the following property (also illustratedin Fig. 4). When an RRBP u i is freezed, at least one of thecorresponding outer component code bits becomes a knownbit, denoted by x i . From the inner code’s perspective, bit x i is decoded as a frozen bit and thus reduces the code rate.Secondly, we only freeze one bit position among the RRBPsin each iteration. When there are multiple RRBPs, we chooseone RRBP u i , such that the resultant frozen bit x i in the innercomponent codes has the least reliability r i . As a result, theremaining information set of each inner component code isstill optimal according to the polar principle, and thus canbe efficiently decoded by SC-based decoders. The details aregiven in Algorithm 3 and Algorithm 4.With the general algorithm, G N -coset codes with arbitrarycode rates can be constructed. The proposed method is de-signed such that the performance under the stage-permutedturbo-like decoder is optimized. The heuristic is to reduce thecoding rate of the inner codes, as well as maximally preservingtheir sub-channel reliabilities. Algorithm 3
SelectOneBitPositionToFreeze( I , Λ ) Input:
Information vector I , decoding graph Λ ; Output:
The index of the bit to freeze; N = length ( I ) , Φ = [] ; Γ = InnerInformationVector(I, Λ ); for i = 1 ; i ≤ N ; i = i + 1 do if I ( i ) is then I i = I , I i ( i ) = 0 ; Γ i = InnerInformationVector( I i , Λ ); if Γ i is not equal to Γ then η i = min { index (Γ i ! = Γ) } ; if Λ is G then η i = int ( η i − √ N ) + 1 ; else η i = ( η i − √ N + 1 ; end if Φ .append( { i, r η i } ); end if end if end for return argmin i { r η i ∈ Φ } . Algorithm 4
InnerInformationVector( I , Λ ) Input:
Information vector I , decoding graph Λ ; Output:
Information vector I o ; I o = I if Λ is G then i s ⇐ ; else i s ⇐ log ( N )2 + 1 ; end if for i = i s ; i < log ( N ) + i s ; i = i + 1 do N = 2 i , △ = N , M = N N ; for m = 0 ; m < M ; m = m + 1 do for z = 1 ; z ≤ △ ; z = z + 1 do if I o ( z + △ + m ∗ N ) is then I o ( z + m ∗ N ) = 1 ; end if end for end for end for return I o .IV. P ERFORMANCE EVALUATION
In this section, we evaluate the performances and paral-lelism of several coding schemes. The coded symbols aremodulated with binary phase-shift keying (BPSK) modulationand then transmitted over an additive white Gaussian noise(AWGN) channel.The proposed G N -coset codes are decoded by the stage-permuted decoding algorithms with 8 iterations. During thedecoding of inner codes, the SC list 8 decoding algorithm(Type-1), the SC permutation algorithm with 8 permutations(Type-2), the soft successive cancellation algorithm (Type-3) Es/N0 (dB) B LE R -3 -2 -1 N=65536 K=57121 BPSK/AWGN
The proposed construction under type-1 stage-permuted turbo decoder The proposed construction under type-2 stage-permuted turbo decoder The proposed construction under type-3 stage-permuted turbo decoder Scheme-1: (65536, 57121+19) polar code under CRC-aided SC List 8 decoder Scheme-1: (65536, 57121) polar code under SC decoderScheme-2: 256 (256, 223=57121/256) polar code under SC List 8 decoders
Fig. 5. BLER performance in the case with a square number of informa-tion bits. Compared to Scheme-2, our scheme achieves significantly betterperformance. Compared to Scheme-1, our scheme achieves comparable errorcorrection performance.
Es/N0 (dB) B LE R -3 -2 -1 N=65536 K=56121 BPSK/AWGN
The proposed construction under type-1 stage-permuted turbo decoder The proposed construction under type-2 stage-permuted turbo decoder The proposed construction under type-3 stage-permuted turbo decoder Scheme-1: (65536, 56121+19) polar code under CRC-aided SC List 8 decoder Scheme-1: (65536, 56121) polar code under SC decoderScheme-2: 256 (256, 219=56121/256) polar code under SC List 8 decoders
Fig. 6. BLER performance in the case with a general number of informa-tion bits. Compared to Scheme-2, our scheme achieves significantly betterperformance. Compared to Scheme-1, our scheme achieves comparable errorcorrection performance. are evaluated. In the simulations, the damping factors are setas follows. For the the SC list 8 decoding algorithm (Type-1)and SC permutation algorithm (Type-2), the damping factors α = [0 . , . , . , . , . , . , . , . For the soft successivecancellation algorithm (Type-3), the damping factors α =[3 / , / , / , / , / , / , / , / .We first evaluate the code construction with a squarenumber of information bits. In the simulation, we construct (65536 , ) stage-permuted G N -coset codes andthen decode them with all types of decoding algorithms. Then,we evaluate the general code construction with N = 65536 and K = 56121 .Polar codes with different configurations are compared asbenchmarks. In Scheme-1, the same number of informationbits are encoded to a length- polar code. This long code configuration obtains more coding gain but incurs largerdecoding latency. In Scheme-2, 256 length- short polarcodes are encoded and decoded in parallel. This short codeconfiguration exhibits a similar degree of parallelism to ours,but suffers from performance loss. The polar codes are de-coded by SC decoders and CRC-aided (CA with 19 CRC bits)SC list 8 decoders.The block error rate (BLER) performances are provided inFig. 5 and Fig. 6. Compared to Scheme-2, our scheme achievessignificantly better performance. Compared to Scheme-1, ourscheme achieves comparable error correction performance.However, the decoding latency of our scheme is much smallerthan Scheme-1, as discussed below.The decoding latency is evaluated with an ASIC implemen-tation in a 16nm TSMC FinFET technology [14]. The requiredtime steps of these coding schemes are given in Table I. Itdemonstrates that our scheme can significantly reduce thedecoding latency thanks to the high degree of parallelism.Therefore, the proposed G N -coset codes possess the benefitsof both coding gain (comparable to that of Scheme-1) andparallelism (comparable to that of Scheme-2).Finally, we compare the proposed three types of soft-outputcomponent decoders. With the Type-1 (SC list) componentdecoder, it achieves better decoding performance with moretime steps. On the contrary, with the Type-2 (SC permutationlist) and Type-3 (soft cancellation) component decoders, therequired time steps can be reduced significantly. This onlycomes at a cost of 0.3 and 0.1 dB performance loss, respec-tively. The diverse choices of component decoders provide aflexible trade-off between performance and decoding latencyto meet the requirements of various communication scenarios.V. C ONCLUSION
We study the construction of G N -coset codes decoded by astage-permuted turbo-like decoding algorithm. Through stagepermutation, the decoding algorithm can exploit the paritycheck functions from multiple equivalent factor graphs. Sinceonly the inner code parts are decoded (in parallel) and theouter code processing is avoided, the decoding algorithmexhibits a higher degree of parallelism. Based on this decodingalgorithm, we propose a new G N -coset code constructionfor arbitrary information lengths and coding rates. The novelencoder-decoder framework is evaluated in terms of bothperformance and decoding latency. The simulations suggestthat the constructed G N -coset codes achieve comparable errorcorrection performance to polar codes of the same length.The ASIC implementation evaluation verifies that the stage-permuted turbo-like decoding algorithm has a much lowerdecoding latency. R EFERENCES[1] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”
IEEETransactions on Information Theory , vol. 55, no. 7, pp. 3051-3073, Jul.2009.[2] S. Kudekar, S. Kumar, M. Mondelli, H. D. Pfister, E. Sasoglu, andR. Urbanke, “Reed-Muller codes achieve capacity on erasure channels,”
IEEE Transactions on Information Theory , vol. 63, no. 7, pp. 4298-4316,Feb. 2017. TABLE IA
COMPARISON OF THE REQUIRED TIME STEPS BETWEEN THE PROPOSEDCODING SCHEMES AND POLAR CODES .Scheme N Rate Time steps
Parallelism
Type-1:Soft-output SC listas inner decoder 65536 . ) . ) CA SC List 8 65536 . . This is evaluated in our ASIC implementation [14] with the double-package mode turned off.[3] H. Zhang, J. Tong, R. Li, P. Qiu, Y. Huangfu, C. Xu, X. Wang, andJ. Wang, “A flip-syndrome-list polar decoder architecture for ultra-low-latency communications,”
IEEE Access , vol. 7, pp. 1149-1159, Dec. 2018.[4] H. Vangala, E. Viterbo, and Y. Hong, “Permuted successive cancellationdecoder for polar codes,” in Proc.
IEEE International Symposium onInformation Theory and Applications , pp. 1-5, Oct. 2014.[5] T. Koike-Akino, C. Cao, Y. Wang, K. Kojima, D. S. Millar, and K. Par-sons, “Irregular polar turbo product coding for high-throughput opticalinterface,” in
Optical Fiber Communication Conference and Exhibition ,Mar. 2018.[6] V. Bioglio, C. Condo, and I. Land, “Construction and decoding ofproduct codes with non-systematic Polar Codes.”[Online]. Available:https://arxiv.org/abs/1901.06892, 2019.[7] I. Tal, and A. Vardy, “List decoding of polar codes,” in Proc.
IEEETransactions on Information Theory , vol.61, no.5, pp. 2213-2226, Mar.2015.[8] K. Chen, K. Niu, and J. R. Lin, “Improved successive cancellationdecoding of polar codes,”
IEEE Transactions on Communications, vol.61, no. 8, pp. 3100-3107, Aug. 2013.[9] M. Kamenev, Y. Kameneva, O. Kurmaev, and A. Maevskiy, “A newpermutation decoding method for Reed-Muller codes,” in Proc.
IEEEInternational Symposium on Information Theory , pp. 1-5, Jul. 2019.[10] M. Kamenev, Y. Kameneva, O. Kurmaev, and A. Maevskiy,“Permutation decoding of polar codes.” [Online]. Available:https://arxiv.org/abs/1901.05459, 2019.[11] R. M. Pyndiah, “Near-optimum decoding of product codes: block turbocodes,”
IEEE Transactions on Communications, vol. 46, no. 8, pp. 1003-1010, Aug. 1998.[12] S. Kahraman, “Strange attractor for efficient polar code Design.” [On-line]. Available: https://arxiv.org/abs/1708.04167, 2017.[13] Ubaid U. Fayyaz, and John R. Barry, “Polar codes for partial responsechannels,” in Proc.
IEEE International Conference on Communications ,Jun. 2013.[14] X. Liu, Q. Zhang, P. Qiu, J. Tong, H. Zhang, C. Zhao, and J. Wang,“A 5.16gbps decoder ASIC for polar code in 16nm FinFET,” in Proc.