[PDF] Codes approaching the Shannon limit with polynomial complexity per information bit

Abstract

We consider codes for channels with extreme noise that emerge in various low-power applications. Simple LDPC-type codes with parity checks of weight 3 are first studied for any dimension m\rightarrow\infty. These codes form modulation schemes: they improve the original channel output for any SNR> -6 dB (per information bit) and gain 3 dB over uncoded modulation as SNR grows. However, they also have a floor on the output bit error rate (BER) irrespective of their length. Tight lower and upper bounds, which are virtually identical to simulation results, are then obtained for BER at any SNR. We also study a combined scheme that splits m information bits into b blocks and protects each with some polar code. Decoding moves back and forth between polar and LDPC codes, every time using a polar code of a higher rate. For a sufficiently large constant b and m\rightarrow\infty, this design yields a vanishing BER at any SNR that is arbitrarily close to the Shannon limit of -1.59 dB. Unlike other existing designs, this scheme has polynomial complexity of order m\ln m per information bit.

Full PDF

CCodes approaching the Shannon limit with polynomialcomplexity per information bit

Ilya Dumer and Navid GharaviUniversity of California, Riverside, USAEmail: [email protected], [email protected]

Abstract

We consider codes for channels with extreme noise that emerge in various low-powerapplications. Simple LDPC-type codes with parity checks of weight 3 are ﬁrst studied forany dimension m → ∞ . These codes form modulation schemes: they improve the originalchannel output for any

SN R > − SN R grows. However, they also have a ﬂoor on the output bit error rate(BER) irrespective of their length. Tight lower and upper bounds, which are virtuallyidentical to simulation results, are then obtained for BER at any SNR. We also study acombined scheme that splits m information bits into b blocks and protects each with somepolar code. Decoding moves back and forth between polar and LDPC codes, every timeusing a polar code of a higher rate. For a suﬃciently large constant b and m → ∞ , thisdesign yields a vanishing BER at any SNR that is arbitrarily close to the Shannon limitof -1.59 dB. Unlike other existing designs, this scheme has polynomial complexity of order m ln m per information bit. In this work, we address code design that protects information transmitted on the AWGN chan-nels with extreme noise. One particularly ubiquitous application is the Internet of things (IoT).To eﬃciently employ it, prospective standards [2] are supposed to achieve a 20 dB reductionin snr per channel bit (below SNR denotes the signal-to-noise ratio per information bit, andnotation snr implies channel outputs).From the theoretical standpoint, we consider binary linear codes C ( n, k ) of length n → ∞ and dimension k used on the BI-AWGN channels N (0 , σ n ) with noise power σ n → ∞ . To achievea ﬁxed signal-to-noise ratio

SN R = 1 / (cid:0) σ n R n (cid:1) , these codes must have the vanishing code rates R n that have an order of σ − n . Moreover, the fundamental Shannon limit shows that any suchcode may achieve the vanishing BERs only if

SN R > ln 2 (equivalently, this limit correspondsto 10 log ln 2 = − . R n that decline exponentiallyin code dimension m. In turn, this yields an exponential growth in bandwidth and decodingcomplexity, both proportional to R − n . For example, biorthogonal codes C (2 m − , m ) achieve the Shannon limit; however, their coderate R n = m/ m − declines exponentially in m. By contrast, the output word error rates (WER)of these codes experience very slow decline, which is only polynomial in blocklength n . Inparticular, for the low SN R ∈ (ln 2 , C (2 m − , m ) have WER [1] bounded fromabove by 1 a r X i v : . [ c s . I T ] J a n m = exp {− m ( √ SN R − √ ln 2) } For a practically important range of

SN R ∈ [1 ,

2] (which gives the range of [0 ,

3] dB), long codes C m – up to billions of bits – still have very high error rates P m . This is shown below for m = 18and m = 30. SNR (dBs) 0 1 2 P . . . P . . . C (2 m − , m ) with the outer RS codes orAG codes still have similar shortcomings, due to the fact that codes C (2 m − , m ) should havelength n proportional to σ n → ∞ . In summary, codes C m or their concatenations fail to yieldacceptable output error rates on the high-noise AWGN channels with SNR of [0,2] dB for theblocks of length n < .As the second example, consider general RM codes or their bit-frozen subcodes. Let W m be a sequence of the binary symmetric channels (BCH p ) with transition error probabilities p m = (1 − (cid:15) m ) / (cid:15) m → m → ∞ . It is well known that channels W m yield asequence of vanishing capacities C m ∼ (cid:15) m / ln 4 , m → ∞ It was proven in [3, 4] that long low-rate RM codes RM ( m, r ) of order r = o ( m ) and length n = 2 m approach the maximum possible code rates C m on channels W m under the maximum-likelihood (ML) decoding. Even in this case, code rates R n decline exponentially as m r − m andrequire exponential decoding complexity.Consider also the existing low-complexity algorithms known for RM codes [7], [8], [9] or theirbit-frozen subcodes [5, 6]. For low SN R < − or require unacceptably large lists under successive cancellation list (SCL) decoding.Finally, consider polar codes [12] of rate R n → σ n ∼ / (2 SR n ) for a ﬁxed SNR = S. One construction of such codes is considered in [10].For σ n → ∞ , these codes begin with a growing number µ ∼ log σ n of upgrading channels andemploy long repetition codes B (2 µ ,

1) or RM codes C (2 µ , m + 1). This design again results in arapid complexity increase as σ n → ∞ . To advance polar design, it is important to analyze howpolar codes of length n → ∞ operate within a vanishing margin ε n → ε n → .

002 at the SNR of 2 dB andimproves the NB-IoT standard [2] by 1 dB. Another recent design [13] yields WER of . Statement 1.

There exist codes (cid:98) C m of dimension k → ∞ and length O ( k ) that have complexityof order O ( k log k ) and limit BER to the order of exp {− c SNR √ k } , where c SNR > dependson SNR and is positive for any SN R above the Shannon limit of ln 2 . Statement 1 is predicated on our ”weak-independence” assumption discussed in Section 4.2

Basic construction and its decoding algorithm

Our basic code - which we denote C m - has generator matrix G m = [ I m | J m ], where I m is an m × m identity matrix and J m is an m × ( m ) matrix that includes all columns of weight 2.Clearly, n = (cid:0) m +12 (cid:1) and k = m. Let a ( s ) be any codeword generated by s rows of G m . Note thatevery row in J m has weight m − , every two rows have a single common 1, and every s ≥ s ) common 1s. Any codeword a ( s ) that has weight s in I m has overall weight w s = ms − s ) = s ( m − s + 1) (1)Thus, code C m has distance m, which is achieved if s = 1 , m .Let [ i, j ] = [ j, i ] denote code positions in G m , where 0 ≤ i (cid:54) = j ≤ m. Encoder aG m receivesa string a = ( a , , ..., a ,m ) of m information bits and adds ( m ) parity bits a , , ..., a m − ,m suchthat a i,j = a ,i + a ,j . Note that encoding has complexity O ( n ) . Let code C m of rate R = 2 / ( m + 1) be used on an AWGN channel with p.d.f. N (0 , σ ) andconstant SN R = (cid:0) σ R (cid:1) − per information bit. In the sequel, it will be convenient for us touse a constant c = 4( SN R ). We use a map { , } → {± } for each transmitted symbol a i,j ,where 0 ≤ i (cid:54) = j ≤ m. Then the parity checks a i,j form the real-valued products a ,i = a ,j a i,j (2)Let an all-one codeword 1 n be transmitted. Then the received symbols y i,j ≡ y j,i form inde-pendent Gaussian R.V. N (1 , σ ) . We will use rescaled r.v. z i,j = δy i,j, where δ = 1 / (cid:0) σ + 1 (cid:1) = c/ ( m + c + 1) . It is easy to verify that this scaling gives power moments x = E ( z i,j ) and σ = E ( z i,j ) such that x = σ = δ (3)Given some z i,j , an input a i,j = 1 has posterior probability q i,j (cid:44) Pr { | z i,j } = 1 / (exp ( − z i,j ) + 1) . Decoding algorithm Ψ soft ( z ) described below employs two closely related quantities, the log-likelihoods (l.l.h.) h i,j and the “probability oﬀsets” u i,j : h i,j = ln[ q i,j ] − ln[1 − q i,j ] = 2 z i,j (4) u i,j = 2 q i,j − z i,j ) (5)Given the oﬀsets u ,j and u i,j in (2), it is easy to verify that symbol a ,i has oﬀset u ,i = u ,j u i,j . Also, u i,j = tanh( z i,j ) = tanh( h i,j / x ) has derivatives tanh (cid:48) (0) = 1 andtanh (cid:48)(cid:48) (0) = 0 at x = 0 . Thus, for the vanishing values of z i,j → ,u i,j = z i,j + o ( z i,j ) = h i,j / o ( h i,j ) (6)Algorithm Ψ soft performs several steps of belief propagation. Unlike conventional algorithms,we estimate only information bits a ,i . We will show that Ψ soft requires L ∼ ln m / ln c iterationsto achieve the best performance . For every step (cid:96) = 1 , ..., L and every symbol a ,i , consider its j -th parity check a ,i = a ,j a i,j of (2) . To re-evaluate a ,i , we introduce the oﬀset u i | (cid:96) ( j ) of the symbol a ,j used in this paritycheck. Then the estimate u i,j u j | (cid:96) ( j ) re-evaluates symbol a ,i via the product a ,j a i,j . We thenobtain the l.l.h. h i | (cid:96) +1 ( j ) of the j -th parity check using transforms (4) and (5). Next, the sumof l.l.h. h i | (cid:96) +1 ( j ) gives the compound estimate h i | (cid:96) +1 of the symbol a ,i . Finally, we derive thepartial l.l.h. h j | (cid:96) +1 ( i ) of the symbol a ,i that will be used in the next round to estimate a ,j via3ts i -th parity check a ,j = a ,i a i,j . This excludes the intrinsic information h i | (cid:96) +1 ( j ) that symbol a ,j already used in round (cid:96) . Our recalculations begin with the original estimates u i | ( j ) (cid:44) u ,i . Round (cid:96) of Ψ soft is done as follows.For all i, j ∈ { , ..., m } and j (cid:54) = i : A. Derive quantities u i | (cid:96) +1 ( j ) = u i,j u i | (cid:96) ( j )and h i | (cid:96) +1 ( j ) = 2 tanh − (cid:2) u i | (cid:96) +1 ( j ) (cid:3) .B. Derive quantities h i | (cid:96) +1 = (cid:80) j h i | (cid:96) +1 ( j )and h j | (cid:96) +1 ( i ) = h i | (cid:96) +1 − h i | (cid:96) +1 ( j ) C. If (cid:96) < L, ﬁnd u i | (cid:96) +1 ( j ) = tanh( h i | (cid:96) +1 ( j ) / . Go to A with (cid:96) := (cid:96) + 1 . If (cid:96) = L :estimate BER τ L = m (cid:80) i Pr { h i | L < } ;output numbers h i | L and a i = sign ( h i | L ) . (7)To estimate the complexity of Ψ soft , note that Step A uses at most n multiplications and n two-way conversions u ↔ h. Step B calculates the sums h i | (cid:96) +1 using m operations for each i . Italso requires 2 n operations to derive the residual sums h i | (cid:96) +1 ( j ) and their oﬀsets u i | (cid:96) +1 ( j ) forall pairs i, j . Then the overall complexity has the order O ( n ) for every iteration (cid:96). Assumingthat we have L = O ( log m ) iterations, we obtain complexity O ( n log n ) . C m We will now study the output BER of codes C m . We ﬁrst show that long codes C m fail toachieve BER P c → SN R = c/ . This is similar tothe uncoded modulation (UM). Let Q ( x ) = (2 π ) − / (cid:90) ∞ x exp {− y / dy Assume that an all-one codeword 1 n (formerly, a 0 n codeword in F n ) is transmitted and z = ( z i,j )is received. Consider the sets of positions I = (0 , j | j (cid:54) = 0 ,

1) and I = (0 , j | j (cid:54) = 0 , . For anyvector z, we will deﬁne the corresponding r.v. Y = (cid:88) j (cid:54) =0 , z ,j , Y = (cid:88) j (cid:54) =0 , z ,j Below we use asymptotic pdfs as m → ∞ . Then r.v. z i,j have asymptotic pdf N ( δ, δ ). It is alsoeasy to verify that r.v. Z i = (cid:80) j z i,j , Y , and Y have asymptotic pdf N ( c, c ) . Codewords of minimum weight in C m include m generator rows g ( p ) , p = 1 , ..., m, of thegenerator matrix G m and their sum g (0) = g (1) + ... + g ( m ) . Under ML decoding, any two-wordcode { n , g ( p ) } , has BER P c = Pr { Y < } = Q (cid:32) mδ − δ (cid:112) m ( δ − δ ) (cid:33) ∼ Q (cid:0) √ c (cid:1) (8)Here we write f ( m ) ∼ g ( m ) if lim f ( m ) /g ( m ) = 1 as m → ∞ . Similarly, we use notation f ( m ) (cid:38) g ( m ) if lim f ( m ) /g ( m ) ≥ . heorem 1. Let codes C m be used on an AWGN channel with an SNR of c/ per informationbit. Then for m → ∞ , ML decoding of codes C m has BER p ML ( c ) (cid:38) P c (1 − P c ) = 2 Q (cid:0) √ c (cid:1) − Q (cid:0) √ c (cid:1) (9) Proof.

Without loss of generality, we consider BER of symbol a , . In essence, we prove thatML decoding gives a , = − { n , g ( p ) } for p = 0 , . All receivedvectors z form four disjoint subsets U = A, B, C, D, where A = { z | Y < , Y > } , B = { z | Y > , Y < } (10) C = { z | Y > , Y > } , D = { z | Y < , Y < } (11)Clearly, Pr { A } = Pr { B } = P c (1 − P c ) . We will prove that p ML ( c ) (cid:38) Pr { A } + Pr { B } . Two vectors g ( p ) , p = 0 , , have supports J p = { ( p, j ) } , where j ∈ { , ..., m }\ { p } . For any z , consider bitwise products g ( p ) z that ﬂip symbols of z on the supports J p . Then g (0) A = C, g (1) A = D, g (0) B = D, g (1) B = C (12)Let z be decoded into some a ( z ) ∈ C m and let a , ( z ) be the ﬁrst symbol of a ( z ). We decomposeeach set U into U + = { z ∈ U : a , ( z ) = 1 } , U − = { z ∈ U : a , ( z ) = − } Note that a (cid:0) g ( p ) z (cid:1) = g ( p ) a ( z ) . Then g (0) A + = C − , g (1) A + = D − (13) g (1) B + = C − , g (0) B + = D − Conditions (12) and (12) show that maps g (0) and g (1) ﬂip full sets U and there subsets U + and U − . In the next step, we remove the ﬁrst symbol a , from each vector z and obtain four sets U (cid:48) = A (cid:48) , B (cid:48) , C (cid:48) , D (cid:48) with a punctured symbol a , . Let U (cid:48) + and U (cid:48)− denote the punctured subsetsof U + and U − . Below we show in Lemma that the maps g (0) and g (1) cannot reduce theprobability of the sets A (cid:48) + B (cid:48) . Namely,Pr (cid:8) C (cid:48)− (cid:9) + Pr (cid:8) D (cid:48)− (cid:9) ≥ (cid:8) A (cid:48) + (cid:9) (14)Pr (cid:8) C (cid:48)− (cid:9) + Pr (cid:8) D (cid:48)− (cid:9) ≥ (cid:8) B (cid:48) + (cid:9) (15)Finally, consider p ML ( c ) ≡ (cid:80) U Pr { U − } . We then prove in Lemma that removing one bit a , has immaterial impact on Pr { U } as m → ∞ , so that Pr { U } ∼ Pr { U (cid:48) } . Then p ML ( c ) = (cid:88) U Pr { U − } ∼ (cid:88) U Pr (cid:8) U (cid:48)− (cid:9) We can now use (14) and (15), which gives p ML ( c ) ∼ Pr (cid:8) A (cid:48)− (cid:9) + Pr (cid:8) B (cid:48)− (cid:9) + Pr (cid:8) C (cid:48)− (cid:9) + Pr (cid:8) D (cid:48)− (cid:9) ≥ Pr (cid:8) A (cid:48)− (cid:9) + Pr (cid:8) B (cid:48)− (cid:9) + Pr (cid:8) A (cid:48) + (cid:9) + Pr (cid:8) B (cid:48) + (cid:9) = Pr (cid:8) A (cid:48) (cid:9) + Pr (cid:8) B (cid:48) (cid:9) Thus, we obtain (9). (cid:4) emma 2. Punctured sets U (cid:48) = A (cid:48) , B (cid:48) , C (cid:48) , D (cid:48) satisfy inequalities (14) and (15).Proof. Recall that 1 n is the transmitted vector. In this case, the set C has the highest probabilityamong all sets U , while D is the least likely. We now can establish stronger conditions. In essence,we show that the transition A (cid:55)→ C (or B (cid:55)→ C ) produces a greater increase Pr( C ) − Pr( A ) thanthe drop Pr( A ) − Pr( D ) required in transition A (cid:55)→ D. We say that any x ∈ A (cid:48) , B (cid:48) is a ( θ, ρ ) vector if Y = θ, Y = ρ. According to (10), any x ∈ A has θ < , ρ >

0, whereas it is vice versa for x ∈ B. Recall that r.v. Y , Y have asymptotic pdf N ( c, c ) . (The exact pdf is N ( cλ, cλ − cδλ )) . Consider ( θ, ρ )-vectors x ∈ A. On the subset I = { (0 , j ) } , these vectors x have pdf p ( θ ) ∼ (2 πc ) − / e − ( θ − c ) / c For any x, the transform g (0) x only ﬂips symbols x ,j thus replacing p.d.f. p ( θ ) on the set I with p ( − θ ) . This gives the ratio r ( θ ) = p ( − θ ) /p ( θ ) = e − θ The other transform g (1) x of any ( θ, ρ )-vector x ﬂips symbols x ,j . Then we obtain the ratio r ( ρ ) = p ( ρ ) /p ( − ρ ) = e − ρ Now we consider two vectors from A + , namely, x = x ( θ, ρ ) and y = y ( − ρ, − θ ). Then g (0) x ∈ C and g (1) x ∈ D. The same inclusion holds for vector y. Also, both vectors x and y have the samepdf p ( x ) = p ( y ) = p generated on the sets I and I , since both r.v. Y and Y have the samedistribution. We can now estimate the total pdf of vectors g ( p ) x and g ( p ) y as follows p (cid:16) g (0) x (cid:17) + p (cid:16) g (1) x (cid:17) = (cid:16) e − θ + e − ρ (cid:17) pp (cid:16) g (0) y (cid:17) + p (cid:16) g (1) y (cid:17) = (cid:16) e θ + e ρ (cid:17) p Since exp {− a } + exp { a } ≥ a, we can reduce the latter equalities to2 (cid:88) p =1 , p (cid:16) g ( p ) x (cid:17) + p (cid:16) g ( p ) y (cid:17) ≥ p This immediately leads to inequality (14). Inequality (15) is identical if we replace A + with B + . Other inequalities of the same kind can be obtained if we consider subsets A (cid:48) , B (cid:48) (or A (cid:48)− , B (cid:48)− ) . (cid:4) We now prove that removing position (0 ,

1) is immaterial for our proof.

Lemma 3.

Any set U and its one-bit puncturing U (cid:48) satisfy asymptotic equality Pr { U } ∼ Pr { U (cid:48) } . Proof.

Note that r.v. z , has pdf N ( δ, δ ) , where δ ∼ c/m → m → ∞ , whereas r.v. Y (or Y ) has pdf N ( δ, δ ) . Let r = (cid:112) c/m ln m and r (cid:48) = r ln m. Then with probability tending to 1, wehave the following conditions: z , ∈ [ − r, r ] , Y / ∈ [ − r (cid:48) , r (cid:48) ] (16)Thus, Pr { z , /Y → } → m → ∞ . Now we see that equalities Pr { U } ∼ Pr { U (cid:48) } hold forany set U or U + or U − as m → ∞ . (cid:4) Probabilistic Bounds for BP decoding

Our next goal is to study BP algorithm Ψ soft of (7). We ﬁrst slightly expand on our notation.We say that events U m hold with high probability P m if P m → m → ∞ . Let N ( a, b ) denotethe pdf of a Gaussian r.v. that has mean a , variance b, and the second power moment a + b .Consider a sequence of Gaussian r.v. x m that have pdf N ( a, b m ) , where b m = b (1 + θ m ) , b > θ m → m → ∞ . Consider also any sequence t m such that t m = o ( θ − / m ) . Then Pr { x m > t m } ∼ Q (( t m − a ) b − / ) and we write N ( a, b m ) ∼ N ( a, b ).Consider also r.v. z i,j that has pdf asymptotic N ( δ, δ ) as m → ∞ . Then restriction (16) showsthat with high probability z i,j → . Then equality (6) shows that u i,j = z i,j + o ( z i,j ) ∼ z i,j . Thus,we will replace r.v. u i,j in algorithm Ψ soft with z i,j . To derive analytical bounds, we will slightly simplify algorithm Ψ soft and assign the samevalue h i | (cid:96) +1 ( j ) = h i | (cid:96) +1 for all j instead of diﬀerent assignments h i | (cid:96) +1 ( j ) := h i | (cid:96) +1 − h j | (cid:96) +1 ( i ) . It can be shown that this change is immaterial for our asymptotic analysis. It also makes verynegligible changes even on the short blocks C. The simpliﬁed version of the algorithm Ψ soft -described below - begins with the initial assignment u j | = z ,j in round (cid:96) = 0 . We will perform L = 2 ln m/ ln c rounds. In round (cid:96), Ψ soft proceeds as follows. A. Derive quantities u i | (cid:96) +1 ( j ) = z i,j u j | (cid:96) and h i | (cid:96) +1 ( j ) = 2 tanh − (cid:2) u i | (cid:96) +1 ( j ) (cid:3) .B. Derive quantities h i | (cid:96) +1 = (cid:80) j h i | (cid:96) +1 ( j ) C. If (cid:96) < L, ﬁnd u i | (cid:96) +1 = tanh( h i | (cid:96) +1 / . Go to A with (cid:96) := (cid:96) + 1 . If (cid:96) = L :estimate BER τ L = m (cid:80) i Pr { h i | L < } ;output numbers h i | L and a ,i = sign ( h i | L ) . (17)To derive analytical bounds, we will also assume that diﬀerent r.v. h i | (cid:96) are “weakly dependent”.Namely, we call r.v. ξ , ..., ξ m weakly dependent if for m → ∞ , we have asymptotic equality E ( ξ i | ξ j , ..., ξ j b ) → E ( ξ i )for any constant b, index i, and any subset J = { j , ..., j b } such that i / ∈ J . In particular, wewill assume that the conditional moment E ( h i | (cid:96) +1 | h j | (cid:96) , ..., h j b | (cid:96) ) tends to the unconditionalmoment E ( h i | (cid:96) +1 ) . This assumption does not necessarily hold if b is a growing number . However,in our case, r.v. h i | (cid:96) +1 includes m − h i | (cid:96) +1 ( j ) for all j (cid:54) = i. On the otherhand, only one related term h j | (cid:96) ( i ) is included in each sum h j | (cid:96) for any j ∈ J. (Both termsinclude the same factor u i,j used to evaluate symbols a ,i and a ,j in parity check (2)). Theabove assumption is also corroborated by the simulation results, which essentially coincide withthe theoretical bounds derived below (see Fig. 3, in particular).Our goal is to derive BER P soft ( c ) = lim τ L for Ψ soft as L, m → ∞ . Given c >

0, considerthe equation x = 1 √ π (cid:90) ∞−∞ tanh( t √ xc ) e − ( t −√ xc ) / dt (18)In Lemma 8, we will show that for c ≤ x = 0 . For c > , (18)has the root x = 0 and two other roots x ∗ and − x ∗ , where x ∗ ∈ (0 , . For any (cid:96) = 0 , , ..., L and any m → ∞ , we introduce parameter c (cid:96) = c ( (cid:96) +1) / . We then7erive probabilities P (cid:96) using recursion P (cid:96) +1 = S (cid:96) + P (cid:96) T (cid:96) , where S (cid:96) = (2 π ) − / (cid:90) ∞−∞ Q ( c (cid:96) t ) e − ( t − c (cid:96) ) / dt (19) T (cid:96) = (2 π ) − / (cid:90) ∞−∞ Q ( c (cid:96) t ) (cid:16) e − ( t + c (cid:96) ) / − e − ( t − c (cid:96) ) / (cid:17) dt (20)and P = Q ( √ c ) . For any (cid:96), probabilities P (cid:96) depend on c only. We will also show that quantities P (cid:96) converge exponentially fast as (cid:96) → ∞ . Let P ∞ = lim (cid:96) →∞ P (cid:96) . We can now establish theasymptotic value of BER as m → ∞ . Theorem 4.

Let codes C m be used on an AWGN channel with an SNR c/ per information bit.For m → ∞ and c ≤ , algorithm Ψ soft has BER P soft ( c ) → / . For c > ,P soft ( c ) ∼ (1 − P ∞ ) Q ( √ x ∗ c ) + P ∞ (1 − Q ( √ x ∗ c )) (21)In Fig. 2 of this section, we will plot analytical bound (21) along with simulation results andthe lower bound (9) of ML decoding. We will see that all three bounds of Fig. 2 give very tightapproximations.We begin the proof of Theorem 4 with Lemma 5. Here we analyze the sums of r.v. z j thathave asymptotic pdf N ( δ, δ ) with a small bias δ → . Lemma 5.

Consider m independent r.v. z , ..., z m with pdf N ( δ, δ ) , where δ ∼ c/m. Let Z = (cid:80) j z j and Y = (cid:80) j z i.j . Then for m → ∞ ,E ( Z | Y ) ∼ E ( Z ) ∼ c (22) Proof.

Consider r.v. ε j = z j − δ that has pdf N (0 , δ ) . Let R = (cid:80) j ε j . This r.v. has ℵ distribution that tends to N ( c, δc ) as m → ∞ . Next, note that r.v. z j and ε j are equivalentwith high probability. Indeed, z j = ε j + 2 δε j + δ ∼ ε j (23)Here with high probability we have two events. First, ε j ≥ √ δ/ ln m, whereas the terms | δε j | and δ are bounded from above by δ / ln m = o (cid:16) √ δ/ ln m (cid:17) . Thus, z j ∼ ε j and Y ∼ R as m → ∞ . In turn, this implies that r.v. Y i has asymptotic pdf N ( c, δc ).To prove (22), we now may consider unbiased r.v. ε j and prove asymptotic equality E (cid:16)(cid:88) j ε j | R (cid:17) ∼ E (cid:16)(cid:88) j ε j (cid:17) = 0 (24)Consider any subset S of 2 m unbiased vectors ( ± ε , ..., ± ε m ) that give the same sum R = (cid:80) j ε j . Then asymptotic equality (24) holds for each subset S , which proves Lemma 5. (cid:4) To prove Theorem 4, we will ﬁrst study r.v. u i | (cid:96) and their average power moments x (cid:96) = E (cid:88) i (cid:0) u i | (cid:96) /m (cid:1) (25) σ (cid:96) = E (cid:88) i (cid:16) u i | (cid:96) /m (cid:17) (26)Then r.v. u (cid:96) = (cid:80) i u i | (cid:96) /m has power moments x (cid:96) and σ (cid:96) /m (here we assume that r.v. u i | (cid:96) are weakly dependent).In the following statements (Lemmas 6-8 and Theorem 4), we will show that r.v. u (cid:96) undergotwo diﬀerent processes as (cid:96) → ∞ . In the initial iterations (cid:96) = 1 , ..., r.v. u (cid:96) take vanishing values8ith high probability as m → ∞ . In these iterations, they also may take multiple random walksacross the origin. For c < (cid:96) → ∞ , r.v. u (cid:96) converge to 0. By contrast, for c > , r.v. u (cid:96) gradually move away from the origin in opposite directions, albeit with diﬀerent probabilities.In the process, r.v. u (cid:96) cross 0 with the rapidly declining probabilities as (cid:96) → ∞ . They approachtwo end points, x ∗ and − x ∗ with probabilities 1 − P ∞ and P ∞ , respectively, and converge tothese points after (cid:96) (cid:38) ln m/ ln c iterations. At this point, any r.v. u i | (cid:96) (that represents a speciﬁcbit i ) has BER of Q (cid:0) √ x ∗ c (cid:1) and 1 − Q (cid:0) √ x ∗ c (cid:1) . This constitutes bound (21).We ﬁrst derive how quantities x (cid:96) and σ (cid:96) change in consecutive iterations. Let σ > − σ ≤ x ≤ σ. Below we use two functions F c ( x, σ ) = (2 π ) − / (cid:90) ∞−∞ tanh (cid:0) σt √ c (cid:1) e − ( t − x √ c/σ ) / dt (27) G c ( x, σ ) = (2 π ) − / (cid:90) ∞−∞ tanh (cid:0) σt √ c (cid:1) e − ( t − x √ c/σ ) / dt (28) Lemma 6.

Let r.v. u i | (cid:96) , i = 1 , .., m, have average power moments x (cid:96) and σ (cid:96) of (25) and (26).Then any r.v. u i | (cid:96) +1 has conditional power moments E ( x (cid:96) +1 | x (cid:96) , σ (cid:96) ) = F c ( x (cid:96) , σ (cid:96) ) (29) E (cid:0) σ (cid:96) +1 | x (cid:96) , σ (cid:96) (cid:1) = G c ( x (cid:96) , σ (cid:96) ) (30) Proof.

Below we consider r.v. z i,j , Z i = (cid:80) j z i,j and Y i = (cid:80) j z i,j . The proof of Lemma 5shows that these r.v. have pdfs N ( δ, δ ) , N ( c, c ) , and N ( c, δc ) , respectively. For m → ∞ , wewill use three restrictions, all of which hold with high probability. Firstly, | z i,j | ≤ ∆ , where∆ = 2 √ δ ln m → . Indeed,Pr {| z i,j | > ∆ } ≤ Q (2 ln m − √ δ ) = m − m + o (1) (31)Also, c − √ c ln m ≤ Z i ≤ c + √ c ln m (32) Y i ∈ ( c − ∆ , c + ∆ ) , ∆ = m − c ln m (33)Since z i,j → i, j , algorithm Ψ soft can use the following approximations u i | (cid:96) +1 ( j ) = u i,j u j | (cid:96) ∼ z i,j u j | (cid:96) (34) h i | (cid:96) +1 ( j ) = 2 tanh − (cid:2) z i,j u j | (cid:96) (cid:3) ∼ z i,j u j | (cid:96) (35)Here we assume that r.v. z i,j and u j | (cid:96) are “weakly dependent”. Indeed, any estimate of u j | (cid:96) includes m − z i,j . We then ﬁx the sums Z i = (cid:80) j z i,j andconsider conditional r.v. z i,j u j | (cid:96) | Z i . Given restrictions (32) and (33) we obtain the moments E (cid:0) z i,j u j | (cid:96) | Z i (cid:1) = E ( z i,j ) E ( u j | (cid:96) ) = x (cid:96) Z i /m (36) D (cid:0) z i,j u j | (cid:96) | Z i (cid:1) = E ( z i,j | Z i ) E ( u j | (cid:96) ) − ( x (cid:96) Z i /m ) ∼ δσ (cid:96) (37)Similarly to the proof of Lemma 5, we consider r.v. z i,j and the sums Z i to be independent.We also remove the term ( x (cid:96) Z i /m ) in (37). Indeed, this term is immaterial since x (cid:96) ≤ σ (cid:96) and ( Z i /m ) (cid:46) cm − ln m = o ( δ ) , according to (32). In essence, here r.v. z i,j u j | (cid:96) have negli-gible means, which yield similar values of conditional variances D (cid:0) z i,j u j | (cid:96) | Z i (cid:1) and the secondmoments E (cid:0) z i,j u j | (cid:96) | Z i (cid:1) . 9e can now proceed with r.v. h i | (cid:96) +1 = 2 (cid:80) j z i,j u j | (cid:96) that sums up independent r.v. z i,j u j | (cid:96) derived in Step B of Ψ soft . Here we obtain E (cid:0) h i | (cid:96) +1 | Z i (cid:1) = mE (cid:0) z i,j u j | (cid:96) | Z i (cid:1) ∼ x (cid:96) Z i (38) D (cid:0) h i | (cid:96) +1 | Z i (cid:1) = m D (cid:0) z i,j u j | (cid:96) | Z i (cid:1) ∼ cσ (cid:96) (39)We can now proceed with the r.v. u i | (cid:96) +1 ∼ tanh( h i | (cid:96) +1 /

2) used in Step C of Ψ soft . For agiven Z i , r.v. h i | (cid:96) +1 has Gaussian pdf N (2 x (cid:96) Z i , cσ (cid:96) ) . By using the variables z ≡ x (cid:96) Z i and t = z/σ (cid:96) √ c , we obtain (29): E (cid:0) u i | (cid:96) +1 (cid:1) ∼ (cid:0) πσ (cid:96) c (cid:1) − / (cid:90) ∞−∞ tanh( z ) e − ( z − x (cid:96) c ) / cσ (cid:96) dz = (2 π ) − / (cid:90) ∞−∞ tanh( σ (cid:96) t √ c ) e − ( t − x (cid:96) √ c/σ (cid:96) ) / dt = F c ( x (cid:96) , σ (cid:96) ) (40)Similarly, we obtain (30): E (cid:16) u i | (cid:96) +1 (cid:17) ∼ G c ( x (cid:96) , σ (cid:96) ) (41)which completes the proof . (cid:4) Recall that the original r.v. u i | have equal power moments x = σ of (3). The followinglemma shows that nonlinear transformations (40) and (41) preserve this equality. It is for thisreason that we rescaled the original r.v. y i,j into z i,j to achieve equality (3).Consider function F c ( x, σ ) of (27) for | x | = σ . For any c, this gives the function R c ( x ) = (2 π ) − / (cid:90) ∞−∞ tanh( t (cid:112) | x | c ) e − (cid:16) t − √ | x | c (cid:17) / dt (42) Lemma 7.

For any two quantities x, σ such that | x | = σ and any c > , functions F c ( x, σ ) and G c ( x, σ ) satisfy relation F c ( x, σ ) = G c ( x, σ ) = R c ( x ) , if x ≥ F c ( x, σ ) = − G c ( x, σ ) = − R c ( x ) , if x < Proof.

Let x = σ and r = t √ xc. Then e − ( t −√ xc ) / = e r e − t / e − xc/ . Consider the function f ( r ) = e r (cid:0) tanh( r ) − tanh ( r ) (cid:1) = e r − e − r e r + e − r Clearly, f ( r ) is an odd function of r. Then F c ( x, σ ) − G c ( x, σ ) = (2 πxc ) − / e − xc/ (cid:90) ∞−∞ f ( r ) e − r / xc dr = 0The case of x < F c ( x, σ ) is an odd function and G c ( x, σ ) is an evenfunction. Then we proceed as above. (cid:4) Lemma 8.

For c ≤ , equation (18) has a single solution x = 0 . For c > , equation (18) hasthree solutions: x = 0 , x ∗ ∈ (0 , and − x ∗ . roof. Let x > . Integration in (42) includes the pdf of N ( √ xc, , which gives negligiblecontribution beyond an interval t ∈ ( − x − / , x − / ) . For x → , we can now limit 42) to thisinterval. In this case, t √ xc → c and tanh( t √ xc ) ∼ t √ xc. Then R c ( x ) ∼ (2 π ) − / (cid:90) ∞−∞ t √ xce − ( t −√ xc ) / dt = xc (44)Thus, inequality R c ( x ) > x holds for suﬃciently small x iﬀ c > . On the other hand, tanh( t √ xc ) < R c ( x ) < x. Now we see that functions y = R c ( x ) and y = x intersect atsome point x ∗ ∈ (0 ,

1) for any c > . Finally, it can be veriﬁed that R c ( x ) has a declining positivederivative R (cid:48) c ( x ) , unlike the constant derivative 1 of the function y = x. Therefore, equation (18)has a single positive solution x ∗ . (cid:4) In Fig. 1, function y = R c ( x ) is shown for diﬀerent values of x ∈ [0 ,

1] and

SN R =10 log ( c/ . The cross-point of functions y = R c ( x ) and y = x represents the root x ∗ . Here thethreshold c = 1 corresponds to SN R = − X Y Y=XR c (x) for SNR = -6 dBR c (x) for SNR = -4 dBR c (x) for SNR = -2 dBR c (x) for SNR = 0 dBR c (x) for SNR = 2 dBR c (x) for SNR = 4 dBR c (x) for SNR = 6 dB Figure 1: Functions y = R c ( x ) and y = x for diﬀerent values of SN R = 10 log ( c/ . Summarizing Lemmas 6-8, we have

Corollary 9.

Let m → ∞ . Then r.v. u i | (cid:96) , i = 1 , .., m, have power moments x (cid:96) and σ (cid:96) thatsatisfy equality | x (cid:96) | = σ (cid:96) for any iteration (cid:96). Iteration (cid:96) transforms x (cid:96) and σ (cid:96) into | x (cid:96) +1 | = σ (cid:96) +1 = R c ( x (cid:96) ) (45) Proof of Theorem c > , function R c ( x (cid:96) ) grows for positive x (cid:96) . Thus, equality R c ( x (cid:96) ) = x (cid:96) holds iﬀ x (cid:96) = x ∗ , where x ∗ the root of (18). Next, consider initial iterations (cid:96) = 0 , ... Here r.v. u has pdf N ( δ, δ/m ) and (with high probability) has vanishing values | u | ≤ (cid:112) δ/m ln m. In further iterations (cid:96), transform (44) performs simple scaling x (cid:96) +1 ∼ cx (cid:96) aslong as x (cid:96) → m → ∞ . Thus, algorithm Ψ soft fails for c < x (cid:96) → c > L = ln m / ln c. Note that u < Q ( √ δm ) ∼ Q ( √ c ) . For iterations (cid:96) = o ( L ) and m → ∞ , we still obtain vanishing moments | E ( u (cid:96) ) | (cid:46) c (cid:96) δ → . It can11lso be veriﬁed that E ( u (cid:96) ) moves away from 0 in µ = αL iterations for some α > . Note alsothat r.v. u (cid:96) has variance D ( u (cid:96) ) ≤ D ( u i | (cid:96) ) /m ≤ /m. Thus, both cases, u (cid:96) → x ∗ or u (cid:96) → − x ∗ , hold with high probability as (cid:96) → ∞ .3. We can now derive the BER for both cases. From (38) and (37), we see that the Gaussianrandom variable h i | (cid:96) +1 has the moments E (cid:0) h i | (cid:96) +1 (cid:1) ∼ x (cid:96) E ( Z i ) = 2 x (cid:96) c, D (cid:0) h i | (cid:96) +1 (cid:1) ∼ cσ (cid:96) For any iteration (cid:96), we can now estimate BER p i | (cid:96) +1 = Pr { h i | (cid:96) +1 < } as p i | (cid:96) +1 = Q (cid:0) x (cid:96) c/σ (cid:96) √ c (cid:1) = (cid:40) Q (cid:0) √ x (cid:96) c (cid:1) , if x (cid:96) > − Q ( √− x (cid:96) c ) , if x (cid:96) < P (cid:96) = Pr { x (cid:96) < } and 1 − P (cid:96) = Pr { x (cid:96) > } , which deﬁneconditions of (46). We will now use two partial distributions of r.v. u (cid:96) that have opposite means ± b (cid:96) , where b (cid:96) = | x (cid:96) | . According to (45), r.v. u i | (cid:96) have the second moment E ( u i | (cid:96) ) = b (cid:96) . Thenr.v. u (cid:96) = (cid:80) i (cid:0) u i | (cid:96) /m (cid:1) has the pdf N ( ± b (cid:96) , η (cid:96) ) with the variance η (cid:96) = (cid:0) b (cid:96) − x (cid:96) (cid:1) /m = b (cid:96) (1 − b (cid:96) ) /m Note that b (cid:96) → x ∗ for (cid:96) > L, whereas η (cid:96) → (cid:96), m → ∞ . Thus, r.v. u (cid:96) cross 0 with a vanishingprobability for any iteration (cid:96) > L. On the other hand, r.v. u (cid:96) may cross 0 multiple times if (cid:96) = o ( L ) . From now on, we take (cid:96) = o ( L ) . Then we will express P (cid:96) +1 via P (cid:96) using the mean b (cid:96) = c (cid:96) δ

5. Consider both distributions N ( x (cid:96) , η (cid:96) ) , where x (cid:96) = ± b (cid:96) = ± c (cid:96) δ. Given some value u of r.v. u (cid:96) , deﬁne r.v. u (cid:96) +1 | u = m − (cid:80) i (cid:0) u i | (cid:96) +1 | u (cid:1) . This r.v. has pdf p ( u ) = N ( cu, cη (cid:96) ) = (2 πη (cid:96) ) − / e − ( u − x (cid:96) ) m/ η (cid:96) First, let E ( u (cid:96) ) = b (cid:96) . Clearly Pr { cu < } = Q ( u (cid:112) c/η (cid:96) ) . Then we average over all values u of u (cid:96) and obtain the probability S (cid:96) = Pr { u (cid:96) +1 < | E ( u (cid:96) ) = b (cid:96) } = (cid:90) ∞−∞ Q ( u (cid:112) c/η (cid:96) ) p ( u ) du ∼ (2 π ) − / (cid:90) ∞−∞ Q ( t √ c ) e − ( t − b (cid:96) / √ η (cid:96) ) / dt Here we use variable t = u/ √ η (cid:96) . Next, we consider the initial iterations (cid:96) = o (ln m/ ln c ) andintroduce parameter C (cid:96) = b (cid:96) / √ η (cid:96) ∼ (cid:113) c (cid:96) +1 / (1 − m − c (cid:96) +1 ) ∼ c ( (cid:96) +1) / (47)Note that b (cid:96) / √ η (cid:96) = C (cid:96) ∼ c (cid:96) , which gives (19). Similarly, for E ( u (cid:96) ) = − b (cid:96) , we obtain theprobability Q (cid:96) = Pr { u (cid:96) +1 < | E ( u (cid:96) ) = − b (cid:96) } = (cid:90) ∞−∞ Q ( u/ (cid:112) c/η (cid:96) ) p ( − u ) du For (cid:96) < L = ln m/ ln c, this gives the probability P (cid:96) +1 = Pr { u (cid:96) +1 < } = (1 − P (cid:96) ) S (cid:96) + P (cid:96) Q (cid:96) = S (cid:96) + P (cid:96) T (cid:96) (48)12here T (cid:96) = Q (cid:96) − S (cid:96) is given by (20). We can also slightly tighten estimates (19) and (20), byusing quantity C (cid:96) of (47) instead of c (cid:96) . We can now proceed with iterations P (cid:96) , which begin with P = Q ( √ c ) . For any (cid:96), quantities S (cid:96) and T (cid:96) depend on c only. Also, quantities c (cid:96) = c ( (cid:96) +1) / grow exponentially, in which case S (cid:96) → Q (cid:96) → . Thus, quantities P (cid:96) converge, since P (cid:96) +1 ∼ P (cid:96) Q (cid:96) for suﬃciently large (cid:96) ≥ L. We can now evaluate P soft . For (cid:96) → ∞ , we replace P (cid:96) with P ∞ in (48) and use x ∗ of (18) . Finally, note that (21) is only an asymptotic estimate. Here we excluded the residual term O (ln m/ √ m ) used in approximations (31) and (33). (cid:4) −6 −4 −2 0 2 4 6 8 SNR [dB] −6 −5 −4 −3 −2 −1 B E R p soft p finite length p sim p ML Figure 2: Simulation results and analytical bounds for the algorithm Ψ soft applied tomodulation-type codes C of length 8256 . High-signal case.

Consider functions S (cid:96) and T (cid:96) of (19) and (20) as c → ∞ . Then S (cid:96) → ,T (cid:96) → , and P ∞ → P = Q ( √ c ) . In this case, P soft ∼ Q ( √ c ) ∼ (2 /πc ) / e − c/ . The latterrepresents a 3 dB gain over the uncoded modulation, whose BER has the order of e − c/ . Complexity . Given m information bits, algorithm Ψ soft has complexity of order m log m .Indeed, each iteration (cid:96) recalculates quantities u i | (cid:96) ( j ) and h i | (cid:96) ( j ) for all ordered pairs ( i, j ) . Thisrequires O ( m ) operations. We also need O (log m / log c ) iterations (cid:96) to make the estimates u i | (cid:96) bounded away from 0 as m → ∞ . Also, it can be shown that the stable point x ∗ can bereached within a margin ε → O (cid:0) ln ε − / ln c (cid:1) iterations . For ε = m − , this gives the overallcomplexity of m ln m / ln c operations. Simulation results vs analytical bounds.

In Fig. 2, we plot analytical bound P soft of (21)along with simulation results P sim and the lower bound P ML of (9). Here we consider codes C m of dimension m = 128 on the AWGN channels with various SNRs 10 log ( c/ . We see thatboth bounds (21) and (9) tightly follow simulation results and each other. This also supportsour main assumption that the algorithm Ψ soft can be considered using independent randomvariables. For completeness, we also plot non-asymptotic bound P finite length obtained by using13arameters C (cid:96) of (47) in both formulas (19) and (20). Unexpectedly, this bound completelycoincides with a much simpler lower bound P ML for high SNR. Let B i = B i ( µ, µr i ) be a sequence of b capacity-achieving polar codes. The rates 0 ≤ r <... < r b − will be speciﬁed later. We ﬁrst encode data block a i of length µr i into some vector A i ∈ B i and then form a compound block A = ( A , ..., A b − ) of length m = µb. Below µ → ∞ and b is a constant. Block A is further encoded by code C m of rate R m = 2 / ( m + 1) and length n = (cid:0) m +12 (cid:1) . We use notation (cid:98) C m for the compound code of rate R ∼ R m r, where r = (cid:80) i r i /b. Thus, code (cid:98) C m reduces code rate R m by a factor of r, which gives SNR of c/ r per informationbit . Let I s = { µs + 1 , ..., µ ( s + 1) } for any s = 0 , ..., b − . The received block (cid:98) C = (cid:98) C (0) oflength n is ﬁrst decoded by the algorithm Ψ soft using L = O (ln m ) iterations . The result issome block (cid:98) A (0) of length m. We then retrieve the ﬁrst µ decoded bits in (cid:98) A (0) that form thesub-block (cid:98) A = ( (cid:98) a , ..., (cid:98) a µ ) of length µ. Block (cid:98) A is decoded by a polar code B into some block A = { a , ..., a µ } . We assume that the corrected block A has W ER → µ → ∞ . We thenuse A to replace the ﬁrst µ symbols of the block (cid:98) C (0). The result is a new block (cid:98) C (1) of length n. This completes round s = 0.Round s = 1 is similar. Algorithm Ψ soft now also employs block A to recalculate theremaining m − µ information bits of (cid:98) C (1) . The obtained sub-block (cid:98) A = ( (cid:98) a µ +1 , ..., (cid:98) a µ ) is decodedinto some vector A = { a µ +1 , ..., a µ } using code B . Then A replaces (cid:98) A in positions i ∈ I andyields a new block (cid:98) C (2). Similarly, rounds s = 2 , ..., b − A s on positions i ∈ I s Then we obtain block (cid:98) C ( s + 1) that include corrected bits a , ..., a ( s +1) µ . In any round s, µs corrected information bits serve as frozen bits and aid the algorithmΨ soft . Indeed, with high probability, we use correct estimates u j | (cid:96) = a j for all j ≤ µs. Then theparity checks u i | (cid:96) +1 ( j ) = u i,j u j | (cid:96) are reduced to the repetitions/inversions u i | (cid:96) +1 ( j ) = a j u i,j ofsymbols u i,j . Also, recall that algorithm (7) outputs the likelihoods h i | L of all symbols a i . Thus,we use h i | L as our bit estimates in every round s as follows.For all i ∈ { µs + 1 , ..., m } and j ∈ { , ..., m } : A . Use block (cid:98) C ( s ) . Derive u i | (cid:96) +1 ( j ) = u i,j u j | (cid:96) and h i | (cid:96) +1 ( j ) = 2 tanh − ( u i,j u j | (cid:96) ) B . Derive h i | (cid:96) +1 = (cid:80) j h i | (cid:96) +1 ( j ) C . If (cid:96) < L, ﬁnd u i | (cid:96) +1 = tanh (cid:0) h i | (cid:96) +1 / (cid:1) . Goto A with u i | (cid:96) +1 and (cid:96) := (cid:96) + 1 .D . If (cid:96) = L, use block (cid:98) A s = ( h i | L , i ∈ I s ) . Decode it into A s ∈ B s ( µ, µr s ) .E. Replace (cid:98) A s with A s to form (cid:98) C ( s + 1) . If s < b − , let s := s + 1 , (cid:96) := 0 . Goto A. If s = b − , output bits a , ..., a m . Let an information block A consist of m zeros. We then use antipodal signaling and transmit acodeword 1 n over an AWGN channel . Round s includes µs correct information bits u i | (cid:96) = a i = 1 . λ s = s/b. Then the remaining m − µs r.v. u i | (cid:96) , i > µs, have the average power moments x (cid:96) = [ m (1 − λ s )] − (cid:88) i>µs Eu i | (cid:96) (49) σ (cid:96) = [ m (1 − λ s )] − (cid:88) i>µs E (cid:16) u i | (cid:96) (cid:17) (50)In particular, the initial setup with (cid:96) = 0 employs the original r.v. u i | that have asymptoticpdf N ( δ, δ ) for all i > µs and satisfy equalities x = σ = δ. Theorem 10.

Let the algorithm Ψ soft have λm correct information symbols a = ... = a λm = 1 , where λ ∈ (0 , . Then the remaining (1 − λ ) m symbols a i have BER P soft ( λ, c ) ∼ Q (cid:16)(cid:112) cX ( λ ) (cid:17) (51) where X ( λ ) satisﬁes equations X ( λ ) = λ + (1 − λ ) x ( λ ) (52) x ( λ ) = (2 π ) − / (cid:90) ∞−∞ tanh (cid:16) t (cid:112) cX ( λ ) (cid:17) e − (cid:16) t − √ cX ( λ ) (cid:17) / dt (53) Proof.

In essence, we follow the proof of Theorem 4. The main diﬀerence - that simpliﬁes thecurrent proof - is that the former vanishing point x = δ → X → λ. This removes the random walks across 0 analyzed in parts 4 and 5 of the former proof. Thus,now we have the case of P ∞ = 0 . The details are as follows.For any j ≥ µs + 1 , we use approximations (34) and (35) and take u j | (cid:96) = 1 for j ≤ µs. Then h i | (cid:96) +1 ( j ) ∼ u i | (cid:96) +1 ( j ) ∼ (cid:40) z i,j u j | (cid:96) , if j ≥ µs + 1 z i,j , if j ≤ µs For any given Z i , consider the sums Z (cid:48) i = (cid:80) j ≤ µs z i,j and Z (cid:48)(cid:48) i = (cid:80) j ≥ µs +1 z i,j . These sums haveexpected values E ( Z (cid:48) i ) = λZ i and E ( Z (cid:48)(cid:48) i ) = (1 − λ ) Z i . Let X (cid:96) = λ + (1 − λ ) x (cid:96) θ (cid:96) = λ + (1 − λ ) σ (cid:96) Then we deﬁne the moments E (cid:0) h i | (cid:96) +1 (cid:1) ∼ x (cid:96) Z (cid:48)(cid:48) i + 2 Z (cid:48) i ∼ Z i [ λ + x (cid:96) (1 − λ )] = 2 Z i X (cid:96) (54) D (cid:0) h i | (cid:96) +1 (cid:1) ∼ c (1 − λ ) σ (cid:96) + 4 cλ = 4 cθ (cid:96) (55)Thus, r.v. h i | (cid:96) +1 / N ( X (cid:96) c, θ (cid:96) c ) . Next. consider r.v. u i | (cid:96) +1 ∼ tanh( h i | (cid:96) +1 / . Similarly to equalities (29) and (30), we have E (cid:0) u i | (cid:96) +1 (cid:1) ∼ (cid:0) πθ (cid:96) c (cid:1) − / (cid:90) ∞−∞ tanh( z ) e − ( z − X (cid:96) c ) / cθ (cid:96) dz = F c ( X (cid:96) , θ (cid:96) ) (56) E [ u i | (cid:96) +1 ] ∼ (cid:0) πθ (cid:96) c (cid:1) − / (cid:90) ∞−∞ tanh ( z ) e − ( z − X (cid:96) c ) / θ (cid:96) c dz = G c ( X (cid:96) , θ (cid:96) )Any round s = λb begins with the initial values X ( λ ) and θ ( λ ) that satisfy equalities X ( λ ) = θ ( λ ) = λ + δ (1 − λ ) ∼ λ (57)15hich are similar to the former equality x = σ . Thus, we may follow the proof of Theorem4 and obtain equality F c ( X (cid:96) , θ (cid:96) ) = G c ( X (cid:96) , θ (cid:96) ) for any iteration (cid:96). Now we see that x (cid:96) +1 = σ (cid:96) +1 and X (cid:96) = θ (cid:96) . Then for any λ and (cid:96) → ∞ , we use variables x ( λ ) and X ( λ ) = λ + (1 − λ ) x ( λ ) . Equalities (49) and (56) then give x ( λ ) = E ( u ∞ i ) = F c ( X ( λ ) , (cid:112) X ( λ ))which can be rewritten as (53).This also gives estimate (51). Indeed, iterations (54) and (55) show that the original iterationfor (cid:96) = 0 gives r.v. h i that has Gaussian pdf N (2 λc, λc ) . Then for any round s = λb, r.v. u = m − (cid:80) i>µs u i | has the mean F c ( λc, λc ) = R ( λc ) and the vanishing variance D = R ( λc ) / (1 − λ ) m, where R c ( x ) is deﬁned in (42). Thus, for any λ > , our iterations begin with the crossoverprobability P = Pr { u ≤ } → m → ∞ . The latter implies that P (cid:96) → (cid:96) → ∞ , asdeﬁned in (48). In turn, we can remove P ∞ = 0 from (21). Now we can use r.v. h i | (cid:96) +1 that havepdf N (2 X (cid:96) c, X (cid:96) c ) , according to (54) and (55). For (cid:96) → ∞ , this gives (51) as P soft ( λ, c ) = Pr { h i | ∞ < } ∼ Q (cid:16)(cid:112) X ( λ ) c (cid:17) (58) (cid:4) The absence of random walks in our current setup also makes bound (51) very tight. Thisis shown in Fig. 3, where we plot analytical BER of (51) along with simulation results obtainedfor the algorithm Ψ soft ( λ ) . Here we consider codes C m with m = 128 and test various fractionsof frozen bits λ = s/m and diﬀerent S/N ratios 10 log ( c/ . −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 SNR [dB] −1 B E R P soft Simulation λ = 0.125P soft

Analytical λ = 0.125P soft

Simulation λ = 0.25P soft

Analytical λ = 0.25P soft

Simulation λ = 0.375P soft

Analytical λ = 0.375P soft

Simulation λ = 0.5P soft

Analytical λ = 0.5P soft

Simulation λ = 0.625P soft

Analytical λ = 0.625P soft

Simulation λ = 0.75P soft

Analytical λ = 0.75P soft

Simulation λ = 0.875P soft

Analytical λ = 0.875

Figure 3: Simulation results and analytical bounds for the algorithm Ψ soft applied tomodulation-type codes C with a fraction λ of frozen bits.Recall that the likelihoods h i | L ( λ ) give BER (51) in round s = λb . We can now representany Gaussian r.v. h i | L ( λ ) as a channel symbol that has pdf N (1 , σ ) and a BER Q (1 /σ ) . Thus,16 = 1 /cX ( λ ) . An important note is that codes B s ( µ, µr s ) now operate on the AWGN channels N (0 , σ ) that have a limited noise power 1 /cX ( λ ) . Unlike the original code C m , we can now usecodes B s ( µ, µr s ) of non-vanishing code rates that grow from r to r b − . Theorem 11.

Codes (cid:98) C m of dimension k → ∞ and length n = O ( k ) precoded with b polar codeshave overall complexity of O ( n ln n ) . For suﬃciently large b, these codes achieve a vanishing BERif used arbitrarily close to the Shannon limit of − . dB per information bit.Proof. In round s = λb, we use a capacity-achieving code B s ( µ, µr s ) . The correspondingBI-AWGN channel N s (0 , σ s ) has noise power σ s = ( X ( λ ) c ) − and achieves capacity [14] ρ c ( λ ) = log (cid:114) cX ( λ )2 πe − (cid:90) ∞−∞ f ( y ) log f ( y ) dy (59) f ( y ) = (cid:114) cX ( λ )8 π (cid:104) e − ( y +1) cX ( λ ) / + e − ( y − cX ( λ ) / (cid:105) Here parameter λ changes from 0 to 1 in small increments 1 /b, which tend to 0 as b → ∞ . Theaverage capacity for all AWGN channels N s (0 , σ s ) is ρ c = (cid:82) ρ c ( λ ) dλ. Thus, for m → ∞ , code (cid:98) C m achieves a vanishing BER for any code rate r < ρ c /m, which gives SN R > c/ ρ c . We now proceed with code complexity. For b polar codes B s ( µ, µr s ) , design complexityhas the order of bµ ∼ n/b or less. Their decoding requires the order of bµ ln µ < m ln m operations. Algorithm Ψ soft includes b rounds with L = O (ln m ) iterations in each round. Thisgives complexity order of n ln n if b is a constant or n ln n for growing b < ln m. Thus, overallcomplexity has the order of k ln k, where k → ρ c m is the number of information bits.To calculate the minimum SNR κ = min c ( c/ ρ c ) , we select parameters c and b. Then wesolve equation (52) for diﬀerent values of λ = s/b, where s = 0 , ..., b −

1, and calculate ρ c . Thefollowing table gives the highest value of code rate ρ c , and the corresponding value of κ = κ ( c, b ) . Here we count κ in dB, as 10 log κ . The last line shows the gap κ / ln 2 − b ρ c . . . . κ (in dB) − . − . − . − . κ / ln 2 − E − E − E − E − b is a constant for any SN R > ln 2 . Statement 1 now follows directly from theexisting bounds [12] on BER for polar codes. Here polar codes B i have length µ = m/b > k/b. (cid:4) In this paper, we study new codes that can approach the Shannon limit on the BI-AWGNchannels. We ﬁrst employ “modulation ” codes C m that use parity checks of weight 3. Thesecodes can be aided by other codes B m via back-and-forth data recovery. Using BP algorithmsthat decode information bits only, codes C m achieve complexity order of n ln n . Then newanalytical techniques give tight lower and upper bounds on the output BER, which are almostidentical to simulation results. Finally, we employ multilevel codes of dimension k → ∞ thatapproach the Shannon limit with complexity order of k . One open problem is to ﬁnd out ifthere exists a close-form solution to the transcendental equations (52), which (unexpectedly)give the Shannon limit using numerical integration in (59).17ur future goal is to improve code design for moderate lengths. This work in progressuses more advanced combinatorial designs for modulation codes. We conjecture that it alsomay reduce code complexity to the order of ln k operations per information bit for dimensions k → ∞ . References [1] G. D. Forney, Jr. and G. Ungerboeck, “Modulation and coding for linear Gaussian chan-nels,”

IEEE Trans. Info. Theory , vol. 44, , pp. 2384-2415, Nov. 1998.[2] R. Ratasuk, N. Mangalvedhe, Y. Zhang, M. Robert, and J.-P. Koskinen, “Overview ofnarrowband IoT in LTE Rel-13,” in Proc. IEEE Conf. Standard Commun. Netw., Berlin,Germany, Oct./Nov. 2016, pp. 1–7.[3] E. Abbe, A. Shpilka, and A. Wigderson, “ Reed-Muller Codes for Random Erasures andErrors ,” IEEE Trans. Info. Theory , vol. 61, pp.5229-5252, Oct. 2015.[4] R. Saptharishi, A. Shpilka and B.L. Volk, “Eﬃciently decoding Reed-Muller codes fromrandom errors,”

Proc. 48 th Symp. Theory of Comp. (STOC ’16) , pp. 227-235, Cambridge,MA, USA, June 19, 2016.[5] I. Dumer and K. Shabunov, “Near-optimum decoding for subcodes of Reed-Muller codes,” , Washington DC, USA, June 24-29, 2001, p. 329.[6] I. Dumer and K. Shabunov, “Soft decision decoding of Reed-Muller codes: recursive lists,”

IEEE Trans. Info. Theory , vol. 52, no. 3, pp. 1260-1266, 2006.[7] V. Sidel’nikov and A. Pershakov, “Decoding of Reed-Muller codes with a large number oferrors,”

Probl. Info. Transmission , vol. 28, no. 3, pp. 80-94, 1992.[8] P. Loidreau and B. Sakkour, “Modiﬁed version of Sidel’nikov-Pershakov decoding algorithmfor binary second order Reed-Muller codes,”

Proc. 9 th Intern. Workshop on Algebraic andCombinatorial Coding theory, ACCT-9. Kranevo, 2004, pp. 266–271. [9] M. Ye and E. Abbe, “Recursive projection-aggregation decoding of Reed-Muller codes”,Archive: 1902.01470v3 [cs.IT], 26 Feb. 2020.[10] I. Dumer, “Polar codes with a stepped boundary”,

Proc. IEEE Intern. Symp. Info. Theory ,Aachen, Germany, July 2017, pp. 2613-2617.[11] M. Fereydounian, M.V. Jamali, H. Hassani, and H. Mahdavifar, “Channel Coding at LowCapacity,”

Proc. 2019 IEEE Intern. Worshop. Info. Theory , Gotland, Sweden, August 2019,5 pp.[12] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes forsymmetric binary-input memoryless channels,”

IEEE Trans. Info. Theory, vol. 55 , pp.3051-3073, July 2009.[13] I. Dumer and N. Gharavi, “Codes for high-noise memoryless channels,” (Virtual Simposium), October 25-27, 2020, paper A03-04,pp. 101-105.[14] T. M. Cover and J. A. Thomas, “