[PDF] A Packing Lemma for Polar Codes

Abstract

A packing lemma is proved using a setting where the channel is a binary-input discrete memoryless channel (X,w(y|x),Y) , the code is selected at random subject to parity-check constraints, and the decoder is a joint typicality decoder. The ensemble is characterized by (i) a pair of fixed parameters (H,q) where H is a parity-check matrix and q is a channel input distribution and (ii) a random parameter S representing the desired parity values. For a code of length n , the constraint is sampled from p S (s)= ∑ x n ∈ X n ϕ(s, x n ) q n ( x n ) where ϕ(s, x n ) is the indicator function of event {s= x n H T } and q n ( x n )= ∏ n i=1 q( x i ) . Given S=s , the codewords are chosen conditionally independently from p X n |S ( x n |s)∝ϕ(s, x n ) q n ( x n ) . It is shown that the probability of error for this ensemble decreases exponentially in n provided the rate R is kept bounded away from I(X;Y)− 1 n I(S; Y n ) with (X,Y)∼q(x)w(y|x) and (S, Y n )∼ p S (s) ∑ x n p X n |S ( x n |s) ∏ n i=1 w( y i | x i ) . In the special case where H is the parity-check matrix of a standard polar code, it is shown that the rate penalty 1 n I(S; Y n ) vanishes as n increases. The paper also discusses the relation between ordinary polar codes and random codes based on polar parity-check matrices.

Full PDF

aa r X i v : . [ c s . I T ] M a y A Packing Lemma for Polar Codes

Erdal Arıkan

Bilkent UniversityAnkara, TurkeyEmail: [email protected]

Abstract —A packing lemma is proved using a setting wherethe channel is a binary-input discrete memoryless channel ( X , w ( y | x ), Y ) , the code is selected at random subject toparity-check constraints, and the decoder is a joint typicalitydecoder. The ensemble is characterized by (i) a pair of ﬁxedparameters ( H , q ) where H is a parity-check matrix and q isa channel input distribution and (ii) a random parameter S representing the desired parity values. For a code of length n ,the constraint is sampled from p S ( s ) = P x n ∈ X n φ ( s , x n ) q n ( x n ) where φ ( s , x n ) is the indicator function of event { s = x n H T } and q n ( x n ) = Q ni = q ( x i ) . Given S = s , the codewords are chosenconditionally independently from p X n | S ( x n | s ) ∝ φ ( s , x n ) q n ( x n ) .It is shown that the probability of error for this ensembledecreases exponentially in n provided the rate R is kept boundedaway from I ( X ; Y ) − n I ( S ; Y n ) with ( X , Y ) ∼ q ( x ) w ( y | x ) and ( S , Y n ) ∼ p S ( s ) P x n p X n | S ( x n | s ) Q ni = w ( y i | x i ) . In the special casewhere H is the parity-check matrix of a standard polar code, itis shown that the rate penalty n I ( S ; Y n ) vanishes as n increases.The paper also discusses the relation between ordinary polarcodes and random codes based on polar parity-check matrices. I. I

NTRODUCTION

Packing and covering lemmas are basic building blocksof coding theorems in information theory. The book by ElGamal and Kim [1] exempliﬁes this; it relies on a smallnumber of packing and covering lemmas (such as Lemma3.1 [1, p. 46] and Lemma 3.3 [1, p. 64]) to prove a vastnumber of coding theorems for multi-terminal source andchannel coding problems. Unfortunately, the packing andcovering lemmas used for proving theorems in a clean wayrely on joint, or at least pairwise, independence amongthe codewords. Joint or pairwise independence are toostrong assumptions for various practical code ensembles,including those for polar codes. The goal of this paper is toprove a packing lemma under less stringent conditions onthe code ensemble. The motivation behind this work is todevelop packing and covering lemmas that are applicableto polar codes so that existing proofs based on standardcode ensembles can be translated readily to similar proofsfor polar codes. In this paper, we address only the packingproblem. The results are preliminary. More work is neededto establish the desired links between random-coding meth-ods and explicit polar code constructions.In Sect. II, we review the random-coding method inthe absence of any constraints. In Sect. III, we extend themethod of Sect. II to the case of random-coding subject toparity-check constraints. In Sect. IV, we further specializethe results to the case of parity-check matrices obtained from polar coding. The paper concludes in Sect. V with asummary and remarks.II. S

TANDARD RANDOM - CODING METHOD

This section reviews the standard random-codingmethod. We follow the presentation given in [1, Sect. 3.1.2]and, for the most part, adopt the notation and conventionsthere.Consider a communication system employing blockcoding over a discrete memoryless channel (DMC)( X , w ( y | x ), Y ) with input alphabet X , output alphabet Y ,and transition probabilities w ( y | x ), x ∈ X , y ∈ Y . Let R denote the code rate, n the length of the codewords, and c = { x n (1),... , x n (2 ⌈ nR ⌉ )} the code itself. To send message m , one transmits the codeword x n ( m ) into the channel; inresponse, the channel outputs a word y n with probability w n ( y n | x n ( m )) ∆ = n Y i = w ( y i | x i ( m )); (1)and, the decoder in the system maps y n to a decisionˆ m ∈ [1 : 2 ⌈ nR ⌉ ] ∪ { e } where e is a special symbol indicat-ing decoder failure. Here, the decoder is assumed to bea joint typicality decoder designed for a channel input-output ensemble ( X , Y ) ∼ q ( x ) w ( y | x ) where q ( x ) is a givenprobability distribution on X . Given y n , the joint typicalitydecoder outputs ˆ m ( y n ) = j if j is the unique message indexin [1 : 2 ⌈ nR ⌉ ] such that ( x n ( j ), y n ) ∈ T ( n ) ǫ ( X , Y ); otherwise,the output is ˆ m = e . Here, T ( n ) ǫ is deﬁned as in [1, p. 27],namely, as the set of all ( x n , y n ) ∈ X n × Y n such that theinequalities | π ( x , y | x n , y n ) − q ( x ) w ( y | x ) | ≤ ǫ q ( x ) w ( y | x )hold for each ( x , y ) ∈ X × Y , where π ( x , y | x n , y n ) is thefraction of times ( x , y ) appears as a coordinate of ( x n , y n ).In random-coding analysis of such a system, one regardsthe code c as a sample of a random code C , drawn withprobability p C ( c ) = ⌈ nR ⌉ Y j = q n ( x n ( j )), (2)where x n ( j ) denotes the j th codeword in c and q n ( x n ) ∆ = Q ni = q ( x i ). The entire system is represented by anensemble ( M , C , Y n , ˆ M ) with a probability assignment p M , C , Y n , ˆ M ( m , c , y n , ˆ m ) of the form p M ( m ) p C ( c ) p Y n | M , C ( y n | m , c ) p ˆ M | C , Y n ( ˆ m | c , y n ), (3)here p M ( m ) is uniform on [1 : 2 ⌈ nR ⌉ ], p Y n | M , C ( y | m , c ) isgiven by (1) with x n ( m ) as the m th codeword of c , and ˆ M is a function of ( C , Y n ) as determined by the operation ofthe joint typicality decoder.Let E = { ˆ M M } denote the error event and P ( E ) theprobability of error w.r.t. the above ensemble. The goalof the random coding analysis is to show that, for anyﬁxed R < I ( X ; Y ) with ( X , Y ) ∼ q ( x ) w ( y | x ), the probabilityof error P ( E ) goes to zero as the block-length n increases.The analysis begins by observing that, due to symmetry, P ( E ) = P ( E | M = E = {( X n (1), Y n ) ∉ T ( n ) ǫ } and E = {( X n ( j ), Y n ) ∈ T ( n ) ǫ for some j P ( E | M = = P ( E ∪ E | M = ≤ P ( E | M = + P ( E | M = P ( E | M =

1) goes to 0 (exponentially) in n . For the second term, the union bound is used to write P ( E | M = ≤ ⌈ nR ⌉ X j = P ( D j | M =

1) (4)where D j ∆ = {( X n ( j ), Y n ) ∈ T ( n ) ǫ }; then, a joint typicalitylemma is invoked to bound each term in the union boundas P ( D j | M =

1) ˙ = − nI ( X ; Y ) , j

1, (5)which establishes that P ( E | M =

1) ˙ = n ( R − I ( X ; Y )) . This com-pletes the proof that P ( E ) goes to zero (exponentially) in n provided R < I ( X ; Y ). If one chooses q ( x ) as a distributionthat maximizes I ( X ; Y ), one obtains a proof of achievabilityof the channel capacity C ∆ = max q ( x ) I ( X ; Y ).III. R ANDOM CODING UNDER CONSTRAINTS

In this section, we consider the same channel codingproblem as in Sect. II with the difference that here thecode ensemble C is subject to certain constraints. Thetarget application of the method developed in this section ispolar coding; however, for broader applicability and a widerperspective, initial formulation is given in a fairly generalmanner. A. Code generation under constraints

The constraints on code generation will be representedby a parameter s taking values over a space S . We willconsider codes of length n and let x n ∈ X n denote ageneric channel input word of length n . We will modelthe constraints by a function φ : S × X n → {0,1} such that φ ( s , x n ) = x n satisﬁes the constraint s . As a simpleexample, let S = { o , e } and let φ ( e , x n ) = x n is even and φ ( o , x n ) = x n is odd. Amore general parity-check constraint will be treated in thenext section.We will say that a constraint functions φ is symmetric ifthere exists non-zero reals ( α s : s ∈ S ) such that X s ∈ S α s φ ( s , x n ) =

1, for all x n ∈ X n . (6) For example, the odd-even parity constraint is symmetricwith α s =

1. We will restrict attention to symmetric con-straint functions.The random code ensembles that we will consider will bedenoted as ( S , C ) with S denoting a random constraint vari-able that takes values in S and C = { X n (1),... , X n (2 ⌈ nR ⌉ )}denoting a code chosen at random subject to the constraint S . We take q ( x ), the target channel input distribution, asgiven. For any particular constraint s ∈ S and code c = { x n (1),... , x n (2 ⌈ nR ⌉ )}, we specify the probability assignmenton ( S , C ) as p S , C ( s , c ) = p S ( s ) ⌈ nR ⌉ Y m = q s ( x n ( m )) (7)where p S ( s ) ∆ = α s X x n φ ( s , x n ) q n ( x n ), s ∈ S , (8)and q s ( x n ) ∆ = φ ( s , x n ) q n ( x n ) P ˜ x n φ ( s , ˜ x n ) q n ( ˜ x n ) , x n ∈ X n . (9)Thus, the codewords { X n ( m )} are selected in a conditionallyi.i.d. manner from q s , given the constraint S = s . Note thatthe marginal distribution of individual codewords is givenby p X n ( m ) ( x n ) = X s p S ( s ) q s ( x n ) = q n ( x n ), x n ∈ X n , (10)which is in agreement with the target channel-input distri-bution. Also note that the channel output follows a product-form distribution p Y n ( y n ) = t n ( y n ) ∆ = n Y i = t ( y i ) (11)with t ( y ) ∆ = P x q ( x ) w ( y | x ). B. Analysis of probability of error

We now analyze the average performance of the con-strained code ensemble deﬁned by (7). As in Sect. II, weassume that the message random variable M is uniformlydistributed over [1 : 2 ⌈ nR ⌉ ] and that a joint typicality decoderis being used. The joint ensemble for the system will be( M , S , C , Y n , ˆ M ) with a probability assignment p M ( m ) p S , C ( s , c ) p Y n | M , C ( y n | m , c ) p ˆ M | C , Y n ( ˆ m | c , y n ), (12)which is the same as (3), except here the code ensemble isdeﬁned by (7). A property of this ensemble, which will beimportant in the sequel, is the independence of ( S , Y n ) and M . This can be veriﬁed by writing p S , Y n | M ( s , y n | m ) = X x n p S , X n ( m ), Y n | M ( s , x n , y n | m ) = X x n p S ( s ) q s ( x n ) w n ( y n | x n ),and observing that the ﬁnal sum is independent of m .We now turn to the error analysis and deﬁne the errorevents E , E , E as in Sect. II. As before, by symmetry,we have P ( E ) ≤ P ( E | M = + P ( E | M = P ( E | M =

1) goes to zero exponentially in n . To bound the second term P ( E | M = D j as deﬁned in Sect. II, aswell as the mutual information random variable i ( s ; y n ) = log p S , Y n ( s , y n ) p S ( s ) p Y n ( y n ) = log p S , Y n ( s , y n ) p S ( s ) t n ( y n ) , (13)and the event A = { i ( S ; Y n ) > n γ }. (14)The γ in the deﬁnition of A is a real number that willbe speciﬁed later. In terms of these, we have the followingbound. P ( E | M = = P ( E ∩ A | M = + P ( E ∩ A c | M = ≤ P ( A | M = + ⌈ nR ⌉ X j = P ( D j ∩ A c | M = = P ( A ) + (2 ⌈ nR ⌉ − P ( D ∩ A c | M = P ( A | M =

1) with P ( A ) bynoting that A , being an event deﬁned in terms of ( S , Y n ),is independent of { M = B as the set ofall ( s , x n , y n ) ∈ S × X n × Y n such that ( x n , y n ) ∈ T ( n ) ǫ and i ( s ; y n ) ≤ n γ , and continue as follows. P ( D ∩ A c | M = = X ( s , x n , y n ) ∈ B p S , Y n ( s , y n ) q s ( x n ) ( a ) ≤ X ( s , x n , y n ) ∈ B n γ p S ( s ) t n ( y n ) q s ( x n ) ( b ) ≤ X ( s , x n , y n ) ∈ S × T ( n ) ǫ n γ p S ( s ) t n ( y n ) q s ( x n ) ( c ) = X ( x n , y n ) ∈ T ( n ) ǫ n γ t n ( y n ) q n ( x n ) ( d ) ˙ = − n ( I ( X ; Y ) − γ ) where (a) follows by the fact that, for any ( s , x n , y n ) ∈ B , p S , Y n ( s , y n ) ≤ n γ p S ( s ) t n ( y n ), (b) by extending the range ofthe sum from B to the larger set S × T ( n ) ǫ , (c) by carryingout the sum over s ∈ S , and (d) by the joint typicalitylemma [1, p. 43]. Collecting the results, we have the bound P ( E | M = ≤ P ( A ) + n ( R − I ( X ; Y ) + γ ) .To keep the upperbound on P ( E | M =

1) under control,we need a large enough γ so that P ( A ) is small, but alsoa rate R smaller than I ( X ; Y ) − γ . These two conﬂictingobjectives put into evidence that there is a trade-off be-tween performance and structure. For a more quantitativeasymptotic statement, consider a sequence of ensembles{( S n , C n )} with each ensemble in the sequence having thesame code rate R . Let P e , n denote the probability of errorfor the n th ensemble. Let γ ∗ = inf ½ γ : limsup n →∞ P ¡ i ( S n ; Y n ) > n γ ¢ = ¾ . (15) Then, P e , n goes to zero if R < I ( X ; Y ) − γ ∗ . If the sequence{( S n , C n )} has a convergence property such aslimsup n →∞ © P ¡ | i ( S n ; Y n ) − I ( S n ; Y n ) | ≥ n ǫ ¢ª = ǫ >

0, then we may take γ ∗ = limsup n →∞ ½ n I ( S n ; Y n ) ¾ . (16)In any case, it is apparent that the cost of placing con-straints on the code is a rate penalty given by γ ∗ . Wesummarize the above discussion as follows. Lemma 1.

Let {( S n , C n )} be a sequence of constrained codeensembles indexed by code length n, with each ensemble inthe sequence deﬁned by (7) and having a common rate R.Let P e , n denote the probability of error for the nth ensemble,under joint typicality decoding. Then, P e , n goes to zero as nincreases provided R < I ( X ; Y ) − γ ∗ where γ ∗ is deﬁned by (15) .C. Parity-check constraints In this part, we continue the above discussion for theimportant special case of parity-check constraints. For sim-plicity, we restrict the discussion to channels with binary in-put alphabets, X = {0,1}. We will identify X with the binaryﬁeld F and use vector space operations over F to deﬁnethe code constraints. The joint ensemble for the system willstill be ( M , S , C , Y n , ˆ M ) with a probability assignment (12),except here we will consider a constraint function φ deﬁnedin terms of a parity-check matrix H ∈ F r × n with r rows and n columns. We leave r as an arbitrary parameter, 0 ≤ r ≤ n ,through the following analysis and discuss its effect on theresults following the analysis. We take the constraint set as S = F r and for any ( s , x n ) ∈ S × X n deﬁne the constraintfunction as φ ( s , x n ) = (

1, if s = x n H T ,0, otherwise. (17)Note that φ is symmetric with α s = s ∈ S . Alsonote that φ splits the set X n into cosets K s ∆ = { x n ∈ X n : x n H T = s } indexed by s ∈ S . Each coset has | K s | = n − r elements and K s = x ns + K where x ns ∈ K s is a cosetrepresentative for K s and K denotes the coset for s = r . Lemma 2.

Let A be as in (14) with γ = n I ( S ; Y n ) + ǫ forsome ǫ > . Then, for the parity-check code ensemble,P ( A ) ≤ exp µ − n ǫ d ¶ , (18) where d is a constant determined by q ( x ) and w ( y | x ) .Proof: Note that i ( S ; Y n ) = f ( X n , Y n ) where f ( x n , y n ) ∆ = i ( x n H T ; y n ). Writing i ( S ; Y n ) in this way as a function of( X n , Y n ) is useful because the function f is Lipschitz: Let( x n , y n ) ∈ X n × Y n and ( ˜ x n , ˜ y n ) ∈ X n × Y n be any twopoints such that (a) ( x i , y i ) ( ˜ x i , ˜ y i ) for some i ∈ [1 : n ]but ( x j , y j ) = ( ˜ x j , ˜ y j ) for all j i , 1 ≤ j ≤ n , and (b) n ( x n ) w n ( y n | x n ) > q ( ˜ x n ) w n ( ˜ y n | ˜ x n ) >

0. We claimthat ¯¯ f ( x n , y n ) − f ( ˜ x n , ˜ x n ) ¯¯ ≤ d i , (19)for some constant d i that depends only on the distributions q ( x ) and w ( y | x ).Assuming for a moment that the claim (19) is true, thelemma follows from Azuma-Hoeffding inequality, specif-ically, from the form of this inequality as given in [2,Corol. 5.2], with d = n P ni = d i . Therefore, it sufﬁces to proveonly (19), or equivalently,2 − d i ≤ f ( x n , y n ) − f ( ˜ x n , ˜ x n ) ≤ d i .To that end, we write2 f ( x n , y n ) − f ( ˜ x n , ˜ x n ) = µ p S , Y n ( s , y n ) p S , Y n ( ˜ s , ˜ y n ) ¶µ p S ( ˜ s ) p S ( s ) ¶µ p Y n ( ˜ y n ) p Y n ( y n ) ¶ ,where we put for shorthand s ∆ = x n H T , ˜ s ∆ = ˜ x n H T . Using thecoset structure of the constraints, we have p S , Y n ( s , y n ) = X x n ∈ X n p S ( s ) q s ( x n ) w n ( y n | x n ) = X x n ∈ X n φ ( s , x n ) q n ( x n ) w n ( y n | x n ) = X x n ∈ K s q n ( x n ) w n ( y n | x n ) = X x n ∈ K q n ( x n + x n ) w n ( y n | x n + x n ).Thus, we have p S , Y n ( s , y n ) p S , Y n ( ˜ s , ˜ y n ) = P x n ∈ K q n ( x n + x n ) w n ( y n | x n + x n ) P x n ∈ K q n ( ˜ x n + x n ) w n ( ˜ y n | ˜ x n + x n ) .Now, term by term, we have the bound q n ( x n + x n ) w n ( y n | x n + x n ) q n ( ˜ x n + x n ) w n ( ˜ y n | ˜ x n + x n ) = q ( x i + x i ) w ( y i | x i + x i ) q ( ˜ x i + x i ) w ( ˜ y i | ˜ x i + x i ) ≤ β q , w where β q , w ∆ = max © q ( x ) w ( y | x ) : ( x , y ) ∈ supp( q ( x ) w ( y | x )) ª min © q ( x ) w ( y | x ) : ( x , y ) ∈ supp( q ( x ) w ( y | x )) ª ,where “supp” denotes the support of a distribution. So,( β q , w ) − ≤ p S , Y n ( s , y n ) p S , Y n ( ˜ s , ˜ y n ) ≤ β q , w .Using the same type of argument, we get( β q ) − ≤ p S ( ˜ s ) p S ( s ) ≤ β q , ( β t ) − ≤ p Y n ( ˜ y n ) p Y n ( y n ) ≤ β t .where β q is deﬁned as the ratio of max{ q ( x ) : x ∈ supp( q ( x ))}to min{ q ( x ) : x ∈ supp( q ( x ))} and β t as the ratio of max{ t ( y ) : y ∈ supp( t ( y ))} to min{ t ( y ) : y ∈ supp( t ( y ))}. Combiningthese, we obtain the proof of (19) with d i = log ¡ β q , w β q β t ¢ .The lemma follows, with d = ¡ log ¡ β q , w β q β t ¢¢ .This shows that P ( A ) goes to zero exponentially in n regardless of the size (number of rows r ) and form of H ;it should be clear, however, that the speciﬁc form of H affects the rate penalty n I ( S ; Y n ). To gain a more intuitive understanding of this issue, let us interpret I ( S ; Y n ) as theaverage information leaked by the received word Y n aboutthe constraint S in a one shot transmission scenario where acodeword X n satisfying the constraint φ ( S , X n ) = H = I n (the identitymatrix) with I ( S ; Y n ) = I ( X n ; Y n ) = nI ( X ; Y ), correspondingto maximum information leakage. A non-trivial examplein the same vein is Gallager’s proof [3, §3.8] that I ( S ; Y n )is bounded away from zero when H is the parity-checkmatrix of a regular LDPC code of a given rate. At the otherextreme, we have the well-known fact that random parity-check codes achieve capacity, which afortiori implies that I ( S ; Y n ) is typically o ( n ).IV. P OLAR PARITY - CHECK MATRICES

In this part, we apply the results of Sect. III-C to thesituation where H is a parity-check matrix derived frompolar coding and show that there is no rate penalty in thiscase. For brevity, we will refer to parity-check matrices ob-tained from polar coding as “polar parity-check” matrices.We ﬁrst give a brief description of polar codes; for details,we refer to [4]. Let F = £ ¤ and G ℓ = F ⊗ ℓ denote the ℓ thKronecker power of F . Note that G ℓ is an n × n matrixwith n = ℓ and its inverse is itself, G − ℓ = G ℓ . Polar codesare deﬁned in terms of the mapping x n = u n G ℓ where x n denotes the codeword and u n denotes the source word.In polar coding we “freeze” a certain subset of coordinatesof the source word u n and insert the data payload in theremaining portion of u n . To be speciﬁc, let F ⊂ [1 : n ]denote the indices marking the frozen part of u n and let u F = ( u i : i ∈ F ) denote the frozen part. By convention,we set u F = s for some ﬁxed pattern s ∈ X | F | and keepthis part unchanged from one transmission to next, whilewe leave the other part u F c free. The parity-check matrixfor polar codes can be derived as follows. We begin withthe deﬁnition that a word x n is a polar codeword iff x n = u n G ℓ for some u n with u F = s . Using the inverserelation u n = x n G − ℓ , we obtain that x n is a codeword iff s = ¡ x n G − ℓ ¢ F . Next, we observe that ¡ x n G − ℓ ¢ F = x n ¡ G − ℓ ¢ F where ¡ G − ℓ ¢ F denotes the submatrix of G − ℓ obtained bytaking the columns with indices in F . Thus, we obtain aparity-check matrix for polar codes, namely, H = ¡¡ G − ℓ ¢ F ¢ T . (20)Now, we consider Lemma 2 in connection with an ensemble( S , X n , Y n ) based on a polar parity-check matrix. We annexto this ensemble the random vector U n ∆ = X n G − ℓ thatcorresponds to the source word in polar coding so that wehave the relation S = ¡ X n G − ℓ ¢ F = U F .e wish to show that if F is chosen using the usual polarcode design rules, then the rate penalty I ( S ; Y n ) will benegligible. The speciﬁc design rule we use here ﬁxes a β < F = n i ∈ [1 : n ] : H ( U i | Y n , U i − ) > − n β o . (21)Now, by standard facts about the entropy function, we have I ( U F ; Y n ) ( a ) = X i ∈ F I ( U i ; Y n | U F i − ) = X i ∈ F [ H ( U i | U F i − ) − H ( U i | Y n , U F i − )] ≤ X i ∈ F [1 − H ( U i | Y n , U i − )] ( b ) ≤ | M | + X i ∈ H − n β ( c ) ≤ o ( n ) + n − n β = o ( n )where in (a) we deﬁned F i − ∆ = { j ∈ F : j ≤ i − F into M = n i ∈ [1 : n ] : 2 − n β < H ( U i | Y n U i − ) ≤ − − n β o and H = n i ∈ [1 : n ] : H ( U i | Y n U i − ) > − − n β o ,and in (c) used polarization results [5] to write the bound | M | = o ( n ). Thus, by Lemma 1 and Lemma 2, we concludethat the rate penalty I ( S ; Y n ) is o ( n ) and I ( X ; Y ) is achiev-able using the polar parity-check ensemble.The number of constraints imposed by polar parity-checks is | F | , which is nH ( X | Y ) + o ( n ) [5]. The dimen-sionality of the ensemble X n is reduced from nH ( X ) + o ( n )to nI ( X ; Y ) + o ( n ) by the polar parity-checks; this is thesmallest possible dimensionality (to order O ( n )) for anensemble that achieves I ( X ; Y ).We refrained from calling the codes generated underpolar parity-checks “polar codes” because there are majordifferences between the two classes of codes. To discussthis further, let us refer to the polar parity-check codes ofthis paper as PPC codes and reserve the term “polar code”for ordinary polar codes as deﬁned in [4]. The results ofthis paper establish that PPC codes achieve I ( X ; Y ) with aprobability of error that goes to zero exponentially in n ,while for polar codes that exponent is not better than p n even under ML decoding. The p n exponent arises fromthe fact that the minimum distance of a code generatedby a submatrix of G ℓ cannot have a minimum distancebetter than O ( p n ) for any ﬁxed non-zero code rate. Itmust be that on average PPC codes have a minimumdistance proportional to n ; otherwise, their error exponentwould not be proportional to n . This signiﬁcant increase inminimum distance can be attributed to random selectionof codewords; a PPC code may be seen as an expurgatedpolar code. The expurgation removes the defects in thepolar code; but it also destroys the linear structure in thecode. In standard polar coding, the mapping from messages to codewords is a linear relation of the form x n = u n G ℓ ,which can be implemented in complexity O ( n log( n )). Un-der PPC coding, there is no linear relationship of this typebetween data bits and codewords; hence, one can no longerclaim that the encoding complexity is O ( n log( n )). Thus,PPC codes show a gain in performance at the expense ofgiving up the low-complexity encoding properties of polarcodes. Clearly, similar remarks apply to the complexity ofdecoding.For PPC codes, achieving I ( X ; Y ) under an arbitrarytarget distribution q ( x ) is no different than achieving itunder a uniform q ( x ). With polar codes, achieving I ( X ; Y )for a non-uniform q ( x ) is not a straightforward task; itrequires extension of the standard method and employingcommon randomness between the encoder and decoderin order to shape the channel input distribution [6]. WithPPC codes, the shaping is built into the code selectionprocedure. V. S UMMARY

The main motivation for this work has been to develop apacking lemma for polar codes that would enable trans-lation of proofs by standard packing lemmas to similarresults for polar coding. More work needs to be done toaccomplish this broader goal. The main contribution ofthe paper has been the development of a technique foranalyzing the performance of a random code ensembledeﬁned by a ﬁxed parity-check matrix. In this sense, theresults may have relevance to a broader class of codes thanpolar codes. An interesting observation in the paper hasbeen that the polar parity-check ensemble shows markedlybetter performance than the standard polar code of thesame size. A better understanding of this phenomenon maybe useful in designing better polar codes.A

CKNOWLEDGMENT

This work was supported in part by the European Com-mission in the framework of the FP7 Network of Excel-lence in Wireless COMmunications NEWCOM

EFERENCES[1] A. El Gamal and Y.-H. Kim,

NetworkInformationTheory.

CambridgeUniversity Press, 2011.[2] D. Dubhashi and A. Panconesi,

Concentration of Measure for theAnalysisofRandomisedAlgorithms.

Cambridge University Press, 2009.[3] R. G. Gallager,

Low Density Parity Check Codes.

Monograph, M.I.T.Press, 1963.[4] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”

IEEE Trans. Inform. Theory ,vol. 55, pp. 3051–3073, July 2009.[5] E. Arıkan, “Source polarization,” in

Proc. 2010 IEEE Int. Symp. Inform.Theory , (Austin, TX), pp. 899-903, 13-18 June 2010.[6] J. Honda and H. Yamamoto, “Polar coding without alphabet extensionfor asymmetric channels,” in 2012 IEEE International Symposium on, (Austin, TX), pp. 899-903, 13-18 June 2010.[6] J. Honda and H. Yamamoto, “Polar coding without alphabet extensionfor asymmetric channels,” in 2012 IEEE International Symposium on