[PDF] A Deterministic Algorithm for Computing the Weight Distribution of Polar Codes

Abstract

We present a deterministic algorithm for computing the entire weight distribution of polar codes. As the first step, we derive an efficient recursive procedure to compute the weight distributions that arise in successive cancellation decoding of polar codes along any decoding path. This solves the open problem recently posed by Polyanskaya, Davletshin, and Polyanskii. Using this recursive procedure, we can compute the entire weight distribution of certain polar cosets in time O(n^2). Any polar code can be represented as a disjoint union of such cosets; moreover, this representation extends to polar codes with dynamically frozen bits. This implies that our methods can be also used to compute the weight distribution of polar codes with CRC precoding, of polarization-adjusted convolutional (PAC) codes and, in fact, general linear codes. However, the number of polar cosets in such representation scales exponentially with a parameter introduced herein, which we call the mixing factor. To reduce the exponential complexity of our algorithm, we make use of the fact that polar codes have a large automorphism group, which includes the lower-triangular affine group LTA(m,2). We prove that LTA(m,2) acts transitively on certain sets of monomials, thereby drastically reducing the number of polar cosets we need to evaluate. This complexity reduction makes it possible to compute the weight distribution of any polar code of length up to n=128.

Full PDF

aa r X i v : . [ c s . I T ] F e b A Deterministic Algorithm for Computing theWeight Distribution of Polar Codes

Hanwen Yao

University of California San Diego9500 Gilman Drive, La Jolla, CA 92093 [email protected]

Arman Fazeli

University of California San Diego9500 Gilman Drive, La Jolla, CA 92093 [email protected]

Alexander Vardy

University of California San Diego9500 Gilman Drive, La Jolla, CA 92093 [email protected]

February 16, 2021

Abstract

We present a deterministic algorithm for computing the entire weight distribution of polar codes. Asthe ﬁrst step, we derive an efﬁcient recursive procedure to compute the weight distributions that arise insuccessive cancellation decoding of polar codes along any decoding path . This solves the open problem re-cently posed by Polyanskaya, Davletshin, and Polyanskii. Using this recursive procedure, we can computethe entire weight distribution of certain polar cosets in time O ( n ) . Any polar code can be represented as adisjoint union of such cosets; moreover, this representation extends to polar codes with dynamically frozenbits. This implies that our methods can be also used to compute the weight distribution of polar codeswith CRC precoding, of polarization-adjusted convolutional (PAC) codes and, in fact, general linear codes.However, the number of polar cosets in such representation scales exponentially with a parameter intro-duced herein, which we call the mixing factor . To reduce the exponential complexity of our algorithm, wemake use of the fact that polar codes have a large automorphism group, which includes the lower-triangularafﬁne group LTA ( m , 2 ) . We prove that LTA ( m , 2 ) acts transitively on certain sets of monomials, therebydrastically reducing the number of polar cosets we need to evaluate. This complexity reduction makes itpossible to compute the weight distribution of any polar code of length up to n = . The weight distribution of an error correction code counts the number of codewords in this code of any givenweights. The weight distribution is one of the main characteristic of a code, and it plays a signiﬁcant role indetermining the capabilities of error detection and correction of a given code.Polar coding, pioneered by Arıkan [1], gives rise to the ﬁrst explicit family of codes that provably achievecapacity with efﬁcient encoding and decoding for a wide range of channels. Since it’s invention, the interestand research effort on polar codes has been constantly rising in academia and industry in the past decade. Nowpolar codes have rich application in wireless communication, storage system, etc., and they have been adopted1s part of the the th generation wireless systems (5G) standard. Understanding the weight distribution of polarcodes thus has great importance in both theoretical and practical aspects. There are many prior attempts towards the weight distribution of polar codes. In [2], the authors provide anexplicit formula for the number of codewords of minimal weight in polar codes. In [3], the authors propose away to search for low weight codewords of polar codes by transmitting an all-zero codeword through a highSNR AWGN channel in simulation, and decode the received word using successive cancellation list (SCL)decoding with a huge list size. The authors in [4] improve this approach in terms of its memory usage. In [5],a probabilistic computation method is proposed to estimate the weight distribution of polar code. This methodis later improved in [6] in both accuracy and complexity. We remark that besides the results in [2], all theaforementioned approaches in the literature are non-deterministic, and they only provide an estimate on theweight distribution of polar codes. Also, in [3] and [4], only part of the weight distribution can be derived.

In this paper, we present a deterministic algorithm for computing the entire weight distribution of polar codes.Our algorithm is based on an efﬁcient recursive procedure to compute the weight enumerating function ofcertain polar cosets to be deﬁned later, that arise in successive cancellation decoding. In a prior work byPolyanskaya, Davletshin, and Polyanskii [7], they introduce an algorithm that computes the weight distribu-tion of these polar cosets along the all-zero decoding path. And how to compute the weight distribution of polarcosets along any decoding path remains open. In this work, we solve this problem by establishing a recursiverelation followed by the weight enumerating functions (WEF) of those cosets. Our recursive computation pro-cedure has two applications: computing the entire weight distribution of polar codes; analysing the successivecancellation (SC) decoding performance as in [7]. According to [7], deriving the weight distributions for po-lar cosets along any decoding path would be helpful in analyzing the performance of SCL decoding for polarcodes.To compute the entire weight distribution of a polar code, we ﬁrst represent the code as a disjoint union ofcertain polar cosets, and then obtain the WEF of the entire code as the sum of the WEF of those cosets. Thisrepresentation extends to polar codes with dynamically frozen bits. This implies our method can be used tocompute the weight distribution of polar codes with CRC precoding [8], of polarization-adjusted convolutional(PAC) codes [9], etc. Since any binary linear codes can be represented as polar codes with dynamically frozenbits [10], our algorithm applies to general linear codes as well. However, the number of polar cosets in thisrepresentation scales exponentially with a code parameter that we refer as the mixing factor . In this paper, wewill have a discussion on the mixing factors of polar codes.Our algorithm works for polar codes in a general setting, where we are allowed to select any subsets ofrows in the polar transformation matrix. In a more restricted deﬁnition of polar codes, where we only selectthe bit channels with the smallest Bhattachary parameters, we can reduce the exponential complexity of ouralgorithm using automorphism group of polar codes. Polar codes as decreasing monomial codes [2] have alarge automorphism group, which includes the lower triangular afﬁne group LTA ( m , 2 ) [2]. We prove thatLTA ( m , 2 ) acts transitively on certain sets of monomials, which allows us to drastically reduce the number ofcosets we need to evaluate. This complexity reduction makes it possible to compute the weight distributionof any polar codes up to length 128. In particular, since Reed-Muller codes can also be viewed as decreasingmonomial codes, our complexity reduction applies to Reed-Muller codes as well. This enables our algorithm tocompute the entire weight distribution of Reed-Muller codes for all rates and length up to 128 with reasonablecomplexity. 2 .3 Notations We use the following notation conventions throughout the paper. We use bold letters like u to denote vec-tors, and nonbold letters like u i to denote symbols within that vector. We use u i to represent ( u , u , · · · , u i ) ,a subvector of u consisting its ﬁrst ( i + ) symbols. Also, we use u even and u odd to denote the subvectors ( u , u , · · · ) and ( u , u , · · · ) of u with only even indices and only odd indices respectively, and we use u i , even and u i , odd to denote the subvectors of u i with only even indices and only odd indices respectively. We use ψ todenote the empty vector. In this Section, we give a brief review on polar codes, and then deﬁne polar cosets and their weight enumerat-ing functions (WEF). For the details of polar codes, we refer the readers to Arıkan’s seminal paper [1].Assuming n = m , an ( n , k ) polar code is a binary linear block code generated by k rows of the polartransformation matrix G n = B n K ⊗ m , where K = (cid:20) (cid:21) , K ⊗ m is the n -th Kronecker power of K , and B n is an n × n bit-reversal permutation matrix. We denote by u the information vector of length n , by c = u G n the codeword for transmission, and by y the length- n receivedword. Let W : {

0, 1 } → Y be a binary memoryless symmetric (BMS) channel, characterized in terms ofits transition probabilities W ( y | x ) for all x ∈ {

0, 1 } and y ∈ Y . Let W n : {

0, 1 } n → Y n be the channelcorresponding to n independent uses of W , the i -th bit channel W ( i ) n : {

0, 1 } → {

0, 1 } i − × Y n for i n − can be deﬁned as W ( i ) n ( y , u i − | u i ) = n − ∑ u ′ ∈{ } n − i W n ( y | ( u i − , u i , u ′ ) G n ) (1)where ( u i − , u i , u ′ ) means concatenation of u i − , u i , and u ′ . Those bit channels follow a recursive relation asshown in [1, Equations (22) and (23)] with different notations. We can deﬁne polar cosets associated with thepolar transformation matrix G n as follows: Deﬁnition 1.

Let u i − ∈ {

0, 1 } i − and u i ∈ {

0, 1 } , we deﬁne the polar coset C ( i ) n ( u i − , u i ) given u i − and u i as C ( i ) n ( u i − , u i ) = n ( u i − , u i , u ′ ) G n | u ′ ∈ {

0, 1 } n − i o where when i = , u i − is the empty vector ψ . The reason that we use the notation ( u i − , u i , u ′ ) instead of ( u i , u ′ ) will be clear from a following proposition(Proposition 1). For the i -th bit channel W ( i ) n , (1) can also be written in an equivalent form using polar coset as W ( i ) n ( y , u i − | u i ) = n − ∑ c ∈ C ( i ) n ( u i − , u i ) W n ( y | c ) Next, we deﬁne the WEFs for the polar cosets. 3 eﬁnition 2.

Let u i − ∈ {

0, 1 } i − and u i ∈ {

0, 1 } , deﬁne A ( i ) n ( u i − , u i )( X ) as the weight enumeratingfunction for C ( i ) n ( u i − , u i ) to be the polynomial: A ( i ) n ( u i − , u i )( X ) = n ∑ w = A w X w , where A w is the number of words in C ( i ) n ( u i − , u i ) with Hamming weight d . Later, we will show that computing these coset WEFs is fundamental in our computation on the entire weightdistribution of polar codes. In the next section, we present a recursive procedure that computes A ( i ) n ( u i − , u i )( X ) for any u i − and u i . In this Section, we present one of the key results of this paper: a recursive procedure that computes the cosetWEF A ( i ) n ( u i − , u i )( X ) for any u i − and u i . Recently in [7], the authors introduce an algorithm that computesthe weight distribution of polar coset C ( i ) n ( u i − , u i ) with u i − = . And how to efﬁciently compute the weightdistribution for C ( i ) n ( u i − , u i ) with any u i − remains open. In this section, we present a recursive computationprocedure that solves this problem. This procedure is based on a recursive relation shown in the followingproposition. Proposition 1.

For any m > , n = m , and i n − , A ( i ) n ( u i − , u i )( X ) = ∑ u i + ∈{ } A ( i ) n ( u i − even ⊕ u i − odd , u i ⊕ u i + )( X ) · A ( i ) n ( u i − odd , u i + )( X ) , (2) and A ( i + ) n ( u i , u i + )( X ) = A ( i ) n ( u i − even ⊕ u i − odd , u i ⊕ u i + )( X ) · A ( i ) n ( u i − odd , u i + )( X ) . (3) Proof.

Let m > and n = m . First, for any u ∈ {

0, 1 } n we have u · G n = ( u · B n ) K ⊗ ( m + ) = ( u even · B n , u odd · B n ) (cid:20) K ⊗ m K ⊗ m K ⊗ m (cid:21) = (( u even ⊕ u odd ) · B n K ⊗ m , u odd · B n K ⊗ m )= (( u even ⊕ u odd ) · G n , u odd · G n ) Then, according to Deﬁnition 1, C ( i ) n ( u i − , u i ) = n ( u i − , u i , u ′ ) · G n | u ′ ∈ {

0, 1 } n − i o In this expression, we can write ( u i − , u i , u ′ ) · G n = (cid:16) ( u i − even ⊕ u i − odd , u i ⊕ u i + , u ′ even ⊕ u ′ odd ) · G n , ( u i − odd , u i + , u ′ odd ) · G n ) (cid:17) lgorithm 1: CalcA( n , u i − ) Input: block length n and vector u i − Output: a pair of polynomials (cid:16) A ( i ) n ( u i − , 0 )( X ) , A ( i ) n ( u i − , 1 )( X ) (cid:17) if n = then // Stopping condition return ( X ) else if i mod 2 = then ( f , f ) ← CalcA( n /2, u i − even ⊕ u i − odd ) ( g , g ) ← CalcA( n /2, u i − odd ) return ( f g + f g , f g + f g ) ; // Use Equation (2) else ( f , f ) ← CalcA( n /2, u i − even ⊕ u i − odd ) ( g , g ) ← CalcA( n /2, u i − odd ) if u i − = then return ( f g , f g ) ; // Use Equation (3) else return ( f g , f g ) ; // Use Equation (3) Since when u ′ odd ranges over {

0, 1 } n − i , u ′ even ⊕ u ′ odd also ranges over {

0, 1 } n − i , we have C ( i ) n ( u i − , u i ) = [ u i + ∈{ } n ( c , c ) | c ∈ C ( i ) n ( u i − even ⊕ u i − odd , u i ⊕ u i + ) , c ∈ C ( i ) n ( u i − odd , u i + ) o Therefore, C ( i ) n ( u i − , u i ) can be expressed as the union of two concatenations of two polar cosets respec-tively. Since WEF of the concatenation of two polar cosets equals the product of their respective WEFs, wehave A ( i ) n ( u i − , u i )( X ) = ∑ u i + ∈{ } A ( i ) n ( u i − even ⊕ u i − odd , u i ⊕ u i + )( X ) · A ( i ) n ( u i − odd , u i + )( X ) , And that’s (2). (3) can be proved in a similar way.Notice this recursive relation followed by the coset WEFs is very similar to the recursive relation followed bythe bit channels shown in [1, Equations (22) and (23)] with different notations.Using (2) and (3) in Proposition 1, we can compute the WEF A ( i ) n ( u i − , u i )( X ) of any polar coset recur-sively with the stopping conditions A ( i ) ( ψ , 0 ) = A ( i ) ( ψ , 1 ) = X . (4)The steps of this recursive procedure are shown in Algorithm 1.Next we analyze its complexity. Denote by T ( n ) the run time for Algorithm 1, where n is the block length.Notice n is also the maximal degree of the polynomials f , f , g , g in the algorithm. For every recurrence, thecomputation is divided into two recursive calls on the same algorithm, each with half of the parameter n . And5here are up to three extra polynomial operations including addition and multiplication. Assume multiplicationof two degree- n polynomials takes time O ( n ) , the recurrence relation shows T ( n ) = T ( n /2 ) + O ( n ) ,which by the Master theorem [11] gives us T ( n ) = O ( n ) . So Algorithm 1 has complexity O ( n ) . Thiscomplexity may be improved assuming multiplication of two degree- n polynomials takes time O ( n log n ) with the Fast-Fourier Transform. In this Section, we introduce our main deterministic algorithm that computes the entire weight distribution ofpolar codes, and polar codes with dynamically frozen bits.

First we show that any polar code can be represented as a disjoint union of polar cosets. This allows us toobtain the WEF of the entire code as the sum of the WEF for those cosets.

Proposition 2.

For any polar code C with at least one frozen bit, denote its last frozen bit by u s and denotethe unfrozen bits before u s the red bits. Then C can be represented as a disjoint union of the following polarcosets: C = [ u i red : u i ∈{ } u i frozen : u i = C ( s ) n ( u s − , u s = ) (5) Denote by A C ( X ) the WEF of the entire code C , then A C ( X ) is the sum of the WEF for those cosets: A C ( X ) = [ u i red : u i ∈{ } u i frozen : u i = A ( s ) n ( u s − , 0 )( X ) (6)We illustrate Proposition 2 with the following example. The proof for general polar codes follows naturally. Example 1.

The (16,11,4) extended Hamming code C can be generated by rows in the polar transformationmatrix G . So we can also view C as a polar code of length 16. The polar transformation matrix G is givenby u u u u u u u u u u u u u u u u   (7)In (7), the information bits in black are frozen, and the information bits highlighted in red and blue are un-frozen. Here the last unfrozen bit is u . We call the unfrozen bits before u the red bits and call the unfrozenbits after u the blue bits. 6 lgorithm 2: Compute the WEF of polar codes, or polar codes with dynamically frozen bits

Input: block length n and the dynamic constraints for frozen bits Output:

WEF A C ( X ) s ← index of the last frozen bit A C ( X ) ← for u i red: u i ∈ {

0, 1 } & u i frozen: u i follows the dynamic constraints do ( f , f ) ← CalcA( n , u s − ) ; // Use Algorithm 1 to compute the coset WEF if u s = then A C ( X ) ← A C ( X ) + f else A C ( X ) ← A C ( X ) + f return A C ( X ) For a vector u with u = u = u = u = , and u , u , u , u taking some values in {

0, 1 } , the polarcoset C ( ) ( u , u = ) is a subset of C . There are = such disjoint polar cosets, and code C is theirunion: C = [ u , u , u , u ∈{ } u = u = u = u = C ( ) ( u , u = ) (8)Therefore, we can compute the WEF A C ( X ) for the entire code C as the sum of the WEFs for those cosets: A C ( X ) = [ u , u , u , u ∈{ } u = u = u = u = A ( ) ( u , 0 )( X ) (9) From Proposition 2, any polar code can be represented as a disjoint union of polar cosets. This representationextends to polar codes with dynamically frozen bits . Polar codes with dynamically frozen bits, ﬁrst introducedin [12], are polar codes where each of the frozen bit u i is not ﬁxed to be zero, but set to be a boolean function(usually, a linear function) of its previous bits as u i = f ( u , u , · · · , u i − ) . We refer the collection of thesefunctions for the frozen bits as the dynamic constraints of this polar code. Polar codes with dynamically frozenbits include polar codes with CRC precoding [8,13], polarization-adjusted convolutional (PAC) codes [9], etc.In fact, since any binary linear codes can be represented as polar codes with dynamically frozen bits [10], ouralgorithm extends to all binary linear codes as well.Proposition 2 extends to polar codes with dynamically frozen bits as follows: Proposition 3.

For any polar code C with dynamically frozen bits, denote its last frozen bit by u s and denotethe unfrozen bits before u s the red bits. Then C can be represented as a disjoint union of the following polarcosets: C = [ u i red : u i ∈{ } u i frozen : u i follows the dynamic constraints C ( s ) n ( u s − , u s = ) (10) Denote by A C ( X ) the WEF of the entire code C , then A C ( X ) is the sum of the WEF for those cosets: A C ( X ) = [ u i red : u i ∈{ } u i frozen : u i follows the dynamic constraints A ( s ) n ( u s − , 0 )( X ) (11)7herefore, by representing any polar code, and any polar code with dynamically frozen bits as a disjoint unionof polar cosets, we can obtain its WEF as sum of the WEFs for those cosets. The WEF for each polar coset canbe computed with Algorithm 1. This procedure is summarized in Algorithm 2. In Algorithm 2, the number ofcoset WEFs we need to evaluate depends on the number of red bits. We deﬁne the number of red bits as the mixing factor of the code as follows: Deﬁnition 3.

For any polar code, and any polar code with dynamically frozen bits, deﬁne the number of un-frozen bits before the last frozen bit as the mixingfactor of this code, denoted as γ . In Algorithm 2, the number of coset WEFs we need to compute equals γ . So Algorithm 2 has complexity O ( γ n ) . We can see that for any polar code, the complexity of Algorithm 2 is largely governed by the mixingfactor of the code.We remark that although any binary linear codes can be represented as polar codes with dynamically frozenbits, many such representations have a large mixing factor γ , making Algorithm 2 less practical for a relativelarge block length. In Section 2, we introduce ( n , k ) polar codes in a broad sense, where we can pick any k rows in the polar trans-formation matrix G n as generators for the code. If we follow a more restricted deﬁnition, where we pick the k bit-channels having the smallest Bhattachary parameters the same as Arıkan’s deﬁnition in [1], the constructedpolar code becomes a decreasing monomial code as introduced in [2]. In this Section, we present results onthe largest mixing factor polar codes can have at each block length as decreasing monomial codes.We ﬁrst recast the deﬁnition for decreasing monomial codes. For details of decreasing monomial codes andtheir algebraic properties, we refer the readers to the ground-breaking paper [2] by Bardet, Dragoi, Otmani,and Tillich.For n = m , deﬁne the polynomial ring R m = F [ x , · · · , x m − ] / ( x − x , · · · , x m − − x m − ) . We can associate each polynomial g ∈ R m by a binary vector in F n as the evaluation of g in all the binaryentries b = ( b , · · · , b m − ) ∈ F m . In other words, we associate polynomial g with ev ( g ) = ( g ( b )) b ∈ F m where ev : R m → F n is a homomorphism. For the evaluation, we pick the order of b ∈ F m such that thenumber ∑ m − i = ( − b i ) m − − i are in natural order from 0 to m − . The order of b in our evaluation is differentfrom that in [2], since in our paper, we are multiplying the information vector with the polar transformationmatrix G n , while in [2], they use a different matrix.Denote the monomials in R m as M m = (cid:8) x g · · · x g m − m − | ( g , · · · , g m − ) ∈ F m (cid:9) . The monomial codes are deﬁned as follows:

Deﬁnition 4.

Let n = m and I ∈ M m , the monomial code C ( I ) generated by I is the linear space spannedby { ev ( f ) : f ∈ I } . Since all rows in the polar transformation matrix G n can be obtained as ev ( g ) with g ∈ M m , polar codes canbe viewed as monomial codes. Consider the binary expansion of each row index from 0 to m − , the rowwith index ∑ m − i = b i i in G n can be obtained as ev ( x ( − b m − ) x ( − b m − ) · · · x b m − ) . For the rest of this paper, wewill also use monomials to represent the rows in G n . And we refer the monomials in G n corresponding to thered bits of a code as the red monomials .For the monomials in M m , we can deﬁne the following partial order as in [2]:8 eﬁnition 5. Two monomials in M m of the same degree are ordered as x i · · · x i t (cid:22) x j · · · x j t if and only if i ℓ j ℓ holds for any ℓ ∈ { · · · , t } , assuming i < · · · < i t and j < · · · < j t . Two monomials f , g ∈ M m of different degrees are ordered as f (cid:22) g if there is a divisor g ∗ of g having the same degree as f , and f (cid:22) g ∗ . The decreasing monomial codes respecting this order can be deﬁned as:

Deﬁnition 6.

A set

I ∈ M m is decreasing, if ( g ∈ I and f (cid:22) g ) implies f ∈ I . We call the monomial codegenerated by a decreasing set a decreasing monomial code. For the connection between (cid:22) and the indices of monomials in G n , we state the following lemma withoutproof. Lemma 1.

Let n = m , and f , g ∈ M m . If f (cid:22) g , then index of f must be larger than the index of g in G n . To compute the largest mixing factor for decreasing monomial codes at a given length, we state the followingproposition.

Proposition 4.

Let C be a length- n decreasing monomial code with its last frozen row in G n be the monomial τ , then the largest mixing factor C can have equals |{ g lies above τ in G n | g and τ are incomparable with respect to (cid:22)}| (12) Proof.

For any decreasing monomial code with last frozen row τ , we ﬁrst prove that (12) is an upper boundon its mixing factor. For all rows g lying above τ in G n , by Lemma 1 we know either τ (cid:22) g , or τ and g are incomparable. If τ (cid:22) g then g must be frozen, since otherwise by the deﬁnition of decreasing monomialcodes, g ∈ I implies τ ∈ I , which contradict that τ is frozen. So if g is an unfrozen row lying above τ , g and τ must be incomparable. Therefore, (12) is an upper bound on the mixing factors of all decreasing monomialcodes with last frozen row τ .Next, we construct a decreasing monomial code that has mixing factor (12) with the following generatingset: I = { g lies above τ in G n | g and τ are incomparable with respect to (cid:22)} ∪ { g lies below τ in G n } We prove that I is a decreasing set as follows. First, let f ∈ I be a monomial lying above τ in G n . We needto prove that for any g with g (cid:22) f , we have g ∈ I . If g lies below τ , then we are done. Otherwise, we need toprove that g and τ are incomparable. We prove this with contradiction. If g and τ are comparable with respectto (cid:22) , then from Lemma 1 we must have τ (cid:22) g . But that with g (cid:22) f gives us τ (cid:22) f , which contradicts f ∈ I .So g and τ must be incomparable, which means g ∈ I .Next, let f ∈ I be a monomial lying below τ in G n . We also need to prove that for any g with g (cid:22) f , wehave g ∈ I . Since by Lemma 1, such g must lie below f , so g also lies below τ in G n , which gives us g ∈ I .Therefore we ﬁnish this proof.Note that for any two monomials f , g ∈ M m , with complexity O ( log n ) , we can decide whether f (cid:22) g , g (cid:22) f , or f and g are incomparable with respect to (cid:22) . The steps for the comparison are shown in Algorithm 3.Then using Proposition 4, we can compute the largest mixing factor for decreasing monomial codes at a givenlength n as the maximum of (12) over all τ in G n . This computation has complexity O ( n log n ) at length n .The results for block length up to 1024 are shown in Table 1.Since by the MacWilliams identity [14], one can easily obtain the weight distribution of a code from theweight distribution of its dual, to get a better complexity cap, we can further restrict our searching space tocodes with rates at most 1/2. 9 lgorithm 3: Compare two different monomials with respect to the partial order (cid:22)

Input: monomials f = x i · · · x i s and g = x j · · · x j t in M m ,assuming i < · · · < i s and j < · · · < j t Output: the order of f and g with repect to (cid:22) if s > t then // either g (cid:22) f or incomparable for k = · · · , t do if i k < j k then return f and g are incomparable return g (cid:22) f else if s < k then // either f (cid:22) g or incomparable for k = · · · , s do if i k > j k then return f and g are incomparable return f (cid:22) g else // f and g has the same degree r ← smallest integer that i r = j r if i r > j r then // either g (cid:22) f or incomparable for k = r + · · · , s do if i k < j k then return f and g are incomparable return g (cid:22) f else // either f (cid:22) g or incomparable for k = r + · · · , s do if i k > j k then return f and g are incomparable return f (cid:22) g log ( n ) γ Table1. largest mixing factors of decreasing monomial codes at each length

Proposition 5.

Let C be a length- n decreasing monomial code with rate at most 1/2, and with its last frozenrow in G n be the monomial τ , then an upper bound on the mixing factor of C is the minimum between (12) and ( n /2 + + ( index of τ in G n )) .Proof. With Proposition 4, we only need to show that for any length- n decreasing monomial code with rateat most 1/2, and with its last frozen row in G n be τ , its mixing factor is upper bounded by ( n /2 + +( index of τ in G n )) . To show this, notice that since τ is the last frozen bit, there are ( n − − index of τ in G n ) unfrozen rows lying below τ in G n . Since the code has rate at most 1/2, the number of the rest of the unfrozenrows lying above τ is at most n /2 − ( n − − τ ) = ( n /2 + + ( index of τ in G n )) , and we are done.Using Proposition 5, we can compute an upper bound for the largest mixing factor for decreasing monomialcodes with rates at most 1/2 at length n as the maximum of the upper bounds in Proposition 5 over all τ in G n .10 og ( n ) γ Table2. upper bounds on the largest mixing factor of decreasing monomial codes with rates at most 1/2 at each length

The results for block lengths up to 1024 are shown in Table 2.We observe that at block lengths n =

3, 5, 7, 9 , those upper bounds at Table 2 are met by the rate 1/2 Reed-Muller codes. We thus conjecture that, with the options of applying Algorithm 2 to either the code or its dual,the rate 1/2 Reed-Muller code has the highest complexity among decreasing monomial codes with the samelength in general.

Polar codes as decreasing monomial codes have a large automorphism group that includes the lower triangularafﬁne group LTA ( m , 2 ) [2, Theorem 2]. In this Section, we show that LTA ( m , 2 ) acts transitively on certainsets of monomials, thereby drastically reduce the coset WEFs we need to evaluate in Algorithm 2. Since Reed-Muller codes can also be viewed as decreasing monomial codes, our complexity reduction also applies to Reed-Muller codes. As we will see in Section 7, this complexity reduction makes it possible to compute the weightdistribution of any polar code and Reed-Muller code for all rates of length up to n = .For decreasing monomial codes, we prove two theorems that will help us on the complexity reduction. Theﬁrst theorem states the following: Theorem 1.

Let C ( I ) be a decreasing monomial generated by a decreasing set I ⊆ M m , and C ( I \{ f } ) be a subcode where we only freeze one extra monomial f in I , where f is the ﬁrst unfrozen row in G n , then C ( I \{ f } ) is also a decreasing monomial code.Proof. We prove it by contradiction. Let f be the ﬁrst unfrozen row in G n . If C ( I \{ f } ) is not a decreasingmonomial code, then exists row g ∈ I \{ f } that f (cid:22) g . Then by Lemma 1, the index of f must be larger thanthe index of g in G n , which gives us a contradiction.Next, we go into the second theorem that involves LTA ( m , 2 ) . ( m , 2 ) Acts Transitively on Certain Subsets of Decreasing Monomial Codes

First, we deﬁne the lower triangular afﬁne group LTA ( m , 2 ) for the decreasing monomial codes as follows: Deﬁnition 7.

The lower triangular afﬁne group over F m , denoted as LTA ( m , 2 ) , consists of all afﬁne transfor-mations over F m with the form x A x + b , where A ∈ F m × m is an m × m lower triangular binary matrixwith all-one diagonal line, and b ∈ F m . In LTA ( m , 2 ) , we can deﬁne its subgroup with respect to a monomial g ∈ M m as in [2] as follows: Deﬁnition 8.

For any g ∈ M m , we deﬁne LTA ( m , 2 ) g as the subgroup of LTA ( m , 2 ) , such that for any ( A , b ) ∈ LTA ( m , 2 ) g with A = ( a ij ) , we have b i = if i ind ( g ) and a ij = ( if i ind ( g ) if j ∈ ind ( g ) (13)Before we state the second theorem, we deﬁne a new relation for the monomials in M m :11 eﬁnition 9. Two monomials f , g ∈ M m have the relation f (cid:22) s g if one of the following holds:1. f = x i · · · x i t x j and g = x i · · · x i t x k with j < k .2. f = x i · · · x i t and g = x i · · · x i t x k .The “s” in (cid:22) s means f differs from g by a single variable. For the relation between (cid:22) s and (cid:22) , we state the following proposition without proof: Proposition 6.

For any f , g ∈ M m

1. If f (cid:22) s g , then f (cid:22) g .2. If f (cid:22) g , there exists a ﬁnite sequence of polynomials: f = f (cid:22) s f (cid:22) s f (cid:22) s · · · (cid:22) s f t = g Now we state our second theorem involving LTA ( m , 2 ) and its corollary. Theorem 2.

For n = m , let C ( I ) be a decreasing monomial code generated by I ∈ M m , f be a monomialin I and S be a subset S ⊆ { g ∈ M m : g (cid:22) s f } . Consider the set X f , S of all the subsets of C ( I ) , where f is frozen as 1 and all the monomials in S are frozen as either 0 or 1. Then the group action of LTA ( m ) f on X f , S is transitive. Before we state the proof for Theorem 2, there are several remarks we want to make about this theorem:1. The choice of f is arbitrary, but we always pick f as the ﬁrst unfrozen monomial in G n in our algorithm.2. Since I is decreasing, if f ∈ I , then all g with g (cid:22) s f should be in I as well. So S is a subset of I .3. For the subsets in X f , S , after we freeze f as 1, we have |S| options to freeze the monomials in S . So X f , S contains |S| subsets, and the union of them will be a large subset of C ( I ) , where we only freeze f as 1. Therefore C ( I ) = C ( I \{ f } ) ∪  [ X ∈X f , S X  (14)An important Collorary of Theorem 2 is the following: Corollary 1.

All the subsets in X f , S have the same weight distribution.Proof. For any two subsets X , X ∈ X f , S , there exists a permutation in LTA ( m , 2 ) f that transforms X to X .Since permutation of coordinates preserves Hamming weights, X and X have the same weight distribution.From Corollary 1, we know a lot of subsets in decreasing monomial codes, in particular polar codes, share thesame weight distribution. Let’s look at an example. 12 xample 2. Consider the same (16,11,4) extended Hamming code C as in Example 1. It can be viewed as a de-creasing monomial codes generated by rows in the polar transformation matrix G . The monomials associatedwith the rows of G are given by: f →S ( →→→ x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x u u u u u u u u u u u u u u u u   (15)Similar to Example 1, we call the monomials associated with red bits the red monomials, and the monomialsassociated with blue bits the blue monomials,In the generating set I , we can pick the monomial f to be ﬁrst red monomial x x , and pick S to be the set S = { x x , x x , x } as shown in (15). We can check that x x (cid:22) s x x , x x (cid:22) s x x , and x (cid:22) s x x .Deﬁne X f , S as all the subsets of C where we freeze f as 1, and freeze those three monomials in S either as 0or 1. Then X f , S consists of = polar cosets: X f , S = {C ( u , u = ) | u = u = u = u = u = } (16)In this example, f , monomials in S , and all the frozen rows are consecutive in G , so the subsets in X f , S canbe written as polar cosets of G . In general, subsets in X S are not necessarily polar cosets.By Theorem 2, group LTA ( m , 2 ) f acts transitively on all 8 subsets in X f , S , which, by Corollary 1, tells usthat all the subsets in X f , S have the same weight distribution. Therefore in Algorithm 2, among all the polarcosets we need to evaluate, 8 of them in X f , S share the same WEFs. So in our algorithm, we can evaluate asingle subset in X f , S , to get the weight distributions for all subsets in X f , S .Now we state a lemma, and present the proof for Theorem 2 using that lemma. For clear description in theproofs, we denote the set of indices of the variables appearing in a monomial f ∈ M m by ind ( f ) . Lemma 2.

Let f , g , h ∈ M m be three different monomials h (cid:22) s f , then for ( A , b ) ∈ LTA ( m , 2 ) f , the coefﬁ-cient of h in ( A , b ) · g equals zero.Proof. We prove this lemma by contradiction. Since ( A , b ) ∈ LTA ( m , 2 ) f , ﬁrst we can check that ( A , b ) · g can be written in the following way: ( A , b ) · g =  ∏ i ∈ ind ( g ) \ ind ( f ) x i | {z } ( a )  ∏ i ∈ ind ( g ) ∩ ind ( f )  x i + ∑ j < i , j / ∈ ind ( f ) a ij x j + b i | {z } ( b ) (17)Assume the coefﬁcient of h in ( A , b ) · g is not zero. We ﬁrst consider the case that f = x i · · · x i t x s and h = x i · · · x i t x k with k < s . Since i , · · · , i t ∈ ind ( f ) , the only way that we obtain variables x i · · · x i t in h from (17) is that in paranthesis (b), we have i , · · · , i t ∈ ind ( g ) . But then to obtain the last variable x k in13 , we either require ind ( g ) \ ind ( f ) = { k } and ind ( g ) ∩ ind ( f ) = { i , · · · , i t } , which gives us g = h , or werequire ind ( g ) \ ind ( f ) = ∅ and ind ( g ) ∩ ind ( f ) = ind ( f ) , which gives us g = f . Thus both situations giveus a contradiction. The case that f = x i · · · x i t x s and h = x i · · · x i t can be argued impossible in a similarway. proof for Theorem 2. First, to prove that the permutations in LTA ( m , 2 ) f act on the subsets in X f , S as a groupaction, we need to show that for any ( A , b ) ∈ LTA ( m , 2 ) f and X ∈ X f , S , we have ( A , b ) · X ∈ X f , S . Forany subset X ∈ X f , S , assume each h ∈ S is frozen as u h , we can express X as follows: X = f + ∑ h ∈S u h · h + span ( { g ∈ I | g = f , g / ∈ S } ) (18)Then if we let ( A , b ) ∈ LTA ( m , 2 ) f act on X , on the right-hand side of (18): f will become ( A , b ) · f = f + v ,where v is a polynomial in the span of I \{ f } ; each h ∈ S will become ( A , b ) · h = h + v , where v is apolynomial in the span of { g ∈ I | g = f , g / ∈ S } ; span ( { g ∈ I | g = f , g / ∈ S } ) is invariant under ( A , b ) by Lemma 2. Therefore, ( A , b ) · X becomes ( A , b ) · X = f + ∑ h ∈S u ′ h · h + span ( { g ∈ I | g = f , g / ∈ S } ) where u ′ h ’s are some other frozen values for those h ∈ S . This shows that ( A , b ) · X ∈ X f , S , and thus thepermutations in LTA ( m , 2 ) f act on the subsets in X f , S as a group action. Next, we show that this action istransitive.Let X be a subset in X f , S with X = f + span ( { g ∈ I | g = f , g / ∈ S } ) Since LTA ( m ) f is a subgroup, to prove that the group action of LTA ( m ) f on X f , S is transitive, it sufﬁcesto prove that for any X ∈ X f , S , there exists a permutation ( A , b ) ∈ LTA ( m ) f that ( A , b ) · X = X . Usingexpression (17) and Lemma 2, we can check that ( A , b ) · X = f + ∑ h ∈S v h · h + span ( { g ∈ I | g = f , g / ∈ S } ) (19)where v h equals a single entry in ( A , b ) as follows: if f = x i · · · x i t x s and h = x i · · · x i t x k , then v h = a sk ;if f = x i · · · x i t x s and h = x i · · · x i t , then v h = b s . Therefore, for any X shown in (18), if we pick the ( A , b ) ∈ LTA ( m , 2 ) f such that every entries in ( A , b ) that equals v h in (19) equals the frozen value u h in X respectively, then we will have ( A , b ) · X = X . And this completes the proof. Next we use Corollary 1 to reduce the number of cosets we need to evaluate in Algorithm 2. Assume C is adecreasing monomial code generated by I ⊆ M m with positive mixing factor, and f ∈ I is its ﬁrst unfrozenrow in G n . We can select the subset S in Theorem 2 to be all the red monomials g with g (cid:22) s f . Let X f , S be the set of all subsets of C , where f is frozen as 1, and all the monomials in S are frozen as either 0 or 1. X f , S contains |S| subsets, and the union of them becomes a large subset of C , where only f is frozen as 1.Therefore C ( I ) can be represented as the disjoint union: C ( I ) = C ( I \{ f } ) ∪  [ X ∈X f , S X  (20)14 lgorithm 4: Compute the WEF of decreasing monomial codes with reduced complexity

Input: block length n and generating set I for code C Output:

WEF A C ( X ) if C has mixing factor zero then s ← index of the last frozen bit ( f , f ) ← CalcA( n , s − ) return f else f ← ﬁrst unfrozen monomial in G n S ← set of all red monomials g with g (cid:22) s f X ← a subset of C with f frozen as 1, and all monomials in S frozen as zero compute the WEF of C ( I \{ f } ) as B ( X ) recursively with Algorithm 4 itself compute the WEF of X as C ( X ) with Algorithm 2 A C ( X ) = B ( X ) + |S| · C ( X ) return A C ( X ) In this representation, C ( I \{ f } ) is another decreasing monomial code by Theorem 1, which has a mixingfactor less than C by 1. And all the subsets in X f , S share the same weight distribution by Corollary 1. There-fore, after we compute the WEF for C ( I \{ f } ) as B ( X ) , and the WEF for one subset in X f , S as C ( X ) , wecan obtain the WEF A C ( X ) for the entire code C as A C ( X ) = B ( X ) + |S| · C ( X ) This procedure is shown in Algorithm 4, where B ( X ) , the WEF for C ( I \{ f } ) is computed recursively withAlgorithm 4 itself.If we denote by λ f the number of red monomials g lying below f in G n with g s f , then the total num-ber of polar cosets we need to evaluate in Algorithm 4 equals (cid:16) ∑ f red λ f (cid:17) . So Algorithm 4 has complexity O (cid:16)(cid:16) ∑ f red λ f (cid:17) · n (cid:17) . Notice the number of cosets we need evaluate in Algorithm 4 can be easily calculatedfor any decreasing monomial codes. So we again have the option to apply Algorithm 4 to either the code itself,or its dual, whichever has a smaller complexity, holding the MacWilliams identity. In this section, we present the weight distribution of an (128,64) polar code computed with our algorithm, anddiscuss the complexity of our algorithm on the self-dual (128,64) Reed-Muller code.First, we run our algorithms on an (128,64) polar code with mixing factor γ = . This is the same polarcode we are using in [15], with it’s unfrozen indices in G shown in Table 3. For this code, the numberof coset WEFs we need to evaluate in Algorithm 2 equals γ = , and after the complexity reduction, thenumber of coset WEFs we need to evaluate in Algorithm 4 equals ≈ . We see that with thecomplexity reduction introduced in Section 6, we drastically reduce the number of coset WEF we need tocompute in our algorithm. The computed entire weight distribution of this code is shown in Table 3. In [15],we obtain lower-bounds on the number of its low-weight codewords using the method in [3] and present theresults on the ﬁrst row of [15, Table 1]. Those numbers coincide with the weight distribution shown in Table 3.Then we look at the (128,64) Reed-Muller, which attains the mixing factor upper bound at length 128 shownin Table 2. This Reed-Muller code has mixing factor γ = , so to compute its entire weight distribution, the15nfrozen indices in G ( ∼ ) , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , n A n Table3. left: the unfrozen indices of a (128,64) polar code in G ; right: the weight distribution of the code, where theunlisted A n equals to zero number of coset WEFs we need to evaluate in Algorithm 2 equals γ = , and after the complexity reduction,the number of coset WEFs we need to evaluate in Algorithm 4 equals ≈ , which turns outfeasible. Since at length 128, the self-dual (128,64) Reed-Muller code is the unique decreasing monomial codethat attains the mixing factor upper bound shown in Table 2, we expect all other complexities for polar codesand Reed-Muller codes at length 128 to be lower. 16 eferences [1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetricbinary-input memoryless channels,” IEEE Transactions on information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.[2] M. Bardet, V. Dragoi, A. Otmani, and J.-P. Tillich, “Algebraic properties of polar codes from a new poly-nomial formalism,” in . IEEE, 2016,pp. 230–234.[3] B. Li, H. Shen, and D. Tse, “An adaptive successive cancellation list decoder for polar codes with cyclicredundancy check,”

IEEE Communications Letters , vol. 16, no. 12, pp. 2044–2047, 2012.[4] Z. Liu, K. Chen, K. Niu, and Z. He, “Distance spectrum analysis of polar codes,” in . IEEE, 2014, pp. 490–495.[5] M. Valipour and S. Youseﬁ, “On probabilistic weight distribution of polar codes,”

IEEE communicationsletters , vol. 17, no. 11, pp. 2120–2123, 2013.[6] Q. Zhang, A. Liu, and X. Pan, “An enhanced probabilistic computation method for the weight distributionof polar codes,”

IEEE Communications Letters , vol. 21, no. 12, pp. 2562–2565, 2017.[7] R. Polyanskaya, M. Davletshin, and N. Polyanskii, “Weight distributions for successive cancellation de-coding of polar codes,”

IEEE Transactions on Communications , vol. 68, no. 12, pp. 7328–7336, 2020.[8] I. Tal and A. Vardy, “List decoding of polar codes,”

IEEE Transactions on Information Theory , vol. 61,no. 5, pp. 2213–2226, 2015.[9] E. Arıkan, “From sequential decoding to channel polarization and back again,” arXiv preprintarXiv:1908.09594 , 2019.[10] A. Fazeli, A. Vardy, and H. Yao, “Hardness of successive-cancellation decoding of linear codes,” in . IEEE, 2020, pp. 455–460.[11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,

Introduction to algorithms . MIT press, 2009.[12] P. Trifonov and V. Miloslavskaya, “Polar codes with dynamic frozen symbols and their decoding by di-rected search,” in . IEEE, 2013, pp. 1–5.[13] K. Niu and K. Chen, “Crc-aided decoding of polar codes,”

IEEE communications letters , vol. 16, no. 10,pp. 1668–1671, 2012.[14] J. A. MacWilliams, “A theorem on the distribution of weights in a systematic code,”

Bell System TechnicalJournal , vol. 42, no. 1, pp. 79–94, 1963.[15] H. Yao, A. Fazeli, and A. Vardy, “List decoding of arıkan’s pac codes,” in2020 IEEE International Sym-posium on Information Theory (ISIT)