[PDF] Polar Codes with Mixed-Kernels

Abstract

A generalization of the polar coding scheme called mixed-kernels is introduced. This generalization exploits several homogeneous kernels over alphabets of different sizes. An asymptotic analysis of the proposed scheme shows that its polarization properties are strongly related to the ones of the constituent kernels. Simulation of finite length instances of the scheme indicate their advantages both in error correction performance and complexity compared to the known polar coding structures.

Full PDF

aa r X i v : . [ c s . I T ] M a r Polar Codes with Mixed-Kernels

Noam Presman, Ofer Shapira and Simon LitsynSchool of Electrical Engineering, Tel Aviv University, Ramat Aviv 69978 Israel.e-mails: { presmann, ofershap, litsyn } @eng.tau.ac.il. Abstract

A generalization of the polar coding scheme called mixed-kernels is introduced. This generalization ex-ploits several homogeneous kernels over alphabets of diﬀerent sizes. An asymptotic analysis of the proposedscheme shows that its polarization properties are strongly related to the ones of the constituent kernels. Sim-ulation of ﬁnite length instances of the scheme indicate their advantages both in error correction performanceand complexity compared to the known polar coding structures.

Polar codes were introduced by Arikan [1] and provide an error correction scheme for achieving the symmetriccapacity of binary memoryless channels (B-MC) with polynomial encoding and decoding complexity. Originally,Arikan considered binary and linear polar codes that are based on a two dimensional kernel, known as the( u + v, v ) mapping. This mapping is extended to generate arbitrary codes of length N = 2 n bits, by a Kroneckerpower of the generating matrix that deﬁnes the transformation. Multiplying a permutation of an N bits inputvector u by this matrix results in a vector x , that is transmitted over N independent copies of a memorylesschannel, W . As a result, N dependent channels between the components of u and the outputs of the copies ofthe channel W are created. These channels exhibit polarization under successive-cancellation (SC) decoding: as n grows there is a proportion of I ( W ) (the symmetric channel capacity) of the channels that have their capacityapproaching to 1, while the rest of the channels have their capacity approaching to 0.The exponent of the kernel as a measure of the asymptotic rate of polarization for arbitrary binary andlinear polar codes was introduced by Korada et al. [2] and further generalized to arbitrary polarizing kernel byMori and Tanaka [3]. The authors suggested designing binary kernels based on the idea of code decomposition[4]. It was noted in that paper that taking advantage of the explicit (non-binary) code decomposition in orderto construct polar code introduces more ﬂexibility to the design. This however, usually requires utilizing atleast two kernels and combining them appropriately. This technique results in a mixed-kernels structure. Ourobjective in this paper is to explore such mixed-kernels constructions and analyze them.This paper is organized as follows. In Section 2 we review the idea of code decomposition and its relation tothe design of polar code kernels. This notion is the main motivation for the introduction of mixed-kernels. Forsimplicity, we decided to ﬁrst present the concept of mixed-kernels by an example of a speciﬁc construction thatis based on a binary kernel and a quaternary kernel. This is done in Section 3. The discussion is then broadenin Section 4 by introducing general mixed-kernels. Section 5 elaborates on two advantages that ﬁnite lengthmixed-kernels structures may have over known polar coding schemes: improved code decomposition leading tobetter error-correction (Subsection 5.1) and moderate decoding complexity (Subsection 5.2). Simulations resultsdemonstrating these advantages are given in Section 6.Throughout we use the following notations. For a natural number ℓ , we denote [ ℓ ] = { , , , ..., ℓ } and[ ℓ ] − = { , , , ..., ℓ − } . We denote vectors in bold letters. For i ≥ j , let u ij = [ u j u j +1 . . . u i ] be the sub-vector of u of length i − j + 1 (if i < j we say that u ij = [ ], the empty vector, and its length is 0). We alsooccasionally use the following notation [ u k ] k = ik = j to refer to the same sub-vector u ij . For two vectors u and v oflengths n u and n v , we denote the n u + n v length vector which is the concatenation of u to v by [ u v ] or [ u , v ]or just u • v . For a scalar x , the n u + 1 length vector u • x , is just the concatenation of the vector u with thelength one vector containing x . 1 Preliminaries

In this paper we consider kernels that are based on bijective transformations over a ﬁeld F . A channel polarizationkernel of ℓ dimensions, denoted by g ( · ), is a mapping g ( · ) : F ℓ → F ℓ . This means that g ( u ) = x , u , x ∈ F ℓ .We refer to this type of kernel as a homogeneous kernel , because its ℓ input coordinates and ℓ outputcoordinates are from the same alphabet F . Symbols from an alphabet F are called F -symbols in this paper.The homogenous kernel g ( · ) may generate a polar code of length ℓ m F -sybmols by inducing a larger mappingfrom it, in the following way [3]. Deﬁnition 1 (Homogenous Polar Code Generation)

Given an ℓ dimensions transformation g ( · ) , we con-struct a mapping g ( n ) ( · ) of N = ℓ n dimensions (i.e. g ( n ) ( · ) : F ℓ n → F ℓ n ) in the following recursive fashion. g (1) ( u ℓ − ) = g ( u ℓ − ) ; g ( n ) = h g ( γ , , γ , , γ , , . . . , γ ℓ − , ) ,g ( γ , , γ , , γ , , . . . , γ ℓ − , ) , . . . ,g (cid:0) γ ,N/ℓ − , γ ,N/ℓ − , γ ,N/ℓ − , . . . , γ ℓ − ,N/ℓ − (cid:1) i , where [ γ i,j ] j = N/ℓ − j =0 = g ( n − (cid:16) u ( i +1) · ( N/ℓ ) − i · ( N/ℓ ) (cid:17) , i ∈ [ ℓ ] − . General Concatenated Codes (GCC) are error correcting codes that are constructed by a technique, which wasintroduced by Blokh and Zyabolov [6] and Zinoviev [7]. In this construction, we have ℓ outer-codes {C i } ℓ − i =0 ,where C i is an N out length code of size M i over alphabet F i . We also have an inner-code of length N in and size Q ℓ − i =0 | F i | over alphabet F , with a nested encoding function ϕ ( · ) : F × F × ... × F ℓ − → F N in . The GCC thatis generated by these components is a code of length N out · N in F -symbols and of size Q ℓ − i =0 M i . It is createdby taking an ℓ × N out matrix, in which the i th row is a codeword from C i , and applying the inner mapping φ oneach of the N out columns of the matrix. As Dumer describes in his survey [8], the GCCs can provide good codeparameters for short length codes when using a good combination of outer-codes and a nested inner-code. Infact, some of them give the best parameters known. Moreover, decoding algorithms may utilize their structureby performing local decoding steps on the (short) outer-codes and utilizing the inner-code layer for exchangingdecisions between the outer-codes.As Arikan already noted, polar codes are examples of recursive GCCs [1, Section I.D]. This observation isuseful as it allows to formalize the construction of a large length polar code as a concatenation of several smallerlength polar codes (outer-codes) by using a kernel mapping (an inner-code). Therefore, applying this notionto Deﬁnition 1, we observe that a polar code of length N = ℓ n symbols, may be regarded as a collection of ℓ outer polar codes of length ℓ n − (the i th outer-code is [ γ i,j ] j = N/ℓ − j =0 = g ( n − (cid:16) u ( i +1) · N/ℓ − i · N/ℓ (cid:17) for i ∈ [ ℓ ] − ). Thesecodes are then joined together by employing an inner-code (deﬁned by the function g ( · )) on the outputs of thesemappings. There are N/ℓ instances of the inner-mapping, such that instance number j ∈ [ N/ℓ ] − is applied onthe j th symbol from each outer-code.The above GCC formalization is illustrated in Figure 1. In this ﬁgure, we see the ℓ outer-codewords of length ℓ n − depicted as gray horizontal rectangles (resembling rows of a matrix). The instances of the inner-codewordmapping are depicted as vertical rectangles that are located on top of the gray outer-codes rows (resemblingcolumns of a matrix). This is appropriate, as this mapping operates on columns of the matrix which rows arethe outer-codewords. Note, that for brevity we only drew three instances of the inner mapping, but there should The construction of the GCCs is a generalization of Forney’s code concatenation method [5]. ℓ n symbols constructed by a homogenous kernelaccording to Deﬁnition 1Figure 2: GCC representations corresponding to Example 1 (Arikan’s construction)be ℓ n − instances of it, one for each column of this matrix. In the homogenous case, the outer-codes themselvesare constructed in the same manner. However, note that even though the outer-codes have the same structure,they are diﬀerent codes in the general case. The reason is that they may have diﬀerent sets of frozen symbolsassociated with them. Example 1 (Arikan’s Construction)

Let u be an N = 2 n length binary vector. The vector u is transformedinto an N length vector x by using a bijective mapping g ( · ) : { , } N → { , } N . The transformation is deﬁnedrecursively as for n = 1 g (1) ( u ) = [ u + u , u ] for n > g ( n ) ( u ) = x N − , (1) where [ x j , x j +1 ] = [ γ ,j + γ ,j , γ ,j ] for j ∈ [ N/ − , and [ γ ,j ] N/ − j =0 = g ( n − (cid:16) u N/ − (cid:17) , [ γ ,j ] N/ − j =0 = g ( n − (cid:16) u N − N/ (cid:17) are the two outer-codes (each one of length N/ bits). Figure 2 depicts the GCC block diagramfor this example. The GCC structure of polar codes can be also represented by a layered Forney’s normal factor graph [9].Layer inner-layer . Layer i by considering the outer-codes that are concatenated by layer The vertices of a layered graph can be partitioned into a sequence of sub-sets called layers and denoted by L , L , · · · , L k − .The edges of the graph connect only vertices within the same layer or vertices in layers with successive ordinals. ℓ = 2 symbols as a layered factor graph i −

1) and include in this layer all the vertices describing their inner mappings. This recursive constructionprocess may continue until we reach to outer-codes that cannot be decomposed to non-trivial inner-codes andouter-codes. Edges (representing variables) connect between outputs of the outer-codes to the inputs of theinner mappings. This presentation can be viewed as observing the GCC structure in Figure 1 from its side.

Example 2 (Layered Normal Factor Graph for Arikan’s Construction)

Figures 3 and 4 depict a lay-ered factor graph representation for length N = 2 n symbols polar code with kernel of ℓ = 2 dimensions. Figure 3gives only a block structure of the graph, in which we have the two outer-codes of length N/ that are connectedby the inner layer (note the similarities to the GCC block diagram in Figure 2). Half edges represent the inputs u N − and the outputs x N − of the transformation. The edges (denoted by γ i,j , j ∈ [ N/ − i ∈ [2] − ) connectthe outputs of the two outer-codes to the inputs of the inner mapping blocks, g ( · ) . A more elaborated version ofthis ﬁgure is given in Figure 4, in which we expanded the recursive construction.Strictly speaking, the green blocks that represent the g ( · ) inner-mapping are themselves factor graphs (i.e.collections of vertices and edges). An example of a normal factor graph specifying such a block is given in Figure5 for Arikan’s ( u + v, v ) construction (see Example 1). Vertex a represents a parity constraint and vertex e represents an equivalence constraint. The half edges u , u represent the inputs of the mapping, and the halfedges x , x represent its outputs. This graphical structures is perhaps the most popular visual representation ofpolar codes (see e.g. [1, Figure 12] and [10, Figure 5.2] ) and is also known as the ”butterﬂies” graph becauseof the edges arrangement in Figure 4. When using SC types of decoders, we sequentially decide on the outer-codes (ﬁrst on outer-code ℓ = 2 symbols as a layered factor graph (detailedversion of Figure 3 - recursion expanded)Figure 5: Normal factor graph representation of the g ( · ) block from Figures 3 and 4 for Arikan’s ( u + v, v )construction 5omes as no surprise that the code decomposition induced by the kernel and its properties play a signiﬁcant rolein the performance of the SC decoder of polar codes. The notion of code decomposition and its relation to the construction of kernels for polar codes were previouslyexplored by the authors [4]. We review these concepts here in order to further develop them in the next section.

Deﬁnition 2 (Code Decomposition)

Denote by T ()0 = F ℓ . We perform a sequential partitioning of this set.First, T ()0 is partitioned into f , | F | η subsets, each one of size | F | ℓ f denoted by T ( b )1 where b ∈ [ f ] − . In thenext step each one of the sets T ( b )1 is decomposed into f , | F | η subsets, each one of size | F | ℓ f · f . We denotethese subsets by T ([ b b ])1 where b ∈ [ f ] − . In step i ∈ [ m ] − , T ( b i − ) i is partitioned into f i , | F | η i equally sizedsubsets n T ( b i − • b i ) i +1 o b i ∈ [ f i ] − , of size | F | ℓ Q ij =0 f j ( i ∈ [ m ] − ). We denote the set of subsets (or sub-codes) of levelnumber i by T i , that is T i = n T ( b i − ) i | b j ∈ [ f j ] − , j ∈ [ i ] − o .The set { T , ..., T m } is called a code decomposition of F ℓ . The decomposition is commonly described by thefollowing chain of code parameters ( ℓ, k , d ) − ( ℓ, k , d ) − ... − ( ℓ, k m , d m ) , if for each T ∈ T i we have that T is a code of length ℓ , size | F | k i and minimum distance at least d i , for all i ∈ [ m ] − .If the sub-codes of the decompositions are cosets, then we say that { T , ..., T m } is a decomposition into cosets.In this case, for each T i the sub-code that contains the zero codeword is called the representative sub-code, anda minimal weight codeword for each coset is called a coset leader. If all the sub-codes in the decomposition arecosets of linear codes, we say that the decomposition is linear. A transformation g ( · ) can be associated to a code decomposition in the following way. Deﬁnition 3 (Kernel Deﬁnition from Code Decomposition)

Let { T , ..., T m } be a code decomposition of F ℓ as described in Deﬁnition 2 and such that ∀T ∈ T m , |T | = 1 (i.e. the ﬁnal step of the decomposition is intosingletons). The transformation that is induced by this code decomposition is deﬁned as follows. g ( v , v , ..., v m − ) : m − Y i =0 F η i ! → F ℓ ; m − X i =0 η i = ℓ (2) ∀ v ∈ m − Y i =0 F η i , g ( v m − ) = x ℓ − iﬀ x ℓ − ∈ T ( v m − ) m , (3) where in the notation of T ( v m − ) m we take the decimal representation of the components of v , for consistencywith Deﬁnition 2. In some cases, it is useful to denote the argument of g ( · ) as a vector u ∈ F ℓ , i.e. write g ( u ) instead of g ( v ) where v ∈ Q m − i =0 F η i . In this case, there exists the obvious correspondence between v and u : v i = u fs , where s = P i − j =0 η j , f = P ij =0 η j and i ∈ [ m ] − . We say that v i is representing η i symbols that are”glued” together. It is convenient to denote v i as u ( s,f ) , if v i = u fs . Example 3 (Decomposition that Deﬁnes a Kernel)

In our previous correspondence [4, Example 1] weconsidered the decomposition into cosets described by the chain (4 , , − (4 , , − (4 , , . Using Deﬁnition 3,we introduce a kernel function g ( u , u (1 , , u ) : { , } × { , } × { , } → { , } (4) that is induced by this decomposition. The ﬁrst bit u chooses between sub-codes T (0)1 and T (1)1 . The secondand the third bits are glued together, forming a binary pair, or a quaternary symbol u (1 , and they indicate theselected sub-code of T ( u )1 . Finally, u selects the codeword from the chosen sub-code. Note that a straight-forwardimplementation of the encoding function is to multiply u by the appropriate generating matrix. N = 4 n bits length code. The standard Arikan’s construction (based on the Kronecker power)does not suﬃce, because of the glued bits u (1 , , that need to be jointly treated as a quaternary symbol. Tofacilitate this, we suggest introducing a second quaternary kernel, g ( · ). Because diﬀerent coordinates of theinput of g ( · ) are from diﬀerent alphabet sizes, and because in order to implement this polarization scheme,we incorporate two mapping functions g ( · ) and g ( · ), we refer to the overall construction as a mixed-kernels construction. Details on how to combine kernels g ( · ) and g ( · ) into a mixed-kernels construction are given inSection 3. The general construction is presented in Section 4. In this section we introduce the concept of polar codes based on mixed-kernels. In order to have a morecomprehensible presentation of the idea we chose to ﬁrstly describe speciﬁc members of the mixed-kernelsensemble. This speciﬁc example of mixed-kernels seems to be attractive because of its relative simplicity andgood error-correction performance as we further observe in Section 6. The general structure of mixed-kernelsmay be easily derived from this example and is further discussed in Section 4.

Let g ( · ) be the mapping deﬁned in (4). Let g ( · ) : (cid:0) { , } (cid:1) → (cid:0) { , } (cid:1) be a polarizing kernel over thequaternary alphabet. For example, g ( · ) can be a kernel, based on the extended Reed-Solomon code of length4, G RS (4) that was proven by Mori and Tanaka [11, Example 20] to be a polarizing kernel (we refer to thecode generated by G RS (4) as the RS g ( · ), we can extend the mapping of g ( · ) to a length N = 4 n bits code. Both g ( · ) and g ( · ) are referred to as the constituent kernels of the construction. Note that g ( · ) is introduced in order to handle the glued bits u (1 , of the input of g ( · ) and therefore is also referred toas the auxiliary kernel of the construction.Let us ﬁrst review the channel splitting principle [1, Section I.B] using g ( · ). The output of g ( · ) is binary.We also assume that the channel on which the result of the transformation (i.e. the codeword) is sent on isbinary input and memoryless. The meaning of taking two inputs and glue them together is that these inputsare treated as a uniﬁed entity for decoding and decision making.Let us denote by u and x two binary vectors that are, respectively, the input and the output of the mapping g ( · ). g ( u , u (1 , , u ) = x , u , u ∈ { , } ,u (1 , ∈ { , } , x i ∈ { , } , i ∈ [4] − x is transmitted over 4 copies of the binary memoryless channel W and the channel output vector y is received.The channel splitting principle dictates the following synthetic channels and their corresponding transitionfunctions. channel W (0)4 : W (0)4 ( y | u ) , P u (1 , ∈{ , } ,u ∈{ , } · W ( y | u , u (1 , , u ) , u ∈ { , } . channel W (1 , : W (1 , ( y , u | u (1 , ) , P u ∈{ , } · W ( y | u , u (1 , , u ) , u (1 , ∈ { , } . channel W (3)4 : W (3)4 ( y , u , u (1 , | u ) , · W ( y | u , u (1 , , u ) , u ∈ { , } . Here we use Arikan’s notations, according to which W (cid:0) y | u (cid:1) = Q i =0 W ( y i | x i ), where x = g ( u ) and W ( y | x )is the transition function of the W channel.Next, consider g ( · ), which is a quaternary input and output mapping. A binary vector u ∈ { , } istransformed into x ∈ (cid:0) { , } (cid:1) in the following fashion g ( u (0 , , u (2 , , u (4 , , u (6 , ) = x , u (2 i, i +1) , x i ∈ { , } , i ∈ [4] − . The codeword x is transmitted over 4 copies of a quaternary input memoryless channel ˜ W , and the outputvector y is received. By the channel splitting principle we derive the following channels7 hannel ˜ W (2 i, i +1)4 : ˜ W (2 i, i +1)4 ( y , u i − | u (2 i, i +1) ) , P u i +2 ∈{ , } − i ˜ W ( y | u i − , u (2 i, i +1) , u i +2 ) ,u (2 i, i +1) ∈ { , } , i ∈ [4] − . We denote g (1) ( · ) , g ( · ). Constructing a mapping function of dimension 16 (denoted by g (2) ( · )) isdone as follows. Let u be a binary vector of length 16. Deﬁne three vectors a , g ( u , u (1 , , u ) , b , g ( u (4 , , u (6 , , u (8 , , u (10 , ) and c , g ( u , u (13 , , u ) . Using these deﬁnitions we ﬁnally have g (2) ( u ) = (cid:2) g ( a , b , c ) , g ( a , b , c ) , (5) g ( a , b , c ) , g ( a , b , c ) (cid:3) . In this construction a , b and c are three outer-codes of length of four symbols (the symbols of a and c are bits,and for b these are quaternary symbols). The outer-codes are combined together using the inner mapping g ( · ).In order to extend this construction to a mapping g ( n ) (cid:16) u n − (cid:17) , n > outer-code : [ γ ,j ] j = N/ − j =0 = g ( n − (cid:16) u N/ − (cid:17) , u j , γ ,j ∈ { , } , j ∈ [ N/ − . outer-code : [ γ ,j ] j = N/ − j =0 = g ( n − (cid:16)(cid:2) u ( N/ j,N/ j +1) (cid:3) j = N/ − j =0 (cid:17) , u ( N/ j,N/ j +1) , γ ,j ∈ { , } , j ∈ [ N/ − . outer-code : [ γ ,j ] j = N/ − j =0 = g ( n − (cid:16) u N − N/ (cid:17) , u N/ j , γ ,j ∈ { , } , j ∈ [ N/ − .Note that outer-codes N/ N/ g ( · ) inner mapping (note the consistencyof this deﬁnition with that of g (2) ( · ) in (5)). g ( n ) = h g ( γ , , γ , , γ , ) , g ( γ , , γ , , γ , ) , . . . , (6) g (cid:0) γ ,N/ − , γ ,N/ − , γ ,N/ − (cid:1) i . Figure 6 depicts the GCC construction of (6). Note that outer-code symbols . On the other hand, the height of the rectangle of outer-code channel tree process as we later describe inSubsection 3.2. The foundation to this analysis is based on the simple observation that SC decoding of theinputs to the mapping g ( n ) ( · ) is equivalent to decoding inputs to the transformations g ( · ) or g ( · ). These trans-formations use as their communication channel one of the synthetic channels generated by the transformation g ( n − ( · ) over the original channel W . In other words, when decoding one bit u i (two glued bits u ( i,i +1) ) overthe channel W ( i )4 n ( y , u i − | u i ) (over the channel W ( i,i +1)4 n ( y , u i − | u ( i,i +1) )), this is manifested as decoding a bit(a glued pair of bits) which is an input to the transformations g ( · ) or g ( · ). These transformations ”see” asthe communication channel the appropriate synthetic channel ( W ( j )4 n − or W ( j,j +1)4 n − , depending on the value of i ). This description of the synthetic channels evolution enables a recursive analysis of the behavior of the SCdecoder. 8igure 6: A GCC representation of the length N = 4 n bits mixed-kernels polar-code g ( n ) ( · ) described in Section3 We now turn to describe the channel tree process corresponding to our example of mixed-kernels construc-tion. A random sequence { W n } n ≥ is deﬁned such that W n ∈ n W ( τ n ( i ))4 n o ν ( n ) − i =0 , where ν ( n ) denotes thenumber of channels (where the glued bits channels are counted as one channel) and τ n ( i ) denotes the in-dex of the i th channel ( τ n ( i ) is needed because some of the channels correspond to glued bits and there-fore have their indexing as a pair of integer numbers ). For example, for the W channel, constructedusing the transformation in (5), we have the number of channels ν (2) = 10, where the values of τ ( · ) are[ τ ( i )] i =9 i =0 = [0 , (1 , , , (4 , , (6 , , (8 , , (10 , , , (13 , , { N n } n ≥ the numberof bits at the input of the channel. We therefore have N n = 1 in case we consider a single bit input channel and N n = 2 in case we deal with a channel of glued bits input. We deﬁne the channels random sequence recursively. W n +1 = W ( B n ) n for n ≥ W = W , N = 1 , (7)where B n ∈ { , (1 , , , (0 , , (2 , , (4 , , (6 , } and indicates the labels of the synthetic channels deﬁned inSubsection 3.1 (note that the pairs of numbers in this set correspond to channels having inputs of two gluedbits). Moreover, W ( B n ) n denotes a synthetic channel ˜˜ W ( B n )4 where the basic channel ˜˜ W is taken as the previouselement of the channel tree process, i.e. ˜˜ W = W n . The channel realizations of W n is the set of all syntheticchannels as deﬁned by traversing the inner layers of the GCC construction (and using the appropriate channelsplitting formulae). It is the object of this sequence deﬁnition, that each of the synthetic channels induced by g ( n ) ( · ) will have probability ρ if its input is binary and 2 · ρ if its input is quaternary. This sequence will enableus to utilize the probabilistic-method to prove properties of the polar coding scheme.The description of the probabilistic dynamics of the random sequences { B n } n ≥ , { N n } n ≥ now follows. Let n B (1) n o n ≥ be an i.i.d random sequence of the values [0 , (1 , ,

3] with corresponding probabilities [0 . , . , . n B (2) n o n ≥ be an i.i.d random sequence of the values [(0 , , (2 , , (4 , , (6 , T the minimum non-negative n such that B (1) n = (1 , N n = (cid:26) , n ≤ T ;2 , n > T . In case τ n ( i ) = ( j , j ) we denote, for brevity, the channel indicated by it as W ( j ,j )4 n , instead of W (( j ,j ))4 n as the notationimplies. B n = B ( N n ) n . Note that T is a geometric random variable with probability of success p = 1 / T , the sequence of B n is of independent samples (although the distribution isnot identical for all samples).Suppose we have a certain channel W and a binary i.i.d input vector U that is transformed by g ( · ) to X ,transmitted over a B-MC channel, and received as Y . The mutual information chain rule implies that4 · I ( W ) = I ( Y ; U ) = I ( Y ; U ) + I ( Y ; U , | U )+ (8)+ I ( Y , U | U ) = I ( W (0)4 ) + I ( W (1 , ) + I ( W (3)4 ) . Next, deﬁne the information random sequence corresponding to the channels as { I n } n ≥ . I n = I ( W n ) N n n ≥ . (9)For a channel W with input X ∈ X and output Y ∈ Y , we denote by P e ( W ) the average error probability ofthe maximum a posteriori estimator ˆ x ( y ) = arg max x ∈X Pr ( X = x | Y = y ). This means that P e ( W ) = 1 − X y ∈Y Pr ( Y = y ) · max x ∈X Pr ( X = x | Y = y ) . (10)We deﬁne the random sequence P e,n = P e ( W n ). The Bhattacharyya parameter sequence is denoted by Z n = Z ( W n ), where for a q -ary channel W we have Z ( W ) = q · ( q − P x,x ′ ∈X ,x = x ′ Z x,x ′ ( W ) and Z x,x ′ ( W ) = X y ∈Y p W ( y | x ) W ( y | x ′ ) . Note that I n , Z n ∈ [0 , Z n → ⇐⇒ I n →

0, and that Z n → ⇐⇒ I n → Proposition 1

The process { I n } n ≥ is a bounded martingale which is uniformly integrable. As a result, itconverges almost surely to I ∞ . Proof

Employing the information sequence deﬁnition (9) results in E [ I n +1 | I n , N n = 1] = 14 I (cid:16) W (0) n (cid:17) I (cid:16) W (1 , n (cid:17) I (cid:16) W (3) n (cid:17) . (11)Using (8) we have E [ I n +1 | I n , N n = 1] == 14 (cid:16) I (cid:16) W (0) n (cid:17) + I (cid:16) W (1 , n (cid:17) + I (cid:16) W (3) n (cid:17)(cid:17) = I ( W n ) N n (cid:21) N n =1 = I n . (12)On the other hand E [ I n +1 | I n , N n = 2] == 14 I (cid:16) W (0 , n (cid:17) I (cid:16) W (2 , n (cid:17) I (cid:16) W (4 , n (cid:17) I (cid:16) W (6 , n (cid:17) E [ I n +1 | I n , N n = 2] = 12 · (cid:16) I (cid:16) W (0 , n (cid:17) + I (cid:16) W (2 , n (cid:17) + (14)+ I (cid:16) W (4 , n (cid:17) + I (cid:16) W (6 , n (cid:17)(cid:17) = I ( W n ) N n (cid:21) N n =2 = I n Consequently, by taking (12) and (14) we have E [ I n +1 | I n ] = I n , (15)10hich means that the sequence { I n } n ≥ is a martingale. Furthermore, it is uniformly integrable (see e.g. [13,Theorem 4.5.3]) and therefore it converges almost surely to I ∞ . ♦ Note that for any S ⊆ R Pr ( I n ∈ S ) = 14 n X i ∈ [ ν ( n )] − s.t. I (cid:16) W ( τn ( i ))4 n (cid:17) ∈ S τ n ( i )) , (16)where τ n ( i )) counts the number of bits at the input of channel τ n ( i ), which is 1 for a single bit input channel,and 2 for a glued two bits input channel. Observe that (16) attributes to the two bits of the glued bits pair thesame characterizations of mutual information (because they are regarded as a uniﬁed entity), and as such theyare counted. Note further that E [ I n ] = E [ I ∞ ] = I ( W ) . Thus, by showing that the mixed kernels constructionis polarizing, i.e. I ∞ ∈ { , } , we may infer using (16) that the proportion of clean channels (induced by thetransformation and the SC decoding) is I ( W ).Let Γ n be the number of glued two bits input channels of g ( n ) ( · ). Using the above probabilistic-method, wecan deduce that Γ n = 4 n · · Pr( N n = 2) = 4 n · (cid:18) − n (cid:19) . (17)The proportion of the glued two bits channel goes to 1 as n grows, and so is the number of occurences of the g ( · ) kernel. Because of this we refer to g ( · ) as the surviving kernel of the mixed-kernels construction. As aconsequence the properties of g ( · ) dominate the construction asymptotically. Speciﬁcally, we show in the sequel,that if the kernel g ( · ) is polarizing, so is the mixed-kernels construction. Moreover, the polar coding exponentassociated with the g ( · ) kernel also deﬁnes the rate of polarization of the mixed-kernels conﬁguration. In this part we study the polarization property of our mixed-kernels example and its rate of polarization. Weshow that g ( · )’s characteristics determine the attributes of the mixed-kernels structure for asymptotically longcodes. Proposition 2

Assume that g ( · ) is a polarizing kernel, i.e. for a construction that is based only on g ( · ) wehave that lim n →∞ Pr (cid:16) I (cid:16) ˜ W n (cid:17) / ∈ ( δ, − δ ) (cid:17) = 0 , ∀ δ ∈ (0 , . , (18) where n ˜ W n o n ≥ is the channel tree process associated with g ( · ) . As a result, the mixed-kernels construction isalso polarizing, i.e. lim n →∞ Pr ( I n ∈ ( δ, − δ )) = 0 , ∀ δ ∈ (0 , .

5) (19)

Proof

We prove that for a given δ ∈ (0 , .

5) for each ǫ > n = n ( δ, ǫ ), such that for all n > n Pr ( I n ∈ ( δ, − δ )) < ǫ. Let n be chosen such that Pr ( N n = 2) ≥ − ǫ for every n ≥ n . Now, for n = n consider all the channels, W ( i,j )4 n , having glued bits input. By our assumption, when n grows further, each one of them undergoes polar-ization. According to (18) this means that each one of the Γ n glued channels has an index n ( i, j ) such thatwhen n ≥ n + n Pr (cid:16) I ( W n ) / ∈ ( δ, − δ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j )4 n (cid:17) < ǫ . Denote by n ∗ the maximum over these n ( i, j ), and by n , n + n ∗ . We have that for n ≥ n Pr ( I n ∈ ( δ, − δ )) == Pr ( I n ∈ ( δ, − δ ) | N n = 1) | {z } ≤ Pr ( N n = 1) | {z } <ǫ/ + Pr ( I n ∈ ( δ, − δ ) | N n = 2) | {z } <ǫ/ P r ( N n = 2) | {z } ≤ < ǫ. (20)11 We now turn to discuss the polarization rate. In order to do this, we need to consider the partial distancesof the kernels. We use the notations of Mori and Tanaka [11]. For a given kernel g ( v , v , . . . , v m − ) as deﬁnedin (2), we give the following deﬁnitions. D ( i ) x,x ′ (cid:0) v i − (cid:1) = min w mi +1 , ˜ w mi +1 d H (cid:0) g (cid:0) v i − , x, w m − i +1 (cid:1) , g (cid:0) v i − , x ′ , ˜ w m − i +1 (cid:1)(cid:1) D ( i ) x,x ′ = min v i − D ( i ) x,x ′ (cid:0) v i − (cid:1) x, x ′ ∈ F η i D ( i )max = max x,x ′ ∈ F ηi D ( i ) x,x ′ ; D ( i )min = min x,x ′ ∈ F ηi ,x = x ′ D ( i ) x,x ′ In order to distinguish between the partial distances of the two kernels, g ( · ) and g ( · ), we add an additionalsubscript to these parameters for kernel indication. For example, D ( i )0 , min and D ( i )1 , min denote the i th item in theminimum partial distance sequences of kernel g ( · ) and kernel g ( · ), respectively. We note here that for linearkernels, we have D ( i )max = D ( i )min . Proposition 3 If g ( · ) is a linear polarizing kernel and Z ( W ) = 0 then it holds for any δ > n →∞ Pr (cid:16) P e,n ≤ − n ( Ec ( g − δ ) (cid:17) ≥ I ( W ) , (21)lim n →∞ Pr (cid:16) P e,n ≤ − n ( Ec ( g δ ) (cid:17) = 0 . (22) where E c ( g ) = 1 / P i =0 log (cid:16) D ( i )1 , min (cid:17) and referred to as the exponent of the kernel g ( · ) . Proof

Let ǫ >

0. Similarly to Proposition 2, we let n be chosen such that Pr ( N n = 2) ≥ − ǫ for each n ≥ n . Now, for n = n consider all channels with glued bits inputs, W ( i,j )4 n . According to Mori and Tanaka[14, Theorem 31], when n grows further, each one of them undergoes polarization and have its error probabilitydecaying according to the exponent of g ( · ). This means that each one of the γ n glued channels has an index n = n ( i, j, δ, ǫ ) such that for n ≥ n + n Pr (cid:16) P e,n < − ( n − n Ec ( g − δ/ (cid:12)(cid:12)(cid:12) W n = W ( i,j )4 n (cid:17) ≥ I (cid:16) W ( i,j )4 n (cid:17) / − ǫ/ , (23)Pr (cid:16) P e,n < − ( n − n Ec ( g δ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j )4 n (cid:17) ≤ ǫ/ . (24)Here I (cid:16) W ( i,j )4 n (cid:17) is divided by the number of bits at the input of the channel (which is 2 because we consider gluedbits channels) in accordance to [14]. For given δ and ǫ >

0, denote by n ∗ the maximum over the aforementioned n ( i, j, δ, ǫ ). Also denote by n the minimum natural number that is ≥ n + n ∗ and also satisﬁes (25) for all n ≥ n . n (1 − n /n ) · ( E c ( g ) − δ/ ≥ n · ( E c ( g ) − δ ) . (25)Therefore, for n ≥ n we havePr (cid:16) P e,n < − n ( Ec ( g − δ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j )4 n (cid:17) ≥ I (cid:16) W ( i,j )4 n (cid:17) / − ǫ/ (cid:16) P e,n < − n ( Ec ( g δ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j )4 n (cid:17) ≤ ǫ/ . (27)Using the law of total probability, Pr (cid:16) P e,n < − n ( Ec ( g − δ ) (cid:17) ≥≥ X i ∈ [ ν ( n )] − V | τ n ( i ) | =2 Pr (cid:16) P e,n < − n ( Ec ( g − δ ) (cid:12)(cid:12)(cid:12) W n = W τ n ( i )4 n (cid:17) · Pr (cid:16) W n = W τ n ( i )4 n (cid:17) . n . Theses channels are identifyied by pair of indices (i.e. | τ n ( i ) | = 2). Using(26), we derive thatPr (cid:16) P e,n < − n ( Ec ( g − δ ) (cid:17) ≥ X i ∈ [ ν ( n )] − V | τ n ( i ) | =2 Pr (cid:16) W n = W τ n ( i )4 n (cid:17) · (cid:16) I (cid:16) W τ n ( i )4 n (cid:17) / − ǫ/ (cid:17) == X i ∈ [ ν ( n )] − V | τ n ( i ) | =2 Pr (cid:16) W n = W τ n ( i )4 n (cid:17) · I (cid:16) W τ n ( i )4 n (cid:17) / − Pr( N n = 2) · ǫ/ = X i ∈ [ ν ( n )] − Pr (cid:16) W n = W τ n ( i )4 n (cid:17) · I (cid:16) W τ n ( i )4 n (cid:17) | τ n ( i ) | − X i ∈ [ ν ( n )] − V | τ n ( i ) | =1 Pr (cid:16) W n = W τ n ( i )4 n (cid:17) · I (cid:16) W τ n ( i )4 n (cid:17) − Pr( N n = 2) · ǫ/ ≥ ≥ E ( I n ) − (1 − Pr( N n = 2)) − Pr( N n = 2) · ǫ/ ≥ I ( W ) − ǫ On the other hand, Pr (cid:16) P e,n < − n ( Ec ( g δ ) (cid:17) ≤≤ X i ∈ [ ν ( n )] − V | τ n ( i ) | =2 Pr (cid:16) P e,n < − n ( Ec ( g δ ) (cid:12)(cid:12)(cid:12) W n = W τ n ( i )4 n (cid:17) · Pr (cid:16) W n = W τ n ( i )4 n (cid:17) ++ X i ∈ [ ν ( n )] − V | τ n ( i ) | =1 Pr (cid:16) W n = W τ n ( i )4 n (cid:17) ≤≤ ǫ/ · Pr( N n = 2) + Pr( N n = 1) ≤ ǫ. ♦ Note that Proposition 3 is encouraging, because typically the exponent of the auxiliary kernel can be largerthan the exponent of the initial kernel. For example, the exponent of g ( · ) is 0 .

5, while the exponent of G RS (4)is 0 . Section 3 introduced a speciﬁc instance of the mixed-kernels family, formed by two constituent kernels of ℓ = 4dimensions and alphabet sizes of 2 and 4. In this section we broaden the ideas and techniques of Section 3 togeneral mixed-kernels schemes.Consider a case of a mixed-kernels code over alphabet F of length N = ℓ n F -symbols. Let us assume thatwe have a code decomposition of the F ℓ space. An ℓ dimensions kernel g ( · ) over F can be associated to thisdecomposition (see Subsection 2.2) acting as an inner-code of our GCC construction. Decomposition steps thatinduce partitioning of the remaining space (deﬁned by the preceding decomposition steps) to | F | sub-codesare represented by F -symbol inputs to g ( · ). Outer-codes of the same mixed-kernels scheme of length N/ℓF -symbols are associated with these inputs. On the other hand, decomposition steps that induce partitioningof the remaining space to | F | η sub-codes ( η >

1) are represented by η F -symbols that are glued together. Theinterpretation of gluing η F -symbols is that these symbols are decoded as a uniﬁed entity by the SC algorithm.In order to meet this decoding speciﬁcation we employ a length

N/ℓ outer-code over alphabet F η . This outer-code’s F η -symbols are connected to the inputs of g ( · ) instances that are associated with this decompositionstep. Typically this outer-code is taken to be a homogenous polar code of length N/ℓ F η -symbols constructedby an ℓ dimensions kernel over F η .We now turn to formalize this generalization. Let g ( v , v , ..., v m − ) be equal to g ( · ) in (2). Denote the setof indices corresponding to glued symbols at the input of g ( · ) by B = { i ∈ [ m ] − | η i ≥ } and let θ i , P ik =0 η k for i ∈ [ m ] − and θ − ,

0. For each i ∈ B we assign a kernel g i +1 ( · ) : ( F η i ) ℓ → ( F η i ) ℓ (if η i = η j we usuallyemploy the same kernel, i.e. g i +1 ( · ) ≡ g j +1 ( · )). The kernels mentioned here are called the constituent kernels of the construction. The mapping g ( · ) is referred to as the interface kernel and the other kernels are dubbed auxiliary kernels . We note that in [15, Table 5], the author gives a list of code decompositions that can be used13or the deﬁnition of a binary interface kernel g ( · ). Mori and Tanaka’s non-binary kernels [16] may be foundsuitable for the auxiliary kernels g i +1 ( · ), i ∈ B .The construction of a high dimensions transform of length N = ℓ n F -symbols, g ( n ) (cid:16) u ℓ n − (cid:17) , can be exercisedby a proper adjustment of the recursive GCC method we described in Section 3. For n = 1 we have g (1) ( · ) ≡ g ( · ).For n > g i +1 ( · ) i ∈ B which support the glued symbols inputs of the inner-mapping, g ( · ). Speciﬁcally, for length N = ℓ n F -symbols code, we have m outer-codes of length of N/ℓ symbols(these symbols may be produced by gluing together several F -symbols). We denote outer-code i by the vector[ γ i,j ] N/ℓ − j =0 , where i ∈ [ m ] − . Denote by s i the input oﬀset for outer-code i , which means that the ﬁrst index ofthe input vector u to outer-code i is s i . Consequently, we have s i = θ i − · N/ℓ . If η i = 1 then the outer-codeis an instance of the same mixed-kernels structure of length N/ℓ F -symbols:[ γ i,j ] j = N/ℓ − j =0 = g ( n − (cid:16) [ u s i + j ] j = N/ℓ − j =0 (cid:17) , u s i + j , γ i,j ∈ F, j ∈ [ N/ℓ ] − . (28)If η i > v i is a glued symbol of η i F -symbols, with corresponding kernel g i +1 ( · ). Therefore,outer-code i is an instance of a polar code of length N/ℓ of F η i -symbols. This code is generated by using thehomogenous kernel g i +1 ( · ). Formally, we have[ γ i,j ] j = N/ℓ − j =0 = g ( n − i +1 (cid:16)(cid:2) u ( s i + η i · j , s i + η i · ( j +1) − (cid:3) j = N/ℓ − j =0 (cid:17) , u ( s i + η i · j , s i + η i · ( j +1) − , γ i,j ∈ F η i , j ∈ [ N/ℓ ] − . (29)Note that in (29), the argument of g ( n − i +1 ( · ) is a vector of length N/ℓ . Each element of this vector, u ( s i + η i · j , s i + η i · ( j +1) − ,is constructed by η i F -symbols, u s i + η i · ( j +1) − s i + η i · j , that are glued together. Finally, these m outer-codes are com-bined together using the g inner mapping g ( n ) = h g ( γ , , γ , , . . . , γ m − , ) , g ( γ , , γ , , . . . , γ m − , ) , . . . ,g (cid:0) γ ,N/ℓ − , γ ,N/ℓ − , . . . , γ m − ,N/ℓ − (cid:1) i . Assume that x ℓ − = g ( v , v , ..., v m − ) is transmitted over ℓ copies of the memoryless channel W , and wereceive the output vector y . The channel splitting principle dictates the generation of m synthetic channels.If the input of channel i is over F (i.e. not glued), then we denote the channel by W ( θ i − ) ℓ and we have thefollowing transition function. W ( θ i − ) ℓ ( y , u θ i − − | u θ i − ) = 1 | F | ℓ − · X u ℓ − θi ∈ F ℓ − θi W ℓ ( y | u θ i − − , u θ i − , u ℓ − θ i ) , u θ i − ∈ F. (30)Glued symbols are handled as a uniﬁed entity in SC decoding. If the input to channel i is of η i glued F -symbolsthen we denote the channel by W ( θ i − ,θ i − ℓ . Note that the superscript that identiﬁes the channel (( θ i − , θ i − u that were glued together. We have W ( θ i − ,θ i − ℓ (cid:16) y , u θ i − − (cid:12)(cid:12) u ( θ i − ,θ i − (cid:17) = 1 | F | ℓ − η i · X u ℓ − θi ∈ F ℓ − θi W ℓ (cid:16) y (cid:12)(cid:12)(cid:12) u θ i − − , u ( θ i − ,θ i − , u ℓ − θ i (cid:17) , u ( θ i − ,θ i − ∈ F η i . (31)The processing of the likelihoods related to the kernels g i +1 ( · ) for the glued symbols i ∈ B is done over a channel˜ W with input symbol F η i . ˜ W can be created as a result of one of the channel splittings that were induced by aglued input v i of g ( · ) (these channels are denoted by a pair of numbers in their superscript, i.e. W ( θ i − ,θ i − ℓ ).˜ W can also be produced by the homogenous polar code that is connected to a glued input v i . We denote thesynthetic channels that are splitted from ˜ W by n ˜ W ( j · η i , ( j +1) · η i − ℓ o ℓ − j =0 . Formally we have g i +1 (cid:0) u (0 ,η i − , u ( η i , η i − , . . . , u (( ℓ − · η i ,ℓ · η i − (cid:1) = x ℓ − , u ( j · η i , ( j +1) · η i − , x j ∈ F η i , j ∈ [ ℓ ] − . N = 8 n bits code (deﬁned bythe g ( n ) ( · ) mapping). The mapping g ( n − ( · ) is the same mixed-kernels construction of length N/ g ( n − ( · )and g ( n − ( · ) are mappings of homogenous polar codes of length N/ x ℓ − is transmitted over ℓ copies of an F η i input memoryless channel ˜ W , and the output vector y is received.By the channel splitting principle we derive the following synthetic channels for j ∈ [ ℓ ] − .˜ W ( j · η i , ( j +1) · η i − ℓ (cid:16) y , u j · η i − (cid:12)(cid:12) u ( j · η i , ( j +1) · η i − (cid:17) == 1 | F | η i ( ℓ − · X u ℓ · ηi − j +1) ηi ∈ ( F ηi ) ℓ − − j ˜ W ( j · η i , ( j +1) · η i − ℓ (cid:16) y (cid:12)(cid:12)(cid:12) u j · η i − , u ( j · η i , ( j +1) · η i − , u ℓ · η i − j +1) · η i (cid:17) . Example 4

Let ℓ = 8 and deﬁne the following binary output kernel g ( · ) g (cid:0) u , u (1 , , u (4 , , u (cid:1) = u · G, u i ∈ { , } , i ∈ [8] − , where G is × matrix derived by swapping row with row of (cid:18) (cid:19) N , where A N k denotes the k th Kronecker power of the matrix A . g ( · ) induces a code decomposition of { , } having the following chain ofparameters (8 , , − (8 , , − (8 , , − (8 , , . g ( · ) induce a code decomposition of { , } having the followingchain of parameters (8 , , − (8 , , − (8 , , − (8 , , . We therefore have two glued octonary input symbols u (1 , , u (4 , ∈ { , } , that require additional octonary kernels of ℓ = 8 dimensions. We denote these kernels by g ( · ) and g ( · ) , respectively. Note that g ( · ) and g ( · ) are mappings in (cid:0) { , } (cid:1) → (cid:0) { , } (cid:1) . We may chooseto use the G RS (8) kernel [3] both for g ( · ) and for g ( · ) . Figure 7 illustrates the GCC construction of length N = 8 n bits polar code using this mixed structure. We associate to the mixed-kernels construction a channel tree process, W n ∈ n W ( τ n ( i )) ℓ n o ν ( n ) − i =0 , where ν ( n )denotes the number of synthetic channels induced by the length ℓ n mapping (where glued symbols input channelsare counted as one channel). Moreover, similarly to the deﬁnitions in Subsection 3.2, τ n ( i ) denotes the index ofchannel number i . As before, { N n } n ≥ denotes the number of symbols at the input of the channel, which in ourcase is N n = 1 when we deal with a single symbol channel or N n = η i , i ∈ B when we consider a channel withinput of glued symbols. The channel tree process statistics is deﬁned as follows. W n +1 = W ( B n ) n for n ≥ W = W , N = 1 , B n = ( B (0) n , n ≤ T ; B ( i +1) n , n > T V B (0) T = ( θ i − , θ i − , i ∈ B . (32)Note that W ( B n ) n denotes a synthetic channel ˜˜ W ( B n ) ℓ where the basic channel ˜˜ W is taken as the previous elementof the channel tree process, i.e. ˜˜ W = W n . The sequence B n indicates the branching of the tree process.Pairs of numbers in the sequence of B n indicate channels having input of glued symbols, while single numberscorrespond to channels with with F -symbol input. The sequence begins by taking the values of the sequence B (0) n that correspond to the channels generated by the interface kernel g ( · ). Starting from n > T , B n takes thevalues of the sequence B ( i +1) n corresponding to the chosen auxiliary kernel.The random sequence n B (0) n o n ≥ is i.i.d and takes values from the set { θ i − | i / ∈ B} S { ( θ i − , θ i − | i ∈ B} .The left set in the union is the set of channel indices with non-glued input symbols. Each one of these indiceshas probability of 1 /ℓ . The right set in the union is the set of the indices of channels with glued symbols inputs,such that ( θ i − , θ i −

1) has probability of η i /ℓ . Moreover, for each i ∈ B , let us deﬁne n B ( i +1) n o n ≥ to be an i.i.drandom sequence of the values { ( j · η i , ( j + 1) η i − | j ∈ [ ℓ ] − } with uniform probabilities (= 1 /ℓ ) associated toeach one of them.Denote by the random variable T the minimum non-negative n such that B (0) n ∈ { ( θ i − , θ i − | i ∈ B} (i.e.it refers to an index of a synthetic channel induced by g ( · ) with glued symbols input). It is easy to see thatthe random variable T is geometric with parameter p = (cid:0)P i ∈B η i (cid:1) /ℓ . Furthermore, given the value of T thesequence B n is of independent samples. Since we begin our tree process with channels corresponding to theinterface kernel g ( · ) the random variable T indicates the index of transition from channels associated with g ( · )to channels corresponding to the auxiliary kernels. The speciﬁc kernel, to which we transition, is determined bythe chosen index in the transition point. Moreover let N n indicate the number of symbols at the input of thechannel W n . If the transition was to kernel of η i glued symbols we have N n = η i for n > T . Formally, N n = (cid:26) , n ≤ T ; η i , n > T V B (0) T = ( θ i − , θ i − , i ∈ B .Let φ ( η i ) denote the number of inputs to g (0) ( · ), having η i glued symbols. For instance, in Example 4 wehave φ (1) = 2 and φ (3) = 2. Denote by Γ n ( η i ) the number of inputs of g ( n ) ( · ) having η i glued symbols. Wehave that for η i > n ( η i ) = ℓ n · η i · Pr ( N n = η i ) = ℓ n · η i · n − X τ =0 Pr (cid:16) N n = η i ^ T = τ (cid:17) = (33)= ℓ n · η i · n − X τ =0 φ ( η i ) · η i ℓ · (1 − p ) τ = ℓ n − · φ ( η i ) · − (1 − p ) n p For η i = 1 we have Γ n (1) = ℓ n · Pr (

T > n ) = ℓ n · (1 − p ) n . (34)The number of g ( n ) ( · )’s F -symbols inputs that are part of an η i -glued set is η i · Γ n ( η i ). Observe that asymptot-ically in n , the proportion of F -symbols input that are part of any η i -glued input set is φ ( η i ) · η i p · ℓ . On the otherhand, the proportion of input symbols that are not part of any glued symbols set vanishes as n grows, and thusis also the relative number of occurrences of the initial interface kernel in the construction. Consequently, theauxiliary kernels are also called the surviving kernels of the construction.Let us deﬁne the mutual information sequence as I n = I ( W n ) N n (note that we take | F | as the base of thelogarithm in the mutual information deﬁnition). As we demonstrated in Section 3, here also the polarizationand the rate of polarization properties are determined by the surviving kernels. The latter observation on thedominance of the auxiliary kernels will be evident in the generalization of the propositions from Section 3.3 thatare presented next. Proposition 4

The process { I n } n ≥ is a bounded martingale which is uniformly integrable. As a result, itconverges almost surely to I ∞ . roof This proof is similar to the proof of Proposition 1. The only delicate step that we need to consider hereis the channel splitting due to the kernel g ( · ). This is because the other kernels are homogenous and polarizingand therefore their information sequence being a martingale was already proven (see e.g. [3, Lemma 9]). Wehave E [ I n +1 | I n , N n = 1] = X i/ ∈B ℓ · I (cid:16) W ( θ i − ) n (cid:17) + X i ∈B η i ℓ · I (cid:16) W ( θ i − ,θ i − n (cid:17) η i = (35)= 1 ℓ X i/ ∈B · I (cid:16) W ( θ i − ) n (cid:17) + X i ∈B I (cid:16) W ( θ i − ,θ i − n (cid:17)! = I n , where the last transition is due to the mutual-information chain rule. As a result of the law of total expectation,we have E [ I n +1 | I n ] = E N n [ E [ I n +1 | I n , N n ]] = I n , (36)which means that the sequence { I n } n ≥ is a martingale. Furthermore, it is uniformly integrable (see e.g. [13,Theorem 4.5.3]) and therefore it converges almost surely to I ∞ . ♦ Proposition 5

Assume that for all i ∈ B , g i +1 ( · ) is a polarizing kernel, i.e. for a construction that is basedonly on g i +1 ( · ) we have that lim n →∞ Pr (cid:16) I (cid:16) ˜ W n (cid:17) /η i ∈ ( δ, − δ ) (cid:17) = 0 , ∀ δ ∈ (0 , . . Here n ˜ W n o n ≥ denotes the channel tree process for the homogenous kernel g i +1 ( · ) . As a result, the mixed-kernelsconstruction is also polarizing, i.e. lim n →∞ Pr ( I n ∈ ( δ, − δ )) = 0 , ∀ δ ∈ (0 , . . Proof

The proof is similar to the proof of Proposition 2, only that here n is chosen such that Pr( N n > ≥ − ǫ/ n ≥ n . ♦ Assume that all the construction’s auxiliary kernels are linear. Let E min = min i ∈B E c ( g i +1 ) and E max =max i ∈B E c ( g i +1 ), where E c ( g i +1 ) is the polar coding exponent of kernel g i +1 ( · ) (the base of the logarithm inthe polar coding exponent is the kernel size, ℓ ). Proposition 6

If for all i ∈ B we have that g i +1 ( · ) is a linear polarizing kernel and Z ( W ) = 0 then it holds forall δ > n →∞ Pr (cid:16) P e,n ≤ − ℓ n ( Emin − δ ) (cid:17) ≥ I ( W ); (37)lim n →∞ Pr (cid:16) P e,n ≤ − ℓ n ( Emax + δ ) (cid:17) = 0 . (38) Proof

This proof is similar to the proof of Proposition 3, only that here we may have more than one auxiliarykernel. Consequently, as in the previous proof, we choose n such that Pr( N n > ≥ − ǫ/

2. Because ∀ i ∈ B , E min ≤ E c ( g i +1 ) ≤ E max and based on Mori and Tanaka [14, Theorem 31], each one of the gluedchannels in layer n has an index n = n ( i, j, δ, ǫ ) such that for n ≥ n + n we have (39) and (40) replacing(23) and (24), respectively.Pr (cid:16) P e,n < − ℓ ( n − n ( Emin − δ/ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j ) ℓ n (cid:17) ≥ I (cid:16) W ( i,j ) ℓ n (cid:17) / − ǫ/ , (39)Pr (cid:16) P e,n < − ℓ ( n − n Emax + δ ) (cid:12)(cid:12)(cid:12) W n = W ( i,j ) ℓ n (cid:17) ≤ ǫ/ . (40)The rest of the proof of Proposition 3 may now be employed. ♦ Merits of Mixed-Kernels Constructions

In this section we discuss possible beneﬁts of using mixed-kernels based structures. Subsection 5.1 considersthe opportunity for utilizing variety of code decompositions, which may be more suitable for SC decoding.Subsection 5.2 examines the SC algorithm complexity, and shows that for mixed-kernels structures we may havesmaller SC decoding complexity compared to homogenous polar codes that are based on their auxiliary kernels(although they have the same polar coding exponent). Furthermore, we suggest an approach to fairly comparebetween the error correction performance of diﬀerent coding schemes of the same length under SC List (SCL)decoder. This idea will be used in Section 6 to demonstrate by simulations that mixed-kernels structures mayoutperform the currently known polar coding schemes.

Our initial motivation for studying mixed-kernels structures was the opportunity for generating richer classes ofcode decompositions that induce new polar code types. These decompositions allow their codes to be partitionedinto variable numbers of subsets on each step. By doing so, we may be able to ensure that each step increases thepartial distance between the sub-codes. The outer-codes that are associated with these steps can be adjusted tosupport the diﬀerent qualities of the resultant synthetic channels. As a consequence, the outer-codes performanceunder SC decoding may be improved.As an example of this advantage, let us compare the partial distances that are induced by the diﬀerent stepsof several decompositions of length N = 2 n bits codes. Table 1 compares the partial distance sequences thatwere induced by codes based on Arikan’s binary ( u + v, v ) kernel, homogenous polar code based on G RS (4)kernel and the mixed-kernels construction of Section 3 with G RS (4) as the auxiliary mapping g ( · ) (denoted as M ixed − RS RS b ( m ) for b, m natural numbers to denote the vector of m copiesof the number b .In order to allow comparisons between structures having the same length, we had to introduce additionalinner-code layers to some of the structures. Speciﬁcally, for N ∈ { , } we included an additional ( u + v, v )layer as an inner mapping to the M ixed − RS N bits wehave two instances of length N/ M ixed − RS u + v, v ) inner-code. We refer to such structures as the ( u + v, v ) − M ixed − RS N ∈ { , } we included an additional quaternary ( u + v, v ) layer as an innermapping to the RS u + v, v ) − RS n bits codes where n is even, the N ∈ { , } entries in Table1 may be considered as inner-codes of the GCC construction. Similarly, when n is odd, the N ∈ { , } entriesare the inner-codes of the constructions.The table shows that the partial distance sequences of the ( u + v, v ) construction contain more repetitionsof values compared to the M ixed − RS u + v, v ) scheme, while the quality of the channels ismore diverse for the mixed-kernels construction. This diversity may lead to better adjustment of the outer-codesthat ”operate” over these synthetic channels.The partial distance sequences of the mixed-kernels structure can also be interpreted as binary partial dis-tance sequence. According to this interpretation, each quaternary decomposition, denoted by bold entry b , istransformed into two steps of binary entries ( b, b ). When comparing the induced binary distance sequences of themixed-kernels with the sequence of ( u + v, v ) we can observe that for N ∈ { , } , the mixed-kernels structurehas better sequence. The meaning of the last statement is that for each entry α i and β i of the distance sequences18 Partial Distance Sequences of Polar Code Constructions[ bits ] ( u + v, v ) RS M ixed − RS (cid:0) , (2) , (cid:1) − (1 , , (cid:0) , (3) , (3) , (cid:1) ( , , , ) (1 , , , , , (cid:0) , (4) , (6) , (4) , (cid:1) (cid:0) , (2) , , (2) , , (cid:1) (cid:0) , (2) , (2) , , , (2) , (cid:1) (cid:0) , (5) , (10) , (10) , (5) , (cid:1) (cid:0) , (2) , (2) , (3) , (cid:0) , , (2) , (2) , (3) , ,, (2) , (2) , , (2) , (cid:17) , (2) , (3) , , , (2) , (cid:17) (cid:0) , (6) , (15) , (20) , (15) , (cid:0) , (3) , (2) , (5) , (4) , (5) , (cid:16) , (3) , (3) , (3) , (3) , (7) , (2) ,, (6) , (cid:1) , (4) , (3) , , (2) , (cid:17) , (3) , (3) , , (3) , (3) , (cid:17) Table 1: Comparison of partial distance sequences induced by diﬀerent codes based on the binary ( u + v, v )construction, the RS RS g ( · ). For N ∈ { , } we included an additional single layerof the quaternary ( u + v, v ) inner-code in the RS N ∈ { , } we included asingle layer of the binary ( u + v, v ) inner-code in the M ixed − RS u + v, v ) and the M ixed − RS α i ≤ β i and there exists i , such that α i < β i . Moreover, for N = 16 bits the mixed structure has the same binary partial distance sequence as theone derived by Korda et al. [2, Example 28]. Korada et al. proved that this sequence is optimal for binarylinear kernels of length ℓ = 16 bits. The optimality here is in the sense that it has the maximum polar codingexponent.The partial distance sequences of the mixed-kernels structure are better than the sequences of the RS RS RS N = 32 the mixed structure contains 12 repeating values (out of 20 distancesequence entries), while the RS RS An additional advantage of the mixed-kernels structures may be manifested in terms of the complexity of the SCdecoders that operate on them. We begin by analyzing the SC decoding algorithm time complexity for diﬀerentpolar coding schemes in Subsection 5.2.1. In Subsection 5.2.2 we explore the the memory requirements of theseschemes (a.k.a space-complexities). The complexity of SCL decoder implementation is elaborated in Subsection5.2.3. This discussion justiﬁes our methodology of fair comparison between diﬀerent coding schemes presentedin Subsection 5.2.4. We ﬁnally apply this approach when analyzing error-correction performance simulations inSection 6.

The SC decoding steps can be classiﬁed into three categories: (SC.a) likelihood calculations; (SC.b) decisionmaking based on these likelihoods; (SC.c) partial encoding of the decided symbols. The time complexity of (SC.a) category operations dominate the time complexity of the entire SC algorithm. This is our justiﬁcationfor regarding the number of operations of (SC.a) as a good measure of the SC decoder time complexity.For a homogeneous kernel of ℓ dimensions over ﬁeld F , the straight-forward calculation of the likelihoodsperformed on the i th decoding step ( i ∈ [ ℓ ] − ) requires | F | · (cid:0) | F | ℓ − i − − (cid:1) additions and | F | ℓ − i · ( ℓ −

1) multi-plications (see (30) where θ i = i + 1 , ∀ i ∈ [ ℓ ] − for the speciﬁcation of this naive method). For linear kernels itis possible to perform trellis decoding based on the zero-coset’s parity check matrix. In this way the number ofadditions is ≤ ℓ · | F | i +1 · ( | F | −

1) and the number of multiplications is ≤ ℓ · | F | i +2 . These bounds do not take19ernel Decoding Complexity ( · , u + v, v ) over GF (2) (4 ,

2) (2 ,

0) - - (6 , u + v, v ) ⊗ mixed over GF (2) (12 ,

6) (20 ,

4) (6 ,

0) - (38 , u + v, v ) over GF (4) (16 ,

12) (4 ,

0) - - (20 , G RS (4) over GF (4) (48 ,

36) (96 ,

60) (48 ,

12) (12 ,

0) (204 , g ( · ) kernel in Section 3.into account the fact that many paths in the trellis may be skipped and that some of the nodes in the trellishave input degree < | F | . Therefore, the actual number of operations may be reduced signiﬁcantly, and alwaysbe bounded from above by the complexity of the initial naive approach. Table 2 summarizes the number ofoperations for likelihood calculations per each decoding step for diﬀerent kernels using trellis decoding. Note thatdue to numerical stability it is preferable to use log-likelihoods instead of likelihoods in the decoding algorithmimplementation. In this case the ﬁrst number in each tuple in the table should be regarded as the number oflog-likelihoods additions. The second item in each tuple is interpreted as the number of max ⋆ ( · , · ) operations,where max ⋆ ( α , α ) , max { α , α } + log (1 + exp {| α − α |} ).In order to calculate the total number of operations of the SC algorithm for a code of length N bits, we need totake into account the number of occurrences of each decoding step in the algorithm. This can be easily achievedby counting the number of kernels in the code structure of each polar code. Utilizing the GCC structure of polarcodes enables us to easily count using recursion formulae. Speciﬁcally, let a ( u + v,v ) ( N ) denote the number ofoccurrences of the ( u + v, v ) kernel in the ( u + v, v ) code GCC structure of length N bits. We have for N = 2 n bitsand n > a ( u + v,v ) ( N ) = N/ · a ( u + v,v ) ( N/

2) and a ( u + v,v ) (2) = 1, therefore a ( u + v,v ) ( N ) = N/ · log ( N ).For the RS4 polar code of length 2 · n bits for n > a RS ( N ) = N/ · a RS ( N/ a RS (8) = 1,as a result a RS ( N ) = N/ · log ( N/ N = 1024 , M ixed − RS G RS (4) as the g ( · ) kernel. The ( u + v, v ) − RS RS N/ u + v, v ) over GF (4) inner-code. The( u + v, v ) − M ixed − RS M ixed − RS N/ u + v, v ) inner-code. The aforementionedconstructions enable us to support diﬀerent code lengths (this is because the RS · n bits and the M ixed − RS n bits for some n > (SC.a) for each code. Notethat here we do not distinguish between additions and multiplications. Remark 1 (SC Decoder Shortcuts)

Table 3 assumes that in SC we have to sequentially decode all the ele-ments of the polar code encoder input vector u . However, given a code design (i.e. a set of input indices thatare frozen), it is possible to reduce the number of decoding operations. The most obvious ”shortcut” is to skiplikelihoods calculation of frozen-symbols blocks (since their value is known a priori). Small outer-codes of lowrates can be decoded as one unit and save calculations (see e.g. [18, Section D]). Rate outer-codes can also bedecoded eﬃciently as Almadar-Yazdi and Kschischang suggested [19]. High-rate linear outer-codes can be eﬃ-ciently decoded using trellis decoder based on their dual-code (see e.g. Miloslavskaya and Trifonov [20, SectionV]).We note that application of these shortcuts may aﬀect speciﬁc polar code structures diﬀerently based on theircode design, (see e.g. the comparison between Tables 6 and 7 in the sequel). Having said that, Table 3 may stillprovide the reader with the (crude) time complexity cost diﬀerences associated with diﬀerent polar-code structures. The rightmost column of Table 3 contains the number of operations for a speciﬁc code divided by the numberoperations for decoding the ( u + v, v ) code of the same length. By doing so we can quantify the eﬀort in likelihoodcalculation (and as a result in SC decoding) for each code compared to Arikan’s ( u + v, v ) scheme. We may20 Polar u + v, v ) ( u + v, v ) ⊗ ( u + v, v ) G RS (4) Total Normalized GF (2) mixed GF (4)1024 ( u + v, v ) 5120 − − − . u + v, v ) − RS − −

256 512 167936 4 . M ixed − RS − −

392 146112 3 . u + v, v ) 11264 − − − . RS − − − . u + v, v ) − M ixed − RS −

784 300416 3 . u + v, v ) 24576 − − − . u + v, v ) − RS − − . M ixed − RS − − . u + v, v ) code ofthe same length.observe that the number of operations for SC decoding of the M ixed − RS RS g ( · ) is much lighter inits SC decoding complexity than the G RS (4) kernel. The second observation that is evident from the table isthat Arikan’s ( u + v, v ) structure has signiﬁcantly lower decoding complexity than both the M ixed − RS RS u + v, v ) and the other structures we need to equalize the decoding eﬀorts for these structures. Wefurther discuss this idea in Section 5.2.4.In this subsection we considered the number of operations of the decoding algorithm as a measure of itscomplexity. The time it takes to run the decoding algorithm is dependent on both the number of operations andtheir time duration. This decoding time can be usually reduced by introducing parallelism into the decodingalgorithm at the cost of duplicating the processing units and additional control logic. Although SC is a sequentialdecoding algorithm, most of its decoding steps can be parallelized as indicated e.g. by Leroux et al. [21, 22]and the authors [23, Section 5]. Therefore the rightmost column in Table 3 may also indicate a proportionalincrease in the allocated computation resources ( e.g. number of logic gates in the hardware implementation) forimplementing each of the decoders while keeping the same decoding throughput. This increase also results in acorresponding growth of the decoder power consumption.Next we explore the memory requirements (space complexity) of the decoding algorithm. Table 4 summarizes the main memory assets required for an eﬃcient time implementation of the SC decoder (aswas discussed in the previous subsection). These assets are described for the inner-layer of the GCC structure(see Subsection 2.2 for the inner-layer deﬁnition). In order to derive the total memory size for a scheme of length N symbols we need to add the numbers in the table to the total memory size for the outer-codes of length N/ℓ where ℓ is the kernel size in symbols. In SC a single outer-code is decoded per GCC layer on each point in time.Therefore we only need to take into account the memory speciﬁed for decoding an individual outer-code (i.e.the values in Table 4 are not to be multiplied by the number of outer-codes per layer). Example 5 (SC Space Complexity for RS ) Consider the RS scheme of length N = 2 · n bits. The SCdecoder memory size for this structure can be derived by a summation of the numbers in Table 4 ( | F | = ℓ = 4 )and the memory size for length N = 2 · n − bits RS scheme (as long as n > ). Therefore we conclude thatthe overall memory size is N · (cid:16)P n − i =0 ( λ + 2) 4 − i (cid:17) = · ( λ + 2) · ( N − bits, where λ is the number of bitsassigned for representing a log-likelihood value. LogLikelihoodMem

Holding the results of (log) likelihood calculation serving as N ·| F |· λ log ( | F | ) · ℓ an input for the next outer-code decoder. PartialEncMem

Holding the results of partial encoding based N on the previous SC decisions. It contains one coset members ofthe currently decided sub-code.Table 4: SC algorithm memory assets for the inner-layer of the GCC structure for code of length N bits overﬁeld F and inner-code size (kernel size) of ℓ F -symbols. Each log-likelihood is represented by λ bits.Polar Code Restrictions Memory Size Normalized Memory Sizeon N [bits] [bits] ( N >> , λ = 6[bits])( u + v, v ) power of 2 2 · ( λ + 1) · N . u + v, v ) − RS · (cid:0) λ + (cid:1) · (cid:16) N − λ +2 λ + (cid:17) . M ixed − RS RS · ( λ + 2) · ( N −

2) 0 . u + v, v ) − M ixed − RS · (cid:0) λ + (cid:1) · (cid:16) N − λ +2 · λ + (cid:17) . N bits. Thelog-likelihoods are represented by λ bits. For an easy comparison between the schemes, the rightmost columnreports the memory size for each scheme divided by the memory size of the ( u + v, v ) scheme of the same length. Remark 2 (Log-Likelihood vs. Log-Likelihood Ratio Memory)

In this Section we assume that the chan-nel observations and internal calculations are done in terms of log-likelihoods (LLs), which require | F | LL valuesper F -symbol. In SC decoder it is possible to save memory space by subtracting all the LLs by the LL corre-sponding to the element of F , and omitting the LL of . These normalized values are called Log-LikelihoodRatios (LLRs). Using LLRs decreases the space required to store likelihoods by a multiplicative factor of | F || F |− ,which gives an advantage to the ( u + v, v ) scheme. However, in order to operate the SCL algorithm (discussedin the next subsection) we have to work with LLs . Since SCL has better performance than SC, we decided toanalyze the space complexity using LLs as a preparation for Subsection 5.2.3. For mixed-kernels codes, the outer-codes structures may not be same, therefore the

LogLikelihoodMem shouldbe taken as the maximum memory required for SC implementation for each of the outer-codes.

Example 6 (SC Space Complexity for

M ixed − RS ) Consider the Mixed-RS4 polar code scheme of length N = 4 n bits. In this structure there are two types of outer-codes: the Mixed-RS4 and the RS schemes. Therefore,in each layer the memory size should be taken as the maximum memory required for supporting each one of them.Speciﬁcally, the mixed-kernel requires LogLikelihoodMem of size λ/ · N bits for supporting the mixed outer-codesand λ · N bits for the RS outer-code. Therefore, for the inner-layer we need to allocate ( λ + 1) · N bits. For thecomplete scheme we take the size of the memory allocated for the inner layer and add to it the maximum sizespeciﬁed for Mixed-RS4 and RS4 of length N/ symbols. It can be proven by induction that the Mixed-RS4 oflength of N/ bits requires less memory than the RS4 code of length N/ quaternary symbols. As a result, theoverall scheme employs memory of size · (cid:0) λ + (cid:1) · (cid:16) N − λ +2 λ + (cid:17) bits. Table 5 contains the SC decoders memory consumption of several polar coding schemes. The rightmost columnof the table contains the quotient of the memory size of each scheme and that of the ( u + v, v ) scheme for thesame (large) code length and λ = 6 bits representation of the LLs. The table indicates that for coding schemes Balatsoukas-Stimming et al. [24] showed how to use LLRs in SCL decoding. However, in order to do so each decoding pathrequires an additional path-metric (PM) to be calculated and stored along with the LLRs. Consequently, the number of LLRs andPMs required to be stored throughout the algorithm is the same as the number of LLs in the standard implementation.

22f length N = 2 · n bits the RS ∼

38% and ∼

45% of the memory used by the ( u + v, v ) and( u + v, v ) − M ixed − RS N = 4 n bits, both the Mixed-RS4 and the ( u + v, v ) − RS ∼

69% of the memory speciﬁed for the ( u + v, v ) polar code. Tal and Vardy [25, 26] introduced the SCL decoder that enhances the error-correction performance of the SCalgorithm. This performance is improved to greater extent if CRC is concatenated to the polar code.The SCL algorithm with list of size L considers simultaneously at most L possible preﬁxes for the transmittedinformation word (these preﬁxes are dubbed decoding-paths ). For each such decoding-path SCL performs thesame calculations that are employed in a single SC decoder. In other words, the consumed time and spaceresources of each decoding step in SC, corresponding to (SC.a) and (SC.c) categories, are grown by at mostfactor of L in SCL. When the algorithm has to decide on the (non-frozen) information symbol u i it calculates foreach of the surviving decoding-paths ˆ u i − the likelihood of the preﬁx when u i is concatenated to it (i.e. ˆ u i − • u i ,where u i ∈ F ). This step increases the number of paths by a multiplicative factor of | F | . The decoder thenkeeps the L paths with the highest likelihood scores. As a consequence, the complexity of SCL is bounded fromabove by the addition of two components: (i) L times the complexity of SC (ii) the total complexity of ﬁndingthe best L paths (for each of the non-frozen symbols). The complexity of (ii) is negligible compared to thecomplexity of (i) (assuming that L is ﬁxed, and N → ∞ ). Indeed, Tal and Vardy showed that for list size L andArikan’s ( u + v, v ) code of length N bits the decoding time complexity of the SCL algorithm is O ( L · N · log N )with space complexity of O ( L · N ). This idea can be further generalized to other homogenous kernels and forthe mixed structures (see e.g. [23]). Observations 1 and 2 formalize this discussion. Observation 1 (SCL Time Complexity)

Let T SC denote the SC decoding time (measured in number of op-erations deﬁned in Table 3) for a homogenous code of length N bits over ﬁeld F . Let T SCL ( L ) denote the decodingtime for SCL with list size L for the same code. We have T SCL ( L ) ≤ L · T SC + N · R log ( | F | ) · µ T ( | F | · L, L ) + O ( N · L · log L ) , (41) where R is the code rate, µ T ( x, y ) is the number of operation for ﬁnding the y maximal elements in a list of x numbers. Proof

The ﬁrst addend on the right hand side is due to the fact that in each point of time there are at most L decoding-paths, each one of them has time complexity of a single SC decoder. The second addend is due tothe selection of best L decoding paths among at most | F | · L candidates. This operation occurs for non-frozensymbols, and hence it occurs N · R log ( | F | ) times. The third addend in the bound accounts for counters and pointershandling that occurs in the SCL algorithm. Note that as N grows the second and the third addends in thebound become negligible compared to the ﬁrst addend. It is known that T SC = O ( N · log N ), assuming that | F | and ℓ (the kernel number of dimensions) are constant. The second addend corresponds to an order staticsproblem and therefore µ T ( | F | · L, L ) = O ( L ) (see e.g. [27, Chapter 9]). As a conclusion, we may claim that T SCL ≤ L · T SC · (1 + o (1)), where o (1) vanishes as N grows. ♦ In Observations 1 we used an upper-bound to characterize the complexity of SCL in terms of L times the SCcomplexity. The exact complexity, however, is dependant on the speciﬁc code design. In order to understandthis remark, note that for each of the ﬁrst l log | F | ( L ) m decisions steps on the non-frozen symbols the list size isincreased by a multiplicative factor ≤ | F | , from list size 1 to L . Therefore, the number of operations until thisdecoding point is strictly less than L times the number of operations employed until the same point in the SCalgorithm. Tables 6 and 7 exemplify this notion.Table 6 summarizes the SCL decoder number of operations (as deﬁned in Table 3) for diﬀerent rate R = 1coding schemes and diﬀerent list sizes L . The number of operations for SC decoding of ( u + v, v ) (i.e. SCLwith list size L = 1) serves as the normalization reference for the complexity of the other codes having the samelength. The leftmost list column ( L = 1) should be recognized as the rightmost column in Table 3. It is evidentfrom the table that for ﬁxed L , as N increases the number of operations of the SCL decoder tends to be L timesthe number of operations of the SC algorithm. Note that this is always the case if the index of the non-frozen23ormalized Number of Operations N [ bits ] ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤ Code L u + v, v ) 1.0 1.9 3.6 6.9 13.7 27.3( u + v, v ) − RS M ixed − RS u + v, v ) 1.0 1.9 3.6 7.0 14.0 27.8 RS u + v, v ) − M ixed − RS u + v, v ) 1.0 1.9 3.6 7.1 14.1 28.1( u + v, v ) − RS M ixed − RS L for diﬀerent rate R = 1 coding schemesof length N bits. For each length N the total number of operations for decoding is divided by the number ofoperations of ( u + v, v ) of the same length with L = 1 (i.e. SC ). The numbers of operations for ( u + v, v ) oflengths N = 1024 , , l log | F | ( L ) m is independent with N (in SCL the number of decoding-paths reachesthe maximum list size when decoding this symbol).Table 7 considers several length N bits codes with user information rate of 0 . N/ L = 1) is diﬀerent than the corresponding column in Table 6. It can be seen that indeed the increasein the number of operations is less than L times that of the SC decoder. Furthermore, Tables 6 and 7 exemplifythat the exact time complexity of the SCL algorithm is indeed code design dependant. Notwithstanding, themultiplicative factor L may still serve as a reasonable rule-of-thumb when considering the complexity of SCLcompared to SC. Simulations of the error-correction performance of the codes listed in Table 7 are discussed inSection 6. Remark 3 (Table 6 Revisited)

Table 6 example was given here only for demonstrating the SCL time com-plexity dependency with the code design (in conjunction with Table 7). The actual results in the table are quiteinsigniﬁcant for the following reasons: (i)

All the table’s length N bits codebooks are identical and equal to { , } N . The encoders are diﬀerent though. (ii) The ML decoder for rate codes is much simpler than the SCdecoder [19]. Consequently, there is no practical justiﬁcation for SCL decoding of such codes. Observation 2 (SCL Space Complexity)

Let S SC denote the SC decoder required memory size for a ho-mogenous code of length N bits over ﬁeld F . Let S SCL ( L ) denote the required memory size for SCL decoder withlist size L for the same code. It can be shown that S SCL ( L ) ≤ L · S SC + µ S ( | F | · L, L ) + O (log N · L · log( L )) , (42) where µ S ( x, y ) is the size of memory used for ﬁnding the y maximal elements in a list of x numbers. Proof

Similarly to Observation 1’s proof, the ﬁrst addend on the right hand side is due to having L decodingpaths and the second addend is required for ﬁnding the best L decoding paths among at most | F | · L candidates.24ormalized Number of Operations N [ bits ] ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤ Code L u + v, v ) 1.0 1.8 3.4 6.6 12.8 25.4( u + v, v ) − RS M ixed − RS u + v, v ) 1.0 1.8 3.5 6.7 13.2 26.0 RS u + v, v ) − M ixed − RS u + v, v ) 1.0 1.8 3.5 6.9 13.7 26.9( u + v, v ) − RS M ixed − RS L for diﬀerent coding schemes of length N bits. The user information rate of the code is R = 0 . N the total number of operations for decoding is divided by the number of operations of ( u + v, v ) of the samelength with L = 1 (i.e. SC ). The numbers of operations for ( u + v, v ) of lengths N = 1024 , , N = 1024 , E b /N = 2 . , . . N increases the secondand the third addends in the bound become negligible compared to the ﬁrst addend. Consequently, we mayclaim that S SCL ( L ) ≤ L · S SC · (1 + o (1)), where o (1) vanishes as N grows. ♦ Observation 2 also bounds from above the space complexity of SCL decoding as L times the complexity ofSC. In this case, for ﬁxed L , the memory size cannot be typically reduced signiﬁcantly by taking into advantagethe code design. Therefore, it is reasonable to assume that S SCL ( L ) ≈ L · S SC . Subsections 5.2.1 and 5.2.2 demonstrated that diﬀerent coding schemes have diﬀerent time and space complexitiesfor the SC decoding algorithm. Comparing the error-correction performance of SC on these schemes has to takeinto account the coding system throughput requirement and its memory limitations. Tables 3 and 7 indicatethat the ( u + v, v ) SC decoder has signiﬁcantly lower time complexity than the other schemes. Consequentlygiven a time-complexity constraint, it is reasonable to utilize a more enhanced decoding scheme (having highertime complexity still meeting the constraint) for the ( u + v, v ) code. One possibility for accomplishing thisidea is by increasing the length of the ( u + v, v ) code. Since the error-correction performance of decodingalgorithms typically improves as longer codes are employed, this technique may be useful for surpassing theoriginal decoder’s performance. However, this approach is problematic because in many cases the code length isa requirement of the communication system and cannot be increased . Therefore, in this correspondence we usethe following comparison guidelines: (i) All the compared coding schemes will have an equal code length. Thislength is understood to be the maximum value still complying with the communication system speciﬁcations. (ii)

In order to equalize the decoding complexities of diﬀerent schemes we employ SCL with diﬀerent list sizes.The discussion in Section 5.2.3 has set the stage for performing fair comparisons by applying guideline (ii) .Let C and C be two coding schemes of equal length decoded by the SCL algorithm with list sizes L and L , Let us consider two simple scenarios exemplifying the dependency of the code length with other features of the communicationsystem: (i)

The code length determines the number of bits required for transmitting a single bit over the channel. Accordingly itinﬂuences the transmission latency of the communication system. (ii)

In storage applications, the code length deﬁnes the minimumsize of information that needs to be retrieved from the device for reliably fulﬁlling a user’s read request. Consequently it aﬀects thesystem read latency. • Case Study I (CS-I):

The decoder implementation is limited by the required throughput or by thenumber of gates of its hardware implementation . In this scenario we should choose L and L such thattheir time complexities (and thereby their implication on the computation resources or logical gates count)will meet the requirements. For this task, an analysis such as the one depicted in Table 7 may be regardedas a useful reference. • Case Study II (CS-II):

The implementation is limited by the algorithm memory size. In this scenariowe may use Table 5 and Observation 2 for choosing L and L such that memory requirements are met.The (CS-I) and (CS-II) scenarios illustrate two extremal limitations. Typically, a system designer mayhave a requirement on the throughput while experiencing limitations on the computation resources/logical-gatescount and the allowed memory size. Hence his challenge is to choose the solution that meets these speciﬁcationsand demonstrate the best performance according to some criteria . In such scenarios, considering the possiblesolutions when only one of the speciﬁcations is taken into account (i.e. reducing the problem to (CS-I) or (CS-II) ) and then selecting only the conﬁgurations that satisﬁes also the other constraints will give the set ofvalid designs from which the best option is to be picked.In the sequel we present simulation results of the schemes from Table 7 and use the above comparisonguidelines to demonstrate that the M ixed − RS u + v, v ) codes in SCL decoding. Proposition 6 implies that when considering the exponent as a measure of the polarization rate, the behavior ofa mixed-kernels structure is the same as the behavior of the weakest kernel from its surviving kernels. However,the exponent is an asymptotic measure and it may fail capturing the performance of a polar coding schemefor a ﬁnite block length N . Indeed, Section 5 suggests that employing mixed-kernels may lead to improvederror-correction performance due to a better code decomposition, with moderate SCL decoding complexity.In this section we demonstrate this performance improvement conjecture for rate 0 . N = 1024 , L ),and diﬀerent outer CRC codes. We tried both CRC codes of 8 bits and of 16 bits and present in the ﬁgures, theCRC that gave the best results. For each simulation point we collected at least 100 frame error events.Figure 8a depicts the frame-error-rate (FER) results simulated for N = 1024 bits codes. The ( u + v, v ) listsizes were of 16 and 32. We consider the two case-studies from Subsection 5.2.4. (CS-I) : using Table 7 we maydeduce that the ( u + v, v ) list sizes of 16 and 32 should be compared with the list sizes of 4 and 8 respectivelyof the other schemes. It is evident that the M ixed − RS u + v, v ) − RS u + v, v ) achieves similarerror-correction performance to the M ixed − RS M ixed − RS u + v, v ) as the SNR increases. (CS-II) : using Table 5 we can compare the schemes with the same listsizes. Here there is a clear advantage of the mixed schemes compared to their corresponding candidates fromthe other schemes. We note that this is achieved with ≈

40% less memory resources compared to the ( u + v, v )polar code.The trend that was illustrated in the last paragraph is enhanced in Figure 8c that depicts the FER resultsfor N = 4096 bits codes. Indeed, the M ixed − RS (CS-I) and (CS-II) . Figure 8d contains simulation results of the M ixed − RS M ixed − RS It is assumed that the throughput speciﬁcation may be accomplished by introducing suﬃcient level of decoding parallelism, seeSubsection 5.2.1. Typical optimization criteria may comprise the ones considered in this section: maximum throughput, minimum logical gatecount and minimum memory size. Additional criteria may also include e.g. maximum error-correction performance and minimumpower consumption. .4 1.6 1.8 2 2.2 2.410 −6 −5 −4 −3 −2 E b /N [dB] F E R (u+v,v), L=16, CRC8(u+v,v), L=32, CRC16(u+v,v)−RS4, L=4, CRC8(u+v,v)−RS4, L=8, CRC8(u+v,v)−RS4, L=16, CRC8(u+v,v)−RS4, L=32, CRC8Mixed−RS4, L=4, CRC8Mixed−RS4, L=8, CRC8Mixed−RS4, L=16, CRC8Mixed−RS4, L=32, CRC8 (a) N = 1024 , ( E b /N ) design = 2 . dB ] −5 −4 −3 −2 E b /N [dB] F E R (u+v,v), L=16, CRC8(u+v,v), L=32, CRC16RS4, L=4, CRC8RS4, L=8, CRC8RS4, L=16, CRC8RS4, L=32, CRC8(u+v,v)−Mixed−RS4, L=4, CRC8(u+v,v)−Mixed−RS4, L=8, CRC8(u+v,v)−Mixed−RS4, L=16, CRC8(u+v,v)−Mixed−RS4, L=32, CRC8 (b) N = 2048 , ( E b /N ) design = 2 . dB ] −6 −5 −4 −3 −2 −1 E b /N [dB] F E R (u+v,v), L=16, CRC16(u+v,v), L=32, CRC16(u+v,v)−RS4, L=4, CRC8(u+v,v)−RS4, L=8, CRC8(u+v,v)−RS4, L=16, CRC16(u+v,v)−RS4, L=32, CRC16Mixed−RS4, L=4, CRC8Mixed−RS4, L=8, CRC8Mixed−RS4, L=16, CRC16Mixed−RS4, L=32, CRC16 (c) N = 4096 , ( E b /N ) design = 1 . dB ] −6 −5 −4 −3 −2 −1 E b /N [dB] F E R (u+v,v)−RS4, L=4, CRC8(u+v,v)−RS4, L=8, CRC8Mixed−RS4, L=4, CRC8Mixed−RS4, L=8, CRC8Mixed−RS8, L=1Mixed−RS8, L=2Mixed−RS8, L=4 (d) N = 4096, including Mixed − RS E b /N ) design =1 . dB ] Figure 8: FER simulation of the SCL decoding algorithms of various rate 0 . N bitsfrom Table 3 over AWGN with BPSK modulation. Each code was designed by running GA simulation at SNRpoint ( E b /N ) design . The list size is indicated by L and the CRC length is indicated by the number followingthe CRC label (i.e. CRC8 and CRC16 for 8 and 16 bits CRCs, respectively).27he other schemes, its time complexity is much higher. Consequently, considering (CS-I) scenarios, the ﬁguresuggests that M ixed − RS N = 2048 bits codes. The ( u + v, v ) curve of SCL with L = 32was also simulated by Tal and Vardy [26, Figure 1]. Comparing the ( u + v, v ) − M ixed − RS u + v, v ) with list sizes of 16 and 32 respectively indicates that both of them have similarFER while the ﬁrst implementation requires less complexity compared to the second one. The RS u + v, v ) − M ixed − RS u + v, v ) − M ixed − RS RS L (reduction of >

29% in the number ofoperations, according to Table 7). On the other hand, the RS u + v, v ) − M ixed − RS L (reduction of ∼

55% of the memory size, according toTable 5).

Mixed-kernels constructions of polar codes were introduced and analyzed in this paper. We began by providingconditions for polarization of the mixed-kernels structures based on their constituent kernels. Then we turned tocalculate their polar coding exponent. Both the polarization property and the rate of polarization are asymptoticin the code length. Considering ﬁnite length instances of these codes suggests possible advantages in the error-correction performances and the decoder complexity.Throughout the paper we used an example based on ( u + v, v ) ⊗ kernel and a quaternary kernel of size 4.Our preliminary intention in using this example was to simplify the introduction of mixed-kernels and theirrelevant notations and deﬁnitions. Simulations of the SCL decoding algorithm of this example (taking G RS (4)as the quaternary kernel) indicate that this scheme is attractive both in terms of error-correction performanceand in terms of decoder complexity. Indeed, in many cases this M ixed − RS Acknowledgements

The authors would like to thank Dr. Nissim Halabi for helpful discussions and for his contribution to thedevelopment of the simulation software that produced Section 6 results.

References [1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”

IEEE Trans. Inf. Theory , vol. 55, no. 7, pp. 3051–3073, 2009.[2] S. B. Korada, E. Sasoglu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds, andconstructions,”

IEEE Trans. Inf. Theory , vol. 56, no. 12, pp. 6253–6264, 2010. [Online]. Available:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5625639[3] R. Mori and T. Tanaka, “Channel polarization on q-ary discrete memoryless channels by arbitrarykernels,” Jan. 2010. [Online]. Available: http://arxiv.org/abs/1001.2662[4] N. Presman, O. Shapira, and S. Litsyn, “Binary polar code kernels from code decompositions,” Jan. 2011.[Online]. Available: http://arxiv.org/abs/1101.0764[5] J. Forney, G. D.,

Concatenated Codes . Cambridge, MA: M.I.T. Press, 1966.[6] E. Blokh and V. Zyabolov, “Coding of generalized concatenated codes,”

Probl. Peredachi. Inform. , vol. 10,no. 3, pp. 45–50, 1974.[7] V. Zinoviev, “Generalized concatenated codes,”

Probl. Peredachi. Inform. , vol. 12, no. 1, pp. 5–15, 1976.[8] I. Dumer,

Handbook of Coding Theory . Eds., Elsevier, The Netherlands, 1998, ch. Concatenated Codesand Their Multilevel Generalizations. 289] J. Forney, G. D., “Codes on graphs: normal realizations,”

IEEE Trans. Inf. Theory , vol. 47, no. 2, pp.520–548, 2001.[10] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, EPFL, 2009. [Online].Available: http://library.epﬂ.ch/en/theses/?nr=4461[11] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memo-ryless channels,” in

Proc. IEEE Int. Symp. Information Theory ISIT 2009 , 2009, pp. 1496–1500.[12] E. Sasoglu, E. Telatar, and E. Arikan, “Polarization for arbitrary discrete memoryless channels,” Aug.2009. [Online]. Available: http://arxiv.org/abs/0908.0302[13] K. L. Chung,

A Course in Probability Theory , 3rd ed. Academic Press, 2001.[14] R. Mori and T. Tanaka, “Source and channel polarization over ﬁnite ﬁelds and Reed–Solomonmatrices,”

IEEE Trans. Inf. Theory , vol. 60, no. 5, pp. 2720–2736, 2014. [Online]. Available:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6774879[15] S. Litsyn,

Handbook of Coding Theory . Eds., Elsevier, The Netherlands, 1998, ch. An Updated Table ofthe Best Binary Codes Known.[16] R. Mori and T. Tanaka, “Non-binary polar codes using reed-solomon codes and algebraic geometry codes,”Jul. 2010. [Online]. Available: http://arxiv.org/abs/1007.3661[17] M.-K. Lee and K. Yang, “The exponent of a polarizing matrix constructed from the kroneckerproduct,”

Designs, Codes and Cryptography , vol. 70, no. 3, pp. 313–322, 2014. [Online]. Available:http://dx.doi.org/10.1007/s10623-012-9689-z[18] P. Trifonov, “Eﬃcient design and decoding of polar codes,”

IEEE Trans. Commun. , vol. 60, no. 11, pp.3221–3227, 2012. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6279525[19] A. Alamdar-Yazdi and F. Kschischang, “A simpliﬁed successive-cancellation decoder for polarcodes,”

IEEE Commun. Lett. , vol. 15, no. 12, pp. 1378–1380, 2011. [Online]. Available:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6065237[20] V. Miloslavskaya and P. Trifonov, “Design of binary polar codes with arbitrary kernel,” in

IEEE InformationTheory Workshop , 2012.[21] C. Leroux, I. Tal, A. Vardy, and W. Gross, “Hardware architectures for successive cancellation decoding ofpolar codes,” in

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on ,2011, pp. 1665–1668. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5946819[22] C. Leroux, A. Raymond, G. Sarkis, and W. Gross, “A semi-parallel successive-cancellation decoderfor polar codes,”

IEEE Trans. Signal Process. , vol. 61, no. 2, pp. 289–299, 2013. [Online]. Available:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6327689[23] N. Presman and S. Litsyn, “Recursive descriptions of decoding algorithms and hardware architectures forpolar codes,” Sep. 2012. [Online]. Available: http://arxiv.org/abs/1209.4818[24] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “Llr-based successive cancellation list decoding ofpolar codes,” arXiv preprint arXiv:1401.3753 , 2013. [Online]. Available: http://arxiv.org/abs/1401.3753[25] I. Tal and A. Vardy, “List decoding of polar codes,” in

Information Theory Proceed-ings (ISIT), 2011 IEEE International Symposium on , 2011, pp. 1–5. [Online]. Available:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6033904[26] ——, “List decoding of polar codes,” Jun. 2012. [Online]. Available:http://webee.technion.ac.il/people/idotal/papers/preprints/polarList.pdf[27] T. Cormen, C. Leiserson, R. Rivest, and C. Stein,