[PDF] On Optimal Finite-length Binary Codes of Four Codewords for Binary Symmetric Channels

Abstract

Finite-length binary codes of four codewords are studied for memoryless binary symmetric channels (BSCs) with the maximum likelihood decoding. For any block-length, best linear codes of four codewords have been explicitly characterized, but whether linear codes are better than nonlinear codes or not is unknown in general. In this paper, we show that for any block-length, there exists an optimal code of four codewords that is either linear or in a subset of nonlinear codes, called Class-I codes. Based on the analysis of Class-I codes, we derive sufficient conditions such that linear codes are optimal. For block-length less than or equal to 8, our analytical results show that linear codes are optimal. For block-length up to 300, numerical evaluations show that linear codes are optimal.

Full PDF

aa r X i v : . [ c s . I T ] J u l On Optimal Finite-length Binary Codes of FourCodewords for Binary Symmetric Channels

Yanyan Dong and Shenghao YangThe Chinese University of Hong Kong, Shenzhen

Abstract

Class-I codes. Based on the analysis of Class-I codes, we derive sufﬁcient conditions suchthat linear codes are optimal. For block-length less than or equal to , our analytical results show that linear codesare optimal. For block-length up to , numerical evaluations show that linear codes are optimal. I. I

NTRODUCTION

A binary code of block length n and codebook size k is called an ( n, k ) code, which is said to be linear ifit is a subspace of { , } n . Linear codes have been extensively studied in coding theory. For memoryless binarysymmetric channels (BSCs), asymptotically capacity achieving linear codes with low encoding/decoding complexityhave been designed, for example polar codes [1]. For given n and k , however, whether linear codes are optimal ornot among all ( n, k ) codes for BSCs in terms of the maximum likelihood (ML) decoding is a long-standing openproblem, traced back to the early days of information and coding theory [2], [3].For given n and k , if perfect or quasi-perfect binary ( n, k ) codes exist, they are optimal for BSC [3]. For example,the optimal ( n, codes are either perfect or quasi-perfect and hence are known [2]. Readers can ﬁnd more aboutoptimal codes in [2], [3]. More recently, Chen, Lin and Moser [4] gave the ﬁrst proof of the optimal binary codesof codewords for any block length n .In this paper, we study binary ( n, codes. The best linear ( n, codes have been explicitly characterized for eachblock length n [4], [5], and are conjectured to be optimal among all ( n, codes in terms of the ML decoding [4].In this paper, we derive a general approach for comparing the ML decoding performance of two ( n, codes withcertain small difference. Based on this approach, we verify that linear ( n, codes are optimal for a range of n .In particular, we show that for any block-length n , there exists an optimal ( n, code that is either linear orin a subset of nonlinear codes, called Class-I codes. Based on the analysis of Class-I codes, we derive sufﬁcientconditions such that linear codes are optimal. For n ≤ , our analytical results show that linear codes are optimal.For n up to , numerical evaluations show that linear codes are optimal, where the evaluation complexity is O ( n ) . Moreover, most ML decoding comparison results obtained in this paper about ( n, codes are universal inthe sense that they do not depend on the crossover probability of the BSC.In the remainder of this paper, we ﬁrst formulate the problem and introduce our main results. In § III, we outlinea general approach for comparing the ML decoding performance of two codes, for which two special cases areused in this paper: two codes with only one codeword different in one bit (see § IV) or in two bits (see § VI). § Vis dedicated to the analysis of Class-I codes, based on the results in § IV.II. P

ROBLEM F ORMULATION AND M AIN R ESULTS

A. Formulation of ( n, k ) Codes An ( n, k ) binary node C is a subset of { , } n of size k , and is said to be linear if it is a subspace of { , } n .Using the codewords of C as rows, we can form a k × n binary matrix C , which is used interchangeably with C .For i = 1 , . . . , k , let c i be the i th row of C , i.e., a codeword of C .For x , y ∈ { , } n , let w ( x ) be the Hamming weight of x and let x ⊕ y be the bit-wise exclusive OR of x and y , so that w ( x ⊕ y ) is the Hamming distance between x and y . Let d C ( y ) = min c ∈C w ( c ⊕ y ) . Consider the communication over a memoryless binary symmetric channel (BSC) with crossover probability ǫ ( < ǫ < ). For a channel input x ∈ { , } n , the channel output is y ∈ { , } n with probability p ( y | x ) = (1 − ǫ ) n − w ( x ⊕ y ) ǫ w ( x ⊕ y ) . Suppose an ( n, k ) code C is used for this BSC. The maximum-likelihood (ML) decoding rule decodes an output y to a code word c if w ( c ⊕ y ) = d C ( y ) , where a tie is resolved arbitrarily. Deﬁne α d ( C ) = |{ y ∈ { , } n : d C ( y ) = d }| , which is the number of outputs y that is decoded to a codeword of distance d . Note that the value α d ( C ) does notdepend on ǫ . The (average) correct decoding probability of C is λ C = 1 | C | n X d =0 α d ( C )(1 − ǫ ) n − d ǫ d . (1)We say an ( n, k ) code C is better or no worse than another ( n, k ) code C ′ if λ C ≥ λ C ′ . We say an ( n, k ) code C is optimal if it is better than any other ( n, k ) codes. If valid for all ǫ , a property of a code is said to be universal . B. Main Results about ( n, Codes

In this paper, we focus on ( n, codes, which have four codewords. The columns of an ( n, code C are ofvectors in { , } . We use h i i k to denote the binary vector of length k associated with an integer i ≥ . When thelength of the vector is implied in the context, the superscript is omitted. For example, h i = h i ⊤ , h i = h i ⊤ . We use { i } C to denote the index set of the columns of C equal to h i i , and let | i | C be the size of { i } C . We maywrite | i | C as | i | when the code C is implied in the context. For example, the (7 , code C =   has the i th column of type h i i and | i | = 1 for i = 1 , . . . , .The analysis of the column types of C has been used in literature [4], [5]. Chen, Lin and Moser [4] compareddifferent codes by induction in n , i.e., increasing one column a time. Here, we compare two codes of the samelength with difference in one or two positions in one codeword, and we ﬁnd that it is also convenient to use thecolumn representation in our analysis. The following facts about ( n, codes are straightforward [4], [5]. First,codes with all-zero columns are not optimal. Second, ﬂipping all the bits in a column does not change the decodingperformance. Third, row and column permutations of C do not affect the decoding performance. Due to these facts,we only need to consider C of types of the columns: h i , h i , . . . , h i for ﬁnding an optimal code. Theorem 1.

Consider an ( n, code C of codewords c , . . . , c with w ( c s ⊕ c t ) even for certain ≤ s = t ≤ ,and with a column of type h − s i . Let C ′ be the code obtained by replacing a column of type h − s i of C by h − s + 2 − t i . Then, λ C ′ ≥ λ C .Proof. See § IV-B.For example, suppose an ( n, code C has a column h i and w ( c ⊕ c ) even. The above theorem says, if wereplace a column of type h i of C by h i , the ML decoding performance is better. Corollary 2.

Consider an ( n, code C with P i =1 | i | C = n . There exists a code C ′ with λ C ′ ≥ λ C and | | C ′ + | | C ′ + | | C ′ + | | C ′ = n .Proof. In this proof, we write | i | C as | i | . Suppose at least two of | | , | | , | | are positive, since otherwise, the proofis done. We argue the case that | | and | | are positive. Other cases can be converted to this case by interchangingrows. Write w ( c ⊕ c ) = | | + | | + | | + | | w ( c ⊕ c ) = | | + | | + | | + | | w ( c ⊕ c ) = | | + | | + | | + | | . We claim that one of the above three weights must be even. For example, assume w ( c ⊕ c ) is odd. Then | | + | | and | | + | | are of different parity, so that one of w ( c ⊕ c ) and w ( c ⊕ c ) must be even.Suppose w ( c ⊕ c ) is even. As | | is positive, Theorem 1 implies a better code with | | smaller and | | bigger.Repeating the above argument, there exists a better code C ′ where at most one of | | C ′ , | | C ′ , | | C ′ is positive and P i =1 | i | C ′ = n . The corollary is proved by properly interchanging rows of C ′ . Corollary 3.

Consider a non-Class-I, nonlinear ( n, code C with | | C + | | C + | | C + | | C = n . There exists aneither linear or Class-I code C ′ with λ C ′ ≥ λ C and | | C ′ < | | C .Proof. In this proof, we write | i | C as | i | . Since C is nonlinear, | | > . We claim that at least one of the followingthree weights are even: w ( c ⊕ c ) = | | + | | + | | (2) w ( c ⊕ c ) = | | + | | + | | (3) w ( c ⊕ c ) = | | + | | + | | , . (4)When | | is odd, | | , | | and | | are not of the same parity since C is not of Class-I, which implies at least one of (2),(3), (4) is even. By Theorem 1, there is a better code C with | | C = | |− even and | | C + | | C + | | C + | | C = n .When | | is even, if (2), (3), (4) are all odd, then | | + | | , | | + | | and | | + | | are all odd, which is not possiblefor any integers | | , | | , | | . Then at least one of (2), (3), (4) is even, Theorem 1 implies a better code C with | | C = | | − odd and | | C + | | C + | | C + | | C = n .For both case, a better code with | | strictly smaller always exists if C is non-Class-I, nonlinear. By repeatingthe similar argument on C , we eventually obtain a better code C ′ which either has | | C ′ = 0 , i.e., linear or is ofClass-I so that (2), (3), (4) are all odd. Theorem 4.

Consider an ( n, code C with ﬁrst two columns of the types h i (resp. h i , h i ) and h i . Let C ′ bethe code obtained by replacing the ﬁrst two columns of C with h i and h i (resp. h i and h i , h i and h i ). Then λ C ′ ≥ λ C .Proof. See § VI.Using the above two theorems, we can reduce the searching range for an optimal ( n, code. Note that a linear ( n, code (subject to row interchanging) has | | + | | + | | = n . Deﬁnition 1. An ( n, code C is of Class-I if | | is odd, | | , | | , | | are of the same parity, and | | + | | + | | + | | = n . Theorem 5.

An optimal ( n, code exists in the set formed by all the linear codes and Class-I codes.Proof. Consider an arbitrary ( n, code C . As column ﬂipping does not change the ML decoding performance,we consider a code C with P i =1 | i | = n obtained by column ﬂipping of C . We then discuss C in two cases.If < | | ≤ | | + | | + | | in C , by Theorem 4, there exists a code C with λ C ≥ λ C and P i =1 | i | = n obtained by replacing, one-by-one, pairs of columns of types h i and h s i ( s = 0 , , ). Following Corollary 2,there exists code C , no worse than C , where | | + | | + | | + | | = n . Then by Corollary 3, there exists an eitherlinear or Class-I code C such that λ C ≥ λ C .If | | + | | + | | < | | in C , by Theorem 4, there exists a better code C ′ with | | + | | + | | = 0 . By ﬂippingcolumns, we can obtain a code C ′ of the same performance of C ′ that has | | > and | | + | | + | | + | | = n .Again, by Corollary 3, the proof is completed. We have the following properties of Class-I codes.

Theorem 6.

Let C be a Class-I ( n, code with | | C = 1 . Let C ′ be the code obtained by replacing the h i column of C by h s i , where s = arg min i =3 , , | i | C . Then λ C ′ ≥ λ C .Proof. See § V-B.In the above theorem, code C ′ is linear. Theorem 7.

Let C be a Class-I ( n, code with min i =3 , , | i | C = 0 or . Let C ′ be the code obtained by replacingone h i column of C by h s i , where s ∈ { , , } has | s | C = 0 or . Then λ C ′ ≥ λ C .Proof. See § V-CThe above analysis enables us to obtain the following sufﬁcient condition about the optimality of linear codes.

Theorem 8.

Fix a block length n . If for any Class-I ( n, code C , there exists an ( n, code C ′ such that | | C ′ < | | C , | | C ′ + | | C ′ + | | C ′ + | | C ′ = n and λ C ≤ λ C ′ , then linear ( n, codes are optimal.Proof. Assume that all optimal ( n, codes are nonlinear. By Theorem 5, there must exist an optimal ( n, code C that is Class-I. By the condition of the theorem, there exists an optimal code C ′ such that | | C ′ < | | C and | | C ′ + | | C ′ + | | C ′ + | | C ′ = n . If | | C ′ = 0 , then C ′ is linear and we get a contradiction to the assumptionthat all optimal ( n, codes are nonlinear. If C ′ is Class-I, we repeat the above argument. If C ′ is non-Class-I andnonlinear, then by Corollary 3, there exists an optimal code C ′′ with | | C ′′ < | | C ′ that is either linear or Class-I.If C ′′ is linear, we get a contradiction to the assumption. If C ′′ is Class-I, we can repeat the above argument. As | | C is ﬁnite, the process will eventually stop with an optimal linear code, i.e., a contradiction to the assumptionthat all optimal ( n, codes are nonlinear. Corollary 9.

For n ≤ , linear ( n, codes are optimal.Proof. Fix n ≤ . For a Class-I ( n, code with | | = 1 , by Theorem 6, there exists a better linear code. For aClass-I ( n, code C with | | ≥ , we have | | + | | + | | ≤ which implies min {| | , | | , | |} ≤ . By Theorem 7,we have λ C ≤ λ C ′ for the ( n, code C ′ obtained by replacing one h i column of C by h s i , where s ∈ { , , } with | s | C = 0 or . By Theorem 8, linear ( n, codes are optimal for n ≤ .For a general block length n , if we can verify the condition of Theorem 8, then there exists an optimal ( n, code that is linear. For each Class-I ( n, code C , we can compare it with the code C ′ obtained by replacing one h i column of C by h s i with s = arg min i =3 , , | i | C . See an algorithm for verifying the optimality of linear ( n, codes in § V-D, where the evaluation complexity is O ( n ) . We have veriﬁed that for n ≤ , linear codes areoptimal. III. A N A PPROACH OF C OMPARING TWO ( n, C ODES

We further deﬁne some notations. Let C be an ( n, code with the j th codeword/row c j , j = 1 , . . . , . Denote d j ( y ) = w ( c j ⊕ y ) . (5)For a binary vector y , denote ( y ) i or y i as the i th entry of y . For example, the rd entry of h i is ( h i ) =([0 0 1 0] ⊤ ) = 1 . For i = 0 , , . . . , , deﬁne w i ( y ) = P j ∈{ i } C y j for y ∈ { , } n . When y is clear from thecontext, we write w i = w i ( y ) . For a vector y ∈ { , } n , we can rewrite d j ( y ) deﬁned in (5) as d j ( y ) = X i =0 h | i | / − ( h i i ) j ( w i − | i | / i . (6)We also write d j = d j ( y ) when y is clear from the context. For example, for C of columns of types only h i , h i , . . . , h i , d ( y ) = w + w + w + w + w + w + w , (7) d ( y ) = w + w + w + w + w + w + w , (8) d ( y ) = w + w + w + w + w + w + w , (9) d ( y ) = w + w + w + w + w + w + w , (10)where w i = | i | C − w i .We compare C with another ( n, code C ′ obtained by modifying C as follows. Let O be a nonempty, propersubset of { , , , } and let P be its complement, which is also nonempty. Let C ′ be the code obtained by ﬂippingthe ﬁrst t bits of c i for each i ∈ P . Denote by c ′ i the i th codeword/row of C ′ , i = 1 , . . . , . For y ∈ { , } n , let f t ( y ) be the vector obtained by ﬂipping the ﬁrst t bits of y . We see that c ′ i = c i for i ∈ O and c ′ i = f t ( c i ) for i ∈ P .Denote by s τ , τ = 1 , , . . . , t the τ th column of C . For y ∈ { , } n , let d ′ i ( y ) = d i ( y ) + t X τ =1 ( − ( s τ ) i ( y τ − y τ ) , (11)where y τ = y τ ⊕ . For a nonempty subset S ⊂ { , . . . , } , let d S ( y ) = min i ∈S d i ( y ) and d ′S ( y ) = min i ∈S d ′ i ( y ) . We have d C ( y ) = min { d O ( y ) , d P ( y ) } , (12) d C ′ ( y ) = min { d O ( y ) , d ′P ( y ) } , (13) d C ′ ( f t ( y )) = min { d O ( f t ( y )) , d ′P ( f t ( y )) } = min { d ′O ( y ) , d P ( y ) } . (14)Our approach to compare the ML decoding performance of C and C ′ is based on a pair of partitions {Y i , i =1 , . . . , i } and {Y ′ i , i = 1 , . . . , i } of { , } n , where i indicates the number of subsets in each partition. This pair of partitions satisﬁes the following properties: 1) for each i , |Y i | = |Y ′ i | , and 2) for each i , there exists an one-to-oneand onto mapping g i : Y i → Y ′ i such that one of the following conditions holds:1) for all y ∈ Y i , d C ( y ) = d C ′ ( g i ( y )) ;2) for all y ∈ Y i , d C ( y ) < d C ′ ( g i ( y )) ;3) for all y ∈ Y i , d C ( y ) > d C ′ ( g i ( y )) .Such a pair of partitions exists. For example, when i = 2 n , Y i = Y ′ i = {h i i n } for i = 0 , , . . . , i − form a pairof partitions satisfying the desired properties. But this example does not help to simplify the problem. For the twospecial cases used to prove Theorem 1 and 4, there exists such a pair of partitions with i = 5 .In the following discussion, we write min { a, b } as a ∧ b . For a function g : { , } n → R , we write { y ∈ { , } n : g ( y ) ≥ } as { g ≥ } to simplify the notations.IV. C HANGE OF ONE C OLUMN

In this section, we study how the ML decoding performance is affected after changing one column of an ( n, code. A. General Results

Consider an ( n, code C with the ﬁrst column h s i , ≤ s ≤ . Let C ′ be the code formed by changing theﬁrst column of C to h s ′ i . Following the notations in § III, O is the set of index j such that ( h s i ) j = ( h s ′ i ) j , and P is the set of index j such that ( h s i ) j = ( h s ′ i ) j . Assume s ′ = s and s ′ = 15 − s , and hence both O and P arenonempty.In this case, d ′ i deﬁned in (11) becomes d ′ i ( y ) = d i ( y ) + ( − ( h s i ) i ( y − y ) . (15)Consider an example with s = 1 and s ′ = 3 . Now O = { , , } and P = { } . Substituting h i into (15), d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) + y − y . and hence d O ( y ) = d ∧ d ∧ d , (16) d P ( y ) = d , (17) d ′O ( y ) = [( d ∧ d ) − y + y ] ∧ ( d + y − y ) , (18) d ′P ( y ) = d − y + y . (19) We are ready to form the partitions. Deﬁne the following subsets of { , } n : Y = { d O ≤ d P < d ′P } ∪ { d O ≤ d ′P ≤ d P , d ′O ≤ d ′P } , Y = { d P ≤ d ′P , d P < d O } ∪ { d ′P < d P ≤ d O , d P ≤ d ′O } , Y = { d ′P = d ′O < d P = d O } , (20) Y = { d P = d ′P = d O < d ′O } , Y = { d ′P = d O < d ′O = d P } . (21)For i = 2 , , , deﬁne Y ′ i = { f ( y ) : y ∈ Y i } , where function f (deﬁned in § III) ﬂips the ﬁrst bit of a binary vector. The next lemma shows that both {Y i , i =1 , . . . , } and {Y , Y ′ , Y , Y ′ , Y ′ } are partitions of { , } n and satisfy the desired properties described in § III.

Lemma 10.

For the ( n, codes C and C ′ formulated above, both {Y , Y , Y , Y , Y } and {Y , Y ′ , Y , Y ′ , Y ′ } are partitions of { , } n . Moreover, For y ∈ Y , d C ( y ) = d C ′ ( y ) = d O ; For y ∈ Y , d C ( y ) = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ ; For y ∈ Y , d C ( y ) = d P = d C ′ ( y ) + 1 = d ′P + 1 ; For y ∈ Y , d C ( y ) = d O = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ ; For y ∈ Y , d C ( y ) + 1 = d O + 1 = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ .Proof. See Appendix A.Now we move on to compare λ C and λ C ′ as deﬁned in (1). Deﬁne for i = 1 , . . . , and d = 0 , , . . . , n , α id ( C ) = |{ y ∈ Y i : d C ( y ) = d }| . (22)As {Y , Y , Y , Y , Y } is a partition of { , } n , we have α d ( C ) = X i =1 α id ( C ) . By the deﬁnition of Y and Y , α ( C ) = 0 and α n ( C ) = 0 . Theorem 11.

For two ( n, codes C and C ′ with only one column different, λ C ′ ≥ λ C if and only if n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − ≥ . Proof.

See Appendix A.

Corollary 12.

For two ( n, codes C and C ′ with only one column different, λ C ′ ≥ λ C if for d = 1 , . . . , n , d X i =1 α i ( C ) ≥ d − X i =0 α i ( C ) . Proof.

See Appendix A.If we can compare C and C ′ based on Corollary 12, their relation is universal in the sense that it does notdepend on ǫ . B. Proof of Theorem 1

Now we give a proof of Theorem 1.As interchanging rows/columns does not change the performance of C , we only consider the following case whenproving the theorem: C has the ﬁrst column h i and w ( c ⊕ c ) is even. Let C ′ be the code obtained by replacingthe ﬁrst column of C by h i . Substituting s = 1 and s ′ = 3 to the discussion in § IV-A, we have O = { , , } and P = { } , and hence Y = { d ′ = d { , , } < d = d ′{ , , } } . Assume Y is nonempty and ﬁx y ∈ Y . As d ′ ( y ) = d ( y ) − y + y , we have y = 1 . Further, due to d { , , } ( y ) = d ∧ d ∧ d ,d ′{ , , } ( y ) = ( d − ∧ ( d − ∧ ( d + 1) , we have d { , , } = d and hence d = d + 1 . By (6), d ( y ) + d ( y )= X i h | i | / − ( h i i ) ( w i − | i | / i + X i h | i | / − ( h i i ) ( w i − | i | / i = X i :( h i i ) =( h i i ) | i | + 2 X i :( h i i ) =( h i i ) (cid:20) | i | − ( h i i ) (cid:18) w i − | i | (cid:19)(cid:21) = w ( c ⊕ c ) + 2 X i :( h i i ) =( h i i ) (cid:20) | i | − ( h i i ) (cid:18) w i − | i | (cid:19)(cid:21) . As w ( c ⊕ c ) is even, we see that d + d is even, which is a contradiction to d = d + 1 . Therefore, Y = ∅ and hence by Corollary 12, λ C ′ ≥ λ C . V. A NALYSIS OF C LASS -I C

ODES

Recall the deﬁnition of Class-I codes in Deﬁnition 1. In this section, we consider a Class-I ( n, code C withthe ﬁrst column h i . Let C ′ be the code obtained by replacing the ﬁrst column of C to h i . The ML decodingperformance of C and C ′ can be compared using the approach introduced in § IV-A.

A. Characterizations of Y and Y Guided by Theorem 11 and Corollary 12, we ﬁrst study Y and Y deﬁned in (20) and (21). Lemma 13.

For a Class-I ( n, code C with the ﬁrst column h i and C ′ obtained by replacing the ﬁrst columnof C to h i , Y = { y = 1 , d ≥ d ∧ d = d } , Y = { y = 1 , d ∧ d ≥ d + 2 = d + 1 } . Proof.

For code C and C ′ deﬁned above, we have (16) – (19). For y ∈ Y , d − y + y < d implies y = 1 ,and d = d ∧ d ∧ d and d − d ∧ d ) − ∧ ( d + 1) together implies d ∧ d ≤ d and d = d ∧ d .For y ∈ Y , d − y + y < d implies y = 1 , and d − d ∧ d ∧ d and d = [( d ∧ d ) − ∧ ( d + 1) together implies d + 1 ≤ ( d ∧ d ) − and d − d .

1) Characterization of α i : For y ∈ Y , by Lemma 10, d C ( y ) = d . By (7) – (10) and Lemma 13, we have thefollowing necessary and sufﬁcient condition for y ∈ Y with d C ( y ) = i : y = 1 and w + w = i − w − w ,w − w ≤ w + w − w − w ,w − w = w + w − ( w + w ) ∧ ( w + w ) . We discuss two cases according to w + w < w + w or not.Deﬁne Y A ( i ) as the collection of y satisfying y = 1 and w + w < ( | | + | | ) / , (23) w + w = i − ( | | + | | ) / , (24) w + w − w ≤ ( | | + | | − | | ) / , (25) w + w = ( | | + | | ) / . (26)We have |Y A ( i ) | = X w ≥ ,w ,w ,w : (23) , (24) , (25) , (26) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . Deﬁne Y B ( i ) as the collection of y satisfying y = 1 and w + w ≥ ( | | + | | ) / , (27) w + w = i − ( | | + | | ) / , (28) w + w − w ≤ ( | | + | | − | | ) / , (29) w − w = ( | | − | | ) / . (30)We have |Y B ( i ) | = X w ≥ ,w ,w ,w : (27) , (28) , (29) , (30) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . (31)We see that α i = |Y A ( i ) | + |Y B ( i ) | .

2) Characterization of α i : For y ∈ Y , by Lemma 10, d C ( y ) = d − . By (7) – (10) and Lemma 13, we havethe following necessary and sufﬁcient condition for y ∈ Y with d C ( y ) = i : y = 1 and w + w = i + 1 − w − w ,w − w = w + w − w − w + 1 ,w − w ≥ w + w − ( w + w ) ∧ ( w + w ) + 1 , which can be further simpliﬁed as y = 1 and w = ( n + | | − / − i, (32) w + w − w = ( | | + | | − | | + 1) / , (33) w + w ≥ ( | | + | | ) / , (34) w − w ≥ ( | | − | | ) / . (35)Hence α i = X w ≥ ,w ,w ,w : (32) , (33) , (34) , (35) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . B. Class-I Codes with | | = 1 Following the discuss in the last subsection, we consider the special case with | | = 1 , and prove Theorem 6for the case | | = min {| | , | | , | |} . For other cases, we can perform row interchanging and column bit ﬂipping toconvert the problem to this case.When | | = 1 , w = 1 . Using the characterization in the last subsection, we have d − X i =0 α i = X W (cid:18) | | w (cid:19)(cid:18) | | | |−| | + w (cid:19)(cid:18) | | w (cid:19) (36)where W = ( (34) , (33) + (35) ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | ) =  w + w ≥ | | + | | +1 ,w − w ≥ | |−| | +1 ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| |  . (37)Similarly, d X i =1 α i ≥ d X i =1 |Y B ( i ) | = X W (cid:18) | | w (cid:19)(cid:18) | | | |−| | + w (cid:19)(cid:18) | | w (cid:19) = X W ′ (cid:18) | | | |−| | + w ′ (cid:19)(cid:18) | | | |−| | + w ′ (cid:19)(cid:18) | | | |−| | + w ′ (cid:19) (38) where W = ( (27) + (30) , (29) + (30) ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | ) =  w + w ≥ | | + | | w − w ≤ | |−| |− w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| |  , W ′ = ( w ′ + w ′ ≥ | | + | | , w ′ − w ′ ≥ | |−| | +12 w ′ ≥ n + | | +12 − d, | |−| | ≤ w ′ ≤ | | + | | , | |−| | ≤ w ′ ≤ | | + | | ) , and (38) is obtained by change of variables w ′ − | | = w − | | and w ′ − | | = w − | | .We show that W ⊂ W ′ . Due to | | ≤ | | , we have | |−| | ≤ and | | + | | ≥ | | . For ( w , w ) ∈ W , we have w + w ≥ | | + | | + 1 , w − w ≥ | |−| | + 1 and ≤ w ≤ | | , which implies | |−| | + 1 ≤ w ≤ | | + | | − .Thus | | − | | ≤ w ≤ | | + | | , | | − | | ≤ w ≤ | | + | | , showing ( w , w ) ∈ W ′ .By Lemma 14 in Appendix B, for ( w , w ) ∈ W , we have (cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ (cid:18) | | | |−| | + w (cid:19)(cid:18) | | | |−| | + w (cid:19) . Comparing (36) and (38), we obtain P di =1 α i ≥ P d − i =0 α i for any d = 1 , . . . , n . By Corollary 12, λ C ′ ≥ λ C ,proving Theorem 6. C. Class-I Codes with | | odd, | | = 0 , Here we give a proof of Theorem 7 for the case | | = min {| | , | | , | |} . Otherwise, we can perform rowinterchanging and column bit ﬂipping (which do not change the ML decoding performance) so that C satisﬁes thecondition. | | = 0 : When | | = 0 , we have w = 0 and n is odd. By (32), α i = 0 if i = n − . So when d < n +12 , P d − i =0 α i = 0 and hence P di =1 α i ≥ P d − i =0 α i ; when d ≥ n +12 , d − X i =0 α i = α n − = X w ≥ , (33) + (34) , (34) ,w = w − w + | | + | |−| | +12 ,w ≤ | | − (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ , (33) + (34) , (34) (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +1 ,w − w ≤ | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) , (39)where (33) + (34) is w − w ≤ | |−| |− . Substituting w ′ = | | − w + 1 into (39), we obtain α n − ≤ X ≤ w ′ ≤| | ,w ≥ | | +1 ,w ′ ≤ | | + | |− − w (cid:18) | | − w ′ − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) (40) When d ≥ n +12 , we further have d X i =1 α i ≥ n +12 X i =1 α i ≥ n +12 X i =1 |Y B ( i ) | = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ | |−| | +12 (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | ,w ≤ | |−| |− + w (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) , (41)where (27) + (30) is w ≥ | | and (29) + (30) is w − w ≤ | |−| | , which is equivalent to w − w ≤ | |−| |− as | | − | | is odd.As | |−| |− + w ≥ | | + | |− − w when w ≥ | | , comparing the RHS’ of (40) and (41), we have P di =1 α i ≥ P d − i =0 α i for d ≥ n +12 . By Corollary 12, λ C ′ ≥ λ C , proving the case when | | = 0 . | | = 1 : When | | = 1 , we have w = 0 or , | | and | | are odd, and n is even. In (32), w = 1 when i = n − , and hence α n − = X w ≥ ,w ≥ | | +12 , (32) + (35) ,w = w − w + | | + | |−| | +12 ,w ≤ | |− (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ ,w ≥ | | +12 , (32) + (35) (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +12 ,w − w ≤ | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) , (42) = X | |≥ w ′ ≥ ,w ≥ | | +12 ,w ′ ≤ | | + | | − w (cid:18) | | − w ′ − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) . (43)where (32) + (35) is w − w ≤ | |−| |− , and (43) is obtained by substituting w ′ = | | − w + 1 into (42). In (32), w = 0 when i = n , and hence α n = X w ≥ ,w ≥ | | +32 , (33) + (35) ,w = w − w + | | + | |−| | +12 ,w ≤ | |− (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ ,w ≥ | | +32 , (33) + (35) (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +32 ,w − w ≤ | |−| | − (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) (44) = X | |≥ w ′ ≥ ,w ≥ | | +32 ,w ′ ≤− w + | | + | |− (cid:18) | | − w ′ − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) . (45)where (33) + (35) means w − w ≤ | |−| | − , and (45) is obtained by substituting w ′ = | | − w + 1 into (44).Following (31), we have (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ n − | | −| | (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | |− ,w ≤ w + | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) (46)where (27) + (30) implies w ≥ | |− , (29) + (30) implies w − w ≤ | |−| |− , and (46) follows that | |−| |− ≤ n − | | − | | . Since w + | |−| |− ≥ − w + | | + | | when w ≥ | | +12 , comparing the RHS’ of (43) and (46),we get α n − ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (47)Following (31), we have (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ n +1 − | | −| | (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w : w ≥ | | +12 ,w ≤ w + | |−| | +12 (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) , (48) where (27) + (30) implies w ≥ | | +12 , (29) + (30) implies w − w ≤ | |−| | +12 , and (48) follows that | |−| | +12 ≤ n + 1 − | | − | | . Since w + | |−| | +12 ≥ − w + | | + | |− when w ≥ | | +12 , comparing the RHS’ of (45) and(48), we get α n ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (49)When d < n , P d − i =0 α i = 0 ≤ P di =1 α i . When d = n , by (47), d − X i =0 α i = α n − ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n X i =1 α i . When d ≥ n + 1 , by (47) and (49), d − X i =0 α i = α n − + α n ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d X i =1 α i . Thus we have P di =1 α i ≥ P d − i =0 α i for ≤ d ≤ n . By Corollary 12, λ C ′ ≥ λ C , proving the case when | | = 1 . D. Algorithm for Verifying Optimal Codes

Based on Theorem 8, we give an algorithm for verifying whether linear ( n, codes are optimal for ﬁxed n (seethe pseudocode in Algorithm 1). The algorithm checks each Class-I ( n, code C speciﬁed by ( w , w , w , w ) with the ﬁrst column being h i and | | C ≤ | | C ≤ | | C , and compares C with C ′ obtained by replacing the ﬁrstcolumn of C to h i . Other ( n, Class-I codes can be converted to ones of the above type by ﬂipping columns andinterchanging rows, and hence do not need to be checked again. By Theorem 8, if for each code C checked by thealgorithm we have P di =1 α i ( C ) ≥ P d − i =0 α i ( C ) for d = 1 , . . . , n , which implies λ C ≤ λ C ′ by Corollary 12, thenlinear ( n, codes are optimal.Evaluating Algorithm § V-D, we have veriﬁed that for n ≤ , linear codes are optimal. To total number oftypes of Class-I codes to evaluate is O ( n ) . For each type, there are less than n values α i /α i to evaluate, eachof which has complexity O ( n ) . Therefore, the complexity of the algorithm is O ( n ) .VI. P ROOF OF T HEOREM § III. Let C be an ( n, codewith the ﬁrst two columns h i and h i . Let C ′ be the code obtained by ﬂipping the ﬁrst two bits of c , so that theﬁrst two columns of C ′ are h i and h i . (Other cases of Theorem 4 can be converted to this case by interchangingrows.)Following the notations in § III, O = { , , } , P = { } , and d ′ i ( y ) = d i ( y ) + ( − h i i ( y − y ) + ( − h i i ( y − y ) . Algorithm 1:

Check optimality of linear ( n, codes input : n output: a (If a = − , linear ( n, codes are optimal.)Initialize a = − ; for n = 3 , , , . . . and n ≤ n dofor n = 2 , , . . . , (cid:4) n − n (cid:5) dofor n = n , n + 2 , . . . and n ≤ (cid:4) n − n − n (cid:5) dofor n = n , n + 2 , . . . and n ≤ n − n − n − n do Compute α i and α i , i = 0 , . . . , n for code C with | | C = n , | | C = n , | | C = n , | | C = n ; if P di =1 α i < P d − i =0 α i for some d ∈ { , ..., n } then a = 1 ; break; endendendendend When y = y , we have d ′P ( y ) = d P ( y ) . (50)When y = y , we have d ′ ( y ) = d ( y ) , d ′ ( y ) = d ( y ) , and d ′ ( y ) − d ( y ) = d ′ ( y ) − d ( y ) = ± , (51)and hence ( d ′O ( y ) − d O ( y ))( d ′P ( y ) − d P ( y ))= ( d ′{ , , } ( y ) − d { , , } ( y ))( d ′ ( y ) − d ( y )) ≥ . (52)Deﬁne the following subsets of { , } n : Y = { y = y } , Y = { y = y , d O ≤ d P ∧ d ′P } , Y = { y = y , d O > d P ∧ d ′P , d P ≤ d O ∧ d ′O } , Y = { y = y , d ′P < d O ∧ d ′O < d P } , Y = { y = y , d ′O ≤ d P ∧ d ′P < d O , d ′O < d P } . Recall the function f deﬁned in § III that ﬂips the ﬁrst two bits of a binary vector. For i = 3 , , let Y ′ i = { f ( y ) : y ∈ Y i } . We justify that Y , Y , Y , Y , Y form a partition of { , } n and Y , Y , Y ′ , Y ′ , Y form a partition of { , } n :First, we show that Y ∪ Y = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O } , (53)and then we obtain S i =1 Y i = { , } n . Moreover, Y , . . . , Y are all disjoint by checking the deﬁnition. Thus Y , . . . , Y form a partition of { , } n .To show (53), since Y ⊆ { d O > d P ∧ d ′P } , we have Y = { y = y , d ′P < d O ∧ d ′O < d P } ∩ { d O > d P ∧ d ′P } = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O }∩{ d O ∧ d ′O > d ′P } . (54)Denote A = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O }∩{ d O ∧ d ′O ≤ d ′P } . (55)For y ∈ A , we have d O ( y ) > d ′O ( y ) which implies d P ( y ) > d ′P ( y ) by (51) and (52), and hence d ′O ( y ) ≤ d P ( y ) ∧ d ′P ( y ) < d O ( y ) , d ′ O ( y ) < d P ( y ) . Thus we have y ∈ Y and then A ⊆ Y . For y ∈ Y , we have d O ( y ) > d ′O ( y ) by the deﬁnition above, whichimplies d P ( y ) > d ′P ( y ) by (51) and (52). Then we obtain d P ( y ) > d O ( y ) ∧ d ′O ( y ) = d ′O ( y ) ≤ d ′P ( y ) < d O ( y ) . Thus y ∈ A and then Y ⊆ A . Therefore, Y = A . From (54) and (55), we obtain (53).We further show that Y ′ ∪ Y ′ ⊆ Y ∪ Y . (56)Since f is an one-to-one mapping, we get Y ′ ∪ Y ′ = Y ∪ Y . Therefore, Y , Y , Y ′ , Y ′ , Y form a partition of { , } n .To show (56), we see Y ′ = { y = y , d P < d O ∧ d ′O < d ′P } ⊆ Y , (57) and Y ′ \ Y = { y = y , d ′O > d P ∧ d ′P , d ′P ≤ d O ∧ d ′O }∩ ( { d O ∧ d ′O ≥ d P } ∪ { d ′P ≥ d O ∧ d ′O } )= { y = y , d ′O > d P ∧ d ′P , d ′P ≤ d O ∧ d ′O ,d O ∧ d ′O ≥ d P } ∪ { y = y , d ′O > d P ∧ d ′P ,d ′P ≤ d O ∧ d ′O , d ′P ≥ d O ∧ d ′O } = { y = y , d P ∧ d ′P < max( d P , d ′P ) ≤ d O ∧ d ′O }∪{ y = y , d O ∧ d ′O = d ′P , d P ∧ d ′P < d ′O } (58)where in the last equality d P ∧ d ′P < max( d P , d ′P ) follows from (51). By (52), when y = y , if d O < d ′O , then d P < d ′P ; and if d P > d ′P , then d O ≥ d ′O . Hence, we can verify that both terms to union in (58) are subsets of Y . Therefore, Y ′ \ Y ⊂ Y , which together with (57), proves (56).Moreover, we prove the following claims:1) For y ∈ Y , d C ( y ) = d C ′ ( y ) ;2) For y ∈ Y , d C ( y ) = d C ′ ( y ) = d O ;3) For y ∈ Y , d C ( y ) = d C ′ ( f ( y )) = d P ;4) For y ∈ Y , d C ( y ) = d O ∧ d P ≥ d C ′ ( f ( y )) = d ′O ;5) For y ∈ Y , d C ( y ) = d O ∧ d P ≥ d C ′ ( y ) = d ′P .Following the similar argument as in § IV-A, we can show that λ C ′ ≥ λ C .The above claims are justiﬁed as follows:1) For y ∈ Y , as y = y , we have d ′P = d P by (50), and hence d C ( y ) = d O ( y ) ∧ d P ( y ) = d O ( y ) ∧ d ′P ( y ) = d C ′ ( y ) .

2) For y ∈ Y , by the deﬁnition of Y , we have d O ≤ d P ∧ d ′P , and hence d C ( y ) = d C ′ ( y ) = d O .3) For y ∈ Y , we have d P ≤ d O ∧ d ′O by the deﬁnition of Y . We then have d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) ,d C ′ ( f ( y )) = d ′O ( y ) ∧ d P ( y ) = d P ( y ) .

4) For y ∈ Y , we have y = y , d ′P < d O ∧ d ′O < d P by the deﬁnition of Y . By (52), d O ∧ d ′O = d ′O , whichimplies d C ′ ( f ( y )) = d ′O ( y ) ∧ d P ( y ) = d ′O ( y ) and hence d C ( y ) = d O ( y ) ∧ d P ( y ) ≥ d ′O ( y ) = d C ′ ( f ( y )) .

5) For y ∈ Y , we have y = y , d ′O ≤ d P ∧ d ′P < d O . By (51) and (52), d ′P < d P . Then we have d C ( y ) = d O ∧ d P ≥ d O ∧ d ′P = d C ′ ( y ) and d C ′ ( y ) = d ′P ( y ) . VII. C

ONCLUDING R EMARKS

It is attractive to prove in general whether linear ( n, codes are optimal or not. One further research directionis to extend the technique for comparing the decoding performance of two codes to codes of more than fourcodewords. A PPENDIX AP ROOFS

Proof of Lemma 10.

By checking the deﬁnition, we see that Y , . . . Y are all disjoint. To show they form a partition,we can verify that Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } , Y ∪ Y = { d O > d P ∧ d ′P } and hence ( Y ∪ Y ∪ Y ) ∪ ( Y ∪ Y ) = { , } n .We ﬁrst prove that Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } . Notice that the three sets can be rewritten as Y = { d O ≤ d P < d ′P } ∪ { d O ≤ d ′P ≤ d P , d ′O ≤ d ′P } = ( { d O ≤ d P ∧ d ′P } ∩ { d P < d ′P } ) ∪ ( { d O ≤ d P ∧ d ′P }∩{ d ′P ≤ d P , d ′O ≤ d ′P } ) , (59) Y = { d P = d ′P = d O < d ′O } (a) = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P , d P = d ′P } , (60) Y = { d ′P = d O < d ′O = d P } (b) = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P , d P > d ′P } . (61)For ∀ y , we have | d S ( y ) − d ′S ( y ) | ≤ , (62)which can be obtained by the deﬁnition. Then if d O ≤ d ′P < d ′O , we will have d O = d ′P and thus the equality(a) in (60) holds. Furthermore, if d ′O > d ′P we have d O ≥ d ′P , and if d ′O > d ′P , we have d O ≥ d ′P and thus theequality (b) in (61) holds. By (60) and (61), we have Y ∪ Y = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P } . From (59), this further implies Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } . We now prove Y ∪ Y = { d O > d P ∧ d ′P } . First we rewrite the two sets as Y = { d P ≤ d ′P , d P < d O } ∪ { d ′P < d P ≤ d O , d P ≤ d ′O } = ( { d O > d P ∧ d ′P } ∩ { d P ≤ d ′P } ) ∪ ( { d O > d P ∧ d ′P }∩{ d P > d ′P , d P ≤ d O , d P ≤ d ′O } ) (a) = ( { d O > d P ∧ d ′P } ∩ { d P ≤ d ′P } ) ∪ ( { d O > d P ∧ d ′P }∩{ d P > d ′P , d P ≤ d ′O } ) , (63) Y = { d ′P = d ′O < d P = d O } (b) = { d O > d P ∧ d ′P } ∩ { d P > d ′P , d P > d ′O } . (64)By (62), we can get d P ≤ d O if d O > d ′P , d P > d ′P . Then the equality (a) in (63) holds. Similarly we can justify(b) in (64) by (62).By deﬁnition, Y ′ = { d ′P ≤ d P , d ′P < d ′O } ∪ { d P < d ′P ≤ d ′O , d ′P ≤ d O } , Y ′ = { d P = d ′P = d ′O < d O } , Y ′ = { d P = d ′O < d O = d ′P } . It can be veriﬁed that ( Y ′ ∪ Y ′ ∪ Y ′ ) ∩ ( Y ∪ Y ) = ∅ . As f is a one-to-one mapping, Y ′ ∪ Y ′ ∪ Y ′ = Y ∪ Y ∪ Y .Hence, we conclude that Y , Y ′ , Y , Y ′ , Y ′ form a partition of { , } n .We use the following facts in the proof of claim 1) – 5). d C ( y ) = min { d O ( y ) , d P ( y ) } d C ′ ( y ) = min { d O ( y ) , d ′P ( y ) } d C ′ ( f ( y )) = min { d ′O ( y ) , d P ( y ) } . To prove the claim 1), for y ∈ Y , by the deﬁnition of Y , d O ≤ min { d P , d ′P } . Hence d C ( y ) = d C ′ ( y ) = d O . Toprove claim 2), for y ∈ Y , by the deﬁnition of Y , d P ≤ min { d O , d ′O } , and hence d C ( y ) = d P . Further, d C ′ ( y ′ ) = d O ( y ′ ) ∧ d ′P ( y ′ )= d ′O ( y ) ∧ d P ( y ) = d P ( y ) . To prove claim 3), for y ∈ Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) by the deﬁnition of Y . Moreover, d C ′ ( y ) = d O ( y ) ∧ d ′P ( y ) = d ′P ( y ) < d C ( y ) . By (62), we have d P = d ′P + 1 . To prove claim 4), for y ∈ Y , by the deﬁnition of Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) ,d C ′ ( y ′ ) = d ′O ( y ) ∧ d P ( y ) = d P ( y ) . To prove claim 5), for y ∈ Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d O ( y ) by the deﬁnition of Y . Moreover, d C ′ ( y ′ ) = d ′O ( y ) ∧ d P ( y ) = d ′O ( y ) > d C ( y ) . By (62), we have d ′O = d O + 1 . Proof of Theorem 11. As {Y , Y ′ , Y , Y ′ , Y ′ } is a partition of { , } n , we have α d ( C ′ ) = P i =1 α id ( C ′ ) where α d ( C ′ ) = |{ y ∈ Y : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y : d C ′ ( y ) = d }| =  α d +1 ( C ) d < n d = n ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| =  α d − ( C ) d ≥ d = 0 . The second equality in each line follows from Lemma 10. Together with (22), we write λ C ′ − λ C = 1 |C| n X d =0 ( α d ( C ′ ) − α d ( C ))(1 − ǫ ) n − d ǫ d = 1 |C| n X d =0 5 X i =1 ( α id ( C ′ ) − α id ( C ))(1 − ǫ ) n − d ǫ d = 1 |C| n X d =0 X i =3 , ( α id ( C ′ ) − α id ( C ))(1 − ǫ ) n − d ǫ d , By substituting α d ( C ′ ) = α d +1 ( C ) and α d ( C ′ ) = α d − ( C ) , we see that λ C ′ ≥ λ C if and only if n X d =0 [ α d +1 ( C ) − α d ( C ) + α d − ( C ) − α d ( C )] (cid:18) ǫ − ǫ (cid:19) d ≥ , where the LHS can be further simpliﬁed as n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − (cid:18) − ǫ − ǫ (cid:19) . The theorem is proved by checking that in the above argument, the relation ≥ can be replaced by > . Proof of Corollary 12.

Let ǫ = ǫ − ǫ and let Ψ d = P di =1 (cid:2) α i ( C ) − α i − ( C ) (cid:3) for d = 1 , . . . , n and Ψ = 0 . Write n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − = n X d =1 (Ψ d − Ψ d − ) ǫ d − = Ψ n ǫ n − + n − X d =1 Ψ d ( ǫ d − − ǫ d ) . Note that for < ǫ < , ǫ d = (cid:16) ǫ − ǫ (cid:17) d is a strictly decreasing function of d . By Theorem 11, we can prove thesufﬁcient conditions of the corollary. A PPENDIX

BA L

EMMA

A similar result has been proved in [4]. Here we provide a proof for completeness.

Lemma 14.

Let ˆ w i = w i − | i | , i = 3 , . The inequality to prove becomes (cid:18) | | | | + ˆ w (cid:19)(cid:18) | | | | + ˆ w (cid:19) ≤ (cid:18) | | | | + ˆ w (cid:19)(cid:18) | | | | + ˆ w (cid:19) . We have by the deﬁnition of W in (37), ˆ w + ˆ w ≥ , ˆ w − ˆ w ≥ , ˆ w ≥ n + 12 − d. We write (cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1) = | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! = ( | | − ˆ w ) · · · ( | | − ˆ w + 1)( | | + ˆ w ) · · · ( | | + ˆ w + 1) · ( | | + ˆ w ) · · · ( | | + ˆ w + 1)( | | − ˆ w ) · · · ( | | − ˆ w + 1)= ˆ w − ˆ w Y i =1 | | − ˆ w + i | | + ˆ w + i | | + ˆ w + i | | − ˆ w + i = ˆ w − ˆ w Y i =1 ( | | + i )( | | + i ) − ˆ w ˆ w + ( | | + i ) ˆ w − ˆ w ( | | + i )( | | + i )( | | + i ) − ˆ w ˆ w − ( | | + i ) ˆ w + ˆ w ( | | + i ) ≤ , where the last inequality is obtained by comparing the last two terms of the denominator and the nominator: (cid:16) | | + i (cid:17) ˆ w − ˆ w (cid:16) | | + i (cid:17) − (cid:16) − (cid:16) | | + i (cid:17) ˆ w + ˆ w (cid:16) | | + i (cid:17)(cid:17) = ( ˆ w + ˆ w ) (cid:18) | | − | | (cid:19) ≤ where the inequality follows from ˆ w + ˆ w ≥ and | | ≤ | | .R EFERENCES[1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”

IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.[2] D. Slepian, “A class of binary signaling alphabets,”

Bell System Technical Journal , vol. 35, no. 1, pp. 203–234, 1956.[3] W. W. Peterson and E. J. Weldon Jr.,

Error-Correcting Codes . MIT Press, 1972.[4] P.-N. Chen, H.-Y. Lin, and S. M. Moser, “Optimal ultrasmall block-codes for binary discrete memoryless channels,”

IEEE Transactions onInformation Theory , vol. 59, no. 11, pp. 7346–7378, 2013.[5] J. Cordaro and T. Wagner, “Optimum (n, 2) codes for small values of channel error probability (corresp.),”