On Optimal Finite-length Binary Codes of Four Codewords for Binary Symmetric Channels
aa r X i v : . [ c s . I T ] J u l On Optimal Finite-length Binary Codes of FourCodewords for Binary Symmetric Channels
Yanyan Dong and Shenghao YangThe Chinese University of Hong Kong, Shenzhen
Abstract
Finite-length binary codes of four codewords are studied for memoryless binary symmetric channels (BSCs) withthe maximum likelihood decoding. For any block-length, best linear codes of four codewords have been explicitlycharacterized, but whether linear codes are better than nonlinear codes or not is unknown in general. In this paper,we show that for any block-length, there exists an optimal code of four codewords that is either linear or in a subsetof nonlinear codes, called
Class-I codes. Based on the analysis of Class-I codes, we derive sufficient conditions suchthat linear codes are optimal. For block-length less than or equal to , our analytical results show that linear codesare optimal. For block-length up to , numerical evaluations show that linear codes are optimal. I. I
NTRODUCTION
A binary code of block length n and codebook size k is called an ( n, k ) code, which is said to be linear ifit is a subspace of { , } n . Linear codes have been extensively studied in coding theory. For memoryless binarysymmetric channels (BSCs), asymptotically capacity achieving linear codes with low encoding/decoding complexityhave been designed, for example polar codes [1]. For given n and k , however, whether linear codes are optimal ornot among all ( n, k ) codes for BSCs in terms of the maximum likelihood (ML) decoding is a long-standing openproblem, traced back to the early days of information and coding theory [2], [3].For given n and k , if perfect or quasi-perfect binary ( n, k ) codes exist, they are optimal for BSC [3]. For example,the optimal ( n, codes are either perfect or quasi-perfect and hence are known [2]. Readers can find more aboutoptimal codes in [2], [3]. More recently, Chen, Lin and Moser [4] gave the first proof of the optimal binary codesof codewords for any block length n .In this paper, we study binary ( n, codes. The best linear ( n, codes have been explicitly characterized for eachblock length n [4], [5], and are conjectured to be optimal among all ( n, codes in terms of the ML decoding [4].In this paper, we derive a general approach for comparing the ML decoding performance of two ( n, codes withcertain small difference. Based on this approach, we verify that linear ( n, codes are optimal for a range of n .In particular, we show that for any block-length n , there exists an optimal ( n, code that is either linear orin a subset of nonlinear codes, called Class-I codes. Based on the analysis of Class-I codes, we derive sufficientconditions such that linear codes are optimal. For n ≤ , our analytical results show that linear codes are optimal.For n up to , numerical evaluations show that linear codes are optimal, where the evaluation complexity is O ( n ) . Moreover, most ML decoding comparison results obtained in this paper about ( n, codes are universal inthe sense that they do not depend on the crossover probability of the BSC.In the remainder of this paper, we first formulate the problem and introduce our main results. In § III, we outlinea general approach for comparing the ML decoding performance of two codes, for which two special cases areused in this paper: two codes with only one codeword different in one bit (see § IV) or in two bits (see § VI). § Vis dedicated to the analysis of Class-I codes, based on the results in § IV.II. P
ROBLEM F ORMULATION AND M AIN R ESULTS
A. Formulation of ( n, k ) Codes An ( n, k ) binary node C is a subset of { , } n of size k , and is said to be linear if it is a subspace of { , } n .Using the codewords of C as rows, we can form a k × n binary matrix C , which is used interchangeably with C .For i = 1 , . . . , k , let c i be the i th row of C , i.e., a codeword of C .For x , y ∈ { , } n , let w ( x ) be the Hamming weight of x and let x ⊕ y be the bit-wise exclusive OR of x and y , so that w ( x ⊕ y ) is the Hamming distance between x and y . Let d C ( y ) = min c ∈C w ( c ⊕ y ) . Consider the communication over a memoryless binary symmetric channel (BSC) with crossover probability ǫ ( < ǫ < ). For a channel input x ∈ { , } n , the channel output is y ∈ { , } n with probability p ( y | x ) = (1 − ǫ ) n − w ( x ⊕ y ) ǫ w ( x ⊕ y ) . Suppose an ( n, k ) code C is used for this BSC. The maximum-likelihood (ML) decoding rule decodes an output y to a code word c if w ( c ⊕ y ) = d C ( y ) , where a tie is resolved arbitrarily. Define α d ( C ) = |{ y ∈ { , } n : d C ( y ) = d }| , which is the number of outputs y that is decoded to a codeword of distance d . Note that the value α d ( C ) does notdepend on ǫ . The (average) correct decoding probability of C is λ C = 1 | C | n X d =0 α d ( C )(1 − ǫ ) n − d ǫ d . (1)We say an ( n, k ) code C is better or no worse than another ( n, k ) code C ′ if λ C ≥ λ C ′ . We say an ( n, k ) code C is optimal if it is better than any other ( n, k ) codes. If valid for all ǫ , a property of a code is said to be universal . B. Main Results about ( n, Codes
In this paper, we focus on ( n, codes, which have four codewords. The columns of an ( n, code C are ofvectors in { , } . We use h i i k to denote the binary vector of length k associated with an integer i ≥ . When thelength of the vector is implied in the context, the superscript is omitted. For example, h i = h i ⊤ , h i = h i ⊤ . We use { i } C to denote the index set of the columns of C equal to h i i , and let | i | C be the size of { i } C . We maywrite | i | C as | i | when the code C is implied in the context. For example, the (7 , code C = has the i th column of type h i i and | i | = 1 for i = 1 , . . . , .The analysis of the column types of C has been used in literature [4], [5]. Chen, Lin and Moser [4] compareddifferent codes by induction in n , i.e., increasing one column a time. Here, we compare two codes of the samelength with difference in one or two positions in one codeword, and we find that it is also convenient to use thecolumn representation in our analysis. The following facts about ( n, codes are straightforward [4], [5]. First,codes with all-zero columns are not optimal. Second, flipping all the bits in a column does not change the decodingperformance. Third, row and column permutations of C do not affect the decoding performance. Due to these facts,we only need to consider C of types of the columns: h i , h i , . . . , h i for finding an optimal code. Theorem 1.
Consider an ( n, code C of codewords c , . . . , c with w ( c s ⊕ c t ) even for certain ≤ s = t ≤ ,and with a column of type h − s i . Let C ′ be the code obtained by replacing a column of type h − s i of C by h − s + 2 − t i . Then, λ C ′ ≥ λ C .Proof. See § IV-B.For example, suppose an ( n, code C has a column h i and w ( c ⊕ c ) even. The above theorem says, if wereplace a column of type h i of C by h i , the ML decoding performance is better. Corollary 2.
Consider an ( n, code C with P i =1 | i | C = n . There exists a code C ′ with λ C ′ ≥ λ C and | | C ′ + | | C ′ + | | C ′ + | | C ′ = n .Proof. In this proof, we write | i | C as | i | . Suppose at least two of | | , | | , | | are positive, since otherwise, the proofis done. We argue the case that | | and | | are positive. Other cases can be converted to this case by interchangingrows. Write w ( c ⊕ c ) = | | + | | + | | + | | w ( c ⊕ c ) = | | + | | + | | + | | w ( c ⊕ c ) = | | + | | + | | + | | . We claim that one of the above three weights must be even. For example, assume w ( c ⊕ c ) is odd. Then | | + | | and | | + | | are of different parity, so that one of w ( c ⊕ c ) and w ( c ⊕ c ) must be even.Suppose w ( c ⊕ c ) is even. As | | is positive, Theorem 1 implies a better code with | | smaller and | | bigger.Repeating the above argument, there exists a better code C ′ where at most one of | | C ′ , | | C ′ , | | C ′ is positive and P i =1 | i | C ′ = n . The corollary is proved by properly interchanging rows of C ′ . Corollary 3.
Consider a non-Class-I, nonlinear ( n, code C with | | C + | | C + | | C + | | C = n . There exists aneither linear or Class-I code C ′ with λ C ′ ≥ λ C and | | C ′ < | | C .Proof. In this proof, we write | i | C as | i | . Since C is nonlinear, | | > . We claim that at least one of the followingthree weights are even: w ( c ⊕ c ) = | | + | | + | | (2) w ( c ⊕ c ) = | | + | | + | | (3) w ( c ⊕ c ) = | | + | | + | | , . (4)When | | is odd, | | , | | and | | are not of the same parity since C is not of Class-I, which implies at least one of (2),(3), (4) is even. By Theorem 1, there is a better code C with | | C = | |− even and | | C + | | C + | | C + | | C = n .When | | is even, if (2), (3), (4) are all odd, then | | + | | , | | + | | and | | + | | are all odd, which is not possiblefor any integers | | , | | , | | . Then at least one of (2), (3), (4) is even, Theorem 1 implies a better code C with | | C = | | − odd and | | C + | | C + | | C + | | C = n .For both case, a better code with | | strictly smaller always exists if C is non-Class-I, nonlinear. By repeatingthe similar argument on C , we eventually obtain a better code C ′ which either has | | C ′ = 0 , i.e., linear or is ofClass-I so that (2), (3), (4) are all odd. Theorem 4.
Consider an ( n, code C with first two columns of the types h i (resp. h i , h i ) and h i . Let C ′ bethe code obtained by replacing the first two columns of C with h i and h i (resp. h i and h i , h i and h i ). Then λ C ′ ≥ λ C .Proof. See § VI.Using the above two theorems, we can reduce the searching range for an optimal ( n, code. Note that a linear ( n, code (subject to row interchanging) has | | + | | + | | = n . Definition 1. An ( n, code C is of Class-I if | | is odd, | | , | | , | | are of the same parity, and | | + | | + | | + | | = n . Theorem 5.
An optimal ( n, code exists in the set formed by all the linear codes and Class-I codes.Proof. Consider an arbitrary ( n, code C . As column flipping does not change the ML decoding performance,we consider a code C with P i =1 | i | = n obtained by column flipping of C . We then discuss C in two cases.If < | | ≤ | | + | | + | | in C , by Theorem 4, there exists a code C with λ C ≥ λ C and P i =1 | i | = n obtained by replacing, one-by-one, pairs of columns of types h i and h s i ( s = 0 , , ). Following Corollary 2,there exists code C , no worse than C , where | | + | | + | | + | | = n . Then by Corollary 3, there exists an eitherlinear or Class-I code C such that λ C ≥ λ C .If | | + | | + | | < | | in C , by Theorem 4, there exists a better code C ′ with | | + | | + | | = 0 . By flippingcolumns, we can obtain a code C ′ of the same performance of C ′ that has | | > and | | + | | + | | + | | = n .Again, by Corollary 3, the proof is completed. We have the following properties of Class-I codes.
Theorem 6.
Let C be a Class-I ( n, code with | | C = 1 . Let C ′ be the code obtained by replacing the h i column of C by h s i , where s = arg min i =3 , , | i | C . Then λ C ′ ≥ λ C .Proof. See § V-B.In the above theorem, code C ′ is linear. Theorem 7.
Let C be a Class-I ( n, code with min i =3 , , | i | C = 0 or . Let C ′ be the code obtained by replacingone h i column of C by h s i , where s ∈ { , , } has | s | C = 0 or . Then λ C ′ ≥ λ C .Proof. See § V-CThe above analysis enables us to obtain the following sufficient condition about the optimality of linear codes.
Theorem 8.
Fix a block length n . If for any Class-I ( n, code C , there exists an ( n, code C ′ such that | | C ′ < | | C , | | C ′ + | | C ′ + | | C ′ + | | C ′ = n and λ C ≤ λ C ′ , then linear ( n, codes are optimal.Proof. Assume that all optimal ( n, codes are nonlinear. By Theorem 5, there must exist an optimal ( n, code C that is Class-I. By the condition of the theorem, there exists an optimal code C ′ such that | | C ′ < | | C and | | C ′ + | | C ′ + | | C ′ + | | C ′ = n . If | | C ′ = 0 , then C ′ is linear and we get a contradiction to the assumptionthat all optimal ( n, codes are nonlinear. If C ′ is Class-I, we repeat the above argument. If C ′ is non-Class-I andnonlinear, then by Corollary 3, there exists an optimal code C ′′ with | | C ′′ < | | C ′ that is either linear or Class-I.If C ′′ is linear, we get a contradiction to the assumption. If C ′′ is Class-I, we can repeat the above argument. As | | C is finite, the process will eventually stop with an optimal linear code, i.e., a contradiction to the assumptionthat all optimal ( n, codes are nonlinear. Corollary 9.
For n ≤ , linear ( n, codes are optimal.Proof. Fix n ≤ . For a Class-I ( n, code with | | = 1 , by Theorem 6, there exists a better linear code. For aClass-I ( n, code C with | | ≥ , we have | | + | | + | | ≤ which implies min {| | , | | , | |} ≤ . By Theorem 7,we have λ C ≤ λ C ′ for the ( n, code C ′ obtained by replacing one h i column of C by h s i , where s ∈ { , , } with | s | C = 0 or . By Theorem 8, linear ( n, codes are optimal for n ≤ .For a general block length n , if we can verify the condition of Theorem 8, then there exists an optimal ( n, code that is linear. For each Class-I ( n, code C , we can compare it with the code C ′ obtained by replacing one h i column of C by h s i with s = arg min i =3 , , | i | C . See an algorithm for verifying the optimality of linear ( n, codes in § V-D, where the evaluation complexity is O ( n ) . We have verified that for n ≤ , linear codes areoptimal. III. A N A PPROACH OF C OMPARING TWO ( n, C ODES
We further define some notations. Let C be an ( n, code with the j th codeword/row c j , j = 1 , . . . , . Denote d j ( y ) = w ( c j ⊕ y ) . (5)For a binary vector y , denote ( y ) i or y i as the i th entry of y . For example, the rd entry of h i is ( h i ) =([0 0 1 0] ⊤ ) = 1 . For i = 0 , , . . . , , define w i ( y ) = P j ∈{ i } C y j for y ∈ { , } n . When y is clear from thecontext, we write w i = w i ( y ) . For a vector y ∈ { , } n , we can rewrite d j ( y ) defined in (5) as d j ( y ) = X i =0 h | i | / − ( h i i ) j ( w i − | i | / i . (6)We also write d j = d j ( y ) when y is clear from the context. For example, for C of columns of types only h i , h i , . . . , h i , d ( y ) = w + w + w + w + w + w + w , (7) d ( y ) = w + w + w + w + w + w + w , (8) d ( y ) = w + w + w + w + w + w + w , (9) d ( y ) = w + w + w + w + w + w + w , (10)where w i = | i | C − w i .We compare C with another ( n, code C ′ obtained by modifying C as follows. Let O be a nonempty, propersubset of { , , , } and let P be its complement, which is also nonempty. Let C ′ be the code obtained by flippingthe first t bits of c i for each i ∈ P . Denote by c ′ i the i th codeword/row of C ′ , i = 1 , . . . , . For y ∈ { , } n , let f t ( y ) be the vector obtained by flipping the first t bits of y . We see that c ′ i = c i for i ∈ O and c ′ i = f t ( c i ) for i ∈ P .Denote by s τ , τ = 1 , , . . . , t the τ th column of C . For y ∈ { , } n , let d ′ i ( y ) = d i ( y ) + t X τ =1 ( − ( s τ ) i ( y τ − y τ ) , (11)where y τ = y τ ⊕ . For a nonempty subset S ⊂ { , . . . , } , let d S ( y ) = min i ∈S d i ( y ) and d ′S ( y ) = min i ∈S d ′ i ( y ) . We have d C ( y ) = min { d O ( y ) , d P ( y ) } , (12) d C ′ ( y ) = min { d O ( y ) , d ′P ( y ) } , (13) d C ′ ( f t ( y )) = min { d O ( f t ( y )) , d ′P ( f t ( y )) } = min { d ′O ( y ) , d P ( y ) } . (14)Our approach to compare the ML decoding performance of C and C ′ is based on a pair of partitions {Y i , i =1 , . . . , i } and {Y ′ i , i = 1 , . . . , i } of { , } n , where i indicates the number of subsets in each partition. This pair of partitions satisfies the following properties: 1) for each i , |Y i | = |Y ′ i | , and 2) for each i , there exists an one-to-oneand onto mapping g i : Y i → Y ′ i such that one of the following conditions holds:1) for all y ∈ Y i , d C ( y ) = d C ′ ( g i ( y )) ;2) for all y ∈ Y i , d C ( y ) < d C ′ ( g i ( y )) ;3) for all y ∈ Y i , d C ( y ) > d C ′ ( g i ( y )) .Such a pair of partitions exists. For example, when i = 2 n , Y i = Y ′ i = {h i i n } for i = 0 , , . . . , i − form a pairof partitions satisfying the desired properties. But this example does not help to simplify the problem. For the twospecial cases used to prove Theorem 1 and 4, there exists such a pair of partitions with i = 5 .In the following discussion, we write min { a, b } as a ∧ b . For a function g : { , } n → R , we write { y ∈ { , } n : g ( y ) ≥ } as { g ≥ } to simplify the notations.IV. C HANGE OF ONE C OLUMN
In this section, we study how the ML decoding performance is affected after changing one column of an ( n, code. A. General Results
Consider an ( n, code C with the first column h s i , ≤ s ≤ . Let C ′ be the code formed by changing thefirst column of C to h s ′ i . Following the notations in § III, O is the set of index j such that ( h s i ) j = ( h s ′ i ) j , and P is the set of index j such that ( h s i ) j = ( h s ′ i ) j . Assume s ′ = s and s ′ = 15 − s , and hence both O and P arenonempty.In this case, d ′ i defined in (11) becomes d ′ i ( y ) = d i ( y ) + ( − ( h s i ) i ( y − y ) . (15)Consider an example with s = 1 and s ′ = 3 . Now O = { , , } and P = { } . Substituting h i into (15), d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) − y + y ,d ′ ( y ) = d ( y ) + y − y . and hence d O ( y ) = d ∧ d ∧ d , (16) d P ( y ) = d , (17) d ′O ( y ) = [( d ∧ d ) − y + y ] ∧ ( d + y − y ) , (18) d ′P ( y ) = d − y + y . (19) We are ready to form the partitions. Define the following subsets of { , } n : Y = { d O ≤ d P < d ′P } ∪ { d O ≤ d ′P ≤ d P , d ′O ≤ d ′P } , Y = { d P ≤ d ′P , d P < d O } ∪ { d ′P < d P ≤ d O , d P ≤ d ′O } , Y = { d ′P = d ′O < d P = d O } , (20) Y = { d P = d ′P = d O < d ′O } , Y = { d ′P = d O < d ′O = d P } . (21)For i = 2 , , , define Y ′ i = { f ( y ) : y ∈ Y i } , where function f (defined in § III) flips the first bit of a binary vector. The next lemma shows that both {Y i , i =1 , . . . , } and {Y , Y ′ , Y , Y ′ , Y ′ } are partitions of { , } n and satisfy the desired properties described in § III.
Lemma 10.
For the ( n, codes C and C ′ formulated above, both {Y , Y , Y , Y , Y } and {Y , Y ′ , Y , Y ′ , Y ′ } are partitions of { , } n . Moreover, For y ∈ Y , d C ( y ) = d C ′ ( y ) = d O ; For y ∈ Y , d C ( y ) = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ ; For y ∈ Y , d C ( y ) = d P = d C ′ ( y ) + 1 = d ′P + 1 ; For y ∈ Y , d C ( y ) = d O = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ ; For y ∈ Y , d C ( y ) + 1 = d O + 1 = d C ′ ( y ′ ) = d P where y ′ , f ( y ) ∈ Y ′ .Proof. See Appendix A.Now we move on to compare λ C and λ C ′ as defined in (1). Define for i = 1 , . . . , and d = 0 , , . . . , n , α id ( C ) = |{ y ∈ Y i : d C ( y ) = d }| . (22)As {Y , Y , Y , Y , Y } is a partition of { , } n , we have α d ( C ) = X i =1 α id ( C ) . By the definition of Y and Y , α ( C ) = 0 and α n ( C ) = 0 . Theorem 11.
For two ( n, codes C and C ′ with only one column different, λ C ′ ≥ λ C if and only if n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − ≥ . Proof.
See Appendix A.
Corollary 12.
For two ( n, codes C and C ′ with only one column different, λ C ′ ≥ λ C if for d = 1 , . . . , n , d X i =1 α i ( C ) ≥ d − X i =0 α i ( C ) . Proof.
See Appendix A.If we can compare C and C ′ based on Corollary 12, their relation is universal in the sense that it does notdepend on ǫ . B. Proof of Theorem 1
Now we give a proof of Theorem 1.As interchanging rows/columns does not change the performance of C , we only consider the following case whenproving the theorem: C has the first column h i and w ( c ⊕ c ) is even. Let C ′ be the code obtained by replacingthe first column of C by h i . Substituting s = 1 and s ′ = 3 to the discussion in § IV-A, we have O = { , , } and P = { } , and hence Y = { d ′ = d { , , } < d = d ′{ , , } } . Assume Y is nonempty and fix y ∈ Y . As d ′ ( y ) = d ( y ) − y + y , we have y = 1 . Further, due to d { , , } ( y ) = d ∧ d ∧ d ,d ′{ , , } ( y ) = ( d − ∧ ( d − ∧ ( d + 1) , we have d { , , } = d and hence d = d + 1 . By (6), d ( y ) + d ( y )= X i h | i | / − ( h i i ) ( w i − | i | / i + X i h | i | / − ( h i i ) ( w i − | i | / i = X i :( h i i ) =( h i i ) | i | + 2 X i :( h i i ) =( h i i ) (cid:20) | i | − ( h i i ) (cid:18) w i − | i | (cid:19)(cid:21) = w ( c ⊕ c ) + 2 X i :( h i i ) =( h i i ) (cid:20) | i | − ( h i i ) (cid:18) w i − | i | (cid:19)(cid:21) . As w ( c ⊕ c ) is even, we see that d + d is even, which is a contradiction to d = d + 1 . Therefore, Y = ∅ and hence by Corollary 12, λ C ′ ≥ λ C . V. A NALYSIS OF C LASS -I C
ODES
Recall the definition of Class-I codes in Definition 1. In this section, we consider a Class-I ( n, code C withthe first column h i . Let C ′ be the code obtained by replacing the first column of C to h i . The ML decodingperformance of C and C ′ can be compared using the approach introduced in § IV-A.
A. Characterizations of Y and Y Guided by Theorem 11 and Corollary 12, we first study Y and Y defined in (20) and (21). Lemma 13.
For a Class-I ( n, code C with the first column h i and C ′ obtained by replacing the first columnof C to h i , Y = { y = 1 , d ≥ d ∧ d = d } , Y = { y = 1 , d ∧ d ≥ d + 2 = d + 1 } . Proof.
For code C and C ′ defined above, we have (16) – (19). For y ∈ Y , d − y + y < d implies y = 1 ,and d = d ∧ d ∧ d and d − d ∧ d ) − ∧ ( d + 1) together implies d ∧ d ≤ d and d = d ∧ d .For y ∈ Y , d − y + y < d implies y = 1 , and d − d ∧ d ∧ d and d = [( d ∧ d ) − ∧ ( d + 1) together implies d + 1 ≤ ( d ∧ d ) − and d − d .
1) Characterization of α i : For y ∈ Y , by Lemma 10, d C ( y ) = d . By (7) – (10) and Lemma 13, we have thefollowing necessary and sufficient condition for y ∈ Y with d C ( y ) = i : y = 1 and w + w = i − w − w ,w − w ≤ w + w − w − w ,w − w = w + w − ( w + w ) ∧ ( w + w ) . We discuss two cases according to w + w < w + w or not.Define Y A ( i ) as the collection of y satisfying y = 1 and w + w < ( | | + | | ) / , (23) w + w = i − ( | | + | | ) / , (24) w + w − w ≤ ( | | + | | − | | ) / , (25) w + w = ( | | + | | ) / . (26)We have |Y A ( i ) | = X w ≥ ,w ,w ,w : (23) , (24) , (25) , (26) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . Define Y B ( i ) as the collection of y satisfying y = 1 and w + w ≥ ( | | + | | ) / , (27) w + w = i − ( | | + | | ) / , (28) w + w − w ≤ ( | | + | | − | | ) / , (29) w − w = ( | | − | | ) / . (30)We have |Y B ( i ) | = X w ≥ ,w ,w ,w : (27) , (28) , (29) , (30) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . (31)We see that α i = |Y A ( i ) | + |Y B ( i ) | .
2) Characterization of α i : For y ∈ Y , by Lemma 10, d C ( y ) = d − . By (7) – (10) and Lemma 13, we havethe following necessary and sufficient condition for y ∈ Y with d C ( y ) = i : y = 1 and w + w = i + 1 − w − w ,w − w = w + w − w − w + 1 ,w − w ≥ w + w − ( w + w ) ∧ ( w + w ) + 1 , which can be further simplified as y = 1 and w = ( n + | | − / − i, (32) w + w − w = ( | | + | | − | | + 1) / , (33) w + w ≥ ( | | + | | ) / , (34) w − w ≥ ( | | − | | ) / . (35)Hence α i = X w ≥ ,w ,w ,w : (32) , (33) , (34) , (35) (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) . B. Class-I Codes with | | = 1 Following the discuss in the last subsection, we consider the special case with | | = 1 , and prove Theorem 6for the case | | = min {| | , | | , | |} . For other cases, we can perform row interchanging and column bit flipping toconvert the problem to this case.When | | = 1 , w = 1 . Using the characterization in the last subsection, we have d − X i =0 α i = X W (cid:18) | | w (cid:19)(cid:18) | | | |−| | + w (cid:19)(cid:18) | | w (cid:19) (36)where W = ( (34) , (33) + (35) ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | ) = w + w ≥ | | + | | +1 ,w − w ≥ | |−| | +1 ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | . (37)Similarly, d X i =1 α i ≥ d X i =1 |Y B ( i ) | = X W (cid:18) | | w (cid:19)(cid:18) | | | |−| | + w (cid:19)(cid:18) | | w (cid:19) = X W ′ (cid:18) | | | |−| | + w ′ (cid:19)(cid:18) | | | |−| | + w ′ (cid:19)(cid:18) | | | |−| | + w ′ (cid:19) (38) where W = ( (27) + (30) , (29) + (30) ,w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | ) = w + w ≥ | | + | | w − w ≤ | |−| |− w ≥ n + | | +12 − d, ≤ w ≤| | , ≤ w ≤| | , W ′ = ( w ′ + w ′ ≥ | | + | | , w ′ − w ′ ≥ | |−| | +12 w ′ ≥ n + | | +12 − d, | |−| | ≤ w ′ ≤ | | + | | , | |−| | ≤ w ′ ≤ | | + | | ) , and (38) is obtained by change of variables w ′ − | | = w − | | and w ′ − | | = w − | | .We show that W ⊂ W ′ . Due to | | ≤ | | , we have | |−| | ≤ and | | + | | ≥ | | . For ( w , w ) ∈ W , we have w + w ≥ | | + | | + 1 , w − w ≥ | |−| | + 1 and ≤ w ≤ | | , which implies | |−| | + 1 ≤ w ≤ | | + | | − .Thus | | − | | ≤ w ≤ | | + | | , | | − | | ≤ w ≤ | | + | | , showing ( w , w ) ∈ W ′ .By Lemma 14 in Appendix B, for ( w , w ) ∈ W , we have (cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ (cid:18) | | | |−| | + w (cid:19)(cid:18) | | | |−| | + w (cid:19) . Comparing (36) and (38), we obtain P di =1 α i ≥ P d − i =0 α i for any d = 1 , . . . , n . By Corollary 12, λ C ′ ≥ λ C ,proving Theorem 6. C. Class-I Codes with | | odd, | | = 0 , Here we give a proof of Theorem 7 for the case | | = min {| | , | | , | |} . Otherwise, we can perform rowinterchanging and column bit flipping (which do not change the ML decoding performance) so that C satisfies thecondition. | | = 0 : When | | = 0 , we have w = 0 and n is odd. By (32), α i = 0 if i = n − . So when d < n +12 , P d − i =0 α i = 0 and hence P di =1 α i ≥ P d − i =0 α i ; when d ≥ n +12 , d − X i =0 α i = α n − = X w ≥ , (33) + (34) , (34) ,w = w − w + | | + | |−| | +12 ,w ≤ | | − (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ , (33) + (34) , (34) (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +1 ,w − w ≤ | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) , (39)where (33) + (34) is w − w ≤ | |−| |− . Substituting w ′ = | | − w + 1 into (39), we obtain α n − ≤ X ≤ w ′ ≤| | ,w ≥ | | +1 ,w ′ ≤ | | + | |− − w (cid:18) | | − w ′ − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) (40) When d ≥ n +12 , we further have d X i =1 α i ≥ n +12 X i =1 α i ≥ n +12 X i =1 |Y B ( i ) | = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ | |−| | +12 (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | ,w ≤ | |−| |− + w (cid:18) | | − w − (cid:19)(cid:18) | | | | (cid:19)(cid:18) | | w (cid:19) , (41)where (27) + (30) is w ≥ | | and (29) + (30) is w − w ≤ | |−| | , which is equivalent to w − w ≤ | |−| |− as | | − | | is odd.As | |−| |− + w ≥ | | + | |− − w when w ≥ | | , comparing the RHS’ of (40) and (41), we have P di =1 α i ≥ P d − i =0 α i for d ≥ n +12 . By Corollary 12, λ C ′ ≥ λ C , proving the case when | | = 0 . | | = 1 : When | | = 1 , we have w = 0 or , | | and | | are odd, and n is even. In (32), w = 1 when i = n − , and hence α n − = X w ≥ ,w ≥ | | +12 , (32) + (35) ,w = w − w + | | + | |−| | +12 ,w ≤ | |− (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ ,w ≥ | | +12 , (32) + (35) (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +12 ,w − w ≤ | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) , (42) = X | |≥ w ′ ≥ ,w ≥ | | +12 ,w ′ ≤ | | + | | − w (cid:18) | | − w ′ − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) . (43)where (32) + (35) is w − w ≤ | |−| |− , and (43) is obtained by substituting w ′ = | | − w + 1 into (42). In (32), w = 0 when i = n , and hence α n = X w ≥ ,w ≥ | | +32 , (33) + (35) ,w = w − w + | | + | |−| | +12 ,w ≤ | |− (cid:18) | | − w − (cid:19)(cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ X w ≥ ,w ≥ | | +32 , (33) + (35) (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | | +32 ,w − w ≤ | |−| | − (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) (44) = X | |≥ w ′ ≥ ,w ≥ | | +32 ,w ′ ≤− w + | | + | |− (cid:18) | | − w ′ − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) . (45)where (33) + (35) means w − w ≤ | |−| | − , and (45) is obtained by substituting w ′ = | | − w + 1 into (44).Following (31), we have (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ n − | | −| | (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w ≥ | |− ,w ≤ w + | |−| |− (cid:18) | | − w − (cid:19)(cid:18) | | | | +12 (cid:19)(cid:18) | | w (cid:19) (46)where (27) + (30) implies w ≥ | |− , (29) + (30) implies w − w ≤ | |−| |− , and (46) follows that | |−| |− ≤ n − | | − | | . Since w + | |−| |− ≥ − w + | | + | | when w ≥ | | +12 , comparing the RHS’ of (43) and (46),we get α n − ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (47)Following (31), we have (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = X w ≥ , (27) + (30) , (29) + (30) ,w − w ≤ n +1 − | | −| | (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) = X w ≥ ,w : w ≥ | | +12 ,w ≤ w + | |−| | +12 (cid:18) | | − w − (cid:19)(cid:18) | | | |− (cid:19)(cid:18) | | w (cid:19) , (48) where (27) + (30) implies w ≥ | | +12 , (29) + (30) implies w − w ≤ | |−| | +12 , and (48) follows that | |−| | +12 ≤ n + 1 − | | − | | . Since w + | |−| | +12 ≥ − w + | | + | |− when w ≥ | | +12 , comparing the RHS’ of (45) and(48), we get α n ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (49)When d < n , P d − i =0 α i = 0 ≤ P di =1 α i . When d = n , by (47), d − X i =0 α i = α n − ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n X i =1 α i . When d ≥ n + 1 , by (47) and (49), d − X i =0 α i = α n − + α n ≤ (cid:12)(cid:12)(cid:12)(cid:12) { w = 1 } ∩ (cid:18) ∪ i ≤ n Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) { w = 0 } ∩ (cid:18) ∪ i ≤ n +1 Y B ( i ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d X i =1 α i . Thus we have P di =1 α i ≥ P d − i =0 α i for ≤ d ≤ n . By Corollary 12, λ C ′ ≥ λ C , proving the case when | | = 1 . D. Algorithm for Verifying Optimal Codes
Based on Theorem 8, we give an algorithm for verifying whether linear ( n, codes are optimal for fixed n (seethe pseudocode in Algorithm 1). The algorithm checks each Class-I ( n, code C specified by ( w , w , w , w ) with the first column being h i and | | C ≤ | | C ≤ | | C , and compares C with C ′ obtained by replacing the firstcolumn of C to h i . Other ( n, Class-I codes can be converted to ones of the above type by flipping columns andinterchanging rows, and hence do not need to be checked again. By Theorem 8, if for each code C checked by thealgorithm we have P di =1 α i ( C ) ≥ P d − i =0 α i ( C ) for d = 1 , . . . , n , which implies λ C ≤ λ C ′ by Corollary 12, thenlinear ( n, codes are optimal.Evaluating Algorithm § V-D, we have verified that for n ≤ , linear codes are optimal. To total number oftypes of Class-I codes to evaluate is O ( n ) . For each type, there are less than n values α i /α i to evaluate, eachof which has complexity O ( n ) . Therefore, the complexity of the algorithm is O ( n ) .VI. P ROOF OF T HEOREM § III. Let C be an ( n, codewith the first two columns h i and h i . Let C ′ be the code obtained by flipping the first two bits of c , so that thefirst two columns of C ′ are h i and h i . (Other cases of Theorem 4 can be converted to this case by interchangingrows.)Following the notations in § III, O = { , , } , P = { } , and d ′ i ( y ) = d i ( y ) + ( − h i i ( y − y ) + ( − h i i ( y − y ) . Algorithm 1:
Check optimality of linear ( n, codes input : n output: a (If a = − , linear ( n, codes are optimal.)Initialize a = − ; for n = 3 , , , . . . and n ≤ n dofor n = 2 , , . . . , (cid:4) n − n (cid:5) dofor n = n , n + 2 , . . . and n ≤ (cid:4) n − n − n (cid:5) dofor n = n , n + 2 , . . . and n ≤ n − n − n − n do Compute α i and α i , i = 0 , . . . , n for code C with | | C = n , | | C = n , | | C = n , | | C = n ; if P di =1 α i < P d − i =0 α i for some d ∈ { , ..., n } then a = 1 ; break; endendendendend When y = y , we have d ′P ( y ) = d P ( y ) . (50)When y = y , we have d ′ ( y ) = d ( y ) , d ′ ( y ) = d ( y ) , and d ′ ( y ) − d ( y ) = d ′ ( y ) − d ( y ) = ± , (51)and hence ( d ′O ( y ) − d O ( y ))( d ′P ( y ) − d P ( y ))= ( d ′{ , , } ( y ) − d { , , } ( y ))( d ′ ( y ) − d ( y )) ≥ . (52)Define the following subsets of { , } n : Y = { y = y } , Y = { y = y , d O ≤ d P ∧ d ′P } , Y = { y = y , d O > d P ∧ d ′P , d P ≤ d O ∧ d ′O } , Y = { y = y , d ′P < d O ∧ d ′O < d P } , Y = { y = y , d ′O ≤ d P ∧ d ′P < d O , d ′O < d P } . Recall the function f defined in § III that flips the first two bits of a binary vector. For i = 3 , , let Y ′ i = { f ( y ) : y ∈ Y i } . We justify that Y , Y , Y , Y , Y form a partition of { , } n and Y , Y , Y ′ , Y ′ , Y form a partition of { , } n :First, we show that Y ∪ Y = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O } , (53)and then we obtain S i =1 Y i = { , } n . Moreover, Y , . . . , Y are all disjoint by checking the definition. Thus Y , . . . , Y form a partition of { , } n .To show (53), since Y ⊆ { d O > d P ∧ d ′P } , we have Y = { y = y , d ′P < d O ∧ d ′O < d P } ∩ { d O > d P ∧ d ′P } = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O }∩{ d O ∧ d ′O > d ′P } . (54)Denote A = { y = y , d O > d P ∧ d ′P , d P > d O ∧ d ′O }∩{ d O ∧ d ′O ≤ d ′P } . (55)For y ∈ A , we have d O ( y ) > d ′O ( y ) which implies d P ( y ) > d ′P ( y ) by (51) and (52), and hence d ′O ( y ) ≤ d P ( y ) ∧ d ′P ( y ) < d O ( y ) , d ′ O ( y ) < d P ( y ) . Thus we have y ∈ Y and then A ⊆ Y . For y ∈ Y , we have d O ( y ) > d ′O ( y ) by the definition above, whichimplies d P ( y ) > d ′P ( y ) by (51) and (52). Then we obtain d P ( y ) > d O ( y ) ∧ d ′O ( y ) = d ′O ( y ) ≤ d ′P ( y ) < d O ( y ) . Thus y ∈ A and then Y ⊆ A . Therefore, Y = A . From (54) and (55), we obtain (53).We further show that Y ′ ∪ Y ′ ⊆ Y ∪ Y . (56)Since f is an one-to-one mapping, we get Y ′ ∪ Y ′ = Y ∪ Y . Therefore, Y , Y , Y ′ , Y ′ , Y form a partition of { , } n .To show (56), we see Y ′ = { y = y , d P < d O ∧ d ′O < d ′P } ⊆ Y , (57) and Y ′ \ Y = { y = y , d ′O > d P ∧ d ′P , d ′P ≤ d O ∧ d ′O }∩ ( { d O ∧ d ′O ≥ d P } ∪ { d ′P ≥ d O ∧ d ′O } )= { y = y , d ′O > d P ∧ d ′P , d ′P ≤ d O ∧ d ′O ,d O ∧ d ′O ≥ d P } ∪ { y = y , d ′O > d P ∧ d ′P ,d ′P ≤ d O ∧ d ′O , d ′P ≥ d O ∧ d ′O } = { y = y , d P ∧ d ′P < max( d P , d ′P ) ≤ d O ∧ d ′O }∪{ y = y , d O ∧ d ′O = d ′P , d P ∧ d ′P < d ′O } (58)where in the last equality d P ∧ d ′P < max( d P , d ′P ) follows from (51). By (52), when y = y , if d O < d ′O , then d P < d ′P ; and if d P > d ′P , then d O ≥ d ′O . Hence, we can verify that both terms to union in (58) are subsets of Y . Therefore, Y ′ \ Y ⊂ Y , which together with (57), proves (56).Moreover, we prove the following claims:1) For y ∈ Y , d C ( y ) = d C ′ ( y ) ;2) For y ∈ Y , d C ( y ) = d C ′ ( y ) = d O ;3) For y ∈ Y , d C ( y ) = d C ′ ( f ( y )) = d P ;4) For y ∈ Y , d C ( y ) = d O ∧ d P ≥ d C ′ ( f ( y )) = d ′O ;5) For y ∈ Y , d C ( y ) = d O ∧ d P ≥ d C ′ ( y ) = d ′P .Following the similar argument as in § IV-A, we can show that λ C ′ ≥ λ C .The above claims are justified as follows:1) For y ∈ Y , as y = y , we have d ′P = d P by (50), and hence d C ( y ) = d O ( y ) ∧ d P ( y ) = d O ( y ) ∧ d ′P ( y ) = d C ′ ( y ) .
2) For y ∈ Y , by the definition of Y , we have d O ≤ d P ∧ d ′P , and hence d C ( y ) = d C ′ ( y ) = d O .3) For y ∈ Y , we have d P ≤ d O ∧ d ′O by the definition of Y . We then have d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) ,d C ′ ( f ( y )) = d ′O ( y ) ∧ d P ( y ) = d P ( y ) .
4) For y ∈ Y , we have y = y , d ′P < d O ∧ d ′O < d P by the definition of Y . By (52), d O ∧ d ′O = d ′O , whichimplies d C ′ ( f ( y )) = d ′O ( y ) ∧ d P ( y ) = d ′O ( y ) and hence d C ( y ) = d O ( y ) ∧ d P ( y ) ≥ d ′O ( y ) = d C ′ ( f ( y )) .
5) For y ∈ Y , we have y = y , d ′O ≤ d P ∧ d ′P < d O . By (51) and (52), d ′P < d P . Then we have d C ( y ) = d O ∧ d P ≥ d O ∧ d ′P = d C ′ ( y ) and d C ′ ( y ) = d ′P ( y ) . VII. C
ONCLUDING R EMARKS
It is attractive to prove in general whether linear ( n, codes are optimal or not. One further research directionis to extend the technique for comparing the decoding performance of two codes to codes of more than fourcodewords. A PPENDIX AP ROOFS
Proof of Lemma 10.
By checking the definition, we see that Y , . . . Y are all disjoint. To show they form a partition,we can verify that Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } , Y ∪ Y = { d O > d P ∧ d ′P } and hence ( Y ∪ Y ∪ Y ) ∪ ( Y ∪ Y ) = { , } n .We first prove that Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } . Notice that the three sets can be rewritten as Y = { d O ≤ d P < d ′P } ∪ { d O ≤ d ′P ≤ d P , d ′O ≤ d ′P } = ( { d O ≤ d P ∧ d ′P } ∩ { d P < d ′P } ) ∪ ( { d O ≤ d P ∧ d ′P }∩{ d ′P ≤ d P , d ′O ≤ d ′P } ) , (59) Y = { d P = d ′P = d O < d ′O } (a) = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P , d P = d ′P } , (60) Y = { d ′P = d O < d ′O = d P } (b) = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P , d P > d ′P } . (61)For ∀ y , we have | d S ( y ) − d ′S ( y ) | ≤ , (62)which can be obtained by the definition. Then if d O ≤ d ′P < d ′O , we will have d O = d ′P and thus the equality(a) in (60) holds. Furthermore, if d ′O > d ′P we have d O ≥ d ′P , and if d ′O > d ′P , we have d O ≥ d ′P and thus theequality (b) in (61) holds. By (60) and (61), we have Y ∪ Y = { d O ≤ d P ∧ d ′P } ∩ { d ′P ≤ d P , d ′O > d ′P } . From (59), this further implies Y ∪ Y ∪ Y = { d O ≤ d P ∧ d ′P } . We now prove Y ∪ Y = { d O > d P ∧ d ′P } . First we rewrite the two sets as Y = { d P ≤ d ′P , d P < d O } ∪ { d ′P < d P ≤ d O , d P ≤ d ′O } = ( { d O > d P ∧ d ′P } ∩ { d P ≤ d ′P } ) ∪ ( { d O > d P ∧ d ′P }∩{ d P > d ′P , d P ≤ d O , d P ≤ d ′O } ) (a) = ( { d O > d P ∧ d ′P } ∩ { d P ≤ d ′P } ) ∪ ( { d O > d P ∧ d ′P }∩{ d P > d ′P , d P ≤ d ′O } ) , (63) Y = { d ′P = d ′O < d P = d O } (b) = { d O > d P ∧ d ′P } ∩ { d P > d ′P , d P > d ′O } . (64)By (62), we can get d P ≤ d O if d O > d ′P , d P > d ′P . Then the equality (a) in (63) holds. Similarly we can justify(b) in (64) by (62).By definition, Y ′ = { d ′P ≤ d P , d ′P < d ′O } ∪ { d P < d ′P ≤ d ′O , d ′P ≤ d O } , Y ′ = { d P = d ′P = d ′O < d O } , Y ′ = { d P = d ′O < d O = d ′P } . It can be verified that ( Y ′ ∪ Y ′ ∪ Y ′ ) ∩ ( Y ∪ Y ) = ∅ . As f is a one-to-one mapping, Y ′ ∪ Y ′ ∪ Y ′ = Y ∪ Y ∪ Y .Hence, we conclude that Y , Y ′ , Y , Y ′ , Y ′ form a partition of { , } n .We use the following facts in the proof of claim 1) – 5). d C ( y ) = min { d O ( y ) , d P ( y ) } d C ′ ( y ) = min { d O ( y ) , d ′P ( y ) } d C ′ ( f ( y )) = min { d ′O ( y ) , d P ( y ) } . To prove the claim 1), for y ∈ Y , by the definition of Y , d O ≤ min { d P , d ′P } . Hence d C ( y ) = d C ′ ( y ) = d O . Toprove claim 2), for y ∈ Y , by the definition of Y , d P ≤ min { d O , d ′O } , and hence d C ( y ) = d P . Further, d C ′ ( y ′ ) = d O ( y ′ ) ∧ d ′P ( y ′ )= d ′O ( y ) ∧ d P ( y ) = d P ( y ) . To prove claim 3), for y ∈ Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) by the definition of Y . Moreover, d C ′ ( y ) = d O ( y ) ∧ d ′P ( y ) = d ′P ( y ) < d C ( y ) . By (62), we have d P = d ′P + 1 . To prove claim 4), for y ∈ Y , by the definition of Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d P ( y ) ,d C ′ ( y ′ ) = d ′O ( y ) ∧ d P ( y ) = d P ( y ) . To prove claim 5), for y ∈ Y , d C ( y ) = d O ( y ) ∧ d P ( y ) = d O ( y ) by the definition of Y . Moreover, d C ′ ( y ′ ) = d ′O ( y ) ∧ d P ( y ) = d ′O ( y ) > d C ( y ) . By (62), we have d ′O = d O + 1 . Proof of Theorem 11. As {Y , Y ′ , Y , Y ′ , Y ′ } is a partition of { , } n , we have α d ( C ′ ) = P i =1 α id ( C ′ ) where α d ( C ′ ) = |{ y ∈ Y : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y : d C ′ ( y ) = d }| = α d +1 ( C ) d < n d = n ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| = α d ( C ) ,α d ( C ′ ) = |{ y ∈ Y ′ : d C ′ ( y ) = d }| = α d − ( C ) d ≥ d = 0 . The second equality in each line follows from Lemma 10. Together with (22), we write λ C ′ − λ C = 1 |C| n X d =0 ( α d ( C ′ ) − α d ( C ))(1 − ǫ ) n − d ǫ d = 1 |C| n X d =0 5 X i =1 ( α id ( C ′ ) − α id ( C ))(1 − ǫ ) n − d ǫ d = 1 |C| n X d =0 X i =3 , ( α id ( C ′ ) − α id ( C ))(1 − ǫ ) n − d ǫ d , By substituting α d ( C ′ ) = α d +1 ( C ) and α d ( C ′ ) = α d − ( C ) , we see that λ C ′ ≥ λ C if and only if n X d =0 [ α d +1 ( C ) − α d ( C ) + α d − ( C ) − α d ( C )] (cid:18) ǫ − ǫ (cid:19) d ≥ , where the LHS can be further simplified as n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − (cid:18) − ǫ − ǫ (cid:19) . The theorem is proved by checking that in the above argument, the relation ≥ can be replaced by > . Proof of Corollary 12.
Let ǫ = ǫ − ǫ and let Ψ d = P di =1 (cid:2) α i ( C ) − α i − ( C ) (cid:3) for d = 1 , . . . , n and Ψ = 0 . Write n X d =1 [ α d ( C ) − α d − ( C )] (cid:18) ǫ − ǫ (cid:19) d − = n X d =1 (Ψ d − Ψ d − ) ǫ d − = Ψ n ǫ n − + n − X d =1 Ψ d ( ǫ d − − ǫ d ) . Note that for < ǫ < , ǫ d = (cid:16) ǫ − ǫ (cid:17) d is a strictly decreasing function of d . By Theorem 11, we can prove thesufficient conditions of the corollary. A PPENDIX
BA L
EMMA
A similar result has been proved in [4]. Here we provide a proof for completeness.
Lemma 14.
Suppose | | ≤ | | of the same parity. For ( w , w ) ∈ W (defined in (37) ), (cid:18) | | w (cid:19)(cid:18) | | w (cid:19) ≤ (cid:18) | | | |−| | + w (cid:19)(cid:18) | | | |−| | + w (cid:19) . Proof.
Let ˆ w i = w i − | i | , i = 3 , . The inequality to prove becomes (cid:18) | | | | + ˆ w (cid:19)(cid:18) | | | | + ˆ w (cid:19) ≤ (cid:18) | | | | + ˆ w (cid:19)(cid:18) | | | | + ˆ w (cid:19) . We have by the definition of W in (37), ˆ w + ˆ w ≥ , ˆ w − ˆ w ≥ , ˆ w ≥ n + 12 − d. We write (cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1)(cid:0) | | | | + ˆ w (cid:1) = | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! | |··· ( | | − ˆ w +1)( | | + ˆ w )! = ( | | − ˆ w ) · · · ( | | − ˆ w + 1)( | | + ˆ w ) · · · ( | | + ˆ w + 1) · ( | | + ˆ w ) · · · ( | | + ˆ w + 1)( | | − ˆ w ) · · · ( | | − ˆ w + 1)= ˆ w − ˆ w Y i =1 | | − ˆ w + i | | + ˆ w + i | | + ˆ w + i | | − ˆ w + i = ˆ w − ˆ w Y i =1 ( | | + i )( | | + i ) − ˆ w ˆ w + ( | | + i ) ˆ w − ˆ w ( | | + i )( | | + i )( | | + i ) − ˆ w ˆ w − ( | | + i ) ˆ w + ˆ w ( | | + i ) ≤ , where the last inequality is obtained by comparing the last two terms of the denominator and the nominator: (cid:16) | | + i (cid:17) ˆ w − ˆ w (cid:16) | | + i (cid:17) − (cid:16) − (cid:16) | | + i (cid:17) ˆ w + ˆ w (cid:16) | | + i (cid:17)(cid:17) = ( ˆ w + ˆ w ) (cid:18) | | − | | (cid:19) ≤ where the inequality follows from ˆ w + ˆ w ≥ and | | ≤ | | .R EFERENCES[1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”
IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.[2] D. Slepian, “A class of binary signaling alphabets,”
Bell System Technical Journal , vol. 35, no. 1, pp. 203–234, 1956.[3] W. W. Peterson and E. J. Weldon Jr.,
Error-Correcting Codes . MIT Press, 1972.[4] P.-N. Chen, H.-Y. Lin, and S. M. Moser, “Optimal ultrasmall block-codes for binary discrete memoryless channels,”
IEEE Transactions onInformation Theory , vol. 59, no. 11, pp. 7346–7378, 2013.[5] J. Cordaro and T. Wagner, “Optimum (n, 2) codes for small values of channel error probability (corresp.),”