[PDF] A Practical Coding Scheme for the BSC with Feedback

Abstract

We provide a practical implementation of the rubber method of Ahlswede et al. for binary alphabets. The idea is to create the "skeleton" sequence therein via an arithmetic decoder designed for a particular k-th order Markov chain. For the stochastic binary symmetric channel, we show that the scheme is nearly optimal in a strong sense for certain parameters.

Full PDF

AA Practical Coding Schemefor the BSC with Feedback

Ke Wu ∗ and Aaron B. Wagner †∗ Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213 USA. [email protected]. † School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14850 USA. [email protected].

Abstract —We provide a practical implementation of the rubbermethod of Ahlswede et al. for binary alphabets. The idea is tocreate the “skeleton” sequence therein via an arithmetic decoderdesigned for a particular k -th order Markov chain. For thestochastic binary symmetric channel, we show that the schemeis nearly optimal in a strong sense for certain parameters. I. I

NTRODUCTION

We consider the binary symmeric channel with ideal feed-back, both in its stochastic- and adversarial-noise forms. In theformer, each bit is ﬂipped independently with some probability p . In the latter, an omniscient adversary can ﬂip up to a fraction f of the bits in order to disrupt the communication.The information-theoretic limits for both forms of thechannel, assuming perfect feedback, are well-known. In theadversarial case, the capacity as a function of f was de-termined by Zigangirov [1], building on earlier results ofBerlekamp [2]. For the stochastic version, the capacity equalsthat of the non-feedback version (e.g., [3], [4]) and likewise thehigh-rate error exponent, normal approximation, and moderatedeviations performance are all unimproved by feedback. Infact, the third-order coding rate is unimproved by feedback [5],as is the order of the optimal “pre-factor” in front of theerror exponent at high rates. Thus, at least for the stochasticversion of the channel, feedback offers very little improvementin coding performance.In general, feedback is known to simplify the coding prob-lem even if it does not provide for improved performance. Theerasure (e.g., [6, Section 17.1]), and Gaussian channels [7], [8]provide striking examples of this phenomenon. For the BSC,see [9], [10] for classical and [11], [12] for recent work ondevising implementable schemes using feedback.For the adversarial symmetric channel with feedback (andarbitrary, ﬁnite alphabet size), Ahlswede et al. [13] proposedan explicit scheme called the rubber method . In the binarycase, for a ﬁxed (cid:96) > , the message is encoded as a “skeleton”string containing no substring of (cid:96) consecutive zeros. Theencoder then transmits this string, sending (cid:96) consecutive zerosto indicate that an error has occurred. For each (cid:96) , this schemeachieves the capacity of the adversarial channel for a certainchoice of f . This scheme simpliﬁes signiﬁcantly the originalachievability argument of Berlekamp [2]. For ternary andlarger alphabets, the scheme is even simpler. Rubber methodhas since been generalized [14]–[16].We only consider the binary case in this paper, and wemake two contributions. The ﬁrst is to propose the use of arithmetic coding applied to a particular Markov chain inorder to efﬁciently encode the message sequence into thecorresponding skeleton string. This results in a practically-implementable end-to-end scheme, with only a negligible ratepenalty. The second contribution is showing that, for each (cid:96) ,there is a special rate R ∗ (cid:96) and crossover probability p (cid:96) suchthat the resulting scheme is optimal with respect to the second-order coding rate and moderate deviations performance forthe channel with crossover probability p (cid:96) and error-exponentoptimal at rate R ∗ (cid:96) for all channels with crossover probabilityless than p (cid:96) . We also consider the third-order coding rate andthe “pre-factor” of the error exponent of the scheme. Theseturn out to be nearly, but not exactly optimal. See Section V.In Section II we introduce our notation and provide variouspreliminaries. In Section III and IV and we describe our codingscheme. In Section V we present our main results.II. N OTATION AND P RELIMINARIES

Capital letters such as X or Y denote random variables. Weuse x n to denote the ﬁrst n bits of the sequence x , . . . , x n ,and we use z (cid:107) z (cid:48) to denote the concatenation of two strings z and z (cid:48) . In addition, (cid:98) x N (cid:99) L denotes the truncation of x N tothe ﬁrst L bits.We use Bin ( n, p ) to denote the binomial distribution withsize n and success probability p and N ( µ, σ ) to denote thenormal distribution with mean µ and variance σ . Moreover, B ( p ) denotes the Bernoulli distribution with success prob-ability p . We use D ( P (cid:107) Q ) to denote the Kullback-Leiblerdivergence between distribution P and Q . A. The Channel Model

Let

BSC ( p ) denote a binary symmetric channel with cross-over probability p ∈ (0 , ) without feedback. That is, BSC ( p ) has input alphabet X = { , } and output alphabet Y = { , } ,and probability transition matrix p ( y | x ) = (cid:18) − p pp − p (cid:19) . Suppose that an encoder wishes to send a message m in amessage space M through BSC ( p ) . It ﬁrst encodes the mes-sage m using an encoding function f , and sends x N = f ( m ) through the channel. The decoder, upon receiving y N from thechannel, runs a decoding function g on y N to obtain m (cid:48) . Thepair ( f, g ) is called a code C N,R with block length N and rate a r X i v : . [ c s . I T ] F e b −√

54 13 12 p R Capacity for

BSC fb ( p ) Capacity for

BSC fbadv ( p ) Fig. 1: Capacity for BSC with feedback and adversarial BSCwith feedback. R = log |M| N . The (average) error probability of a code C N,R is deﬁned as P e ( C N,R ) := |M| (cid:80) m ∈M Pr[ m (cid:48) (cid:54) = m ] .The capacity of BSC ( p ) is well-known to be C ( BSC ( p )) = 1 − h ( p ) , where h ( · ) = − p log p − (1 − p ) log(1 − p ) is the binary entropyfunction, and the log is base- throughout.We will also consider the adversarial binary symmetricchannel BSC adv ( f ) in which at most f fraction of transmittedbits can be adversarially ﬂipped.Feedback allows the encoder to see exactly what the decoderreceives after each transmission and update its next transmis-sion accordingly. In the BSC with feedback, which we denoteas BSC fb ( p ) , the encoding function f consists of a sequenceof maps { f i } Ni =1 . Each f i takes as input m, y , . . . , y i − , andoutputs x i , the next bit to send. The decoder then runs g ( y N ) to obtain m (cid:48) .It is well-known that feedback does not improve the channelcapacity: C ( BSC fb ( p )) = 1 − h ( p ) . For the adversarial feedback BSC channel

BSC fbadv ( f ) , anupper bound on the capacity was ﬁrst shown by Berlekamp[2]. He also gives a lower bound that coincides with the upperbound when f ≥ −√ . A lower bound that coincides withthe upper bound for f < −√ was given by Zigangirov [1],thus determining the capacity for BSC fbadv ( f ) : C ( BSC fbadv ( f )) = (cid:40) − h ( f ) if ≤ f ≤ −√ , (1 − f ) log √ if −√ < f ≤ . We say that a code C for the BSC fbadv ( f ) is admissible if C can correct any error pattern with error fraction at most f .We say that a sequence of codes {C N,R } N for BSC fb ( p ) is admissible if the error probability P e ( C N,R ) tends to as N goes to inﬁnity. Fig. 2: The decoder’s stack with (cid:96) = 2 . B. Markov Chains

Deﬁnition 1.

A discrete stochastic process { X i } is said to bean ( (cid:96) − -th order Markov chain if for any i , Pr[ X i = x i | X = x , . . . , X i − = x i − ]= Pr[ X i = x i | X i − (cid:96) +1 = x i − (cid:96) +1 , . . . , X i − = x i − ] , for all x , . . . , x i ∈ X . C. Rubber Method

Here we brieﬂy present the rubber method for

BSC fbadv ( f ) [13]. Let A N (cid:48) (cid:96) denote the set of binary sequences of length N (cid:48) with no (cid:96) consecutive zeros. Such sequences are called skeleton sequences . The sender chooses a skeleton sequence x N (cid:48) ∈ A N (cid:48) (cid:96) and the decoder’s goal is to recover that sequencecorrectly. The idea is that the encoder can use (cid:96) consecutivezeros to signal an error. Speciﬁcally, we have • Decoding g R : the decoder maintains a stack of receivedbits, which begins empty. Whenever the decoder receivesa bit, it inserts the received bit onto the stack and checksif there are consecutive (cid:96) zeros in the stack. If yes, itremoves these (cid:96) zeros as well as the bit before theseconsecutive (cid:96) zeros from the stack. Finally, it truncatesthe output to N (cid:48) bits. • Encoding f R : if the decoder’s current stack is a preﬁx of x N (cid:48) , then send the next bit in x N (cid:48) . Otherwise send a .If x N (cid:48) has been sent in its entirety, then send for allremaining time steps. Proposition 2.

For skeleton sequence set A N (cid:48) (cid:96) and blocklength N , a code constructed using the rubber method isadmissible for BSC fbadv ( f ) if N (cid:48) + ( (cid:96) + 1) f N ≤ N. (1) Proof.

See Section 2.2 of [17].

Example 3.

Suppose the encoder chooses x = 011010 ∈ A and the maximum fraction of adversarial errors is f = 1 / .Suppose the ﬁrst three bits the decoder receives are ,which is not a preﬁx of x . The encoder then sends andsuppose decoder sees . The decoder then erases the lastthree bits (the consecutive zeros and the one before them) andits stack becomes . This is now a preﬁx of x and the encoderwould thus resend the second bit in x , which is . See Figure 2. . Shannon–Fano–Elias Code and Arithmetic Coding The Shannon–Fano–Elias code compresses a source se-quence with known distribution to near-optimal length. It usesthe cumulative distribution function F ( x ) to allot codewords.For a random variable X ∈ { , , . . . , M } with distribution p ,the codeword is (cid:98) ¯ F ( x ) (cid:99) l ( x ) where ¯ F ( x ) = (cid:88) a

Lemma 4 (Theorem 2.3.6, [19]) . A sequence A ( n ) is anorder- d constant-recursive sequence if for all n ≥ d + 1 , A ( n ) = c A ( n −

1) + c A ( n −

2) + · · · + c d A ( n − d ) . The n -th term A ( n ) in the sequence must be of the form A ( n ) = k ( n ) λ n + k ( n ) λ n + · · · + k d (cid:48) ( n ) λ nd (cid:48) , where λ i is a root with multiplicity d i of the polynomial λ d − c λ d − − · · · − c d , and k i ( n ) is a polynomial with degree d i − . Deﬁnition 5 ((8.3.16), [20]) . A matrix M is a positive (non-negative) matrix if every entry of M is positive (non-negative).A non-negative square matrix M is primitive if its k -thpower is positive for some natural number k . Lemma 6 (Perron–Frobenius Theorem, Page 674, [20]) . If M is a primitive matrix, then M has a positive real eigenvalue λ ∗ such that all other eigenvalues λ i have absolute value | λ i | < | λ ∗ | . Moreover, λ ∗ is a simple eigenvalue and itscorresponding column and row eigenvectors are positive . See [20, Ch. 8] for further detail about the Perron-FrobeniusTheorem. III. A K EY M ARKOV C HAIN

In this section we show that we can efﬁciently compute thedistribution of a Markov Chain that is uniformly distributedover A N(cid:96) .Recall that A N(cid:96) denotes the set of binary sequences of length N with no consecutive (cid:96) zeros. Lemma 7.

Let λ ∗ (cid:96) be the unique real solution that lies in (1 , of λ (cid:96) = λ (cid:96) − + λ (cid:96) − + · · · + 1 . (2) Then lim N →∞ |A N(cid:96) | λ ∗ N(cid:96) exists and is positive and ﬁnite. Proof.

We ﬁrst compute the cardinality of A N(cid:96) . Let A (cid:96) ( N ) denote |A N(cid:96) | .Consider all allowable sequences in A N(cid:96) . The number ofsequences in A N(cid:96) that begin with is A (cid:96) ( N − . The numberof sequences in A N(cid:96) that begin with is A (cid:96) ( N − , and soon. Continuing recursively we have that A (cid:96) ( N ) = A (cid:96) ( N −

1) + A (cid:96) ( N −

2) + · · · + A (cid:96) ( N − (cid:96) ) . Let λ , . . . , λ (cid:96) (cid:48) be the roots of equation (2), where λ i hasmultiplicity d i . Note that λ (cid:96) = λ (cid:96) − + λ (cid:96) − + · · · +1 is also thecharacteristic polynomial for the following (cid:96) × (cid:96) non-negativematrix: M N(cid:96) =  . . .

00 0 1 . . . . . . . . . . . . . . . . . . . . .

11 1 1 . . .  . Therefore λ , . . . , λ (cid:96) (cid:48) are also the eigenvalues of M N(cid:96) . Itis easy to see that equation (2) has exactly one positivereal root that lies inside (1 , and no real root in [2 , + ∞ ) .Without loss of generality, we assume that λ is this root.Moreover, M N(cid:96) is primitive since ( M N(cid:96) ) (cid:96) is a positive matrix.According to Perron–Frobenius theorem, λ is a simple rootwith multiplicity of equation (2) and | λ i | < | λ | for i = 2 , . . . , (cid:96) (cid:48) . Therefore, A (cid:96) ( N ) = k λ N + k ( N ) λ N + · · · + k (cid:96) (cid:48) ( N ) λ N(cid:96) (cid:48) , (3)where k i ( · ) is a polynomial with degree d i − . Since λ isa simple dominating root and its corresponding column androw eigenvectors are positive, according to Theorem 2.4.2 in[19] and Lemma 6, lim N →∞ | A N(cid:96) | λ N = k > Note that Lemma 7 implies that lim N →∞ N log |A N(cid:96) | = log λ ∗ (cid:96) . Lemma 8.

The stochastic process that is uniformly distributedover A N(cid:96) is an ( (cid:96) − -th order Markov Chain.Proof. Let z be any binary sequence. We abuse the notationslightly by deﬁning A (cid:96) ( z ) to be the number of allowablesequences in A (cid:96)N that begin with z .Suppose { X i } Ni =1 is a stochastic process that is uniformlydistributed over A N(cid:96) . Then we have • Pr[ X = 1] = number of sequences begin with |A N(cid:96) | = A (cid:96) ( N − A (cid:96) ( N ) ; • Pr[ X = 0] = 1 − A (cid:96) ( N − A (cid:96) ( N ) ; • For i ≥ , for any z ∈ { , } i − , Pr[ X i = 1 | X , . . . , X i − = z ] = A (cid:96) ( N − i ) A (cid:96) ( z ) , Pr[ X i = 0 | X , . . . , X i − = z ] = 1 − A (cid:96) ( N − i ) A (cid:96) ( z ) . o see that { X i } is an ( (cid:96) − -th order Markov Chain, weonly need to show that for any i ≥ (cid:96) , z ∈ { , } i − Pr[ X i = 1 | X , . . . , X i − = z ]= Pr[ X i = 1 | X i − (cid:96) +1 , . . . , X i − = z [ i − (cid:96) + 1 , i − . Fix a z ∈ { , } i − for i ≥ (cid:96) . Suppose z ends with α zeros.Then ≤ α ≤ (cid:96) − since the sequence is in A N(cid:96) . We havethat A (cid:96) ( z ) = A (cid:96) ( N − i + α + 1) − α − (cid:88) k =0 A (cid:96) ( N − i + k + 1) . (4)When α = 0 , equation (4) becomes A (cid:96) ( z ) = A (cid:96) ( N − i + α + 1) . This indicates that for any z and z (cid:48) that havethe same last (cid:96) − bits, A (cid:96) ( z ) = A (cid:96) ( z (cid:48) ) and that Pr[ X i =1 | X , . . . , X i − = z ] = Pr[ X i = 1 | X , . . . , X i − = z (cid:48) ] .Then for any x ∈ { , } (cid:96) − and any x (cid:48) ∈ { , } i − (cid:96) , wehave Pr[ X i = 1 | X i − (cid:96) +1 , . . . , X i − = x ]= (cid:88) x (cid:48)(cid:48) ∈{ , } i − (cid:96) { Pr[ X i = 1 | X , . . . , X i − = x (cid:48)(cid:48) (cid:107) x ] · Pr[ X i − (cid:96) = x (cid:48)(cid:48) | X i − (cid:96) +1 , . . . , X i − = x ] (cid:9) = (cid:88) x (cid:48)(cid:48) ∈{ , } i − (cid:96) { Pr[ X i = 1 | X , . . . , X i − = x (cid:48) (cid:107) x ] · Pr[ X i − (cid:96) = x (cid:48)(cid:48) | X i − (cid:96) +1 , . . . , X i − = x ] (cid:9) = Pr[ X i = 1 | X , . . . , X i − = x (cid:48) (cid:107) x ] , where the second equation comes from the fact that for anytwo sequences with the same last (cid:96) − bits, we have Pr[ X i = 1 | X , . . . , X i − = x (cid:48)(cid:48) (cid:107) x ]= Pr[ X i = 1 | X , . . . , X i − = x (cid:48) (cid:107) x ] . The proof shows that to compute the probability of the nextsymbol in the string given the past, we only need to compute A (cid:96) ( N ) for various values of N . This can be computed using A (cid:96) ( N ) = k λ N + k ( N ) λ N + · · · + k (cid:96) (cid:48) ( N ) λ N(cid:96) (cid:48) where λ i arethe roots of equation (2) and c , k ( N ) , . . . , k (cid:96) (cid:48) ( N ) can bedetermined by the initial conditions A (cid:96) (1) = 2 , . . . , A (cid:96) ( (cid:96) −

1) = 2 (cid:96) − , A (cid:96) ( (cid:96) ) = 2 (cid:96) − .Note that when N is large, A (cid:96) ( N ) is well-approximated as A (cid:96) ( N ) ≈ k λ ∗ N (cid:96) . Under this approximation the Markov Chainbecomes time-invariant. Example 9.

Consider the case (cid:96) = 2 . That is, we forbidtwo consecutive zeros in the skeleton sequence. Then thecharacteristic polynomial is λ − λ − . The two roots are λ = √ and λ = −√ respectively. The initial conditionis A (cid:96) (1) = 2 , A (cid:96) (2) = 3 . Therefore A (cid:96) ( N ) = k λ N + k λ N where k = √ √ , k = √ − √ . See also [3, Ex. 4.7] IV. A P RACTICAL C ODING S CHEME

In this section we combine arithmetic coding and the rubbermethod to give an efﬁcient feedback code for

BSC fbadv ( f ) and BSC fb ( p ) . First we describe a modiﬁed version of arithmeticcoding that will be used in our scheme. Consider the followingpair of algorithms ( Decom (cid:96) , Com (cid:96) ) : Algorithm 10. ( Decom (cid:96) , Com (cid:96) ) Let L = (cid:100) log |A N(cid:96) |(cid:101) . Let { X i } Ni =1 be a stochastic pro-cess that is uniformly distributed over A N(cid:96) . Let ( A C , A D ) where A C : A N(cid:96) (cid:55)→ { , } L +1 and A D : { , } L +1 (cid:55)→A N(cid:96) ∪ {⊥} be the compression and decompression algo-rithms for arithmetic coding applied to { X i } Ni =1 , wherethe decompressor outputs ⊥ if its input is not a validcodeword. Let L (cid:48) be any integer such that L (cid:48) ≤ L − . Decom (cid:96) ( m ) : { , } L (cid:48) (cid:55)→ A N(cid:96)

1) Run the decompress algorithm A D ( m (cid:107) m (cid:48) ) for allpossible m (cid:48) ∈ { , } L +1 − L (cid:48) . Let the ﬁrst non- ⊥ output be A D ( m (cid:107) m (cid:48) ) = x N . If there’s no such x N ,set x N to be a random sequence in A N(cid:96) .2) Output x N . Com (cid:96) ( x N ) : A N(cid:96) (cid:55)→ { , } L (cid:48) :1) Output (cid:98) A C ( x N ) (cid:99) L (cid:48) . Lemma 11.

The pair of algorithms ( Decom (cid:96) , Com (cid:96) ) de-scribed in Algorithm 10 satisﬁes Com (cid:96) ( Decom (cid:96) ( m )) = m, ∀ m ∈ { , } L (cid:48) . Proof.

Suppose that all sequences in A N(cid:96) are lexicographicallysorted and x N + 1 is the sequence following x N . Note that forsome binary sequences of length L + 1 , A D might output ⊥ if the binary sequence is not a Shannon-Fano-Elias codewordfor any x N ∈ A N(cid:96) .As long as there exists an m (cid:48) such that A D ( m (cid:107) m (cid:48) ) (cid:54) = ⊥ , Com ( Decom ( m )) = m due to the correctness of arith-metic coding. Therefore we only need to show that for any m ∈ { , } L (cid:48) , there exists m (cid:48) ∈ { , } L +1 − L (cid:48) such that A D ( m (cid:107) m (cid:48) ) (cid:54) = ⊥ .We will prove that for any m ∈ { , } L (cid:48) , there mustexist an x N ∈ A N(cid:96) such that m is a preﬁx of A C ( x N ) .To see this, let each sequence m in { , } L (cid:48) represent aninterval of length L (cid:48) in [0 , such that all of the realnumbers inside the interval represented by m have preﬁx m .Note that A C ( x N ) falls between F ( x N ) and F ( x N + 1) ,where F ( · ) is the cumulative distribution function of { X i } Ni =1 .As X N is uniformly distributed over A N(cid:96) , for any x N , F ( x N + 1) − F ( x N ) = |A N(cid:96) | ≤ L (cid:48) +2 . Therefore, for any m , the interval represented by m with length L (cid:48) must containboth F ( x N ) and F ( x N +1) for at least one x N . This indicatesthat A C ( x N ) ∈ ( F ( x N ) , F ( x N + 1)) must fall inside theinterval represented by m . That is, m must be a preﬁx of A C ( x N ) .Now we describe the construction of our overall scheme: onstruction 12. The encoding and decoding of C (cid:96),N,R are as follows: Encoding : • Let m NR be a message source of length N R .Find the minimum natural number N (cid:48) such that (cid:100) log |A N (cid:48) (cid:96) |(cid:101) ≥ N R + 3 . • Run

Decom (cid:96) ( m ) and denote the output as x N (cid:48) . Let x N (cid:48) be the skeleton sequence and send it throughthe feedback channel using the rubber method. Decoding : • Let y N be the sequence received from the feedbackchannel. Run the decoding algorithm of the rubbermethod on y to get (cid:101) x N (cid:48) . If (cid:101) x N (cid:48) / ∈ A N(cid:96) , set (cid:101) x N (cid:48) tobe a random skeleton sequences in A N (cid:48) (cid:96) . • Otherwise, output m (cid:48) = Com (cid:96) ( (cid:101) x N (cid:48) ) . Proposition 13.

The code C (cid:96),N,R in Construction 12 is ad-missible for the BSC fbadv ( f ) if N (cid:48) ≤ (1 − ( (cid:96) + 1) f ) N .Proof. Follows directly from Proposition 2 and Lemma 11.Note that in the ﬁrst step of encoding, we can ﬁnd N (cid:48) simplyby computing A (cid:101) N(cid:96) for (cid:101) N = N R + 3 , . . . , N R + 6 since (cid:102) N ≤ |A (cid:101) N(cid:96) | ≤ (cid:101) N . See Lemma 25 in Appendix.We further note that the above coding scheme also worksfor stochastic feedback BSC channel BSC fb ( p ) : Proposition 14.

The sequence of codes {C (cid:96),N,R } N , each ofwhich is constructed as in Construction 12, is admissible forthe BSC fb ( p ) if R < R (cid:96) ( p ) = (1 − ( (cid:96) + 1) p ) log λ ∗ (cid:96) .Proof. The fraction of errors that can be corrected by C (cid:96),N,R is f N = 1 (cid:96) + 1 (cid:18) − N (cid:48) N (cid:19) . When N tends to inﬁnity, lim N →∞ f N = f ∗ = 1 (cid:96) + 1 (cid:18) − R log λ ∗ (cid:96) (cid:19) . according to Lemma 7.If the fraction of errors is less than f N , then C (cid:96),N,R candecode correctly. Let E i be the indicator of whether the i -thtransmitted bit is ﬂipped. The error probability of C (cid:96),N,R isthus P e ( C (cid:96),N,R ) = Pr (cid:34) N N (cid:88) i =1 E i ≥ f N (cid:35) . The result then follows by the law of large numbers.V. M

AIN R ESULTS

We now show that, for certain parameters, our codes achievethe capacity and the optimal error-exponent, second-order rate,and moderate deviations constant for certain parameters. (cid:96) log λ ∗ (cid:96) p (cid:96) R ∗ (cid:96) TABLE I: Numerical results of log λ ∗ (cid:96) , tangent points p (cid:96) andtangent rates R ∗ (cid:96)

15 14 13 12 . . . . p R C ( BSC fb ( p )) R ( p ) R ( p ) R ( p ) Fig. 3: R (cid:96) ( p ) for different (cid:96) . A. Capacity

Theorem 15.

For any integer (cid:96) ≥ , R (cid:96) ( p ) is tangent to C ( BSC fb ( p )) . For p (cid:96) = ( (cid:96) +1) log λ ∗ (cid:96) , R (cid:96) ( p (cid:96) ) = C ( BSC fb ( p (cid:96) )) . That is, for any (cid:15) > , the sequence of codes {C (cid:96),N,R } N asconstructed in Construction 12 is admissible for BSC fb ( p (cid:96) ) with R = C ( BSC fb ( p (cid:96) )) − (cid:15) .Proof. Note that according to Theorem 2 of [17], R (cid:96) ( p ) istangent to C ( BSC fb ( p )) . Moreover according to Section 3.6of [2], when p = p (cid:96) , R (cid:96) ( p (cid:96) ) = C ( BSC fb ( p (cid:96) )) . The result thenfollows from Proposition 14.We call p (cid:96) the tangent points and R ∗ (cid:96) = R (cid:96) ( p (cid:96) ) the tangentrates . The tangent points p (cid:96) , tangent rates R ∗ (cid:96) , and log λ ∗ (cid:96) values for different (cid:96) are listed in Table I.The function R (cid:96) ( p ) for different (cid:96) is plotted in Figure 3.That the rubber method would achieve the capacity of the BSC fb ( p (cid:96) ) is implicit in [17]. We consider three more-reﬁnedperformance measures. B. Error-exponent

Lemma 16 (Sphere-packing bound with pre-factor [5], [21]) . Let {C N,R } N be a sequence of codes for the BSC fb ( p ) , eachwith rate R < C ( BSC fb ( p )) . Let q ∈ (0 , ) s.t. R = 1 − h ( q ) .Let E sp ( R ) = D ( B ( q ) (cid:107) B ( p )) and E (cid:48) sp ( R ) be the slope ofhe error exponent at R . Then the error probability P e ( C N,R ) satisﬁes P e ( C N,R ) ≥ K N (1+ | E (cid:48) sp ( R ) | ) e − NE sp ( R ) , where K is a positive constant depending on R . Theorem 17.

For any ﬁxed (cid:96) ≥ , consider the sequenceof codes {C (cid:96),N,R ∗ (cid:96) } N at the tangent rate R ∗ (cid:96) . That is, R ∗ (cid:96) = R (cid:96) ( p (cid:96) ) = 1 − h ( p (cid:96) ) . Then for the BSC fb ( p ) with p < p (cid:96) , {C (cid:96),N,R ∗ (cid:96) } N at rate R ∗ (cid:96) achieves optimal error exponent P e ( C (cid:96),N,R ∗ (cid:96) ) ≤ O (cid:18) √ N (cid:19) e − N · E sp ( R ) . In particular, lim N →∞ − N log P e ( C (cid:96),N,R ∗ (cid:96) ) = E sp ( R ) . Remark . The “pre-factor” order achieved by our schemeis O ( √ N ) , which is worse than the optimal order of O ( N

12 (1+ | E (cid:48) sp ( R ) | ) ) in Theorem 16. Interestingly, for the binaryerasure channel (BEC), both with and without feedback, theoptimal pre-factor is O ( √ N ) [5, Theorem 2]. Rubber codingattempts to emulate a BEC using the BSC, which mightexplain this connection. A similar gap from strict optimalityoccurs in the second-order coding rate results to follow.Making the connection between rubber coding and the BECmore precise is an interesting topic for future study. Proof.

Let R = log λ ∗ (cid:96) . Let f N = (cid:96) +1 (1 − N (cid:48) N ) be the fractionof errors C (cid:96),N,R ∗ (cid:96) can correct. Since (cid:100) log |A N (cid:48) (cid:96) |(cid:101) ≥ N R N + 3 ,and (cid:100) log |A N (cid:48) − (cid:96) |(cid:101) < N R N + 3 , we have N (cid:48) N ≤ R ∗ (cid:96) log λ ∗ (cid:96) + O (cid:18) N (cid:19) . This indicates that f N = 1 (cid:96) + 1 (cid:18) − N (cid:48) N (cid:19) = p (cid:96) + O (cid:18) N (cid:19) . If the number of errors is less than

N f N , C (cid:96),N,R ∗ (cid:96) cancorrectly decode the message. Deﬁne r N = pf N − f N − p . Let E i be the indicator random variable of whether the i -th bit isﬂipped. By Lemma 24, when N is large, the error probability P e ( C (cid:96),N,R ∗ (cid:96) ) satisﬁes P e ( C (cid:96),N,R ∗ (cid:96) ) = Pr (cid:34) N (cid:88) i =1 E i ≥ N f N (cid:35) ≤ e − ND ( B ( f N ) (cid:107) B ( p )) (cid:112) πf N (1 − f N ) N (cid:32) a N + o (cid:32) − r (cid:98) (1 − f N ) N (cid:99) +1 N − r N (cid:33)(cid:33) , where a N = 1 − r (cid:98) (1 − f N ) N (cid:99) +1 N exp − ( (cid:98) (1 − f N ) N (cid:99) +12 f N (1 − f N ) N )1 − r N exp ( − f N (1 − f N ) N ) . Since D ( B ( · ) (cid:107) B ( p )) is continuous, D ( B ( f N ) (cid:107) B ( p )) = D ( B ( p (cid:96) ) (cid:107) B ( p )) + O (cid:18) N (cid:19) . Therefore, P e ( C (cid:96),N,R ∗ (cid:96) ) ≤ O (cid:18) √ N (cid:19) e − N ( E sp ( R )+ O ( N )) = O (cid:18) √ N (cid:19) e − NE sp ( R ) . C. Second-order Rate

Lemma 19 (Second-order coding rate: Theorem 15, [22]) . Given a block length N and an (cid:15) such that < (cid:15) < , thelargest possible rate of a code for the BSC fb ( p ) with errorprobability less than or equal to (cid:15) is C − √ N (cid:114) p (1 − p ) log − pp Φ − (1 − (cid:15) ) + log N N + o (1) , where Φ denotes the standard Gaussian distribution. Theorem 20.

For any ﬁxed (cid:96) ≥ , consider the BSC fb ( p ) withcross-over probability p = p (cid:96) . Fix (cid:15) ∈ (0 , , and let R ( N, (cid:15) ) denote the largest possible rate R such that C (cid:96),N,R ( N,(cid:15) ) haserror probability at most (cid:15) , and let C denote the capacity ofthe BSC fb ( p ) . Then for large N , R ( N, (cid:15) ) ≥ C − √ N (cid:114) p (1 − p ) log − pp Φ − (1 − (cid:15) ) − O (cid:18) N (cid:19) . Remark . Note the log

N/N term is “missing” from theexpansion in Theorem 20. See Remark 18.

Proof.

Let R N = C − √ N (cid:114) p (1 − p ) log − pp Φ − (1 − (cid:15) ) − c N , where c is a positive constant which we will specify later. Wenow show that for sufﬁciently large N , the error probabilityof C (cid:96),N,R N satisﬁes P e ( C (cid:96),N,R N ) < (cid:15) .Let e ∗ N = N(cid:96) +1 (1 − N (cid:48) N ) denote the number of errors that C (cid:96),N,R N is capable of correcting. According to our construc-tion, (cid:100) log |A N (cid:48) (cid:96) |(cid:101) ≥ N R N +3 , and (cid:100) log |A N (cid:48) − (cid:96) |(cid:101) < N R N +3 ,we have N (cid:48) N ≤ R N log λ ∗ (cid:96) + c N + o (cid:18) N (cid:19) , where c = − log k log λ ∗ (cid:96) . Therefore e ∗ N ≥ N(cid:96) + 1 (cid:18) − R log λ ∗ (cid:96) (cid:19) − c (cid:96) + 1 − o (1) . Let E i be the random variable such that E i = 1 if the i -th bitis ﬂipped. Let Ψ N be the c.d.f. of the binomial distribution Bin ( N, p ) . According to Berry–Esseen theorem (Section 5,[23]), for any N , for any x , (cid:12)(cid:12) Ψ N ( xσ √ N + N p ) − Φ( x ) (cid:12)(cid:12)(cid:12) ≤ c √ N , where Φ is the c.d.f. of standard Gaussian and σ = (cid:112) p (1 − p ) , c = . pσ . Therefore P e ( C (cid:96),N,R N ) ≤ − Ψ N ( e ∗ N ) ≤ − Ψ N (cid:18) N(cid:96) + 1 (cid:18) − R N log λ ∗ (cid:96) (cid:19) − c (cid:96) + 1 − o (1) (cid:19) ≤ − Φ (cid:18) σ √ N (cid:18) N(cid:96) + 1 (cid:18) − R N log λ ∗ (cid:96) (cid:19) − c (cid:96) + 1 − o (1) − N p (cid:19)(cid:19) + c √ N .

Let R = log λ ∗ (cid:96) . Note that σ √ N (cid:18) N(cid:96) + 1 (cid:18) − R N log λ ∗ (cid:96) (cid:19) − c (cid:96) + 1 − o (1) − N p (cid:19) = 1 σ √ N N(cid:96) + 1 (cid:20) − ( (cid:96) + 1) p − CR + σ √ N ( (cid:96) + 1)Φ − (1 − (cid:15) ) (cid:21) + 1 σ √ N (cid:18) c ( (cid:96) + 1) R − c ( (cid:96) + 1) − o (1) (cid:19) =Φ − (1 − (cid:15) ) + 1 σ √ N (cid:18) c ( (cid:96) + 1) R − c ( (cid:96) + 1) − o (1) (cid:19) , I (cid:48) where the ﬁrst equality comes from the fact that when p = p (cid:96) , log − pp = R ( (cid:96) + 1) . See Section 3.6 in [2]. The secondequality comes from the fact that C = (1 − ( (cid:96) + 1) p ) R .Therefore P e ( C (cid:96),N,R N ) ≤ − Φ (cid:0) Φ − (1 − (cid:15) )+ 1 σ √ N (cid:18) c ( (cid:96) + 1) R − c ( (cid:96) + 1) − o (1) (cid:19)(cid:19) + c √ N =1 − (cid:20) Φ(Φ − (1 − (cid:15) )) + Φ (cid:48) (Φ − (1 − (cid:15) )) √ N (cid:18) c ( (cid:96) + 1) R − c ( (cid:96) + 1) (cid:19)(cid:21) + o (cid:18) √ N (cid:19) + c √ N = (cid:15) − Φ (cid:48) (Φ − (1 − (cid:15) )) √ N (cid:18) c ( (cid:96) + 1) R − c ( (cid:96) + 1) (cid:19) + c √ N + o ( 1 √ N ) . For N large enough, o ( √ N ) < √ N . By picking c ≥ (cid:18) c + 1Φ (cid:48) (Φ − (1 − (cid:15) )) + c (cid:96) + 1 (cid:19) ( (cid:96) + 1) R , we have that Φ (cid:48) (Φ − (1 − (cid:15) )) √ N ( c ( (cid:96) +1) R − c ( (cid:96) +1) )) − c √ N − o ( √ N ) is positive eventually, which implies P e ( C (cid:96),N,R N ) < (cid:15) . D. Moderate Deviations

Lemma 22 (Moderate deviations, Corollary 1, [24]) . For anysequence of real numbers (cid:15) N s.t. (cid:15) N → as N → ∞ and (cid:15) N √ N → ∞ as N → ∞ , for any sequence of codes {C N,R N } N for the BSC fb ( p ) such that R N ≥ C ( BSC fb ( p )) − (cid:15) N , we have lim inf N →∞ N (cid:15) N log P e ( C N,R N ) ≥ − p (1 − p ) log − pp . Theorem 23.

Fix any (cid:96) ≥ . Let C be the capacity of the BSC fb ( p (cid:96) ) . For any sequence of real numbers (cid:15) N s.t. (cid:15) N → as N → ∞ and (cid:15) N √ N → ∞ as N → ∞ , consider thesequence of codes {C (cid:96),N,R N } N such that R N = C − (cid:15) N . Let P e ( C (cid:96),N,R N ) denote the average error probability of C (cid:96),N,R N over the BSC fb ( p (cid:96) ) . Then lim N →∞ N (cid:15) N log P e ( C (cid:96),N,R N ) = − p (1 − p ) log − pp . Proof.

Let R = log λ ∗ (cid:96) . Note that C (cid:96),N,R N has rate R N = C − (cid:15) N . The maximum fraction of errors it can correct is thus f N = 1 (cid:96) + 1 (cid:18) − N (cid:48) N (cid:19) = 1 (cid:96) + 1 (cid:18) − C − (cid:15) N R (cid:19) + O (cid:18) N (cid:19) = p (cid:96) + (cid:15) N ( (cid:96) + 1) R + O (cid:18) N (cid:19) . Let E i be the indicator random variable of whether the i -th bit is ﬂipped. Then the error probability P e ( C (cid:96),N,R N ) ≤ Pr[ (cid:80) Ni =1 E i ≥ N f N ] and P e ( C (cid:96),N,R N ) ≥ Pr[ (cid:80) Ni =1 E i ≥ N f N ] . Let (cid:15) (cid:48) N = ( f N − p (cid:96) )( (cid:96) + 1) R . Then lim N →∞ (cid:15) N (cid:15) (cid:48) N = lim N →∞ ( f N − p (cid:96) − O ( N ))( (cid:96) + 1) R ( f N − p (cid:96) )( (cid:96) + 1) R = 1 , where the last step comes from the fact that (cid:15) N = Ω( √ N ) , f N − p (cid:96) = Ω( √ N ) . Deﬁne Z N = N(cid:15) (cid:48) N (cid:80) Ni =1 ( E i − p ) . Thenwe have lim N →∞ N (cid:15) N log P e ( C (cid:96),N,R N )= lim N →∞ N (cid:15) N log Pr (cid:34) N (cid:88) i =1 E i ≥ N f N (cid:35) = lim N →∞ N (cid:15) N log Pr (cid:20) Z N ≥ f N − p (cid:96) (cid:15) (cid:48) N (cid:21) = lim N →∞ N (cid:15) N log Pr (cid:20) Z N ≥ (cid:96) + 1) R (cid:21) = lim N →∞ (cid:15) (cid:48) N (cid:15) N N (cid:15) (cid:48) N log Pr (cid:20) Z N ≥ (cid:96) + 1) R ) (cid:21) = − p (1 − p ) log − pp . where the last equation comes from Theorem 3.7.1 in [25] andthe fact that when p = p (cid:96) , ( (cid:96) + 1) R = log − pp .I. A PPENDIX

Lemma 24.

Let E , . . . , E N be i.i.d. random variables with E ∼ B ( p ) . Let f N be a sequence of real numbers convergingto f ∗ ∈ (0 , such that f ∗ > p . Then for large N , Pr (cid:34) N (cid:88) i =1 E i ≥ N f N (cid:35) ≤ e − ND ( B ( f N ) (cid:107) B ( p )) (cid:112) πf N (1 − f N ) N (cid:32) a N + o (cid:32) − r (cid:98) (1 − f N ) N (cid:99) +1 N − r N (cid:33)(cid:33) , where r N = pf N − f N − p ,a N = 1 − r (cid:98) (1 − f N ) N (cid:99) +1 N exp − ( (cid:98) (1 − f N ) N (cid:99) +12 f N (1 − f N ) N )1 − r N exp ( − f N (1 − f N ) N ) . Proof.

We follow Theorem 2 in [26]. For any ﬁxed N , let Y , . . . , Y N be i.i.d. random variables with Y ∼ B ( f N ) . Forany integer S ∈ [0 , N ] , we have that Pr (cid:34) N (cid:88) i =1 E i = S (cid:35) = (cid:18) NS (cid:19) p S (1 − p ) N − S , Pr (cid:34) N (cid:88) i =1 Y i = S (cid:35) = (cid:18) NS (cid:19) f SN (1 − f N ) N − S . Therefore for any integer j , for large N , Pr (cid:34) N (cid:88) i =1 E i = (cid:100) N f N (cid:101) + j (cid:35) = Pr (cid:34) N (cid:88) i =1 Y i = (cid:100) N f N (cid:101) + j (cid:35) (cid:18) pf N (cid:19) (cid:100) Nf N (cid:101) + j · (cid:18) − p − f N (cid:19) (cid:98) N (1 − f N ) (cid:99)− j ≤ Pr (cid:34) N (cid:88) i =1 Y i = (cid:100) N f N (cid:101) + j (cid:35) (cid:18) pf N (cid:19) Nf N + j · (cid:18) − p − f N (cid:19) N (1 − f N ) − j = Pr (cid:34) N (cid:88) i =1 Y i = (cid:100) N f N (cid:101) + j (cid:35) e − ND ( B ( f N ) (cid:107) B ( p )) r jN , where the inequality comes from the fact that when N is large, f N > p . Then we have Pr (cid:34) N (cid:88) i =1 E i ≥ N f N (cid:35) = (cid:98) (1 − f N ) N (cid:99) (cid:88) j =0 Pr (cid:34) N (cid:88) i =1 E i = (cid:100) N f N (cid:101) + j (cid:35) ≤ e − ND ( B ( f N ) (cid:107) B ( p )) · (cid:98) (1 − f N ) N (cid:99) (cid:88) j =0 Pr (cid:34) N (cid:88) i =1 Y i = (cid:100) N f N (cid:101) + j (cid:35) r jN . According to the local central limit theorem (see Theorem 2of [27]), for any j = 0 , , . . . , (cid:98) (1 − f N ) N (cid:99) Pr (cid:34) N (cid:88) i =1 Y i = (cid:100) N f N (cid:101) + j (cid:35) ≤ (cid:112) πf N (1 − f N ) N exp (cid:18) − j f N (1 − f N ) N (cid:19) + o (cid:18) √ N (cid:19) ≤ (cid:112) πf N (1 − f N ) N exp (cid:18) − j f N (1 − f N ) N (cid:19) + o (cid:18) √ N (cid:19) . Plugging back we have, Pr (cid:34) N (cid:88) i =1 E i ≥ N f N (cid:35) ≤ e − ND ( B ( f N ) (cid:107) B ( p )) (cid:112) πf N (1 − f N ) N · (cid:98) (1 − f N ) N (cid:99) (cid:88) j =0 r jN (cid:20) exp (cid:18) − j f N (1 − f N ) N (cid:19) + o (1) (cid:21) = e − ND ( B ( f N ) (cid:107) B ( p )) (cid:112) πf N (1 − f N ) N (cid:32) a N + o (cid:32) − r (cid:98) (1 − f N ) N +1 N (cid:99) − r N (cid:33)(cid:33) . Lemma 25.

For any N , N ≤ A (cid:96) ( N ) ≤ N .Proof. It follows directly from the deﬁnition that A (cid:96) ( N ) ≤ N . To see that N ≤ A (cid:96) ( N ) , we use induction on N .Note that the initial conditions, A (cid:96) (1) = 2 , . . . , A (cid:96) ( N −

1) =2 (cid:96) − , A N(cid:96) = 2 (cid:96) − all satisfy the condition. Suppose that N ≤ A (cid:96) ( N ) holds for all i ≤ k . Then A (cid:96) ( k ) = A (cid:96) ( k −

1) + · · · + A (cid:96) ( k − (cid:96) ) ≥ k − + · · · + 2 k − (cid:96) = √ k − (cid:96) − √ k − √ ≥ k , where the last inequality comes from the fact that √ (cid:96) +1 ≤ √ (cid:96) − . Therefore N ≤ A (cid:96) ( N ) ≤ N .A CKNOWLEDGMENT

This research was supported by the US National ScienceFoundation under grant CCF-1956192 and the US ArmyResearch Ofﬁce under grant W911NF-18-1-0426.

EFERENCES[1] K. Zigangirov, “On the number of correctable errors for transmissionover a binary symmetrical channel with feedback,”

Problemy PeredachiInformatsii , vol. 12, no. 2, pp. 3–19, 1976.[2] E. R. Berlekamp, “Block coding for the binary symmetric channelwith noiseless, delayless feedback,” in

Error-Correcting Codes , H. B.Mann, Ed., the Mathematics Research Center, United States Army atthe University of Wisconsin, Madison. Wiley New York, May 1968,pp. 61–68.[3] T. M. Cover and J. A. Thomas,

Elements of Information Theory . JohnWiley & Sons, 2012.[4] C. Shannon, “The zero error capacity of a noisy channel,”

IRE Trans-actions on Information Theory , vol. 2, no. 3, pp. 8–19, 1956.[5] Y. Altu˘g and A. B. Wagner, “On exact asymptotics of the errorprobability in channel coding: symmetric channels,”

IEEE Trans. Inf.Theory , vol. 67, no. 2, pp. 844–868, 2020.[6] A. El Gamal and Y.-H. Kim,

Network Information Theory . CambridgeUniversity Press, 2011.[7] J. Schalkwijk and T. Kailath, “A coding scheme for additive noisechannels with feedback–I: No bandwidth constraint,”

IEEE Trans. Inf.Theory , vol. 12, no. 2, pp. 172–182, 1966.[8] J. Schalkwijk, “A coding scheme for additive noise channels withfeedback–II: Band-limited signals,”

IEEE Trans. Inf. Theory , vol. 12,no. 2, pp. 183–189, 1966.[9] M. Horstein, “Sequential transmission using noiseless feedback,”

IEEETrans. Inf. Theory , vol. 9, no. 3, pp. 136–143, 1963.[10] J. Schalkwijk, “A class of simple and optimal strategies for block codingon the binary symmetric channel with noiseless feedback,”

IEEE Trans.Inf. Theory , vol. 17, no. 3, pp. 283–287, 1971.[11] H. Yang and R. D. Wesel, “Finite-blocklength performance of sequentialtransmission over BSC with noiseless feedback,” in

Proc. IEEE Intl.Symp. on Inf. Theory (ISIT) , 2020, pp. 2161–2166.[12] A. Antonini, H. Yang, and R. D. Wesel, “Low complexity algorithmsfor transmission of short blocks over the BSC with full feedback,” in

Proc. IEEE Intl. Symp. on Inf. Theory (ISIT) , 2020, pp. 2173–2178.[13] R. Ahlswede, C. Deppe, and V. Lebedev, “Non-binary error correctingcodes with noiseless feedback, localized errors, or both,” in

Proc. IEEEIntl. Symp. on Inf. Theory (ISIT) , 2006, pp. 2486–2487.[14] V. S. Lebedev, “Coding with noiseless feedback,”

Problems of Informa-tion Transmission , vol. 52, no. 2, pp. 103–113, 2016.[15] C. Deppe, V. Lebedev, G. Maringer, and N. Polyanskii, “Coding withnoiseless feedback over the Z-channel,” in

International Computing andCombinatorics Conference . Springer, 2020, pp. 98–109.[16] C. Deppe, V. Lebedev, and G. Maringer, “Bounds for the capacityerror function for unidirectional channels with noiseless feedback,”

Theoretical Computer Science

Introduction to Data Compression . Morgan Kaufmann,2017.[19] P. Cull, M. Flahive, and R. Robson,

Difference Equations: From Rabbitsto Chaos . Spring-Verlag NY Inc., 2005.[20] C. D. Meyer,

Matrix Analysis and Applied Linear Algebra . Siam, 2000,vol. 71.[21] P. Elias, “Coding for two noisy channels,” in , 1955, pp. 61–76.[22] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Feedback in the non-asymptotic regime,”

IEEE Trans. on Inf. Theory , vol. 57, no. 8, pp.4903–4925, 2011.[23] R. N. Bhattacharya and E. C. Waymire,

A Basic Course in ProbabilityTheory . Springer, 2007.[24] Y. Altu˘g, H. V. Poor, and S. Verdú, “On ﬁxed-length channel codingwith feedback in the moderate deviations regime,” in

Proc. IEEE Intl.Symp. on Inf. Theory (ISIT) , 2015, pp. 1816–1820.[25] A. Dembo and O. Zeitouni,

Large Deviations Techniques and Applica-tions . Spring-Verlag NY Inc., 1998.[26] R. Arratia and L. Gordon, “Tutorial on large deviations for the binomialdistribution,”

Bulletin of Mathematical Biology , vol. 51, no. 1, pp. 125–131, 1989. [27] V. V. Petrov, “On local limit theorems for sums of independent randomvariables,”