On Reliability Function of BSC with Noisy Feedback
aa r X i v : . [ c s . I T ] N ov Problems of Information Transmission ,vol. 46, no. 2, pp. 3–23, 2010.
M. V. Burnashev , H. Yamamoto ON RELIABILITY FUNCTION OF BSCWITH NOISY FEEDBACK
For information transmission a binary symmetric channel is used. Thereis also another noisy binary symmetric channel (feedback channel), and thetransmitter observes without delay all the outputs of the forward channel viathat feedback channel. The transmission of an exponential number of messages(i.e. the transmission rate is positive) is considered. The achievable decoding errorexponent for such a combination of channels is investigated. It is shown that if thecrossover probability of the feedback channel is less than a certain positive value,then the achievable error exponent is better than the decoding error exponent ofthe channel without feedback. § 1. Introduction and main results
The binary symmetric channel
BSC( p ) with crossover probability < p < / (and q = 1 − p ) is considered. It is assumed that there is also the feedback BSC( p ) channel, andthe transmitter observes (without delay) all outputs of the forward BSC( p ) channel via thatnoisy feedback channel. No coding is used in the feedback channel (i.e. the receiver simplyresends to the transmitter all received outputs). In other words, the feedback channel is“passive” (see Fig. 1).Transm. BSC ( p ) BSC ( p ) Receiver ✲✲✲ ✛✻ ✲ x yx ′ Fig. 1. Channel modelWe consider the case when the overall transmission time n and M = e Rn equiprobablemessages { θ , . . . , θ M } are given. After the moment n , the receiver makes a decision ˆ θ on themessage transmitted. We are interested in the best possible decoding error exponent (andwhether it can exceed the similar exponent of the channel without feedback). The research described in this publication was made possible in part by the Russian Fund forFundamental Research (project numbers 06-01-00226 and 09-01-00536). n ) number M (i.e. R = 0 ) was investigated. In the paper we consider the case M = e Rn , R > ,strengthening methods of [1]. The main difference is that since now M is exponential in n ,we will need much more accurate investigation of the decoding error probability. Moreover,if M is nonexponential in n , then we know the best code for use during phase I - it is an“almost equidistant” code (i.e. all its codeword distances equal n/ o ( n ) ). If R > thenwe do not know such best code, and for that reason we choose that code randomly.Some results for channels with noiseless feedback can be found in [2–12], and in the noisyfeedback case – in [13, 14] (see also discussion in [1]).We show that if the crossover probability p of the feedback channel BSC( p ) is less thenthe certain positive value p ( p, R ) , then it is possible to improve the best error exponent E ( R, p ) of BSC( p ) without feedback. The transmission method with one “switching” moment,giving such an improvement, is described in § 4. It is similar to the method used in [1].We will need some definitions and notations. For L = 1 , , . . . define the critical rates R crit , ( p ) > R crit , ( p ) > . . . [6, 15, 16] R crit ,L ( p ) = ln 2 − h (cid:20) p / ( L +1) p / ( L +1) + q / ( L +1) (cid:21) , (1)where h ( x ) = − x ln x − (1 − x ) ln(1 − x ) . For L = 1 we omit the index L and simply write R crit ( p ) = R crit , ( p ) , E ( R, p ) = E ( R, p, , etc.Define the new critical rate R = R ( p ) as the unique root of the equation [17] min ≤ τ ≤ α ≤ / h ( α ) − h ( τ )=ln 2 − R α (1 − α ) − τ (1 − τ )1 + 2 p τ (1 − τ ) = √ pq √ pq . Then < R ( p ) < R crit ( p ) , < p < / .Denote by C ( p ) = ln 2 − h ( p ) the capacity of the BSC( p ) , and by E sp ( R, p ) the sphere-packing exponent E sp ( R, p ) = D ( δ GV ( R ) k p ) ,D ( x k y ) = x ln xy + (1 − x ) ln 1 − x − y , where δ GV ( R ) ≤ / is defined by the relation ln 2 − R = h ( δ GV ( R )) . Denote by E ( R, p ) the best decoding error exponent (the reliability function) of BSC( p ) without feedback. For R ( p ) ≤ R ≤ C ( p ) , and R = 0 the function E ( R, p ) is known exactly[6, 17]: E ( R, p ) = E r ( R, p ) = (cid:26) ln 2 − ln (cid:0) √ pq (cid:1) − R, R ( p ) ≤ R ≤ R crit ( p ) ,E sp ( R, p ) , R crit ( p ) ≤ R ≤ C ( p ) ,E (0 , p ) = E ex (0 , p ) = 14 ln 14 pq , (2)2here E r ( R, p ) , E ex ( R, p ) – “random coding” bounds [6, 15, 16] (see § 6).For < R < R ( p ) there are known only lower and upper bounds for the function E ( R, p ) . To describe the best known lower bound (the exponent E ex ( R, p ) of random codingwith “expurgation”), introduce the rate R min ( p ) (see (43)). Then < R min ( p ) < R ( p ) 0) = F ( R, p ) , F ( R, p, / 2) = E ( R, p ) .Denote by E ( p ) the best error exponent for two codewords over BSC( p ) (clearly, itremains the same for the channel with noiseless feedback as well) E ( p ) = 12 ln 14 pq . (6)Denote by F ( R, p, p ) the decoding error exponent of the transmission method describedin § 4 (with one switching moment). The inequality F ( R, p, p ) > E ( R, p ) is possible onlywhen R < R crit ( p ) .To describe the function p ( R, p ) of the critical noise level in the feedback channel,introduce the function t ( R, p ) = 3 [ E low ( R, p, − E low ( R, p )]ln( q/p ) , (7)3here E low ( R, p, , E low ( R, p ) = E low ( R, p, are defined in (5).The function t ( R, p ) monotonically decreases on R . For a given R ≥ it first increaseson p , and then decreases. Moreover, max R,p t ( R, p ) = max p t (0 , p ) ≈ t (0 , . ≈ . . Introduce the function p = p ( R, p ) ≤ t ( R, p ) as the unique root of the equation D ( t ( R, p ) k p ) = 2 R. (8)In particular, p (0 , p ) = t (0 , p ) = 3 (cid:2) ln 4 − (cid:0) p / + q / (cid:1)(cid:3) q/p ) . Define also t = t ( R, p ) ≥ p as the unique root of the equation D ( t k p ) = 2 R. (9)The main result of the paper representsT h e o r e m 1. If R < R crit ( p ) and p < p ( R, p ) , then F ( R, p, p ) ≥ max ≤ γ ≤ T ( R, p, p , γ ) > (cid:26) E ex ( R, p ) , ≤ R ≤ R ( p ) ,E ( R, p ) , R ( p ) ≤ R < R crit ( p ) , (10) where T = min (cid:26) γE low ( R/γ, p, − γt ( R/γ, p )3 ln qp , γE low ( R/γ, p ) + (1 − γ ) E ( p ) (cid:27) . (11)In other words, for any R < R crit ( p ) and p < p ( R, p ) the function F ( R, p, p ) is bigger(i.e. better) than the best known lower bound for the decoding error exponent of BSC( p ) without feedback.Moreover, there exists the positive function p ( R, p ) such that the following result holds.C o r o l l a r y 1. If R < R crit ( p ) and p < p ( R, p ) , then F ( R, p, p ) ≥ max ≤ γ ≤ T ( R, p, p , γ ) > E ( R, p ) . (12)This result follows from the proof of the Theorem 2 (see §3) and the fact that the function T ( R, p, p , γ ) is continuous on p . Remark 1. We do not try to find the best function p ( R, p ) , limiting ourselves to rathersimple estimates for it.On Fig. 2. the plot of the function p ( R, p ) for p = 0 . is given ( R crit ≈ . ). Notethat here p ( R, p ) > p for small R .It is more convenient for us to investigate first the function F ( R, p, p ) for p = 0 , i.e.for the channel with noiseless feedback. Then the next result holds.4 h e o r e m 2. If < p < / , R < R crit ( p ) , then F ( R, p, 0) = F ( R, p ) ≥ γ E low ( R/γ , p, > E ( R, p ) , (13) where γ ∈ ( R/R crit ( p ) , is the largest root of the equation (20). Remark 2. If p → , then the relations (10), (11) turn into the similar relation (13) forthe channel with noiseless feedback (see also remark 6 in §4). Remark 3. The transmission method described in § 4, reduces the problem to testing oftwo most probable (at a fixed moment) messages. Such strategy is not optimal even for oneswitching moment (at least, if p is very small). But it is relatively simple for investigation,and it gives already a reasonable improvement over the channel without feedback. Remark 4. In the preliminary publication [19, Proposition] it was claimed that p ( R, p ) =1 / for some range of rates R . In the proof of that result a miscalculation was found.Below in § 2 informal description of the transmission method is given. In § 3 thetransmission method with one switching moment in the case of the channel with noiselessfeedback is described and analyzed and the Theorem 2 is proved. In § 4 that method (slightlymodified) is investigated for the channel with noisy feedback and the Theorem 1 is proved.In § 5 it is clarified for which p noisy feedback behaves approximately like noiseless. A partof formulas used and some auxiliary results are presented in § 6.A preliminary (and simplified) paper variant (without detailed proofs) was published in[19]. § 2. Informal description of the transmission method We use the transmission method with one fixed switching moment at which the codingfunction is changed. That method is based on one idea and one useful observation.Idea. It is based on the inequality which follows from (41) E ex ( R, p ) < E low ( R, p, , R < R crit ( p ) . (14)Considering only R < R crit ( p ) we choose some positive γ < and partition the totaltransmission period [1 , n ] on two phases: [1 , γn ] (phase I) and ( γn, n ] (phase II) (at firstwe may think that γ is rather close to one).On phase I (i.e. on [0 , γn ] ) we use the “best” code of M codewords { x i } of length γn (seebelow). On that phase the transmitter only observes (via the feedback channel) outputs ofthe forward channel, but does not change the coding function. We set the value γ = γ ( R, p ) such that E ex ( R, p ) < γE low ( R/γ, p, , R < R crit ( p ) (15)(it is always possible due to continuity of the function γE low ( R/γ, p, on γ and the condition(14)). After phase I (at moment γn ) the receiver selects two most probable messages θ i , θ j .By the condition (15), the exponent of the probability that the true message θ true is notamong the chosen messages θ i , θ j , will be larger (i.e. better) than E ex ( R, p ) . Assume thatby some means the transmitter is also able to recover those two most probable messages5 i , θ j (it is certainly so in the noiseless feedback case). Then, on phase II (i.e. on ( γn, n ] )the transmitter only helps the receiver to decide between those two most probable messages θ i , θ j , using two opposite codewords of length (1 − γ ) n . The error exponent E ( p ) (see (6))on that phase is better than all other exponents involved. As a result, it gives the overalldecoding error exponent better than E ex ( R, p ) .It remains us to find the way the transmitter will able to recover those two most probablemessages θ i , θ j . It may seem that it is always possible if the value p is sufficiently small.But it is not true. With high probability (even close to one) the second θ j and the third θ k most probable messages will be approximately equiprobable, and then, for any p > , thetransmitter will not be able to rank them correctly (due to noise in the feedback channel).Observation. Fortunately, in that case (with high probability) the most probablemessage θ i will be much more probable than the second most probable message θ j . In suchcase the receiver makes a decision immediately after phase I (in favor of the most probablemessage θ i ), and it ignores all next signals from the transmitter.The description given is rather intuitive, and it should be checked analytically (which isdone below). § 3. Channel with noiseless feedback. Proof of Theorem 2 For simplicity, we start with the noiseless feedback case and describe formally thetransmission method which (after some modification) will be used for noisy feedback as well.Moreover, in the noisy feedback case we will need some formulas from the noiseless feedbackcase.Denote by F ( R, p ) = F ( R, p, the decoding error exponent of the transmission methoddescribed below (with one switching moment).P r o o f o f T h e o r e m 2. We consider M = e Rn messages θ , . . . , θ M . Using some γ ∈ [0 , (it will be chosen later), we partition the total transmission time [1 , n ] on twophases: [1 , γn ] (phase I) and ( γn, n ] (phase II). We perform as follows.1) On phase I (i.e. on [1 , γn ] ) we use the “best” code of M codewords { x i } of length γn (see below). On that phase the transmitter only observes (via the feedback channel) outputsof the forward channel, but does not change the coding function.2) Let x be the transmitted codeword (of length γn ) and y be the received (by thereceiver) block. After phase I, based on the block y , the transmitter selects two messages θ i , θ j (codewords x i , x j ) which are the most probable for the receiver, and ignores all theremaining messages { θ k } . If among the selected messages θ i , θ j there is the true message θ true ,then on phase II (i.e. on ( γn, n ] ) the transmitter only helps the receiver to decide betweenthose two most probable messages θ i , θ j , using two opposite codewords of length (1 − γ ) n .If the true message θ true is not among those two selected messages, then the transmittersends an arbitrary block. After moment n the receiver makes a decision in favor of the mostprobable of those two remaining messages θ i , θ j (based on all received on [1 , n ] signals).Clearly, a decoding error occurs in the following two cases.1) After phase I the true message is not among two most probable messages. We denotethat probability by P . 6) After phase I the true message is among two most probable, but after phase II it isnot the most probable. We denote that probability by P .Then for the total decoding error probability P e we have P e ≤ P + P . (16)On phase I (of length γn ) we use a code having small two decoding error probabilities:usual and when decoding with list size L = 2 . Then there exists a code such that for P wehave (see § 6) n ln 1 P ≥ γE low ( R/γ, p, 2) + o (1) , n → ∞ . (17)Now we evaluate the probability P . Denote by d ( x , y ) the Hamming distance between x and y , and d ij = d ( x i , x j ) . On phase I (of length γn ) the distances among codewords are { d ij } . On phase II (of length (1 − γ ) n ) the distance between two remaining codewords equals (1 − γ ) n . Therefore, the total distance between the true and the concurrent codewords equals d ij + (1 − γ ) n . Then there exists a code such that (see derivation in § 6) n ln 1 P ≥ γE low ( R/γ, p ) + (1 − γ ) E ( p ) + o (1) . (18)Moreover, there exists a code for which both relations (17) and (18) are fulfilled (see § 6).Then from (16)–(18) we have n ln 1 P e ≥ n min (cid:26) ln 1 P , ln 1 P (cid:27) − n ≥≥ min { γE low ( R/γ, p, , γE low ( R/γ, p ) + (1 − γ ) E ( p ) } + o (1) , where E ( p ) is defined in (6). Therefore F ( R, p ) ≥ max ≤ γ ≤ min { γE low ( R/γ, p, , γE low ( R/γ, p ) + (1 − γ ) E ( p ) } , (19)where E low ( R, p, and E low ( R, p ) are defined in (5) (see also § 6).Note that the function γE low ( R/γ, p, from the right-hand side of (19) monotonicallyincreases in γ . On the contrary, the function S ( γ, R, p ) = γE low ( R/γ, p ) + (1 − γ ) E ( p ) monotonically decreases in γ . Indeed, denoting r = R/γ and omitting p , we have S ′ γ ( γ, R ) = E low ( r ) − rE ′ low ( r ) − E and S ′′ γr ( γ, R ) = − rE ′′ low ( r ) < . Therefore maximum over R, γ of the value S ′ γ ( γ, R ) is attained when r → . Since rE ′ low ( r ) → , r → , then we get max R,γ S ′ γ ( γ, R ) = E low (0) − E < .We consider only the case R < R crit ( p ) , i.e. when E low ( R, p, > E low ( R, p ) . For such R the best is to set γ = γ such that P = P , i.e. γ E low ( R/γ , p, 2) = γ E low ( R/γ , p ) + (1 − γ ) E ( p ) . (20)7oth sides of (20) are continuous functions in γ . The left-hand side of (20) monotonicallyincreases in γ , and the right-hand one monotonically decreases in γ . With γ = 1 the left-hand side is greater than its right-hand side, which equals E low ( R, p ) . On the contrary, for γ = R/R crit the right-hand side is greater than the left-hand side. Then there exists theunique γ ∈ ( R/R crit , satisfying (20). Therefore we get F ( R, p ) ≥ γ E low ( R/γ , p ) + (1 − γ ) E ( p ) > E low ( R, p ) . (21)We show that, in fact, F ( R, p ) satisfies the stronger inequality (13), although we knowexactly only part of the function E ( R, p ) , < R < R crit ( p ) (see (2)). If we connect the points E (0 , p ) and E ( R crit , p ) by the piece of the straight line, then due to the “straight-line bound”[20], for ≤ R ≤ R crit the function E ( R, p ) does not exceed that straight line. Therefore, if < R < R crit ( p ) and < p < / then the inequality holds E ( R, p ) < E (0 , p ) − [ E (0 , p ) − E ( R crit ( p ) , p )] RR crit ( p ) . Now, to establish the formula (13), it is sufficient to check that for such p, R the followingstrict inequality is valid γ E ex ( R/γ , p, > E (0 , p ) − [ E (0 , p ) − E ( R crit ( p ) , p )] RR crit ( p ) . (22)For that purpose it is convenient to introduce the parameter u = R/γ , u ∈ (0 , R crit ) . Thenwe get the parametric representation for γ = γ ( u, p ) and R = R ( u, p ) : γ = E ( p ) E ( p ) + E ex ( u, p, − E ex ( u, p ) , R = uγ . Then combining analytical and numerical methods, it is not difficult to check validity of theinequality (22). It concludes proof of the Theorem 2. N .In Fig. 3 the plots of the functions F ( R, p ) and E ex ( R, p ) for p = 0 . ( R crit ≈ . )are shown.To compare the functions F ( R, p ) and E ( R, p ) considerE x a m p l e 1. Let p = (1 − ε ) / , ε → . Then C ( p ) = ǫ O ( ε ) , R crit ( p ) = C ( p )[1 + o (1)]4 , R min2 ( p ) ≤ R min ( p ) = O ( C ) . Therefore when p → / the expurgation bound, essentially, is not applicable and we getthe known results [15] E ( R, p )[1 + o (1)] = (cid:26) C/ − R, ≤ R ≤ C/ , ( √ C − √ R ) , C/ ≤ R ≤ C, and E ( R, p, o (1)] ≥ E r ( R, p, 2) = (cid:26) C/ − R, ≤ R ≤ C/ , ( √ C − √ R ) , C/ ≤ R ≤ C. εt ( R, p )[1 + o (1)] = C − R, ≤ R ≤ C/ , √ C − √ R ) , C/ ≤ R ≤ C/ , , C/ ≤ R ≤ C. (23)Consider the equation (20). For R < R crit ( p ) = C ( p )[1 + o (1)] / , there are possible twocases: R/γ ≤ C/ and C/ ≤ R/γ < C/ .1) Let R/γ ≤ C/ . Then from (20) we get γ = 6( R + C )7 C , F ( R, p ) = 4 C − R , R ≤ C , and F ( R, p ) E ( R, p ) = 87 − R C − R ) , R ≤ C . The ratio F ( R, p ) /E ( R, p ) monotonically decreases from / (for R = 0 ) down to / (for R = 2 C/ ).2) Let C/ ≤ R/γ < C/ . Then we get √ γ = 2 √ R + √ C − R √ C , C ≤ R < C , and F ( R, p ) = 19 h C − R − p R (3 C − R ) i . The ratio F ( R, p ) /E ( R, p ) monotonically decreases from / (for R = 2 C/ ) down to (for R = C/ ).It is natural to expect that similar results will also hold in the case of the noisy feedbackchannel BSC( p ) , if p is sufficiently small. § 4. Channel with noisy feedback. Proof of Theorem 1 In the noisy feedback case we will still use the transmission method with one switchingmoment. But if we try to use exactly the same method as in the noiseless feedback case,we will face with the following problem. After phase I, the transmitter should find the twomost probable (for the receiver) codewords x , x . But with relatively high probability, thesecond and the third ranked codewords x and x will be approximately equiprobable, andtherefore it will be difficult to the transmitter to rank them correctly (due to noise in thefeedback). Fortunately, in that case (with high probability) the most probable codeword x will be much more probable than x , and then (again with high probability) x is thetrue codeword. We use this observation as follows: if posterior probabilities of the second x and the third x ranked codewords are not very different, the receiver makes a decisionimmediately after phase I (in favor of the most probable codeword x ), and it ignores allnext signals from the transmitter on phase II.9s a result, we use the following transmission and decoding method.Transmission. We set a number < γ < . On phase I, of length m = γn , we use a“good” code (it is explained below). Let x true be the transmitted codeword of length m , y bethe received (by the receiver) block, and x ′ be the received (by the transmitter) block. Thetransmitter selects one more codeword x i = x true , closest to x ′ . For example, the codeword x = x true is chosen, if d ( x , x ′ ) = min x i = x true d ( x i , x ′ ) . As a result, the transmitter builds alist of two messages: the true one θ true and another message θ i = θ true , which looks mostprobable among remaining ones.A “good” code in use of length m should have the following properties:1) Its decoding error probability P e satisfies the inequality P e ≤ e − E low ( R,p ) m ;2) Its list size L = 2 decoding error probability P e (2) satisfies similar inequality P e (2) ≤ e − E low ( R,p, m ;3) The relations (18) and (26) hold for it.Existence of such code is shown in § 6, slightly modifying standard Gallager’s argumentsfor expurgation bound [15, 16].On phase II (i.e. on ( γn, n ] ) the transmitter uses the two opposite codewords of length n − m = (1 − γ ) n (for example, consisting of all zeros and all ones), in order to help the receiverto decide between the true message θ true and another most probable message θ i = θ true .This transmission method is a slight modification of the method used in [1]. It gives thesame decoding error probability exponent, but it is simpler for analysis. If the true message θ true is not among the two most probable messages for the receiver, then there will alwaysbe the decoding error. A slight modification of the transmission method from [1] used herehelps in the case when the true message θ true is among the two most probable messages forthe receiver, but it is not such one for the transmitter.Decoding. We set a number t > . Arrange the Hamming distances { d ( x i , y ) , i =1 , . . . , M } after phase I in the increasing order, denoting d (1) = min i d ( x i , y ) ≤ d (2) ≤ . . . ≤ d ( M ) = max i d ( x i , y ) , (in case of tie we use any order). Let also x , . . . , x M be the corresponding ranking ofcodewords after phase I, i.e x is the closest to y codeword, etc. Two cases are possible.C a s e 1. If d (3) ≤ d (2) + tγn , then the receiver makes the decoding immediately afterphase I (in favor of the closest to y codeword x ). Although the transmitter will still sendsome signals on phase II, the receiver has already made its decision.C a s e 2. If d (3) > d (2) + tγn , then after phase I the receiver selects two most probablemessages θ i , θ j , and after transmission on phase II (i.e. after moment n ) makes a decisionbetween those two remaining messages θ i , θ j in favor of more probable of them (based on allreceived on [0 , n ] signals).In the case 2 the transmitter and the receiver will perform in coordination, if the lists oftwo messages build by each of them coincide. Remind that the receiver’s list always containsthe true message. Of course, those lists may be different (and then there will be the decodingerror), but probability of such event should be sufficiently small (which will be securedbelow). 10 emarks 5. a) In the case of noiseless feedback (i.e. when p = 0 ) the strategy describedreduces to the strategy from § 3 if we set t = 0 .b) The strategy described can be improved by introducing an additional parameter τ ≥ ,such that if d (2) ≥ d (1) + τ γn then the receiver also makes the decoding immediately afterphase I (in favor of the closest to y codeword x ). But introduction of such parameter leadsto too bulky formulas.To evaluate the decoding error probability P e , denote by P and P the decoding errorprobabilities in the case 1 (i.e. after the moment γn ), and in the case 2 (i.e. after the moment n ), respectively. Then for P e we have P e ≤ P + P . (24)We evaluate the probabilities P , P in the right-hand side of (24). Denoting d i = d ( x i , y ) ,i = 1 , . . . , M , for P we have P ≤ M − M X k =1 P ( d k = d (1) ; d k ≥ d (3) − tγn | x k ) . (25)We show that there exists a code such that for P we have ( n → ∞ ) n ln 1 P ≥ γE low ( R/γ, p, − tγ qp + o (1) . (26)Indeed, using the inequality ( P a i ) /ρ ≤ P a /ρi , ρ ≥ , we have P /ρ (cid:0) d k = d (1) ; d k ≥ d (3) − tγn | x k (cid:1) ≤≤ /ρ P /ρ (cid:0) d k = d (2) ≥ d (3) − tγn | x k (cid:1) + 2 /ρ P /ρ (cid:0) d k ≥ d (3) | x k (cid:1) ≤≤ /ρ (cid:18) qp (cid:19) tγn/ (3 ρ ) X m ,m X y [ P ( y | x k ) P ( y | x m ) P ( y | x m )] / /ρ , and then (cid:2) E P /ρ (cid:0) d k = d (1) ; d k ≥ d (3) − tγn | x k (cid:1)(cid:3) ρ/n ≤ (1+ ρ ) /n (cid:18) qp (cid:19) tγ/ e − γE ex ( R/γ,p, . A similar inequality holds with E r ( R/γ, p, instead of E ex ( R/γ, p, . Therefore using thedefinition of E low ( R/γ, p, (see (5)), we get the formula (26).For the value P we have P ≤ P + P n , (27)where P is the decoding error probability in the case 2 for the channel with noiselessfeedback, and P n is the probability that the most probable codeword (excluding the true11odeword x true ) for the receiver is not such one for the transmitter (moreover, the truecodeword is among two most probable codewords for the receiver).For the value P the formula (18) remains valid.It remains us to evaluate P n . For that purpose consider the ensemble of codes C in whicheach codeword is selected independently with the probability − m among all possible binaryvectors of length m . We are interested in the value E C P /ρ n ( C ) , ρ ≥ , where expectation istaken over randomly chosen codes C . Clearly, P ( y | x true ) = q m (cid:18) md (cid:19) (cid:18) pq (cid:19) d , d = d ( x true , y ) . For given blocks x true and y all ( M − remaining codewords are independently andequiprobably distributed among all m binary vectors of length m . The vector y is transmitedover the feedback channel BSC( p ) and the transmitter receives the vector x ′ .Without loss of generality we assume that x true = x M . For the received block y we arrangeall remaining codewords x , . . . , x M − as x , . . . , x M − , in increasing by their distance d ( x i , y ) order, i.e. d ( x , y ) is the minimal distance, etc. In the case 2 it is necessary to have d ( x i , y ) − d ( x , y ) ≥ tm, i = 2 , . . . , M − (otherwise, the case 1 occurs). Moreover, wemay assume that the distance d ( x , y ) satisfies the condition ( m → ∞ ) d ( x , y ) /m ≤ δ GV ( R/γ ) − t + o (1) , R > , (28)which is equivalent to the inequality h (cid:8) d ( x , y ) /m + t (cid:9) ≤ ln 2 − R/γ , d ( x , y ) /m + t < / . Indeed, blocks y , x , . . . , x M − are distributed independently and equiprobably amongall m binary vectors of length m . For u ≥ introduce the random event A ( u ) = { d ( x , y ) > ( u − t ) m ; d ( x , y ) − d ( x , y ) ≥ tm } . Then P {A ( u ) } ≤ ( M − P { d ( x , y ) > ( u − t ) m } M − Y i =2 P { d ( x i , y ) > um } == ( M − P { w ( x ) > ( u − t ) m } P M − { w ( x ) > um } ≤≤ ( M − 1) [1 − P { w ( x ) ≤ um } ] M − ≤ ( M − 1) exp {− ( M − P { w ( x ) ≤ um }} , where the inequality (1 − a ) b ≤ e − ab , b ≥ was used. Note that P { w ( x ) ≤ um } ≥ − m (cid:18) mum (cid:19) ≥ m + 1) 2 − m e mh ( u ) , since [21, формула (12.40)] for any ≤ k ≤ n the inequalities hold n + 1 2 nh ( k/n ) ≤ (cid:18) nk (cid:19) ≤ nh ( k/n ) . P {A ( u ) } ≤ exp (cid:26) Rmγ − ( M − M ( m + 1) e [ R/γ + h ( u ) − ln 2] m (cid:27) . We set u such that [ R/γ + h ( u ) − ln 2] m ≥ m . Then for sufficiently large m we have P {A ( u ) } ≤ e − m , and we may neglect the event of such small probability. Therefore theinequality (28) holds.Assuming that x true = x M , For given y , x ′ , x M and randomly (equiprobably) chosen x , x introduce the set F ( y , x ′ , x M ) = (cid:8) x , x : d ( x , y ) ≤ δ GV ( R/γ ) m − tm, d ( x , x ′ ) ≥ d ( x , x ′ ) (cid:9) . We are interested in the values P = P (cid:8) F ( y , x ′ , x M ) (cid:12)(cid:12) y , x ′ , x M (cid:9) and E y , x ′ , x M P s , s ≥ . Remark 6. In the definition of the set F ( y , x ′ , x M ) we might include additional constraints: d ( x , y ) ≥ δ GV ( R/γ ) m ; d ( x , y ) ≥ d ( x M , y ) . But it seems that they do not improve theexponent of P .Note that if d ( y , y ′ ) ≤ tm then P n = P = 0 . Moreover, if p < t then P n ≤ P { d ( y , x ′ ) ≥ tm } ≤ e − mD ( t k p ) . (29)If d ( y , x ′ ) > tm , then for any nonnegative α, ϕP ≤ E x , x (cid:8) e α [( δ − t ) m − d ( x , y )]+ ϕ [ d ( x , x ′ ) − d ( x , x ′ )] (cid:12)(cid:12) y , x ′ , x M (cid:9) == e α ( δ − t ) m E x , x (cid:8) e − αd ( x , y )+ ϕ [ d ( x , x ′ ) − d ( x , x ′ )] (cid:12)(cid:12) y , x ′ , x M (cid:9) . For any a, b and equiprobable x E x h e ad ( x , y )+ bd ( x , x ′ ) (cid:12)(cid:12) y , x ′ i = 2 − m (cid:0) e a + b (cid:1) m (cid:18) e a + e b e a + b (cid:19) d ( y , x ′ ) . Then when d ( y , x ′ ) > tm , we have P ≤ − m e α ( δ − t ) m (cid:0) e ϕ − α (cid:1) m (cid:0) e − ϕ (cid:1) m (cid:20) e − α + e ϕ e ϕ − α (cid:21) d ( y , x ′ ) . Since E b d ( y , x ′ ) = ( q + p b ) m , then n E h b d ( y , x ′ ) ; d ( y , x ′ ) > tm io /m ≤ (cid:26) min µ ≥ E b d ( y , x ′ )+ µ [ d ( y , x ′ ) − tm ] (cid:27) /m == min µ ≥ (cid:8) b − µt ( q + p b µ ) (cid:9) . Note that ( b ≥ ) min µ ≥ (cid:8) b − µt (cid:0) z + b µ (cid:1)(cid:9) = e f ( b,t,p ) ,f ( b, t, p ) = (cid:26) h ( t ) + (1 − t ) ln z + t ln b, ln( tz / (1 − t )) ≥ ln b, ln ( z + b ) , ln( tz / (1 − t )) ≤ ln b, (30)13here minimum is attained when µ = µ = (cid:20) ln( tz / (1 − t ))ln b − (cid:21) + . Therefore for b ≥ we have (cid:0) E y , x ′ , x M P s (cid:1) /m ≤ − s e α ( δ − t ) s e − ( α + ϕ ) s + f ( b s ,t,p ) ( e α + e ϕ ) s ( e ϕ + 1) s , where b = 1 + e ϕ + α e α + e ϕ . We should minimize that expression over nonnegative α, ϕ . We have E e ad ( x M , y ) = ( q + pe a ) m , E b d ( y , x ′ ) = ( q + p b ) m . Denote z = qp , z = q p , (31)and note that b − ∼ (cid:0) e ϕ − e − ϕ (cid:1) (cid:0) − e − α (cid:1) ≥ . Then (cid:0) E y , x ′ , x M P s (cid:1) /m ≤ − s p e − [ α (1 − δ + t )+ ϕ ] s + f ( b s ,t,p ) ( e α + e ϕ ) s ( e ϕ + 1) s . (32)We apply the random coding with expurgation method, using the inequality ( P a i ) /ρ ≤ P a /ρi , ρ ≥ . We have E C P /ρ n ( C ) ≤ M E y , x ′ , x M P /ρ (cid:8) F ( y , x ′ , x M ) (cid:12)(cid:12) y , x ′ , x M (cid:9) = M E y , x ′ , x M P /ρ and then from (32) we get ( ρ = 1 /s ≥ ) h E C P /ρ n ( C ) i ρ/m ≤ e Rρ/γ − p ρ e ρf ( b s ,t,p ) − α (1 − δ + t ) − ϕ ( e α + e ϕ ) ( e ϕ + 1) . To avoid bulky formulas, we choose the parameters such that the inequality holds (see (30)) ρ ln( tz / (1 − t )) ≥ ln b . (33)Then h E C P /ρ n ( C ) i ρ/m ≤ − e Gρ + F , b = 1 + cdc + d ,G = 2 R/γ + h ( t ) + ln (cid:2) p t q − t (cid:3) = 2 R/γ − D ( t k p ) ,F = − (1 − δ + t ) ln d − ln c + t ln(1 + dc ) + ln(1 + c ) + (1 − t ) ln( d + c ) , and we should minimize F over c, d ≥ . 14ote that F does not depend on ρ . If G < then the best is ρ → ∞ . Since h E C P /ρ n ( C ) i ρ/m → , ρ → ∞ , we may assume that P n = 0 . If G ≥ then the best is ρ = 1 (and then it is better to use simply the random coding method). In both cases we need thecondition (33) be satisfied.If ρ → ∞ then the inequality (33) is equivalent to the condition tz / (1 − t ) > , i.e. p < t . We set t > p such that R/γ − D ( t k p ) < . Then G < , P n = 0 , and from (26),(18) we get F ( R, p, p ) ≥ max γ,t>p min (cid:26) γE low ( R/γ, p, − tγ qp , γE low ( R/γ, p ) + (1 − γ ) E ( p ) (cid:27) . (34)Using t = t ( R, p ) ≥ p (see (9)) we get from (34) F ( R, p, p ) ≥ max γ min (cid:26) γE low ( R/γ, p, − γt ( R/γ, p )3 ln qp ,γE low ( R/γ, p ) + (1 − γ ) E ( p ) (cid:27) , (35)from which the formulas (10), (11) and the Theorem 1 follow. N Remark 7. Note that if p → , then t → and the relation (35) transfers to the similarrelation (19) for the channel with noiseless feedback.To find the function p ( R, p ) of the critical noise level in the feedback channel we set γ → . Then p = p ( R, p ) is defined by the system of equations E low ( R, p, − t qp = E low ( R, p ) ,D ( t k p ) = 2 R. In other words, t ( R, p ) and p ( R, p ) ≤ t ( R, p ) are defined by the formulas (7) and (8),respectively. § 5. When noisy feedback behaves like noiseless ? How small should be p in order to have the error exponent F ( R, p, p ) close to thesimilar exponent F ( R, p ) for noiseless feedback ? More exactly, when for a given α ∈ (0 , the inequality holds F ( R, p, p ) − E ( R, p ) ≤ (1 − α )[ F ( R, p ) − E ( R, p )] ?We give a simple estimate for such p , considering only the case R = 0 . For the optimal γ = γ from (10), (11) we have ( E ( p ) = 2 E (0 , p ) ) γ = 2 E (0 , p ) E (0 , p, 2) + E (0 , p ) − p ln( q/p ) / and then F (0 , p, p ) = 2 E (0 , p )[ E (0 , p, − p ln( q/p ) / E (0 , p, 2) + E (0 , p ) − p ln( q/p ) / ,F (0 , p, p ) − E (0 , p ) = E (0 , p )[ E (0 , p, − E (0 , p ) − p ln( q/p ) / E (0 , p, 2) + E (0 , p ) − p ln( q/p ) / . F (0 , p, p ) − E (0 , p ) ≥ (1 − α )[ F (0 , p ) − E (0 , p )] , it is sufficient to have p ≤ α [ E (0 , p, − E (0 , p )][ αE (0 , p, 2) + (2 − α ) E (0 , p )] ln( q/p ) . Since E (0 , p, ≥ E (0 , p ) , without much loss, we may replace the last inequality by a strongerone: p ≤ α [ E (0 , p, − E (0 , p )]ln( q/p ) = p ( p, α ) . On Fig. 4 the plot of the function p ( p, . is given.E x a m p l e 2. Consider the case p = (1 − ε ) / , ε → . Then C ( p ) ≈ ǫ / and E (0 , p, ≈ C/ , E (0 , p ) ≈ C/ . As a result, we get p ( p, α ) = α (1 − p )[1 + o (1)]8 , p → / . In other words, if the forward BSC( p ) is very bad, then in order to improve its error exponentwe need a very good feedback channel BSC( p ) . § 6. Auxiliary formulas and resultsLower bounds for the decoding error exponents . All formulas below are derivedfollowing Gallager’s technique [15, 16].1) Random coding bounds: E ( R, p, L ) ≥ E r ( R, p, L ) , R ≥ . (36)Moreover ( R crit ,L ( p ) определено в (1)), E ( R, p, L ) = E r ( R, p, L ) = E sp ( R, p ) , R crit ,L ( p ) ≤ R ≤ C ( p ) , (37)and for R ≤ R crit ,L ( p ) we have E ( R, p, L ) ≥ E r ( R, p, L ) = L (ln 2 − R ) − (1 + L ) ln (cid:2) p / (1+ L ) + q / (1+ L ) (cid:3) . (38)Since R crit ,L ( p ) → , L → ∞ , then E ( R, p, L ) → E sp ( R, p ) , L → ∞ for any R ≥ .2) Random coding with expurgation bound : E ( R, p, L ) ≥ E ex ( R, p, L ) = max ρ ≥ {− ρLR − ρ ln f ( p, L, ρ ) } , R ≥ , (39)16here f ( p, L, ρ ) = 2 − ( L +1) ( L X i =1 (cid:18) L + 1 i (cid:19) a /ρi ) ,a i = p (cid:18) qp (cid:19) i/ ( L +1) + q (cid:18) pq (cid:19) i/ ( L +1) . The bound (39) improves the random coding bound (38) for ≤ R < R min ,L ( p ) (see (42),but it does not give E sp ( R, p ) . Note also that f ( p, L, ρ ) = E X m,m ,...,m L X y [ P ( y | x m ) P ( y | x m ) . . . P ( y | x m L )] / ( L +1) /ρ , (40)where all components of each codeword x i are chosen independently and equiprobably from and .In particular, E ex ( R, p ) = E ex ( R, p, 1) = max ρ ≥ n ρ ln 2 − ρR − ρ ln h √ pq ) /ρ io ,E ex ( R, p, 2) = max ρ ≥ n ρ ln 4 − ρR − ρ ln h (cid:0) p / q / + p / q / (cid:1) /ρ io . The functions E ( R, p, L ) , E r ( R, p, L ) and E ex ( R, p, L ) does not decreases on L . In particular, E ex ( R, p ) < E ex ( R, p, , R < R crit ( p ) . (41)In order to get a more convenient representation for the functions E ex ( R, p ) and E ex ( R, p, L ) , introduce rates R min ,L ( p ) = ln 2 − ( L + 1) L ln (cid:2) p / ( L +1) + q / ( L +1) (cid:3) − P Li =1 (cid:0) L +1 i (cid:1) a i ln a i L [ p / ( L +1) + q / ( L +1) ] L +1 . (42)The function R min ,L ( p ) monotonically decreases on L and R min ,L ( p ) < R crit ,L ( p ) , if L ≥ and < p < / . In particular, R min ( p ) = R min , ( p ) = ln 2 − h (cid:18) √ pq √ pq (cid:19) ,R min , ( p ) = ln 2 − (cid:20) ln(1 + 3 a ) − a ln a a (cid:21) , a = p / q / + p / q / . (43)We also have R min , ( p ) < R min , ( p ) < R crit ( p ) , < p < / .Now E ex ( R, p, L ) < E r ( R, p, L ) = E sp ( R, p ) R > R crit ,L ( p ) ,E ex ( R, p, L ) = E r ( R, p, L ) , R min ,L ( p ) ≤ R ≤ R crit ,L ( p ) ,E ex ( R, p, L ) > E r ( R, p, L ) , ≤ R < R min ,L ( p ) . E ex ( R, p ) = δ GV ( R )2 ln 14 pq , ≤ R ≤ R min ( p ) . (44)Note also that ≤ R ≤ R min ( p ) corresponds to the case δ GV ( R ) ≥ (2 √ pq ) / (1 + 2 √ pq ) .If L = 2 , the E ex ( R, p, 2) = − v ln a , ≤ R ≤ R min , ( p ) , where a is defined in в (43), and v is the unique root of the equation ln 4 − h ( v ) − v ln 3 = 2 R , ≤ v < . In particular, E ex (0 , p ) = E (0 , p ) = 12 ln 12 √ pq ,E ex (0 , p, 2) = E (0 , p, 2) = − 34 ln (cid:0) p / q / + p / q / (cid:1) , (45)(the second relation is established in [18]). Existence of code with given properties . We are interested in a code C such thateach its codeword has certain properties A , A , . . . . For that purpose we use the followingresult which is a natural modification of the cute Lemma 5.7 from [16].Assume that we choose randomly (in arbitrary way) a code C with M ′ codewords x m ,and for each x m , m = 1 , . . . , M ′ we have P over codes { x m does not have property A} ≤ / . (46)L e m m a. If the condition (46) is satisfied then there exists a code in the ensemble ofcodes with M ′ = 2 M − codewords for which, at least, for M its codewords the property A is fulfilled .P r o o f remains the same as in [16, Lemma 5.7] (it is the changing of the summationorder in the corresponding double sum). N If there are, say, four properties A , . . . , A , then assume that for each x m , m = 1 , . . . , M ′ ,we have P over codes { x m does not have property A i } ≤ / , i = 1 , . . . , . (47)C o r o l l a r y 2. If the condition (47) is satisfied, then there exists a code in theensemble of codes with M ′ = 2 M − codewords for which, at least, for M its codewords allfour properties A i , i = 1 , . . . , are fulfilled .In our case the property A means that the codeword x m has small decoding errorprobability; A means that x m has small list size L = 2 decoding error probability; A , A mean that for the codeword x m the relations (18) and (26), respectively, hold. Proof of the formula (18) . Consider a code C with M codewords x , . . . , x M of length n + k . Each codeword x i has the form x i = ( x ′ i , x ′′ i ) , where x ′ i has length n and x ′′ i has length18 . We suppose that the parts { x ′′ i } are given, while the parts { x ′ i } are chosen randomly (insome way). We also assume that min i = j d (cid:0) x ′′ i , x ′′ j (cid:1) = δk . (48)Using maximum likelihood decoding, denote by P e,m the conditional decoding errorprobability provided the codeword x m was transmitted. An output block y has the form y = ( y ′ , y ′′ ) , where y ′ , y ′′ have length n and k , respectively. Then P ( y | x m ) = P ( y ′ | x ′ m ) P ( y ′′ | x ′′ m ) . Using the inequality ( P a i ) s ≤ P a si , ≤ s ≤ , and theformula X y ′ q P ( y ′ | x ′ m ) P ( y ′ | x ′ m ′ ) = (4 pq ) d ( x ′ m , x ′ m ′ ) / , we have P se,m ≤ X m ′ = m X y p P ( y | x m ) P ( y | x m ′ ) s == X m ′ = m X y ′ q P ( y ′ | x ′ m ) P ( y ′ | x ′ m ′ ) s X y ′′ q P ( y ′′ | x ′′ m ) P ( y ′′ x ′′ m ′ ) s ≤≤ X m ′ = m X y ′ q P ( y ′ | x ′ m ) P ( y ′ | x ′ m ′ ) s (cid:20) max m = m (2 √ pq ) d ( x ′′ m , x ′′ m ) (cid:21) s == (2 √ pq ) δsk X m ′ = m X y ′ q P ( y ′ | x ′ m ) P ( y ′ | x ′ m ′ ) s = (2 √ pq ) δsk X m ′ = m (4 pq ) sd ( x ′ m , x ′ m ′ ) / . (49)Consider an ensemble of codes in which each codeword x ′ m is selected independently withthe probability − n among all possible binary vectors of length n . Since E z d ( x ′ m , x ′ m ′ ) = E z w ( x ′ m ) = (cid:18) z (cid:19) n , we get (cid:0) E P se,m (cid:1) /s ≤ (2 √ pq ) δk (cid:8) e R − [1 + (2 √ pq ) s ] (cid:9) n/s . Further derivation follows Theorem 5.7.1 from [16]. As a result, defining ρ = 1 /s, ρ ≥ , weget that there exists a code with M codewords such that for any m = 1 , . . . , M we have n ln 1 P e,m ≥ δkn ln 12 √ pq + max ρ ≥ n ρ ln 2 − ρR − ρ ln h √ pq ) /ρ io . From that relation the formula (18) follows. N The authors wish to thank the University of Tokyo for supporting this joint research.19 EFERENCES Burnashev M.V., Yamamoto H. On zero-rate error exponent for BSC with noisy feedback// Problems of Inform. Transm. 2008. V. 44, № 3. P. 33–49.2. Shannon C.E. The Zero Error Capacity of a Noisy Channel // IRE Trans. Inform.Theory. 1956. V. 2. № 3. P. 8–19.3. Dobrushin R.L. Asymptotic bounds on error probability for message transmission ina memoryless channel with feedback // Probl. Kibern. No. 8. M.: Fizmatgiz, 1962. P.161–168.4. Horstein M. Sequential Decoding Using Noiseless Feedback // IEEE Trans. Inform.Theory. 1963. V. 9. № 3. P. 136–143.5. Berlekamp E.R. , Block Coding with Noiseless Feedback. Ph.D. Thesis. MIT, Dept.Electrical Enginering, 1964.6. Elias P. Coding for Noisy Channels // IRE Conv. Rec. 1955. V. 4. P. 37–46. Reprintedin Key Papers in the Development of Information Theory. New York: IEEE Press,1974. P. 102–111.7. Burnashev M.V. Data transmission over a discrete channel with feedback: Randomtransmission time // Problems of Inform. Transm. 1976. V. 12, № 4. P. 10–30.8. Burnashev M.V. On a Reliability Function of Binary Symmetric Channel withFeedback // Problems of Inform. Transm. 1988. V. 24, № 1. P. 3–10.9. Pinsker M.S. The probability of error in block transmission in a memoryless Gaussianchannel with feedback // Problems of Inform. Transm. 1968. V. 4, № 4. P. 3–19.10. Schalkwijk J.P.M., Kailath T. A Coding Scheme for Additive Noise Channels withFeedback - I: No Bandwidth Constraint // IEEE Trans. Inform. Theory. 1966. V. 12.№ 2. P. 172–182.11. Tchamkerten A., Telatar E. Variable Length Coding over an Unknown Channel //IEEE Trans. Inform. Theory. 2006. V. 52. № 5. P. 2126–2145.12. Yamamoto H., Itoh R. Asymptotic Performance of a Modified Schalkwijk–BarronScheme for Channels with Noiseless Feedback // IEEE Trans. Inform. Theory. 1979.V. 25. № 6. P. 729–733.13. Draper S.C., Sahai A. Noisy Feedback Improves Communication Reliability // Proc.IEEE Int. Sympos. on Information Theory. Seattle, USA. July 9–14, 2006, P. 69–73.204. Kim Y.-H., Lapidoth A., Weissman T. The Gaussian Channel with Noisy Feedback //Proc. IEEE Int. Sympos. on Information Theory. Nice, France. June 24–29, 2007. P.1416–1420.15. Gallager R.G. A Simple Derivation of the Coding Theorem and some Applications //IEEE Trans. Inform. Theory. 1965. V. 11. P. 3–18.16. Gallager R.G. Information theory and reliable communication. Wiley, NY, 1968.17. Burnashev M.V. Code spectrum and reliability function: binary symmetric channel –II // Problems of Inform. Transm. (in press).18. Blinovsky V.M. Error probability exponent of list decoding at low rates // Problemsof Inform. Transm. 2001. V. 37, № 4. P. 277–287.19. Burnashev M.V., Yamamoto H. Noisy Feedback Improves the BSC ReliabilityFunction // Proc. IEEE Int. Sympos. on Information Theory. Seoul, Korea. June 28–July 3, 2009. P. 1501–1505.20. Shannon C.E., Gallager R.G.. Berlekamp E.R. Lower bounds to error probability forcodes on discrete memoryless channels // Information and Control. 1967. V. 10, PartI, P. 65–103; Part II, P. 522–552.21. Cover T.M., Thomas J.A. Elements of Information Theory. New York: Wiley. 1991. Burnashev Marat Valievich Institute for Information Transmission Problems RAS [email protected] Yamamoto Hirosuke The University of Tokyo, Japan [email protected] Fig. 2. The plot of the function p ( R, . ( R crit ≈ . ).22 ,30,6 0,20,8 0,10,40,5 00,20,70,3 Fig. 3. The plots of the functions F ( R, p ) and E ex ( R, p ) for p = 0 . ( R crit ≈ . ).23 ,060,04 0,080,02 p00,0130,01250,0120,01150,0110,01050,01 0,1 Fig. 4. The plot of the function p ( p, .1)