A Geometric Property of Relative Entropy and the Universal Threshold Phenomenon for Binary-Input Channels with Noisy State Information at the Encoder
AA Geometric Property of Relative Entropy and theUniversal Threshold Phenomenon for Binary-InputChannels with Noisy State Information at theEncoder
Shengtian Yang ∗† and Jun Chen †∗ School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou 310018, China † Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, CanadaEmail: [email protected], [email protected]
Abstract —Tight lower and upper bounds on the ratio ofrelative entropies of two probability distributions with respect toa common third one are established, where the three distributionsare collinear in the standard ( n − -simplex. These boundsare leveraged to analyze the capacity of an arbitrary binary-input channel with noisy causal state information (provided bya side channel) at the encoder and perfect state information atthe decoder, and in particular to determine the exact universalthreshold on the noise measure of the side channel, above whichthe capacity is the same as that with no encoder side information. I. I
NTRODUCTION
It is shown in [1, Lemma 1] that for any binary-inputchannel with noisy causal state information (provided by aside channel) at the encoder and perfect state information atthe decoder, if the side channel is a generalized erasure channeland the erasure probability is greater than or equal to − e − ,then the capacity is the same as that with no encoder sideinformation. In other words, − e − is a universal upper boundon the erasure probability threshold, which does not dependon the characteristics of the binary-input channel and the statedistribution. However, as is noted in [1, Footnote 2], thisbound is not tight, so determining the exact universal erasureprobability threshold remains an interesting open problem. Itis worth mentioning that, with the erasure probability replacedby a suitably defined noise measure, this universal thresholdholds for all side channels (see [1, Theorem 3] and (10)).We shall settle this open problem by characterizing acertain geometric property of relative entropy (also calledthe Kullback-Leibler divergence). Throughout this paper, alllogarithms are base-e. The standard ( n − -simplex is denotedby P n . The set of all maps from A to B is denoted by thepower set B A . The support set of a map f is denoted by supp( f ) . The minimum and the maximum of x and y aredenoted by x ∧ y and x ∨ y , respectively, and ( x ) + := x ∨ .The contributions of this work are summarized in thefollowing theorems. Theorems 1.1 and 1.2 give tight lowerand upper bounds (1) on the ratio of relative entropies of twoprobability distributions with respect to a common third one,where the three distributions are collinear in P n . Theorem 1.3 determines the exact universal erasure probability thresholdand, more generally, the exact universal threshold (11) on thenoise measure (10) of an arbitrary side channel. Theorem 1.1:
Given α, β ∈ P n , we define z ( t ) = tα +(1 − t ) β for t ∈ [0 , . For ≤ a ≤ and ≤ c < b ≤ , we define u = z ( a ) , v = z ( b ) , and w = z ( c ) . Suppose D( v (cid:107) u ) = r D( w (cid:107) u ) , where D( v (cid:107) u ) and D( w (cid:107) u ) are bothfinite and positive (which implies α (cid:54) = β , a (cid:54) = b , and a (cid:54) = c ).Then ρ (1 − a, − c, − b ) < r < ρ ( a, b, c ) , (1)where ρ ( a, b, c ) := (cid:40) ξ a ( b ) ξ a ( c ) if ≤ a < , − b − c otherwise , (2) ξ s ( t ) := ζ t ( t ) − ζ t ( s ) , (3) ζ t ( s ) := s + (1 − t ) ln(1 − s ) . (4)Equivalently,a) For fixed r , b , and c , a ∈ I , ↓ ∪ I , ↑ ∪ I , ↑ , (5) I , ↓ = (1 − ρ − ↓ , − c, − b (1 /r ) , ρ − ↓ ,b,c ( r )) I , ↑ = ( ρ − ↑ ,b,c ( r ) , − ρ − ↑ , − c, − b (1 /r )) if r ≥ ζ b ( b ) ζ c ( c ) , [0 , − ρ − ↑ , − c, − b (1 /r )) if bc < r < ζ b ( b ) ζ c ( c ) , ∅ otherwise ,I , ↑ = ( ρ − ↑ ,b,c ( r ) , − ρ − ↑ , − c, − b (1 /r )) if < r ≤ ζ − b (1 − b ) ζ − c (1 − c ) , ( ρ − ↑ ,b,c ( r ) , if ζ − b (1 − b ) ζ − c (1 − c ) < r < − b − c , ∅ otherwise ,ρ ↑ ,b,c = ρ · ,b,c | [0 ,c ) , ρ ↑ ,b,c = ρ · ,b,c | [ b, , (6a) ρ ↓ ,b,c = ρ · ,b,c | ( c,b ] , (6b) a r X i v : . [ c s . I T ] M a y here ρ · ,b,c denotes the function ρ of the first argument(with other arguments fixed).b) For fixed r , a , and c , b ∈ I ( r, a, c ) ∪ I ( r, a, c ) , (7)where I ( r, a, c )= (1 − ξ − − a, ↑ ( rξ − a (1 − c )) , ξ − a, ↓ ( rξ a ( c ))) if c < a and r < , ∅ otherwise ,I ( r, a, c )= ( ξ − a, ↑ ( rξ a ( c )) ∨ c, − ξ − − a, ↓ ( rξ − a (1 − c ))) if r ≤ ρ (1 − a, − c, , ( ξ − a, ↑ ( rξ a ( c )) ∨ c, if ρ (1 − a, − c, < r < ρ ( a, , c ) , ∅ otherwise ,ξ a, ↓ = ξ a | [0 ,a ] , ξ a, ↑ = ξ a | [ a, . (8)c) For fixed r , a , and b , c ∈ (1 − I (1 /r, − a, − b )) ∪ (1 − I (1 /r, − a, − b )) , (9)where − A := { − x : x ∈ A } . Theorem 1.2:
Given ≤ a ≤ , ≤ c < b ≤ , and a (cid:54) = b, c , we define u = z ( a ) , v = z ( b ) , and w = z ( c ) ,where z ( t ) = tα + (1 − t ) β and α, β ∈ P n . Then ρ ( a, b, c ) =sup ( α,β ) ∈ Q D( v (cid:107) u )D( w (cid:107) u ) , where Q is the set of all pairs ( α, β ) suchthat D( v (cid:107) u ) and D( w (cid:107) u ) are finite and positive.In particular, if α = (1 − δf ( δ ) , δf ( δ ) , , . . . , and β =(1 − δ, δ, , . . . , , where δ ∈ (0 , , ≤ f ( δ ) < , and lim δ → + f ( δ ) = 0 , then lim δ → v (cid:107) u )D( w (cid:107) u ) = ρ ( a, b, c ) . Theorem 1.3:
Let p Y | X,S be a memoryless channel withinput X , output Y , and state S distributed according to p S .The channel state S is known at the decoder, and a noisy stateobservation ˜ S , generated by S through side channel p ˜ S | S , iscausally available at the encoder. Here, X , Y , S , ˜ S are overfinite alphabets X = { , } , Y , S , and ˜ S , respectively.a) If γ ( p ˜ S | S ) := (cid:88) ˜ s ∈ ˜ S min s ∈S p ˜ S | S (˜ s | s ) (10) ≥ T := 1 − ξ − − , ↑ ( ξ e − (0)) ≈ . , (11)then C ( p Y | X,S , p S , p ˜ S | S ) = C ( p Y | X,S , p S ) , (12)where C ( p Y | X,S , p S , p ˜ S | S ) and C ( p Y | X,S , p S ) denotethe capacities of channel p Y | X,S with ˜ S causally avail-able and unavailable at the encoder, respectively. b) Suppose Y = S = { , } and ˜ S = { , , } . Thechannel p Y | X,S with state S is given by p Y | X,S =0 = (cid:20) p Y | X,S (0 | , p Y | X,S (1 | , p Y | X,S (0 | , p Y | X,S (1 | , (cid:21) = (cid:20) − δ δ (cid:21) , (13a) p Y | X,S =1 = (cid:20) − δ δ (cid:21) , (13b) p S = ( p S (0) , p S (1)) = (1 − δ, δ ) , (13c)where δ ∈ (0 , . . For any ι ∈ (0 , T ) , if p ˜ S | S = (cid:20) p ˜ S | S (0 | p ˜ S | S (1 | p ˜ S | S (2 | p ˜ S | S (0 | p ˜ S | S (1 | p ˜ S | S (2 | (cid:21) = (cid:20) − (cid:15) (cid:15) − (cid:15) (cid:15) (cid:21) with (cid:15) = T − ι (so that γ ( p ˜ S | S ) = T − ι ), then C ( p Y | X,S , p S , p ˜ S | S ) > C ( p Y | X,S , p S ) . (14)for sufficiently small δ . Remark 1.4:
The capacity C ( p Y | X,S , p S , p ˜ S | S ) is given by C ( p Y | X,S , p S , p ˜ S | S ) = max p U I ( U ; Y, S ) = max p U I ( U ; Y | S ) ([2], [3, eq. (3)]), where U is a random variable over U = X ˜ S and satisfies p U,X,Y,S, ˜ S ( u, x, y, s, ˜ s )= p U ( u ) p S ( s ) p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } p Y | X,S ( y | x, s ) (15)and supp( p U ) ≤ min { ( |X | − | ˜ S| + 1 , |S||Y|} (which isoptional, see [4, Theorem 7.2]). The capacity C ( p Y | X,S , p S ) is given by C ( p Y | X,S , p S ) = max p X I ( X ; Y, S ) = max p X I ( X ; Y | S ) ([4, eq. (7.2)]).A plot of C ( p Y | X,S , p S , p ˜ S | S ) against (cid:15) for (cid:15) ∈ [0 , isgiven in Fig. 1, where the channel p Y | X,S with state S is givenby (13) with δ = 0 . . The erasure probability threshold inthis example is very close to the universal threshold T givenby (11).The rest of this paper is organized as follows. The proofs ofTheorems 1.1 and 1.2 are presented in Section II. The proofof Theorem 1.3 is given in Section III. Section IV containssome concluding remarks.II. P ROOFS OF T HEOREMS
AND
Proof of Theorem 1.1:
For any n -dimensional probabilitydistribution p such that p (cid:28) z (1 / , we define f p ( t ) =D( p (cid:107) z ( t )) . With no loss of generality, we assume that all com-ponents of z (1 / are nonzero. Then f (cid:48) p ( t ) = − (cid:80) ni =1 p i α i − β i z i ( t ) and f (cid:48)(cid:48) p ( t ) = (cid:80) ni =1 p i ( α i − β i ) ( z i ( t )) , where < t < . . . . . . . . . . . × − (0.32087, 0.0036828) ϵ Fig. 1. A plot of C ( p Y | X,S , p S , p ˜ S | S ) against (cid:15) for (cid:15) ∈ [0 , , where p Y | X,S and p S are given by (13) with δ = 0 . . The condition D( v (cid:107) u ) = r D( w (cid:107) u ) can be rewritten as (cid:82) ab f (cid:48) v ( t )d t = r (cid:82) ac f (cid:48) w ( t )d t. Since for < t < , ( t − b ) f (cid:48) α ( t ) + (1 − t ) f (cid:48) v ( t )= − n (cid:88) i =1 α i − β i z i ( t ) [( t − b ) α i + (1 − t ) v i ]= − n (cid:88) i =1 α i − β i z i ( t ) { ( t − b ) α i + (1 − t )[ bα i + (1 − b ) β i ] } = − n (cid:88) i =1 α i − β i z i ( t ) [ tα i + (1 − t ) β i ](1 − b )= − (1 − b ) n (cid:88) i =1 ( α i − β i ) = 0 and similarly ( t − c ) f (cid:48) α ( t ) + (1 − t ) f (cid:48) w ( t ) = 0 , we have (cid:82) ab b − t − t f (cid:48) α ( t )d t = r (cid:82) ac c − t − t f (cid:48) α ( t )d t or F g ( r, a, b, c ) = (cid:90) ab b − t − t g ( t )d t − r (cid:90) ac c − t − t g ( t )d t = 0 , (16)where g ( t ) = − f (cid:48) α ( t ) . Since the functions b − t − t and c − t − t arenot integrable on ( b, and ( c, , respectively, we assumethat a < and the case of a = 1 will have to be consideredseparately. Since g (cid:48) ( t ) = − f (cid:48)(cid:48) α ( t ) is negative on (0 , and lim t → − g ( t ) ≥ (cid:80) ni =1 ( α i − β i ) = 0 , g ( t ) is strictly decreasingand positive on (0 , . It is also bounded if g (0) is finite. Ifhowever lim t → g ( t ) = + ∞ (which implies that D( α (cid:107) β ) =+ ∞ and a (cid:54) = 0 ), then we define ˜ g ( t ) = (cid:40) g ( t ) if t ≥ (cid:15),g ( (cid:15) ) otherwise , where (cid:15) is a positive number less than all positive numbers in { a, b, c } . It follows from (16) that F ˜ g ( r, a, b, c ) ≤ (17) in all cases, including the case a > c = 0 . Observing that g or ˜ g is positive, bounded, continuous, and strictly decreasingon (0 , , we denote the set of all such functions by M .By (4), ζ t ( s ) = (cid:82) s t − t (cid:48) − t (cid:48) d t (cid:48) , so that (16) with g ( t ) = 1 gives ζ b ( a ) − ζ b ( b ) − rζ c ( a ) + rζ c ( c ) = 0 . It is clear that r ∗ = ζ b ( b ) − ζ b ( a ) ζ c ( c ) − ζ c ( a ) is the unique solution of this equation, andhence r < r ∗ (Propositions A.1 and A.4).In case a = 1 , we have c < b < a , so that D( v (cid:107) u ) = D (cid:18) b − ca − c u + a − ba − c w (cid:13)(cid:13)(cid:13) u (cid:19) < − b − c D( w (cid:107) u ) , and therefore s < (1 − b ) / (1 − c ) .The above arguments prove the second inequality of (1).The first inequality can be obtained by exchanging α and β with − a , − c , − b , and /r in place of a , b , c , and r ,respectively.We have established the main part of the theorem. The threeequivalent parts are easy consequences of Propositions B.1,B.5, B.6, and B.7.a) If we fix r , b , and c , then from (1) it follows that ρ ( a, b, c ) > r (18a) ρ (1 − a, − c, − b ) > r , (18b)which combined with Proposition B.5 yields a ∈ ρ − ↑ ,b,c (( r, + ∞ )) ∪ ρ − ↓ ,b,c (( r, + ∞ )) ∪ ρ − ↑ ,b,c (( r, + ∞ )) and a ∈ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) ∪ − ρ − ↓ , − c, − b ((1 /r, + ∞ )) ∪ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) . Since ρ − ↑ ,b,c (( r, + ∞ )) = (cid:40) ( ρ − ↑ ,b,c ( r ) , c ) if r ≥ ζ b ( b ) ζ c ( c ) , [0 , c ) otherwise , [0 , c ) ⊇ − ρ − ↑ , − c, − b ((1 /r, + ∞ ))= (cid:40) [0 , − ρ − ↑ , − c, − b (1 /r )) if r > bc , ∅ otherwise , and ζ b ( b ) /ζ c ( c ) > b/c (Proposition B.1), we have ρ − ↑ ,b,c (( r, + ∞ )) ∩ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) = I , ↑ . (19)Similarly, since ( b, ⊇ ρ − ↑ ,b,c (( r, + ∞ )) = (cid:40) ( ρ − ↑ ,b,c ( r ) , if r < − b − c , ∅ otherwise , ( b, ⊇ − ρ − ↑ , − c, − b ((1 /r, + ∞ ))= (cid:40) ( b, − ρ − ↑ , − c, − b (1 /r )) if r ≤ ζ − b (1 − b ) ζ − c (1 − c ) , ( b, otherwise , nd ζ − b (1 − b ) /ζ − c (1 − c ) < (1 − b ) / (1 − c ) (Proposition B.1),we have ρ − ↑ ,b,c (( r, + ∞ )) ∩ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) = I , ↑ . (20)It is also clear that ρ − ↓ ,b,c (( r, + ∞ )) ∩ − ρ − ↓ , − c, − b ((1 /r, + ∞ ))= ( c, ρ − ↓ ,b,c ( r )) ∩ (1 − ρ − ↓ , − c, − b (1 /r ) , b ) = I , ↓ . (21)The equations (19), (20), and (21) together yield (5).b) If we fix r , a , and c , then (18) with Propositions B.6 andB.7 yields b ∈ ρ − a, ↓ ,c (( r, + ∞ )) ∪ ρ − a, ↑ ,c (( r, + ∞ )) and b ∈ − ρ − − a, − c, ↑ ((1 /r, + ∞ )) ∪ − ρ − − a, − c, ↓ ((1 /r, + ∞ )) . where ρ a, ↓ ,c = ρ a, · ,c | ( c,a ) , ρ a, ↑ ,c = ρ a, · ,c | ( a ∨ c, ,ρ a,b, ↑ = ρ a,b, · | [0 ,a ∧ b ) , ρ a,b, ↓ = ρ a,b, · | ( a,b ) . Since ρ − a, ↓ ,c (( r, + ∞ )) = (cid:40) ( c, ξ − a, ↓ ( rξ a ( c ))) if c < a and r < , ∅ otherwise , and ( c, a ) ⊇ − ρ − − a, − c, ↓ ((1 /r, + ∞ ))= (1 − ξ − − a, ↑ ( rξ − a (1 − c )) , a ) if c < a and r < , ( c, a ) otherwise , we have ρ − a, ↓ ,c (( r, + ∞ )) ∩ − ρ − − a, − c, ↓ ((1 /r, + ∞ )) = I ( r, a, c ) . (22)Since ( a ∨ c, ⊇ ρ − a, ↑ ,c (( r, + ∞ ))= (cid:40) ( ξ − a, ↑ ( rξ a ( c )) ∨ c, if r < ρ ( a, , c ) , ∅ otherwise , ( a ∨ c, ⊇ − ρ − − a, − c, ↑ ((1 /r, + ∞ ))= ( a ∨ c, − ξ − − a, ↓ ( rξ − a (1 − c ))) if r ≤ ρ (1 − a, − c, , ( a ∨ c, otherwise , and /ρ (1 − a, − c, < ρ ( a, , c ) implied by (1), we have ρ − a, ↑ ,c (( r, + ∞ )) ∩ − ρ − − a, − c, ↑ ((1 /r, + ∞ )) = I ( r, a, c ) (23)Equation (7) then follows from (22) and (23). c) By symmetry, Part (c) is an easy consequence of Part (b)with /r , − a , and − b in place of r , a , and c , respectively. Proof of Theorem 1.2:
Thanks to Theorem 1.1, it sufficesto prove the second part. We first assume that a (cid:54) = 1 . By theproof of Theorem 1.1, g ( t ) = − f (cid:48) α ( t )= δ (1 − (cid:15) )(1 − δ(cid:15) )(1 − δ(cid:15) ) t + (1 − δ )(1 − t ) − δ (cid:15) (1 − (cid:15) ) δ(cid:15)t + δ (1 − t ) , where (cid:15) = f ( δ ) . It is clear that g (0) = δ (1 − (cid:15) ) − δ , so that | g (0) − δ (1 − (cid:15) ) | = δ (1 − (cid:15) ) | δ − (cid:15) | − δ ≤ δ (1 − (cid:15) ) δ ∨ (cid:15) − δ . Furthermore, g (1 − (cid:15) / ) δ (1 − (cid:15) ) ≥ − δ(cid:15) − δ(cid:15) + (cid:15) / − δ(cid:15)δ(cid:15) / = 1 − (cid:15) / − (cid:15) / − δ(cid:15) + (cid:15) / ≥ − (cid:15) / for δ sufficiently small. Since g ( t ) is positive and strictlydecreasing, (cid:90) | g ( t ) − δ (1 − (cid:15) ) | d t ≤ (cid:90) − (cid:15) / δ (1 − (cid:15) ) M d t + (cid:90) − (cid:15) / δ (1 − (cid:15) )d t< δ (1 − (cid:15) )( M + (cid:15) / ) for sufficiently small δ , where M = [( δ ∨ (cid:15) ) / (1 − δ )] ∨ (3 (cid:15) / ) .Then, lim δ → (cid:107) g − δ (1 − f ( δ )) (cid:107) δ (1 − f ( δ )) = 0 , so that the solution of (16) (solved for r ) converges to ρ ( a, b, c ) as δ → (Proposition A.4), and therefore lim δ → D( v (cid:107) u ) / D( w (cid:107) u ) = ρ ( a, b, c ) .As for the case a = 1 , we have ≤ c < b < a = 1 .For any δ (cid:48) > , since D( v (cid:107) u ) and D( w (cid:107) u ) are positive andfinite and lim a → ρ ( a, b, c ) = ρ (1 , b, c ) (Proposition B.5), welet u (cid:48) = z ( a (cid:48) ) where a (cid:48) is arbitrarily close to , so that (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u )D( w (cid:107) u ) − D( v (cid:107) u (cid:48) )D( w (cid:107) u (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) and | ρ ( a (cid:48) , b, c ) − ρ (1 , b, c ) | ≤ δ . Furthermore, for sufficientlysmall δ , (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u (cid:48) )D( w (cid:107) u (cid:48) ) − ρ ( a (cid:48) , b, c ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) . Then (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u )D( w (cid:107) u ) − ρ (1 , b, c ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) for sufficiently small δ . Since δ (cid:48) is arbitrary, the proof iscomplete.II. P ROOF OF T HEOREM
Proof: a) To prove (12), we need to show that a capacity-achieving input distribution p X of channel P Y,S | X is alsooptimal for channel P Y,S | U (see Remark 1.4).Since p X is capacity-achieving for P Y,S | X , it follows from[5, Theorem 4.5.1] that D( p Y,S | X =0 (cid:107) p Y,S ) = D( p Y,S | X =1 (cid:107) p Y,S ) = C, (24)where p Y,S ( y, s ) = p S ( s ) (cid:88) x ∈X p X ( x ) p Y | X,S ( y | x, s ) (25)and C = C ( p Y | X,S , p S ) . This also implies that p X (0) , p X (1) ∈ (e − , − e − ) (Theorem 1.1 with r = 1 ) . (26)Since and can be regarded as constant mappings from ˜ S to X , p X is also a valid input strategy for P Y,S | U . We will showthat D( p Y,S | U = u (cid:107) p Y,S ) ≤ C for all non-constant mappings u ∈ U , so that the natural (zero) extension of p X over U ,achieves the capacity of P Y,S | U ([5, Theorem 4.5.1]).With (15) and (25), D( p Y,S | U = u (cid:107) p Y,S ) can be expressed as D( p Y,S | U = u (cid:107) p Y,S )= (cid:88) y ∈Y ,s ∈S p Y,S | U ( y, s | u ) ln p Y,S | U ( y, s | u ) p Y,S ( y, s )= (cid:88) y ∈Y ,s ∈S p S ( s ) p Y | U,S ( y | u, s ) ln p S ( s ) p Y | U,S ( y | u, s ) p Y,S ( y, s )= (cid:88) y ∈Y ,s ∈S p S ( s ) p Y | U,S ( y | u, s ) ln p Y | U,S ( y | u, s ) p Y | S ( y | s ) , where p Y | U,S ( y | u, s )= (cid:88) x ∈X p Y | X,S ( y | x, s ) p X | U,S ( x | u, s )= (cid:88) x ∈X P Y | X,S ( y | x, s ) (cid:88) ˜ s ∈ ˜ S p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } and p Y | S ( y | s ) = (cid:88) x ∈X p X ( x ) p Y | X,S ( y | x, s ) . (27)Then D( p Y,S | U = u (cid:107) p Y,S )= (cid:88) s ∈S p S ( s ) (cid:88) y ∈Y (cid:32) (cid:88) x ∈X p X | U,S ( x | u, s ) P Y | X,S ( y | x, s ) (cid:33) × ln (cid:80) x ∈X p X | U,S ( x | u, s ) P Y | X,S ( y | x, s ) (cid:80) x ∈X p X ( x ) p Y | X,S ( y | x, s ) with p X | U,S ( x | u, s ) = (cid:80) ˜ s ∈ ˜ S p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } , sothat D( p Y,S | U = u (cid:107) p Y,S ) becomes a function of the channel p X | U = u,S from S to X . For convenience, we denote thisfunction by D ( κ ) with κ = p X | U = u,S . By condition, γ ( p ˜ S | S ) ≥ T , so that γ ( p X | U = u,S ) ≥ T (Proposition C.2), and therefore D ( p X | U = u,S ) ≤ C (Propo-sitions C.3 and C.4 with (24) and (26)).b) When state information is not available, we have thechannel p Y | X = (cid:20) − δ δ − δ + δ δ − δ (cid:21) = (cid:20) − δ (cid:48) f ( δ (cid:48) ) δ (cid:48) f ( δ (cid:48) )1 − δ (cid:48) δ (cid:48) (cid:21) , where δ (cid:48) = g ( δ ) = δ (1 − δ ) which is invertible on (0 , / ,and f ( δ (cid:48) ) = g − ( δ (cid:48) ) / (1 − g − ( δ (cid:48) )) .Then it follows from Theorem 1.2 with α = p Y | X =0 , β = p Y | X =1 , a ∈ (0 , , b = 1 , and c = 0 that lim δ → D( α (cid:107) aα + (1 − a ) β )D( β (cid:107) aα + (1 − a ) β ) = ρ ( a, , and ρ (1 − e − , ,
0) = 1 . Since D( v (cid:107) u ) / D( w (cid:107) u ) is continuouswith respect to a and ρ ( t, , is strictly decreasing on (0 , (Proposition B.5), the capacity-achieving input distribution p X of channel p Y | X must satisfy lim δ → (cid:13)(cid:13) p X − (1 − e − , e − ) (cid:13)(cid:13) = 0 . On the other hand, it is noticed that for sufficiently small δ , D( p Y | X =1 ,S =1 (cid:107) p Y | S =1 ) > D( p Y | X =0 ,S =1 (cid:107) p Y | S =1 ) with p Y | S =1 defined by (27), so it is tempted to use signal when S = 1 . We choose the input strategy u (˜ s ) = (cid:40) if ˜ s = 1 , otherwise . Because of the random erasure of p ˜ S | S , the actual inputdistributions under the strategy u are (1 , and ( (cid:15), − (cid:15) ) for the states and , respectively. By Theorem 1.2 with α = p Y | X =1 ,S =1 , β = p Y | X =0 ,S =1 , a = e − , b ∈ (e − , ,and c = 0 , we have lim δ → D( bα + (1 − b ) β (cid:107) aα + (1 − a ) β )D( β (cid:107) aα + (1 − a ) β ) = ρ (e − , b, and ρ (e − , − T,
0) = 1 . Since D( v (cid:107) u ) / D( w (cid:107) u ) is continuouswith respect to ( a, b ) (with < a < b < ) and ρ (e − , t, isstrictly increasing on (e − , (Proposition B.6), D((1 − (cid:15) ) α + (cid:15)β (cid:107) p Y | S =1 ) > D( β (cid:107) p Y | S =1 ) for (cid:15) = T − ι and sufficiently small δ . Therefore, D( p Y,S | U = u (cid:107) p Y,S )= D ( p X | U = u,S )= p S (0) D ( p Y | X =0 ,S =0 (cid:107) p Y | S =0 )+ p S (1) D ( (cid:15)p Y | X =0 ,S =1 + (1 − (cid:15) ) p Y | X =1 ,S =1 (cid:107) p Y | S =1 ) > p S (0) D ( p Y | X =0 ,S =0 (cid:107) p Y | S =0 )+ p S (1) D ( p Y | X =0 ,S =1 (cid:107) p Y | S =1 )= D (U ) = D( p Y,S | X =0 (cid:107) p Y,S ) , which implies (14) ([5, Theorem 4.5.1] and Remark 1.4),where D ( · ) is defined in the proof of Part (a) and U denotesthe deterministic useless channel with constant output .V. C ONCLUSION
We have established tight lower and upper bounds on theratio of relative entropies of two probability distributions withrespect to a common third one, where the three distributionsare collinear in P n (Theorems 1.1 and 1.2). These boundsenable us to settle an open problem left from [1], namely,determining the exact universal threshold on the noise measureof the side channel (Theorem 1.3).It is worth noting that [6, Theorem 2] is a special case ofTheorem 1.1 with r = 1 , b = 1 , and c = 0 . A natural directionfor future work is to extend Theorem 1.1 to more than twoprobability distributions and to quantum relative entropy.A PPENDIX AP ROPERTIES OF F g ( r, a, b, c ) Proposition A.1:
Let F g ( r, a, b, c ) := (cid:90) ab b − t − t g ( t )d t − r (cid:90) ac c − t − t g ( t )d t, (28)where g ∈ M (cid:48) , the set of all positive, bounded, continuous,nonincreasing functions on (0 , . Then F g ( r, a, b, c ) is strictlyincreasing in r for fixed a , b , and c with ≤ a, b, c ≤ and a (cid:54) = c, . Proof:
Observe that ∂ F g ( r, a, b, c ) ∂r = − (cid:90) ac c − t − t g ( t )d t, which is positive whenever a (cid:54) = c . Lemma A.2:
Let f and g be bounded measurable functionson ( I, B ( I )) and λ the Lebesgue measure on R , where I =[ c, d ] with c < d . The function g is nonincreasing on I . If for s ∈ I , (cid:90) [ c,s ] f ( t ) λ (d t ) ≥ (29)with equality iff s = c or d , then (cid:90) I f ( t ) g ( t ) λ (d t ) ≥ (30)with equality iff g is constant on ( c, d ) , and for any µ ∈ R , (cid:90) I f ( t ) g ( t ) λ (d t ) ≤ M (cid:90) I | g ( t ) − µ | λ (d t ) , (31)where M = sup t ∈ I | f ( t ) | . Proof:
Note that, owing to the nonincreasing property of g , the two limits g ( c + ) and g ( d − ) always exist. Let h ( t, t (cid:48) ) = f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } , which is clearly integrable on I × R .By Fubini’s theorem, (cid:90) I f ( t ) g ( t ) λ (d t )= (cid:90) I λ (d t ) (cid:90) R f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } λ (d t (cid:48) )= (cid:90) I × R h d( λ × λ )= (cid:90) R λ (d t (cid:48) ) (cid:90) I f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } λ (d t )= (cid:90) R λ (d t (cid:48) ) (cid:90) J ( t (cid:48) ) f ( t ) λ (d t )= (cid:90) [ g ( d − ) ,g ( c + )] λ (d t (cid:48) ) (cid:90) J ( t (cid:48) ) f ( t ) λ (d t ) , (32)where J ( t (cid:48) ) = { t ∈ I : g ( t ) ≥ t (cid:48) } is an interval [ c, t (cid:48)(cid:48) ) or [ c, t (cid:48)(cid:48) ] with t (cid:48)(cid:48) ∈ I , and (32) is because when t (cid:48) / ∈ [ g ( d − ) , g ( c + )] , J ( t (cid:48) ) is ∅ , { c } , [ c, d ) , or [ c, d ] , so that (cid:82) J ( t (cid:48) ) f d λ = 0 bycondition (29).If g is constant on ( c, d ) , then g ( c + ) = g ( d − ) , so that (cid:82) I f g d λ = 0 . On the other hand, if g is not constant on ( c, d ) ,then for any g ( d − ) < t (cid:48) < g ( c + ) , J ( t (cid:48) ) = [ c, t (cid:48)(cid:48) ) or [ c, t (cid:48)(cid:48) ] with c < t (cid:48)(cid:48) < d , hence (cid:82) J ( t (cid:48) ) f d λ > for t (cid:48) ∈ ( g ( d − ) , g ( c + )) ,and therefore (cid:82) I f g d λ > . This proves (30), and (31) is aneasy consequence of the following fact: (cid:90) I f ( t )( g ( t ) − µ ) λ (d t ) = (cid:90) I f ( t ) g ( t ) λ (d t ) − µ (cid:90) I f ( t ) λ (d t )= (cid:90) I f ( t ) g ( t ) λ (d t ) . Proposition A.3:
Let r ∗ = ρ ( a, b, c ) , where ρ ( a, b, c ) isdefined by (2) with ≤ a < , ≤ c < b ≤ , and a (cid:54) = b, c . The functional F g ( r ∗ , a, b, c ) can be written as (cid:82) [ p ,p ] f ( t ) g ( t ) λ (d t ) such that h ( s ) = (cid:82) [ p ,s ] f ( t ) λ (d t ) iszero at s = p and is strictly increasing on ( p , p ) andstrictly decreasing on ( p , p ) for some p ∈ ( p , p ) , where p = a ∧ c , p = a ∨ b , and λ is the Lebesgue measure on R .More specifically, we have:a) If ≤ a < c < b ≤ , then f ( t ) = r ∗ c − t − t { a ≤ t ≤ c } − b − t − t { a ≤ t ≤ b } . (33)It is strictly decreasing on ( a, c ) and strictly increasing on ( c, b ) , and it is positive on ( a, d ) and negative on ( d, b ) forsome d ∈ ( a, c ) .b) If ≤ c < a < b ≤ , then f ( t ) = r ∗ t − c − t { c ≤ t ≤ a } − b − t − t { a ≤ t ≤ b } , (34)It is strictly increasing on ( c, a ) and ( a, b ) , and it is positiveon ( c, a ) and negative on ( a, b ) .c) If ≤ c < b < a < , then f ( t ) = r ∗ t − c − t { c ≤ t ≤ a } − t − b − t { b ≤ t ≤ a } , (35)t is strictly increasing on ( c, b ) and strictly decreasing on ( b, a ) , and it is positive on ( c, d ) and negative on ( d, a ) forsome d ∈ ( b, a ) . Proof:
It has been shown in the proof of Theorem 1.1that h ( p ) = 0 . Other properties of h are easy consequencesof the remaining part.Equations (33), (34), and (35) are obviously true in thealmost-everywhere sense. It remains to prove the propertiesof f in the three cases.a) For t ∈ ( a, c ) , it follows from Propositions B.1 and B.5that f (cid:48) ( t ) = − r ∗ − c (1 − t ) + 1 − b (1 − t ) < − b (1 − c ) + (1 − b ) cc (1 − t ) = − b − cc (1 − t ) < , so f ( t ) is strictly decreasing on ( a, c ) . For t ∈ ( c, b ) , f ( t ) = − b − t − t = 1 − b − t − , which is clearly strictly increasing on ( c, b ) . By Proposi-tion B.4, lim t ∈ a + f ( t ) = f ( a ) = r ∗ c − a − a − b − a − a > . It is also clear that f ( b ) = 0 . Therefore, f ( t ) is positive on ( a, d ) and negative on ( d, b ) for some d ∈ ( a, c ) .b) When t ∈ ( c, a ) , f ( t ) = r ∗ t − c − t = r ∗ − c − t − r ∗ which is strictly increasing. When t ∈ ( a, b ) , f ( t ) = 1 − b − t − , which is also strictly increasing. Since lim t → c + f ( t ) = f ( c ) =0 and lim t ∈ b − f ( t ) = f ( b ) = 0 , f ( t ) is positive on ( c, a ) andnegative on ( a, b ) .c) For t ∈ ( c, b ) , f ( t ) = r ∗ − c − t − r ∗ , which is strictly increasing. For t ∈ ( b, a ) , it follows fromProposition B.5 that f (cid:48) ( t ) = r ∗ − c (1 − t ) − − b (1 − t ) < (1 − b ) − (1 − b )(1 − t ) = 0 , so f ( t ) is strictly decreasing on ( b, a ) . By Proposition B.4, lim t → a − f ( t ) = f ( a ) = r ∗ a − c − a − a − b − a < . It is also clear that lim t → c + f ( t ) = f ( c ) = 0 and lim t → b + f ( t ) = f ( b ) > . Therefore, f ( t ) is positive on ( c, d ) and negative on ( d, a ) for some d ∈ ( b, a ) . Proposition A.4:
The equation F g ( r, a, b, c ) = 0 solved for r has a unique positive solution q = q ( g ) for g ∈ M (cid:48) andfixed a , b , c with ≤ a < , ≤ c < b ≤ , and a (cid:54) = b, c .Then q ( g ) ≤ q (1) = ρ ( a, b, c ) for all g ∈ M with equality iff g is constant on ( a ∧ c, a ∨ b ) , where ρ ( a, b, c ) is defined by(2). If for some positive real µ , (cid:107) g − µ (cid:107) < µξ a ( c )(1 − a ) | a − c | , then q ( g ) ≥ q (1) − M (1 − a ) (cid:107) g − µ (cid:107) µξ a ( c )(1 − a ) − | a − c | (cid:107) g − µ (cid:107) , where (cid:107) g (cid:107) = (cid:82) | g ( t ) | d t , ξ a is defined by (3), and M = M ( a, b, c ) is a certain positive real number. Proof:
The existence and uniqueness of q ( g ) followsfrom Proposition A.1 with the facts F g (0 , a, b, c ) < and lim r → + ∞ F g ( r, a, b, c ) = + ∞ .It is clear that q (1) , the solution of F ( r, a, b, c ) = 0 , is ρ ( a, b, c ) . From Propositions A.2 and A.3 it follows that ≤ F g ( q (1) , a, b, c ) ≤ M ( a, b, c ) (cid:107) g − µ (cid:107) . (36)The first inequality of (36) implies that q ( g ) ≤ q (1) withequality iff g is constant on ( a ∧ c, a ∨ b ) (Propositions A.1and A.2). On the other hand, F g ( q (1) , α, β, γ )= F g ( q (1) , α, β, γ ) − F g ( q ( g ) , α, β, γ )= ( q (1) − q ( g )) (cid:90) ac t − c − t g ( t )d t ≥ ( q (1) − q ( g )) (cid:18)(cid:90) ac t − c − t µ d t − (cid:90) ac t − c − t | g ( t ) − µ | d t (cid:19) ≥ ( q (1) − q ( g )) (cid:18) µ (cid:90) ac t − c − t d t − | a − c | − a (cid:90) | g ( t ) − µ | d t (cid:19) = ( q (1) − q ( g )) (cid:18) µξ a ( c ) − | a − c | − a (cid:107) g − µ (cid:107) (cid:19) . This, combined with (36), completes the proof.A
PPENDIX BP ROPERTIES OF ζ t ( s ) , ξ s ( t ) , AND ρ ( a, b, c ) Proposition B.1:
For the function ζ t ( s ) defined by (4), ζ (cid:48) t ( s ) = t − s − s , so that ζ t is strictly increasing on (0 , t ) andstrictly decreasing on ( t, . Furthermore, we have ζ b ( b ) ζ c ( c ) > bc for < c < b ≤ . Proof:
The first part is obvious, and the second part canbe proved by letting f ( t ) = ζ t ( t ) /t and noting that f (cid:48) ( t ) = − t + ln(1 − t ) t > − t − tt = 0 for < t < . Also note that this inequality is equivalent to ρ (0 , b, c ) > /ρ (1 , − c, − b ) implied by (1). Proposition B.2:
For the function ξ s ( t ) defined by (3) with ≤ s < , ξ s (0) = ln − s − s, ξ s ( s ) = 0 , and ξ s (1) = 1 − s.ξ s is continuous on [0 , , and it is strictly decreasing on (0 , s ) and strictly increasing on ( s, .On the other hand, when t is fixed, ξ s ( t ) is strictly decreas-ing in s for s ∈ (0 , t ) and strictly increasing in s for s ∈ ( t, .When s = 1 − e − , ξ s (0) = ξ s (1) , so that for s ≤ − e − , ξ s ( t ) = ξ s (0) has a unique solution on ( s, , and for s ≥ − e − , ξ s ( t ) = ξ s (1) has a unique solution on [0 , s ) . Proof:
Observe that ξ (cid:48) s ( t ) = ln(1 − s ) − ln(1 − t ) hich is negative on (0 , s ) and positive on ( s, . Also notethat ∂ξ s ( t ) ∂s = s − t − s , which, as a function of s , is negative on (0 , t ) and positiveon ( t, . These two facts prove the first and the second parts,respectively. The last part can be easily proved by noting that ξ s (0) and ξ s (1) are strictly increasing and decreasing for s ∈ [0 , , respectively. Proposition B.3:
Let ξ s, ↓ = ξ s | [0 ,s ] and ξ s, ↑ = ξ s | [ s, . Let f t, ↑ ( s ) = ξ − s, ↑ ( ξ s ( t )) , where s ∈ ( t, d ) , t ∈ [0 , , and d is the unique solution of ξ d ( t ) = ξ d (1) for d ∈ ( t, . Then f t, ↑ ( s ) is strictly increasing in s .Let f t, ↓ ( s ) = ξ − s, ↓ ( ξ s ( t )) , where s ∈ ( d, t ) , t ∈ (0 , , and d is the unique solution of ξ d ( t ) = ξ d (0) for d ∈ (0 , t ) . Then f t, ↓ ( s ) is strictly increasing in s . Proof:
This result is a consequence of Proposition B.2.The condition ξ d ( t ) = ξ d (1) ensures that ξ s ( s ) < ξ s ( t ) < ξ d ( t ) = ξ d (1) < ξ s (1) when s ∈ ( t, d ) , so that f t, ↑ ( s ) is well defined. For t < s ξ − s (cid:48) , ↑ ( ξ s ( t )) > s (cid:48) .Similarly, ξ s ( ξ − s (cid:48) , ↑ ( ξ s ( t ))) > ξ s (cid:48) ( ξ − s (cid:48) , ↑ ( ξ s ( t ))) = ξ s ( t ) , so that ξ − s (cid:48) , ↑ ( ξ s ( t )) > ξ − s, ↑ ( ξ s ( t )) , and therefore f t, ↑ ( s ) < f t, ↑ ( s (cid:48) ) .The condition ξ d ( t ) = ξ d (0) ensures that ξ s ( s ) < ξ s ( t ) < ξ d ( t ) = ξ d (0) < ξ s (0) when s ∈ ( d, t ) , so that f t, ↓ ( s ) is well defined. For d < s ξ s (cid:48) ( t ) , so that ξ − s, ↓ ( ξ s ( t )) < ξ − s, ↓ ( ξ s (cid:48) ( t )) < s .Similarly, ξ s (cid:48) ( t ) = ξ s ( ξ − s, ↓ ( ξ s (cid:48) ( t ))) < ξ s (cid:48) ( ξ − s, ↓ ( ξ s (cid:48) ( t ))) , so that ξ − s, ↓ ( ξ s (cid:48) ( t )) < ξ − s (cid:48) , ↓ ( ξ s (cid:48) ( t )) , and therefore f t, ↓ ( s ) < f t, ↓ ( s (cid:48) ) . Proposition B.4:
Let ρ ( a, b, c ) be the function defined by(2). If ≤ a < c < b ≤ , then ρ ( a, b, c ) > b − ac − a . If ≤ c ζ b ( c ) − ζ b ( a ) ζ c ( c ) − ζ c ( a ) = ζ (cid:48) b ( t ) ζ (cid:48) c ( t ) = b − tc − t > b − ac − a , where t ∈ ( a, c ) . Similarly, if ≤ c < b < a < , then ρ ( a, b, c ) < ζ b ( b ) − ζ b ( a ) ζ c ( b ) − ζ c ( a ) = ζ (cid:48) b ( t ) ζ (cid:48) c ( t ) = t − bt − c < a − ba − c , where t ∈ ( b, a ) . Proposition B.5:
Let f ( t ) = ρ ( t, b, c ) , where ≤ c < b ≤ .Then f (0) = ζ b ( b ) ζ c ( c ) , f ( c ) = + ∞ , f ( b ) = 0 , and f (1) = − b − c . The function f is continuous on [0 , c ) and ( c, , and it isstrictly increasing on (0 , c ) and ( b, and strictly decreasingon ( c, b ) . Let f ↑ = f | [0 ,c ) , and f ↓ = f | ( c,b ] , and f ↑ = f | [ b, . Then, for s > , f − ↑ (( s, + ∞ )) = (cid:40) ( f − ↑ ( s ) , c ) if s ≥ f (0) , [0 , c ) otherwise ,f − ↓ (( s, + ∞ )) = ( c, f − ↓ ( s )) , f − ↑ (( s, + ∞ )) = (cid:40) ( f − ↑ ( s ) , if s < f (1) , ∅ otherwise . Proof:
It is clear that f is continuous on [0 , c ) and ( c, .As for t = 1 , lim t → f ( t ) = lim t → ζ b ( b ) − ζ b ( t ) ζ c ( c ) − ζ c ( t ) = lim t → ζ (cid:48) b ( t ) ζ (cid:48) c ( t )= lim t → b − tc − t (Proposition B.1) = f (1) . For the remaining part, it suffices to show that f (cid:48) ( t ) ispositive on (0 , c ) ∪ ( b, and negative on ( c, b ) . We have f (cid:48) ( t ) = g ( t )( ζ c ( c ) − ζ c ( t )) , where g ( t ) = ζ (cid:48) c ( t )( ζ b ( b ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t )) . If < t < c , then from Proposition B.1, it follows that g ( t ) > ζ (cid:48) c ( t )( ζ b ( c ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t ))= ζ (cid:48) c ( t ) ζ (cid:48) b ( t (cid:48) )( ζ c ( c ) − ζ c ( t )) ζ (cid:48) c ( t (cid:48) ) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t )) (37) = ( ζ c ( c ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − t (cid:48) c − t (cid:48) − ζ (cid:48) b ( t ) (cid:19) > ( ζ c ( c ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − tc − t − ζ (cid:48) b ( t ) (cid:19) = 0 , where (37) follows from Cauchy’s mean value theorem forsome t (cid:48) ∈ ( t, c ) . If c < t < b , then it follows fromProposition B.1 that ζ (cid:48) c ( t ) < and ζ (cid:48) b ( t ) > , so that g ( t ) < .If b < t < , then it follows from Proposition B.1 that g ( t ) > ζ (cid:48) c ( t )( ζ b ( b ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( b ) − ζ c ( t ))= ζ (cid:48) c ( t ) ζ (cid:48) b ( t (cid:48) )( ζ c ( b ) − ζ c ( t )) ζ (cid:48) c ( t (cid:48) ) − ζ (cid:48) b ( t )( ζ c ( b ) − ζ c ( t )) (38) = ( ζ c ( b ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − t (cid:48) c − t (cid:48) − ζ (cid:48) b ( t ) (cid:19) > ( ζ c ( b ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − tc − t − ζ (cid:48) b ( t ) (cid:19) = 0 , where (38) follows from Cauchy’s mean value theorem forsome t (cid:48) ∈ ( b, t ) . Proposition B.6:
Let f ( t ) = ρ ( a, t, c ) , where ≤ a ≤ , ≤ c < , and a (cid:54) = c . Then f (0) = (cid:40) ξ a (0) ξ a ( c ) if ≤ a < , − c otherwise , ,f ( c ) = 1 , f ( a ) = 0 , and f (1) = ξ a (1) ξ a ( c ) . The function f is continuous on [0 , , and it is strictly decreasing on (0 , a ) and strictly increasing on ( a, . Let f ↓ = f | ( c,a ) and f ↑ = f | ( a ∨ c, . Then, for s > , f − ↓ (( s, + ∞ )) = (cid:40) ( c, ξ − a, ↓ ( sξ a ( c ))) if c < a and s < , ∅ otherwise , − ↑ (( s, + ∞ )) = (cid:40) ( ξ − a, ↑ ( sξ a ( c )) ∨ c, if s < f (1) , ∅ otherwise , where ξ a, ↓ and ξ a, ↑ are defined in Proposition B.3. Proof:
Since f ( t ) = (1 − t ) / (1 − c ) for a = 1 , theproposition is clearly true. As for a < , note that f ( t ) = ξ a ( t ) /ξ a ( c ) and use Proposition B.2. Proposition B.7:
Let f ( t ) = ρ ( a, b, t ) , where ≤ a ≤ , < b ≤ , and a (cid:54) = b . Then f (0) = (cid:40) ξ a ( b ) ξ a (0) if ≤ a < , − b otherwise ,f ( a ) = + ∞ , f ( b ) = 1 , and f (1) = ξ a ( b ) ξ a (1) . The function f iscontinuous on [0 , a ) and ( a, , and it is strictly increasing on (0 , a ) and strictly decreasing on ( a, . Let f ↑ = f | [0 ,a ∧ b ) and f ↓ = f | ( a,b ) . Then, for s > , f − ↑ (( s, + ∞ )) = (cid:40) ( ξ − a, ↓ ( ξ a ( b ) /s ) , a ∧ b ) if s ≥ f (0) , [0 , a ∧ b ) otherwise ,f − ↓ (( s, + ∞ )) = (cid:40) ( a, ξ − a, ↑ ( ξ a ( b ) /s )) if b > a and s > , ( a, b ) otherwise , where ξ a, ↓ and ξ a, ↑ are defined in Proposition B.3. Proof:
Since f ( t ) = (1 − b ) / (1 − t ) for a = 1 , theproposition is clearly true. As for a < , note that f ( t ) = ξ a ( b ) /ξ a ( t ) and use Proposition B.2.A PPENDIX CT HE PROPERTIES OF γ ( κ ) AND D ( κ ) Proposition C.1:
Any channel κ : S → X can be decom-posed into the following form: κ = (cid:88) x ∈X λ x ( κ ) [ γ ( κ )U x + (1 − γ ( κ )) κ (cid:48) ] , where U x denotes the deterministic useless channel withconstant output x , and γ ( κ ) := (cid:88) x ∈X min s ∈S κ ( x | s ) ∈ [0 , ,λ x ( κ ) := (cid:40) min s ∈S κ ( x | s ) γ ( κ ) if γ ( κ ) > , |X | otherwise ,κ (cid:48) ( x | s ) = (cid:40) κ ( x | s ) − min s (cid:48)∈S κ ( x | s (cid:48) )1 − γ ( κ ) if γ ( κ ) < , |X | otherwise . Sketch of proof:
The proof is straightforward and onlyinvolves simple algebraic manipulations. One thing to note isthat γ ( κ ) ≤ (cid:80) x ∈X κ ( x | s ) = 1 where s is arbitrary.Since a channel κ : S → X can be regarded as a |S| × |X | matrix. The next property of γ ( κ ) is given in a matrix form. Proposition C.2:
Let A be an m × (cid:96) channel matrix and B an (cid:96) × n deterministic channel matrix. Let g j be the gap betweenthe least number and the second least number of column A ∗ ,j and let g = min ≤ j ≤ (cid:96) g j . Then γ ( AB ) ≥ γ ( A )+( | M |− n ) + g, where M = I ( { , . . . , (cid:96) } ) and I ( j ) = arg min i A i,j . When | M | ≤ n , the lower bound can be attained by choosing B such that | I ( B − ( k )) | ≤ for every ≤ k ≤ n , where B is understood as a map. (There may be more than one rowsattaining the minimum value of A ∗ ,j , in which case, it does notmatter which row index is assigned to I ( j ) because g = 0 ). Proof:
Since B is deterministic, AB = C = (cid:0) C ∗ , C ∗ , · · · C ∗ ,n (cid:1) with every column C ∗ ,k = (cid:80) j ∈ B − ( k ) A ∗ ,j . Let I (cid:48) ( k ) =arg min i C i,k . Then the set M (cid:48) = I (cid:48) ( { , . . . , n } ) hasat most min { m, n } elements, and hence misses at least ( | M | − min { m, n } ) + indices in M , so that at least ( | M | − min { m, n } ) + columns of A do not contribute their minimumcomponents to the minimum components of columns of C ,and therefore γ ( C ) = n (cid:88) k =1 C I (cid:48) ( k ) ,k ≥ (cid:96) (cid:88) j =1 min ≤ i ≤ m A i,j + ( | M | − min { m, n } ) + g = γ ( A ) + ( | M | − n ) + g. The remaining part is straightforward.
Proposition C.3:
Let D ( κ ) := (cid:88) s ∈S p S ( s )D( κ ( · | s ) ⊗ p Y | X,S = s (cid:107) p Y | S = s ) where κ is a channel from S to X , ( κ ( · | s ) ⊗ p Y | X,S = s )( y ) = (cid:88) x ∈X κ ( x | s ) p Y | X,S ( y | x, s ) , and p Y | S ( y | s ) = (cid:80) x ∈X p X ( x ) p Y | X,S ( y | x, s ) . If X = { , } , D (U ) = D (U ) = C , and γ ( κ ) ≥ T ( p X (0)) ∨ T ( p X (1)) , then D ( κ ) ≤ C , where T ( a ) = min { t ∈ [0 ,
1] : D( z ( t ) (cid:107) z ( a )) ≤ D( z (1) (cid:107) z ( a )) for all α, β ∈ P |Y| ,where z ( t ) = tα + (1 − t ) β } (39)for a ∈ (0 , . Proof:
We denote by A the set on which the minimumis taken in (39). We will show that it is closed, so that T ( a ) is well defined. The set A can be rewritten as A = ∩ α,β ∈P |Y| A ( α, β ) with A ( α, β ) = { t ∈ [0 ,
1] : D( z ( t ) (cid:107) z ( a )) ≤ D( z (1) (cid:107) z ( a )) } .Since D( z ( t ) (cid:107) z ( a )) , as a function of t , is continuous, A ( α, β ) is closed for all α, β , and hence the intersection A is alsoclosed.By the convexity of Kullback-Leibler divergence (or the log-sum inequality), it is easy to see that D ( κ ) is convex. Then, D ( κ ) ≤ (cid:88) x ∈X λ x D ( γ U x + (1 − γ ) κ (cid:48) ) , (Proposition C.1)here λ x = λ x ( κ ) and γ = γ ( κ ) . For every x ∈ X , D ( γ U x + (1 − γ ) κ (cid:48) )= (cid:88) s ∈S p S ( s )D (cid:0) γp Y | X = x,S = s + (1 − γ )( κ (cid:48) ( · | s ) ⊗ p Y | X,S = s ) (cid:13)(cid:13) p Y | S = s (cid:1) = (cid:88) s ∈S p S ( s )D (cid:32) (cid:88) x (cid:48) ∈X λ (cid:48) x p Y | X = x (cid:48) ,S = s (cid:13)(cid:13)(cid:13) p Y | S = s (cid:33) , where λ (cid:48) x ≥ γ . Since γ ≥ T ( p X (0)) ∨ T ( p X (1)) , it followsfrom (39) that D ( γ U x + (1 − γ ) κ (cid:48) ) ≤ D (U x ) = C . Therefore, D ( κ ) ≤ (cid:80) x ∈X λ x C = C . Proposition C.4:
For a > e − , the function T ( a ) defined by(39) can be computed by T ( a ) = 1 − ξ − − a, ↑ ( ξ − a (0)) , whichis strictly increasing in (e − , . Proof:
Since a > e − , ξ − a (0) < ξ − a (1) (Proposi-tion B.2). From Theorems 1.1 and 1.2 with b = 1 , it followsthat T ( a ) = inf { c : 1 /ρ (1 − a, − c, ≥ } = inf { c : ρ (1 − a, − c, ≤ } = inf(1 − [0 , ξ − − a, ↑ ( ξ − a (0))]) (40) = inf[1 − ξ − − a, ↑ ( ξ − a (0)) ,
1] = 1 − ξ − − a, ↑ ( ξ − a (0)) , where (40) follows from Proposition B.6. It is also clear that T ( a ) is strictly increasing in (e − , (Proposition B.3).A CKNOWLEDGMENT
This work was supported in part by the National NaturalScience Foundation of China under Grant 61571398 and inpart by the Natural Science and Engineering Research Council(NSERC) of Canada under a Discovery Grant.R
EFERENCES[1] R. Xu, J. Chen, T. Weissman, and J.-K. Zhang, “When is noisy stateinformation at the encoder as useless as no information or as good asnoise-free state?”
IEEE Trans. Inf. Theory , vol. 63, no. 2, pp. 960–974,Feb. 2017.[2] C. E. Shannon, “Channels with side information at the transmitter,”
IBMJ. Res. Develop. , vol. 2, no. 4, pp. 289–293, Oct. 1958.[3] G. Caire and S. Shamai, “On the capacity of some channels with channelstate information,”
IEEE Trans. Inf. Theory , vol. 45, no. 6, pp. 2007–2019,Sep. 1999.[4] A. A. El Gamal and Y.-H. Kim,
Network Information Theory . Cam-bridge, New York: Cambridge University Press, 2011.[5] R. G. Gallager,
Information Theory and Reliable Communication . NewYork: Wiley, 1968.[6] N. Shulman and M. Feder, “The Uniform Distribution as a UniversalPrior,”