[PDF] A Geometric Property of Relative Entropy and the Universal Threshold Phenomenon for Binary-Input Channels with Noisy State Information at the Encoder

Abstract

Tight lower and upper bounds on the ratio of relative entropies of two probability distributions with respect to a common third one are established, where the three distributions are collinear in the standard (n−1) -simplex. These bounds are leveraged to analyze the capacity of an arbitrary binary-input channel with noisy causal state information (provided by a side channel) at the encoder and perfect state information at the decoder, and in particular to determine the exact universal threshold on the noise measure of the side channel, above which the capacity is the same as that with no encoder side information.

Full PDF

AA Geometric Property of Relative Entropy and theUniversal Threshold Phenomenon for Binary-InputChannels with Noisy State Information at theEncoder

Shengtian Yang ∗† and Jun Chen †∗ School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou 310018, China † Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, CanadaEmail: [email protected], [email protected]

Abstract —Tight lower and upper bounds on the ratio ofrelative entropies of two probability distributions with respect toa common third one are established, where the three distributionsare collinear in the standard ( n − -simplex. These boundsare leveraged to analyze the capacity of an arbitrary binary-input channel with noisy causal state information (provided bya side channel) at the encoder and perfect state information atthe decoder, and in particular to determine the exact universalthreshold on the noise measure of the side channel, above whichthe capacity is the same as that with no encoder side information. I. I

NTRODUCTION

It is shown in [1, Lemma 1] that for any binary-inputchannel with noisy causal state information (provided by aside channel) at the encoder and perfect state information atthe decoder, if the side channel is a generalized erasure channeland the erasure probability is greater than or equal to − e − ,then the capacity is the same as that with no encoder sideinformation. In other words, − e − is a universal upper boundon the erasure probability threshold, which does not dependon the characteristics of the binary-input channel and the statedistribution. However, as is noted in [1, Footnote 2], thisbound is not tight, so determining the exact universal erasureprobability threshold remains an interesting open problem. Itis worth mentioning that, with the erasure probability replacedby a suitably deﬁned noise measure, this universal thresholdholds for all side channels (see [1, Theorem 3] and (10)).We shall settle this open problem by characterizing acertain geometric property of relative entropy (also calledthe Kullback-Leibler divergence). Throughout this paper, alllogarithms are base-e. The standard ( n − -simplex is denotedby P n . The set of all maps from A to B is denoted by thepower set B A . The support set of a map f is denoted by supp( f ) . The minimum and the maximum of x and y aredenoted by x ∧ y and x ∨ y , respectively, and ( x ) + := x ∨ .The contributions of this work are summarized in thefollowing theorems. Theorems 1.1 and 1.2 give tight lowerand upper bounds (1) on the ratio of relative entropies of twoprobability distributions with respect to a common third one,where the three distributions are collinear in P n . Theorem 1.3 determines the exact universal erasure probability thresholdand, more generally, the exact universal threshold (11) on thenoise measure (10) of an arbitrary side channel. Theorem 1.1:

Given α, β ∈ P n , we deﬁne z ( t ) = tα +(1 − t ) β for t ∈ [0 , . For ≤ a ≤ and ≤ c < b ≤ , we deﬁne u = z ( a ) , v = z ( b ) , and w = z ( c ) . Suppose D( v (cid:107) u ) = r D( w (cid:107) u ) , where D( v (cid:107) u ) and D( w (cid:107) u ) are bothﬁnite and positive (which implies α (cid:54) = β , a (cid:54) = b , and a (cid:54) = c ).Then ρ (1 − a, − c, − b ) < r < ρ ( a, b, c ) , (1)where ρ ( a, b, c ) := (cid:40) ξ a ( b ) ξ a ( c ) if ≤ a < , − b − c otherwise , (2) ξ s ( t ) := ζ t ( t ) − ζ t ( s ) , (3) ζ t ( s ) := s + (1 − t ) ln(1 − s ) . (4)Equivalently,a) For ﬁxed r , b , and c , a ∈ I , ↓ ∪ I , ↑ ∪ I , ↑ , (5) I , ↓ = (1 − ρ − ↓ , − c, − b (1 /r ) , ρ − ↓ ,b,c ( r )) I , ↑ =  ( ρ − ↑ ,b,c ( r ) , − ρ − ↑ , − c, − b (1 /r )) if r ≥ ζ b ( b ) ζ c ( c ) , [0 , − ρ − ↑ , − c, − b (1 /r )) if bc < r < ζ b ( b ) ζ c ( c ) , ∅ otherwise ,I , ↑ =  ( ρ − ↑ ,b,c ( r ) , − ρ − ↑ , − c, − b (1 /r )) if < r ≤ ζ − b (1 − b ) ζ − c (1 − c ) , ( ρ − ↑ ,b,c ( r ) , if ζ − b (1 − b ) ζ − c (1 − c ) < r < − b − c , ∅ otherwise ,ρ ↑ ,b,c = ρ · ,b,c | [0 ,c ) , ρ ↑ ,b,c = ρ · ,b,c | [ b, , (6a) ρ ↓ ,b,c = ρ · ,b,c | ( c,b ] , (6b) a r X i v : . [ c s . I T ] M a y here ρ · ,b,c denotes the function ρ of the ﬁrst argument(with other arguments ﬁxed).b) For ﬁxed r , a , and c , b ∈ I ( r, a, c ) ∪ I ( r, a, c ) , (7)where I ( r, a, c )=  (1 − ξ − − a, ↑ ( rξ − a (1 − c )) , ξ − a, ↓ ( rξ a ( c ))) if c < a and r < , ∅ otherwise ,I ( r, a, c )=  ( ξ − a, ↑ ( rξ a ( c )) ∨ c, − ξ − − a, ↓ ( rξ − a (1 − c ))) if r ≤ ρ (1 − a, − c, , ( ξ − a, ↑ ( rξ a ( c )) ∨ c, if ρ (1 − a, − c, < r < ρ ( a, , c ) , ∅ otherwise ,ξ a, ↓ = ξ a | [0 ,a ] , ξ a, ↑ = ξ a | [ a, . (8)c) For ﬁxed r , a , and b , c ∈ (1 − I (1 /r, − a, − b )) ∪ (1 − I (1 /r, − a, − b )) , (9)where − A := { − x : x ∈ A } . Theorem 1.2:

Given ≤ a ≤ , ≤ c < b ≤ , and a (cid:54) = b, c , we deﬁne u = z ( a ) , v = z ( b ) , and w = z ( c ) ,where z ( t ) = tα + (1 − t ) β and α, β ∈ P n . Then ρ ( a, b, c ) =sup ( α,β ) ∈ Q D( v (cid:107) u )D( w (cid:107) u ) , where Q is the set of all pairs ( α, β ) suchthat D( v (cid:107) u ) and D( w (cid:107) u ) are ﬁnite and positive.In particular, if α = (1 − δf ( δ ) , δf ( δ ) , , . . . , and β =(1 − δ, δ, , . . . , , where δ ∈ (0 , , ≤ f ( δ ) < , and lim δ → + f ( δ ) = 0 , then lim δ → v (cid:107) u )D( w (cid:107) u ) = ρ ( a, b, c ) . Theorem 1.3:

Let p Y | X,S be a memoryless channel withinput X , output Y , and state S distributed according to p S .The channel state S is known at the decoder, and a noisy stateobservation ˜ S , generated by S through side channel p ˜ S | S , iscausally available at the encoder. Here, X , Y , S , ˜ S are overﬁnite alphabets X = { , } , Y , S , and ˜ S , respectively.a) If γ ( p ˜ S | S ) := (cid:88) ˜ s ∈ ˜ S min s ∈S p ˜ S | S (˜ s | s ) (10) ≥ T := 1 − ξ − − , ↑ ( ξ e − (0)) ≈ . , (11)then C ( p Y | X,S , p S , p ˜ S | S ) = C ( p Y | X,S , p S ) , (12)where C ( p Y | X,S , p S , p ˜ S | S ) and C ( p Y | X,S , p S ) denotethe capacities of channel p Y | X,S with ˜ S causally avail-able and unavailable at the encoder, respectively. b) Suppose Y = S = { , } and ˜ S = { , , } . Thechannel p Y | X,S with state S is given by p Y | X,S =0 = (cid:20) p Y | X,S (0 | , p Y | X,S (1 | , p Y | X,S (0 | , p Y | X,S (1 | , (cid:21) = (cid:20) − δ δ (cid:21) , (13a) p Y | X,S =1 = (cid:20) − δ δ (cid:21) , (13b) p S = ( p S (0) , p S (1)) = (1 − δ, δ ) , (13c)where δ ∈ (0 , . . For any ι ∈ (0 , T ) , if p ˜ S | S = (cid:20) p ˜ S | S (0 | p ˜ S | S (1 | p ˜ S | S (2 | p ˜ S | S (0 | p ˜ S | S (1 | p ˜ S | S (2 | (cid:21) = (cid:20) − (cid:15) (cid:15) − (cid:15) (cid:15) (cid:21) with (cid:15) = T − ι (so that γ ( p ˜ S | S ) = T − ι ), then C ( p Y | X,S , p S , p ˜ S | S ) > C ( p Y | X,S , p S ) . (14)for sufﬁciently small δ . Remark 1.4:

The capacity C ( p Y | X,S , p S , p ˜ S | S ) is given by C ( p Y | X,S , p S , p ˜ S | S ) = max p U I ( U ; Y, S ) = max p U I ( U ; Y | S ) ([2], [3, eq. (3)]), where U is a random variable over U = X ˜ S and satisﬁes p U,X,Y,S, ˜ S ( u, x, y, s, ˜ s )= p U ( u ) p S ( s ) p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } p Y | X,S ( y | x, s ) (15)and supp( p U ) ≤ min { ( |X | − | ˜ S| + 1 , |S||Y|} (which isoptional, see [4, Theorem 7.2]). The capacity C ( p Y | X,S , p S ) is given by C ( p Y | X,S , p S ) = max p X I ( X ; Y, S ) = max p X I ( X ; Y | S ) ([4, eq. (7.2)]).A plot of C ( p Y | X,S , p S , p ˜ S | S ) against (cid:15) for (cid:15) ∈ [0 , isgiven in Fig. 1, where the channel p Y | X,S with state S is givenby (13) with δ = 0 . . The erasure probability threshold inthis example is very close to the universal threshold T givenby (11).The rest of this paper is organized as follows. The proofs ofTheorems 1.1 and 1.2 are presented in Section II. The proofof Theorem 1.3 is given in Section III. Section IV containssome concluding remarks.II. P ROOFS OF T HEOREMS

AND

Proof of Theorem 1.1:

For any n -dimensional probabilitydistribution p such that p (cid:28) z (1 / , we deﬁne f p ( t ) =D( p (cid:107) z ( t )) . With no loss of generality, we assume that all com-ponents of z (1 / are nonzero. Then f (cid:48) p ( t ) = − (cid:80) ni =1 p i α i − β i z i ( t ) and f (cid:48)(cid:48) p ( t ) = (cid:80) ni =1 p i ( α i − β i ) ( z i ( t )) , where < t < . . . . . . . . . . . × − (0.32087, 0.0036828) ϵ Fig. 1. A plot of C ( p Y | X,S , p S , p ˜ S | S ) against (cid:15) for (cid:15) ∈ [0 , , where p Y | X,S and p S are given by (13) with δ = 0 . . The condition D( v (cid:107) u ) = r D( w (cid:107) u ) can be rewritten as (cid:82) ab f (cid:48) v ( t )d t = r (cid:82) ac f (cid:48) w ( t )d t. Since for < t < , ( t − b ) f (cid:48) α ( t ) + (1 − t ) f (cid:48) v ( t )= − n (cid:88) i =1 α i − β i z i ( t ) [( t − b ) α i + (1 − t ) v i ]= − n (cid:88) i =1 α i − β i z i ( t ) { ( t − b ) α i + (1 − t )[ bα i + (1 − b ) β i ] } = − n (cid:88) i =1 α i − β i z i ( t ) [ tα i + (1 − t ) β i ](1 − b )= − (1 − b ) n (cid:88) i =1 ( α i − β i ) = 0 and similarly ( t − c ) f (cid:48) α ( t ) + (1 − t ) f (cid:48) w ( t ) = 0 , we have (cid:82) ab b − t − t f (cid:48) α ( t )d t = r (cid:82) ac c − t − t f (cid:48) α ( t )d t or F g ( r, a, b, c ) = (cid:90) ab b − t − t g ( t )d t − r (cid:90) ac c − t − t g ( t )d t = 0 , (16)where g ( t ) = − f (cid:48) α ( t ) . Since the functions b − t − t and c − t − t arenot integrable on ( b, and ( c, , respectively, we assumethat a < and the case of a = 1 will have to be consideredseparately. Since g (cid:48) ( t ) = − f (cid:48)(cid:48) α ( t ) is negative on (0 , and lim t → − g ( t ) ≥ (cid:80) ni =1 ( α i − β i ) = 0 , g ( t ) is strictly decreasingand positive on (0 , . It is also bounded if g (0) is ﬁnite. Ifhowever lim t → g ( t ) = + ∞ (which implies that D( α (cid:107) β ) =+ ∞ and a (cid:54) = 0 ), then we deﬁne ˜ g ( t ) = (cid:40) g ( t ) if t ≥ (cid:15),g ( (cid:15) ) otherwise , where (cid:15) is a positive number less than all positive numbers in { a, b, c } . It follows from (16) that F ˜ g ( r, a, b, c ) ≤ (17) in all cases, including the case a > c = 0 . Observing that g or ˜ g is positive, bounded, continuous, and strictly decreasingon (0 , , we denote the set of all such functions by M .By (4), ζ t ( s ) = (cid:82) s t − t (cid:48) − t (cid:48) d t (cid:48) , so that (16) with g ( t ) = 1 gives ζ b ( a ) − ζ b ( b ) − rζ c ( a ) + rζ c ( c ) = 0 . It is clear that r ∗ = ζ b ( b ) − ζ b ( a ) ζ c ( c ) − ζ c ( a ) is the unique solution of this equation, andhence r < r ∗ (Propositions A.1 and A.4).In case a = 1 , we have c < b < a , so that D( v (cid:107) u ) = D (cid:18) b − ca − c u + a − ba − c w (cid:13)(cid:13)(cid:13) u (cid:19) < − b − c D( w (cid:107) u ) , and therefore s < (1 − b ) / (1 − c ) .The above arguments prove the second inequality of (1).The ﬁrst inequality can be obtained by exchanging α and β with − a , − c , − b , and /r in place of a , b , c , and r ,respectively.We have established the main part of the theorem. The threeequivalent parts are easy consequences of Propositions B.1,B.5, B.6, and B.7.a) If we ﬁx r , b , and c , then from (1) it follows that ρ ( a, b, c ) > r (18a) ρ (1 − a, − c, − b ) > r , (18b)which combined with Proposition B.5 yields a ∈ ρ − ↑ ,b,c (( r, + ∞ )) ∪ ρ − ↓ ,b,c (( r, + ∞ )) ∪ ρ − ↑ ,b,c (( r, + ∞ )) and a ∈ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) ∪ − ρ − ↓ , − c, − b ((1 /r, + ∞ )) ∪ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) . Since ρ − ↑ ,b,c (( r, + ∞ )) = (cid:40) ( ρ − ↑ ,b,c ( r ) , c ) if r ≥ ζ b ( b ) ζ c ( c ) , [0 , c ) otherwise , [0 , c ) ⊇ − ρ − ↑ , − c, − b ((1 /r, + ∞ ))= (cid:40) [0 , − ρ − ↑ , − c, − b (1 /r )) if r > bc , ∅ otherwise , and ζ b ( b ) /ζ c ( c ) > b/c (Proposition B.1), we have ρ − ↑ ,b,c (( r, + ∞ )) ∩ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) = I , ↑ . (19)Similarly, since ( b, ⊇ ρ − ↑ ,b,c (( r, + ∞ )) = (cid:40) ( ρ − ↑ ,b,c ( r ) , if r < − b − c , ∅ otherwise , ( b, ⊇ − ρ − ↑ , − c, − b ((1 /r, + ∞ ))= (cid:40) ( b, − ρ − ↑ , − c, − b (1 /r )) if r ≤ ζ − b (1 − b ) ζ − c (1 − c ) , ( b, otherwise , nd ζ − b (1 − b ) /ζ − c (1 − c ) < (1 − b ) / (1 − c ) (Proposition B.1),we have ρ − ↑ ,b,c (( r, + ∞ )) ∩ − ρ − ↑ , − c, − b ((1 /r, + ∞ )) = I , ↑ . (20)It is also clear that ρ − ↓ ,b,c (( r, + ∞ )) ∩ − ρ − ↓ , − c, − b ((1 /r, + ∞ ))= ( c, ρ − ↓ ,b,c ( r )) ∩ (1 − ρ − ↓ , − c, − b (1 /r ) , b ) = I , ↓ . (21)The equations (19), (20), and (21) together yield (5).b) If we ﬁx r , a , and c , then (18) with Propositions B.6 andB.7 yields b ∈ ρ − a, ↓ ,c (( r, + ∞ )) ∪ ρ − a, ↑ ,c (( r, + ∞ )) and b ∈ − ρ − − a, − c, ↑ ((1 /r, + ∞ )) ∪ − ρ − − a, − c, ↓ ((1 /r, + ∞ )) . where ρ a, ↓ ,c = ρ a, · ,c | ( c,a ) , ρ a, ↑ ,c = ρ a, · ,c | ( a ∨ c, ,ρ a,b, ↑ = ρ a,b, · | [0 ,a ∧ b ) , ρ a,b, ↓ = ρ a,b, · | ( a,b ) . Since ρ − a, ↓ ,c (( r, + ∞ )) = (cid:40) ( c, ξ − a, ↓ ( rξ a ( c ))) if c < a and r < , ∅ otherwise , and ( c, a ) ⊇ − ρ − − a, − c, ↓ ((1 /r, + ∞ ))=  (1 − ξ − − a, ↑ ( rξ − a (1 − c )) , a ) if c < a and r < , ( c, a ) otherwise , we have ρ − a, ↓ ,c (( r, + ∞ )) ∩ − ρ − − a, − c, ↓ ((1 /r, + ∞ )) = I ( r, a, c ) . (22)Since ( a ∨ c, ⊇ ρ − a, ↑ ,c (( r, + ∞ ))= (cid:40) ( ξ − a, ↑ ( rξ a ( c )) ∨ c, if r < ρ ( a, , c ) , ∅ otherwise , ( a ∨ c, ⊇ − ρ − − a, − c, ↑ ((1 /r, + ∞ ))=  ( a ∨ c, − ξ − − a, ↓ ( rξ − a (1 − c ))) if r ≤ ρ (1 − a, − c, , ( a ∨ c, otherwise , and /ρ (1 − a, − c, < ρ ( a, , c ) implied by (1), we have ρ − a, ↑ ,c (( r, + ∞ )) ∩ − ρ − − a, − c, ↑ ((1 /r, + ∞ )) = I ( r, a, c ) (23)Equation (7) then follows from (22) and (23). c) By symmetry, Part (c) is an easy consequence of Part (b)with /r , − a , and − b in place of r , a , and c , respectively. Proof of Theorem 1.2:

Thanks to Theorem 1.1, it sufﬁcesto prove the second part. We ﬁrst assume that a (cid:54) = 1 . By theproof of Theorem 1.1, g ( t ) = − f (cid:48) α ( t )= δ (1 − (cid:15) )(1 − δ(cid:15) )(1 − δ(cid:15) ) t + (1 − δ )(1 − t ) − δ (cid:15) (1 − (cid:15) ) δ(cid:15)t + δ (1 − t ) , where (cid:15) = f ( δ ) . It is clear that g (0) = δ (1 − (cid:15) ) − δ , so that | g (0) − δ (1 − (cid:15) ) | = δ (1 − (cid:15) ) | δ − (cid:15) | − δ ≤ δ (1 − (cid:15) ) δ ∨ (cid:15) − δ . Furthermore, g (1 − (cid:15) / ) δ (1 − (cid:15) ) ≥ − δ(cid:15) − δ(cid:15) + (cid:15) / − δ(cid:15)δ(cid:15) / = 1 − (cid:15) / − (cid:15) / − δ(cid:15) + (cid:15) / ≥ − (cid:15) / for δ sufﬁciently small. Since g ( t ) is positive and strictlydecreasing, (cid:90) | g ( t ) − δ (1 − (cid:15) ) | d t ≤ (cid:90) − (cid:15) / δ (1 − (cid:15) ) M d t + (cid:90) − (cid:15) / δ (1 − (cid:15) )d t< δ (1 − (cid:15) )( M + (cid:15) / ) for sufﬁciently small δ , where M = [( δ ∨ (cid:15) ) / (1 − δ )] ∨ (3 (cid:15) / ) .Then, lim δ → (cid:107) g − δ (1 − f ( δ )) (cid:107) δ (1 − f ( δ )) = 0 , so that the solution of (16) (solved for r ) converges to ρ ( a, b, c ) as δ → (Proposition A.4), and therefore lim δ → D( v (cid:107) u ) / D( w (cid:107) u ) = ρ ( a, b, c ) .As for the case a = 1 , we have ≤ c < b < a = 1 .For any δ (cid:48) > , since D( v (cid:107) u ) and D( w (cid:107) u ) are positive andﬁnite and lim a → ρ ( a, b, c ) = ρ (1 , b, c ) (Proposition B.5), welet u (cid:48) = z ( a (cid:48) ) where a (cid:48) is arbitrarily close to , so that (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u )D( w (cid:107) u ) − D( v (cid:107) u (cid:48) )D( w (cid:107) u (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) and | ρ ( a (cid:48) , b, c ) − ρ (1 , b, c ) | ≤ δ . Furthermore, for sufﬁcientlysmall δ , (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u (cid:48) )D( w (cid:107) u (cid:48) ) − ρ ( a (cid:48) , b, c ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) . Then (cid:12)(cid:12)(cid:12)(cid:12) D( v (cid:107) u )D( w (cid:107) u ) − ρ (1 , b, c ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ (cid:48) for sufﬁciently small δ . Since δ (cid:48) is arbitrary, the proof iscomplete.II. P ROOF OF T HEOREM

Proof: a) To prove (12), we need to show that a capacity-achieving input distribution p X of channel P Y,S | X is alsooptimal for channel P Y,S | U (see Remark 1.4).Since p X is capacity-achieving for P Y,S | X , it follows from[5, Theorem 4.5.1] that D( p Y,S | X =0 (cid:107) p Y,S ) = D( p Y,S | X =1 (cid:107) p Y,S ) = C, (24)where p Y,S ( y, s ) = p S ( s ) (cid:88) x ∈X p X ( x ) p Y | X,S ( y | x, s ) (25)and C = C ( p Y | X,S , p S ) . This also implies that p X (0) , p X (1) ∈ (e − , − e − ) (Theorem 1.1 with r = 1 ) . (26)Since and can be regarded as constant mappings from ˜ S to X , p X is also a valid input strategy for P Y,S | U . We will showthat D( p Y,S | U = u (cid:107) p Y,S ) ≤ C for all non-constant mappings u ∈ U , so that the natural (zero) extension of p X over U ,achieves the capacity of P Y,S | U ([5, Theorem 4.5.1]).With (15) and (25), D( p Y,S | U = u (cid:107) p Y,S ) can be expressed as D( p Y,S | U = u (cid:107) p Y,S )= (cid:88) y ∈Y ,s ∈S p Y,S | U ( y, s | u ) ln p Y,S | U ( y, s | u ) p Y,S ( y, s )= (cid:88) y ∈Y ,s ∈S p S ( s ) p Y | U,S ( y | u, s ) ln p S ( s ) p Y | U,S ( y | u, s ) p Y,S ( y, s )= (cid:88) y ∈Y ,s ∈S p S ( s ) p Y | U,S ( y | u, s ) ln p Y | U,S ( y | u, s ) p Y | S ( y | s ) , where p Y | U,S ( y | u, s )= (cid:88) x ∈X p Y | X,S ( y | x, s ) p X | U,S ( x | u, s )= (cid:88) x ∈X P Y | X,S ( y | x, s ) (cid:88) ˜ s ∈ ˜ S p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } and p Y | S ( y | s ) = (cid:88) x ∈X p X ( x ) p Y | X,S ( y | x, s ) . (27)Then D( p Y,S | U = u (cid:107) p Y,S )= (cid:88) s ∈S p S ( s ) (cid:88) y ∈Y (cid:32) (cid:88) x ∈X p X | U,S ( x | u, s ) P Y | X,S ( y | x, s ) (cid:33) × ln (cid:80) x ∈X p X | U,S ( x | u, s ) P Y | X,S ( y | x, s ) (cid:80) x ∈X p X ( x ) p Y | X,S ( y | x, s ) with p X | U,S ( x | u, s ) = (cid:80) ˜ s ∈ ˜ S p ˜ S | S (˜ s | s )1 { x = u (˜ s ) } , sothat D( p Y,S | U = u (cid:107) p Y,S ) becomes a function of the channel p X | U = u,S from S to X . For convenience, we denote thisfunction by D ( κ ) with κ = p X | U = u,S . By condition, γ ( p ˜ S | S ) ≥ T , so that γ ( p X | U = u,S ) ≥ T (Proposition C.2), and therefore D ( p X | U = u,S ) ≤ C (Propo-sitions C.3 and C.4 with (24) and (26)).b) When state information is not available, we have thechannel p Y | X = (cid:20) − δ δ − δ + δ δ − δ (cid:21) = (cid:20) − δ (cid:48) f ( δ (cid:48) ) δ (cid:48) f ( δ (cid:48) )1 − δ (cid:48) δ (cid:48) (cid:21) , where δ (cid:48) = g ( δ ) = δ (1 − δ ) which is invertible on (0 , / ,and f ( δ (cid:48) ) = g − ( δ (cid:48) ) / (1 − g − ( δ (cid:48) )) .Then it follows from Theorem 1.2 with α = p Y | X =0 , β = p Y | X =1 , a ∈ (0 , , b = 1 , and c = 0 that lim δ → D( α (cid:107) aα + (1 − a ) β )D( β (cid:107) aα + (1 − a ) β ) = ρ ( a, , and ρ (1 − e − , ,

0) = 1 . Since D( v (cid:107) u ) / D( w (cid:107) u ) is continuouswith respect to a and ρ ( t, , is strictly decreasing on (0 , (Proposition B.5), the capacity-achieving input distribution p X of channel p Y | X must satisfy lim δ → (cid:13)(cid:13) p X − (1 − e − , e − ) (cid:13)(cid:13) = 0 . On the other hand, it is noticed that for sufﬁciently small δ , D( p Y | X =1 ,S =1 (cid:107) p Y | S =1 ) > D( p Y | X =0 ,S =1 (cid:107) p Y | S =1 ) with p Y | S =1 deﬁned by (27), so it is tempted to use signal when S = 1 . We choose the input strategy u (˜ s ) = (cid:40) if ˜ s = 1 , otherwise . Because of the random erasure of p ˜ S | S , the actual inputdistributions under the strategy u are (1 , and ( (cid:15), − (cid:15) ) for the states and , respectively. By Theorem 1.2 with α = p Y | X =1 ,S =1 , β = p Y | X =0 ,S =1 , a = e − , b ∈ (e − , ,and c = 0 , we have lim δ → D( bα + (1 − b ) β (cid:107) aα + (1 − a ) β )D( β (cid:107) aα + (1 − a ) β ) = ρ (e − , b, and ρ (e − , − T,

0) = 1 . Since D( v (cid:107) u ) / D( w (cid:107) u ) is continuouswith respect to ( a, b ) (with < a < b < ) and ρ (e − , t, isstrictly increasing on (e − , (Proposition B.6), D((1 − (cid:15) ) α + (cid:15)β (cid:107) p Y | S =1 ) > D( β (cid:107) p Y | S =1 ) for (cid:15) = T − ι and sufﬁciently small δ . Therefore, D( p Y,S | U = u (cid:107) p Y,S )= D ( p X | U = u,S )= p S (0) D ( p Y | X =0 ,S =0 (cid:107) p Y | S =0 )+ p S (1) D ( (cid:15)p Y | X =0 ,S =1 + (1 − (cid:15) ) p Y | X =1 ,S =1 (cid:107) p Y | S =1 ) > p S (0) D ( p Y | X =0 ,S =0 (cid:107) p Y | S =0 )+ p S (1) D ( p Y | X =0 ,S =1 (cid:107) p Y | S =1 )= D (U ) = D( p Y,S | X =0 (cid:107) p Y,S ) , which implies (14) ([5, Theorem 4.5.1] and Remark 1.4),where D ( · ) is deﬁned in the proof of Part (a) and U denotesthe deterministic useless channel with constant output .V. C ONCLUSION

We have established tight lower and upper bounds on theratio of relative entropies of two probability distributions withrespect to a common third one, where the three distributionsare collinear in P n (Theorems 1.1 and 1.2). These boundsenable us to settle an open problem left from [1], namely,determining the exact universal threshold on the noise measureof the side channel (Theorem 1.3).It is worth noting that [6, Theorem 2] is a special case ofTheorem 1.1 with r = 1 , b = 1 , and c = 0 . A natural directionfor future work is to extend Theorem 1.1 to more than twoprobability distributions and to quantum relative entropy.A PPENDIX AP ROPERTIES OF F g ( r, a, b, c ) Proposition A.1:

Let F g ( r, a, b, c ) := (cid:90) ab b − t − t g ( t )d t − r (cid:90) ac c − t − t g ( t )d t, (28)where g ∈ M (cid:48) , the set of all positive, bounded, continuous,nonincreasing functions on (0 , . Then F g ( r, a, b, c ) is strictlyincreasing in r for ﬁxed a , b , and c with ≤ a, b, c ≤ and a (cid:54) = c, . Proof:

Observe that ∂ F g ( r, a, b, c ) ∂r = − (cid:90) ac c − t − t g ( t )d t, which is positive whenever a (cid:54) = c . Lemma A.2:

Let f and g be bounded measurable functionson ( I, B ( I )) and λ the Lebesgue measure on R , where I =[ c, d ] with c < d . The function g is nonincreasing on I . If for s ∈ I , (cid:90) [ c,s ] f ( t ) λ (d t ) ≥ (29)with equality iff s = c or d , then (cid:90) I f ( t ) g ( t ) λ (d t ) ≥ (30)with equality iff g is constant on ( c, d ) , and for any µ ∈ R , (cid:90) I f ( t ) g ( t ) λ (d t ) ≤ M (cid:90) I | g ( t ) − µ | λ (d t ) , (31)where M = sup t ∈ I | f ( t ) | . Proof:

Note that, owing to the nonincreasing property of g , the two limits g ( c + ) and g ( d − ) always exist. Let h ( t, t (cid:48) ) = f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } , which is clearly integrable on I × R .By Fubini’s theorem, (cid:90) I f ( t ) g ( t ) λ (d t )= (cid:90) I λ (d t ) (cid:90) R f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } λ (d t (cid:48) )= (cid:90) I × R h d( λ × λ )= (cid:90) R λ (d t (cid:48) ) (cid:90) I f ( t )1 { ≤ t (cid:48) ≤ g ( t ) } λ (d t )= (cid:90) R λ (d t (cid:48) ) (cid:90) J ( t (cid:48) ) f ( t ) λ (d t )= (cid:90) [ g ( d − ) ,g ( c + )] λ (d t (cid:48) ) (cid:90) J ( t (cid:48) ) f ( t ) λ (d t ) , (32)where J ( t (cid:48) ) = { t ∈ I : g ( t ) ≥ t (cid:48) } is an interval [ c, t (cid:48)(cid:48) ) or [ c, t (cid:48)(cid:48) ] with t (cid:48)(cid:48) ∈ I , and (32) is because when t (cid:48) / ∈ [ g ( d − ) , g ( c + )] , J ( t (cid:48) ) is ∅ , { c } , [ c, d ) , or [ c, d ] , so that (cid:82) J ( t (cid:48) ) f d λ = 0 bycondition (29).If g is constant on ( c, d ) , then g ( c + ) = g ( d − ) , so that (cid:82) I f g d λ = 0 . On the other hand, if g is not constant on ( c, d ) ,then for any g ( d − ) < t (cid:48) < g ( c + ) , J ( t (cid:48) ) = [ c, t (cid:48)(cid:48) ) or [ c, t (cid:48)(cid:48) ] with c < t (cid:48)(cid:48) < d , hence (cid:82) J ( t (cid:48) ) f d λ > for t (cid:48) ∈ ( g ( d − ) , g ( c + )) ,and therefore (cid:82) I f g d λ > . This proves (30), and (31) is aneasy consequence of the following fact: (cid:90) I f ( t )( g ( t ) − µ ) λ (d t ) = (cid:90) I f ( t ) g ( t ) λ (d t ) − µ (cid:90) I f ( t ) λ (d t )= (cid:90) I f ( t ) g ( t ) λ (d t ) . Proposition A.3:

Let r ∗ = ρ ( a, b, c ) , where ρ ( a, b, c ) isdeﬁned by (2) with ≤ a < , ≤ c < b ≤ , and a (cid:54) = b, c . The functional F g ( r ∗ , a, b, c ) can be written as (cid:82) [ p ,p ] f ( t ) g ( t ) λ (d t ) such that h ( s ) = (cid:82) [ p ,s ] f ( t ) λ (d t ) iszero at s = p and is strictly increasing on ( p , p ) andstrictly decreasing on ( p , p ) for some p ∈ ( p , p ) , where p = a ∧ c , p = a ∨ b , and λ is the Lebesgue measure on R .More speciﬁcally, we have:a) If ≤ a < c < b ≤ , then f ( t ) = r ∗ c − t − t { a ≤ t ≤ c } − b − t − t { a ≤ t ≤ b } . (33)It is strictly decreasing on ( a, c ) and strictly increasing on ( c, b ) , and it is positive on ( a, d ) and negative on ( d, b ) forsome d ∈ ( a, c ) .b) If ≤ c < a < b ≤ , then f ( t ) = r ∗ t − c − t { c ≤ t ≤ a } − b − t − t { a ≤ t ≤ b } , (34)It is strictly increasing on ( c, a ) and ( a, b ) , and it is positiveon ( c, a ) and negative on ( a, b ) .c) If ≤ c < b < a < , then f ( t ) = r ∗ t − c − t { c ≤ t ≤ a } − t − b − t { b ≤ t ≤ a } , (35)t is strictly increasing on ( c, b ) and strictly decreasing on ( b, a ) , and it is positive on ( c, d ) and negative on ( d, a ) forsome d ∈ ( b, a ) . Proof:

It has been shown in the proof of Theorem 1.1that h ( p ) = 0 . Other properties of h are easy consequencesof the remaining part.Equations (33), (34), and (35) are obviously true in thealmost-everywhere sense. It remains to prove the propertiesof f in the three cases.a) For t ∈ ( a, c ) , it follows from Propositions B.1 and B.5that f (cid:48) ( t ) = − r ∗ − c (1 − t ) + 1 − b (1 − t ) < − b (1 − c ) + (1 − b ) cc (1 − t ) = − b − cc (1 − t ) < , so f ( t ) is strictly decreasing on ( a, c ) . For t ∈ ( c, b ) , f ( t ) = − b − t − t = 1 − b − t − , which is clearly strictly increasing on ( c, b ) . By Proposi-tion B.4, lim t ∈ a + f ( t ) = f ( a ) = r ∗ c − a − a − b − a − a > . It is also clear that f ( b ) = 0 . Therefore, f ( t ) is positive on ( a, d ) and negative on ( d, b ) for some d ∈ ( a, c ) .b) When t ∈ ( c, a ) , f ( t ) = r ∗ t − c − t = r ∗ − c − t − r ∗ which is strictly increasing. When t ∈ ( a, b ) , f ( t ) = 1 − b − t − , which is also strictly increasing. Since lim t → c + f ( t ) = f ( c ) =0 and lim t ∈ b − f ( t ) = f ( b ) = 0 , f ( t ) is positive on ( c, a ) andnegative on ( a, b ) .c) For t ∈ ( c, b ) , f ( t ) = r ∗ − c − t − r ∗ , which is strictly increasing. For t ∈ ( b, a ) , it follows fromProposition B.5 that f (cid:48) ( t ) = r ∗ − c (1 − t ) − − b (1 − t ) < (1 − b ) − (1 − b )(1 − t ) = 0 , so f ( t ) is strictly decreasing on ( b, a ) . By Proposition B.4, lim t → a − f ( t ) = f ( a ) = r ∗ a − c − a − a − b − a < . It is also clear that lim t → c + f ( t ) = f ( c ) = 0 and lim t → b + f ( t ) = f ( b ) > . Therefore, f ( t ) is positive on ( c, d ) and negative on ( d, a ) for some d ∈ ( b, a ) . Proposition A.4:

The equation F g ( r, a, b, c ) = 0 solved for r has a unique positive solution q = q ( g ) for g ∈ M (cid:48) andﬁxed a , b , c with ≤ a < , ≤ c < b ≤ , and a (cid:54) = b, c .Then q ( g ) ≤ q (1) = ρ ( a, b, c ) for all g ∈ M with equality iff g is constant on ( a ∧ c, a ∨ b ) , where ρ ( a, b, c ) is deﬁned by(2). If for some positive real µ , (cid:107) g − µ (cid:107) < µξ a ( c )(1 − a ) | a − c | , then q ( g ) ≥ q (1) − M (1 − a ) (cid:107) g − µ (cid:107) µξ a ( c )(1 − a ) − | a − c | (cid:107) g − µ (cid:107) , where (cid:107) g (cid:107) = (cid:82) | g ( t ) | d t , ξ a is deﬁned by (3), and M = M ( a, b, c ) is a certain positive real number. Proof:

The existence and uniqueness of q ( g ) followsfrom Proposition A.1 with the facts F g (0 , a, b, c ) < and lim r → + ∞ F g ( r, a, b, c ) = + ∞ .It is clear that q (1) , the solution of F ( r, a, b, c ) = 0 , is ρ ( a, b, c ) . From Propositions A.2 and A.3 it follows that ≤ F g ( q (1) , a, b, c ) ≤ M ( a, b, c ) (cid:107) g − µ (cid:107) . (36)The ﬁrst inequality of (36) implies that q ( g ) ≤ q (1) withequality iff g is constant on ( a ∧ c, a ∨ b ) (Propositions A.1and A.2). On the other hand, F g ( q (1) , α, β, γ )= F g ( q (1) , α, β, γ ) − F g ( q ( g ) , α, β, γ )= ( q (1) − q ( g )) (cid:90) ac t − c − t g ( t )d t ≥ ( q (1) − q ( g )) (cid:18)(cid:90) ac t − c − t µ d t − (cid:90) ac t − c − t | g ( t ) − µ | d t (cid:19) ≥ ( q (1) − q ( g )) (cid:18) µ (cid:90) ac t − c − t d t − | a − c | − a (cid:90) | g ( t ) − µ | d t (cid:19) = ( q (1) − q ( g )) (cid:18) µξ a ( c ) − | a − c | − a (cid:107) g − µ (cid:107) (cid:19) . This, combined with (36), completes the proof.A

PPENDIX BP ROPERTIES OF ζ t ( s ) , ξ s ( t ) , AND ρ ( a, b, c ) Proposition B.1:

For the function ζ t ( s ) deﬁned by (4), ζ (cid:48) t ( s ) = t − s − s , so that ζ t is strictly increasing on (0 , t ) andstrictly decreasing on ( t, . Furthermore, we have ζ b ( b ) ζ c ( c ) > bc for < c < b ≤ . Proof:

The ﬁrst part is obvious, and the second part canbe proved by letting f ( t ) = ζ t ( t ) /t and noting that f (cid:48) ( t ) = − t + ln(1 − t ) t > − t − tt = 0 for < t < . Also note that this inequality is equivalent to ρ (0 , b, c ) > /ρ (1 , − c, − b ) implied by (1). Proposition B.2:

For the function ξ s ( t ) deﬁned by (3) with ≤ s < , ξ s (0) = ln − s − s, ξ s ( s ) = 0 , and ξ s (1) = 1 − s.ξ s is continuous on [0 , , and it is strictly decreasing on (0 , s ) and strictly increasing on ( s, .On the other hand, when t is ﬁxed, ξ s ( t ) is strictly decreas-ing in s for s ∈ (0 , t ) and strictly increasing in s for s ∈ ( t, .When s = 1 − e − , ξ s (0) = ξ s (1) , so that for s ≤ − e − , ξ s ( t ) = ξ s (0) has a unique solution on ( s, , and for s ≥ − e − , ξ s ( t ) = ξ s (1) has a unique solution on [0 , s ) . Proof:

Observe that ξ (cid:48) s ( t ) = ln(1 − s ) − ln(1 − t ) hich is negative on (0 , s ) and positive on ( s, . Also notethat ∂ξ s ( t ) ∂s = s − t − s , which, as a function of s , is negative on (0 , t ) and positiveon ( t, . These two facts prove the ﬁrst and the second parts,respectively. The last part can be easily proved by noting that ξ s (0) and ξ s (1) are strictly increasing and decreasing for s ∈ [0 , , respectively. Proposition B.3:

Let ξ s, ↓ = ξ s | [0 ,s ] and ξ s, ↑ = ξ s | [ s, . Let f t, ↑ ( s ) = ξ − s, ↑ ( ξ s ( t )) , where s ∈ ( t, d ) , t ∈ [0 , , and d is the unique solution of ξ d ( t ) = ξ d (1) for d ∈ ( t, . Then f t, ↑ ( s ) is strictly increasing in s .Let f t, ↓ ( s ) = ξ − s, ↓ ( ξ s ( t )) , where s ∈ ( d, t ) , t ∈ (0 , , and d is the unique solution of ξ d ( t ) = ξ d (0) for d ∈ (0 , t ) . Then f t, ↓ ( s ) is strictly increasing in s . Proof:

This result is a consequence of Proposition B.2.The condition ξ d ( t ) = ξ d (1) ensures that ξ s ( s ) < ξ s ( t ) < ξ d ( t ) = ξ d (1) < ξ s (1) when s ∈ ( t, d ) , so that f t, ↑ ( s ) is well deﬁned. For t < s ξ − s (cid:48) , ↑ ( ξ s ( t )) > s (cid:48) .Similarly, ξ s ( ξ − s (cid:48) , ↑ ( ξ s ( t ))) > ξ s (cid:48) ( ξ − s (cid:48) , ↑ ( ξ s ( t ))) = ξ s ( t ) , so that ξ − s (cid:48) , ↑ ( ξ s ( t )) > ξ − s, ↑ ( ξ s ( t )) , and therefore f t, ↑ ( s ) < f t, ↑ ( s (cid:48) ) .The condition ξ d ( t ) = ξ d (0) ensures that ξ s ( s ) < ξ s ( t ) < ξ d ( t ) = ξ d (0) < ξ s (0) when s ∈ ( d, t ) , so that f t, ↓ ( s ) is well deﬁned. For d < s ξ s (cid:48) ( t ) , so that ξ − s, ↓ ( ξ s ( t )) < ξ − s, ↓ ( ξ s (cid:48) ( t )) < s .Similarly, ξ s (cid:48) ( t ) = ξ s ( ξ − s, ↓ ( ξ s (cid:48) ( t ))) < ξ s (cid:48) ( ξ − s, ↓ ( ξ s (cid:48) ( t ))) , so that ξ − s, ↓ ( ξ s (cid:48) ( t )) < ξ − s (cid:48) , ↓ ( ξ s (cid:48) ( t )) , and therefore f t, ↓ ( s ) < f t, ↓ ( s (cid:48) ) . Proposition B.4:

Let ρ ( a, b, c ) be the function deﬁned by(2). If ≤ a < c < b ≤ , then ρ ( a, b, c ) > b − ac − a . If ≤ c ζ b ( c ) − ζ b ( a ) ζ c ( c ) − ζ c ( a ) = ζ (cid:48) b ( t ) ζ (cid:48) c ( t ) = b − tc − t > b − ac − a , where t ∈ ( a, c ) . Similarly, if ≤ c < b < a < , then ρ ( a, b, c ) < ζ b ( b ) − ζ b ( a ) ζ c ( b ) − ζ c ( a ) = ζ (cid:48) b ( t ) ζ (cid:48) c ( t ) = t − bt − c < a − ba − c , where t ∈ ( b, a ) . Proposition B.5:

Let f ( t ) = ρ ( t, b, c ) , where ≤ c < b ≤ .Then f (0) = ζ b ( b ) ζ c ( c ) , f ( c ) = + ∞ , f ( b ) = 0 , and f (1) = − b − c . The function f is continuous on [0 , c ) and ( c, , and it isstrictly increasing on (0 , c ) and ( b, and strictly decreasingon ( c, b ) . Let f ↑ = f | [0 ,c ) , and f ↓ = f | ( c,b ] , and f ↑ = f | [ b, . Then, for s > , f − ↑ (( s, + ∞ )) = (cid:40) ( f − ↑ ( s ) , c ) if s ≥ f (0) , [0 , c ) otherwise ,f − ↓ (( s, + ∞ )) = ( c, f − ↓ ( s )) , f − ↑ (( s, + ∞ )) = (cid:40) ( f − ↑ ( s ) , if s < f (1) , ∅ otherwise . Proof:

It is clear that f is continuous on [0 , c ) and ( c, .As for t = 1 , lim t → f ( t ) = lim t → ζ b ( b ) − ζ b ( t ) ζ c ( c ) − ζ c ( t ) = lim t → ζ (cid:48) b ( t ) ζ (cid:48) c ( t )= lim t → b − tc − t (Proposition B.1) = f (1) . For the remaining part, it sufﬁces to show that f (cid:48) ( t ) ispositive on (0 , c ) ∪ ( b, and negative on ( c, b ) . We have f (cid:48) ( t ) = g ( t )( ζ c ( c ) − ζ c ( t )) , where g ( t ) = ζ (cid:48) c ( t )( ζ b ( b ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t )) . If < t < c , then from Proposition B.1, it follows that g ( t ) > ζ (cid:48) c ( t )( ζ b ( c ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t ))= ζ (cid:48) c ( t ) ζ (cid:48) b ( t (cid:48) )( ζ c ( c ) − ζ c ( t )) ζ (cid:48) c ( t (cid:48) ) − ζ (cid:48) b ( t )( ζ c ( c ) − ζ c ( t )) (37) = ( ζ c ( c ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − t (cid:48) c − t (cid:48) − ζ (cid:48) b ( t ) (cid:19) > ( ζ c ( c ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − tc − t − ζ (cid:48) b ( t ) (cid:19) = 0 , where (37) follows from Cauchy’s mean value theorem forsome t (cid:48) ∈ ( t, c ) . If c < t < b , then it follows fromProposition B.1 that ζ (cid:48) c ( t ) < and ζ (cid:48) b ( t ) > , so that g ( t ) < .If b < t < , then it follows from Proposition B.1 that g ( t ) > ζ (cid:48) c ( t )( ζ b ( b ) − ζ b ( t )) − ζ (cid:48) b ( t )( ζ c ( b ) − ζ c ( t ))= ζ (cid:48) c ( t ) ζ (cid:48) b ( t (cid:48) )( ζ c ( b ) − ζ c ( t )) ζ (cid:48) c ( t (cid:48) ) − ζ (cid:48) b ( t )( ζ c ( b ) − ζ c ( t )) (38) = ( ζ c ( b ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − t (cid:48) c − t (cid:48) − ζ (cid:48) b ( t ) (cid:19) > ( ζ c ( b ) − ζ c ( t )) (cid:18) ζ (cid:48) c ( t ) b − tc − t − ζ (cid:48) b ( t ) (cid:19) = 0 , where (38) follows from Cauchy’s mean value theorem forsome t (cid:48) ∈ ( b, t ) . Proposition B.6:

Let f ( t ) = ρ ( a, t, c ) , where ≤ a ≤ , ≤ c < , and a (cid:54) = c . Then f (0) = (cid:40) ξ a (0) ξ a ( c ) if ≤ a < , − c otherwise , ,f ( c ) = 1 , f ( a ) = 0 , and f (1) = ξ a (1) ξ a ( c ) . The function f is continuous on [0 , , and it is strictly decreasing on (0 , a ) and strictly increasing on ( a, . Let f ↓ = f | ( c,a ) and f ↑ = f | ( a ∨ c, . Then, for s > , f − ↓ (( s, + ∞ )) = (cid:40) ( c, ξ − a, ↓ ( sξ a ( c ))) if c < a and s < , ∅ otherwise , − ↑ (( s, + ∞ )) = (cid:40) ( ξ − a, ↑ ( sξ a ( c )) ∨ c, if s < f (1) , ∅ otherwise , where ξ a, ↓ and ξ a, ↑ are deﬁned in Proposition B.3. Proof:

Since f ( t ) = (1 − t ) / (1 − c ) for a = 1 , theproposition is clearly true. As for a < , note that f ( t ) = ξ a ( t ) /ξ a ( c ) and use Proposition B.2. Proposition B.7:

Let f ( t ) = ρ ( a, b, t ) , where ≤ a ≤ , < b ≤ , and a (cid:54) = b . Then f (0) = (cid:40) ξ a ( b ) ξ a (0) if ≤ a < , − b otherwise ,f ( a ) = + ∞ , f ( b ) = 1 , and f (1) = ξ a ( b ) ξ a (1) . The function f iscontinuous on [0 , a ) and ( a, , and it is strictly increasing on (0 , a ) and strictly decreasing on ( a, . Let f ↑ = f | [0 ,a ∧ b ) and f ↓ = f | ( a,b ) . Then, for s > , f − ↑ (( s, + ∞ )) = (cid:40) ( ξ − a, ↓ ( ξ a ( b ) /s ) , a ∧ b ) if s ≥ f (0) , [0 , a ∧ b ) otherwise ,f − ↓ (( s, + ∞ )) = (cid:40) ( a, ξ − a, ↑ ( ξ a ( b ) /s )) if b > a and s > , ( a, b ) otherwise , where ξ a, ↓ and ξ a, ↑ are deﬁned in Proposition B.3. Proof:

Since f ( t ) = (1 − b ) / (1 − t ) for a = 1 , theproposition is clearly true. As for a < , note that f ( t ) = ξ a ( b ) /ξ a ( t ) and use Proposition B.2.A PPENDIX CT HE PROPERTIES OF γ ( κ ) AND D ( κ ) Proposition C.1:

Any channel κ : S → X can be decom-posed into the following form: κ = (cid:88) x ∈X λ x ( κ ) [ γ ( κ )U x + (1 − γ ( κ )) κ (cid:48) ] , where U x denotes the deterministic useless channel withconstant output x , and γ ( κ ) := (cid:88) x ∈X min s ∈S κ ( x | s ) ∈ [0 , ,λ x ( κ ) := (cid:40) min s ∈S κ ( x | s ) γ ( κ ) if γ ( κ ) > , |X | otherwise ,κ (cid:48) ( x | s ) = (cid:40) κ ( x | s ) − min s (cid:48)∈S κ ( x | s (cid:48) )1 − γ ( κ ) if γ ( κ ) < , |X | otherwise . Sketch of proof:

The proof is straightforward and onlyinvolves simple algebraic manipulations. One thing to note isthat γ ( κ ) ≤ (cid:80) x ∈X κ ( x | s ) = 1 where s is arbitrary.Since a channel κ : S → X can be regarded as a |S| × |X | matrix. The next property of γ ( κ ) is given in a matrix form. Proposition C.2:

Let A be an m × (cid:96) channel matrix and B an (cid:96) × n deterministic channel matrix. Let g j be the gap betweenthe least number and the second least number of column A ∗ ,j and let g = min ≤ j ≤ (cid:96) g j . Then γ ( AB ) ≥ γ ( A )+( | M |− n ) + g, where M = I ( { , . . . , (cid:96) } ) and I ( j ) = arg min i A i,j . When | M | ≤ n , the lower bound can be attained by choosing B such that | I ( B − ( k )) | ≤ for every ≤ k ≤ n , where B is understood as a map. (There may be more than one rowsattaining the minimum value of A ∗ ,j , in which case, it does notmatter which row index is assigned to I ( j ) because g = 0 ). Proof:

Since B is deterministic, AB = C = (cid:0) C ∗ , C ∗ , · · · C ∗ ,n (cid:1) with every column C ∗ ,k = (cid:80) j ∈ B − ( k ) A ∗ ,j . Let I (cid:48) ( k ) =arg min i C i,k . Then the set M (cid:48) = I (cid:48) ( { , . . . , n } ) hasat most min { m, n } elements, and hence misses at least ( | M | − min { m, n } ) + indices in M , so that at least ( | M | − min { m, n } ) + columns of A do not contribute their minimumcomponents to the minimum components of columns of C ,and therefore γ ( C ) = n (cid:88) k =1 C I (cid:48) ( k ) ,k ≥ (cid:96) (cid:88) j =1 min ≤ i ≤ m A i,j + ( | M | − min { m, n } ) + g = γ ( A ) + ( | M | − n ) + g. The remaining part is straightforward.

Proposition C.3:

Let D ( κ ) := (cid:88) s ∈S p S ( s )D( κ ( · | s ) ⊗ p Y | X,S = s (cid:107) p Y | S = s ) where κ is a channel from S to X , ( κ ( · | s ) ⊗ p Y | X,S = s )( y ) = (cid:88) x ∈X κ ( x | s ) p Y | X,S ( y | x, s ) , and p Y | S ( y | s ) = (cid:80) x ∈X p X ( x ) p Y | X,S ( y | x, s ) . If X = { , } , D (U ) = D (U ) = C , and γ ( κ ) ≥ T ( p X (0)) ∨ T ( p X (1)) , then D ( κ ) ≤ C , where T ( a ) = min { t ∈ [0 ,

1] : D( z ( t ) (cid:107) z ( a )) ≤ D( z (1) (cid:107) z ( a )) for all α, β ∈ P |Y| ,where z ( t ) = tα + (1 − t ) β } (39)for a ∈ (0 , . Proof:

We denote by A the set on which the minimumis taken in (39). We will show that it is closed, so that T ( a ) is well deﬁned. The set A can be rewritten as A = ∩ α,β ∈P |Y| A ( α, β ) with A ( α, β ) = { t ∈ [0 ,

1] : D( z ( t ) (cid:107) z ( a )) ≤ D( z (1) (cid:107) z ( a )) } .Since D( z ( t ) (cid:107) z ( a )) , as a function of t , is continuous, A ( α, β ) is closed for all α, β , and hence the intersection A is alsoclosed.By the convexity of Kullback-Leibler divergence (or the log-sum inequality), it is easy to see that D ( κ ) is convex. Then, D ( κ ) ≤ (cid:88) x ∈X λ x D ( γ U x + (1 − γ ) κ (cid:48) ) , (Proposition C.1)here λ x = λ x ( κ ) and γ = γ ( κ ) . For every x ∈ X , D ( γ U x + (1 − γ ) κ (cid:48) )= (cid:88) s ∈S p S ( s )D (cid:0) γp Y | X = x,S = s + (1 − γ )( κ (cid:48) ( · | s ) ⊗ p Y | X,S = s ) (cid:13)(cid:13) p Y | S = s (cid:1) = (cid:88) s ∈S p S ( s )D (cid:32) (cid:88) x (cid:48) ∈X λ (cid:48) x p Y | X = x (cid:48) ,S = s (cid:13)(cid:13)(cid:13) p Y | S = s (cid:33) , where λ (cid:48) x ≥ γ . Since γ ≥ T ( p X (0)) ∨ T ( p X (1)) , it followsfrom (39) that D ( γ U x + (1 − γ ) κ (cid:48) ) ≤ D (U x ) = C . Therefore, D ( κ ) ≤ (cid:80) x ∈X λ x C = C . Proposition C.4:

For a > e − , the function T ( a ) deﬁned by(39) can be computed by T ( a ) = 1 − ξ − − a, ↑ ( ξ − a (0)) , whichis strictly increasing in (e − , . Proof:

Since a > e − , ξ − a (0) < ξ − a (1) (Proposi-tion B.2). From Theorems 1.1 and 1.2 with b = 1 , it followsthat T ( a ) = inf { c : 1 /ρ (1 − a, − c, ≥ } = inf { c : ρ (1 − a, − c, ≤ } = inf(1 − [0 , ξ − − a, ↑ ( ξ − a (0))]) (40) = inf[1 − ξ − − a, ↑ ( ξ − a (0)) ,

1] = 1 − ξ − − a, ↑ ( ξ − a (0)) , where (40) follows from Proposition B.6. It is also clear that T ( a ) is strictly increasing in (e − , (Proposition B.3).A CKNOWLEDGMENT

This work was supported in part by the National NaturalScience Foundation of China under Grant 61571398 and inpart by the Natural Science and Engineering Research Council(NSERC) of Canada under a Discovery Grant.R

EFERENCES[1] R. Xu, J. Chen, T. Weissman, and J.-K. Zhang, “When is noisy stateinformation at the encoder as useless as no information or as good asnoise-free state?”

IEEE Trans. Inf. Theory , vol. 63, no. 2, pp. 960–974,Feb. 2017.[2] C. E. Shannon, “Channels with side information at the transmitter,”

IBMJ. Res. Develop. , vol. 2, no. 4, pp. 289–293, Oct. 1958.[3] G. Caire and S. Shamai, “On the capacity of some channels with channelstate information,”

IEEE Trans. Inf. Theory , vol. 45, no. 6, pp. 2007–2019,Sep. 1999.[4] A. A. El Gamal and Y.-H. Kim,

Network Information Theory . Cam-bridge, New York: Cambridge University Press, 2011.[5] R. G. Gallager,

Information Theory and Reliable Communication . NewYork: Wiley, 1968.[6] N. Shulman and M. Feder, “The Uniform Distribution as a UniversalPrior,”

Related Researches

Moving Object Classification with a Sub-6 GHz Massive MIMO Array using Real Data

by B. R. Manoj

The Exact Rate Memory Tradeoff for Small Caches with Coded Placement

by Vijith Kumar K P

Differential Privacy for Binary Functions via Randomized Graph Colorings

by Rafael G. L. D'Oliveira

Constrained Secrecy Capacity of Finite-Input Intersymbol Interference Wiretap Channels

by Aria Nouri

Optimal SIC Ordering and Power Allocation in Downlink Multi-Cell NOMA Systems

by Sepehr Rezvani

Distributed Storage Allocations for Optimal Service Rates

by Pei Peng

Multilevel Topological Interference Management: A TIM-TIN Perspective

by Chunhua Geng

Distributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning

by Rui Yin

Two-Dimensional Golay Complementary Array Sets from Generalized Boolean Functions

by Cheng-Yu Pai

Variations on a Theme by Massey

by Olivier Rioul

Compressed Shaping: Concept and FPGA Demonstration

by Tsuyoshi Yoshida

Learning-based WiFi Traffic Load Estimation in NR-U Systems

by Rui Yin

Max-log APP Detection for Non-bijective Symbol Constellations

by Martin Damrath

A Theoretical Answer to "Does the IRC-SINR of an Interference Rejection Combiner always Increase with an Increase in Number of Receive Antennas?"

by Karthik Muralidhar

Design of Polar Code Lattices of Small Dimension

by Obed Rhesa Ludwiniananda

Mutual Information of Neural Network Initialisations: Mean Field Approximations

by Jared Tanner

Quantum Algorithm for DOA Estimation in Hybrid Massive MIMO

by Fanxu Meng

Semiquantitative Group Testing in at Most Two Rounds

by Mahdi Cheraghchi

Learning to Decode Protograph LDPC Codes

by Jincheng Dai

Coded Computing with Noise

by Royee Yosibash

Robust and Secure Cache-aided Private Linear Function Retrieval from Coded Servers

by Qifa Yan

Frame Based Codes for Partially Active NOMA

by Maya Slamovich

Spectral Graph Theory Based Resource Allocation for IRS-Assisted Multi-Hop Edge Computing

by Huilian Zhang

Communications using Sparse Signals

by Madhusudan Kumar Sinha

A Simple Cooperative Diversity Method Based on Deep-Learning-Aided Relay Selection

by Wei Jiang

«

1

2

3

4

»

Submitted on 30 May 2018 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar