Effective Secrecy: Reliability, Confusion and Stealth
aa r X i v : . [ c s . I T ] J a n Effective Secrecy: Reliability, Confusion and Stealth
Jie Hou and Gerhard KramerInstitute for Communications EngineeringTechnische Universit¨at M¨unchen, 80290 Munich, GermanyEmail: { jie.hou, gerhard.kramer } @tum.de Abstract —A security measure called effective security is de-fined that includes strong secrecy and stealth communication.Effective secrecy ensures that a message cannot be decipheredand that the presence of meaningful communication is hidden.To measure stealth we use resolvability and relate this to binaryhypothesis testing. Results are developed for wire-tap channelsand broadcast channels with confidential messages.
I. I
NTRODUCTION
Wyner [1] derived the secrecy capacity for degraded wire-tap channels (see Fig. 1). Csisz´ar and K¨orner [2] extended theresults to broadcast channels with confidential messages. Inboth [1] and [2], secrecy was measured by a normalized mutualinformation between the message M and the eavesdropper’soutput Z n under a secrecy constraint n I ( M ; Z n ) ≤ S (1)which is referred to as weak secrecy . Weak secrecy has theadvantage that one can trade off S for rate. The drawback isthat even S ≈ is usually considered too weak because theeavesdropper can decipher nS bits of M , which grows with n . Therefore, [3] (see also [4]) advocated using strong secrecy where secrecy is measured by the unnormalized mutual infor-mation I ( M ; Z n ) and requires I ( M ; Z n ) ≤ ξ (2)for any ξ > and sufficiently large n .In related work, Han and Verd´u [5] studied resolvability based on variational distance that addresses the number of bitsneeded to mimic a marginal distribution of a prescribed jointdistribution. Bloch and Laneman [6] used the resolvabilityapproach of [5] and extended the results in [2] to continuousrandom variables and channels with memory.The main contribution of this work is to define a new andstronger security measure for wire-tap channels that includesnot only reliability and (wiretapper) confusion but also stealth .The measure is satisfied by random codes and by using a re-cently developed simplified proof [7] of resolvability based on unnormalized informational divergence (see also [8, Lemma11]). In particular, we measure secrecy by the informationaldivergence D ( P MZ n || P M Q nZ ) (3)where P MZ n is the joint distribution of M Z n , P M is thedistribution of M , P Z n is the distribution of Z n , and Q nZ isthe distribution that the eavesdropper expects to observe when the source is not communicating useful messages. We call thissecurity measure effective secrecy . One can easily check that(see (7) below) D ( P MZ n || P M Q nZ ) = I ( M ; Z n ) | {z } Non-Confusion + D ( P Z n || Q nZ ) | {z } Non-Stealth (4)where we interpret I ( M ; Z n ) as a measure of “non-confusion” and D ( P Z n || Q nZ ) as a measure of “non-stealth”.We justify the former interpretation by using error probabilityin Sec. III and the latter by using binary hypothesis testingin Sec. IV. Thus, by making D ( P MZ n || P M Q nZ ) → we notonly keep the message secret from the eavesdropper but alsohide the presence of meaningful communication.The paper is organized as follows. In Section II, we statethe problem. In Section III we state and prove the main result.Section IV relates the result to hypothesis testing. Section Vdiscusses related works.II. P RELIMINARIES
A. Notation
Random variables are written with upper case letters andtheir realizations with the corresponding lower case letters. Su-perscripts denote finite-length sequences of variables/symbols,e.g., X n = X , . . . , X n . Subscripts denote the position ofa variable/symbol in a sequence. For instance, X i denotesthe i -th variable in X n . We use X ni to denote the sequence X i , . . . , X n , ≤ i ≤ n . A random variable X has probabilitydistribution P X and the support of P X is denoted as supp ( P X ) .We write probabilities with subscripts P X ( x ) but we dropthe subscripts if the arguments of the distribution are lowercase versions of the random variables. For example, we write P ( x ) = P X ( x ) . If the X i , i = 1 , . . . , n , are independentand identically distributed (i.i.d.) according to P X , then wehave P ( x n ) = Q ni =1 P X ( x i ) and we write P X n = P nX . Weoften also use Q nX to refer to sequences of i.i.d. randomvariables. Calligraphic letters denote sets. The size of a set S is denoted as |S| and the complement is denoted as S c .For X with alphabet X , we denote P X ( S ) = P x ∈S P X ( x ) for any S ⊆ X . We use T nǫ ( P X ) to denote the set of letter-typical sequences of length n with respect to the probabilitydistribution P X and the non-negative number ǫ [9, Ch. 3],[10], i.e., we have T nǫ ( P X ) = (cid:26) x n : (cid:12)(cid:12)(cid:12) N ( a | x n ) n − P X ( a ) (cid:12)(cid:12)(cid:12) ≤ ǫP X ( a ) , ∀ a ∈ X (cid:27) where N ( a | x n ) is the number of occurrences of a in x n . PSfrag replacements Joey ChandlerRoss
MWU n X n Y n Z n ˆ M ˆ W Q nY Z | X Encoder Decoder I ( M ; Z n ) D ( P Z n || Q nZ ) Fig. 1. A wire-tap channel.
B. Wire-Tap Channel
Consider the wire-tap channel depicted in Fig. 1. Joey hasa message M which is destined for Chandler but should bekept secret from Ross. The message M is uniformly distributedover { , . . . , L } , L = 2 nR , and an encoder f ( · ) maps M tothe sequence X n = f ( M, W ) (5)with help of a randomizer variable W that is independent of M and uniformly distributed over { , . . . , L } , L = 2 nR . Thepurpose of W is to confuse Ross so that he learns little about M . X n is transmitted through a memoryless channel Q nY Z | X .Chandler observes the channel output Y n while Ross observes Z n . The pair M Z n has the joint distribution P MZ n . Chandlerestimates ˆ M from Y n and the average error probability is P ( n ) e = Pr h ˆ M = M i . (6)Ross tries to learn M from Z n and secrecy is measured by D ( P MZ n || P M Q nZ )= X ( m,z n ) ∈ supp ( PMZn ) P ( m, z n ) log (cid:18) P ( m, z n ) P ( m ) · Q nZ ( z n ) · P ( z n ) P ( z n ) (cid:19) = X ( m,z n ) ∈ supp ( PMZn ) P ( m, z n ) (cid:18) log P ( z n | m ) P ( z n ) + log P ( z n ) Q nZ ( z n ) (cid:19) = I ( M ; Z n ) | {z } Non-Confusion + D ( P Z n || Q nZ ) | {z } Non-Stealth (7)where P Z n is the distribution Ross observes at his channeloutput and Q nZ is the distribution Ross expects to observe ifJoey is not sending useful information. For example, if Joeytransmits X n with probability Q nX ( X n ) through the channel,then we have Q nZ ( z n ) = X x n ∈ supp ( Q nX ) Q nX ( x n ) Q nZ | X ( z n | x n ) . (8)When Joey sends useful messages, then P Z n and Q nZ aredifferent. But a small D ( P MZ n || P M Q nZ ) implies that both I ( M ; Z n ) and D ( P Z n || Q nZ ) are small which in turn impliesthat Ross learns little about M and cannot recognize whetherJoey is communicating anything meaningful. A rate R is achievable if for any ξ , ξ > there is a sufficiently large n and an encoder and a decoder such that P ( n ) e ≤ ξ (9) D ( P MZ n || P M Q nZ ) ≤ ξ . (10) The effective secrecy capacity C S is the supremum of the setof achievable R . We wish to determine C S .III. M AIN RESULT AND P ROOF
We prove the following result.
Theorem 1:
The effective secrecy capacity of the wire-tapchannel is the same as the weak and strong secrecy capacity,namely C S = max Q V X [ I ( V ; Y ) − I ( V ; Z )] (11)where the maximization is over all joint distributions Q V X satisfying the Markov chain V − X − Y Z. (12)One may restrict the cardinality of V to |V| ≤ |X | . A. Achievability
We use random coding and the proof technique of [7].
Random Code:
Fix a distribution Q X and generate L · L codewords x n ( m, w ) , m = 1 , . . . , L , w = 1 , . . . , L using Q ni =1 Q X ( x i ( m, w )) . This defines the codebook C = { x n ( m, w ) , m = 1 , . . . , L, w = 1 , . . . , L } (13)and we denote the random codebook by e C = { X n ( m, w ) } ( L,L )( m,w )=(1 , . (14) Encoding:
To send a message m , Joey chooses w uniformlyfrom { , . . . , L } and transmits x n ( m, w ) . Hence, for a fixedcodebook C every x n ( m, w ) occurs with probability P X n ( x n ( m, w )) = 1 L · L (15)rather than Q nX ( x n ( m, w )) . Further, for every pair ( m, z n ) wehave (see (8)) P ( z n | m ) = L X w =1 L · Q nZ | X ( z n | x n ( m, w )) (16) P ( z n ) = L X m =1 L X w =1 L · L · Q nZ | X ( z n | x n ( m, w )) . (17) Chandler:
Chandler puts out ( ˆ m, ˆ w ) if there is a unique pair ( ˆ m, ˆ w ) satisfying the typicality check ( x n ( ˆ m, ˆ w ) , y n ) ∈ T nǫ ( Q XY ) . (18)Otherwise he puts out ( ˆ m, ˆ w ) = (1 , . Analysis:
Define the events E : { ( ˆ M , ˆ W ) = ( M, W ) } E : D ( P MZ n || P M Q nZ ) > ξ . (19)Let E = E ∪ E so that we have Pr[ E ] ≤ Pr[ E ] + Pr[ E ] (20)where we have used the union bound. Pr[ E ] can be madesmall with large n as long as R + R < I ( X ; Y ) − δ ǫ ( n ) (21) where δ ǫ ( n ) → as n → ∞ (see [10]) which implies that P ( n ) e is small. Pr[ E ] can be made small with large n as long as [7,Theorem 1] R > I ( X ; Z ) + δ ′ ǫ ( n ) (22)where δ ′ ǫ ( n ) → as n → ∞ . This is because the averagedivergence over M , W , e C and Z n satisfies (see [7, Equ. (9)])E [ D ( P MZ n || P M Q nZ )] ( a ) = E [ D ( P M || P M ) + D ( P Z n | M || Q nZ | P M )] ( b ) = E " log P L j =1 Q nZ | X ( Z n | X n ( M, j )) L · Q nZ ( Z n ) = L X m =1 L X w =1 L · L E " log P L j =1 Q nZ | X ( Z n | X n ( m, j )) L · Q nZ ( Z n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M = m, W = w ( c ) ≤ L X m =1 L X w =1 L · L E " log Q nZ | X ( Z n | X n ( m, w )) L · Q nZ ( Z n ) + 1 ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M = m, W = w ( d ) = E " log Q nZ | X ( Z n | X n ) L · Q nZ ( Z n ) + 1 ! (23)where(a) follows from the chain rule for informational divergence;(b) follows from (16) and by taking the expectation over M, W, X n (1 , , . . . , X n ( L, L ) , Z n ;(c) follows by the concavity of the logarithm and Jensen’s in-equality applied to the expectation over the X n ( m, j ) , j = w for a fixed m ;(d) follows by choosing X n Z n ∼ Q nXZ .Next we can show that the right hand side (RHS) of (23) issmall if (22) is valid by splitting the expectation in (23) intosums of typical and atypical pairs (see [7, Equ. (11)-(16)]).But if the RHS of (23) approaches , then using (7) we haveE [ I ( M ; Z n ) + D ( P Z n || Q nZ )] → . (24)Combining (20), (21) and (22) we can make Pr[ E ] → as n → ∞ as long as R + R < I ( X ; Y ) (25) R > I ( X ; Z ) . (26)We hence have the achievability of any R satisfying ≤ R < max Q X [ I ( X ; Y ) − I ( X ; Z )] . (27)Of course, if the RHS of (27) is non-positive, then we require R = 0 . Now we prefix a channel Q nX | V to the original channel Q nY Z | X and obtain a new channel Q nY Z | V where Q nY Z | V ( y n , z n | v n )= X x n ∈ supp ( Q nX | V ( ·| v n )) Q nX | V ( x n | v n ) Q nY Z | X ( y n , z n | x n ) . (28)Using a similar analysis as above, we have the achievabilityof any R satisfying ≤ R < max Q V X [ I ( V ; Y ) − I ( V ; Z )] (29)where the maximization is over all Q V X satisfying (12).Again, if the RHS of (29) is non-positive, then we require R = 0 . As usual, the purpose of adding the auxiliary variable V is to potentially increase R . Note that V = X recovers(27). Hence, the RHS of (27) is always smaller than or equalto the RHS of (29). Remark 1:
The average divergence E [ D ( P MZ n || P M Q nZ )] can be viewed as the sum of I ( M e C ; Z n ) and D ( P Z n || Q nZ ) [11, Sec. III] (see also [7, Sec. III-B]). To see this, considerE [ D ( P MZ n || P M Q nZ )] ( a ) = E " log P L j =1 Q nZ | X ( Z n | X n ( M, j )) L · Q nZ ( Z n ) = L X m =1 L X x n (1 , · · · X x n ( L,L ) ( L,L ) Y k =(1 , Q nX ( x n ( k )) X z n L X w =1 L Q nZ | X ( z n | x n ( m, w ))log " P L j =1 1 L Q nZ | X ( z n | x n ( m, j )) Q nZ ( z n ) = L X m =1 P ( m ) X C P ( C| m ) X z n P ( z n | m, C ) log P ( z n | m, C ) Q nZ ( z n )= X ( m, C ,z n ) P ( m, C , z n ) (cid:18) log P ( z n | m, C ) P ( z n ) + log P ( z n ) Q nZ ( z n ) (cid:19) = I ( M e C ; Z n ) + D ( P Z n || Q nZ ) (30)where (a) follows by (23)(b). Therefore, asE [ D ( P MZ n || P M Q nZ )] → we have I ( M e C ; Z n ) → which means that M e C and Z n are (almost) independent.This makes sense, since for effective secrecy the adversarylearns little about M nor about the presence of meaningfultransmission. B. Converse
The converse follows as in [2, Theorem 1]. We provide analternative proof using the telescoping identity [12, Sec. G].Suppose that for some ξ , ξ > there exists a sufficientlylarge n , an encoder and a decoder such that (9) and (10) are satisfied. We have log L = nR = H ( M )= I ( M ; Y n ) + H ( M | Y n ) ( a ) ≤ I ( M ; Y n ) + (1 + ξ · nR ) ( b ) ≤ I ( M ; Y n ) − I ( M ; Z n ) + ξ + (1 + ξ · nR ) (31)where ( a ) follows from Fano’s inequality and ( b ) follows from(7) and (10). Using the telescoping identity [12, Equ. (9) and(11)] we have n [ I ( M ; Y n ) − I ( M ; Z n )]= n X i =1 [ I ( M ; Z ni +1 Y i ) − I ( M ; Z ni Y i − )]= 1 n n X i =1 [ I ( M ; Y i | Y i − Z ni +1 ) − I ( M ; Z i ; | Y i − Z ni +1 )] ( a ) = I ( M ; Y T | Y T − Z nT +1 T ) − I ( M ; Z T | Y T − Z nT +1 T ) ( b ) = I ( V ; Y | U ) − I ( V ; Z | U ) ≤ max Q UV X [ I ( V ; Y | U ) − I ( V ; Z | U )] ≤ max u max Q V X | U = u [ I ( V ; Y | U = u ) − I ( V ; Z | U = u )] (32) ( c ) = max Q V X [ I ( V ; Y ) − I ( V ; Z )] (33)where(a) follows by letting T be independent of all other randomvariables and uniformly distributed over { , . . . , n } ;(b) follows by defining U = Y T − Z nT +1 T, V = M U,X = X T , Y = Y T , Z = Z T ; (34)(c) follows because if the maximum in (32) is achieved for U = u ∗ and Q V X | U = u ∗ , then the same can be achievedin (33) by choosing a Q V X = Q V X | U = u ∗ .Combining (31) and (33) we have R ≤ max Q V X [ I ( V ; Y ) − I ( V ; Z )]1 − ξ + ξ + 1(1 − ξ ) n . (35)Letting n → ∞ , ξ → , and ξ → , we have R ≤ max Q V X [ I ( V ; Y ) − I ( V ; Z )] (36)where the maximization is over all Q V X satisfying the Markovchain (12). The cardinality bound in Theorem 1 was derivedin [13, Theorem 22.1]. This completes the converse.
C. Broadcast Channels with Confidential Messages
Broadcast channels with confidential messages (BCC) [2]are wire-tap channels with common messages. For the BCC(Fig. 2), Joey has a common message M destined for bothChandler and Ross which is independent of M and uniformlyPSfrag replacements Joey ChandlerRoss MM WU n X n Y n Z n ˆ M ˆ M f M Q nY Z | X Encoder Decoder Decoder I ( M ; Z n ) Fig. 2. A broadcast channel with a confidential message. distributed over { , . . . , L } , L = 2 nR . An encoder maps M and M to X n = f ( M , M, W ) (37)which is sent through the channel Q nY Z | X . Chandler estimates ( ˆ M , ˆ M ) from Y n while Ross estimates f M from Z n . Theaverage error probability is P ∗ ( n ) e = Pr hn ( ˆ M , ˆ M ) = ( M , M ) o ∪ n f M = M oi (38)and non-secrecy is measured by D ( P MZ n || P M Q nZ ) . A ratepair ( R , R ) is achievable if, for any ξ , ξ > , there is asufficiently large n , an encoder and two decoders such that P ∗ ( n ) e ≤ ξ (39) D ( P MZ n || P M Q nZ ) ≤ ξ . (40)The effective secrecy capacity region C BCC is the closure ofthe set of achievable ( R , R ) . We have the following theorem. Theorem 2: C BCC is the same as the weak and strongsecrecy capacity region C BCC = [ ( R , R ) :0 ≤ R ≤ min { I ( U ; Y ) , I ( U ; Z ) } ≤ R ≤ I ( V ; Y | U ) − I ( V ; Z | U ) (41)where the union is over all distributions Q UV X satisfying theMarkov chain U − V − X − Y Z. (42)One may restrict the alphabet sizes to |U| ≤ |X | + 3; |V| ≤ |X | + 4 |X | + 3 . (43) Proof:
The proof is omitted due to the similarity to theproof of Theorem 1.
D. Choice of Security Measures
Effective secrecy includes both strong secrecy and stealthcommunication. One may argue that using only I ( M ; Z n ) or D ( P Z n || Q nZ ) would suffice to measure secrecy. However,we consider two examples where secrecy is achieved but notstealth, and where stealth is achieved but not secrecy. Example 1: I ( M ; Z n ) → , D ( P Z n || Q nZ ) = D > .Suppose that Joey inadvertently uses e Q X rather than Q X forcodebook generation, where (22) is still satisfied. The new e Q X PSfrag replacements A nF ( A nF ) c Q nZ ( z n ) P Zn ( z n ) > F Q nZ ( z n ) P Zn ( z n ) ≤ FH = Q nZ H = P Z n Fig. 3. Example of the decision regions A nF and (cid:0) A nF (cid:1) c . could result in a different expected e Q nZ = Q nZ . Hence, as n grows large we have D ( P MZ n || P M Q nZ ) = I ( M ; Z n ) + D ( e Q nZ || Q nZ ) (44)where I ( M ; Z n ) → but we have D ( e Q nZ || Q nZ ) = D, for some D > . (45)Ross thus recognizes that Joey is transmitting useful informa-tion even though he cannot decode. Example 2: I ( M ; Z n ) = I > , D ( P Z n || Q nZ ) → . Note that E [ D ( P Z n || Q nZ )] → as n → ∞ as long as (see[7, Theorem 1]) R + R > I ( X ; Z ) . (46)If Joey is not careful and chooses R such that (22) is violatedand (46) is satisfied, then D ( P Z n || Q nZ ) can be made small butwe have I ( M ; Z n ) = I for some I > . (47)Thus, although the communication makes D ( P Z n || Q nZ ) small,Ross can learn I ( M ; Z n ) ≈ n [ I ( X ; Z ) − R ] (48)bits about M if he is willing to pay a price and always triesto decode (see Sec. IV).IV. H YPOTHESIS T ESTING
The reader may wonder how D ( P Z n || Q nZ ) relates to stealth.We consider a hypothesis testing framework and show that aslong as (46) is satisfied, the best Ross can do to detect Joey’saction is to guess.For every channel output z n , Ross considers two hypotheses H = Q nZ (49) H = P Z n . (50)If H is accepted, then Ross decides that Joey’s transmission isnot meaningful, whereas if H is accepted, then Ross decidesthat Joey is sending useful messages. We define two kinds oferror probabilities α = Pr { H is accepted | H is true } (51) β = Pr { H is accepted | H is true } . (52) The value α is referred to as the level of significance [14] andcorresponds to the probability of raising a false alarm, while β corresponds the probability of mis-detection. In practice,raising a false alarm can be expensive. Therefore, Ross wouldlike to minimize β for a given tolerance level of α . To thisend, Ross performs for every z n a ratio test Q nZ ( z n ) P Z n ( z n ) = r (53)and makes a decision depending on a threshold F , F ≥ ,namely (cid:26) H is accepted if r > FH is accepted if r ≤ F . (54)Define the set of z n for which H is accepted as A nF = (cid:26) z n : Q nZ ( z n ) P Z n ( z n ) > F (cid:27) (55)and ( A nF ) c is the set of z n for which H is accepted (seeFig. 3). Ross chooses the threshold F and we have α = Q nZ (( A nF ) c ) = 1 − Q nZ ( A nF ) β = P Z n ( A nF ) . (56)The ratio test in (53) is the Neyman-Pearson test which is optimal [14, Theorem 3.2.1] in the sense that it minimizes β for a given α . We have the following lemma. Lemma 1: If D ( P Z n || Q nZ ) ≤ ξ , ξ > , then with theNeyman-Pearson test we have − g ( ξ ) ≤ α + β ≤ g ( ξ ) (57)where g ( ξ ) = p ξ · (58)which goes to as ξ → . Proof:
Since D ( P Z n || Q nZ ) ≤ ξ , we have (see (60)) || P Z n − Q nZ || TV ≤ p ξ · g ( ξ ) (59)where || P X − Q X || TV = X x ∈X | P ( x ) − Q ( x ) | (60)is the variational distance between P X and Q X and wherethe inequality follows by Pinsker’s inequality [15, Theorem11.6.1]. We further have || P Z n − Q nZ || TV = X z n ∈A nF | P Z n ( z n ) − Q nZ ( z n ) | + X z n ∈ ( A nF ) c | P Z n ( z n ) − Q nZ ( z n ) |≥ X z n ∈A nF | P Z n ( z n ) − Q nZ ( z n ) | ( a ) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X z n ∈A nF [ P Z n ( z n ) − Q nZ ( z n )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = | P Z n ( A nF ) − Q nZ ( A nF ) | = | β − (1 − α ) | (61) PSfrag replacements αβ
01 11 − g ( ξ ) 1 − g ( ξ ) StealthNo StealthFig. 4. Optimal tradeoff between α and β . where (a) follows by the triangle inequality. Combining (59)and (61), we have the bounds (57).Fig. 4 illustrates the optimal tradeoff between α and β forstealth communication, i.e., when (46) is satisfied. As n → ∞ and ξ → , we have D ( P Z n || Q nZ ) → (62) ( α + β ) → . (63)If Ross allows no false alarm ( α = 0 ), then he always ends upwith mis-detection ( β = 1 ). If Ross tolerates no mis-detection( β = 0 ), he pays a high price ( α = 1 ). Further, for any given α , the optimal mis-detection probability is β opt = 1 − α. (64)But Ross does not need to see Z n or perform an optimal testto achieve β opt . He may randomly choose some A ′ such that Q nZ (( A ′ ) c ) = α (65)and achieves β ′ opt = 1 − α . The best strategy is thus to guess.On the other hand, if lim n →∞ D ( P Z n || Q nZ ) > (66)then Ross detects Joey’s action and we can have α + β = 0 . (67)We thus operate in one of two regimes in Fig. 4, either near ( α, β ) = (0 , or near the line α + β = 1 .V. D ISCUSSION
Our resolvability proof differs from that in [6] in that werely on unnormalized informational divergence [7] insteadof variational distance [5]. Our proof is simpler and theresult is stronger than that in [6] when restricting attention toproduct distributions and memoryless channels because a small D ( P MZ n || P M Q nZ ) implies small I ( M ; Z n ) and D ( P Z n || Q nZ ) while a small || P X n − Q nX || TV implies only a small I ( M ; Z n ) [4, Lemma 1].Hayashi studied strong secrecy for wire-tap channels usingresolvability based on unnormalized divergence and he derivedbounds for nonasymptotic cases [11, Theorem 3]. We remarkthat Theorem 1 can be derived by extending [11, Lemma 2] toasymptotic cases. However, Hayashi did not consider stealthbut focused on strong secrecy, although he too noticed a formalconnection to (7) [11, p. 1568]. A CKNOWLEDGMENT
J. Hou and G. Kramer were supported by an Alexandervon Humboldt Professorship endowed by the German Fed-eral Ministry of Education and Research. G. Kramer wasalso supported by NSF Grant CCF-09-05235. J. Hou thanksRafael Schaefer for useful discussions.R
EFERENCES[1] A. Wyner, “The wire-tap channel,”
Bell Syst. Tech. Journal , vol. 54,no. 8, pp. 1355–1387, Oct. 1975.[2] I. Csisz´ar and J. K¨orner, “Broadcast channels with confidential mes-sages,”
IEEE Trans. Inf. Theory , vol. 24, no. 3, pp. 339–348, May 1978.[3] U. Maurer and S. Wolf, “Information-theoretic key agreement: Fromweak to strong secrecy for free,” in
Advances in Cryptology - Eurocrypt2000 . Lecture Notes in Computer Science, Springer-Verlag, 2000, pp.351–368.[4] I. Csisz´ar, “Almost independence and secrecy capacity,”
Prob. Inf. Trans. ,vol. 32, no. 1, pp. 40–47, Jan.–March 1996.[5] T. Han and S. Verd´u, “Approximation theory of output statistics,”
IEEETrans. Inf. Theory , vol. 39, no. 3, pp. 752–772, May 1993.[6] M. Bloch and N. Laneman, “Strong secrecy from channel resolvability,”
IEEE Trans. Inf. Theory , vol. 59, no. 12, pp. 8077–8098, Dec. 2013.[7] J. Hou and G. Kramer, “Informational divergence approximations toproduct distributions,” in , Toronto, Canada, June 2013, pp. 76–81.[8] A. Winter, “Secret, public and quantum correlation cost of triples ofrandom variables,” in
IEEE Int. Symp. Inf. Theory , Adelaide, Australia,Sept. 2005, pp. 2270–2274.[9] J. L. Massey,
Applied Digital Information Theory , ETH Zurich, Zurich,Switzerland, 1980-1998.[10] A. Orlitsky and J. Roche, “Coding for computing,”
IEEE Trans. Inf.Theory , vol. 47, no. 3, pp. 903–917, March 2001.[11] M. Hayashi, “General nonasymptotic and asymptotic formulas in chan-nel resolvability and identification capacity and their application to thewiretap channel,”
IEEE Trans. Inf. Theory , vol. 52, no. 4, pp. 1562–1575, April 2006.[12] G. Kramer, “Teaching IT: An identity for the Gelfand-Pinsker converse,”
IEEE Inf. Theory Society Newsletter , vol. 61, no. 4, pp. 4–6, Dec. 2011.[13] A. El Gamal and Y.-H. Kim,
Network Information Theory . CambridgeUniversity Press, 2011.[14] E. L. Lehmann and J. P. Romano,
Testing Statistical Hypotheses , 3rd ed.New York: Springer, 2005.[15] T. Cover and J. Thomas,