Secrecy Results for Compound Wiretap Channels
aa r X i v : . [ c s . I T ] N ov Secrecy results for compound wiretap channels
Igor Bjelakovi´c,
1, *
Holger Boche,
1, ** and Jochen Sommerfeld
1, *** Lehrstuhl f¨ur theoretische Informationstechnik,Technische Universit¨at M¨unchen, 80290 M¨unchen, Germany
We derive a lower bound on the secrecy capacity of the compound wiretap channel with channelstate information at the transmitter which matches the general upper bound on the secrecy capacityof general compound wiretap channels given by Liang et al. and thus establishing a full codingtheorem in this case. We achieve this with a stronger secrecy criterion and the maximum errorprobability criterion, and with a decoder that is robust against the effect of randomisation in theencoding. This relieves us from the need of decoding the randomisation parameter which is in generalnot possible within this model. Moreover we prove a lower bound on the secrecy capacity of thecompound wiretap channel without channel state information and derive a multi-letter expressionfor the capacity in this communication scenario.
1. INTRODUCTIONCompound wiretap channels are among the simplest non-trivial models incorporating the requirement ofsecurity against a potential eavesdropper while at the same time the legitimate users suffer from channel un-certainty. They may be considered therefore as a starting point for theoretical investigation tending towardsapplications, for example, in wireless systems, a fact explaining an alive research activity in this area inrecent years (cf. [1], [2] and references therein). In this article we give capacity results for different scenariosof channel state information under a strong secrecy criterion and the maximum error probability criterion.In a more recent work [3] the authors make use of these results to derive capacity results for arbitrarilyvarying wiretap channels, a more realistic communication model, which, apart from eavesdropping, takesinto account an active adversarial jamming situation.In this paper we consider finite families of pairs of channels W = { ( W t , V t ) : t = 1 , . . . , T } with commoninput alphabet and possibly different output alphabets. The legitimate users control W t and the eavesdrop-per observes the output of V t . We will be dealing with two communication scenarios. In the first one thetransmitter is informed about the index t (channel state information (CSI) at the transmitter) while in thesecond the transmitter has no information about that index at all (no CSI). In both scenarios the eaves-dropper knows and the legitimate receiver does not know the channel state. This setup is a generalisationof Wyner’s [4] wiretap channel.Along the way we will comment what our results look like when applied to widely used class of models ofthe form W = { ( W t , V s ) : t = 1 , . . . , T, s = 1 , . . . , S } with T = S which are special cases of the model we aredealing with in this paper.Our contributions are summarised as follows: In [1] a general upper bound on the capacity of compoundwiretap channel as the minimum secrecy capacity of the involved wiretap channels was given. We prove inSection 3.2 that the models whose secrecy capacity matches this upper bound contain all compound wiretapchannels with CSI at the transmitter. At the same time we achieve this bound with a substantially strongersecurity criterion employed already in [5], [6], [7], and [8]. Indeed, our security proof follows closely thatdeveloped in [8] for single wiretap channel with classical input and quantum output. In order to achievesecrecy we follow the common approach according to which randomised encoding is a permissible opera-tion. Usually , the legitimate decoder can decode the sent codeword that represents both the message to betransmitted and the outcome of the random experiment as well. However, in the case of compound wiretapchannel with CSI at the transmitter this strategy does not work as is illustrated by an example in Section4.1. We resolve this difficulty by developing a decoding strategy which is independent of the particularchannel realisation and is insensitive to randomisation while decoding just at the optimal secrecy rate for allchannels { W t : t = 1 , . . . , T } simultaneously. * Electronic address: [email protected] ** Electronic address: [email protected] ***
Electronic address: [email protected]
Moreover, a slight modification of our proofs allows us to determine the capacity of the compound wiretapchannel without CSI by a (non-computable) multi-letter expression. This is content of Section 3.3. Weshould mention, however, that the traditional proof strategy of sending the pair consisting of message andrandomisation parameter to the legitimate receiver works as well in the case where the transmitter has noCSI. The lower bound on the secrecy capacity, we proofed under the strong secrecy criterion, we have usedfor parts of the secrecy results for arbitrarily varying wiretap channels in [3]. The lower bound on the secrecycapacity as well the as the multi-letter expression were given earlier in [1] respective in [2] for weaker secrecycriteria but without detailed proofs.In Section 4.2 we give an example of compound wiretap channel such that both the set of channels to thelegitimate receiver and to the eavesdropper are convex but whose secrecy capacities with CSI and withoutCSI at the transmitter are different. Indeed the former is positive while the latter is equal to .Section 3.4 is devoted to the practically important model W = { ( W t , V s ) : t = 1 , . . . , T, s = 1 , . . . , S } with theassumption that the transmitter has CSI for the T -part but has no CSI for the S -part of the channel. Hereagain we provide a multi-letter expression for the capacity. Additionally, we give a computable descriptionof the secrecy capacity in the case where the channels to the eavesdropper are degraded versions of those tothe legitimate receiver.Our results are easily extended to arbitrary sets (even uncountable) of wiretap channels via standard ap-proximation techniques [9]. 2. COMPOUND WIRETAP CHANNELS Let
A, B, C be finite sets and θ = { , . . . , T } an index set. We consider two families of channels W t : A →P ( B ) , V t : A → P ( C ) , t ∈ θ , which we collectively abbreviate by W and call the compound wiretap channelgenerated by the given families of channels. Here the first family represents the communication link to thelegitimate receiver while the output of the latter is under control of the eavesdropper. In the rest of thepaper expressions like W ⊗ nt or V ⊗ nt stand for the n -th memoryless extension of the stochastic matrices W t , V t .An ( n, J n ) code for the compound wiretap channel W consists of a stochastic encoder E : J n → P ( A n ) (astochastic matrix) with a message set J n := { , . . . , J n } and a collection of mutually disjoint decoding sets { D j ⊂ B n : j ∈ J n } . The maximum error probability of a ( n, J n ) code C n is given by e ( C n ) := max t ∈ θ max j ∈J n X x n ∈ A n E ( x n | j ) W ⊗ nt ( D cj | x n ) . (1)I.e. neither the sender nor the receiver have CSI.If channel state information is available at the transmitter the notion of ( n, J n ) code is modified in that theencoding may depend on the channel index while the decoding sets remain universal, i.e. independent of thechannel index t . The probability of error in (1) changes to e CSI ( C n ) := max t ∈ θ max j ∈J n X x n ∈ A n E t ( x n | j ) W ⊗ nt ( D cj | x n ) . We assume throughout the paper that the eavesdropper always knows which channel is in use.
Definition 2.1.
A non-negative number R is an achievable secrecy rate for the compound wiretap channel W with or without CSI respectively if there is a sequence ( C n ) n ∈ N of ( n, J n ) codes such that lim n →∞ e ( C n ) = 0 resp. lim n →∞ e CSI ( C n ) = 0 , lim inf n →∞ n log J n ≥ R, P ( B ) denotes the set of probability distributions on B . and lim n →∞ max t ∈ θ I ( J ; Z nt ) = 0 , (2) where J is a uniformly distributed random variable taking values in J n and Z nt are the resulting randomvariables at the output of eavesdropper’s channel V ⊗ nt .The secrecy capacity in either scenario is given by the largest achievable secrecy rate and is denoted by C S ( W ) and C S,CSI ( W ) . A weaker and widely used security criterion is obtained if we replace (2) by lim n →∞ max t ∈ θ n I ( J ; Z nt ) = 0 .We prefer to follow [5], [7], and [8] and require the validity of (2). A nice discussion on interrelation of severalsecrecy criteria is contained in [2]. We confine ourselves to giving some hints on the operational meaning ofthe requirement (2). To this end we restrict our attention to the case where the transmitter has no CSI inorder to simplify our notation. The case of compound wiretap channel with CSI at the transmitter can betreated accordingly. Set ε n := max t ∈ θ I ( J ; Z nt ) with lim n →∞ ε n = 0 . Then Pinsker’s inequality implies that || p JZ nt − p J ⊗ p Z nt || ≤ c √ ε n ∀ t ∈ θ, (3)with a positive universal constant c , where || · || is the variational distance. Suppose that the eavesdropperchooses for each t ∈ θ decoding sets { K j,t ⊂ C n : j ∈ J n } with C n = S j ∈J n K j,t . We will lower bound theaverage error probability (and consequently the maximum error probability) for every choice of the decodingrule the eavesdropper might make. Set e av ( t ) := 1 J n X j ∈J n X x n ∈ A n E ( x n | j ) V ⊗ nt ( K cj,t | x n ) . Then e av ( t ) = X j ∈J n p JZ nt ( { j } × K cj,t ) = p JZ nt (cid:0) [ j ∈J n { j } × K cj,t (cid:1) ≥ p J ⊗ p Z nt (cid:16) [ j ∈J n { j } × K cj,t (cid:17) − c √ ε n = X j ∈J n p J ⊗ p Z nt (cid:0) { j } × K cj,t (cid:1) − c √ ε n = 1 J n X j ∈J n p Z nt ( K cj,t ) − c √ ε n = J n − J n − c √ ε n = 1 − J n − c √ ε n , (4)where in the first and the third line we have used the fact that the sets { j } × K cj,t , j ∈ J n , are mutuallydisjoint, the second line follows from (3), and in the fourth line we merely observed that for any non-negativenumbers a , . . . , a J with P Jj =1 a j = 1 we have P Jj =1 (1 − a j ) = J − . Consequently, the average (and hencemaximum) error probability of every decoding strategy the eavesdropper might select tends to as soon as J n → ∞ . It should be remarked, however, that although for the vast majority of messages the eavesdropperwill be in error there is still a possibility left that she/he can decode a small fraction of them correctly.As will follow from the proofs below we will have ε n = 2 − na , a > , and J n = 2 nR , R > , if the secrecycapacity is positive so that the speed of convergence in (4) will be exponential.Notice that (3) means that the random variables Z nt at the output of the channel to the eavesdropper arealmost independent of the random variable J embodying the messages to be transmitted to the legitimatereceiver. Therefore it is heuristically convincing that our criterion (2) offers secrecy to some extent forcommunication tasks going beyond the transmission of messages. To demonstrate this by an example weintroduce, based on [10], the notion of identification attack as follows. Suppose that for each fixed t ∈ θ andany j ∈ J n there is a subset K j,t ⊂ C n on the eavesdropper’s output alphabet where now the sets K j,t neednot necessarily be mutually disjoint. With E : J n → P ( A n ) being the stochastic encoder used to transmitmessages to the legitimate receiver we can write down the identification errors of first and second kind (cf.[10] for further explanation of this code concept) for the eavesdropper’s channel as X x n ∈ A n E ( x n | j ) V ⊗ nt ( K cj,t | x n ) , (5)and X x n ∈ A n E ( x n | i ) V ⊗ nt ( K j,t | x n ) (6)for j, i ∈ J n , i = j .One possible interpretation of this attack, again based on [10], is that on the eavesdropper’s side of the channelthere are persons F , . . . , F J n observing the output of the channel. The sole interest of F j is whether or notthe message j has been sent to the legitimate receiver. Thus F j performs the hypothesis test represented by K j,t based on his/her knowledge of t ∈ θ and (5), (6) are just the errors of the first resp. second kind forthat hypothesis test.Let us define for j ∈ J n g ( j, t ) := X x n ∈ A n (cid:18) E ( x n | j ) V ⊗ nt ( K cj,t | x n ) + 1 J n − J n X i =1 i = j E ( x n | i ) V ⊗ nt ( K j,t | x n ) (cid:19) which is a number in [0 , .Notice that if g ( j, t ) ≥ − η for some η ∈ (0 , then either X x n ∈ A n E ( x n | j ) V ⊗ nt ( K cj,t | x n ) ≥ − η , or there is at least one i = j with X x n ∈ A n E ( x n | i ) V ⊗ nt ( K j,t | x n ) ≥ − η , or both, so that no reliable identification of message j can be guaranteed. We show now that under assump-tion of (2) we have J n J n X j =1 g ( j, t ) ≥ − η n , η n = o ( n ) (7)so that at most a fraction (1 + η n ) of j ∈ J n can satisfy the inequality g ( j, t ) < . This last assertion is readily seen from (7) by applying Markov’s inequality to the set F := { j ∈ J n : 2 − g ( j, t ) > } . In order to prove (7), note that for any t ∈ θ J n J n X j =1 g ( j, t ) = J n X j =1 (cid:18) p JZ nt ( { j } × K cj,t ) + 1 J n − p JZ nt ( { j } c × K j,t ) (cid:19) = p JZ nt ( [ j ∈J n { j } × K cj,t ) + 1 J n − J n X j =1 p JZ nt ( { j } c × K j,t ) ≥ p J ⊗ p Z nt ( [ j ∈J n { j } × K cj,t ) + 1 J n − J n X j =1 p J ⊗ p Z nt ( { j } c × K j,t ) − c √ ε n − c J n J n − √ ε n = 1 J n J n X j =1 (cid:0) p Z nt ( K cj,t ) + p Z nt ( K j,t ) (cid:1) − c √ ε n J n − J n −
1= 1 − c √ ε n J n − J n − , where in the third line we have used (3) and in the fourth we inserted p J ( { j } c ) = J n − J n .Besides the attempts of the eavesdropper to decode or identify messages we can introduce attacks correspond-ing to each communication task introduced in [11]. It would be interesting, not only from the mathematicalpoint of view, to see against which of them and to what extent secrecy can be guaranteed by the condition(2). 3. CAPACITY RESULTS In what follows we use the notation as well as some properties of typical and conditionally typical sequencesfrom [12]. For p ∈ P ( A ) , W : A → P ( B ) , x n ∈ A n , and δ > we denote by T np,δ the set of typical sequencesand by T nW,δ ( x n ) the set of conditionally typical sequences given x n in the sense of [12].The basic properties of these sets that are needed in the sequel are summarised in the following threelemmata. Lemma 3.1.
Fixing δ > , for every p ∈ P ( A ) and W : A → P ( B ) we have p ⊗ n ( T np,δ ) ≥ − ( n + 1) | A | − ncδ W ⊗ n ( T nW,δ ( x n ) | x n ) ≥ − ( n + 1) | A || B | − ncδ for all x n ∈ A n with c = 1 / (2 ln 2) . In particular, there is n ∈ N such that for each δ > and p ∈ P ( A ) , W : A → P ( B ) and n > n p ⊗ n ( T np,δ ) ≥ − − nc ′ δ W ⊗ n ( T nW,δ ( x n ) | x n ) ≥ − − nc ′ δ holds with c ′ = c .Proof. Standard Bernstein-Sanov trick using the properties of types from [12] and Pinsker’s inequality. Thedetails can be found in [13] and references therein for example.Recall that for p ∈ P ( A ) and W : A → P ( B ) , pW ∈ P ( B ) denotes the output distribution generated by p and W and that x n ∈ T np,δ and y n ∈ T nW,δ ( x n ) imply that y n ∈ T npW, | A | δ . Lemma 3.2.
Let x n ∈ T np,δ , then for V : A → P ( C ) |T npV, | A | δ | ≤ α − V n ( z n | x n ) ≤ β for all z n ∈ T nV,δ ( x n ) hold where α = 2 − n ( H ( pV )+ f ( δ )) (8) β = 2 − n ( H ( V | p ) − f ( δ )) (9) with universal f ( δ ) , f ( δ ) > satisfying lim δ → f ( δ ) = 0 = lim δ → f ( δ ) .Proof. Cf. [12].In addition we need a further lemma which will be used to determine the rates at which reliable transmissionto the legitimate receiver is possible.
Lemma 3.3.
Let p, ˜ p ∈ P ( A ) and two stochastic matrices W, f W : A → P ( B ) be given. Further let q ∈ P ( B ) be the output distribution generated by p and W . Fix δ ∈ (0 , | A || B | ) . Then for every n ∈ N q ⊗ n ( T n f W ,δ (˜ x n )) ≤ ( n + 1) | A || B | − n ( I (˜ p, f W ) − f ( δ )) for all ˜ x n ∈ T n ˜ p,δ holds for a universal f ( δ ) > and lim δ → f ( δ ) = 0 .Proof. The proof can be found in [13] but is given here for the sake of completeness. Let ˜ x n ∈ T n ˜ p,δ and y n ∈ T n f W ,δ (˜ x n ) . Then with the empirical distribution p y n ( b ) = N ( b | y n ) n , b ∈ B it follows by Lemma . in[12] that q n ( y n ) = 2 − n ( D ( p yn || q )+ H ( p yn )) ≤ − nH ( p yn ) , where the inequality holds, since D ( p y n || q ) ≥ . By Lemma . in [12], because ˜ x n ∈ T n ˜ p,δ and y n ∈ T n f W ,δ (˜ x n ) ,it follows that y n ∈ T n ˜ q, | X | δ , where ˜ q is the output distribution generated by ˜ p and ˜ W , and thus X b ∈ B | p y n ( b ) − ˜ q ( b ) | ≤ | A || B | δ By the continuity of the entropy function it follows by . in [12] that | H ( p y n ) − H (˜ q ) | ≤ − | A || B | δ log 2 | A || B | δ | B | =: ϕ ( δ ) with lim δ → ϕ ( δ ) = 0 . By the last two inequalities we obtain that q n ( T n f W ,δ (˜ x n )) ≤ |T n f W ,δ (˜ x n ) | − n ( H (˜ q ) − ϕ ( δ )) . (10)By the proof of Lemma . it follows that |T n f W ,δ (˜ x n ) | ≤ ( n + 1) | A || B | n ( H ( f W | ˜ p )+ ψ ( δ )) with ψ ( δ ) > and lim δ → ψ ( δ ) = 0 . Then from (10) by defining f ( δ ) := ϕ ( δ ) + ψ ( δ ) we end up with q n ( T n f W ,δ (˜ x n )) ≤ ( n + 1) | A || B | − n ( I (˜ p, f W ) − f ( δ )) The assertion still holds if we replace f W by W and ˜ p by p throughout the proof.The last lemma is a standard result from large deviation theory. Lemma 3.4. (Chernoff-Hoeffding bounds) Let Z , . . . , Z L be i.i.d. random variables with values in [0 , andexpectation E Z i = µ , and < ǫ < . Then it follows that P r ( L L X i =1 Z i / ∈ [(1 ± ǫ ) µ ] ) ≤ (cid:18) − L · ǫ µ (cid:19) , where [(1 ± ǫ ) µ ] denotes the interval [(1 − ǫ ) µ, (1 + ǫ ) µ ] .Proof. The proof is given in [14] (cf. Theorem . ) and in [15]. First we consider the case in which the transmitter has full knowledge of the channel state (CSI) whilethe legitimate receiver has no information about the channel state. The main result in this section is thefollowing theorem.
Theorem 3.5.
The secrecy capacity of the compound wiretap channel W with CSI at the transmitter isgiven by C S,CSI ( W ) = min t ∈ θ max U t → X t → ( Y Z ) t ( I ( U t , Y t ) − I ( U t , Z t )) . Here X t is a random variable with probability distribution in P ( A ) and U t is an auxiliary random variablewith range equals A , such that U t , X t , ( Y Z ) t form a Markov chain U t → X t → ( Y Z ) t in this order. Thenthe maximum refers to all random variables satisfying the Markov chain condition such that X t is connectedwith Y t respective Z t by the channels W t respective V t for every t ∈ θ .Notice first that the inequality C S,CSI ( W ) ≤ min t ∈ θ max U t → X t → ( Y Z ) t ( I ( U t , Y t ) − I ( U t , Z t )) is trivially true since we cannot exceed the secrecy capacity of the worst wiretap channel in the family W .This has been already pointed out in [1]. The rest of this section is devoted to the proof of the achievability. Proof.
It suffices to prove that min t ∈ Θ ( I ( X t , Y t ) − I ( X t , Z t )) for ( XY Z ) t as above is an achievable secrecyrate. Then we will have shown that R = min t ∈ Θ ( I ( U t , Y t ) − I ( U t , Z t )) , with U t → X t → ( Y Z ) t form aMarkov chain, is an achievable secrecy rate (cf. [12] page ). We choose p , . . . , p T ∈ P ( A ) and define newprobability distributions on A n by p ′ t ( x n ) := ( p ⊗ nt ( x n ) p ⊗ nt ( T npt,δ ) if x n ∈ T np t ,δ , otherwise , . (11)Define then for z n ∈ C n , x n ∈ A n ˜ Q t,x n ( z n ) = V nt ( z n | x n ) · T nVt,δ ( x n ) ( z n ) on C n . Additionally, we set for z n ∈ C n Θ ′ t ( z n ) = X x n ∈T npt,δ p ′ t ( x n ) ˜ Q t,x n ( z n ) . (12)Now let S := { z n ∈ C n : Θ ′ t ( z n ) ≥ ǫα t } where ǫ = 2 − nc ′ δ (cf. Lemma 3.1) and α t is from (8) in Lemma 3.2computed with respect to p t and V t . By lemma . the support of Θ ′ t has cardinality ≤ α − t since for each x n ∈ T np t ,δ it holds that T nV t ,δ ( x n ) ⊂ T np t V t , | A | δ , which implies that P z n ∈ S Θ t ( z n ) ≥ − ǫ , if Θ t ( z n ) = Θ ′ t ( z n ) · S ( z n ) and Q t,x n ( z n ) = ˜ Q t,x n ( z n ) · S ( z n ) . (13)Now for each t ∈ θ define J n · L n,t i.i.d. random variables X ( t ) jl with j ∈ [ J n ] := { , . . . , J n } and l ∈ [ L n,t ] := { , . . . , L n,t } each of them distributed according to p ′ t with J n = j n [min t ∈ θ ( I ( p t ,W t ) − I ( p t ,V t )) − τ ] k (14) L n,t = j n [ I ( p t ,V t )+ τ ] k (15)for τ > . Moreover we suppose that the random matrices { X ( t ) j,l } j ∈ [ J n ] ,l ∈ [ L n,l ] and { X ( t ′ ) j,l } j ∈ [ J n ] ,l ∈ [ L n,l ] areindependent for t = t ′ . Now it is obvious from (12) and the definition of the set S that for any z n ∈ S Θ t ( z n ) = E Q t,X ( t ) jl ( z n ) ≥ ǫα t if E is the expectation value with respect to the distribution p ′ t . For the randomvariables β − t Q t,X ( t ) jl ( z n ) define the event ι j ( t ) = \ z n ∈ C n L n,t L n,t X l =1 Q t,X ( t ) jl ( z n ) ∈ [(1 ± ǫ )Θ t ( z n )] , (16)and keeping in mind that Θ t ( z n ) ≥ ǫα t for all z n ∈ S it follows that for all j ∈ [ J n ] and for all t ∈ θ Pr { ( ι j ( t )) c } ≤ | C | n exp (cid:16) − L n,t − n [ I ( p t ,V t )+ g ( δ )] (cid:17) (17)by Lemma 3.4, Lemma 3.2, and our choice ǫ = 2 − nc ′ δ with g ( δ ) := f ( δ ) + f ( δ ) + 3 c ′ δ . Making δ > sufficiently small we have for all sufficiently large n ∈ N L n,t − n [ I ( p t ,V t )+ g ( δ )] ≥ n τ . Thus, for this choice of δ the RHS of (17) is double exponential in n uniformly in t ∈ θ and can be madesmaller than ǫJ − n for all j ∈ [ J n ] and all sufficiently large n ∈ N . I.e.Pr { ( ι j ( t )) c } ≤ ǫJ − n ∀ t ∈ θ. (18)Let us turn now to the coding part of the problem. Let p ′ t ∈ P ( A n ) be given as in (11). We abbreviate X := { X ( t ) } t ∈ θ for the family of random matrices X ( t ) = { X ( t ) jl } j ∈ [ J n ] ,l ∈ [ L n,t ] whose components are i.i.d.according to p ′ t . We will show now how the reliable transmission of the message j ∈ [ J n ] can be achievedwhen randomising over the index l ∈ L n,t without any attempt to decode the randomisation parameter atthe legitimate receiver (see section 4.1). To this end let us define for each j ∈ [ J n ] a random set D ′ j ( X ) := [ s ∈ θ [ k ∈ [ L n,s ] T nW s ,δ ( X ( s ) jk ) , and the subordinate random decoder { D j ( X ) } j ∈ [ J n ] ⊆ B n is given by D j ( X ) := D ′ j ( X ) ∩ (cid:18) [ j ′ ∈ [ J n ] j ′ = j D ′ j ′ ( X ) (cid:19) c . (19)Consequently we can define the random average probabilities of error for a specific channel t ∈ θ by λ ( t ) n ( X ) := 1 J n X j ∈ [ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt (( D j ( X )) c | X ( t ) jl ) . (20)Now (19) implies for each t ∈ θ and l ∈ [ L n,t ] W ⊗ nt (( D j ( X )) c | X ( t ) jl ) ≤ W ⊗ nt ( \ s ∈ θ \ k ∈ [ L n,s ] ( T nW s ,δ ( X ( s ) jk )) c | X ( t ) jl ) + X j ′ ∈ [ J n ] j ′ = j X s ∈ θ X k ∈ [ L n,s ] W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | X ( t ) jl ) ≤ W ⊗ nt (( T ⊗ nW t ,δ ( X ( t ) jl )) c | X ( t ) jl ) + X j ′ ∈ [ J n ] j ′ = j X s ∈ θ X k ∈ [ L n,s ] W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | X ( t ) jl ) , (21)where the second inequality follows by the monotonicity of the probability. By Lemma 3.1 and the indepen-dence of all involved random variables we obtain E X ( W ⊗ nt (( D j ( X )) c | X ( t ) jl )) ≤ ( n + 1) | A || B | · − ncδ + X j ′ ∈ [ J n ] j ′ = j X s ∈ θ X k ∈ [ L n,s ] E X ( s ) j ′ k E X ( t ) jl W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | X ( t ) jl ) . (22)We shall find now for j ′ = j an upper bound on E X ( t ) jl W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | X ( t ) jl )= X x n ∈ A n p ′ t ( x n ) W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | x n ) ≤ X x n ∈ A n p ⊗ nt ( x n ) p ⊗ nt ( T np t ,δ ) W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | x n )= q ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k )) p ⊗ nt ( T np t ,δ ) . (23)By Lemma 3.1 and by Lemma 3.3 for any t, s ∈ θ we have p ⊗ nt ( T np t ,δ ) ≥ − ( n + 1) | A | · − ncδ q ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k )) ≤ ( n + 1) | A || B | · − n ( I ( p s ,W s ) − f ( δ )) (24)with a universal f ( δ ) > satisfying lim δ → f ( δ ) = 0 since X ( s ) j ′ k ∈ T np s ,δ with probability 1. Thus insertingthis into (23) we obtain E X ( t ) jl W ⊗ nt ( T nW s ,δ ( X ( s ) j ′ k ) | X ( t ) jl ) ≤ ( n + 1) | A || B | − ( n + 1) | A | · − ncδ · − n ( I ( p s ,W s ) − f ( δ )) for all s, t ∈ θ , all j ′ = j , and all l ∈ [ L n,t ] , k ∈ [ L n,s ] . Now by defining ν n ( δ ) := ( n + 1) | A || B | · − ncδ and µ n ( δ ) := 1 − ( n + 1) | A | · − ncδ thus for each t ∈ θ , l ∈ [ L n,t ] , and j ∈ [ J n ] (22) and (23) lead to E X ( W ⊗ nt (( D j ( X )) c | X ( t ) jl )) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) J n X s ∈ θ L n,s − n ( I ( p s ,W s ) − f ( δ )) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) J n X s ∈ θ − n ( I ( p s ,W s ) − I ( p s ,V s ) − f ( δ ) − τ ) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T · J n · − n (min s ∈ θ ( I ( p s ,W s ) − I ( p s ,V s )) − f ( δ ) − τ ) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T · − n ( τ − f ( δ ) − τ ) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T · − n τ (25)where we have used (15), (14), and we have chosen δ > small enough to ensure that τ − f ( δ ) − τ ≥ τ .Defining a = a ( δ, τ ) := min { cδ , τ } we can find n ( δ, τ, | A | , | B | ) ∈ N such that for all n ≥ n ( δ, τ, | A | , | B | ) E X ( W ⊗ nt (( D j ( X )) c | X ( t ) jl )) ≤ T · − na holds for all t ∈ θ , l ∈ [ L n,t ] , and j ∈ [ J n ] . Consequently, for any t ∈ θ we obtain E X ( λ ( t ) n ( X )) ≤ T · − na . Additionally we define for any t ∈ θ an event ι ( t ) = { λ ( t ) n ( X ) ≤ √ T − n a } . (26)Then using the Markov inequality applied to λ ( t ) n ( X ) along with (26), we obtain thatPr { ( ι ( t )) c } ≤ √ T − n a . (27)0Set ι := \ t ∈ θ J n \ k =0 ι k ( t ) (28)Then with (18), (27), and applying the union bound we obtainPr { ι c } ≤ X t ∈ θ J n X k =0 Pr { ( ι k ( t )) c } ≤ T · ǫ + T · − n a ≤ T · − nc ′′ for a suitable positive constant c ′′ > and all sufficiently large n ∈ N .Hence, we have shown that for each t ∈ θ there exist realisations { ( x ( t ) jl ) j ∈ [ J n ] ,l ∈ [ L n,t ] : t ∈ θ } ∈ ι of X . Now,denoting by k · k the variational distance || p − q || := X x ∈ A | p ( x ) − q ( x ) | for p, q ∈ A , we show that the secrecy level is fulfilled uniformly in t ∈ θ for any particular { ( x ( t ) jl ) j ∈ [ J n ] ,l ∈ [ L n,t ] : t ∈ θ } ∈ ι . (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L n,t L n,t X l =1 V nt ( ·| x ( t ) jl ) − Θ t ( · ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ L n,t L n,t X l =1 (cid:13)(cid:13)(cid:13) V nt ( ·| x ( t ) jl ) − ˜ Q t,x ( t ) jl ( · ) (cid:13)(cid:13)(cid:13) ++ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L n,t L n,t X l =1 (cid:0) ˜ Q t,x ( t ) jl ( · ) − Q t,x ( t ) jl ( · ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L n,t L n,t X l =1 Q t,x ( t ) jl ( · ) − Θ t ( · ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ǫ. (29)In the first term the functions V nt ( ·| x ( t ) jl ) and ˜ Q t,x ( t ) jl ( · ) differ if z n / ∈ T np t V t , | A | δ , so it makes a contribution of ǫ to the bound. In the second term ˜ Q t and Q t are different for z n / ∈ S and because ι j ( t ) and P z n ∈ S Θ t ( z n ) ≥ − ǫ imply that L n,t L n,t X l =1 X z n ∈ S Q t,x ( t ) jl ( z n ) ≥ − ǫ, the second term is bounded by ǫ . The third term is bounded by ǫ which follows directly from (16).For any { ( x ( t ) jl ) j ∈ [ J n ] ,l ∈ [ L n,t ] : t ∈ θ } ∈ ι with the corresponding decoding sets { D j : j ∈ [ J n ] } it follows byconstruction that J n X j ∈ [ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt ( D cj | x ( t ) jl ) ≤ √ T · − na ′ (30)is fulfilled for all t ∈ θ with a ′ > , which means that we have found a ( n, J n ) code with average errorprobability tending to zero for n ∈ N sufficiently large for any channel realisation. Now by a standardexpurgation scheme we show that this still holds for the maximum error probability. We define the set G t := { j ∈ J n : 1 L n,t X l ∈ [ L n,t ] W ⊗ nt ( D cj | x ( t ) jl ) ≤ √ η } (31)with η := √ T · − na ′ and denote its complement as B t := G ct and the union of all complements as B = S t ∈ θ B t .Then (30) and (31) imply that η ≥ J n X j ∈ [ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt ( D cj | x ( t ) jl ) ≥ | B t | J n √ η t ∈ θ and by the union bound it follows that | B | ≤ X t ∈ θ | B t | ≤ T · √ η · J n . After removing all j ∈ B (which are at most a fraction of T − n a ′ of J n ) and relabeling we obtain a new ( n, ˜ J n ) code ( E j , D j ) j ∈ [ ˜ J n ] without changing the rate. The maximum error probability of the new codefulfills for sufficiently large n ∈ N max t ∈ θ max j ∈ [ ˜ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt ( D cj | x ( t ) jl ) ≤ T · − n a ′ . On the other hand, if we set ˆ V nt ( z n | ( j, l )) := V nt ( z n | x ( t ) jl ) (32)and further define ˆ V nt,j ( z n ) = 1 L n,t L n,t X l =1 ˆ V nt ( z n | ( j, l )) , (33) ¯ V nt ( z n ) = 1˜ J n ˜ J n X j =1 ˆ V nt,j ( z n ) , (34)we obtain that k ˆ V nt,j − ¯ V nt k ≤ k ˆ V nt,j − Θ t k + k Θ t − ¯ V nt k≤ ǫ, for all j ∈ [ ˜ J n ] , t ∈ θ with ǫ = 2 − nc ′ δ where we have used the convexity of the variational distance and(29) which still applies by our expurgation procedure. For a uniformly distributed random variable J takingvalues in the set { , . . . , ˜ J n } we obtain with Lemma . of [12] (uniform continuity of the entropy function) I ( J ; Z nt ) = J n X j =1 J n ( H ( ¯ V nt ) − H ( ˆ V nt,j ))= H ( Z nt ) − H ( Z nt | J ) ≤ − ǫ log(10 ǫ ) + 10 nǫ log | C | uniformly in t ∈ θ (for ǫ ≤ e − ). Hence the strong secrecy level of the definition 2.1 holds uniformly in t ∈ θ . Using standard arguments (cf. [12] page ) we then have shown the achievability of the secrecy rate R S = min t ∈ θ max U t → X t → ( Y Z ) t ( I ( U t , Y t ) − I ( U t , Z t )) . (35) Remark.
Note that in the case that W := { W t , V s : t = 1 , . . . T, s = 1 , . . . S } with S = T and the pair ( s, t ) known to the transmitter prior to transmission nothing new happens. A slight modification of the argumentspresented above shows that C S,CSI ( W ) = min ( t,s ) max U → X → ( Y t Z s ) ( I ( U, Y t ) − I ( U, Z s )) . In the previous section we have assumed that the channel state is known to the transmitter. We nowconsider the case where neither the transmitter nor the receiver has knowledge of the channel state. We willprove that2
Theorem 3.6.
For the secrecy capacity C S ( W ) of the compound wiretap channel W without CSI it holdsthat C S ( W ) ≥ max p ∈P ( A ) (min t ∈ θ I ( p, W t ) − max t ∈ θ I ( p, V t )) . Proof.
Caused by the lack of channel knowledge we use a stochastic encoder independent of the channelrealisation. For any p ∈ P ( A ) let p ′ ∈ P ( A n ) be the distribution given by p ′ ( x n ) := ( p ⊗ n ( x n ) p ⊗ n ( T np,δ ) if x n ∈ T np,δ , otherwise . Then analogously to the case with CSI we define ˜ Q t,x n ( z n ) , Q t,x n ( z n ) , and Θ ′ t ( z n ) , Θ t ( z n ) for z n ∈ C n butnow with respect to the distribution p ′ . Consequently, Θ ′ ( · ) has support only on T npV t , | A | δ , and Q t,x n ( · ) and Θ( · ) only on the set S . Furthermore Θ( z n ) ≥ ǫα t for all z n ∈ S . Now define J n · L n i.i.d random variables X jl according to the distribution p ′ independent of t ∈ θ with j ∈ [ J n ] and l ∈ [ L n ] with J n = ⌊ n [min t I ( p,W t ) − max t I ( p,V t ) − τ ] ⌋ (36) L n = ⌊ n [max t I ( p,V t )+ τ ] ⌋ (37)for τ > . Now because Θ t ( z n ) = E Q t,X jl ≥ ǫα t for all z n ∈ S we define the event ι j ( t ) as in (16) for therandom variables β − t Q t,X jl ι j ( t ) = \ z n ∈ C n ( L n L n X l =1 Q t,X jl ( z n ) ∈ [(1 ± ǫ )Θ t ( z n )] ) , but considering the difference that the random variables X jl are independent of the channel state. Thenanalogously to (17) we obtain thatPr { ( ι j ( t )) c } ≤ | C | n exp (cid:16) − L n − n ( I ( p,V t )+ g ( δ )) (cid:17) by Lemma 3.4 and Lemma 3.2. Notice that, because the sender does not know which channel is used, weneed the maximum in the definition of L n . Thus the right-hand side is a double exponential in n and canbe made smaller than ǫJ − n for all j and for all t ∈ θ and sufficiently large n .Now let J n and L n be defined as stated above, and let X n = { X jl } j ∈ [ J n ] ,l ∈ [ L n ] be the set of i.i.d. randomvariables each of them distributed according to p ′ independent of t ∈ θ . As in the case of CSI we can showthat reliable transmission of the message j ∈ [ J n ] can be achieved. To this end define now the randomdecoder { D j ( X n ) } j ∈ [ J n ] ⊆ B n as in (19) but with D ′ j ( X n ) := [ s ∈ θ [ k ∈ [ L n ] T nW s ,δ ( X jk ) , and the random average probabilities of error for a specific channel λ ( t ) n ( X n ) as in (20). Notice that nowboth X n and L n do not depend on t ∈ θ and this holds throughout the entire proof. Then we can give thebound in (21) now by W ⊗ nt (( D j ( X n )) c | X jl ) ≤ W ⊗ nt (( T ⊗ nW t ,δ ( X jl )) c | X jl ) + X j ′ ∈ [ J n ] j ′ = j X s ∈ θ X k ∈ [ L n ] W ⊗ nt ( T nW s ,δ ( X j ′ k ) | X jl ) We can bound the first term in the inequality by ν n ( δ ) := ( n + 1) | A || B | · − ncδ (see (22)). If we average overall codebooks we get E X n ( W ⊗ nt (( D j ( X n )) c | X jl )) ≤ ν n ( δ ) + X j ′ ∈ [ J n ] j ′ = j X s ∈ θ X k ∈ [ L n ] E X j ′ k E X jl W ⊗ nt ( θ nW s ,δ ( X j ′ k ) | X jl ) . E X jl W ⊗ nt ( T nW s ,δ ( X j ′ k ) | X jl ) ≤ q ⊗ nt ( T nW s ,δ ( X j ′ k )) p ⊗ n ( T np,δ ) ≤ ( n + 1) | A || B | − ( n + 1) | A | · − ncδ · − n ( I ( p,W s ) − f ( δ )) for all t ∈ θ , all j ′ = j and all k, l ∈ [ L n ] with a universal f ( δ ) > satisfying lim δ → f ( δ ) = 0 . q ⊗ nt denotes the output distribution generated by the conditional distribution W ⊗ nt and the input distribution p ⊗ n . Additionally we define µ n ( δ ) := 1 − ( n + 1) | A | · − ncδ . Then (25) changes to E X n ( W ⊗ nt (( D j ( X n )) c | X jl )) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T · J n L n · − n (min s I ( p,W s ) − f ( δ )) ≤ ν n ( δ ) + T · − n τ by the definition of J n and L n in (36), (37) and by choosing δ > small enough that τ − τ − f ( δ ) ≥ τ .Now by defining a := min { cδ , τ } and the definition of the error probability the last inequality results in theupper bound E X n ( λ ( t ) n ( X n )) ≤ T · − na for any t ∈ θ and n ∈ N large enough.Now we define the event ι ( t ) for any t ∈ θ and the event ι as in (26) and (28) but with the difference thatthe input is independent of the channel realisation. So by the same reasoning we end inPr { ι c } ≤ T · − nc ′′ for a constant c ′′ > and all sufficiently large n ∈ N , which implies that there exist realisations { x jl } of { X jl } such that x jl ∈ ι for all j ∈ [ J n ] and l ∈ [ L n ] . Then analogously to (29) we get for any channel t ∈ θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L n L n X l =1 V nt ( ·| x jl ) − Θ t ( · ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ǫ differs from the former only by L n in place of L n,t . Hence, following the same arguments subsequent to (30),we have shown that there is a sequence of ( n, ˜ J n ) codes for which max t ∈ θ max j ∈ [ ˜ J n ] L n X l ∈ [ L n ] W ⊗ nt ( D cj | x jl ) ≤ T · − n a ′ holds for sufficiently large n ∈ N , and the strong secrecy level is fulfilled for every channel t ∈ θ by k ˆ V nt,j − ¯ V nt k ≤ ǫ ( ˆ V nt,j , ¯ V nt defined as in (33), (34)) and thus by I ( J ; Z nt ) ≤ − ǫ log(10 ǫ ) + 10 nǫ log | C | which tends to zero for n → ∞ uniformly in t ∈ θ .We turn now to the converse of Theorem 3.6. Actually, we give only a multiletter formula of the upperbound of the secrecy rates. First we need the following lemma.4 Lemma 3.7.
Let W = { ( W t , V t ) : t ∈ θ } be an arbitrary compound wiretap channel without CSI. Then lim n →∞ n max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) exists and we have lim n →∞ n max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt ))= sup n ∈ N n max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) . Proof.
The proof is based on Fekete’s lemma [16]. Consequently, if we apply the lemma to the sequence ( a n ) n ∈ N defined by a n := max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) it suffices to show that the inequality a n + m ≥ a n + a m holds for all n, m ∈ N . This will be done by considering two independent Markov chains U → X n → ( Y nt , Z nt ) and U → ˆ X m → ( ˆ Y mt , ˆ Z mt ) and setting U := ( U , U ) , X n + m := ( X n , ˆ X m ) , and ( Y n + mt , Z n + mt ) :=(( Y nt , ˆ Y mt ) , ( Z nt , ˆ Z mt )) . Then by the definition of a n a n + m ≥ inf t ∈ θ I ( U ; Y n + mt ) − sup t ∈ θ I ( U ; Z n + mt ) ≥ inf t ∈ θ I ( U ; Y nt ) + inf t ∈ θ I ( U ; ˆ Y mt ) − sup t ∈ θ I ( U ; Z nt ) − sup t ∈ θ I ( U ; ˆ Z mt ) . By the independence of the two Markov chains mentioned above and because apart from that these Markovchains were arbitrary we can conclude that a n + m ≥ a n + a m holds for all n, m ∈ N . Proposition 3.8.
The secrecy capacity of the compound wiretap channel in the case of no CSI C S ( W ) isupper bounded by C S ( W ) ≤ lim n →∞ n max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) . Proof.
Let ( C n ) n ∈ N be any sequence of ( n, J n ) codes such that with sup t ∈ θ J n J n X j =1 X x n ∈ A n E ( x n | j ) W ⊗ nt ( D cj | x n ) =: ε ,n , (38)and sup t ∈ θ I ( J ; Z nt ) =: ε ,n it holds that lim n →∞ ε ,n = 0 and lim n →∞ ε ,n = 0 , where J denotes the random variable which is uniformlydistributed on the message set { , . . . , J n } . Let us denote by ˆ J the random variable with values in { , . . . , J n } determined by the Markov chain J → X n → Y nt → ˆ J where the first transition is governed by E , the secondby W ⊗ nt , and the last by the decoding rule. Then we have for any t ∈ θ log J n = H ( J ) = I ( J ; ˆ J ) + H ( J | ˆ J ) ≤ I ( J ; Y nt ) + H ( J | ˆ J ) , (39)5where the inequality follows from the data processing inequality. Then using Fano’s inequality we find that H ( J | ˆ J ) ≤ ε ,n log J n with (38). Thus we can rewrite inequality (39) as (1 − ε ,n ) log J n ≤ I ( J ; Y nt ) + 1 for all t ∈ θ . On the other hand we have for every t ∈ θI ( J ; Y nt ) = I ( J ; Y nt ) − sup t ∈ θ I ( J ; Z nt ) + ε ,n where we have used the validity of the secrecy criterion stated above. Then the last two inequalities implythat for any t ∈ θ (1 − ε ,n ) log J n ≤ I ( J ; Y nt ) − sup t ∈ θ I ( J ; Z nt ) + ε ,n . (40)Since the LHS of (40) does not depend on t we arrive at (1 − ε ,n ) log J n ≤ max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) + ε ,n , which concludes the proof after dividing by n ∈ N , taking lim sup and taking into account the assertion ofLemma 3.7. Remark.
Following the same arguments subsequent to (35) concerning the use of the channels defined by P Y t | T = W t · P X | T and P Z t | T = V t · P X | T instead of W t and V t and applying the assertion of Theorem 3.6to the n -fold product of channels W t and V t , we can give the coding theorem for the multiletter case. Thecapacity of the compound wiretap channel in the case of no CSI is C S ( W ) = lim n →∞ n max U → X n → Y nt Z nt (inf t ∈ θ I ( U ; Y nt ) − sup t ∈ θ I ( U ; Z nt )) . Let us consider now the case W := { W t , V s : t = 1 , . . . T, s = 1 , . . . S } with S = T and the pair ( s, t ) unknown to both the transmitter and the legitimate receiver. Additionally we assume that each V s is adegraded version of every W t , which is characterised by V s ( z | x ) = X y ∈ B W t ( y | x ) D ( t,s ) ( z | y ) , (41)for all x ∈ A , z ∈ C , if D is defined as the stochastic matrix D : B → P ( C ) . Then we have the following Lemma 3.9.
Let p ∈ P ( A ) , W : A → P ( B ) , V : A → P ( C ) , and assume that V is a degraded version of W . Then I ( X ; Y | Z ) is a concave with respect to the input distribution p X = p .Proof. Let
X, Y, Z be random variables with values in
A, B, C respectively distributed according toPr ( X = x, Y = y, Z = z ) := p XY Z ( x, y, z ) = p ( x ) W ( y | x ) D ( z | y ) (42)for all x ∈ A, y ∈ B, z ∈ Z . Because I ( X ; Y | Z ) = H ( Y | Z ) − H ( Y | X, Z ) the proof is based on the two assertions1. H ( Y | Z ) depends concavely on p X , and2. H ( Y | X, Z ) is an affine function of p X .6First, H ( Y | Z ) is a concave function with respect to p Y Z by the log-sum inequality (cf. [12] Lemma 3.1).Then because p XY Z depends affinely on p X by (42), so does p Y Z , and the first assertion follows. For thesecond consider that (41) and (42) imply that p Y | X,Z ( y | x, z ) = W ( y | x ) D ( z | y ) V ( z | x ) for every input distribution p X , any y ∈ B and all x ∈ A, z ∈ C with p XZ ( x, z ) > . Then we have H ( Y | X, Z ) = X x ∈ A,z ∈ C p XZ ( x, z ) H (cid:18) W ( ·| x ) D ( z |· ) V ( z | x ) (cid:19) showing that H ( Y | X, Z ) is an affine function of p XZ which in turn depends affinely on p X .Now because the random variables X, Y t , Z s ( Y t , Z s the channel outputs of W t and V s resp.) form aMarkov chain for all t ∈ θ and s ∈ S , we obtain that I ( X ; Y t | Z s ) = I ( X, Y t ) − I ( X, Z s ) . (43)By virtue of Theorem 2 of [17] we can show that for the secrecy rate it holds that R S ≤ n n X i =1 I ( X i ; Y i,t | Z i,s ) + ǫ ′ for any channel ( t, s ) ∈ θ × S and ǫ ′ > . The concavity of I ( X ; Y t | Z s ) with respect to the input distributions p ∈ P ( A ) together with (43) then imply the converse part of Theorem 3.6, that R S ≤ max p ∈P ( A ) min ( t,s ) ( I ( p, W t ) − I ( p, V s )) . Now we can state the following
Proposition 3.10. If V s is a degraded version of W t for all s ∈ S and t ∈ θ the capacity of the compoundwiretap channel is given by C S ( W ) = max p ∈P ( A ) min ( t,s ) ( I ( p, W t ) − I ( p, V s ))= max p ∈P ( A ) (min t I ( p, W t ) − max s I ( p, V s )) . Remark.
This result was obtained in [1] with a weaker notion of secrecy.
CSI t ) We now consider the case, in which the transmitter has knowledge of the channel state to the legitimatereceiver t ∈ θ but the channel state to the eavesdropper s ∈ S is unknown. We will denote this kindof channel state information by CSI t . Consequently we get for each t ∈ θ possible channel realisations W t := { ( W t , V s ) : s = 1 , . . . S } . Then we can describe the compound channel as W = ∪ t ∈ θ W t . Theorem 3.11.
For the secrecy capacity C S,CSI t ( W ) of the compound wiretap channel with CSI t it holdsthat C S,CSI t ( W ) ≥ min t ∈ θ max p ∈P ( A ) ( I ( p, W t ) − max s ∈S I ( p, V s )) . Proof.
Adapted to the channel realisation W t define p ′ t ( x n ) := ( p ⊗ nt ( x n ) p ⊗ nt ( T npt,δ ) if x n ∈ T np t ,δ , otherwise . (44)7for arbitrary input distributions p , . . . , p T ∈ P ( A ) . Now define for z n ∈ C n and s ∈ S ˜ Q s,x n ( z n ) = V ns ( z n | x n ) · T nVs,δ ( x n ) ( z n ) on C n . Additionally, we set for z n ∈ C n Θ ′ s ( z n ) = X x n ∈T npt,δ p ′ t ( x n ) ˜ Q s,x n ( z n ) . Now let S := { z n ∈ C n : Θ ′ s ( z n ) ≥ ǫα t,s } where ǫ = 2 − nc ′ δ and α t,s is from (8) similar to the former casesbut computed with respect to p t and V s . Then the support of Θ ′ s has cardinality ≤ α − t,s , which implies that P z n ∈ S Θ s ( z n ) ≥ − ǫ . Analogously to (13) define Θ s ( z n ) and Q s,x n ( z n ) with support on S and further J n = ⌊ n [min t ( I ( p t ,W t ) − max s I ( p t ,V s )) − τ ] ⌋ (45) L n,t = ⌊ n [max s I ( p t ,V s )+ τ ] ⌋ . (46)As in the case of CSI define random matrices { X ( t ) jl } j ∈ [ J n ] ,l ∈ [ L n,t ] such that the random variables X ( t ) jl wherei.i.d. according to p ′ t . We suppose additionally that { X ( t ) jl } j,l and { X ( t ′ ) jl } j,l are independent for t = t ′ . Forany z n ∈ S it follows that Θ s ( z n ) = E Q s,X ( t ) jl ( z n ) ≥ ǫα t,s , if E is the expectation value with respect to thedistribution p ′ t . For the random variables β − t,s Q s,X ( t ) jl ( z n ) define the event ι j ( s, t ) = \ z n ∈ C n L n,t L n,t X l =1 Q s,X ( t ) jl ( z n ) ∈ [(1 ± ǫ )Θ s ( z n )] . Then it follows that for all j ∈ [ J n ] and for all s ∈ S it holds for each t ∈ θ Pr { ( ι j ( s, t )) c } ≤ | C | n exp (cid:16) − L n,t − n [ I ( p t ,V s )+ g ( δ )] (cid:17) by Lemma 3.4, Lemma 3.2, Thus the RHS is double exponential in n uniformly in s ∈ S , t ∈ θ (guaranteedby the maximum in s in the definition of L n,t ) and can be made smaller than ǫJ − n for all j ∈ [ J n ] and allsufficiently large n . Now the coding part of the problem is similar to the case with CSI. Let p ′ t ∈ P ( A n ) begiven as in (44). We abbreviate X := { X ( t ) } t ∈ θ for the family of random matrices X ( t ) = { X ( t ) jl } j ∈ [ J n ] ,l ∈ [ L n,t ] whose components are i.i.d. according to p ′ t . We will show how reliable transmission of the message j ∈ [ J n ] can be achieved. To this end define now the random decoder { D j ( X ) } j ∈ [ J n ] ⊆ B n as in (19) and with D ′ j ( X ) := [ r ∈ θ [ k ∈ [ L n,r ] T nW r ,δ ( X ( r ) jk ) , and the random average probabilities of error for a specific channel λ ( t ) n ( X ) as in (20) by λ ( t ) n ( X ) := 1 J n X j ∈ [ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt (( D j ( X )) c | X ( t ) jl ) . As in (21) we get for each t ∈ θ and l ∈ [ L n,t ] W ⊗ nt (( D j ( X )) c | X ( t ) jl ) ≤ W ⊗ nt (( T ⊗ nW t ,δ ( X ( t ) jl )) c | X ( t ) jl ) + X j ′ ∈ [ J n ] j ′ = j X r ∈ θ X k ∈ [ L n,r ] W ⊗ nt ( T nW r ,δ ( X ( r ) j ′ k ) | X ( t ) jl ) , Then by Lemma 3.1 we can bound the first term of the right hand side, such that together with theindependence of all involved random variables we end up with E X ( W ⊗ nt (( D j ( X )) c | X ( t ) jl )) ≤ ( n + 1) | A || B | · − ncδ + X j ′ ∈ [ J n ] j ′ = j X r ∈ θ X k ∈ [ L n,r ] E X ( r ) j ′ k E X ( t ) jl W ⊗ nt ( T nW r ,δ ( X ( r ) j ′ k ) | X ( t ) jl ) . (47)8We shall find now for j ′ = j by the same reasoning as in (23) and (24) an upper bound on E X ( t ) jl W ⊗ nt ( T nW r ,δ ( X ( r ) j ′ k ) | X ( t ) jl ) ≤ q ⊗ nt ( T nW r ,δ ( X ( r ) j ′ k )) p ⊗ nt ( T np t ,δ ) ≤ ( n + 1) | A || B | − ( n + 1) | A | · − ncδ · − n ( I ( p r ,W r ) − f ( δ )) for all r, t ∈ θ , all j ′ = j , and all l ∈ [ L n,t ] , k ∈ [ L n,r ] . Now by defining ν n ( δ ) := ( n + 1) | A || B | · − ncδ and µ n ( δ ) := 1 − ( n + 1) | A | · − ncδ thus for each t ∈ θ , l ∈ [ L n,t ] , and j ∈ [ J n ] (47) and the last inequality leadsto E X ( W ⊗ nt (( D j ( X )) c | X ( t ) jl )) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) J n X r ∈ θ L n,r − n ( I ( p r ,W r ) − f ( δ )) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T J n · − n (min r ∈ θ ( I ( p r ,W r ) − max s I ( p r ,V s )) − f ( δ ) − τ ) ≤ ν n ( δ ) + ( n + 1) | A || B | µ n ( δ ) T · − n τ where we have used the definitions of J n and L n,r in (45), (46) and we have chosen δ > small enough toensure that τ − f ( δ ) − τ ≥ τ . Defining a = a ( δ, τ ) := min { cδ , τ } we can find n ( δ, τ, | A | , | B | ) ∈ N such thatfor all n ≥ n ( δ, τ, | A | , | B | ) we end in E X ( λ ( t ) n ( X )) ≤ T · − na . for any t ∈ θ . To give a bound on the average probability of error we define the event ι ( t ) for any t ∈ θ asin (26) and the event ι := \ t ∈ θ \ s ∈S J n \ k =0 ι k ( t, s ) differs from (28) only by the intersection of the unknown channel states s ∈ S . Thus we can conclude thatPr { ι c } ≤ S · T · ǫ + S · T · − n a ≤ S · T · − nc ′′ holds for a suitable positive constant c ′′ > and all sufficiently large n ∈ N , and we have shown that foreach t ∈ θ there exist realisations { ( x ( t ) jl ) j ∈ [ J n ] ,l ∈ [ L n,t ] : t ∈ θ } ∈ ι of X . By the same reasoning as in (29) weget for any channel realisation t ∈ θ to the legitimate receiver (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L n,t L n,t X l =1 V ns ( ·| x ( t ) jl ) − Θ s ( · ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ǫ for each of the unknown channels s ∈ S to the eavesdropper. Now, because for any t ∈ θ we have a differentcodeword set { x ( t ) jl } , we slightly change the definition in (32) to ˆ V n ( s,t ) ( z n | ( j, l )) := V ns ( z n | x ( t ) jl ) and accordingly to ˆ V n ( s,t ) ,j and ¯ V n ( s,t ) in (33), (34) in that way, that these distributions are defined separatelyfor each codeword set t ∈ θ . Thus we get, that k ˆ V n ( s,t ) ,j − ¯ V n ( s,t ) k ≤ ǫ s ∈ S for each individual channel t ∈ θ to the legitimate receiver.Hence, using the same expurgation scheme as in the previous sections we have shown that there is a sequenceof ( n, ˜ J n ) codes for which max t ∈ θ max j ∈ [ ˜ J n ] L n,t X l ∈ [ L n,t ] W ⊗ nt ( D cj | x ( t ) jl ) ≤ T · − n a ′ holds for sufficiently large n ∈ N , and the strong secrecy level is fulfilled for every channel t ∈ θ by I ( J ; Z ns ) ≤ − ǫ log(10 ǫ ) + 10 nǫ log | C | which tends to zero for n → ∞ for all channels s ∈ S to the eavesdropper. Thus we have shown that R S = min t ∈ θ max p ∈P ( A ) ( I ( p, W t ) − max s :( s,t ) ∈S× θ I ( p, V s )) is an achievable secrecy rate for the compound wiretap channel ∪ t ∈ θ W t in the case where the channel stateto the legitimate receiver is known at the transmitter. Remark.
By considering the converse of Theorem 3.11, we get for each t ∈ θ possible channel realisations W t := { ( W t , V s ) : s = 1 , . . . S } . Then we can describe the compound channel as W = ∪ t ∈ θ W t . In accordanceto the case of no CSI for each t ∈ θ we obtain that C S ( W t ) = lim n →∞ n max U → X n → Y nt Z ns ( I ( U ; Y nt ) − sup s ∈S I ( U ; Z ns )) . Proposition 3.12.
The secrecy capacity of the compound wiretap channel in the case where only the channelstate to the legitimate receiver is known at the transmitter C S,CSI t ( W ) is given by C S,CSI t ( W ) = inf t ∈ θ C S ( W t ) . Now, additionally let us assume that each V s is a degraded version of every W t for s ∈ S and t ∈ θ . Thenas shown in Lemma 3.9 I ( X ; Y t | Z s ) is a concave function with respect to the input distribution p X = p . Inparticular this still holds for min s ∈S I ( X ; Y t | Z s ) . Now because the random variables X, Y t , Z s form a Markovchain for all t ∈ θ and s ∈ S and min s ∈S I ( X ; Y t | Z s ) = I ( X, Y t ) − max s ∈S I ( X, Z s ) , for any t ∈ θ we get the upper bound on the secrecy rate as the secrecy capacity of a single channel W t with S channels to the eavesdropper. Then we can conclude Proposition 3.13.
The secrecy capacity of the channel where only the channel states to the legitimatereceiver are known and the channels to the eavesdropper are degraded versions of those to the legitimatereceiver is given by C S,CSI t ( W ) = min t ∈ θ max p ∈P ( A ) ( I ( p, W t ) − max s ∈S I ( p, V s )) . C S = C S,CSI
Let W := { W t , V s : t = 1 , . . . T, s = 1 , . . . S } with S = T and the pair ( t, s ) unknown to both thetransmitter and the legitimate receiver. In addition let us assume that ∃ ˆ t ∈ θ ∀ t ∈ θ ∃ U t : W ˆ t = U t W t , (48)which means that W ˆ t is a degraded version of all channel W t with t = ˆ t . We further assume that ∃ ˆ s ∈ S ∀ s ∈ S ∃ ˆ U s : V s = ˆ U s V ˆ s , (49)0which means that all V s with s = ˆ s are degraded versions of V ˆ s . Then we can show that the capacity of thischannel equals the capacity of the same channel with CSI at the transmitter, e.g. C S ( W ) = C S,CSI ( W ) . First, by Theorem 3.6 it holds that C S ( W ) ≥ max M → X → ( Y t Z s ) min ( t,s ) ( I ( M, Y t ) − I ( M, Z s )) , (50)where M is an auxiliary random variable, such that M, X, ( Y t , Z s ) form a Markov chain M → X → ( Y t Z s ) in this order. Now let p ∗ MX = arg max M → X → ( Y ˆ t Z ˆ s ) ( I ( M, Y ˆ t ) − I ( M, Z ˆ s )) the joint distribution of M and X that achieves capacity for the single wiretap channel ( W ˆ t , V ˆ s ) . Becausethe capacity of the compound wiretap channel W is less than or equal the capacity of each single channelwe obtain C S,CSI ( W ) ≤ I ( p ∗ M , W ˆ t · P ∗ X | M ) − I ( p ∗ M , V ˆ s · P ∗ X | M ) = C S ( W ˆ t , V ˆ s ) ≤ I ( p ∗ M , U t ( W t · P ∗ X | M )) − I ( p ∗ M , ˆ U s ( V ˆ s · P ∗ X | M )) ≤ I ( p ∗ M , W t · P ∗ X | M ) − I ( p ∗ M , V s · P ∗ X | M ) (51)for all ( s, t ) ∈ S × θ because of (48), (49). Then by the last inequality it follows that I ( p ∗ M , W ˆ t · P ∗ X | M ) − I ( p ∗ M , V ˆ s · P ∗ X | M ) = min ( s,t ) ( I ( p ∗ M , W t · P ∗ X | M ) − I ( p ∗ M , V s · P ∗ X | M )) ≤ max M → X → ( Y t Z s ) min ( t,s ) ( I ( M, Y t ) − I ( M, Z s )) Now taking into account (50) and (51) we end in C S,CSI ( W ) ≤ C S ( W ) and therewith for this channel the lower bound of the capacity without CSI matches the capacity of thecompound wiretap channel with CSI. 4. EXAMPLESIn this section we provide some examples which display some striking features of compound wiretapchannels as opposed to the usual compound channels. Our first example shows clearly that for compoundwiretap channels with CSI at the transmitter the strategy of sending both the message and the randomisationparameter does not work. The second one demonstrates that even in the case where the sets of channels tothe legitimate receiver and the eavesdropper both are convex, we can have C S,CSI ( W ) > and C S ( W ) = 0 , as opposed to the case of the usual compound channel where the Minimax-Theorem applies.In the following we use some simple facts which we state here without proof. Fact 1.
The binary entropy function h ( x ) := − x log x − (1 − x ) log(1 − x ) , x ∈ [0 , , is strictly increasing on [0 , ] . Fact 2.
Let η ∈ [0 , and set D η := (cid:18) − η ηη − η (cid:19) . τ, τ ′ ∈ [0 , it follows that D τ D τ ′ = D τ + τ ′ − ττ ′ . Moreover, if τ, τ ′ ∈ (0 , ) then τ + τ ′ − τ τ ′ ∈ (0 ,
12 ) and τ + τ ′ − τ τ ′ > τ, τ ′ . Fact 3.
For τ, t ∈ [0 ,
1] (1 − t ) D + tD τ = D tτ . Consider a compound wiretap channel W = { ( W t , V t ) : t = 0 , } in the case of CSI at the transmitter.First we define the channels to the legitimate receiver and to the eavesdropper for t = 0 by W = D η , η ∈ [0 ,
12 ) , η ≈ , V := D τ W , τ ∈ [0 ,
12 ) , τ ≈ , and for t = 1 , ˆ τ ∈ (0 , / W := D ˆ τ V = D ˆ τ D τ W , V := (cid:18)
12 1212 12 (cid:19) . Hence V and W are degraded versions of W and I ( p, V ) = 0 , ∀ p ∈ P ( A ) by definition of V . Now for every p ∈ P ( A ) we can choose τ small enough, that I ( p, W ) − I ( p, V ) < I ( p, W ) . Now with p = ( , ) , ν > and because we have CSI at the transmitter we have by the defining equations(14) and (15) J n = 2 n [ I ( p ,W ) − I ( p ,V )) − ν ] L n, = 2 n [ I ( p ,V )+ ν ] such that we obtain J n L n, = 2 n [ I ( p ,W ) − ν ] . But for ˆ τ close to / it holds then that I ( p , W ) − ν > I ( p , W ) = max p ∈P ( A ) I ( p, W ) = C CSI { W , W } , where C CSI { W , W } is the capacity of a compound channel with CSI at the transmitter. Hence we haveshown, that we can achieve reliable transmission of the message j ∈ [ J n ] , but identifying both the messageand the randomizing indices is not possible for all pairs j ∈ [ J n ] and l ∈ [ L n,t ] . This is in contrast to thecase where we have only one channel to both the legitimate receiver and the eavesdropper (cf. [8], [5]). Now, for η, τ ∈ (0 , ) we set W = D η , V := D τ W = D η + τ − ητ ,W := D τ V = D τ − τ W , V := D τ W . V is a degraded version of W , W of V , and V of W . Next we define for t ∈ [0 , W t := (1 − t ) W + tW = (cid:2) (1 − t ) D + tD τ − τ ] W , (52) V t := (1 − t ) V + tV = D τ (cid:2) (1 − t ) D + tD τ − τ (cid:3) W = D τ W t (53)By the definition, the set of channels to the legitimate receiver { W t } and the set of channels to the eaves-dropper { V t } both are convex. Nevertheless we will show now, that for the compound wiretap channel W := { ( W t , V t ) : t ∈ [0 , } we have C S,CSI ( W ) > , C S ( W ) = 0 . To this end, note that by (52), fact 3, and fact 2 we have W t = D t (2 τ − τ ) D η = D f ( t,η,τ ) with f ( t, η, τ ) := η + t (2 τ − τ ) − ηt (2 τ − τ ) ∈ (0 ,
12 ) . (54)Similarly from (53) and fact 2 we obtain V t = D τ D f ( t,η,τ ) = D τ + f ( t,η,τ ) − τf ( t,η,τ ) Additionally from (54) and fact 2 we get τ + f ( t, η, τ ) − τ f ( t, η, τ ) ∈ (0 ,
12 ) and τ + f ( t, η, τ ) − τ f ( t, η, τ ) > f ( t, η, τ ) . (55)Taking p = (1 / , / we obtain for every t ∈ [0 , I ( p, W t ) − I ( p, V t ) = h ( τ + f ( t, η, τ ) − τ f ( t, η, τ )) − h ( f ( t, η, τ )) > where the last inequality follows from fact 1 and (55). Thus we have shown that C S,CSI ( W ) > holds by Theorem 3.5.In order to show that C S ( W ) = 0 , we have to employ our multiletter converse in the case of no CSI,Proposition 3.8. First, a simple algebra shows that for any t ∈ [0 , V t = ((1 − t ) D + tD τ − τ ) V by (53) and thus each V t is a degraded version of V . Let us now consider the Markov chain U → X n → ( Y nt , Z nt ) where the transition from the random variable U to Y nt is governed by P Y nt | U = V ⊗ nt · P X n | U forall t ∈ [0 , . Then we obtain that each P Y nt | U is a degraded version of P Y n | U = V ⊗ n · P X n | U , and the dataprocessing inequality implies that for each n ∈ N max t ∈ [0 , I ( U, Y nt ) = I ( U, Y n ) (56)for all distributions P UX n that satisfy the Markov chain condition U → X n → ( Y nt , Z nt ) .On the other hand, since W = D τ V we obtain for the matrix P Z n | U = W ⊗ n · P X n | U by the data processinginequality and (56) for all n ∈ N I ( U, Z n ) − max t ∈ [0 , I ( U, Y nt ) = I ( U, Z n ) − I ( U, Y n ) ≤ , for all P UX n . Then Proposition 3.8 implies that C S ( W ) = 0 as desired.3ACKNOWLEDGMENTSupport by the Deutsche Forschungsgemeinschaft (DFG) via projects BO 1734/16-1, BO 1734/20-1, and bythe Bundesministerium f¨ur Bildung und Forschung (BMBF) via grant 01BQ1050 is gratefully acknowledged.4REFERENCES
1. Y. Liang, G. Kramer, H.V. Poor, and S. Shamai, “Compound Wiretap Channels,” EURASIP Journal on WirelessCommunications and Networking(2008)2. M. Bloch and J.N. Laneman, “On the secrecy capacity of arbitrary wiretap channel,” Forty-Sixth Annual AllertonConference, Allerton House, Illinois, USA(Sep. 2008)3. I. Bjelacovi´c, H. Boche, and J. Sommerfeld, “Capacity results for arbitrarily varying wiretap channels,” (2012),accepted for publication in LNCS Volume in Memory of Rudolf Ahlswede4. A.D. Wyner, “The wire-tap channel,” The Bell System Tech. J. , 1355–1387 (Oct. 1975)5. I. Csiszar, “Almost independence and secrecy capacity,” Problems of Information Transmission , 40–47 (1996)6. U.M. Maurer and S. Wolf, “Information-theoretic key agreement: From weak to strong secrecy for free,” inAdvances in Cryptology-Eurocrypt 2000, Lecture Notes in Computer Science , 351–368 (2000)7. N Cai, A. Winter, and R.W. Yeung, “Quantum privacy and quantum wiretap channel,” Problems of InformationTransmission , 318–336 (2004)8. Igor Devetak, “The Private Classical Capacity and Quantum Capacity of a Quantum Channel,” IEEE Transactionson Information Theory , 44–55 (January 2005)9. D. Blackwell, L. Breiman, and A. J. Thomasian, “The capacity of a class of channels,” Ann. Math. Stat. ,1229–1241 (1959)10. R. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transactions on Information Theory , 15–29(1989)11. R. Ahlswede, “General theory of information transfer: updated,” General Theory of Information Transfer andCombinatorics, Special Issue of Discrete Applied Mathematics , 1348–1388 (2008)12. I. Csiszar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems , 2nd ed.(Cambridge University Press, 2011)13. Rafael F. Wyrembelski, I. Bjelakovi´c, T. J. Oechtering, and H. Boche, “Optimal coding strategies for bidirectionalbroadcast channels under channel uncertainty,” IEEE Transactions on Communications , 2984–2994 (October2010)14. Devdatt P. Dubhashi and Alessandro Panconesi, Concentration of Measure for the Analysis of Randomized Al-gorithms (Cambridge University Press, 2009)15. R. Ahlswede and A: Winter, “Strong converse for identification via quantum channels,” IEEE Transactions onInformation Theory , 569–579 (March 2002)16. M. Fekete, “ ¨Uber die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzzahligen Koeffizien-ten,” Mathematische Zeitschrift , 228–249 (1923)17. R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography-part I: Secret sharing,”IEEE Transactions on Information Theory , 1121–11132 (July 1993) W D τ V W t V t W D τ V D ˆ τ Figure 1.
Compound wiretap channel W := { ( W t , V t ) : t ∈ [0 , }}