[PDF] Authentication Against a Myopic Adversary

Abstract

We consider keyless authentication for point-to-point communication in the presence of a myopic adversary. In particular, the adversary has access to a non-causal noisy version of the transmission and may use this knowledge to choose the channel state of an arbitrarily-varying channel between legitimate users; the receiver is successful if it either decodes to the correct message or correctly detects adversarial interference. We show that a channel condition called U-overwritability, which allows the adversary to make its false message appear legitimate and untampered with, is a sufficient condition for zero authentication capacity. We present a useful way to compare adversarial channels, and show that once an AVC becomes U-overwritable, it remains U-overwritable for all "less myopic" adversaries. Finally, we show that stochastic encoders are necessary for positive authentication capacity in some cases, and examine in detail a binary adversarial channel that illustrates this necessity. Namely, for this binary channel, we show that when the adversarial channel is degraded with respect to the main channel between users, the no-adversary capacity of the underlying channel is achievable with a deterministic encoder. Otherwise, provided the channel to the adversary is not perfect, a stochastic encoder is necessary for positive authentication capacity; if such an encoder is allowed, the no-adversary capacity is again achievable.

Full PDF

aa r X i v : . [ c s . I T ] J a n Authentication Against a Myopic Adversary

Allison Beemer, Eric Graves, J¨org Kliewer, Oliver Kosut, and Paul Yu

Abstract

We consider keyless authentication for point-to-point communication in the presence of a myopic adversary. Inparticular, the adversary has access to a non-causal noisy version of the transmission and uses this knowledge tochoose the state of an arbitrarily-varying channel between legitimate users. The receiver succeeds by either decodingaccurately or correctly detecting adversarial interference. We show that a channel condition called U -overwritability,which allows the adversary to make its false message appear legitimate and untampered with, is a su ﬃ cient conditionfor zero authentication capacity. We present a useful way to compare adversarial channels, and show that once anAVC becomes U -overwritable, it remains U -overwritable for all “less myopic” adversaries. Finally, we show thatstochastic encoders are necessary for positive authentication capacity in some cases, and examine in detail a binaryadversarial channel that illustrates this necessity. Namely, for this channel, we show that when the adversarial channelis degraded with respect to the main channel, the no-adversary capacity of the underlying channel is achievablewith a deterministic encoder. Otherwise, provided the channel to the adversary is not perfect, a stochastic encoderis necessary for positive authentication capacity; if such an encoder is allowed, the no-adversary capacity is againachievable. Index Terms

Authentication, keyless authentication, arbitrarily-varying channel, myopic adversary, channel capacity

I. I ntroduction

When communicating over unsecured channels, verifying the trustworthiness of a received signal is critical. Thus,it may be useful for a receiver to declare adversarial tampering, even if it cannot decode the message perfectly. Thisallows for messages to be rejected unless they are conﬁrmed to be trustworthy. This is known as authentication . Weconsider authentication over an arbitrarily-varying channel (AVC) with an adversary who has some noisy versionof the transmitted sequence. An AVC is a channel that takes as inputs both the legitimate and an adversarialtransmission [2], where the adversarial transmission (called a state ) may be maliciously chosen with the goal ofcausing a decoding error at the receiver. A plethora of variations on the AVC appear in the literature, in whichthe adversary has varying degrees of power and knowledge of the legitimate transmission, and the transmitter andreceiver may or may not have access to shared secret information. In the case of authentication, we say that thedecoder is successful if one of the following scenarios occurs: (1) the adversary is not active, and the decoderrecovers the correct message, or (2) the adversary is active and the decoder either recovers the correct message ordeclares adversarial interference. This is a relaxation of the classical AVC, where the decoder is successful if andonly if it correctly recovers the transmitted message. Indeed, we ﬁnd that authentication is “easier” in the sensethat the AVC capacity can be zero, even though the authentication capacity for the same channel is positive (oreven unchanged from the no-adversary setting).The current work lies at the intersection of two broader areas of previous study: authentication over AVCs(with an adversary oblivious to the transmitted sequence), and traditional error correction over AVCs with myopicadversaries. Lying squarely in the former category are [3], [4]. In [3], a channel condition called overwritability isintroduced, and it is shown that authentication can be achieved with high probability for non-overwritable AVCs

A. Beemer and J. Kliewer are with the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark,NJ, 07103; O. Kosut is with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287; andE. Graves and P. Yu are with the Computer and Information Sciences Division, U.S. Army Research Laboratory, Adelphi, MD 20783.This research was sponsored by the Combat Capabilities Development Command Army Research Laboratory and was accomplished underCooperative Agreement Number W911NF-17-2-0183. The views and conclusions contained in this document are those of the authors andshould not be interpreted as representing the o ﬃ cial policies, either expressed or implied, of the Combat Capabilities Development CommandArmy Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmentpurposes not withstanding any copyright notation here on.This work was presented in part at the 2019 IEEE Conference on Communications and Network Security (CNS), and appeared in [1]. at positive (but asymptotically vanishing) rates. The results are extended in [4], where overwritability is shown tocompletely classify authentication capacity over an (oblivious adversary) AVC, with all non-overwritable channelshaving capacity equal to the non-adversarial capacity of the AVC. Meanwhile, more knowledgeable adversariesover the AVC were studied in [5], [6], [7], [8]. The capacity of an AVC with a myopic adversary who does nothave knowledge of the codebook realization (i.e., the legitimate users share common randomness) is characterizedin [5]. The authors of [6] give the capacity of the AVC in the case where the legitimate users have a su ﬃ cientlylarge shared key, and the adversary can see the transmission sequence (i.e. the channel input) perfectly. In [7], theauthors give bounds on the capacity of an AVC in the presence of a power-constrained myopic adversary, showingthat once the channel to the adversary is bad enough, it is essentially oblivious. In the context of Gaussian channels,[8] examines the case where both the transmitter and myopic adversary are quadratically power-constrained.In this paper, we focus on keyless authentication when the adversary has full knowledge of the encoding procedure,and some limited knowledge of the transmitted sequence. More speciﬁcally, the adversary chooses a state based ona non-causal noisy version of the legitimate transmission, as well as full knowledge of the codebook being used,but no direct knowledge of the message being transmitted. The adversary is not power-constrained, a freedom thatis reasonable given we are considering authentication and not pure error-correction, and the possible attacks by theadversary are determined by the AVC between legitimate users.Prior work on authentication with a variety of more capable adversaries includes [9], [10], [11], [12], [13], [14].Simmons [9] considered authentication in the case where the adversary knows the encoding procedure and mayintercept the transmission perfectly, but the legitimate users are allowed some shared key with which they randomizetheir choice of codebook. In [10], Maurer considers authentication for secret key agreement in the setting where eachof the legitimate users and the adversary have access to a random variable, the three of which are correlated in someway. In that model, the adversary may substitute its transmission for that of the actual transmitter (over an otherwisenoiseless channel), with the goal of hoodwinking the legitimate receiver; the (authentication) capacity is characterizedby whether the joint distribution is simulatable by the adversary. An inner bound on keyless authentication capacityis given by Graves, Yu, and Spasojevic in [12] for the case in which the adversary knows both the encoding procedureand the message, but does not have any additional knowledge of the transmitted sequence, which is allowed to bethe output of a stochastic encoder. Lai, El Gamal, and Poor [11] consider authentication in the presence of a myopicadversary who carries out impersonation or substitution attacks; the transmitter and receiver are allowed a sharedsecret key. The attacks considered in the current work reduce to substitution attacks when the underlying AVCallows for direct modiﬁcation of a transmission. This model is extended in [13], where the authors additionallyprovide inner bounds for the keyed and keyless authentication capacities with impersonation / substitution attacks,and extend the simulatability condition of [10] to characterize nonzero capacity for this set of attacks. Finally, [14]improves on the inner bounds of [11], [13] for keyed authentication.A myopic adversary model bridges the gap between oblivious and omniscient adversaries. As mentioned above,authentication capacity over the AVC in the presence of an oblivious adversary is characterized in [4] by thechannel condition of overwritability. In this work, we show that an analogous condition for a myopic adversarycalled U-overwritability , ﬁrst introduced in [1], is a su ﬃ cient condition for zero authentication capacity for bothdeterministic and stochastic encoders. When we are restricted to deterministic encoders, a su ﬃ ciently capableadversary becomes essentially omniscient, giving more cases of zero authentication capacity; interestingly, whenwe allow for stochastic encoders, the authentication capacity has the potential to increase to the non-adversarialcapacity of the underlying channel, which is illustrated by example. These results naturally lead to the questionsabout the relationships between the possible channels to the adversary: namely, how capable can an adversary bebefore the channel becomes U -overwritable? We examine this question by endowing the set of channels to theadversary with a partially ordered set structure, showing that the up-set of any U -overwritable channel gives a setof channels which also lead to U -overwritability. This relationship is perhaps most useful when we know or wantto say something about U -overwritablility for omniscient or oblivious adversaries.The paper is organized as follows. In Section II we introduce the model and necessary notation. We give severalcases for which the authentication capacity is zero in Section III, and demonstrate that there are cases where thedeterministic authentication capacity is zero but a stochastic encoder allows for a positive authentication capacity.Section IV gives results derived by comparing di ﬀ erent channels to the adversary. In Section V, we examine aspeciﬁc binary model, proving that when the channel is not U -overwritable, the authentication capacity is equal tothe underlying non-adversarial capacity, and that stochastic encoders must be used to achieve this in a portion of cases. Section VI concludes the paper. II. P reliminaries Notation: Throughout the paper, an ( M , n ) code is a code with M codewords and block length n . The notation H ( · ) will indicate the binary entropy function, k · k will denote the Hamming weight of a vector, E will denoteexpected value, ⊕ will be used to indicate (coordinate-wise) modulo 2 addition, and “ ❝ ” will indicate a Markovchain. A random variable will be denoted using a capital letter, with the corresponding alphabet and alphabetelements written with script and lowercase. For example, X is a random variable with alphabet X such that for all x ∈ X , P X ( x ) (or P ( x ) when the random variable is understood) is the probability that X = x . If P X ( x ) is an integermultiple of / n for all x ∈ X , we write τ X to indicate the type class corresponding to P X (i.e. the set of vectors oflength n whose empirical distributions match the distribution P X ). Finally, we deﬁne the typical set T ( n ) ǫ ( X ) to bethe set of sequences x ∈ X n such that the empirical probability of each x ∈ X in x is within ǫ · P X ( x ) of P X ( x ). Wemay write T ( n ) ǫ when the random variable(s) are clear from context.We consider authentication when there is a legitimate transmitter and receiver as well as an active adversary whoinduces some channel state at each transmission. We assume that the adversary has full knowledge of the codebookof the legitimate parties, though not necessarily of the speciﬁc transmission being sent. More formally, let W Y | X , S be a discrete memoryless adversarial channel with the sets X , S , and Y as the input, state, and output alphabets.In the event of multiple channel uses, we write W ( y | x , s ) for the product channel, where the sequence lengths areunderstood. Similarly, there is a discrete memoryless channel between the legitimate transmitter and the adversarygiven by U Z | X , where Z is the output alphabet at the adversary, so that the adversary has a noisy version z of anytransmitted sequence x with probability U ( z | x ). See Figure 1 for a depiction of this setup.Transmitter F W ( y | x , s ) Receiver φ U ( z | x ) Adversary J ( s | z ) i ∈ [ M ] X i ∈ X n y ∈ Y n ˆ i ∈ [ M ] ∪ { } s ∈ S n X i ∈ X n z ∈ Z n Fig. 1. The transmitter sends a length- n sequence X i : = F ( i ), which is the (possibly stochastic) encoding of a message i ∈ [ M ]. The adversaryviews a noisy version z of this transmission and sends state s to the channel between transmitter and receiver. The receiver then receives y ,whose probability is conditioned on both the legitimate transmission and the adversarial state, and decodes to φ ( y ) ∈ [ M ] ∪ { } , where “0”is a declaration of adversarial interference. A myopic adversary will be formally deﬁned by a distribution J S n | Z n , where the adversary’s choice of J maydepend on U and on the codebook used by the legitimate users. Notice that the adversary has a noncausal noisyview of the transmission, so that they may view the entire block length before choosing a state. The case in whichthe adversary’s choice is deterministic, as in [5], is included in the above deﬁnition.Let s ∈ S and the corresponding constant state sequence s ∈ S n represent the state in which there is noadversary: i.e. W Y | X , S = s is a non-adversarial channel. An ( M , n ) authentication code for W is an encoder / decoderpair, where the encoder is possibly stochastic : F : { , , . . . , M } → X n φ : Y n → { , , , . . . , M } , where an output of “0” under φ indicates a declaration of adversarial interference. The decoder φ is successful if(1) the output is equal to the input message, or (2) s , s and the output is equal to 0. In other words, the decodereither successfully detects adversarial interference, or decodes correctly. There are a variety of ways to formally deﬁne a stochastic encoder (also called a randomized encoder ), all ofwhich allow the transmitter to make use of some local randomness. We use the following (see, e.g. [15], pg. 550).

Deﬁnition II.1.

Given a message set [2 nR ] , a stochastic encoder generates a codeword in X n for m ∈ [2 nR ] accordingto some conditional distribution P X | M ( x | m ) . Let φ − ( A ) ⊆ Y n represent the set of channel outputs which decode to some i ∈ A under φ , and let φ − ( A ) c be thecomplement in Y n of this set. Let X i : = F ( i ) denote the length- n encoding of message i ; notice that if the encoderis stochastic, F ( i ) can take on one of several values, and so X i is a random variable. Given transmitted message i and choice of myopic adversary J , we deﬁne the probability of error for the authentication code ( F , φ ) as: e ( i , J ) = E X | i X z U ( z | X ) J ( s | z ) W ( φ − ( i ) c | X , s ) + X s , s X z U ( z | X ) J ( s | z ) W ( φ − ( { i , } ) c | X , s )  (1)where the expectation is over the realizations of the stochastic encoder conditioned on message i being sent. If theadversary is absent, then J ( s | z ) = z ∈ Z n . We denote this distribution by J s , and observe that with thischoice of adversarial action, (1) reduces to the standard error probability given that message i is transmitted. e ( i , J s ) = E X | i h W ( φ − ( i ) c | X , s ) i . Similarly, if the adversary decides that J ( s | z ) = z ∈ Z n and some particular s , s , denoted J s , e ( i , J s ) = E X | i h W ( φ − ( { i , } ) c | X , s ) i . The above two cases show the reduction to the so-called oblivious case, in which the adversary has no knowledgeabout the transmission before choosing a state vector s , and so should simply choose s to optimize its chances ofcausing decoding failure.We assume that each message in [ M ] : = { , , . . . , M } is transmitted with equal probability. Then the averageprobability of error over all possible messages for a given adversarial choice of J is given by e ( J ) = M M X i = e ( i , J ) . (2)The maximal probability of error is given by e max ( J ) = max i e ( i , J ) . (3) Remark II.2.

For ﬁxed i and any choice of J, e ( i , J ) ≤ max s e ( i , J s ) , implying that sup J e ( i , J ) = max s e ( i , J s ) .However, since the maximizing choice of s may di ﬀ er by message i, and a myopic adversary has access to someinformation about i, this does not imply that e ( J ) ≤ max s e ( J s ) for all J. This marks a fundamental di ﬀ erence fromthe oblivious adversary model: in that case, an adversarial strategy is deﬁned by a speciﬁc choice of s that is madeindependent of the transmitted message, and the probability of error for that choice is given by e ( s ) = M P i e ( i , s ) .Here, it is clearly true that e ( s ) ≤ max s e ( s ) for every choice of adversarial strategy s . We say that a rate R is achievable if there exists a sequence of (2 nR , n ) authentication codes such that sup J e ( J ) → n → ∞ , or sup J e max ( J ) → n → ∞ . We will primarily consider the average probability of error in thispaper, and will explicitly state when we are claiming a stronger achievability result (i.e. with reference to themaximal probability of error). Notice that sup J e ( J ) is the highest error probability the adversary can hope for,achieved by designing J optimally. The authentication capacity C auth is the supremum of all achievable rates. Let C denote the capacity in the no-adversary setting (i.e., J = J s ).The work in [4] shows that a channel property called overwritability exactly determines when the authenticationcapacity is nonzero for oblivious adversaries. Deﬁnition II.3.

An adversarial channel W Y | X , S with no-adversary state s is overwritable if there exists a distributionP S | X ′ such that X s P S | X ′ ( s | x ′ ) W ( y | x , s ) = W ( y | x ′ , s ) for all x , x ′ , y . (4) Less formally, over an overwritable channel, an adversary can seamlessly make their own false message appearlegitimate to the receiver without being detected.

Theorem II.4. [4] If a channel with an oblivious adversary is not overwritable, then C auth = C; if it is overwritable,C auth = . Overwritability should be compared with symmetrizability for the standard AVC problem [16]: W Y | X , S is sym-metrizable if there exists P S | X ′ such that P s P S | X ′ ( s | x ′ ) W ( y | x , s ) = P s P S | X ′ ( s | x ) W ( y | x ′ , s ) for all x , x ′ , y . In [4] itis shown that overwritability implies symmetrizability, but that the converse does not hold. Since a classical AVC haszero capacity if and only if it is symmetrizable [16], this implies that the authentication capacity can be positive eventhough the AVC capacity is zero. There is an equivalence between overwritability and simulatability , as introducedin [10], when the adversary is oblivious to the transmission but has complete control over the communicationchannel (i.e. can substitute in its transmission for the legitimate transmission at will).When the adversary is not oblivious, we propose a modiﬁcation of overwritability, called U-overwritability , thatwill help us to characterize capacity in the myopic case.

Deﬁnition II.5.

We say that an adversarial channel W Y | X , S with no-adversary state s is U -overwritable , whereU Z | X is a conditional distribution, if there exists a distribution P S | X ′ , Z such that X s , z U ( z | x ) P S | X ′ , Z ( s | x ′ , z ) W ( y | x , s ) = W ( y | x ′ , s ) ∀ x , x ′ , y . (5)Let Z | X be the deterministic identity distribution: that is, ( z | x ) = z = x , and zero otherwise. Thiscorresponds to the case of a so-called omniscient adversary, and induces a special case of U -overwritability, asdeﬁned below. Deﬁnition II.6.

An adversarial channel W Y | X , S with no-adversary state s is I -overwritable if there exists adistribution P S | X ′ , X such that X s P S | X ′ , X ( s | x ′ , x ) W ( y | x , s ) = W ( y | x ′ , s ) ∀ x , x ′ , y . (6)With the introduction of a noisy channel to the adversary, there are several relevant problems in terms ofauthentication. The ﬁrst is characterizing the channel capacity given an AVC W Y | X , S and a channel U Z | X to theadversary. Another is determining the robustness of a channel W Y | X , S against myopic adversaries: in other words,what is the “best” channel to the adversary that the legitimate users can withstand (i.e. still transmit reliably witha positive rate). We consider these problems in the remainder of the paper.III. Z ero A uthentication C apacity In this section, we characterize the conditions under which the authentication capacity is zero. In Section III-A,we consider the case where we are restricted to deterministic encoders. Section III-B shows that these results donot necessarily hold when the encoder is allowed some local randomness.First, we give a result that will apply to both deterministic and stochastic encoders.

Theorem III.1.

If the channel to the adversary is given by U Z | X and W Y | X , S is U-overwritable, C auth = .Proof. Suppose W Y | X , S is U -overwritable, and let P S | X ′ , Z be the distribution guaranteed by Deﬁnition II.5. Consider asequence of (2 nR , n ) authentication codes with ﬁxed R >

0, and let M : = nR . Let ˜ J ( s | z ) = M P Mj = E X ′ | j (cid:2) P S | X ′ , Z ( s | X ′ , z ) (cid:3) ,where the expectation is over the stochastic encoding of message j . Then, e ( ˜ J ) = M M X i = e ( ˜ J , i ) (7) ≥ M X i , j E X | i X s , z U ( z | X ) E X ′ | j (cid:2) P S | X ′ , Z ( s | X ′ , z ) (cid:3) W ( φ − ( { i , } ) c | X , s )  (8) = M X i , j E X | i h E X ′ | j h W ( φ − ( { i , } ) c | X ′ , s ) ii (9) ≥ M X j X i , j E X ′ | j h W ( φ − ( j ) | X ′ , s ) i (10) ≥ M X j X i , j (1 − e ( j , J s )) (11) ≥ M − M  M X j (1 − e ( j , J s ))  (12) ≥ M − M (cid:0) − e ( J s ) (cid:1) , (13)where (9) follows from U -overwritability and (10) follows because j ∈ { i , } c as long as i , j . The deterministicdistribution J s is as deﬁned in Section II. Since sup J e ( J ) ≥ e ( ˜ J ) and sup J e ( J ) ≥ e ( J s ),sup J e ( J ) ≥ M − M (cid:0) − e ( J s ) (cid:1) ≥ M − M − sup J e ( J ) ! . We conclude that sup J e ( J ) ≥ M − M (cid:0) − sup J e ( J ) (cid:1) . This reduces to sup J e ( J ) ≥ M − M − , which approaches 1 / n → ∞ . Thus, C auth = (cid:3) A. Deterministic Encoders

The encoder f of a (2 Rn , n ) authentication code is deterministic if each message i ∈ [2 Rn ] has a single associatedcodeword f ( i ) : = x i ∈ X n that will be transmitted across the channel W Y | X , S . That is, if f is a function from [2 nR ] to X n . For deterministic encoders, knowing the message is synonymous with knowing the transmitted sequence. Thus,in this scenario, if the adversary can reliably decode the message, we can consider them as an essentially omniscientadversary. We show for the set of deterministic encoders that if the no-adversary channel between legitimate usersis stochastically degraded with respect to the channel to the adversary, then I -overwritability guarantees C auth = Deﬁnition III.2.

0, and let M : = nR . With an abuse of notation, we will let φ ( y ) denote either a message or a codeword; they are in one-to-onecorrepondence due to the assumption of a deterministic decoder. Let˜ J ( s | z ) = M M X j = X y P S | X ′ , X ( s | x j , φ ( y )) P Y | Z ( y | z ) , where φ is the non-adversarial decoding rule of the legitimate receiver. Then, e ( ˜ J ) ≥ M X i , j , s , z , y U ( z | x i ) P S | X ′ , X ( s | x j , φ ( y )) · P Y | Z ( y | z ) W ( φ − ( { i , } ) c | x i , s ) (14) = M X i , j , s , y W ( y | x i , s ) P S | X ′ , X ( s | x j , φ ( y )) W ( φ − ( { i , } ) c | x i , s ) (15) ≥ M X i , j , s , y s.t. φ ( y ) = i W ( y | x i , s ) P S | X ′ , X ( s | x j , φ ( y )) W ( φ − ( { i , } ) c | x i , s ) (16) = M X i , j , s (cid:2) − e ( i , J s ) (cid:3) P S | X ′ , X ( s | x j , x i ) W ( φ − ( { i , } ) c | x i , s ) (17) = M X i , j (cid:2) − e ( i , J s ) (cid:3) W ( φ − ( { i , } ) c | x j , s ) (18) ≥ M X i (cid:2) − e ( i , J s ) (cid:3)  M X j , i W ( φ − ( j ) | x j , s )  (19) ≥ M X i (cid:2) − e ( i , J s ) (cid:3)  M X j , i (cid:2) − e ( j , J s ) (cid:3) (20) =  M X i (cid:2) − e ( i , J s ) (cid:3) − M X i (cid:2) − e ( i , J s ) (cid:3) (21) = (cid:2) − e ( J s ) (cid:3) − M X i (cid:2) − e ( i , J s ) (cid:3) (22) ≥ (cid:2) − e ( J s ) (cid:3) − M (23)where (15) follows from stochastic degradation, and (18) follows from I -overwritability. Thus, sup J e ( J ) > (cid:2) − e ( J s ) (cid:3) − M ≥ (cid:2) − sup J e ( J )) (cid:3) − M , and sup J e ( J ) ≥ − q + M J e ( J ) is bounded away from zero as n → ∞ , and thus that C auth = (cid:3) B. Stochastic Encoders

When a stochastic encoder is used, the adversary has knowledge of the distribution P X | M , though not necessarilythe speciﬁc transmitted message or sequence. In this case, the ability of a myopic adversary to determine theintended message does not imply that it has perfect knowledge of the transmission itself. As the latter may benecessary for successful malicious interference, this constitutes the distinction from the work in Section III-A.For the model in which the adversary has perfect knowledge of the message and encoding procedure, but iscompletely oblivious the transmission sequence itself, [12] gives an example of a channel for which stochasticencoding will increase the channel capacity. This example is expanded below to the model in which the adversaryhas a noisy version of the transmission (from which it may or may not deduce the message). Via this example,we demonstrate the necessity of the deterministic encoder condition in Theorem III.3. In the proof of TheoremIII.3, the deterministic encoder makes an appearance in Equation (17), where knowledge of φ ( y ) = i is treated asequivalent to knowledge of x i .Recall that binary symmetric channel (BSC) with crossover probability p , denoted BSC( p ), is a binary-input,binary-output channel such that the probability the output di ﬀ ers from the input is equal to p . A binary erasurechannel (BEC) with erasure probability p outputs an erasure symbol with probability p , and otherwise transmitsbinary inputs reliably. Example III.4.

In this example, we exhibit a channel with the property that unless the adversary has perfectknowledge of the transmitted codeword, the receiver will be able to authenticate with high probability. Over thischannel, a deterministic encoder results in C auth =

0, but we will show that a stochastic encoder will allow for apositive authentication capacity.Consider a binary-input channel with output alphabet { , , ε } , where ε denotes an erasure symbol. The adversaryeither chooses not to act (denoted s ), or chooses a state equal to 0 or 1. If the adversary does not act, the channeloperates as a BSC( p ). If the adversary inputs a 0 or 1, there are two possibilities: if the chosen state symbolmatches the transmitted symbol, the channel operates as a BSC(1 − p ). Otherwise, the transmitted symbol is erased.An example is shown in Figure 2. Notice that the adversary causes authentication failure if and only if it is able x s W ( y | x , s ) ys s p )BSC(1 − p )BSC( p )BEC(1)BSC(1 − p )BEC(1) 0 / / / ε / ε Fig. 2. The word x = s = s s p ), where we assume without loss of generality that p < .

5. When the adversary’s state matchesthe transmitted symbol, the channel statistics ﬂip. The fourth and the sixth symbols in the output sequence y will be erasure symbols, sincethe transmitted symbol and state symbol do not match. Thus, in this example, the receiver can declare with absolute conﬁdence that therehas been adversarial interference. t i

01 01 γ Most likely x ’s given t i • •• • •• Most likely x ’s given zz Fig. 3. The message i is encoded as t i , and is passed through a BSC( γ ), whose output is x . The adversary observes z after x is passedthrough a BSC( q ). Given z , the adversary can decode reliably to t i , but there remains ambiguity about the realization of the sequence x . to simultaneously ﬂip enough bits to convince the receiver that a di ﬀ erent message was sent, and not trigger anyerasures whatsoever.It is straightforward to see that the channel is I -overwritable. Suppose, then, that the channel to the adversary isa BSC( q ), where q ≤ p . By Theorem III.3, C auth = γ >

0, and encode the message i using a deterministic binary code designed for the BSC concatenation of aBSC( γ ) and a BSC( p ); call this codeword t i . Next, simulate an artiﬁcial BSC( γ ) and send the output of this simulatedchannel, denoted x , through the AVC. The non-adversarial channel to the receiver is now the concatenated channelfor which the code was designed, and so messages are transmitted reliably in the absence of an adversary. Meanwhile,the channel to the adversary is less noisy than the main channel, and so we assume that given its observation z ,the adversary can determine the codeword t i . However, it cannot exactly determine the actual transmitted sequence x . For an illustration of this, see Figure 3.In order to avoid being detected, the adversary can either send s or s = x , and in order to cause a decoding error,it must manipulate enough symbols that the receiver decodes to an incorrect codeword t j , i . Because the adversarycannot be certain of the exact realization of the transmitted sequence, we conclude that if it acts, it is detected withhigh probability. In other words, this channel has positive capacity. By making γ su ﬃ ciently small, the capacity ofthe concatenated channel approaches the capacity of the underlying BSC( p ), allowing our coding scheme to haverates approaching the latter.Example III.4 is high-stakes: if the adversary makes any mistake whatsoever, it is discovered. Section V examinesa more forgiving scenario, where the adversary may ﬂip the BSC channel statistics in whichever time slots it chooses,without needing to know the transmitted sequence. We show that a stochastic encoder also allows for positive (and,in fact, no-adversary) authentication capacity in this case. IV. R elationships between myopic adversaries

A myopic adversary spans the gap between oblivious adversaries (as in [4]) and omniscient adversaries, whohave perfect access to the transmitted sequence. Thus, when the legitimate transmitter and receiver have use of aparticular communication channel W Y | X , S , there may be some point at which the adversary becomes capable enoughthat the channel is U -overwritable, bringing the authentication capacity to zero. Since not every pair of potentialchannels to the adversary are directly comparable, we address this transition with the use of a partial ordering onchannels U Z | X given by stochastic degradation: formally, U Z | X ≤ U Z | X if and only if U Z | X is stochastically degradedwith respect to U Z | X . Examples IV.1 and IV.2 illustrate some features of this partial order. Example IV.1.

The channel to the adversary in the oblivious case (i.e. the case in which Z is independent of X ) is stochastically degraded with respect to any other channel U Z | X . On the other hand, every channel U Z | X isstochastically degraded with respect to the omniscient channel ( U Z | X = Z | X ). Thus, these extremes give the uniquemaximum (omniscient adversary) and minimum (oblivious adversary) elements of our partial order. Example IV.2.

As an example of a totally ordered chain within the partial order, consider the set of binary symmetricchannels. The BSC( p ) is stochastically degraded with respect to the BSC( q ) if and only if 0 . ≥ p ≥ q ≥

0. Themaximum element of the chain is the BSC(0) (omniscient adversary), and the minimum is the BSC(0 .

5) (obliviousadversary). See Figure 4.Using the partial order of stochastic degradation, we may compare channels and then use these comparisons todraw conclusions about U -overwritability. Theorem IV.3.

If a channel W Y | X , S is overwritable, then it is U-overwritable for every channel U Z | X .Proof. The channel to an oblivious adversary is stochastically degraded with respect to every channel U Z | X . Theresult follows by Theorem IV.3. (cid:3) Clearly, if the adversary can successfully generate a false message given a noisy version of a transmission, it isalso successful in the noiseless case. This is formalized in the following corollary.

Corollary IV.5.

If a channel W Y | X , S is U-overwritable for some U Z | X , then it is also I-overwritable.Proof. Every channel U Z | X is stochastically degraded with respect to Z | X . The result follows by Theorem IV.3. (cid:3) Example IV.6.

Extending Example IV.2, we see that if W Y | X , S is U -overwritable for U Z | X = BSC( p ), then it is U -overwritable for U Z | X = BSC( q ) for all q ≤ p , and the authentication capacity is equal to zero whenever thechannel to the adversary is a BSC( q ) for q ≤ p . See Figure 4. Example IV.7.

By the contrapositive of Corollary IV.5, if a channel is not I -overwritable, then it is not U -overwritable for any U Z | X . As an example, consider the following AVC: X = F , Y = F ∪ { ε } , S = { , , s } , and W ( y | x , s ) is a binary erasure channel (BEC) operating on x with erasure probability s if s = ,

1. If s = s , wedeﬁne the erasure probability to be equal to p . BSC(0 . p ) C auth = Fig. 4. The binary symmetric channels are totally ordered by stochastic degradation. If W Y | X , S is U -overwritable for U Z | X = BSC( p ), thenthe authentication capacity C auth is equal to zero for that U Z | X and all binary symmetric channels with smaller crossover probability. First, we observe that if the channel is I -overwritable, then p =

1, we have 0 = − p ; we conclude p =

1. Thus, for all p <

1, thechannel is not I -overwritable. It cannot then be U -overwritable for any channel U Z | X to the adversary.In fact, we can show that the authentication capacity is equal to C = − p for any value of p and any channelto the adversary. The converse follows from the capacity of the underlying BEC( p ). To see achievability, design acode of rate 1 − p − δ for a BEC( p ) with vanishing error probability, and decode as follows:(1) If there is a single codeword consistent with the observed sequence, decode to that codeword.(2) If there is more than one codeword consistent with the observed sequence, declare adversarial interference.The only error that may occur is when the adversary is not present, and we declare an error in step (2). However,this would constitute a regular decoding error in our code for the BEC( p ). Thus, as n → ∞ , the probability of thistype of error goes to zero. In all, then, we have shown that rates arbitrarily close to 1 − p are achievable. Remark IV.8.

In general, it is straightforward to see that the existence of a distribution P S | X ′ , Z (resp. P S | X ′ X )satisfying the equality in Deﬁnition II.5 (resp. Deﬁnition II.6), and thus the U-overwritability (resp. I-overwritability)of a channel, may be determined via linear programming. Once U-overwritability has been determined for aparticular choice of U Z | X , Theorem IV.3 allows us to determine U-overwritability for less degraded channels. V. A M yopic B inary A dversarial C hannel In this section, we examine in detail a binary model that we believe will provide insight into the more generalcase. In this model, the adversary views the transmitted codeword through a BSC( q ), and decides on a binary statesequence s , which is added to the transmission x . The sequence x ⊕ s is then transmitted across a BSC( p ). We callthis the myopic binary adversarial channel with parameters p and q , and denote it by MBAC p , q . Remark V.1.

Consider an MBAC p , q such that ≤ p , q ≤ . . We ﬁrst claim that W Y | X , S is not U-overwritable aslong as q > and p < . .Indeed, suppose W Y | X , S is overwritable. Then there exists P S | X ′ , Z such that for any choice of x , x ′ , and y, X s , z U ( z | x ) P S | X ′ , Z ( s | x ′ , z ) W ( y | x , s ) = W ( y | x ′ , s ) . If x = x ′ = y = , then the above reduces to: X s , z U ( z | P S | X ′ , Z ( s | , z ) W (0 | , s ) = W (0 | , . We then have − p = X s , z U ( z | P S | X ′ , Z ( s | , z ) W (0 | , s ) = (1 − p )(1 − q ) P S | X ′ , Z (0 | , + q (1 − p ) P S | X ′ , Z (0 | , + p . So, = (1 − q ) P S | X ′ , Z (0 | , + qP S | X ′ , Z (0 | , . If q > , this can only occur if P S | X ′ , Z (0 | , = P S | X ′ , Z (0 | , = , and thus, P S | X ′ , Z (1 | , = P S | X ′ , Z (1 | , = .With this in mind, let x = and x ′ = y = . In this case, we may show that − p = p = . . Thus, if q > and p < . , the channel is not U-overwritable, where U Z | X is a BSC(q).Next, consider the boundary cases: if q > and p = . , the channel has non-adversarial capacity − H (0 . = .The ﬁnal case is that in which q = . In this case, the channel is U-overwritable, or equivalently, I-overwritable:indeed, choose P S | X ′ , Z = X to be deterministic such that s = x ′ + x.A. When the adversary is more myopic than the receiver We begin with the case in which the channel to the adversary is stochastically degraded with respect to thenon-adversarial channel W Y | X , S = s between legitimate users. That is, 0 ≤ p < q ≤ /

2. By our arguments in RemarkV.1, the channel is never BSC( q )-overwritable in this case, and so we cannot make use of Theorem III.1. In fact, weshow that the authentication capacity is not only nonzero when q > p , but that we can achieve the non-adversarialcapacity with a deterministic encoder. Theorem V.2. If ≤ p < q ≤ / , the authentication capacity C auth is equal to the non-adversarial capacityC = − H ( p ) . Moreover, this rate can be achieved with a deterministic encoder.Theorem V.2 converse proof. Since any authentication code must also be an error-correcting code for the underlyingnon-adversarial channel, we have C auth ≤ C BSC( p ) = − H ( p ). (cid:3) Per the MBAC p , q model, the adversary can see a noisy version of the transmitted codeword. In our proof ofachievability, we will strengthen the adversary in order to simplify some arguments. Proving achievability for astronger adversary simultaneously proves the result for any weaker adversary. Speciﬁcally, we introduce an oraclewho will reveal to the adversary the exact distance d of the transmitted codeword, x i , from the received word z . Generally speaking, even given this information, there remain enough potentially transmitted words to makethe adversary’s task di ﬃ cult. We will also allow the adversary to be aware of the exact error pattern, e , of theBSC( p ) between the transmitter and receiver, so that it can design the state knowing the exact di ﬀ erence betweenthe transmission and what the receiver will see. That is, the adversary can add the state s ′ = s ⊕ e , so that thereceiver will see x i ⊕ s ′ ⊕ e = x i ⊕ s . To simplify our analysis, we will simply assume the adversary adds s andthere is no additional channel noise from the BSC( p ). However, the decoder will still assume there has been somechannel noise and decode appropriately (i.e. the receiver does not have any increased knowledge).We ﬁrst present two lemmas that will allow us to choose a good codebook. The ﬁrst is a variation of Lemma 3from [16]. In each, let M : = nR . Lemma V.3 ([16]) . Let ǫ > and let x , . . . , x M be drawn uniformly at random from the type class of type P X .With high probability, this codebook satisﬁes the following. For any type class τ XX ′ S and any sequence s , |{ i : ∃ j , i s.t. ( x i , x j , s ) ∈ τ XX ′ S }| ≤ n | R − I ( X ; X ′ S ) + | R − I ( X ′ ; S ) | + | + + n ǫ . (24)The proof of the following lemma appears in Appendix A. Lemma V.4.

Let codewords x , . . . , x M be drawn independently and uniformly at random from some type class τ X .For each type class τ Z , and each z ∈ τ Z , the number of messages i such that k x i ⊕ z k = d is, with high probabilityin n, bounded below by $ n + n ( R − I ( X ; Z )) % , where we let p X | Z be the conditional distribution of pairs of words in their respective type classes that are distanced apart, as follows: p X | Z (1 | = d np Z (0) + − p X (0)2 p Z (0) p X | Z (0 | = d np Z (1) + − p X (1)2 p Z (1) . Combining Lemma V.4 with the fact that | H ( t ) − H ( t ± ǫ ) | ≤ − ǫ log ( ǫ/

2) for ǫ < / Corollary V.5.

Given ǫ < / and X , Z ∼ Bernoulli(1 / , if z ∈ T ( n ) ǫ ( Z ) , | d − nq | ≤ n ǫ , and R = − H ( p ) + ǫ log ǫ/ ,then the number of codewords messages distance d from z is, with high probability, bounded below by $ n + n ( R − + H ( d / n ) + ǫ log ǫ/ ) % ≥ $ n + n ( H ( q ) − H ( p ) + ǫ log ǫ/ ) % . We now prove achievability with a deterministic encoder.

Theorem V.2 achievability proof.

Let ˜ δ n be an arbitrary sequence such that n ˜ δ n ≥ √ n log n and lim n →∞ ˜ δ n = δ n : = − ˜ δ n log ˜ δ n / ≤ p < q ≤ /

2. For n su ﬃ ciently large, and without loss of generality, let R = − H ( p ) − δ n > − H ( q ) + n log ( n + + δ n . We construct a ( M : = nR , n ) code family with vanishing probability of error. Encoding:

By Lemmas V.3 and V.4, for n su ﬃ ciently large, there exist codewords x , . . . , x M from the type class τ X ,where X ∼ Bernoulli(1 / z ∈ T ( n )˜ δ n ( Z ) where Z ∼ Bernoulli(1 / d from z , where | d − nq | ≤ n ˜ δ n , is bounded below by 2 n ( H ( q ) − H ( p ) − δ n − n log ( n + ), and (2) for anytype class τ XX ′ S and any sequence s , (24) holds. Given a message i ∈ [ M ], transmit x i . Decoding:

Let ǫ > ﬃ ciently small. Given an output y ∈ { , } n , decode to message i ∈ [ M ] if i is uniquewith the property that k x i ⊕ y k < n ( p + ˜ δ n ). Otherwise, declare adversarial interference by outputting “0”. Probability of error analysis:

Deﬁne S ( z , d ) : = { i ∈ [ M ] : k z ⊕ x i k = d } , E ( s ) : = { i ∈ [ M ] : ∃ j , i s.t. k x i ⊕ s ⊕ x j k < n ( p + ˜ δ n ) } . That is, S ( z , d ) is the set of messages in [ M ] whose corresponding codewords are distance d from z ∈ { , } n , and E ( s ) is the set of messages i in [ M ] such that adding s to x i results in the decoder potentially confusing the intendedmessage with a false message j . Let J S | Z , D be adversary’s choice of distribution given knowledge of the distanceto the transmitted codeword, which is given by the random variable D . For ﬁxed J , e ( i , J ) is the probability ofdecoding error given that message i was sent. Then, e ( J ) = M M X i = P (error | i ) (25) = X i , s , z , d P (error | i , s , z , d ) P ( i | s , z , d ) J ( s | z , d ) P ( z , d ) (26) = X i , s , z , d P (error | i , s , z , d ) P ( i | z , d ) J ( s | z , d ) P ( z , d ) (27)Since every message whose codeword is distance d from z is equiprobable P ( i | z , d ) =  | S ( z , d ) | if i ∈ S ( z , d ) , P (error | i , s , z , d ) = j , i , k x i ⊕ s ⊕ x j k ≥ n ( p + ˜ δ n ). In other words, P (error | i , z , s , d ) = i < E ( s ). Then, (27) is bounded as follows: X s , z , d J ( s | z , d ) P ( z , d ) M X i = P (error | i , s , z , d ) P ( i | z , d ) ≤ X s , z , d J ( s | z , d ) P ( z , d ) · | S ( z , d ) ∩ E ( s ) || S ( z , d ) | (28) ≤ X s , d : | d − nq |≤ n ˜ δ n z ∈T ( n )˜ δ n J ( s | z , d ) P ( z , d ) · | S ( z , d ) ∩ E ( s ) || S ( z , d ) | (29) + X s , d z < T ( n )˜ δ n J ( s | z , d ) P ( z , d ) (30) + X s , z d : | d − nq | > n ˜ δ n J ( s | z , d ) P ( z , d ) , (31)since | S ( z , d ) ∩ E ( s ) || S ( z , d ) | ≤ . Now, observe that we can choose ˜ δ n such that (30) and (31) are typicality conditions, and more speciﬁcallylim n →∞ ((30) + (31)) ≤ lim n →∞ (cid:18) Pr (cid:18) Z < T ( n )˜ δ n (cid:19) + Pr (cid:16) | D − nq | > n ˜ δ n (cid:17)(cid:19) = . On the other hand, to show that (29) (and thus (27)) also approaches zero as n increases, we make use of thefollowing lemma, whose proof appears in Appendix B. Lemma V.6.

Let the codewords x , . . . , x M be constructed as above. Then, for any z ∈ T ( n )˜ δ n and s ∈ F n , n su ﬃ cientlylarge, and d such that | d − nq | ≤ ˜ δ n n, | S ( z , d ) ∩ E ( s ) | ≤ n ( H ( q ) − H ( p ) − . δ n ) . (32)Combining Corollary V.5 and Lemma V.6, we see that | S ( z , d ) ∩ E ( s ) || S ( z , d ) | ≤ ( n + − . n δ n (33)which converges to 0 since 0 . n δ n = − . n ˜ δ n log ˜ δ n / ≥ . √ n log n . Since (33) also serves as an upper boundfor (29), we are done. (cid:3)

Remark V.7.

The authors of [7] examine capacity under the following model: the adversary views the transmittedcodeword through a BSC ( q ) , and decides on a state sequence s such that k s k ≤ tn for ﬁxed parameter t (to avoidconfusion, we use “t” rather than “p” as used in [7]). This state sequence is added to x and sent noiselessly tothe receiver. This di ﬀ ers from our model in two signiﬁcant ways: (1) the adversary is power constrained and theno-adversary case is noiseless, and (2) error correction rather than authentication is considered.Consider this model in the authentication setting. Because the channel is not overwritable for any power constraintt, the oblivious case of q = / yields an authentication capacity of C auth = , the non-adversarial capacity, byTheorem II.4. In fact, Theorem V.2 shows that even if the adversary can ﬂip any number of bits (i.e. t = ), as longas q > , the authentication capacity is equal to 1. Interestingly, the power constraint t, which is instrumental inthe general AVC case, is immaterial in the authentication case.B. When the adversary is less myopic than the receiver Now, suppose that the non-adversarial channel between legitimate users is the worse channel. In this case, theauthentication capacity is dependent on whether the encoder must be deterministic, or allowed to be stochastic.

Theorem V.8. If ≤ q ≤ p ≤ / , and the encoder is deterministic, C auth = .Proof. It is straightforward to show that if 0 ≤ q ≤ p ≤ / U Z | X is stochastically degraded with respect to W Y | X , S = s and the channel is I -overwritable: to see I -overwritability, choose P S | X ′ , X to be deterministic such that s = x ′ ⊕ x .Then, by Theorem III.3, C auth = (cid:3) Remark V.9.

The authors of [7] show that for their similar model (which was detailed above in Remark V.7), if q < p, then the deterministic coding capacity is equal to the capacity of the channel with an omniscient adversary. Thisis also what we have shown for authentication (and our modiﬁed model) in Theorem V.8, since the authenticationcapacity of the MBAC p , q = is equal to zero. By the comments at the end of Remark V.1 and Theorem III.1, if q = p = /

2, the authentication capacityof the MBAC p , q is zero, regardless of the type of encoder. However, allowing a stochastic encoder, we can achievepositive capacity as long as q > p < /

2. Similarly to the strategy in Example III.4, we will send our initial codeword, t , through an artiﬁcial channel, before sending that through the channels U Z | X and W Y | X , S . Here,however, there must be some asymmetry to our artiﬁcial channel; otherwise the error pattern of the artiﬁcial channelis independent of t i , and the adversary may choose s = t i ⊕ t j to reliably deceive the decoder. Theorem V.10. If < q ≤ p < / , and the encoder is allowed to be stochastic, C auth = − H ( p ) , where thisauthentication capacity holds with error probability measured either as average or maximum over all messages. Notice that we state the capacity holds even if we consider maximum error probability over all messages: thatis, considering error to be sup J e max ( J ). Since the maximum error approaching zero with increasing block lengthimplies average error doing the same, this is a stronger statement. We are able to make such a statement here becausewe may assume the adversary knows exactly which message is being sent, and claim that we can authenticate evenin this case. Theorem V.10 converse proof.

As before, since any authentication code must also be an error-correcting code forthe underlying non-adversarial channel, we have C auth ≤ C BSC( p ) = − H ( p ). (cid:3) For this case, we will again strengthen the adversary slightly in order to simplify some arguments. Speciﬁcally,we assume that the adversary can determine the message i with perfect accuracy, and we allow the adversary’schoice of distribution, J S n | Z n , to be a function of the exact joint type of t i , the transmitted word x , and the observedsequence z .We ﬁrst present a result that will allow us to choose a good codebook; the proof of the following lemma maybe found in Appendix C. Lemma V.11.

Let P Y | T be a discrete memoryless channel with capacity C > and capacity-achieving inputdistribution P T . Let R > such that R < C, let ǫ > be su ﬃ ciently small, let n be su ﬃ ciently large, and letM : = nR . Then there exist codewords t , . . . , t M ∈ τ T such that(a) for any type class τ TT ′ S , i ∈ [ M ] , and any sequence s , |{ j : ( t i , t j , s ) ∈ τ TT ′ S }| ≤ n | R − I ( T ′ ; TS ) | + + n ǫ , (34) (b) H ( T i | T j ) ≥ ǫ for all i , j, where T i and T j are artiﬁcial random variables with joint distribution given by theempirical distribution of t i and t j , and(c) with a typical set decoder, the maximum error probability for transmission over P Y | T is bounded above by ǫ . We now prove achievability.

Theorem V.10 achievability proof.

Let 0 < q ≤ p < /

2. Let γ, δ > ﬃ ciently small, and let R : = C ′ − δ where C ′ is the capacity of a binary asymmetric channel, BAC( p , γ + p − γ p ). The BAC( p , γ + p − γ p ) is equivalentto a Z-channel with crossover probability γ followed by a BSC( p ). We construct a ( M : = nR , n ) code family withvanishing probability of error using Lemma V.11. Encoding:

Let P T ∼ Bernoulli( α ), with α chosen according to the capacity-achieving distribution of the BAC( p , γ + p − γ p ), and choose a codebook t , . . . , t M as in Lemma V.11. In order to send message i ∈ [ M ], we pass t i through V X | T , a Z-channel with V (0 | = γ , and V (1 | = Decoding:

Let ǫ > ﬃ ciently small. Upon receiving vector y ∈ F n , decode to message i if and only if i is uniquesuch that ( t i , y ) ∈ T ( n ) ǫ ( T , Y ), where ( T , Y ) ∼ P T × P Y | T , where P T ∼ Bernoulli( α ) and P Y | T is given by W Y | X , S = s × V X | T . Probability of error analysis:

Let e ( t i , x , s ) be the probability i does not satisfy the decoding requirement, assumingmessage i is chosen, x is sent, and s is transmitted by the adversary, and let e ( t i , x , s ) be the probability that some j ∈ [ M ], j , i satisﬁes the decoding requirement, assuming message i is chosen, x is sent, and s is transmitted bythe adversary. In other words, e ( t i , x , s ) = W ( φ − ( i ) c | x , s ) and e ( t i , x , s ) = W ( φ − (0 , i ) c | x , s ) . Allowing the adversary to know t i , z and P t i , X , z , we have e ( i , J ) = E X | t i X z U ( z | X ) J ( s | z , t i , P t i , X , z ) e ( t i , X , s ) + X s , s X z U ( z | X ) J ( s | z , t i , P t i , X , z ) e ( t i , X , s )  (35) = X x , z P ( x | t i ) U ( z | x )  J ( s | z , t i , P t i , x , z ) e ( t i , x , s ) + X s , s J ( s | z , t i , P t i , x , z ) e ( t i , x , s )  (36) = X x , z P ( x , z | t i )  J ( s | z , t i , P t i , x , z ) e ( t i , x , s ) + X s , s J ( s | z , t i , P t i , x , z ) e ( t i , x , s )  (37) = X x , z ( t i , x , z ) ∈T ( n ) ǫ ( T , X , Z ) P ( x , z | t i )  J ( s | z , t i , P t i , x , z ) e ( t i , x , s ) + X s , s J ( s | z , t i , P t i , x , z ) e ( t i , x , s )  (38) + X x , z ( t i , x , z ) < T ( n ) ǫ ( T , X , Z ) P ( x , z | t i )  J ( s | z , t i , P t i , x , z ) e ( t i , x , s ) + X s , s J ( s | z , t i , P t i , x , z ) e ( t i , x , s )  , (39)where (37) follows because T ❝ X ❝ Z is a Markov chain. Consider (39). With high probability in n , ( T i , X , Z )lies in the typical set. Thus, since(39) ≤ X x , z ( t i , x , z ) < T ( n ) ǫ ( T , X , Z ) P ( x , z | t i ) = P n ( T i , X , Z ) < T ( n ) ǫ ( T , X , Z ) | T i = t i o , we have that (39) approaches 0 as n → ∞ . Notice that(38) ≤ X s , x , z ( t i , x , z ) ∈T ( n ) ǫ ( T , X , Z ) P ( z | t i ) J ( s | z , t i , P t i , x , z ) P ( x | t i , z ) W ( φ − ( i ) c | x , s ) (40)If for every ﬁxed choice of s and z we have P x P ( x | t i , z ) W ( φ − ( i ) c | x , s ) → n → ∞ , where the sum isover x ’s such that ( t i , x , z ) ∈ T ( n ) ǫ ( T , X , Z ), then (38) approaches zero asymptotically. Thus, it is su ﬃ cient to provethe following lemma, which states that any choice of s given knowledge of t i and z causes decoding failure fora vanishing fraction of the transmissions x that are consistent with t i and z . A proof of the lemma appears inAppendix D. Lemma V.12.

Let i, z , and s be ﬁxed such that ( t i , z ) ∈ T ( n ) ǫ , and suppose X and Y are drawn from the distributionP ( x | t i , z ) W ( y | x , s ) , where P ( x | t i , z ) is the conditional distribution for X from the Markov chain T ❝ X ❝ Z.Then, P n ∃ j , i : ( t j , Y ) ∈ T ( n ) ǫ o → as n → ∞ . With Lemma V.12 in hand, letting δ and γ approach zero gives us a code with rate arbitrarily close to lim δ,γ → C ′ − δ = − H ( p ) and vanishing error probability, proving achievability. (cid:3) VI. C onclusions

In this paper, we considered keyless authentication over an AVC where the adversary sees the transmitted sequencethrough a noisy channel U Z | X . We introduced the channel condition U -overwritability as a generalization of theoblivious-adversary condition overwritability, and showed that U -overwritability is a su ﬃ cient condition for zeroauthentication capacity. We also showed that if users are restricted to deterministic encoders, there are additionalcases in which the authentication capacity is zero: namely, when the adversary is able to reliably decode, and theAVC is vulnerable to an omnicient adversary ( I -overwritable). However, allowing for stochastic encoders can allowfor positive authentication capacity in these cases.Next, we compared adversaries to one another, and showed that once an adversary has a channel U Z | X such thatthe AVC is U -overwritable, the AVC is also U -overwritable for any less degraded channel to the adversary. As a consequence, if an AVC is overwritable, it is also U -overwritable for every U Z | X . This can allow us to quicklydetermine U -overwritability for a large group of channels to the adversary.Finally, we examined a myopic binary adversarial channel in detail. Interestingly, for this case the authenticationcapacity is always equal to the non-adversarial capacity of the underlying channel as long as the channel tothe adversary is not perfect and we allow stochastic encoders. Furthermore, in this case the maximum errorauthentication capacity is equal to the average error authentication capacity. If we restrict to deterministic encoders,the authentication capacity drops to zero for the cases in which the non-adversarial channel between users isstochastically degraded with respect to the channel to the adversary, as the adversary is essentially omniscientin this scenario. An open question is whether U -overwritability is more generally a necessary condition for zeroauthentication capacity when stochastic encoders are allowed, and whether the authentication capacity is alwaysequal to the non-adversarial capacity when it is positive.R eferences [1] A. Beemer, O. Kosut, J. Kliewer, E. Graves, and P. Yu, “Authentication against a myopic adversary,” in . IEEE, 2019, pp. 1–5.[2] D. Blackwell, L. Breiman, and A. Thomasian, “The capacities of certain channel classes under random coding,” The Annals ofMathematical Statistics , vol. 31, no. 3, pp. 558–567, 1960.[3] O. Kosut and J. Kliewer, “Network equivalence for a joint compound-arbitrarily-varying network model,” in , Sept 2016, pp. 141–145.[4] ——, “Authentication capacity of adversarial channels,” in , 2018.[5] A. D. Sarwate, “Coding against myopic adversaries,” in , Aug 2010, pp. 1–5.[6] N. Cai, T. Chan, and A. Grant, “The arbitrarily varying channel when the jammer knows the channel input,” in , June 2010, pp. 295–299.[7] B. K. Dey, S. Jaggi, and M. Langberg, “Su ﬃ ciently myopic adversaries are blind,” IEEE Transactions on Information Theory , vol. 65,no. 9, pp. 5718–5736, Sep. 2019.[8] Y. Zhang, S. Vatedka, S. Jaggi, and A. D. Sarwate, “Quadratically constrained myopic adversarial channels,” in , June 2018, pp. 611–615.[9] G. J. Simmons, “Authentication theory / coding theory,” in Workshop on the Theory and Application of Cryptographic Techniques .Springer, 1984, pp. 411–431.[10] U. Maurer, “Information-theoretically secure secret-key agreement by NOT authenticated public discussion,” in

International Conferenceon the Theory and Applications of Cryptographic Techniques . Springer, 1997, pp. 209–225.[11] L. Lai, H. El Gamal, and H. V. Poor, “Authentication over noisy channels,”

IEEE Transactions on Inf. Theory , vol. 55, no. 2, pp.906–916, 2009.[12] E. Graves, P. Yu, and P. Spasojevic, “Keyless authentication in the presence of a simultaneously transmitting adversary,” in . IEEE, 2016, pp. 201–205.[13] O. Gungor and C. E. Koksal, “On the basic limits of RF-ﬁngerprint-based authentication,”

IEEE transactions on Inf. Theory , vol. 62,no. 8, pp. 4523–4543, 2016.[14] J. Perazzone, E. Graves, P. Yu, and R. Blum, “Inner bound for the capacity region of noisy channels with an authentication requirement,”in

IEEE International Symposium on Inf. Theory (ISIT) . IEEE, 2018, pp. 126–130.[15] A. El Gamal and Y.-H. Kim,

Network information theory . Cambridge University Press, 2011.[16] I. Csiszar and P. Narayan, “The capacity of the arbitrarily varying channel revisited: positivity, constraints,”

IEEE Transactions on Inf.Theory , vol. 34, no. 2, pp. 181–193, March 1988.[17] I. Csiszar and J. K¨orner,

Information theory: coding theorems for discrete memoryless systems . Cambridge University Press, 2011. A ppendix AL emma V.4First we derive two lemmas which will help considerably in establishing Lemma V.4 by simplifying the analysisthat results from using Bernstein’s trick when bounding sums of random variables.

Lemma A.1.

Let < t , p < , let s ∈ {− , } , and let B , . . . , B n be independent Bernoulli ( p ) random variables. min h > E h e h P ni = sB i i e hsnt ≤ e − nD ( t || p ) , where D ( t || p ) : = t ln (cid:16) tp (cid:17) + (1 − t ) ln (cid:16) − t − p (cid:17) . Proof.

First, note that since the B i ’s are independent, for ﬁxed h > E h e h P ni = sB i i = n Y i = E h e hsB i i = (1 − p + pe hs ) n . Thus, min h > E h e h P ni = sB i i e hsnt = min h > (cid:16) (1 − p + pe hs ) n e − hsnt (cid:17) . (41)Solving, we ﬁnd that as long as t <

1, min h > E h e h P ni = sB i i e hsnt ≤ e − nD ( t || p ) . (42)since the only critical point of (41) occurs at h = s ln t (1 − p ) p (1 − t ) and the second derivative is always positive. (cid:3) In order to apply Lemma A.1 to obtain Lemma V.4 it will be helpful to lower bound MD ( t || p ). Lemma A.2.

Let t : = ζ p for some positive number ζ ∈ (0 , . Then,D ( t || p ) ≥ p ( ζ ln ζ − ζ + . Proof.

Borrowing from [17, Lemma 17.9], observe that D ( ζ p || p ) is a convex function of p , and thus a linearapproximation at p = D ( t || p ) ≥ D (0 || + d D ( ζ p || p )d p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p = p = ( ζ ln ζ − ζ + p . (cid:3) Proof of Lemma V.4.

Fix z ∈ τ Z and suppose all codewords X i are independently chosen uniformly at random from τ X . Before continuing further, we will show that the there exists a conditional type set τ X | Z ( z ) such that the set ofall messages Hamming distance d from z is equal to { i : x i ∈ τ X | Z ( z ) } since τ X , τ Z , and d are ﬁxed. Indeed, deﬁne τ X | Z ( z ) by p X | Z (1 | = d np Z (0) + − p X (0)2 p Z (0) (43) p X | Z (0 | = d np Z (1) + − p X (1)2 p Z (1) . (44)For any x ∈ τ X that is distance d from z , observe that d can be written as d + d , where d is the number on indiceswhere z is 0 and x is 1, and d is the be the number of indices where z is 1 and x is 0. Furthermore, because d and d must satisfy the linear equations np Z (0) + d − d = np X (0) (45) d + d = d , (46)we have d = d + n [ p Z (0) − p X (0)]2 (47) d = d + n [ p Z (1) − p X (1)]2 , (48)and we see that the empirical distributions p X | Z (1 |

0) and p X | Z (0 |

1) are as in equations (43) and (44) above. Weconclude that x ∈ τ X | Z ( z ). Because x was chosen arbitrarily from the type class, we see that any codeword distance d from z must also be in τ X | Z ( z ). Now, let A i =  X i ∈ τ X | Z ( z ) , , so that we may write |{ i : k z ⊕ X i k = d }| = P Mi = A i . Recall that X , . . . , X M are independent, and note that by [17,Lemma 2.5], 2 n ( H ( X | Z ) − ǫ ) ≤ | τ X | Z ( z ) | , and | τ X | ≤ nH ( X ) , where ǫ = log ( n + n . We conclude that the A i are independent Bernoulli( b ) random variables, where b ≥ | τ X | Z ( z ) || τ X | ≥ − n ( I ( X ; Z ) + ǫ ) . Hence, the probability that there are fewer than ⌊ n ( R − I ( X ; Z ) − ǫ ) ⌋ messages distance d from z ∈ τ Z is equal toPr (cid:16)P Mi = A i < ⌊ n ( R − I ( X ; Z ) − ǫ ) ⌋ (cid:17) . Notice that if R < I ( X ; Z ) + ǫ , then ⌊ n ( R − I ( X ; Z ) − ǫ ) ⌋ =

0. Since the A i ’s are nonnegative, we havePr  M X i = A i <  = . (49)On the other hand, if R > I ( X ; Z ) + ǫ then:Pr  M X i = A i < j n ( R − I ( X ; Z ) − ǫ ) k ≤ Pr  M X i = A i < n ( R − I ( X ; Z ) − ǫ )  (50) = min h > Pr (cid:16) e − h P Mi = A i > e − h n ( R − I ( X ; Z ) − ǫ ) (cid:17) (51) ≤ min h > E h e − h P Mi = A i i e h n ( R − I ( X ; Z ) − ǫ ) (52) ≤ e − MD ( − n ( I ( X ; Z ) + ǫ ) || − n ( I ( X ; Z ) + ǫ ) ) (53) ≤ e − n ( R − I ( X ; Z ) − ǫ ) [ − − n ǫ (1 + n ǫ ln 2) ] (54)where (51) is from Bernstein’s trick; (52) is from Markov’s Inequality; (53) is from Lemma A.1 and the convexityof divergence (since 2 − n ( I ( X ; Z ) + ǫ ) < − n ( I ( X ; Z ) + ǫ ) < b ); and (54) is Lemma A.2. Combining the bounds (49) and (54),we obtain Pr (cid:16) |{ i : k z ⊕ X i k = d }| < ⌊ n ( R − I ( X ; Z ) − ǫ ) ⌋ (cid:17) ≤ e − n ǫ [1 − − n ǫ (1 + n ǫ ln 2)] = e − ( n + + + n + . (55)From the union bound, then,Pr (cid:16) ∃ z such that |{ i : k z ⊕ X i k = d }| < ⌊ n ( R − I ( X ; Z ) − ǫ ) ⌋ (cid:17) ≤ e − ( n + + nR + + n + . (56)Since (56) goes to zero as n goes to inﬁnity, we ﬁnd that, with high probability, the number of messages that aredistance d from any z ∈ τ Z is at least j n ( R − I ( X ; Z ) − ǫ ) k = $ n + n ( R − I ( X ; Z )) % , where p X , Z is deﬁned by the conditional distributions in (43) and (44). (cid:3) A ppendix BL emma V.6

Proof of Lemma V.6.

We can upper bound the left-hand side of (32) by splitting into di ﬀ erent type classes. Namely, | S ( z , d ) ∩ E ( s ) | ≤ X X , X ′ , S , Z |{ i : ( x i , x j , s , z ) ∈ τ XX ′ S Z for some j , i }| , where the binary random variables X , X ′ , S , Z satisfy E [ X ] = / , E [ X ′ ] = / , E [ X ⊕ Z ] = d / n , E [ X ⊕ S ⊕ X ′ ] ≤ p + ˜ δ n . (57)Let ǫ , n > ǫ , n < . δ n . Applying Lemma V.3 with ( S , Z ) in place of S , we have |{ i : ( x i , x j , s , z ) ∈ τ XX ′ S Z for some j , i }| ≤ n | R − I ( X ; X ′ S Z ) + | R − I ( X ′ ; S Z ) | + | + + n ǫ , n . (58)We now want to bound this for random variables satisfying (57). In particular, consider two cases. First, if R ≤ I ( X ′ ; S Z ), then we have that (cid:12)(cid:12)(cid:12) R − I ( X ; X ′ S Z ) + | R − I ( X ′ ; S Z ) | + (cid:12)(cid:12)(cid:12) + is equal to = (cid:12)(cid:12)(cid:12) R − I ( X ; X ′ S Z ) (cid:12)(cid:12)(cid:12) + ≤ (cid:12)(cid:12)(cid:12) R − I ( X ; S ⊕ X ′ ) (cid:12)(cid:12)(cid:12) + (59) ≤ (cid:12)(cid:12)(cid:12) R − (1 − H ( p + ˜ δ n )) (cid:12)(cid:12)(cid:12) + (60) ≤ | R − + H ( p ) + δ n | + (61) = , (62)where (59) follows from the data processing inequality, (60) from the last condition of (57), and (62) holds by thedeﬁnition of R . Second, if R > I ( X ′ ; S Z ), then we have that (cid:12)(cid:12)(cid:12) R − I ( X ; X ′ S Z ) + | R − I ( X ′ ; S Z ) | + (cid:12)(cid:12)(cid:12) + is equal to = (cid:12)(cid:12)(cid:12) R − I ( X ; X ′ S Z ) − I ( X ′ ; S Z ) (cid:12)(cid:12)(cid:12) + = (cid:12)(cid:12)(cid:12) R − I ( X ; S Z ) − I ( X ; X ′ | S Z ) − I ( X ′ ; S Z ) (cid:12)(cid:12)(cid:12) + (63) = (cid:12)(cid:12)(cid:12) R − I ( X ; S Z ) − I ( X ′ ; XS Z ) (cid:12)(cid:12)(cid:12) + (64) ≤ (cid:12)(cid:12)(cid:12) R − I ( X ; Z ) − I ( X ′ ; X ⊕ S ) (cid:12)(cid:12)(cid:12) + (65) ≤ (cid:12)(cid:12)(cid:12) R − (1 − H ( d / n )) − (1 − H ( p + ˜ δ n )) (cid:12)(cid:12)(cid:12) + (66) ≤ | R + H ( d / n ) + H ( p ) − + δ n | + (67) ≤ | H ( d / n ) − H ( p ) − δ n | + (68) ≤ H ( d / n ) − H ( p ) − δ n (69) ≤ H ( q ) − H ( p ) − δ n , (70)where (63) and (64) follow from the chain rule, (65) from the data processing inequality, (66) from the last twoconditions of (57), and (69) holds for n su ﬃ ciently large. In either case, (58) is upper bounded by 2 n ( H ( q ) − H ( p ) − δ n + ǫ , n ).Since ǫ , n < . δ n and there are only a polynomial number of types, for su ﬃ ciently large n , we have shown (32). (cid:3) A ppendix CL emma V.11In order to prove Lemma V.11, we ﬁrst give an extension of Lemma 3 of [16].

Lemma C.1.

Let P Y | T be a discrete memoryless channel with capacity C > and capacity-achieving inputdistribution P T . Let ǫ ′′ > , let R ′ , ǫ ′ > such that ǫ ′ < R ′ < C, let n be su ﬃ ciently large, and let M : = nR ′ . Thenthere exist t , . . . , t M of length n from type class τ T that simultaneously satisfy the following for any type class τ TT ′ S :(1) for any i ∈ [ M ] , and any sequence s , |{ j : ( t i , t j , s ) ∈ τ TT ′ S }| ≤ n | R ′ − I ( T ′ ; TS ) | + + n ǫ ′ , (71) (2) if I ( T ; T ′ ) − R ′ > ǫ ′ , then M |{ i : ( t i , t j ) ∈ τ TT ′ for some j , i }| ≤ − ( ǫ ′ / n , and (72) (3) with a typical set decoder, the average error probability for transmission over P Y | T is bounded above by ǫ ′′ .Proof. The existence of a codebook having properties (1) and (2) follows directly from Lemma 3 of [16]. In fact,not only does there exist such a codebook, but a codebook whose codewords are chosen uniformly at random from τ T has both properties with high probability. Furthermore, choosing codewords uniformly at random from τ T willresult with high probability in a code whose average error probability over P Y | T vanishes as block length goes toinﬁnity. Thus, a randomly chosen codebook simultaneously possesses properties (1)-(3) with high probability. (cid:3) To construct a codebook satisfying (a)-(c) of Lemma V.11, we will take a codebook with properties (1)-(3) ofLemma C.1, and eliminate all codewords t i such that the decoding error probability given t i was sent is boundedaway from zero, and also all t i for which there exists j , i with the property that H ( T i | T j ) < ǫ , where T i and T j are artiﬁcial random variables with joint distribution given by the empirical distribution of t i and t j . Proof of Lemma V.11.

Let ǫ, δ > ﬃ ciently small, with δ < ǫ < δ ′ , where δ ′ : = C − R and Y is distributedaccording to P Y | T . Let ǫ ′ > ǫ ′ < min { R + δ, ǫ − δ, δ ′ − ǫ } . Letting R ′ = R + δ , Lemma C.1 states thatthere exists a codebook of size 2 n ( R + δ ) from the type class τ T that satisﬁes properties (1)-(3), letting ǫ ′′ : = ǫ/ τ TT ′ such that H ( T | T ′ ) < ǫ , I ( T ; T ′ ) − ( R + δ ) > H ( T ) − H ( T | T ′ ) − ( R + δ ) (73) > H ( T ) − H ( T | T ′ ) − ( C − δ ′ ) (74) = H ( T ) − H ( T | T ′ ) − I ( T ; Y ) + δ ′ (75) = H ( T | Y ) − H ( T | T ′ ) + δ ′ (76) > H ( T | Y ) + δ ′ − ǫ (77) > ǫ ′ . (78)Thus, by property (2) of Lemma C.1, for each such τ TT ′ ,1 M |{ i : ( t i , t j ) ∈ τ TT ′ for some j , i }| ≤ − ( ǫ ′ / n . Since there are a polynomial number of joint type classes τ TT ′ with H ( T | T ′ ) < ǫ ,1 M |{ i : ( t i , t j ) ∈ τ TT ′ for some T , T ′ s.t. H ( T | T ′ ) < ǫ, and some j , i }| ≤ n O (1) − ( ǫ ′ / n . (79)Removing all codewords t i such that i falls in the set on the left hand side of (79), we have at least 2 n ( R + δ ) (cid:16) − n O (1) − ( ǫ ′ / n (cid:17) codewords remaining. For n su ﬃ ciently large,1 − n O (1) − ( ǫ ′ / n ≥ − ( δ/ n . Thus, the number of remaining codewords is at least 2 n ( R + δ/ .With our initial choice of codebook sequence, the average error probability of the code for transmission over P Y | T with typical set decoding was bounded above by ǫ/

2; this remains true for our now-smaller codebook ofsize 2 n ( R + δ/ . Denote the average error probability for this new codebook by P ne . For each codeword t i remainingin the codebook, there is an associated decoding error probability P ne ( t i ). Remove from the codebook half of thecodewords: in particular, those that have the highest error probabilities. We claim that for each remaining codeword,we have P ne ( t i ) ≤ P ne ≤ ǫ . Indeed, were this not the case, then P ne = P i ∈ S P ne ( t i ) + P i ∈ [2 n ( R + δ/ ] \ S P ne ( t i )2 n ( R + δ/ > ǫ + (2 n ( R + δ/ − · ǫ )2 n ( R + δ/ > ǫ/ , where S is the set of 2 n ( R + δ/ − codewords with smallest error probability. Thus, we conclude that each remainingcodeword in our codebook of size 2 n ( R + δ/ − has error probability bounded above by ǫ . Since 2 n ( R + δ/ − > nR for n su ﬃ ciently large, we may select M : = nR codewords from those remaining, and these have maximum errorprobability bounded above by ǫ . Call these t , . . . , t M .Finally, we show (a). Let τ TT ′ S be some type class. If R + δ > I ( T ′ ; T S ), Lemma C.1 gives that for any i ∈ [ M ],and any sequence s , |{ j : ( t i , t j , s ) ∈ τ TT ′ S }| ≤ n | ( R + δ ) − I ( T ′ ; TS ) | + + n ǫ ′ (80) = n ( R − I ( T ′ ; TS )) + n ( δ + ǫ ′ ) (81) < n | R − I ( T ′ ; TS ) | + + n ǫ , (82)where (82) follows from the upper bound ǫ ′ < ǫ − δ . On the other hand, if R + δ ≤ I ( T ′ ; T S ), then R < I ( T ′ ; T S ),and |{ j : ( t i , t j , s ) ∈ τ TT ′ S }| ≤ n | ( R + δ ) − I ( T ′ ; TS ) | + + n ǫ ′ (83) = n | R − I ( T ′ ; TS ) | + + n ǫ ′ (84) < n | R − I ( T ′ ; TS ) | + + n ǫ . (85) (cid:3) A ppendix DL emma V.12

Proof of Lemma V.12.

We will use the following notation: if | A − B | < ǫ , we write A ǫ = B . For distributions, “ ǫ = ”indicates that this holds for each set of realizations of the random variables involved.Let τ TZS be the joint type class for the sequences t i , z , and s . Note that here T , Z , S represent artiﬁcial randomvariables. We will use Q to denote the distribution of these artiﬁcial random variables (e.g. Q ( t , z , s ) is the jointtype), and P to denote the actual distribution that variables are drawn from. So, for example, we can assume that t i , z are jointly typical with respect to P ( t , z ), i.e., Q ( t , z ) ǫ = P ( t , z ). Let X , Y be random vectors drawn from thedistribution P ( x | t i , z ) W ( y | x , s ) (86)where P ( x | t i , z ) is the conditional distribution for X from the Markov chain T ❝ X ❝ Z . We wish to upper bound P n ∃ j , i : ( t j , Y ) ∈ T ( n ) ǫ o . (87)We can split this probability based on the joint type class of ( t i , t j , z , s , X , Y ), restricted to those for which t j isjointly typical with Y : X τ TT ′ ZSXY s.t. τ T ′ Y ⊂ T ( n ) ǫ P n ∃ j , i : ( t i , t j , z , s , X , Y ) ∈ τ TT ′ ZS XY o . (88)Since there are only polynomially many types, we only need to show that each of the probabilities in (88) is expo-nentially vanishing. Note that, by the law of large numbers, for n su ﬃ ciently large, Q ( x , y | t , z , s ) ǫ = P ( x | t , z ) W ( y | x , s ).Given any joint type, we may write P n ∃ j , i : ( t i , t j , z , s , X , Y ) ∈ τ TT ′ ZS XY o ≤ X j , i s.t( t i , t j , z , s ) ∈ τ TT ′ ZS P n ( t i , t j , z , s , X , Y ) ∈ τ TT ′ ZS XY o ≤ |{ j , i : ( t i , t j , z , s ) ∈ τ TT ′ ZS }| n ( − I ( XY ; T ′ | TZS ) + ǫ ) (89) ≤ n ( | R − I ( T ′ ; S TZ ) | + − I ( XY ; T ′ | S TZ ) + ǫ ) (90)where (89) follows from the joint typicality lemma, and (90) follows from our choice of codebook satisfying LemmaV.11.We now have two cases: the ﬁrst is that R ≥ I ( T ′ ; S T Z ). If this holds, then P n ∃ j , i : ( t i , t j , z , s , X , Y ) ∈ τ TT ′ ZS XY o ≤ n ( R − I ( T ′ ; S TZ ) − I ( XY ; T ′ | S TZ ) + ǫ ) (91) = n ( R − I ( T ′ ; S TZXY ) + ǫ ) (92) ≤ n ( R − I ( T ′ ; Y ) + ǫ ) . (93)Recall that τ T ′ Y ⊂ T ( n ) ǫ , and so I ( T ′ ; Y ) ≥ C ′ − ǫ ′ for some ǫ ′ > ǫ ′ → ǫ →

0. Thus, for ǫ su ﬃ cientlysmall, R < I ( T ′ ; Y ) and I ( T ′ ; Y ) − R > ǫ simultaneously, so (93) vanishes exponentially, and we are done.The second case is that R < I ( T ′ ; S T Z ). Here, (90) reduces to P n ∃ j , i : ( t i , t j , z , s , X , Y ) ∈ τ TT ′ ZS XY o ≤ n ( − I ( XY ; T ′ | S TZ ) + ǫ ) . (94)If I ( XY ; T ′ | T ZS ) > ǫ , (94) will be exponentially vanishing, and we are again done. So, now consider just the typeclasses τ TT ′ ZS XY such that I ( XY ; T ′ | T ZS ) ≤ ǫ . That is, the Markov chain T ′ → S T Z → XY approximately holdsfor the empirical distribution, so that Q ( t , t ′ , z , s , x , y ) ǫ ′ = Q ( t , t ′ , z , s ) Q ( x , y | t , z , s ) . (95)Where ǫ ′ goes to 0 with ǫ . By the assumption that Q ( x , y | t , z , s ) ǫ = P ( x | t , z ) W ( y | x , s ), we have Q ( t , t ′ , z , s , x , y ) ǫ = Q ( t , t ′ , z , s ) P ( x | t , z ) W ( y | x , s ) = Q ( t , z ) P ( x | t , z ) Q ( t ′ , s | t , z ) W ( y | x , s ) (96)where ǫ = ǫ ′ + ǫ . We can also assume that Q ( t , x , z ) ǫ = P ( t , x ) U ( z | x ), and Q ( x | t , z ) ǫ = P ( x | t , z ) so we have Q ( t , t ′ , z , s , x , y ) ǫ = P ( t , x ) U ( z | x ) Q ( t ′ , s | t , z ) W ( y | x , s ) , (97) where ǫ = ǫ ′ + ǫ . In addition, we know that Q ( t ′ , y ) ǫ = X x P ( t ′ , x ) W ( y | x , s ) . (98)Therefore, X t , x , z , s P ( t , x ) U ( z | x ) Q ( t ′ , s | t , z ) W ( y | x , s ) ǫ = X x P ( t ′ , x ) W ( y | x , s ) (99)where ǫ = ǫ ′ + ǫ ) + ǫ . However, we will show that as a consequence of our code design, there is in fact noconditional distribution Q ( t ′ , s | t , z ) that satisﬁes the above. In particular, we can make ǫ as small as we like, so itis enough to show that if Q ( t ′ , s | t , z ) satisﬁes X t , x , z , s P ( t , x ) U ( z | x ) Q ( t ′ , s | t , z ) W ( y | x , s ) = X x P ( t ′ , x ) W ( y | x , s ) (100)then Q ( t ′ | t ) = ( t ′ = t ), which is a contradiction to our choice of codebook.Deﬁne ˜ Y = X ⊕ S , so the channel W Y | X , S is broken down into a deterministic part from ( X , S ) to ˜ Y , followed bya BSC( p ) from ˜ Y to Y . We may then rewrite (100) as X t , x , z , s , ˜ y P ( t , x ) U ( z | x ) Q ( t ′ , s | t , z ) (˜ y = x ⊕ s ) W ( y | ˜ y ) = X x P ( t ′ , x ) W ( y | x , s ) . (101)Consider t ′ =

0. By our encoding procedure, it is then the case that X = X t , x , z , s , ˜ y P ( t , x ) U ( z | x ) Q ( t ′ = , s | t , z ) (˜ y = x ⊕ s ) W ( y | ˜ y ) =  − p y = , p y = . (102)Deﬁne a (˜ y ) : = P t , x , z , s P ( t , x ) U ( z | x ) Q ( t ′ = , s | t , z ) (˜ y = x ⊕ s ). Since W ( y | ˜ y ) is a BSC( p ), and p , , /

2, thenthe above equation simultaneously holding for y = y = a (0)(1 − p ) + a (1) p = − p and a (0) p + a (1)(1 − p ) = p . Solving this system yields a (1) =

0, or, expanded: X t , x , z , s P ( t , x ) U ( z | x ) Q ( t ′ = , s | t , z ) (1 = x ⊕ s ) = . (103)Since each term in this sum is non-negative, they all must equal 0. In particular, suppose there exists a pair ( z , s )where Q ( t ′ = , s | t = , z ) >

0. Then, noting that we only need to consider x = s ⊕

0, so Q ( t ′ = | t = =

1. Since T and T ′ must have the same marginal distribution of Q ( t ) = Q ( t ′ ) = . t , t ′ ,we have Q ( t = | t ′ = =

1, which in turn implies that Q ( t = | t ′ = =

0, and Q ( t ′ = | t = =

0. In otherwords, Q ( t ′ | t ) = ( t ′ = t ). However, this is impossible based on the fact that our codebook satisﬁes Lemma V.11.In other words, there is no consistent type class τ TT ′ ZS XY such that I ( XY ; T ′ | S T Z ) ≤ ǫ . This completes the proofof the lemma.. This completes the proofof the lemma.