[PDF] Rate-Distortion Theory for Secrecy Systems

Abstract

Secrecy in communication systems is measured herein by the distortion that an adversary incurs. The transmitter and receiver share secret key, which they use to encrypt communication and ensure distortion at an adversary. A model is considered in which an adversary not only intercepts the communication from the transmitter to the receiver, but also potentially has side information. Specifically, the adversary may have causal or noncausal access to a signal that is correlated with the source sequence or the receiver's reconstruction sequence. The main contribution is the characterization of the optimal tradeoff among communication rate, secret key rate, distortion at the adversary, and distortion at the legitimate receiver. It is demonstrated that causal side information at the adversary plays a pivotal role in this tradeoff. It is also shown that measures of secrecy based on normalized equivocation are a special case of the framework.

Full PDF

11 Rate-Distortion Theory for Secrecy Systems

Curt Schieler and Paul Cuff

Abstract —Secrecy in communication systems is measuredherein by the distortion that an adversary incurs. The trans-mitter and receiver share secret key, which they use to encryptcommunication and ensure distortion at an adversary. A modelis considered in which an adversary not only intercepts thecommunication from the transmitter to the receiver, but alsopotentially has side information. Speciﬁcally, the adversary mayhave causal or noncausal access to a signal that is correlated withthe source sequence or the receiver’s reconstruction sequence.The main contribution is the characterization of the optimaltradeoff among communication rate, secret key rate, distortionat the adversary, and distortion at the legitimate receiver. It isdemonstrated that causal side information at the adversary playsa pivotal role in this tradeoff. It is also shown that measures ofsecrecy based on normalized equivocation are a special case ofthe framework.

Index Terms —Rate-distortion theory, information-theoretic se-crecy, shared secret key, causal disclosure, soft covering lemma,equivocation.

I. I

NTRODUCTION

In “Communication Theory of Secrecy Systems” [6], Shan-non regarded a communication system as perfectly secretif the source and the eavesdropped message are statisticallyindependent. The secrecy system studied in [6] is referred toas the “Shannon cipher system” and is depicted in Fig. 1. Anecessary and sufﬁcient condition for perfect secrecy is thatthe number of secret key bits per source symbol exceeds theentropy of the source. When the amount of key is insufﬁcient,one must relax the requirement of statistical independence andinvite new measures of secrecy.One common way of measuring sub-perfect secrecy iswith equivocation, the conditional entropy H ( X | M ) of thesource given the public message. The use of equivocation asa measure of secrecy was considered in the original work onthe wiretap channel in [7] and [8], and it continues today.Although a distortion-based approach to secrecy might appearincomparable at ﬁrst glance, it turns out that equivocation(when normalized by blocklength) becomes a special caseof the framework developed here, under the proper choice ofdistortion measure.In this work, we study an information-theoretic measureof secrecy that is directly inspired by rate-distortion theory.Whereas the objective in classical rate-distortion theory is tominimize a receiver’s distortion for a given rate of communi-cation, our goal is to maximize an eavesdropper’s distortionfor a given rate of secret key. If we relax the requirement of This work was supported in part by the National Science Foundation underGrants CCF-1116013 and CCF-1017431, and also by the Air Force Ofﬁceof Scientiﬁc Research under Grant FA9550-12-1-0196. Portions of this paperwere presented in [1], [2], [3], [4], [5].The authors are with the Department of Electrical Engineering, Prince-ton University, Princeton, NJ, 08544 USA (email: [email protected];[email protected]).

Source X A B ˆ X secret key K Cmessage M Fig. 1:

The Shannon cipher system. Nodes A, B, and C are thetransmitter, receiver, and eavesdropper, respectively. lossless communication in Shannon’s cipher system, then ourgoal is to maximize an eavesdropper’s distortion for a givensecret key rate, communication rate, and distortion toleranceat the receiver. Although there are a variety of secrecy systemsother than Shannon’s cipher system (such as a wiretap channel[7] or distributed correlated sources [9], [10]), this paper isconcerned exclusively with settings involving shared secretkey, a single discrete memoryless source, and a noiselesschannel. Moreover, we focus on block codes in the regimeof blocklength tending to inﬁnity.When distortion is used as a measure of secrecy, we areimplicitly viewing an eavesdropper in the same way that oneviews a receiver in a standard rate-distortion setting – as anactive participant whose goal is to produce a sequence thatis statistically correlated with the source sequence. Becausehe plays an active role, the eavesdropper is thought of as anadversarial entity. To ensure robustness, we will design thecommunication and encryption schemes against the worst-caseadversarial strategy; that is, we wish to maximize the minimumdistortion attainable by an adversary.The study of information-theoretic secrecy via rate-distortion theory was initiated by Yamamoto in [11], in whichthe rate-distortion region was characterized for the specialsetting in which no secret key is available. Later, in “Rate-Distortion Theory for the Shannon Cipher System” [12],Yamamoto considered the exact problem we have heretoforedescribed, but only obtained an inner and outer bound onthe achievable rate-key-distortion region. In this paper, wecharacterize the region; however, it is not our main focus.The following example serves to illustrate the care that shouldbe exercised in a distortion-based approach to secrecy andmotivates our primary investigation, which is centered around The inner bound provided in [12] is precisely the region expressed in (49)of this work. Corollary 4 shows that this performance is achievable even ifadditional information is available to the eavesdropper, but it is suboptimalfor the problem at hand. The outer bound in [12] makes use of two auxiliaryvariables, but with the appropriate selection can be shown to be equivalent tothe trivial bound in (48), which in fact we show to be achievable. To showthat the outer bound in [12] is trivial, the variables U and V can be selectedas follows. Let U be independent of X and Y and uniformly distributed on { , . . . , |X |} . Let V = U + X modulo |X | . a r X i v : . [ c s . I T ] O c t a salient feature of our model referred to as causal disclosure . A. One-bit secrecy and causal disclosure

Consider an n -bit i.i.d. source sequence X n (cid:44) ( X , . . . , X n ) with X i ∼ Bern (1 / . Supposecommon randomness K ∼ Bern (1 / is available to thetransmitter and receiver; that is, there is one bit of sharedsecret key. Now suppose the transmitter uses K to encrypt X n by transmitting the n -bit message (cid:101) X n , where (cid:101) X i = X i ⊕ K .In other words, he ﬂips all of the bits of X n if K = 1 ,otherwise he simply sends X n . Upon intercepting the publicmessage (cid:101) X n , the adversary produces a reconstruction Z n andincurs expected distortion E n (cid:80) ni =1 d ( X i , Z i ) , where d ( x, z ) is a per-letter distortion measure. If d ( x, z ) = { x (cid:54) = z } ,then an optimal strategy for the adversary is to simply set Z n = (cid:101) X n , yielding an expected distortion of / . Observethat / is also the maximum possible expected distortionthat we could ever force on the adversary, regardless of theamount of secret key available! It appears as though we havemaximized secrecy by only using one bit of secret key for anarbitrarily long n -bit source. However, this view is severelymisleading because the adversary actually knows a greatdeal about X n , namely that it is one of only two candidatesequences.This example demonstrates the potential fragility of usingdistortion to measure secrecy without recognizing the rami-ﬁcations. For, although maximum secrecy (in the distortionsense) is attained, it vanishes altogether if the adversary viewsjust one true bit of the source sequence (the bit allows him todetermine whether or not to ﬂip the (cid:101) X n sequence). In general,the consequences of this example apply to the setting thatYamamoto considered in [12]. An arbitrarily small rate ofsecret key is enough to guarantee maximum distortion, butsuch secrecy is weak in the sense that even a small amountof additional knowledge (for example, observation of a fewsource symbols) is enough for the adversary to completelyidentify the source sequence.The way that we strengthen a distortion-based approach tosecrecy is through an assumption of causal disclosure, in whichwe design codes under the supposition that the adversary hasnoisy (or noiseless) access to the past behavior of the system.For example, in the one-bit secrecy example we might assumethat the adversary produces the i th reconstruction symbol Z i based not only on the public message M , but also on thepast source symbols X i − . Incidentally, such a modiﬁcationto the standard rate-distortion theory setting does not changethe theory, though it has a dramatic effect in this secrecysetting. Regardless of whether or not an adversary actually hasaccess to such information, designing our encryption underthe assumption that he does leads to a much more robustnotion of secrecy. In particular, it is resistant to disruptionsin secrecy like those exhibited in the example. Despite the“pessimistic” nature of the causal disclosure assumption, weﬁnd that the optimal tradeoff between secret key and distortionin this regime is reasonable and not degenerate.The assumption of causal disclosure is relevant not only forthe sake of robustness, but also for its natural interpretations. In [13], an alternative view of rate-distortion theory wasintroduced in which source and reconstruction sequences areregarded as sequences of actions in a distributed system. Com-munication is used to coordinate the receiver’s actions withthe transmitter’s actions (which are given by nature). In thiscontext, an adversary can be viewed as an active participantin the system who produces a sequence of actions. With thisinterpretation, it is not unrealistic to assume that the adversarycould have causal access to the system behavior. Depending onwhere the adversary is intercepting communication, he mightbe able to view the past actions of the transmitter or receiver(or both) and produce his current action accordingly.We ﬁnd that optimal communication in this setting is notonly fundamentally different than that of other source codingproblems (often requiring a stochastic decoder), but in factlends itself to a simple interpretation of injecting artiﬁcialmemoryless noise into the adversary’s received signal. B. Organization

The content of this paper is as follows. In Section II,we describe the problem setup. In Section III, we present ageneralized version of the one-bit secrecy example in whichthere is no assumption of causal disclosure. In Section IV, westate our main result, Theorem 1, in which causal disclosureis a primary assumption. Theorem 1 describes the optimal re-lationship among the communication rate, secret key rate, anddistortion at the legitimate receiver and adversary. Section IValso establishes a number of relevant corollaries to Theorem 1and provides several concrete examples of the correspond-ing information-theoretic tradeoff regions. In Section V, wedemonstrate how normalized equivocation arises as a specialcase of the causal disclosure framework. In Section VI, wegive the achievability proof of Theorem 1. The proof uses astochastic “likelihood encoder” that enables tractable analysiswhen combined with a “soft covering lemma”. Afterward, wediscuss several important properties and implications of theoptimal communication scheme used in the proof. Section VIIprovides the converse proof of Theorem 1. In Section VIII, weconsider some settings with noncausal disclosure that are notsubsumed by Theorem 1, but that can be proved similarly.Lastly, Section IX gives results for settings involving causaldisclosure with delay greater than one.II. P

RELIMINARIES

The communication system model used throughout is shownin Fig. 2. The transmitting node, Node A, observes an i.i.d.source sequence X n (cid:44) ( X , . . . , X n ) , where X i is distributedaccording to P X . Nodes A and B share a source of commonrandomness K ∈ { , . . . , nR } , referred to as secret key, thatis uniformly distributed and independent of X n . Based on thesource block X n and the secret key K , Node A transmitsa message M ∈ { , . . . , nR } that is received without lossby Nodes B and C. Once M is delivered, all three nodessequentially produce actions: in the i th step, Nodes A, B andC produce X i , Y i , and Z i , respectively. Note that Node A hasno control over his actions; they are simply given by X n . Atthe other end, Node B produces Y i based on the pair ( M, K ) X n A B Y n K ∈ [2 nR ] C Z n ( W i − x , W i − y ) X i P W x | X W x,i Y i P W y | Y W y,i M ∈ [2 nR ] Fig. 2:

Nodes A and B use secret key K and public communication M to coordinate against an adversarial Node C. At each step i , NodeC can view the past behavior of the system, ( W i − x , W i − y ) , where W nx is the output of a memoryless channel (cid:81) P W x | X with input X n ,and W ny is the output of a memoryless channel (cid:81) P W y | Y with input Y n . and the adversarial Node C produces Z i based on M and hisobservation of the past behavior of the system, ( W i − x , W i − y ) .At each step, the joint actions of the players incur a value π ( x, y, z ) , which represents symbol-wise payoff; the block-average payoff is given by n n (cid:88) i =1 π ( X i , Y i , Z i ) . (1)Nodes A and B want to cooperatively maximize payoff, whileNode C wants to minimize payoff through his actions Z n .This payoff function can take the role of distortion incurredby Node C, corresponding to the secrecy metric described inthe introduction. Note that instead of evaluating secrecy andcoordination separately, which could be done with two payofffunctions π ( x, y ) and π ( x, y, z ) , we have uniﬁed them in asingle function π ( x, y, z ) . Of course, the use of multiple payofffunctions does have its own merits, and the results extendreadily.In Fig. 2, we depict noisy causal disclosure by ( W i − x , W i − y ) , where W nx is the output of a memorylesschannel (cid:81) ni =1 P W x | X with input X n , and W ny is the out-put of a memoryless channel (cid:81) ni =1 P W y | Y with input Y n .Modeling the side information in this way covers a varietyof scenarios. For example, if P W x | X and P W y | Y are identitychannels, resulting in ( W x , W y ) = ( X, Y ) , then the adversaryhas full causal access ( X i − , Y i − ) . This is the strongestdeﬁnition of secrecy in the causal disclosure framework andleads to the design of a thoroughly robust secrecy system. If ( W x , W y ) = ( ∅ , ∅ ) , then the adversary is completely blind tothe past and only views the public message M ; this is thesetting of [12], which does not include causal disclosure.We remark that other strong security deﬁnitions involvingside information leaks to the adversary can be found in [14],for example.Throughout, we assume that the alphabets X , Y , and Z areﬁnite. We denote the set { , . . . , m } by [ m ] and use ∆ A todenote the probability simplex of distributions with alphabet A . The notation X ⊥ Y indicates that the random variables X and Y are independent, and X − Y − Z indicates a markovchain relationship. Deﬁnition 1: An ( n, R, R ) code consists of an encoder f : X n × [2 nR ] → [2 nR ] and a decoder g : [2 nR ] × [2 nR ] →Y n . More generally, we allow a stochastic encoder P M | X n ,K and a stochastic decoder P Y n | M,K . An ( n, R, R ) code is saidto have blocklength n , communication rate R , and secret keyrate R .Permitting stochastic decoders that use local randomizationis crucial (in contrast to Wyner’s wiretap channel, in which astochastic encoder is needed). On the other hand, it is likelythat the optimal encoder can be a deterministic function ofthe message and key, but this has not been shown. The proofof our main result uses a stochastic encoder and stochasticdecoder.Nodes A and B use an ( n, R, R ) code to coordinate againstNode C. To ensure robustness, we consider the payoff that canbe assured against the worst-case adversary, i.e., the max-minpayoff. There are several ways to deﬁne the payoff criterionfor a block, and we consider three: expected payoff, assuredpayoff, and symbol-wise minimum payoff. To distinguishamong the three criteria, we use the monikers AVG , WHP , and

MIN , respectively.

Deﬁnition 2:

Fix a source distribution P X , a symbol-wisepayoff function π : X × Y × Z → R , and causal disclosurechannels P W x | X and P W y | Y . For simplicity, denote the pair ( W nx , W ny ) by W n . The triple ( R, R , Π) is achievable if thereexists a sequence of ( n, R, R ) codes such that • Under the

AVG criterion (expected payoff): lim inf n →∞ min { P Zi | M,Wi − } ni =1 E n n (cid:88) i =1 π ( X i , Y i , Z i ) ≥ Π . (2) • Under the

WHP criterion (assured payoff): lim n →∞ min { P Zi | M,Wi − } ni =1 P (cid:104) n n (cid:88) i =1 π ( X i , Y i , Z i ) ≥ Π (cid:105) = 1 . (3) • Under the

MIN criterion (symbol-wise minimum payoff): lim inf n →∞ min i ∈ [ n ] min P Z | M,Wi − E π ( X i , Y i , Z ) ≥ Π . (4)Under the WHP criterion, the range of π ( x, y, z ) is extendedto include −∞ so that lossless communication settings can berecovered.Several remarks concerning the preceding deﬁnitions are inorder.1) Although WHP and

MIN are incomparable, they are bothstronger than

AVG . However, it will be shown that allthree criteria give rise to the same optimal tradeoff region.2) In each of the criteria, we allow the adversary to employhis best set of probabilistic strategies { P Z i | M,W i − } ni =1 that minimize payoff. However, since expectation is linearin P Z i | M,W i − for all i , the expectation is minimized byextreme points of the probability simplex; thus, we canassume that Node C uses a set of deterministic strategies, { z i ( m, w i − ) } ni =1 .

3) It is assumed (although not explicit in the notation)that the adversary has full knowledge of the sourcedistribution and the code that Nodes A and B use.4) The optimal payoff does not increase if Node B is givendirect causal access to Nodes A and C (i.e., if the decoderis given by { P Y i | M,K,X i − ,Z i − } ni =1 instead of simply P Y n | M,K ). This is shown in Section VII in the converseproof of the main result.

Deﬁnition 3:

The rate-payoff region R AVG is the closureof achievable triples ( R, R , Π) under payoff criterion AVG .Regions R WHP and R MIN are deﬁned in the same way.III. O NE - BIT SECRECY , GENERALIZED

In this section, we expand on the scenario in which losslesscommunication is required between Nodes A and B and thereis no causal disclosure of the system behavior to Node C.This is Yamamoto’s setting in [12]. Although the result ofthis section is a special case of the main result in Theorem 1,it is an illustrative starting point.For lossless communication, an additional achievabilitycriterion is required, as stated below. Since X n must equal Y n with high probability, the payoff function is of the form π ( x, z ) . Thus, the achievability criteria for ( R, R , Π) underthe MIN payoff criterion (which is stronger than the

AVG criterion) are that lim n →∞ P [ X n (cid:54) = Y n ] = 0 (5)and lim inf n →∞ min i ∈ [ n ] min z ( m ) E π ( X i , z ( M )) ≥ Π . (6) Proposition 1:

Fix P X and π ( x, z ) . If lossless communica-tion is required and there is no causal disclosure, then R MIN ,the rate-payoff region under payoff criteria

AVG and

MIN isequal to  ( R, R , Π) : R ≥ H ( X ) R ≥ ≤ min z E π ( X, z )  . (7)Thus, any positive rate of secret key guarantees maximumsecrecy (in the distortion sense), as Node C can achieve min z E π ( X, z ) by only knowing the source statistics. In fact,we now prove that each point in (7) can be achieved with keysize K = [ n ] instead of K = [2 nR ] . This shows that even ifthe number of secret key bits is sublinear in the blocklength(in this case, log n ), one can still force the eavesdropper toincur the maximum distortion. As in the example of one-bitsecrecy, such guarantees are shattered if even a small amountof source information is available to the adversary.The following lemma is useful for the payoff analysis. Note that R = 0 is only included in Proposition 1 because we deﬁnedthe region as the closure of achievable triples. Furthermore, we remark that R = 0 refers to a vanishing rate of secret key and is not the same as theabsence of key. In [3], we show that an arbitrarily slow rate of increase is sufﬁcient, evenslower than log n , under the AVG criterion.

Lemma 1:

Let P XY Z be a markov chain X − Y − Z , and f an arbitrary function. Then min g ( x,y ) E f ( g ( X, Y ) , Z ) = min g ( y ) E f ( g ( Y ) , Z ) . (8) Proof:

We have min g ( x,y ) E f ( g ( X, Y ) , Z )= min g ( x,y ) (cid:88) x,y P X,Y ( x, y ) E [ f ( g ( X, Y ) , Z ) | ( X, Y ) = ( x, y )] (9) = (cid:88) x,y P X,Y ( x, y ) min g E [ f ( g, Z ) | ( X, Y ) = ( x, y )] (10) ( a ) = (cid:88) x,y P X,Y ( x, y ) min g E [ f ( g, Z ) | Y = y ] (11) = min g ( y ) E f ( g ( Y ) , Z ) , (12)where (a) follows from the markovity assumption.Now we prove Proposition 1. Proof of Proposition 1: Converse.

By the converse tothe lossless source coding theorem, if (5) holds then we musthave R ≥ H ( X ) . To see that the payoff never exceeds min z E π ( X, z ) , observe that the adversary can always let Z n equal ( z ∗ , . . . , z ∗ ) , where z ∗ = argmin z E π ( X, z ) . (13)Note that this converse argument holds for all three payoffcriteria. Achievability.

Let ε > . Denote the empirical distribution(also referred to as the type) of a sequence x n by P x n : P x n ( x ) = 1 n n (cid:88) i =1 { x i = x } . (14)The set of ε -typical sequences is deﬁned as T nε (cid:44) { x n : | P x n ( x ) − P X ( x ) | < εP X ( x ) , ∀ x ∈ X } . (15)To communicate, Nodes A and B use the set of ε -typicalsequences as their codebook, just as in the standard proof ofthe lossless source coding theorem. If the source sequence X n is typical, then the index of that codeword is the (pre-encrypted) message; if the source sequence is not typical,an arbitrary index is selected. Due to familiar propertiesof the size and probability of the typical set, the rate ofcommunication is (1 + ε ) H ( X ) and the probability of error is P [ X n (cid:54) = Y n ] < ε (16)for large enough n .The message will be encrypted using common random-ness K ∼ Unif [ n ] ; this implies that the rate of secret keyapproaches zero as blocklength tends to inﬁnity. In order toencrypt, we ﬁrst partition T nε into bins of size n (in a mannerspeciﬁed shortly), and use K to apply a one-time pad to thelocation of the source sequence X n within the appropriate bin.More precisely, the encoder operates as follows: if X n is ε -typical and is the L th sequence in the J th bin, then transmitthe message M = ( J, L ⊕ K ) , where ⊕ indicates additionmodulo n . By encrypting in this manner, the adversary knows which bin X n lies in (bin J ), but does not know which ofthose n sequences it is, because L is independent of L ⊕ K .Using the secret key, Node B can recover both J and L andproduce the corresponding sequence.The partitioning of T nε is done according to the followingequivalence relation: x n ∼ y n if x n is a cyclic permutation of y n . (17)Although the resulting partition can contain bins of size lessthan n , the number of such bins is small enough that we canignore them without affecting the communication rate or (16).Thus, we assume that partitioning T nε yields only bins of size n . Due to (17), it can be readily shown that each bin of size n has the following property. Property 1:

View the j th bin (denoted by b j ) as an n × n matrix whose columns are formed from the sequences in thebin. Then every row and column of the matrix has the sameempirical distribution (denoted by P j ) and hence every rowhas the same probability (denoted by α j ) under the sourcedistribution (cid:81) ni =1 P X ( x i ) .This property is the crux of the proof; we offer the followingintuition for why it implies that the eavesdropper suffersmaximal distortion. The eavesdropper knows which bin X n lies in, but does not know where it lies in the bin. Becauseof how we partitioned T nε , the eavesdropper’s uncertainty isspread uniformly over the bin. To estimate X i , the eaves-dropper consults the i th row of the bin; however, Property 1ensures that the empirical distribution of this row matches thetype of the sequences in the bin, which in turn approximatesthe source distribution P X (due to typicality). Therefore, theeavesdropper’s estimate of X i is based on no more than theoriginal source statistics, which means that he suffers maximaldistortion.We now analyze the distortion precisely. For sufﬁcientlylarge n , we have for all i ∈ [ n ] that min z ( m ) E d ( X i , z ( M ))= min z ( j,l ) E d ( X i , z ( J, L ⊕ K )) (18) ( a ) = min z ( j ) E d ( X i , z ( J )) (19) = min z ( j ) (cid:88) j (cid:88) x n ∈ b j p ( x n ) d ( x i , z ( j )) (20) ( b ) = (cid:88) j α j min z (cid:88) x n ∈ b j d ( x i , z ) (21) ( c ) = (cid:88) j α j min z (cid:88) x ∈X nP j ( x ) d ( x, z ) (22) ( d ) ≥ (cid:88) j nα j min z (cid:88) x ∈X (1 − ε ) P X ( x ) d ( x, z ) (23) = P [ X n ∈ T nε ](1 − ε ) min z E d ( X, z ) (24) ≥ (1 − ε ) min z E d ( X, z ) , (25)where (a) is due to ( X i , J ) ⊥ ( L ⊕ K ) and Lemma 1, (b) and(c) are due to Property 1, and (d) follows from the deﬁnitionof T nε . Thus, we have (6). Discussion

Suppose Nodes A and B use the binning scheme justdescribed in the proof of Proposition 1 to achieve maximumsecrecy. What if, instead of eavesdropping only the publicmessage, the adversary is also able to view the past behavior ofthe system, namely X i − ? Because of the structure of each bin(i.e., Property 1), knowledge of just the ﬁrst symbol, X = x ,is enough for the adversary to narrow down the size of thelist of candidate source sequences from n to approximately nP X ( x ) . One can see that the adversary will be able todetermine the true sequence quickly, well before the end ofthe block. In this manner, the adversary can take advantageof the causal disclosure to force the payoff to take on itsminimum value instead of its maximum value. In general,causal disclosure beneﬁts an adversary and gives rise to anontrivial tradeoff between secret key and payoff. We remarkthat one of the key elements in the proof of the main resultis that the beneﬁts of causal disclosure can be voided if theright amount of secret key is available. In fact, it will becomeevident in Section VI that using secret key to sterilize thecausal disclosure gives rise to the optimal tradeoff of secretkey and payoff. IV. M AIN R ESULT

Our main result is the following.

Theorem 1:

Fix P X , π ( x, y, z ) , and causal disclosure chan-nels P W x | X and P W y | Y . Then R AVG , the closure of achievable ( R, R , Π) under payoff criterion AVG , is equal to (cid:91) W x − X − ( U,V ) − Y − W y  ( R, R , Π) : R ≥ I ( X ; U, V ) R ≥ I ( W x W y ; V | U )Π ≤ min z ( u ) E π ( X, Y, z ( U ))  , (26)where |U| ≤ |X | + 2 and |V| ≤ |X ||Y| ( |X | + 2) + 1 .Furthermore, R AVG = R WHP = R MIN . (27)We remark that the convexity of R AVG and R WHP can beshown from Deﬁnitions 2 and 3 by using a standard time-sharing argument. By (27), R MIN is also a convex set.We now elaborate on several corollaries to Theorem 1 thatare obtained through different choices of the causal disclosurechannels P W x | X and P W y | Y . To begin, we consider scenariosin which lossless communication is required between NodesA and B. A. Lossless communication

In the following, we require X n to equal Y n with highprobability. That is, we introduce into Deﬁnition 2 the addi-tional constraint lim n →∞ P [ X n (cid:54) = Y n ] = 0 . (28)Conveniently, (28) can be ensured by considering payoffcriterion WHP with a payoff function π ( x, y, z ) that evaluatesto −∞ when x (cid:54) = y . Corollary 1:

Fix P X , π ( x, z ) , and causal disclosure chan-nel P W x | X . If lossless communication is required (i.e., (28) isimposed), then the rate-payoff region R WHP is equal to (cid:91) U − X − W x  ( R, R , Π) : R ≥ H ( X ) R ≥ I ( W x ; X | U )Π ≤ min z ( u ) E π ( X, z ( U ))  . (29) Proof:

Deﬁne a payoff function π ( x, y, z ) (cid:44) (cid:40) π ( x, z ) if x = y −∞ if x (cid:54) = y. (30)When Π > −∞ , it is easily veriﬁed that lim n →∞ P (cid:104) n n (cid:88) i =1 π ( X i , Y i , Z i ) ≥ Π − ε (cid:105) = 1 (31)if and only if both of the following hold: lim n →∞ P [ X n = Y n ] = 1 (32) lim n →∞ P (cid:104) n n (cid:88) i =1 π ( X i , Z i ) ≥ Π − ε (cid:105) = 1 . (33)Thus, R WHP (the region we seek the characterize) is obtainedby invoking Theorem 1 with W y = ∅ . However, we want tosimplify the region further. Denoting the region in (29) by S ,we now show that R WHP = S .Note that when Π > −∞ , we have − ∞ < Π ≤ min z ( u ) E π ( X, Y, z ( U )) , (34)which implies X = Y . When combined with the markov chain X − ( U, V ) − Y , this gives H ( X | U V ) = 0 . Therefore, R WHP ⊆S follows from writing R ≥ I ( X ; U, V ) = H ( X ) R ≥ I ( W x ; V | U ) = I ( W x ; X, V | U ) ≥ I ( W x ; X | U ) . To see

S ⊆ R

WHP , let V = Y = X .Corollary 1, in turn, spawns two important results. Byinvoking Corollary 1 with W x = ∅ , we recover Proposition 1under WHP . Corollary 2:

Fix P X and π ( x, z ) . If lossless communica-tion is required and there is no causal disclosure, then therate-payoff region R WHP is equal to  ( R, R , Π) : R ≥ H ( X ) R ≥ ≤ min z E π ( X, z )  . (35)If we instead consider the disclosure channel W x = X , wehave the following. Corollary 3:

Fix P X and π ( x, z ) . If lossless communica-tion is required and X i − is disclosed, then the rate-payoff region R WHP is equal to (cid:91) P U | X  ( R, R , Π) : R ≥ H ( X ) R ≥ H ( X | U )Π ≤ min z ( u ) E π ( X, z ( U ))  . (36) B. Lossless communication example

In this section, we present a concrete example of the regionin Corollary 3 (causal disclosure of Node A) and compare itto the region in Corollary 2 (no causal disclosure).We ﬁrst show that (36) can be written as a linear program.Since the constraint on R is ﬁxed by the source distribution,we focus our attention on the boundary of the ( R , Π) tradeoff,namely Π( R ) (cid:44) max P U | X : H ( X | U ) ≥ R min z ( u ) E π ( X, z ( U )) . (37)Notice that this can be rewritten as Π( R ) = max P U ,P X | U : (cid:80) u P U P X | U = P X H ( X | U ) ≥ R (cid:88) u P U ( u ) min z E [ π ( X, z ) | U = u ] . (38)If we are able to restrict the set { P X | U = u } u ∈U in the maxi-mization to a ﬁnite set P ⊆ ∆ X , then Π( R ) can be expressedas a linear program. Indeed, viewing the distribution P U as avector p ∈ R |P| , (38) becomesmaximize d (cid:62) p subject to p ≥ (cid:62) p = 1 T p = P X h (cid:62) p ≤ R (39)where • T ∈ R |X |×|P| is the transition matrix whose columns arethe elements of P . • The vector d ∈ R |P| has entries d u = min z E [ π ( X, z ) | U = u ] , u ∈ U . (40) • The vector h ∈ R |P| has entries h u = H ( X | U = u ) , u ∈ U . (41)To see why there is always a choice of ﬁnite P such thatthe rate-payoff boundary is unaffected, consider the function d : ∆ X → R deﬁned by d ( p ) = min z E [ π ( X, z )] , where X ∼ p. (42)Observe that d ( · ) is the boundary of a convex polytope becauseit is the minimum of |Z| linear functions (and Z is ﬁnite).Deﬁne the set P = { p ∈ ∆ X : d ( p ) is an extreme point of d } (43)Given a set of distributions { P X | U = u } u ∈U that optimize(38), we can write each element P X | U = u as a convex combi-nation of the distributions in P while maintaining the value of the objective. Furthermore, due to the concavity of the entropyfunction, the constraint on R is still satisﬁed. Thus, P issufﬁcient for the optimization.In the particular case that the payoff function is hammingdistance (i.e., π ( x, z ) = { x (cid:54) = z } ), the set P has a particularlyconvenient form: P = { p ∈ ∆ X : p = Unif ( A ) for some A ⊆ X } . (44)This allows us to give the following simple analytical expres-sion for Π( R ) . The proof is given in Appendix A. Theorem 2:

Fix P X and let π ( x, z ) = { x (cid:54) = z } . Deﬁnethe function φ ( · ) as the linear interpolation of the points (log n, n − n ) , n ∈ N . Also, deﬁne π max = 1 − max x P X ( x ) . (45)Then, the boundary of the rate-payoff region when losslesscommunication is required and X i − is disclosed can bewritten as Π( R ) = min { φ ( R ) , π max } . (46)In Fig. 3, we illustrate Theorem 2 for an arbitrary sourcedistribution. Note that when there is no causal disclosure and π ( x, z ) is hamming distance, the payoff is given by Corollary 2as min z E π ( X, z ) = 1 − max x P X ( x ) = π max , (47)regardless of the rate of secret key. Comparing (47) with min { φ ( R ) , π max } demonstrates the effect of causal disclo-sure (see Fig. 3). In particular, we see that the assumption thatthe adversary does not view any of the true source bits can leadto a rather fragile guarantee of maximum secrecy. Indeed, atlow rates of secret key, the gap that results from revealing thesource causally is the difference between maximum secrecyand zero secrecy. This reduction in payoff is the price thatis paid for increased robustness against an adversary (e.g.,preventing pitfalls like those that we saw in the example ofone-bit secrecy).From Theorem 2, we also readily see that the payoff cansaturate when R < H ( X ) , which shows that maximumpayoff is not the same as Shannon’s perfect secrecy. Forexample, if P X = { / , / , / } , then the maximum payoffof / occurs at R = 1 , but H ( X ) = 1 . . C. Lossy communication

In the previous section, the communication rate lay above H ( X ) and did not affect the ( R , Π) tradeoff. However, whenthe requirement of lossless communication is relaxed, all threequantities interact. There are four natural special cases that areobtained by setting W x equal to ∅ or X and setting W y equalto ∅ or Y . We denote the corresponding rate-payoff regions as R ∅ , R A , R B , and R AB to distinguish which nodes’ actionsare causally revealed. Corollary 4:

Fix P X and π ( x, y, z ) . In each of the follow-ing, the region holds under all three payoff criteria. Here n does not refer to blocklength. Π R π max (log 3 , )(log 4 , )(log 5 , )(log 2 , ) Fig. 3:

Illustration of Theorem 2 for a generic source P X with − max x P X ( x ) = π max . The solid curve, Π( R ) =min { φ ( R ) , π max } , is the tradeoff between rate of secret key andpayoff under the assumption of causal disclosure (Corollary 3). Theloosely dashed line is π max , which also corresponds to the payoffwhen there is no causal disclosure (Corollary 2). The densely dashedcurve is φ ( R ) . If there is no causal disclosure, then the rate-payoff region, R ∅ , is equal to (cid:91) P Y | X  ( R, R , Π) : R ≥ I ( X ; Y ) R ≥ ≤ min z E π ( X, Y, z )  . (48)If X i − is disclosed, then the rate-payoff region, R A , is equalto (cid:91) P Y,U | X  ( R, R , Π) : R ≥ I ( X ; Y, U ) R ≥ I ( X ; Y | U )Π ≤ min z ( u ) E π ( X, Y, z ( u ))  . (49)If Y i − is disclosed, then R B is given by directly substituting W x = ∅ and W y = Y in (26). Similarly, if ( X i − , Y i − ) isdisclosed, then R AB is given by directly substituting W x = X and W y = Y in (26). Proof:

Setting ( W x , W y ) = ( ∅ , ∅ ) in Theorem 1 gives R ∅ . Denote the region in (48) by S . If ( R, R , Π) ∈ R ∅ , then R ≥ I ( X ; U, V ) = I ( X ; U, V, Y ) ≥ I ( X ; Y ) (50) Π ≤ min z ( u ) E π ( X, Y, z ( U )) ≤ min z E π ( X, Y, z ) , (51)which gives R ∅ ⊆ S . To see S ⊆ R ∅ , let U = ∅ and V = Y .Setting ( W x , W y ) = ( X, ∅ ) in Theorem 1 gives R A . Denotethe region in (49) by T . If ( R, R , Π) ∈ R A , then R ≥ I ( X ; U, V ) = I ( I ; U, V, Y ) ≥ I ( X ; U, Y ) (52) R ≥ I ( X ; V | U ) = I ( X ; V, Y | U ) ≥ I ( X ; Y | U ) , (53)which gives R A ⊆ T . To see T ⊆ R A , let V = Y . D. Lossy communication examples

In this section, we investigate concrete examples of Corol-lary 4 by considering the payoff function π ( x, y, z ) = { x = y, x (cid:54) = z } . (54)For this choice, the block-average payoff is the fraction ofsymbols in a block that Nodes A and B are able to agree onand keep hidden from Node C.We now present achievable regions for the cases of Corol-lary 4 when P X ∼ Bern (1 / and π ( x, y, z ) is given by (54).The region that we give for R ∅ is optimal, and numericalcomputation suggests that the other regions are optimal aswell. Setting P Y | X = BSC ( α ) , we have R ∅ = (cid:91) α ∈ [0 , ]  ( R, R , Π) : R ≥ − h ( α ) R ≥ ≤ (1 − α )  . (55)If we let U = ∅ and P Y | X = BSC ( α ) , then we have R A ⊇ (cid:91) α ∈ [0 , ]  ( R, R , Π) : R ≥ − h ( α ) R ≥ − h ( α )Π ≤ (1 − α )  . (56)Letting U = ∅ , P Y | X = BSC ( α ) , and P V | Y = BSC ( β ) gives R B ⊇ (cid:91) α,β ∈ [0 , ]  ( R, R , Π) : R ≥ − h ( α ) R ≥ − h ( β )Π ≤ (1 − α (cid:63) β )  (57)and also R AB ⊇ conv  (cid:91) α,β ∈ [0 , ]  ( R, R , Π) : R ≥ − h ( α ) R ≥ h ( α (cid:63) β ) − h ( α ) − h ( β )Π ≤ (1 − α (cid:63) β )  . (58)where α (cid:63) β = α (1 − β ) + β (1 − α ) and conv ( · ) denotes theconvex hull operation. Regions (56) and (57) are convex asgiven.Several observations concerning the regions in Fig. 4 are inorder. First, the minimum payoff is / , which occurs whenthere is no communication or secret key. This is achieved ifNode B generates an i.i.d. sequence according to Bern (1 / and Node C produces an arbitrary sequence. The maximumpayoff that can be guaranteed is / , because Node C cancorrectly guess X with probability one-half without any in-formation. Second, note the strict containment from top tobottom: causal access to Node A (Fig. 4b) is better for theadversary than access to Node B (Fig. 4c), and the combination(Fig. 4d) is strictly better for him than Node A alone. Finally,observe the effect of having a higher secret key rate thancommunication rate, and vice versa. When Node A is causallyrevealed, the payoff is a function of min( R, R ) and there is (a) No causal disclosure.(b) Node A causally disclosed.(c) Node B causally disclosed.(d) Nodes A and B causally disclosed. Fig. 4:

Achievable regions of Corollary 4 for P X ∼ Bern (1 / and π ( x, y, z ) = { x = y, x (cid:54) = z } . Numerical computation suggests thatthese regions are optimal. no advantage in having excess of either rate. However, whenNode B is revealed, both R > R and R > R result inhigher payoff than R = R . When both nodes are revealed, anexcess of secret key rate increases payoff. This phenomenonis particularly surprising because it means that secret key isuseful even beyond the application of a one-time-pad to thecommunication. V. E

QUIVOCATION

In this section, we show that (normalized) equivocation-based measures of secrecy become a special case of the causaldisclosure framework if we choose the payoff function to bea log-loss function. Relating distortion to conditional entropyvia a log-loss function was done recently in the context ofcertain multiterminal source coding problems [15].First, we remark that Theorem 1 can be readily extendedto include multiple distortion functions. For example, if wewanted to separately evaluate coordination and secrecy, wecould use two payoff functions π ( x, y ) and π ( x, y, z ) . Inthis setting, it might be more natural to refer to distortionfunctions than payoff functions, with the goal of minimizingthe distortion between Nodes A and B while maximizing thedistortion between Nodes (A,B) and Node C. Then, the rate-distortion region becomes (cid:91) W x − X − ( U,V ) − Y − W y  ( R, R , D , D ) : R ≥ I ( X ; U, V ) R ≥ I ( W x W y ; V | U ) D ≥ E d ( X, Y ) D ≤ min z ( u ) E d ( X, Y, z ( U ))  . (59)Now consider ( W x , W y ) = ( X, ∅ ) and a distortion function d : X × Y × ∆ X → R deﬁned by d ( x, y, z ) = log 1 z ( x ) (60)where z is a probability distribution on X , and z ( x ) denotesthe probability of x ∈ X according to z ∈ ∆ X . With thischoice, the distortion in criterion AVG can be written as min { P Zi | M,Xi − } ni =1 E n n (cid:88) i =1 d ( X i , Y i , Z i )= 1 n n (cid:88) i =1 min P Z | M,Xi − E d ( X i , Y i , Z ) (61) = 1 n n (cid:88) i =1 min P Z | M,Xi − E log 1 Z ( X i ) (62) ( a ) = 1 n n (cid:88) i =1 H ( X i | M, X i − ) (63) = 1 n H ( X n | M ) , (64)where (a) is due to the Lemma 2 (given below). Thus, forthe log-loss distortion function in (60), expected adversarialdistortion under an assumption of causal disclosure simplybecomes normalized equivocation. These relationships are not known to be true in general.

Lemma 2:

Fix a pair of random variables ( X, Y ) and let Z = ∆ X . Then H ( X | Y ) = min Z : X − Y − Z E log 1 Z ( X ) (65)where z ( x ) is the probability of x according to z . Proof: If X − Y − Z , then E log 1 Z ( X ) (66) = E log 1 P X | Y ( X | Y ) + E log P X | Y ( X | Y ) Z ( X ) (67) = H ( X | Y ) + (cid:88) y,z P Y Z ( y, z ) D ( P X | Y = y || z ) (68) ≥ H ( X | Y ) , (69)with equality if z = P X | Y = y for all ( y, z ) .So far, we have focused on the equivocation of X n ;however, one might be interested in n H ( Y n | M ) or n H ( X n , Y n | M ) , instead. In these cases, the rate-distortion-equivocation regions can again be recovered from Theorem 1(via the form in (59)) by considering ( W x , W y ) = ( ∅ , Y ) , Z = ∆ Y and d ( x, y, z ) = log 1 z ( y ) (70)or ( W x , W y ) = ( X, Y ) , Z = ∆ X ×Y and d ( x, y, z ) = log 1 z ( x, y ) . (71)In all three cases, the regions can be simpliﬁed (in particular,the auxiliary random variable V can be eliminated). Theresults are given in the following theorem, part 1 of whichwas given by Yamamoto in [12]. Corollary 5:

Fix P X and d ( x, y ) . Let R denote the closureof achievable pairs ( R, R , D, E ) .1) If the equivocation criterion is lim inf n →∞ n H ( X n | M ) ≥ E, (72)then R = (cid:91) P Y | X  ( R, R , D, E ) : R ≥ I ( X ; Y ) D ≥ E d ( X, Y ) E ≤ H ( X ) − [ I ( X ; Y ) − R ] +  , (73)where [ x ] + = max { , x } .2) If the equivocation criterion is lim inf n →∞ n H ( Y n | M ) ≥ E, (74)then R = (cid:91) X − U − Y  ( R, R , D, E ) : R ≥ I ( X ; U ) D ≥ E d ( X, Y ) E ≤ H ( Y ) − [ I ( Y ; U ) − R ] +  . (75)3) If the equivocation criterion is lim inf n →∞ n H ( X n , Y n | M ) ≥ E, (76) then R = (cid:91) X − U − Y  ( R, R , D, E ) : R ≥ I ( X ; U ) D ≥ E d ( X, Y ) E ≤ H ( X, Y ) − [ I ( X, Y ; U ) − R ] +  . (77) Proof:

We only prove part 2, as parts 1 and 3 followsimilar arguments. First, ﬁx d ( x, y, z ) according to (70).Then, by Lemma 2, min z ( u ) E d ( X, Y, z ( U )) = H ( Y | U ) . (78)From the discussion above, it is clear that R is characterizedby setting ( W x , W y ) = ( ∅ , Y ) in (59), yielding R = (cid:91) X − ( U,V ) − Y  ( R, R , D, E ) : R ≥ I ( X ; U, V ) R ≥ I ( Y ; V | U ) D ≥ E d ( X, Y ) E ≤ H ( Y | U )  . (79)Denote the region in (75) by S . To see R ⊆ S , ﬁrst consider ( R, R , D, E ) ∈ R . Deﬁning U (cid:48) (cid:44) ( U, V ) , we have R ≥ I ( X ; U, V ) = I ( X ; U (cid:48) ) (80) E ≤ H ( Y | U ) = H ( Y | U, V ) + I ( Y ; V | U ) (81) ≤ H ( Y | U (cid:48) ) + R (82) E ≤ H ( Y ) , (83)which implies ( R, R , D, E ) ∈ S . To see S ⊆ R , let ( R, R , D, E ) ∈ S . Deﬁne V (cid:48) (cid:44) U and ﬁnd a random variable U (cid:48) such that U (cid:48) − U − ( X, Y ) form a markov chain and H ( Y | U (cid:48) ) = H ( Y ) − [ I ( Y ; U ) − R ] + (84)This is always possible because the right-hand side of (84)lies in the interval [ H ( Y | U ) , H ( Y )] . Then, we can write R ≥ I ( X ; U ) = I ( X ; U (cid:48) , V (cid:48) ) (85) R ≥ H ( Y | U (cid:48) ) − H ( Y | U ) = I ( Y ; V (cid:48) | U (cid:48) ) (86) E ≤ H ( Y | U (cid:48) ) , (87)which implies ( R, R , D, E ) ∈ R . Thus, R = S .VI. A CHIEVABILITY PROOF

A. Soft covering lemma

The primary tool used in the achievability proof of Theo-rem 1 is a so-called “soft covering lemma”, a known resultconcerning the approximation of the output distribution of achannel. Various forms of the lemma have appeared in [17]and [18] and related notions from the perspective of randombinning can be found in [19]. Several generalizations of thelemma (including a one-shot version) can be found in [16].In brief, the most basic version of the soft covering lemmais as follows. Fix a joint distribution P X,U . First, generatea random codebook of nR independent codewords, each The name “soft covering lemma” was given in [16]. The same lemma hasalso been referred to as the “resolvability lemma” and “cloud-mixing lemma”. drawn according to (cid:81) ni =1 P U ( u i ) . Select a codeword, uni-formly at random, as the input to a memoryless channel (cid:81) ni =1 P X | U ( x i | u i ) . The lemma states that if R > I ( X ; U ) ,then the distribution of the channel output X n converges to (cid:81) ni =1 P X ( x i ) in expected total variation distance, where theexpectation is with respect to the random codebook.A generalization of the soft covering lemma, presentedshortly, will prove essential to the payoff analysis. Once wedeﬁne a code by pairing a random codebook with a particularstochastic encoder and decoder, the soft covering lemma canbe used to approximate the joint statistics of the system (i.e.,the joint distribution on ( X n , M, K, Y n , W n ) that is inducedby the code) by an “idealized” distribution that has desirableproperties. Having a tractable approximation of the jointdistribution of the system is important because an adversary’soptimal strategy is dictated by a posterior distribution. Forexample, if an adversary tries to estimate the i th source symbol X i based on his observations of causal disclosure X i − andthe public message M , his optimal strategy is entirely deter-mined by the posterior distribution of X i given ( X i − , M ) .The approximating distribution that the soft covering lemmaguarantees will provide a clear understanding of that posteriordistribution and lead to a manageable payoff analysis.Although the distribution approximation in the soft coveringlemma holds for normalized and unnormalized divergence, weuse the total variation version found in [16] and [18] becauseof the following properties that total variation enjoys.Given two probability measures P and Q with commonalphabet X , the total variation distance between P and Q isdeﬁned by (cid:107) P − Q (cid:107) = sup A ∈F | P ( A ) − Q ( A ) | , (88)where F is the sigma algebra of the common alphabet. Property 2:

Total variation distance satisﬁes the following.(a) If the support of P and Q is a countable set X , then (cid:107) P − Q (cid:107) = 12 (cid:88) x ∈X (cid:12)(cid:12)(cid:12) P ( { x } ) − Q ( { x } ) (cid:12)(cid:12)(cid:12) . (89)(b) Let ε > and let f ( x ) be a function with bounded rangeof width b > . Then (cid:107) P − Q (cid:107) < ε = ⇒ (cid:12)(cid:12) E P f ( X ) − E Q f ( X ) (cid:12)(cid:12) < εb, (90)where E P indicates that the expectation is taken withrespect to the distribution P .(c) For any P , Q , and Φ , (cid:107) P − Q (cid:107) ≤ (cid:107) P − Φ (cid:107) + (cid:107) Φ − Q (cid:107) . (91)(d) Let P X P Y | X and Q X P Y | X be two joint distributions withcommon channel P Y | X . Then (cid:107) P X P Y | X − Q X P Y | X (cid:107) = (cid:107) P X − Q X (cid:107) . (92)(e) Let P X and Q X be marginal distributions of P XY and Q XY . Then (cid:107) P X − Q X (cid:107) ≤ (cid:107) P XY − Q XY (cid:107) . (93)We require the following generalization of the soft coveringlemma. Deﬁnition 4:

Let { P X n ,Y n } ∞ n =1 be a sequence of jointdistributions. The sup-information rate of this sequence isdeﬁned as I ( X ; Y ) (cid:44) lim sup n →∞ n i P Xn,Y n ( X n ; Y n ) , (94)where lim sup n →∞ W n (cid:44) inf { τ : P [ W n > τ ] → } (95)and i P X,Y ( a ; b ) (cid:44) log P X,Y ( a, b ) P X ( a ) P Y ( b ) . (96)The function i P X,Y ( a ; b ) is called the information density. Lemma 3 ([16, Corollary VII.4],[18]):

Let { P X n ,Y n } ∞ n =1 be a sequence of joint distributions. Let C ( n ) be a randomcodebook of nR sequences in X n , each drawn independentlyaccording to P X n and indexed by m ∈ [2 nR ] . Let Q Y n denotethe output distribution of the channel when the input is selectedfrom C ( n ) uniformly at random; that is, Q Y n ( y n ) = 2 − nR (cid:88) m ∈ [2 nR ] P Y n | X n ( y n | X n ( m )) . (97)If R > I ( X ; Y ) , then lim n →∞ E C ( n ) (cid:107) Q Y n − P Y n (cid:107) = 0 , (98)where E C ( n ) indicates that the expectation is with respect tothe random codebook. Furthermore, the convergence in (98)occurs exponentially quickly with n if the distribution P X n Y n is memoryless.We now begin the achievability proof of Theorem 1 by spec-ifying the random codebook, stochastic encoder, and stochasticdecoder. B. Design of codebook, encoder, and decoder

In the statement of Theorem 1, we are given disclosurechannels P W x | X and P W y | Y . For simplicity, we treat thechannels as a single channel deﬁned by P W | XY (cid:44) P W x | X P W y | Y . (99)Thus, we denote ( W nx , W ny ) by W n and the causal disclosureby W i − . The memoryless channel from ( X n , Y n ) to W n isdenoted by P W n | X n Y n (cid:44) n (cid:89) i =1 P W | XY . (100)Given a source distribution P X and a disclosure channel P W | XY , ﬁx a distribution P XUV Y W = P X P UV | X P Y | UV P W | XY . (101) Because the codebook is random, the output distribution Q Y n is a randomvariable taking values on ∆ Y n . One way to notate this is through the use ofconditional distributions (i.e., write Q Y n |C ( n ) ), but we choose to suppresssuch notation in order to simplify the presentation. The decomposition of the channel P W | XY into two channels does notplay a role in the achievability proof. The reason Theorem 1 does not featurea generic channel P W | XY is that a matching converse proof has not beensupplied. Note that this distribution satisﬁes the markov chain X − ( U, V ) − Y . Fix a communication rate R > I ( X ; U, V ) and asecret key rate R > I ( W ; V | U ) . Random codebook:

Generate a random superposition code-book in the following manner. First, generate a codebook C ( n ) U of nR codewords from U n i.i.d. according to (cid:81) ni =1 P U . Thesecodewords are indexed by m ∈ [2 nR ] . Then, for each code-word U n ( m ) ∈ C ( n ) U , generate a codebook C ( n ) V ( m ) of nR codewords from V n i.i.d. according to (cid:81) ni =1 P V | U = U i ( m ) .These codewords are indexed by ( m, k ) , k ∈ [2 nR ] . Thus,we have C ( n ) U = ( U n (1) , . . . , U n ( m ) , . . . , U n (2 nR )) (102)and C ( n ) V ( m ) = ( V n ( m, , . . . , V n ( m, k ) , . . . , V n ( m, nR )) . (103)We refer to the entire superposition codebook as C ( n ) . Likelihood encoder:

For a ﬁxed superposition codebook, theencoder is a stochastic likelihood encoder deﬁned by P M | X n K ( m | x n , k ) ∝ n (cid:89) i =1 P X | U,V ( x i | u i ( m ) , v i ( m, k )) , (104)where ∝ indicates that an appropriate normalization factoris required to make P M | X n K a valid conditional probabilitydistribution. Eqn. (104) says that the probability of ( x n , k ) being mapped to the index m is proportional to the probabilitythat x n is the output of the memoryless “test channel” P X | UV with input ( u n ( m ) , v n ( m, k )) . The reason for this choice ofencoder will become clear shortly. Decoder:

The decoder is stochastic and is deﬁned by P Y n | MK ( y n | m, k ) (cid:44) n (cid:89) i =1 P Y | UV ( y i | u i ( m ) , v i ( m, k )) . (105)The random codebook, likelihood encoder, and decodercomprise the code and induce a joint distribution on the systemthat is given by P X n MKY n W n = P X n P K P M | X n K P Y n | MK P W n | X n Y n , (106)where P X n is i.i.d. according to P X , and P K is uniform over [2 nR ] . C. The approximating distribution Q and its property We now use the soft covering lemma (Lemma 3) toyield an approximation to the system-induced distribution P X n MKY n W n . The idealized distribution that we are con-cerned with is described by Fig. 5 and deﬁned explicitly as Q X n MKY n W n (cid:44) Q X n MK P Y n | MK P W n | X n Y n , (107)where Q X n MK is given by Q ( x n , m, k ) (cid:44) n ( R + R ) n (cid:89) i =1 P X | UV ( x i | U i ( m ) , V i ( m, k )) . (108) M ∼ Unif [2 nR ] K ∼ Unif [2 nR ] C ( n ) U C ( n ) V P XY | UV P W | XY W n U n ( M ) V n ( M, K ) ( X n , Y n ) Fig. 5:

Process that deﬁnes Q X n MKY n W n . The pair ( M, K ) indexes a pair of codewords ( U n ( M ) , V n ( M, K )) in the superpositionrandom codebook. The codeword pair is passed through a memoryless channel P XY | UV = P X | UV P Y | UV to get ( X n , Y n ) . Then ( X n , Y n ) is passed through a memoryless channel P W | XY to get W n . Observe that the deﬁnitions of P Y n | MK and P W n | X n Y n ,combined with the factorization of P XUV Y W in (101), allowus to write Q as Q ( x n , m, k, y n , w n )= 2 − n ( R + R ) n (cid:89) i =1 P XY W | UV ( x i , y i , w i | U i ( m ) , V i ( m, k )) , (109)which corresponds to the process depicted in Fig. 5.The stochastic likelihood encoder was deﬁned intentionallyso that Q M | X n K = P M | X n K . In fact, the only differencebetween P and Q lies in the marginal distribution of ( X n , K ) .Indeed, notice that we can write Q X n MKY n W n (cid:44) Q X n MK P Y n | MK P W n | X n Y n (110) = Q X n K P M | X n K P Y n | MK P W n | X n Y n (111) = Q X n K P MY n W n | X n K . (112)Therefore, we can show that P X n MKY n W n ≈ Q X n MKY n W n by demonstrating that P X n K ≈ Q X n K . This is accomplishedusing the soft covering lemma. Lemma 4: If R > I ( X ; U, V ) , then lim n →∞ E C ( n ) (cid:13)(cid:13) P X n MKY n W n − Q X n MKY n W n (cid:13)(cid:13) = 0 . (113) Proof of Lemma 4: E C ( n ) (cid:13)(cid:13) P X n MKY n W n − Q X n MKY n W n (cid:13)(cid:13) ( a ) = E C ( n ) (cid:13)(cid:13) P X n K − Q X n K (cid:13)(cid:13) (114) = E C ( n ) (cid:13)(cid:13) P X n P K − Q X n | K P K (cid:13)(cid:13) (115) ( b ) = 2 − nR nR (cid:88) k =1 E C ( n ) (cid:13)(cid:13) P X n − Q X n | K = k (cid:13)(cid:13) (116) = E C ( n ) (cid:13)(cid:13) P X n − Q X n | K =1 (cid:13)(cid:13) (117) ( c ) = o (1) . (118)The justiﬁcation for the steps is as follows:(a) Eqn. (112) and Property 2d of total variation.(b) Property 2a of total variation.(c) R > I ( X ; U, V ) and the soft covering lemma (Lemma 3).Notice that P X n is i.i.d. according to P X and Q X n | K =1 isthe output distribution of the memoryless channel P X | UV acting on a (sub)codebook of size nR .Approximating P by Q will allow us to analyze the payoff asif Q governs the joint statistics of the system. If the rate ofsecret key is large enough, the structure of Q will allow us to argue that the causal disclosure W i − is actually useless to theeavesdropper and that his best strategy for estimating ( X i , Y i ) is based solely on U n ( M ) . The crucial property of Q thatenables this argument is given in the following lemma. Theproof, which relies on the soft covering lemma, is relegatedto Appendix B. Lemma 5: If R > I ( W ; V | U ) , there exists α ∈ (0 , suchthat lim n →∞ E C ( n ) (cid:13)(cid:13)(cid:13) Q MW n X B Y B − (cid:98) Q MW n X B Y B (cid:13)(cid:13)(cid:13) = 0 , (119)where (cid:98) Q MW n X B Y B (cid:44) Q M · (cid:16) n (cid:89) i =1 P W | U = U i ( M ) (cid:17) · (cid:16) (cid:89) i ∈B P XY | W,U = U i ( M ) (cid:17) (120)and B is any subset of [ n ] of size |B| ≤ (cid:98) αn (cid:99) .To see the signiﬁcance of (cid:98) Q , ﬁrst consider B = ∅ and W =( X, Y ) , so that (cid:98) Q MX n Y n ( m, x n , y n ) = 2 − nR n (cid:89) i =1 P XY | U ( x i , y i | U i ( m )) . (121)Recall that W = ( X, Y ) implies direct causal disclosureof Nodes A and B; that is, the adversary has access to ( M, X i − , Y i − ) at step i . From (121), we see that (cid:98) Q X n Y n | M is a memoryless channel from the codeword U n ( M ) to thepair ( X n , Y n ) . In particular, this implies ( X i , Y i ) − U i ( M ) − ( M, X i − , Y i − ) , ∀ i ∈ [ n ] . (122)Therefore, the adversary’s best estimate of ( X i , Y i ) only de-pends on U i ( M ) and is not improved by the causal disclosure.We have essentially created an artiﬁcial noisy channel fromthe intercepted codeword U n ( M ) to the pair ( X n , Y n ) , aproperty which not only greatly simpliﬁes the payoff analysis,but is interesting independent of the causal disclosure problem.We discuss this effect and some of its implications aftercompleting the achievability proof.For general W , consider B = { i } . In this case, Lemma 5demonstrates that Q approximately satisﬁes the markov chain ( X i , Y i ) − U i ( M ) − ( M, W i − ) , (123)and again we see that adversary’s estimate of ( X i , Y i ) onlydepends on U i ( M ) and is not improved by the causal dis-closure. However, it turns out that the property in (123) isnot quite strong enough for the analysis of the WHP payoffcriterion, which is why Lemma 5 is concerned with sub-blocks ( X B , Y B ) of size linearly increasing with n . D. Analysis of the

MIN payoff criterion

We ﬁrst combine Lemmas 4 and 5 to demonstrate theexistence of a codebook that ensures certain distribution ap-proximations hold simultaneously for all i ∈ [ n ] . Lemma 6:

There exists a sequence of codebooks such that lim n →∞ max i ∈ [ n ] (cid:107) P MW n X i Y i − (cid:98) Q MW n X i Y i (cid:107) = 0 (124)and lim n →∞ max i ∈ [ n ] (cid:107) (cid:98) Q u i ( M ) − P U (cid:107) = 0 . (125) Proof of Lemma 6:

First, for all i ∈ [ n ] we have E C ( n ) (cid:107) P MW n X i Y i − (cid:98) Q MW n X i Y i (cid:107) ( a ) ≤ E C ( n ) (cid:107) P MW n X i Y i − Q MW n X i Y i (cid:107) + E C ( n ) (cid:107) Q MW n X i Y i − (cid:98) Q MW n X i Y i (cid:107) (126) ( b ) ≤ E C ( n ) (cid:107) P X n MKY n W n − Q X n MKY n W n (cid:107) + E C ( n ) (cid:107) Q MW n X i Y i − (cid:98) Q MW n X i Y i (cid:107) (127) ( c ) = O ( e − γn ) (128)for some γ > . Steps (a) and (b) use Properties 2c and 2eof total variation distance, respectively. Step (c) follows fromLemmas 4 and 5, and the fact that the convergence in the softcovering lemma occurs exponentially quickly with n .Next, we invoke Lemma 3 to show that, for all i ∈ [ n ] , E C ( n ) (cid:107) (cid:98) Q U i ( M ) − P U (cid:107) = O ( e − βn ) , (129)for some β > . The soft covering lemma applies because: • (cid:98) Q U i ( M ) is the output distribution of the identity channelacting on a “codebook” of nR “codewords” generatedi.i.d. according to P U – the “codebook” consists of ( U i (1) , . . . , U i (2 nR )) . Furthermore P U is the output dis-tribution when the input distribution is P U , because thechannel is the identity channel. • The rate requirement is trivially satisﬁed because

R > and lim sup n →∞ n i P U ( U i ; U i ) ≤ lim n →∞ n log |U| = 0 . (130)Combining (128) and (129), we can write lim n →∞ E C ( n ) (cid:20) n (cid:88) i =1 (cid:107) P X n Y i M − (cid:98) Q X n Y i M (cid:107) + n (cid:88) i =1 (cid:107) (cid:98) Q U i ( M ) − P U (cid:107) (cid:21) = 0 . (131)It is straightforward to verify that this fact implies the state-ment of the lemma.With Lemma 6 in hand, we proceed with the analysis of the MIN payoff criterion. Let Π ≤ min z ( u ) E π ( X, Y, z ( U )) . For all i ∈ [ n ] , we have min z ( m,w i − ) E P π ( X i , Y i , z ( M, W i − )) ( a ) = min z ( m,w i − ) E (cid:98) Q π ( X i , Y i , z ( M, W i − )) − o (1) (132) ( b ) = min z ( u ) E (cid:98) Q π ( X i , Y i , z ( u i ( M ))) − o (1) (133) ( c ) = min z ( u ) E π ( X, Y, z ( U )) − o (1) (134) ≥ Π − o (1) . (135)Step (a) uses the ﬁrst part of Lemma 6 along with Property2b of total variation. Step (b) follows from Lemma 1 becauseunder (cid:98) Q MW n X i Y i , the following markov chain holds: ( X i , Y i ) − u i ( M ) − ( M, W i − ) . (136)Step (c) is due to the second part of Lemma 6 and Property2b of total variation. This completes the analysis of the MIN payoff criterion.

E. Analysis of the

WHP payoff criterion

Without loss of generality, we restrict attention to thosedistributions P UV XY W that satisfy P XY ( x, y ) > ⇒ π ( x, y, z ) > −∞ , ∀ x, y, z. (137)Otherwise, min z E π ( X, Y, z ) = −∞ and the region in Theo-rem 1 is trivial.The analysis will take place over sub-blocks of length k = (cid:98) αn (cid:99) rather than over the full block. For ease of presentation,we assume that (cid:98) αn (cid:99) = αn and that k divides n evenly; theanalysis is readily adjusted when this is not the case. We ﬁrstﬁx some notation for handling sub-blocks. Denote the indicesof the j th sub-block by the set B ( j ) : B ( j ) = { jk, jk + 1 , . . . , ( j + 1) k − } , j ∈ [1 /α ] . (138)Furthermore, denote the ﬁrst t indices of sub-block B ( j ) by B t ( j ) ; for example, B j ) = j and B k ( j ) = B ( j ) . Somemore notation: denote the adversary’s optimal reconstructionsequence by { Z ∗ i } ni =1 and, for brevity, deﬁne ρ (cid:44) min z ( u ) E π ( X, Y, z ( U )) . (139)Let Π < ρ and ε = ρ − Π . To prove achievability under the WHP criterion, we claim that it is enough to show that, for all j ∈ [1 /α ] , lim k →∞ E C ( n ) P (cid:98) Q (cid:104) k (cid:88) i ∈B ( j ) π ( X i , Y i , Z ∗ i ) < ρ − ε (cid:105) = 0 , (140)where (cid:98) Q MW n X B ( j ) Y B ( j ) is given in Lemma 5. Indeed, if this is true, then we can write E C ( n ) P (cid:104) n n (cid:88) i =1 π ( X i , Y i , Z ∗ i ) ≥ Π − ε (cid:105) ≥ E C ( n ) P (cid:104) n n (cid:88) i =1 π ( X i , Y i , Z ∗ i ) ≥ ρ − ε (cid:105) (141) ≥ E C ( n ) P (cid:104) (cid:92) j (cid:110) k (cid:88) i ∈B ( j ) π ( X i , Y i , Z ∗ i ) ≥ ρ − ε (cid:111)(cid:105) (142) = 1 − E C ( n ) P (cid:104) (cid:91) j (cid:110) k (cid:88) i ∈B ( j ) π ( X i , Y i , Z ∗ i ) < ρ − ε (cid:111)(cid:105) (143) ( a ) ≥ − /α (cid:88) j =1 E C ( n ) P (cid:104) k (cid:88) i ∈B ( j ) π ( X i , Y i , Z ∗ i ) < ρ − ε (cid:105) (144) ( b ) = 1 − /α (cid:88) j =1 E C ( n ) P (cid:98) Q (cid:104) k (cid:88) i ∈B ( j ) π ( X i , Y i , Z ∗ i ) < ρ − ε (cid:105) − o (1) (145) ( c ) = 1 − o (1) . (146)Step (a) uses a union bound. Step (b) is due to Lemma 5and the deﬁnition of total variation. Step (c) follows from thehypothesis in (140) and the fact that /α is a constant thatdoes not grow with n .We now show that (140) holds for all j ∈ [1 /α ] . Since ouranalysis is the same for all sub-blocks, we drop the subscripton B ( j ) and simply consider an arbitrary sub-block B of size k . We cannot use the standard law of large numbers to show(140) because the dependence of Z ∗ i on ( M, W i − ) impliesthat the random variables { π ( X i , Y i , Z ∗ i ) } i ∈B are not mutuallyindependent. Instead, we condition on U n ( M ) and use amartingale argument.For simplicity, denote U n ( M ) by U n . Let { S t } t ∈B bedeﬁned by S t (cid:44) (cid:88) i ∈B t ( π ( X i , Y i , Z ∗ i ) − ρ i ( U n )) , (147)where ρ i ( U n ) (cid:44) min z ( u n ) E (cid:98) Q [ π ( X i , Y i , z ) | U n ] . (148)We claim that, conditioned on U n , S t is a submartingale,i.e., E (cid:98) Q [ S t | S t − , U n ] ≥ S t − , ∀ t ∈ B . (149)To verify the claim, ﬁrst observe that the deﬁnition of S t gives E (cid:98) Q [ S t | S t − , U n ] = S t − + E (cid:98) Q [ π ( X t , Y t , Z ∗ t ) | S t − , U n ] − ρ t ( U n ) . (150)Moreover, for each t ∈ B , we have E (cid:98) Q [ π ( X t , Y t , Z ∗ t ) | S t − , U n ] ≥ min z ( m,w t − ,u n ,s t − ) E (cid:98) Q [ π ( X t , Y t , z ( M, W t − )) | S t − , U n ] (151) ( a ) = min z ( u n ,s t − ) E (cid:98) Q [ π ( X t , Y t , z ) | S t − , U n ] (152) ( b ) = min z ( u n ) E (cid:98) Q [ π ( X t , Y t , z ) | U n ] (153) = ρ t ( U n ) . (154)Step (a) follows by invoking Lemma 1 after noting that under (cid:98) Q we have the markov chain ( X t , Y t ) − ( U n , S t − ) − ( M, W t − ) . (155)Step (b) follows from the markov chain ( X t , Y t ) − U n − S t − . (156)Thus, conditioned on U n , we see that S t is a submartingale.By Doob’s decomposition theorem, we can write S t = M t + A t , where M t is a martingale (conditioned on U n ) and A t isan increasing process with A = 0 . Therefore, conditioningon U n , we have P (cid:98) Q (cid:104) k (cid:88) i ∈B π ( X i , Y i , Z ∗ i ) < k (cid:88) i ∈B ρ i ( U n ) − ε (cid:12)(cid:12)(cid:12) U n (cid:105) = P (cid:98) Q [ S k < − kε | U n ] (157) ≤ P (cid:98) Q [ M k < − kε | U n ] (158) = P (cid:98) Q (cid:2) M k − E (cid:98) Q [ M k ] < − kε − E (cid:98) Q [ M k ] (cid:12)(cid:12) U n (cid:3) (159) ( a ) ≤ Var (cid:98) Q ( M k | U n )( kε + E (cid:98) Q [ S ]) , (160)where (a) follows from Chebyshev’s inequality. Now werecursively bound the variance of M k (conditioned on U n ) by writing Var ( M k | U n ) ( a ) = Var ( E [ M k | M k − , U n ]) (161) + E [ Var ( M k | M k − , U n )] (162) ≤ Var ( E [ M k | M k − , U n ]) + O (1) (163) = Var ( M k − | U n ) + O (1) . (164)Step (a) uses the law of total variance. The recursion impliesVar (cid:98) Q ( M k | U n ) ∈ O ( k ) , which, together with (160), shows lim k →∞ P (cid:98) Q (cid:104) k (cid:88) i ∈B π ( X i , Y i , Z ∗ i ) < k (cid:88) i ∈B ρ i ( U n ) − ε (cid:12)(cid:12)(cid:12) U n (cid:105) = 0 . (165)Since this convergence is uniform for all U n , we can take theexpectation over random codebooks to get lim k →∞ E C ( n ) P (cid:98) Q (cid:104) k (cid:88) i ∈B π ( X i , Y i , Z ∗ i ) < k (cid:88) i ∈B ρ i ( U n ) − ε (cid:105) = 0 . (166)Continuing, notice that ρ i ( U n ) can be written as ρ i ( U n ) = min z E (cid:98) Q [ π ( X i , Y i , z ) | U n ] (167) = min z E (cid:98) Q [ π ( X i , Y i , z ) | U i ] (168) (cid:44) ρ ( U i ) (169) because of the markov chain ( X i , Y i ) − U i − U n that holdsunder (cid:98) Q . Furthermore, the expected value of ρ ( U i ) is E C ( n ) ρ ( U i ) = E C ( n ) E (cid:98) Q [ π ( X i , Y i , z ) | U i ] (170) = min z ( u ) E C ( n ) π ( X i , Y i , z ( U i )) (171) ( a ) = min z ( u ) E C ( n ) π ( X, Y, z ( U )) (172) = ρ (173)where step (a) is due to the fact (readily veriﬁed) that E C ( n ) Q X i Y i U i ( M ) = P XY U . Therefore, because U n is i.i.d.according to P U (in expectation over the random codebooks),we can invoke the law of large numbers to get lim k →∞ E C ( n ) P (cid:98) Q (cid:104) k (cid:88) i ∈B ρ ( U i ) > ρ − ε (cid:105) = 1 . (174)This, together with (166), yields lim k →∞ E C ( n ) P (cid:98) Q (cid:104) k (cid:88) i ∈B π ( X i , Y i , Z ∗ i ) < ρ − ε (cid:105) = 0 , (175)completing the proof of (140). Finally, we invoke Shannon’srandom coding argument to ensure the existence of a codebookthat satisﬁes the payoff criterion. This concludes the achiev-ability proof of the WHP payoff criterion.

F. Discussion: Optimal encoding produces artiﬁcial noise

The optimal encoding and decoding scheme designed in thissection produces an effect that is worth investigating outsideof this particular context of rate-distortion theory for secrecysystems. In particular, consider the most pessimistic disclosureassumption, that W = ( X, Y ) . In this case, the communicationsystem effectively corrupts the i.i.d. information signal X n with noise by synthesizing a memoryless broadcast channel,with the information source X n as input, actions at theintended receiver Y n as one output, and a sequence U n asthe other output observed by the adversary. The synthesis isaccurate in a particular sense relevant to secrecy. That is, thecommunication system, which uses public message M andsecret key K to facilitate coordination, synthesizes memorylessnoise characterized by P Y U | X by producing a distributionon ( X n , Y n , M ) such that P X n Y n | M closely approximates (cid:81) ni =1 P XY | U ( x i , y i | u i ( M )) for a set of statistically typical u n ( M ) sequences. This behavior is revealed by (cid:98) Q MX n Y n in(121), which the proof shows to converge to the induced jointdistribution of the system in the limit of large n .Let us now consider why this might be an operationallymeaningful criterion for synthesizing noise in a secrecy setting.Consider an adversary who actually does observe a noise-corrupted version of the information signal, such as one ofthe outputs of a broadcast channel. As in any probabilisticsituation, rational behavior is based on the posterior distribu-tion of the state of the universe given what is known to theindividual. In this situation that means P X n ,Y n | U n will dictatethe adversary’s optimal behavior, regardless of the objectivethat the adversary is trying to accomplish. Therefore, a com-munication system that mimics P X n ,Y n | U n will elicit the samebehavior by an adversary for the same observed U n sequence as would occur if the noisy channel was genuine. Furthermore,if the observed U n sequence is statistically representative of true noisy observations, then the communication systemperformance in the presence of an adversary will be equivalentto the memoryless broadcast channel that it mimics.For comparison, consider the work of Winter in [20].Although the communication setting and results in [20] arequite different from ours in that the setting does not have aninformation source provided by nature, our proof and methodsfor achievability bear resemblance. There, he considers adistribution on a triple of variables ( X, Y, U ) and a com-munication system that generates correlated random variables X n and Y n at two different nodes using communication andsecret key in the presence of an adversary. For the sakeof comparison, imagine Y n as a noisy version of X n . Thesecrecy criterion in that work is very strong, requiring thatthe public message reveal no more about the sequences X n and Y n than the correlated sequence U n would, in the sensethat M is stochastically degraded from U n with respect to ( X n , Y n ) . This is stronger than the secrecy criterion we gavein the previous paragraphs, requiring more communicationresources as a consequence. However, the noise synthesisachieved by the communication system of this section, evenwith the weaker secrecy performance implied by (121), hasthe same compelling operational signiﬁcance—an adversarycan gain no more advantage from the eavesdropped messagethan they could by observing the correlated U n sequence.VII. C ONVERSE P ROOF

It is enough to prove the converse to Theorem 1 for just the

AVG payoff criterion, since it is the weakest of the criteria.We further weaken the conditions by allowing Node B causalaccess to Nodes A and C (i.e., we permit decoders of the form { P Y i | MKX i − Z i − } ni =1 ). We will see that this allowance doesnot increase the payoff.Fix a source distribution P X , a payoff function π ( x, y, z ) ,and causal disclosure channels P W x | X and P W y | Y . For easeof presentation, denote the pair ( W nx , W ny ) by W n . Next, let J be an auxiliary random variable drawn uniformly from [ n ] ,independently of ( X n , Y n , W n , M, K ) . Deﬁne the followingrandom variables: X = X J (176) Y = Y J (177) Z = Z J (178) ( W x , W y ) = W J (179) U = ( M, W J − , J ) (180) V = K. (181)With these choices, it can be veriﬁed that W x − X − ( U, V ) − Y − W y (182) X ∼ P X (183) W x | X ∼ P W x | X (184) W y | Y ∼ P W y | Y (185) Exact characterization of this depends on the speciﬁc objectives of thecommunication system. The following properties of P MKX n Y n W n can also be veri-ﬁed: X n ⊥ K (186) X J − ( M, K, X J − , J ) − W J − (187) X J ⊥ J. (188)Let ( R, R , Π) be an achievable triple. We ﬁrst have nR ≥ H ( M ) (189) ≥ H ( M | K ) (190) ≥ I ( X n ; M | K ) (191) ( a ) = I ( X n ; M, K ) (192) = n (cid:88) j =1 I ( X j ; M, K | X j − ) (193) = n (cid:88) j =1 I ( X j ; M, K, X j − ) (194) ( b ) = n (cid:88) j =1 I ( X j ; M, K, X j − , W j − ) (195) ≥ n (cid:88) i =1 I ( X j ; M, K, W j − ) (196) ( c ) = nI ( X J ; M, K, W J − , J ) (197) = nI ( X ; U, V ) , (198)where (a), (b), and (c) follow from (186), (187), and (188).Next, we have nR ≥ H ( K ) (199) ≥ H ( K | M ) (200) ≥ I ( W n ; K | M ) (201) ≥ n (cid:88) j =1 I ( W j ; K | M, W j − ) (202) ( a ) = nI ( W J ; K | M, W J − , J ) (203) = nI ( W ; V | U ) , (204)where (a) follows from (188). Finally, we have Π ≤ min z ( m,w j − ,j ) E n n (cid:88) j =1 π ( X j , Y j , z ( M, W j − , j )) (205) = min z ( m,w j − ,j ) E (cid:104) E [ π ( X J , Y J , z ( M, W J − , J )) | J ] (cid:105) (206) = min z ( m,w j − ,j ) E π ( X J , Y J , z ( M, W J − , J )) (207) = min z ( u ) E π ( X, Y, z ( U )) . (208)It remains to bound the cardinality of U and V , which isstraightforward from the standard support lemma (e.g., [21]).Note that the set of markov distributions forms a compact,connected set. To bound U , it sufﬁces to have |X |− elementsto preserve P X and 3 more elements to preserve H ( X | U, V ) , I ( W ; V | U ) , and min z ( u ) E π ( X, Y, z ( U )) . To bound V , itsufﬁces to have |X ||Y||U|− elements to preserve P XY U and2 more elements to preserve H ( X | U, V ) and H ( W | U, V ) . VIII. O THER FORMS OF DISCLOSURE

In this section, we consider several relevant scenarios thatare not directly subsumed by Theorem 1, but that can besolved by modifying the proof slightly. Throughout, we denote ( W nx , W ny ) by W n . Whereas previously we assumed that theeavesdropper has access to causal disclosure W i − , now weconsider three other types of disclosure: W i , W i , and W n . Itturns out that the regions corresponding to W i and W n arethe same. Theorem 3:

Fix P X , π ( x, y, z ) and disclosure channels P W x | X and P W y | Y . If W i is disclosed instead of W i − , thenthe rate-payoff region for all three payoff criteria is equal to (cid:91) P Y | X  ( R, R , Π) : R ≥ I ( X ; Y ) R ≥ ≤ min z ( w x ,w y ) E π ( X, Y, z ( W x , W y ))  . (209) Proof:

The proof of achievability is very similar to thatof Section VI. Deﬁne the random codebook, encoder, decoder,and Q X n MKY n W n in the same way, but set U = ∅ and V = Y throughout. Lemma 4 ensures that the system-induceddistribution is approximated by Q since R > I ( X ; Y ) . Insteadof the property in Lemma 5, the desired property of Q is now Q MX B Y B W B ≈ Q M · (cid:16) (cid:89) i ∈B P XY W (cid:17) . (210)The soft covering lemma can be invoked to show that thisproperty holds if the rate of secret key satisﬁes R > lim sup n →∞ n i Q ( X B Y B W B ; Y n ) = 0 . (211)Thus, under Q , the message M is approximately indepen-dent of ( X i , Y i , W i ) and the eavesdropper’s best estimate of ( X i , Y i ) only depends on his observation of the disclosure W i .The payoff analysis of Section VI is straightforward to modifyaccordingly.To prove the converse, it is ﬁrst straightforward to bound R and R . To bound Π , deﬁne ( W x , W y ) = W J , where J ∼ Unif ( n ) , and write Π ≤ min z ( m,w j ,j ) E n n (cid:88) j =1 π ( X j , Y j , z ( M, W j , j )) (212) ≤ min z ( w j ) E n n (cid:88) j =1 π ( X j , Y j , z ( W j )) (213) = min z ( w ) E π ( X J , Y J , z ( W J )) (214) = min z ( w x ,w y ) E π ( X, Y, z ( W x , W y )) . (215) Theorem 4:

Fix P X , π ( x, y, z ) and disclosure channels P W x | X and P W y | Y . If W n or W i is disclosed instead of W i − ,then the rate-payoff region for all three payoff criteria is equal to (cid:91)  ( R, R , Π) : R ≥ I ( X ; U, V ) R ≥ I ( W x , W y ; V | U )Π ≤ min z ( u,w x ,w y ) E π ( X, z ( U, W x , W y ))  , (216)where the union is taken over all markov chains W x − X − ( U, V ) − Y − W y . (217) Proof:

For the proof of achievability, suppose W n is dis-closed. The proof is almost exactly the same as in Section VI.The code and the rates are identical, as is the deﬁnition of theapproximating distribution Q . Notice that under (cid:98) Q (deﬁned inLemma 5), the following markov chain holds for all i ∈ [ n ] : ( X i , Y i ) − ( U i ( M ) , W i ) − ( M, W n ) . (218)Thus, the eavesdropper’s best strategy only depends on ( U i ( M ) , W i ) ; the rest of the disclosure of W n is rendereduseless. To adjust the analysis of the payoff criteria, simplyuse markov relations similar to the one in (218).To show the converse proof, suppose that only W i isdisclosed. The proof follows arguments similar to those inSection VII, with exactly the same identiﬁcation of randomvariables. IX. C AUSAL DISCLOSURE WITH DELAY

In this section, we consider the effects of assuming that theadversary has delayed causal access to the system behavior. Inother words, we replace causal disclosure W i − with W i − d , d > . Surprisingly, this has a major effect on relaxingthe amount of secret key required to maintain secrecy. Weestablish an inner and outer bound on the corresponding rate-payoff region and give an example in which the boundsmeet. Using the bounds, we further show that if losslesscommunication is required, the minimum rate of secret keyneeded to ensure a given level of payoff is on the order of /d . A. Inner and outer boundTheorem 5 (Inner bound, causal disclosure with delay d ): Fix P X , π ( x, y, z ) , and causal disclosure channels P W x | X and P W y | Y . Let R d denote the closure of achievable ( R, R , Π) when the causal disclosure has delay d ≥ . Then R d ⊇ (cid:91)  ( R, R , Π) : R ≥ d I ( X d ; U, V ) R ≥ d I ( W dx W dy ; V | U )Π ≤ min z ( u ) E (cid:104) d d (cid:88) j =1 π ( X j , Y j , z ( U )) (cid:105) , (219)where the union is taken over all markov chains W dx − X d − ( U, V ) − Y d − W dy (220) Numerical investigation reveals that the bounds are not tight in general. in which P X d W dx = d (cid:89) j =1 P X P W x | X (221)and P W dy | Y d = d (cid:89) j =1 P W y | Y . (222) Proof:

For simplicity, we present the proof for d = 2 .Denote ( W nx , W ny ) by W n . The idea is to transform theproblem into one involving delay d = 1 so that we can applyTheorem 1. To that end, we ﬁrst treat the source X n as ani.i.d. sequence (cid:101) X n of super-symbols of length by deﬁning (cid:101) X i = ( X i − , X i ) , i = 1 , , . . . , n/ . (223)Similarly, treat Y n and W n as sequences of super-symbolsby appropriately deﬁning (cid:101) Y n and (cid:102) W n . Under this deﬁnition,observe that at steps i = 2 , , . . . , n the adversary has accessto (cid:102) W i − = W i − . Suppose that at steps i = 1 , , . . . , n wedisclose additional information W i − to the adversary. Nowthe causal disclosure to the adversary is exactly (cid:102) W i − for all i ∈ [ n ] . Note that supplying extra information to the adversarycan only reduce the achievable region.To complete the transformation, deﬁne a payoff function (cid:101) π : X × Y × Z → R by (cid:101) π ( x , y , z ) = (cid:88) j =1 π ( x j , y j , z j ) . (224)If ( (cid:101) R, (cid:101) R , (cid:101) Π) is an achievable triple for this transformedproblem, then ( (cid:101) R/ , (cid:101) R / , (cid:101) Π / is an achievable triple for thedelayed causal disclosure problem with d = 2 . By applyingTheorem 1, we obtain the region in (219) for d = 2 . Theorem 6 (Outer bound, causal disclosure with delay d ): Fix P X , π ( x, y, z ) , and causal disclosure channels P W x | X and P W y | Y . Let R d denote the closure of achievable ( R, R , Π) when the causal disclosure has delay d ≥ . Then R d ⊆ (cid:91)  ( R, R , Π) : R ≥ I ( X ; U, V ) R ≥ d I ( W x W y ; V | U )Π ≤ min z ( u ) E π ( X, Y, z ( U ))  , (225)where the union is taken over all markov chains W x − X − ( U, V ) − Y − W y . (226) Proof:

The key to the proof is the following lemma.

Lemma 7:

For arbitrary random variables ( X n , Y ) , it holdsthat d · I ( X n ; Y ) ≥ n (cid:88) i =1 I ( X i ; Y | X i − d ) . (227) Proof of Lemma 7: d · I ( X n ; Y )= d (cid:88) j =1 I ( X n ; Y ) (228) ≥ d (cid:88) j =1 I ( X n − (( n − j ) mod d ) j ; Y ) (229) ( a ) = d (cid:88) j =1 (cid:88) i ∈ [ n ] ,i ≥ j,i ≡ j mod d I ( X ii − d +1 ; Y | X i − d ) (230) = n (cid:88) i =1 I ( X ii − d +1 ; Y | X i − d ) (231) ≥ n (cid:88) i =1 I ( X i ; Y | X i − d ) . (232)Step (a) uses the chain rule for mutual information on each ofthe d terms.The converse steps of Section VII can now be modiﬁed bydeﬁning U = ( M, W J − d , J ) .First, bound R by writing nR ≥ H ( M ) (233)... ≥ nI ( X J ; M, K, W J − , J ) (234) ≥ nI ( X J ; M, K, W J − d , J ) (235) = nI ( X ; U, V ) . (236)Next, bound R by writing d · nR ≥ d · H ( M ) (237)... ≥ d · I ( W n ; K | M ) (238) ( a ) ≥ n (cid:88) j =1 I ( W j ; K | M, W j − d ) (239) = nI ( W J ; K | M, W J − d , J ) (240) = nI ( W ; V | U ) , (241)where (a) uses Lemma 7. Finally, Π can be bounded in themanner of Section VII. B. Lossless communication

We now specialize the inner and outer bound to the settingin which lossless communication is required and X i − d isdisclosed. In this regime, we are able to show explicitly howdelay affects the tradeoff between rate of secret key and payoff. Theorem 7:

Fix P X and π ( x, z ) . Let R d denote the closureof achievable ( R, R , Π) for the case of lossless communica-tion and causal disclosure X i − d , d ≥ . Let R d (Π) denote the key-payoff boundary of R d . First, we have R d ⊇ (cid:91) P XdU : X d ∼ (cid:81) dj =1 P X  ( R, R , Π) : R ≥ H ( X ) R ≥ d H ( X d | U )Π ≤ min z ( u ) E (cid:104) d d (cid:88) j =1 π ( X j , z ( U )) (cid:105) (242)and R d ⊆ (cid:91) P XU : X ∼ P X  ( R, R , Π) : R ≥ H ( X ) R ≥ d H ( X | U )Π ≤ min z ( u ) E π ( X, z ( U ))  . (243)Furthermore, for all Π , R d (Π) = Θ (cid:16) d (cid:17) . (244) Proof:

To establish the inner bound on R d , ﬁrst recall thecharacterization of R given in Corollary 3. Using the samearguments as the proof of Theorem 5, we can transform theproblem with delay d > into one involving delay d = 1 and invoke Corollary 3 on the new problem. Upon noting that d H ( X d ) = H ( X ) when X d ∼ (cid:81) P X , this technique givesthe achievable region in (242).To establish the outer bound, let ( R, R , Π) be an achievabletriple. The bound R ≥ H ( X ) is due to the lossless sourcecoding theorem. To bound R and Π , let J be uniformly dis-tributed on [ n ] and deﬁne U = ( M, X J − d , J ) and X = X J .Then, we have nR ≥ nH ( K ) (245) ≥ nI ( X n ; K | M ) (246) = nH ( X n | M ) − nH ( X n | K, M ) (247) ( a ) = nH ( X n | M ) − n · o (1) (248) ( b ) ≥ n · d n (cid:88) j =1 H ( X j | X j − d , M ) − n · o (1) (249) = n · d H ( X J | X J − , M, J ) − n · o (1) (250) = n · d H ( X | U ) − n · o (1) . (251)Step (a) uses Fano’s inequality, and step (b) follows fromLemma 7 (by setting Y = X n and conditioning on M ). Itis straightforward to bound Π in the manner of Section VII.From the outer bound in (243), we see that R d (Π) ≥ d R (Π) . It remains to show that R d (Π) ≤ c · d for someconstant c ; we do this via (242). First, let X d ∼ (cid:81) dj =1 P X .Let K ∼ Unif ( X ) be independent of X d and deﬁne U (cid:44) ( X ⊕ K, X ⊕ K, . . . , X d ⊕ K ) , (252)where ⊕ indicates addition modulo X . With this choice of U ,we have H ( X i | X j , U ) = 0 , ∀ i, j ∈ [ d ] (253)and X j ⊥ U, ∀ j ∈ [ d ] . (254) Therefore, we can write d H ( X d | U ) = d d (cid:88) i =1 H ( X j | X j − , U ) (255) ( a ) = d H ( X | U ) (256) ( b ) = d H ( X ) , (257)where (a) and (b) follow from (253) and (254), respectively.Moreover, we have min z ( u ) E (cid:104) d d (cid:88) j =1 π ( X j , z ( U )) (cid:105) (258) = d d (cid:88) j =1 min z ( u ) E π ( X j , z ( U )) (259) ( a ) = min z E π ( X, z ) (260) (cid:44) π max , (261)where (a) follows from Lemma 1 and (254).By selecting U according to (252), we have shown thatthe inner bound in (242) contains the point ( R , Π) =( d H ( X ) , π max ) ; therefore, ( d H ( X ) , π max ) ∈ R d . Since π max is the maximum possible payoff, this implies R d (Π) ≤ d H ( X ) , completing the proof of (244). C. Example in which the bounds meet

In the preceding proof, we demonstrated that the point ( R , Π) = ( d H ( X ) , π max ) is in the region (242) and istherefore achievable. If we choose the source distribution tobe P X ∼ Bern (1 / , then from Theorem 2 (which gives us R (Π) ) and the convexity of the rate-payoff region, it is clearthat R d (Π) ≤ d R (Π) . Conversely, the outer bound in (243)directly gives R d ≥ d R (Π) .X. C ONCLUSION

This work has established a theory of secure source codingwhich characterizes the optimal use of communication andsecret key to allow good reconstruction of the source by theintended receiver (who has access to the key) and force apoor reconstruction on any eavesdropper (without the secretkey). The central contribution, presented in Theorem 1, givesa general information theoretic characterization of the achiev-able performance. The expression in the theorem makes use oftwo auxiliary variables which can be interpreted as informationthat is kept secure and information that is released publicly. Inthe case of lossless compression in Corollary 3, the optimalcommunication system can explicitly follow these impliedsteps, constructing two separate messages and focusing all ofthe security resource (i.e., the key) on only one.An important component of the main result is the causaldisclosure assumption depicted in Fig. 2, which was absentfrom Yamamotos formulation of the problem in [11] and[12]. The causal disclosure empowers the eavesdropper withadditional information and forces the communication systemto resort to a more robust design for secure encoding, which results in an innovative encoding and decoding scheme thatsterilizes the causal disclosure.The theorems in this work allow for an arbitrary but knowndisclosure channel to the eavesdropper. However, one couldalways take the most pessimistic approach and assume thatthe source X and the reconstruction Y are both fully disclosed(causally) to the eavesdropper. This leads to the strongest deﬁ-nition of secrecy in our model, and the optimal communicationsystem for this setting has a simple and natural interpretationas producing synthetic noise, discussed in Section VI-F.This work also identiﬁes the rate-distortion tradeoffs withoutthe causal disclosure assumption. The case of no disclosure(as in Yamamoto’s model) is a special case of the main resultand is addressed in Section III, along with a discussion of itsfragility. Non-causal disclosure is the topic of Section VIII,which turns out to only be as empowering to the eavesdropperas causal disclosure.The causal disclosure framework boasts some importantunique properties aside from its operational interpretation asreal-time reconstruction by the eavesdropper. In Section V weshow that the traditional approach of measuring secrecy bynormalized equivocation (rather than distortion) is in fact aspecial case of this framework by applying a particular log-loss distortion function. This connection only exists because ofthe causal disclosure assumption. Another property that arisesis the need for a stochastic decoder, which suggests a dualitywith Wyner’s wiretap channel [7] where a stochastic encoderis needed. Furthermore, this framework induces a rich tradeoffbetween the rate of secret key used and the distortion thesystem imposes upon an eavesdropper, while such a tradeoffdoes not occur in the absence of causal disclosure. Thesefeatures suggest that causal disclosure is an appropriate baseassumption for understanding rate-distortion theory for secrecysystems. A PPENDIX AP ROOF OF T HEOREM A. Supporting lemma

For each x ∈ X , deﬁne F n ( x ) ⊆ ∆ X by F n ( x ) (cid:44) (cid:110) p ∈ ∆ X : p = Unif ( A ) for some A ⊆ X , |A| = n, and p ( x ) = max x (cid:48) p ( x (cid:48) ) (cid:111) , (262)and deﬁne A n ( x ) ⊆ ∆ X by A n ( x ) (cid:44) (cid:110) p ∈ ∆ X : p ( x ) = max x (cid:48) p ( x (cid:48) ) and p ( x ) ∈ (cid:104) n +1 , n (cid:105) (cid:111) . (263)In words, F n ( x ) is the set of probability mass functions on X that are uniformly distributed on a subset of size n and whoselargest mass occurs at x . Fig. 6 illustrates the deﬁnitions of F n ( x ) and A n ( x ) when X = { , , } .The key to the proof of Theorem 2 is the following technicallemma. Lemma 8:

For a random variable X with distribution P X ,let x and N be such that P X ∈ A N ( x ) . (0 , , , , ) (0 , , )(1 , ,

0) ( , ,

0) (0 , , A (1) A (1) A (2) A (2) A (3) A (3) Fig. 6:

The probability simplex ∆ X for X = { , , } . The centroidis the distribution ( , , ) . Note that F (1) = { } , F (1) = { } ,and F (1) = { } .

1) There exists a random variable V , correlated with X ,such that for all v ∈ V , P X | V = v ∈ F N ( x ) ∪ F N +1 ( x ) . (264)In other words, P X can be written as a convex combina-tion of distributions in F N ( x ) ∪ F N +1 ( x ) .2) Let n ∈ [ N ] . There exists a random variable V such thatfor all v ∈ V , P X | V = v ∈ (cid:91) x ∈X F n ( x ) . (265)In other words, for any n ∈ [ N ] , P X can be written as aconvex combination of distributions in ∪ x F n ( x ) . Proof:

Fix x ∈ X and n ∈ N , and deﬁne F (cid:44) F n ( x ) ∪ F n +1 ( x ) . (266)First, one can verify that A n ( x ) is a convex set. Furthermore,it is well-known that every compact convex set is the convexhull of its extreme points. Thus, to prove part 1, it is enoughto show that the set of extreme points of A n ( x ) is equal to F .Then any p ∈ A n ( x ) can be written as a convex combinationof the elements of F .The set of extreme points of a convex set C is deﬁned byextr ( C ) (cid:44) { p ∈ C : if p = θq + (1 − θ ) r, q, r ∈ C ,θ ∈ (0 , then p = q = r } . (267)We ﬁrst show that F ⊆ extr ( A n ( x )) . Let p ∈ F , and let q, r ∈ A n ( x ) , θ ∈ (0 , be such that q (cid:54) = p , r (cid:54) = p , and p = θq + (1 − θ ) r (268)If p ∈ F n ( x ) , then p = q = r is clear because q ( x ) ∈ [0 , n ] and r ( x ) ∈ [0 , n ] for all x ∈ X . On the other hand, suppose p ∈ F n +1 ( x ) . Because q, r ∈ A n ( x ) and p ( x ) = n +1 , we have q ( x ) = r ( x ) = n +1 . Thus, q ( x ) ∈ [0 , n +1 ] and r ( x ) ∈ [0 , n +1 ] for all x ∈ X , and again p = q = r .To show extr ( A n ( x )) ⊆ F , we proceed by way of con-tradiction and suppose that p ∈ extr ( A n ( x )) and p / ∈ F .From p / ∈ F , it holds that p ( x (cid:48) ) ∈ (0 , n +1 ) ∪ ( n +1 , n ) for some x (cid:48) ∈ X . There are now three separate cases toconsider depending on whether p ( x ) = n +1 , p ( x ) ∈ ( n +1 , n ) ,or p ( x ) = n . For ease of exposition, we only consider p ( x ) = n +1 ; the other two cases use a similar argument. Since p ( x (cid:48) ) ≤ p ( x ) , we have p ( x (cid:48) ) ∈ (0 , n +1 ) . It follows that theremust exist x (cid:48)(cid:48) (cid:54) = x (cid:48) such that p ( x (cid:48)(cid:48) ) ∈ (0 , n +1 ) ; otherwise, wewould have (cid:88) x ∈X p ( x ) = nn +1 + p ( x (cid:48) ) < (269)Now we can write p = q + r , where q ( x ) =  p ( x ) , x (cid:54) = x (cid:48) , x (cid:54) = x (cid:48)(cid:48) p ( x ) + ε, x = x (cid:48) p ( x ) − ε, x = x (cid:48)(cid:48) (270) r ( x ) =  p ( x ) , x (cid:54) = x (cid:48) , x (cid:54) = x (cid:48)(cid:48) p ( x ) − ε, x = x (cid:48) p ( x ) + ε, x = x (cid:48)(cid:48) (271)and ε = min (cid:110) p ( x (cid:48) ) , p ( x (cid:48)(cid:48) ) , n +1 − p ( x (cid:48) ) , n +1 − p ( x (cid:48)(cid:48) ) (cid:111) . (272)Thus, p / ∈ extr ( A n ( x )) , giving the contradiction. We haveshown F = extr ( A n ( x )) and part 1 of the lemma.To prove part 2 of the lemma, ﬁrst deﬁne B n (cid:44) (cid:91) x ∈X F n ( x ) . (273)For any n , it holds that B n +1 ⊆ conv ( B n ) . (274)This follows from writing p ∈ B n +1 as p = (cid:88) q ∈B n : supp ( q ) ⊆ supp ( p ) 1 n +1 q. (275)One can establish part 2 by using part 1 and (274). B. Proof of Theorem 2

With Lemma 8 in hand, we are equipped to prove Theo-rem 2. Fix R and let U ∗ be the maximizer of Π( R ) . Whenthe payoff function is π ( x, z ) = { x (cid:54) = z } , we can rewrite Π( R ) as Π( R ) = min z ( u ) E π ( X, z ( U ∗ )) (276) = min z ( u ) (cid:88) u P U ∗ ( u ) (cid:88) x P X | U ∗ ( x | u ) { x (cid:54) = z ( u ) } (277) = (cid:88) u P U ∗ ( u ) min z (cid:88) x P X | U ∗ ( x | u ) { x (cid:54) = z } (278) = (cid:88) u P U ∗ ( u ) min z (1 − P X | U ∗ ( z | u )) (279) = (cid:88) u P U ∗ ( u )(1 − max x P X | U ∗ ( x | u )) . (280) We now show that the set { P X | U ∗ = u } u in (38) can berestricted the ﬁnite set P unif , where P unif (cid:44) { p ∈ ∆ X : p = Unif ( A ) for some A ⊆ X } . (281)By applying part 2 of Lemma 8 to each distribution in { P X | U ∗ = u } u , we have that there exists a random variable V such that ∀ u, v, P X | U ∗ = u,V = v ∈ P unif (282) ∀ u, v, v (cid:48) , arg max x P X | U ∗ V ( x | u, v )= arg max x P X | U ∗ V ( x | u, v (cid:48) ) . (283)We now write Π( R ) ( a ) = (cid:88) u P U ∗ ( u )(1 − max x P X | U ∗ ( x | u )) (284) = (cid:88) u P U ∗ ( u )(1 − max x (cid:88) v P X | U ∗ V ( x | u, v ) P V | U ∗ ( v | u )) (285) ( b ) = (cid:88) u,v P U ∗ V ( u, v )(1 − max x P X | U ∗ V ( x | u, v )) (286) = min z ( u,v ) E π ( X, z ( U ∗ , V )) , (287)where (a) is due to (280) and (b) follows from (283). Bynoting that R ≥ H ( X | U ∗ ) ≥ H ( X | U ∗ , V ) and letting U =( U ∗ , V ) , we have Π( R ) ≤ max U : P X | U = u ∈P unif R ≥ H ( X | U ) min z ( u ) E π ( X, z ( U )) . (288)This shows that we can restrict attention to P unif withouthurting the payoff. Now, observe that p ∈ P unif satisﬁes (cid:0) H ( p ) , − max x p ( x ) (cid:1) = (cid:0) log n, n − n (cid:1) (289)for some n ∈ N . Referring to (280) and noting that H ( X | U ) = (cid:80) u P U ( u ) H ( X | U = u ) , we see that Π( R ) cannot lie outsideof the convex hull of the pairs (log n, n − n ) , n ∈ N . That is, Π( R ) ≤ φ ( R ) . (290)To see Π( R ) ≤ π max , simply write Π( R ) = (cid:88) u P U ∗ ( u )(1 − max x P X | U ∗ ( x | u )) (291) ≤ − max x (cid:88) u P U ∗ ( u ) P X | U ∗ ( x | u ) (292) = π max . (293)It remains to show that min { φ ( R ) , π max } can be achievedthrough the proper choice of U . To that end, let x and N be such that P X ∈ A N ( x ) . By the convexity of R , we willbe done once we show that we can achieve not only thepoints (log n, n − n ) , n ∈ [ N ] , but also the intersection of φ with π max . To achieve the point (log n, n − n ) , invoke part 2of Lemma 8 produce U . Denote the corresponding rate-payoffpair by ( R (cid:48) , Π (cid:48) ) n . Since the { P X | U = u } u all satisfy (cid:0) H ( X | U = u ) , − max x P X | U = u ( x | u ) (cid:1) = (cid:0) log n, n − n (cid:1) (294) so must ( R (cid:48) , Π (cid:48) ) n as well. To achieve the intersection of φ with π max , ﬁrst invoke part 1 of Lemma 8 to produce U .Denote the corresponding rate-payoff pair by ( R (cid:48)(cid:48) , Π (cid:48)(cid:48) ) . The { P X | U = u } u correspond to either (log n, n − n ) or (log( n +1) , nn +1 ) . Thus, ( R (cid:48)(cid:48) , Π (cid:48)(cid:48) ) lies on f because it is a convexcombination of those two points. We also have that ( R (cid:48)(cid:48) , Π (cid:48)(cid:48) ) satisﬁes Π (cid:48)(cid:48) = π max because arg max x P X | U = u ( x | u ) = x, ∀ u ∈ U . (295)This completes the proof of Theorem 2.A PPENDIX BP ROOF OF L EMMA R > I ( W ; V | U ) . Deﬁne the typical set T nε (cid:44) { u n : | T u n ( u ) − P U ( u ) | < εP U ( u ) , ∀ u ∈ U} . (296)where T u n denotes the type of u n .First, write (cid:13)(cid:13) Q MW n X B Y B − (cid:98) Q MW n X B Y B (cid:13)(cid:13) = (cid:88) m : U n ( m ) ∈T nε Q M ( m ) (cid:13)(cid:13) Q W n X B Y B | M = m − (cid:98) Q W n X B Y B | M = m (cid:13)(cid:13) + (cid:88) m : U n ( m ) / ∈T nε Q M ( m ) (cid:13)(cid:13) Q W n X B Y B | M = m − (cid:98) Q W n X B Y B | M = m (cid:13)(cid:13) . (297)The expected value of the second term in (297) can be boundedeasily. For sufﬁciently large n , we have E (cid:88) m : U n ( m ) / ∈T nε Q M ( m ) (cid:13)(cid:13)(cid:13) P X n Y k | M = m − (cid:98) Q X n Y k | M = m (cid:13)(cid:13)(cid:13) ≤ E (cid:88) m : U n ( m ) / ∈T nε Q M ( m ) (298) = P [ U n ( M ) / ∈ T nε ] (299) = P [ U n (1) / ∈ T nε ] (300) ( a ) ≤ ε, (301)where (a) is due to the law of large numbers.The expected value of the ﬁrst term in (297) can ﬁrstbe rewritten by moving the expectation with respect to thesubcodebook C ( n ) V ( m ) inside the sum. E (cid:88) m : U n ( m ) ∈T nε Q M ( m ) (cid:13)(cid:13)(cid:13) Q W n X B Y B | M = m − (cid:98) Q W n X B Y B | M = m (cid:13)(cid:13)(cid:13) = E C ( n ) U (cid:88) m : U n ( m ) ∈T nε Q M ( m ) E C ( n ) V ( m ) (cid:13)(cid:13)(cid:13) Q W n X B Y B | M = m − (cid:98) Q W n X B Y B | M = m (cid:13)(cid:13)(cid:13) . (302)It remains to show that the inner expectation vanishes for each m . To do this, ﬁrst observe that Q W n X B Y B | M = m is the out-put of the memoryless (but nonstationary) channel Φ (cid:44) Q W n X B Y B | K,M = m acting on a codebook of size nR that Due to the symmetry of codebook construction, the behavior of the innerexpectation is uniform for all m . Thus, the rate of convergence does not playa role in claiming that (302) vanishes. is generated i.i.d. according to Ψ (cid:44) (cid:81) i P V | U = u i ( m ) . Fur-thermore, it can veriﬁed that (cid:98) Q W n X B Y B | M = m is the outputdistribution of the channel Φ when the input distribution is Ψ . Thus, we can invoke the soft covering lemma (Lemma 3)as long as R exceeds the sup-information rate of the processthat results from Φ acting on Ψ . To be explicit, that processis given by Γ( v n , w n , x B , y B ) (cid:44) n (cid:89) i =1 P V W | U ( w i , v i | u i ( m )) (cid:89) i ∈B P V XY | U ( x i , y i , v i | u i ( m )) . (303)Since Γ is a memoryless process and the second moments of { i Γ ( W i , X i , Y i ; V i ) } are uniformly bounded, the law of largenumbers gives lim sup n →∞ n i Γ ( W n , X B , Y B ; V n ) ≤ E n i Γ ( W n , X B , Y B ; V n ) . (304)Furthermore, we can upper bound the expected informationdensity by writing E n i Γ ( W n , X B , Y B ; V n )= E n i Γ ( W n ; V n ) + E n i Γ ( X B , Y B ; V n | W n ) (305) = E n i Γ ( W n ; V n ) + 1 n I Γ ( X B , Y B ; V n | W n ) (306) ≤ E n i Γ ( W n ; V n ) + α log |X ||Y| (307) = E n n (cid:88) i =1 i P WV | U = ui ( m ) ( W ; V | U = u i ( m ))+ α log |X ||Y| (308) = 1 n n (cid:88) i =1 I ( W ; V | U = u i ( m )) + α log |X ||Y| (309) = (cid:88) u ∈U T u n ( m ) I ( W ; V | U = u ) + α log |X ||Y| (310) ( a ) ≤ (cid:88) u ∈U (1 + ε ) P U ( u ) I ( W ; V | U = u ) + α log |X ||Y| (311) = (1 + ε ) I ( W ; V | U ) + α log |X ||Y| . (312)Step ( a ) follows from u n ( m ) ∈ T nε .The expression in (312) is strictly less than R for the properchoice of ε > and α > . Thus, when u n ( m ) ∈ T nε , R > lim sup n →∞ n i Γ ( W n , X B , Y B ; V n ) . (313)Invoking Lemma 3, we have lim n →∞ E C ( n ) V ( m ) (cid:13)(cid:13)(cid:13) Q W n X B Y B | M = m − (cid:98) Q W n X B Y B | M = m (cid:13)(cid:13)(cid:13) = 0 . (314)This completes the proof of Lemma 5.R EFERENCES[1] P. Cuff, “A framework for partial secrecy,” in

Proc. Global Telecomm.Conf. (GLOBECOM) , Dec. 2010. [2] ——, “Using a secret key to foil an eavesdropper,” in , Sept. 2010, pp. 1405–1411.[3] C. Schieler and P. Cuff, “Secrecy is cheap if the adversary mustreconstruct,” in

Proc. IEEE International Symposium on InformationTheory (ISIT) , Jul. 2012, pp. 66–70.[4] P. Cuff, “Optimal equivocation in secrecy systems a special case ofdistortion-based characterization,” in

Information Theory and Applica-tions Workshop (ITA) , Feb. 2013, pp. 1–3.[5] C. Schieler and P. Cuff, “Rate-distortion theory for secrecy systems,” in

Proc. IEEE International Symposium on Information Theory (ISIT) , Jul.2013, pp. 2219–2223.[6] C. Shannon, “Communication theory of secrecy systems,”

Bell Syst.Tech. J. , vol. 28, no. 4, pp. 656–715, Oct. 1949.[7] A. Wyner, “The wire-tap channel,”

Bell Syst. Tech. J. , vol. 54, no. 8, pp.1334–1387, 1975.[8] I. Csisz´ar and J. K¨orner, “Broadcast channels with conﬁdential mes-sages,”

IEEE Trans. Inf. Theory , vol. 24, no. 3, pp. 339–348, May 1978.[9] U. Maurer, “Secret key agreement by public discussion from commoninformation,”

IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 733–742, May1993.[10] R. Ahlswede and I. Csisz´ar, “Common randomness in information theoryand cryptography. i. secret sharing,”

IEEE Trans. Inf. Theory , vol. 39,no. 4, pp. 1121–1132, Jul. 1993.[11] H. Yamamoto, “A rate-distortion problem for a communication systemwith a secondary decoder to be hindered,”

IEEE Trans. Inf. Theory ,vol. 34, no. 4, pp. 835–842, July 1988.[12] ——, “Rate-distortion theory for the shannon cipher system,”

IEEETrans. Inf. Theory , vol. 43, no. 3, pp. 827–835, May 1997.[13] P. Cuff, H. Permuter, and T. Cover, “Coordination capacity,”

IEEE Trans.Inf. Theory , vol. 56, no. 9, pp. 4181–4206, Sept. 2010.[14] U. Maurer, A. R¨uedlinger, and B. Tackmann, “Conﬁdentiality andintegrity: a constructive perspective,” in

Theory of Cryptography , ser.Lecture Notes in Computer Science, R. Cramer, Ed., 2012, vol. 7194,pp. 209–229.[15] T. Courtade and T. Weissman, “Multiterminal source coding underlogarithmic loss,”

IEEE Trans. Inf. Theory , 2013, to appear.[16] P. Cuff, “Distributed channel synthesis,”

IEEE Trans. Inf. Theory ,vol. 59, no. 11, pp. 7071–7096, Nov. 2013.[17] A. Wyner, “The common information of two dependent random vari-ables,”

IEEE Trans. Inf. Theory , vol. 21, no. 2, pp. 163–179, Mar. 1975.[18] T. S. Han and S. Verdu, “Approximation theory of output statistics,”

IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 752–772, May 1993.[19] M. Yassaee, M. Aref, and A. Gohari, “Achievability proof via outputstatistics of random binning,”

IEEE Trans. Inf. Theory , vol. PP, no. 99,pp. 1–1, 2014.[20] A. Winter, “Secret, public and quantum correlation cost of triples of ran-dom variables,” in

Proc. IEEE International Symposium on InformationTheory (ISIT) , Sept. 2005, pp. 2270–2274.[21] I. Csisz´ar and J. K¨orner,