Discriminatory Lossy Source Coding: Side Information Privacy
aa r X i v : . [ c s . I T ] J un Discriminatory Lossy Source Coding: SideInformation Privacy
Ravi Tandon,
Member, IEEE,
Lalitha Sankar,
Member, IEEE, and H. VincentPoor,
Fellow, IEEE
Abstract
A lossy source coding problem is studied in which a source encoder communicates with twodecoders, one with and one without correlated side information with an additional constraint on theprivacy of the side information at the uninformed decoder. Two cases of this problem arise depending onthe availability of the side information at the encoder. The set of all feasible rate-distortion-equivocationtuples are characterized for both cases. The difference between the informed and uninformed cases andthe advantages of encoder side information for enhancing privacy are highlighted for a binary symmetricsource with erasure side information and Hamming distortion.
Index Terms lossy source coding, information privacy, side information, equivocation, discriminatory coding,informed and uninformed encoders, Heegard-Berger problem, Kaspi problem.
I. I
NTRODUCTION
Information sources often need to be made accessible to multiple legitimate users simulta-neously, some of whom can have correlated side information obtained from other sources or
R. Tandon, L. Sankar, and H. V. Poor are with the Department of Electrical Engineering at Princeton University, NJ 08544,USA. email: { rtandon,lalitha,[email protected] } .This research was supported in part by the National Science Foundation under Grants CNS-09-05086, CNS-09-05398, andCCF-10-16671, the Air Force Office of Scientific Research under Grant FA9550-09-1-0643, and by a fellowship from thePrinceton University Council on Science and Technology. July 10, 2018 DRAFT from prior interactions. A natural question that arises in this context is the following: can thesource publish (encode) its data in a discriminatory manner such that the uninformed user doesnot infer the side information, i.e., it is kept private, while providing utility (fidelity) to bothusers? Two possible cases arise in this context depending on whether the encoder is informed or uninformed , i.e., it has or does not have access to the correlated side information, respectively.This question is addressed from strictly a fidelity viewpoint by C. Heegard and T. Berger in [1],henceforth referred to as the Heegard-Berger problem, for the uninformed case and by A. Kaspi[2], henceforth referred to as the Kaspi problem, for the informed case wherein they determinedthe rate-distortion function for a discrete and memoryless source pair. Using equivocation as theprivacy metric, we address the question posed above using the source network models in [1] and[2] with an additional constraint on the side information privacy at the decoder without accessto it, i.e., decoder 1 (see Fig. 1).We prove here that the encoding scheme for the Heegard-Berger problem achieves the minimalrate while guaranteeing the maximal equivocation for any feasible distortion pair at the twodecoders when the encoder is uninformed. Informally speaking, the Heegard-Berger codingscheme involves a combination of a rate-distortion code and a conditional Wyner-Ziv codewhich is revealed to both decoders. Our proof exploits the fact that conditioned on what isdecodable by decoder 1, i.e., the rate-distortion code, the additional information intended fordecoder 2, i.e. the conditional Wyner-Ziv bin index, is asymptotically independent of the sideinformation, Y (see Fig. 1). Observing that the generation of the conditional Wyner-Ziv bin indexis analogous to the Slepian-Wolf binning scheme, we prove this independence property for boththe Slepian-Wolf and the Wyner-Ziv encoding. Next, we prove a similar independence propertyfor the Heegard-Berger coding scheme, which in turn allows us to demonstrate the optimalityof this scheme for the problem studied in this paper.On the other hand, for the informed encoder case, we present a modified coding scheme (vis-`a-vis the Kaspi scheme) which achieves the set of all feasible rate-equivocation pairs for the desiredfidelity requirements at the two decoders. The Kaspi coding scheme exploits the encoder sideinformation Y (see Fig. 1) via a combination of a rate-distortion code, intended for decoder 1,and a conditional rate-distortion code, intended for decoder 2, which is then revealed to both thedecoders. However, conditioned on what is decodable by decoder 1, i.e., the rate-distortion code,the conditional rate-distortion code does not explicitly ensure the asymptotic independence of the July 10, 2018 DRAFT resulting index with the side information Y , and therefore, does not simplify the equivocationcomputation at decoder 1. To resolve this difficulty, we present a two-step encoding scheme inwhich the first step is the same as in the Kaspi problem while in the second step we first choosethe codeword intended for decoder and then bin it. We prove that the resulting conditional binindex is asymptotically independent of the side information Y .The last part of our paper focuses on a specific source model, a binary equiprobable source X with erased side information Y (with erasure probability p ) and Hamming distortion con-straints. For this source pair, we focus on the rate-distortion-equivocation tradeoffs for both theuninformed and informed cases.For the uninformed encoder case, we prove that the maximal equivocation is independentof the fidelity requirement D at decoder , i.e., the only information leaked about the sideinformation is a direct consequence of the distortion requirement at decoder . We also explicitlycharacterize the rate-distortion-equivocation tradeoff for this problem over the space of allachievable distortion pairs. Our results clearly demonstrate the optimality of the Heegard-Bergerencoding scheme from both rate and equivocation standpoints.In contrast, for the informed encoder case, we explicitly demonstrate the usefulness of encoderside information. We first prove that the set of distortion pairs for which perfect equivocationis achievable at decoder 1 is strictly larger than that for the uninformed case. We prove this byshowing that the informed encoder uses the side information Y via a single description whichsatisfies the distortion constraints at both the decoders while simultaneously achieving perfectprivacy at decoder 1. Furthermore, we also demonstrate that access to side information leadsto a tradeoff between rate and equivocation. To guarantee a desired equivocation, we show thatthe minimal rate required can be strictly larger than the rate-distortion function for the originalKaspi problem.The problem of source coding with equivocation constraints has gained attention recently[3]–[13]. In contrast to these papers where the focus is on an external eavesdropper, we addressthe problem of privacy leakage to a legitimate user, i.e., we seek to understand whether theencoding at the source can discriminate between legitimate users with and without access tocorrelated side information. Furthermore, our results on the rate-distortion-equivocation tradeofffor a binary symmetric source with erased side information for both the informed and uninformedencoder cases allow a clear comparison of the results for the same models without an additional July 10, 2018 DRAFT Y n Encoder Decoder 1Decoder 2 JX n E [ d ( X, ˆ X ( J, Y ))] ≤ D E [ d ( X, ˆ X ( J ))] ≤ D ˆ X ˆ X n H ( Y n | J ) ≥ ES Fig. 1. Source network model. privacy constraint as studied in [14] and [15].The paper is organized as follows. In Section II, we present the system model. In Section III,we first prove the asymptotic independence of the bin index and the decoder side information inthe Slepian-Wolf and Wyner-Ziv source coding problems. Subsequently, we establish the rate-equivocation tradeoff regions for both the uninformed and informed cases. In Section IV, wecharacterize the achievable rate-distortion-equivocation tradeoff for a specific source pair ( X, Y ) where X is binary and Y results from passing X through an erasure channel. We conclude inSection V. II. S YSTEM M ODEL
We consider a source network with a single encoder which observes and communicates allor a part ( X n ) of a discrete, memoryless bivariate source ( X n , Y n ) over a finite rate link todecoders and at distortions D and D , respectively, in which decoder has access to Y n and an equivocation E about Y n is required at decoder . The network is shown in Fig. 1where the two cases with and without side information at the encoder correspond to the switch S being in the closed and open positions, respectively. Without the equivocation constraint atdecoder , the problems with the switch in open and closed positions, are the Heegard-Berger andKaspi problems for which the set of feasible ( R, D , D ) tuples are characterized by Heegard July 10, 2018 DRAFT and Berger [1] and Kaspi [2], respectively. We seek to characterize the set of all achievable ( R, D , D , E ) tuples for both problems.Formally, let ( X , Y , p ( x, y )) denote the bivariate source with random variables X ∈ X and Y ∈ Y . Furthermore, let ˆ X and ˆ X denote the reconstruction alphabets at decoders 1 and 2,respectively, and let d and d such that d k : X × ˆ X → [0 , ∞ ) , k = 1 , , (1)be distortion measures associated with reconstruction of X at decoders 1 and 2, respectively.Let S take the values 0 and 1 to denote the open and closed switch positions, respectively. An ( n, M, D , D , E ) code for this network consists of an encoder f : X n × S · Y n → J = { , . . . , M } (2)and two decoders, g : { , . . . , M } → ˆ X n , and g : { , . . . , M } × Y n → ˆ X n . The expected distortion D k at decoder k is given by D k = E n n P i =1 d k (cid:16) X i , ˆ X i (cid:17) , k = 1 , , (3)where ˆ X = g ( f ( X n )) , ˆ X = g ( f ( X n ) , Y n ) , and the equivocation rate E is given by E = 1 n H ( Y n | J ) , J ∈ J . (4) Definition 1:
The rate-distortion-equivocation tuple ( R, D , D , E ) is achievable for the abovesource network if there exists an ( n, M, D + ǫ, D + ǫ, E − ǫ ) code with M ≤ n ( R + ǫ ) for n suffi-ciently large. Let R denote the set of all achievable ( R, D , D , E ) tuples, R ( D , D , E ) denotethe minimal achievable rate R , and Γ ( D , D ) denote the maximal achievable equivocation E such that R ( D , D , E ) ≡ min ( R,D ,D ,E ) ∈R R, and (5) Γ ( D , D ) ≡ max ( R,D ,D ,E ) ∈R , ∀ R ≥ E. (6) Remark 1:
Γ ( D , D ) is the maximal privacy achievable about Y n at decoder 1 and R ( D , D , E ) is the minimal rate required to guarantee a distortion pair ( D , D ) and an equivocation E . R ( D , D , Γ( D , D )) is the minimal rate achieving the maximal equivocation for a distortionpair ( D , D ) . July 10, 2018 DRAFT
III. R
ELATED O BSERVATIONS
In the context of lossless communications, [16] studies a problem of losslessly communicatinga bivariate source ( X, Y ) to a single decoder via two encoders, one with access to the X n sequences and the other with access to the Y n sequences. A special case of this problem is onein which the decoder has perfect access to Y n for which a minimal rate of R X ≥ H ( X | Y ) isneeded [16] and it this problem (which leads to a corner point in the Slepian-Wolf region) thatwe address below.On the other hand, [17] studies the problem of lossily communicating a part X of a bivariatesource ( X, Y ) subject to a fidelity criterion to a single decoder which has access to Y and provesthat a minimum rate of R ( D ) ≥ min ( I ( X ; U ) − I ( Y ; U )) where the minimization is over alldistributions p ( u | x ) and deterministic functions g such that ˆ X = g ( U, Y ) and E h d (cid:16) X, ˆ X (cid:17)i ≤ D .In both of the abovementioned problems, the coding index communicated is chosen withknowledge of the decoder side information. In the lemmas that follow we prove that in bothcases the optimal encoding is such that the coding index is asymptotically independent of theside information Y n at the decoder. A. Slepian-Wolf Coding Coding: Independence of Bin Index and Side InformationLemma 1:
For a bivariate source ( X, Y ) where X n is encoded via the encoding function f SW : X n → J ∈ { , . . . , M J } while Y n is available only at the decoder, we have lim n →∞ H ( Y n | J )/ n = H ( Y ) , i.e., lim n →∞ I ( Y n ; J )/ n → . Proof:
Let T A ( n, ǫ ) denote the set of strongly typical A sequences of length n . We definea binary random variable µ as follows: µ ( x n , y n ) = , ( x n , y n )
6∈ T XY ( n, ǫ ) ;1 , otherwise. (7)From the Slepian-Wolf encoding, since a typical sequence x n is assigned a bin (index) j atrandom, we have that Pr ( J = j | X n = x n ∈ T X ( n, ǫ )) = 1 M J (8)and Pr ( J = j | µ = 1) = P x n Pr ( x n , J = j | µ = 1) ∈ ((1 − ǫ ) /M j , /M J ) (9) July 10, 2018 DRAFT where we have used the fact that for a typical set
Pr ( T XY ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2].The conditional equivocation H ( Y n | J ) can be lower bounded as H ( Y n | J ) ≥ H ( Y n | J, µ ) (10) = Pr ( µ = 0) H ( Y n | J, µ = 0) + Pr ( µ = 1) H ( Y n | J, µ = 1) ≥ Pr ( µ = 1) H ( Y n | J, µ = 1) (11) = Pr ( µ = 1) P j Pr ( j | µ = 1) H ( Y n | j, µ = 1) (12)where (10) follows from the fact that conditioning does not increase entropy, and (11) from thefact that the entropy is non-negative. The probability Pr ( y n | j, µ = 1) can be written as Pr ( y n | j, µ = 1)= P x n Pr ( y n , x n | j, µ = 1) (13a) = P x n Pr ( x n | j, µ = 1) Pr ( y n | x n , j, µ = 1) (13b) = P x n Pr ( x n , j | µ = 1)Pr ( j | µ = 1) Pr ( y n | x n , µ = 1) (13c) ≤ nǫ ′ P x n Pr ( x n | µ = 1) /M J M J Pr ( y n | x n , µ = 1) (13d) = 2 nǫ ′ P x n Pr ( x n | µ = 1) Pr ( y n | x n , µ = 1) (13e) = 2 nǫ ′ Pr ( y n | µ = 1) (13f) ≤ − n ( H ( Y ) − ǫ ′′ ) (13g)where (13b) follows from (8) and the fact that Y n − X n − J forms a Markov chain (byconstruction), and (13d) follows from (9). Expanding H ( Y n | j, µ = 1) , we have H ( Y n | j, µ = 1) = P y n p ( y n | j, µ = 1) log 1Pr ( y n | j, µ = 1) (14) ≥ P y n p ( y n | j, µ = 1) log 2 n ( H ( Y ) − ǫ ′′ ) (15) = n ( H ( Y ) − ǫ ′′ ) P y n p ( y n | j, µ = 1) (16) ≥ n (1 − ǫ ) ( H ( Y ) − ǫ ′′ ) (17) July 10, 2018 DRAFT where (15) results from the upper bound on
Pr ( y n | j, µ = 1) in (13g) and (17) from the fact thatfor a typical set Pr ( T XY ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2]. Thus, the equivocation H ( Y n | J ) canbe lower bounded as H ( Y n | J ) ≥ Pr ( µ = 1) P j Pr ( j | µ = 1) (1 − ǫ ) n ( H ( Y ) − ǫ ′′ ) (18) ≥ n (1 − ǫ ) ( H ( Y ) − ǫ ′′ ) (19)where we have used (9) and the fact that for a typical set Pr ( T XY ( n, ǫ )) ≥ (1 − ǫ ) [18, chap.2]. The proof concludes by observing that H ( Y n ) ≥ H ( Y n | J ) and ǫ → , ǫ ′′ → as n → ∞ . Remark 2:
Lemma 1 captures the intuition that it suffices to encode only that part of X n thatis independent of the decoder side-information Y n . Remark 3:
The proof of Lemma 1 does not depend on the precise bound on the total number, M J , of encoding indices, i.e., it holds for all choices of M J . In fact, the bound on M J is aconsequence of the decoding requirements. B. Wyner-Ziv Coding: Independence of Bin Index and Side InformationLemma 2:
For a bivariate source ( X, Y ) where X n is encoded via the encoding function f W Z : X n → J ∈ { , . . . , M J } while Y n is available only at the decoder, we have lim n →∞ H ( Y n | J )/ n = H ( Y ) , i.e., lim n →∞ I ( Y n ; J )/ n → . Proof:
Let T A ( n, ǫ ) denote the set of strongly typical A sequences of length n . We definea binary random variable µ as follows: µ ( u n , y n ) = , ( u n , y n )
6∈ T UY ( n, ǫ ) ;1 , otherwise. (20)From the Wyner-Ziv encoding, for a given x n , first a sequence u n that is jointly typical with x n is chosen where the n symbols of u n are generated independently according to p U ( · ) (computedfrom p XU ( · )) . The resulting sequence u n is assigned a bin (index) j at random such that wehave Pr ( J = j | U n = u n ∈ T U ( n, ǫ )) = 1 M J (21)and Pr ( J = j | µ = 1) = P u n Pr ( u n , J = j | µ = 1) ∈ ((1 − ǫ ) /M j , /M J ) (22) July 10, 2018 DRAFT where we have used the fact that the probability of the typical set T UY ( n, ǫ ) ≥ (1 − ǫ ) [18,chap. 2] and using (1 − ǫ ) /M j = 2 − nǫ ′ /M j for a given n .The conditional equivocation H ( Y n | J ) can be lower bounded as H ( Y n | J ) ≥ H ( Y n | J, µ ) (23) = Pr ( µ = 0) H ( Y n | J, µ = 0) + Pr ( µ = 1) H ( Y n | J, µ = 1) ≥ Pr ( µ = 1) H ( Y n | J, µ = 1) (24) = Pr ( µ = 1) P j Pr ( j | µ = 1) H ( Y n | j, µ = 1) (25)where (10) follows from the fact that conditioning reduces entropy, and (11) from the fact thatthe entropy is non-negative. The probability Pr ( y n | j, µ = 1) can be written as Pr ( y n | j, µ = 1) (26a) = P u n Pr ( y n , u n | j, µ = 1) (26b) = P u n Pr ( u n | j, µ = 1) Pr ( y n | u n , j, µ = 1) (26c) = P u n Pr ( u n | j, µ = 1) Pr ( y n | u n , µ = 1) (26d) = P u n Pr ( u n | µ = 1)Pr ( j | µ = 1) 1 M J Pr ( y n | u n , µ = 1) (26e) ≤ P u n Pr ( u n | µ = 1) 2 nǫ ′ Pr ( y n | u n , µ = 1) (26f) = P u n Pr ( y n , u n | µ = 1) 2 nǫ ′ (26g) = Pr ( y n | µ = 1) 2 nǫ ′ (26h) ≤ − n ( H ( Y ) − ǫ ′′ ) (26i)where (26d) follows from (21) and the fact that Y n − U n − J forms a Markov chain (by July 10, 2018 DRAFT0 construction) and (26f) follows from (22). Expanding H ( Y n | j, µ = 1) , we have H ( Y n | j, µ = 1) = P y n p ( y n | j, µ = 1) log 1Pr ( y n | j, µ = 1) (27) ≥ P y n p ( y n | j, µ = 1) log 2 n ( H ( Y ) − ǫ ′ ) (28) = n ( H ( Y ) − ǫ ′ ) P y n p ( y n | j, µ = 1) (29) ≥ n (1 − ǫ ) ( H ( Y ) − ǫ ′ ) (30)where (15) results from the upper bound on Pr ( y n | j, µ = 1) in (26i) and (17) from the fact thatfor a typical set T XY ( n, ǫ ) ≥ (1 − ǫ ) [18, chap. 2]. Thus, the equivocation H ( Y n | J ) can belower bounded as H ( Y n | J ) ≥ Pr ( µ = 1) P j Pr ( j | µ = 1) (1 − ǫ ) n ( H ( Y ) − ǫ ′ ) (31) ≥ n (1 − ǫ ) ( H ( Y ) − ǫ ′ ) (32)where we have used the fact that for a typical set T UY ( n, ǫ ) ≥ (1 − ǫ ) [18, chap. 2]. The proofconcludes by observing that H ( Y n ) ≥ H ( Y n | J ) and ǫ ′ → , ǫ ′′ → as n → ∞ .We will now use Lemmas 1 and 2 to demonstrate the optimality of the Heegard-Berger andKaspi encoding for the uninformed and informed source models respectively. C. Uninformed Encoder with Side Information Privacy
We first consider the source network in which the encoder does not have side informationand derive the set of all feasible rate-distortion-equivocation (RDE) pairs. The resulting problemmay be viewed as the Heegard-Berger problem with an additional privacy constraint at decoder . Our result demonstrates that the optimal coding scheme is the same as the Heegard-Bergerproblem without a privacy constraint. The proof makes use of the independence of the Wyner-Zivbinning index from the side information Y n in tightly bounding the achievable equivocation. Webriefly sketch the proof here; the detailed proof can be found in the appendix.
1) Rate-Distortion-Equivocation ( R, D , D , E ) Tuples:Definition 2:
Let Γ U ( D , D ) and R U ( D , D , E ) be two functions defined as Γ U ( D , D ) ≡ max P U ( D ,D ,E ) H ( Y | W ) (33) R U ( D , D , E ) ≡ min P U ( D ,D ,E ) I ( X ; W ) + I ( X ; W | W Y ) (34) July 10, 2018 DRAFT1 such that R U ≡ { ( R, D , D , E ) : D ≥ , D ≥ , ≤ E ≤ Γ U ( D , D ) , R ≥ R U ( D , D , E ) } (35)where the subscript U denotes the uninformed case, P U ( D , D , E ) is the set of all p ( x, y ) p ( w , w | x ) that satisfy (3) and (4), Y − X − ( W , W ) is a Markov chain, and |W | = |X | + 2 , |W | =( |X | + 1) . Lemma 3: Γ U ( D , D ) is a non-decreasing, concave function of ( D , D ) (i.e., for all D l ≥ , l = 1 , ) . Lemma 3 follows from the concavity properties of the (conditional) entropy function as afunction of the underlying distribution, and therefore, of the distortion.
Theorem 1:
For a bivariate source ( X, Y ) where only X n is available at the source, and Y n is available at decoder 2 but not at decoder 1, we have R = R U , Γ ( D , D ) = Γ U ( D , D ) , and R ( D , D , E ) = R U ( D , D , E ) . (36) Proof sketch : Converse : A lower bound on R ( D , D , E ) is the same as that in [1] andinvolves the introduction of two auxiliary variables W ,i ≡ ( J, Y i − ) and W ,i ≡ (cid:0) X i − Y ni +1 (cid:1) .Using this definition of W ,i , one can expand the equivocation definition in (4) to show that Γ( D , D ) ≤ H ( Y | W ) . Achievable scheme : The achievable scheme begins with a rate-distortion code for decoder 1by mapping an observed x n sequence to one of a set of nI ( X ; W ) w n sequences, denoted w n ( j ) ,subject to typicality requirements. For this choice of w n ( j ) , a second code for decoder 2 resultsfrom choosing a conditionally typical sequence out of a set of nI ( X ; W | W ) w n sequences, denotedby w n ( j | j ) , and binning the resulting sequence into one of n ( I ( X ; W | W ) − I ( Y ; W | W )) bins,denoted by b ( j ) , chosen uniformly. The pair ( j , b ( j )) is revealed to the decoders. We show inthe appendix that this scheme achieves an equivocation of H ( Y | W ) asymptotically; the crux ofour proof relies on the fact that the binning index B ( J ) is conditionally independent of ( XW ) conditioned on W , i.e., the random variables are related via the Markov chain relationship Y − ( XW ) − W − B ( J ) . Remark 4:
An intuitive way to interpret the equivocation arises from the following decom-
July 10, 2018 DRAFT2 position: n H ( Y n | J , B ( J , J )) = 1 n H ( Y n | J ) (37a) − n I ( Y n ; B ( J ) | J ) (37b) = 1 n H ( Y n | W n ( J )) (37c) − n I ( Y n ; B ( J ) | W n ( J )) . The first term in (37c) is approximately equal to H ( Y | W ) while the second term, which inthe limit goes to , follows from a conditional version of Lemma 2 and the fact that Y − X − ( W W ) − B ( J ) forms a Markov chain. D. Informed Encoder with Side Information Privacy
We now consider the source network in which the encoder has access to the side information Y n and derive the set of all feasible rate-distortion-equivocation tuples. The resulting problemmay be viewed as the Kaspi problem with an additional privacy constraint about Y n at decoder . Our results below demonstrate that the Kaspi coding scheme achieves the set of all rate-distortion-equivocation tuples. However, for a given ( D , D , E ) pair, the minimal rate R ( D , D , E ) willin general be different from the R ( D , D ) for the original Kaspi problem.Our proof includes a two-step achievable scheme involving binning for the conditional rate-distortion function for which we show that the bin index is independent of the side information Y n . Our converse is a minor modification of the converse in [2] and involves two auxiliaryrandom variables. We briefly sketch the proof here; the details are relegated to the appendix.
1) Rate-Distortion-Equivocation ( R, D , D , E ) Tuples:Definition 3:
Let Γ I ( D , D ) and R I ( D , D , E ) be two functions defined as Γ I ( D , D ) ≡ max P I ( D ,D ,E ) H ( Y | W ) , and (38) R I ( D , D , E ) ≡ min P I ( D ,D ,E ) I ( XY ; W ) + I ( X ; W | W Y ) (39)such that R I ≡ { ( R, D , D , E ) : D ≥ , D ≥ , ≤ E ≤ Γ I ( D , D ) , R ≥ R I ( D , D , E ) } (40) July 10, 2018 DRAFT3 where P I ( D , D , E ) is the set of all p ( x, y ) p ( w , w | x, y ) that satisfy (3) and (4) and |W | = |X | + 2 , |W | = ( |X | + 1) . Remark 5:
The cardinality bounds on W and W can be obtained analogously to the argu-ments in [1, p. 730]. Lemma 4: R I ( D , D , E ) is a convex function of ( D , D , E ) . Theorem 2:
For a two-source ( X, Y ) where X n is available at the source, and Y n is availableat the source and at decoder 2 but not at decoder 1, we have R = R I , Γ ( D , D ) = Γ I ( D , D ) , and R ( D , D , E ) = R I ( D , D , E ) . (41) Proof sketch : Converse : A lower bound on R ( D , D , E ) can be obtained analogously to thebounds in [2] with the introduction of two auxiliary variables W ,i ≡ ( J, Y i − ) and W ,i ≡ (cid:0) X i − Y ni +1 (cid:1) . Using this definition of W ,i , one can expand the equivocation definition in (4) toobtain Γ( D , D ) ≤ H ( Y | W ) . Achievable scheme : The achievable scheme begins with a rate-distortion code for decoder 1by mapping an observed ( x n , y n ) sequence to one of a set of nI ( XY ; W ) w n sequences, denotedby w n ( j ) , subject to typicality requirements. A second rate-distortion code for decoder 2 resultsfrom mapping ( x n , y n , w n ) to one of a set of nI ( XY W ; W ) w n sequences, denoted by w n ( j ) ,and binning the resulting sequence into one of n ( I ( XY W ; W ) − I ( Y W ; W )) bins, denoted by b ( j ) ,chosen uniformly. The pair ( j , b ( j )) is revealed to the decoders. In the appendix it is shownthat this scheme achieves an equivocation of H ( Y | W ) ; the crux of the proof relies on the factthat the binning index B ( J ) is conditionally independent of ( XY W ) conditioned on W . Remark 6:
An intuitive way to interpret the equivocation arises from the same decompositionas in (37) where the first term in (37c) is approximately equal to H ( Y | W ) while the secondterm, which in the limit goes to , follows from a conditional version of Lemma 2. Note that,in contrast to the uninformed case, the distribution here is such that ( XY ) − ( W W ) − B ( J ) forms a Markov chain.IV. R ESULTS FOR A B INARY S OURCE WITH E RASED S IDE I NFORMATION
We consider the following pair of correlated sources. X is binary and uniform, and Y = X, w.p. (1 − p ) E, w.p. p, July 10, 2018 DRAFT4 L / L p/ , D D L D = p D L Fig. 2. Partition of the ( D , D ) region: uninformed encoder case. and we consider the Hamming distortion metric, i.e., d ( x, ˆ x ) = x ⊕ ˆ x for both decoders and forboth the informed and uninformed cases. A. Uninformed Case
We are interested in the rate-distortion-equivocation tradeoff, given as, R ≥ I ( X ; W ) + I ( X ; W | Y, W ) , and (42) E ≤ H ( Y | W ) (43)where the rate and equivocation computation is over all random variables ( W , W ) that satisfythe Markov chain relationship ( W , W ) − X − Y and for which there exist functions f ( · ) and f ( · , · , · ) satisfying E [ d ( X, f ( W ))] ≤ D , and (44) E [ d ( X, f ( W , W , Y ))] ≤ D . (45)Let h ( a ) denote the binary entropy function defined for a ∈ [0 , . The ( D , D ) region for thiscase is partitioned into four regimes as shown in Fig. 2. July 10, 2018 DRAFT5 Γ (D ,D )R(D ,p/2)R(D ,p/8)H(Y)=h(p)+1−pH(Y|X)=h(p) Fig. 3. Illustration of the rate-equivocation tradeoff for p = 0 . . The rate-distortion-equivocation tradeoff is given as follows: R ( D , D ) = if ( D , D ) ∈ L ,p (1 − h ( D /p )); if ( D , D ) ∈ L , − h ( D ); if ( D , D ) ∈ L ,p (1 − h ( D /p )) + (1 − p )(1 − h ( D )); if ( D , D ) ∈ L . and Γ( D , D ) = h ( p ) + (1 − p ) h ( D ); if D ≤ / ,h ( p ) + (1 − p ); otherwise . In Figure 3, we have plotted R ( D , D ) and Γ( D , D ) for the cases in which D = p/ and D = p/ , and D ∈ [0 , / . Remark 7:
This example shows that the equivocation does not depend on the distortionachieved by the decoder 2 which has access to side-information Y , but rather depends onlyon the distortion achieved by the uninformed decoder 1.
1) Upper bound on Γ( D , D ) : For any D ≥ / , we use the trivial upper bound Γ( D , D ) ≤ H ( Y | W ) ≤ H ( Y ) (46) = h ( p ) + 1 − p. (47) July 10, 2018 DRAFT6
For any D ≤ / , we use the following: Γ( D , D ) ≤ H ( Y | W ) (48a) = H ( Y, X | W ) − H ( X | Y, W ) (48b) = H ( X | W ) + H ( Y | X ) − H ( X | Y, W ) (48c) = H ( X | W ) + H ( Y | X ) − pH ( X | W ) (48d) = H ( Y | X ) + (1 − p ) H ( X | W ) (48e) = H ( Y | X ) + (1 − p ) H ( X | W , ˆ X ) (48f) ≤ H ( Y | X ) + (1 − p ) H ( X | ˆ X ) (48g) ≤ H ( Y | X ) + (1 − p ) H ( X ⊕ ˆ X ) (48h) = H ( Y | X ) + (1 − p ) h ( P ( X = ˆ X )) (48i) ≤ h ( p ) + (1 − p ) h ( D ) (48j)where (48d) follows from a direct verification that H ( X | Y, W ) = pH ( X | W ) if X is uniformand Y is an erased version of X and W − X − Y forms a Markov chain.
1) Upper bound on Γ( D , D ) : • If ( D , D ) ∈ L , we use the lower bound R ( D , D ) ≥ . • If ( D , D ) ∈ L , we use the lower bound R ( D , D ) ≥ R ( Y ) W Z ( D ) [19]. • If ( D , D ) ∈ L , we use the lower bound R ( D , D ) ≥ − h ( D ) . • If ( D , D ) ∈ L , we show that R ( D , D ) ≥ p (1 − h ( D /p )) + (1 − p )(1 − h ( D )) . (49)Consider an arbitrary ( W , W ) such that ( W , W ) → X → Y is a Markov chain and thereexist functions f and f : ˆ X = f ( W ) , and ˆ X = f ( W , W , Y ) , such that Pr( X = ˆ X j ) ≤ D j , j = 1 , . July 10, 2018 DRAFT7
Now consider the following sequence of equalities: I ( X ; W ) + I ( X ; W | Y, W ) = H ( X ) − H ( X | W ) + H ( X | Y, W ) − H ( X | Y, W , W )= H ( X ) − I ( X ; Y | W ) − H ( X | Y, W , W )= H ( X ) − H ( Y | W ) + H ( Y | X, W ) − H ( X | Y, W , W )= H ( X ) + H ( Y | X ) − H ( Y | W ) − H ( X | Y, W , W ) . (50a)Consider the following term appearing in (50a): H ( Y | W ) = H ( Y, X | W ) − H ( X | Y, W ) (51a) = H ( Y | X ) + H ( X | W ) − H ( X | Y, W ) (51b) = H ( Y | X ) + (1 − p ) H ( X | W ) (51c) = H ( Y | X ) + (1 − p ) H ( X | W , ˆ X ) (51d) ≤ H ( Y | X ) + (1 − p ) H ( X | ˆ X ) (51e) ≤ H ( Y | X ) + (1 − p ) H ( X ⊕ ˆ X ) (51f) ≤ H ( Y | X ) + (1 − p ) h ( D ) . (51g)We also have D ≥ Pr( X = ˆ X ) (52a) = Pr( Y = E ) Pr( X = ˆ X | Y = E ) + Pr( Y = E ) Pr( X = ˆ X | Y = E ) (52b) ≥ Pr( Y = E ) Pr( X = ˆ X | Y = E ) (52c) = p Pr( X = ˆ X | Y = E ) (52d)which implies that Pr( X = ˆ X | Y = E ) ≤ D p ≤ . (53) July 10, 2018 DRAFT8
Now consider the following sequence of inequalities for the last term in (50a): H ( X | Y, W , W ) = H ( X | Y, W , W , ˆ X ) (54a) ≤ H ( X | Y, ˆ X ) (54b) = pH ( X | Y = E, ˆ X ) (54c) ≤ pH ( X ⊕ ˆ X | Y = E ) (54d) = ph ( P ( X = ˆ X | Y = E )) (54e) ≤ ph ( D /p ) (54f)where (54f) follows from (53). Using (51g) and (54f), we can lower bound (50a), to arrive at R ( D , D ) ≥ p (1 − h ( D /p )) + (1 − p )(1 − h ( D )) .
3) Coding Scheme: • If ( D , D ) ∈ L , the ( R, Γ) tradeoff is trivial. • If ( D , D ) ∈ L , we use the following coding scheme:In this regime, we have D ≥ / , hence the encoder sets W = φ , and sends only onedescription W = X ⊕ N , where N ∼ Ber ( D /p ) and N is independent of X . It can beverified that I ( X ; W | Y ) = p (1 − h ( D /p )) . Decoder estimates X by ˆ X as follows: ˆ X = Y ; if Y = E ; W ; if Y = E. Therefore the achievable distortion at decoder is (1 − p )0 + p ( D /p ) = D . • If ( D , D ) ∈ L , we use the following coding scheme:The encoder sets W = φ , and sends only one description W = X ⊕ N , where N ∼ Ber ( D ) and N is independent of X . It can be verified that I ( X ; W ) = 1 − h ( D ) . Decoder estimates X as ˆ X = W which leads to distortion of D . Decoder estimates X by ˆ X as follows: ˆ X = Y ; if Y = E ; W ; if Y = E. Therefore the achievable distortion at decoder is (1 − p )0 + p ( D ) = pD . Hence, as longas D ≥ pD , the fidelity requirement of decoder is satisfied. July 10, 2018 DRAFT9 • If ( D , D ) ∈ L , we use the following coding scheme:We select W = X ⊕ N , and W = W ⊕ N , where N ∼ Ber ( D /p ) , and N ∼ Ber ( α ) ,where α = ( D − D /p ) / (1 − D /p ) , and the random variables N and N are independentof each other and are also independent of X . At the uninformed decoder, the estimate iscreated as ˆ X = W , so that the desired distortion D is achieved.At the decoder with side-information Y , the estimate ˆ X is created as follows: ˆ X = Y ; if Y = E ; W ; if Y = E. Therefore the achievable distortion at this decoder is (1 − p )0 + p ( D /p ) = D . It isstraightforward to check that the rate required by this scheme matches the stated lowerbound on R ( D , D ) , and Γ( D , D ) = H ( Y | W ) = h ( p ) + (1 − p ) h ( D ) . This completesthe proof of the achievable part. B. Informed Encoder
For this case, the rate-distortion-equivocation tradeoff is given as R ≥ I ( X, Y ; W ) + I ( X ; W | W , Y ) , and (55) E ≤ H ( Y | W ) (56)where the joint distribution of ( W , W ) with ( X, Y ) can be arbitrary.As in the previous section, we partition the space of admissible ( D , D ) distortion pairs. Forsimplicity, we denote these partitions as follows: G = { ( D , D ) : D ≥ / , D ≥ p/ } , (57) G = { ( D , D ) : D ≥ / , D ≤ p/ } , (58) G = { ( D , D ) : D ≥ D + (1 − p ) / , D ≤ p/ } , (59) G = { ( D , D ) : D ≤ / , D ≥ D } , and (60) G = { ( D , D ) : D ≤ D + (1 − p ) / , D ≤ D } . (61)These partitions are illustrated in Figure 4.We provide a partial characterization the optimal ( R, E ) tradeoff as a function of ( D , D ) . Inparticular, we establish the tight characterization of ( R, E ) pairs for all values of ( D , D ) with July 10, 2018 DRAFT0 / / G p/ , D D G (1 − p ) / G G D = D G Fig. 4. Partition of ( D , D ) region: informed encoder case. the exception of when ( D , D ) ∈ G . This characterization reveals the benefit of the encoderside-information. It shows that in the presence of encoder side-information, there can be several ( R, E ) operating points relative to the case in which the encoder does not have side-information.(a) ( D , D ) ∈ G : In this case the ( R, Γ) region is trivial since both the decoders can satisfytheir distortion constraints which also yields the maximum equivocation, i.e., we have R ( D , D ) = 0 , and (62) Γ( D , D ) = h ( p ) + 1 − p (63)(b) ( D , D ) ∈ G : In this case, we use the proof as in the uninformed case for the partition L to show that R ( D , D ) = p (1 − h ( D /p )) , and (64) Γ( D , D ) = h ( p ) + 1 − p. (65)(c) ( D , D ) ∈ G : The ( R, Γ) tradeoff for this case is given as follows: R ( D , D ) = p (1 − h ( D /p )) , and (66) Γ( D , D ) = h ( p ) + 1 − p. (67) July 10, 2018 DRAFT1 / , , E , E , / D /p W ( X, Y ) D /p Fig. 5. Illustration of p ( w | x, y ) when D ≥ D + (1 − p ) / and D ∈ [0 , p/ . This case differs from the uninformed encoder case in the sense that for the same rate, wecan achieve the maximum equivocation and a non-trivial distortion for decoder . Since R ≥ R X | Y ( D ) = R YW Z ( D ) , and Γ ≤ H ( Y ) , the converse proof is straightforward. The interestingaspect of this regime is the coding scheme, which utilizes the side information at the encoderin a non-trivial manner. To achieve this tradeoff, we set W = 0 , and send only one description W to both the decoders. The conditional distribution p ( w | x, y ) that is used to generate the W n codewords is illustrated in Figure 5 .Hence the rate for this scheme is given by R ≥ I ( X, Y ; W ) (68) = H ( W ) − H ( W | X, Y ) (69) = 1 − H ( W | X, Y ) (70) = 1 − (1 − p ) − ph ( D /p ) (71) = p (1 − h ( D /p )) , (72) July 10, 2018 DRAFT2 and the equivocation is given as
Γ = H ( Y | W ) (73) = H ( Y ) − I ( Y ; W ) (74) = H ( Y ) − H ( W ) + H ( W | Y ) (75) = H ( Y ) − H ( W | Y ) (76) = H ( Y ) − − p ) H ( W | Y = X ) + pH ( W | Y = X ) (77) = H ( Y ) − − p ) + p (78) = H ( Y ) . (79)Decoder forms its estimate as follows: ˆ X = Y if Y = E ; W if Y = E, which yields a distortion of D at decoder . Decoder forms its estimate as ˆ X = W which yields P ( ˆ X = X ) = D + (1 − p )2 . Therefore, as long as D ≥ D + (1 − p )2 , this scheme achieves the optimal ( R, Γ) tradeoff.We now informally describe the intuition behind this coding scheme: since the encoder hasaccess to side-information Y , it uses the fact that whenever Y = X , no additional rate is requiredto satisfy the requirement of decoder , i.e., for (1 − p ) -fraction of time it is guaranteed to exactlyrecover X . However, this yields a distortion of (1 − p ) / at decoder (since decoder doesnot have access to Y ). In the remaining p -fraction of time, the encoder describes X with adistortion D /p , which contributes to a distortion of D at both the decoders. To summarize,the net distortion at decoder is D , whereas the distortion at decoder is lowered from / to (1 − p ) / D . Furthermore, by construction, W is independent of Y , i.e., H ( Y | W ) = H ( Y ) ,which results in the maximal equivocation at decoder 1. July 10, 2018 DRAFT3 (d) ( D , D ) ∈ G : For this case, the ( R, E ) tradeoff is given as the set of ( R, E ) pairs R ≥ − (1 − p ) h (cid:18) D − pα − p (cid:19) − ph ( α ) , and (80) E ≤ h ( p ) + (1 − p ) h (cid:18) D − pα − p (cid:19) , (81)where the parameter α belongs to the range α ∈ [0 , D /p ] .We now describe the coding scheme that achieves this region: we set W = φ , and send onedescription W at a rate I ( X, Y ; W ) . The conditional distribution p ( w | x, y ) that is used togenerate the W n codewords is illustrated in Figure 6. The parameters ( α, β ) that describe thisdistribution are chosen such that D ≥ P ( X = W ) (82) ≥ (1 − p ) β + pα, (83)so that β ≤ ( D − pα ) / (1 − p ) . At decoder , the estimate ˆ X is created as ˆ X = Y ; if Y = E ; W ; if Y = E, which yields a distortion of pα . Since α ∈ [0 , D /p ] , the worst case distortion for decoder for afixed D is p ( D /p ) = D . Hence, as long as D ≥ D , we can satisfy the fidelity requirementsat both decoders. By direct calculations, it can be shown that the resulting ( R, E ) tradeoff is asstated above.Compared to all the previous cases, the proof of optimality of the above coding scheme isnon-trivial and is relegated to the appendix.We remark here that in this regime, the tradeoff between rate and privacy can be observedin a precise manner. First, note that the choice α = D yields the ( R, E ) operating point as inthe uninformed encoder case. Next, when α decreases from D to , the equivocation increases,albeit at the cost of a higher rate. This phenomenon does not occur in the case in which theencoder does not have side information.Finally, when α is in the range ( D , D /p ] , we obtain a lower equivocation by increasing therate. This phenomenon appears counterintuitive and can be explained as follows: this range of α corresponds to a coding scheme in which we give more weight to the side-information Y whendescribing X to decoder . Such a coding scheme can be regarded as the solution to the problem July 10, 2018 DRAFT4 β , , E , E , W ( X, Y ) β = (cid:16) D − pα − p (cid:17) α αβ Fig. 6. Illustration of p ( w | x, y ) when D ≤ / , D ≥ D . in which the encoder is interested in revealing Y to decoder , while simultaneously satisfyingthe fidelity requirement for X at decoder . While it is a feasible solution to the problem, itmay not be a desirable coding scheme when the privacy of Y at decoder is of primary concern,and thus, there exists a set of rate-equivocation operating points that one can choose from. InFigure 7, we show the ( R, E ) achievable tradeoff when p = 0 . and D = 0 . .(d) ( D , D ) ∈ G : For this case, the following ( R, E ) pairs are achievable: R ≥ − (1 − p ) h (cid:18) D − pα − p (cid:19) − ph ( α ) , and (84) E ≤ h ( p ) + (1 − p ) h (cid:18) D − pα − p (cid:19) , (85)where α is such that α ∈ [0 , D /p ] . The coding scheme that achieves this tradeoff is similarto the one used when ( D , D ) ∈ G , with the exception that the range of α is different. Thequestion of optimality of tradeoff for this regime is still unresolved.V. C ONCLUDING R EMARKS
We have determined the rate-distortion-equivocation region for a source coding problem withtwo decoders, in which only one of the decoders has correlated side information and it isdesired to keep this side information private from the uninformed decoder. We have studied
July 10, 2018 DRAFT5 R (R, Γ )−uninformed encoder D ≥ α ≥ /p ≥ α ≥ D Fig. 7. Illustration of the rate-equivocation tradeoff for p = 0 . , D = 0 . with an informed encoder. two cases of this problem depending on the availability of side information at the encoder. Wehave proved that the Heegard-Berger and the Kaspi coding schemes are optimal even with anadditional privacy constraint for the uninformed and the informed encoder cases, respectively.We have illustrated our results for a binary symmetric source with erasure side information andHamming distortion which clearly highlight the difference between the informed and uninformedcases and the advantages of encoder side information for enhancing privacy. Future work includesgeneralization to multiple decoders as well as to continuously distributed sources.A PPENDIX
A. Proof of Theorem 1Converse : The lower bound on R ( D , D , E ) follow directly from the converse for theHeegard-Berger problem and is omitted here in the interest of space. We now upper bound July 10, 2018 DRAFT6 the maximal achievable equivocation as n H ( Y n | J ) = n P i =1 n H (cid:0) Y i | Y i − J (cid:1) (86a) = n P i =1 n H ( Y i | W i ) (86b) ≤ Γ U ( D , D ) (86c)where (86b) follows from defining W ,i ≡ ( J, Y i − ) (see [1, sec. IV]) and (86c) follows fromthe definition of Γ U ( D , D ) in (33) and its concavity property from Lemma 3. Achievability : We briefly summarize the Heegard-Berger coding scheme [1]. Fix p ( w , w | x ) .First generate M = 2 n ( I ( W ; X )+ ǫ ) , W n ( j ) sequences, j = 1 , , . . . , M , independently andidentically distributed (i.i.d.) according to p ( w ) . For every W n ( j ) sequence, generate M =2 n ( I ( W ; X | W )+ ǫ ) W n ( j | j ) sequences i.i.d. according to p ( w | w ( j )) . Bin the resulting W n sequences into S bins (analogously to the Wyner-Ziv binning), chosen at random where S =2 n ( I ( X ; W | W ) − I ( Y ; W | W )+ ǫ ) , and index these bins as b ( j ) . Upon observing a source sequence x n , the encoder searches for a W n ( j ) sequence such that ( x n , w n ( j )) ∈ T XW ( n, ǫ ) (the choiceof M ensures that there exists at least one such j ). Next, the encoder searches for a w n ( j | j ) such that ( x n , w n ( j ) , w n ( j | j )) ∈ T XW W ( n, ǫ ) (the choice of M ensures that there exists atleast one such j ). The encoder sends ( j , b ( j )) where b ( j ) is the bin index of the w n ( j | j ) sequence. Thus, we have that ( XW ) − W − B forms a Markov chain and Pr ( B = b ( j ) | ( x n , w n ( j ) , w n ( j | j )) ∈ T XW W ( n, ǫ ))= Pr ( B = b ( j ) | w n ( j | j ) ∈ T W ( n, ǫ )) = 1 /S. (87)With µ as defined in (7) for the typical set T XY W W , and J ≡ ( J , B ( J )) , the achievableequivocation can be lower bounded as n H ( Y n | J , B ( J )) ≥ n H ( Y n | J , B ( J ) , µ ) (88a) = 1 n H ( Y n | W n ( J ) , B ( J ) , µ ) (88b) ≥ Pr ( µ = 1) 1 n H ( Y n | W n ( J ) , B ( J ) , µ = 1) . (88c) July 10, 2018 DRAFT7
The probability
Pr ( y n | w n ( j ) , b ( j ) , µ = 1) for all j , j , and y n can be written as P ( x n ,j ) Pr ( y n , j , x n | w n ( j ) , b ( j ) , µ = 1)= P ( x n ,j ) Pr ( x n , j | w n ( j ) , b ( j ) , µ = 1) Pr ( y n | x n , µ = 1) (89a) = P ( x n ,j ) Pr ( x n , j , w n ( j ) , b ( j ) | µ = 1)Pr ( w n ( j ) , b ( j ) | µ = 1) Pr ( y n | x n , µ = 1) (89b) = P ( x n ,j ) Pr ( x n , j , w n ( j ) | µ = 1) /S Pr ( w n ( j ) | µ = 1) /S Pr ( y n | x n , µ = 1) (89c) ≤ nǫ ′ P ( x n ,j ) Pr ( x n , j | w n ( j ) , µ = 1) Pr ( y n | x n , µ = 1) (89d) = 2 nǫ ′ P ( x n ,j ) Pr ( x n , j , y n | w n ( j ) , µ = 1) (89e) = 2 nǫ ′ Pr ( y n | w n ( j ) , µ = 1) (89f)where (89a) follows from the fact that Y − X − ( W , W ) forms a Markov chain and (89d) isobtained by expanding Pr ( w n ( j ) , b ( j ) | µ = 1) as follows: Pr ( w n ( j ) , b ( j ) | µ = 1)= Pr ( w n ( j ) | µ = 1) P w n Pr ( b ( j ) , w n ( j ) | w n ( j ) , µ = 1) (90a) = Pr ( w n ( j ) | µ = 1) P w n Pr ( w n ( j ) | w n ( j ) , µ = 1) 1 S (90b) ≥ Pr ( w n ( j ) | µ = 1) (1 − ǫ ) S (90c) = Pr ( w n ( j ) | µ = 1) 2 − nǫ ′ S (90d)where (90b) follows from the fact that W − W − B forms a Markov chain and (87), while(90c) follows the fact that for a typical set Pr ( T W W ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2]. Thus, from(89) we have that Pr ( y n | w n ( j ) , b ( j ) , µ = 1) ≤ nǫ ′ Pr ( y n | w n ( j ) , µ = 1) (91) ≤ − n ( H ( Y | W ) − ǫ ′′ ) . (92) July 10, 2018 DRAFT8
From (88c) and (92), we then have H ( Y n | w n ( j ) , b ( j ) , µ = 1) ≥ P y n Pr ( y n | w n ( j ) , µ = 1) n ( H ( Y | W ) − ǫ ′′ ) (93) ≥ n (1 − ǫ ) ( H ( Y | W ) − ǫ ′′ ) (94)such that n H ( Y n | J ) ≥ Pr ( µ = 1) 1 n P w n ,b ( j ) Pr ( w n ( j ) , b ( j ) | µ = 1) H ( Y n | w n ( j ) , b ( j ) , µ = 1) (95) ≥ (1 − ǫ ) ( H ( Y | W ) − ǫ ′′ ) (96)where we have used the fact that for a typical set Pr ( T Y W W ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2].The proof concludes by observing that H ( Y n ) ≥ H ( Y n | J ) and ǫ → , ǫ ′′ → as n → ∞ . B. Proof of Theorem 2Converse : A lower bound on R ( D , D , E ) can be obtained as follows. nR ≥ H ( J ) (97a) ≥ I ( X n Y n ; J ) (97b) = I ( X n ; J | Y n ) + I ( Y n ; J ) (97c) = n P i =1 (cid:8) I ( X i ; J X i − Y − Y ni +1 | Y i ) − I ( X i ; X i − Y − Y ni +1 | Y i ) + I ( Y i ; J, Y i − ) − I ( Y i ; Y i − ) (cid:9) = n P i =1 (cid:8) I ( X i ; J X i − Y i − Y ni +1 | Y i ) + I ( Y i ; J Y i − ) (cid:9) (97d) = n P i =1 (cid:8) I ( X i ; J Y i − | Y i ) + I ( X i ; X i − Y ni +1 | J Y i − Y i ) + I ( Y i ; J Y i − ) (cid:9) (97e)where (97d) follows from the independence of the pairs ( X i , Y i ) for all i = 1 , , . . . , n . Let W ,i ≡ ( J, Y i − ) and W ,i ≡ (cid:0) X i − Y ni +1 (cid:1) . With these definitions, (97e) can be written as nR ≥ n P i =1 { I ( X i Y i ; W ,i ) + I ( X i ; W ,i | W ,i Y i } (98) ≥ n P i =1 R I ( D ,i , D ,i , E i ) (99) ≥ nR I ( D , D , E ) (100) July 10, 2018 DRAFT9 where (99) follows from Definition 3 with D ,i , D ,i , and E i defined as D ,i ≡ E (cid:2) d (cid:0) X i , g ′ ,i ( W ,i ) (cid:1)(cid:3) (101a) D ,i ≡ E (cid:2) d (cid:0) X i , g ′ ,i ( W ,i , W ,i , Y i ) (cid:1)(cid:3) , and (101b) E i ≡ H ( Y i | W ,i ) , (101c)and (100) follows from the convexity of R I ( D , D , E ) and the definitions of D k , k = 1 , , in(3) and the concavity of H ( Y | W ) , and hence, of E . We upper bound the maximal achievableequivocation as n H ( Y n | J ) = n P i =1 n H (cid:0) Y i | Y i − J (cid:1) (102a) = n P i =1 n H ( Y i | W i ) (102b) = n P i =1 n E i (102c) ≤ n P i =1 n Γ ( D i , D i ) (102d) ≤ Γ I ( D , D ) (102e)where (102b) follows from the definition of W ,i , (102c) and (102d) follow from (38) in Definition3 and from Lemma 3. Achievability : Fix p ( w , w | x, y ) . First generate M = 2 n ( I ( W ; XY )+ ǫ ) , W n ( j ) sequences, j = 1 , , . . . , M , i.i.d. according to p ( w ) (obtained from p ( w , w | x, y )) . Generate M =2 n ( I ( W ; XY W )+ ǫ ) W n ( j ) sequences i.i.d. according to p ( w ) (obtained from p ( w , w | x, y )) . Binthe resulting W n sequences into S bins (analogously to the Wyner-Ziv binning), chosen at randomwhere S = 2 n ( I ( XY W ; W ) − I ( W Y ; W )+ ǫ ) , and index these bins as b ( j ) . Upon observing a sourcesequence ( x n , y n ) , the encoder searches for a W n ( j ) sequence such that ( x n , y n , w n ( j )) ∈T XY W ( n, ǫ ) (the choice of M ensures that there exists at least one such j ). Next, the encodersearches for a w n ( j ) such that ( x n , y n , w n ( j ) , w n ( j )) ∈ T XY W W ( n, ǫ ) (the choice of M ensures that there exists at least one such j ). The encoder sends ( j , b ( j )) where b ( j ) is thebin index of the w n ( j ) sequence at a rate R = I ( XY ; W ) + I ( X ; W | W Y ) + ǫ . Thus, we July 10, 2018 DRAFT0 have
Pr ( B = b ( j ) | ( x n , y n , w n ( j ) , w n ( j )) ∈ T XY W W ( n, ǫ ))= Pr ( B = b ( j ) | w n ( j ) ∈ T W ( n, ǫ )) = 1 /S. (103)where (103) is the result of the code construction which yields a Markov chain relationship ( XY W ) − W − B . With µ as defined in (7) for the typical set T XY W W , and J ≡ ( J , B ( J )) ,the achievable equivocation can be lower bounded as n H ( Y n | J , B ( J )) ≥ n H ( Y n | J , B ( J ) , µ ) (104a) = 1 n H ( Y n | W n ( J ) , B ( J ) , µ ) (104b) ≥ Pr ( µ = 1) 1 n H ( Y n | W n ( J ) , B ( J ) , µ = 1) . (104c)The probability Pr ( y n | w n ( j ) , b ( j ) , µ = 1) for all j , j , and y n can be written as P w n Pr ( y n , w n | w n ( j ) , b ( j ) , µ = 1)= P w n Pr ( w n | w n ( j ) , b ( j ) , µ = 1) Pr ( y n | w n ( j ) , w n , µ = 1) (105a)where (105a) follows from the fact that ( XY W ) − W − B forms a Markov chain. The probability Pr ( w | w ( j ) , b ( j ) , µ = 1) can be rewritten as Pr ( w n , w n ( j ) , b ( j ) | µ = 1)Pr ( w n ( j ) , b ( j ) | µ = 1)= Pr ( w n , w n ( j ) | µ = 1) / | S | P w Pr ( w n , w n ( j ) | µ = 1) / | S | (106) = Pr ( w n | w n ( j ) , µ = 1) . (107)Substituting (107) in (105a), Pr ( y n | w n ( j ) , b ( j ) , µ = 1) can be written as P w n Pr ( w n | w n ( j ) , µ = 1) Pr ( y n | w n ( j ) , w n , µ = 1)= P w n Pr ( y n , w n | w n ( j ) , µ = 1) (108a) = Pr ( y n | w n ( j ) , µ = 1) (108b) ≤ − n ( H ( Y | W ) − ǫ ) (108c) July 10, 2018 DRAFT1 where we have used the fact that for a typical set
Pr ( T Y W W ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2].From (104c) and (108c), we then have H ( Y n | w n ( j ) , b ( j ) , µ = 1) = P y n Pr ( y n | w n ( j ) , µ = 1) log 1Pr ( y n | w n ( j ) , µ = 1) (109a) ≥ P y n Pr ( y n | w n ( j ) , µ = 1) n ( H ( Y | W ) − ǫ ) (109b) ≥ n (1 − ǫ ) ( H ( Y | W ) − ǫ ) (109c)where in (109b) we have used the fact that for a typical set Pr ( T Y W W ( n, ǫ )) ≥ (1 − ǫ ) [18,chap. 2]. Thus, we have n H ( Y n | J ) ≥ Pr ( µ = 1) 1 n P w n ,b ( j ) Pr ( w n ( j ) , b ( j ) | µ = 1) H ( Y n | w n ( j ) , b ( j ) , µ = 1) (110) ≥ (1 − ǫ ) ( H ( Y | W ) − ǫ ) (111)where we have used the fact that for a typical set Pr ( T Y W W ( n, ǫ )) ≥ (1 − ǫ ) [18, chap. 2].The proof concludes by observing that H ( Y n ) ≥ H ( Y n | J ) and ǫ → as n → ∞ . C. Converse Proof for region G We start by a simple lower bound on the rate R ≥ I ( X, Y ; W ) + I ( X ; W | W , Y ) ≥ I ( X, Y ; ˆ X ) (112)and an upper bound on Γ Γ ≤ H ( Y | W )= H ( Y | W , ˆ X ) ≤ H ( Y | ˆ X )= H ( Y ) − I ( Y ; ˆ X ) . (113)We will now use the distortion constraint of decoder alone to simultaneously lower bound therate and upper bound the equivocation. Consider an arbitrary p (1) (ˆ x | x, y ) (and denote this as July 10, 2018 DRAFT2 distribution P ) given as: p (1) (0 | ,
0) = a, p (1) (0 | ,
1) = bp (1) (0 | , E ) = c, p (1) (0 | , E ) = d. For this distribution, we have P ( X = ˆ X ) = (1 / h (1 − p )(1 − a + b ) + p (1 − c + d ) i (114) H ( ˆ X ) = h (cid:18) (cid:2) (1 − p )( a + b ) + p ( c + d ) (cid:3)(cid:19) (115) H ( ˆ X | X, Y ) = (1 − p )2 ( h ( a ) + h ( b )) + p h ( c ) + h ( d )) (116) H ( ˆ X | Y ) = (1 − p )2 ( h ( a ) + h ( b )) + ph (cid:18) c + d (cid:19) . (117)These four quantities characterize the bounds in (112) and (113) exactly and also the achievabledistortion.Now consider a new distribution P , with conditional probabilities as follows: p (2) (0 | ,
0) = 1 − b, p (2) (0 | ,
1) = 1 − ap (2) (0 | , E ) = 1 − d, p (2) (0 | , E ) = 1 − c. It is straightforward to verify that the distortion, rate and equivocation terms are the same forboth P and P . Next, define a new distribution P as follows: p (3) (ˆ x | x, y ) = p (1) (ˆ x | x, y ) w.p. / ,p (2) (ˆ x | x, y ) w.p. / . We now note that I ( X, Y ; ˆ X ) is convex in p (ˆ x | x, y ) and H ( Y | ˆ X ) = H ( Y ) − I ( Y ; ˆ X ) isconcave in p (ˆ x | y ) . By Jensen’s inequality, this implies that the distribution P defined aboveuses a rate that is at most as large and leads to an equivocation that is at least as large whencompared to both the distributions P and P . Hence, it suffices to consider input distributionsof the form p (3) (ˆ x | x, y ) , which can be explicitly written as p (3) (0 | ,
0) = 1 − β, p (3) (0 | ,
1) = βp (3) (0 | , E ) = 1 − α, p (3) (0 | , E ) = α. July 10, 2018 DRAFT3
To satisfy the distortion constraint, we also have D ≥ (1 − p ) β + pα which leads to β = ( D − pα ) / (1 − p ) . Now, also note that for a fixed α , this scheme yields adistortion of pα at the decoder . Furthermore, since the range of α ∈ [0 , D /p ] , we note thatthe worst case distortion for decoder (for a fixed D ) is pD /p = D . This implies that aslong as D ≥ D this region yields the stated tradeoff for the region G .R EFERENCES [1] C. Heegard and T. Berger, “Rate distortion when side information may be absent,”
IEEE Trans. Inform. Theory , vol. 31,pp. 727–733, Nov. 1985.[2] A. Kaspi, “Rate-distortion function when side-information may be present at the decoder,”
IEEE Trans. Inform. Theory ,vol. 40, no. 6, pp. 2031–2034, Nov. 1994.[3] D. G¨und¨uz, E. Erkip, and H. V. Poor, “Lossless compression with security constraints,” in
Proc. IEEE Intl. Symp. Inform.Theory , Toronto, ON, Canada, 2008, pp. 111–115.[4] L. Grokop, A. Sahai, and M. Gastpar, “Discriminatory source coding for a noiseless broadcast channel,” in
Proc. IEEEIntl. Symp. Inform. Theory , Adelaide, Australia, 2005, p. 77.[5] J. Villard and P. Piantanida, “Secure lossy source coding with side information at the decoders,” in
Proc. 48th AnnualAllerton Conf. Commun., Control and Computing , Monticello, IL, Sept. 2010, pp. 733–739.[6] R. Tandon, S. Mohajer, and H. V. Poor, “Cascade source coding with erased side information,” in
Proc. IEEE Symp.Inform. Theory , St. Petersburg, Russia, Aug. 2011.[7] R. Tandon, L. Sankar, and H. V. Poor, “Multi-user privacy: The Gray-Wyner system and generalized common information,”in
Proc. IEEE Symp. Inform. Theory , St. Petersburg, Russia, Aug. 2011.[8] R. Tandon and S. Ulukus, “Secure source coding with a helper,” Oct. 2009, submitted to the
IEEE Trans. Inform. Theory .[9] L. Sankar, S. R. Rajagopalan, and H. V. Poor, “A theory of privacy and utility in databases,” Feb. 2011, submitted to the
IEEE Trans. Inform. Theory .[10] V. Prabhakaran and K. Ramchandran, “On secure distributed source coding,” in
Proc. IEEE Information Theory Workshop,2007 , Tahoe City, CA, Sep. 2007, pp. 442–447.[11] P. Cuff, “Using a secret key to foil an eavesdropper,” in
Proc. 2010 48th Annual Allerton Conf. Commun., Control, andComputing , Sep. 2010, pp. 1405 –1411.[12] H. Yamamoto, “A rate-distortion problem for a communication system with a secondary decoder to be hindered,”
IEEETrans. Inform. Theory , vol. 34, no. 4, pp. 835 –842, Jul. 1988.[13] N. Merhav and E. Arikan, “The Shannon cipher system with a guessing wiretapper,”
IEEE Trans. Inform. Theory , vol. 45,no. 6, pp. 1860–1866, Sep. 1998.
July 10, 2018 DRAFT4 [14] E. Perron, S. N. Diggavi, and I. E. Telatar, “On the role of encoder side-information in source coding for multiple decoders,”in
Proc. 2006 IEEE Intl. Symp. Inform. Theory , Seattle, WA, Jul. 2006, pp. 331 –335.[15] E. Perron, S. Diggavi, and E. Telatar, “The Kaspi rate-distortion problem with encoder side-information: Gaussian case,”Nov. 2005, EPFL LICOS-REPORT-2006-004, Lausanne, Switzerland.[16] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,”
IEEE Trans. Inform. Theory , vol. 19,no. 4, pp. 471–480, Jul. 1973.[17] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”
IEEE Trans.Inform. Theory , vol. 22, no. 1, pp. 1–10, Jan. 1976.[18] T. M. Cover and J. A. Thomas,
Elements of Information Theory . New York: Wiley, 1991.[19] T. Weissman and S. Verd´u, “The information lost in erasures,”
IEEE Trans. Inform. Theory , vol. 54, no. 11, pp. 5030–5058,Nov. 2008., vol. 54, no. 11, pp. 5030–5058,Nov. 2008.