Multi-Terminal Source Coding With Action Dependent Side Information
aa r X i v : . [ c s . I T ] O c t Multi-Terminal Source Coding With ActionDependent Side Information
Yeow-Khiang Chia, Himanshu Asnani and Tsachy WeissmanDepartment of Electrical Engineering, Stanford UniversityEmail: [email protected], [email protected], [email protected]
Abstract
We consider multi-terminal source coding with a single encoder and multiple decoders where either the encoderor the decoders can take cost constrained actions which affect the quality of the side information present at thedecoders. For the scenario where decoders take actions, we characterize the rate-cost trade-off region for losslesssource coding, and give an achievability scheme for lossy source coding for two decoders which is optimum fora variety of special cases of interest. For the case where the encoder takes actions, we characterize the rate-costtrade-off for a class of lossless source coding scenarios with multiple decoders. Finally, we also consider extensionsto other multi-terminal source coding settings with actions, and characterize the rate -distortion-cost tradeoff for acase of successive refinement with actions.
I. I
NTRODUCTION
The problem of source coding with decoder side information (S.I.) was introduced in [1]. S.I. acts as an importantresource in rate distortion problems, where it can significantly reduce the compression rate required. In classicalshannon theory and in work building on [1], S.I. is assumed to be either always present or absent. However, inpractical systems as we know, acquisition of S.I. is costly, the encoder or decoder has to expend resources to aquireside information. With this motivation, the framework for the problem of source coding with action-dependent sideinformation (S.I.) was introduced in [2], where the authors considered the cases where the encoder or decoder areallowed to take actions (with cost constraints) that affect the quality or availability of the side information presentat the decoders, and in some settings, the encoder. As noted in [2], one motivation for this setup is the case wherethe side information is obtained via a sensor through a sequence of noisy measurements of the source sequence.The sensor may have limited resources, such as acquisition time or power, in obtaining the side information. Thisis therefore modeled by the cost constraint on the action sequence to be taken at the decoder. Additional motivationfor considering this framework is given in [2]. We also refer readers to recent work in [3], [4] for related Shannontheoretic scenarios invoking the action framework.In this paper, we extend the source coding with action framework to the case where there are multiple decoders,which can take actions that affect the quality or availability of S.I. at each decoder, or where the encoder takesactions that affect the quality or availability of S.I. at the decoders. As a motivation for this framework, consider thefollowing problem: An encoder observes an i.i.d source sequence X n which it wishes to describe to two decodersvia a common rate limited link of rate R . The decoders, in addition to observing the output of the common ratelimited link, also have access to a common sensor which gives side information Y that is correlated with X .However, because of contention or resource constraints, when decoder 1 observes the side information, decoder2 cannot access the side information and vice versa. This problem is depicted in Figure 1. Even in the absenceof cost constraints on the cost of switching to or , this problem is interesting and non-trivial. How should thedecoders share the side information and what is the optimum sequence of actions be conveyed and then taken bythe decoder?By posing the above problem in the framework of source coding with action dependent side information, wesolve it for the (near) lossless source coding case, a special case of lossy source coding with switching dependentside information, and give interpretations of the standard random binning and coding arguments when specializedto this switching problem. As one example for the implications of our findings, when Y = X , we show that theoptimum rate required for lossless source coding in the above problem is H ( X ) / - clearly a lower bound onthe required rate, but that it suffices for perfect reconstruction of the source simultaneously at both decoders is,at first glance, surprising. We devote a significant portion of this paper to the setting where the side informationSfrag replacements Y i A i ( M ) A i ( M ) Dec 1Dec 2 X n X n M ∈ [1 : 2 nR ] Enc. X n Fig. 1: Lossless source coding with switching dependent side information. When the switch is at position , decoder1 observes the side information. When the switch is at position , decoder 2 observes the side information.at the decoders is obtained through a switch that determines which of the two decoders gets to observe the sideinformation, and obtain a complete characterization of the fundamental performance limits in various scenariosinvolving such switching. The achieving schemes in these scenarios are interesting in their own right, and alsoprovide insight into more general cases.The rest of the paper is organized as follows. In section II, we provide formal definitions and problem formulationsfor the cases considered. In section III, we first consider the setting of lossless source coding with decoders takingactions with cost constraints and give the optimum rate-cost trade-off region for this setting. Next, we consider thesetting of lossy source coding decoders taking actions with cost constraints and give a general achievability schemefor this setup. We then specialize our achievability scheme to obtain the optimum rate-distortion and cost trade-offregion for a number of special cases. In section V, we consider the setting where actions are taken by the encoder.The rate-cost-distortion tradeoff setting is open even for the single decoder case. Hence, we only consider a specialcase of lossless source coding for which we can characterize the rate-cost tradeoff. In section VI, we extend oursetup to two other multiple users settings, including the case of successive refinement with actions. The paper isconcluded in section VII. II. P ROBLEM D EFINITION
In this section, we give formal definitions for, and focus on, the case where there are two decoders. Generalizationof the definitions to K decoders is straightforward, and, as we indicate in subsequent sections, some of our resultshold in the K decoders setting. We follow the notation of [5]. We use A to denote the action random variable. Thedistortion measure between sequences is defined in the usual way. Let d : X × ˆ X → [0 , ∞ ) . Then, d ( x n , ˆ x n ) := n P ni =1 d ( x i , ˆ x i ) . The cost constraint is also defined in the usual fashion: let Λ( A n ) := n P ni =1 Λ( A i ) . Throughoutthis paper, sources ( X n , Y n ) are specified by the joint distribution p ( x n , y n ) = Q ni =1 p X,Y ( x i , y i ) (i.i.d.). Thedecoders obtain side information through a discrete memoryless action channel P Y ,Y | X,A specified by conditionaldistribution p ( y n , y n | x n , a n ) = Q ni =1 p Y ,Y | X,A ( y i , y i | x i , a i ) , with decoder j obtaining side information Y nj for j ∈ { , } . Extensions to more than two sources or more than two channel outputs for multiple decoders arestraightforward. A. Source coding with actions taken at the decoders
This setting for two decoders is shown in figure 2. A ( n, nR ) code for the above setting consists of one encoder f : X n → M ∈ [1 : 2 nR ] , one joint action encoder at all decoders f A − Dec. : M ∈ [1 : 2 nR ] → A n , and two decoders g : Y n × [1 : 2 nR ] → ˆ X n ,g : Y n × [1 : 2 nR ] → ˆ X n , X n Enc. Y n Y n Dec 1Dec 2 P Y ,Y | X,A ˆ X n ˆ X n A n ( M ) A n ( M ) M ∈ [1 : 2 nR ] Fig. 2: Lossy source coding with actions at the decoders.PSfrag replacements X n Enc. Y n Y n Dec 1Dec 2 P Y ,Y | X,A ˆ X n ˆ X n A n ( X n ) M ∈ [1 : 2 nR ] Fig. 3: Lossy source coding with actions at the encoder.Given a distortion-cost tuple ( D , D , C ) , a rate R is said to be achievable if, for any ǫ > and n sufficientlylarge, there exists ( n, nR ) code such that E " n n X i =1 d j ( X i , ˆ X j,i ) ≤ D j + ǫ, j=1,2 , E " n n X i =1 Λ( A i ) ≤ C + ǫ. The rate-distortion-cost region , R ( D , D , C ) , is defined as the infimum of all achievable rates. Causal reconstruction with action dependent side information : Some results in this paper involves the case of causal reconstruction . In the case of causal reconstruction, the decoder reconstructs ˆ X i based only on the receivedmessage M and the side information up to time i . That is, g j,i : Y ij × [1 : 2 nR ] → ˆ X j,i , for j ∈ { , } and i ∈ [1 : n ] . Remark 2.1:
The case of the decoders taking separate actions A and A respectively is a special case of oursetup since we can write A := ( A , A ) . Remark 2.2:
For the reconstruction mappings, we excluded the action sequence as an input since A n is a functionof the other input M . In our (information) rate expressions, we will see the appearance of A in the expressions.As we will see in the next subsection, an advantage of this definition is that it carries over to the case when theencoder takes actions rather than the decoders. 3 . Source coding with action taken at the encoder This setting is shown in figure 3. As the definitions and problem statement for this case are similar to the firstsetting, we will only mention the differences between the two settings. The main difference is that the encodertakes actions rather than the decoders. Therefore, in the definition of a code, we replace the case of a joint actionencoder at the decoders with the encoder taking actions given by the function f A − Enc : X n → A n . As in the setting of actions taken at the decoder, here too we assume that the side information observed by thedecoders is not available at the encoder. In subsequent sections we also describe the results pertaining to the casewhere side information is available at the encoder.
Remark 2.3: Lossless source coding -
Some of our results concern the case of lossless source coding. In thecase of lossless source coding, the definitions are similar, except that the distortion constraints D , D are replacedby the block probability of error constraint: P ( { ˆ X n = X n } ∪ { ˆ X n = X n } ) ≤ ǫ .III. L OSSLESS SOURCE CODING WITH ACTIONS AT THE DECODERS
In this section and the next, we consider the case of source coding with actions taken at the decoders. We firstpresent results for the lossless source coding setting. While the lossless case can be taken to be a special case oflossy source coding, we present them separately, as we are able to obtain stronger results for more general scenariosin the lossless setting, and give several interesting examples that arise from this setup. The case of lossy sourcecoding for two decoders is presented in section IV.For the lossless case, we first state the result for the general case of K decoders. Our result is stated in Theorem 1. Theorem 1:
Let the action channel be given by the conditional distribution P Y ,Y ,...,Y K | X,A with decoder j observing the side information Y j . Then, the minimum rate required for lossless source coding with actions takenat the decoders and cost constraint C is given by R = min max j ∈ [1: K ] { H ( X | Y j , A ) } + I ( X ; A ) , where min is taken over the distributions p ( x ) p ( a | x ) p ( y , y , . . . , y K | x, a ) such that E Λ( A ) ≤ C . Achievability
As the achievability techniques used are fairly standard (cf. [5]), we give only a sketch of achievability.
Codebook Generation: • Generate n ( I ( X ; A )+ ǫ ) A n sequences according to Q ni =1 p ( a i ) . • Bin the set of all X n sequences into n (max j ∈ [1: K ] { H ( X | Y j ,A ) } + ǫ ) bins, B ( m b ) , m b ∈ [1 : 2 n (max j ∈ [1: K ] { H ( X | Y j ,A ) } + ǫ ) ] . Encoding: • Given a source sequence x n , the encoder looks for an index M A ∈ [1 : 2 n ( I ( X ; A )+ ǫ ) ] such that ( x n , a n ( M A )) ∈T ( n ) ǫ . If there is none, it outputs an uniform random index from [1 : 2 n ( I ( X ; A )+ ǫ ) ] . If there is more than onesuch index, it selects an index uniformly at random from the set of feasible indices. From the covering lemma[5, Chapter 3], the probability of error for this step goes to as n → ∞ since there are n ( I ( X ; A )+ ǫ ) A n sequences. • The encoder also looks the index m b ∈ [1 : 2 n (max j ∈ [1: K ] { H ( X | Y j ,A ) } + ǫ ) ] such that x n ∈ B ( m b ) . • It then sends the indices m b and M A to the decoders via the common link. This step requires a rate of R = max j ∈ [1: K ] { H ( X | Y j , A ) } + I ( X ; A ) + 2 ǫ . Decoding: • The decoders take the joint action a n ( M A ) and obtain their side informations Y j for j ∈ [1 : K ] . • Decoder j then looks for the unique X n sequence in bin B ( m b ) such that ( X n , Y nj , a n ( M A )) ∈ T ( n ) ǫ .An error is declared if there is none more than one x n sequence satisfying the decoding condition. Theprobability of error for this step goes to as n → ∞ from the strong law of large numbers and the fact that |B| > n (max j ∈ [1: K ] { H ( X | Y j ,A )) } . 4 onverse Given a ( n, nR , C ) code, consider the rate constraint for decoder j . We have nR ≥ H ( M )= I ( M ; X n ) ( a ) = I ( A n ; X n ) + I ( M ; X n | A n ) ( b ) ≥ I ( A n ; X n ) + H ( M | A n , Y nj ) − H ( M | A n , X n , Y nj )= H ( X n ) − H ( X n | A n ) + I ( M ; X n | A n , Y nj ) ( c ) ≥ H ( X n ) − H ( X n | A n ) + H ( X n | A n , Y nj ) − nǫ n = H ( X n ) − H ( X n | A n ) + H ( X n | A n )+ H ( Y nj | X n , A n ) − H ( Y nj | A n ) − nǫ n ( d ) ≥ n X i =1 H ( X i ) + H ( Y ji | X i , A i ) − H ( Y ji | A i ) − nǫ n . ( a ) follows from A n being a function of M ; ( b ) follows from the Markov chain M → ( X n , A n ) → Y nj ; ( c ) follows from the assumption of lossless source coding; ( d ) follows from conditioning reduces entropy and the factthat the action channel is a discrete memoryless channel (DMC). Define Q as the standard time sharing randomvariable. Observe that H ( X Q | Q ) = H ( X Q ) = H ( X ) , H ( Y jQ | A Q , X Q , Q ) = H ( Y jQ | A Q , X Q ) = H ( Y j | A, X ) and H ( Y jQ | A Q , Q ) ≤ H ( Y j | A ) . Hence, we can write the lower bound as nR ≥ n ( H ( X ) + H ( Y j | X, A ) − H ( Y j | A ) − ǫ n )= n ( I ( X ; A ) + H ( X | Y j , A ) − ǫ n ) . Taking the intersection of all lower bounds for all K decoders then give us the rate expression given in the Theorem.Finally, the cost constraint on the action follows from C ≥ E n P ni =1 Λ( A i ) = E Λ( A ) .We now specialize the result in Theorem 1 to the case of source coding with switching dependent side informationmentioned in the introduction. We consider the more general setting involving K decoders. Corollary 1: Source coding with switching dependent side information and no cost constraints.
Let ( X, Y ) bejointly distributed according to p ( x, y ) . Let A = [1 : K ] and P Y ,Y ,...,Y K | X,A be defined by Y j = Y when A = j and e otherwise for j ∈ [1 : K ] . Let Λ( A ) := 0 for all a ∈ A . Then, the minimum rate is given by H ( X | Y ) + K − K I ( X ; Y ) . Proof:
Proof of Corollary 1 amounts to an explicit characterization of the distribution of p ( a | x ) in Theorem 1. For each j ∈ [1 : K ] , we have, from Theorem 1, R ≥ H ( X | Y j , A ) + I ( X ; A )= H ( X | Y ) + I ( X ; Y ) − I ( X ; Y j | A ) . (1)Consider now the sum K X j =1 I ( X ; Y j | A ) ( a ) = X a ∈A p ( a ) I ( X ; Y | A = a )= H ( Y | A ) − H ( Y | X, A ) ( b ) ≤ H ( Y ) − H ( Y | X )= I ( X ; Y ) . (2)5 a ) follows from the fact that Y j = e for a = j and Y j = Y for a = j . ( b ) follows from the Markov Chain A − X − Y .Next, summing over the K lower bounds in (1), we obtain R ≥ K ( KH ( X | Y ) + KI ( X ; Y ) − K X j =1 I ( X ; Y j | A )) ≥ H ( X | Y ) + I ( X ; Y ) − K I ( X ; Y )= H ( X | Y ) + K − K I ( X ; Y ) , where we used inequality (2) in the second last step. Finally, noting that this lower bound on the achievable ratecan be obtained from Theorem 1 by setting A ⊥ X and p ( a = j ) = 1 /K completes the proof of Corollary 1. Remark 3.1:
The action can be set to a fixed sequence independent of the source sequence. This is perhaps notsurprising since there is no cost on the actions.
Remark 3.2:
For K = 2 and X = Y , which is the example given in the introduction, we have R = H ( X ) / . Remark 3.3:
For this class of channels, the achievability scheme in Theorem 1 has a simple and interesting“modulo-sum” interpretation. We present a sketch of an alternative scheme for this class of switching channels for K = 2 . It is straightforward to extend the achievability scheme given below to K decoders. Alternative achievability scheme
Split the X n sequence into 2 equal parts; X n/ and X nn/ and select the fixed action sequence of lettingdecoder 1 observe Y n/ and decoder 2 observe Y nn/ . Separately compress each part using standard randombinning with side information to obtain M ∈ [1 : 2 n ( H ( X | Y ) / ǫ ) ] and M ∈ [1 : 2 n ( H ( X | Y ) / ǫ ) ] correspondingto the first and second half respectively. Within each bin, with high probability, there are only nI ( X ; Y ) / typical X n/ sequences and we represent each of them with an index M j ∈ [1 : 2 n ( I ( X ; Y ) / ǫ ) ] , where j ∈ { , } . Sendout the indexes M and M , which requires a rate of H ( X | Y ) + 2 ǫ . Next, send out the index M ⊕ M whichrequires a rate of I ( X ; Y ) / ǫ . From M and side information Y n/ , decoder 1 can recover X n/ with highprobability. Therefore, it can recover M with high probability. Hence, it can recover M from M ⊕ M andtherefore, recover the X n n +1 sequence. The same analysis holds for decoder 2 with the indices interchanged.Corollary 2 gives the characterization of the achievable rate for a general switching dependent side informationsetup with cost constraint on the actions for two decoders. Corollary 2: General switching depedent side information for 2 decoders.
Define the action channel as follows: A ∈ { , , , } ; A = 0 , Y = e, Y = e ; A = 1 , Y = Y, Y = e ; A = 2 , Y = e, Y = Y ; and A = 3 , Y = Y, Y = Y . Let Λ( A = j ) = C j for j ∈ [0 : 3] . Then, the optimum rate-cost trade-off for this class of channel is given by R ≥ I ( X ; A ) + max { H ( X | Y , A ) , H ( X | Y , A ) } = I ( X ; A ) + p H ( X | A = 0) + X j =1 p j H ( X | Y, A = j )+ max { p I ( X ; Y | A = 1) , p I ( X ; Y | A = 2) } , for some p ( a | x ) , where P { A = j } = p j , satisfying P j =0 p j C j ≤ C . Remark 3.4:
This setup again has a “modulo-sum interpretation” for the term max { p I ( X ; Y | A = 1) , p I ( X ; Y | A =2) } and the rate can also be achieved by extending the achievability scheme described in Corollary 1. Thescheme involves partitioning the X n sequence according to the value of A i for i ∈ [1 : n ] . Following thescheme in Corollary 1, we let M j ∈ [1 : 2 n ( p j H ( X | Y,A = j )+ ǫ ) ] for j ∈ [0 : 3] . We first generate a set of A n codewords according to Q ni =1 p ( a i ) . Next, for each A n codeword, define A n j to be { A i : A i = j } . Similarly, let X n j := { X i : A i = j, i ∈ [1 : n ] } be the set of possible X sequences corresponding to A n j . We bin the set ofall X n j sequences to n ( p j H ( X | Y,A = j )+ ǫ ) bins, B j ( M j ) . For j ∈ { , } , further bin the set of x n j sequences into n ( p j I ( X ; Y | A = j )+ ǫ ) bins, B j ( M j ) , M j ∈ [1 : 2 n ( p j I ( X ; Y | A = j )+ ǫ ) ] .For encoding, given an x n sequence, the encoder first finds an A n sequence that is jointly typical with x n .It sends out the index corresponding to the A n sequence found. Next, it splits the x n sequence into four partial6equences, x n j , for j ∈ [0 : 3] , where x n j is the set of x i corresponding to A i = j . It then finds the correspondingbin indices such that x n j ∈ B j ( I j ) for j ∈ [0 : 3] . It then sends out the indices M , M , M , M and M ⊕ M .For decoding, we mention only the scheme employed by the first decoder, since the scheme is the same in fordecoder 2. From the properties of jointly typical sequences and standard analysis for Slepian-Wolf lossless sourcecoding [6], it is not difficult to see that decoder 1 can recover x n , x n , x n with high probability. Recovery of x n also allows decoder 1 to recover the index M and hence, M from M ⊕ M . Noting that the rate of M and M sums up to p H ( X | A = 2) + 2 ǫ , it is then easy to see that decoder 1 can recover x n with high probability.In corollary 1, we showed that, for the case of switching dependent side information, the action sequence isindependent of the source X n when cost constraint on the actions is absent. A natural question to ask is whetherthe action is still independent of X n when a cost constraint on the actions is present? The following example showsthat the optimum action sequence is in general dependent on X n . Example 1: Action is dependent on source statistics when cost constraint is present.
Let K = 2 and ( X, Y ) bedistributed according to an S channel, with X ∼ Bern(1 / , P ( Y = 1 | X = 1) = 1 and P ( Y = 0 | X = 0) = 0 . .Let A ∈ { , } with Y = Y if A = 1 and Y = Y if A = 2 . Let P ( A = 1) = p , P ( X = 0 | A = 1) = 1 / δ and P ( X = 0 | A = 2) = 1 / − δ . Figure 4 shows the probability distributions between the random variables.PSfrag replacements X YA . + δ + δ Fig. 4: Probability distributions for random variables used in example 1Since X ∼ Bern(1 / , d and d are related by δ = p δ / (1 − p ) . Therefore, we set δ = δ and κ = p / (1 − p ) for this example.Now, let Λ( A = 1) = 1 and Λ( A = 2) = 0 and C = 0 . . The optimum rate-cost tradeoff in this case may beobtained from Corollary 2 by setting C = C = ∞ , C = 1 and C = 0 , giving us R = I ( X ; A ) + p H ( X | Y, A = 1) + (1 − p ) H ( X | Y, A = 2)+ max { p I ( X ; Y | A = 1) , (1 − p ) I ( X ; Y | A = 2) } , for some p ( a | x ) , where P { A = 1 } = p , satisfying p ≤ . . The problem of finding the optimum action sequenceto take then reduces (after some straightforward algebra) to the following optimization problem: min p ,δ − p H (0 . − δ ) − (1 − p ) H (0 . − κδ )+ p H ( X | Y, A = 1) + (1 − p ) H ( X | Y, A = 2)+ max { p I ( X ; Y | A = 1) , (1 − p ) I ( X ; Y | A = 2) } , subject to ≤ p ≤ . , − . ≤ δ ≤ . , H ( X | A = 1) = p H (0 . − δ ) ,H ( X | A = 2) = (1 − p ) H (0 . − κδ ) ,H ( X | Y, A = 1) = ((0 . δ )(0 .
8) + 0 . − δ ) H (cid:18) . − δ (0 . δ )(0 .
8) + (0 . − δ ) (cid:19) ,H ( X | Y, A = 2) = ((0 . − κδ )(0 .
8) + (0 . κδ )) H (cid:18) . κδ (0 . − κδ )(0 .
8) + (0 . κδ ) (cid:19) , and H ( . ) is the binary entropy function.While exact solution to this (non-convex) optimization problem involves searching over p and δ , it is easy to seethat if A is restricted to be independent of X , which corresponds to restricting δ to be equal to , then the optimumsolution for p is . . Under p = 0 . and δ = 0 , we obtain R A ⊥ X = 0 . . In contrast, setting p = 0 . and δ = − . , we obtain R = 0 . , which shows that the optimum action sequence is in general dependent on thesource X when cost constraints are present.An explanation for this observation is as follows. The cost constraint forces decoder 1 to see less of the sideinformation Y than decoder 2. It may therefore make sense to bias the distribution X | A = 1 so that Y conveys moreinformation about the source sequence X , even at the expense of describing the action sequence to the decoders.Roughly speaking, the amount of information conveyed about X by Y may be measured by I ( X ; Y ) . Note thatunder δ = 0 , I ( X ; Y | A = 1) = 0 . , whereas under δ = − . , I ( X ; Y | A = 1) = 0 . . A plot of the optimumrate versus cost tradeoff obtained by searching over a grid of p and δ is shown in Figure 5. The figure also showsthe rate obtained if actions were forced to be independent of the source sequence.IV. L OSSY SOURCE CODING WITH ACTION AT THE DECODERS
In this section, we first consider the case when causal reconstruction is required, and give the general rate-distortion-cost region for K decoders. Next, we consider the case of lossy noncausal reconstruction for two decodersand give a general achievability scheme for this case. We then show that our achievability scheme is optimum forseveral special cases. Finally, we discuss some connections between our setting and the complementary deliverysetting introduced in [7]. A. Causal reconstruction for K decodersTheorem 2: Causal lossy reconstruction for K decoders When the decoders are restricted to causal reconstruction [8], R ( D , D , . . . , D K , C ) is given by R = I ( U ; X ) for some p ( u | x ) , A = f ( U ) and reconstruction functions ˆ x j for j ∈ [1 : K ] such that E d j ( X, ˆ x j ( U, Y j )) ≤ D j for j ∈ [1 : K ] E Λ( A ) ≤ C. The cardinality of U is upper bounded by |U| ≤ |X ||A| + K . Remark 4.1 : Theorem 2 generalizes the corresponding result for one decoder in [2, Theorem 3].
Proof:
As the achievability scheme is a straightforward extension of the scheme in [2, Theorem 3], we willomit the proof of achievability here. For the converse, given a code that satisfies the cost and distortion constraints,8 R a t e OptimumAction independent
Fig. 5: Rate versus cost constraint for the example 1. It is easy to show operationally that the optimum rate versuscost curve is convex in the cost constraint. When the cost constraint approaches zero, the rate approaches 1, sincethis case corresponds to decoder 1 not seeing any of the side information. When the cost constraint approaches0.5, the rate approaches the minimum rate without cost constraint. The red dashed line shows the rate that wouldbe obtained if actions were forced to be independent of the source. As can be seen on graph, forcing actions tobe independent of the source is in general not optimum when cost constraint is present. The optimum rate versuscost constraint plot appears to be linear over a range of cost constraints. It can be shown that if the cost constraintis below a threshold, then the optimum rate is a linear function of the cost constraint. However, the plot obtainedvia numerical simulation appears to be linear in the cost constraint over a wider range than what we obtained byanalysis. Performing a more refined analysis to obtain a cost constraint threshold that matches the cost thresholdobtained by simulation appears to be difficult, due to the nature of the optimization problem that is involved.we have nR ≥ H ( M )= I ( X n ; M ) ( a ) = n X i =1 ( H ( X i ) − H ( X i | M, X i − )) ( b ) = n X i =1 ( H ( X i ) − H ( X i | M, X i − , A i − )) ( c ) = n X i =1 ( H ( X i ) − H ( X i | M, X i − , A i − , Y i − , . . . , Y i − K )) ≥ n X i =1 ( H ( X i ) − H ( X i | U i )) , where ( a ) follows from the fact that X n is a memoryless source; ( b ) follows from the fact that A i − is a functionof M ; ( c ) follows from the fact that the action channel p ( y , y , . . . , y k | x, a ) is a memoryless channel; and the laststep follows from defining U i = ( M, Y i − , . . . , Y i − K ) . Finally, defining Q to be a random variable uniform over9 n ] , independent of all other random variables, U = ( U Q , Q ) , X = X Q , A = A Q and Y j = Y jQ for j ∈ [1 : K ] then gives the required lower bound on the minimum rate required. Further, we have A = f ( U ) . It remains toverify that the cost and distortion constraints are satisfied. Verification of the cost constraint is straightforward. Forthe distortion constraint, we have for j ∈ [1 : K ] E n n X i =1 d j ( X i , ˆ x ji ( M, Y ij )) ≥ E d j ( X, ˆ x ′ j ( U, Y j )) , where we define ˆ x ′ j ( U, Y j ) := ˆ x jQ ( M, Y ij ) . This shows that the definition of the auxiliary random variable U satisfies the distortion constraints. Finally, the cardinality of U can be upper bounded by using the support lemma[9]. We require |X ||A| − letters to preserve P X,A , which also preserves the cost constraint. In addition, we require K + 1 letters to preserve the rate and K distortion constraints.We now turn to the case of noncausal reconstruction. For this setting, we give results only for the case of twodecoders. B. Noncausal reconstruction for two decoders
We first give a general achievability scheme for this setting.
Theorem 3:
An achievable scheme for the lossy source coding with actions at the decoders is given by R ≥ I ( X ; A ) + max { I ( X ; U | A, Y ) , I ( X ; U | A, Y ) } + I ( X ; V | U, A, Y ) + I ( X ; V | U, A, Y ) for some p ( x ) p ( a | x ) p ( u | a, x ) p ( v | u, a, x ) p ( v | u, a, x ) p ( y , y | x, a ) and reconstruction functions ˆ x and ˆ x satis-fying E d j ( X, ˆ x j ( U, V j , A, Y j )) ≤ D j for j = 1 , , E Λ( A ) ≤ C. We provide a sketch of achievability in Appendix A since the techniques used are fairly straightforward. As anoverview, the encoder first tells the decoders the action sequence to take. It then sends a common description of X n , U n , to both decoders. Based on the action sequence A n and the common description U n , the encoder sends V n and V n to decoders 1 and 2 respectively. We do not require decoder 1 to decode V n , or for decoder 2 todecode V n .Theorem 3 is optimum for the following special cases. Proposition 1: Heegard-Berger-Kaspi [10], [11] Extension.
Suppose the following Markov chain holds: ( X, A ) − ( A, Y ) − ( A, Y ) . Then, the rate-distortion-cost trade-off region is given by R ≥ I ( X ; A ) + I ( X ; U | A, Y )+ I ( X ; V | U, A, Y ) for some p ( x ) p ( a | x ) p ( u, v | x, a ) p ( y | x, a ) p ( y | y , a ) satisfying E d ( X, ˆ X ( U, V , A, Y )) ≤ D , E d ( X, ˆ X ( U, A, Y )) ≤ D , E Λ( A ) ≤ C. The cardinality of the auxiliary random variables is upper bounded by |U| ≤ |X ||A| + 2 and | V | ≤ |U| ( |X ||A| + 1) .The achievability for this proposition follows from Theorem 3 by setting V = ∅ and noting that since ( X, A ) − ( A, Y ) − ( A, Y ) , the terms in the max { . } function simplifies to I ( X ; U | A, Y ) . We give a proof of converse asfollows. Converse:
Given a code that satisfies the constraints, nR ≥ H ( M )= H ( M, A n ) H ( A n ) + H ( M | A n ) ≥ H ( A n ) − H ( A n | X n ) + H ( M | A n , Y n ) − H ( M | Y n , A n , X n )= I ( X n ; A n ) + I ( X n ; M | A n , Y n )= I ( X n ; A n ) + I ( X n ; M, Y n | A n , Y n ) − I ( X n ; Y n | M, A n , Y n )= I ( X n ; A n ) + H ( X n | A n , Y n ) − H ( X n | M, Y n , A n , Y n ) − I ( X n ; Y n | M, A n , Y n )= I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 ( H ( X i | M, Y n , A n , Y n , X i − ) + I ( X n ; Y i | M, A n , Y n , Y i − )) ≥ I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 ( H ( X i | M, Y n , A n , Y n ) + I ( X n ; Y i | M, A n , Y n , Y i − )) ( a ) = I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 ( H ( X i | M, Y n , A n , Y n ) + I ( X i ; Y i | M, A n , Y n , Y i − ))= I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 H ( X i | M, Y i − , A n , Y n )+ n X i =1 ( I ( X i ; Y n i | M, A n , Y n , Y i − ) − I ( X i ; Y i | M, A n , Y n , Y i − ))= I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 H ( X i | M, Y i − , A n , Y n )+ n X i =1 ( I ( X i ; Y n ,i +1 | M, A n , Y n , Y i ) , = I ( X n ; A n ) + H ( X n | A n , Y n ) − n X i =1 H ( X i | M, Y i − , A n , Y n )+ n X i =1 ( I ( X i ; Y n ,i +1 | M, A n , Y n \ i , Y i , Y i − ) , where ( a ) follows from the fact that X n \ i − ( M, A n , Y n , Y i − , X i ) − Y i and the last step follows from the MarkovChain assumption X i − ( A i , Y i ) − ( A i , Y i ) . Consider now, I ( X n ; A n ) + H ( X n | A n , Y n ) = I ( X n ; A n ) + H ( X n , Y n | A n ) − H ( Y n | A n )= H ( X n ) + H ( Y n | A n , X n ) − H ( Y n | A n ) ≥ n X i =1 ( H ( X i ) + H ( Y i | X i , A i ) − H ( Y i | A i )) . Hence, nR ≥ n X i =1 ( H ( X i ) + H ( Y i | X i , A i ) − H ( Y i | A i )) − n X i =1 H ( X i | M, Y i − , A n , Y n )+ n X i =1 ( I ( X i ; Y n ,i +1 | M, A n , Y n ı2 , Y i , Y i − ) . Define now Q to be a random variable uniform over [1 : n ] , independent of all other random variables; X = X Q , Y = Y Q , Y = Y Q , A = A Q , U i = ( M, Y i − , A n \ i , Y n \ i ) , V i = Y n ,i +1 , U = ( U Q , Q ) and V = V Q . Then, we11ave R ≥ H ( X ) + H ( Y | X, A ) − H ( Y | A, Q ) − H ( X | A, Y , U )+ I ( X ; V | A, Y , U ) ≥ H ( X ) + H ( Y | X, A ) − H ( Y | A ) − H ( X | A, Y , U )+ I ( X ; V | A, Y , U )= I ( X ; A ) + I ( X ; U | A, Y ) + I ( X ; V | A, Y , U ) . It remains to verify that the definitions of U , V and A satisfy the distortion and cost constraints, which isstraightforward. Prove of the cardinality bounds follows from standard techniques.The next proposition extends our results for the case of switching dependent side information to the a class oflossy source coding with switching dependent side information. Proposition 2: Special case of switching dependent side information.
Let Y = X, Y = Y if A = 1 and Y = Y, Y = X if A = 2 and for all x , there exists ˆ x and ˆ x such that d ( x, ˆ x ) = 0 and d ( x, ˆ x ) = 0 . Then,the rate-distortion-cost trade-off region is given by R ≥ I ( X ; A ) + max { P ( A = 2) I ( X ; U | A = 2 , Y ) , P ( A = 1) I ( X ; U | A = 1 , Y ) } for some p ( x, y ) p ( a | x ) p ( u | x, a = 2) p ( u | x, a = 1) satisfying P ( A = 2) E d ( X, ˆ X ( Y, U ) | A = 2) ≤ D , P ( A = 1) E d ( X, ˆ X ( Y, U ) | A = 1) ≤ D , E Λ( A ) ≤ C. The cardinality of the auxiliary random variables is upper bounded by |U | ≤ |X | + 1 and |U | ≤ |X | + 1 .Achievability follows from Theorem 3 by setting V = V = ∅ and U = U if A = 1 and U = U if A = 2 . Wegive the proof of converse as follows. Converse:
Given a code that satisfies the cost and distortion constraints, consider the rate required for decoder1. We have nR ≥ H ( M )= H ( M, A n )= H ( A n ) + H ( M | A n ) ≥ H ( A n ) − H ( A n | X n ) + H ( M | A n , Y n ) − H ( M | Y n , A n , X n )= I ( X n ; A n ) + I ( X n ; M | A n , Y n )= I ( X n ; A n ) + H ( X n , Y n | A n ) − H ( Y n | A n ) − H ( X n | M, A n , Y n )= H ( X n ) + H ( Y n | A n , X n ) − H ( Y n | A n ) − H ( X n | M, A n , Y n ) ≥ n X i =1 ( H ( X i ) + H ( Y i | X i , A i ) − H ( Y i | A i )) − n X i =1 H ( X i | M, A n , Y n ) . As before, we define Q to be an uniform random variable over [1 : n ] , independent of all other random variables.We then have R ≥ H ( X Q | Q ) + H ( Y Q | X Q , A Q , Q ) − H ( Y Q | A Q , Q ) − H ( X Q | M, A n , Y n , Q ) ( a ) ≥ H ( X ) + H ( Y | X, A ) − H ( Y | A ) − H ( X | M, A n , Y n ) ( b ) = I ( X ; A ) + I ( X ; U | Y , A ) . ( a ) follows from the discrete memoryless nature of the action channel and the fact that conditioning reduces entropy; ( b ) follows from defining U i = ( M, A n \ i , Y n \ i ) and U = ( U Q , Q ) . Expanding the second term in terms of A Y = X when A = 1 and Y = Y when A = 2 , we obtain R ≥ I ( X ; A ) + P ( A = 2) I ( X ; U | Y, A = 2) . For decoder 2, the same steps with side information Y instead of Y and defining U i = ( M, A n \ i , Y n \ i ) , U =( U Q , Q ) yield R ≥ I ( X ; A ) + P ( A = 1) I ( X ; U | Y, A = 1) . Taking the maximum over two lower bounds yield R ≥ I ( X ; A ) + max { P ( A = 2) I ( X ; U | Y, A = 2) , P ( A = 1) I ( X ; U | Y, A = 1) } for some p ( a | x ) p ( u , u | x, a ) . Verifying the cost constraint is straightforward. As for the distortion constraint, wehave for the decoder 1 n E d ( X n , ˆ x n ( M, A n , Y n )) = E d ( X, ˆ x ( U , A, Y ))= P ( A = 2) E ( d ( X, ˆ x ( U , Y )) | A = 2) . The same arguments hold for decoder 2. It remains to show that the probability distribution can be restricted to theform p ( a | x ) p ( u | a, x ) p ( u | a, x ) . Observe that P ( A = 2) E ( d ( X, ˆ x ( U , Y )) | A = 2) and P ( A = 2) I ( X ; U | Y, A =2) depends on the joint distribution only through the marginal p ( a, u | x ) and P ( A = 1) E ( d ( X, ˆ x ( U , Y )) | A = 1) and P ( A = 1) I ( X ; U | Y, A = 1) depends on the joint distribution only through the marginal p ( a, u | x ) . Hence,restricting the joint distribution to the form p ( a | x ) p ( u | a, x ) p ( u | a, x ) does not affect the rate, cost or distortionconstraints. It remains to bound the cardinality of the auxiliary random variables used, which follows from standardtechniques. This completes the proof of converse. Remark 4.2:
The condition on the distortion constraints is simply to remove distortion offsets. It can be removedin a fairly straightforward manner.
Remark 4.3:
As with the lossless source coding with switching dependent side information case, a modulo suminterpretation for the terms in the max expression is possible. When A = 1 , the encoder codes for decoder 2,resulting, after binning, in an index I for the codeword U n ; and when A = 2 , the encoder codes for decoder 1,resulting, after binning, in an index I for the codeword U n . The encoder sends out the modulo sum of the indicesof the two codewords ( I ⊕ I ) along with the index of the action codeword. Decoder 1 has the X i sequence when A = 2 and hence, it has the index I . Therefore, it can recover it’s desired index I from I ⊕ I . A similar analysisholds for decoder 2. Example 2: Binary source with Hamming distortion and no cost constraint.
Let Y = ∅ and X ∼ Bern(1 / .Assume no cost on the actions taken: Λ( A = 1) = Λ( A = 2) = 0 and let the distortion measure be Hamming.Then, the rate distortion trade-off evaluates to R = min α max { α (1 − H ( D /α )) ( D /α ≤ / , (1 − α ) (1 − H ( D / (1 − α ))) ( D / (1 − α ) ≤ / } , where ( x ) denotes the indicator function. As a check, note that if D , D → , then the rate obtained is / , whichagrees with the rate obtained in Corollary 1 for the lossless case. The result follows from explicitly evaluating theresult in Proposition 2. Let P ( A = 2) = α . From Proposition 2, we have R ≥ I ( X ; A ) + P ( A = 2) I ( X ; U | Y, A = 2)= 1 − (1 − α ) H ( X | A = 1) − αH ( X | A = 2) + αH ( X | A = 2) − αH ( X | U , A = 2) ≥ α − αH ( X | U , A = 2) ≥ α (1 − H ( X ⊕ ˆ X | U , A = 2)) ≥ α (cid:18) − H (cid:18) D α (cid:19)(cid:19) (cid:18) D α ≤ / (cid:19) . The last step follows from the observations that (i) if D /α > / , then we lower bound R by ; and (ii) if D /α ≤ / , then from the distortion constraint α E d ( X, ˆ X | A = 2) ≤ D , H ( X ⊕ ˆ X | A = 2) ≤ H ( D /α ) .13he other bound is derived in the same manner. The fact that this rate can be attained is straightforward, since wecan choose U = ˆ X when A = 2 and U = ˆ X when A = 1 . In this example, the action sequence is independent ofthe source, but unlike the case of lossless source coding, P ( A = 1) is not in general equal to P ( A = 2) . It dependson the distortion constraints for the individual decoders. A surface plot of the rate versus distortion constraints forthe two decoders is shown in Figure 6. D R ( D , D ) Fig. 6: Plot of rate versus distortions. The figure above plots the rate distortion surface R ( D , D ) for the Example2. There is no side information, i.e., Y = ∅ and X ∼ Bern(1 / . Assume no cost on the actions taken: Λ( A =1) = Λ( A = 2) = 0 and let the distortion measure be Hamming. Note that if any of D , D → . , R approaches , also if D = D = 0 , rate is . C. Connections with Complementary Delivery
In the prequel, we consider several cases for switching dependent side information in which the achievabilityscheme has a simple “modulo sum” interpretation for the terms in the max function. This interpretation is notunique to our setup and in this subsection, we consider the complementary delivery setting [7] in which thisinterpretation also arises. Formally, the complementary delivery problem is a special case of our setting and isobtained by letting A = ∅ , X = ( ˜ X, ˜ Y ) , P ( Y , Y | X ) = 1 Y = ˜ X,Y = ˜ Y , Λ( A ) = 0 , d ( X, ˆ X ) = d ′ ( ˜ Y , ˆ X ) and d ( X, ˆ X ) = d ′ ( ˜ X, ˆ X ) . For this subsection, for notational convenience, we will use X in place of ˜ X , Y in placeof ˜ Y , ˆ Y in place of ˆ X and ˆ X in place of ˆ X . This setting is shown in Figure 7. In [7], the following achievablerate was established R ( D , D ) ≥ max { I ( U ; Y | X ) , I ( U ; X | Y ) } , (3)for some p ( u | x, y ) satisfying E d ( Y, ˆ Y ( U, X )) ≤ D and E d ( X, ˆ X ( U, Y )) ≤ D .Our achievability scheme in Theorem 3 generalizes this scheme when specialized to the complementary deliverysetting, but we do not yet know if our achievable rate can be strictly smaller for the same distortions. However,by taking a modulo sum interpretation for the terms in the max { . } function in (3), as we have done for severalexamples in this paper, we are able to give simple proofs and explicit characterization for two canonical cases:the Quadratic Gaussian and the doubly symmetric binary Hamming distortion complementary delivery problems.While characterizations for these two settings also appear independently in [12], our approach in characterizingthese settings is different from that in [12], and we believe would be of interest to readers. Furthermore, by takingthe “modulo sum” interpretation, we establish the following, which may be a useful observation in practice: “ For Sfrag replacements ( X n , Y n ) X n Y n R ˆ Y n ˆ X n Decoder 1Decoder 2Encoder
Fig. 7: Complementary Delivery setting the Quadratic Gaussian complementary delivery problem, if one has a good code (in the sense of achieving theoptimum rate distortion tradeoff) for the point to point Wyner-Ziv [1] Quadratic Gaussian setup, then a simplemodification exists to turn the code into a good code for the Quadratic Gaussian complementary delivery problem. ”A similar observation holds for the doubly symmetric binary Hamming distortion case. We first consider theQuadratic Gaussian case.
Proposition 3: Quadratic Gaussian complementary delivery.
Let Y = X + Z , where Z ∼ N (0 , N ) is independentof X ∼ N (0 , P ) , and the distortion measures be mean square distortion. Let P ′ = P N/ ( P + N ) . The rate distortionregion for the non-trivial constraints of D ≤ P ′ and D ≤ N is given by R ( D , D ) = max (cid:26)
12 log (cid:18) ND (cid:19) ,
12 log (cid:18) P ′ D (cid:19)(cid:27) . Proof:Converse
The converse follows from straightforward cutset bound arguments. The reader may notice that the expressiongiven above is the maximum of the Quadratic Gaussian Wyner-Ziv [1] rate to decoder 1 and the Quadratic GaussianWyner-Ziv rate to decoder 2, or equivalently the maximum of the two cutset bounds. Clearly, this rate is the lowestpossible for the given distortions.
Achievability
We now show that it is also achievable using a modulo sum interpretation for (3). Consider first encodingfor decoder 1. From the Quadratic Gaussian Wyner-Ziv result, we know that side information at the encoder isredundant. Therefore, without loss of optimality, the encoder can code for decoder 1 using only Y n , resulting inthe codeword U nY and the corresponding index I Y after binning. Similarly, for decoder 2, the encoder can code fordecoder 2 using X n only, resulting in the codeword U nX and index I X after binning. The encoder then sends out theindex I X ⊕ I Y . Since decoder 1 has the X n sequence as side information, it knows the index I X and can thereforerecover I Y from I X ⊕ I Y . The same decoding scheme works as well for decoder 2. Therefore, we have shown theachievability of the given rate expression. We note further that this scheme corresponds to setting U = ( U X , Y Y ) such that U X − X − Y − U Y in rate expression (3). Remark 4.4 : As shown in our proof of achievability, if we have a good practical code for the Wyner-Ziv QuadraticGaussian problem, then we also have a good practical code for the complementary delivery problem setting. Wefirst develop two point to point codes: one for the Wyner-Ziv Quadratic Gaussian case with X as the source and Y as the side information, and another for the case where Y is the source and X is the side information. A goodcode for the complementary delivery setting is then obtained by taking the modulo sum of the indices produced bythese two point to point codes. 15e now turn to the doubly symmetric binary sources with Hamming distortion case. Here, the achievabilityscheme involves taking the modulo sum of the sources X n and Y n . Proposition 4: Doubly symmetric binary source with Hamming distortion.
Let X ∼ Bern(1 / , Y ∼ Bern(1 / , X ⊕ Y ∼ Bern( p ) and both distortion measures be Hamming distortion. Assume, without loss of generality, that D , D ≤ p . Then, R ( D , D ) = max { H ( p ) − H ( D ) , H ( p ) − H ( D ) } . Proof:
The converse again follows from straightforward cutset bounds by considering decoders 1 and 2individually. For the achievability scheme, let Z = X ⊕ Y and assume that D ≤ D . Since Z is i.i.d. Bern( p ) ,using a point to point code for Z at distortion D , we obtain a rate of H ( p ) − H ( D ) . Denote the reconstructionfor Z at time i by ˆ Z i . Decoder 1 reconstructs Y i by ˆ Y i = X i ⊕ ˆ Z i for i ∈ [1 : n ] . Similarly, decoder 2reconstructs X by ˆ X i = Y i ⊕ ˆ Z i for i ∈ [1 : n ] . To verify that the distortion constraint holds, note that d ( Y i , X i ⊕ ˆ Z i ) = Y i ⊕ X i ⊕ ˆ Z i = Z i ⊕ ˆ Z i . Since ˆ Z is a code that achieves distortion D , ˆ Y satisfies thedistortion constraint for decoder 1. The same analysis holds for decoder 2. Remark 4.5:
In this case, we only need a good code for the standard point to point rate distortion problem for abinary source. A good rate distortion code for a binary source is also a good code for the doubly symmetric binarysource with Hamming distortion complementary delivery problem.
Remark 4.6:
In our scheme, the reconstruction symbols at time i depend only on the received message and the sideinformation at the decoder at time i . Therefore, for this case, the rate distortion region for causal reconstruction [8]is the same as the rate distortion region for noncausal reconstruction.V. A CTIONS TAKEN AT THE ENCODER
We now turn to the case where the encoder takes action (figure 3) instead of the decoders. When the actions aretaken at the encoder, the general rate-cost-distortion tradeoff region is open even for the case of a single decoder.Special cases which have been characterized includes the lossless case [2]. In this section, we consider a specialcase of lossless source coding with K decoders in which we can characterize the rate-cost tradeoff region. Theorem 4: Special case of lossless source coding with actions taken at the encoder.
Let the action channel begiven by the conditional distribution P Y ,Y ,...,Y K | X,A . Assume further that A = f ( Y ) = f ( Y ) = . . . , f K ( Y K ) .Then, the minimum rate required for lossless source coding with actions taken at the encoder and cost constraint C is given by R = min[ max j ∈ [1: K ] { H ( X | Y j , A ) } − H ( A | X )] + , where minimization is over the joint distribution p ( x ) p ( a | x ) p ( y , y , . . . , y K | x, a ) such that E Λ( A ) ≤ C . Proof:Converse
The proof of converse is a straightforward extension from the single decoder case given in [2]. We givethe proof here for completeness. Consider the rate required for decoder j . nR ≥ H ( M ) ≥ H ( M, X n | Y nj ) − H ( X n | M, Y nj ) ≥ H ( M, X n | Y nj ) − nǫ n ( a ) = H ( X n | Y nj ) − nǫ n ( b ) = H ( X n ) + H ( Y nj | X n , A n ) − H ( Y nj ) − nǫ n ≥ n X i =1 ( H ( X i ) + H ( Y ji | X i , A i ) − H ( Y j,i )) − nǫ n , where ( a ) follows from the fact that M is a function of X n and ( b ) follows from A n being a function of X n .The last step follows from X n being a discrete memoryless source; the action channel being memoryless and16onditioning reduces entropy. As before, we define Q to be an uniform random variable over [1 : n ] independentof all other random variables to obtain R ≥ H ( X ) + H ( Y j | X, A ) − H ( Y j ) − ǫ n = H ( X ) + H ( Y j , X | A ) − H ( X | A ) − H ( Y j ) − ǫ n = H ( X | A, Y j ) + I ( X ; A ) − I ( Y j ; A ) − ǫ n = H ( X | A, Y j ) − H ( A | X ) − ǫ n . The last step follows from the fact that A = f j ( Y j ) . Combining the lower bounds over K decoders then give usthe achievable rate stated in the Theorem. Achievability
We give a sketch of achievability since the techniques used are relatively straightforward. Assumefirst that
R > . We first bin the set of X n sequences to n (max j ∈ [1: K ] H ( Y j | X,A )+ ǫ ) , B ( M X ) , M X ∈ [1 :2 n (max j ∈ [1: K ] H ( Yj | X,A )+ ǫ ) ] . Given an x n sequence, we first find the bin index m x such that x n ∈ B ( m x ) . We thensplit m x into two sub-messages: m xr ∈ [1 : 2 max j ∈ [1: K ] { H ( X | Y j ,A ) }− H ( A | X )+2 ǫ ] and m xa ∈ [1 : 2 n ( H ( A | X ) − ǫ ) ] . m xr is transmitted over the noiseless link, giving us the rate stated in the Theorem. As for m xa , we will sendthe message through the action channel by treating the action channel as a channel with i.i.d. state X noncausallyknown at the transmitter ( A ). We can therefore use Gel’fand Pinsker coding [13] for this channel.Each decoder first decodes m xa from their side information Y j . From the condition that A = f j ( Y j ) for all j , we have H ( A | X ) − ǫ = I ( Y j ; A ) − I ( X ; A ) − ǫ . From analysis of Gel’fand-Pinsker coding, since |M xa | = I ( Y j ; A ) − I ( X ; A ) − ǫ , the probability of error in decoding m xa goes to zero as n → ∞ . The decoder thenreconstructs m x from m xr and m xa . It then finds the unique ˆ x n ∈ B ( m x ) that is jointly typical with Y nj and A n . Note that due to Gel’fand-Pinsker coding, the true x n sequence is jointly typical with Y nj and A n withhigh probability. Therefore, the probability of error in this decoding step goes to zero as n → ∞ since we have n (max j ∈ [1: K ] H ( Y j | X,A )+ ǫ ) bins.For the case where R = 0 , we send the entire message through the action channel. Example 3:
Consider the case of K = 2 with switching dependent side information: A = { , } and ( X, Y ) ∼ p ( x, y ) with P Y ,Y | X,A specified by Y = Y, Y = e when A = 1 and Y = e, Y = Y when A = 2 . Notethat A is a function of Y , and also of Y . It therefore satisfies the condition in Theorem 4. Let P ( A = 1) = α , Λ( A = 1) = C and Λ( A = 2) = C . The rate-cost tradeoff is characterized by R = max { αH ( X | A = 1 , Y ) + (1 − α ) H ( X | A = 1) , (1 − α ) H ( X | A = 2 , Y ) + αH ( X | A = 2) } + H ( X ) − H ( α ) − αH ( X | A = 1) − (1 − α ) H ( X | A = 2) for some p ( a | x ) satisfying αC + (1 − α ) C ≤ C .VI. O THER SETTINGS
In this section, we consider other settings involving multi-terminal source coding with action dependent sideinformation. The first setting that we consider in this section generalizes [2, Theorem 7] to the case where thereis a rate-limited link from the source encoder to the action encoder. The second setting we consider is a case ofsuccessive refinement with actions.
A. Single decoder with Markov Form X-A-Y and rate limited link to action encoder
In this subsection, we consider the setting illustrated in Figure 8. Here, we have a single decoder with actionstaken at an action encoder. The source encoder have access to source X n and sends out two indices M ∈ [1 : 2 nR ] and M A ∈ [1 : 2 nR A ] . The action encoder is a function f : M A → A n . In addition, we have the Markov relation X − A − Y . That is, the side information Y is dictated only by the action A taken. The other definitions remainthe same and we omit them here. Proposition 5: R ( D, C ) for the setting shown in figure 8 is given by R ( D, C ) = min max { I ( X ; ˆ X ) − R A , I ( X ; ˆ X ) − I ( A ; Y ) } , where the minimization is over p ( x ) p ( a ) p ( y | a ) p (ˆ x | x ) satisfying the cost and distortion constraints E d ( X, ˆ X ) ≤ D and E Λ( A ) ≤ C . 17 Sfrag replacements X n Source EncoderAction Encoder RR A Y n Decoder P Y | A ˆ X n A n Fig. 8: Lossy source coding with rate limited link to action encoder
Remark 6.1:
If we set R A = ∞ in Proposition 5, then we recover the result in [2, Theorem 7]. Essentially, thesource encoder tries to send as much information as possible through the rate limited action link until the linksaturates. Proof:Achievability:
The achievability is straightforward. Using standard rate distortion coding, we cover X n with n ( I ( X ; ˆ X )+ ǫ ) ˆ X n codewords. Given a source sequence x n , we find an ˆ X n that is jointly typical with x n . We thensplit the index M X corresponding to the chosen ˆ X n codeword into two parts: M A ∈ [1 : 2 n (min { R A ,I ( A ; Y ) } + ǫ ) ] and M ∈ [1 : 2 nR ] . The action encoder takes the index and transmit it through the action channel. Since the rateof M A is less than I ( A ; Y ) − ǫ , the decoder can decode M A with high probability of success. It then combines M A with M to obtain the index of the reconstruction codeword ˆ X n . Converse
Given a code that satisfy the distortion and cost constraints, we have nR ≥ H ( M )= I ( X n ; M ) ≥ I ( X n ; M ) − I ( X n ; Y n )= I ( X n ; ˆ X n ) − I ( X n , M A ; Y n ) ( a ) ≥ n X i =1 I ( X i ; ˆ X i ) − I ( X n , M A , A n ; Y n ) ( b ) ≥ n X i =1 I ( X i ; ˆ X i ) − I ( M A , A n ; Y n ) . ( a ) follows from the fact that A n is a function of M A . ( b ) follows from the Markov chain X − A − Y . Now, itis easy to see that I ( M A , A n ; Y i ) ≤ min { nR A , P ni =1 I ( A i ; Y i ) } . The bound on the rate is then single letterized inthe usual manner, giving us R ( D, C ) = min max { I ( X ; ˆ X ) − R A , I ( X ; ˆ X ) − I ( A ; Y ) } , for some p ( a, ˆ x | x ) satisfying the distortion and cost constraints. Finally, we note that p ( a, ˆ x | x ) can be restricted tothe form p ( a ) p (ˆ x | x ) . To see this, note that none of the terms depend on the joint p ( a, ˆ x | x ) . Furthermore, due to theMarkov conditon X − A − Y , it suffices to consider A independent of X , giving us the p.m.f in the Proposition. B. Successive refinement with actions
The next setup that we consider is a case of successive refinement [14], [15] with actions taken at the “morecapable” decoder. The setting is shown in Figure 9. 18
Sfrag replacements X n Source EncoderAction Encoder Decoder 2 R Y n Decoder 1 P Y | X,A ˆ X n , D ˆ X n , D A n R Fig. 9: Successive refinement with actions
Proposition 6: Successive refinement with actions taken at the more capable decoder
For the setting shown infigure 9, the rate distortion cost tradeoff region is given by R ≥ I ( X ; ˆ X ) ,R + R ≥ I ( X ; ˆ X , A ) + I ( X ; U | ˆ X , Y, A ) for some p (ˆ x , a, u | x ) satisfying E d ( X, ˆ X ) ≤ D , E d ( X, ˆ X ( U, Y, A )) ≤ D , E Λ( A ) ≤ C. The cardinality of the auxiliary U may be upper bounded by |U| ≤ |X || ˆ X ||A| + 1 .If we restrict R = 0 , then Proposition 6 gives the rate-distortion-cost tradeoff region for a special case ofProposition 1. That is, the case when Y = ∅ and actions are taken only at decoder 1. Proof:Achievability:
We give the case where R = I ( X ; ˆ X ) + ǫ and R = I ( X ; A | ˆ X ) + I ( X ; U | ˆ X , Y, A ) + 3 ǫ . Thegeneral region stated in the Proposition can then be obtained by rate splitting of R . Codebook generation • Generate nR ˆ X n ( m ) sequences according to Q ni =1 p (ˆ x i ) , m ∈ [1 : 2 nR ] . • For each ˆ X n ( m ) sequence, generate n ( I ( X ; A | ˆ X )+ ǫ ) A n ( m , m ) , sequences according to Q ni =1 p ( a i | ˆ x i ) . • For each ˆ X n ( m ) and A n ( m , m ) sequence pair, generate n ( I ( X ; U | ˆ X ,A )+ ǫ ) U n ( m , m , l ) , sequencesaccording to Q ni =1 p ( u i | ˆ x i , a i ) . • Partition the set of l indices into I ( X ; U | ˆ X ,Y,A )+2 ǫ bins, B ( m , m , m ) , m ∈ [1 : 2 n ( I ( X ; U | ˆ X ,Y,A )+2 ǫ ) ] . Encoding • Given a sequence x n , the encoder first looks for an ˆ x n ( m ) sequence such that ( x n , ˆ x n ) ∈ T ( n ) ǫ . This stepsucceeds with high probability since R = I ( X ; ˆ X ) + ǫ .19 Next, the encoder looks for an A n ( m , m ) sequence such that ( x n , a n , ˆ x n ) ∈ aep . This step succeeds withhigh probability since we have n ( I ( X ; A | ˆ X )+ ǫ ) A n sequences. • The encoder then looks for an U n ( m , m , l ) sequence such that ( x n , a n , ˆ x n , u n ) ∈ aep . This step succeedswith high probability since we have n ( I ( X ; U | ˆ X ,A )+ ǫ ) U n sequences. • It then finds the bin index such that l ∈ B ( m , m , m ) . • The encoder sends out the indices m over the link R and m and m over the link R , giving us thestated rates. Decoding and reconstruction • Since decoder 1 has index m , it reconstructs x n using ˆ x ( m ) n . Since ( x n , ˆ x n ) are jointly typical with highprobability, the expected distortion satisfies the D distortion constraint to within ǫ . • For decoder 2, from m and m , it recovers the action sequence a n ( m , m ) . It then takes the action a n ( m , m ) to obtain it’s side information Y n . With the side information, it recovers the u n sequence bylooking for the unique ˆ l ∈ B ( m , m , m ) such that ( u n ( m , m , ˆ l ) , ˆ x n , a n , Y n ) ∈ T ( n ) ǫ . Since thereare only n ( I ( U ; Y | ˆ X ,A ) − ǫ ) U n sequences in the bin and ( u n ( m , m , l ) , ˆ x n , a n , Y n ) ∈ T ( n ) ǫ with highprobability from the fact that Y is generated i.i.d. according to p ( y | a i , x i ) , the probability of error goes tozero as n → ∞ . Decoder 2 then reconstructs x n using ˆ x i ( a i , u i , y i ) for i ∈ [1 : n ] . Converse:
We consider only the lower bound for R + R . The lower bound for R is straightforward. Given acode which satisfies the distortion and cost constraints, we have n ( R + R ) ≥ H ( M , M )= H ( M , M , A n , ˆ X n )= H ( A n , ˆ X n ) + H ( M , M | A n , ˆ X n ) ≥ I ( X n ; A n , ˆ X n ) + H ( M , M | A n , ˆ X n , Y n ) − H ( M , M | Y n , A n , ˆ X n , X n )= I ( X n ; A n , ˆ X n ) + I ( X n ; M , M | A n , ˆ X n , Y n )= I ( X n ; A n , ˆ X n ) + H ( X n | A n , ˆ X n , Y n ) − H ( X n | A n , ˆ X n , Y n , M , M )= I ( X n ; A n , ˆ X n ) + H ( X n , Y n | A n , ˆ X n ) − H ( Y n | ˆ X n , A n ) − H ( X n | A n , ˆ X n , Y n , M , M )= H ( X n ) − H ( X n | A n , ˆ X n ) + H ( X n , Y n | A n , ˆ X n ) − H ( Y n | ˆ X n , A n ) − H ( X n | A n , ˆ X n , Y n , M , M )= H ( X n ) + H ( Y n | X n , A n , ˆ X n ) − H ( Y n | ˆ X n , A n ) − H ( X n | A n , ˆ X n , Y n , M , M ) ≥ n X i =1 ( H ( X i ) + H ( Y i | X n , A n , ˆ X n , Y i − ) − H ( Y i | ˆ X n , A n , Y i − ) − H ( X i | A n , ˆ X n , Y n , M , M )) ( a ) ≥ n X i =1 ( H ( X i ) + H ( Y i | X i , A i , ˆ X i ) − H ( Y i | ˆ X i , A i ) − H ( X i | A n , ˆ X n , Y n , M , M ))= n X i =1 ( H ( X i ) + H ( Y i | X i , A i , ˆ X i ) − H ( Y i | ˆ X i , A i ) − H ( X i | A n , ˆ X n , Y n , M , M )) ≥ n X i =1 ( H ( X i ) + H ( Y i | X i , A i , ˆ X i ) − H ( Y i | ˆ X i , A i ) − H ( X i | U i , A i , Y i , ˆ X i ))( a ) follows from the Markov Chain ( X n \ i , A n \ i , ˆ X n \ i , Y i − ) − ( ˆ X i , X i , A i ) − Y i and the last step follows fromdefining U i = ( M , M , Y n \ i , A n \ i ) . The proof is then completed in the usual manner by defining the time sharinguniform random variable Q and U = ( U Q , Q ) , giving us R + R ≥ H ( X ) + H ( Y | X, A, ˆ X ) − H ( Y | ˆ X , A ) − H ( X | U, A, Y, ˆ X )= I ( X ; ˆ X , A ) + I ( X ; U | ˆ X , Y, A ) . ˆ X is a function of U , Y and A , which is straightforward. Finally, the cardinality bound on U may beobtained from standard techniques. Note that we need | ˆ X ||X ||A| − letters to preserve p ( u, a, x ) and two moreto preserve the rate and distortion constraints. Remark 6.2 : An interesting question to explore characterizing the more general case when degraded side informationis also available at decoder 1. That is, we have the side informations Y at decoder 1 and Y at decoder 2 aregenerated by a discrete memoryless channel P Y ,Y | X,A such that ( X, A ) − ( Y , A ) − ( Y , A ) . This generalized setupwould allow us to generalize Proposition 1 entirely and also leads to a generalization of successive refinement forthe Wyner-Ziv problem in [16] to the action setting.VII. C ONCLUSION
In this paper, we considered an important class of multi-terminal source coding problems, where the encodersends the description of the source to the decoders, which then take cost-constrained actions that affect the qualityor availability of side information. We computed the optimum rate region for lossless compression, while for thelossy case we provide a general achievability scheme that is shown to be optimal for a number of special cases,one of them being the generalization of
Heegard-Berger-Kaspi setting. (cf. [10], [11]). In all these cases in additionto a standard achievability argument, we also provided a simple scheme which has a modulo sum interpretation.The problem where the encoder takes actions rather than the decoders, was also considered. Finally, we extendedthe scope to additional multi-terminal source coding problems such as successive refinement with actions.A
CKNOWLEDGMENT
We thank Professor Haim Permuter and Professor Yossef Steinberg for helpful discussions. The authors’ researchwas partially supported by NSF grant CCF-0729195 and the Center for Science of Information (CSoI), an NSFScience and Technology Center, under grant agreement CCF-0939370; the Scott A. and Geraldine D. MacomberStanford Graduate Fellowship and the Office of Technology Licensing Stanford Graduate Fellowship.R
EFERENCES[1] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”
IEEE Trans. Inf. Theory ,vol. 22, no. 1, pp. 1–10, 1976.[2] H. Permuter and T. Weissman, “Source coding with a side information “vending machine”,”
IEEE Trans. Inf. Theory , vol. 57, no. 7, pp.4530–4544, July 2011.[3] H. Asnani, H. H. Permuter, and T. Weissman, “Probing capacity,”
CoRR , vol. abs/1010.1309, 2010.[4] K. Kittichokechai, T. J. Oechtering, M. Skoglund, and R. Thobaben, “Source and channel coding with action-dependent partially knowntwo-sided state information,” in
Proc. IEEE International Symposium on Information Theory , 2010, pp. 629–633–194.[5] A. El Gamal and Y. H. Kim, “Lectures on network information theory,” 2010, available online at ArXiv.[6] T. M. cover, “A proof of the data compression theorem of slepian and wolf for ergodic sources,”
IEEE Trans. Inf. Theory , vol. 21, no. 2,pp. 226–228, 1975.[7] A. Kimura and T. Uyematsu, “Multiterminal source coding with complementary delivery,” in
Proc. International Symposium on InformationTheory and its Applications (ISITA) , 2006, pp. 189–194.[8] T. Weissman and A. El Gamal, “Source coding with limited-look-ahead side information at the decoder,”
IEEE Trans. Inf. Theory , vol. 52,no. 12, pp. 5218–5239, 2006.[9] H. G. Eggleston,
Convexity . Cambridge: Cambridge University Press, 1958.[10] C. Heegard and T. Berger, “Rate distortion when side information may be absent,”
IEEE Trans. Inf. Theory , vol. 31, no. 6, pp. 727–734,1985.[11] A. H. Kaspi, “Rate-distortion function when side-information may be present at the decoder,,”
IEEE Trans. Inf. Theory , vol. 40, no. 6, pp.2031–2034, 1994.[12] R. Timo, A. Grant, and G. Kramer, “Rate-distortion functions for source coding with complementary side information,” in
Proc. IEEEInternational Symposium on Information Theory , St. Petersburg, Russia, 2011, pp. 2934–2938.[13] S. I. Gel’fand and M. S. Pinsker, “Coding for channels with random parameters,”
Probl. Control and Information Theory , vol. 9, pp.19–31, 1980.[14] W. H. R. Equitz and T. M. Cover, “Successive refinement of information,”
IEEE Trans. Inf. Theory , vol. 37, no. 2, pp. 269–275, 1991.[15] ——, “Successive refinement of information: Characterization of the achievable rates,,”
IEEE Trans. Inf. Theory , vol. 40, no. 1, pp.253–259, 1994.[16] Y. Steinberg and N. Merhav, “On successive refinement for the wyner-ziv problem,”
IEEE Trans. Inf. Theory , vol. 50, no. 8, pp. 1636–1654,2004. PPENDIX AA CHIEVABILITY S KETCH FOR T HEOREM Codebook generation • Generate n ( I ( X ; A )+ ǫ ) A n ( l a ) , l a ∈ [1 : 2 n ( I ( X ; A )+ ǫ ) ] , sequences according to Q ni =1 p ( a i ) . • For each A n sequence, generate n ( I ( U ; X | A )+ ǫ ) U n ( l a , l ) , l ∈ [1 : 2 n ( I ( U ; X | A )+ ǫ ) ] , sequences according to Q ni =1 p U | A ( u i | a i ) . • Partition the set of indices corresponding to the U n codewords uniformly to n (max { I ( X ; U | A,Y ) ,I ( X ; U | A,Y ) } +2 ǫ ) bins, B U ( l a , m ) , m ∈ [1 : 2 n (max { I ( X ; U | A,Y ) ,I ( X ; U | A,Y ) } +2 ǫ ) ] . • For each pair of A n and U n sequences, generate n ( I ( V ; X | A,U )+ ǫ ) V n ( l a , l , l ) , l ∈ [1 : 2 n ( I ( V ; X | A,U )+ ǫ ) ] ,sequences according to Q ni =1 p V | A,U ( v i | a i , u i ) . • Partition the set of indices corresponding to the V n codewords uniformly to n ( I ( X ; V | U,A,Y )+2 ǫ ) bins, B V ( l a , l , m ) , m ∈ [1 : 2 n ( I ( X ; V | U,A,Y )+2 ǫ ) ] . • For each pair of A n and U n sequences, generate n ( I ( V ; X | A,U )+ ǫ ) V n ( l a , l , l ) , l ∈ [1 : 2 n ( I ( V ; X | A,U )+ ǫ ) ] ,sequences according to Q ni =1 p V | A,U ( v i | a i , u i ) . • Partition the set of indices corresponding to the V n codewords uniformly to n ( I ( X ; V | U,A,Y )+2 ǫ ) bins, B V ( l a , l , m ) , m ∈ [1 : 2 n ( I ( X ; V | U,A,Y )+2 ǫ ) ] . Encoding • Given an x n sequence, the encoder first looks for an a n ( l a ) sequence such that ( x n , a n ) ∈ T ( n ) ǫ . If there isnone, it outputs and index chosen uniformly at random from the set of possible l a indices. If there is morethan one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are n ( I ( X ; A )+ ǫ ) such sequences, the probability of error → as n → ∞ . • The encoder then looks for a u n ( l a , l ) sequence that is jointly typical with ( a n ( l a ) , x n ) . If there is none, itoutputs and index chosen uniformly at random from the set of possible l indices. If there is more than one, itoutputs an index chosen uniformly at random from the set of feasible indices. Since there are n ( I ( U ; X | A )+ ǫ ) such sequences, the probability of error → as n → ∞ . • Next, the encoder looks for a v n ( l a , l , l ) sequence that is jointly typical with ( a n ( l a ) , u n ( l ) , x n ) . If thereis none, it outputs and index chosen uniformly at random from the set of possible l indices. If there is morethan one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are n ( I ( V ; X | A,U )+ ǫ ) such sequences, the probability of error → as n → ∞ . • Next, the encoder looks for a v n ( l a , l , l ) sequence that is jointly typical with ( a n ( l a ) , u n ( l ) , x n ) . If thereis none, it outputs and index chosen uniformly at random from the set of possible l indices. If there is morethan one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are n ( I ( V ; X | A,U )+ ǫ ) such sequences, the probability of error → as n → ∞ . • The encoder then sends out the indices l a , m , m and m such that l ∈ B U ( l a , m ) , l ∈ B V ( l a , l , m ) and l ∈ B V ( l a , l , m ) . Decoding and reconstruction
Decoder 1: • Decoder 1 first takes the action sequence a n ( l a ) to obtain the side information Y n . We note that if ( a n ( l a ) , x n , u n ( l a , l ) , v n ( l a , l , l )) ∈ T ( n ) ǫ , then P { ( a n ( l a ) , x n , u n ( l a , l ) , v n ( l a , l , l ) , Y n ) ∈ T ( n ) ǫ } → as n → ∞ by the conditional typicality lemma [5, Chapter 2] and the fact that Y n ∼ Q ni =1 p ( y i | x i , a i ) . • Decoder 1 then decodes U n . it does this by finding the unique ˆ l such that u n ( l a , ˆ l ) ∈ B U ( l a , m ) . If thereis none or more than one such ˆ l , an error is declared. Following standard analysis for the Wyner-Ziv setup(see for e.g. [5, Chapter 12]), the probability of error goes to zero as n → ∞ since there are less than or equalto n ( I ( U ; Y | A ) − ǫ ) U n sequences within each bin. • Similarly, decoder 1 decodes V n . It does this by finding the unique ˆ l such that v n ( l a , ˆ l , ˆ l ) ∈ B V ( l a , ˆ l , m ) .If there is none or more than one such ˆ l , an error is declared. As with the previous step, the probability oferror goes to zero as n → ∞ since there are only n ( I ( V ; Y | A,U ) − ǫ ) V n sequences within each bin. • Decoder 1 then reconstructs x n as ˆ x i ( a i ( l a ) , u i ( l a , ˆ l ) , v i ( l a , ˆ l , ˆ l ) , y i ) for i ∈ [1 : n ] .22ecoder 2: As the decoding steps for decoder 2 are similar to that for 1, we will only mention the differenceshere. That is, decoder 2 uses side information Y n instead of Y n to perform the decoding operations and insteadof decoding V n , decoder 2 decodes V n . • Decoder 2 decodes V n . It does this by finding the unique ˆ l such that v n ( l a , ˆ l , ˆ l ) ∈ B V ( l a , ˆ l , m ) . If thereis none or more than one such ˆ l , an error is declared. As with the previous step, the probability of error goesto zero as n → ∞ since there are only n ( I ( V ; Y | A,U ) − ǫ ) V n sequences within each bin. • Decoder 1 then reconstructs x n as ˆ x i ( a i ( l a ) , u i ( l a , ˆ l ) , v i ( l a , ˆ l , ˆ l ) , y i ) for i ∈ [1 : n ] . Distortion and cost constraints • For the cost constraint, since the chosen A n sequence is typical with high probability, E Λ( A n ) ≤ C + ǫ bythe typical average lemma [5, Chapter 2]. • For the distortion constraints, since the probability of “error” goes to zero as n → ∞ and we are dealing onlywith finite cardinality random variables, following the analysis in [5, Chapter 3], we have n E d ( X n , ˆ X n ) ≤ D + ǫ, n E d ( X n , ˆ X n ) ≤ D + ǫ.ǫ.