Universal Quantization for Separate Encodings and Joint Decoding of Correlated Sources
aa r X i v : . [ c s . I T ] N ov Universal Quantization for Separate Encodings and JointDecoding of Correlated Sources ∗ Avraham Reani and Neri MerhavDepartment of Electrical EngineeringTechnion - Israel Institute of TechnologyTechnion City, Haifa 32000, IsraelEmails: [avire@tx, merhav@ee].technion.ac.ilFebruary 24, 2018
Abstract
We consider the multi-user lossy source-coding problem for continuous alphabetsources. In a previous work, Ziv proposed a single-user universal coding scheme whichuses uniform quantization with dither, followed by a lossless source encoder (entropycoder). In this paper, we generalize Ziv’s scheme to the multi-user setting. For thisgeneralized universal scheme, upper bounds are derived on the redundancies, definedas the differences between the actual rates and the closest corresponding rates on theboundary of the rate region. It is shown that this scheme can achieve redundancies ofno more than 0.754 bits per sample for each user. These bounds are obtained withoutknowledge of the multi-user rate region, which is an open problem in general. As a directconsequence of these results, inner and outer bounds on the rate-distortion achievableregion are obtained.
Index Terms: Multi-terminal source coding, Dithered quantization, Uni-versal source coding, scalar quantization, Slepian-Wolf coding. ∗ This paper was presented in part at 2014 IEEE International Symposium on Information Theory (ISIT). ∗ This research was supported by the Israeli Science Foundation (ISF), grant no. 208/08. Introduction
Consider the case where two correlated sources are observed separately by two non-cooperativeencoders which communicate with one decoder. The decoder needs to reconstruct bothsources and the distortions between the reconstructions and the corresponding sourcesshould not exceed some given values. The general version of this problem has remainedopen for several decades, even under the assumption of memoryless sources. However,many special cases have been solved. When no distortion is allowed, this is the problemconsidered by Slepian and Wolf [1]. Their well-known result states that two discrete sources X and X can be losslessly reproduced if and only if R ≥ H ( X | X ) , (1a) R ≥ H ( X | X ) , (1b) R + R ≥ H ( X , X ) (1c)where R is the rate of the encoder observing X and R is the rate of the encoder observing X . Returning to the lossy case, the setting in which one of the variables is known to thedecoder, is the original Wyner-Ziv problem [2]. This setting was generalized to continuousalphabet sources by Wyner [3]. Other examples include the source coding problem withside information of Ahlswede-K¨orner [4], where an arbitrary distortion is allowed for one ofthe sources and the other source should be reconstructed losslessly. Berger and Yeung [5]considered a setting where one of the sources is to be perfectly reconstructed and theother source should be reconstructed with a distortion constraint (their setting subsumesall previous examples). Zamir and Berger [6] characterized the rate-distortion region in thehigh-SNR limit. Wagner and Anantharan [7] presented a new outer bound which is betterthan the previous outer bounds in the literature.Recent results for specific sources and distortion measures include the works of Wagner,Tavildar, and Viswanath [8], who determined the rate region for the quadratic Gaussianmultiterminal source coding problem, by showing that the Berger-Tung [9] inner bound istight. In addition, a characterization of the rate region under logarithmic loss was givenby Courtade and Weissman [10]. Finally, a version of this problem, where both users andthe decoder must operate with zero-delay, was considered by Kaspi and Merhav [11], whocharacterized the rate region in this case. 2n [12], Ziv presented a universal coding scheme for the single-user case. This schemeis composed of a uniform, one-dimensional quantizer with dither, followed by a noiselessvariable-rate encoder (entropy encoder). He showed that this scheme yields a rate thatis, for every positive integer n , no more than 0 .
754 bits per sample higher than the bestpossible rate associated with the optimal n -dimensional quantizer. This result was laterrevisited and further developed by Zamir and Feder [13], [14], who also gave a redundancyupper bound which depends on the source distribution. However, their derivation of theglobal upper bound relies on the known formula of the single-user rate-distortion function.In addition, a dithered scheme for the multi-user setting, which is similarly to the schemein this paper, was given in [6]. Redundancy upper bounds can be derived by boundingthe difference between the dithered scheme rate region and the outer bound on the multi-user rate region given in [6]. These bounds depend on the divergence between the sourcedistribution and a Gaussian distribution. As a result, they are not uniformly bounded (forevery source distribution) in contrast to the bound of Ziv and the bounds presented in thispaper. In addition, only the redundancy of the sum of the rates can be upper boundedusing the methods of [6].In this paper, we investigate a generalized scheme for the multi-user setting. In thisscheme, each user uses dithered quantizer followed by universal Slepian-Wolf encoder. Weshow that the rates achieved by this scheme are no more than 0.754 bits per sample awayfrom the boundary of the achievable rate region, for each user. This is done regardless ofthe characterization of the achievable region, which is, as mentioned before, unknown ingeneral. As a direct consequence of these results, inner and outer bounds on the achievableregion are obtained. Finally, similarly to the results of [12], it is straightforward to showthat using multi-dimensional lattice quantizers instead of scalar ones would decrease theredundancy to about 0.5 bits per sample for high lattice dimension.The remainder of this paper is organized as follows. In Section 2, we present the problemformulation and give basic results regarding the performance of the dithered scheme. InSection 3, we revisit the redundancy upper bound of [12]. In Section 4, we enhance theresults of Section 2 by adding an estimation stage to the dithered scheme. We conclude thiswork in Section 5. 3 Problem Formulation and Basic Results
Throughout the paper, random variables will be denoted by capital letters and their al-phabets will be denoted by calligraphic letters. Random vectors (all of length n ) will bedenoted by capital letters in the bold face font.In this section, we present the multi-user setting we deal with and describe the ditheredcoding scheme we use. Then, we give upper bounds on the performance of this scheme,compared to the boundary of the optimal rate region.We begin with defining the multi-user rate region. Let ( X , X ) be a continuous alphabetmemoryless source, characterized by the joint probability density P X X . We assume that P X X has bounded support, i.e., there exists A ∈ R + such that P X X ( x , x ) = 0 if( x , x ) / ∈ [ − A, A ] × [ − A, A ]. The reason for this assumption will be explained later. Arate pair ( R ∗ , R ∗ ) is said to be ( D , D )-achievable under the mean-square error distortionmeasure with respect to ( X , X ), if for every δ > n , there exists acode of block length n consisting of two encoders f , f f : [ − A, A ] n → I M , f : [ − A, A ] n → I M (2)and a decoder g g : I M × I M → [ − A, A ] n × [ − A, A ] n (3)such that 1 n E || X − ˆX || ≤ D + δ, n E || X − ˆX || ≤ D + δ (4)and 1 n log M ≤ R ∗ + δ, n log M ≤ R ∗ + δ, (5)where I M i , { , , . . . , M i } , i ∈ { , } . The set of ( D , D )-achievable rate pairs, is denotedby R ∗ ( D , D ).Our scheme works as follows. We have two encoders ˜ f , ˜ f :˜ f : [ − A, A ] n × [ − p D , p D ] → I ˜ M , ˜ f : [ − A, A ] n × [ −√ D , √ D ] → I ˜ M (6)and a decoder ˜ g ˜ g : I ˜ M × I ˜ M × [ − p D , p D ] × [ − p D , p D ] → [ − A, A ] n × [ − A, A ] n . (7)4 niform Scalar Quantizer 1 Y Slepian-Wolf Encoder 1Uniform Scalar Quantizer 2 Y Slepian-Wolf Encoder 2 Slepian-Wolf Decoder log
R M = ɶ M T I ˛ (cid:1) ɶ log R M = ɶ M T I ˛ (cid:0) ɶ ˆ X - Z Y ˆ X - Z Y Z X Z X Figure 1: The dithered coding schemeEach encoder ˜ f i , i ∈ { , } , uses a one-dimensional uniform quantizer Q i , Q i : R →{ , ± √ D i , ± · √ D i , . . . } and a dither random variable (RV) Z i , uniformly distributedover [ −√ D i , √ D i ], to produce Q i ( X i + Z i ) , [ Q i ( X i, + Z i ) , Q i ( X i, + Z i ) , . . . , Q i ( X i,n + Z i )], where Z i denotes a vector of dimension n composed of n repetitions of the same re-alization of Z i . For convenience, the random variable Q i ( X i + Z i ) and the random vector Q i ( X i + Z i ) will be denoted by Y i and Y i , respectively. The dither RV’s, Z and Z , areavailable to the respective encoders and to the decoder and are independent. As is shownin [12, Lemma 1], E (cid:2) || Y i − Z i − X i || | X i (cid:3) = D i , i ∈ { , } (8)where the expectation is taken over Z i . Notice that the distortion is D i independently of X i and therefore the total distortion is also D i . After the quantization stage, the two encodersperform Slepian-Wolf encoding with a rate pair ( R , R ), for lossless compression of Y and Y . Complying with Eq. (1), the rate pair ( R = log ˜ M , R = log ˜ M ) satisfies R ≥ H ( Y | Y , Z , Z ) , (9a) R ≥ H ( Y | Y , Z , Z ) , (9b) R + R ≥ H ( Y , Y | Z , Z ) (9c)where we used the following, for every value of n n H ( Y | Y , Z , Z ) = H ( Y | Y , Z , Z ) , (10a)1 n H ( Y | Y , Z , Z ) = H ( Y | Y , Z , Z ) , (10b)1 n H ( Y , Y | Z , Z ) = H ( Y , Y | Z , Z ) . (10c)5o see why (10a) is true, consider the following chain1 n H ( Y | Y , Z , Z ) = 1 n n X i =1 H ( Y ,i | Y , , Y , , . . . , Y ,i − , Y , Z , Z )= 1 n n X i =1 H ( Y ,i | Y ,i , Z , Z )= H ( Y | Y , Z , Z ) (11)where the second equality stems from the fact that Y and Y are memoryless given Z and Z and the third equality stems from the stationarity of the source. The same can be donefor H ( Y | Y , Z , Z ) and H ( Y , Y | Z , Z ).The rate region of Eq. (9) is achievable for n sufficiently large and it is denoted by R ( D , D ). The interesting range of R is R ( D , D ) , [ H ( Y | Y , Z , Z ) , H ( Y | Z )] sincehigher rate can always be reduced to this range. The same is true for R . The universaldecoder first decodes Y and Y (correctly with high probability), and then subtracts thecorresponding dithers to obtain the reconstruction vectors ˆX , ˆX : ˆX i = Y i − Z i . (12)The universal Slepian-Wolf decoder is described in Appendix A. The dithered coding schemeis presented in Fig. 1. Remark . The Slepian-Wolf mechanism can be applied, in general, to sources with countably-infinite alphabets. However, a universal Slepian-Wolf scheme for such sources is not known.Trying to preserve universality in the case of infinite alphabets would require the assignmentof infinite number of sequences into bins. Thus, even the codebook generation does not seemto be feasible in this case. This is not surprising, considering the fact that even in the single-user case, diminishing redundancy cannot be achieved for universal lossless coding of sourceswith infinite alphabets (see, e.g., [15]). Therefore, for the sake of universality, we assumedthat the source alphabets have bounded supports so the outputs of the quantizers havefinite alphabets. From the above, this assumption is also needed for the original single-userscheme of Ziv [12]. The inner and outer bounds on the achievable rate-distortion region,which are obtained as a direct consequence of Theorems 1-4 below, are also valid, of course,for sources with unbounded support, as they do not depend on the universality.We begin with a simple result.
Theorem 1.
For any rate pair ( R ∗ , R ∗ ) on the boundary of R ∗ ( D , D ) and any rate pair R , R ) on the boundary of R ( D , D ) , with R ∈ R ( D , D ) , we have R + R ≤ R ∗ + R ∗ + 2 c (13) where c = 0 . bits/sample.Moreover, for any R ∗ ∈ R ( D , D ) , there exists a rate pair ( R , R ) ∈ R ( D , D ) suchthat R = R ∗ R ≤ R ∗ + 2 c. (14) Proof of Theorem 1.
We have1 n H ( Y , Y | Z , Z ) ≤ n H ( Y , Y , T , T | Z , Z ) ≤ n H ( T , T ) + 1 n H ( Y , Y | T , T , Z , Z ) ≤ R ∗ + R ∗ + 1 n H ( Y , Y | T , T , Z , Z ) ≤ R ∗ + R ∗ + 1 n H ( Y , Y | g ( T , T ) , Z , Z )= R ∗ + R ∗ + 1 n H (cid:16) Y , Y | ˆX opt , ˆX opt , Z , Z (cid:17) ≤ R ∗ + R ∗ + 1 n H (cid:16) Y | ˆX opt , Z (cid:17) + 1 n H (cid:16) Y | ˆX opt , Z (cid:17) ≤ R ∗ + R ∗ + 2 c (15)where T ∈ I M , T ∈ I M are the outputs of the optimal encoders f , f , respectively, (cid:16) ˆX opt , ˆX opt (cid:17) , g ( T , T ) are the outputs of the optimal decoder g , and ( R ∗ , R ∗ ) ∈ R ∗ ( D , D ).The last inequality can be obtained in the same way as in [12]. The left-hand side is achiev-able for sufficiently large n . Therefore, for any rate pair ( R , R ) ∈ R ( D , D ), which lieson the straight line R + R = H ( Y , Y | Z , Z ), we have R + R ≤ R ∗ + R ∗ + 2 c (16)Moreover, if R ∗ ∈ R ( D , D ), we can always take R = R ∗ and obtain: R ≤ R ∗ + 2 c (17)7he same can be done, of course, when the roles of the two users are interchanged. Thiscompletes the proof.The following theorem suggests another result regarding the relation between the bound-ary of R ( D , D ) and that of R ∗ ( D , D ). Theorem 2.
For any rate pair ( R , R ) on the boundary of R ( D , D ) , with R ∈ R ( D , D ) ,there exists a rate pair ( R ∗ , R ∗ ) ∈ R ∗ ( D , D ) such that: R ≤ R ∗ + cR ≤ R ∗ + c (18)Notice that Theorems 1 and 2 also provide outer bounds on R ∗ ( D , D ). Theorem1 asserts that the straight line R + R = H ( Y , Y | Z , Z ) − c defines an outer boundfor R ∗ ( D , D ). In addition, Theorem 2 bounds the distance between the boundary of R ( D , D ) and that of R ∗ ( D , D ) in each coordinate. The boundary of R ( D , D ) is, ofcourse, an inner bound on R ∗ ( D , D ).Before proving Theorem 2, we first prove a simple auxiliary result regarding the source-coding problem where side information is available only to the encoders but not to thedecoder. The setting is as follows. A rate pair ( R , R ) is achievable for a memoryless source( Y , Y , P Y ,Y ) and some side information S ∈ S which depends statistically on ( Y , Y )through the joint probability distributions P Y , Y ,S , if for any δ > n , there exists a block code of length n consisting of two encoders f , f f : Y n × S → I M , f : Y n × S → I M (19)and a decoder g g : I M × I M → Y n × Y n (20)such that Pr { g ( f ( Y , S ) , f ( Y , S )) = ( Y , Y ) } ≤ δ (21)and 1 n log M ≤ R + δ, n log M ≤ R + δ (22)8he set of achievable rate pairs is denoted by ˜ R . The regular Slepian-Wolf region (withoutside information) is denoted by R SW . Obviously, R SW ⊆ ˜ R . We have the following lemma. Lemma 1.
Any rate pair ( ˜ R , ˜ R ) ∈ ˜ R must satisfy the following constraint: ˜ R + ˜ R ≥ H ( Y , Y ) . (23) Therefore, side information available only to the encoders cannot improve the performanceif ˜ R ∈ [ H ( Y | Y ) , H ( Y )] or ˜ R ∈ [ H ( Y | Y ) , H ( Y )] .Proof of Lemma 1. The proof follows directly from the fact that even one encoder, whichhas access to ( Y , Y , S ), cannot do better than H ( Y , Y ), when the side information S is not available to the decoder.The generalization of Lemma 1 to our case where, in addition, a dither is available tothe encoders and decoder, is straightforward. We can now prove Theorem 2. Proof of Theorem 2.
Assume that the optimal code ( f , f , g ), which achieves the rate pair( R ∗ , R ∗ ), is known, and that the encoders of the dithered scheme, which transmit Y , Y at rates ( R , R ) to the decoder, have access to f ( X ), f ( X ) as side information.According to Lemma 1, this side information does not change the fact that any rate pair( R , R ) ∈ R ( D , D ) must satisfy R + R ≥ H ( Y , Y | Z , Z ). Consider the followingauxiliary coding scheme: User i compresses T i = f i ( X i ) using nR ∗ i bits, i ∈ { , } . Then, thefirst user uses Slepian-Wolf coding to compress Y given { T , T , Z } into H ( Y | T , T , Z )bits. The second user uses Slepian-Wolf coding to compress Y given { Y , T , T , Z , Z } into H ( Y | Y , T , T , Z , Z ) bits. The decoder, which has access to { T , T , Z , Z } , firstdecodes Y , using { T , T , Z } . Then, it decodes Y using { Y , T , T , Z , Z } . The ratepair of this scheme, ( R , R ), satisfies R = R ∗ + 1 n H ( Y | T , T , Z ) ≤ R ∗ + 1 n H ( Y | g ( T , T ) , Z )= R ∗ + 1 n H (cid:16) Y | ˆX opt , ˆX opt , Z (cid:17) ≤ R ∗ + 1 n H (cid:16) Y | ˆX opt , Z (cid:17) (24)9nd R = R ∗ + 1 n H ( Y | Y , T , T , Z , Z ) ≤ R ∗ + 1 n H ( Y | Y , g ( T , T ) , Z , Z )= R ∗ + 1 n H (cid:16) Y | Y , ˆX opt , ˆX opt , Z , Z (cid:17) ≤ R ∗ + 1 n H (cid:16) Y | ˆX opt , Z (cid:17) (25)The upper bounds on H ( Y i | ˆX opti , Z i ) can be obtained in the same way as in [12]. Noticethat the Slepian-Wolf coding part in the proof requires long blocks of ( T , T , Y , Y ).Now, since R ( D , D ) ⊆ R ∗ ( D , D ), we can always find R ( R ∗ , R ∗ ) ∈ R ∗ ( D , D ) suchthat R ∗ + c ∈ R ( D , D ) (or higher and thus can be reduced to this range). Using theauxiliary scheme above, the rate pair ( R , R ) = ( R ∗ + c, R ∗ + c ) can be achieved. Therefore,it can also be achieved by the dithered scheme, since R ∈ R ( D , D ) (or higher), and inthis range the regions of the auxiliary scheme and the dithered scheme coincide. Noticethat any rate pair in R ( D , D ) can be achieved in practice by time-sharing the two edgepoints of R ( D , D ). H (cid:16) Y | ˆX opt , Z (cid:17) In this section, we revisit the proof of [12] for the upper bound on H (cid:16) Y | ˆX opt , Z (cid:17) . This isdone for completeness and since we point and modify some of the steps in the next section.The result of this section involves only one source X . The width of the quantization cell isdenoted by ∆ , √ D ⇒ D = ∆ / X k , k ∈ { , . . . , n } , E h X k − ˆ X optk + Z i = 0 . (26)This follows from the following consideration: E h X k − ˆ X optk + Z i = E h X k − ˆ X optk i + E [ Z ]= E h X k − ˆ X optk i . (27)The distortion associated with X k is given by: E (cid:20)(cid:16) X k − ˆ X optk (cid:17) (cid:21) = Var { X k − ˆ X optk ) } + (cid:16) E h X k − ˆ X optk i(cid:17) ≥ Var { X k − ˆ X optk } (28)10here the inequality must be achieved by the optimal quantizer. Otherwise, we couldadd a constant to ˆ X optk to obtain E h X k − ˆ X optk i = 0 and thus smaller total distortion, incontradiction to the optimality of the quantizer.We now rederive the upper bound on H (cid:0) Y | X opt , Z (cid:1) . Using a method similar to [13],we show the following for the conditional entropy of each coordinate: H (cid:16) Y k | ˆ X optk , Z (cid:17) = I (cid:16) X k ; X k + Z | ˆ X optk (cid:17) = h (cid:16) X k + Z | ˆ X optk (cid:17) − h ( Z ) (29)where the second equality follows since ˆ X optk and Z are independent. By definition: H (cid:16) Y k | ˆ X optk = q, Z (cid:17) = Z ∆2 − ∆2 dzf Z ( z ) H (cid:16) Y k | ˆ X optk = q, Z = z (cid:17) = 1∆ Z ∆2 − ∆2 dzH (cid:16) Y k | ˆ X optk = q, Z = z (cid:17) (30)Given (cid:16) ˆ X optk = q, Z = z (cid:17) , Y k is a discrete random variable taking values in { i ∆ } i ∈ N . Thus, H (cid:16) Y k | ˆ X optk = q, Z = z (cid:17) = − X j ∈ N P Y k | ˆ X optk ,Z ( j ∆ | q, z ) · log P Y k | ˆ X optk ,Z ( j ∆ | q, z ) (31)where P Y k | ˆ X optk ,Z ( ·| q, z ) is the probability density function of Y k given ˆ X optk and Z . Calcu-lating: P Y k | ˆ X optk ,Z ( j ∆ | q, z ) = Pr { Y k = j ∆ | Z = z, ˆ X optk = q } = ∆ Z ( j + )∆ − z ( j − )∆ − z dx f X | ˆ X optk ( x | q )= ∆ · f U k | ˆ X optk ( j ∆ − z | q ) (32)where f X | ˆ X optk ( ·| q ) is the probability density function of X given ˆ X optk , f U k | ˆ X optk ( ·| q ) = f X | ˆ X optk ( ·| q ) ∗ f Z ( · ) is the probability density function of the continuous random variable U k , X k + Z given ˆ X optk and ’ ∗ ’ denotes the convolution operation. Substituting in Eq.1130), we have H (cid:16) Y k | ˆ X optk = q, Z (cid:17) = − Z ∆2 − ∆2 dz X i ∈ N ∆ · f U k | ˆ X optk ( j ∆ − z | q ) · log (cid:16) ∆ · f U k | ˆ X optk ( j ∆ − z | q ) (cid:17) = − Z R du · f U k | ˆ X optk ( u | q ) · log (cid:16) ∆ · f U k | ˆ X optk ( u | q ) (cid:17) = h (cid:16) U k | ˆ X optk = q (cid:17) − log ∆= h (cid:16) U k | ˆ X optk = q (cid:17) − h ( Z )= h (cid:16) U k | ˆ X optk = q (cid:17) − h ( Z | X k , ˆ X optk = q )= h (cid:16) U k | ˆ X optk = q (cid:17) − h ( U k | X k , ˆ X optk = q )= I (cid:16) X k ; X k + Z | ˆ X optk = q (cid:17) (33)where in the fifth equality we used the independence of X k and Z and in the sixth equalitywe used the fact that U k = X k + Z . We have H (cid:16) Y k | ˆ X optk , Z (cid:17) = X q ∈Q opt P ˆ X optk ( q ) H (cid:16) Y k | ˆ X optk = q, Z (cid:17) = X q ∈Q opt P ˆ X optk ( q ) I (cid:16) X k ; X k + Z | ˆ X optk = q (cid:17) = I (cid:16) X k ; X k + Z | ˆ X optk (cid:17) = h (cid:16) X k + Z | ˆ X optk (cid:17) − h ( Z )= h (cid:16) X k − ˆ X optk + Z | ˆ X optk (cid:17) − h ( Z ) (34)This completes the derivation of Eq. (29). Now, we can upper bound h (cid:16) X k − ˆ X optk + Z | ˆ X optk (cid:17) in the following way. h (cid:16) X k − ˆ X optk + Z | ˆ X optk (cid:17) = X q ∈Q opt P ˆ X optk ( q ) h (cid:16) X − ˆ X optk + Z | ˆ X optk = q (cid:17) ≤ X q ∈Q opt P ˆ X optk ( q ) ·
12 log (cid:18) πe E (cid:20)(cid:16) X − ˆ X optk + Z (cid:17) | ˆ X optk = q (cid:21)(cid:19) ≤
12 log (cid:18) πe E (cid:16) X − ˆ X optk + Z (cid:17) (cid:19) (35)where in the first inequality we upper bounded the differential entropy by using the maximum-entropy property of the Gaussian random variable and the second inequality is due to Jensen.12sing these results, we can upper bound H (cid:16) Y | ˆX opt , Z (cid:17) . H (cid:16) Y | ˆX opt , Z (cid:17) ≤ n X k =1 H (cid:16) Y k | ˆ X optk , Z (cid:17) ≤ n X k =1
12 log (cid:18) πe E (cid:20)(cid:16) X k − ˆ X optk + Z (cid:17) (cid:21)(cid:19) − nh ( Z ) ≤ n πe n n X k =1 E (cid:20)(cid:16) X k − ˆ X optk + Z (cid:17) (cid:21)! − n log ∆= n (cid:18) πe n E (cid:13)(cid:13)(cid:13) X − ˆX opt + Z (cid:13)(cid:13)(cid:13) (cid:19) − n log ∆ ≤ n πe D ) − n log ∆= n πe D ) − n )= n πeD ) − n D )= n (cid:16) πe (cid:17) (36)where the third inequality is due to Jensen, and in the fourth we used the following.1 n E (cid:13)(cid:13)(cid:13) X − ˆX opt + Z (cid:13)(cid:13)(cid:13) = 1 n E (cid:13)(cid:13)(cid:13) X − ˆX opt (cid:13)(cid:13)(cid:13) + 1 n E k Z k ≤ D (37)which stems from the independence of X and Z . This completes the proof of the upperbound on H (cid:16) Y | ˆX opt , Z (cid:17) . The goal of this section is to enhance the results of Section 1 by improving the coding schemedescribed there. The idea is to decrease the distortion by adding an estimation stage atthe decoder side. The new scheme works as follows. After producing Y , Y and insteadof just using them as outputs, the decoder uses them to estimate each one of the sourcevectors ( X , X ). Since the sources and the quantization process (given Z ) are memoryless,the estimation can be done on a symbol-by-symbol basis.We begin with the following lemma: 13 emma 2. For the multi-terminal setting described in Section 1, we have ( i ∈ { , } ): E [ Y i − Z i ] = E [ X i ] (38) E [( Y i − Z i ) ] = E [ X i ] + D i (39) E [ X i ( Y i − Z i )] = E [ X i ] (40) E [( Y − Z )( Y − Z )] = E [ X X ] (41) E [ X ( Y − Z )] = E [ X X ] (42) E [ X ( Y − Z )] = E [ X X ] (43)Notice that the results above are true for each coordinate k ∈ { , . . . , n } . The proof ofLemma 2 is given in Appendix B.The improved decoder described below requires the knowledge of the second-order statis-tics of the source. However, as Lemma 2 shows, these statistics can be estimated from { Y i } i =1 , so universality can still be maintained.The decoder of the multi-terminal setting uses the optimal linear estimator, under theMMSE criterion, of { X i } i =1 given { Q ( X i + Z i ) − Z i } i =1 . The estimation error is calculatedby using the results of Lemma 2. From now on, without loss of generality, we assume that E [ X ] = E [ X ] = 0. The covariance matrix of Y , [ Q ( X + Z ) − Z , Q ( X + Z ) − Z ] is:Λ = E [ X ] + D E [ X X ] E [ X X ] E [ X ] + D ! (44)and the inverse matrix is:Λ − = 1 | Λ | E [ X ] + D − E [ X X ] − E [ X X ] E [ X ] + D ! (45)The vector E (cid:2) X · Y † (cid:3) is given by: E h X · Y † i = E [ X ] E [ X X ] ! (46)It can be shown by direct calculation thatΛ − E h X · Y † i = 1 | Λ | | Λ | − D ( E [ X ] + D ) E [ X X ] D ! Therefore, the optimal linear estimator of X given the vector Y is:ˆ X = Y · | Λ | | Λ | − D ( E [ X ] + D ) E [ X X ] D ! (47)14he error of the optimal linear estimator is given by: D ∗ = E (cid:2) X (cid:3) − E h ˆ X i (48)It is shown in Appendix C that the estimation error takes the following form: D ∗ = D E [ X ]( E [ X ] + D ) − E [ X X ] ( E [ X ] + D )( E [ X ] + D ) − E [ X X ] Remember that D ∗ is the distortion of X in the multi-terminal setting, where we add theabove estimation stage after decoding ( Y , Y ). It can be easily seen that the fraction inthe brackets is less than 1 and thus D ∗ ≤ D as desired. The same can be done, of course,for X . Since the distortion of X i in the improved scheme is D ∗ i , we should compare the ratepair ( R , R ) of this scheme, to the optimal rate pair ( R ∗ , R ∗ ) which achieves ( D ∗ , D ∗ ). Thisfact immediately improves on the results of Theorems 1 and 2. Revisiting the derivation ofthe upper bound for H (cid:16) Y | ˆX opt , Z (cid:17) in Eq. (36), it can be shown that ( i ∈ { , } ): H (cid:16) Y i | ˆX opti , Z i (cid:17) ≤ n (cid:20) πe (cid:18) D ∗ i D i + 1 (cid:19)(cid:21) (49)by using the following:1 n E (cid:13)(cid:13)(cid:13) X i − ˆX opti + Z i (cid:13)(cid:13)(cid:13) = 1 n E (cid:13)(cid:13)(cid:13) X i − ˆX opti (cid:13)(cid:13)(cid:13) + 1 n E k Z i k ≤ D ∗ i + D i (50)Notice that when X and X are independent, E [ X X ] = 0 and we have H (cid:16) Y i | ˆX opti , Z i (cid:17) ≤ n (cid:20) πe (cid:18) − D i E [ X i ] + D i (cid:19)(cid:21) (51)The maximum interesting value of D ∗ i is, of course, E [ X i ]. This value is obtained for D i → ∞ . It is not hard to see that the range of the upper bound in (51) is [0 . , . D . For the high-SNR limit, i.e., D i →
0, it is wellknown that the redundancy is 0 .
255 bits/sample (cf. [16]). We define ( i ∈ { , } ): c i ( D , D ) = n (cid:20) πe (cid:18) D ∗ i D i + 1 (cid:19)(cid:21) (52)We can now state Theorems 3 and 4. These theorems are obtained by applying the gener-alized upper bound of Eq. (49), instead of Ziv’s upper bound on H (cid:16) Y i | ˆX opti , Z i (cid:17) , in theproofs of Theorem 1 and 2. 15 heorem 3. For any rate pair ( R ∗ , R ∗ ) on the boundary of R ∗ ( D , D ) and any rate pair ( R , R ) on the boundary of R ( D ∗ , D ∗ ) , with R ∈ R ( D , D ) , we have R + R ≤ R ∗ + R ∗ + c ( D , D ) + c ( D , D ) (53) Moreover, for any R ∗ ∈ R ( D , D ) , there exists a rate pair ( R , R ) ∈ R ( D ∗ , D ∗ ) suchthat: R = R ∗ R ≤ R ∗ + c ( D , D ) + c ( D , D ) (54) Theorem 4.
For any rate pair ( R , R ) on the boundary of R ( D ∗ , D ∗ ) , with R ∈ R ( D , D ) ,there exists a rate pair ( R , R ) ∈ R ( D ∗ , D ∗ ) such that: R ≤ R ∗ + c ( D , D ) R ≤ R ∗ + c ( D , D ) (55) Acknowledgment
The authors are grateful to Prof. Rami Zamir for useful discussions.16 ppendix A - Universal Slepian-Wolf Coding
In this appendix we describe the universal Slepian-Wolf decoder used in our coding scheme.The following results are similar to those of [17]. For convenience, we omit the notation ofthe conditioning on the dither variables Z and Z . The results below can be applied forany realization of these continuous variables. Remember that our coding scheme, unlike thescheme presented in [6], requires only one realization of Z and Z in each round.We consider the Slepian-Wolf setting for two correlated memoryless sources ( Y , Y ) ∼ P Y ,Y . We assume that Y ∈ Y and Y ∈ Y , where Y and Y are finite alphabets. A(2 nR , nR , n ) source code is a block code of length n consisting of two encoders f , f , f : Y n → I M , f : Y n → I M (A.1)and a decoder g g : I M × I M → Y n × Y n . (A.2)where M j = 2 nR j , j = 1 ,
2. The probability of error of the code is defined as P e ( n ) , Pr { g ( f ( Y ) , f ( Y )) = ( Y , Y ) } (A.3)We will prove the following result: Theorem 5.
Let ( R , R ) be given. Then, there exists a sequence of (2 nR , nR , n ) Slepian-Wolf source codes with probability of error P e ( n ) → as n → ∞ for every memoryless sourcethat satisfies Eq. (1).Proof. Throughout the proof, the cardinality of a set A is denoted by |A| . The empiricaljoint entropy H y , y ( Y , Y ) and the empirical conditional entropy H y , y ( Y | Y ) induced bythe sequences y ∈ Y n , y ∈ Y n are defined as H y , y ( Y , Y ) , − X y ∈Y X y ∈Y P y , y ( y , y ) log P y , y ( y , y ) (A.4) H y , y ( Y | Y ) , − X y ∈Y X y ∈Y P y , y ( y , y ) log P y , y ( y | y ) (A.5)where P y , y ( y , y ), P y , y ( y | y ) are the empirical joint and conditional distribution func-tions, respectively, induced by y and y (see [18, Chap. 11]).To prove the theorem, we use the following random-binning mechanism:17 Codebook generation:
Assign every y ∈ Y n to one of 2 nR bins independently ac-cording to a uniform distribution on { , , . . . nR } . Similarly, randomly assign every y ∈ Y n to one of 2 nR bins. Reveal the assignments f and f to the encoders andthe decoder. • Encoding:
User j sends the index of the bin to which Y j belongs, j = 1 , • Decoding:
Given the received index pair ( T = f ( Y ) , T = f ( Y )), the decoder usesthe Minimum Joint Entropy (MJE) decoder: Choose the pair ( y ′ , y ′ ) : f ( y ′ ) = T , f ( y ′ ) = T which minimizes the empirical joint entropy induced by ( y ′ , y ′ ), H y ′ , y ′ ( Y , Y ).Define the following events: E = (cid:8) ( Y , Y ) / ∈ A nǫ (cid:9) E = (cid:8) ( Y , Y ) ∈ A nǫ (cid:9) ∩ (cid:8) ∃ y ′ = Y : f ( y ′ ) = T and H y ′ , Y ( Y , Y ) ≤ H Y , Y ( Y , Y ) (cid:9) E = (cid:8) ( Y , Y ) ∈ A nǫ (cid:9) ∩ (cid:8) ∃ y ′ = Y : f ( y ′ ) = T and H Y , y ′ ( Y , Y ) ≤ H Y , Y ( Y , Y ) (cid:9) E = (cid:8) ( Y , Y ) ∈ A nǫ (cid:9) ∩ (cid:8) ∃ (cid:16) y ′ , y ′ (cid:17) : y ′ = Y , y ′ = Y , f ( y ′ ) = T , f ( y ′ ) = T and H y ′ , y ′ ( Y , Y ) ≤ H Y , Y ( Y , Y ) (cid:9) (A.6)where A nǫ , ǫ >
0, is the strongly typical set with respect to the source P Y ,Y (see [18, Eq.10.107]). Remember that Y , Y , f and f are random. Obviously, H y ′ , y ( Y , Y ) ≤ H y , y ( Y , Y ) ⇔ H y ′ , y ( Y | Y ) ≤ H y , y ( Y | Y ) (A.7)where H y ′ , y ( Y | Y ), H y , y ( Y | Y ) are the empirical conditional entropies induced by ( y ′ , y )and ( y , y ), respectively. We have an error if there is another pair of sequences in the samebin such that the empirical joint entropy induced by this pair is smaller than the empiricaljoint entropy induced by ( Y , Y ). Hence,¯ P e ( n ) ≤ Pr { E ∪ E ∪ E ∪ E }≤ Pr { E } + Pr { E } + Pr { E } + Pr { E } (A.8)where ¯ P e ( n ) , E [ P e ( n )] is the expected probability of error where the expectation is takenwith respect to the random choice of the code. The first inequality follows from the factthat we treat E as error event and the second inequality is due to the union bound. We18rst consider E . By the asymptotic equipartition property (AEP), Pr { E } → n sufficiently large, Pr { E } < ǫ . To bound Pr( E ), we havePr( E ) = X ( y , y ) ∈ A nǫ P ( y , y ) · Pr (cid:8) ∃ y ′ = y : f ( y ′ ) = f ( y ) and H y ′ , y ( Y | Y ) ≤ H y , y ( Y | Y ) (cid:9) ≤ X ( y , y ) ∈ A nǫ P ( y , y ) X y ′ ∈ B ( y , y ) Pr { f ( y ′ ) = f ( y ) } = X ( y , y ) ∈ A nǫ P ( y , y )2 − nR | B ( y , y ) | (A.9)where the set B ( y , y ) is defined as B ( y , y ) , (cid:8) y ′ : H y ′ , y ( Y | Y ) ≤ H y , y ( Y | Y ) (cid:9) (A.10)and the last equality simply follows from the definition of the random-binning codingscheme. Using the method of types (see [18, Chap. 10-11]), we have | B ( y , y ) | = X y ′ ∈ B ( y , y ) X V y ′ | y ⊆ B ( y , y ) (cid:12)(cid:12)(cid:12) V y ′ | y (cid:12)(cid:12)(cid:12) ≤ X V y ′ | y ⊆ B ( y , y ) n (cid:0) H y ′ , y ( Y | Y )+ ǫ (cid:1) ≤ X V y ′ | y ⊆ B ( y , y ) n (cid:0) H y , y ( Y | Y )+ ǫ (cid:1) ≤ ( n + 1) |Y ||Y | n (cid:0) H y , y ( Y | Y )+ ǫ (cid:1) ≤ ( n + 1) |Y ||Y | n (cid:0) H ( Y | Y )+2 ǫ (cid:1) (A.11)where V y ′ | y is the conditional type of y ′ given y (see [18, Chap. 10]). The second equalityfollows from the fact that the event y ′ ∈ B ( y , y ) depends only on the type V y ′ | y . In thefirst inequality, we used the known upper bound on the size of the conditional type. Thesecond inequality stems from the definition of B ( y , y ). In the third inequality we used aknown upper bound on the number of conditional types. The last inequality follows since(see [18, Chap. 10])( y , y ) ∈ A nǫ ⇒ H y , y ( Y | Y ) ≤ H ( Y | Y ) + ǫ. (A.12)19herefore, we have Pr( E ) ≤ X ( y , y ) ∈ A nǫ P ( y , y )2 − nR | B ( y , y ) |≤ ( n + 1) |Y ||Y | − nR n ( H ( Y | Y )+2 ǫ ) (A.13)where in the second inequality we used Eq. (A.11). Similarly, it can be shown thatPr( E ) ≤ ( n + 1) |Y ||Y | − nR · n ( H ( Y | Y )+2 ǫ ) (A.14)and Pr( E ) ≤ ( n + 1) |Y ||Y | − n ( R + R ) · n ( H ( Y ,Y )+2 ǫ ) (A.15)Hence, taking R > H ( Y | Y ) + 2 ǫ , R > H ( Y | Y ) + 2 ǫ and R + R > H ( Y , Y ) + 2 ǫ , wehave P ( E ) < ǫ , P ( E ) < ǫ and P ( E ) < ǫ for sufficiently large n . Since ¯ P e ( n ) ≤ ǫ , thereexists at least one universal code ( f ∗ , f ∗ , g ∗ ) with P e ( n ) ≤ ǫ . Thus, we can construct asequence of universal codes with P e ( n ) →
0, and the proof of achievability is complete.
Remark . It can be shown that the universal decoder presented in the proof above alsoachieves the optimal error exponent.
Appendix B - Proof of Lemma 2
We now prove Lemma 2. We first show that the random vector ( Y − Z , Y − Z ) is equiva-lent to the random vector ( X + N , X + N ) where N , N are independent of X , X andof each other and N i ∼ U [ −√ D i , √ D i ], i ∈ { , } . Therefore, the dithered quantizationprocess can be viewed as passing X , X through independent noisy memoryless channelsˆ X = X + N and ˆ X = X + N , respectively. We start with the following conditionalprobability distribution. f N ,N | X ,X ( N , N | X , X ) = f ( N | X ) f ( N | X ) (B.1)where we have defined N , Y − Z − X , N , Y − Z − X . The equality stems from thefact that ( Y − Z − X ) is independent of X given X and ( Y − Z − X ) is independentof X given X , since ( Z , Z ) are independent of ( X , X ). In addition, it can be easilyseen that for every value of X i , N i is uniformly distributed over [ −√ D i , √ D i ]. Therefore, N i is independent of X i and we have f N ,N | X ,X ( N , N | X , X ) = f ( N ) f ( N ) (B.2)20emma 2 follows directly from this result: E [ Y i − Z i ] = E [ X i + N i ] = E [ X i ] E [( Y i − Z i ) ] = E [( X i + N i ) ] = E [ X i ] + D i E [ X i ( Y i − Z i )] = E [ X i ( X i + N i )] = E [ X i ] E [( Y − Z )( Y − Z )] = E [( X + N )( X + N )] = E [ X X ] E [ X ( Y − Z )] = E [ X ( X + N )] = E [ X X ] E [ X ( Y − Z )] = E [ X ( X + N )] = E [ X X ] Appendix C - Calculation of the Estimation Error
In this appendix we calculate the estimation error given in Eq. (49). The optimal linearestimator of X given the vector Y is:ˆ X = Y · | Λ | | Λ | − D ( E [ X ] + D ) E [ X X ] D ! (C.1)where | Λ | = ( E [ X ] + D )( E [ X ] + D ) − E [ X X ] (C.2)The error of the optimal linear estimator is given by: D ∗ = E (cid:2) X (cid:3) − E h ˆ X i (C.3)21alculating the second term: | Λ | E h ˆ X i = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) E h ( Y − Z ) i + E [ X X ] D E h ( Y − Z ) i +2 (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) E [ X X ] D E [( Y − Z ) ( Y − Z )]= (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0) E [ X ] + D (cid:1) + E [ X X ] D (cid:0) E [ X ] + D (cid:1) +2 (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) E [ X X ] D = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0) E [ X ] + D (cid:1) + E [ X X ] D (cid:0) D (cid:0) E [ X ] + D (cid:1) + 2 (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1)(cid:1) = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0) E [ X ] + D (cid:1) + E [ X X ] D (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0)(cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0) E [ X ] + D (cid:1) + E [ X X ] D (cid:1) + | Λ | E [ X X ] D = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (cid:0) | Λ | (cid:0) E [ X ] + D (cid:1) − D | Λ | (cid:1) + | Λ | E [ X X ] D = (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) | Λ | E [ X ]+ | Λ | E [ X X ] D = | Λ | E [ X ] + | Λ | D (cid:0) E [ X X ] − E [ X ] (cid:0) E [ X ] + D (cid:1)(cid:1) = | Λ | E [ X ] − | Λ | D (cid:0) E [ X ] (cid:0) E [ X ] + D (cid:1) − E [ X X ] (cid:1) = | Λ | E [ X ] − | Λ | D (cid:0) | Λ | − D (cid:0) E [ X ] + D (cid:1)(cid:1) (C.4)where in the second equality we used the results of Lemma 2. Therefore, we have D ∗ = | Λ | D (cid:0) | Λ | − D ( E [ X ] + D )) (cid:1) | Λ | = D (cid:18) − D ( E [ X ] + D ) | Λ | (cid:19) = D (cid:18) − D ( E [ X ] + D )( E [ X ] + D )( E [ X ] + D ) − E [ X X ] (cid:19) = D E [ X ]( E [ X ] + D ) − E [ X X ] ( E [ X ] + D )( E [ X ] + D ) − E [ X X ] (C.5)22 eferences [1] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEETrans. Inform. Theory , vol. 19, pp. 471–480, Jul. 1973.[2] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with sideinformation at the decoder,”
IEEE Trans. Inform. Theory , vol. 22, pp. 1–10, Jan. 1976.[3] A. D. Wyner, “The rate-distortion function for source coding with side information atthe decoder-II: General sources,”
Inform. Contr. , vol. 38, pp. 60–80, 1978.[4] R. Ahlswede and J. K¨orner, “Source coding with side information and a converse fordegraded broadcast channels,”
IEEE Trans. Inform. Theory , vol. 21, no. 6, pp. 629–637,Nov. 1975.[5] T. Berger and R. W. Yeung, “Multiterminal source encoding with one distortion crite-rion,”
IEEE Trans. Inform. Theory , vol. 35, no. 2, pp. 228–236, Mar. 1989.[6] R. Zamir and T. Berger, “Multiterminal source coding with high resolution,”
IEEETrans. Inf. Theory , vol. 45, pp. 106–117, Jan. 1999.[7] A. Wagner and V. Anantharam, “An improved outer bound for multiterminal sourcecoding,”
IEEE Trans. Inf. Theory , vol. 54, no. 5, pp. 1919–1937, May 2008.[8] A. Wagner, S. Tavildar, and P. Viswanath, “Rate region of the quadratic gaussian two-encoder source-coding problem,”
IEEE Trans. Inf. Theory , vol. 54, no. 5, pp. 1938–1961,May 2008.[9] T. Berger and S. Y. Tung, “Encoding of correlated analog sources,” in
Proc. IEEE–USSR Joint Workshop on Information Theory , pp. 7–10, 1975.[10] T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmicloss,”
IEEE Trans. Inform. Theory , vol. 60, pp. 740–761, Jan. 2014.[11] Y. Kaspi and N. Merhav, “Zero-delay and causal single-user and multi-user lossy sourcecoding with decoder side information,”
IEEE Trans. Inf. Theory , vol. 60, no. 11, pp.6931-6942, Nov. 2014. 2312] J. Ziv, “On universal quantization,”
IEEE Trans. Inform. Theory , vol. 31, pp. 344–347,May 1985.[13] R. Zamir and M. Feder, “On universal quantization by randomized uniform/latticequantizer,”
IEEE Trans. Inform. Theory , vol. 38, pp. 428–436, Mar. 1992.[14] R. Zamir and M. Feder, “Information rates of pre/post filtered dithered quantizers,”
IEEE Trans. Inform. Theory , vol. 42, pp. 1340–1353, Sept. 1996.[15] J. Kieffer, “A unified approach to weak universal source coding,”
IEEE Trans. Inform.Theory , vol. 24, pp. 674–682, Nov. 1978.[16] H. Gish and N. J. Pierce, “Asymptotically efficient quantization,”
IEEE Trans. Inform.Theory , vol. 14, pp. 676–683, Sept. 1968.[17] I. Csiszar and J. K¨orner, “Towards a general theory of source networks,”
IEEE Trans.Inform. Theory , vol. 26, pp. 155–165, Mar. 1980.[18] T. M. Cover and J. A. Thomas, “Elements of information theory,”,