[PDF] Distributed Arithmetic Coding for the Asymmetric Slepian-Wolf problem

Abstract

Distributed source coding schemes are typically based on the use of channels codes as source codes. In this paper we propose a new paradigm, termed "distributed arithmetic coding", which exploits the fact that arithmetic codes are good source as well as channel codes. In particular, we propose a distributed binary arithmetic coder for Slepian-Wolf coding with decoder side information, along with a soft joint decoder. The proposed scheme provides several advantages over existing Slepian-Wolf coders, especially its good performance at small block lengths, and the ability to incorporate arbitrary source models in the encoding process, e.g. context-based statistical models. We have compared the performance of distributed arithmetic coding with turbo codes and low-density parity-check codes, and found that the proposed approach has very competitive performance.

Full PDF

aa r X i v : . [ c s . I T ] N ov Distributed Arithmetic Coding for theSlepian-Wolf problem

Marco Grangetto,

Member, IEEE,

Enrico Magli,

Senior Member, IEEE,

Gabriella Olmo,

Senior Member, IEEE

EDICS: SEN-DCSC, SPC-CODC

Abstract

Distributed source coding schemes are typically based on the use of channels codes as sourcecodes. In this paper we propose a new paradigm, named “distributed arithmetic coding”, whichextends arithmetic codes to the distributed case employing sequential decoding aided by the sideinformation. In particular, we introduce a distributed binary arithmetic coder for the Slepian-Wolfcoding problem, along with a joint decoder. The proposed scheme can be applied to two sources inboth the asymmetric mode, wherein one source acts as side information, and the symmetric mode,wherein both sources are coded with ambiguity, at any combination of achievable rates. Distributedarithmetic coding provides several advantages over existing Slepian-Wolf coders, especially goodperformance at small block lengths, and the ability to incorporate arbitrary source models in theencoding process, e.g., context-based statistical models, in much the same way as a classical arith-metic coder. We have compared the performance of distributed arithmetic coding with turbo codesand low-density parity-check codes, and found that the proposed approach is very competitive.

Index Terms

Distributed source coding, arithmetic coding, Slepian-Wolf coding, Wyner-Ziv coding, compres-sion, turbo codes, LDPC codes.

M. Grangetto is with Dip. di Informatica, Universit`a degli Studi di Torino, Corso Svizzera 185 - 10149 Torino - ITALY- Ph.: +39-011-6706711 - FAX: +39-011-751603 - E-mail: [email protected]

E. Magli and G. Olmo are with Dip. di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24 - 10129 Torino -Italy - Ph.: +39-011-5644195 - FAX: +39-011-5644099 - E-mail: enrico.magli(gabriella.olmo)@polito.it .Corresponding author: Enrico Magli.

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 1

Distributed Arithmetic Coding for theSlepian-Wolf problem

I. I

NTRODUCTION AND BACKGROUND

In recent years, distributed source coding (DSC) has received an increasing attention from thesignal processing community. DSC considers a situation in which two (or more) statistically dependentsources X and Y must be encoded by separate encoders that are not allowed to talk to each other.Performing separate lossless compression may seem less efﬁcient than joint encoding. However, DSCtheory proves that, under certain assumptions, separate encoding is optimal, provided that the sourcesare decoded jointly [1]. For example, with two sources it is possible to perform “standard” encodingof the ﬁrst source (called side information ) at a rate equal to its entropy, and “conditional” encodingof the second one at a rate lower than its entropy, with no information about the ﬁrst source availableat the second encoder; we refer to this as “asymmetric” Slepian-Wolf (S-W) problem. Alternatively,both sources can be encoded at a rate smaller than their respective entropy, and decoded jointly, whichwe refer to as “symmetric” S-W coding.DSC theory also encompasses lossy compression [2]; it has been shown that, under certain con-ditions, there is no performance loss in using DSC [2], [3], and that possible losses are boundedbelow 0.5 bit per sample (bps) for quadratic distortion metric [4]. In practice, lossy DSC is typicallyimplemented using a quantizer followed by lossless DSC, while the decoder consists of the jointdecoder followed by a joint dequantizer. Lossless and lossy DSC have several potential applications,e.g., coding for non co-located sources such as sensor networks, distributed video coding [5], [6],[7], [8], layered video coding [9], [10], error resilient video coding [11], and satellite image coding[12], [13], just to mention a few. The interested reader is referred to [14] for an excellent tutorial.Traditional entropy coding of an information source can be performed using one out of manyavailable methods, the most popular being arithmetic coding (AC) and Huffman coding. “Conditional”(i.e., DSC) coders are typically implemented using channel codes, by representing the source usingthe syndrome or the parity bits of a suitable channel code of given rate. The syndrome identiﬁessets of codewords (“cosets”) with maximum distance properties, so that decoding an ambiguousdescription of a source at a rate less than its entropy (given the side information) incurs minimumerror probability. If the correlation between X and Y can be modeled as a “virtual” channel describedas X = Y + W , with W an additive noise process, a good channel code for that transmission problemis also expected to be a good S-W source code [3]. EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 2

Regarding asymmetric S-W coding, the ﬁrst practical technique has been described in [15], andemploys trellis codes. Recently, more powerful channel codes such as turbo codes have been proposedin [6], [16], [17], and low-density parity-check (LDPC) [18] codes have been used in [19], [20], [21].Turbo and LDPC codes can get extremely close to channel capacity, although they require the blocksize to be rather large. Note that the constituent codes of turbo-codes are convolutional codes, hencethe syndrome is difﬁcult to compute. In [6] the cosets are formed by all messages that producethe same parity bits, even though this approach is somewhat suboptimal [17], since the geometricalproperties of these cosets are not as good as those of syndrome-based coding. In [22] a syndromeformer is used to deal with this problem. Multilevel codes have also be addressed; in [23] trelliscodes are extended to multilevel sources, whereas in [24] a similar approach is proposed for LDPCcodes.Besides techniques based on channel coding, a few authors have also investigated the use of sourcecoders for DSC. This is motivated by the fact that existing source coders obviously exhibit nicecompression features that should be retained in a DSC coder, such as the ability to employ ﬂexibleand adaptive probability models, and low encoding complexity. In [25] the problem of designinga variable-length DSC coder is addressed; it is shown that the problem of designing a zero-errorsuch coder is NP-hard. In [26] a similar approach is followed; the authors consider the problemof designing Huffman and arithmetic DSC coders for multilevel sources with zero or almost-zeroerror probability. The idea is that, if the joint density of the source and the side information satisﬁescertain conditions, the same codeword (or the same interval for the AC process) can be associatedto multiple symbols. This approach leads to an encoder with a complex modeling stage (NP-hardfor the optimal code, though suboptimal polynomial-time algorithms are provided in [26]), while thedecoding process resembles a classical arithmetic decoder.As for symmetric S-W codes, a few techniques have been recently proposed. A symmetric codecan be obtained from an asymmetric one through time sharing, whereby the two sources alternativelytake the role of the source and the side information; however, current DSC coders cannot easilyaccommodate this approach. Syndrome-based channel code partitioning has been introduced in [27],and extended in [28] to systematic codes. A similar technique is described in [29], encompassingnon-systematic codes. Syndrome formers have also been proposed for symmetric S-W coding [30].Moreover, techniques based on the use of parity bits can also be employed, as they can typicallyprovide rate compatibility. A practical code has been proposed in [16] using two turbo codes that aredecoded jointly, achieving the equal rate point; in [31] an algorithm is introduced that employs turbocodes to achieve arbitrary rate splitting. Symmetric S-W codes based on LDPC codes have also beendeveloped [32], [33].

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 3

Although several near-optimal DSC coders have been designed for simple ideal sources (e.g., binaryand Gaussian sources), the applications of practical DSC schemes to realistic signals typically incursthe following problems. • Channel codes get very close to capacity only for very large data blocks (typically in excess of symbols). In many applications, however, the basic units to be encoded are of the order ofa few hundreds to a few thousands symbols. For such block lengths, channel codes have goodbut not optimal performance. • The symbols contained in a block are expected to follow a stationary statistical distribution.However, typical real-world sources are not stationary. This calls for either the use of shortblocks, which weakens the performance of the S-W coder, or the estimation of conditionalprobabilities over contexts, which cannot be easily accommodated by existing S-W coders. • When the sources are strongly correlated (i.e., in the most favorable case), very high-rate channelcodes are needed (e.g., rate- codes). However, capacity-achieving channel codes are often notvery efﬁcient at high rate. • In those applications where DSC is used to limit the encoder complexity, it should be noted thatthe complexity of existing S-W coders is not negligible, and often higher than that of existingnon-DSC coders. This seriously weakens the beneﬁts of DSC. • Upgrading an existing compression algorithm like JPEG 2000 or H.264/AVC to provide DSCfunctionalities requires at least to redesign the entropy coding stage, adopting one of the existingDSC schemes.Among these issues, the block length is particularly important. While it has been shown that, onideal sources with very large block length, the performance of some practical DSC coders can beas close as 0.09 bits to the theoretical limit [14], so far DSC of real-world data has fallen short ofits expectations, one reason being the necessity to employ much smaller blocks. For example, thePRISM video coder [5] encodes each macroblock independently, with a block length of 256 samples.For the coder in [6], the block length is equal to the number of 8x8 blocks in one picture (1584 forthe CIF format). The performance of both coders is rather far from optimal, highlighting the need ofDSC coders for realistic block lengths.A solution to this problem has been introduced in [34], where an extension of AC, named distributedarithmetic coding (DAC), has been proposed for asymmetric S-W coding. Moreover, in [35] DAChas been extended to the case of symmetric S-W coding of two sources at the same rate (i.e., themid-point of the S-W rate region). DAC and its decoding process do not currently have a rigorousmathematical theory that proves they can asymptotically achieve the S-W rate region; such theory is

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 4 very difﬁcult to develop because of the non-linearity of AC. However, DAC is a practical algorithmthat was shown in [34] to outperform other existing distributed coders. In this paper, we build on theresults presented in [34], providing several new contributions. For asymmetric coding, we focus oni.i.d. sources as these are often found in many DSC applications; for example, in transform-domaindistributed video coding, DAC could be applied to the bit-planes of transform coefﬁcients, which canbe modeled as i.i.d. We optimize the DAC using an improved encoder termination procedure, andwe investigate the rate allocation problem, i.e., how to optimally select the encoding parameters toachieve a desired target rate. We evaluate the performance of this new design comparing it with turboand LDPC codes, including the case of extremely correlated sources with highly skewed probabilities.This is of interest in multimedia applications because the most signiﬁcant bit-planes of the transformcoefﬁcients of an image or video sequence are almost always equal to zero, and are strongly correlatedwith the side information. For symmetric coding, we extend our previous work in [35] by introducingDAC encoding and rate allocation procedures that allow to encode an arbitrary number of sourceswith arbitrary combination of rates. We develop and test the decoder for two sources.Finally, it should be noted that an asymmetric DAC scheme has been independently and concurrentlydeveloped in [36] using quasi-arithmetic codes. Quasi-arithmetic codes are a low-complexity approx-imation to arithmetic codes, providing smaller encoding and decoding complexity [37]. These codesallow the interval endpoints to be only a ﬁnite set of points. While this yields suboptimal compressionperformance, it makes the arithmetic coder a ﬁnite state machine, simplifying the decoding processwith side information.This paper is organized as follows. In Sect. III we describe the DAC encoding process for theasymmetric case, in Sect. III we describe the DAC decoder, and in Sect. IV we study the rateallocation and parameter selection problem. In Sect. V we describe the DAC encoder, decoder andrate allocator for the symmetric case. In Sect. VI and VII we report the DAC performance evaluationresults in the asymmetric and symmetric case respectively. Finally, in Sect. VIII we draw someconclusions. II. D

ISTRIBUTED ARITHMETIC CODING : ASYMMETRIC ENCODER

Before describing the DAC encoder, it should be noted that the AC process typically consists of amodeling stage and a coding stage. The modeling stage has the purpose of computing the parametersof a suitable statistical model of the source, in terms of the probability that a given bit takes onvalue 0 or 1. This model can be arbitrarily sophisticated, e.g., by using contexts, adaptive probabilityestimation, and so forth. The coding stage takes the probabilities as input, and implements the actualAC procedure, which outputs a binary codeword describing the input sequence.

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 5

Let X be a binary memoryless source that emits a semi-inﬁnite sequence of random variables X i , i = 0 , , . . . , with probabilities p X = P ( X i = 0) and p X = P ( X i = 1) . We are concerned withencoding the sequence x = [ x , . . . , x N − ] consisting in the ﬁrst N occurrences of this source. Themodeling and coding stages are shown in Fig. 1-a. The modeling stage takes as input the sequence x , and outputs an estimate of the probabilities p X and p X . The coding stage takes as input x , p X and p X , and generates a codeword C X . The expected length of C X depends on p X and p X , and isdetermined once these probabilities are given.In order to use the DAC, we consider two sources X and Y , where Y is a binary memoryless sourcethat emits random variables Y i , i = 0 , , . . . , with probabilities p Y = P ( Y i = 0) and p Y = P ( Y i = 1) .The ﬁrst N occurrences of this source form the side information y = [ y , . . . , y N − ] . We assume that X and Y are i.i.d. sources, and that X i and Y i are statistically dependent for a given i . The entropyof X is deﬁned as H ( X ) = − P j =0 p Xj log p Xj , and similarly for Y . The conditional entropy of X given Y is deﬁned as H ( X | Y ) = − P j =0 P k =0 P ( X i = j, Y i = k ) log P ( X i = j | Y i = k ) .For DAC, three blocks can be identiﬁed, as in Fig. 1-b, namely the modeling, rate allocation, andcoding stages. The modeling stage is exactly the same as in the classical AC. The coding stage willbe described in Sect. II-B; it takes as inputs x , the probabilities p X and p X , and the parameter k X ,and outputs a codeword C ′ X . Unlike a classical AC, where the expected rate is function of the sourceprobabilities, and hence cannot be selected a priori , the DAC allows to select any desired rate notlarger than the expected rate of a classical AC. This is very important, since in a DSC setting therate for x should depend not only on how much “compressible” the source is, but also on how muchcorrelated X i and Y i are. For this reason, in DAC we also have a rate allocation stage that takes asinput the probabilities p X and p X and the conditional entropy H ( X | Y ) , and outputs a parameter k X that drives the DAC coding stage to achieve the desired target rate.In this paper we deal with the coding and rate allocation stages, and assume that the inputprobabilities p X , p X and conditional entropy H ( X | Y ) are known a priori . This allows us to focuson the distributed coding aspects of the proposed scheme, and, at the same time, keeps the schemeindependent of the modeling stage. A. Arithmetic coding

We ﬁrst review the classical AC coding process, as this sets the stage for the description of the DACencoder; an overview can be found in [38]. The binary AC process for x is based on the probabilities p X and p X , which are used to partition the [0 , interval into sub-intervals associated to possibleoccurrences of the input symbols. At initialization the “current” interval is set to I = [0 , . Foreach input symbol x i , the current interval I i is partitioned into two adjacent sub-intervals of lengths EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 6

Coding C X p X ModelingModelingx CodingallocationRate C X’ k X (a)(b) j H(X|Y) Xj px Fig. 1. Modeling, rate allocation and coding stage for (a) classical AC, and (b) DAC. p X | I i | and p X | I i | , where | I i | is the length of I i . The sub-interval corresponding to the actual valueof x i is selected as the next current interval I i +1 , and this procedure is repeated for the next symbol.After all N symbols have been processed, the sequence is represented by the ﬁnal interval I N . Thecodeword C X can consist in the binary representation of any number inside I N (e.g., the number in I N with the shortest binary representation), and requires approximately − log | I N | bits. B. DAC encoder

Similarly to other S-W coders, DAC is based on the principle of inserting some ambiguity in thesource description during the encoding process. This is obtained using a modiﬁed interval subdivisionstrategy. In particular, the DAC employs a set of intervals whose lengths are proportional to themodiﬁed probabilities e p X and e p X , such that e p X ≥ p X and e p X ≥ p X . In order to ﬁt the enlargedsub-intervals into the [0 , interval, they are allowed to partially overlap. This prevents the decoderfrom discriminating the correct interval, unless the side information is used.The detailed DAC encoding procedure is described in the following. At initialization the “current”interval is set to I ′ = [0 , . For each input symbol x i , the current interval I ′ i is subdivided intotwo partially overlapped sub-intervals whose lengths are e p X | I ′ i | and e p X | I ′ i | . The interval representingsymbol x i is selected as the next current interval I ′ i +1 . After all N symbols have been processed,the sequence is represented by the ﬁnal interval I ′ N . The codeword C ′ X can consist in the binaryrepresentation of any number inside I ′ N , and requires approximately − log | I ′ N | bits. This procedureis sketched in Fig. 2. At the decoder side, whenever the codeword points to an overlapped region,the input symbol cannot be detected unambiguously, and additional information must be exploited EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 7 p p x =0x =0x =1 I’ X p ~ X p ~ XX pp ~~ XX pp ~~ XXX ppp ~~~

XXX ppp ~~~

Fig. 2. Distributed arithmetic encoding procedure for a block of three symbols. by the joint decoder to solve the ambiguity. It is worth noticing that the DAC encoding procedure isa generalization of AC. Letting e p X = p X and e p X = p X leads to the AC encoding process describedin Sect. II-A, with I ′ N = I N and C ′ X = C X .It should also be noted that, for simplicity, the description of the AC and DAC provided aboveassumes inﬁnite precision arithmetic. The practical implementation used in Sect. VI and VII employsﬁxed-point arithmetic and interval renormalization.III. D ECODING FOR THE ASYMMETRIC CASE

The objective of the DAC decoder is joint decoding of the sequence x given the correlated sideinformation y . The arithmetic decoding machinery of the DAC decoder presents limited modiﬁcationswith respect to standard arithmetic decoders; a ﬁxed-point implementation has been employed, withthe same interval scaling and overlapping rules used at the encoder. In the following the arithmeticdecoder state at the i -th decoding step is denoted as σ i , i = 0 , . . . , N − . The data stored in σ i represent the interval I ′ i and the codeword at iteration i .The decoding process can be formulated as a symbol-driven sequential search along a properdecoding tree, where each node represents a state σ i , and a path in the tree represents a possibledecoded sequence. The following elementary decoding functions are required to explore the tree: • (˜ x i , σ i +1 ) = Test-One-Symbol ( σ i ) : it computes the sub-intervals at the i -th step, compares themwith C ′ X and outputs either an unambiguous symbol ˜ x i = 0 , (if C ′ X belongs to one of thenon-overlapped regions), or an ambiguous symbol ˜ x i = A . In case of unambiguous decoding,the new decoder state σ i +1 is returned for the following iterations. EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 8 • σ i +1 = Force-One-Symbol ( σ i , ˜ x i ) : it forces the decoder to select the sub-interval correspondingto the symbol ˜ x i regardless of the ambiguity; the updated decoder state is returned.In Fig. 3 an example of a section of the decoding tree is shown. In this example the decoder is not ableto make a decision on the i -th symbol, as Test-One-Symbol returns ˜ x i = A . As a consequence, twoalternative decoding attempts are pursued by calling Force-One-Symbol with ˜ x i = 0 , respectively.In principle, by iterating this process, the tree T , representing all the possible decoded sequences, canbe explored. The best decoded sequence can ﬁnally be selected applying the Maximum A Posteriori (MAP) criterion ˜ x = arg max T P ( X , . . . , X N − | C ′ X , Y ) .In general, exhaustive search cannot be applied due to the exponential growth of T . A viablesolution is obtained applying the breadth-ﬁrst sequential search known as M -algorithm [39], [40]; ateach tree depth, only the M nodes with the best partial metric are retained. This amounts to visitingonly a subset of the most likely paths in T . The MAP metric for a given node can be evaluated asfollows: P ( X = ˜ x , . . . , X i = ˜ x i | C ′ X , Y ) = i Y j =0 P ( X j = ˜ x j | C ′ X , Y j ) (1)Metric (1) can be expressed into additive terms by setting: Λ i +1 , log P ( X = ˜ x , . . . , X i = ˜ x i | C ′ X , Y ) = i X j =0 λ j (2) λ j , log P ( X j = ˜ x j | C ′ X , Y j ) where Λ = 0 and λ i represent the additive metric to be associated to each branch of T .The pseudocode for the DAC decoder is given in Algorithm 1, where T i represents the list of nodesin T explored at depth i ; each tree node stores its corresponding arithmetic decoder state σ i and theaccumulated metric Λ i .It is worth pointing out that M has to be selected as a trade-off between the memory/complexityrequirements and the error probability, i.e., the probability that the path corresponding to the originalsequence x is accidentally dropped. As in the case of standard Viterbi decoding, the path metricturns out to be stable and reliable as long as a signiﬁcant amount of terms, i.e., number of decodedsymbols ˜ x i , are taken into account. In the pessimistic case when all symbol positions i trigger adecoder branching, given M , one can guarantee that at least log ( M ) symbols are considered formetric comparisons and pruning. On the other hand, in practical cases, the interval overlap is onlypartial and branching does not occur at every symbol iteration. All the experimental results presentedin Sect. VI have been obtained using M = 2048 , while the trade-off between performance andcomplexity is analyzed in Sect. VI-F. EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 9

Algorithm 1

DAC decoder (asymmetric case)Initialize T with root node ( σ , Λ = 0 )Set symbol counter i ⇐ while ( i < N ) dofor All nodes ( σ i , Λ i ) in T i do (˜ x i , σ i +1 ) = Test-One-Symbol ( σ i ) if ˜ x i = A thenfor k = (0 , do σ i +1 = Force-One-Symbol ( σ i , ˜ x i = k )Λ i +1 ⇐ Λ i + λ i Insert ( σ i +1 , Λ i +1 ) in T i +1 end forelse Λ i +1 ⇐ Λ i + λ i Insert ( σ i +1 , Λ i +1 ) in T i +1 end ifend for Sort nodes in T i +1 according to metric Λ i +1 Keep only the M nodes with best metric in T i +1 end while Output ˜ x (sequence corresponding to the ﬁrst node stored in T N )Finally, metric reliability cannot be guaranteed for the very last symbols of a ﬁnite-length sequence x . For channel codes, e.g., convolutional codes, this issue is tackled by imposing a proper terminationstrategy, e.g., forcing the encoded sequence to end in the ﬁrst state of the trellis. A similar approach isnecessary when using DAC. Examples of AC termination strategies are encoding a known terminationpattern or end-of-block symbol with a certain probability or, in the case of context-based AC, drivingthe AC encoder in a given context. For DAC, we employ a new termination policy that is tailoredto its particular features. In particular, termination is obtained by encoding the last T symbols of thesequence without interval overlap, i.e., using e p Xj = p Xj , for all symbols x i with i ≥ N − T . As aconsequence, no nodes in the DAC decoding tree will cause branching in the last T steps, makingthe ﬁnal metrics more reliable for the selection of the most likely sequence. However, there is a ratepenalty for the termination symbols. EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 10 unambiguous decoding ambiguous decoding unambiguous decoding Y i Y i +1 ˜ x i = 0˜ x i = 1 P ( X = ˜ x ,...,X i = ˜ x i | C ′ X ,Y ) P ( X = ˜ x ,...,X i − = ˜ x i − | C ′ X ,Y ) Fig. 3. Distributed arithmetic decoding tree for asymmetric S-W coding.

IV. R

ATE ALLOCATION AND CHOICE OF THE OVERLAP FACTOR

The length of codeword C ′ X is determined by the length | I ′ N | of the ﬁnal interval, which in turndepends on how much e p X and e p X are larger than p X and p X . As a consequence, in order to selectthe desired rate, it is important to quantitatively determine the dependence of the expected rate onthe overlap, because this will drive the selection of the desired amount of overlap. Moreover, wealso need to understand how to split the overlap in order to achieve good decoding performance.In the following we derive the expected rate obtained by the DAC as a function of the set of inputprobabilities and the amount of overlap. A. Calculation of the rate yielded by DAC

We are interested in ﬁnding the expected rate e R (in bps) of the codeword used by the DAC toencode the sequence x . This is given by the following formula: e R = X j =0 p Xj log e p Xj (3)This can be derived straightforwardly from the property that the codeword generated by an AC has anexpected length that depends on the size of the ﬁnal interval, that is, on the product of the probabilities e p Xj , and hence on the amount of overlap. The expectation is computed using the true probabilities p Xj .We set e p Xj = α Xj p Xj , where α Xj ≥ , so that e p X + e p X ≥ . This amounts to enlarging eachinterval by an amount proportional to the overlap factors α Xj . The expected rate achieved by the EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 11

DAC becomes e R = X j =0 p Xj (cid:16) r Xj − δ Xj (cid:17) where r Xj = − log p Xj , and δ Xj = log α Xj . Note that r Xj represents the rate contribution of symbol j yielded by standard AC, while δ Xj represents the decrease of this contribution, i.e., the averagenumber of bits saved in the binary representation of the j -th input symbol. B. Design of the overlap factors

Once a target rate has been selected, the problem arises of selecting α Xj . As an example, a possiblechoice is to take equal overlap factors α X = α X = α X . This implies that each interval is enlargedby a factor α X that does not depend on the source probability p Xj . This leads to a target rate R ′ X = H ( X ) − log α X . (4)It can be shown that this choice minimizes the rate e R for a given total amount of overlap α X p X + α X p X − ; the computations are simple and are omitted for brevity. This choice is not necessarilyoptimal in terms of the decoder error probability. However, optimizing for the error probability isimpractical because of the nonlinearity of the arithmetic coding process.In practice, one also has to make sure that the enlarged intervals [0 , α X p X ) and [1 − α X p X , are both contained inside the [0 , interval. E.g., taking equal overlap factors as above does notguarantee this. We have devised the following rule that allows to achieve any desired rate satisfyingthe constraint above. We apply the following constraint: δ Xj r Xj = k X (5)with k X a positive constant independent of j . This leads to α Xj = ( p Xj ) − k X (6)This can be interpreted as an additional constraint that the rate reduction for symbols “0” and “1”depends on their probabilities, i.e., the least probable symbol undergoes a larger reduction. Using (6),it can be easily shown that the expected rate achieved by the DAC can be written as e R = (cid:16) − k X (cid:17) H ( X ) . (7)Thus, the allocation problem for an i.i.d. source is very simple. We assume that the conditionalentropy H ( X | Y ) is available as in Fig. 1-b, modeling the correlation between X and Y . In asymmetricDSC, x should be ideally coded at a rate arbitrarily close to H ( X | Y ) . In practice, due to thesuboptimality of any practical coder, some margin µ ≥ should be taken. Hence, we assume that EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 12 the allocation problem can be written as (cid:16) − k X (cid:17) H ( X ) ≤ µH ( X | Y ) . Since µ is a constant and H ( X | Y ) and H ( X ) are given, one can solve for k X and then perform the encoding process.Finally, it should be noted that, while we have assumed that X and Y are i.i.d., the DAC conceptcan be easily extended to a nonstationary source. This simply requires to consider all probabilitiesand overlap factors as depending on index i ; all computations, including the design of the overlapfactors and the derivation of the target rate, can be extended straightforwardly. A possible applicationis represented by context-based coding or Markov modeling of correlated sources. There is one caveatthough, in that, if the probabilities and context of each symbol are computed by the decoder frompast symbols, decoding errors can generate signiﬁcant error propagation.V. D ISTRIBUTED ARITHMETIC CODING : THE SYMMETRIC CASE

A. Symmetric DAC encoding and rate allocation

In many applications, it is preferable to encode the correlated sources at similar rather thanunbalanced rates; in this case, symmetric S-W coding can be used. Considering a pair of sources, insymmetric S-W coding both X and Y are encoded using separate DACs. We denote as C ′ X and C ′ Y the codewords representing X and Y , and R ′ X and R ′ Y the respective rates. With DAC, the rate of X and Y can be adjusted with a proper selection of the parameters k X and k Y for the two DACencoders. However, it should be noted that, for the same total rate, not all possible choices of k X and k Y are equally good, because some of them could complicate the decoder design, or be suboptimalin terms of error probability. To highlight the potential problems of a straightforward extension ofthe asymmetric DAC, let us assume that k X and k Y can be chosen arbitrarily. This would requirea decoder that performs a search in a symbol-synchronous tree where each node represents two sequential decoder states ( σ Xi , σ Yi ) for X and Y respectively. If the interval selection is ambiguousfor both sequences, the four possible binary symbol pairs (00,01,10,11) need to be included in thesearch space; this would accelerate the exponential growth of the tree, and quickly make the decodersearch unfeasible. This example shows that some constraints need to be put on k X and k Y in orderto limit the growth rate of the search space.To overcome this problem, we propose an algorithm that applies the idea of time-sharing to theDAC. The concept of time-shared DAC has been preliminarly presented in [35] for a pair of sources inthe subcase R ′ X = R ′ Y , i.e. providing only the mid-point of the S-W rate region. In the following weextend this to an arbitrary combination of rates, and show how this can be generalized to an arbitrarynumber of sources. For two sources, the idea is to divide the set of input indexes i = 0 , , . . . , N − in two disjoint sets such that, at each index i , ambiguity is introduced in at most one out of the twosources. In particular, for sequences x and y of length N , let A X and A Y be the subsets of even and EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 13 odd integer numbers in { , . . . , N − } respectively. We employ a DAC on x and y , but the choiceof parameters k X and k Y differs. In particular, we let the parameters depend on the symbol index i , i.e., k Xi and k Yi . The DAC of x employs parameter k Xi = k X ≥ for all i ∈ A X , and k Xi = 0 otherwise. Vice versa, y is encoded with parameter k Yi = k Y ≥ for all i ∈ A Y , and k Yi = 0 otherwise. As a consequence of these constraints, at each step of the decoding process, ambiguityappears in at most one out the two sequences. In this way, the growth rate of the decoding treeremains manageable, as no more than two new states are generated at each transition, exactly asin the asymmetric DAC decoder; this also makes the MAP metric simpler. The conceptual relationwith time-sharing is evident. Since, during the DAC encoding process, for each input symbol theambiguity is introduced in at most one out the two encoders, this corresponds to switching the roleof side information between either source on a symbol-by-symbol basis.By varying the parameters k X and k Y , all combinations of rates can be achieved. The achievedrates can be derived repeating the same computations described in Sect. IV, and can be expressed as R ′ X = (cid:16) − k X (cid:17) H ( X ) and R ′ Y = (cid:16) − k Y (cid:17) H ( Y ) . The rate allocation problem amounts to selectingsuitable rates R ′ X and R ′ Y such that R ′ X ≥ H ( X | Y ) , R ′ Y ≥ H ( Y | X ) , and R ′ X + R ′ Y ≥ H ( X, Y ) . Inpractice one will typically take some margin µ ≥ , such that R ′ X + R ′ Y = µH ( X, Y ) ; for safety, amargin should also be taken on R ′ X and R ′ Y with respect to the conditional entropy. Since the priorprobabilities of X and Y are given, one can solve for k X and k Y , and then perform the encodingprocess. Thus, the whole S-W rate region can be swept. B. Decoding process for symmetric DAC

Similarly to the asymmetric case, the symmetric decoding process can be viewed as a search alonga tree; however, speciﬁcally for the case of two correlated sources, each node in tree represents thedecoding states ( σ Xi , σ Yi ) of two sequential arithmetic decoders for x and y respectively. At eachiteration, sequential decoding is run from both states. The time-sharing approach guarantees that, fora given index i , the ambiguity can be found only in one of the two decoders. Therefore, at most twobranches must be considered, and the tree can be constructed using the same functions introduced inSect. III for the asymmetric case. This would be the same also for P sources. In particular, for i ∈ A X , Test-One-Symbol ( σ Yi ) yields an unambiguous symbol ˜ y i = A , whereas ambiguity can be found onlywhile attempting decoding for x with Test-One-Symbol ( σ Yi ). In conclusion, from the node ( σ Xi , σ Yi ) the function Test-One-Symbol is used on both states. If ambiguity is found on ˜ x i , Force-One-Symbol is then used to explore the two alternative paths for ˜ x i , whereas ˜ y i is used as side information forbranch metric evaluation. In the case that i ∈ A Y , the roles of x and y are exchanged. Therefore,Algorithm 1 can be easily extended to the symmetric case by alternatively probing either x or y for EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 14 ambiguity, and possibly generating a branching. The joint probability distribution can be written as P ( X = ˜ x , . . . , X N − = ˜ x N − , Y = ˜ y , . . . , Y N − = ˜ y N − | C ′ X , C ′ Y ) = (8) = Y i ∈A X P ( X i = ˜ x i | Y i , C ′ X , C ′ Y ) Y i ∈A Y P ( Y i = ˜ y i | X i , C ′ X , C ′ Y ) The symmetric encoder and decoder can be easily generalized to an arbitrary number P of sources.The idea is to identify P subsets of input indexes i = 0 , , . . . , N − such that, at each symbol index i ,ambiguity is introduced in at most one out of the P sources. In particular, for sequences x (1) , . . . , x ( P ) of length N , let A , . . . , A P be disjoint subsets of { , , . . . , N − } . We denote the DAC parametersas k (1) i , . . . , k ( P ) i . The DAC of x ( j ) employs parameter k ( j ) i = k ( j ) ≥ for all i ∈ A j , and k ( j ) i = 0 otherwise. As a consequence of these constraints, at each step of the decoding process, ambiguityappears in at most one out the P sequences. Note that this formulation also encompasses the casethat one or more sources are independent of each other and from all the others; these sources can becoded with a classical AC, taking A j = ∅ for this source.The selection of the sets A j and the overlap factors k ( j ) , for j = 1 , . . . , P , is still somewhatarbitrary, as the expected rate of source j depends on both the cardinality of A j and the value of k ( j ) .In a realistic application it would be more practical to ﬁx the sets A j once and for all, and to modifythe parameters k ( j ) so as to obtain the desired rate. This is because, for time-varying correlations,one has to update the rate on-the-ﬂy. In a distributed setting, varying one parameter k ( j ) requires tocommunicate the change only to source j , while varying the sets A j requires to communicate thechange to all sources. Therefore, we deﬁne A j such that the P statistically dependent sources take inturns the role of the side information. Any additional independent sources are coded separately using A j = ∅ . In particular, we set A j = { k | k % P = j } , where % denotes the remainder of the divisionbetween two integers, and j = 0 . The DAC encoder for the j -th source inserts ambiguity onlyat time instants i ∈ A j . At each node, the decoder stores the states of the P arithmetic decoders,and possibly performs a branching if the codeword related to the only potentially ambiguous symbolat the current time i is actually ambiguous. Although this encoding and decoding structure is notnecessarily optimal, it does lead to a viable decoding strategy.VI. R ESULTS : ASYMMETRIC CODING

In the following we provide results of a performance evaluation carried out on DAC. We implementa communication system that employs a DAC and a joint decoder, with no feed-back channel; at thedecoder, pruning is performed using the M-algorithm [39], with M=2048. The side information isobtained by sending the source X through a binary symmetric channel with transition probability p ,which measures the correlation between the two sources. We simulate a source with both balanced EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 15 ( p = 0 . ) and skewed ( p > . ) symbol probabilities. The ﬁrst setting implies H ( X ) = H ( Y ) = 1 and H ( X, Y ) = 1 + H ( X | Y ) , where H ( X | Y ) depends on p . The closer p to 0.5, the less correlatedthe sources, and hence the higher H ( X | Y ) . In the skewed case, given p , H ( X ) is ﬁxed, whereasboth H ( Y ) and H ( X | Y ) depend on p . Unless otherwise speciﬁed, each point of the ﬁgures/tablespresented in the following has been generated averaging the results obtained encoding samples. A. Effect of termination

As a ﬁrst experiment, the beneﬁt of the termination policy is assessed. An i.i.d. stationary source X emits sequences x of N = 200 symbols, with p = 0 . and H ( X | Y ) = 0 . , which are encodedwith DAC at ﬁxed rate . bps, i.e., . bps higher than the theoretical S-W bound. For Y weassume ideal lossless encoding at average rate H ( Y ) = 1 bps, so that the total average rate of X and Y is 1.5 bps. The bit error rate (BER) yielded by the decoder is measured for increasing valuesof the number of termination symbols T . The same simulation is performed with N = 1000 . In allsimulated cases, the DAC overlap has been selected to compensate for the rate penalty incurred bythe termination, so as to achieve the 1.5 bps overall target rate. The overlap factors α Xj are selectedaccording to (6).The results are shown in Fig. 4; it can be seen that the proposed termination is effective at reducingthe BER. There is a trade-off in that, for a given rate, increasing T reduces the effect of errors in thelast symbols, but requires to overlap the intervals more. It is also interesting to consider the positionof the ﬁrst decoding error as, without termination, errors tend to cluster at the end of the block. For N = 200 , the mean position value is 191, 178, 168, 161 and 95, with standard deviation 13, 18, 25,36 and 49, respectively for T equal to 0, 5, 10, 15 and 20. For N = 1000 , the mean value is 987,954, 881, 637 and 536, with standard deviation 57, 124, 229, 308 and 299. The optimal values of T are around 15-20 symbols. Therefore, we have selected T = 15 and used this value for all theexperiments reported in the following. B. Effect of the overlap design rule

Next, an experiment has been performed to validate the theoretical analysis of the effects of differentoverlap designs shown in Sect. IV-B. In Fig. 5 the performance obtained by using the design ofequations (4) and (6) respectively is shown. The experimental settings are N = 200 , p = 0 . , ﬁxedrate for x of 0.5 bps, and total average rate for X and Y equal to 1.5 bps, with ideal lossless encodingof Y at rate H ( Y ) . The BER is reported as a function of the source correlation expressed in termsof H ( X, Y ) . It is worth noticing that the performance yielded by different overlap design rules arealmost equivalent. Note that the rule in (6) consistently outperforms that in (4), conﬁrming that this EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 16 −4 −3 −2 T BE R N=200N=1000

Fig. 4. BER as function of T (number of termination symbols); p = 0 . , total rate = 1.5 bps, rate of x = 0.5 bps, H ( X | Y ) = 0 . . latter is only optimal for the rate. There is some difference when H ( X, Y ) is very high (i.e., forweakly correlated sources). However, this case is of marginal interest since the performance is poor(the BER is of the order of 0.1). C. Performance evaluation at ﬁxed rate

The performance of the proposed system is compared with that of a system where the DAC encoderand decoder are replaced by a punctured turbo code similar to that in [6]. We use turbo codes with rate- generator (17,15) octal (8 states) and (31,27) octal (16 states), and employ S-random interleavers,and 15 decoder iterations. We consider the case of balanced source ( p = p = 0 . ) and skewedsource (in particular p = 0 . and p = 0 . ). For a skewed source, as an improvement with respectto [6], the turbo decoder has been modiﬁed by adding to the decoder metric the a priori term, asdone in [16]. Block sizes N = 50 , N = 200 and N = 1000 have been considered (with S-randominterleaver spread of 5, 11 and 25 respectively); this allows to assess the DAC performance at smalland medium block lengths. Besides turbo codes, we also considered the rate-compatible LDPC codesproposed in [21]. For these codes, a software implementation is publicly available on the web; amongthe available pre-designed codes, we used the matrix for N = 396 , which is comparable with theblock lengths considered for the DAC and the turbo code.The results are worked out in a ﬁxed-rate coding setting as in [14], i.e., the rate is the same for each EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 17 −5 −4 −3 −2 −1 H(X,Y) BE R Eq.(4)Eq.(6)

Fig. 5. Performance comparison between the use of different overlap rules ( p = 0 . , total rate = 1.5 bps). sample realization of the source. Fig. 6 reports the results for the balanced source case; the abscissais H ( X, Y ) , and is related to p . The performance is measured in terms of the residual BER afterdecoding, which is akin to the distortion in the Wyner-Ziv binary coding problem with Hammingmetric. Both the DAC and the turbo code generate a description of x at ﬁxed rate 0.5 bps; the totalaverage rate of X and Y is 1.5 bps, with ideal lossless encoding of Y at rate H ( Y ) . Since H ( Y ) = 1 ,we also have that H ( X, Y ) = 1 + H ( X | Y ) . This makes it possible to compare these results with thecase of skewed sources which is presented later in this section, so as to verify that the performanceis uniformly good for all distributions. The Wyner-Ziv bound for a doubly symmetric binary sourcewith Hamming metric is also reported for comparison.As can be seen, the performance of DAC slightly improves as the block length increases. This ismostly due to the effect of the termination. As the number of bits used to terminate the encoder ischosen independently of the block length, the rate penalty for non overlapping the last bits weightsmore when the block length is small, while the effect vanishes for large block length. In [34], wherethe termination effect is not considered, the performance is shown to be almost independent ofthe block size. It should also be noted that the value of M required for near-optimal performancegrows exponentially with the block size. As a consequence, the memory which leads to near-optimalperformance for N = 50 or N = 200 limits the performance for N = 1000 . EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 18

We compared both 8-states and 16-states turbo codes. The 8-states code is often used in practicalapplications, as it exhibits a good trade-off between performance and complexity; the 16-states codeis more powerful, and requires more computations. It can be seen that, for block length N = 50 and N = 200 , the proposed system outperforms the 8-states and 16-states turbo codes. For block length N = 1000 , the DAC performs better than the 8-states turbo code, and is equivalent to the 16-statescode. It should be noted that, in this experiment, only the “channel coding performance” of the DACis tested, since for the balanced source no compression is possible as H ( X ) = 1 . Consequently, it isremarkable that the DAC turns out to be generally more powerful than the turbo code at equal blocklength. Note that the performance of the 16-states code is limited by the error ﬂoor, and could beimproved using an ad-hoc design of the code or the interleaver; the DAC has no error ﬂoor, but itswaterfall is less steep. For H ( X | Y ) ≥ . , a result not reported in Fig. 6 shows that the DAC with N = 200 and N = 1000 also outperform the 8-state turbo-coder with N = 5000 . In Fig. 6 and in thefollowing, it can be seen that turbo codes do not show the typical cliff-effect. This is due to the factthat, at the block lengths considered in this paper, the turbo code is still very far from the capacity;its performance improves for larger block lengths, where the cliff-effect can be seen. In terms of therate penalty, setting a residual BER threshold of − , for N = 200 the DAC is almost 0.3 bps awayfrom the S-W limit, while the best 16-state turbo code simulated in this paper is 0.35 bps away;for N = 1000 the DAC is 0.26 bpp away, while the best 8-state turbo code is 0.30 bps away. Theperformance of the LDPC code for N = 396 is halfway between the turbo codes for N = 200 and N = 1000 , and hence very similar to the DAC.The results for a skewed source are reported in Fig. 7 for p = 0 . . In this setting, we select variousvalues of H ( X, Y ) , and encode x at ﬁxed rate such that the total average rate for X and Y equals1.5 bps, with ideal lossless encoding of Y at rate H ( Y ) . For Fig. 7, from left to right, the rates of x are respectively 0.68, 0.67, 0.66, 0.64, 0.63, 0.61, 0.59, and 0.58 bps. Consistently with [30], allturbo codes considered in this work perform rather poorly on skewed sources. In [30] this behavioris explained with the fact that, when the source is skewed, the states of the turbo code are used withuneven probability, leading to a smaller equivalent number of states. On the other hand, the DAC hasgood performance also for skewed sources, as it is designed to work with unbalanced distributions.The performance of the LDPC codes is similar to that of the best turbo codes, and slightly worsethan the DAC.Similar remarks can be made in the case of p = 0 . , which is reported in Fig. 8. In this case, wehave selected a total rate of 1 bps, since the source is more unbalanced and hence easier to compress.The rates for x are respectively 0.31, 0.34, 0.37, 0.39, 0.42, 0.44, and 0.47 bps. In this case the turbocode performance is better than in the previous case, although it is still poorer than DAC. This is due EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 19 −5 −4 −3 −2 −1 H(X,Y) BE R DAC N=50DAC N=200DAC N=1000TC8S N=50TC8S N=200TC8S N=1000TC16S N=50TC16S N=200TC16S N=1000LDPC−R N=396LDPC−I N=396Wyner−Ziv bound

Fig. 6. Performance comparison of data communication systems ( p = 0 . , total rate = 1.5 bps, rate for x = 0.5 bps):DAC versus turbo coding, balanced source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbocode with S-random interleaver; LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. to the fact that the sources are more correlated, and hence the crossover probability on the virtualchannel is lower. Therefore, the turbo code has to correct a smaller number of errors, whereas for p = 0 . the correlation was weaker and hence the crossover probability was higher. D. Performance evaluation for strongly correlated sources

We also considered the case of strongly correlated sources, for which high-rate channel codes areneeded. These sources are a good model for the most signiﬁcant bit-planes of several multimediasignals. Due to the inefﬁciency of syndrome-based coders, practical schemes often assume that noDSC is carried out on those bit-planes, e.g., they are not transmitted, and at the decoder they aredirectly replaced by the side information [9].The results are reported in Tab. I for the DAC and the 16-state turbo code, when a rate of 0.1bps is used for x . The table also reports the cross-over probability p , corresponding, for a balancedsource, to the performance of an uncoded system that reconstructs x as the side information y . Ascan be seen, the DAC has similar performance to the turbo codes and LDPC codes, and becomesbetter when the source is extremely correlated, i.e., H ( X | Y ) = 0 . . EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 20 −5 −4 −3 −2 −1 H(X,Y) BE R DAC N=50DAC N=200DAC N=1000TC16S N=50TC16S N=200TC16S N=1000TC8S N=50TC8S N=200TC8S N=1000LDPC−R N=396LDPC−I N=396

Fig. 7. Performance comparison of data communication systems ( p = 0 . , total rate = 1.5 bps): DAC versus turbocoding, skewed source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbo code with S-randominterleaver; LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. −5 −4 −3 −2 −1 H(X,Y) BE R DAC N=50DAC N=200DAC N=1000TC8S N=50TC16S N=50TC8S N=200TC16S N=200TC8S N=1000TC16S N=1000LDPC−R N=396LDPC−I N=396

Fig. 8. Performance comparison of data communication systems ( p = 0 . , total rate = 1 bps): DAC versus turbo coding,skewed source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbo code with S-random interleaver;LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. E. Performance evaluation at variable rate

Finally, the coding efﬁciency of DAC is measured in terms of expected rate required to achieveerror-free decoding. This amounts to re-encoding the sequence at increasing rates, and representsthe optimal DAC performance if the encoder could exactly predict the decoder behavior. Since eachrealization of the source is encoded using a different number of bits, this case is referred to asvariable-rate encoding. This scenario is representative of practical distributed compression settings,

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 21

TABLE IR

ESIDUAL

BER

IN CASE OF STRONGLY CORRELATED SOURCES , WITH p = 0 . AND RATE FOR x EQUAL TO

BPS . N = 200 H ( X | Y ) p DAC TC16S0.1 . · − . · − . · − . · − . · − . · − . · − . · − . · − N = 1000 H ( X | Y ) p DAC TC16S0.1 . · − . · − . · − . · − . · − . · − . · − < · − . · − N = 396 H ( X | Y ) p LDPC-R LDPC-I0.1 . · − . · − . · − . · − . · − . · − . · − . · − . · − e.g., [6], in which one seeks the shortest code that allows to reconstruct without errors each realizationof the source process.For this simulation, the following setup is used. The source correlation H ( X | Y ) is kept constantand, for each sample realization of the source, the total rate is progressively increased beyond theS-W bound, in steps of 0.01 bps, until error-free decoding is obtained. This operation is repeated on1000 different realizations of the source; the mean value and standard deviation of the rates yieldingcorrect decoding are then computed.The results have been worked out for block length N = 200 , with probabilities p = 0 . and p = 0 . . For p = 0 . , the conditional entropy H ( X | Y ) (i.e., the S-W bound) has been set to 0.5bps. For p = 0 . , the joint entropy H ( X, Y ) has been set to 1 bps; this amounts to coding Y at theideal rate of H ( Y ) ≃ . bps, with a S-W bound H ( X | Y ) ≃ . bps.The results are reported in Tab. II. As can be seen, the DAC has a rate loss of about 0.06 bpswith respect to the S-W bound for both the symmetric and skewed source. The turbo code exhibitsa loss of about 0.2 bps and 0.13 bps. The LDPC-R code has a relatively small loss, i.e., 0.12 bpsin the symmetric case and 0.10 in the skewed one. The LDPC-I code has a slightly smaller loss,i.e., 0.09 bps in the symmetric case and 0.075 in the skewed one. However, the DAC still performsslightly better. It should be noted that, while for LDPC and turbo codes the encoding is done only EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 22 once thanks to rate-compatibility, for the DAC multiple encodings are necessary, leading to highercomplexity.

TABLE IIP

ERFORMANCE COMPARISON FOR VARIABLE - RATE CODING : MEAN AND STANDARD DEVIATION OF RATE NEEDED FORLOSSLESS COMPRESSION . p = 0 . p = 0 . H ( X | Y ) , H ( X, Y ) N = 200 N = 396 N = 396 N = 200 N = 1000 F. Performance versus complexity

As has been said, the DAC performance is a function of the block size and especially of thedecoder parameter M . Tab. III reports comparative decoding results of DAC, turbo and LDPC codesfor various values of M and N . The simulations have been made under the same conditions of Fig.6, i.e. p = 0 . , total average rate equal to 1.5 bps, and ﬁxed rate of x equal to 0.5 bps, consideringthe case of H ( X | Y ) = 0 . . Tab. III reports the residual BER, and the running time in milliseconds,obtained running the different decoders on a workstation with Pentium IV 3 GHz processor runningWindows XP.As can be seen, the DAC complexity grows exponentially with M . Increasing M typically improvesperformance, and the improvement is larger as N increases. Comparing DAC and turbo codes atapproximately equal computation time, it can be seen that, for N = 50 and N = 200 , the DACperformance is signiﬁcantly better, while the turbo code outperforms DAC for N = 1000 . For LDPCcodes, the results for N = 396 can be compared with the DAC for N = 200 . It can be seen that,with similar computation time, DAC and LDPC codes have similar performance. The BER yieldedby the LDPC code is four times smaller than that of DAC, although it would increase going from N = 396 to N = 200 . VII. R ESULTS : SYMMETRIC CODING

In the following we provide results for the symmetric DAC. We consider two sources with balanced( p = 0 . ) and unbalanced ( p = 0 . ) distribution with arbitrary rate splitting, and use M = 2048 . EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 23

TABLE IIID

ECODER COMPLEXITY AND PERFORMANCE FOR

DAC,

TURBO CODES AND

LDPC

CODES .Algorithm Parameter BER Time (ms)DAC N = 50 M = 64 1 . · − N = 50 M = 256 4 . · − N = 50 M = 512 3 . · − N = 50 M = 1024 2 . · − N = 50 M = 2048 2 . · − N = 50

15 iterations . · − N = 200 M = 64 3 . · − N = 200 M = 256 8 . · − N = 200 M = 512 4 . · − N = 200 M = 1024 2 . · − N = 200 M = 2048 2 . · − N = 200

15 iterations . · − N = 1000 M = 64 5 . · − N = 1000 M = 256 1 . · − N = 1000 M = 512 5 . · − N = 1000 M = 1024 2 . · − N = 1000 M = 2048 1 . · − N = 1000

15 iterations . · − N = 396

100 iterations . · − N = 396

100 iterations . · − A. Performance evaluation at ﬁxed rate

For ﬁxed rate, we set the total rate of x and y equal to 1.5 bps. We consider two cases of ratesplitting. In the ﬁrst case the rate is equally split; we choose k X = k Y so as to achieve a rate of 0.75bps for each source. In the second case we encode x at 0.6 bps and y at 0.9 bps.The performance of the symmetric DAC is worked out for N = 200 and N = 1000 . Sincesymmetric DSC coders typically reconstructs each sequence either without any errors or with a largenumber of errors [28], we report the frame error rate (FER) instead of the residual BER, i.e. theprobability that a data block contains at least one error after joint decoding. For each point, wesimulated at least bits.Fig. 9 shows the results for the symmetric DAC. Comparisons with other algorithms can be donebased on the following remarks. In [31], a symmetric S-W coder is proposed employing turbo codes,which can obtain any rate splitting. In the case that one source is encoded without ambiguity, this EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 24 reduces to the asymmetric turbo-based S-W coder we have employed in Sect. VI. In [31] it is reportedthat this algorithm achieves its best performance in the asymmetric points of the S-W region, whileit is slightly poorer in the intermediate points. Therefore, in Fig. 9 we report the FER correspondingto the best turbo code shown in Fig. 6 for N = 200 and N = 1000 , as this lower-bounds the FERachieved by [31] over the entire S-W region. Moreover, we also report the FER achieved by irregularLDPC codes with block length N = 396 [21]. The asymmetric algorithm in [21] has been extendedin [33] to arbitrary rate splitting, showing that the performance is uniformly good over the entire S-Wregion. Finally, we also report the FER curve of the asymmetric DAC for N = 1000 . −4 −3 −2 −1 H(X,Y) F E R Symmetric DAC N=200, R X =R Y =0.75Symmetric DAC N=1000, R X =R Y =0.75Symmetric DAC N=200, R X =0.6 bps, R Y =0.9 bpsSymmetric DAC N=1000, R X =0.6 bps, R Y =0.9 bpsAsymmetric TC16S N=200Asymmetric TC16S N=1000Asymmetric DAC N=1000Asymmetric LDPC−I N=396 Fig. 9. Performance comparison of data communication systems ( p = 0 . , total rate = 1.5 bps). DAC: distributedarithmetic coding; TC16S: 16-state turbo code with S-random interleaver; LDPC-I: irregular LDPC codes from [21]. In Fig. 9, the results for symmetric coding are very similar to what has been observed in theasymmetric case. The DAC achieves very similar BER for N = 200 and N = 1000 ; hence, the FERis smaller for N = 200 . The results are almost independent of the rate splitting between x and y ,as can be seen by comparing the two rate-splitting cases as well as the asymmetric DAC. The turbocodes for N = 200 and N = 1000 , and the irregular LDPC code, exhibit poorer performance thanDAC. B. Performance evaluation at variable rate

For variable rate coding, we consider the same two settings as in Sect. VI-E, i.e., block length N = 200 , with probabilities p = 0 . and p = 0 . ; in the ﬁrst case the conditional entropy has been EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 25 set to 0.5 bps, while in the second case the joint entropy H ( X, Y ) has been set to 1 bps. The resultsare shown in Fig. 10. As can be seen, the performance of the symmetric DAC is uniformly good overthe entire S-W region, and is signiﬁcantly better than turbo codes and LDPC codes. In particular,the DAC suboptimality is between 0.03-0.06 bps, as opposed to 0.07-0.09 for the irregular LDPCcode, and 0.14-0.21 for the turbo code. It should be noted, however, that variable rate coding requiresfeedback, while the S-W bound is achievable with no feedback, with vanishing error probability as N → ∞ . In our simulations we re-encode the sequence at increasing rates (in steps of 0.01 bps),which represents the optimal DAC performance if the encoder could exactly predict the decoderbehavior. X R Y symmetric DAC N=200 symmetric DAC N=200S−W bound, p =0.1LDPC−I N=396 LDPC−I N=396TC16S N=200 TC16S N=200 S−W bound, p =0.5 Fig. 10. Performance comparison at variable rate. The curves in the top-right corner refer to the case of p = 0 . , andthose in the bottom-left corner to p = 0 . . DAC: distributed arithmetic coding; TC16S: 16-state turbo code with S-randominterleaver; LDPC-I: irregular LDPC codes from [21]. The solid curves represent the S-W bound. VIII. D

ISCUSSION AND CONCLUSIONS

We have proposed DAC as an alternative to existing DSC coders based on channel codes. DACcan operate in the entire S-W region, providing both asymmetric and symmetric coding.DAC achieves good compression performance, with uniformly good results over the S-W rateregion; in particular, its performance is comparable with or better than that of turbo and LDPCcodes at small and medium block lengths. This is very important in many applications, e.g., in the

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 26 multimedia ﬁeld, where the encoder partitions the compressed ﬁle into small units (e.g., packets inJPEG 2000, slices and NALUs in H.264/AVC) that have to be coded independently.As for encoding complexity, which is of great interest for DSC, DAC has linear encoding com-plexity, like a classical AC [41]. Turbo codes and the LDPC codes in [21] also have linear encodingcomplexity, whereas general LDPC codes typically have more than linear, and typically quadraticcomplexity [42]. As a consequence, the complexity of DAC is suitable for DSC applications.A major advantage of DAC lies in the fact that it can exploit statistical prior knowledge aboutthe source very easily. This is a strong asset of AC, which is retained by DAC. Probabilities can beestimated on-the-ﬂy based on past symbols; context-based models employing conditional probabilitiescan also be used, as well as other models providing the required probabilities. These models allow toaccount for the nonstationarity of typical real-world signals, which is a signiﬁcant advantage over DSCcoders based on channel codes. In fact, for channel codes, accounting for time-varying correlationsrequires to adjust the code rate, which can only be done for the next data block, incurring a signiﬁcantadaptation delay. Moreover, with channel codes it is not easy to take advantage of prior information;for turbo codes it has been shown to be possible [43], employing a more sophisticated decoder.Another advantage of the proposed DAC lies in the fact that the encoding process can be seen asa simple extension of the AC process. As a consequence, it is straightforward to extend an existingscheme employing AC as ﬁnal entropy coding stage in order to provide DSC functionalities.R

EFERENCES [1] D. Slepian and J.K. Wolf, “Noiseless coding of correlated information sources,”

IEEE Transactions on InformationTheory , vol. 19, no. 4, pp. 471–480, July 1973.[2] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”

IEEETransactions on Information Theory , vol. 22, no. 1, pp. 1–10, Jan. 1976.[3] S.S. Pradhan, J. Chou, and K. Ramchandran, “Duality between source coding and channel coding and its extensionto the side information case,”

IEEE Transactions on Information Theory , vol. 49, no. 5, pp. 1181–1203, May 2003.[4] R. Zamir, “The rate loss in the Wyner-Ziv problem,”

IEEE Transactions on Information Theory , vol. 42, no. 6, pp.2073–2084, Nov. 1996.[5] R. Puri and K. Ramchandran, “PRISM: a “reversed” multimedia coding paradigm,” in

Proc. of IEEE InternationalConference on Image Processing , 2003, pp. 617–620.[6] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,”

Proceedings of the IEEE , vol.Special Issue on Advances in Video Coding and Delivery, no. 1, pp. 71–83, Jan. 2005.[7] M. Grangetto, E. Magli, and G. Olmo, “Context-based distributed wavelet video coding,” in

Proceedings of IEEEInternational Workshop on Multimedia Signal Processing , 2005.[8] C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi, R. Leonardi, and J. Ostermann, “Distributed monoview and multiviewvideo coding,”

IEEE Signal Processing Magazine , vol. 24, no. 5, pp. 67–76, Sept. 2007.[9] Q. Xu and Z. Xiong, “Layered Wyner-Ziv video coding,”

IEEE Transactions on Image Processing , vol. 15, no. 12,pp. 3791–3803, Dec. 2006.

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 27 [10] H. Wang and A. Ortega, “Scalable predictive coding by nested quantization with layered side information,” in

Proceedings of IEEE International Conference on Image Processing , 2004, pp. 1755–1758.[11] A. Sehgal, A. Jagmohan, and N. Ahuja, “Wyner-Ziv coding of video: an error-resilient compression framework,”

IEEE Transactions on Multimedia , vol. 6, no. 2, pp. 249–258, Apr. 2004.[12] E. Magli, M. Barni, A. Abrardo, and M. Grangetto, “Distributed source coding techniques for lossless compressionof hyperspectral images,”

EURASIP Journal on Advances in Signal Processing , vol. 2007, 2007.[13] N.-M. Cheung, C. Tang, A. Ortega, and C.S. Raghavendra, “Efﬁcient wavelet-based predictive Slepian-Wolf codingfor hyperspectral imagery,”

Signal Processing , vol. 86, no. 11, pp. 3180–3195, Nov. 2006.[14] Z. Xiong, A.D. Liveris, and S. Cheng, “Distributed source coding for sensor networks,”

IEEE Signal ProcessingMagazine , vol. 21, no. 5, pp. 80–94, Sept. 2004.[15] S.S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DISCUS): Design and construction,”

IEEE Transactions on Information Theory , vol. 49, no. 3, pp. 626–643, Mar. 2003.[16] J. Garcia-Frias and Y. Zhao, “Compression of correlated binary sources using turbo codes,”

IEEE CommunicationsLetters , vol. 5, no. 10, pp. 417–419, Oct. 2001.[17] A.D. Liveris, Z. Xiong, and C.N. Georghiades, “Distributed compression of binary sources using conventional paralleland serial concatenated convolutional codes,” in

Proc. of IEEE Data Compression Conference , 2003, pp. 193–202.[18] R. Gallager,

Low Density Parity Check Codes , MIT Press, 1963.[19] A. Liveris, Z. Xiong, and C. Georghiades, “Compression of binary sources with side information at the decoder usingLDPC codes,”

IEEE Communications Letters , vol. 6, no. 10, pp. 440–442, Oct. 2002.[20] Y. Yang, S. Cheng, Z. Xiong, and W. Zhao, “Wyner-Ziv coding based on TCQ and LDPC codes,” in

Proceedings ofAsilomar Conference on Signals, Systems, and Computers , 2003, pp. 825–829.[21] D. Varodayan, A. Aaron, and B. Girod, “Rate adaptive codes for distributed source coding,”

Signal Processing , vol.86, no. 11, pp. 3123–3130, Nov. 2006.[22] Z. Tu, J. Li, and R.S. Blum, “Compression of a binary source with side information using parallely concatenatedconvolutional codes,” in

Proceedings of IEEE GLOBECOM , 2004.[23] A. Majumdar, J. Chou, and K. Ramchandran, “Robust distributed video compression based on multilevel coset codes,”in

Proceedings of Thirty-Seventh Asilomar Conference on Signals, Systems and Computers , 2003, pp. 845–849.[24] S. Cheng and Z. Xiong, “Successive reﬁnement for the Wyner-Ziv problem and layered code design,”

IEEETransactions on Signal Processing , vol. 53, no. 8, pp. 3269–3281, Aug. 2005.[25] P. Koulgi, E. Tuncel, S.L. Regunathan, and K. Rose, “On zero-error source coding with decoder side information,”

IEEE Transactions on Information Theory , vol. 49, no. 1, pp. 99–111, Jan. 2003.[26] Q. Zhao and M. Effros, “Lossless and near-lossless source coding for multiple access networks,”

IEEE Transactionson Information Theory , vol. 49, no. 1, pp. 112–128, Jan. 2003.[27] S.S. Pradhan and K. Ramchandran, “Generalized coset codes for distributed binning,”

IEEE Transactions onInformation Theory , vol. 51, no. 10, pp. 3457–3474, Oct. 2005.[28] V. Stankovic, A.D. Liveris, Z. Xiong, and C.N. Gheorghiades, “On code design for the Slepian-Wolf problem andlossless multiterminal networks,”

IEEE Transactions on Information Theory , vol. 52, no. 4, pp. 1495–1507, Apr. 2006.[29] N. Gehrig and P.L. Dragotti, “Symmetric and a-symmetric Slepian-Wolf codes with systematic and non-systematiclinear codes,”

IEEE Communications Letters , vol. 9, no. 1, pp. 61–63, Jan. 2005.[30] P. Tan and J. Li, “A general and optimal framework to achieve the entire rate region for Slepian-Wolf coding,”

SignalProcessing , vol. 86, no. 11, pp. 3102–3114, Nov. 2006.[31] J. Garcia-Frias and F. Cabarcas, “Approaching the Slepian-Wolf boundary using practical channel codes,”

SignalProcessing , vol. 86, pp. 3096–3101, 2006.

EEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 28 [32] M. Sartipi and F. Fekri, “Distributed source coding in wireless sensor networks using LDPC coding: The entireSlepian-Wolf rate region,” in

Proceedings of IEEE WCNC , 2005, pp. 1939–1944.[33] V. Toto-Zarasoa, A. Roumy, and C. Guillemot, “Rate-adaptive codes for the entire Slepian-Wolf region and arbitrarilycorrelated sources,” in

Proceedings of IEEE ICASSP , 2008, pp. 2965–2968.[34] M. Grangetto, E. Magli, and G. Olmo, “Distributed arithmetic coding,”

IEEE Communications Letters , vol. 11, no.11, pp. 883–885, Nov. 2007.[35] M. Grangetto, E. Magli, and G. Olmo, “Symmetric distributed arithmetic coding of correlated sources,” in

Proceedingsof IEEE MMSP , 2007, pp. 111–114.[36] X. Artigas, S. Malinowski, C. Guillemot, and L. Torres, “Overlapped quasi-arithmetic codes for distributed videocoding,” in

Proceedings of IEEE ICIP , 2007, pp. 9–12.[37] P. Howard and J. Vitter, “Practical implementations of arithmetic coding,” in

Image and Text Compression . Norwell,1992.[38] A. Moffatt, R.M. Neal, and I.H. Witten, “Arithmetic coding revisited,”

ACM Transactions on Information Systems ,vol. 16, pp. 256–294, 1995.[39] J.B. Anderson and S. Mohan,

Source and Channel Coding , Kluwer, 1991.[40] M. Grangetto, P. Cosman, and G. Olmo, “Joint source/channel coding and MAP decoding of arithmetic codes,”

IEEETransactions on Communications , vol. 53, no. 6, pp. 1007–1016, June 2005.[41] H. Helfgott and M. Cohn, “Linear-time construction of optimal context trees,” in

Proceedings of IEEE DataCompression Conference , 1998, pp. 369–377.[42] T.J. Richardson and L.R. Urbanke, “Efﬁcient encoding of low-density parity-check codes,”

IEEE Transactions onInformation Theory , vol. 47, no. 2, pp. 638–656, Feb. 2001.[43] J. Garcia-Frias and J.D. Villasenor, “Joint turbo decoding and estimation of hidden Markov sources,”