aa r X i v : . [ c s . I T ] N ov Statistical Physics of Random Binning ∗ Neri Merhav
Department of Electrical EngineeringTechnion - Israel Institute of TechnologyTechnion City, Haifa 32000, ISRAELE–mail: [email protected]
Abstract
We consider the model of random binning and finite–temperature decoding for Slepian–Wolf codes,from a statistical–mechanical perspective. While ordinary random channel coding is intimatelyrelated to the random energy model (REM) – a statistical–mechanical model of disordered mag-netic materials, it turns out that random binning (for Slepian–Wolf coding) is analogous to an-other, related statistical mechanical model of strong disorder, which we call the random dilutionmodel (RDM). We use the latter analogy to characterize phase transitions pertaining to finite–temperature Slepian–Wolf decoding, which are somewhat similar, but not identical, to those offinite–temperature channel decoding. We then provide the exact random coding exponent of thebit error rate (BER) as a function of the coding rate and the decoding temperature, and discuss itsproperties. Finally, a few modifications and extensions of our results are outlined and discussed.
Index Terms
Slepian–Wolf codes, error exponent, bit–error probability, finite–temperature de-coding, random energy model, phase transitions, phase diagram. ∗ This research was supported by the Israel Science Foundation (ISF), grant no. 412/12. Introduction
The famous paper by Slepian and Wolf [16], on separate (almost) lossless compression and jointdecompression of statistically dependent sources, has triggered an intensive research activity ofinformation theorists during the last forty years. Among its various generalizations and modifica-tions, several recent works have been dedicated to detailed performance analysis, first and foremost,to exponential error bounds for the Slepian–Wolf (SW) decoder. Specifically, Gallager [6] obtaineda lower bound on the random coding error exponent associated with random binning, by employinga very similar technique to the one he used in his famous derivation of the random (channel) codingerror exponent [5, Sections 5.5–5.6]. A few years later, this error exponent was shown by Csiszár,Körner and Marton [2], [4] to be achievable by a universal decoder, that is independent of thechannel. In [3] Csiszár and Körner have studied universally achievable error exponents pertainingto linear codes, as well as ordinary (non-universal) expurgated exponents. Later, Csiszár [1] andOohama and Han [13] have developed error exponents for situations of coded side information. Forhigh rates at one of the two encoders, Kelly and Wagner [7] have improved relative to these results,but not in the general case.This paper continues the above described line of work on exponential error bounds associatedwith random binning of SW codes, but unlike the previous works mentioned above, this one ismore oriented to the statistical–mechanical point of view. Specifically, in analogy to the notion offinite–temperature decoding, originally proposed by Ruján [15] in the context of channel coding(see also [14, Section 6.3.3]), here we examine a similar finite–temperature decoder for SW codes,and analyze it from various aspects. In a nutshell, finite–temperature decoding amounts to anoptimal symbol–error–probability decoder that is associated with the likelihood function, raised tosome power β ≥
0, a parameter referred to as the inverse temperature , which is a term borrowedfrom equilibrium statistical mechanics and the Boltzmann–Gibbs distribution. In channel coding,the motivation for this parametrization by β could either stem from uncertainty concerning thechannel SNR (which is analogous to the “temperature” of the real channel), or from the fact thatit enables to analyze both the optimal symbol–error–probability decoder and the optimal block–error–probability decoder on the same footing ( β = 1 and β → ∞ , respectively [14, p. 118]). InSW coding, the motivations are similar. 2ocusing mostly on the “one–sided” version of the Slepian–Wolf setting, where one source iscompressed while the other one is available (e.g., after perfect reconstruction) as side informationat the decoder, we derive several results in this paper. First, we present a statistical–mechanicalmodel, henceforth referred to as the random dilution model (RDM), which is a natural analogue ofthe random binning mechanism, exactly in the same way that the random energy model (REM) ofstatistical mechanics is the physical analogue of the random coding mechanism (see also [8], [14,Chapters 5 and 6] and references therein). The RDM was already mentioned briefly in an earlierwork [10], but was not developed in detail therein.Secondly, in analogy to the phase diagrams of finite–temperature channel coding, provided in [9]and [14, Sect. 6.3.3], here we provide a statistical–mechanical characterization in the form of a phasediagram of SW codes with random binning, in the plane of R vs. T , where R is the coding rate and T = 1 /β is the decoding temperature. As in channel coding, there are three different phases, inwhich the kinds of behavior of the posterior distribution of the source given the side information andthe bin index, are completely different. We will elaborate on these phases in the sequel. Generallyspeaking, the phase diagram of a finite–temperature SW decoder appears similar to the “mirrorimage” of the one of channel coding, where the axis of the rate R is flipped over. On the one hand,this seems to make sense, in view of the fact that a SW code at rate R is nearly equivalent toa channel code at rate H − R , where H is the entropy of the compressed source. However, thisequivalence is not perfect, as there are also some non–trivial differences between channel coding andSW coding. Accordingly, there are differences also in the phase diagram beyond the aforementioned“mirror reflection”.Next, we derive the exact exponent of the symbol error probability of the finite–temperaturedecoder, as a function of R and β , and we make a few observations concerning the properties of theresulting error exponent, denoted E ( R, β ). It turns out that E ( R, β ) also exhibits phase transitions,which are related to the above mentioned phases of the posterior, but are not identical.Finally, we outline a few extensions and modifications of the above described results, in sev-eral directions, including: mismatched decoding, universal decoding, variable–rate coding, and the“two–sided” SW problem, where both sources are compressed separately and decompressed jointly.The outline of the paper is as follows. In Section 2, we establish notation conventions, define the3roblem setting, and provide some physics background. In Section 3, we derive the phase diagram,and in in Section 4, we derive the error exponent of the finite–temperature decoder.
Throughout the paper, random variables will be denoted by capital letters, specific values they maytake will be denoted by the corresponding lower case letters, and their alphabets will be denoted bycalligraphic letters. Random vectors and their realizations will be denoted, respectively, by capitalletters and the corresponding lower case letters, both in the bold face font. Their alphabets willbe superscripted by their dimensions. For example, the random vector X = ( X , . . . , X N ), ( N –positive integer) may take a specific vector value x = ( x , . . . , x N ) in X N , the N –th order Cartesianpower of X , which is the alphabet of each component of this vector.The expectation operator will be denoted by E {·} . Logarithms and exponents will be understoodto be taken to the natural base unless specified otherwise. The indicator function will be denotedby I ( · ). The notation function [ t ] + will be defined as max { t, } . For two positive sequences, { a N } and { b N } , the notation a N · = b N will mean asymptotic equivalence in the exponential scale, thatis, lim N →∞ N log( a N b N ) = 0. Similarly, a N · ≤ b N will mean lim sup N →∞ N log( a N b N ) ≤
0, and so on.
Let { ( X i , Y i ) } Ni =1 be N independent copies of a random vector ( X, Y ), distributed according toa given probability mass function P ( x, y ), where x and y take on values in finite alphabets, X and Y , respectively. The source vector x = ( x , . . . , x N ), which is a generic realization of X =( X , . . . , X N ), is compressed at the encoder by random binning, that is, each N –tuple x ∈ X N israndomly and independently assigned to one out of M = e NR bins, where R is the coding rate in natsper symbol. Given a realization of the random partitioning into bins (revealed to both the encoderand the decoder), let f : X N → { , , . . . , M − } denote the encoding function, i.e., u = f ( x ) isthe encoder output. Accordingly, the inverse image of u , defined as f − ( u ) = { x : f ( x ) = u } , isthe bin of all source vectors mapped by the encoder into u . The decoder has access to u and to y = ( y , . . . , y N ), which is a realization of Y = ( Y , . . . , Y N ), namely, the side information at the4ecoder.As is well known, the optimal decoder in the sense of minimum word error probability is theword–level maximum a-posteriori (MAP) decoder,ˆ x = arg max x ∈ f − ( u ) P ( x | y ) = arg max x ∈ f − ( u ) P ( x , y ) . (1)Similarly, the optimal decoder in the sense of minimum symbol error probability is given by thesymbol–level MAP decoder,ˆ x i = arg max x ∈X X x ∈ f − ( u ): x i = x P ( x , y ) , i = 1 , , . . . , N. (2)Following the notion of finite–temperature decoding in channel coding [15] (see also [14, Section6.3.3]), we consider a parametric family of decoders, that generalizes both (1) and (2), and whichis of the form ˆ x i = arg max x ∈X X x ∈ f − ( u ): x i = x P β ( x , y ) , i = 1 , , . . . , N, (3)where the parameter β ≥ inverse temperature , a term borrowed from equilib-rium statistical physics (see next subsection). The motivation for considering the finite–temperaturedecoder is two–fold (see also [9] for even more motivations): First, as said, it is a common general-ization of both the symbol–level MAP decoder ( β = 1) and the word–level MAP decoder ( β → ∞ ).Secondly, in some important cases, it refers to a situation of a certain mismatch that may stemfrom uncertainty concerning the joint distribution of ( X, Y ), or even more specifically, the qualityof the ‘channel’ P ( y | x ), connecting X to Y . For example, if ( X, Y ) is a double binary symmetricsource (BSS), that is, X is a BSS and Y given X is generated by binary symmetric channel (BSC),then the choice of β manifests the decoder’s ‘belief’ concerning the quality of this BSC: β < β > P β ( x | y , u ) = P β ( x , y ) P x ′∈ f − u ) P β ( x ′ , y ) x ∈ f − ( u )0 elsewhere (4)Accordingly, we first focus on studying the properties of this posterior in the random binningregime. In particular, similarly as in [9] and [14], we view it as an instance of the Boltzmann–Gibbs5B-G) distribution of statistical mechanics by rewriting it in the form P β ( x | y , u ) = exp {− β E ( x , y ) } P x ′∈ f − u ) exp {− β E ( x ′ , y ) } x ∈ f − ( u )0 elsewhere (5)where E ( x , y ) is the energy function (Hamiltonian), defined as E ( x , y ) ∆ = − ln P ( x , y ). We thenstudy the phase diagram of the corresponding statistical–mechanical model in the plane of R vs. β , or more precisely, R vs. T , where T = 1 /β is the decoding temperature. In analogy to the roleof the random energy model (REM) as the statistical–mechanical counterpart of ordinary randomcoding (see [8, Chap. 6], [14, Chapters 5 and 6]), it turns out that random binning corresponds toa somewhat different (though related) physical model, which we call the random dilution model(RDM), and which was first mentioned in [10]. The RDM and its relevance to random binning willbe presented in the next subsection.Our second objective would be to derive the exact error exponent of the symbol error probability,denoted E ( R, β ), that is associated with the finite–temperature decoder (3), as a function of R and β , in the random binning regime. The properties of E ( R, β ), as well as its phase diagram, will bestudied in some detail.Finally, we briefly discuss several variations of these results, covering situations of mismatch,universal decoding, variable–rate compression, and the “two–sided” version of SW coding, namely,separate encodings and joint decoding of both sources (as opposed to the “one–sided” version de-scribed above, of encoding and decoding of one source while the other one serves as side informationat the decoder).
In ordinary random coding, the analysis of bounds on the probability of error (especially in Gal-lager’s method) are often associated with expressions of the form P x ∈C P β ( y | x ), where P ( y | x ) isthe conditional distribution pertaining to the channel, C is randomly drawn codebook and β > y ): Z ( β | y ) = X x ∈C e − β E ( x , y ) , (6)6here β is the inverse temperature and where here the energy function is E ( x , y ) = − ln P ( y | x ).The same partition function is relevant for the Boltzmann–Gibbs form of the finite–temperatureposterior associated with the channel decoder, in the spirit of eqs. (4) and (5): P β ( x | y ) = ( P β ( y | x ) Z ( β | y ) x ∈ C ( exp {− β E ( x , y ) } Z ( β | y ) x ∈ C y , the energy values {E ( x , y ) , x ∈ C} are i.i.d. random variables. This is essentially the same as in the random energymodel (REM), a well known model of disorder in statistical physics of spin glasses, which under-goes a phase transition: below a certain temperature ( β > β c ), the system is frozen in the sensethat the partition function is dominated by a non–exponential number of microstates { x } at theground–state energy (zero thermodynamical entropy). This is called the frozen phase or the glassyphase . The other phase, β < β c , is called the paramagnetic phase (see more details in [14, Chap.5]). Owing to this analogy between the REM and random coding, the corresponding exponentialerror bounds associated with random coding then undergo a similar phase transition (see [8] andreferences therein).In random binning, as opposed ordinary random coding, the mechanism is somewhat different,and the analogous statistical–mechanical model, which we call the RDM, is defined as follows.Consider a partition function of a certain physical system, with a microstate x and Hamiltonian E ( x ), i.e., Z ( β ) = X x ∈X N e − β E ( x ) , β > , (9)where β is, as said, the inverse temperature. The diluted version of Z ( β ), according to the RDM(hence the name), is defined as Z D ( β ) = X x ∈X N I ( x ) · e − β E ( x ) , (10)where { I ( x ) , x ∈ X N } are i.i.d. Bernoulli random variables with p ∆ = Pr { I ( x ) = 1 } = 1 − Pr { I ( x ) =0 } = e − NR for all x ∈ X N , and R ≥ Z D ( β ) is a partial version ofthe full partition function Z ( β ), with randomly chosen (surviving) microstates { x } . Equivalently, Z D ( β ) can be thought of as being defined just like Z ( β ), but with an Hamiltonian redefined as7 D ( x ) = E ( x ) + ψ ( x ), where ψ ( x ) are i.i.d. random variables, taking the value ψ ( x ) = 0 withprobability e − NR and the value ψ ( x ) = ∞ with probability 1 − e − NR . From the physical pointof view, ψ ( x ) can be thought of as some disordered potential energy function, that due to long–range interactions, disables access to certain points in the configuration space (those that have not‘survived’ the dilution).Let s ( ǫ ) denote the normalized (per–particle) entropy as a function of the normalized energy,associated with the full system, Z ( β ). More precisely, denoting by Ω N ( E ) the number of vectors { x } for which E ( x ) = E , then s ( ǫ ) ∆ = lim N →∞ ln Ω N ( N ǫ ) N , (11)provided that the limit exists. Let ∆ ǫ be an arbitrarily small quantity (increment) of the normalizedenergy. Then, Z D ( β ) ≈ X i X x : Ni ∆ ǫ ≤E ( x ) Let E ( x ) = κ k x k , with X = { , ± a, ± a, . . . } , i.e., an harmonic potential appliedto particles in a grid with spacing a . Then, Ω N ( E ) is approximately the volume of the shell of ahyper–sphere of radius p E/κ , divided by an elementary volume of the grid cube a N , which yields s ( ǫ ) = 12 ln 4 πeǫκa , (16)and so, s ′ ( ǫ ) = 12 ǫ . (17)and s − ( R ) = κa πe · e R . (18)Thus, β c ( R ) = 2 πeκa · e − R , (19)meaning that the critical temperature grows exponentially with R . This concludes Example 1. Remark 1. A slightly more general version of the RDM replaces the fixed parameter R by a functionof ǫ , that is, p = e − NR ( ǫ ) . The analysis is essentially the same as before, except that now, the rangeof maximization { ǫ : s ( ǫ ) ≥ R } is replaced by { ǫ : s ( ǫ ) ≥ R ( ǫ ) } , which, depending on the form ofthe function R ( · ), might be rather different.Finally, to see the relevance of the RDM to random binning for SW coding, let us return to theproblem setting described in Subsection 2.2, and consider the partition function Z ( β | y , u ) = X x ∈ f − ( u ) P β ( x , y ) = X x ∈X N exp {− β E ( x , y ) } · I [ f ( x ) = u ] , (20)9ertaining to the Boltzmann-Gibbs distribution (5). We can think of this as an instance of theRDM, with I ( x ) = I [ f ( x ) = u ], i.e., the microstates { x } that ‘survive’ the dilution are only thosefor which the randomly selected bin index happens to coincide with the given u , which is the casewith probability e − NR , exactly like in the above defined RDM. In this subsection, we characterize the phase diagram of the partition function (20), pertainingto the finite–teperature posterio (5), for a typical realization of the random binning scheme anda typical realization of ( X , Y ). Generally speaking, this derivation is in the spirit of those in [8,Chap. 6] and [14, Section 6.3.3], but there are some important differences.We begin by decomposing Z ( β | y , u ) as Z ( β | y , u ) = Z c ( β | y , u ) + Z e ( β | y , u ) , (21)where Z c ( β | y , u ) = e − β E ( x , y ) is the contribution of the correct x that was actually emitted by thesource, whereas Z e ( β | y , u ) is the sum of contributions of all other source vectors. For a typicalrealization ( x , y ) of ( X , Y ), E ( x , y ) is about N H ( X, Y ) (by the weak law of large numbers), andso, Z c ( β | y , u ) is about e − βNH ( X,Y ) , where H ( X, Y ) is the joint entropy of ( X, Y ). What makes thereal x emitted having a special stature here is the fact that it surely survives the dilution, as u is,by definition, the bin index of x .We next address the behavior of the second term, Z e ( β | y , u ). To this end, we need the entropyfunction s ( ǫ ) (see (11)) of the full (non–diluted) system. Using the method of types, it is easilyseen that for a typical y , this function is given by s ( ǫ ) = max { Q ( x | y ): − P x,y P ( y ) Q ( x | y ) ln P ( x,y )= ǫ } X y P ( y ) X x Q ( x | y ) ln 1 Q ( x | y ) . (22)It would be instructive and useful to characterize the form of the optimal ‘channel’ { Q ( x | y ) } , callit { Q ∗ ( x | y ) } , that achieves s ( ǫ ). Intuitively, s ( ǫ ) can be thought of as the overall per–particleentropy of a mixture of systems, indexed by y , each one with N P ( y ) particles and Hamiltonian E ( x, y ) = − ln P ( x, y ), where x plays the role of a micro-state and y is the index. In thermalequilibrium, all systems are at the same temperature, which we will denoted by τ = 1 /α , and10he Boltzmann factor is proportional to e − α E ( x,y ) = P α ( x, y ), where α is chosen so as to meet theconstraint.More precisely, let ζ ( α | y ) = P x P α ( x, y ), α ∈ IR. We argue that the conditional distribution Q ∗ ( x | y ) that achieves s ( ǫ ) is always of the form Q ∗ ( x | y ) = Q α ( x | y ) ∆ = P α ( x, y ) ζ ( α | y ) , (23)where α is chosen to satisfy the constraint X x,y P ( y ) Q α ( x | y ) E ( x, y ) = − X x,y P ( y ) Q α ( x | y ) ln P ( x, y ) = ǫ. (24)Note that for α → ∞ , Q α ( x | y ) tends to put all its mass on the letter x which maximizes P ( x, y )and the resulting energy is ǫ min = P y P ( y ) min x ln[1 /P ( x, y )], whereas for α → −∞ , Q α ( x | y )tends to put all its mass on the letter x which minimizes P ( x, y ) and the resulting energy is ǫ max = P y P ( y ) max x ln[1 /P ( x, y )]. Thus, as α exhausts the real line, the entire energy range( ǫ min , ǫ max ) is covered. The optimality of Q α ( x | y ) follows from the following consideration:0 ≤ X x,y P ( y ) Q ( x | y ) ln Q ( x | y ) Q α ( x | y )= X x,y P ( y ) Q ( x | y ) ln Q ( x | y ) ζ ( α | y ) P α ( x, y )= X y P ( y ) ln ζ ( α | y ) + αǫ + X y P ( y ) X x Q ( x | y ) ln Q ( x | y ) (25)or X y P ( y ) X x Q ( x | y ) ln 1 Q ( x | y ) ≤ X y P ( y ) ln ζ ( α | y ) + αǫ, (26)with equality for Q ( x | y ) = Q α ( x | y ). It is also seen that s ( ǫ ) = X y P ( y ) ln ζ ( α | y ) + αǫ, (27)where it should be kept in mind that α is itself a function of ǫ , defined by the constraint (24).Now, as Z e ( β | y , u ) is associated with the RDM, it has two phases: the paramagnetic phase andthe glassy phase. in the plane of β vs. R , where the boundary is β = β c ( R ) with the entropyfunction s ( · ) as above. The contribution of Z c ( β | y , u ) introduces a third phase – the so called, ordered phase or ferromagnetic phase . The ferromagnetic phase dominates the glassy phase when11 H ( X, Y ) ≤ βs − ( R ), namely, when R ≥ s [ H ( X, Y )]. Now, we argue that s [ H ( X, Y )] = H ( X | Y ).To prove this, first observe that obviously, s [ H ( X, Y )] ≥ H ( X | Y ) (by choosing Q ( x | y ) = P ( x | y )).On the other hand, the reversed inequality is obtained by repeating eq. (25) with the choice α = 1.Thus, the boundary between the ferromagnetic phase and the glassy phase is given by R = H ( X | Y )(the vertical line in the phase diagram of Fig. 1). Note also that β c [ H ( X | Y )] = 1.Now, the normalized log–partition function of the non–diluted system, φ ( β ), is obtained by e Nφ ( β ) = X x P β ( x , y ) (28)= X x N Y i =1 P β ( x i , y i ) (29)= N Y i =1 "X x P β ( x, y i ) (30)= Y y ∈Y "X x P β ( x, y ) NP ( y ) (31)= exp N X y ∈Y P ( y ) ln "X x P β ( x, y ) , (32)i.e., φ ( β ) = X y ∈Y P ( y ) ln "X x P β ( x, y ) . (33)It follows that for the ferromagnetic component to dominate also the paramagnetic component, wemust have βH ( X, Y ) ≤ − X y ∈Y P ( y ) ln "X x P β ( x, y ) + R, (34)or, equivalently, R ≥ βH ( X, Y ) + X y ∈Y P ( y ) ln "X x P β ( x, y ) ∆ = Γ( β ) , (35)and so the boundary is given by R = Γ( β ) or T = 1 / Γ − ( R ). The boundary between the glassyphase and the paramagnetic phase is, of course, β = β c ( R ) or T = T c ( R ) ∆ = 1 /β c ( R ), as mentionedalready in the general discussion on the RDM. The phase diagram of finite–temperature randombinning appears in Fig. 1, as a partition of the plane of T = 1 /β vs. R into the three regionsmentioned. As can be seen, qualitatively speaking, it looks quite like the mirror image of the phasediagram of random coding for channels [9], [14, p. 119, Fig. 6.5], since a rate R SW code essentially12perates like a channel code at rate H ( X ) − R . However, the equations of the boundary curves T = T c ( R ) and T = 1 / Γ − ( R ) here and in channel coding are completely different due to severalreasons:1. In SW coding, the typical size of a bin (which is analogous to the size of the correspondingchannel codebook) is a random variable, which fluctuates around |X | N · e − NR . Only aboutexp { N [ H ( X ) − R ] } members of this bin are typical to the source, but when error exponentsand large deviations effects are considered, the a-typical bin members may also play a non–trivial role.2. Unlike in traditional channel coding, the prior of the input is not necessarily uniform acrossthe bin, as it depends on the source (just like in joint source–channel coding).3. The compositions (types) of the codewords are random.All these differences make the analogy between SW coding and channel coding rather non–trivialin our context. A few comments are in order concerning possible extensions and modifications ofthe above phase diagram.1. Variable–rate SW codes. Variable–rate SW codes may be related to the generalized version ofthe RDM that allows R to be energy–dependent (see Remark 1). In the context of variable–rateSW coding, this requires a slight modification, as R may be allowed to depend only on x , but noton y . A plausible approach is then to let R depend on x only via the type class of x (see [18]).2. Mismatch. The above analysis can easily be extended to a situation of a general mismatch.Suppose that the partition function is defined in terms of a mismatched model ˜ P ( x , y ), wherewe assume, without loss of generality, that ˜ P ( y ) = P ( y ), because as far as the finite–temperaturedecoder is concerned, a general ˜ P ( x , y ) is equivalent to P ( y ) ˜ P ( x | y ), where ˜ P ( x | y ) is the conditionaldistribution induced by ˜ P ( x , y ). Accordingly, in Fig. 1, the ferromagnetic–glassy boundary wouldbe replaced by the vertical straight line R = − E ln ˜ P ( X | Y ) = − X x,y P ( x, y ) ln ˜ P ( x | y ) , the ferromagnetic–paramagnetic boundary would be modified to R = ˜Γ( β ) = − β E ln ˜ P ( X, Y ) + X y P ( y ) ln "X x ˜ P β ( x, y ) , ( X | Y ) R |X | T = β paramagneticglassy T = T c ( R ) ferromagnetic T = − ( R ) Figure 1: Phase diagram of Z ( β | y ) (for a typical y ) in the plane of the decoding temperature T vs. thecoding rate R . and the paramagnetic–glassy boundary would become β = ˜ β c ( R ) = ˜ s ′ [˜ s − ( R )] , where ˜ s ( ǫ ) is defined similarly as s ( ǫ ), except that P ( x, y ) is replaced by ˜ P ( x, y ).3. Universal decoding. It is interesting to analyze similarly the partition function pertaining toa finite–temperature version of the (universal) minimum conditional entropy decoder. The onlydifference is that here, the Hamiltonian is replaced by E ( x , y ) = − N P y P ( y ) P x Q ( x | y ) ln Q ( x | y ),where Q ( x | y ) is the conditional empirical distribution of X given Y , induced from ( x , y ). Obviously,in this case, s ( ǫ ) = ǫ , and so, for a typical y :lim N →∞ ln Z e ( β | y ) N = sup { ǫ : ǫ ≥ R } [ ǫ (1 − β ) − R ] (36)= ( (1 − β ) ln | X | − R β < − βR β ≥ T c = 1, independently of R . Thus, the14aramagnetic–glassy boundary becomes the horizontal straight line T c = 1. The paramagnetic–ferromagnetic boundary is now R = (1 − β ) ln |X | + βH ( X | Y ) , or equivalently, T = ln |X | − H ( X | Y )ln |X | − R . The phase diagram is depicted in Fig. 2. As can be seen, the price of the universality is that theparamagnetic phase partly ‘invades’ into the previous area of the ferromagnetic phase, and that,similarly, the glassy phase has expanded at the expense of the paramagnetic phase. H ( X | Y ) R |X | T = β paramagneticglassy T c = 1 T = ln |X |− H ( X | Y )ln |X |− R ferromagnetic Figure 2: Phase diagram of the finite–temperature minimum entropy decoder in the plane of the decodingtemperature T vs. the coding rate R . Two–Sided SW Coding. When both x and y are encoded, at rates R X and R Y , respectively,the partition function becomes Z ( β | u, v ) = X x ′ , y ′ P β ( x ′ , y ′ ) · I [ f X ( x ′ ) = u ] · I [ f Y ( y ′ ) = v ] . (38)Here, we should distinguish between four terms: Z cc ( β | u, v ) = P β ( x , y ) = e − β ln[1 /P ( x , y )] ec ( β | y , u, v ) = X x ′ = x P β ( x ′ , y ) · I [ f X ( x ′ ) = u ] (39) Z ce ( β | x , u, v ) = X y ′ = y P β ( x , y ′ ) · I [ f Y ( y ′ ) = v ] (40) Z ee ( β | u, v ) = X x ′ = x , y ′ = y P β ( x ′ , y ′ ) · I [ f X ( x ′ ) = u ] · I [ f Y ( y ′ ) = v ] . (41)As before, Z cc ( β | u, v ) is typically about e − NβH ( X,Y ) . Z ec ( β | y , u, v ) is exactly the same as the earlier Z e ( β | y , u ), and so is, Z ce ( β | x , u, v ), with the roles of x and y being interchanged. Thus, we define s X | Y ( ǫ ) = max { Q ( x | y ): − P x,y P ( y ) Q ( x | y ) ln P ( x,y )= ǫ } X x,y P ( y ) Q ( x | y ) ln 1 Q ( x | y ) (42) s Y | X ( ǫ ) = max { Q ( y | x ): − P x,y P ( x ) Q ( y | x ) ln P ( x,y )= ǫ } X x,y P ( x ) Q ( y | x ) ln 1 Q ( y | x ) (43)and s XY ( ǫ ) = max { Q ( x,y ): − P x,y Q ( x,y ) ln P ( x,y )= ǫ } X x,y Q ( x, y ) ln 1 Q ( x, y ) (44)Therefore, we have lim N →∞ ln Z ec ( β ) N = sup { ǫ : s X | Y ( ǫ ) ≥ R X } [ s X | Y ( ǫ ) − R X − βǫ ] (45)= ( φ X ( β ) − R X β < β X − βǫ X β ≥ β X (46)and lim N →∞ ln Z ce ( β ) N = ( φ Y ( β ) − R Y β < β Y − βǫ Y β ≥ β Y (47)where φ X ( β ) = X y P ( y ) ln "X x P β ( x, y ) , (48) φ Y ( β ) = X x P ( x ) ln "X y P β ( x, y ) , (49) ǫ X is the solution to the equation s X | Y ( ǫ ) = R X , β X = s ′ X | Y ( ǫ X ), ǫ Y is the solution to the equation s Y | X ( ǫ ) = R Y , and β Y = s ′ Y | X ( ǫ Y ). Similarly,lim N →∞ ln Z ee ( β ) N = sup { ǫ : s XY ( ǫ ) ≥ R X + R Y } [ s XY ( ǫ ) − R X − R Y − βǫ ] (50)= ( φ XY ( β ) − R X − R Y β < β XY − βǫ XY β ≥ β XY (51)16here φ XY ( β ) = ln "X x,y P β ( x, y ) , (52) ǫ XY is the solution to the equation s XY ( ǫ ) = R X + R Y , and β XY = s ′ XY ( ǫ XY ). Here, the phasediagram, in the three–dimensional space ( T, R X , R Y ) is much more involved since each one ofthe terms Z ee ( β ), Z ce ( β ), Z ec ( β ) could be in two different phases, and on top of that, one shouldcheck when Z cc ( β ) dominates. We will not delve into it any further here, but only note thatfor β ≤ Z ee ( β ), Z ec ( β ), Z ce ( β ) and Z cc ( β ), dominates, wherever R X + R Y − φ XY ( β ), R X − φ X ( β ), R Y − φ Y ( β ), and βH ( X, Y ), is the smallest among all four functions, respectively. In particular,we have the following conditions for reliable communication (where Z cc ( β ) dominates): R X > βH ( X, Y ) + φ X ( β ) (53) R Y > βH ( X, Y ) + φ Y ( β ) (54) R X + R Y > βH ( X, Y ) + φ XY ( β ) (55)For β = 1, this boils down to the well–known achievability region of SW coding. Note thatthere are regions where either Z ec ( β ) or Z ce ( β ) dominate, which means that one of the sourcesis decoded reliably, while the other one is not. As expected, for β = 1, y alone is decodedreliably within { ( R X , R Y ) : R X < H ( X | Y ) , R Y > H ( Y ) } and x alone is decoded reliably within { ( R X , R Y ) : R Y < H ( Y | X ) , R X > H ( X ) } . In this section, we provide an exact analysis of the error exponent, associated with the symbolerror probability of the finite–temperature decoder (3). We then discuss some properties of theerror exponent as a function of R and β and present a phase diagram. Finally, we discuss somemodifications and extensions.For the sake of simplicity of the exposition, and without any essential loss of generality, we willassume X = { , } and evaluate the expected bit–error rate (BER), P b ( R, β, N ) = Pr { ˆ x = x } Expectation w.r.t. the random binning. { ˆ x i = x i } for every i = 1 , , . . . , N due to symmetry). that is, P b ( R, β, N ) = Pr X { x ′ : x ′ = x } P β ( x ′ , y ) · I [ f ( x ′ ) = f ( x )] ≥ X { x ′ : x ′ = x } P β ( x ′ , y ) · I [ f ( x ′ ) = f ( x )] , (56)or more precisely, the error exponent associated with P b ( R, β, N ): E ( R, β ) ∆ = lim N →∞ (cid:20) − ln P b ( R, β, N ) N (cid:21) . (57)For later use, we also define the following notation. ǫ ( Q XY ) ∆ = 1 N ln P ( x , y ) = X ( x,y ) ∈X ×Y Q XY ( x, y ) ln P XY ( x, y ) , (58)where Q XY is understood here to be the joint empirical distribution of ( x , y ) ∈ X N ×Y N . Similarly,for a generic x ′ , the corresponding auxiliary random variable will be denoted by X ′ , so that the jointempirical distribution of ( x ′ , y ) will be denoted by Q X ′ Y . In general, depending on the context, Q XY and Q X ′ Y (or just Q ) may also denote generic joint distributions on X × Y , not necessarilyempirical distributions pertaining to sequences of finite length. For a given Q XY , let us define A ( Q XY , R, β ) ∆ = min Q X ′| Y (cid:26) [ R − H Q ( X ′ | Y )] + : ǫ ( Q X ′ Y ) + 1 β [ H Q ( X ′ | Y ) − R ] + ≥ ǫ ( Q XY ) (cid:27) , (59)where H Q ( X ′ | Y ) is the conditional entropy of X ′ given Y associated with Q X ′ Y . Finally, define E ( R, β ) = min Q XY [ D ( Q XY k P ) + A ( Q XY , R, β )] , (60)where D ( Q XY k P ) is the relative entropy (Kullback–Leibler divergence) between { Q XY ( x, y ) } and { P ( x, y ) } , i.e., D ( Q XY k P ) = X x,y Q XY ( x, y ) ln Q XY ( x, y ) P ( x, y ) . (61)The following theorem presents our main result in this section. Theorem 1 For the ensemble of random binning pertaining to SW codes, as described in 2.2, E ( R, β ) = E ( R, β ) . (62)18t is easy to see that E ( R, β ) is non-decreasing both in β and R . In particular, E ( R, ∞ ) = min Q Y | X ( D ( Q XY k P ) + min Q X ′| Y : ǫ ( Q X ′ Y ) ≥ ǫ ( Q XY ) [ R − H Q ( X ′ | Y )] + ) , (63)which agrees with the error exponent of the word error probability, as expected. On the other hand,for β = 1, the finite–temperature decoder minimizes the BER and hence maximizes the exponent.Consequently, E ( R, β ) must be a constant for all β ≥ 1, which is equal to E ( R, ∞ ). It is also easyto verify that E ( R, β ) vanishes for every β ≤ Γ − ( R ), i.e., beyond the paramagnetic–ferromagneticboundary curve. Thus, E ( R, β ) has three phases in the plane of β vs. R : (i) β ≤ Γ − ( R ) or R ≤ H ( X | Y ) (the union of the paramagnetic and glassy phases of the posterior), where E ( R, β ) = 0,(ii) R > H ( X | Y ) and β ≥ 1, where E ( R, β ) = E ( R, ∞ ) (first ferromagnetic sub–phase), and (iii) R > H ( X | Y ) and Γ − ( R ) ≤ β < E ( R, β ) < E ( R, ∞ ) ismonotonically non–decreasing both in β and R . The phase diagram of E ( R, β ) is depicted in Fig.3. As can be seen, it is related to the phase diagram of the finite–temperature partition function,but somewhat different. Here the paramagnetic and the glassy phases are united (in both of them E ( R, β ) = 0), but the ferromagnetic phase is subdivided into two new phases, as described above. E ( R, β ) = E ( R, ∞ ) E ( R, β ) = 0 T = − R ) T = β R ln |X | H ( X | Y ) T = 10 < E ( R, β ) < E ( R, ∞ ) Figure 3: Phase diagram of E ( R, /T ) in the plane of the decoding temperature T vs. the coding rate R . Remark 2. In order to analyze the performance of other decoding metrics that depend on ( x , y )19nly via their joint type, one should simply replace the definition of ǫ ( Q XY ) by the correspondingmetric, for example, a mismatched metric ǫ ( Q XY ) = P x,y Q XY ( x, y ) ln ˜ P ( x, y ), or the minimumconditional entropy metric ǫ ( Q XY ) = − H Q ( X | Y ). Concerning the latter, the phase diagram of theerror exponent will be based on Fig. 2 in the same way that the phase diagram of Fig. 3 is based onthe phase diagram of Fig. 1. Here, however, there is no apparent subdivision of the ferromagneticphase E ( R, β ) > T = 1. In other words, there are just two phases, E ( R, β ) > E ( R, β ) = 0. Proof of Theorem 1. The proof is similar to the proof of Theorem 1 in [11]. For a given ( x , y ) ∈X N ×Y N , and a given joint probability distribution Q X ′ Y on X ×Y , let Ω ( Q X ′ Y ) denote the numberof { x ′ } within the bin of x , such that x ′ = x and such that the empirical joint distribution with y is given by Q X ′ Y , that isΩ ( Q X ′ Y ) = X x ′ : x ′ = x I{ ( x ′ , y ) ∈ T ( Q X ′ Y ) } · I [ f ( x ′ ) = f ( x )] . (64)For a given ( x , y ), the BER is first calculated w.r.t. the randomness of the bins of codewords with x ′ = x , but for a given binning of those with x ′ = x . We henceforth denote C = { x ′ : x ′ = x , f ( x ′ ) = f x ) } and C = { x ′ : x ′ = x , f ( x ′ ) = f ( x ) } .For a given C and ( x , y ), let r ∆ = 1 N ln X x ′ ∈C P β ( x ′ , y ) , (65)and so, the BER becomes Pr X x ′ ∈C P β ( x ′ , y ) ≥ e Nr , where it is kept in mind that r is a function of C and ( x , y ). Now,Pr X x ′ ∈C P β ( x ′ , y ) ≥ e Nr = Pr X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≥ e Nr (66) · = Pr ( max Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≥ e Nr ) (67)= Pr [ Q X ′| Y n Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≥ e Nr o (68) · = X Q X ′| Y Pr n Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≥ e Nr o (69)20 = max Q X ′| Y Pr n Ω ( Q X ′ Y ) ≥ e N [ r − βǫ ( Q X ′ Y )] o , (70)Now, for a given Q X ′ | Y , Ω ( Q X ′ Y ) is a binomial random variable with exponentially e NH Q ( X ′ | Y ) trials and probability of ‘success’ e − NR . Thus, similarly as in [11] and [17], a standard largedeviations analysis yieldsPr n Ω ( Q X ′ Y ) ≥ e N [ r − βǫ ( Q X ′ Y )] o · = e − NE ( r,β,R,Q X ′ Y ) , (71)where E ( r, β, R, Q X ′ Y ) = [ R − H Q ( X ′ | Y )] + βǫ ( Q X ′ Y ) ≥ s βǫ ( Q X ′ Y ) < r, βǫ ( Q X ′ Y ) ≥ r − H Q ( X ′ | Y ) + R ∞ βǫ ( Q X ′ Y ) < r, βǫ ( Q X ′ Y ) < r − H Q ( X ′ | Y ) + R = R − H Q ( X ′ | Y ) βǫ ( Q X ′ Y ) ≥ r, H Q ( X ′ | Y ) ≤ R βǫ ( Q X ′ Y ) ≥ r, H Q ( X ′ | Y ) ≥ R βǫ ( Q X ′ Y ) < r, βǫ ( Q X ′ Y ) ≥ s − H Q ( X ′ | Y ) + R ∞ βǫ ( Q X ′ Y ) < r, βǫ ( Q X ′ Y ) < r − H Q ( X ′ | Y ) + R = R − H Q ( X ′ | Y ) βǫ ( Q X ′ Y ) ≥ r, H Q ( X ′ | Y ) ≤ R βǫ ( Q X ′ Y ) ≥ r − [ H Q ( X ′ | Y ) − R ] + , H Q ( X ′ | Y ) ≥ R ∞ βǫ ( Q XY ) < r − [ H Q ( X ′ | Y ) − R ] + = ( [ R − H Q ( X ′ | Y )] + βǫ ( Q X ′ Y ) ≥ r − [ H Q ( X ′ | Y ) − R ] + ∞ βǫ ( Q X ′ Y ) < r − [ H Q ( X ′ | Y ) − R ] + (72)Therefore, max Q X ′| Y Pr { Ω ( Q X ′ Y ) ≥ e N [ r − βǫ ( Q X ′ Y )] } decays according to E ( r, β, R, Q Y ) = min Q X ′| Y E ( r, β, R, Q X ′ Y ) , which is given by E ( r, β, R, Q Y ) = min { [ R − H Q ( X ′ | Y )] + : βǫ ( Q X ′ Y ) + [ H Q ( X ′ | Y ) − R ] + ≥ r } (73)with the understanding that the minimum over an empty set is defined as infinity. Finally, P b ( R, β, N ) is the expectation of e − NE ( r,β,R,Q Y ) where the expectation is w.r.t. the randomnessof the binning in C and the randomness of ( X , Y ). This expectation will be taken in two steps:first, over the randomness of the binning in C while x (the real transmitted source vector) and y are held fixed, and then over the randomness of X and Y . Let x and y be given and let δ > P b ( x , y ) ∆ = E [exp {− N E ( r, β, R, Q Y ) }| X = x , Y = y ]21 X r P ( r | X = x , Y = y ) · exp {− N E ( r, β, R, Q Y ) }≤ X i Pr { iδ ≤ r < ( i + 1) δ | X = x , Y = y } · exp {− N E ( iδ, β, R, Q Y ) } , (74)where i ranges from Nδ ln P ( x , y ) to some constant, which is immaterial for our purposes. Now, e nr = P β ( x , y ) + X x ′ : x ′ = x P β ( x ′ , y ) · I [ f ( x ′ ) = f ( x )]= e Nβǫ ( Q XY ) + X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) , (75)where Q XY is the empirical distribution of ( x , y ) and Ω ( Q X ′ Y ) is the number of codewords in C \ { x } whose joint empirical distribution with y is Q X ′ Y . The first term in the second line of (75)is fixed at this stage. As for the second term, we have (similarly as before):Pr X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≥ e nt · = e − NE ( t,β,R,Q Y ) . (76)On the other hand,Pr X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≤ e Nt · = Pr \ Q X ′| Y n Ω ( Q X ′ Y ) ≤ e N [ t − βǫ ( Q X ′ Y )] o . (77)Now, if there exists at least one Q X ′ | Y for which R < H Q ( X ′ | Y ) and H Q ( X ′ | Y ) − R > t − βǫ ( Q X ′ Y ),then this Q X ′ | Y alone is responsible for a double exponential decay of the probability of the event { Ω ( Q X ′ Y ) ≤ e N [ t − βǫ ( Q X ′ Y )] } , let alone the intersection over all Q X ′ | Y . On the other hand, iffor every Q X ′ | Y , either R ≥ H Q ( X ′ | Y ) or H Q ( X ′ | Y ) − R ≤ t − βǫ ( Q X ′ Y ), then we have anintersection of polynomially many events whose probabilities all tend to unity. Thus, the probabilityin question behaves exponentially like an indicator function of the condition that for every Q X ′ | Y ,either R ≥ H Q ( X ′ | Y ) or H Q ( X ′ | Y ) − R ≤ t − βǫ ( Q X ′ Y ), or equivalently,Pr X Q X ′ Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≤ e Nt · = I ( R ≥ max Q X ′| Y { H Q ( X ′ | Y ) − [ t − βǫ ( Q X ′ Y )] + } ) . (78)Let us now find what is the minimum value of t for which the value of this indicator function isunity. The condition is equivalent tomax Q X ′| Y min ≤ a ≤ { H Q ( X ′ | Y ) − a [ t − βǫ ( Q X ′ Y )] } ≤ R (79)22r: ∀ Q X ′ | Y ∃ ≤ a ≤ H Q ( X ′ | Y ) − a [ t − βǫ ( Q X ′ Y )] ≤ R, (80)which can also be written as ∀ Q X ′ | Y ∃ ≤ a ≤ t ≥ βǫ ( Q XY ) + H Q ( X ′ | Y ) − Ra (81)or equivalently, t ≥ max Q X ′| Y min ≤ a ≤ (cid:20) βǫ ( Q X ′ Y ) + H Q ( X ′ | Y ) − Ra (cid:21) (82)= max Q X ′| Y " βǫ ( Q X ′ Y ) + ( H Q ( X ′ | Y ) − R H Q ( X ′ | Y ) − R ≥ −∞ H Q ( X ′ | Y ) < R (83)= max { Q X ′| Y : R ≤ H Q ( X ′ | Y ) } [ βǫ ( Q X ′ Y ) + H Q ( X ′ | Y )] − R (84) ∆ = r ( Q Y ) . (85)It is easy to check that E ( t, β, R, Q Y ) vanishes for t ≤ r ( Q Y ). Thus, in summary, we havePr e nt ≤ X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≤ e n ( t + ǫ ) · = ( t < r ( Q Y ) − ǫe − nE ( t,β,R,Q Y ) t ≥ r ( Q Y ) (86)Therefore, for a given ( x , y ), the expected error probability w.r.t. the randomness of the binningat C yields P e ( x , y ) = E { e − N [ E ( r,β,R,Q Y ) | X = x , Y = y } (87) ≤ X i Pr e Niδ ≤ X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) ≤ e N ( i +1) δ ) × exp {− N E (max { iδ, βǫ ( Q XY ) } , β, R, Q Y ) } (88) · ≤ X i ≥ r ( Q Y ) /δ exp {− N E ( iδ, β, R, Q Y ) } × exp {− N E (max { iδ, βǫ ( Q XY ) } , β, R, Q Y ) } , (89)where the expression max { iδ, βǫ ( Q XY ) } in the argument of E ( · , Q Y ) is due to the fact that r = 1 N ln e Nβǫ ( Q XY ) + X Q X ′| Y Ω ( Q X ′ Y ) e Nβǫ ( Q X ′ Y ) (90) ≥ N ln h e Nβǫ ( Q XY ) + e Niδ i (91)23 = max { iδ, βǫ ( Q XY ) } . (92)By using the fact that δ is arbitrarily small, we obtain P e ( x , y ) · = exp {− N E (max { r ( Q Y ) , βǫ ( Q XY ) } , β, R, Q Y ) } = exp {− N max { E ( r ( Q Y ) , β, R, Q Y ) , E ( βǫ ( Q XY ) , β, R, Q Y ) } = exp {− N E ( βǫ ( Q XY ) , β, R, Q Y ) } (93)since the dominant contribution to the sum over i is due to the term i = r ( Q Y ) /δ (by the non–decreasing monotonicity of the function E ( · , Q Y )). After averaging w.r.t. ( X , Y ), we obtain E ( R, β ) = min Q XY { D ( Q XY k P ) + E ( βǫ ( Q XY ) , β, R, Q Y ) } (94)= min Q XY { D ( Q XY k P ) + A ( Q XY , R, β ) } , (95)completing the proof of Theorem 1. References [1] I. Csiszár, “Linear codes for sources and source networks: error exponents, universal coding,” IEEE Trans. Inform. Theory , vol. IT–28, no. 4, pp. 585–592, July 1982.[2] I. Csiszár and J. Körner, “Towards a general theory of source networks,” IEEE Trans. Inform.Theory , vol. IT–26, no. 2, pp. 155–165, March 1980.[3] I. Csiszár and J. Körner, “Graph decomposition: a new key to coding theorems,” IEEE Trans.Inform. Theory , vol. IT–27, no. 1, pp. 5–12, January 1981.[4] I. Csiszár, J. Körner, and K. Marton, “A new look at the error exponent of a discrete mem-oryless channel,” Proc. ISIT ‘77 , p. 107 (abstract), Cornell University, Itacha, New York,U.S.A., 1977.[5] R. G. Gallager, Information Theory and Reliable Communication IEEE Trans. Inform. Theory , vol. 57, no. 9, pp. 5615–5633, September 2011.248] N. Merhav, “Statistical physics and information theory,” Foundations and Trends in Com-munications and Information Theory , vol. 6, nos. 1–2, pp. 1–212, 2009.[9] N. Merhav, “Relations between random coding exponents and the statistical physics of ran-dom codes,” IEEE Trans. Inform. Theory , vol. 55, no. 1, pp. 83–92, January 2009.[10] N. Merhav, “Erasure/list exponents for Slepian–Wolf decoding,” IEEE Trans. Inform. The-ory , vol. 60, no. 8, pp. 4463–4471, August 2014.[11] N. Merhav, “Exact random coding exponents of optimal bin index decoding,” IEEE Trans.Inform. Theory , vol. 60, no. 10, pp. 6024–6031, October 2014.[12] N. Merhav, “Erasure/list exponents for Slepian–Wolf decoding,” IEEE Trans. Inform. The-ory , vol. 60, no. 8, pp. 4463–4471, August 2014.[13] Y. Oohama and T. S. Han, “Universal coding for the Slepian–Wolf data compression systemand the strong converse theorem,” IEEE Trans. Inform. Theory , vol. 40, no. 6, pp. 1908–1919,November 1994.[14] M. Mézard and A. Montanari, Information, Physics and Computation , Oxford UniversityPress, 2009.[15] P. Ruján, “Finite temperature error–correcting codes,” Phys. Rev. Let. , vol. 70, no. 19, pp.2968–2971, May 1993.[16] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans.Inform. Theory , vol. IT–19, no. 4, pp. 471–480, January 1973.[17] N. Weinberger and N. Merhav, “Codeword or noise? Exact random coding exponents forjoint detection and decoding,” IEEE Trans. Inform. Theory , vol. 60, no. 9, pp. 5077–5094,September 2014.[18] N. Weinberger and N. Merhav, “Optimum trade-off between the error exponent and theexcess–rate exponent of variable–rate Slepian–Wolf coding,” submitted to IEEE Trans. In-form. Theory , January 2014. Available on-line at: http://arxiv.org/pdf/1401.0892.pdfhttp://arxiv.org/pdf/1401.0892.pdf