(ε, n) Fixed-Length Strong Coordination Capacity
11 ( (cid:15), n ) Fixed-Length Strong Coordination Capacity
Giulia Cervia, Tobias J. Oechtering, and Mikael Skoglund
Abstract
This paper investigates the problem of synthesizing joint distributions in the finite-length regime. For a fixedblocklength n and an upper bound on the distribution approximation (cid:15) , we prove a capacity result for fixed-lengthstrong coordination. It is shown analytically that the rate conditions for the fixed-length regime are lower-boundedby the mutual information that appears in the asymptotical condition plus Q − ( (cid:15) ) (cid:112) V /n , where V is the channeldispersion, and Q − is the inverse of the Gaussian cumulative distribution function. I. I
NTRODUCTION
The problem of cooperation of autonomous devices in a decentralized network, initially raised in the contextof game theory by [1], with applications, for instance, to power control [2], is concerned with communicationnetworks beyond the traditional problem of reliable communications. The goal of coordination is in fact to exceedthe classical problem of reliably conveying information, and to characterize the set of target joint probabilitydistributions that are implementable by a choice of strategy of the agents. Coordination is then intended as a wayof enforcing a prescribed joint behavior of the devices through communication, by synthesizing joint distributionswhich approximates a target behavior [3, 4]. This topic, presented as “channel simulation” is related to earlier workon “Shannon’s reverse coding theorem” [5] and the compression of probability distribution sources and mixedquantum states [6–8].The information-theoretic framework for coordination in networks considered in the present paper has beenintroduced in [3, 4, 9]. In [3, 4, 9] two metrics to measure the level of coordination have been defined: empiricalcoordination , which requires the empirical distribution of the distributed random states to approach a targetdistribution with high probability, and strong coordination , which requires the L distance of the distribution ofsequences of distributed random states to converge to an i.i.d. target distribution. Strong and empirical coordinationin the asymptotical regime have been studied in a number of works, namely [4, 9–23], but until [24] there wasno attempt to tackle the problem of coordination in the finite-length regime. However, in many realistic systemsof interest, non-asymptotic information-theoretic limits are of high practical interest. Originally raised by [25],and following [26, 27] and more recently [28–30], an increasingly large number of papers have brought up finiteblocklength information theory limits (see for instance [31–37]), tackling the question of whether the asymptoticalresults are well-suited to estimate the finite blocklength problems.Specifically to the coordination problem, [24] drops the simplifying assumption that allows the blocklength togrow indefinitely, and focuses on the trade-offs achievable in the finite blocklength regime. The notion of ( (cid:15), n ) fixed-length strong coordination demands that, for a fixed codelength n and a given (cid:15) , the L distance between thedistribution of sequences of distributed random states and an i.i.d. target distribution is upper-bounded by (cid:15) [24].In a first attempt to derive a capacity region, [24] presents an inner bound for a two-node network comprised of aninformation source and a noisy channel, in which both nodes have access to a common source of randomness. Eventhough the achievability scheme outlined in [24] is general enough to be applied to more sophisticated networktopologies, deriving an outer bound for the region is a difficult problem even with the less stringent constraint ofthe asymptotical regime [23]. Hence, to prove an outer bound as well as an inner bound, in this paper we lookat the simplest setting for which the strong coordination problem has been solved in the asymptotical regime [4].Thus, we consider the point-to-point setting comprised of an information source, a rate-limited error-free link, anda uniform source of common randomness available at the encoder and the decoder as depicted in Figure 1, and This work was supported in part by the Swedish foundation for strategic research, the Swedish research council, and Digital Futures.G. Cervia is with IMT Lille Douai, Institut Mines-T´el´ecom, Univ. Lille, Centre for Digital Systems, F-59000 Lille, France (email:[email protected]), and was with the School of Electrical Engineering and Computer Science, KTH Royal Institute ofTechnology, Stockholm, Sweden. T. J. Oechtering, and M. Skoglund are with the School of Electrical Engineering and Computer Science,KTH Royal Institute of Technology, Stockholm, Sweden (email: { oech, skoglund } @kth.se). a r X i v : . [ c s . I T ] J a n Figure 1. Coordination of U n and V n for a two-node network with an error-free link of rate R . we present both an inner and an outer bound for the ( (cid:15), n ) fixed-length strong coordination capacity region. Inparticular, while the inner bound exploits the achievability approach of [24], we delineate an outer bound proofthat profits from Neyman-Pearson theory and hypothesis testing techniques [30, 34, 38, 39], and that does notrely merely on the characteristics of the chosen network, and should therefore be well-suited for generalization todifferent scenarios.Interestingly, for fixed blocklength n and bound on the L distance (cid:15) , we find rate constraintsrate ≥ mutual information (cid:124) (cid:123)(cid:122) (cid:125) constraint of the asymptotical case + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:114) Vn + O (cid:18) log nn (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) approximation term (1)where V , referred to as channel dispersion, is a characteristic of the “test channel” that connects the random variablesof the problem ( U and V in the setting of Fig. 1) with an auxiliary random variable representing the codebook.Then, Q − is the inverse of the Gaussian cumulative distribution function, and the approximation term is the samerecovered by [30, 33, 34] for channel coding and compression in the finite-length regime. Since the approximationterm vanishes as n increases, we also recover the capacity result of asymptotical strong coordination [4], thereforeanswering the standard question in information theory of how the asymptotic limits relate to their fixed blocklengthcounterparts. We recall that the best known fixed-length capacity results for channel and source coding [30, 33, 34]are also of the formrate constraint of the asymptotical case + Q − ( (cid:15) ) (cid:114) Vn + O (cid:18) log nn (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) approximation termwhere (cid:15) represents the maximal probability of error and the upper-bound on distortion respectively, meaning that thecoordination problem has the same order of approximation as in [30, 33, 34]. Then, even though the fundamentaltrade-off of (1) was expected, through (1) we make the trade-off more explicit, and we prove that high coordinationrate and common randomness are needed to shorten the blocklength and approximate the distribution with higherprecision. Not only the better we can approximate the target distribution, the more expensive this is in terms ofrate, but the shape of these rate constraints shows the direct dependence of the rate on the “level of coordination” (cid:15) inside the approximation term. Similarly, [30, 33, 34] prove the same relation between the rate constraint and theprobability of coding error and the level of distortion, respectively. A. Contributions
The main contributions of this paper are the following.
Problem formulation:
We state the definition of ( (cid:15), n ) fixed length strong coordination as introduced in [24]:for a given blocklength n and bound on the L distance (cid:15) , an i.i.d. distribution is achievable for strong coordinationin the non-asymptotic regime if we can approximate it through the coding process up to a margin (cid:15) . Similarly to [4],we investigate the fixed length strong coordination region R ( (cid:15), n ) for the simplest point-to-point setting comprisedof an i.i.d. source and a noiseless link, in which encoder and decoder share a source of common randomness.The characterization of the coordination region involves two stages: achievability and converse, detailed in thefollowing paragraphs. As in [30, 33], the key step of each stage consists in identifying a sequence of independentrandom variables to which we apply the Berry-Esseen Central Limit Theorem. Inner bound:
Theorem 2 presents the sufficient conditions for achievability for non-asymptotic strong coor-dination. Following the approach designed by [24], we use the random binning approach inspired by [40, 41] to design a random binning and a random coding scheme which are close in L distance. By combining the finite-length techniques of [30, 33] and the Berry-Esseen Central Limit Theorem with the properties of random binning[41], we derive an inner bound on the rate conditions that guarantees coordination with given arbitrary blocklength n . Interestingly, the rate constraints, proved in Section IV, are consistent with the inner bound for the fixed lengthstrong coordination region of a two-nodes network with a noisy link [24, Theorem 1]. Outer bound:
In Theorem 3 we derive the outer bound for the same setting of Theorem 2. The proof involvesa meta-converse, an approach which has proved to be optimal in several scenarios [42, 43]. The meta-converseexploits results from the Neyman-Pearson hypothesis testing, similarly to [30, 34, 38, 39]. Starting from sequenceswhich are by assumption generated with a distribution close to i.i.d., we consider a randomized test between thisdistribution and an i.i.d. one. Then, the probabilities of false positive and miss-detection of the test can be boundedusing the Berry-Esseen Central Limit Theorem and the assumption that strong coordination holds, leading to rateconditions that asymptotically match the achievability.Furthermore, by analyzing the two main results, we observe that we can derive a closed result on the capacityregion in Remark 1.
Comparison with lossy compression:
Since coordination is conceptually related to source coding, we compareour results with the non-asymptotic fundamental limits of lossy data compression [33, 34, 44]. Furthermore,coordination and lossy source coding involve different metrics, thus we need to adapt the formulation of compressionto derive analogies between the two problems.
Discussion of the result:
We discuss the new non-asymptotic results by looking at the trade-off between therate required to achieve strong coordination and the threshold (cid:15) which measures the “level of coordination”.
B. Organization of the paper
The remainder of the paper is organized as follows. Section II introduces the notation and the model, and recallsthe asymptotical result for strong coordination region derived in [4]. In Section III we present the information-theoretic modelling of fixed-length strong coordination together with the main results of this paper: an inner andan outer bound for fixed-length strong coordination. The inner bound is proved in Section IV, while the proofof the outer bound is given in Section V. Finally, the result is further analyzed in Section VI by comparing itto fixed-length lossy compression [33], and by studying the trade-off between the rate-constraint and the level ofcoordination, measured by the bound on the L distance.II. S YSTEM MODEL AND BACKGROUND
We begin by reviewing the main concepts used in this paper.
A. Notation and preliminary results
We define the integer interval (cid:74) a, b (cid:75) as the set of integers from a to b. Given a random vector X n := ( X , . . . , X n ) ,we denote x ∈ X n as a realization of X n , and x i ∈ X is the i − th component of x . We use the notation (cid:107)·(cid:107) and D ( ·(cid:107)· ) to denote the L distance and Kullback-Leibler (KL) divergence respectively. We write x ∼ y if x isproportional to y . Finally for a set X , we denote with Q X the uniform distribution over X .We recall some useful definition and results. Definition 1:
Given A generated according to P A and ( A, B ) generated according to P AB • Information (or entropy density): h P A := log P A ( a ) ; • Conditional information: h P A | B ( a | b ) := log P A | B ( a | b ) ; • Information density: ı P AB := log P AB ( a , b ) P A ( a ) P B ( b ) .Whenever the underlying distribution is clear from the context, we drop the subscript from h ( · ) and ı ( · , · ) . Lemma 1 (Properties of L distance and K-L divergence): (i) (cid:107) P A − ˆ P A (cid:107) ≤ (cid:107) P AB − ˆ P AB (cid:107) , see [3, Lemma 16],(ii) (cid:107) P A − ˆ P A (cid:107) = (cid:107) P A P B | A − ˆ P A P B | A (cid:107) , see [3, Lemma 17],(iii) If (cid:107) P A P B | A − P (cid:48) A P (cid:48) B | A (cid:107) = (cid:15) , then there exists a ∈ A s.t. (cid:107) P B | A = a − P (cid:48) B | A = a (cid:107) ≤ (cid:15) , see [45, Lemma 4]. Definition 2:
A coupling of two probability mass functions P A and P A (cid:48) on A is any probability mass function ˆ P AA (cid:48) defined on A × A whose marginals are P A and P A (cid:48) . Proposition 1 (Coupling property [46, I.2.6]):
Given A generated according to P A , A (cid:48) generated according to P A (cid:48) , any coupling ˆ P AA (cid:48) of P A , P A (cid:48) satisfies (cid:107) P A − P A (cid:48) (cid:107) ≤ P ˆ P AA (cid:48) { A (cid:54) = A (cid:48) } . Now, we recall the Berry-Esseen Central Limit Theorem.
Theorem 1 (Berry-Esseen Central Limit Theorem [47, Thm. 2]):
Given n > and Z i , i = 1 , . . . , n independentr.v.s. Then, for any real t , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P (cid:40) n (cid:88) i =1 Z i > n (cid:32) µ n + t (cid:114) V n n (cid:33)(cid:41) − Q ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ B n √ n , where µ n = 1 n n (cid:88) i =1 E [ Z i ] ,V n = 1 n n (cid:88) i =1 Var [ Z i ] ,T n = 1 n n (cid:88) i =1 E [ | Z i − µ i | ] ,B n = 6 T n V / n , and Q ( · ) is the tail distribution function of the standard normal distribution. B. Point-to-point setting
As in [4], we consider two nodes connected by a one-directional error-free link of rate R and sharing a commonsource of uniformly distributed randomness C defined on C of rate R = log |C| n . At time i = 1 , . . . , n , the nodesperform U i and V i respectively. The sequence U n is assigned by nature and behaves according to the fixeddistribution ¯ P U n . Then, the encoder generates a message M defined on M of rate R = log |M| n as a functionof U n and the common randomness C , via the stochastic map f n : U n × C → M . The message is sent throughthe error-free link of rate R and the sequence V n is generated through the map g n : M × C → V n as a functionof the message M and of the common randomness C . C. Asymptotic case
We recall the definitions of achievability for strong coordination and strong coordination region in the asymptoticregime [4] for the setting of Figure 1. Let P U n V n be the joint distribution induced by the code ( f n , g n ) , a triplet ( ¯ P UV , R, R ) , composed of the target distribution, the error-free channel rate and the rate of common randomness,is achievable for strong coordination if lim n →∞ (cid:107) P U n V n − ¯ P ⊗ nUV (cid:107) = 0 . Then, the strong coordination region is the closure of the set of achievable triplets ( ¯ P UV , R, R ) [4]. In more detail,the strong coordination region is characterized in [4, Theorem 10] as follows: R Cuff := ( ¯ P UV , R, R )¯ P UV = ¯ P U ¯ P V | U ∃ W ∈ W , W generated according to ¯ P W | UV s.t. ¯ P UW V = ¯ P U ¯ P W | U ¯ P V | W R ≥ I ( U ; W ) R + R ≥ I ( U V ; W ) |W| ≤ |U × V| + 1 . (2) To derive a closed result, an auxiliary random variable W is introduced, which represents a “description” of thesource on which the encoder and the decoder have to agree on in order to generate a distribution close in L distance to i.i.d.. III. N ON - ASYMPTOTIC CASE : DEFINITION AND MAIN RESULTS
We recall the notion of ( (cid:15), n ) fixed-length strong coordination as introduced in [24]. Definition 3 ( ( (cid:15), n ) Fixed-length strong coordination):
For a fixed (cid:15) > and n > , a triplet ( ¯ P UV , R, R ) is ( (cid:15), n ) -achievable for strong coordination if there exists a code ( f n , g n ) with common randomness rate R , suchthat (cid:107) P U n V n − ¯ P ⊗ nUV (cid:107) ≤ (cid:15), where P U n V n is the joint distribution induced by the code. Then, the ( (cid:15), n ) fixed-length strong coordination region R ( (cid:15), n ) is the closure of the set of achievable ( ¯ P UV , R, R ) .For the setting of Figure 1, the main result of this paper is the following inner and outer bounds for the ( (cid:15), n ) fixed-length strong coordination region R ( (cid:15), n ) . Theorem 2 (Inner bound for R ( (cid:15), n ) – Sufficient Conditions): Let ¯ P U be the given source distribution, then thetriplets ( ¯ P UV , R, R ) that satisfy the following conditions are achievable for ( (cid:15), n ) fixed-length strong coordination : ¯ P UV = ¯ P U ¯ P V | U , ∃ W ∈ W , W generated according to ¯ P W | UV , such that ¯ P UW V = ¯ P U ¯ P W | U ¯ P V | W ,R ≥ I ( W ; U ) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:115) V ¯ P W | U n + O (cid:18) log nn (cid:19) ,R + R ≥ I ( W ; U V ) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:115) V ¯ P W | UV n + O (cid:18) log nn (cid:19) , (3)where Q ( t ) = (cid:82) ∞ t √ π e − x / dx is the tail distribution function of the standard normal distribution, and V ¯ P W | U and V ¯ P W | UV are the dispersions of the test channels ¯ P W | U and ¯ P W | UV respectively, as defined in [30, Thm. 49]: V ¯ P W | U := min ¯ P W | U Var ( ı ( W ; U ) | W ) = min ¯ P W | U Var ( ı ( W ; U )) ,V ¯ P W | UV := min ¯ P W | UV Var ( ı ( W ; U V ) | W ) = min ¯ P W | UV Var ( ı ( W ; U V )) , where ¯ P W | U and ¯ P W | UV are the test channels that connect the random variables U and V of the given setting withthe auxiliary random variable W representing the codebook, and the last identification follows from the fact thatthe channels ¯ P W | U and ¯ P W | UV have no cost constraint [48, Section 22.3]. Theorem 3 (Outer bound for R ( (cid:15), n ) – Necessary Conditions): Let ¯ P U be the given source distribution, then thetriplets ( ¯ P UV , R, R ) that are achievable for ( (cid:15), n ) fixed-length strong coordination have to satisfy the followingconditions: ¯ P UV = ¯ P U ¯ P V | U , ∃ W ∈ W , W generated according to ¯ P W | UV such that ¯ P UW V = ¯ P U ¯ P W | U ¯ P V | W ,R ≥ I ( W ; U ) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:115) V ¯ P W | U n + O (cid:18) n (cid:19) ,R + R ≥ I ( W ; U V ) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:115) V ¯ P W | UV n + O (cid:18) n (cid:19) + O (cid:32) log (cid:15) (cid:15) (cid:33) , |W| ≤ | U × V| + 1 . (4) Remark 1 (Closed result – Sufficient conditions are also necessary):
By getting a closer look at the rate conditions,we note that by taking the condition of the inner bound (3), we can retrieve a closed result. In fact, for a given ( (cid:15), n ) as well as target distribution ¯ P UV , any ( R, R ) that satisfies the condition in (4), also satisfy the conditionsin (3), since f ( n ) = O (cid:18) log nn (cid:19) if ∃ k > , ∃ n , ∀ n ≥ n | f ( n ) | ≤ k log nn ,g ( n ) = O (cid:18) n (cid:19) if ∃ k > , ∃ n , ∀ n ≥ n | g ( n ) | ≤ k n , and | g ( n ) | ≤ k n ≤ k log nn ∀ n ≥ ⇒ g ( n ) = O (cid:18) log nn (cid:19) . Remark 2 (Comparison with the asymptotic case):
We observe the following analogies between the asymptoticand the fixed-length case: • The decomposition of the target joint distribution is the same (see (2) and (3), (4)). • Even though necessary and sufficient conditions for ( (cid:15), n ) strong coordination lead to different rate constraints,we observe that the constant terms are the same in (3) and in (4), whereas the difference is only in the growthrate of two functions of n that go to zero as n increases, although with different speeds. More precisely, both O (cid:16) log nn (cid:17) and O (cid:0) n (cid:1) vanish when n → ∞ , hence the following terms in the inner bound of (3) and in theouter bound of (4) O (cid:18) log nn (cid:19) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:114) V ¯ P nO (cid:18) n (cid:19) + Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:114) V ¯ P n (5)coincide asymptotically to Q − ( (cid:15) ) (cid:114) V ¯ P n . (6)Thus the rate conditions of both the inner bound and the outer bound are reduced to R ≥ I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n ,R + R ≥ I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n . (7) • Perhaps more interestingly, as in [30, 33, 34] for channel coding and compression in the finite-length regime,we observe that by letting n tend to infinity, we end up with the same rate conditions of the asymptoticcase (2), therefore reproving the results of [4] with different techniques. In fact, in the asymptotic regime (cid:15) vanishes when n → ∞ , and Q − (cid:18) (cid:15) + O (cid:18) √ n (cid:19)(cid:19) (cid:114) V ¯ P n ∼ log (cid:18) O ( (cid:15) ) (cid:19)(cid:114) V ¯ P n . (8)Then, if for example, (cid:15) ∼ √ n , (8) becomes (cid:112) V ¯ P Q − (cid:16) (cid:15) + O (cid:16) √ n (cid:17)(cid:17) √ n ∼ (cid:112) V ¯ P Q − (cid:16) √ n (cid:17) √ n ∼ (cid:112) V ¯ P log √ n √ n → . Hence, we can recover the asymptotic region of (2) from the fixed-length necessary and sufficient conditionsof (3) and (4). Moreover, with this choice the bound (cid:15)
Tot on the L distance between the two distribution goesto zero as / √ n . IV. I
NNER BOUND
Outline of the proof of Theorem 2:
The achievability proof is based on non-asymptotic output statics ofrandom binning [41] and is decomposed into the following steps:A. preliminary definitions and results on random binning are recalled;B. two schemes are defined for a fixed n , a random binning and a random coding scheme; using the properties ofrandom binning, it is possible to derive an upper bound on the L distance between the i.i.d. random binningdistribution P RB and random coding distribution P RC , providing a first bound on (cid:107) P RB − P RC (cid:107) . Then, a secondbound (cid:15) Tot is recovered, by reducing the rate of common randomness to obtain the conditions in (3);C. the term (cid:15)
Tot is analyzed;D. the rate conditions are summarized.
Remark 3:
Observe that, as we will see in Section IV.B.4, the final bound (cid:15)
Tot on the L distance between P RB and P RC is worse than the one found in Section IV.B.3. However, by worsening the L distance, we can reducethe rate of common randomness. A. Preliminaries on random binning
Let A taking values in A be partitioned into R bins at random, we denote with ϕ : A → (cid:74) , R (cid:75) , a (cid:55)→ k , therealization of such partition (or binning), and we call K := ϕ ( A ) a random binning of A . With a slight abuse ofnotation, throughout this text we may refer to the map ϕ as a uniform random binning , if ϕ ( A ) is a binning of A and the partition into bins is performed uniformly at random through ϕ .Now, let the pair ( A, B ) generated according to P AB be a discrete source, and ϕ introduced above be a uniformmap, we denote the distribution induced by the binning as P RB ( a , b , k ) := P AB ( a , b ) { ϕ ( a ) = k } . (9)The first objective consists in ensuring that the binning is almost uniform and almost independent from the sourceso that the random binning scheme and the random coding scheme generate joint distributions that have the samestatistics. Theorem 4 ([41, Thm. 1]):
Given P AB , for every distribution T B on B and any γ ∈ R + , the marginal of P RB in(9) satisfies E (cid:107) P RB ( b , k ) − Q K ( k ) P B ( b ) (cid:107) ≤ (cid:15) App ,(cid:15)
App := P AB ( S γ ( P AB (cid:107) T B ) c ) + 2 − γ +12 , (10)where for a set X , we denote with Q X the uniform distribution over X and S γ ( P AB (cid:107) T B ) := { ( a , b ) : h P AB ( a , b ) − h T B ( b ) − nR > γ } . (11)With the previous result we measure in terms of L distance how well we can approximate a distribution for whichthe binning of A is independently generated from B and uniform, of which we characterize the upper bound (cid:15) App .Intuitively, the approximation error (cid:15)
App is small if the number of bins in which we partition A is high, in particularit has to be higher than the conditional information of A given B , with the real number γ allowing some degreeof freedom.Before stating the second property, we introduce the decoder that we will use, sometimes called the mismatchstochastic likelihood coder (SLC) [49, 50]. Definition 4:
Let T AB be an arbitrary probability mass function, and ϕ : A → (cid:74) , R (cid:75) , a (cid:55)→ k a uniform randombinning of A . A mismatch SLC is defined by the following induced conditional distribution ˆ T ˆ A | BK (ˆ a | b , k ) := T A | B (ˆ a | b ) { ϕ (ˆ a ) = k } (cid:80) ¯ a ∈A T A | B (¯ a | b ) { ϕ (¯ a ) = k } . (12)Then, the following result is used to bound the error probability of decoding A when the decoder has access tothe side information B as well as to the binning ϕ ( A ) = K . Theorem 5 ([41, Thm. 2]):
Given P AB and any distribution T AB , the following bound on the error probabilityof the decoder defined in (12) holds E [ P [ E ]] ≤ P AB ( S γ ( T AB ) c ) + 2 − γ =: (cid:15) Dec , (13)where γ is an arbitrary positive number and S γ ( T AB ) := (cid:8) ( a , b ) : nR − h T A | B ( a | b ) > γ (cid:9) . (14)While we will use Theorem 4 at the encoder to ensure that the random binning probability distribution approximateswell a random coding process, the latter is used at the decoder’s side to minimize the probability of error ofgenerating the wrong sequence. In this context, the bound on the error probability (cid:15) Dec is small if the number ofbins is upper bounded by the conditional information. Since at the encoder’s side we had the opposite request, bydemanding that the rate (or equivalently the number of bins) is large enough, we would have to find a compromisebetween two seemingly competing goals. This issue will be resolved by carefully choosing different side informationat the encoder and at the decoder, and by playing with different values of γ . B. Fixed-length coordination scheme
The encoder and the decoder share a source of uniform randomness C ∈ (cid:74) , nR (cid:75) . Moreover, suppose that theencoder and decoder have access not only to common randomness C but also to extra randomness F , where C is generated uniformly at random in (cid:74) , nR (cid:75) with distribution Q C and F is generated uniformly at random in (cid:74) , n ˜ R (cid:75) with distribution Q F independently of C . The encoder observes the source U n generated according to ¯ P U n and selects a message M of rate R , which is then transmitted through an error-free link to the decoder. Then,the decoder exploits the message and the common randomness to select V n .In the rest of this section, we introduce an auxiliary random variable W n which is not part of the setting suchthat the Markov chain U − W − V holds. This random variable represents the “description” of the source on whichthe encoder and the decoder have to agree to produce the right distributions. In order to do so, we consider the i.i.d.target distribution, and we define three binnings of W n , thus inducing a joint distribution on the random variables ( U n , W n , V n ) and on the binnings, which we call random binning distribution . Then, we define a random codingdistribution , and we use the binning properties of Theorem 4 and Theorem 5 to estimate the L distance betweenthis random coding distribution and the random binning distribution. Since the marginal of the random binningdistribution coincides with the target distribution, we can estimate the upper bound on the L distance (cid:15) Tot betweenthe target distribution and the distribution induced by the code. This will be done in two steps: first, we derivean upper bound on the L distance by coordinating the sequences ( U n , W n , V n ) ; finally we see how to reuseTheorem 5 to reduce the amount of common randomness and coordinate U n and V n only.
1) Random binning scheme:
Let ¯ P U ¯ P V | U be the target distribution, we introduce an auxiliary random variable W such that U n , W n , and V n are jointly i.i.d. with distribution ¯ P := ¯ P U n ¯ P W n | U n ¯ P V n | W n . We consider three uniform random binnings for W n :i) binning C = ϕ C ( W ) , where ϕ C : W n → (cid:74) , nR (cid:75) ,ii) binning M = ϕ F ( W ) , where ϕ F : W n → (cid:74) , nR (cid:75) ,iii) binning F = ϕ M ( W ) , where ϕ M : W n → (cid:74) , n ˜ R (cid:75) ,and, inspired by [40, 41, 49], we consider a decoder defined according to (12) that reconstructs ˆ W n : ˆ T ˆ W n | F CM ( ˆ w | f , c , m ) := T W n ( ˆ w ) { ϕ ( ˆ w ) = ( f , c , m ) } (cid:80) ¯ w ∈W n T W n ( ¯ w ) { ϕ ( ¯ w ) = ( f , c , m ) } (15)where ϕ = ( ϕ C , ϕ F , ϕ M ) . This induces the joint distribution P RB := ¯ P U n ¯ P W n | U n ¯ P F | W n ¯ P C | W n ¯ P M | W n ¯ P V n | W n ˆ T ˆ W n | F CM . (16)In particular, P RB W n | F CU n is well defined.
2) Random coding scheme:
The encoder generates W n according to P RB W n | F CU n defined above. At the decoder, ˆ W n is generated via the conditional distribution ˆ T ˆ W n | F CM . The decoder then generates V n according to thedistribution P RC V n | ˆ W n (ˆ v | ˆ w ) := ¯ P V n | ˆ W n (ˆ v | ˆ w ) , where ˆ w is the output of a decoder defined as in (12). This inducesthe joint distribution P RC := Q F Q C ¯ P U n P RB W n | F CU n ¯ P M | W n ˆ T ˆ W n | F CM P RC V n | ˆ W n . (17)Observe that the marginal of the distribution P RB is by construction trivially close in L distance to the targetdistribution ¯ P . We use the properties of random binning to show that the random binning distribution P RB and therandom coding distribution P RC are (cid:15) -close in L distance, and therefore so are the marginals of P RC and ¯ P .
3) Strong coordination of ( U n , V n , W n ) — Initial bound: By applying Theorem 4 and Theorem 5 to P RB and P RC , we have (cid:107) P RB U n W n CF M − P RC U n W n CF M (cid:107) a ) = (cid:107) ¯ P U n ¯ P W n | U n ¯ P C | W n ¯ P F | W n − Q C Q F ¯ P U n P RB W n | CF U n (cid:107) ≤ (cid:15) App , E [ P [ E ]] ≤ (cid:15) Dec , where ( a ) comes from Lemma 1 (ii), and (cid:15) App := ¯ P U n CF (cid:0) S c γ (cid:1) + 2 − γ , (18a) (cid:15) Dec := ¯ P W n (cid:0) S c γ (cid:1) + 2 − γ , (18b)with γ and γ arbitrary positive numbers, and S γ := S γ ( ¯ P U n CF (cid:107) ¯ P U n ) = { ( u , w ) : h ¯ P ( u , w ) − h ¯ P ( u ) − n ( ˜ R + R ) > γ } , (19a) S γ := S γ ( ¯ P W n ) = { w : n ( R + R + ˜ R ) − h ¯ P ( w ) > γ } ( b ) = (cid:110) w : n ( R + R + ˜ R ) − n (cid:88) i =1 h ¯ P ( w i ) > γ (cid:111) , (19b)where ( b ) comes from the choice of the decoder (15). Then, we have (cid:107) P RB U n W n CF ¯ P M | W n ˆ T ˆ W n | CF M − P RC U n W n CF ¯ P M | W n ˆ T ˆ W n | CF M (cid:107) = (cid:107) P RB U n W n CF M ˆ W n − P RC U n W n CF M ˆ W n (cid:107) ≤ (cid:15) App + (cid:15) Dec . To conclude, observe that in the random binning scheme we have V n generated according to ¯ P V n | W n , W n generated according to ¯ P W n | U n , while in the random coding scheme we have V n generated according to P RC V n | ˆ W n , ˆ W n generated according to ˆ T ˆ W n | CF M . Then, by applying the coupling result of Proposition 1, we have (cid:107) P RB − P RC (cid:107) ≤ (cid:15) App + 5 (cid:15)
Dec .
4) Reducing the rate of common randomness — Final bound:
Although in a first instance we have exploitedthe extra randomness F to coordinate the whole triplet ( U n , V n , W n ) with maximal L distance (cid:15) App + 5 (cid:15)
Dec , wenow show that we do not need it in order to coordinate only ( U n , V n ) . As in [45] we can reduce the requiredamount of common randomness by having the two nodes agree on a suitable realization of the extra randomness F . By fixing F = f , we will reduce the rate requirements at the expense of the upper bound on the L distanceby introducing a third approximation term, ending up with a new upper bound (cid:15) Tot . Now, we detail under whichcircumstance such suitable realization of F exists, and we derive the final upper bound on the L distance (cid:15) Tot . Todo so, first we apply Theorem 4 to A = W n , B = ( U n , V n ) , P B = P RB U n V n , P AB = P RB U n V n W n and K = F . Then,we have (cid:107) P RB U n V n F − Q F P RB U n V n (cid:107) ≤ (cid:15) App , , (20)where (cid:15) App , := P RB (cid:0) S c γ (cid:1) + 2 − γ , (21a) S γ := S γ ( P RB U n V n W n (cid:107) P RB U n V n ) = { ( u , v , w ) : h P RB ( u , v , w ) − h P RB ( u , v ) − n ˜ R > γ } . (21b)Now, we recall that by Lemma 1 (i), we have (cid:107) P RB U n V n F − P RC U n V n F (cid:107) ≤ (cid:107) P RB − P RC (cid:107) ≤ (cid:15) App + 5 (cid:15)
Dec , (22)and combining (20) and (22) with the triangle inequality, we have (cid:107) Q F P RB U n V n − Q F P RC U n V n (cid:107) ≤ (cid:107) P RB U n V n F − Q F P RB U n V n (cid:107) + (cid:107) P RB U n V n F − P RC U n V n F (cid:107) ≤ (cid:15) App , + (cid:15) App + 5 (cid:15)
Dec . Finally by Lemma 1 (iii), there exists an instance F = f , such that (cid:107) P RB U n V n | F = f − P RC U n V n | F = f (cid:107) ≤ (cid:15) Tot , (23a) (cid:15) Tot := 2 ( (cid:15)
App , + (cid:15) App + 5 (cid:15)
Dec ) . (23b) C. Analysis of the L distance (cid:15) Tot
Here, we take a closer look at the overall L distance between the i.i.d. distribution and the random coding one,denoted with (cid:15) Tot . First, substituting the explicit expression for ( (cid:15) App , (cid:15)
Dec , (cid:15)
App , ) of (18a), (18b), and (21a) into(23b), the bound in (23a) becomes (cid:15) Tot = 2 ¯ P U n CF (cid:0) S c γ (cid:1) + 10 ¯ P W n (cid:0) S c γ (cid:1) + 2 P RB (cid:0) S c γ (cid:1) + 2 (cid:104) − γ + 5 · − γ + 2 − γ (cid:105) . (24)By the union bound and De Morgan’s law, we have (cid:15) Tot ≤ (cid:2) ¯ P U n CF (cid:0) S c γ (cid:1) + ¯ P W n (cid:0) S c γ (cid:1) + P RB (cid:0) S c γ (cid:1)(cid:3) + 2 (cid:104) − γ + 5 · − γ + 2 − γ (cid:105) = 10 (cid:2) ¯ P (cid:0) S c γ ∪ S c γ ∪ S c γ (cid:1)(cid:3) + 2 (cid:104) − γ + 5 · − γ + 2 − γ (cid:105) ≤ (cid:2) ¯ P (( S γ ∩ S γ ∩ S γ ) c ) (cid:3) + 2 (cid:104) − γ + 5 · − γ + 2 − γ (cid:105) . (25)In the next paragraph, we investigate ( S γ ∩ S γ ∩ S γ ) c , to understand which rate conditions are dominant tominimize the measure of the set as a function of γ i , i = 1 , , . In a second instance, we choose the parameters ( γ , γ , γ ) such that (cid:15) Tot defined above is small.
1) Analysis of ( S γ ∩ S γ ∩ S γ ) c : First, we write explicitly the set: ( S γ ∩ S γ ∩ S γ ) c = ( u , v , w ) : h ( u , v , w ) − h ( u , v ) − n ˜ R > γ h ( u , w ) − h ( u ) − n ( R + ˜ R ) > γ n ( R + ˜ R + R ) − h ( w ) > γ c = ( u , v , w ) : h ( w | uv ) − n ˜ R > γ h ( w | u ) − n ( R + ˜ R ) > γ n ( R + ˜ R + R ) − h ( w ) > γ c = ( u , v , w ) : n ˜ R < h ( w | uv ) − γ n ( R + R ) > ı ( w ; uv ) + γ + γ nR > ı ( w ; u ) + γ + γ c = (cid:40) ( u , v , w ) : n ˜ R < h ( w | uv ) − γ (cid:41) c ∪ (cid:40) ( u , v , w ) : n ( R + R ) > ı ( w ; uv ) + γ + γ (cid:41) c ∪ (cid:40) ( u , v , w ) : nR > ı ( w ; u ) + γ + γ (cid:41) c = (cid:40) ( u , v , w ) : n ˜ R ≥ h ( w | uv ) − γ (cid:41) ∪ (cid:40) ( u , v , w ) : n ( R + R ) ≤ ı ( w ; uv ) + γ + γ (cid:41) ∪ (cid:40) ( u , v , w ) : nR ≤ ı ( w ; u ) + γ + γ (cid:41) . (26)Now, recall that in Section IV.B.4 the extra common randomness F of rate ˜ R has been fixed to an instance F = f .Hence, to minimize the measure of the set ( S γ ∩ S γ ∩ S γ ) c , we only need to minimize the second and thirdterms of (26). We define the sets S ı ( w ; uv ) := (cid:40) ( u , v , w ) : n ( R + R ) ≤ ı ( w ; uv ) + γ + γ (cid:41) , (27) S ı ( w ; u ) := (cid:40) ( u , v , w ) : nR ≤ ı ( w ; u ) + γ + γ (cid:41) , (28)and we treat them separately in the following. a) Analysis of S ı ( w ; uv ) : We observe that, since the distribution ¯ P is i.i.d., the terms Z i = ı ¯ P ( w i , u i v i ) , aremutually independent for i = 1 , . . . n . Then, we consider the following inequality n ( R + R ) > n (cid:88) i =1 E ¯ P WUV [ ı ¯ P ( w i ; u i v i )] (cid:124) (cid:123)(cid:122) (cid:125) nµ n + Q − ( (cid:15) ) (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 Var ¯ P WUV ( ı ¯ P ( w i ; u i v i )) (cid:124) (cid:123)(cid:122) (cid:125) n √ V n /n + γ , (29)where µ n = n (cid:80) ni =1 E [ Z i ] , V n = n (cid:80) ni =1 Var [ Z i ] and Q ( · ) is the tail distribution function of the standard normaldistribution. We prove that, assuming that (29) holds, we can successfully bound S ı ( w ; uv ) . In fact, the chain ofinequalities n (cid:88) i =1 i ¯ P ( W ; U V ) > n ( R + R ) − ( γ + γ ) > n (cid:32) µ n + t (cid:114) V n n (cid:33) implies that, if (29) holds, S ı ( w ; uv ) is contained in (cid:40) ( u , v , w ) : n (cid:88) i =1 ı ¯ P ( w i ; u i , v i ) > nµ n + n Q − ( (cid:15) ) (cid:114) V n n (cid:41) . (30)Therefore, if we find an upper bound on (30), we have an upper bound on S ı ( w ; uv ) as well. To obtain that, weapply Theorem 1 (Berry-Esseen CLT) to the right-hand side of (30), and we choose Q ( t ) = (cid:15) ,(cid:15) ∗ = (cid:15) + B n √ n , , (31)where, as in the statement of Theorem 1 (Berry-Esseen CLT), B n = 6 T n V / n , and T n = n (cid:80) ni =1 E [ | Z i − µ i | ] . Then,we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P (cid:40) n (cid:88) i =1 ı ¯ P ( w i , u i , v i ) > nµ n + n Q − ( (cid:15) ) (cid:114) V n n (cid:41) − (cid:15) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ B n √ n , ⇒ P (cid:40) n (cid:88) i =1 ı ¯ P ( w i , u i , v i ) > nµ n + n Q − ( (cid:15) ) (cid:114) V n n (cid:41) ≤ (cid:15) ∗ . (32)Finally, (32) combined with (30) implies ¯ P (cid:0) S ı ( w : uv ) (cid:1) ≤ (cid:15) ∗ . Moreover, we can simplify (29) with the following identifications. Remark 4 (Mutual Information and Channel Dispersion):
Similarly to [51, Section IV.A], observe that, since ( u i , w i , v i ) are generated i.i.d. according to the same distribution ¯ P W UV , we have µ n := 1 n n (cid:88) i =1 E ¯ P WUV [ ı ¯ P ( w i ; u i , v i )]= E ¯ P WUV [ ı ¯ P ( w ; u, v )]= I ¯ P ( W ; U V ) , (33a) V n := 1 n n (cid:88) i =1 Var ¯ P WUV ( ı ¯ P ( w i ; u i v i ))= Var ¯ P WUV ( ı ¯ P ( W ; U V )) (33b)and V ¯ P W | UV = min ¯ P W | UV (cid:2) Var ¯ P WUV ( ı ¯ P ( W ; U V )) (cid:3) = min ¯ P W | UV (cid:2) Var ¯ P WUV ( ı ¯ P ( W ; U V ) | W ) (cid:3) is the dispersion of thechannel ¯ P W | UV as defined in [30, Thm. 49].Then, (29) can be rewritten as n ( R + R ) > nI ¯ P ( W ; U V ) + n Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + ( γ + γ ) . (34) b) Analysis of S ı ( w ; u ) : Similarly, we use Theorem 1 (Berry-Esseen CLT) to estimate ¯ P ( S ı ( w ; u ) ) . If weapply the same reasoning to Z (cid:48) i = ı ¯ P ( w i , U i ) for i = 1 , . . . n , and µ (cid:48) n = n (cid:80) ni =1 E [ Z (cid:48) i ] = I ( W ; U ) , V (cid:48) n = n (cid:80) ni =1 Var [ Z (cid:48) i ] = V ¯ P W | U , T (cid:48) n = n (cid:80) ni =1 E [ | Z (cid:48) i − µ (cid:48) i | ] , B (cid:48) n = 6 T n V / n , we find that ¯ P (cid:0) S ı ( w : u ) (cid:1) ≤ (cid:15) ∗ = (cid:15) + B (cid:48) n n if nR > nI ¯ P ( W ; U ) + n Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + ( γ + γ ) . (35)A more detailed proof can be found in Appendix A.
2) Choice of ( γ , γ , γ ) and rate conditions: If we choose ( γ , γ , γ ) = (log n, log n, log n ) , then the bound(24) on the L distance becomes (cid:107) P RB U n V n − P RC U n V n (cid:107) ≤ (cid:15) Tot ,(cid:15)
Tot = 10 ¯ P (( S γ ∩ S γ ∩ S γ ) c ) + 2 (cid:16) − γ + 5 · − γ + 2 − γ (cid:17) = 10 ¯ P (( S γ ∩ S γ ∩ S γ ) c ) + 10 + 2 √ √ n ≤
10 ¯ P (cid:0) S ı ( w ; uv ) (cid:1) + 10 ¯ P (cid:0) S ı ( w ; u ) (cid:1) + 10 + 2 √ √ n ≤
10 ( (cid:15) ∗ + (cid:15) ∗ ) + 10 + 2 √ √ n = 10 ( (cid:15) + (cid:15) ) + 10(1 + B n + B (cid:48) n ) + 2 √ √ n = 10 ( (cid:15) + (cid:15) ) + O (cid:18) √ n (cid:19) . (36)With this choice for γ i , the rate conditions become R + R > I ¯ P ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + 3 log n n , R > I ¯ P ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + 3 log n n . (37)From now on, we drop the subscript ¯ P from I ( · , · ) to simplify the notation.V. O UTER BOUND
Consider a code ( f n , g n ) that induces a distribution P U n V n that is (cid:15) -close in L distance to the i.i.d. distribution ¯ P ⊗ nUV : (cid:15) := (cid:107) P U n W n V n − ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:107) ≤ (cid:15) Tot , (38a) (cid:15) Tot = 10 ( (cid:15) + (cid:15) ) + O (cid:18) √ n (cid:19) , (38b) ≤ (cid:15) , (cid:15) ≤ , (cid:15) , (cid:15) ∈ R , (38c)where the parameters (cid:15) Tot , (cid:15) , and (cid:15) in (38b) and (38c) are given in the achievability in Section IV.C, and (38a) islater referred as the ( (cid:15), n ) - strong coordination assumption . Furthermore, we identify the auxiliary random variables W t as ( C, M ) for each t ∈ (cid:74) , n (cid:75) and W as ( W T , T ) = ( C, M, T ) . Note that by definition of encoder and decoderthe following Markov chains hold: U t − ( C, M ) − V t ⇔ U t − W t − V t ,U T − ( C, M, T ) − V T ⇔ U − W − V. In the next section, we present a proof of the rate constraints based on a meta-converse, following the approachof [30]. More precisely, we show the bound on the rate R , while the proof for the bound on R + R and thecardinality bound are developed in Appendix D and Appendix E respectively. A. First bound – R Similarly to [30], for an observation w = ( m, c ) , we define the hypothesis: H : w generated according to P W n | U n ( w ) = (cid:88) u P U n ( u ) P W n | U n ( w | u ) = (cid:88) u P U n W n ( u , w ) , H : w generated according to ¯ P ⊗ nW ( w ) = (cid:88) u ¯ P ⊗ nU ¯ P ⊗ nW ( u , w ) , where P U n W n is the coding distribution (cid:15) -close to i.i.d. by assumption, and ¯ P ⊗ nUW is the i.i.d. target distribution.We now carry out a thought experiment: we consider a test with a (possibly) stochastic decision rule between thedistributions P U n W n and ¯ P ⊗ nU ¯ P ⊗ nW : a test is defined by a random transformation P Z | U n W n : U n × W n → {H , H } , where H indicates that the test chooses P U n W n , and H indicates that the test chooses ¯ P ⊗ nU ¯ P ⊗ nW . The purposeof this randomized test is to bound the error probability of such test declaring that a sequence w is generatedaccording to the product distribution ¯ P ⊗ nUW , while w is by assumption generated according to P U n W n . Later on,the bound on this error probability will be reduced to the rate conditions of the outer bound (4) by applying theTheorem 1 (Berry-Esseen CLT) and the ( (cid:15), n ) -strong coordination assumption (38a). Hence, we start by defining theprobability of type-I error (probability of choosing H when the true hypothesis is H ) and type-II error (probabilityof choosing H when the true hypothesis is H ) as P I e ( P Z | U n W n ) := P { ˆ H |H } = (cid:88) u ¯ P ⊗ nU ( u ) ¯ P ⊗ nW ( w ) P Z | U n W n ( H | u , w ) , (39a) P II e ( P Z | U n W n ) := P { ˆ H |H } = (cid:88) u P U n W n ( u , w ) P Z | U n W n ( H | u , w ) . (39b) Similar to [30], we denote the minimum type-I error for a maximum type-II error − α : β α := min P Z | UnWn : P II e ( P Z | UnWn ) ≤ − α P I e ( P Z | U n W n )= min P Z | UnWn : (cid:80) u P UnWn ( u , w ) P Z | UnWn ( H | u , w ) ≤ − α (cid:88) u ¯ P ⊗ nU ( u ) ¯ P ⊗ nW ( w ) P Z | U n W n ( H | u , w ) , (40)where the error probability α will be defined later. For the error probabilities α and β α , [48, Section 12.4] provesthe following relations: • upper bound on min P I e ( P Z | U n W n ) β α ≤ γ , with γ s.t. P P UnWn (cid:26) log P U n W n ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:27) ≥ α, (41a) • lower bound on min P I e ( P Z | U n W n ) α ≤ P P UnWn (cid:26) log P U n W n ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:27) + γ β α ∀ γ > . (41b)Then, our goal is to use the inequalities (41a) and (41b) to prove the rate constraint nR ≥ nI ( W ; U ) + Q − ( (cid:15) ) (cid:113) nV ˆ P W | U − x (42)where (cid:15) is an approximation term defined in Section IV.C.1.b in the achievability, and so is the term Q − ( (cid:15) ) (cid:113) nV ˆ P W | U which comes from (35) in the achievability, and x ∈ R is a parameter which will be defined later. First,we split (42) into an upper and a lower bound on the logarithm of the error probability β α :upper bound on log β α log 1 β α ≥ nI ( W ; U ) + Q − ( (cid:15) ) (cid:113) nV ˆ P W | U , (43a)lower bound on log β α nR + x ≥ log 1 β α . (43b)Then, the proof of (42) is divided in the following steps, detailed in the next sections:(i) Proof of the upper bound on log β α : we use the upper bound on min P I e ( P Z | U n W n ) of (41a) combined withthe ( (cid:15), n ) -strong coordination assumption (38a) and Theorem 1 (Berry-Esseen CLT) to derive the followingupper bound on the logarithm β α of (43a) by choosing the parameter γ ;(ii) Proof of the lower bound on log β α : we use the lower bound on min P I e ( P Z | U n W n ) of (41b) combined withthe ( (cid:15), n ) -strong coordination assumption (38a) and classical information theory properties to derive the lowerbound on β α of (43b) by choosing the parameter γ ;(iii) Proof of the rate constraint: we combine (43a) and (43b) proved in the previous steps and we derive (42).Before proceeding, observe that by the ( (cid:15), n ) -strong coordination assumption (38a) and Lemma 1 (i), we have (cid:107) P U n W n − ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:107) ≤ (cid:107) P U n W n V n − ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:107) = (cid:15) ≤ (cid:15) Tot , which we can distinguish into two cases: • Case 1 P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U + (cid:15) ; • Case 2 P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) .Thus, we have to prove the steps (i)–(iii) separately for both cases 1 and 2.
1) Proof of the upper bound on log β α (43a) – Case 1 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U + (cid:15) ): By the ( (cid:15), n ) -strongcoordination assumption (38a), we have log ( P U n W n ( u , w )) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) + (cid:15) (cid:17) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) (cid:17) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) (cid:33) . (44) Thus, we rewrite the left-hand side of the upper bound on min P I e ( P Z | U n W n ) in (41a) as P (cid:8) log P U n W n ≥ log ¯ P ⊗ nU ¯ P ⊗ nW + log γ (cid:9) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) ≥ log ¯ P ⊗ nU ¯ P ⊗ nW + log γ (cid:41) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW ≥ log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:41) . (45)Given the parameter (cid:15) defined in Section IV.C, ≤ (cid:15) ≤ , we choose the parameter γ as: log γ := nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) (46)where, as in the statement of Theorem 1 (Berry-Esseen CLT), (cid:80) ni =1 log( ¯ P U ¯ P W | U / ¯ P U ¯ P W ) =: (cid:80) ni =1 Z i is thesum of n i.i.d. random variables, and µ n = n (cid:80) ni =1 E [ Z i ] , V n = n (cid:80) ni =1 Var [ Z i ] , T n = n (cid:80) ni =1 E [ | Z i − µ i | ] , B n = 6 T n V / n , and Q ( · ) is the tail distribution function of the standard normal distribution. Moreover, we can rewritethese terms by using the identification of the following remark: Remark 5 (Mutual Information and Channel Dispersion):
Similarly to [30], we observe that for the discrete i.i.d.distributions ¯ P ⊗ nU ¯ P ⊗ nW | U and ¯ P ⊗ nU ¯ P ⊗ nW , we have: µ n = D ( ¯ P U ¯ P W | U (cid:107) ¯ P U ¯ P W ) = I ( W ; U ) ,V n = (cid:88) u,w ¯ P U ( u ) ¯ P W | U ( w | u ) (cid:34) log ¯ P U ( u ) ¯ P W | U ( w | u )¯ P U ( u ) ¯ P W ( w ) (cid:35) − D ( ¯ P U ¯ P W | U (cid:107) ¯ P U ¯ P W ) = V ¯ P W | U ,T n = (cid:88) u,w ¯ P U ( u ) ¯ P W | U ( w | u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ¯ P U ( u ) ¯ P W | U ( w | u )¯ P U ( u ) ¯ P W ( w ) − D ( ¯ P U ¯ P W | U (cid:107) ¯ P U ¯ P W ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,B n = 6 T n V / n . (47)Now, observe that we have chosen parameter γ in (46) appropriately such, that with the identifications of Remark 5,the probability of error (45) becomes P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n (cid:41) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nI ( U ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | U (cid:41) . (48)Since (cid:80) ni =1 log( ¯ P U ¯ P W | U / ¯ P U ¯ P W ) = (cid:80) ni =1 Z i is the sum of n i.i.d. random variables, the next step is to bound theprobability in (48) using Theorem 1 (Berry-Esseen CLT): P (cid:40) log n (cid:89) i =1 ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nI ( U ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | U (cid:41) − (cid:15) ≥ − B n √ n ⇐⇒ P (cid:40) log n (cid:89) i =1 ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nI ( U ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | U (cid:41) ≥ (cid:18) (cid:15) − B n √ n (cid:19) . (49)Then, we identify α := (cid:15) − B n √ n , (50) and by combining the upper bound on min P I e ( P Z | U n W n ) of (41a) with the parameter γ as chosen in (46), weobtain: log 1 β α ≥ log γ = nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) . (51)
2) Proof of the lower bound on log β α (43b) – Case 1 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U + (cid:15) ): First, we observe that nR = H ( M ) ( a ) ≥ nI ( U ; W ) ( b ) ≥ nI ( U ; W ) + Q − ( y ) (cid:112) nV n , < y < , (52)where ( a ) is proved in Appendix B and ( b ) comes from the fact that Q − ( y ) √ nV n ≤ for every < y < .Now, we recall that by the lower bound on min P I e ( P Z | U n W n ) of (41b), for every γ > we have β α ≥ γ (cid:20) α − P P UnWn (cid:26) log P U n W n ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:27)(cid:21) ( c ) = 1 γ (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U + (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:41)(cid:35) = 1 γ (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:41)(cid:35) , ⇐⇒ log β α ≥ log 1 γ (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:41)(cid:35) , (53)where in ( c ) we have used the ( (cid:15), n ) -strong coordination assumption (38a). Then, similarly to Section V.A.1, wechoose appropriately the parameter γ : log γ = H ( M ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) . (54)With this choice of γ , we plug (52) into (53), and we have log β α ≥ log 1 γ + log (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:41)(cid:35) = − (cid:34) H ( M ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:35) + log (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > H ( M ) (cid:41)(cid:35) ≥ − (cid:34) H ( M ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33)(cid:35) + log (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > nI ( U ; W ) + Q − ( y ) (cid:112) nV n (cid:41)(cid:35) = − H ( M ) − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) + log (cid:18) α − y − B n √ n (cid:19) (55)which is equivalent to the lower bound on log β α of (43b) if we identify x = log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) − log (cid:18) α − y − B n √ n (cid:19) , (56) since nR + x = nR + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) − log (cid:18) α − y − B n √ n (cid:19) = H ( M ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) − log (cid:18) α − y − B n √ n (cid:19) ≥ log 1 β α . (57)
3) Proof of the rate constraint – Case 1 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U + (cid:15) ): Now, we can conclude the proof of thispart of the outer bound by combining (51) and (57). In fact, for / < y < we have H ( M ) (cid:124) (cid:123)(cid:122) (cid:125) nR + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) − log (cid:18) α − y − B n √ n (cid:19) ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U (cid:33) which is equivalent to R ≥ µ n + Q − ( (cid:15) ) (cid:114) V n n + log (cid:16) α − y − B n √ n (cid:17) n . (58)
4) Proof of the rate constraint – Case 2: P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) : The proofs of the upper bound and ofthe lower bound on log β α of (43a) and (43b) are similar to the one of Section V.A.1 and Section V.A.2, and aretherefore deferred to Appendix C. Remark 6 (Speed of convergence):
Note that for both case 1 and case 2 we retrieve the same rate condition asin (58), and the term log (cid:16) α − y − B n √ n (cid:17) n ≤ α − y − B n √ n n = α − yn − B n n √ n = O (cid:18) n (cid:19) . which goes to zero faster than the term log n/n in the achievability. B. Second bound – R + R The proof is similar to the one of Section V.A, and it is deferred to Appendix D.VI. D
ISCUSSION ON THE RESULT
A. Comparison with fixed-length lossy compression
In [4, 20] the authors show that empirical coordination in the asymptotic regime yields the rate-distortion resultof Shannon [52].
Empirical coordination is the weaker form of coordination which requires the joint histogram ofthe devices’ distributed random states to approach a target distribution in L distance with high probability [4],thus capturing an “average behavior” of the agents. This metric of choice can be specialized to the probability ofdistortion, therefore connecting empirical coordination with source coding [4, 20]. In this paper however we haveconsidered the strong coordination metric, which requires the joint distribution of sequences of distributed randomstates to converge to an i.i.d. target distribution in L distance instead [4], hence dealing with a different and morestringent constraint which demands a positive rate of common randomness. Nonetheless, by looking at the knownresults for fixed-length rate-distortion [33], we can derive similarities with our case of study.First, we recall the setting and notation of fixed-length lossy compression [33]. The output of a source S generatedaccording to ¯ P S with alphabet M is mapped to one of the M codewords from ˆ M , and a lossy code consists of a pairof mappings f : M (cid:55)→ { , . . . , M } and c : { , . . . , M } (cid:55)→ ˆ M . Then, a distortion measure d : M × ˆ M (cid:55)→ [0 , ∞ ) is used to quantify the performance of a lossy code. Given the decoder c , the best encoder maps the source outputto the codeword which minimises the distortion. Then, (cid:15) is the excess-distortion probability if P { d ( S, c ( f ( S ))) > d } ≤ (cid:15). (59) The minimum achievable code size at excess-distortion probability (cid:15) and distortion d is defined by M ∗ ( d, (cid:15) ) = min { M : ∃ ( M, d, (cid:15) ) code } ,R ( n, d, (cid:15) ) = 1 n log M ∗ ( M, d, (cid:15) ) . Then, the following achievability results are presented in [33, 44].
Theorem 6 (Achievability for fixed-length lossy compression [44, Thm. 2.21]):
There exists an ( M, d, (cid:15) ) codewith (cid:15) ≤ inf P Z | S (cid:40) P { d ( S, Z ) > d } + inf γ> (cid:40) sup z ∈ ˆ M P { ı S ; Z ( S ; z ) ≥ log M − γ } (cid:41) + exp( − γ ) (cid:41) (60) Theorem 7 (Gaussian approximation for fixed-length lossy compression [33, Thm. 12]):
When the source ismemoryless, R ( n, d, (cid:15) ) = R ( d ) + (cid:114) Vn Q − ( (cid:15) ) + Θ (cid:18) log nn (cid:19) ≥ min I ( S ; Z ) + (cid:114) Vn Q − ( (cid:15) ) + Θ (cid:18) log nn (cid:19) (61)where V is the dispersion term, and f ( n ) = Θ (cid:16) log nn (cid:17) indicates that f is bounded both above and below by log nn asymptotically: ∃ k > , ∃ k > , , ∃ n such that ∀ n > n , k nn ≤ f ( n ) ≤ k nn .Now, if we choose as a distance d ( · ) the L distance, the strong coordination condition (cid:107) P U n V n − ¯ P ⊗ nUV (cid:107) ≤ d (62)implies the rate-distortion condition (59). Then, we can interpret the strong coordination problem outlined inSection II as two connected “stronger” rate-distortion problems depicted in Figure 2 and Figure 3 respectively: • Rate-distortion Problem 1: first, at the encoder we have to generate a pair ( U, V ) which is close in L distanceto the one generated according to the fixed i.i.d. distribution ¯ P UV ; • Rate-distortion Problem 2: in a second instance, the decoder has to reconstruct the source U to produce V via the conditional distribution ¯ P V | U . Figure 2. Rate-distortion Problem 1: Compression of the source ( U n , V n ) with a link of rate R + R .Figure 3. Rate-distortion Problem 2: Compression of the source U n with a link of rate R and reconstruction of V n . If the two goals are fulfilled the strong coordination requirements are met, and each one implies lossy compressionof a source. Then, at the encoder, we can interpret the strong coordination problem as “distorting” a source of i.i.d.distribution ¯ P UV , by exploiting a link of rate R + R as in Figure 2. Then, the constraint on rate of the coordinationproblem R + R ≥ min I ( U V ; W ) + (cid:114) Vn Q − ( (cid:15) ) + O (cid:18) log nn (cid:19) (63) is similar and implies the rate-distortion condition of Theorem 7: R + R ≥ min I ( U V ; W ) + (cid:114) Vn Q − ( (cid:15) ) + Θ (cid:18) log nn (cid:19) . (64)Moreover, as in Figure 3, if the decoder is able to reconstruct U n reliably, then it can generate V n by using thei.i.d. distribution, and strong coordination would be achieved. By allowing W n to be the reliable reconstruction of U n at the decoder, we can also reformulate this in terms of “stronger” rate-distortion, and we find R ≥ min I ( U ; W ) + (cid:114) Vn Q − ( (cid:15) ) + O (cid:18) log nn (cid:19) , (65)which implies the rate-distortion condition of Theorem 7: R ≥ min I ( U ; W ) + (cid:114) Vn Q − ( (cid:15) ) + Θ (cid:18) log nn (cid:19) . (66)Then, the constraints (64) and (66) which ensure lossy compression are similar to the rate conditions (63) and (65)in Theorem 2 for achievability in fixed-length strong coordination, with the only difference being the order ofapproximation. However, as we will see in Section IV, we prove the achievability with rate constraintsrate ≥ mutual information + (cid:114) Vn Q − ( (cid:15) ) + constant · log nn (cid:124) (cid:123)(cid:122) (cid:125) Θ ( log nn ) which is consistent with (64) and (66). We keep the term O (cid:16) log nn (cid:17) in Theorem 2 because, being less restrictive,it allows us to derive the closed result for the fixed-length coordination region of Corollary 1.Notice that there is no rate of common randomness in this second constraint. This is because the two problemshave to be solved together, and once that all the possible distributions ¯ P UV are generated at the “encoder side” inRate-distortion Problem 1, the decoder’s role merely relies on generating the correct random variable. This conceptis better explained in the following remarks. Remark 7 (Stochasticity at the decoder does not help):
Note that we can always represent discrete stochasticdecoders as discrete deterministic decoders with auxiliary randomness S that takes value in (cid:74) , nR ∗ (cid:75) . Then, insteadof the stochastic decoder function dec, we can consider the deterministic decoder dec (cid:48) , that exploits externalrandomness S . Then, when focusing on the probability of error p e , we have p e = P { d ( U n , W n ) > d } = E S [ P { d ( U n , W n ) > d | S } ]= E S [ p e ( S )] . (67)Since each realization s of S gives a deterministic decoder, and the average over all s is equal to p e by (67), thereexists at least one choice s (cid:63) for which p e ( s (cid:63) ) ≤ p e . Because of this, and because the choice of the deterministicdecoder only concerns reliable reconstruction and not approximating the target distribution, we can assume that thedecoder is deterministic without loss of generality. Remark 8 (The encoder has to be stochastic):
While we can suppose that the decoder is deterministic, theencoding function should be stochastic to achieve the whole coordination region. This is because we not only wantto characterize the rates such that the rate condition holds, but also the target distributions ¯ P UV . When restrictingthe case to deterministic functions, we would restrict the choice of distributions ¯ P UV that can be coordinated. Morein details, W n generated according to ¯ P ⊗ nW | UV comes from: (cid:74) , nR (cid:75) × (cid:74) , nR (cid:75) ×U n ×V n enc. −−−−−−−−−−−−−−−−−→ W n , ( c , m , u , v ) (cid:55)−−−−−−−−−−−−−−−−−→ w , and if the encoder is a deterministic function, enc ( c , m , u , v ) = w with probability 1, whereas if the encoder is stochastic, enc ( c , m , u , v ) = w with probability ¯ P ⊗ nV | U ( v | u ) . Thus, the “deterministic encoder choice” restricts thepossibilities for ¯ P U n W n V n and therefore for the target distributions ¯ P U n V n , since the realization ( u , w , v ) wouldbe generated with probability nR C nR M ¯ P ⊗ nU ( u ) ¯ P ⊗ nV | U ( v | u ) instead of nR C nR M ¯ P ⊗ nU ( u ) ¯ P ⊗ nV | U ( v | u ) ¯ P ⊗ nW | UV ( v | u , v ) . B. Trade-off between (cid:15)
Tot and rate
Observe that in order to minimize (cid:15)
Tot , in the achievability proof we can choose (cid:15) ∗ and (cid:15) ∗ equal to zero. On theother hand, this would require more common randomness since Q − ( · ) increases as its argument approaches zero.Note that one can minimize (cid:15) Tot (for example, we can have (cid:15)
Tot = constant · − n ) simply by choosing different ( γ , γ , γ ) in Section IV.C.2 in the achievability, but this increases the rate conditions (37). If, for example wechoose ( γ , γ , γ ) = (2 cn, cn, cn ) for every constant c , the rate conditions become R + R > I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + 3 c,R > I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + 3 c, (68)and therefore match the rate constraints of the outer bound (4). With this choice, the bound (24) on the L distancedecreases exponentially: (cid:107) P RB U n V n − P RC U n V n (cid:107) ≤ (cid:15) Tot ,(cid:15)
Tot = 10 ¯ P ( S γ ∩ S γ ∩ S γ ) c ) + 2 (cid:16) − γ + 5 · − γ + 2 − γ (cid:17) ≤
10 ( (cid:15) + (cid:15) ) + 2 − c (cid:16) + 5 (cid:17) − n . (69)Suppose instead that we want to recover the same conditions of the outer bound (4). Then we can choose ( γ , γ , γ ) = (2 , , . With this choice for γ i , the rate conditions become R + R > I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + 3 n ,R > I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + 3 n , (70)and therefore match the rate constraints of the outer bound (4). With this choice, the bound (24) on the L distancebecomes (cid:15) Tot ≤
10 ( (cid:15) + (cid:15) ) + (cid:16) + 5 (cid:17) . (71)More to this point, in the achievability we can choose ( γ , γ , γ ) = (cid:0) cn k , cn k , cn k (cid:1) for any constant c and any k ≥ .With this choice for γ i , the rate conditions become R + R > I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + 3 cn k +1 , = I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + O (cid:18) n (cid:19) , R > I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + 3 n √ n , = I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + O (cid:18) n (cid:19) (72)because cn k +1 = O (cid:0) n (cid:1) , since ∀ k ≥ we have cn k +1 ≤ cn , and the bound on the L distance becomes (cid:15) Tot ≤
10 ( (cid:15) + (cid:15) ) + 2 (cid:16) + 5 (cid:17) − cnk . (73) Figure 4. Trade-off between ( γ , γ , γ ) and (cid:15) Tot for (cid:15) = (cid:15) = 0 We can generalize this by fixing ( γ , γ , γ ) = (2 x, x, x ) , the rate conditions and the bound on the L distancebecome R + R > I ( W ; U V ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | UV n + 3 xn ,R > I ( W ; U ) + Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + 3 xn ,(cid:15) Tot ≤
10 ( (cid:15) + (cid:15) ) + 2 (cid:16) + 5 (cid:17) − x . (74)and the trade-off between ( γ , γ , γ ) , and therefore the rate conditions, and (cid:15) Tot is depicted in Figure 4.A
PPENDIX AD ETAILED ANALYSIS OF S ı ( w ; u ) We observe that, since the distribution ¯ P is i.i.d., the terms Z (cid:48) i = ı ¯ P ( w i , u i ) , are mutually independent for i = 1 , . . . n Then, we consider the following inequality nR > n (cid:88) i =1 E ¯ P WU [ ı ¯ P ( w i ; u i )] (cid:124) (cid:123)(cid:122) (cid:125) nµ (cid:48) n + Q − ( (cid:15) ) (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 Var ¯ P WU ( ı ¯ P ( w i ; u i )) (cid:124) (cid:123)(cid:122) (cid:125) n √ V (cid:48) n /n + γ , (75)where µ (cid:48) n = n (cid:80) ni =1 E [ Z i ] , V (cid:48) n = n (cid:80) ni =1 Var [ Z i ] and Q ( · ) is the tail distribution function of the standard normaldistribution. We prove that, assuming that (75) holds, we can successfully bound S ı ( w ; u ) . In fact, the chain of inequalities n (cid:88) i =1 i ¯ P ( W ; U ) > nR − ( γ + γ ) > n (cid:32) µ (cid:48) n + t (cid:114) V (cid:48) n n (cid:33) implies that, if (75) holds, S ı ( w ; u ) is contained in (cid:40) ( u , w ) : n (cid:88) i =1 ı ¯ P ( w i ; u i , ) > nµ (cid:48) n + n Q − ( (cid:15) ) (cid:114) V n n (cid:41) . (76)Therefore, if we find an upper bound on (76), we have an upper bound on S ı ( w ; u ) as well. To obtain that, we applyTheorem 1 (Berry-Esseen CLT) to the right-hand side of (76), and we choose Q ( t ) = (cid:15) ,(cid:15) ∗ = (cid:15) + B (cid:48) n √ n , (77)where, as in the statement of Theorem 1 (Berry-Esseen CLT), B (cid:48) n = 6 T (cid:48) n V (cid:48) / n , and T (cid:48) n = n (cid:80) ni =1 E [ | Z (cid:48) i − µ i | ] . Then,we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P (cid:40) n (cid:88) i =1 ı ¯ P ( w i , u i ) > nµ (cid:48) n + n Q − ( (cid:15) ) (cid:114) V (cid:48) n n (cid:41) − (cid:15) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ B (cid:48) n √ n , ⇒ P (cid:40) n (cid:88) i =1 ı ¯ P ( w i , u i ) > nµ (cid:48) n + n Q − ( (cid:15) ) (cid:114) V (cid:48) n n (cid:41) ≤ (cid:15) ∗ . (78)Finally, (78) combined with (76) implies ¯ P (cid:0) S ı ( w : uv ) (cid:1) ≤ (cid:15) ∗ . Moreover, we can simplify (75) with the followingidentifications: similarly to [41], observe that µ (cid:48) n := 1 n n (cid:88) i =1 E ¯ P WU [ ı ¯ P ( w i ; u i )]= E ¯ P WU [ ı ¯ P ( w ; u )] = I ( W ; U ) , (79a) V (cid:48) n := 1 n n (cid:88) i =1 Var ¯ P WU ( ı ¯ P ( W ; U ))= Var ¯ P WU ( ı ¯ P ( W ; U )) , (79b)and V ¯ P W | U = min ¯ P W | U (cid:2) Var ¯ P WU ( ı ¯ P ( W ; U )) (cid:3) = min ¯ P W | U (cid:2) Var ¯ P WU ( ı ¯ P ( W ; U ) | W ) (cid:3) is the dispersion of the channel ¯ P W | U as defined in [30, Thm. 49]. Hence, (75) can be rewritten as nR > nI ( W ; U ) + n Q − ( (cid:15) ) (cid:115) V ¯ P W | U n + ( γ + γ ) . (80)A PPENDIX BP ROOF OF H ( M ) ≥ nI ( U ; W ) We have H ( M ) ≥ H ( M | C ) ≥ I ( U n ; M | C ) = n (cid:88) i =1 I ( U t ; M | U t − C )= n (cid:88) i =1 I ( U t ; M C | U t − ) − n (cid:88) i =1 I ( U t ; C | U t − ) ≥ n (cid:88) i =1 I ( U t ; M C | U t − ) − n (cid:88) i =1 I ( U t ; C ) ( a ) = n (cid:88) i =1 I ( U t ; M C | U t − ) = n (cid:88) i =1 I ( U t ; M CU t − ) − n (cid:88) i =1 I ( U t ; U t − ) ( d ) = n (cid:88) i =1 I ( U t ; M CU t − ) ≥ n (cid:88) i =1 I ( U t ; M C ) ( c ) = n (cid:88) i =1 I ( U t ; W t ) ( f ) = nI ( U ; W ) (81)where ( a ) and ( b ) follow from the i.i.d. nature of the channel, and ( c ) and ( d ) from the identifications W t = ( C, M ) for each t ∈ (cid:74) , n (cid:75) and W = ( W T , T ) = ( C, M, T ) .A PPENDIX CP ROOF OF THE BOUND ON R : C ASE A. Proof of the upper bound on log β α (43a) – Case 2 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) ) We have log ( P U n W n ( u , w )) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) − (cid:15) (cid:17) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) (cid:17) + log (cid:32) − (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) (cid:33) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) (cid:17) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w )¯ P ⊗ nU ¯ P ⊗ nW | U ( u , w ) − (cid:15) (cid:33) . (82)Thus, the following holds P (cid:8) log P U n W n ≥ log ¯ P ⊗ nU ¯ P ⊗ nW + log γ (cid:9) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) ≥ log ¯ P ⊗ nU ¯ P ⊗ nW + log γ (cid:41) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ≥ log ¯ P ⊗ nU ¯ P ⊗ nW + log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33)(cid:41) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW ≥ log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33)(cid:41) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33)(cid:41) . (83)Since (cid:80) ni =1 log( ¯ P U ¯ P W | U / ¯ P U ¯ P W ) =: (cid:80) ni =1 Z i is the sum of n i.i.d. random variables, the next step is to evaluatethe probability in (45) using Theorem 1 (Berry-Esseen CLT). To accomplish that, we choose appropriately theparameter γ : log γ := nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) (84)where, as in Theorem 1, µ n = n (cid:80) ni =1 E [ Z i ] , V n = n (cid:80) ni =1 Var [ Z i ] , T n = n (cid:80) ni =1 E [ | Z i − µ i | ] , B n = 6 T n V / n ,and Q ( · ) is the tail distribution function of the standard normal distribution, and (cid:15) defined above in Section IV.C.Observe that by (84) and Remark 5, (83) becomes P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n (cid:41) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P U ¯ P W ≥ nI ( U ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | U (cid:41) ≥ (cid:15) − B n √ n =: α. (85)Then, we have log 1 β α ≥ log γ = nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) . (86) B. Proof of the lower bound on log β α (43b) – Case 2 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) ) By (41b) for every γ > we have β α ≥ γ (cid:20) α − P P UnWn (cid:26) log P U n W n ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:27)(cid:21) = 1 γ (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW > log γ (cid:41)(cid:35) = 1 γ (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33)(cid:41)(cid:35) . Then, we set log γ = H ( M ) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) and, as proved in (52), nR = H ( M ) ≥ nI ( U ; W ) ≥ nI ( U ; W ) + Q − ( y ) (cid:112) nV n < y < which implies log β α ≥ log 1 γ + log (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33)(cid:41)(cid:35) ≥ − H ( M ) + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) + log (cid:34) α − P P UnWn (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW > nI ( U ; W ) + Q − ( y ) (cid:112) nV n (cid:41)(cid:35) = − H ( M ) + (cid:34) log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) + log (cid:18) α − y − B n √ n (cid:19)(cid:35)(cid:124) (cid:123)(cid:122) (cid:125) − x (87)and similarly to (57), (43b) holds for this case as well. C. Proof of the rate constraint – Case 2 ( P U n W n = ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) ) Now, we conclude the proof by combining (86) and (87). For / < y < , we have H ( M ) (cid:124) (cid:123)(cid:122) (cid:125) nR − (cid:34) log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) + log (cid:18) α − y − B n √ n (cid:19)(cid:35) ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nU ¯ P ⊗ nW | U − (cid:15) (cid:33) which is equivalent to R ≥ µ n + Q − ( (cid:15) ) (cid:114) V n n + log (cid:16) α − y − B n √ n (cid:17) n . (88)A PPENDIX DP ROOF OF THE BOUND ON R + R As in Section V.A, for an observation w = ( m, c ) , we define the hypothesis: H : w generated according to P W n | U n V n ( w ) = (cid:88) u , v P U n ( w ) P U n W n V n ( u , w , v ) = (cid:88) u , v P U n W n ( u , w , v ) , H : w generated according to ¯ P ⊗ nW ( w ) = (cid:88) u , v ¯ P ⊗ nUV ¯ P ⊗ nW ( u , w , v ) , where ¯ P is the i.i.d. target distribution. Then, we consider a randomized test between the distributions P U n W n V n and ¯ P ⊗ nUV ¯ P ⊗ nW : P Z | U n W n V n : U n × W n × V n → {H , H } , where H indicates that the test chooses P U n W n V n , and H indicates that the test chooses ¯ P ⊗ nUV ¯ P ⊗ nW . The probabilityof type-I error (probability of choosing H when the true hypothesis is H ) and type-II error (probability of choosing H when the true hypothesis is H ) are P I e ( P Z | U n W n V n ) := P { ˆ H |H } = (cid:88) u , v ¯ P ⊗ nUV ( u , v ) ¯ P ⊗ nW ( w ) P Z | U n W n V n ( H | u , w , v ) , (89a) P II e ( P Z | U n W n V n ) := P { ˆ H |H } = (cid:88) u , v P U n W n V n ( u , w , v ) P Z | U n W n V n ( H | u , w , v ) . (89b)Similar to (40), we denote with β (cid:48) α (cid:48) the minimum type-I error for a maximum type-II error − α (cid:48) : β (cid:48) α (cid:48) := min P Z | UnWnV n : P II e ( P Z | UnWnV n ) ≤ − α (cid:48) P I e ( P Z | U n W n V n ) (90)where the error probability α (cid:48) will be defined later. The following relations between α (cid:48) and β (cid:48) α (cid:48) , proved in [48,Section 12.4], hold: β (cid:48) α (cid:48) ≤ γ , if γ is such that P P UnWnV n (cid:26) log P U n W n V n ¯ P ⊗ nUV ¯ P ⊗ nW > log γ (cid:27) ≥ α (cid:48) , (91a) α (cid:48) ≤ P P UnWnV n (cid:26) log P U n W n V n ¯ P ⊗ nUV ¯ P ⊗ nW > log γ (cid:27) + γ β (cid:48) α (cid:48) ∀ γ > . (91b)Similarly to Section V.A, we prove the rate constraint by separately deriving an upper and a lower bound on log β (cid:48) α (cid:48) :upper bound on log β (cid:48) α (cid:48) log 1 β (cid:48) α (cid:48) ≥ nI ( W ; U V ) + Q − ( (cid:15) ) (cid:113) nV ˆ P W | UV , (92a)lower bound on log β (cid:48) α (cid:48) n ( R + R ) + x ≥ log 1 β (cid:48) α (cid:48) (92b)for a certain x ∈ R which will be defined later. Then, the proof of the rate constraint is divided in the followingsteps, detailed in the next sections:(i) Proof of the upper bound on log β (cid:48) α (cid:48) : we use the upper bound on min P I e ( P Z | U n W n V n ) of (91a) combined withthe ( (cid:15), n ) -strong coordination assumption (38a) and Theorem 1 (Berry-Esseen CLT) to derive the followingupper bound on the logarithm β (cid:48) α (cid:48) of (92a) by choosing the parameter γ ; (ii) Proof of the lower bound on log β (cid:48) α (cid:48) : we use the lower bound on min P I e ( P Z | U n W n V n ) of (91b) combinedwith the ( (cid:15), n ) -strong coordination assumption (38a) and classical information theory properties to derive thelower bound on β (cid:48) α (cid:48) of (92b) by choosing the parameter γ ;(iii) we combine (92a) and (92b) proved in the previous steps and we derive the rate constraint.Moreover, observe that by the ( (cid:15), n ) -strong coordination assumption (38a), we have (cid:107) P U n W n V n − ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:107) = (cid:15) ≤ (cid:15) Tot ; hence we have to distinguish two cases: • Case 1 P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W + (cid:15) ; • Case 2 P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) . A. Proof of the upper bound on log β (cid:48) α (cid:48) (92a) – Case 1 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W + (cid:15) ) By the ( (cid:15), n ) -strong coordination assumption (38a), we have log ( P U n W n V n ( u , w , v )) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ( u , w , v ) (cid:17) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ( u , w , v ) (cid:33) . (93)Then, similarly to (48), the following holds P (cid:8) log P U n W n V n ≥ log ¯ P ⊗ nUV ¯ P ⊗ nW + log γ (cid:9) = P (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW ≥ log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33)(cid:41) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P V | W ¯ P UV ¯ P W ≥ nI ( U V ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | UV (cid:41) (94)where, for a given ≤ (cid:15) ≤ , as defined in Section IV.C, the last equality follows from choosing the parameter γ as log γ := nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) (95)and, as in the statement of Theorem 1 and Remark 5, from the identifications: n (cid:88) i =1 Z i = n (cid:88) i =1 log( ¯ P U ¯ P W | U ¯ P V | W / ¯ P UV ¯ P W ) ,µ n = 1 n n (cid:88) i =1 E [ Z i ] = D ( ¯ P U ¯ P W | U ¯ P V | W (cid:107) ¯ P UV ¯ P W ) = I ( W ; U V ) ,V n = 1 n n (cid:88) i =1 Var [ Z i ] = V ¯ P W | UV = (cid:88) u,w,v ¯ P U ( u ) ¯ P W | U ( w | u ) ¯ P V | W ( v | w ) (cid:34) log ¯ P U ( u ) ¯ P W | U ( w | u ) ¯ P V | W ( v | w )¯ P UV ( u, v ) ¯ P W ( w ) (cid:35) − D ( ¯ P U ¯ P W | U ¯ P V | W (cid:107) ¯ P UV ¯ P W ) T n = 1 n n (cid:88) i =1 E [ | Z i − µ i | ]= (cid:88) u,w,v ¯ P U ( u ) ¯ P W | U ( w | u ) ¯ P V | W ( v | w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ¯ P U ( u ) ¯ P W | U ( w | u ) ¯ P V | W ( v | w )¯ P UV ( u, v ) ¯ P W ( w ) − D ( ¯ P U ¯ P W | U ¯ P V | W (cid:107) ¯ P UV ¯ P W ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , B n = 6 T n V / n , and Q ( · ) is the tail distribution function of the standard normal distribution.Now, since (cid:80) ni =1 log( ¯ P U ¯ P W | U ¯ P V | W / ¯ P UV ¯ P W ) = (cid:80) ni =1 Z i is the sum of n i.i.d. random variables, we bound (94)using Theorem 1 (Berry-Esseen CLT): P (cid:40) log n (cid:89) i =1 ¯ P U ¯ P W | U ¯ P V | W ¯ P UV ¯ P W ≥ nI ( U V ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | UV (cid:41) ≥ (cid:18) (cid:15) − B n √ n (cid:19) (96)and we identify α := (cid:15) − B n √ n (97)and by combining (91a) with (95), we obtain: log 1 β (cid:48) α (cid:48) ≥ log γ = nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) . (98) B. Proof of the lower bound on log β (cid:48) α (cid:48) (92b) – Case 1 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W + (cid:15) ) First, we prove that n ( R + R ) = H ( M, C ) ( a ) ≥ nI ( U V ; W ) − n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) ( b ) ≥ nI ( U V ; W ) − n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) + Q − ( y ) (cid:112) nV n , < y < . (99)To prove ( a ) , observe that H ( M, C ) ≥ I ( U n V n ; M C ) = n (cid:88) i =1 I ( U t V t ; M C | U t − V t − )= n (cid:88) i =1 I ( U t V t ; M CU t − V t − ) − n (cid:88) i =1 I ( U t V t ; U t − V t − ) ( c ) ≥ n (cid:88) i =1 I ( U t V t ; M CU t − V t − ) − ng ( (cid:15) ) ≥ n (cid:88) i =1 I ( U t V t ; M C ) − ng ( (cid:15) ) ( d ) = n (cid:88) i =1 I ( U t V t ; W t ) − ng ( (cid:15) ) ( e ) ≥ nI ( U V ; W ) − ng ( (cid:15) )= nI ( U V ; W ) − n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) (100)where, as in [9], the term g ( (cid:15) ) in ( c ) and ( e ) is defined as g ( (cid:15) ) := 2 (cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) , (101)and the inequalities ( c ) and ( e ) are proved in [9, Lemma VI.3]. Moreover ( d ) and ( e ) use the identifications W t = ( C, M ) for each t ∈ (cid:74) , n (cid:75) and W = ( W T , T ) = ( C, M, T ) . Finally, (99) is proved in [9, Lemma VI.3]since ( b ) comes from the fact that Q − ( y ) √ nV n ≤ for every < y < . Now, we recall that by (91b) for every γ > , we have β (cid:48) α (cid:48) ≥ γ (cid:20) α (cid:48) − P P UnWnV n (cid:26) log P U n W n V n ¯ P ⊗ nUV ¯ P ⊗ nW > log γ (cid:27)(cid:21) = 1 γ (cid:34) α − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33)(cid:41)(cid:35) . (102)Then, we set log γ = H ( M, C ) + 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) (103)and (99) implies log β α ≥ log 1 γ + log (cid:34) α (cid:48) − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > log γ − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33)(cid:41)(cid:35) = − (cid:34) H ( M, C ) + 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33)(cid:35) + log (cid:34) α (cid:48) − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > H ( M, C ) + 2 ng ( (cid:15) ) (cid:41)(cid:35) ≥ − (cid:34) H ( M, C ) + 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33)(cid:35) + log (cid:34) α (cid:48) − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > nI ( U V ; W ) + Q − ( y ) (cid:112) nV n (cid:41)(cid:35) = − H ( M, C ) − ng ( (cid:15) ) − log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) + log (cid:18) α (cid:48) − y − B n √ n (cid:19) . (104)Similarly to (57), we identify x = 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) − log (cid:18) α (cid:48) − y − B n √ n (cid:19) and we observe that (104) is equivalent to n ( R + R ) + x = n ( R + R ) + 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) − log (cid:18) α (cid:48) − y − B n √ n (cid:19) = H ( M, C ) + 2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) − log (cid:18) α (cid:48) − y − B n √ n (cid:19) ≥ log 1 β (cid:48) α (cid:48) . (105) C. Proof of the rate constraint – Case 1 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W + (cid:15) ) Now, we conclude the proof of this part by combining (98) and (105). For / < y < we have H ( M, C ) (cid:124) (cid:123)(cid:122) (cid:125) n ( R + R ) +2 ng ( (cid:15) ) + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) − log (cid:18) α − y − B n √ n (cid:19) ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n + log (cid:32) (cid:15) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W (cid:33) which is equivalent to R + R ≥ µ n + Q − ( (cid:15) ) (cid:114) V n n + log (cid:16) α (cid:48) − y − B n √ n (cid:17) n − g ( (cid:15) )= µ n + Q − ( (cid:15) ) (cid:114) V n n + log (cid:16) α (cid:48) − y − B n √ n (cid:17) n − (cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) . (106) D. Proof of the upper bound on log β (cid:48) α (cid:48) (92a) – Case 2 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) ) We have log ( P U n W n V n ( u , w , v )) = log (cid:16) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ( u , w , v ) (cid:17) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ( u , w , v )¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ( u , w , v ) − (cid:15) (cid:33) . (107)Thus, the following holds P (cid:8) log P U n W n V n ≥ log ¯ P ⊗ nUV ¯ P ⊗ nW + log γ (cid:9) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P V | W ¯ P U ¯ P W ≥ log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33)(cid:41) = P (cid:40) n (cid:88) i =1 log ¯ P U ¯ P W | U ¯ P V | W ¯ P UV ¯ P W ≥ nI ( U V ; W ) + Q − ( (cid:15) ) (cid:113) nV ¯ P W | UV (cid:41) ≥ (cid:15) − B n √ n =: α (cid:48) . (108)by Theorem 1 (Berry-Esseen CLT) and log γ := nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) . (109)Then, we have log 1 β (cid:48) α (cid:48) ≥ log γ = nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) . (110) E. Proof of the lower bound on log β (cid:48) α (cid:48) (92b) – Case 2 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) ) By (91b) ∀ γ > we have β (cid:48) α (cid:48) ≥ γ (cid:20) α (cid:48) − P P UnWnV n (cid:26) log P U n W n V n ¯ P ⊗ nUV ¯ P ⊗ nW > log γ (cid:27)(cid:21) = 1 γ (cid:34) α (cid:48) − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33)(cid:41)(cid:35) . Then, we set log γ = H ( M, C ) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) and we recall the following chain of inequalities, proved in (99) n ( R + R ) = H ( M, C ) ≥ nI ( U V ; W ) − n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) + Q − ( y ) (cid:112) nV n , < y < (111)which implies log β (cid:48) α (cid:48) ≥ log 1 γ + log (cid:34) α (cid:48) − P P UnWnV n (cid:40) log ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nUV ¯ P ⊗ nW > log γ + log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33)(cid:41)(cid:35) ≥ − H ( M, C ) − (cid:34) n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) − log (cid:18) α − y − B n √ n (cid:19)(cid:35)(cid:124) (cid:123)(cid:122) (cid:125) x . (112) F. Proof of the rate constraint – Case 2 ( P U n W n V n = ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) ) Now, by combining (112) with (110), for / < y < we have H ( M, C ) (cid:124) (cid:123)(cid:122) (cid:125) n ( R + R ) +4 n(cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) − log (cid:18) α − y − B n √ n (cid:19) ≥ nµ n + Q − ( (cid:15) ) (cid:112) nV n − log (cid:32) ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W ¯ P ⊗ nU ¯ P ⊗ nW | U ¯ P ⊗ nV | W − (cid:15) (cid:33) which is equivalent to R + R ≥ µ n + Q − ( (cid:15) ) (cid:114) V n n + log (cid:16) α − y − B n √ n (cid:17) n − (cid:15) (cid:18) log |U × V| + log 1 (cid:15) (cid:19) . (113)A PPENDIX EP ROOF OF THE CARDINALITY BOUND
Here we prove the cardinality bound for the outer bound in Theorem 3.First, we state the Support Lemma [53, Appendix C].
Lemma 2:
Let A a finite set and W be an arbitrary set. Let P be a connected compact subset of probabilitymass functions on A and P A | W be a collection of conditional probability mass functions on A . Suppose that h i ( π ) , i = 1 , . . . , d , are real-valued continuous functions of π ∈ P . Then for every W defined on W there exists a randomvariable W (cid:48) with |W (cid:48) | ≤ d and a collection of conditional probability mass functions P A | W (cid:48) ∈ P such that (cid:88) w ∈W P W ( w ) h i ( P A | W ( a | w )) = (cid:88) w ∈W (cid:48) P W (cid:48) ( w ) h i ( P A | W (cid:48) ( a | w )) i = 1 , . . . , d. Now, we consider the probability distribution ¯ P U ¯ P W | U ¯ P V | W that is (cid:15) -close in L distance to the i.i.d. distribution.We identify A with { , . . . , |A|} and we consider P a connected compact subset of probability mass functionson A = U × V . Similarly to [20], suppose that h i ( π ) , i = 1 , . . . , |A| + 1 , are real-valued continuous functions of π ∈ P such that: h i ( π ) = π ( i ) for i = 1 , . . . , |A| − ,H ( U ) for i = |A| ,H ( V | U ) for i = |A| + 1 . Then by Lemma 2 there exists an auxiliary random variable W (cid:48) taking at most |U × X × Y × V| + 1 values suchthat: H ( U | W ) = (cid:88) w ∈W P W ( w ) H ( U | W = w ) = (cid:88) w ∈W (cid:48) P W (cid:48) ( w ) H ( U | W (cid:48) = w ) = H ( U | W (cid:48) ) ,H ( V | U W ) = (cid:88) w ∈W P W ( w ) H ( V | U W = w ) = (cid:88) w ∈W (cid:48) P W (cid:48) ( w ) H ( V | U W (cid:48) = w ) = H ( V | U W (cid:48) ) . The constraints on the conditional distributions, the rate constraints and the Markov chain U − W − V are thereforestill verified since we can write I ( U ; W ) = H ( U ) − H ( U | W ) ,I ( U V ; W ) = H ( U V ) − H ( U V | W ) = H ( U ) + H ( V | U ) − H ( U | W ) + H ( V | U W ) ,I ( U ; V | W ) = H ( U | W ) − H ( U | V W ) = 0 . Note that we are not forgetting any constraints: once the distribution ¯ P UV and the Markov chain U − W − V arepreserved, the dispersion term of the channels ¯ P W | U and ¯ P W | UV are fixed as well.R EFERENCES [1] O. Gossner, P. Hernandez, and A. Neyman, “Optimal use of communication resources,”
Econometrica , pp. 1603–1636,2006.[2] B. Larrousse, S. Lasaulce, and M. Bloch, “Coordination in distributed networks via coded actions with application topower control,”
IEEE Transactions on Information Theory , vol. 64, no. 5, pp. 3633–3654, May 2018.[3] P. Cuff, “Communication in networks for coordinating behavior,” Ph.D. dissertation, Stanford University, 2009.[4] P. Cuff, H. H. Permuter, and T. M. Cover, “Coordination Capacity,”
IEEE Trans. Inf. Theory , vol. 56, no. 9, pp. 4181–4206,2010.[5] C. H. Bennett, P. W. Shor, J. A. Smolin, and A. V. Thapliyal, “Entanglement-assisted capacity of a quantum channel andthe reverse Shannon theorem,”
IEEE Transactions on Information Theory , vol. 48, no. 10, pp. 2637–2655, 2002.[6] E. Soljanin, “Compressing quantum mixed-state sources by sending classical information,”
IEEE Transactions onInformation Theory , vol. 4, no. 8, pp. 2263–2275, 2002.[7] G. Kramer and S. A. Savari, “Communicating probability distributions,”
IEEE Transactions on Information Theory ,vol. 53, no. 2, pp. 518–525, 2007.[8] A. Winter, “Compression of sources of probability distributions and density operators,” 2002. [Online]. Available:http://arxiv.org/abs/quant-ph/0208131[9] P. Cuff, “Distributed Channel Synthesis,”
IEEE Trans. Inf. Theory , vol. 59, no. 11, pp. 7071–7096, Nov. 2013.[10] F. Haddadpour, M. H. Yassaee, A. Gohari, and M. R. Aref, “Coordination via a relay,” in
Proc. of IEEE InternationalSymposium on Information Theory (ISIT) , 2012, pp. 3048–3052.[11] M. R. Bloch and J. Kliewer, “Strong coordination over a three-terminal relay network,” in
Information Theory Workshop(ITW), 2014 IEEE . IEEE, 2014, pp. 646–650.[12] ——, “Strong coordination over a line network,” in
Proc. of IEEE International Symposium on Information Theory (ISIT) ,2013, pp. 2319–2323.[13] B. N. Vellambi, J. Kliewer, and M. R. Bloch, “Strong coordination over multi-hop line networks,” in
Proc. of IEEEInformation Theory Workshop-Fall (ITW) , 2015, pp. 192–196.[14] ——, “Strong coordination over a line when actions are markovian,” in
Proc. of Annual Conference on InformationScience and Systems (CISS) , 2016, pp. 412–417.[15] P. Cuff and C. Schieler, “Hybrid codes needed for coordination over the point-to-point channel,” in
Proc. of AllertonConference on Communication, Control and Computing , 2011, pp. 235–239.[16] M. Le Treust, “Correlation between channel state and information source with empirical coordination constraint,” in
Proc.of IEEE Information Theory Workshop (ITW) , 2014, pp. 272–276.[17] ——, “Empirical coordination with two-sided state information and correlated source and state,” in
Proc. of IEEEInternational Symposium on Information Theory (ISIT) , 2015, pp. 466–470.[18] ——, “Empirical coordination with channel feedback and strictly causal or causal encoding,” in
Proc. of IEEE InternationalSymposium on Information Theory (ISIT) . IEEE, 2015, pp. 471–475.[19] B. Larrousse, S. Lasaulce, and M. Wigger, “Coordinating partially-informed agents over state-dependent networks,” in
Proc. of IEEE Information Theory Workshop (ITW) , 2015, pp. 1–5.[20] M. Le Treust, “Joint empirical coordination of source and channel,”
IEEE Transactions on Information Theory , vol. 63,no. 8, pp. 5087–5114, 2017. [21] F. Haddadpour, M. H. Yassaee, S. Beigi, A. Gohari, and M. R. Aref, “Simulation of a channel with another channel,” IEEE Transactions on Information Theory , vol. 63, no. 5, pp. 2659–2677, 2017.[22] G. Cervia, L. Luzzi, M. Le Treust, and M. R. Bloch, “Strong coordination of signals and actions over noisy channels,”in , Jun. 2017, pp. 2835–2839.[23] ——, “Strong coordination of signals and actions over noisy channels with two-sided state information,”
IEEE Transactionson Information Theory , vol. 66, no. 8, pp. 4681–4708, 2020.[24] G. Cervia, T. J. Oechtering, and M. Skoglund, “Fixed-length strong coordination,” in
Proc. of IEEE Information TheoryWorkshop (ITW) , 2019.[25] V. Strassen, “Asymptotische abschatzugen in shannon’s informationstheorie,” in
Transactions of the Third PragueConference on Information Theory etc, 1962. Czechoslovak Academy of Sciences, Prague , 1962, pp. 689–723.[26] I. Kontoyiannis, “Second-order noiseless source coding theorems,”
IEEE Transactions on Information Theory , vol. 43,no. 4, pp. 1339–1341, 1997.[27] D. Baron, M. A. Khojastepour, and R. G. Baraniuk, “How quickly can we approach channel capacity?” in
ConferenceRecord of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004. , vol. 1. IEEE, 2004, pp.1096–1100.[28] M. Hayashi, “Second-order asymptotics in fixed-length source coding and intrinsic randomness,”
IEEE Transactions onInformation Theory , vol. 54, no. 10, pp. 4619–4637, 2008.[29] ——, “Information spectrum approach to second-order coding rate in channel coding,”
IEEE Transactions on InformationTheory , vol. 55, no. 11, pp. 4947–4966, 2009.[30] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Channel coding rate in the finite blocklength regime,”
IEEE Transactions onInformation Theory , vol. 56, no. 5, p. 2307, 2010.[31] S. Verd´u, “Non-asymptotic achievability bounds in multiuser information theory,” in . IEEE, 2012, pp. 1–8.[32] E. M. Jazi and J. N. Laneman, “Simpler achievable rate regions for multiaccess with finite blocklength,” in . IEEE, 2012, pp. 36–40.[33] V. Kostina and S. Verd´u, “Fixed-length lossy compression in the finite blocklength regime,”
IEEE Transactions onInformation Theory , vol. 58, no. 6, pp. 3309–3338, 2012.[34] ——, “Lossy joint source-channel coding in the finite blocklength regime,”
IEEE Transactions on Information Theory ,vol. 59, no. 5, pp. 2545–2575, 2013.[35] V. Y. Tan and O. Kosut, “On the dispersions of three network information theory problems,”
IEEE Transactions onInformation Theory , vol. 60, no. 2, pp. 881–903, 2013.[36] S. Watanabe, S. Kuzuoka, and V. Y. F. Tan, “Nonasymptotic and second-order achievability bounds for coding withside-information,”
IEEE Transactions on Information Theory , vol. 61, no. 4, pp. 1574–1605, 2015.[37] R. Nomura et al. , “Second-order slepian-wolf coding theorems for non-mixed and mixed sources,”
IEEE transactions oninformation theory , vol. 60, no. 9, pp. 5553–5572, 2014.[38] R. Blahut, “Hypothesis testing and information theory,”
IEEE Transactions on Information Theory , vol. 20, no. 4, pp.405–417, 1974.[39] A. T. Campo, G. Vazquez-Vilar, A. G. i F`abregas, and A. Martinez, “Converse bounds for finite-length joint source-channelcoding,” in . IEEE, 2012,pp. 302–307.[40] M. H. Yassaee, M. R. Aref, and A. Gohari, “A technique for deriving one-shot achievability results in network informationtheory,” in
Proc. of IEEE International Symposium on Information Theory (ISIT) , 2013, pp. 1287–1291.[41] ——, “Non-asymptotic output statistics of random binning and its applications,” in . IEEE, 2013, pp. 1849–1853.[42] G. Vazquez-Vilar, A. T. Campo, A. G. i F`abregas, and A. Martinez, “The meta-converse bound is tight,” in . IEEE, 2013, pp. 1730–1733.[43] G. Vazquez-Vilar, A. Tauste Campo, A. Guill´en i F`abregas, and A. Martinez, “Bayesian m -ary hypothesis testing: Themeta-converse and verd´u-han bounds are tight,” IEEE Transactions on Information Theory , vol. 62, no. 5, pp. 2324–2333,2016.[44] V. Kostina, “Lossy data compression: Non-asymptotic fundamental limits,”
Ph. D. dissertation , 2013.[45] M. H. Yassaee, M. R. Aref, and A. Gohari, “Achievability Proof via Output Statistics of Random Binning,”
IEEE Trans.Inf. Theory , vol. 60, no. 11, pp. 6760–6786, Nov. 2014.[46] T. Lindvall,
Lectures on the Coupling Method . John Wiley & Sons, Inc., 1992. Reprint: Dover paperback edition, 2002.[47] V. Erokhin, “ ε -entropy of a discrete random variable,” Theory of Probability & Its Applications , vol. 3, no. 1, pp. 97–100,1958.[48] Y. Polyanskiy and Y. Wu, “Lecture notes on information theory,”
Lecture Notes for ECE563 (UIUC) and , vol. 6, no.2012-2016, p. 7, 2014. [49] J. Scarlett, A. Martinez, and A. G. i F`abregas, “Mismatched decoding: Finite-length bounds, error exponents andapproximations,” arXiv preprint arXiv:1303.6166 , 2012.[50] A. Martinez, A. G. i Fabregas, G. Caire, and F. M. Willems, “Bit-interleaved coded modulation revisited: A mismatcheddecoding perspective,” IEEE Transactions on Information Theory , vol. 55, no. 6, pp. 2756–2765, 2009.[51] M. H. Yassaee, M. R. Aref, and A. Gohari, “Non-asymptotic output statistics of random binning and its applications,” arXiv preprint arXiv:1303.0695 , 2013.[52] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,”
IRE Nat. Conv. Rec , vol. 4, no. 142-163,p. 1, 1959.[53] A. El Gamal and Y. H. Kim,