Every Bit Counts: Second-Order Analysis of Cooperation in the Multiple-Access Channel
11 Every Bit Counts: Second-Order Analysis ofCooperation in the Multiple-Access Channel
Oliver Kosut, Michelle Effros, Michael Langberg
Abstract —The work at hand presents a finite-blocklengthanalysis of the multiple access channel (MAC) sum-rate underthe cooperation facilitator (CF) model. The CF model, in whichindependent encoders coordinate through an intermediary node,is known to show significant rate benefits, even when the rate ofcooperation is limited. We continue this line of study for coop-eration rates which are sub-linear in the blocklength n . Roughlyspeaking, our results show that if the facilitator transmits log K bits, there is a sum-rate benefit of order (cid:112) log K/n . This resultextends across a wide range of K : even a single bit of cooperationis shown to provide a sum-rate benefit of order / √ n . I. I
NTRODUCTION
The multiple access channel (MAC) model lies at aninteresting conceptual intersection between the notions ofcooperation and interference in wireless communications.When viewed from the perspective of any single transmitter,codewords transmitted by other transmitters can only inhibitthe first transmitter’s individual communication rate; thuseach transmitter sees the others as a source of interference.When viewed from the perspective of the receiver, however,maximizing the total rate delivered to the receiver oftenrequires all transmitters to communicate simultaneously; fromthe receiver’s perspective, then, the transmitters must cooperatethrough their simultaneous transmissions to maximize thesum-rate delivered to the receiver.Simultaneous transmission is, perhaps, the weakest form ofcooperation imaginable in a wireless communication model.Nonetheless, the fact that even simultaneous transmissionof independent codewords from interfering transmitters canincrease the sum-rate deliverable to the MAC receiver begsthe question of how much more could be achieved throughmore significant forms of MAC transmitter cooperation.The information theory literature devotes considerable effortto studying the impact of encoder cooperation in the MAC.A variety of cooperation models are considered. Examplesinclude the “conferencing” cooperation model [1], in whichencoders share information directly in order to coordinate theirchannel inputs, the “cribbing” cooperation model [2], in whichtransmitters cooperate by sharing their codeword information(at times causally), and the “cooperation facilitator” (CF)
O. Kosut is with the School of Electrical, Computer and Energy Engineeringat Arizona State University. Email: [email protected]
M. Effros is with the Department of Electrical Engineering at the CaliforniaInstitute of Technology. Email: [email protected]
M. Langberg is with the Department of Electrical Engineering atthe University at Buffalo (State University of New York). Email: [email protected]
This work is supported in part by NSF grants CCF-1817241, CCF-1908725,and CCF-1909451. cooperation model [3] in which users coordinate their channelinputs with the help of an intermediary called the CF. TheCF distinguishes the amount of information that must beunderstood to facilitate cooperation (i.e., the rate R IN to the CF) from the amount of information employed in thecoordination (i.e., the rate R OUT from the CF). Key resultsusing the CF model show that for many MACs, no matterwhat the (non-zero) fixed rate C IN , the curve describing themaximal sum-rate as a function of R OUT has infinite slope at R OUT = 0 [4]. That is, very little coordination through a CFcan change the MAC capacity considerably.
This phenomenonholds for both average and maximum error sum-rates; it ismost extreme in the latter case, where even a finite numberof bits (independent of the blocklength) — that is, R OUT = 0 — can suffice to change the MAC capacity region [5]–[7].We study the CF model for 2-user MACs under the averageerror criterion. In this setting, the maximal sum-rate is a continuous function of R OUT at R OUT = 0 [6], [7], implyinga first-order upper-bound on the benefit of cooperation forrates that are sub-linear. However, sub-linear CF cooperationmay still increase sum-rate, albeit through second-order terms.In this work, we seek to understand the impact of the CF overa wide range of cooperation rates. Specifically, we considera CF that, after viewing both messages, can transmit one of K signals to both transmitters. We prove achievable boundsthat express the benefit of this cooperation as a function of K . These bounds extend all the way from constant K toexponential K . Interestingly, we find that even for K = 2 (i.e.,one bit of cooperation), there is a benefit in the second-order(i.e., dispersion) term, corresponding to an improvement of O ( √ n ) message bits. We prove two main achievable bounds,each of which is optimal for a different range of K values.The proof of the first bound is based on refined asymptoticanalysis similar to typical second-order bounds. The proof ofthe second bound is based on the method of types. For a widerange of K values, we find that the benefit is O ( √ n log K ) message bits. II. P ROBLEM S ETUP An ( M , M , K ) facilitated multiple access code for mul-tiple access channel (MAC) ( X × X , p Y | X ,X ( y | x , x ) , Y ) is defined by a facilitator code e : [ M ] × [ M ] → [ K ] a pair of encoders f : [ M ] × [ K ] → X a r X i v : . [ c s . I T ] F e b f : [ M ] × [ K ] → X and a decoder g : Y → [ M ] × [ M ] . The encoder’s output is sometimes described using the abbre-viated notation X ( m , m ) = f ( m , e ( m , m )) X ( m , m ) = f ( m , e ( m , m )) . The average error probability for the given code is P e = 1 M M M (cid:88) m =1 M (cid:88) m =1 Pr (cid:0) g ( Y ) (cid:54) = ( m , m ) (cid:12)(cid:12) ( X , X ) = ( X ( m , m ) , X ( m , m )) (cid:1) . We also consider codes for the n -length product channel,where X , X , Y are replaced by X n , X n , Y n respectively, andwhere p Y n | X n ,X n ( y n | x n , x n ) = n (cid:89) i =1 p Y | X ,X ( y i | x i , x i ) . An ( M , M , K ) code for the n -length channel achieving aver-age probability of error at most (cid:15) is called an ( n, M , M , K, (cid:15) ) code. We assume that all alphabets are finite.The following notation will be useful. Given a MAC ( X × X , p Y | X ,X ( y | x , x ) , Y ) , the sum-capacity without cooperation is given by C sum = max p X p X I ( X , X ; Y ) . (1)Let P (cid:63) be the set of product distributions p X p X achievingthe maximum in (1). For any p X p X ∈ P (cid:63) , let p Y be theresulting marginal on the channel output, giving p Y ( y ) = (cid:88) ( x ,x ) ∈X ×X p X ( x ) p X ( x ) p Y | X ,X ( y | x , x ) for all y ∈ Y . We use i ( x , x ; y ) , i ( x ; y | x ) and i ( x ; y | x ) to represent the joint and conditional information densities i ( x , x ; y ) = log (cid:18) p Y | X ,X ( y | x , x ) p Y ( y ) (cid:19) i ( x ; y | x ) = log (cid:18) p Y | X ,X ( y | x , x ) p Y | X ( y | x ) (cid:19) i ( x ; y | x ) = log (cid:18) p Y | X ,X ( y | x , x ) p Y | X ( y | x ) (cid:19) , where p Y | X and p Y | X are conditional marginals on Y underjoint distribution p X ,X ,Y = p X p X p Y | X ,X . We denote the 3-vector of all three quantities as i ( x , x ; y ) = i ( x , x ; y ) i ( x ; y | x ) i ( x ; y | x ) . It will be convenient to define i ( x , x ) = E [ i ( x , x ; Y ) | ( X , X ) = ( x , x )] = D ( p Y | X = x ,X = x (cid:107) p Y ) . Let V = Var ( i ( X , X )) , (2) V = E [ Var ( i ( X , X ; Y ) | X , X )] . (3)Roughly speaking, V represents the information-variance ofthe codewords, whereas V represents the information-varianceof the channel noise. Given two distributions p X , q X , let thedivergence-variance be V ( p X (cid:107) q X ) = Var p X (cid:18) log p X ( X ) q X ( X ) (cid:19) . Note that V = (cid:88) x ,x p X ( x ) p X ( x ) V ( p Y | X = x ,X = x (cid:107) p Y ) . III. M
AIN R ESULTS
Define the fundamental sum-rate limit for the facilitated-MAC as R sum ( n, (cid:15), K )= sup (cid:26) log( M M ) n : ∃ ( n, M , M , K, (cid:15) ) code (cid:27) . In the literature on second-order rates, there are typicallytwo types of results: (i) finite blocklength results, with noasymptotic terms, that are typically written in terms of ab-stract alphabets, and (ii) asymptotic results that derive fromthese finite blocklength results, which are typically easier tounderstand. The following is an achievable result which hassome flavor of both: the channel noise is dealt with via anasymptotic analysis, but the dependence on the randomnessin the codewords is written as in a finite blocklength result.We provide this “intermediate” result because, depending onthe CF parameter K , the relevant aspect of the codeworddistribution may be in the central limit, moderate deviations,or large deviations regime. Thus, in this form one may plugin any concentration bound to derive an achievable bound.Subsequently, Theorem 2 gives specific achievable resultsbased on two different concentration bounds. We also proveanother achievable bound, Theorem 3, which does not rely onTheorem 1, but instead uses an approach based on the methodof types that applies at larger values of K . Theorem 1.
Assume log K = o ( n ) . For any distribution p X , p X , let X nj ( k ) be an i.i.d. sequence from p X j for each k ∈ [ K ] , with all sequences mutually independent. There existsan ( n, M , M , K, (cid:15) ) code if (cid:15) ≥ Pr (cid:32) max k ∈ [ K ] n (cid:88) i =1 i ( X i ( k ) , X i ( k )) + (cid:112) nV Z < log( M M K ) + 12 log n (cid:33) + O (cid:32)(cid:114) log nn (cid:33) + O (cid:32)(cid:114) log Kn (cid:33) (4) log M ≤ nI ( X ; Y | X ) − c (cid:112) n log K + n log n (5) log (K) -4-3-2-10123 F S K - () Fig. 1. The inverse CDF F − S K ( (cid:15) ) for (cid:15) = 0 . , for V = V = 1 across arange of K . Note that the horizontal axis is log K , i.e., the number of bitstransmitted from the CF. log M ≤ nI ( X ; Y | X ) − c (cid:112) n log K + n log n (6) where Z is a standard Gaussian, and where c is a constant. For fixed K , let Z , . . . , Z K be drawn i.i.d. from N (0 , .Let S K = (cid:112) V Z + (cid:112) V max k ∈ [1: K ] Z k , and define the CDF of S K as F S K ( s ) = Pr( S K ≤ s ) . Also let F − S K be the inverse of the CDF; that is, F − S K ( p ) = sup { s : F S K ( s ) ≤ p } for p ∈ [0 , . In what follows we use Theorem 1 and the function F − S K to explicitly bound from below the benefit in sum-rate whencooperating with varying measures of K . A numerical com-putation of F − S K ( (cid:15) ) as a function of K is shown in Fig. 1. Thefollowing is a technical estimate of F − S K . Lemma 1.
For K and (cid:15) that satisfy K > e √ π ln(4 /(cid:15) ) , F − S K ( (cid:15) ) is at least (cid:112) V (2 ln K − /(cid:15) ) − ln ln K − ln(4 π )) − (cid:112) V ln(2 /(cid:15) ) . Moreover, for all K and (cid:15) , F − S K (1 − (cid:15) ) ≤ (cid:112) V ln K + (cid:112) V ln(4 /(cid:15) ) + (cid:112) V ln(2 /(cid:15) ) . Proof:
Let Z ( K ) = max k ∈ [ K ] Z k . From [8], it holdsthat Pr( Z ( K ) ≤ √ κ − ln κ ) ≤ (cid:15)/ for κ = 2 ln( K/ √ π ) − /(cid:15) ) ≥ . Moreover, Pr( √ V Z ≤ − (cid:112) V ln(2 /(cid:15) )) ≤ (cid:15)/ . Combining these bounds gives the desired lower bound.For the upper bound, [9], [10] imply that for any K , Pr( √ V Z ( K ) ≥ ( √ V ln K + (cid:112) V ln(4 /(cid:15) ))) ≤ (cid:15)/ . More-over, Pr( √ V Z ≥ (cid:112) V ln(2 /(cid:15) )) ≤ (cid:15)/ . Thus, F − S k (1 − (cid:15) ) ≤√ V ln K + (cid:112) V ln(4 /(cid:15) ) + (cid:112) V ln(2 /(cid:15) ) . Theorem 2.
For any p X p X ∈ P (cid:63) and the associatedconstants V and V , if log K = o ( n / ) , then R sum ( n, (cid:15), K ) ≥ C sum + 1 √ n F − S K ( (cid:15) ) − θ n where θ n = O (cid:18) log nn (cid:19) , if K ≤ log n (7) θ n = O (cid:18) Kn (cid:19) , if log n ≤ K ≤ log / n (8) θ n = O (cid:32) log / nn (cid:33) , if log / n ≤ K ≤ n (9) θ n = O (cid:32) log / Kn (cid:33) , if K ≥ n. (10)For larger K , our achievability bound employs the function ∆( a ) = max p X ,X : I ( X ; X ) ≤ a I ( X , X ; Y ) − C sum . Note that ∆(0) = 0 . Lemma 2 captures the behavior of ∆( a ) for small a . (See Appendix A for the proof.) Lemma 2.
In the limit as a → , ∆( a ) = (cid:112) a V (cid:63) ln 2 + o ( √ a ) where V (cid:63) = max p X p X ∈P (cid:63) Var ( i ( X , X )) . (11) Theorem 3.
For any K such that log K = ω (log n ) , R sum ( n, (cid:15), K ) ≥ C sum + ∆ (cid:18) log Kn − O (cid:18) log nn (cid:19)(cid:19) − O (cid:18) √ n (cid:19) . Remark 1.
While Theorems 2 and 3 appear quite different,Lemmas 1 and 2 imply that for mid-range K values, they givesimilar results. In particular, if log n (cid:28) log K (cid:28) n / then applying Theorem 2, and choosing the distribution p X p X ∈ P (cid:63) that achieves the maximum in (11) gives R sum ( n, (cid:15), K ) − C sum ≥ √ n F − S K ( (cid:15) ) − θ n ≈ (cid:114) V (cid:63) ln Kn .
For the same range of K , Theorem 3 gives R sum ( n, (cid:15), K ) − C sum ≥ ∆ (cid:18) log Kn − O (cid:18) log nn (cid:19)(cid:19) − O (cid:18) √ n (cid:19) ≈ (cid:114) V (cid:63) log Kn (cid:114) V (cid:63) ln Kn .
A. Comparison to prior work
In [4], an analog to Theorem 3 is proven for the asymptoticblocklength regime. Namely, in our notation, [4] proves thatfor any (cid:15) > and δ > , if we set K = 2 Ω( n ) then there exist n such that, R sum ( n, (cid:15), K ) − C sum > ∆ (cid:18) log Kn (cid:19) − δ. Similarly, in [4], [7], an analog to Lemma 2 is shown forasymptotic blocklength. Specifically, it is shown that theexistence of distributions p X p X ∈ P (cid:63) and p ˜ X ˜ X over X × X such that (a) the support of p ˜ X ˜ X is included inthat of p X p X , and (b) I ( ˜ X , ˜ X , ˜ Y ) + D ( p ˜ X ˜ X (cid:107) p X p X ) > I ( X , X ; Y ) for p X ,X , ˜ X , ˜ X ,Y, ˜ Y ( x , x , ˜ x , ˜ x , y, ˜ y )= p X ( x ) p X ( x ) p ˜ X , ˜ X (˜ x , ˜ x ) · p Y | X ,X ( y | x , x ) p Y | X ,X (˜ y | ˜ x , ˜ x ) , imply that there exists a constant σ such that lim a → ∆( a ) ≥ σ √ a. Although Theorem 3 and Lemma 2 (and their proof tech-niques) are similar in nature to those of [4], [7], the analysispresented here is refined in that it captures higher orderbehavior in blocklength n and further optimized to address thechallenges in studying values of K that are sub-exponentialin the blocklength n .We may also compare our results against prior achievablebounds without cooperation. Note that the standard MAC, withno cooperation, corresponds to K = 1 . In fact, in this caseTheorem 2 gives the same second-order term as the best-known achievable bound for the MAC sum-rate [11]–[15].This can be seen by noting that S ∼ N (0 , V + V ) , andso F − S ( (cid:15) ) = √ V + V Φ − ( (cid:15) ) . Thus Theorem 2 gives R sum ( n, (cid:15), ≥ C sum + (cid:114) V + V n Φ − ( (cid:15) ) − O (cid:18) log nn (cid:19) . Moreover, V + V = Var ( i ( X , X ; Y )) which, for the optimal input distribution, is precisely the best-known achievable dispersion. The proof of Theorem 2 usesi.i.d. codebooks, which, as shown in [14], can be outperformedin terms of second-order rate by constant combination code-books. However, as pointed out in [15, Sec. III-B], the twoapproaches give the same bounds on the sum-rate itself.Another interesting conclusion comes from comparing theno cooperation case ( K = 1 ) with a single bit of cooperation( K = 2 ). As long as V (cid:63) > , it is easy to see that F − S ( (cid:15) ) >F − S ( (cid:15) ) for any (cid:15) ∈ (0 , (Fig. 1 shows an example). Thus,the second-order coefficient in Theorem 2 for K = 2 is strictlyimproved compared to K = 1 . Therefore, even a single bit ofcooperation allows for O ( √ n ) additional message bits. IV. P ROOF OF T HEOREM f (1 , , f (1 , , . . . , f ( M , K ) ∼ i.i.d. p X f (1 , , f (1 , , . . . , f ( M , K ) ∼ i.i.d. p X . The facilitator code e ( m , m ) is then designed in an attemptto maximize the likelihood p Y | X ,X under a received channeloutput Y . We begin by defining the threshold decoder g ( y ) employed in our analysis. Maximum likelihood decoding isexpected to give the best performance, but instead we hereemploy a threshold decoder for simplicity. For notationalefficiency, let ( X , X )( m , m ) = ( X ( m , m ) , X ( m , m ))= ( f ( m , e ( m , m )) , f ( m , e ( m , m ))) , where e ( m , m ) is the (fixed) facilitator function to bedefined below. Given a constant vector c (cid:63) = [ c (cid:63) , c (cid:63) , c (cid:63) ] T ,we define the decoder g ( y ) to choose the unique message pair ( m , m ) such that i (( X , X )( m , m ); y ) ≥ c (cid:63) , where the vector inequality means that all three inequalitiesmust hold simultaneously. Conversely we use the notation (cid:54)≥ between vectors to mean that any one of the three inequalitiesfails. If the number of message pairs that meet this constraintis not one, we declare an error. In an attempt to ensure that i (( X , X )( m , m ); Y ) is, in some sense, large for randomchannel outputs Y that may result from that codeword pair’stransmissions, for each ( m , m ) ∈ [ M ] × [ M ] , we define e ( m , m ) = arg max k ∈ [ K ] s ( f ( m , k ) , f ( m , k )) , where s ( x , x ) is a score function to be chosen below.Under this code design, the expected error probabilitysatisfies E [ P e ] = E [Pr ( g ( Y ) (cid:54) = (1 , | ( X , X ) = ( X , X )(1 , ≤ Pr (cid:32) i (( X , X )( m , m ); Y ) (cid:54)≥ c (cid:63) or i ( X ( ˆ m , ˆ k ) , X ( ˆ m , ˆ k ); Y ) ≥ c (cid:63) for any ( ˆ m , ˆ m ) (cid:54) = (1 , , k ∈ [ K ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( X , X ) = ( X , X )(1 , (cid:33) . To further upper bound the error probability, we define thefollowing random variables. Let p ˜ X , ˜ X be the joint distri-bution of ( X , X )(1 , that results from our choice of CF.This distribution would be the same for any message pair. Letvariables X , X , ˜ X , ˜ X , Y , ˜ Y have joint distribution p X ,X , ˜ X , ˜ X ,Y, ˜ Y ( x , x , ˜ x , ˜ x , y, ˜ y )= p X ( x ) p X ( x ) p ˜ X , ˜ X (˜ x , ˜ x ) · p Y | X ,X ( y | x , x ) p Y | X ,X (˜ y | ˜ x , ˜ x ) . Under transmission of message pair (1 , , ( X , X , Y ) capture the relationship between channel inputs and out-put in a standard MAC, whereas ( ˜ X , ˜ X , ˜ Y ) capture thecorresponding relationship with CF. Moreover, ( ˜ X , X , ˜ Y ) , ( X , ˜ X , ˜ Y ) , and ( X , X , ˜ Y ) capture the relationship be-tween the channel output and one or more untransmitted code-words from our random code. Assume without loss of general-ity that e (1 ,
1) = 1 ; i.e., ( X , X )(1 ,
1) = ( f (1 , , f (1 , .We now analyze the error probability by considering thefollowing cases. ˆ m ˆ m ˆ k Number of values Distribution (cid:54) = 1 M − p X p ˜ X , ˜ Y (cid:54) = 1 (cid:54) = 1 ( M − K − p X p X p ˜ Y (cid:54) = 1 M − p ˜ X , ˜ Y p X (cid:54) = 1 (cid:54) = 1 ( M − K − p X p X p ˜ Y (cid:54) = 1 (cid:54) = 1 any ( M − M − K p X p X p ˜ Y Note that we have excluded cases where ˆ m = ˆ m = 1 , sincethose are not errors (even if ˆ k (cid:54) = 1 ). Moreover, the number ofcases wherein X ( ˆ m , ˆ k ) , X ( ˆ m , ˆ k ) , Y has joint distribution p X p X p ˜ Y is less than M M K . We can upper bound theexpected error probability as E [ P e ] ≤ Pr (cid:16) i ( ˜ X , ˜ X ; ˜ Y ) (cid:54)≥ c (cid:63) (cid:17) + M M K Pr( i ( X , X ; ˜ Y ) ≥ c (cid:63) )+ M Pr( i ( X , ˜ X ; ˜ Y ) ≥ c (cid:63) )+ M Pr( i ( ˜ X , X ; ˜ Y ) ≥ c (cid:63) )) ≤ Pr (cid:16) i ( ˜ X , ˜ X ; ˜ Y ) (cid:54)≥ c (cid:63) (cid:17) + M M K Pr( i ( X , X ; ˜ Y ) ≥ c (cid:63) )+ M Pr( i ( X ; ˜ Y | ˜ X ) ≥ c (cid:63) )+ M Pr( i ( X ; ˜ Y | ˜ X ) ≥ c (cid:63) ) Note that
Pr( i ( X ; ˜ Y | ˜ X ) ≥ c (cid:63) )= (cid:88) x ,x ,y p X ( x ) p ˜ X , ˜ Y ( x , y )1 ( i ( x ; y | x ) ≥ c (cid:63) )= (cid:88) x ,x ,y p X | X ( x | x ) p ˜ X , ˜ Y ( x , y ) · i ( x ; y | x ) ≥ c (cid:63) )= (cid:88) x ,y p ˜ X , ˜ Y ( x , y ) (cid:88) x p X | X ,Y ( x | x , y ) · p X | X ( x | x ) p X | X ,Y ( x | x , y ) 1 ( i ( x ; y | x ) ≥ c (cid:63) )= (cid:88) x ,y p ˜ X , ˜ Y ( x , y ) (cid:88) x p X | X ,Y ( x | x , y ) · p Y | X ( y | x ) p Y | X ,X ( y | x , x ) · (cid:18) log p Y | X ,X ( y | x , x ) p Y | X ( y | x ) ≥ c (cid:63) (cid:19) ≤ (cid:88) x ,y p ˜ X , ˜ Y ( x , y ) (cid:88) x p X | X ,Y ( x | x , y ) exp( − c (cid:63) )= exp( − c (cid:63) ) . Applying similar arguments to the other terms, we find E [ P e ] ≤ Pr (cid:16) i ( ˜ X , ˜ X ; ˜ Y ) (cid:54)≥ c (cid:63) (cid:17) + M M K exp( − c (cid:63) )+ M exp( − c (cid:63) ) + M exp( − c (cid:63) ) . (12)Note that (12) may be viewed as a finite blocklength achiev-able result. While our primary goal is asymptotic second-order analysis, we proceed by analyzing this bound on the n -length product channel. Specifically, we now focus on the casewhere ( X × X , p Y | X ,X , Y ) captures n uses of a discrete,memoryless channel. We designate this special case by ( X n × X n , ( p Y | X ,X ) n , Y n ) and add superscript n to all coding functions as a reminder ofthe scenario in operation. Assume that each codeword entry isdrawn i.i.d. from p X or p X . Define the CF’s score functionas s ( x n , x n ) = n (cid:88) i =1 i ( x i , x i ) . If we choose c (cid:63) = log( M M K ) + 12 log n,c (cid:63) = log M + 12 log n,c (cid:63) = log M + 12 log n, then E [ P e ] ≤ Pr (cid:16) i ( ˜ X n , ˜ X n ; ˜ Y n ) (cid:54)≥ c (cid:63) (cid:17) + 3 √ n ≤ Pr (cid:18) i ( ˜ X n , ˜ X n ; ˜ Y n ) < log( M M K ) + 12 log n (cid:19) + Pr (cid:18) i ( ˜ X n ; ˜ Y n | ˜ X n ) < log M + 12 log n (cid:19) + Pr (cid:18) i ( ˜ X n ; ˜ Y n | ˜ X n ) < log M + 12 log n (cid:19) + 3 √ n . (13)We begin by bounding the second and third terms of (13)before returning to bound the first. For the second term in(13), recall that ˜ X n , ˜ X n are drawn from the distribution of ( X n , X n )(1 , induced by the cooperation facilitator, Pr (cid:18) i ( ˜ X n ; ˜ Y n | ˜ X n ) < log M + 12 log n (cid:19) (14) ≤ K Pr (cid:18) i ( X n ; Y n | X n ) < log M + 12 log n (cid:19) ≤ K exp (cid:40) − a n (cid:18) nI ( X ; Y | X ) − log M −
12 log n (cid:19) (cid:41) where the last inequality follows from Hoeffding’s inequalityand the assumption that i ( X ; Y | X ) is bounded, where a is a constant employed in this bound. By the assumption of log M in the statement of the theorem, this quantity is atmost / √ n for a suitable constant c . A similar bound can beapplied to the third term in (13). Now we consider the first term in (13). For fixed x n , x n , E [ i ( x n , x n ; Y n )] = n (cid:88) i =1 i ( x i , x i ) = s ( x n , x n ) . Thus we can apply the Berry-Esseen theorem to write Pr (cid:18) i ( x n , x n ; Y n ) < c (cid:63) (cid:12)(cid:12)(cid:12)(cid:12) ( X n , X n ) = ( x n , x n ) (cid:19) ≤ Pr (cid:18) s ( x n , x n ) + (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 V ( p ( y | x i , x i ) (cid:107) p Y ) Z < c (cid:63) (cid:19) + B √ n where, as in the statement of the theorem, Z is a standardGaussian random variable.Assume V ( p Y | X = x ,X = x (cid:107) p Y ) ≤ V max for all x , x . Let γ = V max (cid:115) ln K + ln n n . Note that γ = O (cid:32)(cid:114) log Kn (cid:33) + O (cid:32)(cid:114) log nn (cid:33) . By the assumption that log K = o ( n ) , γ = o (1) . ByHoeffding’s inequality, we may write Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 V ( p ( y | X i , X i ) (cid:107) p Y ) − V (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > γ (cid:33) ≤ (cid:26) − nγ V (cid:27) = 2 K √ n . Thus, by the union bound Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 V ( p ( y | f i (1 , k ) , f i (1 , k )) (cid:107) p Y ) − V (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > γ for any k ∈ [ K ] (cid:33) ≤ √ n . Thus E [ P e ] ≤ Pr (cid:32) s ( ˜ X n , ˜ X n )+ (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 V ( p ( y | ˜ X i , ˜ X i ) (cid:107) p Y ) Z < c (cid:63) (cid:33) + O (cid:18) √ n (cid:19) ≤ E (cid:34) max V (cid:48) ∈ [ V − γ,V + γ ] Pr (cid:32) s ( ˜ X n , ˜ X n ) + √ nV (cid:48) Z < c (cid:63) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ X n , ˜ X n (cid:33)(cid:35) + O (cid:18) √ n (cid:19) = E (cid:34) max V (cid:48) ∈ [ V − γ,V + γ ] Φ (cid:32) c (cid:63) − s ( ˜ X n , ˜ X n ) √ nV (cid:48) (cid:33) (cid:35) + O (cid:18) √ n (cid:19) where Φ( · ) is the Gaussian CDF. Similarly, let φ ( · ) be theGaussian PDF. Given x n , x n , let z = c (cid:63) − s ( x n , x n ) √ n . If z ≥ , then we may bound max V (cid:48) ∈ [ V − γ,V + γ ] Φ (cid:18) z √ V (cid:48) (cid:19) = Φ (cid:18) z √ V − γ (cid:19) = Φ (cid:18) z √ V (cid:19) + (cid:90) z/ √ V − γz/ √ V φ ( y ) dy ≤ Φ (cid:18) z √ V (cid:19) + (cid:18) √ V − γ − √ V (cid:19) zφ (cid:18) z √ V (cid:19) ≤ Φ (cid:18) z √ V (cid:19) + (cid:18) √ V − γ − √ V (cid:19) (cid:114) V πe = Φ (cid:18) z √ V (cid:19) + (cid:32)(cid:115) V V − γ − (cid:33) √ πe . If z ≥ , then we may bound max V (cid:48) ∈ [ V − γ,V + γ ] Φ (cid:18) z √ V (cid:48) (cid:19) = Φ (cid:18) z √ V + γ (cid:19) = Φ (cid:18) z √ V (cid:19) + (cid:90) z/ √ V + γz/ √ V φ ( y ) dy ≤ Φ (cid:18) z √ V (cid:19) + (cid:18) √ V − √ V + γ (cid:19) | z | φ (cid:18) z √ V + γ (cid:19) ≤ Φ (cid:18) z √ V (cid:19) + (cid:18) √ V − √ V + γ (cid:19) (cid:114) V + γ πe = Φ (cid:18) z √ V (cid:19) + (cid:32)(cid:114) V + γV − (cid:33) √ πe . Since γ = o (1) , combining the above bounds gives max V (cid:48) ∈ [ V − γ,V + γ ] Φ (cid:18) z √ V (cid:48) (cid:19) ≤ Φ (cid:18) z √ V (cid:19) + O ( γ ) . Thus, E [ P e ] ≤ E (cid:34) Φ (cid:32) c (cid:63) − s ( ˜ X n , ˜ X n ) √ nV (cid:33)(cid:35) + O ( γ ) + O (cid:18) √ n (cid:19) = Pr (cid:18) s ( ˜ X n , ˜ X n ) + (cid:112) nV Z < c (cid:63) (cid:19) + O ( γ ) + O (cid:18) √ n (cid:19) = Pr (cid:18) max k ∈ [ K ] n (cid:88) i =1 i ( X i ( k ) , X i ( k )) + (cid:112) nV Z < c (cid:63) (cid:19) + O (cid:32)(cid:114) log Kn (cid:33) + O (cid:32)(cid:114) log nn (cid:33) . V. P
ROOF OF T HEOREM (cid:15) , our goal is to choose M , M to satisfy theconditions of Theorem 1, while log( M M ) = nC sum + √ n F − S K ( (cid:15) ) − nθ n (15)where θ n satisfies one of (7)–(10) depending on K . Given p X p X ∈ P (cid:63) , let r , r be rates where r + r = I ( X , X ; Y ) = C sum , (16) r < I ( X ; Y | X ) , (17) r < I ( X ; Y | X ) . (18)Let log M j = nr j + 12 (cid:2) √ n F − S K ( (cid:15) ) − nθ n (cid:3) . This choice clearly satisfies (15). By Lemma 1, F − S K ( (cid:15) ) = √ V ln K + O (1) , so for sufficiently large n , (5)–(6) are easilysatisfied. It remains to prove (4). Let p e be the probability in(4). We divide the remainder of the proof into two cases. Case 1: K ≤ log / n . We adopt the notation from theproof of Theorem 1, specifically s ( X n , X n ) = n (cid:88) i =1 i ( X i ( k ) , X i ( k )) ,c (cid:63) = log( M M K ) + 12 log n. Thus p e ≤ (cid:90) ∞−∞ φ ( z ) Pr (cid:18) max k ∈ [ K ] s ( X n ( k ) , X n ( k )) < c (cid:63) − (cid:112) nV z (cid:19) dz = (cid:90) ∞−∞ φ ( z ) Pr (cid:18) s ( X n , X n ) < c (cid:63) − (cid:112) nV z (cid:19) K dz. Note that s ( X n , X n ) is an i.i.d. sum where each term hasexpectation E [ i ( X , X )] = I ( X , X ; Y ) = C sum and variance V . Thus, by the Berry-Esseen theorem, p e ≤ (cid:90) ∞−∞ φ ( z ) (cid:20) Pr (cid:18) nC sum + √ n σZ < c (cid:63) − (cid:112) nV z (cid:19) + B √ n (cid:21) K dz where Z ∼ N (0 , . For any p ∈ [0 , and any ≤ q ≤ /K ,we can bound ( p + q ) K = K (cid:88) (cid:96) =0 (cid:18) K(cid:96) (cid:19) p K − (cid:96) q (cid:96) ≤ p K + K (cid:88) (cid:96) =1 (cid:18) K(cid:96) (cid:19) q (cid:96) = p K + (1 + q ) K − ≤ p K + e qK − ≤ p K + 2 qK. By the assumption that K ≤ log / n , for sufficiently large n , B √ n ≤ K . Thus p e ≤ (cid:90) ∞−∞ φ ( z ) Pr (cid:18) nC sum + √ n σZ < c (cid:63) − (cid:112) nV z (cid:19) K dz + 2 B K √ n = Pr( nC sum + √ n S K < c (cid:63) ) + O (cid:18) K √ n (cid:19) = F S K (cid:18) log( M M K ) + log n − nC sum √ n (cid:19) + O (cid:18) K √ n (cid:19) . Recalling Theorem 1, we can achieve probability of error (cid:15) if p e + O (cid:32)(cid:114) log nn (cid:33) + O (cid:32)(cid:114) log Kn (cid:33) ≤ (cid:15). This condition is satisfied if log( M M K ) + 12 log n = nC sum + √ n F − S K (cid:32) (cid:15) − c K √ n − c (cid:114) log nn − c (cid:114) log Kn (cid:33) for suitable constants c , c , c and sufficiently large n . Tosimplify the second term, we need the following lemma, whichis proved in Appendix B. Lemma 3.
Fix (cid:15) ∈ (0 , and V , V > . Then sup K ≥ ddp F − S K ( p ) (cid:12)(cid:12)(cid:12)(cid:12) p = (cid:15) < ∞ . Applying Lemma 3, there exists a sequence of codes if log( M M ) n ≥ C sum + 1 √ n F − S K ( (cid:15) ) − O (cid:18) Kn (cid:19) − O (cid:18) log nn (cid:19) . This achieves the ranges of K given by (7)–(8). Case 2: K ≥ log / n and log K = o ( n / ) . For conve-nience, define A = c (cid:63) − max k ∈ [ K ] s ( X n ( k ) , X n ( k )) √ nV . Thus, p e = Pr( Z < A ) ≤ Pr( Z < A, | Z | < √ ln n ) + Pr( | Z | ≥ √ ln n ) ≤ Pr( Z < A, | Z | < √ ln n ) + O (cid:18) √ n (cid:19) = (cid:90) √ ln n −√ ln n φ ( z ) Pr( z < A ) dz + O (cid:18) √ n (cid:19) = (cid:90) √ ln n −√ ln n φ ( z ) Pr (cid:18) max k ∈ [ K ] s ( X n ( k ) , X n ( k )) < c (cid:63) − (cid:112) nV z (cid:19) dz + O (cid:18) √ n (cid:19) = (cid:90) √ ln n −√ ln n φ ( z ) Pr (cid:16) s ( X n , X n ) < c (cid:63) − (cid:112) nV z (cid:17) K dz + O (cid:18) √ n (cid:19) . To continue, we need the moderate deviations bound given bythe following lemma.
Lemma 4 (Moderate deviations [16]) . Let X , X , . . . be i.i.d.random variables with zero mean and unit variance, and let W = (cid:80) ni =1 X i / √ n where c = E [ e t | X | ] < ∞ for some t > .There exist constants a and b depending only on t and c suchthat, for any ≤ w ≤ a n / , (cid:12)(cid:12)(cid:12)(cid:12) Pr( W ≥ w ) Q ( w ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ b (1 + w ) √ n , where Q ( w ) = 1 − Φ( w ) is the complementary CDF of thestandard Gaussian distribution. To apply the moderate deviations bound, we can write Pr (cid:16) s ( X n , X n ) < c (cid:63) − (cid:112) nV z (cid:17) = Pr (cid:18) s ( X n , X n ) − nC sum √ nV < w z (cid:19) where w z = c (cid:63) − √ nV z − nC sum √ nV . Since in our integral, | z | ≤ √ ln n , in order to apply the mod-erate deviations bound, we need to prove that | w z | ≤ a n / as long as | z | ≤ √ ln n . We have | w z | ≤ | c (cid:63) − nC sum |√ nV + (cid:114) V ln nV . From the target for M M in (15), c (cid:63) = log( M M K ) + 12 log n = nC sum + √ n F − S K ( (cid:15) ) − nθ n + log K + 12 log n ≤ nC sum + (cid:112) V n ln K + log K + O (log n )= nC sum + O ( n / ) . By the assumption that log K = o ( n / ) , √ n ln K (cid:29) log K ,so | w z | = O ( (cid:112) log K ) + O ( (cid:112) log n ) . Thus | w z | = o ( n / ) , so indeed we may apply the moderatedeviations bound. Let λ n = max | z |≤√ ln n b √ n (1 + | w z | )= O (cid:32) log / K √ n (cid:33) + O (cid:32) log / n √ n (cid:33) Letting Z ∼ N (0 , we now have p e ≤ (cid:90) √ ln n −√ ln n φ ( z ) (1 − Pr( Z > w z )(1 − λ n )) K dz + O (cid:18) √ n (cid:19) ≤ (cid:90) √ ln n −√ ln n φ ( z ) (1 − Q ( w z )(1 − λ n )) K dz + O (cid:18) √ n (cid:19) . We now claim that for any w ≥ and any ≤ λ ≤ / , Q ( w )(1 − λ ) ≥ Q ( w + 2 λ ) . Indeed, it is easy to see that Q ( w + 2 λ ) Q ( w ) ≤ Q (2 λ ) Q (0) = 2 Q (2 λ ) ≤ − λ, where the last inequality holds if λ ≤ / . Note that λ n = o (1) , so this inequality holds for sufficiently large n . Thus, p e ≤ E (cid:34) (1 − Q ( w Z ) (1 − λ n )) K · w Z ≥ (cid:35) + Pr ( w Z <
0) + O (cid:18) √ n (cid:19) ≤ E (cid:104) (1 − Q ( w Z + 2 λ n )) K (cid:105) + Q (cid:18) c (cid:63) − nC sum √ nV (cid:19) + O (cid:18) √ n (cid:19) . Note that E (cid:104) (1 − Q ( w Z + 2 λ n )) K (cid:105) = E (cid:34) Pr (cid:18) Z < c (cid:63) − √ nV Z − nC sum √ nV + 2 λ n (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:19) K (cid:35) = Pr (cid:18) max k ∈ [ K ] Z k < c (cid:63) − √ nV Z − nC sum √ nV + 2 λ n (cid:19) = F S K (cid:18) c (cid:63) − nC sum √ n + 2 (cid:112) V λ n (cid:19) . At this point, we make the choice of M M slightly moreprecise; in particular, let log( M M ) = nC sum + √ nF − S K (cid:32) (cid:15) − c (cid:114) log Kn − c (cid:114) log nn − K − V / (2 V ) (cid:33) − (cid:112) nV λ n −
12 log n − log K for suitable constants c and c . From Lemma 1, c (cid:63) − nC sum √ n ≥ (cid:112) V ln K − o (1) . Thus Q (cid:18) c (cid:63) − nC sum √ nV (cid:19) ≤ exp − (cid:32)(cid:114) V ln KV − o (1) (cid:33) ≤ K − V / (2 V ) where the last inequality holds for sufficiently large n . FromTheorem 1, there exists a code with probability of error atmost p e + O (cid:32)(cid:114) log Kn (cid:33) + O (cid:32)(cid:114) log nn (cid:33) ≤ (cid:15) − c (cid:114) log Kn − c (cid:114) log nn + O (cid:18) √ n (cid:19) + O (cid:32)(cid:114) log Kn (cid:33) + O (cid:32)(cid:114) log nn (cid:33) ≤ (cid:15) assuming c , c are chosen properly. This proves that we canachieve the sum-rate log( M M ) n ≥ C sum + 1 √ n F − S K (cid:32) (cid:15) − c (cid:114) log Kn − c (cid:114) log nn − K − V / (2 V ) (cid:33) − √ V λ n √ n − log n n − log Kn ≥ C sum + 1 √ n F − S K ( (cid:15) ) − O (cid:32) log / Kn (cid:33) − O (cid:32) log / nn (cid:33) where in the last inequality we have used Lemma 3 as wellas the bound on λ n . This achives the ranges of K given by(9)–(10). VI. P ROOF OF T HEOREM p X is an n -length type on alphabet X if p X ( x ) is amultiple of /n for each x ∈ X . For an n -length type p X , thetype class is denoted T ( p X ) .Let p X ,X be an n -length joint type on alphabet X × X .Note that the marginal distributions p X and p X are also n -length types. We employ the following random code construc-tion. Draw codewords uniformly from the type classes T ( p X ) and T ( p X ) . Given message pair ( m , m ) , the cooperationfacilitator chooses uniformly from the set of k ∈ [ K ] where ( f ( m , k ) , f ( m , k )) ∈ T ( p X ,X ) . If there is no such k , the CF chooses k uniformly at random.These random choices at the CF are taken to be part of therandom code design. For the purposes of this proof, the threeinformation densities employ the joint distribution p X ,X . Thequantity V is also defined as in (3) using information densityfor this joint distribution. The decoder is as follows. Given y n ,choose the unique message pair ( m , m ) such that1) i (( X n , X n )( m , m ); y n ) ≥ c (cid:63) ,2) (( X n , X n )( m , m )) ∈ T ( p X ,X ) for a constant vector c (cid:63) = [ c (cid:63) , c (cid:63) , c (cid:63) ] T to be determined. Ifthere is no message pair or more than one satisfying theseconditions, declare an error. Note that, given ( X n , X n )( m , m ) ∈ T ( p X ,X ) , ( X n , X n )( m , m ) is uniformly distributed on T ( p X ,X ) .Let q ( x n , x n ) be the uniform distribution on the typeclass T ( p X ,X ) , with corresponding conditional distributions q ( x n | x n ) and q ( x n | x n ) . Define random variables X n , X n , Y n to have distribution p X n ,X n ,Y n ( x n , x n , y n ) = q ( x n , x n ) p Y n | X n ,X n ( y n | x n , x n ) . Furthermore, define Y n , Y n , Y n where p Y n ,Y n ,Y n | X n ,X n ,Y n ( y n , y n , y n | x n , x n , y n )= p Y n | X n ( y n | x n ) p Y n | X n ( y n | x n ) p Y n ( y n ) Now we may bound the expected error probability by E [ P e ] (19) ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i (( X n , X n )(1 , Y n ) (cid:54)≥ c (cid:63) )+ (cid:88) ( ˆ m , ˆ m ) (cid:54) =(1 , Pr (cid:18) ( X n , X n )( ˆ m , ˆ m ) ∈ T ( p X ,X ) , i (( X n , X n )( ˆ m , ˆ m ); Y n ) ≥ c (cid:63) ) (cid:12)(cid:12)(cid:12)(cid:12) ( X n , X n )(1 , (cid:19) ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i (( X n , X n )(1 , Y n ) (cid:54)≥ c (cid:63) )+ (cid:88) ( ˆ m , ˆ m ) (cid:54) =(1 , Pr (cid:18) i (( X n , X n )( ˆ m , ˆ m ); Y n ) ≥ c (cid:63) ) (cid:12)(cid:12)(cid:12)(cid:12) ( X n , X n )(1 , , ( X n , X n )( ˆ m , ˆ m ) ∈ T ( p X ,X ) (cid:19) . (20)In the summation in (20), consider a term where ˆ m (cid:54) = 1 and ˆ m (cid:54) = 1 . In this case, ( X n , X n )( ˆ m , ˆ m ) is independent from Y n , so we may write that (( X n , X n )( ˆ m , ˆ m ) , Y n ) d = ( X n , X n , Y n ) , where Y n has the same distribution as Y n but is independentfrom X n , X n ; i.e., p Y n | X n ,X n ( y n | x n , x n ) = p Y n ( y n ) . Now consider a term in (20) where ˆ m = 1 but ˆ m (cid:54) = 1 .In this case, whether the transmitted signal from user 1 withmessage pair (1 , ˆ m ) is the same as that with message pair (1 , depends on whether e (1 , ˆ m ) = e (1 , . Thus, the termin (20) is no more than Pr (cid:18) i (( X n , X n )(1 , ˆ m ); Y n ) ≥ c (cid:63) ) (cid:12)(cid:12)(cid:12)(cid:12) ( X n , X n )(1 , , ( X n , X n )(1 , ˆ m ) ∈ T ( p X ,X ) , e (1 , ˆ m ) = e (1 , (cid:19) + Pr (cid:18) i (( X n , X n )(1 , ˆ m ); Y n ) ≥ c (cid:63) ) (cid:12)(cid:12)(cid:12)(cid:12) ( X n , X n )(1 , , ( X n , X n )(1 , ˆ m ) ∈ T ( p X ,X ) , e (1 , ˆ m ) (cid:54) = e (1 , (cid:19) . In the first term, Y n is the channel output where X n (1 , ˆ m ) is one of the channel inputs, but the channel input for user 2 isunrelated. However, by the condition that ( X n , X n )(1 , ˆ m ) ∈ T ( p X ,X ) , these two codewords are distributed according to q ( x n , x n ) . Thus we may write that (( X n , X n )(1 , ˆ m ) , Y n ) d = ( X n , X n , Y n ) , where p Y n | X n ,X n ( y n | x n , x n ) = p Y n | X n ( y n | x n ) . In the second term, the transmitted signals are unrelated, andso the three sequences once again have the same distributionas ( X n , X n , Y n ) . We may apply a similar analysis for thecase where ˆ m (cid:54) = 1 and ˆ m = 1 , defining Y n by p Y n | X n ,X n ( y n | x n , x n ) = p Y n | X n ( y n | x n ) . Therefore E [ P e ] ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i ( X n , X n ; Y n ) (cid:54)≥ c (cid:63) )+ M M Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) )+ M Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) )+ M Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) ) ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i ( X n , X n ; Y n ) (cid:54)≥ c (cid:63) )+ M M Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) )+ M Pr( i ( X n ; Y n | X n ) ≥ c (cid:63) )+ M Pr( i ( X n ; Y n | X n ) ≥ c (cid:63) ) . For any ( x n , x n ) ∈ T ( p X ,X ) , q ( x n , x n )= 1 | T ( p X ,X ) |≤ ( n + 1) |X |·|X | − nH ( X ,X ) = ( n + 1) |X |·|X | (cid:89) x ,x p X ,X ( x , x ) np X ,X ( x ,x ) = ( n + 1) |X |·|X | n (cid:89) i =1 p X ,X ( x i , x i ) . Thus, for any x n , x n including those not in T ( p X ,X ) , q ( x n , x n ) ≤ ( n + 1) |X |·|X | n (cid:89) i =1 p X ,X ( x i , x i ) . By similar calculations q ( x n | x n ) ≤ ( n + 1) |X |·|X | n (cid:89) i =1 p X | X ( x i | x i ) ,q ( x n | x n ) ≤ ( n + 1) |X |·|X | n (cid:89) i =1 p X | X ( x i | x i ) . We bound
Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) ) as Pr( i ( X n , X n ; Y n ) ≥ c (cid:63) )= (cid:88) x n ,x n ,y n q ( x n , x n ) p Y n ( y n )1( i ( x n , x n ; y n ) ≥ c (cid:63) ) ≤ ( n + 1) |X |·|X | (cid:88) x n ,x n ,y n n (cid:89) i =1 p X ,X ( x i , x i ) p Y n ( y n ) · (cid:32) n (cid:88) i =1 i ( x i , x i ; y i ) ≥ c (cid:63) (cid:33) ≤ ( n + 1) |X |·|X | exp {− c (cid:63) }· (cid:88) x n ,x n ,y n n (cid:89) i =1 p X ,X | Y ( x i , x i | y i ) p Y n ( y n )= ( n + 1) |X |·|X | exp {− c (cid:63) } . Using similar bounds on the other terms, we have E [ P e ] ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i ( X n , X n ; Y n ) (cid:54)≥ c (cid:63) ) + ( n + 1) |X |·|X | (cid:16) M M exp {− c (cid:63) } + M exp {− c (cid:63) } + M exp {− c (cid:63) } (cid:17) . Next, choose c (cid:63) = log( M M ) + 12 log n + |X | · |X | log( n + 1) ,c (cid:63) = log M + 12 log n + |X | · |X | log( n + 1) ,c (cid:63) = log M + 12 log n + |X | · |X | log( n + 1) . Then E [ P e ] ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i ( X n , X n ; Y n ) (cid:54)≥ c (cid:63) ) + 3 √ n ≤ Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))+ Pr( i ( X n , X n ; Y n ) < c (cid:63) )+ Pr( i ( X n ; Y n | X n ) < c (cid:63) )+ Pr( i ( X n ; Y n | X n ) < c (cid:63) ) + 3 √ n . (21)As in the proof of Thm. 2, let ( r , r ) be a pair of ratessatisfying (16)–(18). We now choose log M j = r j − (cid:16)(cid:112) nV Q − ( (cid:15) ) + nθ n (cid:17) , j = 1 , where θ n is an error term to be determined chosen below tosatisfy θ n ≤ O ( log nn ) . Thus log( M M ) = nI ( X , X ; Y ) − (cid:112) nV Q − ( (cid:15) ) − nθ n . Consider the first term in (21). Note that ( X n , X n )(1 , / ∈ T ( p X ,X ) only if ( f (1 , k ) , f (1 , k )) / ∈ T ( p X ,X ) for all k ∈ [ K ] . This occurs with probability bounded as
Pr(( X n , X n )(1 , / ∈ T ( p X ,X ))= (cid:18) − | T ( p X ,X ) || T ( p X ) | · | T ( p X ) | (cid:19) K ≤ (cid:16) − ( n + 1) −|X |·|X | − nI ( X ; X ) (cid:17) K ≤ exp (cid:110) − K ( n + 1) −|X |·|X | − nI ( X ; X ) (cid:111) ≤ √ n , where the last inequality holds if I ( X ; X ) ≤ n (cid:18) log K − |X | · |X | log( n + 1) − log (cid:18)
12 ln n (cid:19) (cid:19) . (22)Now consider the second term in (21). For any ( x n , x n ) ∈ T ( p X ,X ) , n (cid:88) i =1 E [ i ( x i , x i ; Y i )] = nI ( X , X ; Y ) , n (cid:88) i =1 Var [ i ( x i , x i ; Y i )]= n (cid:88) x ,x p X ,X ( x , x ) V ( p Y | X = x ,X = x (cid:107) p Y ) = nV . By the Berry-Esseen inequality,
Pr( i ( X n , X n ; Y n ) < c (cid:63) ) ≤ max ( x n ,x n ) ∈ T ( p X ,X ) Pr( i ( x n , x n ; Y n ) < c (cid:63) | x n , x n ) ≤ Q (cid:18) nI ( X , X ; Y ) − c (cid:63) √ nV (cid:19) + O (cid:18) √ n (cid:19) . As in the proof of Thm. 2 (near (14)), we use Hoeffding’sinequality to bound the third and fourth terms in (21) fromabove by / √ n .Putting together all the above bounds, for any p X ,X satisfying (22), we find E [ P e ] ≤ Q (cid:18) nI ( X , X ; Y ) − c (cid:63) √ nV (cid:19) + O (cid:18) √ n (cid:19) = Q (cid:18) nI ( X , X ; Y ) − log( M M ) − O (log n ) √ nV (cid:19) + O (cid:18) √ n (cid:19) = Q (cid:18) Q − ( (cid:15) ) + (cid:114) nV θ n − O (cid:18) log n √ n (cid:19)(cid:19) + O (cid:18) √ n (cid:19) . There exists a choice for θ n = O ( log nn ) where this bound isno greater than (cid:15) . This proves that we can achieve the sum-rate log( M M ) n ≥ I ( X , X ; Y ) − (cid:114) V n Q − ( (cid:15) ) − O (cid:18) log nn (cid:19) for any p X ,X satisfying (22).A PPENDIX AP ROOF OF L EMMA x ≈ y means that x − y → as a → .For small a , I ( X ; X ) ≤ a implies that p X ,X ≈ p X p X .Thus, the second-order Taylor approximation for the mutualinformation gives I ( X ; X ) ≈
12 ln 2 (cid:88) x ,x ( p X ,X ( x , x ) − p X ( x ) p X ( x )) p X ( x ) p X ( x ) . Moreover, the first-order Taylor approximation of the mutualinformation I ( X , X ; Y ) is (cid:88) x ,x ,y p X ,X ( x , x ) log p Y | X ,X ( y | x , x ) p Y ( y ) where p Y ( y ) = (cid:88) x ,x p X ( x ) p X ( x ) p Y | X ,X ( y | x , x ) . As usual, let i ( x , x ; y ) = log p Y | X ,X ( y | x , x ) p Y ( y ) i ( x , x ) = (cid:88) y p Y | X ,X ( y | x , x ) i ( x , x ; y ) . Also let I ( X , X ; Y ) = (cid:88) x ,x p X ( x ) p X ( x ) i ( x , x ) be the mutual information where X and X are independent.We can now rewrite the optimization problem for ∆( a ) interms of the marginal distributions p X , p X , and r ( x , x ) = p X ,X ( x , x ) − p X ( x ) p X ( x ) . Note that I ( X , X ; Y ) − C sum ≈ (cid:88) x ,x r ( x , x ) i ( x , x ) + I ( X , X ; Y ) − C sum In particular, if we consider maximizing over only r , theoptimization problem ismaximize (cid:80) x ,x r ( x , x ) i ( x , x ) subject to (cid:80) x ,x r ( x ,x ) p X ( x ) p X ( x ) ≤ a (cid:80) x r ( x , x ) = 0 for all x ∈ X . (cid:80) x r ( x , x ) = 0 for all x ∈ X (23)The Lagrangian for this problem is (cid:88) x ,x r ( x , x ) i ( x , x ) − λ (cid:18) r ( x , x ) p X ( x ) p X ( x ) − a (cid:19) + (cid:88) x ν ( x ) (cid:88) x r ( x , x ) + (cid:88) x ν ( x ) (cid:88) x r ( x , x ) . Differentiating with respect to r ( x , x ) and setting to zero,we find that the optimal r ( x , x ) is of the form r ( x , x ) = p X ( x ) p X ( x )2 λ ( i ( x , x ) + ν ( x ) + ν ( x )) . We first find the values of the dual variables ν and ν . Forany x , we need (cid:88) x r ( x , x )= p X ( x )2 λ ( E [ i ( x , X )] + ν ( x ) + E [ ν ( X )]) where the expectations are with respect to ( X , X ) ∼ p X p X . Combining this constraint with the equivalent onefor ν , we must have ν ( x ) = − E [ i ( x , X )] − E [ ν ( X )] ν ( x ) = − E [ i ( X , x )] − E [ ν ( X )] . Taking the expectation of either constraint gives E [ ν ( X )] + E [ ν ( X )] = − E [ i ( X , X )] . Thus ν ( x ) + ν ( x )= − E [ i ( x , X )] − E [ i ( X , x )] + E [ i ( X , X )] . and so r ( x , x ) = 12 λ p X ( x ) p X ( x ) j ( x , x ) where j ( x , x ) = i ( x , x ) − E [ i ( x , X )] − E [ i ( X , x )] + E [ i ( X , X )] . To find λ , we use the constraint a (cid:88) x ,x r ( x , x ) p X ( x ) p X ( x )= 1(2 λ ) E [ j ( X , X ) ] so λ = (cid:115) a E [ j ( X , X ) ] . We may now derive the optimal objective value for theoptimization problem in (23), which is (cid:88) x ,x r ( x , x ) i ( x , x )= (cid:115) a E [ j ( X , X ) ] E [ j ( X , X ) i ( X , X )] . Now considering the optimization over p X , p X , we maywrite ∆( a ) ≈ max p X p X (cid:115) a E [ j ( X , X ) ] E [ j ( X , X ) i ( X , X )]+ I ( X , X ; Y ) − C sum . Note that for small a , the RHS will be negative unless p X , p X are such that I ( X , X ; Y ) = C (i.e., they aresum-capacity achieving). By the optimality conditions for themaximization defining the sum-capacity, this implies that E [ i ( x , X )] = C sum for all x where p X ( x ) > E [ i ( X , x )] = C sum for all x where p X ( x ) > . Thus, for x , x where p X ( x ) p X ( x ) > , we have j ( x , x ) = i ( x , x ) − C. Thus E [ j ( X , X ) ] = Var ( i ( X , X )) , and E [ j ( X , X ) i ( X , X )] = E [ i ( X , X ) − C sum i ( X , X )]= E [ i ( X , X ) ] − C sum = Var ( i ( X , X )) . Therefore ∆( a ) ≈ max p X p X : I ( X ,X ; Y )= C sum (cid:112) a Var ( i ( X , X ))= σ √ a . A PPENDIX BP ROOF OF L EMMA
Lemma 5.
Fix (cid:15) ∈ (0 , . Let Y and Z be independent randomvariables where f Y ( y ) ≥ c for all p ∈ [ F − Y ( (cid:15)/ , F − Y ( (cid:15) )] ,d ≥ f Z ( y ) ≥ c for all p ∈ [ F − Z ( (cid:15)/ , F − Z ( (cid:15) )] . Then for X = Y + Z , f X ( F − X ( (cid:15) )) ≥ min (cid:26) c , c (cid:15) d (cid:27) . Proof:
Let x = F − X ( (cid:15) ) . Note that F X ( y + z ) = Pr( Y + Z ≤ y + z ) ≤ Pr( Y ≤ y or Z ≤ z ) ≤ F Y ( y ) + F Z ( z ) . In particular, F X ( F − Y ( (cid:15)/
2) + F − Z ( (cid:15)/ ≤ (cid:15) so x ≥ F − Y ( (cid:15)/
2) + F − Z ( (cid:15)/ . By similar reasoning, x ≤ F − Y (cid:18) (cid:15) (cid:19) + F − Z (cid:18) (cid:15) (cid:19) . Define y = F − Y ( (cid:15)/ , y = F − Y ( (cid:15) ) ,z = F − Z ( (cid:15)/ , z = F − Z ( (cid:15) ) . Consider several cases. First, if y + z ≤ x ≤ y + z . (24)Then f X ( x ) = (cid:90) ∞−∞ f Y ( x − z ) f Z ( z ) dz ≥ (cid:90) z z cf Y ( x − z ) dz = c Pr( x − z < Y < x − z ) ≥ c Pr( y < Y < y )= c (cid:18) (cid:15) − (cid:15) (cid:19) ≥ c . Similarly, if y + z ≤ x ≤ y + z , (25)then f X ( x ) ≥ c . Now consider the case that neither (24) nor(25) holds. We have f X ( x ) ≥ (cid:90) min { y ,x − z } max { y ,x − z } c dy = c [min { y , x − z } − max { y , x − z } ] By the assumption that (24) and (25) do not hold, we have f X ( x ) ≥ c min { y + z − x, x − y − z } . Note that y + z − x ≥ F − Y (1 / (cid:15) ) + F − Z (1 / (cid:15) ) − F − Y (cid:18) (cid:15) (cid:19) + F − Z (cid:18) (cid:15) (cid:19) ≥ F − Z (1 / (cid:15) ) − F − Z (cid:18) (cid:15) (cid:19) ≥ (cid:15) d . Moreover x − y − y ≥ F − Y ( (cid:15) ) + F − Z ( (cid:15) ) − F − Y ( (cid:15) ) − F − Z ( (cid:15) ) ≥ F − Z ( (cid:15)/ − F − Z ( (cid:15)/ ≥ (cid:15) d . Thus in this case, f X ( x ) ≥ c (cid:15) d . We now complete the proof of Lemma 3. Recall that S K = √ V Z ( K ) + √ V Z where Z ( K ) = max k ∈ [ K ] Z k . Let x = F − S K ( (cid:15) ) . Note that dds F S K ( s ) = f S K ( s ) and so ddp F − S K ( p ) (cid:12)(cid:12)(cid:12)(cid:12) p = (cid:15) = 1 f S K ( x ) . Thus it is sufficient to show that f S K ( x ) is bounded awayfrom zero for all K . Since Z ∼ N (0 , , F − Z ( (cid:15)/ ≥ − (cid:112) /(cid:15) ) , F − Z ( (cid:15) ) ≤ (cid:112) / (1 − (cid:15) )) . Thus, for z ∈ [ F − Z ( (cid:15)/ , F − Z ( (cid:15) )] , f Z ( z ) = φ ( z ) ≥ max { φ ( − (cid:112) /(cid:15) )) , φ ( (cid:112) / (1 − (cid:15) ))) } = max (cid:26) √ π(cid:15) , √ π (1 − (cid:15) ) (cid:27) = 2 √ π min { (cid:15), − (cid:15) } . Moreover, for all z , f Z ( z ) ≤ √ π . Now we prove a lower bound on f Z ( K ) ( y ) . Specifically let y = F − Z ( K ) ( p ) for p ∈ [ (cid:15) , (cid:15) ] . Note that p = F Z ( K ) ( y ) = Φ( y ) K so Φ( y ) = p /K . We have f Z ( K ) ( y ) = K Φ( y ) K − φ ( y )= Kp − /K φ ( y ) Suppose p /K < / . Thus y < . Also, since K ≥ , p /K ≥ p ≥ (cid:15) , we have (cid:15) ≤ p /K = Φ( y )= Q ( − y ) ≤ e − y / . Thus > y ≥ − (cid:112) /(cid:15) ) and so f Z ( K ) ( y ) ≥ Kp − /K (cid:15) √ π ≥ Kp (cid:15) √ π ≥ (cid:15) √ π . Now suppose p /K ≥ / , so y ≥ . We have p /K = Φ( y )= 1 − Q ( y ) ≥ − e − y / and so y ≤ (cid:113) − − p /K ) . Thus f Z ( K ) ( y ) ≥ Kp − /K φ (cid:18)(cid:113) − − p /K ) (cid:19) = Kp − /K √ π (1 − p /K )= Kp √ π ( p − /K − In the limit as K → ∞ , p − /K − (cid:26) − K ln p (cid:27) − ≥ − K ln p. Thus f Z ( K ) ( y ) ≥ p ln(1 /p ) √ π . This proves that there exists a c > such that f Z ( K ) ( y ) ≥ c for all y in the range of interest. Similar f Z is upper andlower bounded as shown above, we may apply Lemma 5 tocomplete the proof. R EFERENCES[1] F. Willems, “The discrete memoryless multiple access channel with par-tially cooperating encoders,”
IEEE Transactions on Information Theory ,vol. 29, no. 3, pp. 441–445, 1983.[2] F. Willems and E. Van der Meulen, “The discrete memoryless multiple-access channel with cribbing encoders,”
IEEE Transactions on Informa-tion Theory , vol. 31, no. 3, pp. 313–327, 1985.[3] P. Noorzad, M. Effros, M. Langberg, and T. Ho, “On the power ofcooperation: Can a little help a lot?” in
IEEE International Symposiumon Information Theory , 2014, pp. 3132–3136.[4] P. Noorzad, M. Effros, and M. Langberg, “The unbounded benefitof encoder cooperation for the k-user MAC,”
IEEE Transactions onInformation Theory , vol. 64, no. 5, pp. 3655–3678, 2018.[5] M. Langberg and M. Effros, “On the capacity advantage of a single bit,”in . IEEE, 2016, pp.1–6. [6] P. Noorzad, M. Effros, and M. Langberg, “Can negligible cooperationincrease capacity? the average-error case,” in Proceedings of IEEEInternational Symposium on Information Theory (ISIT) , 2018, pp. 1256–1260.[7] P. Noorzad, M. Langberg, and M. Effros, “Negligible Cooperation: Con-trasting the Maximal- and Average-Error Cases,”
Manuscript. Availableon https://arxiv.org/pdf/1911.10449.pdf , 2019.[8] J. Hartigan et al. , “Bounding the maximum of dependent randomvariables,”
Electronic Journal of Statistics , vol. 8, no. 2, pp. 3126–3140,2014.[9] C. Borell, “The Brunn-Minkowski inequality in Gauss space,”
Inven-tiones Mathematicae , vol. 30, no. 2, pp. 207–216, 1975.[10] B. Tsirelson, I. Ibragimov, and V. Sudakov, “Norms of Gaussian samplefunctions,”
Proceedings of the Third Japan-USSR Symposium on Prob-ability Theory , vol. 550, pp. 20–41, 1976.[11] Y.-W. Huang and P. Moulin, “Finite blocklength coding for multipleaccess channels,” in . IEEE, 2012, pp. 831–835.[12] E. M. Jazi and J. N. Laneman, “Simpler achievable rate regionsfor multiaccess with finite blocklength,” in . IEEE, 2012, pp. 36–40.[13] V. Y. Tan and O. Kosut, “On the dispersions of three network informationtheory problems,”
IEEE Transactions on Information Theory , vol. 60,no. 2, pp. 881–903, 2014.[14] J. Scarlett, A. Martinez, and A. G. i F`abregas, “Second-order rate regionof constant-composition codes for the multiple-access channel,”
IEEETransactions on Information Theory , vol. 61, no. 1, pp. 157–172, 2015.[15] R. C. Yavas, V. Kostina, and M. Effros, “Random access channel codingin the finite blocklength regime,”
IEEE Transactions on InformationTheory , 2020.[16] L. H. Y. Chen, X. Fang, and Q.-M. Shao, “From Stein identities tomoderate deviations,”