[PDF] A Joint Typicality Approach to Algebraic Network Information Theory

Abstract

This paper presents a joint typicality framework for encoding and decoding nested linear codes for multi-user networks. This framework provides a new perspective on compute-forward within the context of discrete memoryless networks. In particular, it establishes an achievable rate region for computing the weighted sum of nested linear codewords over a discrete memoryless multiple-access channel (MAC). When specialized to the Gaussian MAC, this rate region recovers and improves upon the lattice-based compute-forward rate region of Nazer and Gastpar, thus providing a unified approach for discrete memoryless and Gaussian networks. Furthermore, this framework can be used to shed light on the joint decoding rate region for compute-forward, which is considered an open problem. Specifically, this work establishes an achievable rate region for simultaneously decoding two linear combinations of nested linear codewords from K senders.

Full PDF

aa r X i v : . [ c s . I T ] J un A Joint Typicality Approach to AlgebraicNetwork Information Theory

Sung Hoon Lim, Chen Feng, Adriano Pastore,Bobak Nazer, and Michael Gastpar

Abstract

This paper presents a joint typicality framework for encoding and decoding nested linear codes formulti-user networks. This framework provides a new perspective on compute–forward within the contextof discrete memoryless networks. In particular, it establishes an achievable rate region for computing theweighted sum of nested linear codewords over a discrete memoryless multiple-access channel (MAC).When specialized to the Gaussian MAC, this rate region recovers and improves upon the lattice-basedcompute–forward rate region of Nazer and Gastpar, thus providing a uniﬁed approach for discretememoryless and Gaussian networks. Furthermore, this framework can be used to shed light on thejoint decoding rate region for compute–forward, which is considered an open problem. Speciﬁcally,this work establishes an achievable rate region for simultaneously decoding two linear combinations ofnested linear codewords from K senders. Index Terms

Linear codes, joint decoding, compute–forward, multiple-access channel, relay networks

This paper was presented in part at the 2014 IEEE Information Theory Workshop, Hobart, Australia and the 2015 AllertonConference on Communication, Control, and Computing, Monticello, IL.Sung Hoon Lim, Adriano Pastore, and Michael Gastpar are with the School of Computer and CommunicationSciences, Ecole Polytechnique F´ed´erale, 1015 Lausanne, Switzerland (e-mail: sung.lim@epﬂ.ch, adriano.pastore@epﬂ.ch,michael.gastpar@epﬂ.ch).Chen Feng is with the School of Engineering, The University of British Columbia, Kelowna, BC, Canada (e-mail:[email protected]).Bobak Nazer is with the Department of Electrical and Computer Engineering, Boston University, Boston, MA (e-mail:[email protected]).

July 1, 2016 DRAFT

I. I

NTRODUCTION

In network information theory, random i.i.d. ensembles serve as the foundation for the vastmajority of coding theorems and analytical tools. As elegantly demonstrated by the textbook ofEl Gamal and Kim [1], the core results of this theory can be uniﬁed via a few powerful packingand covering lemmas. However, starting from the many–help–one source coding example ofK¨orner and Marton [2], it has been well-known that there are coding theorems that seem torequire random linear ensembles, as opposed to random i.i.d. ensembles. Recent efforts havedemonstrated that linear and lattice codes can yield new achievable rates for relay networks [3]–[9], interference channels [10]–[16], distributed source coding [17]–[21], dirty-paper multiple-access channels [22]–[25], and physical-layer secrecy [26]–[28]. See [29] for a survey of lattice-based techniques for Gaussian networks.Although there is now a wealth of examples that showcase the potential gains of random linearensembles, it remains unclear if these examples can be captured as part of a general framework,i.e., an algebraic network information theory , that is on par with the well-established frameworkfor random i.i.d. ensembles. The recent work of Padakandla and Pradhan [16], [25], [30] hastaken important steps towards such a theory, by developing joint typicality encoding and decodingtechniques for nested linear code ensembles. In this paper, we take further steps in this directionby developing coding techniques and error bounds for nested linear code ensembles. For instance,we provide a packing lemma for analyzing the performance of linear codes under simultaneous joint typicality decoding (in Sections VI and VIII) and a Markov Lemma for linear codes (inAppendix F).We will use the compute–forward problem as a case study for our approach. As originallystated in [5], the objective in this problem is to reliably decode one or more linear combinationsof the messages over a Gaussian multiple-access channel (MAC). Within the context of a relaynetwork, compute–forward allows relays to recover linear combinations of interfering codewordsand send them towards a destination, which can then solve the resulting linear equations for thedesired messages. Recent work has also shown that compute–forward is useful in the context ofinterference alignment. For instance, Ordentlich et al. [13] approximated the sum capacity of thesymmetric Gaussian interference channel via compute–forward. The achievable scheme from [5]relies on nested lattice encoding combined with “single-user” lattice decoding, i.e., each desired

July 1, 2016 DRAFT linear combination is recovered independently of the others. Subsequent efforts [13], [31], [32]developed a variation of successive cancellation for decoding multiple linear combinations.In this paper, we generalize compute–forward beyond the Gaussian setting and develop single-letter achievable rate regions using joint typicality decoding. Within our framework, each encodermaps its message into a vector space over a ﬁeld and the decoder attempts to recover a linearcombination of these vectors. In particular, Theorem 1 establishes a rate region for recoveringa ﬁnite-ﬁeld linear combination over a MAC. This includes, as special cases, the problemof recovering a ﬁnite-ﬁeld linear combination over a discrete memoryless (DM) MAC anda Gaussian MAC. In Theorem 2, we develop a rate region for recovering an integer-linearcombination of bounded, integer-valued vectors. Finally, in Theorem 3, we use a quantizationargument to obtain a rate region for recovering an integer-linear combination of real-valuedvectors.As mentioned above, the best-known rate regions for lattice-based compute–forward relyon successive cancellation decoding. One might expect that simultaneous decoding yields alarger rate region for recovering two or more linear combinations. However, for a randomlattice codebook, a direct analysis of simultaneous decoding is challenging, due to the statisticaldependencies induced by the shared linear structure [33]. We are able to surmount this difﬁcultyby carefully partitioning error events directly over the ﬁnite ﬁeld from which the codebook isdrawn. Overall, we obtain a rate region for simultaneously recovering two linear combinationsin Theorem 4.Our results recover and improve upon the rate regions of [5], [32], [34], thus providing a uniﬁedapproach to compute–forward over both DM and Gaussian networks. Additionally, the single-letter rate region implicitly captures recent work [35, Example 3] that has shown that Gaussianinput distributions are not necessarily optimal for Gaussian networks. One appealing feature ofour approach is that the ﬁrst-order performance analysis uses steps that closely resemble thoseused for random i.i.d. ensembles. However, there are several technical subtleties that arise dueto linearity, which require careful treatment in our error probability bounds.For a random linear codebook, each codeword is i.i.d. uniformly distributed over the underlyingﬁnite ﬁeld. This poses a challenge for generating non-uniform channel input distributions, andit is well-known that a direct application of a linear codebook cannot attain the point-to-pointcapacity in general [36]. See Figure 1 for an illustration. To get around this issue, we will use the

July 1, 2016 DRAFT

Fig. 1. An illustration of the typicality of random i.i.d. (red) and random linear (blue) codewords. Due to the weak law oflarge numbers, most random i.i.d. codewords are typical for large n . In contrast, since random linear codewords are uniformlydistributed, exponentially many codewords will be atypical with respect to non-uniform distributions. We resolve this issue via multicoding , i.e., we generate exponentially more linear codewords than needed and use an auxiliary index to select the typicalones. nested linear coding architecture which ﬁrst appeared in [37], [38]. This encoding architectureconsists of the following components:1) an auxiliary linear code (shared by all encoders)2) a joint typicality encoder for multicoding3) a symbol-by-symbol function of the auxiliary linear codeword.Roughly speaking, the auxiliary linear code is designed at a higher rate than the target achievablerate, the joint typicality encoding is used to select codewords of the desired type, and the functionis used to map the codeword symbols from the ﬁnite ﬁeld to the channel input alphabet. The ideaof using a joint typicality encoder for channel coding appears in the celebrated coding schemeby Gelfand and Pinsker [39] for channels with state, Marton’s coding scheme for the broadcastchannel [40] and the hybrid coding scheme [41] for joint–source channel coding. In contrast tothese applications, our joint typicality encoding step is used to ﬁnd an auxiliary codeword thatis itself typical with respect to a desired distribution, instead of with respect to a state or sourcesequence. The use of a symbol-by-symbol function is reminiscent of the Shannon strategy [42]for channels with states.The shared linear codebook creates subtle issues for the analysis of joint typicality encoding July 1, 2016 DRAFT and decoding. Speciﬁcally, the users’ choices of typical codewords depend upon the codebook,and thus the codewords are not independent across users. For this scenario, the standard Markovlemma (see, for instance, [1, Lemma 12.1]) does not directly apply. To overcome this issue,prior work by Padakandla and Pradhan proposed a Markov lemma for nested linear codes thatrequired both a lower and an upper bound on the auxiliary rates [25]. In Appendix F, we followa different proof strategy, which enables us to remove the upper bound.Furthermore, for a random linear codebook, the codewords are only pairwise independent.While this sufﬁces to apply a standard packing lemma [1, Section 3.2] for decoding a singlecodeword, it creates obstacles for decoding multiple codewords. In particular, one has to contendwith the fact that competing codewords may be linearly dependent on the true codewords. Tocope with these linear dependencies, we develop a packing lemma for nested linear codes, whichserves as a foundation for the achievable rate regions described above.We closely follow the notation in [1]. Let X denote the alphabet and x n a length- n sequencewhose elements belong to X (which can be either discrete or a subset of R ). We use uppercaseletters to denote random variables. For instance, X is a random variable that takes values in X .We follow standard notation for probability measures. Speciﬁcally, we denote the probabilityof an event A by P {A} and use P X ( x ) , p X ( x ) , f X ( x ) , and F X ( x ) to denote a probabilitydistribution (i.e., measure), probability mass function (pmf), probability density function (pdf),and cumulative distribution function (cdf), respectively.For ﬁnite and discrete X , the type of x n is deﬁned to be π ( x | x n ) := (cid:12)(cid:12) { i : x i = x } (cid:12)(cid:12) /n for x ∈ X . Let X be a discrete random variable over X with probability mass function p X ( x ) . Forany parameter ǫ ∈ (0 , , we deﬁne the set of ǫ -typical n -sequences x n (or the typical set inshort) [43] as T ( n ) ǫ ( X ) = { x n : | π ( x | x n ) − p X ( x ) | ≤ ǫp X ( x ) for all x ∈ X } . We use δ ( ǫ ) > to denote a generic function of ǫ > that tends to zero as ǫ → . One notable departure is thatwe deﬁne sets of message indices starting at zero rather than one, [ n ] := { , . . . , n − } .We use the notation F , R , and F q to denote a ﬁeld, the real numbers, and the ﬁnite ﬁeld oforder q , respectively. We denote deterministic row vectors either with lowercase, boldface font(e.g., a ∈ F K q ). Note that a deterministic row vector can also be written as a sequence (e.g., u n ∈ F n q ). We will denote random sequences using uppercase font (e.g., U n ∈ F n q ) and will notrequire explicit notation for random vectors. Random matrices will be denoted with uppercase,boldface font (e.g., G ∈ F n × κ q ) and we will use uppercase, sans-serif font to denote realizations July 1, 2016 DRAFT of random matrices (e.g., G ∈ F n × κ q ) or deterministic matrices.II. P ROBLEM S TATEMENT

We now give a formal problem statement for compute–forward. Although the primary resultsof this paper focus on recovering one or two linear combinations, we state the general case ofrecovering K linear combinations so that we can clearly state open questions.Consider the K -user memoryless multiple-access channel (MAC) ( X × · · · × X K , P Y | X ,...,X K ( y | x , . . . , x K ) , Y ) which consists of K sender alphabets X k , k ∈ [1 : K ] , one receiver alphabet Y , and a collection ofconditional probability distributions P Y | X ,...,X K ( y | x , . . . , x K ) . Since the channel is memoryless,we have that P Y n | X n ,...,X nK ( y n | x n , . . . , x nK ) = n Y i =1 P Y | X ,...,X K ( y i | x i , . . . , x Ki ) . In our considerations, the input alphabets X k and receiver alphabet Y are either ﬁnite or the realline. Note that discrete memoryless (DM) MACs and Gaussian MACs are special cases of thisclass of channels. M Encoder BijectiveMapping to F n U n x n ( u n ) X n ... ... M K Encoder K BijectiveMapping to F n U nK x nK ( u nK ) X nK P Y | X ,...,X K Y n Decoder ˆ W n a , . . . , ˆ W n a K Fig. 2. Block diagram of the compute–forward problem. Each transmitter has a message M k drawn independently anduniformly from [2 nR k ] that is bijectively mapped to a representative sequence U nk ( M k ) over a vector space F n , and theninto a channel input X nk ( M k ) ∈ X nk . The K channel inputs pass through a memoryless MAC described by conditionalprobability distribution P Y | X ,...,X K resulting in channel output Y n . Finally, the decoder computes ˆ W n a , . . . , ˆ W n a K of thelinear combinations W n a ℓ ( M , . . . , M K ) = P k a ℓ,k U nk ( M k ) . Consider a ﬁeld F (not necessarily ﬁnite) and let A ⊂ F be a discrete subset of F . Let July 1, 2016 DRAFT a , . . . , a K ∈ A K denote the coefﬁcient vectors, and let A =  a ... a K  ∈ A K × K (1)denote the coefﬁcient matrix.A (2 nR , . . . , nR K , n, ( F , A ) , A ) code for compute–forward consists of • K message sets [2 nR k ] , k ∈ [1 : K ] • K encoders, where encoder k maps each message m k ∈ [2 nR k ] to a pair of sequences ( u nk , x nk )( m k ) ∈ F n × X nk such that u nk ( m k ) is bijective , • K linear combinations for each message tuple ( m , . . . , m K )  w n a ( m , . . . , m K ) ... w n a K ( m , . . . , m K )  = A  u n ( m ) ... u nK ( m K )  , where the linear combinations are deﬁned over the vector space F n , and • a decoder that assigns estimates ( ˆ w n a , . . . , ˆ w n a K ) ∈ F n × · · · × F n to each received sequence y n ∈ Y n .Each message M k is independently and uniformly drawn from [2 nR k ] . The average probabilityof error is deﬁned as P ( n ) e = P (cid:8) ( ˆ W n a , . . . , ˆ W n a K ) = ( W n a , . . . , W n a K ) (cid:9) . We say that a rate tuple ( R , . . . , R K ) is achievable for recovering the linear combinations with coefﬁcient matrix A ifthere exists a sequence of (2 nR , . . . , nR K , n, ( F , A ) , A ) codes such that lim n →∞ P ( n ) e = 0 .The role of the mappings u nk ( m k ) is to embed the messages into the vector space F n , sothat it is possible to take linear combinations. The restriction to bijective mappings ensures thatit is possible to solve the linear combinations and recover the original messages (subject toappropriate rank conditions).The goal is for the receiver to recover the linear combinations w n a ℓ ( m , . . . , m K ) = K X k =1 a ℓ,k u nk ( m k ) , ℓ ∈ [1 : K ] . (2)where a ℓ,k is the ( ℓ, k ) th entry of A and the multiplication and summation operations are over F .The matrix A can be of any rank, for example, setting a = · · · = a K = and a = a corre-sponds to the case where the receiver only wants a single linear combination w n a ( m , . . . , m K ) . July 1, 2016 DRAFT

One natural example is to take the ﬁeld as the reals, F = R , and the set of possible coefﬁcientsas the integers, A = Z . This corresponds to the Gaussian compute–forward problem statementfrom [5] where the receiver’s goal is to recover integer-linear combinations of the real-valuedcodewords. Another example is to set A = F = F q , i.e., linear combinations are taken over theﬁnite ﬁeld of order q . This will be the starting point for our coding schemes. Remark 1.

We could also attempt to deﬁne compute–forward formally for any choice of deter-ministic functions of the messages. See [6] for an example. However, all known compute–forwardschemes, have focused on the special case of linear functions. Moreover, certain applications,such as interference alignment, take explicit advantage of the connection to linear algebra.Therefore, we ﬁnd it more intuitive to directly frame the problem in terms of linear combinations.III. M

AIN R ESULTS

We now state our achievability theorems and work out several examples. For the sake of clarityand simplicity, we begin with the special case of K = 2 transmitters and a receiver that only wantsa single linear combination. Theorem 1 describes an achievable rate region for ﬁnite-ﬁeld linearcombinations, Theorem 2 provides a rate region for recovering integer-linear combinations ofinteger-valued random variables, and Theorem 3 establishes a rate region for recovering integer-linear combinations of real-valued random variables. Afterwards, in Theorem 4, we provide arate region for recovering two ﬁnite-ﬁeld linear combinations of K codewords, and Theorem 5argues that, if K = 2 , this corresponds to a multiple-access strategy. A. Computing One Linear Combination Over a Two-User MAC

In this subsection, we consider the special case of a receiver that wants a single linearcombination of K = 2 transmitters’ codewords. Speciﬁcally, we set a = and, for notationalsimplicity, denote a by a = [ a , a ] .In order to state our main result, we need to deﬁne two rate regions. See Figure 3 for anillustration. The ﬁrst region can be interpreted as the rates available for directly recovering thelinear combination w n a ( m , m ) from the received sequence Y n via “single-user” decoding, R CF ( a ) := { ( R , R ) : R < I CF , ( a ) , R < I CF , ( a ) } , (3)where I CF , ( a ) and I CF , ( a ) will be speciﬁed in the following theorems. July 1, 2016 DRAFT R R I ( X , X ; Y ) − I CF , ( a ) I CF , ( a ) I ( X ; Y | X ) I ( X , X ; Y ) − I CF , ( a ) I ( X ; Y | X ) I CF , ( a ) R CF ( a ) R R R + R = I ( X , X ; Y ) I ( X , X ; Y ) − I CF , ( a ) I CF , ( a ) I ( X ; Y | X ) I ( X , X ; Y ) − I CF , ( a ) I ( X ; Y | X ) I CF , ( a ) R LMAC R R I ( X , X ; Y ) − I CF , ( a ) I CF , ( a ) I ( X ; Y | X ) I ( X , X ; Y ) − I CF , ( a ) I ( X ; Y | X ) I CF , ( a ) R CF ( a ) ∪ R LMAC

Fig. 3. Illustration of the rate region from Theorems 1, 2, and 3 for the special case when the coefﬁcient vector a is chosento (simultaneously) maximize I CF , ( a ) and I CF , ( a ) and we assume that I CF , ( a ) + I CF , ( a ) ≥ I ( X , X ; Y ) . In the top left,we have the rate region R CF ( a ) for directly recovering a linear combination via “single–user” decoding. In the bottom left, wehave the rate region R LMAC for multiple–access with a shared linear codebook. The rate region from Theorems 1, 2, and 3 arethe union of these two regions and is shown on the right.

The second rate region can be interpreted as the rates available for recovering both messagesindividually via multiple-access with a shared nested linear codebook: R LMAC := R LMAC , ∪ R LMAC , (4a) R LMAC , := n ( R , R ) : R < max b ∈ A \{ } min { I CF , ( b ) , I ( X , X ; Y ) − I CF , ( b ) } , (4b) R < I ( X ; Y | X ) , July 1, 2016 DRAFT0 R + R < I ( X , X ; Y ) o , R LMAC , := n ( R , R ) : R < I ( X ; Y | X ) , (4c) R < max b ∈ A \{ } min { I CF , ( b ) , I ( X , X ; Y ) − I CF , ( b ) } ,R + R < I ( X , X ; Y ) o . Notice that R LMAC does not correspond, in general, to the classical multiple-access rate region.We are ready to state our main theorems. Note that all of our theorems apply to both discreteand continuous input and output alphabets X k and Y , and are distinguished from another by thealphabet of the auxiliary random variables U k .The theorem below gives an achievable rate region for recovering a single linear combinationover F q . Theorem 1 (Finite-Field Compute–Forward) . Set ( F , A ) = ( F q , F q ) and let a ∈ F q be thedesired coefﬁcient vector. A rate pair ( R , R ) is achievable if it is included in R CF ∪ R LMAC for some input pmf p U ( u ) p U ( u ) and symbol mappings x ( u ) and x ( u ) , where U k ⊆ F q , I CF , ( a ) = H ( U ) − H ( W a | Y ) , (5a) I CF , ( a ) = H ( U ) − H ( W a | Y ) , (5b) and W a = a U ⊕ a U , (6) where the addition and multiplication operations in (6) are over F q . Remark 2.

We have omitted the use of time-sharing random variables for the sake of simplicity.We note that the achievability results in this paper can be extended to include a time-sharingrandom variable following the standard coded time-sharing method [1, Sec. 4.5.3].

Remark 3.

Prior work by Padakandla and Pradhan proposed a ﬁnite-ﬁeld compute–forwardscheme for communicating the sum of codewords over a two-user MAC [38], resulting in theachievable rate region R PP = { ( R , R ) : R k ≤ min( H ( U ) , H ( U )) − H ( U ⊕ U | Y ) , k = 1 , } .Note that this region is included in R CF ([1 1]) from Theorem 1, and corresponds to the specialcase where the rates are set to be equal R = R . July 1, 2016 DRAFT1

We prove Theorem 2 in two steps in Section VI. First, we develop an achievable scheme fora

DM-MAC , which will serve as a foundation for the remainder of our achievability arguments.Afterwards, we use a quantization argument to extend this scheme to real-valued receiveralphabets.

Example 1.

Consider the binary multiplying MAC with channel output Y = X · X and binarysender and receiver alphabets, X = X = Y = { , } . The receiver would like to recover thesum W = U ⊕ U over the binary ﬁeld q = 2 where U k ∼ Bern( p k ) and x k ( u k ) = u k , k = 1 , .The highest symmetric rate R = R = R sym achievable via Theorem 1 is R sym = 0 . , whichis attained with p = p = 0 . . Note that, if we send both U and U to the receiver viaclassical multiple-access, the highest symmetric rate possible is R sym = 0 . .In many settings, it will be useful to recover a real-valued sum of the codewords, rather thanthe ﬁnite-ﬁeld sum. Below, we provide two theorems for recovering integer-linear combinationsof codewords over the real ﬁeld. The ﬁrst restricts the U k random variables to (bounded) integervalues, which in turn allows us to express the rate region in terms of discrete entropies. Thesecond allows the U k to be continuous–valued random variables (subject to mild technicalconstraints), and the rate region is written in terms of differential entropies. Theorem 2.

Set ( F , A ) = ( R , Z ) and let a ∈ Z be the desired coefﬁcient vector. Assume that U k ⊂ Z and |U k | < ∞ . A rate pair ( R , R ) is achievable if it is included in R CF ∪ R LMAC forsome input pmf p U ( u ) p U ( u ) and symbol mappings x ( u ) and x ( u ) , where I CF , ( a ) = H ( U ) − H ( W a | Y ) ,I CF , ( a ) = H ( U ) − H ( W a | Y ) , and W a = a U + a U , (8) where the addition and multiplication in (8) are over R . The proof of Theorem 2 is given in Section VI. Notice that, while the U k are restricted tointeger values, the x k ( u k ) are free to map to any real values. July 1, 2016 DRAFT2

Deﬁnition 1 (Weak continuity of random variables) . Consider a family of cdfs { F t } that areparametrized by t ∈ R K and denote random variables X t ∼ F t . The family { F t } is said to be weakly continuous at t if X t converges in distribution to X t as t → t . Theorem 3 (Continuous Compute–Forward) . Set ( F , A ) = ( R , Z ) and let a ∈ Z be the desiredcoefﬁcient vector. Let U and U be two independent real-valued random variables with absolutelycontinuous distributions described by pdfs f U and f U , respectively. Also, assume that the familyof cdfs { F Y | U ( ·| u ) } is weakly continuous in u almost everywhere. Finally, assume that thefollowing ﬁniteness conditions on entropies and differential entropies hold: h ( U ) < ∞ and h ( U ) < ∞ H ( ⌈ U ⌋ ) < ∞ and H ( ⌈ U ⌋ ) < ∞ where ⌈ u ⌋ rounds u to the nearest integer. A rate pair ( R , R ) is achievable if it is included in R CF ( a ) ∪ R LMAC for some input pdf f U ( u ) f U ( u ) and symbol mappings x ( u ) , x ( u ) , where I CF , ( a , β ) := h ( U ) − h ( W a | Y ) + log gcd( a ) (9a) I CF , ( a , β ) := h ( U ) − h ( W a | Y ) + log gcd( a ) , (9b) and W a = a U + a U , (10) where the addition and multiplication in (10) are over R and gcd( a ) denotes the greatest commondivisor of | a | and | a | . The proof of this theorem is deferred to Section VII.

Remark 4.

The log gcd( a ) term neutralizes the penalty for choosing a coefﬁcient vector a with gcd( a ) > . For example, set a = [1 1] and ˜ a = [2 2] and note that gcd( a ) = 1 and gcd( ˜ a ) = 2 .Since h ( W ˜ a | Y ) = h ( W a | Y ) + log(2) , we ﬁnd that the log gcd( ˜ a ) term compensates exactly forthe penalty in the conditional entropy. Previous work on compute–forward either ignored thepossibility of a penalty [5] or compensated by taking an explicit union over all integer coefﬁcientmatrices with the same row span [32].Consider the Gaussian MAC Y = h X + h X + Z, (11) July 1, 2016 DRAFT3 with channel gains h k ∈ R , average power constraints P ni =1 x k,i ( m k ) ≤ nP k , k = 1 , , andzero-mean additive Gaussian noise with unit variance. Specializing Theorem 3 by setting f U k tobe N (0 , P k β k ) and x k ( u k ) = β k u k for some β k ∈ R , we establish the following corollary, whichincludes the Gaussian compute–forward rate regions in [5], [32], [44]. Corollary 1 (Gaussian Compute–Forward) . Consider a Gaussian MAC and set ( F , A ) = ( R , Z ) and let a ∈ Z be the desired coefﬁcient vector. A rate pair ( R , R ) is achievable if it isincluded in R CF ( a ) ∪ R LMAC for some β k ∈ R , k = 1 , , where I CF , ( a , β ) := 12 log (cid:18) β (1 + h P + h P )( a β h − a β h ) P + ( a β ) + ( a β ) (cid:19) + log gcd( a ) ,I CF , ( a , β ) := 12 log (cid:18) β (1 + h P + h P )( a β h − a β h ) P + ( a β ) + ( a β ) (cid:19) + log gcd( a ) ,I ( X , X ; Y ) = C( h P + h P ) ,I ( X ; Y | X ) = C( h P ) ,I ( X ; Y | X ) = C( h P ) , and C( x ) := log(1 + x ) . Example 2.

We now apply each of the theorems above to the problem of sending the sum oftwo codewords over a symmetric Gaussian MAC with channel output Y = X + X + Z where Z ∼ N (0 , is independent, additive Gaussian noise and we have the usual power constraints P ni =1 x k,i ( m k ) ≤ nP , k = 1 , . Speciﬁcally, we would like to send the linear combination withcoefﬁcient vector a = [1 1] at the highest possible sum rate R sum = R + R . In Figure 4, wehave plotted the sum rate for several strategies with respect to SNR = 10 log ( P ) .The upper bound R sum ≤ log(1 + P ) follows from a simple cut-set bound. Corollary 1 with β = β = 1 yields the sum rate R sum = max(log( + P ) , log(1 + 2 P )) . Note that this isthe best-known performance for the Gaussian two-way relay channel [3]–[5]. The best-knownperformance for i.i.d. Gaussian codebooks is R sum = log(1 + 2 P ) . The performance can be slightly improved if the transmitters remain silent part of the time, and increase their power during theremainder of the time. Speciﬁcally, this approach would achieve R = R = max(sup α ∈ [0 , α log( + P − α ) , log(1 + 2 P )) .Note that this requires the use of a time–sharing auxiliary random variable. July 1, 2016 DRAFT4 − − SNR in dB S u m R a t e Upper BoundCorollary 1Theorem 2Theorem 1, q = 4 i.i.d. GaussianTheorem 1, q = 2 Fig. 4. Performance comparison for sending the sum of codewords over a symmetric Gaussian MAC Y = X + X + Z . . . . . . . . . . . . . . SNR in dB S u m R a t e Corollary 1Theorem 2i.i.d. Gaussian

Fig. 5. Example showing that the compute–forward scheme in Theorem 2 with |U | = |U | = 3 can outperform bothcompute–forward with Gaussian inputs and i.i.d. Gaussian coding. July 1, 2016 DRAFT5

We have also plotted two examples of Theorem 1 with q = 2 and q = 4 . For the binary ﬁeld q = 2 , we take U k ∼ Unif( F ) , x k (0) = −√ P , and x k = √ P , k = 1 , . For q = 4 , we take U k ∼ Unif( F ) , x k (0) = − q P , x k (1) = − q P , x k (2) = q P , and x k (3) = 3 q P , k = 1 , .Finally, we have plotted an example of Theorem 2 with U k = {− , − , , } , U k ∼ Unif( U k ) ,and x k ( u k ) = q P u k , k = 1 , . Note that this outperforms the q = 4 strategy in Theorem 1,which effectively uses the same input distributions. If we were to set U k = {− , } , U k ∼ Unif( U k ) , and x k ( u k ) = √ P u k , k = 1 , , we would match the achievable rate of Theorem 1with q = 2 exactly (not shown on the plot). Example 3.

Consider the Gaussian MAC channel in Example 2. In Figure 5, we have plotted anexample of Theorem 2 with U k = {− , , } , pmfs p U k = { − p k , p k , − p k } , and X k = U k q P − p k ,which we optimize over p k ∈ [0 , . For SNR near . dB, we can see that the strategy inTheorem 2 strictly outperforms both the Gaussian-input compute–forward (and thus the lattice-based compute–forward in [5]) and i.i.d. Gaussian coding. The suboptimality of Gaussian inputsfor compute–forward was ﬁrst observed by Zhu and Gastpar [35]. B. Computing Two Linear Combinations Over a K -User MAC In this subsection, we extend the results of the previous section to compute two linearcombinations over a K -user MAC. The problem of recovering multiple linear combinations at asingle receiver was previously studied in [13], [31], [32], [35], [45], [46]. Applications includelattice interference alignment [13], multiple-access [13], [31], [32], [35], and low–complexityMIMO receiver architectures [31], [46]. Prior to this paper, the largest available rate region reliedon successive cancellation decoding [31], [32] and was limited to the Gaussian setting. Here,we derive an achievable rate region for the discrete memoryless setting using simultaneous jointtypicality decoding .There are K transmitters and a single receiver that wants to recover two linear combinationswith coefﬁcient vectors a , a ∈ A K . Without loss of generality, we assume that a and a arelinearly independent. (Otherwise, we can use the results for recovering a single linear combinationdescribed above.) Theorem 4 (Two Linear Combinations) . Let ( F , A ) = ( F q , F q ) and a , a ∈ F K q be the desiredcoefﬁcient vectors. Assume that a and a are linearly independent and deﬁne K ℓ = { k ∈ [1 : July 1, 2016 DRAFT6 K ] : a ℓk = 0 } , ℓ = 1 , as well as W a = K X k =1 a k U k , (12) W a = K X k =1 a k U k , (13) V b = b W a + b W a , (14) where b ∈ F q \{ } and the multiplications and summations are over F q . A rate tuple ( R , . . . , R K ) is achievable if R k < max b ∈ F q \{ } min { H ( U k ) − H ( V b | Y ) , H ( U k ) − H ( W a , W a | Y, V b ) } , k ∈ K R j < I ( W a ; Y, W a ) − H ( W a ) + H ( U j ) , j ∈ K ,R k + R j < I ( W a , W a ; Y ) − H ( W a , W a ) + H ( U k ) + H ( U j ) , k ∈ K , j ∈ K or R k < I ( W a ; Y, W a ) − H ( W a ) + H ( U k ) , k ∈ K ,R j < max b ∈ F q \{ } min { H ( U j ) − H ( V b | Y ) , H ( U j ) − H ( W a , W a | Y, V b ) } , j ∈ K ,R k + R j < I ( W a , W a ; Y ) − H ( W a , W a ) + H ( U k ) + H ( U j ) , k ∈ K , j ∈ K for some input pmf Q Kk =1 p U k ( u k ) , symbol mappings x k ( u k ) , k ∈ [1 : K ] , where U k ⊆ F q . Remark 5.

Theorem 4 can be easily extended to the case ( F , A ) = ( R , Z ) with U k ⊂ Z , |U k | < ∞ (similar to Theorem 2). For this case, we would replace ( F , A ) = ( F q , F q ) with ( F , A ) = ( R , Z ) , set U k ⊂ Z , |U k | < ∞ , and take the summations in (12) to (14) are over R .We defer to Section VIII-A for a detailed description of the decoder, the proof of Theorem 4,and the proof of Remark 5. Remark 6.

The rate region from Theorems 1 and 2 demonstrate that, even if we are interestedin recovering a single linear combination, a joint typicality decoder will sometimes implicitlyrecover both messages. (This occurs for rates that fall in R LMAC .) It seems likely that, forrecovering two linear combinations with coefﬁcient vectors a and a , a complete analysis ofa joint typicality decoder should also include the rate regions for decoding linear combinations July 1, 2016 DRAFT7 with all coefﬁcient matrices A of rank or greater whose rowspan includes a and a . Thisis not the case for Theorem 4, due to the fact that our error analysis can only handle pairs ofindices. The analysis of the simultaneous joint typicality decoder for more than two indices isleft as an open problem.We now consider the special case of K = 2 users and a coefﬁcient matrix A with rank , which, by the bijective mapping assumption on U nk ( M k ) , is equivalent to recovering bothmessages ( M , M ) . Theorem 5 (Multiple-Access via Compute–Forward) . Consider the sequences of code pairsthat achieves the rate region in Theorems 1, 2, and 3 for some input distribution p U ( u ) p U ( u ) and symbol mappings x ( u ) and x ( u ) . Then, the rate pair ( R , R ) is also achievable forrecovering the individual messages with the same sequence of codes, if it is included in R LMAC . The proof is deferred to Section VIII-B.The following corollary is a Gaussian specialization of Theorem 5.

Corollary 2 (Gaussian Multiple-Access via Compute–Forward) . Consider the sequences of codepairs that achieves the rate region in Corollary 1 for some Gaussian MAC. Then, the rate pair ( R , R ) is also achievable for recovering the messages with the same sequence of codes if itis included in R LMAC for some β k ∈ R . The following example considers a compound MAC where one receiver only wants the sumof the codewords. It demonstrates that simultaneous joint typicality decoding can outperformsuccessive cancellation decoding for compute–forward, even after time-sharing. It also showsthat our strategy outperforms the best known random i.i.d. coding scheme.

Example 4.

Consider the two-sender, two-receiver Gaussian network depicted in Figure 6. Thechannel outputs are given by Y = X + hX + Z Y = X + X + Z , where Z and Z are independent Gaussian noise components with zero mean and unit variance, h = √ , and P = 25 and P = 18 where P and P are the power constraints on X and X , July 1, 2016 DRAFT8 M M X n X n Y n Y n Z n Z n h Encoder 1Encoder 2 Decoder 1Decoder 2 ( ˆ M , ˆ M )ˆ W n a Fig. 6. A two-sender two-receiver network. Decoder 1 wishes to recover both messages and Decoder 2 wishes to compute thesum of the channel inputs, W n a = X n ( M ) + X n ( M ) . respectively. Here, we assume that Receiver 1 wishes to recover both messages separately whileReceiver 2 wishes to recover the sum of the codewords, W n a = X n ( M ) + X n ( M ) , (15)where a = [1 1] .To explicitly compute the linear combinations of the transmitted codewords (15), we ﬁx x ( u ) and x ( u ) to be identity mappings in Corollary 2. By Corollary 2, decoding is possible atReceiver if the rates are included in R LMAC (with β = β = 1 ) for the induced MAC. ByCorollary 1, decoding is possible at Receiver if the rates are included in R CF ([1 1]) ∪ R LMAC (with β = β = 1 ) for the induced MAC. In Figure 7, we have plotted these rate constraints,followed by their intersection, and the convexiﬁcation of this region allowed by time–sharing.We have also plotted the performance available to nested lattice codes combined with successivecancellation decoding as derived in [32, Theorem 7]. Finally, we have plotted the performanceof random i.i.d. codes coupled with simultaneous joint typicality decoding, which correspondsto the rates available for a compound Gaussian MAC. While our strategy strictly outperformsthe other two strategies in this scenario, it is not known to be optimal in general.In the following two sections, we introduce the nested linear coding architecture which willform the foundation of our achievability strategies.IV. P OINT - TO -P OINT C HANNELS R EVISITED

To better explain the intuition and structure of our coding strategies, we will ﬁrst revisitand explain the nested linear code architecture for point-to-point communication. Consider the

July 1, 2016 DRAFT9 R R Receiver 1 Rate Constraints R R Receiver 2 Rate Constraints R R Intersection of Receiver 1 andReceiver 2 Rate Constraints R R Achievable Rate Regions

IntersectRate Regions Time-Sharing(Convexify)

Fig. 7. Step-by-step illustration for determining the achievable rate regions for Example 4. On the left, we have the rateconstraints imposed by the receivers and , respectively. On the top right, we have the intersection of these rate constraints.Time sharing yields the achievable rate regions on the bottom right. The thick black line represents the rate region available toi.i.d. Gaussian codebooks combined with joint typicality decoding. The blue line represents the rate region available to nestedlinear codebooks combined with joint typicality decoding (along with a discretization argument to the Gaussian case). The thinred line represents the rate region available to nested lattice codebooks combined with successive cancellation decoding. point-to-point communication system depicted in Figure 8, where a sender wishes to reliablycommunicate a message M at a rate R bits per transmission to a receiver over the discretememoryless channel (DMC) p ( y | x ) .The celebrated channel coding theorem of Shannon [47] states that the capacity C of the July 1, 2016 DRAFT0

M X n Y n p ( y | x ) Encoder Decoder ˆ M Fig. 8. A point-to-point communication system.

M U n X n Y n p ( y | x ) EncoderLinearcode Multicoding x ( u ) Decoder ˆ M Fig. 9. A joint typicality encoding architecture for point-to-point communication based on nested linear codes. discrete memoryless channel p ( y | x ) is given by the capacity formula C = max p ( x ) I ( X ; Y ) . (16)The classic achievability proof for the channel coding theorem relies on a random codingargument. Speciﬁcally, the codeword symbols are randomly and independently generated fromthe capacity achieving distribution p ( x ) and the receiver employs joint typicality decoding.As an alternative strategy, consider the linear coding architecture in Figure 9. This architectureis based on three components, an auxiliary linear code, a joint typicality encoder for multicoding,and a symbol-by-symbol mapping function x ( u ) . Multicoding is often used in the context ofGelfand-Pinsker (i.e., dirty-paper) coding [39] to ﬁnd codewords that are jointly typical withrespect to the observed state sequence. In constrast, the proposed architecture uses multicodingto select linear codewords that are typical with respect to the desired input distribution (asopposed to the uniform distribution). This linear coding architecture was studied by Miyake [37]in the context of sparse codes for point-to-point channels and by Padakandla and Pradhan forthree-user broadcast channels [30], recovering the sum of discrete memoryless sources over adiscrete memoryless MAC with distributed state information [38], and three-user interferencechannels [16]. Below, we provide an overview of the codebook construction, encoding anddecoding operations, and error analysis for this linear coding architecture in the context of amemoryless point-to-point channel. This will help build useful intuition for our main theorems. July 1, 2016 DRAFT1

Codebook generation.

Fix a ﬁnite ﬁeld F q and a parameter ǫ ′ ∈ (0 , . In addition to themessages m ∈ [2 nR ] , we use auxiliary indices l ∈ [2 n ˆ R ] , with rates R and ˆ R , respectively.Randomly generate a κ × n matrix, G ∈ F κ × n q , and a vector d n ∈ F n q where each element of G and d n are independently and randomly generated according to Unif( F q ) , and κ = ⌈ nR/ log( q ) ⌉ + ⌈ n ˆ R/ log( q ) ⌉ .Generate a linear code C with parameters ( R, ˆ R, n, q ) by u n ( m, l ) = [ ν ( m ) , ν ( l )] G ⊕ d n , (17)for m ∈ [2 nR ] , l ∈ [2 n ˆ R ] , where ν ( m ) is the q -ary expansion of the index m ∈ [2 nR ] withlength ˜ κ = ⌈ nR/ log( q )] ⌉ and ν ( l ) is the q -ary expansion of the index l ∈ [2 n ˆ R ] with length ⌈ n ˆ R/ log( q ) ⌉ , and G =  g g · · · g n g g · · · g n ... ... . . . ... g ˜ κ, g ˜ κ, · · · g ˜ κ,n g ˜ κ +1 , g ˜ κ +1 , · · · g ˜ κ +1 ,n g ˜ κ +2 , g ˜ κ +2 , · · · g ˜ κ +2 ,n ... ... . . . ... g κ, g κ, · · · g κ,n  . Note that from this construction, the codewords are pairwise independent P { U n ( m, l ) = u n , U n ( ˜ m, ˜ l ) = ˜ u n } = n Y i =1 p q ( u i ) p q (˜ u i ) , ( m, l ) = ( ˜ m, ˜ l ) , (18)where p q = Unif( F q ) . The general joint distribution of the codewords resulting from thisconstruction can be found in [48, Theorem 1]. Encoding.

Fix a pmf p ( u ) and a function x : F q → X . For each m ∈ [2 nR ] , ﬁnd an index l ∈ [2 n ˆ R ] such that u n ( m, l ) ∈ T ( n ) ǫ ′ ( U ) . If there is more than one, choose one randomly fromsuch indices. If there is none, randomly choose an index from [2 n ˆ R ] .To send message m ∈ [2 nR ] , transmit x i ( u i ( m, l )) for i = 1 , . . . , n , where l is the chosenindex from the above encoding step. From Lemma 9 in Appendix B, the probability of encodingerror tends to zero as n → ∞ if ˆ R > D ( p U k p q ) + (. ǫ ′ ) . (19) July 1, 2016 DRAFT2

Decoding.

Select a parameter ǫ > ǫ ′ . Upon observing y n , the receiver searches for a uniquemessage m ∈ [2 nR ] such that ( u n ( m, l ) , y n ) ∈ T ( n ) ǫ , for some l ∈ [2 n ˆ R ] . If there is none or more than one such message, it declares an error. FromLemma 10 in Appendix B, the probability of encoding error tends to zero as n → ∞ if R + ˆ R < I ( U ; Y ) + D ( p U k p q ) − (. ǫ ) . (20)By eliminating ˆ R from (19) and (20), and sending ǫ → , any rate R that satisﬁes R < max p ( u ) ,x ( u ) I ( U ; Y ) is achievable. Finally, for q ≥ |X | , we can simply select an injective function x : F q → X and apmf p ( u ) so that X = x ( U ) has the capacity-achieving input distribution. Thus, we can achievethe point-to-point capacity (16) using nested linear codes.As mentioned earlier, the above argument can be viewed as a special case of [37, Theorem5.1] or [38, Theorem 1]. In the following sections, we generalize this technique and use it todevelop a discrete memoryless version of compute–forward.V. C OMPUTE –F ORWARD WITH M ULTICODING

Consider a relay in a Gaussian network that observes a noisy linear combination of severalcodewords. Classical relaying strategies for this scenario can be viewed as variations on threefundamental strategies: decode–forward [49, Th. 1], compress–forward [49, Th. 6], and amplify–forward [50]. Recent work [5] has introduced a novel strategy, compute–forward, which enablesa relay to decode a linear combination of q -ary expansions of the messages . Recall that, in ourproblem formulation from Section II, the messages m k are mapped to representative sequences u nk ( m k ) ∈ F n , and the goal of the decoder is to recover linear combinations of the u nk ( m k ) .Below, we provide intuition for why this generalization is useful to move beyond the Gaussian,equal power setting. Afterwards, we provide a formal description of our codebook generationand encoding procedure. July 1, 2016 DRAFT3 a ℓ η ( m , l ) L a ℓ η ( m , l ) ν ( s a ) a ℓ ν ( m ) ν ( l ) L a ℓ ν ( m ) ν ( l ) ν ( s a ℓ ) Fig. 10. An illustration of a linear combination of the q -ary expansions of message and auxiliary indices. On the right-handside, we have used solid colors for message symbols and dashed lines for auxiliary symbols. Transmitter 1’s symbols are shownin blue and occupy the entire vector. Transmitter 2’s symbols are shown in red and only occupy part of the vector. We haveassumed that both a ℓ and a ℓ are non-zero so the linear combination occupies the entire vector. (If a ℓ = 0 , then the last twoentries will be zero.) A. High-Level Overview

To begin, consider a scenario with K transmitters and a single receiver that operate withblocklength n . The k th transmitter has a message m k ∈ [2 nR k ] where R k ≥ denotes its rate. Anappealing approach is to view the messages as vectors in a vector space over the ﬁnite ﬁeld F q .Speciﬁcally, let ν ( m k ) denote the q -ary expansion of m k into a vector of length nR k / log( q ) . For the special case of symmetric rates, R = · · · = R K , we can deﬁne the class of desiredlinear functions as those of the form K M k =1 a k ν ( m k ) for some a k ∈ F q . This is the approach taken in [5] for transmitters with equal power constraints.Unfortunately, it seems that this framework is not rich enough to handle the setting where eachtransmitter has a different input distribution. Speciﬁcally, this is due to the use of multicodingto select linear codewords with the desired types. A similar issue arises in the Gaussian settingwith unequal powers across transmitters [32]. Our solution is to broaden the notion of recoveringa linear combination.As part of our coding scheme, the k th transmitter will have an auxiliary index l k ∈ [2 n ˆ R k ] for some auxiliary rate ˆ R k that represents its selection during the multicoding step. Deﬁne For the remainder of the paper, we will assume that nR k / log( q ) is integer-valued in order to simplify our notation. July 1, 2016 DRAFT4 ˜ R k = R k + ˆ R k and ˜ R max = max k ˜ R k . We will map each transmitter’s message and auxiliaryindices into F κ q where κ = n ˜ R max / log( q ) . This is accomplished by concatenating the q -aryexpansions, followed by zero-padding (if necessary), resulting in η ( m k , l k ) := [ ν ( m k ) ν ( l k ) ] , which is then mapped to the the linear codeword u nk ( m k , l k ) = η ( m k , l k ) G ⊕ d nk , where G ∈ F κ × n q is the generator matrix and d nk ∈ F n q is the dither vector.The goal of the receiver is to recover up to K linear combinations, each of which can beexpressed as a linear codeword, w n a ℓ ( m , . . . , m K ) = K M k =1 a ℓ,k u nk ( m k , l k )= K M k =1 a ℓ,k (cid:0) η ( m k , l k ) G ⊕ d nk (cid:1) = (cid:18) K M k =1 a ℓ,k η ( m k , l k ) (cid:19) G ⊕ K M k =1 a ℓ,k d nk . It will be convenient to associate each linear combination with a unique index. First, notice thatthe effective rate for a linear combination is determined by the maximum rate of all participatingmessages, ˜ R ( a ℓ ) := max { ˜ R k : a ℓ,k = 0 , k ∈ [1 : K ] } . (21)Let s a ℓ ∈ [2 n ˜ R ( a ℓ ) ] be the unique index whose q -ary expansion satisﬁes [ ν ( s a ℓ ) ] = K M k =1 a ℓ,k η ( m k , l k ) . (22)Now, with a slight abuse of notation, we can refer to each possible linear combination as follows w n a ℓ ( s a ℓ ) = ν ( s a ℓ ) G ⊕ K M k =1 a ℓ,k d nk . (23) Remark 7.

From an algebraic perspective, the set (cid:8) η ( m k , l k ) : l k ∈ [2 nR k ] (cid:9) corresponds to acoset for the message m k . Similarly, we can view the linear combinations from (23) as linearcombinations of cosets. July 1, 2016 DRAFT5

B. Nested Linear Code Architecture

We now specify the nested linear codes that will be used as our encoding functions throughoutthe paper. In addition to the messages m ∈ [2 nR k ] , k = 1 , . . . , K , we use auxiliary indices l ∈ [2 n ˆ R k ] , k = 1 , . . . , K , with rates R k and ˆ R k , respectively. We deﬁne ˜ R k := R k + ˆ R k , R max := max { R , R , . . . , R K } , and ˜ R max := max { ˜ R , ˜ R , . . . , ˜ R K } . Let ν ( m k ) denote thelength ⌈ nR k / log( q ) ⌉ q -ary expansion of m k ∈ [2 nR k ] . Similarly, let ν ( l k ) denote the length ⌈ n ˆ R k / log( q ) ⌉ q -ary expansion of l k ∈ [2 n ˆ R k ] . For simplicity, we assume that nR k / log( q ) and n ˆ R k / log( q ) are integers for all rates in the sequel. Further deﬁne η ( m k , l k ) = [ ν ( m k ) , ν ( l k ) , ] , k ∈ [1 : K ] , where η ( m k , l k ) ∈ F κ q , κ = n ˜ R max / log( q ) , and is a vector of zeros with length n ( ˜ R max − ˜ R k ) / log( q ) . Note that all η ( m k , l k ) have the same length due to zero padding.We deﬁne a (2 nR , . . . , nR K , n ˆ R , . . . , n ˆ R K , F q , n ) nested linear code as the collection of K codebooks generated by the following procedure.Fix a pmf Q Kk =1 p ( u k ) and functions x k ( u k ) , k ∈ [1 : K ] . Codebook generation.

Fix a ﬁnite ﬁeld F q and a parameter ǫ ′ ∈ (0 , . Randomly generate a κ × n matrix, G ∈ F κ × n q , and sequences d nk ∈ F n q , k = 1 , . . . , K where each element of G and d nk are independently and randomly generated according to Unif( F q ) , and κ = n ˜ R max / log( q ) .For each k ∈ [1 : K ] , generate a linear code C k with parameters ( R k , ˆ R k , n, q ) by u nk ( m k , l k ) = η ( m k , l k ) G ⊕ d nk , (24)for m k ∈ [2 nR k ] , l k ∈ [2 n ˆ R k ] . Note that from this construction, the codewords are pairwiseindependent and i.i.d. distributed, i.e., P { U nk ( m k , l k ) = u nk , U nk ( ˜ m k , ˜ l k ) = ˜ u nk } = n Y i =1 p q ( u i ) p q (˜ u i ) , ( m, l ) = ( ˜ m, ˜ l ) , (25)where p q = Unif( F q ) . The general joint distribution of the codewords resulting from thisconstruction can be found in [48, Theorem 1]. Encoding.

For k ∈ [1 : K ] , given m k ∈ [2 nR k ] , ﬁnd an index l k ∈ [2 n ˆ R k ] such that u nk ( m k , l k ) ∈T ( n ) ǫ ′ ( U k ) . If there is more than one, select one randomly and uniformly. If there is none, randomlychoose an index from [2 n ˆ R k ] . Node k transmits x ki ( u ki ) , i = 1 , . . . , n .In the following section, we propose a decoding strategy that establishes Theorem 1. July 1, 2016 DRAFT6 M M U n U n X n X n Y n p ( y | x , x ) MulticodingMulticodingEncoder 1Encoder 2Nestedlinear codeNestedlinear code x ( u ) x ( u ) Decoder ˆ S a Fig. 11. Nested linear coding architecture for computing a linear combination with coefﬁcient vector a ∈ F q over a two-userDM-MAC. Each user selects, via multicoding, a linear codeword U nk of the desired type, maps it into the channel input alphabetvia the function x k ( u k ) , and transmits it as X nk . The receiver observes Y n over the DM-MAC speciﬁed by p ( y | x , x ) andoutputs an estimate ˆ S a . Decoding is successful if ˆ S a = S a where S a is the index whose q -ary expansion corresponds to thelinear combination with coefﬁcient vector a in the sense of (23). VI. P

ROOF OF T HEOREMS AND A. Proof of Theorem 1

In the following, we provide achievable rate regions for the important special case of twotransmitters and a receiver that wants a single linear combination over a ﬁnite ﬁeld F q . As wewill demonstrate, the rate region can be viewed as a union of the rates available to a “single-user” decoder that attempts to directly recover the desired linear combination and the ratesavailable to a “multiple-access” decoder that recovers the messages individually and then takesthe linear combination. Moreover, the achievability argument follows naturally via simultaneousjoint typicality decoding, rather than a deliberate combination of two specialized decoders.We will break up the proof into two steps. First, we will establish Theorem 1 for the specialcase when the channel is a discrete memoryless MAC. Afterwards, we will use a standardquantization argument to extend this result to a the case, Y = R . Step 1: Discrete memoryless MAC

Fix F q , pmf p ( u ) p ( u ) , and functions x ( u ) , x ( u ) . The codebook construction and encodingsteps follow the nested linear coding architecture in Section V-B. Without loss of generality, weassume that a = 0 and a = 0 . (If one coefﬁcient is equal to zero, the problem degenerates tothe point-to-point communication case.) Decoding.

Let ǫ ′ < ǫ . Upon receiving y n , the decoder searches for a unique index s a ∈ [2 n ˜ R max ] July 1, 2016 DRAFT7 such that ( u n ( m , l ) , u n ( m , l ) , y n ) ∈ T ( n ) ǫ ( U , U , Y ) , (26)for some ( m , l , m , l ) ∈ [2 nR ] × [2 n ˆ R ] × [2 nR ] × [2 n ˆ R ] such that ν ( s a ) = a η ( m , l ) ⊕ a η ( m , l ) . If there is no such index, or more than one, the decoder declares an error.

Analysis of the probability of error.

Let M , M be the messages, L , L be the indices chosenby the encoders, and S a be the (unique) index of the linear combination W n a ( S a ) such that ν ( S a ) = a η ( M , L ) ⊕ a η ( M , L ) . (27)Then, the decoder makes an error only if one or more of the following events occur, E = { U nk ( m k , l k )

6∈ T ( n ) ǫ ′ for all l k , for some m k , k = 1 , } , E = { ( U n ( M , L ) , U n ( M , L ) , Y n )

6∈ T ( n ) ǫ } , E = { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ for some ( m , l , m , l ) such that ν ( S a ) = a η ( m , l ) ⊕ a η ( m , l ) } . Then, by the union of events bound, P ( E ) ≤ P ( E ) + P ( E ∩ E c ) + P ( E ∩ E c ) . (28)By Lemma 9 in Appendix B, the probability P ( E ) tends to zero as n → ∞ if ˆ R k > D ( p U k k p q ) + (. ǫ ′ ) , k = 1 , . . . , K. (29)Deﬁne M := { M = 0 , M = 0 , L = 0 , L = 0 } as the event where both messages are zeroand the chosen auxiliary indices are zero as well. By symmetry of the codebook constructionand encoding steps, we have that P ( E ∩ E c ) = P ( E ∩ E c |M ) and P ( E ∩ E c ) = P ( E ∩ E c |M ) . Remark 8.

To bound the second probability term, we need a non-trivial proof to establishthat the pair of selected codewords are jointly typical with the channel output. If each encoderemployed an independent random codebook, this could be shown via a standard application ofthe Markov lemma [1, Lemma 12.1]. However, due to the shared generator matrix, the codebooksare dependent across the users. Prior work by Padakandla and Pradhan [25] established that the

July 1, 2016 DRAFT8 channel inputs and output are jointly typical for K = 2 users under the additional constraintthat ˆ R k < D ( p U k k p q ) + 3 (. ǫ ′ ) . In Appendix F, we provide an alternative proof that removes thisconstraint and generalizes to K > users.By Lemma 12 in Appendix F, the second term P ( E ∩ E c |M ) tends to zero as n → ∞ if (29)is satisﬁed.We bound the probability P ( E ∩ E c |M ) in two ways. The ﬁrst bounds the event that anincorrect linear combination is jointly typical with the channel output. The second bounds theevent that incorrect codewords are jointly typical with the channel output, regardless of the result-ing linear combination. Note that the event M implies that S a = 0 . Let S = { ( m , l , m , l ) : a η ( m , l ) ⊕ a η ( m , l ) = } denote the set of indices that yield the correct linear combination.For the ﬁrst bound, P ( E ∩ E c |M )= P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , E c , for some ( m , l , m , l ) / ∈ S|M} ( a ) = P { ( W n a ( s a ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , E c , for some ( m , l , m , l ) / ∈ S|M} ( b ) ≤ P { ( W n a ( s a ) , Y n ) ∈ T ( n ) ǫ , E c , for some s a = 0 |M} (30)where W n a ( s a ) = a U n ( m , l ) ⊕ a U n ( m , l ) , step ( a ) follows from the fact that W n a ( s a ) is a deterministic function of ( U n ( m , l ) , U n ( m , l )) ,and step ( b ) follows from the fact that ( W n a ( s a ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ implies ( W n a ( s a ) , Y n ) ∈ T ( n ) ǫ . Deﬁne ˜ E ( s a ) = { ( W n a ( s a ) , Y n ) ∈ T ( n ) ǫ , U n (0 , ∈ T ( n ) ǫ ′ , U n (0 , ∈ T ( n ) ǫ ′ } . Then, by the union of events bound, P ( E ∩ E c |M ) ≤ X s a =0 P ( ˜ E ( s a ) |M ) . (31) July 1, 2016 DRAFT9

Lemma 1.

Let ˜ D U = D ( p U k p q ) + D ( p U k p q ) . Then, P ( ˜ E ( s a ) |M ) ≤ n ( ˆ R + ˆ R ) − n ( I ( W a ; Y )+ D ( p W a k p q ) − (. ǫ )) − n ( ˜ D U − (. ǫ )) . Proof: P ( ˜ E ( s a ) |M ) ≤ P { ( W n a ( s a ) , Y n ) ∈ T ( n ) ǫ , U n (0 , ∈ T ( n ) ǫ , U n (0 , ∈ T ( n ) ǫ |M} = X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X ( w n ,y n ) ∈T ( n ) ǫ P { W n a ( s a ) = w n , Y n = y n , U n (0 ,

0) = u n , U n (0 ,

0) = u n |M} ( a ) = X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ X w n ∈T ( n ) ǫ ( W a | y n ) P { Y n = y n | U n (0 ,

0) = u n , U n (0 ,

0) = u n , M}× P { W n a ( s a ) = w n , U n (0 ,

0) = u n , U n (0 ,

0) = u n |M} ( b ) ≤ n ( ˆ R + ˆ R ) X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) × X w n ∈T ( n ) ǫ ( W a | y n ) P { W n a ( s a ) = w n , U n (0 ,

0) = u n , U n (0 ,

0) = u n } ( c ) = 2 n ( ˆ R + ˆ R ) X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) × X w n ∈T ( n ) ǫ ( W a | y n ) P { W n a ( s a ) = w n } P { U n (0 ,

0) = u n } P { U n (0 ,

0) = u n } ( d ) = 2 n ( ˆ R + ˆ R ) X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) × X w n ∈T ( n ) ǫ ( W a | y n ) − n ( H ( W a )+ D ( p W a k p q )) − n ( H ( U )+ D ( p U k p q )) − n ( H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) × − n ( I ( W a ; Y )+ D ( p W a k p q ) − (. ǫ )) − n ( H ( U )+ D ( p U k p q )) − n ( H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) X u n ∈T ( n ) ǫ , u n ∈T ( n ) ǫ − n ( I ( W a ; Y )+ D ( p W a k p q ) − (. ǫ )) − n ( H ( U )+ D ( p U k p q )) − n ( H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) − n ( I ( W a ; Y )+ D ( p W a k p q ) − (. ǫ )) − n ( D ( p U k p q ) − (. ǫ )) − n ( D ( p U k p q ) − (. ǫ )) , July 1, 2016 DRAFT0 where step ( a ) follows from the fact that conditioned on M , we have the Markov relation Y n → ( U n (0 , , U n (0 , → W n a ( s a ) , step ( b ) follows from Lemma 11 in Appendix C, step ( c ) follows from the fact that W n a ( s a ) , U n (0 , , and U n (0 , are independent due to the dithers and that s a = 0 , and step ( d ) uses thefact that W n a ( s a ) , U n (0 , , and U n (0 , are each uniformly distributed over F n q and that, forany pmf p V ( v ) , we can use the relation log q = H ( V ) + D ( p V k p q ) , (32)where p q = Unif( F q ) to write q n = 2 − n ( H ( V )+ D ( p V k p q )) . Plugging the bound from Lemma 1 back into (31), we ﬁnd that P ( E ∩ E c |M ) ≤ n ( ˜ R max + ˆ R + ˆ R ) − n ( I ( W a ; Y )+ D ( p W a k p q ) − (. ǫ )) − n ( ˜ D U − (. ǫ )) . Thus, the probability of P ( E ∩ E c |M ) tends to zero if as n → ∞ if R + 2 ˆ R + ˆ R < I ( W a ; Y ) + D ( p W a k p q ) + ˜ D U − (. ǫ ) ,R + ˆ R + 2 ˆ R < I ( W a ; Y ) + D ( p W a k p q ) + ˜ D U − (. ǫ ) . By eliminating ˆ R and ˆ R , setting ˆ R = D ( p U k p q ) + 2 (. ǫ ′ ) and ˆ R = D ( p U k p q ) + 2 (. ǫ ′ ) in orderto satisfy (29), and sending ǫ → , we have shown that a rate pair ( R , R ) is achievable if R < H ( U ) − H ( W a | Y ) ,R < H ( U ) − H ( W a | Y ) , where we have used the relation (32) to simplify the expression.Next, we show the second bound on P ( E ∩ E c |M ) by the following steps: P ( E ∩ E c |M )= P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , E c , for some ( m , l , m , l ) / ∈ S|M}≤ P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , E c , for some ( m , l , m , l ) = (0 , , , |M} . (33) July 1, 2016 DRAFT1

Deﬁne ˜ E ( m , l , m , l ) = { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ ,U n (0 , ∈ T ( n ) ǫ ′ , U n (0 , ∈ T ( n ) ǫ ′ } , and subsets of [2 nR ] × [2 n ˆ R ] × [2 nR ] × [2 n ˆ R ] as A = { ( m , l , m , l ) : ( m , l , m , l ) = (0 , , , } , A = { ( m , l , m , l ) : ( m , l ) = (0 , , ( m , l ) = (0 , } , A = { ( m , l , m , l ) : ( m , l ) = (0 , , ( m , l ) = (0 , } , A = { ( m , l , m , l ) : ( m , l ) = (0 , , ( m , l ) = (0 , } , L = { ( m , l , m , l ) ∈ A : η ( m , l ) , η ( m , l ) are linearly dependent } , L c = { ( m , l , m , l ) ∈ A : η ( m , l ) , η ( m , l ) are linearly independent } . Further, for some b ∈ F q such that b = , deﬁne L ( b ) = { ( m , l , m , l ) ∈ L : b η ( m , l ) ⊕ b η ( m , l ) = } , L ( b ) = { ( m , l , m , l ) ∈ L : b η ( m , l ) ⊕ b η ( m , l ) = } . Note that, for any b ∈ F q that is not the all-zero vector, we have A ⊆ ( A ∪ A ∪ A ) , A = L ∪ L c , L = L ( b ) ∪ L ( b ) , and thus, A ⊆ ( A ∪ A ∪ L c ∪ L ( b ) ∪ L ( b )) . Furthermore, the cardinality of these sets can July 1, 2016 DRAFT2 be upper bounded by |A | ≤ n ( R + ˆ R ) , |A | ≤ n ( R + ˆ R ) , |A | ≤ n ( R + ˆ R + R + ˆ R ) , |L| ≤ n (min { R + ˆ R ,R + ˆ R } ) ( q − . (34)Then, P ( E ∩ E c |M ) = P ( ˜ E ( m , l , m , l ) for some ( m , l , m , l ) ∈ A|M ) ≤ X ( m ,l ,m ,l ) ∈A P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ X ( m ,l ,m ,l ) ∈A P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M )+ X ( m ,l ,m ,l ) ∈A P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M )+ X ( m ,l ,m ,l ) ∈L c P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M )+ X ( m ,l ,m ,l ) ∈L ( b ) P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M )+ X ( m ,l ,m ,l ) ∈L ( b ) P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) . (35)We establish upper bounds on P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) in the following lemma. Lemma 2.

Let ˜ D U = D ( p U k p q ) + D ( p U k p q ) . The probability P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) can beupper bounded by considering the following cases: For ( m , l , m , l ) ∈ A , P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ n ( ˆ R + ˆ R ) × − n ( I ( U ; Y | U )+ D ( p U k p q )+ ˜ D U − (. ǫ )) . July 1, 2016 DRAFT3 For ( m , l , m , l ) ∈ A , P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ n ( ˆ R + ˆ R ) × − n ( I ( U ; Y | U )+ D ( p U k p q )+ ˜ D U − (. ǫ )) . For ( m , l , m , l ) ∈ L c , P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ n ( ˆ R + ˆ R ) × − n ( I ( U ,U ; Y )+2 ˜ D U − (. ǫ )) . For ( m , l , m , l ) ∈ L ( b ) , P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ n ( ˆ R + ˆ R ) × − n ( I ( W b ; Y )+ D ( p W b k p q )+ ˜ D U − (. ǫ )) , where W b = b U ⊕ b U . For ( m , l , m , l ) ∈ L ( b ) , P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) ≤ n ( ˆ R + ˆ R ) × − n ( I ( W c ; Y,W b )+ D ( p W c k p q )+ ˜ D U − (. ǫ )) , for some non-zero vector c = [ c , c ] ∈ F q that is linearly independent of b where W c = c U ⊕ c U . The proof is given in Appendix D.From the cardinality bounds given in (34) and Lemma 2, the probability terms in (35) tendsto zero as n → ∞ if R + 2 ˆ R + ˆ R < I ( U ; Y | U ) + D ( p U k p q ) + ˜ D U − (. ǫ ) , (36) R + ˆ R + 2 ˆ R < I ( U ; Y | U ) + D ( p U k p q ) + ˜ D U − (. ǫ ) , (37) R + R + 2 ˆ R + 2 ˆ R < I ( U , U ; Y ) + 2 ˜ D U − (. ǫ ) , (38) min { R + ˆ R , R + ˆ R } + ˆ R + ˆ R < I ( W b ; Y ) + D ( p W b k p q ) + ˜ D U − (. ǫ ) , (39) min { R + ˆ R , R + ˆ R } + ˆ R + ˆ R < I ( W c ; Y, W b ) + D ( p W c k p q ) + ˜ D U − (. ǫ ) . (40) July 1, 2016 DRAFT4

By choosing the auxiliary rates ˆ R k = D ( p U k k p q ) + 2 (. ǫ ′ ) , k = 1 , , in order to satisfy (29),using the relation (32), and taking ǫ → , we can conclude that any rate pair ( R , R ) satisfying R < I ( U ; Y | U ) ,R < I ( U ; Y | U ) ,R + R < I ( U , U ; Y )min (cid:0) R − H ( U ) , R − H ( U ) (cid:1) < I ( W b ; Y ) − H ( W b ) , min (cid:0) R − H ( U ) , R − H ( U ) (cid:1) < I ( W c ; Y, W b ) − H ( W c ) , (41)for some pmf p ( u ) p ( u ) , functions x ( u ) , x ( u ) , and non-zero linearly independent vectors b , c ∈ F q is achievable.Finally, in Appendix E, we show that the above rate region is equivalent to the rate region R LMAC which concludes the proof for the DM-MAC. We now generalize this result to the casewhere the channel output is real-valued, Y = R . Step 2: Real-valued channel outputs

Assume that Y = R . Let [ y ] j denote the output of a uniform quantizer that maps y ∈ R tothe closest point in {− j ∆ , − ( j − , . . . , − ∆ , , ∆ , . . . , ( j − , j ∆ } , where the step size is ∆ = 1 / √ j .From the proof in Step 1 above, the rate region in Theorem 1 is achievable with Y replaced by [ Y ] j . Since the real line R is a standard space according to the nomenclature of [51, Section 1.4],and since as ∆ → , the quantization partitions generated by ∆ Z asymptotically recover theBorel ﬁeld of the real line, by [51, Lem. 7.18] we have the limits lim j →∞ H (cid:0) W (cid:12)(cid:12) [ Y ] j (cid:1) = H ( W | Y ) , lim j →∞ I ( X ; [ Y ] j | X (cid:1) = I ( X ; Y | X ) , lim j →∞ I ( X ; [ Y ] j | X (cid:1) = I ( X ; Y | X ) , lim j →∞ I ( X , X ; [ Y ] j (cid:1) = I ( X , X ; Y ) . This completes the proof of Theorem 1.

July 1, 2016 DRAFT5

B. Proof of Theorem 2

Our approach is to show that integer-linear combinations of bounded integers can be viewedas linear combinations over a sufﬁciently large, prime-sized ﬁnite ﬁeld. This will enable us toapply Theorem 1.Let q be a prime number. Consider the ﬁnite ﬁeld F q = Z / q Z , F q = (cid:26) − q − , . . . , − , , , . . . , q − (cid:27) , where, for a, b ∈ F q , the addition and multiplication operations are deﬁned as a ⊕ b = [ a + b ] mod q ab = [ a · b ] mod q , respectively, with the modulo operation taken over the residue system Z / q Z . That is, [ a ] mod q = r where r ∈ Z / q Z is the unique element satisfying a = i q + r (over the reals) for some integer i . Notice that, for any a, b ∈ F q the addition and multiplication operations over F q can beexpressed as a ⊕ b = [ a + b ] mod q ,ab = [ a · b ] mod q , respectively.The next lemma will allow us to translate our integer-linear combinations over R into linearcombinations over F q . Lemma 3 (Translation Lemma) . Select a ℓ,k ∈ Z , ℓ, k ∈ [1 : K ] and assume that U k take valueson a bounded subset of Z . Then, for prime q large enough and ℓ = 1 , . . . , K , we have that K X k =1 a ℓ,k U k = K M k =1 ˜ a ℓ,k U k where ˜ a ℓ,k = [ a ℓ,k ] mod q and the multiplication and summation operations are taken over R onthe left-hand side and over F q on the right-hand side. July 1, 2016 DRAFT6

Proof:

Since the U k ’s are bounded, there exists a Γ > such that | U k | ≤ Γ , k = 1 , . . . , K .Select a prime q large enough to satisfy the following relation max (cid:26) max ℓ,k | a ℓ,k | , Γ (cid:27) ≤ (cid:22)r q − K (cid:23) . It follows that | a ℓ,k U k | ≤ ( q − / (2 K ) over R and that ˜ a ℓ,k = a ℓ,k . Therefore, (cid:12)(cid:12) P Kk =1 a ℓ,k U k (cid:12)(cid:12) ≤ ( q − / over R , and the mod q operation will not be used in any of the addition or multipli-cation operations over F q , ∆ , i.e., K M k =1 a ℓ,k U k = (cid:20) K X k =1 a ℓ,k U k (cid:21) mod q = K X k =1 a ℓ,k U k . Now, using the Translation Lemma, select q large enough so that a U + a U = ˜ a U ⊕ ˜ a U where the operations on the left-hand side are over R while those on the right-hand side are over F q and ˜ a k = [ a k ] mod q . Now, invoking Theorem 1 with ﬁnite ﬁeld F q , input pmf p U ( u ) p U ( u ) ,and symbol mappings x ( u ) and x ( u ) , we obtain the desired achievable rate region.VII. P ROOF OF T HEOREM U and U are compactly supported (an assumption we will relax at the end of theproof by means of a truncation argument).For a given resolution ∆ > , deﬁne ⌈ u ⌋ ∆ = arg min ˜ u ∈ ∆ Z | u − ˜ u | to be the quantization of u to the closest point in ∆ Z , ties being broken in any arbitrary way.Now, deﬁne the variables W , a U + a U W ′ , a ′ U + a ′ U W ∆ , a ⌈ U ⌋ ∆ + a ⌈ U ⌋ ∆ W ′ ∆ , a ′ ⌈ U ⌋ ∆ + a ′ ⌈ U ⌋ ∆ July 1, 2016 DRAFT7 where a ′ , a / gcd( a ) and a ′ , a / gcd( a ) denote the gcd-reduced coefﬁcients. Let Y ∆ denotethe channel output variable induced by the quantized input variables ⌈ U ⌋ ∆ and ⌈ U ⌋ ∆ . Thatis, conditional on ( ⌈ U ⌋ ∆ , ⌈ U ⌋ ∆ ) = ( u , u ) , the variable Y ∆ is distributed with a cdf Y ∆ ∼ F Y | U ,U ( ·| u , u ) .Note that in Theorem 2, the assumption U k ⊂ Z can be equivalently replaced by U k ⊂ ∆ Z with some positive scaling factor ∆ > without affecting the achievable rate region (whichis invariant under this scaling). Owing to the compact support assumption on U and U , thequantized auxiliaries ⌈ U ⌋ ∆ and ⌈ U ⌋ ∆ are ﬁnitely supported for any ∆ > . Hence the followingcompute–forward rate region is achievable by Theorem 2: R < H (cid:0) ⌈ U ⌋ ∆ (cid:1) − H (cid:0) W ∆ (cid:12)(cid:12) Y ∆ (cid:1) R < H (cid:0) ⌈ U ⌋ ∆ (cid:1) − H (cid:0) W ∆ (cid:12)(cid:12) Y ∆ (cid:1) . We will calculate the limit of this achievable rate region as we take the quantization step ∆ to zero. It sufﬁces to prove the following three statements in order to conclude the proof ofTheorem 3: lim ∆ → (cid:8) H ([ U ] ∆ ) + log(∆) (cid:9) = h ( U ) (42a) lim ∆ → (cid:8) H ([ U ] ∆ ) + log(∆) (cid:9) = h ( U ) (42b) lim sup ∆ → (cid:8) H ( W ∆ | Y ∆ ) + log(∆) (cid:9) ≤ h ( W | Y ) − log gcd( a ) . (42c)Let us ﬁrst state a classical result by R´enyi. Lemma 4 ( [52, Theorem 1]) . Let X be an R K -valued random vector with an absolutelycontinuous distribution such that H ( ⌈ X ⌋ ) and h ( X ) are ﬁnite. Then lim ∆ → (cid:8) H ( ⌈ X ⌋ ∆ ) + K log(∆) (cid:9) = h ( X ) . Note that (42a) and (42b) follow directly from Lemma 4. Next, we will need a recent resultof Makkuva and Wu [53].

Lemma 5 ( [53, Lemma 1]) . Let X , . . . , X K be mutually independent, continuous randomvariables with compact support such that H ( ⌈ X i ⌋ ) and h ( X i ) are ﬁnite for all i = 1 , . . . , K . July 1, 2016 DRAFT8

Then for relatively prime integer coefﬁcients ( a , . . . , a K ) ∈ Z K , lim ∆ → ( H & K X i =1 a i X i % ∆ ! − H K X i =1 a i ⌈ X i ⌋ ∆ !) = 0 . To prove the remaining statement (42c), note that lim sup ∆ → (cid:8) H ( W ∆ | Y ∆ ) + log(∆) (cid:9) = lim sup ∆ → (cid:8) H ( W ∆ ) − I ( W ∆ ; Y ∆ ) + log(∆) (cid:9) ≤ lim sup ∆ → (cid:8) H ( W ∆ ) + log(∆) (cid:9) − lim inf ∆ → I ( W ∆ ; Y ∆ ) . (43)For the ﬁrst limit, we have lim ∆ → (cid:8) H ( W ∆ ) + log(∆) (cid:9) ( a ) = lim ∆ → (cid:8) H ( W ′ ∆ ) + log(∆) (cid:9) ( b ) = lim ∆ → (cid:8) H ( ⌈ W ′ ⌋ ∆ ) + log(∆) (cid:9) ( c ) = h ( W ′ )= h ( W ) − log gcd( a ) (44)where step ( a ) follows from scale invariance of discrete entropy, step ( b ) is due to Lemma 5,and step ( c ) is due to Lemma 4.For the second limit, we will prove that ( W ∆ , Y ∆ ) converges in distribution to ( W, Y ) as ∆ → . This convergence will imply, by the lower semi-continuity of relative entropy [54,Thm. 1], [55, Thm. 19] that lim inf ∆ → I ( W ∆ ; Y ∆ ) ≥ I ( W ; Y ) (45)which, combined with (43)–(45), will conclude the proof of (42c). To prove this weak con-vergence property, ﬁrst observe that the pair of quantized variables ⌈ U ⌋ ∆ = ( ⌈ U ⌋ ∆ , ⌈ U ⌋ ∆ ) converges in probability (and hence in distribution) to the unquantized pair U = ( U , U ) .Since by assumption, we have that for almost all u belonging to the support of U , the familyof cdfs F Y | U ( ·| u ) is continuous in u (in the sense of weak convergence of random variables), itfollows by the Portmanteau Theorem [56, Theorem 2.8.1] that for any continuous and bounded ϕ : R → R , the associated function ˜ ϕ ( u ) , E (cid:2) ϕ ( Y, u ) (cid:12)(cid:12) U = u (cid:3) = Z ϕ ( u , y ) F Y | U (d y | u ) July 1, 2016 DRAFT9 is continuous almost everywhere and bounded. It further follows that the pair ( ⌈ U ⌋ ∆ , Y ∆ ) converges in distribution to ( U , Y ) as ∆ → , because for any continuous bounded function ϕ : R → R , we have lim ∆ → E [ ϕ ( ⌈ U ⌋ ∆ , Y ∆ )] = lim ∆ → Z ϕ ( u , y ) F ⌈ U ⌋ ∆ ,Y ∆ (d u , d y ) ( a ) = lim ∆ → Z (cid:18)Z ϕ ( u , y ) F Y | U (d y | u ) (cid:19) F ⌈ U ⌋ ∆ (d u )= lim ∆ → Z ˜ ϕ ( u ) F ⌈ U ⌋ ∆ (d u ) ( b ) = E [ ϕ ( U , Y )] . Here, equality ( a ) holds by Fubini’s Theorem, which is applicable since ϕ is bounded andthe integrals are taken with respect to probability measures; equality ( b ) holds because ˜ ϕ iscontinuous and bounded (as argued above), and ⌈ U ⌋ ∆ converges in distribution to U , which byassumption is absolutely continuous. In particular, if we set ϕ to be any function of the form ϕ ( u , y ) = ψ ( a ′ u + a ′ u , y ) with an arbitrary continuous bounded function ψ , it will hold that lim ∆ → E [ ψ ( W ∆ , Y ∆ )] = lim ∆ → E [ ψ ( a ′ ⌈ U ⌋ ∆ + a ′ ⌈ U ⌋ ∆ , Y ∆ )]= lim ∆ → E [ ϕ ( ⌈ U ⌋ ∆ , Y ∆ )]= E [ ϕ ( U , Y )]= E [ ψ ( W, Y )] . Hence, ( W ∆ , Y ∆ ) tends in distribution to ( W, Y ) , which concludes the proof of (42c).Thus far, we have proven Theorem 3 for the case where U and U are compactly supported. Torelax this assumption, it sufﬁces to show that for arbitrarily supported ( U , U ) , the differentialentropies h ( U ) , h ( U ) and h ( W | Y ) can be represented as the limiting differential entropiesof sequences of compactly supported variables. For this purpose, consider arbitrarily supportedvariables U ∈ R and U ∈ R complying with the assumptions set forth by Theorem 3, and theirrespective truncated versions h U i τ and h U i τ with pdfs deﬁned as follows: f h U i τ ( u ) , f U ( u ) {| u | < τ } P {| U | < τ } f h U i τ ( u ) , f U ( u ) {| u | < τ } P {| U | < τ } July 1, 2016 DRAFT0 where {·} represents the indicator function. Let us further deﬁne W ′ τ , a ′ h U i τ + a ′ h U i τ and let Y τ denote the output variable induced by the truncated auxiliaries h U i τ and h U i τ . Thatis, conditional on ( h U i τ , h U i τ ) = ( u , u ) , the variable Y τ is distributed as Y τ ∼ P Y | U ,U ( ·| u , u ) .Then the following holds: Lemma 6 (Truncation) . In the limit as τ → ∞ , the following holds: h ( U ) = lim τ →∞ h ( h U i τ ) (46a) h ( U ) = lim τ →∞ h ( h U i τ ) (46b) h ( W ′ | Y ) ≥ lim sup τ →∞ h ( W ′ τ | Y τ ) . (46c) Proof:

The ﬁrst two equalities can be proven by standard arguments. In fact, they followdirectly from [53, Lem. 2]. As to the inequality (46c), the joint cdf of W ′ τ and Y τ is expressibleas F W ′ τ ,Y τ ( w, y ) = P { W ′ τ ≤ w, Y τ ≤ y } = Z Z [ − τ,τ ] P { W ′ τ ≤ w, Y τ ≤ y |h U i τ = u , h U i τ = u } f h U i τ ( u ) f h U i τ ( u ) d u d u = Z Z [ − τ,τ ] { a ′ u + a ′ u ≤ w } F Y | U ( y | u ) f U ( u ) P {| U | < τ } f U ( u ) P {| U | < τ } d u d u . Hence, the joint cdf converges pointwise on the continuity set, because lim τ →∞ F W ′ τ ,Y τ ( w, y ) = Z Z R { a ′ u + a ′ u ≤ w } F Y | U ,U ( y ) f U ( u ) f U ( u ) d u d u = F W ′ ,Y ( w, y ) for each point ( w, y ) at which F W ′ ,Y is continuous. It follows in particular that the marginalsconverge weakly, i.e., lim τ →∞ F W ′ τ ( w ) = F W ′ ( w )lim τ →∞ F Y τ ( y ) = F Y ( y ) We commit a slight abuse of notation here, since W ′ ∆ and Y ∆ , deﬁned earlier, have a different meaning than W ′ τ and Y τ . July 1, 2016 DRAFT1 for all w and y being continuity points of F W ′ and F Y , respectively. Consequently, the jointdistribution and the product distribution of marginals converge as P W ′ τ ,Y τ τ →∞ −−−→ P W ′ ,Y P W ′ τ × P Y τ τ →∞ −−−→ P ′ W × P Y in the sense of weak convergence. Since relative entropy is lower semi-continuous in the weaktopology [54, Theorem 1], [55, Theorem 19], it follows that lim inf τ →∞ I ( W ′ τ ; Y τ ) ≥ I ( W ′ ; Y ) = I ( W ; Y ) . (47)By [53, Lemma 2], we further know that lim τ →∞ h ( W ′ τ ) = h ( W ′ ) . It thus follows that lim inf τ →∞ (cid:8) h ( h U i τ ) − h ( W ′ τ | Y τ ) (cid:9) = lim inf τ →∞ (cid:8) h ( h U i τ ) − h ( W ′ τ ) + I ( W ′ τ ; Y τ ) (cid:9) ≥ h ( U ) − h ( W ′ | Y ) . which concludes the proof of (46c) and hence the proof of Lemma 6.It follows from Lemma 6 that the compactness assumption on the support sets of U and U can be removed, which establishes Theorem 3.VIII. P ROOFS OF T HEOREM AND T HEOREM A. Proof of Theorem 4

Fix F q , pmf Q Kk =1 p ( u k ) , and functions x k ( u k ) , k ∈ [1 : K ] . The codebook construction andencoding steps follow the nested linear coding architecture in Section V-B. Decoder.

Let ǫ ′ < ǫ . Upon receiving y n , the decoder ﬁnds a unique index pair ( s a , s a ) , suchthat ( w n a ( s a ) , w n a ( s a ) , y n ) ∈ T ( n ) ǫ , for some s a ∈ [2 n ˜ R ( a ) ] and s a ∈ [2 n ˜ R ( a ) ] , where w n a ( s a ) and w n a ( s a ) are deﬁned in (23)and ˜ R ( a ) and ˜ R ( a ) are deﬁned in (21). If there is no such index pair, or more than one, thedecoder declares an error. Analysis of the probability of error.

In the following analysis, we will omit some steps whichare simple extensions of the proof steps in the previous section. Let M , . . . , M K be the chosen July 1, 2016 DRAFT2 messages, L , . . . , L K be the indices chosen by the encoders, and S a , S a be the indices of thedesired linear combinations W n a ( S a ) , W n a ( S a ) .Then, the decoder makes an error only if one or more of the following events occur, E = { U nk ( m k , l k )

6∈ T ( n ) ǫ ′ for all l k , for some m k , k ∈ [1 : K ] } , E = { ( W n a ( S a ) , W n a ( S a ) , Y n )

6∈ T ( n ) ǫ } , E = { ( W n a ( s a ) , W n a ( s a ) , Y n ) ∈ T ( n ) ǫ for some ( s a , s a ) = ( S a , S a ) } . Then, by the union of events bound, P ( E ) ≤ P ( E ) + P ( E ∩ E c ) + P ( E ∩ E c ) . (48)By Lemma 9 in Appendix B, the probability P ( E ) tends to zero as n → ∞ if ˆ R k > D ( p U k k p q ) + (. ǫ ′ ) , k = 1 , . . . , K. (49)Deﬁne M = { M = · · · = M K = 0 , L = · · · = L K = 0 } as the event where all messages arezero and the chosen auxiliary indices are zero as well. Note that, conditioned on the event M ,the correct indices are zero, S a = S a = 0 . By symmetry of the codebook construction andencoding steps, we have that P ( E ∩ E c ) = P ( E ∩ E c |M ) and P ( E ∩ E c ) = P ( E ∩ E c |M ) .By Lemma 12 in Appendix F and the conditional typicality lemma [1, § P ( E ∩ E c |M ) tends to zero as n → ∞ if (49) is satisﬁed. Deﬁne ˜ E ( s a , s a ) = { ( W n a ( s ) , W n a ( s ) , Y n ) ∈ T ( n ) ǫ ,U nj (0 , ∈ T ( n ) ǫ ′ , j ∈ [1 : K ] } , and partitions of the index pairs by A = { ( s a , s a ) : ( s a , s a ) = (0 , } , A = { ( s a , s a ) : s a = 0 , s a = 0 } , A = { ( s a , s a ) : s a = 0 , s a = 0 } , A = { ( s a , s a ) : s a = 0 , s a = 0 } , L = { ( s a , s a ) ∈ A : η ( s a ) , η ( s a ) are linearly dependent } , L c = { ( s a , s a ) ∈ A : η ( s a ) , η ( s a ) are linearly independent } . July 1, 2016 DRAFT3

Furthermore, for b ∈ F q , b = , deﬁne the sets L ( b ) = { ( s a , s a ) ∈ L : b η ( s a ) ⊕ b η ( s a ) = } , (50) L ( b ) = { ( s a , s a ) ∈ L : b η ( s a ) ⊕ b η ( s a ) = } . (51)Note that, for any b ∈ F q that is not the all-zero vector, we have A ⊆ ( A ∪ A ∪ A ) , A = L ∪ L c , L = L ( b ) ∪ L ( b ) , and thus, A = ( A ∪ A ∪ L c ∪ L ( b ) ∪ L ( b )) . Furthermore, the cardinality of these sets canbe upper bounded by |A | ≤ n ˜ R ( a ) , |A | ≤ n ˜ R ( a ) , |A | ≤ n ( ˜ R ( a )+ ˜ R ( a )) , |L| ≤ q n min( ˜ R ( a ) , ˜ R ( a )) . (52)Then, P ( E ∩ E c |M ) = P { ˜ E ( s a , s a ) for some ( s a , s a ) ∈ A|M}≤ X ( s a ,s a ) ∈A P { ˜ E ( s a , s a ) (cid:12)(cid:12) M}≤ X ( s a ,s a ) ∈A P { ˜ E ( s a , s a ) (cid:12)(cid:12) M} + X ( s a ,s a ) ∈A P { ˜ E ( s a , s a ) (cid:12)(cid:12) M} + X ( s a ,s a ) ∈L c P { ˜ E ( s a , s a ) (cid:12)(cid:12) M} + X ( s a ,s a ) ∈L ( b ) P { ˜ E ( s a , s a ) (cid:12)(cid:12) M} + X ( s a ,s a ) ∈L ( b ) P { ˜ E ( s a , s a ) (cid:12)(cid:12) M} . (53)Let ˜ D U = D ( p U k p q ) + · · · + D ( p U K k p q ) and deﬁne V b = b W a ⊕ b W a ,V c = c W a ⊕ c W a , July 1, 2016 DRAFT4 where c = [ c , c ] ∈ F q is a non-zero vector that is linearly independent of b .By the cardinality bounds in (52) and by closely following the steps in Lemma 2 (by replacing U k with W a k , k = 1 , , replacing W b with V b , and replacing W c with V c ), the probability termsin (53) tend to zero as n → ∞ if ˜ R ( a ) + ˆ R Σ < I ( W a ; Y, W a ) + D ( p W a k p q ) + ˜ D U − (. ǫ ) , (54) ˜ R ( a ) + ˆ R Σ < I ( W a ; Y, W a ) + D ( p W a k p q ) + ˜ D U − (. ǫ ) , (55) ˜ R ( a ) + ˜ R ( a ) + ˆ R Σ < I ( W a , W a ; Y ) + I ( W a ; W a ) (56) + D ( p W a k p q ) + D ( p W a k p q ) + ˜ D U − (. ǫ ) , (57) min (cid:0) ˜ R ( a ) , ˜ R ( a ) (cid:1) + ˆ R Σ < I ( V b ; Y ) + D ( p V b k p q ) + ˜ D U − (. ǫ ) , (58) min (cid:0) ˜ R ( a ) , ˜ R ( a ) (cid:1) + ˆ R Σ < I ( V c ; Y, V b ) + D ( p V c k p q ) + ˜ D U − (. ǫ ) , (59)where ˆ R Σ = ˆ R + · · · + ˆ R K . Finally, the rate region in Theorem 4 is established by eliminatingthe auxiliary rates by choosing ˆ R k = D ( p U k k p q ) + 2 (. ǫ ′ ) , k ∈ [1 : K ] to satisfy (49), using therelation (32), following the steps in Appendix E to simplify the rate region expression into theform without V c , and taking ǫ → . This concludes the proof for ( F , A ) = ( F q , F q ) .Finally, by Lemma 3, we can ﬁnd a large enough q such that the linear combinations in (12),(13), and (14), can be translated to linear combinations in ( R , Z ) , which concludes the proof ofRemark 5. B. Proof of Theorem 5

First, note that by the achievability proof of Theorems 1 and 2, there exists a sequence ofnested linear coding architectures with rates ( R , R ) that are achievable for computing ( F q , F q ) and ( R , Z q ) linear combinations. Moreover, note that the rate region in Theorem 4 simpliﬁes to R LMAC when specialized to the case K = 2 and A is the identity matrix. Thus, by the achievabilityproof of Theorem 4, the same nested linear coding architecture recovers the message pair, ifthe sequence of codes have rate pairs ( R , R ) ∈ R LMAC . To prove the theorem for ( R , Z ) computation codes (Theorem 3), the same quantization method in Section VII applies to the rateregion R LMAC . July 1, 2016 DRAFT5

IX. C

ONCLUDING R EMARKS

Looking ahead, the framework of joint typicality is a promising approach for exploring theperformance of random structured codes. Here, we have generalized prior work on Gaussiancompute–forward and developed a compute–forward framework for memoryless MACs wherethe goal is either to recover a linear combination over F q or an integer-linear combination ofreal-valued codewords. Furthermore, we have analyzed the performance of simultaneous jointtypicality decoding for recovering two linear combinations. As discussed in Remark 6, an openproblem is to extend our analysis of simultaneous joint typicality decoding from recovering pairsof messages to recovering more than two messages.A CKNOWLEDGMENTS

The authors would like to thank Aditya Gangrade, Young-Han Kim, Olivier L´evˆeque and OrOrdentlich for helpful discussions. A

PPENDIX AJ OINT TYPICALITY LEMMA FORMISMATCHED DISTRIBUTIONS

Lemma 7.

Let X ∼ p X ( x ) and let ˜ p X ( x ) be another distribution on X such that D X = D ( p X k ˜ p X ) < ∞ . Then, for x n ∈ T ( n ) ǫ ( X ) , − n ( D X + H ( X )+ (. ǫ )) ≤ n Y i =1 ˜ p X ( x i ) ≤ − n ( D X + H ( X ) − (. ǫ )) . (60) Proof:

To prove the ﬁrst statement, observe that, Q ni =1 ˜ p X ( x i ) = Q x ∈X ˜ p X ( x ) nπ ( x | x n ) , whererecall that π ( x | x n ) is the empirical pmf of x n . Then, log ˜ p X ( x n ) = X x ∈X nπ ( x | x n ) log ˜ p X ( x )= X x ∈X n ( π ( x | x n ) − p X ( x ) + p X ( x )) log ˜ p X ( x )= n X x ∈X p X ( x ) log ˜ p X ( x ) − n X x ∈X ( π ( x | x n ) − p X ( x )) ( − log ˜ p X ( x ))= − n ( D ( p X k ˜ p X ) + H ( X )) − n X x ∈X ( π ( x | x n ) − p X ( x )) ( − log ˜ p X ( x )) . July 1, 2016 DRAFT6

Since x n ∈ T ( n ) ǫ ( X ) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X x ∈X ( π ( x | x n ) − p X ( x )) ( − log ˜ p X ( x )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X x ∈X | π ( x | x n ) − p X ( x ) | ( − log ˜ p X ( x )) ≤ − ǫ X x ∈X p X ( x ) log ˜ p X ( x )= ǫ ( D ( p X k ˜ p X ) + H ( X )) Lemma 8.

Let ( X, Y ) ∼ p X,Y ( x, y ) and ˜ p X ( x ) be another distribution on X such that D ( p X k ˜ p X ) < ∞ . Let ǫ ′ < ǫ . Then, there exists (. ǫ ) > that tends to zero as ǫ → such that the followingstatement holds: If ˜ y n is an arbitrary sequence and ˜ X n ∼ Q ni =1 ˜ p X (˜ x i ) , then P { ( ˜ X n , ˜ y n ) ∈ T ( n ) ǫ ( X, Y ) }≤ − n ( I ( X ; Y )+ D ( p X k ˜ p X ) − (. ǫ )) If ˜ y n ∈ T ( n ) ǫ ′ ( Y ) and ˜ X n ∼ Q ni =1 ˜ p X (˜ x i ) , then for n sufﬁciently large, P { ( ˜ X n , ˜ y n ) ∈ T ( n ) ǫ ( X, Y ) }≥ − n ( I ( X ; Y )+ D ( p X k ˜ p X )+ (. ǫ )) The proof follows from Lemma 7 and standard cardinality bounds on the conditional typicalset T ( n ) ǫ ( X | y n ) . A PPENDIX BP ACKING AND COVERING LEMMAS FORMISMATCHED DISTRIBUTIONS

Lemma 9 (Mismatched Covering Lemma) . Let ( X, ˆ X ) ∼ p X, ˆ X ( x, ˆ x ) and ˜ p ˆ X (ˆ x ) be a distributionon ˆ X such that D ( p ˆ X k ˜ p ˆ X ) < ∞ . Let X n be a random sequence with lim n →∞ P { X n ∈T ( n ) ǫ ( X ) } = 1 and let ˜ X n ( m ) , m ∈ C , where |C| ≥ nR , be pairwise independent and in-dependent of X n , each distributed according to Q ni =1 ˜ p ˆ X (˜ x i ) . Then, there exists a (. ǫ ) that tends July 1, 2016 DRAFT7 to zero as ǫ → such that lim n →∞ P { ( X n , ˜ X n ( m ))

6∈ T ( n ) ǫ ( X, ˆ X ) for all m ∈ C} = 0 , if R > I ( X ; ˆ X ) + D ( p ˆ X k ˜ p ˆ X ) + (. ǫ ) .Proof: Let A = { m ∈ [1 : 2 nR ] : ( X n , ˜ X n ( m )) ∈ T ( n ) ǫ ( X, ˆ X ) } . Then, by the Chebyshevlemma, P {|A| = 0 } ≤ Var( |A| )( E |A| ) . For m ∈ [1 : 2 nR ] , deﬁne the indicator random variables E ( m ) =  if ( X n , ˜ X n ( m )) ∈ T ( n ) ǫ ( X, ˆ X ) , otherwise,and let p := P { E (1) = 1 } and p := P { E (1) = 1 , E (2) = 1 } = p . Then, E ( |A| ) = X m P { ( X n , ˜ X ( m )) ∈ T ( n ) ǫ ( X, ˆ X ) } = 2 nR p , E ( |A| ) = X m P { ( X n , ˜ X ( m )) ∈ T ( n ) ǫ ( X, ˆ X ) } + X m X m ′ = m P { ( X n , ˜ X ( m )) ∈ T ( n ) ǫ ( X, ˆ X ) , ( X n , ˜ X ( m ′ )) ∈ T ( n ) ǫ ( X, ˆ X ) }≤ nR p + 2 n R p . Thus,

Var( |A| ) ≤ nR p . From Lemma 8, for sufﬁciently large n , we have p ≤ − n ( I ( X ; Y )+ D ( p X k ˜ p X ) − (. ǫ )) ,p ≥ − n ( I ( X ; Y )+ D ( p X k ˜ p X )+ (. ǫ )) , and hence, Var( |A| )( E |A| ) ≤ − n ( R − I ( X ; Y ) − D ( p X k ˜ p X ) − (. ǫ )) , which tends to zero as n → ∞ if R > I ( X ; Y ) + D ( p X k ˜ p X ) + (. ǫ ) . July 1, 2016 DRAFT8

Lemma 10 (Mismatched Packing Lemma) . Let ( X, Y ) ∼ p X,Y ( x, y ) and ˜ p X ( x ) be a distributionon X such that D ( p X k ˜ p X ) < ∞ . Let ˜ Y n be an arbitrarily distributed random sequence, and ˜ X n ( m ) , m ∈ C , where |C| ≤ nR and each sequence is distributed according to Q ni =1 ˜ p X ( x i ) .Further assume that ˜ X n ( m ) , m ∈ C is pairwise independent of ˜ Y n , but is arbitrarily dependenton other ˜ X n sequences. Then, there exists (. ǫ ) that tends to zero as ǫ → such that lim n →∞ P { ( ˜ X n ( m ) , ˜ Y n ) ∈ T ( n ) ǫ ( X, Y ) for some m ∈ C} = 0 , if R < I ( X ; Y ) + D ( p X k ˜ p X ) − (. ǫ ) . The proof of this lemma follows directly from the union of events bound and Lemma 8.A

PPENDIX CL EMMA Lemma 11.

Let M = { M k = 0 , L k = 0 , k ∈ [1 : K ] } and A be an arbitrary event that isindependent of the event { M = 0 , . . . , M K = 0 } . Then, P ( A|M ) ≤ n ( ˆ R + ··· + ˆ R K ) P ( A ) . Proof:

From the relation P ( A|M ) = P ( M|A ) P ( M ) P ( A ) , and P ( M|A ) ≤ P ( M = 0 , . . . , M K = 0 |A )= P ( M = 0 , . . . , M K = 0) , it is sufﬁcient to show that P ( M ) = P ( L = 0 , . . . , L K = 0 | M = 0 , . . . , M K = 0) P ( M = 0 , . . . , M K = 0)= 12 n ( ˆ R + ··· + ˆ R K ) P ( M = 0 , . . . , M K = 0) , = 12 n ( R + ··· + R K + ˆ R + ··· + ˆ R K ) , i.e., the tuple of messages and indices are uniformly distributed, which follows from the symmetryof the codebook construction. To be precise, in the following we will show that P ( L = 0 , . . . , L K = 0 | M = 0 , . . . , M K = 0) = P ( L = l , . . . , L K = l K | M = 0 , . . . , M K = 0) , July 1, 2016 DRAFT9 for ( l , . . . , l K ) ∈ [2 n ˆ R ] × · · · × [2 n ˆ R K ] .Let ˜ M = { M = 0 , . . . , M K = 0 } . Then, we have P ( L k = l k , k ∈ [1 : K ] | ˜ M )= X u n ,...,u nK X G P ( G = G , U nk (0 , l k ) = u nk , L k = l k , k ∈ [1 : K ] | ˜ M ) , (61)and P ( G = G , U nk (0 , l k ) = u nk , L k = l k , k ∈ [1 : K ] | ˜ M )= P ( G = G , η (0 , l k ) G ⊕ D nk = u nk , L k = l k , k ∈ [1 : K ] | ˜ M )= P ( G = G , D nk = u nk ⊖ η (0 , l k ) G , L k = l k , k ∈ [1 : K ] | ˜ M ) ( a ) = P ([ U n (0 , l ′ k ) = η (0 , l ′ k ) G ⊕ u nk ⊖ η (0 , l k ) G : l ′ k = l k ] , G = G ,D nk = u nk ⊖ η (0 , l k ) G , L k = l k , k ∈ [1 : K ] | ˜ M )= P ([ U n (0 , l ′ k ) = ( η (0 , l ′ k ) ⊖ η (0 , l k )) G ⊕ u nk : l ′ k = l k ] , G = G ,D nk = u nk ⊖ η (0 , l k ) G , L k = l k , k ∈ [1 : K ] | ˜ M )= P ([ ˆ U n (0 , ˆ l k ) = η (0 , ˆ l k ) G ⊕ u nk : ˆ l k = 0] , G = G , D nk = u nk , L k = 0 , k ∈ [1 : K ] | ˜ M )= P ( G = G , D nk = u nk , L k = 0 , k ∈ [1 : K ] | ˜ M ) (62)where ˆ U nk (0 , ˆ l k ) , ˆ l k ∈ [2 n ˆ R k ] is a permuted codebook of U n (0 , l ′ k ) , ˆ l ′ k ∈ [2 n ˆ R k ] with respect to l k such that η (0 , ˆ l k ) = η (0 , l ′ k ) ⊖ η (0 , l k ) , and step ( a ) follows from the fact that that G and D nk = u nk ⊖ η (0 , l k ) G determines the rest ofthe codewords. Finally, plugging in (62) into (61) completes the proof.A PPENDIX DP ROOF OF L EMMA P ( ˜ E ( m , l , m , l ) (cid:12)(cid:12) M ) forthe following cases. July 1, 2016 DRAFT0

A. Case ( m , l , m , l ) ∈ A : P { ( U n ( m , l ) , U n (0 , , Y n ) ∈ T ( n ) ǫ , U n (0 , ∈ T ( n ) ǫ ′ |M}≤ X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ P { ( U n ( m , l ) , u n , Y n ) ∈ T ( n ) ǫ , U n (0 ,

0) = u n , U n (0 ,

0) = u n |M} = X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n ,y n ):(˜ u n ,u n ,y n ) ∈T ( n ) ǫ P { U n ( m , l ) = ˜ u n , Y n = y n , U n (0 ,

0) = u n , U n (0 ,

0) = u n |M} ( a ) = X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n ,y n ):(˜ u n ,u n ,y n ) ∈T ( n ) ǫ P { Y n = y n | U n (0 ,

0) = u n , U n (0 ,

0) = u n , M}× P { U n ( m , l ) = ˜ u n , U n (0 ,

0) = u n , U n (0 ,

0) = u n |M} ( b ) ≤ X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n ,y n ):(˜ u n ,u n ,y n ) ∈T ( n ) ǫ P { Y n = y n | U n (0 ,

0) = u n , U n (0 ,

0) = u n , M}× n ( ˆ R + ˆ R ) P { U n ( m , l ) = ˜ u n , U n (0 ,

0) = u n , U n (0 ,

0) = u n } ( c ) = 2 n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n ,y n ):(˜ u n ,u n ,y n ) ∈T ( n ) ǫ p ( y n | u n , u n ) p q (˜ u n ) p q ( u n ) p q ( u n ) ≤ n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X y n :( u n ,y n ) ∈T ( n ) ǫ p ( y n | u n , u n ) X ˜ u n :(˜ u n ,u n ,y n ) ∈T ( n ) ǫ p q (˜ u n ) p q ( u n ) p q ( u n ) ≤ n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X y n :( u n ,y n ) ∈T ( n ) ǫ p ( y n | u n , u n ) × n ( H ( U | Y,U )+ (. ǫ )) − n (2 H ( U )+2 D ( p U k p q )) − n ( H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ n ( H ( U | Y,U )+ (. ǫ )) − n (2 H ( U )+2 D ( p U k p q )) − n ( H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) − n ( I ( U ; Y,U )+ D ( p U k p q ) − (. ǫ )) − n ( ˜ D U − (. ǫ ))( d ) = 2 n ( ˆ R + ˆ R ) − n ( I ( U ; Y | U )+ D ( p U k p q ) − (. ǫ )) − n ( ˜ D U − (. ǫ )) July 1, 2016 DRAFT1 where step ( a ) follows from the fact that conditioned on M , we have the Markov relation Y n → ( U n (0 , , U n (0 , → U n ( m , l ) , step ( b ) follows from Lemma 11, step ( c ) followsfrom the independent construction of dithers d nk , k = 1 , , and [48, Theorem 1], and step ( d ) follows from the independence of U and U . B. Case ( m , l , m , l ) ∈ A : By symmetry with the case above, P { ( U n (0 , , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U n (0 , ∈ T ( n ) ǫ ′ |M}≤ n ( ˆ R + ˆ R ) − n ( I ( U ; Y | U )+ D ( p U k p q ) − (. ǫ )) − n ( ˜ D U − (. ǫ )) . C. Case ( m , l , m , l ) ∈ L c : P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ ′ , k = 1 , |M}≤ X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 ,

0) = u nk , k = 1 , |M} ( a ) = X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n , ˜ u n ,y n ):(˜ u n , ˜ u n ,y n ) ∈T ( n ) ǫ P { Y n = y n | U nk (0 ,

0) = u nk , k = 1 , , M}× P { U nk ( m k , l k ) = ˜ u nk , U nk (0 ,

0) = u nk , k = 1 , |M} ( b ) ≤ X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n , ˜ u n ,y n ):(˜ u n , ˜ u n ,y n ) ∈T ( n ) ǫ P { Y n = y n | U nk (0 ,

0) = u nk , k = 1 , , M}× n ( ˆ R + ˆ R ) P { U nk ( m k , l k ) = ˜ u nk , U nk (0 ,

0) = u nk , k = 1 , } ( c ) = 2 n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X (˜ u n , ˜ u n ,y n ):(˜ u n , ˜ u n ,y n ) ∈T ( n ) ǫ p ( y n | u n , u n ) Y k =1 p q (˜ u nk ) p q ( u nk ) ≤ n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) X (˜ u n , ˜ u n ):(˜ u n , ˜ u n ,y n ) ∈T ( n ) ǫ Y k =1 p q (˜ u nk ) p q ( u nk ) July 1, 2016 DRAFT2 ≤ n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ X y n ∈T ( n ) ǫ p ( y n | u n , u n ) × n ( H ( U ,U | Y )+ (. ǫ )) − n H ( U )+ D ( p U k p q )) − n H ( U )+ D ( p U k p q )) = 2 n ( ˆ R + ˆ R ) X ( u n ,u n ): u n ∈T ( n ) ǫ ,u n ∈T ( n ) ǫ n ( H ( U ,U | Y )+ (. ǫ )) − n H ( U )+ D ( p U k p q )) − n H ( U )+ D ( p U k p q )) ≤ n ( ˆ R + ˆ R ) − n ( I ( U ,U ; Y ) − (. ǫ )) − n (2 ˜ D U − (. ǫ )) where step ( a ) follows from the fact that conditioned on M , we have the Markov relation Y n → ( U n (0 , , U n (0 , → ( U n ( m , l ) , U n ( m , l )) , step ( b ) follows from Lemma 11, and step ( c ) follows from the independent construction of dithers d nk , k = 1 , and statistical independenceof linearly independent codewords [48, Theorem 1]. D. Case ( m , l , m , l ) ∈ L ( b ) : Let W b = b U ⊕ b U and let s b ∈ [2 n ˜ R ( b ) ] be the index whose q -ary expansion satisﬁes [ ν ( s b ) ] = b η ( m , l ) ⊕ b η ( m , l ) . (63)We can also uniquely associate each index s b with a linear combination of the codewords W n b ( s b ) := b U n ( m , l ) ⊕ b U n ( m , l ) . Then, P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ ′ , k = 1 , |M}≤ P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( a ) = P { ( W n b ( s b ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( b ) ≤ P { ( W n b ( s b ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( c ) ≤ n ( ˆ R + ˆ R ) − n ( I ( W b ; Y )+ D ( p W b k p q ) − (. ǫ )) 2 Y k =1 − n ( D ( p Uk k p q ) − (. ǫ )) , where step ( a ) follows from the fact that W n b ( s b ) is a function of ( U n ( m , l ) , U n ( m , l )) , step ( b ) follows from the fact that the event ( W n b ( s b ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ implies ( W n b ( s b ) , Y n ) ∈ T ( n ) ǫ , and step ( c ) follows from Lemma 1. July 1, 2016 DRAFT3

E. Case ( m , l , m , l ) ∈ L ( b ) : Consider some non-zero vector c = [ c , c ] ∈ F q that is linearly independent of b . Deﬁne s c ∈ [2 n ˜ R ( c ) ] as the index whose q -ary expansion satisﬁes [ ν ( s c ) ] = c η ( m , l ) ⊕ c η ( m , l ) . (64)and let W n c ( s c ) := c U n ( m , l ) ⊕ c U n ( m , l ) W c := c U ⊕ c U . Note that by deﬁnition, for ( m , l , m , l ) ∈ L ( b ) , W n b (0) = b U n ( m , l ) ⊕ b U n ( m , l ) .Then, P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ ′ , k = 1 , |M}≤ P { ( U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( a ) = P { ( W n b (0) , W n c ( s c ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( b ) ≤ P { ( W n b (0) , W n c ( s c ) , Y n ) ∈ T ( n ) ǫ , U nk (0 , ∈ T ( n ) ǫ , k = 1 , |M} ( c ) = 2 n ( ˆ R + ˆ R ) − n ( I ( W c ; Y,W b )+ D ( p W c k p q ) − (. ǫ )) 2 Y k =1 − n ( D ( p Uk k p q ) − (. ǫ )) , where step ( a ) follows from the fact that W n b (0) , W n c ( s c ) are deterministic functions of the linearcodewords U n ( m , l ) , U n ( m , l ) for ( m , l , m , l ) ∈ L ( b ) , step ( b ) follows from the fact thatthe event ( W n b (0) , W n c ( s c ) , U n ( m , l ) , U n ( m , l ) , Y n ) ∈ T ( n ) ǫ implies ( W n b (0) , W n c ( s c ) , Y n ) ∈T ( n ) ǫ , and step ( c ) follows from Lemma 1 with Y n replaced by ( Y n , W n b (0)) .A PPENDIX EP ROOF OF THE EQUIVALENCE OF R LMAC

AND RATE REGION (41)Deﬁne the rate regions R = { ( R , R ) : R < I ( X ; Y | X ) , (65) R < I ( X ; Y | X ) , (66) R + R < I ( X , X ; Y ) } . (67) July 1, 2016 DRAFT4 ˆ R = { ( R , R ) :min( R − H ( U ) , R − H ( U )) < I ( W b ; Y ) − H ( W b ) , min( R − H ( U ) , R − H ( U )) < I ( W c ; Y, W b ) − H ( W c ) } , and ˆ R = { ( R , R ) : R < min { I CF , ( b ) , I ( X , X ; Y ) − I CF , ( b ) }} , ˆ R = { ( R , R ) : R < min { I CF , ( b ) , I ( X , X ; Y ) − I CF , ( b ) }} , where I CF ,k ( b ) , k = 1 , , is deﬁned in (5a) and (5b).First, note that due to the following inequality between (65) and ˆ R (and also between (66)and ˆ R ), H ( U ) − H ( U | Y, U ) = H ( U ) − H ( W b | Y, U ) ≥ H ( U ) − H ( W b | Y ) , we have R LMAC = R ∩ ( ˆ R ∪ ˆ R ) .Next, note that I ( U ; Y | U ) = I ( U , X ; Y | U , X )= H ( Y | U , X ) − H ( Y | U , X , U , X )= H ( Y | X ) − H ( Y | X , X )= I ( X ; Y | X ) , where we have used the Markov relations U → X → Y and ( U , U ) → ( X , X ) → Y .Similarly, we have I ( U ; Y | U ) = I ( X ; Y | X ) ,I ( U , U ; Y ) = I ( X , X ; Y ) , and thus, the rate region in (41) is R ∩ ˆ R . Thus, it is sufﬁcient to show that ˆ R = ( ˆ R ∪ ˆ R ) .To this end, ﬁrst consider ( R , R ) ∈ ˆ R such that R − H ( U ) ≤ R − H ( U ) . Then, we have ( R , R ) ∈ ˆ R since R < H ( U ) + I ( W b ; Y ) − H ( W b )= H ( U ) − H ( W b | Y ) , July 1, 2016 DRAFT5 and R < H ( U ) + I ( W c ; Y, W b ) − H ( W c )= H ( U ) + H ( U ) − H ( U ) − H ( W c | Y, W b )= H ( U , U ) − H ( U ) − H ( W b , W c | Y ) + H ( W b | Y )= H ( U , U ) − H ( U , U | Y ) − H ( U ) + H ( W b | Y )= I ( U , U ; Y ) − H ( U ) + H ( W b | Y )= I ( X , X ; Y ) − I CF , ( b ) . (68)Similarly, for ( R , R ) ∈ ˆ R such that R − H ( U ) ≤ R − H ( U ) , we have ( R , R ) ∈ ˆ R .Clearly, ˆ R ⊆ ( ˆ R ∪ ˆ R ) .To show the inclusion in the other direction, it is sufﬁcient to show following:1) For the rate tuples ( R , R ) ∈ ˆ R such that R − H ( U ) ≤ R − H ( U ) , ( R , R ) ∈ ˆ R , (69)2) and for the rate tuples ( R , R ) ∈ ˆ R such that R − H ( U ) ≥ R − H ( U ) , ( R , R ) ∈ ˆ R . (70)We begin by considering the ﬁrst case and assume that a rate pair ( R , R ) satisﬁes R − H ( U ) ≤ R − H ( U ) and that ( R , R ) ∈ ˆ R . Since R ≤ R − H ( U ) + H ( U ) ( a ) < min { I ( W b ; Y ) − H ( W b ) , I ( W c ; Y, W b ) − H ( W c ) } + H ( U ) , ( b ) = min { I CF , ( b ) , I ( X , X ; Y ) − I CF , ( b ) } , ( R , R ) is also included in ˆ R , where step ( a ) follows from the fact that ( R , R ) ∈ ˆ R andstep ( b ) uses (68). The second case (70) can also be shown in the same manner.A PPENDIX FM ARKOV L EMMA FOR N ESTED L INEAR C ODES

Without loss of generality, we assume that the message indices are set to zero and focus onthe effect of the auxiliary indices. With a slight abuse of notation, we let η ( l k ) = [ ν ( l k ) ] denote the q -ary expansion of the index l k followed by zero padding to length κ = max k n ˆ R k . July 1, 2016 DRAFT6

Consider a nested linear code C k = { u nk ( l k ) : ν ( l k ) G ⊕ D nk , l k ∈ [2 n ˆ R k ] } , k ∈ [1 : K ] , where G ∈ F κ × n q is the random generator matrix and D nk ∈ F n q are the random dithers. Eachentry of G and D nk is drawn uniformly and independently from F q . We denote the realizationof G and D nk by G and d nk , respectively.Let ( X, U , . . . , U K ) ∼ p ( x ) Q Kk =1 p ( u k | x ) and consider the following encoding procedure. Encoding:

For each x n ∈ T ( n ) ǫ ′ , ﬁnd an index l k ∈ [2 n ˆ R k ] such that ( x n , U nk ( l k )) ∈ T ( n ) ǫ ′ . If there is more than one index, choose one at random from the available options. If there isnone, choose one at random from [2 n ˆ R k ] . Deﬁne the random variable L k as the chosen index. Lemma 12 (Markov Lemma for Nested Linear Codes) . For sufﬁciently small ǫ ′ < ǫ and any x n ∈ T ( n ) ǫ ′ ( X ) , lim n →∞ P { ( x n , U n ( L ) , . . . , U nK ( L k )) ∈ T ( n ) ǫ ( X, U , . . . , U K ) } = 1 , if ˆ R k > I ( U k ; X ) + D ( p U k k p q ) + (. ǫ ′ ) , k ∈ [1 : K ] . As noted earlier, the codebooks share a generator matrix, which means that the auxiliary indices L , . . . , L K are not conditionally independent given X n , even though the target distribution for U , . . . , U K is conditionally independent given X . This precludes a standard application of theMarkov lemma [1, Lemma 12.1]. Below, we develop a proof from ﬁrst principles, beginningwith some linear algebra deﬁnitions.To simplify our notation, we deﬁne n k := n ˆ R k log ( q ) , which allows us to write l k ∈ [ q n k ] rather than l k ∈ [2 n ˆ R k ] . Furthermore, let ˜ G =  D n ... D nK G  , July 1, 2016 DRAFT7 and for some ( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) ∈ [ q n ] × · · · × [ q n K ] × [ q n j ] × · · · × [ q n jt ] , and ≤ j < · · · < j t ≤ K , deﬁne H ( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) =  e η ( l ) ... ... e K η ( l K ) e j η (˜ l j ) ... ... e j t η (˜ l j t )  , (71)where e k is the k th standard basis vector in F Kq , i.e., its k th entry is while the rest are . Wewill use the notation rank( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) to denote the rank of H ( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) .Note that, with this notation at hand, the codeword tuple ( U n ( l ) , . . . , U nK , U nj (˜ l j ) , . . . , U nj t (˜ l j t )) can be represented by H ( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) · ˜ G .We can now state two basic statistical properties of nested linear codes. Lemma 13 (Uniformity) . For any choice of indices ( l , . . . , l K ) ∈ [ q n ] × · · · × [ q n K ] and ( u n , . . . , u nK ) ∈ F n q × · · · × F n q , P { U n ( l ) = u n , . . . , U nK ( l K ) = u nK } = 1 q nK . Lemma 13 is a direct consequence of the independent random dithers.

Lemma 14 (Linear Independence = ⇒ Statistical Independence) . For indices satisfying rank( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) = K + t , the random linear codewords ( U n ( l ) , . . . , U nK ( l K ) , U nj (˜ l j ) , . . . , U nj t (˜ l j t )) are statistically independent. July 1, 2016 DRAFT8

Proof:

For ( u n , . . . , u nK , ˜ u nj , . . . , , ˜ u nj t ) ∈ F n q × · · · × F n q , P { U n ( l ) = u n , . . . , U nK ( l K ) = u nK , U nj (˜ l j ) = ˜ u nj , . . . , U nj t (˜ l j t ) = ˜ u nj t } ( a ) = 1 q nK P { U nj (˜ l j ) = ˜ u nj , . . . , U nj t (˜ l j t ) = ˜ u nj t | U n ( l ) = u n , . . . , U nK ( l K ) = u nK } = 1 q nK P { U nj (˜ l j ) = ˜ u nj , . . . , U nj t (˜ l j t ) = ˜ u nj t | D n = u n ⊖ η ( l ) G , . . . , D nK = u nK ⊖ η ( l K ) G } = 1 q nK P { ( η (˜ l j ) ⊖ η ( l j )) G = ˜ u nj ⊖ u nj , . . . , ( η (˜ l j t ) ⊖ η ( l j t )) G = ˜ u nj t ⊖ u nj t | D n = u n ⊖ η ( l ) G , . . . , D nK = u nK ⊖ η ( l K ) G } ( b ) = 1 q nK P { ( η (˜ l j ) ⊖ η ( l j )) G = ˜ u nj ⊖ u nj , . . . , ( η (˜ l j t ) ⊖ η ( l j t )) G = ˜ u nj t ⊖ u nj t } ( c ) = 1 q n ( K + t ) , where step ( a ) follows from Lemma 13, step ( b ) follows from the fact that G and the dithersare independent, and step ( c ) follows from the fact that ( η (˜ l j ) ⊖ η ( l j )) , . . . , ( η (˜ l j t ) ⊖ η ( l j t )) arelinearly independent due to the assumption that rank( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) = K + t and [48,Theorem 1].It will be useful to classify codewords according the rank of their auxiliary indices. Deﬁnethe index set of rank r as I r := (cid:8) ( l , . . . , l K , ˜ l , . . . , ˜ l K ) : rank( l , . . . , l K , ˜ l , . . . , ˜ l K ) = r (cid:9) . Note that, by deﬁnition, |I | = · · · = |I K − | = 0 and |I K | = q n + ··· + n K . Lemma 15.

The size of an index set I r of rank K < r ≤ K is upper bounded as follows |I K + t | ≤ q n + ··· + n K q K X ≤ j < ···

The following construction can be used to generate all possible index tuples in I { j ,...,j t } K + t :1) Choose K arbitrary indices ( l , . . . , l K ) ∈ [ q n ] × · · · × [ q n K ] ,2) Choose t indices (˜ l j , . . . , ˜ l j t ) ∈ [ q n j ] × · · ·× [ q n jt ] such that rank( l , . . . , l K , ˜ l j , . . . , ˜ l j t ) = K + t , and3) For each ℓ ∈ { , . . . , K } \ { j , . . . , j t } , choose an index ˜ l ℓ ∈ [ q n ℓ ] such that the row vector [ e ℓ η (˜ l ℓ )] is a linear combination of the K + t row vectors in (71).We now upper bound the number of choices in each step of the construction above. First, thenumber of choices in Step 1) is q n + ··· + n K . Second, the number of choices in Step 2) is upperbounded by q n j + ··· + n jt . Third, for any ℓ ∈ { , . . . , K } \ { j , . . . , j t } , the number of choices for ˜ l ℓ is upper bounded by q K + t , because [ e ℓ η (˜ l ℓ )] is linearly dependent with respect to K + t rowvectors. As such, the total number of choices in Step 3) is at most q ( K + t )( K − t ) , which is in turnbounded by q K . The total number of choices leads to the following upper bound, |I { j ,...,j t } K + t | ≤ q n + ··· + n K q n j + ··· + n jt q K . Plugging this into (73) gives us the desired upper bound.We now bound the probability that the random linear codewords land in certain subsets. Itwill be useful to deﬁne Z S := X ( l ,...,l K ) (( U n ( l ) , . . . , U nK ( l K )) ∈ S ) (74)to represent the number of codeword tuples that fall in S . Since the codewords are uniformlydistributed, the mean of Z S is µ S = |S| q Kn − ( n + ··· + n K ) . Lemma 16.

For k ∈ [1 : K ] , let S k be a subset of F n q and let S be a subset of S × · · · × S K .For any γ > , the probability that Z S deviates from its mean is bounded as follows P (cid:26) | Z S − µ S | ≥ γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) (75) ≤ γ q Kn − ( n + ··· + n K ) |S | · · · |S K | + q K K − X t =1 X ≤ j < ···

Proof:

We begin by calculating the variance of Z S , σ S := Var( Z S )= E ( Z S ) − µ S = X l ,...,l K , ˜ l ,..., ˜ l K P (cid:8) ( U n ( l ) , . . . , U nK ( l K )) ∈ S , ( U n (˜ l ) , . . . , U nK (˜ l K )) ∈ S (cid:9) − µ S = K X r = K ϕ ( I r ) − µ S (77)where ϕ ( I ) := X ( l ,...,l K , ˜ l ,..., ˜ l K ) ∈I P (cid:8) ( U n ( l ) , . . . , U nK ( l K )) ∈ S , ( U n (˜ l ) , . . . , U nK (˜ l K )) ∈ S (cid:9) . Note that ( l , . . . , l K , ˜ l , . . . , ˜ l K ) ∈ I K if and only if l k = ˜ l k for all k ∈ [1 : K ] . Therefore, ϕ ( I K ) = X l ,...,l K P { ( U n ( l ) , . . . , U nK ( l K )) ∈ S} = |S| q Kn − ( n + ··· + n K ) = µ S . (78)Next, by Lemma 14, we observe that, for ( l , . . . , l K , ˜ l , . . . , ˜ l K ) ∈ I K , the resulting randomcodewords are independent. Therefore, ϕ ( I K ) = X ( l ,...,l K , ˜ l ,..., ˜ l K ) ∈I K P { ( U n ( l ) , . . . , U nK ( l K )) ∈ S} P { ( U n (˜ l ) , . . . , U nK (˜ l K )) ∈ S} = X ( l ,...,l K , ˜ l ,..., ˜ l K ) ∈I K |S| q Kn = |I K | |S| q Kn ≤ q n + ··· + n K ) |S| q Kn = µ S , (79)where the inequality follows from Lemma 15.For the remaining terms, we use the subsets deﬁned in (72) to obtain a union bound, ϕ ( I K + t ) ≤ X ≤ j < ···

Finally, we obtain the desired upper bound via Chebyshev’s inequality, P (cid:26) | Z S − µ S | ≥ γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) ≤ γ (cid:18) q Kn − ( n + ··· + n K ) |S | · · · |S K | (cid:19) σ S ( a ) ≤ γ q Kn − ( n + ··· + n K ) |S | · · · |S K | q K K − X t =1 X ≤ j < ···

Let V , . . . , V K be random variables that are conditionally independent given therandom variable X . Then, for sufﬁciently small ǫ ′ < ǫ and x n ∈ T ( n ) ǫ ′ ( X ) , lim n →∞ (cid:12)(cid:12) T ( n ) ǫ ′ ( V | x n ) × · · · × T ( n ) ǫ ′ ( V K | x n ) ∩ (cid:0) T ( n ) ǫ ( V , . . . , V K | x n ) (cid:1) c (cid:12)(cid:12)(cid:12)(cid:12) T ( n ) ǫ ′ ( V | x n ) × · · · × T ( n ) ǫ ′ ( V K | x n ) (cid:12)(cid:12) = 0 . Proof:

Lemma 17 is a simple consequence of Lemma 12.1 in [1] once we have the followingrelation. For some x n ∈ T ( n ) ǫ ′ ( X ) , let V n , . . . , V nK be independent random sequences uniformlydistributed in T ( n ) ǫ ′ ( V | x n ) , . . . , T ( n ) ǫ ′ ( V K | x n ) , respectively. Then, P { ( x n , V n , . . . , V nK ) ∈ T ( n ) ǫ } = X v n ∈T ( n ) ǫ ′ ,...,v nK ∈T ( n ) ǫ ′ P { ( x n , v n , . . . , v nK ) ∈ T ( n ) ǫ , V n = v n , . . . , V nK = v nK } = X v n ∈T ( n ) ǫ ′ ,...,v nK ∈T ( n ) ǫ ′ :( x n ,v n ,...,v nK ) ∈T ( n ) ǫ P { V n = v n , . . . , V nK = v nK } This independence assumption does not hold for nested linear codes, which precludes a direct application of the MarkovLemma in our achievability proof.

July 1, 2016 DRAFT3 = (cid:12)(cid:12)(cid:12)(cid:0) T ( n ) ǫ ′ ( V | x n ) × · · · × T ( n ) ǫ ′ ( V K | x n ) (cid:1) ∩ T ( n ) ǫ ( V , . . . , V K | x n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T ( n ) ǫ ′ ( V | x n ) × · · · × T ( n ) ǫ ′ ( V K | x n ) (cid:12)(cid:12) . It remains to show that the left-hand side of the relation above tends to . For some ǫ <ǫ < · · · < ǫ K where ǫ = ǫ ′ and ǫ K = ǫ , we have that P { ( x n , V n , . . . , V nK ) ∈ T ( n ) ǫ }≥ P { ( x n , V n ) ∈ T ( n ) ǫ , ( x n , V n , V n ) ∈ T ( n ) ǫ , . . . , ( x n , V n , V n , . . . , V nK ) ∈ T ( n ) ǫ K } = K Y k =1 P { ( x n , V n , . . . , V nk ) ∈ T ( n ) ǫ k | ( x n , V n , . . . , V nk − ) ∈ T ( n ) ǫ k − , . . . , ( x n , V n ) ∈ T ( n ) ǫ } ( a ) ≥ (1 − δ n ) K − where step ( a ) follows from K − applications of [1, Lemma 12.1] and δ n → as n → ∞ .We are now ready to assemble a proof for the Markov Lemma for Nested Linear Codes. Proof of Lemma 12:

Select < ǫ ′ < ǫ . Deﬁne S k = T ( n ) ǫ ′ ( U k | x n ) S = (cid:0) S × · · · × S K (cid:1) ∩ (cid:0) T ( n ) ǫ ( U , . . . , U K | x n ) (cid:1) c . Also, deﬁne the intersection of the codebooks with the marginally typical sets, A = (cid:0) C × · · · × C K (cid:1) ∩ (cid:0) S × · · · × S K (cid:1) , as well as the subset that is not jointly typical, B = (cid:0) C × · · · × C K (cid:1) ∩ S = A ∩ (cid:0) T ( n ) ǫ ( U , . . . , U K | x n ) (cid:1) c . We need to show that, with high probability, there are many choices of marginally typicallycodewords (i.e., |A| is large), but relatively few of them are not jointly typical (i.e., |B| / |A| issmall).Deﬁne U n = ( U n ( L ) , . . . , U nK ( L K )) . We have that P (cid:8) U n ∈ T ( n ) ǫ ( U , . . . , U K | x n ) (cid:9) ≥ P (cid:8) U n ∈ ( S × · · · × S K ) ∩ T ( n ) ǫ ( U , . . . , U K | x n ) (cid:9) = P (cid:8) U n ∈ S × · · · × S K } − P { U n ∈ B} . July 1, 2016 DRAFT4

The ﬁrst term is lower bounded as follows: P (cid:8) U n ∈ S × · · · S K } ≥ − K X k =1 P (cid:8) U nk ( L k ) / ∈ T ( n ) ǫ ′ ( U k | x n ) (cid:9) . By Lemma 9 in Appendix B, each term in the summation tends to zero as n → ∞ since, byassumption, ˆ R k > I ( U k ; X ) + D ( p U k k p q ) + (. ǫ ′ ) .It remains to show that P { U n ∈ B} tends to zero. To this end, for some γ > to be speciﬁedlater, deﬁne a n := (1 − γ ) |S | · · · |S K | q Kn − ( n + ··· + n K ) b n := |S| + γ |S | · · · |S K | q Kn − ( n + ··· + n K ) . We have that P { U n ∈ B}≤ P (cid:8) U n ∈ B (cid:12)(cid:12) |A| > a n , |B| < b n (cid:9) + P (cid:8) {|A| > a n , |B| < b n } c (cid:9) ≤ P (cid:8) U n ∈ B (cid:12)(cid:12) |A| > a n , |B| < b n (cid:9) + P {|A| ≤ a n } + P {|B| ≥ b n } < b n a n + P {|A| ≤ a n } + P {|B| ≥ b n } where the last step is due to the fact that U n is uniformly distributed in A conditioned on |A| ≥ , combined with the fact that B ⊂ A . The ﬁrst term can be written as b n a n = 11 − γ (cid:18) γ + |S||S | · · · |S K | (cid:19) and we know, from Lemma 17, that lim n →∞ |S||S | ··· |S K | = 0 .For the second and third terms, note that q n − n k |S k | ≤ − n ( ˆ R k − ( I ( U k ; X )+ D ( p Uk k p q )+ (. ǫ ′ )) , which tends to as n → ∞ . For the remainder of the proof, we will assume n is large enoughsuch that the upper bound (76) from Lemma 16 is at most γ . Recall that, from (74), Z A = |A| July 1, 2016 DRAFT5 and Z B = |B| . It follows that P { Z A ≤ a n } = P (cid:26) Z A − µ A ≤ − γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) ≤ P (cid:26) | Z A − µ A | ≥ γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) ≤ γ where the last step follows from Lemma 16. Similarly, we have that P { Z B ≥ b n } = P (cid:26) Z B − µ B ≥ γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) ≤ P (cid:26) | Z B − µ B | ≥ γ |S | · · · |S K | q Kn − ( n + ··· + n K ) (cid:27) ≤ γ. Finally, by letting γ tend to zero as n → ∞ , we obtain the desired result.R EFERENCES [1] A. El Gamal and Y.-H. Kim,

Network Information Theory . Cambridge: Cambridge University Press, 2011.[2] J. K¨orner and K. Marton, “How to encode the modulo-two sum of binary sources,”

IEEE Trans. Inf. Theory , vol. 25, no. 2,pp. 219–221, 1979.[3] M. P. Wilson, K. Narayanan, H. D. Pﬁster, and A. Sprintson, “Joint physical layer coding and network coding forbidirectional relaying,”

IEEE Trans. Inf. Theory , vol. 56, no. 11, pp. 5641–5654, Nov. 2010.[4] W. Nam, S.-Y. Chung, and Y. H. Lee, “Capacity of the Gaussian two-way relay channel to within bit,” IEEE Trans. Inf.Theory , vol. 56, no. 11, pp. 5488–5494, Nov. 2010.[5] B. Nazer and M. Gastpar, “Compute-and-forward: Harnessing interference through structured codes,”

IEEE Trans. Inf.Theory , vol. 57, no. 10, pp. 6463–6486, Oct. 2011.[6] U. Niesen and P. Whiting, “The degrees-of-freedom of compute-and-forward,”

IEEE Trans. Inf. Theory , vol. 58, no. 8, pp.5214–5232, Aug. 2012.[7] Y. Song and N. Devroye, “Lattice codes for the Gaussian relay channel: Decode-and-forward and compress-and-forward,”

IEEE Trans. Inf. Theory , vol. 59, no. 8, pp. 4927–4948, Sep. 2013.[8] S. N. Hong and G. Caire, “Compute-and-forward strategies for cooperative distributed antenna systems,”

IEEE Trans. Inf.Theory , vol. 59, no. 9, pp. 5227–5243, Sep. 2013.[9] Z. Ren, J. Goseling, J. H. Weber, and M. Gastpar, “Maximum throughput gain of compute-and-forward for multipleunicast,”

IEEE Communication Letters , vol. 18, no. 7, pp. 1111–1113, Jul. 2014.[10] G. Bresler, A. Parekh, and D. N. C. Tse, “The approximate capacity of the many-to-one and one-to-many Gaussianinterference channel,”

IEEE Trans. Inf. Theory , vol. 56, no. 9, pp. 4566–4592, Sep. 2010.[11] A. S. Motahari, S. Oveis-Gharan, M.-A. Maddah-Ali, and A. K. Khandani, “Real interference alignment: Exploiting thepotential of single antenna systems,”

IEEE Trans. Inf. Theory , vol. 60, no. 8, pp. 4799–4810, Aug. 2014.

July 1, 2016 DRAFT6 [12] U. Niesen and M. A. Maddah-Ali, “Interference alignment: From degrees-of-freedom to constant-gap capacity approxima-tions,”

IEEE Trans. Inf. Theory , vol. 59, no. 8, pp. 4855–4888, Aug. 2013.[13] O. Ordentlich, U. Erez, and B. Nazer, “The approximate sum capacity of the symmetric Gaussian-user interference channel,”

IEEE Trans. Inf. Theory , vol. 60, no. 6, pp. 3450–3482, Jun. 2014.[14] I. Shomorony and S. Avestimehr, “Degrees of freedom of two-hop wireless networks: Everyone gets the entire cake,”

IEEETrans. Inf. Theory , vol. 60, no. 5, pp. 2417–2431, May 2014.[15] V. Ntranos, V. R. Cadambe, B. Nazer, and G. Caire, “Integer-forcing interference alignment,” in

Proc. IEEE Int. Symp.Inf. Theory , Istanbul, Turkey, Jul. 2013.[16] A. Padakandla, A. G. Sahebi, and S. S. Pradhan, “An achievable rate region for the three-user interference channel basedon coset codes,”

IEEE Trans. Inf. Theory , vol. 62, no. 3, pp. 1250–1279, Mar. 2016.[17] D. Krithivasan and S. S. Pradhan, “Lattices for distributed source coding: Jointly Gaussian sources and reconstruction ofa linear function,”

IEEE Trans. Inf. Theory , vol. 55, no. 12, pp. 5628–5651, Dec. 2009.[18] ——, “Distributed source coding using Abelian group codes,”

IEEE Trans. Inf. Theory , vol. 57, no. 3, pp. 1495–1519,Mar. 2011.[19] A. B. Wagner, “On distributed compression of linear functions,”

IEEE Trans. Inf. Theory , vol. 57, no. 1, pp. 79–94, Jan.2011.[20] D. N. C. Tse and M. A. Maddah-Ali, “Interference neutralization in distributed lossy source coding,” in

Proc. IEEE Int.Symp. Inf. Theory , Austin, TX, June 2010.[21] Y. Yang and Z. Xiong, “Distributed compression of linear functions: Partial sum-rate tightness and gap to optimal sum-rate,”

IEEE Trans. Inf. Theory , vol. 60, no. 5, pp. 2835–2855, May 2014.[22] T. Philosof and R. Zamir, “On the loss of single-letter characterization: The dirty multiple access channel,”

IEEE Trans.Inf. Theory , vol. 55, no. 6, pp. 2442–2454, Jun. 2009.[23] T. Philosof, R. Zamir, U. Erez, and A. J. Khisti, “Lattice strategies for the dirty multiple access channel,”

IEEE Trans. Inf.Theory , vol. 57, no. 8, pp. 5006–5035, Aug. 2011.[24] I.-H. Wang, “Approximate capacity of the dirty multiple-access channel with partial state information at the encoders,”

IEEE Trans. Inf. Theory , vol. 58, no. 5, pp. 2781–2787, May 2012.[25] A. Padakandla and S. S. Pradhan, “Achievable rate region based on coset codes for multiple access channel with states,”2013, preprint available at http://arxiv.org/abs/1301.5655.[26] X. He and A. Yener, “Providing secrecy with structured codes: Tools and applications to two-user Gaussian channels,”

IEEE Trans. Inf. Theory , vol. 60, no. 4, pp. 2121–2138, Apr. 2014.[27] S. Vatedka, N. Kashyap, and A. Thangaraj, “Secure compute-and-forward in a bidirectional relay,”

IEEE Trans. Inf. Theory ,vol. 61, no. 5, pp. 2531–2556, May 2015.[28] J. Xie and S. Ulukus, “Secure degrees of freedom of one-hop wireless networks,”

IEEE Trans. Inf. Theory , vol. 60, no. 6,pp. 3359–3378, Jun. 2014.[29] B. Nazer and R. Zamir, 2014, ch. Gaussian Networks, appears as Ch. 12 in [58].[30] A. Padakandla and S. S. Pradhan, “Achievable rate region for three user discrete broadcast channel based on coset codes,”2012, preprint available at http://arxiv.org/abs/1207.3146.[31] O. Ordentlich, U. Erez, and B. Nazer, “Successive integer-forcing and its sum-rate optimality,” in

Proc. 51th Ann. AllertonConf. Comm. Control Comput. , Monticello, IL, Oct. 2013, pp. 282–292.[32] B. Nazer, V. Cadambe, V. Ntranos, and G. Caire, “Expanding the compute-and-forward framework: Unequal powers,

July 1, 2016 DRAFT7 signal levels, and multiple linear combinations,”

IEEE Trans. Inf. Theory , to appear 2016, preprint available athttp://arxiv.org/abs/1504.01690.[33] O. Ordentlich and U. Erez, “On the robustness of lattice interference alignment,”

IEEE Trans. Inf. Theory , vol. 59, no. 5,pp. 2735–2759, May 2013.[34] J. Zhu and M. Gastpar, “Asymmetric compute-and-forward with CSIT,” in

International Zurich Seminar on Communica-tions , 2014.[35] ——, “Compute-and-forward using nested linear codes for the Gaussian MAC,” in

Proc. IEEE Inf. Theory Workshop , Apr.2015, pp. 1–5.[36] R. Ahlswede, “Group codes do not achieve Shannon’s channel capacity for general discrete channels,”

The Annals ofMathematical Statistics , pp. 224–240, 1971.[37] S. Miyake, “Coding theorems for point-to-point communication systems using sparse matrix codes.” Ph.D. Thesis,University of Tokyo, Tokyo, Japan, 2010.[38] A. Padakandla and S. S. Pradhan, “Computing the sum of sources over an arbitrary multiple access channel,” in

Proc.IEEE Int. Symp. Inf. Theory , Istanbul, Turkey, 2013.[39] S. I. Gelfand and M. S. Pinsker, “Coding for channel with random parameters,”

Probl. Control Inf. Theory , vol. 9, no. 1,pp. 19–31, 1980.[40] K. Marton, “A coding theorem for the discrete memoryless broadcast channel,”

IEEE Trans. Inf. Theory , vol. 25, no. 3,pp. 306–311, 1979.[41] P. Minero, S. H. Lim, and Y.-H. Kim, “A uniﬁed approach to hybrid coding,”

IEEE Trans. Inf. Theory , vol. 61, no. 4, pp.1509–1523, April 2015.[42] C. E. Shannon, “Channels with side information at the transmitter,”

IBM J. Res. Develop. , vol. 2, no. 4, pp. 289–293,1958.[43] A. Orlitsky and J. R. Roche, “Coding for computing,”

IEEE Trans. Inf. Theory , vol. 47, no. 3, pp. 903–917, 2001.[44] J. Zhu and M. Gastpar, “Multiple access via compute-and-forward,” 2014, preprint available athttp://arxiv.org/abs/1407.8463.[45] C. Feng, D. Silva, and F. Kschischang, “An algebraic approach to physical-layer network coding,”

IEEE Trans. Inf. Theory ,vol. 59, no. 11, pp. 7576–7596, Nov. 2013.[46] J. Zhan, B. Nazer, U. Erez, and M. Gastpar, “Integer-forcing linear receivers,”

IEEE Trans. Inf. Theory , vol. 55, no. 12,pp. 7661–7685, Dec. 2014.[47] C. E. Shannon, “A mathematical theory of communication,”

Bell Syst. Tech. J. , vol. 27, no. 3, pp. 379–423, 27(4), 623–656,1948.[48] Y. Domb, R. Zamir, and F. Meir, “The random coding bound is tight for average linear code or lattice,” 2013, preprintavailable at http://arxiv.org/abs/1307.5524v2.[49] T. M. Cover and A. El Gamal, “Capacity theorems for the relay channel,”

IEEE Trans. Inf. Theory , vol. 25, no. 5, pp.572–584, Sep. 1979.[50] B. Schein and R. G. Gallager, “The Gaussian parallel relay channel,” in

Proc. IEEE Int. Symp. Inf. Theory , Sorrento, Italy,Jun. 2000, p. 22.[51] R. M. Gray,

Entropy and Information Theory , 2nd ed. Boston, MA: Springer US, 2011.[52] A. R´enyi, “On the dimension and entropy of probability distributions,”

Acta Mathematica Academiae ScientiarumHungarica , vol. 10, no. 1, pp. 193–215, Mar. 1959.

July 1, 2016 DRAFT8 [53] A. V. Makkuva and Y. Wu, “On additive-combinatorial afﬁne inequalities for Shannon entropy and differential entropy,”2016, preprint available at http://arxiv.org/abs/1601.07498.[54] E. Posner, “Random coding strategies for minimum entropy,”

IEEE Trans. Inf. Theory , vol. 21, no. 4, pp. 388–391, Jul.1975.[55] T. v. Erven and P. Harremo¨es, “R´enyi divergence and Kullback-Leibler divergence,”

IEEE Trans. Inf. Theory , vol. 60,no. 7, pp. 3797–3820, Jul. 2014.[56] R. B. Ash and C. A. Dol´eans-Dade,

Probability and Measure Theory , 2nd ed. Elsevier/Academic Press, 2000.[57] I. Csisz´ar and J. K¨orner,

Information Theory: Coding Theorems for Discrete Memoryless Systems . Cambridge, UK:Cambridge University Press, 2011.[58] R. Zamir,

Lattice Coding for Signals and Networks . Cambridge University Press, 2014.. Cambridge University Press, 2014.