[PDF] Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments

Abstract

Probabilistic amplitude shaping (PAS) is a coded modulation strategy in which constellation shaping and channel coding are combined. PAS has attracted considerable attention in both wireless and optical communications. Achievable information rates (AIRs) of PAS have been investigated in the literature using Gallager's error exponent approach. In particular, it has been shown that PAS achieves the capacity of the additive white Gaussian noise channel (Böcherer, 2018). In this work, we revisit the capacity-achieving property of PAS and derive AIRs using weak typicality. Our objective is to provide alternative proofs based on random sign-coding arguments that are as constructive as possible. Accordingly, in our proofs, only some signs of the channel inputs are drawn from a random code, while the remaining signs and amplitudes are produced constructively. We consider both symbol-metric and bit-metric decoding.

Full PDF

DDRAFT 1

Achievable Information Rates for ProbabilisticAmplitude Shaping: An Alternative Approach viaRandom Sign-Coding Arguments

Yunus Can G¨ultekin, Alex Alvarado, and Frans M. J. WillemsInformation and Communication Theory Laboratory (ICTLab)Signal Processing Systems (SPS) Group, Department of Electrical EngineeringEindhoven University of Technology, The NetherlandsEmails: { y.c.g.gultekin, a.alvarado, f.m.j.willems } @tue.nl Abstract

Probabilistic amplitude shaping (PAS) is a coded modulation strategy in which constellation shaping and channel codingare combined. PAS has attracted considerable attention in both wireless and optical communications. Achievable informationrates (AIRs) of PAS have been investigated in the literature using Gallager’s error exponent approach. In particular, it has beenshown that PAS achieves the capacity of the additive white Gaussian noise channel (B¨ocherer, 2018). In this work, we revisit thecapacity-achieving property of PAS and derive AIRs using weak typicality. Our objective is to provide alternative proofs basedon random sign-coding arguments that are as constructive as possible. Accordingly, in our proofs, only some signs of the channelinputs are drawn from a random code, while the remaining signs and amplitudes are produced constructively. We consider bothsymbol-metric and bit-metric decoding.

Index Terms

Probabilistic amplitude shaping, achievable information rate, random coding, symbol-metric decoding, bit-metric decoding.

I. I

NTRODUCTION

Coded modulation (CM) refers to the design of forward error correction (FEC) codes and high-order modulation formats,which are combined to reliably transmit more than one bit per channel use. Examples of CM strategies include multilevel coding(MLC) [1], [2] in which each address bit of the signal point is protected by an individual binary FEC code, and trellis CM [3],which combines the functions of a trellis-based channel code and a modulator. Among many CM strategies, bit-interleavedCM (BICM) [4], [5], which combines a high-order modulation format with a binary FEC code using a binary labeling strategyand uses bit-metric decoding (BMD) at the receiver, is the de-facto standard for CM. BICM is included in multiple wirelesscommunication standards such as the IEEE 802.11 [6] and the DVB-S2 [7]. BICM is also currently the de-facto CM alternativefor ﬁber optical communications.Proposed in [8], probabilistic amplitude shaping (PAS) integrates constellation shaping into existing BICM systems. Theshaping gap that exists for the additive white Gaussian noise (AWGN) channel [9, Ch. 9] can be closed with PAS. To this end,an amplitude shaping block converts binary information strings into shaped amplitude sequences in an invertible manner. Then,a systematic FEC code produces parity bits encoding the binary labels of these amplitudes. These parity bits are used to selectthe signs, and the combination of the amplitudes and the signs, i.e., probabilistically shaped channel inputs, are transmittedover the channel. PAS has attracted considerable attention in ﬁber optical communications due to its availability of providingrate adaptivity [10], [11].Achievable information rates (AIRs) of PAS have been investigated in the literature [12], [13], [14]. It has been shown thatthe capacity of the AWGN channel can be achieved with PAS, e.g., in [13, Example 10.4]. The achievability proofs in theliterature are based on Gallager’s error exponent approach [15, Ch. 5] or on strong typicality [16, Ch. 1].In this work, we provide a random sign-coding framework based on weak-typicality that contains the achievability proofsrelevant for the PAS architecture. We also revisit the capacity-achieving property of PAS for the AWGN channel. As explainedin Section II-E, the ﬁrst main contribution of this paper is to provide a framework that combines the constructive approachto amplitude shaping with randomly-chosen error-correcting codes, where the randomness is concentrated only in the choiceof the signs. The second contribution is to provide a unifying framework of achievability proofs to bring together PAS resultsthat are somewhat scattered in the literature, using a single proof technique, which we call the random sign-coding arguments.This work is organized as follows. In Section II, we brieﬂy summarize the related literature on CM, AIRs, and PAS and stateour contribution. In Section III, we provide some background information on typical sequences and deﬁne a modiﬁed (weakly)typical set. In Section IV, we explain the random sign-coding setup. Finally in Section V, we provide random sign-coding

Acknowledgements:

The work of Y.C. G¨ultekin and A. Alvarado has received funding from the European Research Council (ERC) under the EuropeanUnion’s Horizon 2020 research and innovation programme (grant agreement No 757791). a r X i v : . [ c s . I T ] J u l RAFT 2 arguments to derive AIRs for PAS and, consequently, show that it achieves the capacity of a discrete-input memoryless channelwith a symmetric capacity-achieving distribution. Conclusions are drawn in Section VI.II. R

ELATED W ORK AND O UR C ONTRIBUTION

A. Notation

Capital letters X are used to denote random variables, while lower case letters x are used to denote their realizations.Underlined capital and lower case letters X and x are used to denote random vectors and their realizations, respectively.Boldface capital and lower case letters X and x are used to denote collections of random variables and their realizations,respectively. Underlined boldface capital and lower case letters X and x are used to denote collections of random vectors andtheir realizations, respectively. Element-wise multiplication of x and y is denoted by x ⊗ y . Calligraphic letters X representsets, while X Y = { xy : x ∈ X , y ∈ Y} . We denote by X n the n -fold Cartesian product of X with itself, while X × Y is theCartesian product of X and Y . Probability density and mass functions over X are denoted by p ( x ) . We use [ · ] to indicatethe indicator function, which is one when its argument is true and zero otherwise. The entropy of X is denoted by H ( X ) (inbits), the expected value of X by E [ X ] . B. Achievable Information Rates

For a memoryless channel that is characterized by an input alphabet X , input distribution p ( x ) , and channel law p ( y | x ) ,the maximum AIR is the mutual information (MI) I ( X ; Y ) of the channel input X and output Y . Consequently, the capacityof this channel is deﬁned as I ( X ; Y ) maximized over all possible input distributions p ( x ) , typically under an average powerconstraint, e.g., in [9, Sec. 9.1]. The MI can be achieved, e.g., with MLC and multi-stage decoding [1], [2].In BICM systems, channel inputs are uniquely labeled with log |X | = ( m + 1) -bit binary strings. Here, we assume that |X | is an integer power of two. At the transmitter, the output of a binary FEC code is mapped to channel inputs using this labelingstrategy. At the receiver, BMD is employed, i.e., binary labels C = ( C , C , · · · , C m +1 ) are assumed to be independent,and consequently, the symbol-wise decoding metric is written as the product of bit-metrics: q ( x, y ) = m +1 (cid:89) i =1 q i ( c i , y ) . (1)Since the metric in (1) is in general not proportional to p ( y | x ) , i.e., there is a mismatch between the actual channel law andthe one assumed at the receiver, this setup is called mismatched decoding.Different AIRs have been derived for this so-called mismatched decoding setup. One of these is the generalized MI(GMI) [17], [18]: GMI ( p ( x )) = max s ≥ E (cid:20) log [ q ( X, Y )] s (cid:80) x ∈X p ( x ) [ q ( x, Y )] s (cid:21) , (2)which reduces to [19, Thm. 4.11, Coroll. 4.12] and [20]:GMI ( p ( c ) p ( c ) · · · p ( c m +1 )) = m +1 (cid:88) i =1 I ( C i ; Y ) (3)when the bit levels are independent at the transmitter, i.e., p ( x ) = p ( c ) = p ( c ) p ( c ) · · · p ( c m +1 ) where c = ( c , c , · · · , c m +1 ) ,and: q i ( c i , y ) = p ( y | c i ) . (4)The rate (3) is achievable for both uniform and shaped bit levels [5], [21]. The problem of computing the bit level distributionsthat maximize the GMI in (3) was shown to be nonconvex in [22]. The parameter that maximizes (2) to obtain (3) is s = 1 .Another AIR for mismatched decoding is the LM (lower bound on the mismatch capacity) rate [18], [23]:LM ( p ( x )) = max s ≥ ,r ( · ) E (cid:20) log [ q ( X, Y )] s r ( X ) (cid:80) x ∈X p ( x ) [ q ( x, Y )] s r ( x ) (cid:21) , (5)where r ( · ) is a real-valued cost function deﬁned on X . The expectations in (2) and (5) are taken with respect to p ( x, y ) .When there is dependence among bit levels, i.e., p ( x ) = p ( c ) (cid:54) = p ( c ) p ( c ) · · · p ( c m +1 ) , the rate [24], [25]: R BMD ( p ( x )) = H ( C ) − m +1 (cid:88) i =1 H ( C i | Y ) (6)has been shown to be achievable by BMD for any joint input distribution p ( c ) = p ( c , c , · · · , c m +1 ) . In [24], [25], the achiev-ability of (6) was derived using random coding arguments based on strong typicality [16, Ch. 1]. Later in [26, Lemma 1], it RAFT 3

AmplitudeShaping a ∈ A k bits BinaryLabeling SystematicFEC Coding R c = m + γm +1 c c · · · c m γn bits s i = ( s , s , · · · , s γn ) a = ( a , a , · · · , a n ) s i = ( s , s , · · · , s γn ) s p = ( s γn +1 s γn +2 · · · s n ) s = ( s i , s p ) Fig. 1. Probabilistic amplitude shaping with transmission rate R = k/n + γ bit/1D. was shown that (6) is an instance of the so-called LM rate (5) for s = 1 , the symbol decoding metric (1), bit decoding metrics(4), and the cost function: r ( c , c , · · · , c m +1 ) = (cid:81) m +1 i =1 p ( c i ) p ( c , c , · · · , c m +1 ) . (7)We note here that R BMD in (6) can be negative as discussed in [26, Sec. II-B]. In such cases, R BMD cannot be considered asan achievable rate. To avoid this, R BMD is deﬁned as the maximum of (6) and zero in [26, Eq. (1)].

C. Probabilistic Amplitude Shaping: Model

PAS [8] is a capacity-achieving CM strategy in which constellation shaping and FEC coding are combined as shown inFigure 1. In PAS, ﬁrst an amplitude shaping block maps k -bit information strings to n -amplitude shaped sequences a =( a , a , · · · , a n ) in an invertible manner. These amplitudes are drawn from a m -ary alphabet A . The amplitude shapingblock can be realized using constant composition distribution matching [27], multiset-partition distribution matching [28],shell mapping [29], enumerative sphere shaping [30], etc.After n amplitudes are generated, binary labels c c · · · c m of the amplitudes a and an additional γn -bit information string s i = ( s , s , · · · , s γn ) are fed to a rate ( m + γ ) / ( m + 1) systematic FEC encoder. The encoder produces (1 − γ ) n parity bits s p = ( s γn +1 , s γn +2 , · · · , s n ) . The additional data bits s i and the parity bits s p are used as the signs s = ( s , s , · · · , s n ) forthe amplitudes a . Finally, probabilistically shaped channel inputs x = s ⊗ a are transmitted through the channel. Here, γ is therate of the additional information in bits per symbol (bit/1D) or, equivalently, the fraction of signs that are selected directlyby data bits. The transmission rate of PAS is R = k/n + γ in bit/1D. D. Probabilistic Amplitude Shaping: Achievable Rates

Based on Gallager’s error exponent approach [15, Ch. 5], AIRs of PAS were investigated in [12], [13], [14]. In [12], a randomcode ensemble was considered from which the channel inputs x were drawn. Then, the AIR in [12, Eqs. (32)–(34)] was derivedfor a general memoryless decoding metric q ( x, y ) . It was shown that by properly selecting q ( x, y ) , I ( X ; Y ) and the rate (6)can be recovered from the derived AIR, and consequently, they can be achieved with PAS.Computing error exponents for PAS was also the main concern of the work presented in [13, Ch. 10]. The differencefrom [12] was in the random coding setup. In [13, Ch. 10], a random code ensemble was considered from which only the signs s of the channel inputs were drawn at random. We call this the random sign-coding setup. The error exponent [13, Eq. (10.42)]was then derived again for a general memoryless decoding metric. Error exponents of PAS have also been examined based onthe joint source-channel coding (JSCC) setup in [14], [31]. Random sign-coding was considered in [14], [31], but only withsymbol-metric decoding (SMD) and only for the speciﬁc case where γ = 0 . E. Our Contribution

In this work, we derive AIRs of PAS in a random sign-coding framework based on weak typicality [9, Secs. 3.1, 7.6and 15.2]. We ﬁrst consider basic sign-coding in which amplitudes of the channel inputs are generated constructively whilethe signs are drawn from a randomly generated code. Basic sign-coding corresponds to PAS with γ = 0 . Then, we considermodiﬁed sign-coding in which only some of the signs are drawn from the random code while the remaining are chosen directlyby information bits. Modiﬁed sign-coding corresponds to PAS with < γ < . We compute AIRs for both SMD and BMD.Our ﬁrst objective is to provide alternative proofs of achievability in which the codes are generated as constructively aspossible. In our random sign-coding experiment, both the amplitude sequences ( a ) and the sign sequence parts ( s i ) that areinformation bits are constructively produced, and only the remaining signs ( s p ) are randomly generated as illustrated in Figure 2.In most proofs of Shannon’s channel coding theorem, channel input sequences ( x ) are drawn at random, and the existence of agood code is demonstrated. Therefore, these proofs are not constructive and cannot be used to identify good codes as discussed, RAFT 4 a a · · · a n a s s · · · s γn s i s γn +1 s γn +2 · · · s n s p Random code [This work]Random code [13], [14]Random code [12]

Fig. 2. The scope of the random coding experiments considered in this work and in [12], [13], [14]. e.g., in [32, Sec. I] and the references therein. On the other hand, in our proofs using random sign-coding arguments, it isself-evident how—at least a part of—the code should be constructed. Our second objective is to provide a uniﬁed frameworkin which all possible PAS scenarios are considered, i.e., SMD or BMD at the receiver with ≤ γ < , and correspondingAIRs are determined using a single technique, i.e., the random sign-coding argument.Note that our approach differs from the random sign-coding setup considered in [13], [14] where all signs ( s i and s p ) weregenerated randomly, which was called partially systematic encoding in [13, Ch. 10]. We will show later that only s p needs tobe chosen randomly. Furthermore, we deﬁne a special type of typicality ( B -typicality; see Deﬁnition 1 below) that allows usto avoid the mismatched JSCC approach of [14]. III. P RELIMINARIES

A. Memoryless Channels

We consider communication over a memoryless channel with discrete input X ∈ X and discrete output Y ∈ Y . The channellaw is given by: p ( y | x ) = n (cid:89) i =1 p ( y i | x i ) . (8)Later in Example 1, we will also discuss the AWGN channel Y = X + Z where Z is zero-mean Gaussian with variance σ . Inthis case, we assume that the channel output Y is a quantized version of the continuous channel output X + Z . Furthermore,we assume that this quantization has a resolution high enough that the discrete-output channel is an accurate model for theunderlying continuous-output channel. Therefore, the achievability results we will obtain for discrete memoryless channelscarry over to the discrete-input AWGN channel. B. Typical Sequences

We will provide achievability proofs based on weak typicality. In this section, which is based on [9, Secs. 3.1, 7.6, and 15.2],we formally deﬁne weak typicality and list its properties that will be used in this paper.Let ε > and n be a positive integer. Consider the random variable X with probability distribution p ( x ) . Then, the (weak)typical set A nε ( X ) of length- n sequences with respect to p ( x ) is deﬁned as: A nε ( X ) (cid:44) (cid:26) x ∈ X n : (cid:12)(cid:12)(cid:12)(cid:12) − n log p ( x ) − H ( X ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:27) , (9)where: p ( x ) (cid:44) n (cid:89) i =1 p ( x i ) . (10)The cardinality of the typical set A nε ( X ) satisﬁes [9, Thm. 3.1.2]: (1 − ε )2 n ( H ( X ) − ε ) (a) ≤ |A nε ( X ) | (b) ≤ n ( H ( X )+ ε ) , (11)where (a) holds for n sufﬁciently large and (b) holds for all n . For x ∈ A nε ( X ) , the probability of occurrence can be boundedas [9, Eq. (3.6)]: − n ( H ( X )+ ε ) ≤ p ( x ) ≤ − n ( H ( X ) − ε ) . (12) RAFT 5

The idea of typical sets can be generalized for pairs of n -sequences. Now, consider the pair of random variables ( X, Y ) with probability distribution p ( x, y ) . Then, the typical set A nε ( XY ) of pairs of length- n sequences with respect to p ( x, y ) isdeﬁned as: A nε ( XY ) (cid:44) (cid:26) ( x, y ) ∈ X n × Y n : (cid:12)(cid:12)(cid:12)(cid:12) − n log p ( x ) − H ( X ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε, (cid:12)(cid:12)(cid:12)(cid:12) − n log p ( y ) − H ( Y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε, (cid:12)(cid:12)(cid:12)(cid:12) − n log p ( x, y ) − H ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:27) (13)where: p ( x, y ) (cid:44) n (cid:89) i =1 p ( x i , y i ) , (14)and where p ( x ) and p ( y ) are the marginal distributions that correspond to p ( x, y ) . The cardinality of the typical set A nε ( XY ) satisﬁes [9, Thm. 7.6.1]: |A nε ( XY ) | ≤ n ( H ( X,Y )+ ε ) (15)for all n . For ( x, y ) ∈ A nε ( XY ) , the probability of occurrence can be bounded in a similar manner to (12) as: − n ( H ( X,Y )+ ε ) ≤ p ( x, y ) ≤ − n ( H ( X,Y ) − ε ) . (16)Along the same lines, joint typicality can be extended for collections of n -sequences ( X , X , · · · , X m ) and the correspond-ing typical set A nε ( X X · · · X m ) can be deﬁned similar to how (9) was extended to (13). Then, for ( x , x , · · · , x m ) ∈A nε ( X X · · · X m ) , the probability of occurrence can be bounded in a similar manner to (16) as: − n ( H ( X )+ ε ) ≤ p ( x , x , · · · , x m ) ≤ − n ( H ( X ) − ε ) , (17)where X = ( X , X , . . . , X m ) .Finally, we ﬁx x . The conditional (weak) typical set A nε ( Y | x ) of length- n sequences is deﬁned as: A nε ( Y | x ) = (cid:8) y : ( x, y ) ∈ A nε ( XY ) (cid:9) . (18)In other words, A nε ( Y | x ) is the set of all y sequences that are jointly typical with x . For x ∈ A nε ( X ) and for sufﬁciently large n , the cardinality of the conditional typical set A nε ( Y | x ) satisﬁes [9, Thm. 15.2.2]: |A nε ( Y | x ) | ≤ n ( H ( Y | X )+2 ε ) . (19) Deﬁnition 1 ( B -typicality) . Let the input probability distribution p ( u ) together with the transition probability distribution p ( v | u ) determine the joint probability distribution p ( u, v ) = p ( u ) p ( v | u ) . Now, we deﬁne: B nV,ε ( U ) ∆ = (cid:110) u : u ∈ A nε ( U ) and Pr (cid:8) ( u, V ) ∈ A nε ( U V ) | U = u ) (cid:9) ≥ − ε (cid:111) , (20)where V is the output sequence of a “channel” p ( v | u ) when sequence u is input.The set B nV,ε ( U ) in (20) guarantees that a sequence u in this B -typical set will with high probability lead to a sequence v that is jointly typical with u . We note that U and/or V can be composite. The set B nV,ε ( U ) has three properties, as stated inLemma 1, the proof of which is given in Appendix VI. Lemma 1 ( B -typicality properties) . The set B nV,ε ( U ) in Deﬁnition 1 has the following properties: P : For u ∈ B nV,ε ( U ) , − n ( H ( U )+ ε ) ≤ p ( u ) ≤ − n ( H ( U ) − ε ) . (21) P : For n large enough, (cid:88) u/ ∈B nV,ε ( U ) p ( u ) ≤ ε.P : |B nV,ε ( U ) | ≤ n ( H ( U )+ ε ) holds for all n , while |B nV,ε ( U ) | ≥ (1 − ε )2 n ( H ( U ) − ε ) holds for n large enough.IV. R ANDOM S IGN -C ODING E XPERIMENT

We consider m +1 -ary amplitude shift keying ( M -ASK) alphabets X = {− M + 1 , − M + 3 , · · · , M − } where M =2 m +1 . We note that X is symmetric around the origin and can be factorized as X = SA . Here, S = {− , +1 } and A = { +1 , +3 , · · · , M − } are the sign and amplitude alphabets, respectively. Accordingly, any channel input x ∈ X can be writtenas the multiplication of a sign and an amplitude, i.e., x = s ⊗ a . RAFT 6

Shaper a ∈ A m a Coder s ∈ S m s s (cid:48) ( m s ) ⊗ s ( m a , m s ) = ( s (cid:48) ( m s ) , s (cid:48)(cid:48) ( m a , m s )) a ( m a ) p ( y | x ) x ( m a , m s ) SMD orBMD y ˆ m a , ˆ m s Shaping layer Coding layer

Fig. 3. Sign-coding structure: sign-coding (coder) is combined with amplitude shaping (shaper). SMD, symbol-metric decoding; BMD, bit-metric decoding.

Shaper p ( a ) , a ∈ A m a a ( m a ) ∈ B nSY,ε ( A ) Fig. 4. Shaping layer of the random sign-coding setup with SMD.

A. Random Sign-Coding Setup

We cast the PAS structure shown in Figure 1 as a sign-coding structure as in Figure 3. The sign-coding setup consists oftwo layers: a shaping layer and a coding layer.

Deﬁnition 2 (Sign-coding) . For every message index pair ( m a , m s ) , with uniform m a ∈ { , , · · · , M a } and uniform m s ∈{ , , · · · , M s } , a sign-coding structure as shown in Figure 3 consists of the following. • A shaping layer that produces for every message index m a , a length- n shaped amplitude sequence a ( m a ) where themapping is one-to-one. The set of amplitude sequences is assumed to be shaped, but uncoded. • An additional n -bit (uniform) information string in the form of a sign sequence part s (cid:48) ( m s ) = ( s ( m s ) , s ( m s ) , · · · , s n ( m s )) for every message index m s . • A coding layer that extends the sign sequence part s (cid:48) ( m s ) by adding a second (uniform) sign sequence part s (cid:48)(cid:48) ( m a , m s ) =( s n +1 ( m a , m s ) , s n +2 ( m a , m s ) , · · · , s n ( m a , m s )) of length- n for all m a and m s . This is obtained by using an encoderthat produces redundant signs in the set S from a ( m a ) and s (cid:48) ( m s ) . Here, n + n = n .Finally, the transmitted sequence is x ( m a , m s ) = a ( m a ) ⊗ s ( m a , m s ) , where s ( m a , m s ) = ( s (cid:48) ( m s ) , s (cid:48)(cid:48) ( m a , m s )) . The sign-coding setup with n = 0 ( γ = 0 ) is called basic sign-coding, while the setup with n > ( γ > ) is called modiﬁedsign-coding. B. Shaping Layer

When SMD is employed at the receiver, the shaping layer is as shown in Figure 4. Here, let A be distributed with p ( a ) over a ∈ A . Then, the shaper produces for every message index m a a length- n amplitude sequence a ( m a ) ∈ B nSY,ε ( A ) . We notethat for this sign-coding setup, the rate is: R = 1 n log | M a M s | = γ + 1 n log |B nSY,ε ( A ) | ≥ H ( A ) + γ − ε (22)where the inequality in (22) follows for n large enough from P .On the other hand, when BMD is used at the receiver, the shaping layer is as shown in Figure 5. Here, let B =( B , B , · · · , B m ) be distributed with p ( b ) = p ( b , b , · · · , b m ) over ( b , b , · · · , b m ) ∈ { , } m . The shaper produces forevery message index m a an n -sequence of m -tuples b ( m a ) = ( b ( m a ) , b ( m a ) , · · · , b m ( m a )) ∈ B nSY,ε ( B B · · · B m ) . Then,each m -tuple is mapped to an amplitude sequence a ( m a ) by a symbol-wise mapping function f ( · ) . We note that for thissign-coding setup, the rate is: R = 1 n log | M a M s | = γ + 1 n log |B nSY,ε ( B ) | ≥ H ( B ) + γ − ε (23)where the inequality in (23) follows for n large enough from P .To realize f ( · ) , we label the channel inputs with ( m + 1) -bit strings. The amplitude is addressed by m amplitude bits ( B , B , · · · , B m ) , while the sign is addressed by a sign bit S . The symbol-wise mapping function f ( · ) in Figure 5 uses the RAFT 7

Shaper p ( b ) , b ∈ { , } m m a Symbol-wiseMapping f ( b ) b ( m a ) ∈ B nSY,ε ( B B · · · B m ) a ( m a ) Fig. 5. Shaping layer of the random sign-coding setup with BMD for M-ASK. addressing ( B , B , · · · , B m ) ⇐⇒ A . We emphasize that unlike the case in Section II-B, we use ( S, B , B , · · · , B m ) todenote a channel input instead of ( C , C , · · · , C m +1 ) . Amplitudes and signs of x ∈ X are tabulated for 8-ASK in Table Ialong with an example of the mapping function f ( b , b ) , namely the binary reﬂected Gray code [19, Defn. 2.10]. TABLE II

NPUT ALPHABET AND MAPPING FUNCTION FOR A S -1 -1 -1 -1 1 1 1 1 X -7 -5 -3 -1 1 3 5 7 B B C. Decoding Rules

At the receiver, SMD ﬁnds the unique message index pair ( ˆ m a , ˆ m s ) such that the corresponding amplitude-sign sequenceis jointly typical with the received output sequence y , i.e., ( a ( ˆ m a ) , s ( ˆ m a , ˆ m s ) , y ) ∈ A nε ( ASY ) .On the other hand, BMD ﬁnds the unique message index pair ( ˆ m a , ˆ m s ) such that the corresponding bit and sign sequences are(individually) jointly typical with the received output sequence y , i.e., ( s ( ˆ m a , ˆ m s ) , y ) ∈ A nε ( SY ) and ( b j ( ˆ m a ) , y ) ∈ A nε ( B j Y ) for j = 1 , , · · · , m . We note that the decoder can use bit metrics p ( b ji = 1 | y i ) = 1 − p ( b ji = 0 | y i ) for j = 1 , , · · · , m and i = 1 , , · · · , n to ﬁnd p ( b j | y ) . Here, b ji is the j th bit of the i th symbol. Together with p ( y ) and p ( b j ) , the decoder can checkwhether ( b j , y ) ∈ A nε ( B j Y ) . We note that B j is in general not uniform. A similar statement holds for the uniform sign S .V. A CHIEVABLE I NFORMATION R ATES OF S IGN -C ODING

Here, we investigate AIRs of the sign-coding architecture in Figure 3. We consider both SMD and BMD at the receiver.In what follows, four AIRs are presented. The proofs are based on B -typicality, a variation of weak typicality, and randomsign-coding arguments and are given in Appendix VI-C. As indicated in Deﬁnition 2, signs S are assumed to be uniform in theproofs. We have not applied weak typicality for continuous random variables, discussed in [9, Sec. 8.2] and [33, Sec. 10.4],since our channels are discrete-input. However, it is also possible to develop a hybrid version of weak typicality that matcheswith discrete-input continuous-output channels.In the following, the concept of AIR is formally deﬁned in the sign-coding context. Deﬁnition 3 (Achievable information rate) . A rate R is said to be achievable if for every δ > and n large enough, thereexists a sign-coding encoder and a decoder such that (1 /n ) log ( M a M s ) ≥ R − δ and error probability P e ≤ δ . A. Sign-Coding with Symbol-Metric Decoding

Theorem 1 (Basic sign-coding with SMD) . For a memoryless channel {X , p ( y | x ) , Y} with amplitude shaping and basicsign-coding, the rate: R γ =0 SMD = max p ( a ): H ( A ) ≤ I ( SA ; Y ) H ( A ) (24)is achievable using SMD.Theorem 1 implies that for a memoryless channel, the rate R = H ( A ) is achievable with basic sign-coding, as long as H ( A ) ≤ I ( SA ; Y ) = I ( X ; Y ) is satisﬁed. For the AWGN channel, this means that a range of rate-SNR pairs are achievable.Here, SNR denotes the signal-to-noise ratio. One of these points, H ( A ) = I ( SA ; Y ) , is on the capacity-SNR curve. Note thathere, “capacity” indicates the largest achievable rate using X as the channel input alphabet under the average power constraint.It can be observed from Figure 6 discussed in Example 1 that there indeed exists an amplitude distribution p ( a ) for which H ( A ) = I ( SA ; Y ) . RAFT 8 H ( A ) + 0 . H ( A ) + 0 . H ( A ) Extra Rate γ SNR (in dB) A I R , E n t r opyo r R a t e ( b it/ - D ) AWGN Capacity4-ASK Capacity I ( X ; Y ) (unif. X ) Fig. 6. Sign-coding with SMD for 4-ASK. All C ≥ . bit/1D can be achieved with sign-coding. AIR, achievable information rate. Theorem 2 (Modiﬁed sign-coding with SMD) . For a memoryless channel {X , p ( y | x ) , Y} with amplitude shaping and modiﬁedsign-coding, the rate: R γ> SMD = max p ( a ) ,γ : H ( A )+ γ ≤ I ( SA ; Y ) H ( A ) + γ (25)is achievable using SMD for γ < .Theorem 2 implies that for a memoryless channel, the rate H ( A ) + γ is achievable with modiﬁed sign-coding, as long as R = H ( A ) + γ ≤ I ( SA ; Y ) = I ( X ; Y ) is satisﬁed. For the AWGN channel, this means that all points on the capacity-SNRcurve for which H ( X | Y ) ≤ − γ are achievable. This follows from: H ( A ) + γ ≤ I ( SA ; Y ) = H ( SA ) − H ( SA | Y ) = H ( A ) + 1 − H ( X | Y ) , (26)i.e., the constraint in the maximization in (25). Example 1.

We consider the AWGN channel with average power constraint E [ X ] ≤ P . Figure 6 shows the capacity of4-ASK: C = max p ( x ): X = {− , − , +1 , +3 } ,E [ X ] ≤ P I ( X ; Y ) (27)together with the amplitude entropy H ( A ) of the distribution that achieves this capacity. Here, SNR = E [ X ] /σ , and σ is thenoise variance. Basic sign-coding achieves capacity only for SNR = 0 . dB, i.e., at the point where H ( A ) = I ( X ; Y ) , whichis C = 0 . bit/1D. We see from Figure 6 that the shaping gap is negligible around this point, i.e., the capacity C of 4-ASK and the MI I ( X ; Y ) for uniform p ( x ) are virtually the same. On the other hand, this gap is signiﬁcant for largerrates, e.g., it is around 0.42 dB at 1.6 bit/1D. To achieve rates larger than 0.562 bit/1D on the capacity-SNR curve, modiﬁedsign-coding ( γ > ) is required. At a given SNR, C can be written as C = H ( A ) + γ , i.e., when the H ( A ) curve isshifted above by γ , the crossing point is again at C for that SNR. We also plot the additional rate γ = C − H ( A ) in Figure 6. As an example, at SNR = 9 . dB, C ASK = H ( A ) + γ = 1 . can be achieved with modiﬁed sign-coding where H ( A ) = 0 . and γ = 0 . . We observe that sign-coding achieves the capacity of 4-ASK for SNR ≥ . dB. RAFT 9

B. Sign-Coding with Bit-Metric Decoding

The following theorems give AIRs for sign-coding with BMD.

Theorem 3 (Basic sign-coding with BMD) . For a memoryless channel {X , p ( y | x ) , Y} with amplitude shaping using M -ASKand basic sign-coding, the rate: R γ =0 BMD = max p ( b ): H ( B ) ≤ R BMD ( p ( x )) H ( B ) (28)is achievable using BMD. Here, B = ( B , B , . . . , B m ) , p ( b ) = p ( b , b , . . . , b m ) , and p ( x ) = p ( s, b , b , . . . , b m ) , and R BMD ( p ( x )) is as deﬁned in (6). Theorem 4 (Modiﬁed sign-coding with BMD) . For a memoryless channel {X , p ( y | x ) , Y} with amplitude shaping using M -ASK and modiﬁed sign-coding, the rate: R γ> BMD = max p ( b ) ,γ : H ( B )+ γ ≤ R BMD ( p ( x )) H ( B ) + γ (29)is achievable using BMD for γ < .Theorems 3 and 4 imply that for a memoryless channel, the rate R = H ( B ) + γ = H ( A ) + γ is achievable with sign-codingand BMD, as long as R ≤ R BMD is satisﬁed.

Remark 1 (Random sign-coding with binary linear codes) . An amplitude can be represented by m bits. We can uniformlygenerate a code matrix with mn rows of length n . This matrix can be used to produce the sign sequences. This results inthe pairwise independence of any two different sign sequences, as is explained in the proof of [15, Thm. 6.2.1]. Inspection ofthe proof of our Theorem 1 shows that only the pairwise independence of sign sequences is needed. Therefore, achievabilitycan also be obtained with a binary linear code. Note that our linear code can also be seen as a systematic code that generatesparity. The code rate of the corresponding systematic code is m/ ( m + 1) . For BMD, a similar reasoning shows that linearcodes lead to achievability, and also for modiﬁed sign-coding, achievability follows for binary linear codes. The rate of thesystematic code that corresponds to the modiﬁed setting is ( m + γ ) / ( m + 1) .VI. C ONCLUSIONS

In this paper, we studied achievable information rates (AIRs) of probabilistic amplitude shaping (PAS) for discrete-inputmemoryless channels. In contrast to the existing literature in which Gallager’s error exponent approach was followed, weused a weak typicality framework. Random sign-coding arguments based on weak typicality were introduced to upper-boundthe probability of error of a so-called sign-coding structure. The achievability of the mutual information was demonstratedfor uniform signs, which were independent of the amplitudes. Sign-coding combined with amplitude shaping correspondedto PAS, and consequently, PAS achieved the capacity of a discrete-input memoryless channel with a symmetric capacity-achieving distribution.Our approach was different than the random coding arguments considered in the literature, in the sense that our motivationwas to provide achievability proofs that were as constructive as possible. To this end, in our random sign-coding setup, boththe amplitudes and the signs of the channel inputs that were directly selected by information bits were constructively produced.Only the remaining signs were drawn at random. A study on the achievability of capacity for channels with asymmetriccapacity-achieving distributions with a type of sign-coding is left for possible future research.A

PPENDIX AP ROOF OF L EMMA A. Proof of P We see from [9, Eq. (3.6)] that for u ∈ A nε ( U ) , − n ( H ( U )+ ε ) ≤ p ( u ) ≤ − n ( H ( U ) − ε ) . (30)Due to Deﬁnition 1, each u ∈ B nV,ε ( U ) is also in A nε ( U ) ; more speciﬁcally, B nV,ε ( U ) ⊆ A nε ( U ) . Consequently, (30) also holdsfor u ∈ B nV,ε ( U ) , which completes the proof of P . RAFT 10

B. Proof of P Let ( U , V ) be independent and identically distributed with respect to p ( u, v ) . Then: Pr { ( U , V ) ∈ A nε ( U V ) } = (cid:88) u p ( u ) (cid:88) v :( u,v ) ∈A nε ( UV ) p ( v | u ) (31) = (cid:88) u ∈B nV,ε ( U ) p ( u ) (cid:88) v :( u,v ) ∈A nε ( UV ) p ( v | u )+ (cid:88) u/ ∈B nV,ε ( U ) p ( u ) (cid:88) v :( u,v ) ∈A nε ( UV ) p ( v | u ) (32) ≤ (cid:88) u ∈B nV,ε ( U ) p ( u ) + (cid:88) u/ ∈B nV,ε ( U ) p ( u )(1 − ε ) (33) = 1 − ε + ε (cid:88) u ∈B nV,ε ( U ) p ( u ) (34) = 1 − ε + ε Pr { U ∈ B nV,ε ( U ) } . (35)Here, (33) follows from Deﬁnition 1, which states that Pr (cid:8) ( u, V ) ∈ A nε ( U V ) (cid:12)(cid:12) U = u (cid:9) < − ε for u ∈ A nε ( U ) , if u / ∈ B nV,ε ( U ) .Then, from (35), we obtain: Pr { U ∈ B nV,ε ( U ) } ≥ Pr { ( U, V ) ∈ A nε ( U V ) } − εε (36) = 1 − Pr { ( U, V ) / ∈ A nε ( U V ) } ε (37) ≥ − ε. (38)for large enough n . Here, (38) follows from [9, Thm. 7.6.1], which states that Pr { ( U , V ) ∈ A nε ( U V ) } → as n → ∞ . Thisimplies that Pr { ( U , V ) / ∈ A nε ( U V ) } ≤ ε for positive ε and large enough n , which completes the proof. C. Proof of P We see from [9, Thm. 3.1.2] that: |A nε ( U ) | ≤ n ( H ( U )+ ε ) . (39)Since B nV,ε ( U ) ⊆ A nε ( U ) , again by Deﬁnition 1, (39) also holds for |B nV,ε ( U ) | . This proves the upper bound in P . To provethe lower bound, we obtain from (38) for n sufﬁciently large that: − ε ≤ Pr { U ∈ B nV,ε ( U ) } (40) ≤ (cid:88) u ∈B nV,ε ( U ) − n ( H ( U ) − ε ) (41) = |B nV,ε ( U ) | − n ( H ( U ) − ε ) , (42)where (41) follows from (30). A PPENDIX BP ROOFS OF T HEOREMS

1, 2, 3,

AND P e over a random choice of sign-codebooks. This way, we will demonstrate the existence of at least one good sign-code. Again as in [9, Sec. 7.7] and as explained in Section IV-C, we decode by joint typicality: the decoder looks for a uniquemessage index pair ( ˆ m a , ˆ m s ) for which the corresponding amplitude-sign sequence ( a, s ) is jointly typical with the receivedsequence y .By the properties of weak typicality and B -typicality, the transmitted amplitude-sign sequence and the received sequence arejointly typical with high probability for n large enough. We call the event for which the transmitted amplitude-sign sequence isnot jointly typical with the received sequence the ﬁrst error event with average probability P e (1) . Furthermore, the probabilitythat any other (not transmitted) amplitude-sign sequence is jointly typical with the received sequence vanishes for asymptoticallylarge n . We call the event that there is another amplitude-sign sequence that is jointly typical with the received sequence thesecond error event with average probability P e (2) . Observing that these events are not disjoint, we can write [9, Eq. (7.75)]: P e ≤ P e (1) + P e (2) . (43) RAFT 11

D. Proof of Theorem 1

For the error of the ﬁrst kind, we can write: P e (1) = M a (cid:88) m a =1 M a (cid:88) s ∈S n p ( s ) (cid:88) y ∈Y n p ( y | a ( m a ) , s ) [( a ( m a ) , s, y ) / ∈ A nε ( ASY )] (44) = (cid:88) m a M a (cid:88) s (cid:88) y p ( s, y | a ( m a )) [( a ( m a ) , s, y ) / ∈ A nε ] (45) = (cid:88) m a M a Pr (cid:8) ( a ( m a ) , S, Y ) / ∈ A nε (cid:12)(cid:12) A = a ( m a ) (cid:9) (46) ≤ (cid:88) m a εM a (47) = ε, (48)where we simpliﬁed the notation by replacing m a = 1 , , · · · , M a by m a , s ∈ S n by s , and y ∈ Y n by y in (45). Furthermore,we dropped the index of the typical set A nε ( ASY ) and used A nε instead. We will follow these notations for summations andfor the typical sets for the rest of the paper, assuming for the latter that the index of the typical set will be clear from thecontext. To obtain (45), we used p ( s ) p ( y | a ( m a ) , s ) = p ( s, y | a ( m a )) . Then, (47) is a direct consequence of Deﬁnition 1 since a ( m a ) ∈ B nSY,ε ( A ) for m a = 1 , , · · · , M a .For the error of the second kind, we can write: P e (2) ≤ (cid:88) m a M a (cid:88) s p ( s ) (cid:88) y p ( y | a ( m a ) , s ) M a (cid:88) k a =1 ,k a (cid:54) = m a (cid:88) ˜ s ∈S n p (˜ s ) [( a ( k a ) , ˜ s, y ) ∈ A nε ] (49) = M a (cid:88) m a (cid:88) s p ( s ) M a (cid:88) y p ( y | a ( m a ) , s ) (cid:88) k a (cid:54) = m a (cid:88) ˜ s p (˜ s ) M a [( a ( k a ) , ˜ s, y ) ∈ A nε ] (50) ≤ M a nε (cid:88) m a (cid:88) s p ( a ( m a )) p ( s ) (cid:88) y p ( y | a ( m a ) , s ) · (cid:88) k a (cid:54) = m a (cid:88) ˜ s p ( a ( k a )) p (˜ s ) [( a ( k a ) , ˜ s, y ) ∈ A nε ] (51) ≤ M a nε (cid:88) a ∈A n (cid:88) s p ( a ) p ( s ) (cid:88) y p ( y | a, s ) (cid:88) ˜ a ∈A n (cid:88) ˜ s p (˜ a ) p (˜ s ) [(˜ a, ˜ s, y ) ∈ A nε ] (52) = M a nε (cid:88) ( y, ˜ x ) ∈A nε p (˜ x ) p ( y ) (53) ≤ n ( H ( A )+ ε ) nε |A nε ( XY ) | − n ( H ( X ) − ε ) − n ( H ( Y ) − ε ) (54) ≤ n ( H ( A )+7 ε ) n ( H ( X,Y )+ ε ) − n ( H ( X ) − ε ) − n ( H ( Y ) − ε ) (55) = 2 n ( H ( A ) − I ( SA ; Y )+10 ε ) , (56)where we simpliﬁed the notation by replacing k a = 1 , , · · · , M a : k a (cid:54) = m a by k a (cid:54) = m a , and ˜ s ∈ S n by ˜ s in (50). We willfollow these notations for the rest of the paper. Then:(51) follows for n sufﬁciently large and for a ∈ B nSY,ε ( A ) from: M a = 1 |B nSY,ε ( A ) | ≤ − n ( H ( A ) − ε )) − ε (57) = 2 nε − ε − n ( H ( A )+ ε ) (58) ≤ nε − ε p ( a ) (59) ≤ nε p ( a ) , (60)where (57) follows from the B -typicality property P , (59) follows from the B -typicality property P , and (60) holds forall large enough n .(52) follows from summing over a ∈ A n instead of over a ( m a ) ∈ B nε and over ˜ a ∈ A n instead of a ( k a ) ∈ B nε for k a (cid:54) = m a .(53) is obtained by working out the summations over a and s and by replacing ˜ a ˜ s with ˜ x .(54) follows from M a = |B nε ( A ) | ≤ n ( H ( A )+ ε ) , i.e., the B -typicality property P , and from (12). RAFT 12 (55) follows from (15).The conclusion from (56) is that for H ( A ) < I ( X ; Y ) − ε , the error probability of the second kind: P e (2) ≤ ε (61)for n large enough. Using (48) and (61) in (43), we ﬁnd that the total error probability averaged over all possible sign-codes P e ≤ ε for n large enough. This implies the existence of a basic sign-code with total error probability P e = Pr { ˆ M a (cid:54) = M a } ≤ ε . This holds for all ε > , and therefore, the rate: R = H ( A ) ≤ I ( X ; Y ) , (62)is achievable with basic sign-coding, which concludes the proof of Theorem 1. E. Proof of Theorem 2

For the error of the ﬁrst kind, we can write: P e (1) = (cid:88) m a M a M s (cid:88) m s =1 n (cid:88) s (cid:48)(cid:48) ∈S n p ( s (cid:48)(cid:48) ) (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) [( a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ] (63) = (cid:88) m a M a (cid:88) m s (cid:88) s (cid:48)(cid:48) − n (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) [( a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ] (64) = (cid:88) m a M a (cid:88) m s (cid:88) s (cid:48)(cid:48) (cid:88) y p ( s (cid:48) ( m s ) s (cid:48)(cid:48) , y | a ( m a )) [( a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ] (65) = (cid:88) m a M a Pr (cid:8) ( a ( m a ) , S, Y ) / ∈ A nε (cid:12)(cid:12) A = a ( m a ) (cid:9) (66) ≤ (cid:88) m a εM a (67) = ε, (68)where we simpliﬁed the notation by replacing s (cid:48)(cid:48) ∈ S n by s (cid:48)(cid:48) and m s = 1 , , · · · , M s by m s in (64). We will follow thesenotations for the rest of the paper. To obtain (64), we used the fact that S (cid:48)(cid:48) is uniform; more precisely p ( s (cid:48)(cid:48) ) = 2 − n . To obtain(65), we used the fact that S (cid:48) is also uniform, and then, − n p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) = p ( s (cid:48) ( m s ) s (cid:48)(cid:48) , y | a ( m a )) . Then, (67) is adirect consequence of Deﬁnition 1 since a ( m a ) ∈ B nSY,ε ( A ) for m a = 1 , , · · · , M a .For the error of the second kind, we obtain: P e (2) ≤ (cid:88) m a M a (cid:88) m s n (cid:88) s (cid:48)(cid:48) p ( s (cid:48)(cid:48) ) (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) ( k a ,k s ) (cid:54) =( m a ,m s ) (cid:88) ˜ s (cid:48)(cid:48) p (˜ s (cid:48)(cid:48) ) [( a ( k a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ]= M a n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) ( k a ,k s ) (cid:54) =( m a ,m s ) (cid:88) ˜ s (cid:48)(cid:48) − n M a [( a ( k a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ] (69) = M a n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) k a (cid:54) = m a ,k s , ˜ s (cid:48)(cid:48) − n M a [( a ( k a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ]+ 2 n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) k s (cid:54) = m s , ˜ s (cid:48)(cid:48) − n [( a ( m a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ] . (70)Here, we replaced nested summations over m a , m s , and s (cid:48) by a single summation over ( m a , m s , s (cid:48) ) for the sake of betterreadability. We will use this notation for the rest of the paper. Then:(69) follows from n = n + n and from the fact that S (cid:48)(cid:48) is uniform; more precisely, p ( s (cid:48)(cid:48) ) = 2 − n .(70) is obtained by splitting ( k a , k s ) (cid:54) = ( m a , m s ) into k a (cid:54) = m a , k s and k a = m a , k s (cid:54) = m s . RAFT 13

From (70), we obtain: P e (2) ≤ M a n nε (cid:88) m a ,m s ,s (cid:48)(cid:48) p ( a ( m a )) p ( s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k a (cid:54) = m a ,k s , ˜ s (cid:48)(cid:48) p ( a ( k a )) p ( s (cid:48) ( k s )˜ s (cid:48)(cid:48) ) [( a ( k a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ]+ 2 n nε (cid:88) m a ,m s ,s (cid:48)(cid:48) p ( a ( m a )) p ( s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) y p ( y | a ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k s (cid:54) = m s , ˜ s (cid:48)(cid:48) p ( s (cid:48) ( k s )˜ s (cid:48)(cid:48) ) [( a ( m a ) , s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ] (71) ≤ M a n nε (cid:88) a,s (cid:48) s (cid:48)(cid:48) p ( a ) p ( s (cid:48) s (cid:48)(cid:48) ) (cid:88) y p ( y | a, s (cid:48) s (cid:48)(cid:48) ) (cid:88) ˜ a, ˜ s (cid:48) ˜ s (cid:48)(cid:48) p (˜ a ) p (˜ s (cid:48) ˜ s (cid:48)(cid:48) ) [(˜ a, ˜ s (cid:48) ˜ s (cid:48)(cid:48) , y ) ∈ A nε ]+ 2 n nε (cid:88) a,s (cid:48) s (cid:48)(cid:48) p ( a ) p ( s (cid:48) s (cid:48)(cid:48) ) (cid:88) y p ( y | a, s (cid:48) s (cid:48)(cid:48) ) (cid:88) ˜ s (cid:48) ˜ s (cid:48)(cid:48) p (˜ s (cid:48) ˜ s (cid:48)(cid:48) ) [( a, ˜ s (cid:48) ˜ s (cid:48)(cid:48) , y ) ∈ A nε ] (72) = M a n nε (cid:88) a,s p ( a ) p ( s ) (cid:88) y p ( y | a, s ) (cid:88) ˜ a, ˜ s p (˜ a ) p (˜ s ) [(˜ a, ˜ s, y ) ∈ A nε ]+ 2 n nε (cid:88) a,s p ( a ) p ( s ) (cid:88) y p ( y | a, s ) (cid:88) ˜ s p (˜ s ) [( a, ˜ s, y ) ∈ A nε ] , (73)where:(71) follows for n sufﬁciently large and for a ∈ B nSY,ε ( A ) from: M a (60) ≤ nε p ( a ) (74)and from p ( s (cid:48) s (cid:48)(cid:48) ) = 2 − n ,(72) follows from summing over a ∈ A n instead of over a ( m a ) ∈ B nε and over ˜ a ∈ A n instead of a ( k a ) ∈ B nε for k a (cid:54) = m a .Moreover, it follows from summing over s (cid:48) ∈ S n instead of s (cid:48) ( k s ) for k s = 1 , , · · · , M s and k s (cid:54) = m s .(73) follows from substituting s for s (cid:48) s (cid:48)(cid:48) and ˜ s for ˜ s (cid:48) ˜ s (cid:48)(cid:48) .Finally, from (73), we obtain: P e (2) = M a n nε (cid:88) y p ( y ) (cid:88) ˜ x p (˜ x ) [(˜ x, y ) ∈ A nε ]+ 2 n nε (cid:88) a,y p ( a, y ) (cid:88) ˜ s p (˜ s ) [( a, ˜ s, y ) ∈ A nε ] (75) ≤ n ( H ( A )+ ε ) nγ nε |A nε ( XY ) | − n ( H ( X ) − ε ) − n ( H ( Y ) − ε ) + 2 nγ nε |A nε ( SAY ) | − n ( H ( A,Y ) − ε ) − n ( H ( S ) − ε ) (76) ≤ n ( H ( A )+7 ε ) nγ n ( H ( X,Y )+ ε ) − n ( H ( X ) − ε ) − n ( H ( Y ) − ε ) + 2 nγ nε n ( H ( S,A,Y )+ ε ) − n ( H ( A,Y ) − ε ) − n ( H ( S ) − ε ) (77) = 2 n ( H ( A )+ γ +10 ε − I ( X ; Y )) + 2 n ( γ +6 ε − I ( S ; A,Y )) . (78)Here, we substituted n = nγ in (76). Then:(75) is obtained by working out the summations over a, s in the ﬁrst part and s in the second part. Moreover, we replaced ˜ a ˜ s with ˜ x .(76) is obtained using for the ﬁrst part that M a = |B nε ( A ) | ≤ n ( H ( A )+ ε ) , i.e., the B -typicality property P , and (12). For thesecond part, we used (12) for p ( s ) and (16) for p ( a, y ) .(77) follows from (15), and its extension to jointly typical triplets; more precisely, |A nε ( SAY ) | ≤ n ( H ( S,A,Y )+ ε ) .The conclusion from (78) is that for H ( A ) + γ < I ( X ; Y ) − ε and γ < I ( S ; A, Y ) − ε , the error probability of thesecond kind: P e (2) ≤ ε, (79) RAFT 14 for n large enough. The ﬁrst constraint, i.e., H ( A ) + γ < I ( X ; Y ) − ε , already implies the second constraint, i.e., γ , and thus, the rate: R = H ( A ) + γ ≤ I ( X ; Y ) , (84)is achievable with modiﬁed sign-coding, which concludes the proof of Theorem 2. F. Proof of Theorem 3

For the error of the ﬁrst kind, we can write: P e (1) = (cid:88) m a M a (cid:88) s p ( s ) (cid:88) y p ( y | b ( m a ) , s ) (85) · [(( b ( m a ) , y ) / ∈ A nε ) ∪ (( b ( m a ) , y ) / ∈ A nε ) ∪ . . . ∪ (( b m ( m a ) , y ) / ∈ A nε ) ∪ (( s, y ) / ∈ A nε )] ≤ (cid:88) m a M a (cid:88) s (cid:88) y p ( s, y | b ( m a )) [( b ( m a ) , s, y ) / ∈ A nε ] (86) = (cid:88) m a M a Pr (cid:8) ( b ( m a ) , S, Y ) / ∈ A nε (cid:12)(cid:12) B = b ( m a ) (cid:9) (87) ≤ (cid:88) m a εM a (88) = ε, (89)where we used b ( m a ) to denote ( b ( m a ) , b ( m a ) , . . . , b m ( m a )) in (85) and B to denote ( B , B , . . . , B m ) in (87). Then, weused p ( s ) p ( y | b ( m a ) , s ) = p ( s, y | b ( m a )) in (86). Here, (86) follows from the fact that if at least one of b ( m a ) , b ( m a ) , . . . , b m ( m a ) or s is not jointly typical with y , then ( b ( m a ) , s, y ) is not jointly typical. Then, (88) is a direct consequence of Deﬁnition 1since b ( m a ) ∈ B nSY,ε ( B B · · · B m ) for m a = 1 , , · · · , M a . RAFT 15

For the error of the second kind, we can write: P e (2) ≤ (cid:88) m a M a (cid:88) s p ( s ) (cid:88) y p ( y | b ( m a ) , s ) · (cid:88) k a (cid:54) = m a (cid:88) ˜ s p (˜ s ) [( b ( k a ) , y ) ∈ A nε , ( b ( k a ) , y ) ∈ A nε , . . . , ( b m ( k a ) , y ) ∈ A nε , (˜ s, y ) ∈ A nε ]= M a (cid:88) m a (cid:88) s p ( s ) M a (cid:88) y p ( y | b ( m a ) , s ) · (cid:88) k a (cid:54) = m a (cid:88) ˜ s p (˜ s ) M a [( b ( k a ) , y ) ∈ A nε , ( b ( k a ) , y ) ∈ A nε , . . . , ( b m ( k a ) , y ) ∈ A nε , (˜ s, y ) ∈ A nε ] ≤ M a nε (cid:88) m a (cid:88) s p ( b ( m a )) p ( s ) (cid:88) y p ( y | b ( m a ) , s ) (90) · (cid:88) k a (cid:54) = m a (cid:88) ˜ s p (˜ s ) p ( b ( k a )) [( b ( k a ) , y ) ∈ A nε , ( b ( k a ) , y ) ∈ A nε , . . . , ( b m ( k a ) , y ) ∈ A nε , (˜ s, y ) ∈ A nε ] ≤ M a nε (cid:88) b ∈{ , } mn (cid:88) s p ( b ) p ( s ) (cid:88) y p ( y | b , s ) (91) · (cid:88) ˜ b ∈{ , } mn (cid:88) ˜ s p (˜ s ) p (˜ b ) [(˜ b , y ) ∈ A nε , (˜ b , y ) ∈ A nε , . . . , (˜ b m , y ) ∈ A nε , (˜ s, y ) ∈ A nε ]= M a nε (cid:88) y p ( y ) (cid:88) ˜ b , ˜ s p (˜ b , ˜ s ) [(˜ b , y ) ∈ A nε , (˜ b , y ) ∈ A nε , . . . , (˜ b m , y ) ∈ A nε , (˜ s, y ) ∈ A nε ] (92) ≤ n ( H ( B )+7 ε ) |A nε ( Y ) | − n ( H ( Y ) − ε ) ·|A nε ( B | y ) ||A nε ( B | y ) | · . . . · |A nε ( B m | y ) ||A nε ( S | y ) | − n ( H ( B ,S ) − ε ) (93) ≤ n ( H ( B )+7 ε ) n ( H ( Y )+ ε ) − n ( H ( Y ) − ε ) · n ( H ( B | Y )+ H ( B | Y )+ ... + H ( B m | Y )+ H ( S | Y )+2( m +1) ε ) − n ( H ( B ,S ) − ε ) (94) = 2 n ( H ( B ) − H ( B ,S )+ H ( B | Y )+ H ( B | Y )+ ... + H ( B m | Y )+ H ( S | Y )+(12+2 m ) ε ) , (95)where we used b to denote ( b , b , . . . , b m ) and ˜ b to denote (˜ b , ˜ b , . . . , ˜ b m ) in (91). We also used B to denote ( B , B , . . . , B m ) in (93). Finally, we simpliﬁed the notation by replacing ˜ b ∈ { , } mn by ˜ b in (92). Then:(90) follows for n sufﬁciently large and for b ∈ B nSY,ε ( B ) from /M a ≤ nε p ( b ) , which can be shown in a similar way as(60) was derived.(91) follows from summing over b ∈ { , } mn instead of over b ( m a ) ∈ B nε and over ˜ b ∈ { , } mn instead of over b ( k a ) ∈ B nε for k a (cid:54) = m a .(92) is obtained by working out the summations over b , b , . . . , b m , and s .(93) follows from M a = |B nε ( B ) | ≤ n ( H ( B )+ ε ) , i.e., the B -typicality property P , from (12), and from (17).(94) follows from (11) and (19).The conclusion from (95) is that for: H ( B ) < H ( B , S ) − H ( S | Y ) − (cid:32) m (cid:88) i =1 H ( B i | Y ) (cid:33) − (12 + 2 m ) ε = R BMD ( p ( b , s )) − (12 + 2 m ) ε, the error probability of the second kind: P e (2) ≤ ε (96)for n large enough. Using (89) and (96) in (43), we ﬁnd that the total error probability averaged over all possible sign-codes P e ≤ ε for n large enough. This implies the existence of a sign-code with total error probability P e = Pr { ˆ M a (cid:54) = M a } ≤ ε .This holds for all ε > , and thus, the rate: R = H ( B ) ≤ R BMD (97)is achievable with sign-coding and BMD, which concludes the proof of Theorem 3.

RAFT 16

G. Proof of Theorem 4

For the error of ﬁrst kind, we can write: P e (1) = (cid:88) m a M a (cid:88) m s n (cid:88) s (cid:48)(cid:48) p ( s (cid:48)(cid:48) ) (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:34) m (cid:91) i =1 (( b i ( m a ) , y ) / ∈ A nε ) (cid:91) (( s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ) (cid:35) = (cid:88) m a M a (cid:88) m s (cid:88) s (cid:48)(cid:48) − n (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:34) m (cid:91) i =1 (( b i ( m a ) , y ) / ∈ A nε ) (cid:91) (( s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ) (cid:35) (98) ≤ (cid:88) m a M a (cid:88) m s (cid:88) s (cid:48)(cid:48) (cid:88) y p ( s (cid:48) ( m s ) s (cid:48)(cid:48) , y | b ( m a )) [( b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) / ∈ A nε ] (99) = (cid:88) m a M a Pr { ( b ( m a ) , S, Y ) / ∈ A nε | B = b ( m a ) }≤ (cid:88) m a εM a (100) = ε. (101)Here, to obtain (98), we used the fact that S (cid:48)(cid:48) is uniform; more precisely, p ( s (cid:48)(cid:48) ) = 2 − n . Then, we used − n p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) = p ( s (cid:48) ( m s ) s (cid:48)(cid:48) , y | b ( m a )) in (99). Furthermore, (99) also follows from the fact that if at least one of b ( m a ) , b ( m a ) , . . . , b m ( m a ) or s (cid:48) ( m s ) s (cid:48)(cid:48) is not jointly typical with y , then ( b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) , y ) is not jointly typical. Then, (100) is a direct consequenceof Deﬁnition 1 since b ( m a ) ∈ B nSY,ε ( B B · · · B m ) for m a = 1 , , · · · , M a .For the error of second kind, we can write: P e (2) ≤ (cid:88) m a M a (cid:88) m s n (cid:88) s (cid:48)(cid:48) p ( s (cid:48)(cid:48) ) (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) ( k a ,k s ) (cid:54) =( m a ,m s ) (cid:88) ˜ s (cid:48)(cid:48) p (˜ s (cid:48)(cid:48) ) (cid:34) m (cid:92) i =1 (( b i ( k a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) = M a n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) ( k a ,k s ) (cid:54) =( m a ,m s ) (cid:88) ˜ s (cid:48)(cid:48) − n M a (cid:34) m (cid:92) i =1 (( b i ( k a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) (102) = M a n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k a (cid:54) = m a ,k s , ˜ s (cid:48)(cid:48) − n M a (cid:34) m (cid:92) i =1 (( b i ( k a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) + 2 n (cid:88) m a ,m s ,s (cid:48)(cid:48) − n M a (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k s (cid:54) = m s , ˜ s (cid:48)(cid:48) − n (cid:34) m (cid:92) i =1 (( b i ( m a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) , (103)where (102) follows from n = n + n and from the fact that S (cid:48)(cid:48) is uniform; more precisely, p ( s (cid:48)(cid:48) ) = 2 − n . Then, (103) isobtained by splitting ( k a , k s ) (cid:54) = ( m s , m s ) into k a (cid:54) = m s , k s and k a = m a , k s (cid:54) = m s . RAFT 17

From (103), we obtain: P e (2) ≤ M a n nε (cid:88) m a ,m s ,s (cid:48)(cid:48) p ( b ( m a )) p ( s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k a (cid:54) = m a ,k s , ˜ s (cid:48)(cid:48) p ( b ( k a )) p ( s (cid:48) ( k s )˜ s (cid:48)(cid:48) ) (cid:34) m (cid:92) i =1 (( b i ( k a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) + 2 n nε (cid:88) m a ,m s ,s (cid:48)(cid:48) p ( b ( m a )) p ( s (cid:48) ( m s ) s (cid:48)(cid:48) ) (cid:88) y p ( y | b ( m a ) , s (cid:48) ( m s ) s (cid:48)(cid:48) ) · (cid:88) k s (cid:54) = m s , ˜ s (cid:48)(cid:48) p ( s (cid:48) ( k s )˜ s (cid:48)(cid:48) ) (cid:34) m (cid:92) i =1 (( b i ( m a ) , y ) ∈ A nε ) (cid:92) (( s (cid:48) ( k s )˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) (104) ≤ M a n nε (cid:88) b ,s (cid:48) s (cid:48)(cid:48) p ( b ) p ( s (cid:48) s (cid:48)(cid:48) ) (cid:88) y p ( y | b , s (cid:48) s (cid:48)(cid:48) ) (cid:88) ˜ b , ˜ s (cid:48) ˜ s (cid:48)(cid:48) p (˜ b ) p (˜ s (cid:48) ˜ s (cid:48)(cid:48) ) · (cid:34) m (cid:92) i =1 ((˜ b i , y ) ∈ A nε ) (cid:92) ((˜ s (cid:48) ˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) + 2 n nε (cid:88) b ,s (cid:48) s (cid:48)(cid:48) p ( b ) p ( s (cid:48) s (cid:48)(cid:48) ) (cid:88) y p ( y | b , s (cid:48) s (cid:48)(cid:48) ) (cid:88) ˜ s (cid:48) ˜ s (cid:48)(cid:48) p (˜ s (cid:48) ˜ s (cid:48)(cid:48) ) · (cid:34) m (cid:92) i =1 (( b i , y ) ∈ A nε ) (cid:92) ((˜ s (cid:48) ˜ s (cid:48)(cid:48) , y ) ∈ A nε ) (cid:35) (105) = M a n nε (cid:88) b ,s p ( b ) p ( s ) (cid:88) y p ( y | b , s ) (cid:88) ˜ b , ˜ s p (˜ b ) p (˜ s ) (cid:34) m (cid:92) i =1 ((˜ b i , y ) ∈ A nε ) (cid:92) ((˜ s, y ) ∈ A nε ) (cid:35) + 2 n nε (cid:88) b ,s p ( b ) p ( s ) (cid:88) y p ( y | b , s ) (cid:88) ˜ s p (˜ s ) (cid:34) m (cid:92) i =1 (( b i , y ) ∈ A nε ) (cid:92) ((˜ s, y ) ∈ A nε ) (cid:35) , (106)where:(104) follows for n sufﬁciently large and for b ∈ B nSY,ε ( B ) from /M a ≤ nε p ( b ) and from p ( s (cid:48) s (cid:48)(cid:48) ) = 2 − n ,(105) follows from summing over b ∈ { , } mn instead of over b ( m a ) ∈ B nε and over ˜ b ∈ { , } mn instead of b ( k a ) ∈ B nε for k a (cid:54) = m a . Moreover, it follows from summing over s (cid:48) ∈ S n instead of s (cid:48) ( k s ) for k s = 1 , , · · · , M s and k s (cid:54) = m s ,(106) follows from substituting s for s (cid:48) s (cid:48)(cid:48) and ˜ s for ˜ s (cid:48) ˜ s (cid:48)(cid:48) .Finally, from (106), we obtain: P e (2) = M a n nε (cid:88) y p ( y ) (cid:88) ˜ b , ˜ s p (˜ b , ˜ s ) (cid:34) m (cid:92) i =1 ((˜ b i , y ) ∈ A nε ) (cid:92) ((˜ s, y ) ∈ A nε ) (cid:35) + 2 n nε (cid:88) b ,y p ( b , y ) (cid:88) ˜ s p (˜ s ) (cid:34) m (cid:92) i =1 (( b i , y ) ∈ A nε ) (cid:92) ((˜ s, y ) ∈ A nε ) (cid:35) (107) ≤ n ( H ( B )+ ε ) nγ nε |A nε ( Y ) | − n ( H ( Y ) − ε ) (cid:32) m (cid:89) i =1 |A nε ( B i | y ) | (cid:33) |A nε ( S | y ) | − n ( H ( B B ··· B m S ) − ε ) + 2 nγ nε |A nε ( Y ) | − n ( H ( B Y ) − ε ) − n ( H ( S ) − ε ) (cid:32) m (cid:89) i =1 |A nε ( B i | y ) | (cid:33) |A nε ( S | y ) | (108) ≤ n ( H ( B )+ ε ) nγ nε n ( H ( Y )+ ε ) − n ( H ( Y ) − ε ) (cid:32) m (cid:89) i =1 n ( H ( B i | Y )+2 ε ) (cid:33) n ( H ( S | Y )+2 ε ) − n ( H ( B S ) − ε ) + 2 nγ nε n ( H ( Y )+ ε ) − n ( H ( B Y ) − ε ) − n ( H ( S ) − ε ) (cid:32) m (cid:89) i =1 n ( H ( B i | Y )+2 ε ) (cid:33) n ( H ( S | Y )+2 ε ) (109) = 2 n ( H ( B )+ γ + ( (cid:80) mi =1 H ( B i | Y ) ) + H ( S | Y ) − H ( B S )+(12+2 m ) ε )+ 2 n ( γ + H ( Y ) − H ( B Y ) − H ( S )+ ( (cid:80) mi =1 H ( B i | Y ) ) + H ( S | Y )+(8+2 m ) ε ) . (110)Here, we substituted n = nγ in (108). Then:(107) is obtained by working out the summations over b , b , . . . , b m , s in the ﬁrst part and s in the second part. RAFT 18 (108) is obtained using for the ﬁrst part that M a = |B nε ( B ) | ≤ n ( H ( B )+ ε ) , i.e., the B -typicality property P , (12) for p ( y ) ,and (17) for p (˜ b , ˜ s ) . For the second part, we used (12) for p (˜ s ) and (17) for p ( b , y ) .(109) follows from (11) and (19).The conclusion from (110) is that for: H ( B ) + γ ≤ R BMD − (12 + 2 m ) ε, (111)and for: γ ≤ H ( B Y ) + H ( S ) − H ( Y ) − (cid:32) m (cid:88) i =1 H ( B i | Y ) (cid:33) − H ( S | Y ) − (8 + 2 m ) ε, (112)the error probability of the second kind: P e (2) ≤ ε (113)for n large enough. The second constraint (112) is already implied by the ﬁrst constraint (111) since: γ ≤ H ( B Y ) + H ( S ) − H ( Y ) − (cid:32) m (cid:88) i =1 H ( B i | Y ) (cid:33) − H ( S | Y ) − (8 + 2 m ) ε (114) = H ( B Y ) + H ( S ) − H ( Y ) − (cid:32) m (cid:88) i =1 H ( B i | Y ) (cid:33) − H ( S | Y ) + H ( B S ) − H ( B S ) − (8 + 2 m ) ε (115) = H ( B Y ) + H ( S ) − H ( Y ) + R BMD − H ( B ) − H ( S ) − (8 + 2 m ) ε (116) = H ( B | Y ) + R BMD − H ( B ) − (8 + 2 m ) ε. (117)Using (101) and (113) in (43), we ﬁnd that the total error probability averaged over all possible modiﬁed sign-codes P e ≤ ε for n large enough. This implies the existence of a modiﬁed sign-code with total error probability P e = Pr { ( ˆ M a , ˆ M s ) (cid:54) =( M a , M s ) } ≤ ε . This holds for all ε > , and thus, the rate: R = H ( B ) + γ ≤ R BMD , (118)is achievable with modiﬁed sign-coding, which concludes the proof of Theorem 4.R EFERENCES[1] H. Imai and S. Hirakawa, “A new multilevel coding method using error-correcting codes,”

IEEE Trans. Inf. Theory , vol. 23, no. 3, pp. 371–377, May1977.[2] U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: Theoretical concepts and practical design rules,”

IEEE Trans. Inf. Theory , vol. 45,no. 5, pp. 1361–1391, July 1999.[3] G. Ungerb¨ock, “Channel coding with multilevel/phase signals,”

IEEE Trans. Inf. Theory , vol. 28, no. 1, pp. 55–67, Jan. 1982.[4] E. Zehavi, “8-psk trellis codes for a Rayleigh channel,”

IEEE Trans. Commun. , vol. 40, no. 5, pp. 873–884, May 1992.[5] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,”

IEEE Trans. on Inf. Theory , vol. 44, no. 3, pp. 927–946, May 1998.[6] “IEEE standard 802.11-2016,”

IEEE Standard for Inform. Technol.-Telecommun. and Inform. Exchange Between Syst. Local and Metropolitan AreaNetworks-Speciﬁc Requirements-Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations , 2016.[7] “Digital video broadcasting (DVB); 2nd generation framing structure, channel coding and modulation systems for broadcasting, interactive services,news gathering and other broadband satellite applications (DVB-S2),”

European Telecommun. Standards Inst. (ETSI) Standard EN 302 307, Rev. 1.2.1 ,2009.[8] G. B¨ocherer, F. Steiner, and P. Schulte, “Bandwidth efﬁcient and rate-matched low-density parity-check coded modulation,”

IEEE Trans. Commun. ,vol. 63, no. 12, pp. 4651–4665, Dec. 2015.[9] T. M. Cover and J. A. Thomas,

Elements of Information Theory , 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.[10] F. Buchali, F. Steiner, G. Bcherer, L. Schmalen, P. Schulte, and W. Idler, “Rate adaptation and reach increase by probabilistically shaped 64-qam: Anexperimental demonstration,”

J. Lightw. Technol. , vol. 34, no. 7, pp. 1599–1609, Apr. 2016.[11] W. Idler, F. Buchali, L. Schmalen, E. Lach, R. Braun, G. B¨ocherer, P. Schulte, and F. Steiner, “Field trial of a 1 tb/s super-channel network usingprobabilistically shaped constellations,”

J. Lightw. Technol. , vol. 35, no. 8, pp. 1399–1406, Apr. 2017.[12] G. B¨ocherer, “Achievable rates for probabilistic shaping,” arXiv e-prints , May 2018. [Online]. Available: http://arxiv.org/abs/1707.01134v5[13] ——, “Principles of coded modulation,” in

Dept. of Electr. and Comput. Eng., Tech. Uni. of Munich (habilitation thesis) , 2018.[14] R. A. Amjad, “Information rates and error exponents for probabilistic amplitude shaping,” in

Proc. IEEE Inf. Theory Workshop , 2018.[15] R. G. Gallager,

Information Theory and Reliable Communication . New York, NY, USA: John Wiley & Sons, 1968.[16] G. Kramer, “Topics in multi-user information theory,”

Found. Trends Commun. Inf. Theory , vol. 4, no. 4-5, pp. 265–444, June 2008.[17] G. Kaplan and S. Shamai (Shitz), “Information rates and error exponents of compound channels with application to antipodal signaling in a fadingenvironment,”

A ¨EU. Archiv f¨ur Elektronik und ¨Ubertragungstechnik , vol. 47, no. 4, pp. 228–239, 1993.[18] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai (Shitz), “On information rates for mismatched decoders,”

IEEE Trans. on Inf. Theory , vol. 40, no. 6,pp. 1953–1967, Nov. 1994.[19] L. Szczecinski and A. Alvarado,

Bit-Interleaved Coded Modulation: Fundamentals, Analysis, and Design . Chichester, UK: John Wiley & Sons, 2015.[20] A. Martinez, Guill´en i F`abregas, G. Caire, and F. M. J. Willems, “Bit-interleaved coded modulation revisited: A mismatched decoding perspective,”

IEEE Trans. Inf. Theory , vol. 55, no. 6, pp. 2756–2765, June 2009.[21] A. Guill´en i F`abregas and A. Martinez, “Bit-interleaved coded modulation with shaping,” in

Proc. IEEE Inf. Theory Workshop , 2010.[22] A. Alvarado, F. Br¨annstr¨om, and E. Agrell, “High SNR bounds for the BICM capacity,” in

Proc. IEEE Inf. Theory Workshop , 2011.[23] L. Peng, “Fundamentals of bit-interleaved coded modulation and reliable source transmission,” Ph.D. dissertation, University of Cambridge, Cambridge,UK, Dec. 2012.[24] G. B¨ocherer, “Probabilistic signal shaping for bit-metric decoding,” in

Proc. IEEE Int. Symp. Inf. Theory , 2014.[25] ——, “Probabilistic signal shaping for bit-metric decoding,” arXiv e-prints , Apr. 2014. [Online]. Available: http://arxiv.org/abs/1401.6190

RAFT 19 [26] G. B¨ocherer, “Achievable rates for shaped bit-metric decoding,” arXiv e-prints , May 2016. [Online]. Available: http://arxiv.org/abs/1410.8075v6[27] P. Schulte and G. B¨ocherer, “Constant composition distribution matching,”

IEEE Trans. Inf. Theory , vol. 62, no. 1, pp. 430–434, Jan. 2016.[28] T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, and K. Parsons, “Multiset-partition distribution matching,”

IEEE Trans. on Commun. , vol. 67,no. 3, pp. 1885–1893, Mar. 2019.[29] P. Schulte and F. Steiner, “Divergence-optimal ﬁxed-to-ﬁxed length distribution matching with shell mapping,”

IEEE Wireless Commun. Lett. , vol. 8,no. 2, pp. 620–623, Apr. 2019.[30] Y. C. G¨ultekin, W. J. van Houtum, A. Koppelaar, and F. M. J. Willems, “Enumerative sphere shaping for wireless communications with short packets,”

IEEE Trans. Wireless Commun. , vol. 19, no. 2, pp. 1098–1112, Feb. 2020.[31] R. A. Amjad, “Information rates and error exponents for probabilistic amplitude shaping,” arXiv e-prints , June 2018. [Online]. Available:https://arxiv.org/abs/1802.05973[32] N. Shulman and M. Feder, “Random coding techniques for nonrandom codes,”

IEEE Trans. on Inf. Theory , vol. 45, no. 6, pp. 2101–2104, Sep. 1999.[33] R. Yeung,

Related Researches

Hierarchical Resource Allocation: Balancing Throughput and Energy Efficiency in Wireless Systems

by Bho Matthiesen

Throughput Maximization of Network-Coded and Multi-Level Cache-Enabled Heterogeneous Network

by Mohammed S. Al-Abiad

Multi-Cell Mobile Edge Computing: Joint Service Migration and Resource Allocation

by Zezu Liang

Function-Correcting Codes

by Andreas Lenz

Optimizing RRH Placement Under a Noise-Limited Point-to-Point Wireless Backhaul

by Hussein A. Ammar

Reconfigurable Intelligent Surface Assisted Edge Machine Learning

by Shanfeng Huang

Wiener Filter versus Recurrent Neural Network-based 2D-Channel Estimation for V2X Communications

by Moritz Benedikt Fischer

Minimizing the Age of Incorrect Information for Real-time Tracking of Markov Remote Sources

by Saad Kriouile

Bounds on List Decoding of Linearized Reed-Solomon Codes

by Sven Puchinger

Predictive Relay Selection: A Cooperative Diversity Scheme Using Deep Learning

by Wei Jiang

Globally Optimal Beamforming for Rate Splitting Multiple Access

by Bho Matthiesen

Gridded UAV Swarm for Secrecy Rate Maximization with Unknown Eavesdropper

by Christantus O. Nnamani

Deep Reinforcement Learning for Energy-Efficient Beamforming Design in Cell-Free Networks

by Weilai Li

Resource Allocation and Scheduling in Non-coherent User-centric Cell-free MIMO

by Hussein A. Ammar

Learning Rate Optimization for Federated Learning Exploiting Over-the-air Computation

by Chunmei Xu

Feedback Capacity of Parallel ACGN Channels and Kalman Filter: Power Allocation with Feedback

by Song Fang

Low-Power Status Updates via Sleep-Wake Scheduling

by Ahmed M. Bedewy

A Practical Coding Scheme for the BSC with Feedback

by Ke Wu

Decoding of Space-Symmetric Rank Errors

by Thomas Jerkovits

On the Global Optimality of Whittle's index policy for minimizing the age of information

by Saad Kriouile

On Fading Channel Dependency Structures with a Positive Zero-Outage Capacity

by Karl-Ludwig Besser

The Age of Gossip in Networks

by Roy D. Yates

On Single-User Interactive Beam Alignment in Millimeter Wave Systems: Impact of Feedback Delay

by Abbas Khalili

First- and Second-Moment Constrained Gaussian Channels

by Shuai Ma

Lower Bound on the Optimal Access Bandwidth of ( K+2,K,2 )-MDS Array Code with Degraded Read Friendly

by Ting-Yi Wu

«
1

2

3

4

»

Submitted on 24 Feb 2020 (v1), last revised 12 Jul 2020 (this version, v3) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar