[PDF] On the Smooth Renyi Entropy and Variable-Length Source Coding Allowing Errors

Abstract

In this paper, we consider the problem of variable-length source coding allowing errors. The exponential moment of the codeword length is analyzed in the non-asymptotic regime and in the asymptotic regime. Our results show that the smooth Renyi entropy characterizes the optimal exponential moment of the codeword length.

Full PDF

aa r X i v : . [ c s . I T ] D ec On the Smooth R ´enyi Entropy andVariable-Length Source Coding AllowingErrors

Shigeaki Kuzuoka

Member, IEEE

Abstract

In this paper, we consider the problem of variable-length source coding allowing errors. The exponential momentof the codeword length is analyzed in the non-asymptotic regime and in the asymptotic regime. Our results show thatthe smooth R´enyi entropy characterizes the optimal exponential moment of the codeword length.

Index Terms ε source coding, exponential moment, the smooth R´enyi entropy, variable-length source coding I. I

NTRODUCTION

Renato Renner and Stefan Wolf [1], [2] introduced a new information measure called the smooth R´enyi entropy ,which is a generalization of the R´enyi entropy [3]. They showed that two special cases of the smooth R´enyi entropyhave clear operational meaning in the ﬁxed-length source coding problem and the intrinsic randomness problem:(i) the smooth max R´enyi entropy H ε characterizes the minimum number of bits needed for with decoding errorprobability at most ε , and (ii) the smooth min R´enyi entropy H ε ∞ characterizes the amount of uniform randomnessthat can be extracted from a random variable.As the notations indicate, the smooth max/min R´enyi entropies H ε and H ε ∞ are deﬁned as limits of the smoothR´enyi entropy H εα of order α ; see Section II for details. Hence it is natural to ask Does the smooth R´enyi entropy H εα of order α have operational meaning? In this study, we answer this question by demonstrating that the smooth R´enyi entropy characterizes the optimalexponential moment of the codeword length of variable-length source code allowing errors. Our contributions inthis paper are summarized as follows.

The work of S. Kuzuoka was supported in part by JSPS KAKENHI Grant Number 26820145.S. Kuzuoka is with the Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama, 640-8510 Japan, e-mail:[email protected].

October 23, 2018 DRAFT

A. Contributions

We consider ε -variable-length source coding problem, that is, a variable-length source coding problem wheredecoding error is allowed as long as it is smaller than or equal to the given value ε ≥ . Usually, in this setting,the average codeword length E [ ℓ ( X )] is investigated; see, e.g., [4]. In this study, however, we adopt the criterionof minimizing the exponential moment of the codeword length ℓ ( X ) , i.e., E [exp { λℓ ( X ) } ] for a given parameter λ > .Our ﬁrst contribution is to give non-asymptotic upper and lower bounds on the exponential moment E [exp { λℓ ( X ) } ] of the codeword length of ε -source codes. Our one-shot coding theorems (Theorems 1 and 2) demonstrate that theoptimal exponential moment of the codeword length is characterized by the smooth R´enyi entropy.Our second contribution is a general formula (in the sense of Verd´u-Han [5], [6]) for the asymptotic exponentialrate of the exponential moment of the codeword length (Theorem 3). Moreover, to apply our general formula tothe mixture of i.i.d. sources, we analyze the asymptotic behavior of the smooth R´enyi entropy of the mixture ofi.i.d. sources (Theorem 4). B. Related Work

The smooth R´enyi entropy was ﬁrst introduced by Renner and Wolf [1], [2]. In our analysis, we use the result ofKoga [7], where the smooth R´enyi entropy is investigated by using majorization theory. As mentioned above, thesmooth max and min R´enyi entropies have clear operational meaning respectively in the ﬁxed-length source coding[1], [2], [8] and the intrinsic randomness problem [1], [2], [9]. Recently it was shown that the smooth max R´enyientropy has an application also in variable-length lossless source coding [10], where it is shown that the smoothmax R´enyi entropy characterizes the threshold of codeword length under the condition that the overﬂow probabilityis at most ε . Similarly, the smooth R´enyi divergence also ﬁnds applications in several coding problems; see, e.g.,[11]–[13].On the other hand, conventional R´enyi entropy [3] also plays an important role in analyses of variable-lengthsource coding [14], [15] and ﬁxed-length coding [16]. In particular, Campbell [14] proposed the exponential momentof the codeword length as an alternative to the average codeword length as a criterion for variable-length losslesssource coding, and gave upper and lower bounds on the exponential moment in terms of conventional R´enyi entropy.Our one-shot coding theorems (Theorems 1 and 2) can be considered as a generalization of Campbell’s result to thecase where the decoding error is allowed. It should be mentioned here that a general problem for the optimizationof the exponential moment of a given cost function was investigated by Merhav [17], [18].Although we consider variable-length codes subject to preﬁx constraints in this paper, studies on variable-lengthcodes without preﬁx constraints are also important [19], [20]. In particular, Courtade and Verd´u [20] gave non-asymptotic upper and lower bounds on the distribution of codeword length by bounding the cumulant generatingfunction of the optimum codeword lengths. It should be noted that in [19] and [20] codes are required to be injectiveso that the decoder can losslessly recover the source output from the codeword. The problem of variable-length October 23, 2018 DRAFT source coding allowing errors was investigated under the criterion of the average codeword length by Koga andYamamoto [4] and Kostina et al. [21], [22].

C. Paper Organization

The rest of the paper is organized as follows. At ﬁrst, we review the deﬁnition of the smooth R´enyi entropyin Section II. Then, in Section III, non-asymptotic coding theorems for ε -variable-length source coding is given.The general formula for the optimal exponential moment of the codeword length achievable by ε -variable-lengthsource codes is given in Section IV. Section V concludes the paper. To ensure that the main ideas are seamlesslycommunicated in the main text, we relegate all proofs to the appendices.II. S MOOTH

R ´

ENYI E NTROPY

Renner and Wolf [2] deﬁned the smooth R´enyi entropy as follows. Fix ε ∈ [0 , . Given a distribution P ona ﬁnite or countably inﬁnite set X , let B ε ( P ) be the set of non-negative functions Q with domain X such that Q ( x ) ≤ P ( x ) , for all x ∈ X , and P x ∈X Q ( x ) ≥ − ε . Then, for α ∈ (0 , ∪ (1 , ∞ ) , the ε -smooth R´enyi entropyof order α is deﬁned as H εα ( P ) , − α log r εα ( P ) (1)where r εα ( P ) , inf Q ∈B ε ( P ) X x ∈X [ Q ( x )] α . (2)For basic properties of H εα ( P ) , see [2] and [7]. Remark 1.

The deﬁnition of H εα ( P ) above is slightly different from the original deﬁnition given in [1]. However,in [2], it is pointed out that this version is more appropriate for generalization to conditional smooth R´enyi entropy.Our result in this paper demonstrates that this version is appropriate also for describing the variable-length sourcecoding theorem allowing errors. Remark 2.

The max and min smooth R´enyi entropies are deﬁned respectively as H ε ( P ) , lim α ↓ H εα ( P ) , (3) H ε ∞ ( P ) , lim α →∞ H εα ( P ) . (4)As shown in [1], H εα ( P ) for α ∈ (0 , is, up to an additive constant, equal to H ε ( P ) . This fact may be one ofthe reasons that H εα ( P ) has received less attentions. However, as shown in Theorems 1 and 2 below, H εα ( P ) itselfplays an important role in the evaluation of the exponential moment of the length function. Throughout this paper, log denotes the natural logarithm.

October 23, 2018 DRAFT

III. O NE -S HOT C ODING T HEOREM

Let X be a ﬁnite or countably inﬁnite set and X be a random variable on X with the distribution P . Withoutloss of generality, we assume P ( X ) > for all x ∈ X .A variable-length source code Φ = ( ϕ, ψ, C ) is determined a triplet of a set C ⊂ { , } ∗ of ﬁnite-length binarystrings, an encoder mapping ϕ : X → C , and a decoder mapping ψ : C → X . Without loss of generality, we assumethat C = { ϕ ( x ) : x ∈ X } . Further, we assume that C satisﬁes the preﬁx condition. The error probability of the code Φ is deﬁned as P e (Φ) , Pr { X = ψ ( ϕ ( X )) } . (5)The length of the codeword ϕ ( x ) of x (in bits) is denoted by k ϕ ( x ) k . Let ℓ be the length function (in nats): ℓ ( x ) , k ϕ ( x ) k log 2 . (6)In this study, we focus on the exponential moment of the length function. For a given λ > , let us consider theproblem of minimizing E P [exp { λℓ ( X ) } ] (7)subject to P e (Φ) ≤ ε , where E P denotes the expectation with respect to the distribution P . Remark 3.

In Theorems 1 and 2 below, we allow the encoder mapping ϕ to be stochastic. Let W ϕ ( c | x ) be theprobability that x ∈ X is encoded in c ∈ C . Then, P e (Φ) and E P [exp { λℓ ( X ) } ] are precisely written as P e (Φ) = X x ∈X P ( x ) X c : x = ψ ( c ) W ϕ ( c | x ) (8)and E P [exp { λℓ ( X ) } ]= X x ∈X P ( x ) X c ∈C W ϕ ( c | x ) exp { λ k c k log 2 } (9)where k c k is the length (in bits) of c ∈ C . Note that, without loss of optimality we can assume that the decodermapping ψ is deterministic. Indeed, for a given W ϕ , we can choose ψ so that ψ ( c ) = arg max W ϕ ( c | x ) P ( x ) . (10)The following theorems demonstrate that the exponential moment E P [exp { λℓ ( X ) } ] is characterized by thesmooth R´enyi entropy H ε / (1+ λ ) ( P ) . Theorem 1.

For any λ > and ε ∈ [0 , , there exists a code Φ (with a stochastic encoder) such that P e (Φ) ≤ ε and E P [exp { λℓ ( X ) } ] ≤ λ exp n λH ε / (1+ λ ) ( P ) o + ε λ . (11) October 23, 2018 DRAFT

Theorem 2.

Fix λ > and ε ∈ [0 , . Then, for any code Φ such that P e (Φ) ≤ ε , we have E P [exp { λℓ ( X ) } ] ≥ exp n λH ε / (1+ λ ) ( P ) o . (12)Theorem 1 and Theorem 2 will be proved in respectively Appendix A and Appendix B.In Theorem 1, we allow the encoder mapping ϕ to be stochastic. However, it is not hard to modify the theoremfor the case where only deterministic encoder mappings are allowed. To see this, let X = { , , , . . . } and assumethat P (1) ≥ P (2) ≥ · · · . Then, let k ∗ = k ∗ ( ε ) be the minimum integer such that P k ∗ i =1 P ( i ) ≥ − ε and let Q ∗ ( i ) ,  P ( i ) , i = 1 , , . . . , k ∗ − , − ε − P k ∗ − i =1 P ( i ) , i = k ∗ , , i > k ∗ . (13)Since < / (1 + λ ) < for all λ > , we can use (A) of Theorem 1 of [7] and obtain λH ε / (1+ λ ) ( P ) = (1 + λ ) log X i ∈X [ Q ∗ ( i )] / (1+ λ ) ! . (14)Based on this fact, we can modify the proof of Theorem 1 and obtain the following result (See Appendix C fordetails). Proposition 1.

For any λ > and ε ∈ [0 , , there exists a code Φ with a deterministic encoder mapping ϕ suchthat P e (Φ) ≤ ε + γ ε and E P [exp { λℓ ( X ) } ] ≤ λ exp n λH ε + γ ε / (1+ λ ) ( P ) o + ( ε + γ ε )2 λ (15)where γ ε , − ε − P k ∗ ( ε ) − i =1 P ( i ) . IV. G ENERAL F ORMULA

In this section, we consider coding problem for general sources. A general source X = { X n = ( X ( n )1 , X ( n )2 , . . . , X ( n ) n ) } ∞ n =1 (16)is deﬁned as a sequence of random variables X n on the n -th Cartesian product X n of X [6]. The distribution of X n is denoted by P X n , which is not required to satisfy the consistency condition.We consider a sequence of coding problems indexed by the blocklength n . A code of block length n is denotedby Φ n = ( ϕ n , ψ n , C n ) . The length function of Φ n is denoted by ℓ n , i.e., ℓ n ( x n ) , k ϕ n ( x n ) k log 2 for all x n ∈ X n .We are interested in the asymptotic behavior of (1 /n ) log E P Xn [exp { λℓ n ( X n ) } ] .A value E is said to be ε -achievable if there exists a sequence { Φ n } ∞ n =1 of codes satisfying lim sup n →∞ P e (Φ n ) ≤ ε (17) October 23, 2018 DRAFT and lim sup n →∞ n log E P Xn [exp { λℓ n ( X n ) } ] ≤ E. (18)The inﬁmum of ε -achievable values is denoted by E ελ ( X ) .To characterize E ελ ( X ) , we introduce the following notation. H εα ( X ) , lim δ ↓ lim sup n →∞ n H ε + δα ( P X n ) . (19)It is worth to note that H εα ( X ) is non-negative for all α ∈ (0 , and ε ∈ [0 , . Indeed, we can prove the strongerfact that lim inf n →∞ n H εα ( P X n ) ≥ , α ∈ (0 , , ε ∈ [0 , . (20)We will prove (20) in Appendix D.Now, we state our general formula, which will be proved in Appendix E. Theorem 3.

For any λ > and ε ∈ [0 , , E ελ ( X ) = λH ε / (1+ λ ) ( X ) . (21)In the following, we consider a mixture of i.i.d. sources. Let us consider m distributions P X i ( i = 1 , , . . . , m )on X . A general source X is said to be a mixture of P X , P X , . . . , P X m if there exists ( α , α . . . , α m ) satisfying P i α i = 1 , α i > ( i = 1 , . . . , m ), and for all n = 1 , , . . . and all x n = ( x , x , . . . , x n ) ∈ X n P X n ( x n ) = m X i =1 α i P X ni ( x n ) (22) = m X i =1 α i n Y t =1 P X i ( x t ) . (23)For the later use, let A i , P i − j =1 α i ( i = 1 , , . . . , m ) and A m +1 , . Further, to simplify the analysis, we assumethat H ( X ) > H ( X ) > · · · > H ( X m ) (24)where H ( X i ) is the entropy determined by P X i : H ( X i ) , X x ∈X P X i ( x ) log 1 P X i ( x ) . (25)Then, H εα ( X ) of the mixture X is characterized as in the following theorem. Theorem 4.

Let X be a mixture of i.i.d. sources satisfying (24). Fix α ∈ (0 , , i , and ε ∈ [ A i , A i +1 ) . Then, wehave H εα ( X ) = H ( X i ) . (26)Theorem 4 will be proved in Appendix F. October 23, 2018 DRAFT

Remark 4.

Letting m = 1 and ε ↓ , Theorem (4) derives Lemma I.2 of [1]. Remark 5.

Although Theorem 4 assumes that components are i.i.d., this assumption is not crucial. Indeed, theproperty of i.i.d. sources used in the proof of the theorem is only that the AEP [23] holds, i.e., lim n →∞ Pr (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) n log 1 P X ni ( X ni ) − H ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) > γ (cid:27) = 0 (27)for all i = 1 , , . . . , m and any γ > . Hence, it is straightforward to extend the theorem so that it can be appliedfor the mixture of stationary and ergodic sources. Moreover, since we use only the AEP, it can be seen that theassumption (24) is also not crucial. Assume that there exists some components j = j such as H ( X j ) = H ( X j ) .Then, let us consider the modiﬁed mixture such that “ j th component is substituted by j th component”: i.e., P X n ( x n ) = X i = j α ′ i P X ni ( x n ) (28)where α ′ i = α i for i = j and α ′ j = α j + α j . Then H εα ( X ) the modiﬁed mixture is identical with the originalone.Combining Theorems 3 and 4, we have the coding theorem for the mixture of i.i.d. sources. Corollary 1.

Let X be a mixture of i.i.d. sources satisfying (24). Then, for any λ > and ε ∈ [0 , , E ελ ( X ) = λH ( X i ) (29)where i is determined so that ε ∈ [ A i , A i +1 ) .V. C ONCLUDING R EMARKS

In this paper, we investigated the the exponential moment of the codeword length of variable-length source codingallowing decoding errors. Roughly speaking, our results demonstrate that the logarithm log E P [exp { λℓ ( X ) } ] ofthe optimal exponential moment E P [exp { λℓ ( X ) } ] is characterized by the smooth R´enyi entropy H ε / (1+ λ ) .Now, let us consider to take λ → ∞ . When λ is sufﬁciently large, the value log E P [exp { λℓ ( X ) } ] is dominatedby the longest codeword length max x ∈X ℓ ( x ) . In other words, to minimize log E P [exp { λℓ ( X ) } ] , we need tominimize the longest codeword length max x ∈X ℓ ( x ) . Therefore, roughly speaking, the difference between variable-length coding and ﬁxed-length coding becomes smaller as λ is increased. On the other hand, we know that H ε =lim λ →∞ H ε / (1+ λ ) characterizes the optimal coding rate of ﬁxed-length codes [1], [8]. The above argument impliesthat we can unify our result and results of [1], [8] in the limit of λ → ∞ or equivalently α → .On the other hand, since λ > and thus < / (1 + λ ) < , only the smooth R´enyi entropy H εα of the order α ∈ (0 , plays an important role in our coding theorems. It remains as a future work to investigate the operationalmeaning of the smooth R´enyi entropy H εα of the order α > . October 23, 2018 DRAFT A PPENDIX AP ROOF OF T HEOREM δ > arbitrarily and choose Q ∈ B ε ( P ) so that log X x ∈X [ Q ( x )] / (1+ λ ) ≤ λ λ H ε / (1+ λ ) ( P ) + δ. (30)Let A , { x ∈ X : Q ( x ) > } and ˜ Q ( λ ) ( x ) = [ Q ( x )] / (1+ λ ) P x ′ ∈A [ Q ( x ′ )] / (1+ λ ) . (31)Since X x ∈A −{− log ˜ Q ( λ ) ( x ) } ≤ (32)holds, we can construct ( ˆ ϕ, ˆ ψ, ˆ C ) such that (i) ˆ C , { ˆ ϕ ( x ) : x ∈ A} is preﬁx free, (ii) ˆ ϕ : A → ˆ C satisﬁes k ˆ ϕ ( x ) k = ⌈− log ˜ Q ( λ ) ( x ) ⌉ , (33)and, (iii) ˆ ϕ and ˆ ψ : ˆ C → A satisfy x = ψ ( ϕ ( x )) for all x ∈ A .For each x ∈ X , let γ ( x ) = Q ( x ) /P ( x ) . Note that ≤ γ ( x ) ≤ and γ ( x ) = 0 for all x / ∈ A . Since Q ∈ B ε ( P ) ,we have X x ∈X P ( x ) γ ( x ) ≥ − ε. (34)Now, we construct a stochastic encoder as follows: ϕ ( x ) =  ◦ ˆ ϕ ( x ) with probability γ ( x )1 with probability − γ ( x ) (35)where ◦ denotes the concatenation. That is, x is encoded to “0” following ˆ ϕ ( x ) with probability γ ( x ) , and “1”with probability − γ ( x ) . We can construct the corresponding decoder ψ so that x = ψ ( ϕ ( x )) for all x ∈ X . Thelength function ℓ ( x ) = k ϕ ( x ) k log 2 satisﬁes that, if x is encoded to “0” following ˆ ϕ ( x ) , ℓ ( x ) ≤ − log ˜ Q ( λ ) ( x ) + 2 log 2 (36) October 23, 2018 DRAFT and otherwise ℓ ( x ) = log 2 . Hence, we have E P [exp { λℓ ( X ) } ] ≤ X x ∈X P ( x ) γ ( x ) exp n λ [ − log ˜ Q ( λ ) ( x ) + 2 log 2] o + X x ∈X P ( x )(1 − γ ( x )) exp { λ log 2 } (37) (a) ≤ λ X x ∈A Q ( x ) exp n − λ log ˜ Q ( λ ) ( x ) o + ε λ (38) = 2 λ (X x ∈A [ Q ( x )] / (1+ λ ) ) (1+ λ ) + ε λ (39) (b) ≤ λ exp n λH ε / (1+ λ ) ( P ) + (1 + λ ) δ o + ε λ (40)where the inequality (a) follows from (34) and (b) follows from (30). Since we can choose δ > arbitrarily small,we have (11). A PPENDIX BP ROOF OF T HEOREM

Φ = ( ϕ, ψ, C ) such that P e (Φ) ≤ − ε .Recall that we allow ϕ to be stochastic. Let W ϕ ( c | x ) be the probability such that x ∈ X is mapped to c ∈ C .Let Γ( x ) , { c ∈ C : W ϕ ( c | x ) > , x = ψ ( c ) } (41)and γ ( x ) , X c ∈ Γ( x ) W ϕ ( c | x ) . (42)Note that, since P e (Φ) ≤ ε , we have X x ∈X P ( x ) γ ( x ) ≥ − ε. (43)Further, we have E P [exp { λℓ ( X ) } ]= X x ∈X P ( x ) X c ∈C W ϕ ( c | x ) exp { λ k c k log 2 } (44) ≥ X x ∈X P ( x ) X c ∈ Γ( x ) W ϕ ( c | x ) exp { λ k c k log 2 } . (45) October 23, 2018 DRAFT0

From Jensen’s inequality, it is not hard to see that X c ∈ Γ( x ) W ϕ ( c | x ) exp { λ k c k log 2 }≥ γ ( x ) exp  λ X c ∈ Γ( x ) W ϕ ( c | x ) γ ( x ) k c k log 2  (46) ≥ γ ( x ) exp  λ X c ∈ Γ( x ) W ϕ ( c | x ) γ ( x ) ¯ ℓ ( x )  (47) = γ ( x ) exp (cid:8) λ ¯ ℓ ( x ) (cid:9) (48)where ¯ ℓ ( x ) , min c ∈ Γ( x ) k c k log 2 . (49)Substituting (48) into (45), we have E P [exp { λℓ ( X ) } ] ≥ X x ∈X P ( x ) γ ( x ) exp (cid:8) λ ¯ ℓ ( x ) (cid:9) . (50)Let Q ( x ) = P ( x ) γ ( x ) . Then, from (43), we have Q ∈ B ε ( P ) . Let A = { x : Q ( x ) > } . Then, (50) can be writtenas E P [exp { λℓ ( X ) } ] ≥ X x ∈A Q ( x ) exp (cid:8) λ ¯ ℓ ( x ) (cid:9) . (51)On the other hand, from the deﬁnition of the set Γ( x ) , we can see that Γ( x ) ∩ Γ( x ′ ) = ∅ for all x, x ′ ∈ A suchthat x = x ′ , and thus we have X x ∈A exp {− ¯ ℓ ( x ) } ≤ . (52)Now, let us consider the problem of minimizing P x ∈A Q ( x ) exp (cid:8) λ ¯ ℓ ( x ) (cid:9) subject to (52). As shown in Example1 in Section 3 of [18], the minimum is achieved by ¯ ℓ ( x ) = − log [ Q ( x )] / (1+ λ ) P x ′ ∈A [ Q ( x ′ )] / (1+ λ ) , x ∈ A . (53)In other words, (51) can be rewritten as E P [exp { λℓ ( X ) } ] ≥ X x ∈A Q ( x ) exp (cid:26) − λ log [ Q ( x )] / (1+ λ ) P x ′ ∈A [ Q ( x ′ )] / (1+ λ ) (cid:27) (54) = " X x ∈A [ Q ( x )] / (1+ λ ) (1+ λ ) (55) ≥ h r ε / (1+ λ ) ( P ) i (1+ λ ) (56)where the last inequality follows from the fact Q ∈ B ε ( P ) . By the deﬁnition of the smooth R´enyi entropy, we have(12). October 23, 2018 DRAFT1 A PPENDIX CP ROOF OF P ROPOSITION ˆ Q ∗ ( i ) ,  P ( i ) , i = 1 , , . . . , k ∗ ( ε ) − , , i > k ∗ ( ε ) . (57)Then, from Theorem 1 (A) of [7], we have λH ε + γ ε / (1+ λ ) ( P ) = (1 + λ ) log X i ∈X [ Q ∗ ( i )] / (1+ λ ) ! . (58)Now, let us substitute ε (resp. Q ) in the proof of Theorem 1 with ε + γ ε (resp. ˆ Q ∗ ). Note that γ ( x ) = ˆ Q ∗ ( x ) /P ( x ) satisﬁes γ ( x ) =  , i = 1 , , . . . , k ∗ ( ε ) − , , i > k ∗ ( ε ) . (59)Thus, the encoder constructed in the proof of Theorem 1 becomes deterministic. Hence, we can obtain theproposition. A PPENDIX DP ROOF OF (20)Fix α ∈ (0 , and ε ∈ [0 , , and then, choose ε ′ > so that ε + ε ′ < . From Lemma 2 of [2], we have n H εα ( P X n ) ≥ n H ε + ε ′ ( P X n ) − log(1 /ε ′ ) n (1 − α ) . (60)On the other hand, it is known that H ε + ε ′ ( P X n ) can be written as H ε + ε ′ ( P X n ) = min A⊆X P ( A ) ≥ − ε − ε ′ log |A| , (61)where |A| is the cardinality of A , and thus, H ε + ε ′ ( P X n ) ≥ . So, taking the inferior limit of both sides of (60),we have (20). A PPENDIX EP ROOF OF T HEOREM Direct Part:

At ﬁrst, we consider the case where H ε / (1+ λ ) ( X ) > . (62)In this case, for all sufﬁciently small δ > and sufﬁciently large n , we have λ exp n λH ε + δ / (1+ λ ) ( P X n ) o > ε λ . (63) October 23, 2018 DRAFT2

Hence, from Theorem 1, there exists { Φ n } ∞ n =1 such that P e (Φ n ) ≤ ε + δ, n = 1 , , . . . , (64)and, for sufﬁciently large n , E P [exp { λℓ n ( X n ) } ] ≤ × λ exp n λH ε + δ / (1+ λ ) ( P X n ) o . (65)Eq. (65) gives lim sup n →∞ n log E P [exp { λℓ n ( X n ) } ] ≤ λ lim sup n →∞ n H ε + δ / (1+ λ ) ( P X n ) . (66)By using the diagonal line argument (see [6]), we can conclude that λH ε / (1+ λ ) ( X ) is ε -achievable.If H ε / (1+ λ ) ( X ) = 0 then (65) is replaced with E P [exp { λℓ n ( X n ) } ] ≤ max n × λ exp n λH ε + δ / (1+ λ ) ( P X n ) o , × ε λ o (67)In this case, we can also prove that is ε -achievable in the same way as the case H ε / (1+ λ ) ( X ) > . Converse Part:

Suppose that E is ε -achievable and ﬁx δ > arbitrarily. Then there exists { Φ n } ∞ n =1 such that,for sufﬁciently large n , P e (Φ n ) ≤ ε + δ (68)and lim sup n →∞ n log E P [exp { λℓ n ( X n ) } ] ≤ E. (69)On the other hand, from Theorem 2, for sufﬁciently large n such that (68) holds, E P [exp { λℓ n ( X n ) } ] ≥ exp n λH ε + δ / (1+ λ ) ( P X n ) o . (70)Combining (69) with (70), we have E ≥ λ lim sup n →∞ n H ε + δ / (1+ λ ) ( P X n ) . (71)Since δ > is arbitrary, letting δ ↓ , we have E ≥ λH ε / (1+ λ ) ( X ) .A PPENDIX FP ROOF OF T HEOREM A. Lemmas

Before proving the theorem, we introduce some lemmas.

October 23, 2018 DRAFT3

Lemma 1.

Fix γ > arbitrarily. Then, there exists an integer n so that for all n ≥ n and all i = 1 , , . . . , m , Pr (cid:26) n log 1 P X n ( X n ) ≥ H ( X i ) − γ (cid:27) ≥ A i +1 − γ. (72) Proof:

For each k = 1 , , . . . , m , let S nk , (cid:26) x n : 1 n log 1 P X nk ( x n ) ≥ H ( X k ) − γ (cid:27) . (73)Since i.i.d. sources satisfy the AEP [23], we can choose n such that X x n ∈S nk P X nk ( x n ) ≥ − γ , ∀ n ≥ n , ∀ k = 1 , , . . . , m. (74)Moreover, we can choose n ≥ n so that − n log γ ≤ γ , ∀ n ≥ n . (75)Then, for all n ≥ n , any i = 1 , , . . . , m , and any k = 1 , , . . . , i , ˜ S ni , (cid:26) x n : 1 n log 1 P X n ( x n ) ≥ H ( X i ) − γ (cid:27) (76)and T nk , (cid:8) x n : P X nk ( x n ) ≤ γP X n ( x n ) (cid:9) (77)satisfy that ˜ S ni ∪ T nk ⊇ (cid:26) x n : 1 n log γP X nk ( x n ) ≥ H ( X i ) − γ (cid:27) (78) = (cid:26) x n : 1 n log 1 P X nk ( x n ) ≥ H ( X i ) − γ − n log γ (cid:27) (79) ⊇ (cid:26) x n : 1 n log 1 P X nk ( x n ) ≥ H ( X i ) − γ (cid:27) (80) ⊇ S nk . (81)Thus, we have X x n ∈ ˜ S ni P X n ( x n ) ≥ i X k =1 α k X x n ∈ ˜ S ni P X nk ( x n ) (82) ≥ i X k =1 α k X x n ∈S nk P X nk ( x n ) − i X k =1 α k X x n ∈T nk P X nk ( x n ) (83) ≥ A i +1 (1 − γ/ − i X k =1 α k X x n ∈T nk γ P X n ( x n ) (84) ≥ A i +1 (1 − γ/ − γ (85) ≥ A i +1 − γ. (86) October 23, 2018 DRAFT4

Lemma 2.

Fix γ > arbitrarily. Then, there exists an integer n so that for all n ≥ n and all i = 1 , , . . . , m , Pr (cid:26) n log 1 P X n ( X n ) ≤ H ( X i ) + γ (cid:27) ≥ − A i − γ. (87) Proof:

For each k = 1 , , . . . , m , let S nk , (cid:26) x n : 1 n log 1 P X nk ( x n ) ≤ H ( X k ) + γ (cid:27) . (88)Since i.i.d. sources satisfy the AEP [23], we can choose n such that X ( x n ,y n ) ∈S nk P X nk Y nk ( x n , y n ) ≥ − γ, ∀ n ≥ n , ∀ k = 1 , , . . . , m. (89)Moreover, we can choose n ≥ n so that − n log α k ≤ γ , ∀ n ≥ n , ∀ k = 1 , , . . . , m. (90)Hence, for all n ≥ n and any i , ˜ S ni , (cid:26) x n : 1 n log 1 P X n ( x n ) ≤ H ( X k ) + γ (cid:27) (91)satisﬁes that ˜ S ni ⊇ (cid:26) x n : 1 n log 1 α k P X nk ( x n ) ≤ H ( X i ) + γ (cid:27) (92) = (cid:26) x n : 1 n log 1 P X nk ( x n ) ≤ H ( X i ) + γ + 1 n log α k (cid:27) (93) ⊇ S nk . (94)Thus, we have X x n ∈ ˜ S ni P X n ( x n ) ≥ m X k = i α k X x n ∈ ˜ S ni P X nk ( x n ) (95) ≥ m X k = i α k X x n ∈S nk P X nk ( x n ) (96) ≥ (1 − A i )(1 − γ ) (97) ≥ − A i − γ. (98) Lemma 3.

Fix γ > so that H ( X j ) − γ > H ( X j +1 ) + γ for all j = 1 , , . . . , m − . Then, for sufﬁciently large n and any i = 1 , , . . . , m , α i − γ ≤ Pr (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) n log 1 P X n ( X n ) − H ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ γ (cid:27) ≤ α i + 2 γ. (99) October 23, 2018 DRAFT5

Proof:

From Lemmas 1 and 2, we have Pr (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) n log 1 P X n ( X n ) − H ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ γ (cid:27) = Pr (cid:26) n log 1 P X n ( X n ) ≤ H ( X i ) + γ (cid:27) − Pr (cid:26) n log 1 P X n ( X n ) < H ( X i ) − γ (cid:27) (100) ≥ { − A i − γ } − { − ( A i +1 − γ ) } (101) = α i − γ (102)and Pr (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) n log 1 P X n ( X n ) − H ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ γ (cid:27) = Pr (cid:26) n log 1 P X n ( X n ) ≤ H ( X i ) + γ (cid:27) − Pr (cid:26) n log 1 P X n ( X n ) < H ( X i ) − γ (cid:27) (103) ≤ Pr (cid:26) n log 1 P X n ( X n ) < H ( X i − ) − γ (cid:27) − Pr (cid:26) n log 1 P X n ( X n ) ≤ H ( X i +1 ) + γ (cid:27) (104) ≤ { − ( A i − γ ) } − { − A i +1 − γ } (105) = α i + 2 γ. (106) B. Proof of Theorem 4

To prove the theorem, it is sufﬁcient to show that, for ε satisfying A i < ε < A i +1 , lim sup n →∞ n H εα ( P X n ) ≤ H ( X i ) (107)and lim inf n →∞ n H εα ( P X n ) ≥ H ( X i ) . (108) Proof of (107) : Fix γ > sufﬁciently small so that H ( X j ) − γ > H ( X j +1 ) + γ for all j = 1 , , . . . , m − and that A i + 2 mγ < ε . For j = 1 , , . . . , m , let T n ( j ) , (cid:26) x n : (cid:12)(cid:12)(cid:12)(cid:12) n log 1 P X n ( x n ) − H ( X j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ γ (cid:27) . (109)Note that T n ( j ) ∩ T n (ˆ j ) = ∅ ( j = ˆ j ). Further, from Lemma 3, we have Pr  X n ∈ m [ j = i T n ( j )  = m X j = i Pr { X n ∈ T n ( j ) } (110) ≥ m X j = i ( α j − γ ) (111) ≥ − A i − mγ (112) ≥ − ε. (113) October 23, 2018 DRAFT6

From (113), we can see that Q n ( x n ) ,  P X n ( x n ) , if x n ∈ S mj = i T n ( j )0 , otherwise (114)satisﬁes Q n ∈ B ε ( P X n ) . Thus, from the deﬁnition of r εα ( P X n ) , r εα ( P X n ) ≤ X x n ∈X n [ Q n ( x n )] α (115) = m X j = i X x n ∈T n ( j ) [ P X n ( x n )] α (116) ≤ m X j = i |T n ( j ) | exp {− αn ( H ( X j ) − γ ) } (117) ≤ m X j = i exp { n ( H ( X j ) + γ ) } exp {− αn ( H ( X j ) − γ ) } (118) = m X j = i exp { n [(1 − α ) H ( X j ) + (1 + α ) γ ] } (119) ≤ m exp { n [(1 − α ) H ( X i ) + (1 + α ) γ ] } . (120)Hence, we have n H εα ( P X n ) ≤ H ( X i ) + 1 + α − α γ + 1 n log m (121)and thus lim sup n →∞ n H εα ( P X n ) ≤ H ( X i ) + 1 + α − α γ. (122)Since we can choose γ > arbitrarily small, we have (107). Proof of (108) : If H ( X i ) = 0 then (108) is apparent, since (20) holds. So, we assume H ( X i ) > .Fix γ > sufﬁciently small so that H ( X j ) − γ > H ( X j +1 ) + γ for all j = 1 , , . . . , m − and that A i + 6 mγ <ε < A i +1 − mγ . We assume that n is sufﬁciently large so that exp {− n [ H ( X i ) − γ ] } ≤ mγ . Let us deﬁne T n ( j ) as in (109). Note that P X n ( x n ) < P X n (ˆ x n ) , x n ∈ T n ( j ) , ˆ x n ∈ T n (ˆ j ) , j < ˆ j. (123)Let S n , S mj =1 T n ( j ) and S n , X n \ S n . Then, from Lemma 3, we have P X n ( S n ) ≤ mγ. (124)Let us sort the sequences in X n so that P X n ( x n ) ≥ P X n ( x n ) ≥ P X n ( x n ) ≥ . . . . (125)Then, let A n , { x n , x n , . . . , x nk ∗ − } and A + n , A n ∪ { x nk ∗ } where k ∗ is the integer satisfying k ∗ X k =1 P X n ( x nk ) ≥ − ε (126) October 23, 2018 DRAFT7 and k ∗ − X k =1 P X n ( x nk ) < − ε. (127)We ﬁrst show that x nk ∗ ∈ S n or x nk ∗ ∈ i [ j =1 T n ( j ) . (128)From Lemma 3, we have Pr  X n ∈ m [ j = i +1 T n ( j )  ≤ m X j = i +1 ( α j + 2 γ ) (129) ≤ − A i +1 + 2 mγ (130) ≤ − ε − mγ. (131)Since P ( A + n ) ≥ − ε holds, from (124) and (131), we have A + n ∩  i [ j =1 T n ( j )  = ∅ . (132)From (123) and (132), we can obtain (128).We next notice that, from (123) and the assumption that n is sufﬁciently large, we have P X n ( x n ) ≤ exp {− n [ H ( X i ) − γ ] } ≤ mγ for all x n ∈ S ij =1 T n ( j ) . Combining this fact with (124) and (128), we can see that P X n ( A n ∩ S n ) ≥ − ε − mγ − P X n ( S n ) (133) ≥ − ε − mγ. (134)Thus, from (131) and (134), we have Pr  X n ∈ A n ∩  i [ j =1 T n ( j )  ≥ mγ. (135)Moreover, since (123) holds, (135) implies that P X n ( A n ∩ T n ( i )) ≥ β , min { mγ, α i } (136)and thus |A n ∩ T n ( i ) | ≥ β exp { n [ H ( X i ) − γ ] } . (137)Hence, we have X x n ∈A n ∩T n ( i ) [ P X n ( x n )] α ≥ β exp { n [(1 − α ) H ( X i ) − (1 + α ) γ ] } . (138) October 23, 2018 DRAFT8

Now we use the result of Koga [7]. Theorem 1 (A) of [7] tells us that H εα ( P X n ) ≥ − α log k ∗ − X k =1 [ P X n ( x nk )] α ! (139) = 11 − α log X x n ∈A n [ P X n ( x n )] α ! . (140)By combining this with (138), we have n H εα ( P X n ) ≥ n (1 − α ) log  X x n ∈A n ∩T n ( i ) [ P X n ( x n )] α  (141) ≥ n (1 − α ) log ( β exp { n [(1 − α ) H ( X i ) − (1 + α ) γ ] } ) (142) = H ( X i ) − α − α γ + log βn (1 − α ) . (143)Thus, we have lim inf n →∞ n H εα ( P X n ) ≥ H ( X i ) − α − α γ. (144)Since we can choose γ > arbitrarily small, we have (108).A CKNOWLEDGMENT

The author would like to thank Prof. Mitsugu Iwamoto for informing him about [14].R

EFERENCES[1] R. Renner and S. Wolf, “Smooth R´enyi entropy and applications,” in

Proc. IEEE ISIT 2004 , 2004, p. 232.[2] ——, “Simple and tight bounds for information reconciliation and privacy ampliﬁcation,” in

Advances in cryptology-ASIACRYPT 2005 .Springer, 2005, pp. 199–216.[3] A. R´enyi, “On measures of entropy and information,” in

Proc. 4th Berkeley Symp. on Math. Stat. and Prob. , 1961, pp. 547–561.[4] H. Koga and H. Yamamoto, “Asymptotic properties on codeword lengths of an optimal FV code for general sources,”

IEEE Trans. Inf.Theory , vol. 51, no. 4, pp. 1546–1555, Apr. 2005.[5] S. Verd´u and T. S. Han, “A general formula for channel capacity,”

IEEE Trans. Inf. Theory , vol. 40, no. 4, pp. 1147–1157, Jul. 1994.[6] T. S. Han,

Information-spectrum methods in information theory . New York: Springer-Verlag, 2002.[7] H. Koga, “Characterization of the smooth R´enyi entropy using majorization,” in

Proc. 2013 IEEE Information Theory Workshop (ITW2013) ,2013.[8] T. Uyematsu, “A new uniﬁed method for ﬁxed-length source coding problems of general sources,”

IEICE Trans. Fundamentals , vol. E93-A,no. 11, pp. 1868–1877, 2010.[9] T. Uyematsu and S. Kunimatsu, “A new uniﬁed method for intrinsic randomness problems of general sources,” in

Information TheoryWorkshop (ITW), 2013 IEEE , Sept 2013, pp. 624–628.[10] S. Saito and T. Matsushima, “On the achievable overﬂow threshold of variable-length coding using smooth max-entropy for general sources,”in

Proc. of the 38th Symposium on Information Theory and Its Applications (SITA2015) , Okayama, Japan, Nov. 2015, pp. 142–146, inJapanese.[11] N. Datta and R. Renner, “Smooth entropies and the quantum information spectrum,”

IEEE Trans. Inf. Theory , vol. 55, no. 6, pp. 2807–2815,2009.[12] L. Wang, R. Colbeck, and R. Renner, “Simple channel coding bounds,” in

Proc. 2009 IEEE International Symposium on InformationTheory (ISIT2009) , 2009, pp. 1804–1808.

October 23, 2018 DRAFT9 [13] N. A. Warsi, “One-shot bounds for various information theoretic problems using smooth min and max R´enyi divergence,” in

Proc. 2013IEEE Information Theory Workshop (ITW2013) , 2013.[14] L. Campbell, “A coding theorem and R´enyi’s entropy,”

Information and control , vol. 8, no. 4, pp. 423–429, 1965.[15] F. Jelinek, “Buffer overﬂow in variable length coding of ﬁxed rate sources,”

IEEE Trans. Inf. Theory , vol. 14, no. 3, pp. 490–501, 1968.[16] H. Shimokawa, “R´enyi’s entropy and error exponent of source coding with countably inﬁnite alphabet,” in , 2006, pp. 1831–1835.[17] N. Merhav, “On optimum strategies for minimizing exponential moments of a given cost function,”

Communications in Information andSystems , vol. 11, no. 4, pp. 343–368, 2011.[18] ——, “On optimum strategies for minimizing the exponential moments of a given cost function.” [Online]. Available:http://arxiv.org/abs/1103.2882[19] I. Kontoyiannis and S. Verd´u, “Optimal lossless data compression: Non-asymptotics and asymptotics,”

IEEE Trans. Inf. Theory , vol. 60,no. 2, pp. 777–795, 2014.[20] T. Courtade and S. Verd´u, “Cumulant generating function of codeword lengths in optimal lossless compression,” in

Information Theory(ISIT), 2014 IEEE International Symposium on , June 2014, pp. 2494–2498.[21] V. Kostina, Y. Polyanskiy, and S. Verd´u, “Variable-length compression allowing errors,” in

Information Theory (ISIT), 2014 IEEEInternational Symposium on , June 2014, pp. 2679–2683.[22] ——, “Variable-length compression allowing errors (extended),” arXiv:1402.0608.[23] T. M. Cover and J. A. Thomas,

Elements of Information Theory , 2nd ed. John Wiley & Sons, Inc., 2006., 2nd ed. John Wiley & Sons, Inc., 2006.