[PDF] Codes for Tasks and Rényi Entropy Rate

Abstract

A task is randomly drawn from a finite set of tasks and is described using a fixed number of bits. All the tasks that share its description must be performed. Upper and lower bounds on the minimum ρ -th moment of the number of performed tasks are derived. The key is an analog of the Kraft Inequality for partitions of finite sets. When a sequence of tasks is produced by a source of a given Rényi entropy rate of order 1/(1+ρ) and n tasks are jointly described using nR bits, it is shown that for R larger than the Rényi entropy rate, the ρ -th moment of the ratio of performed tasks to n can be driven to one as n tends to infinity, and that for R less than the Rényi entropy rate it tends to infinity. This generalizes a recent result for IID sources by the same authors. A mismatched version of the direct part is also considered, where the code is designed according to the wrong law. The penalty incurred by the mismatch can be expressed in terms of a divergence measure that was shown by Sundaresan to play a similar role in the Massey-Arikan guessing problem.

Full PDF

aa r X i v : . [ c s . I T ] M a y Codes for Tasks and R ´enyi Entropy Rate

Christoph Bunte and Amos Lapidoth

ETH ZurichEmail: { bunte,lapidoth } @isi.ee.ethz.ch Abstract —A task is randomly drawn from a ﬁnite set of tasksand is described using a ﬁxed number of bits. All the tasks thatshare its description must be performed. Upper and lower boundson the minimum ρ -th moment of the number of performed tasksare derived. The key is an analog of the Kraft Inequality forpartitions of ﬁnite sets. When a sequence of tasks is producedby a source of a given R´enyi entropy rate of order / (1 + ρ ) and n tasks are jointly described using nR bits, it is shown thatfor R larger than the R´enyi entropy rate, the ρ -th moment of theratio of performed tasks to n can be driven to one as n tends toinﬁnity, and that for R less than the R´enyi entropy rate it tendsto inﬁnity. This generalizes a recent result for IID sources by thesame authors. A mismatched version of the direct part is alsoconsidered, where the code is designed according to the wronglaw. The penalty incurred by the mismatch can be expressed interms of a divergence measure that was shown by Sundaresanto play a similar role in the Massey-Arikan guessing problem. I. I

NTRODUCTION

You are asked to complete a task X drawn according to aPMF P from a ﬁnite set of tasks X . You do not get to see X but only its description f ( X ) , where f : X → { , . . . , M } . (1)In other words, X is described to you using log M bits. Youknow the mapping f and you promise to complete X basedon f ( X ) , which leaves you no choice but to complete everytask in the set f − ( f ( X )) = { x ∈ X : f ( x ) = f ( X ) } . (2)In the interesting case where M < |X | , you will sometimeshave to perform multiple tasks, of which all but one aresuperﬂuous. (We use |·| to denote the cardinality of sets.)Given M , the goal is to design f so as to minimize the ρ -thmoment of the number of tasks you perform E (cid:2) | f − ( f ( X )) | ρ (cid:3) = X x ∈X P ( x ) | f − ( f ( x )) | ρ , (3)where ρ is some given positive number. This minimum isat least one because X is in f − ( f ( X )) ; it decreases as M increases; and it is equal to one when M ≥ |X | .Our ﬁrst result is a pair of upper and lower bounds on thisminimum as a function of M . The bounds are expressed interms of the R´enyi entropy of X of order / (1 + ρ ) : H ρ ( X ) = 1 + ρρ log X x ∈X P ( x ) ρ . (4) Throughout log( · ) stands for log ( · ) , the logarithm to base .For typographic reasons we henceforth use the notation ˜ ρ = 11 + ρ , ρ > . (5) Theorem I.1.

Let ρ > .1) For all positive integers M and every f : X →{ , . . . , M } , E (cid:2) | f − ( f ( X )) | ρ (cid:3) ≥ ρ ( H ˜ ρ ( X ) − log M ) . (6)

2) For every integer

M > log |X | + 2 there exists f : X →{ , . . . , M } such that E (cid:2) | f − ( f ( X )) | ρ (cid:3) < ρ ( H ˜ ρ ( X ) − log f M ) , (7) where f M = ( M − log |X | − / . A proof is provided in Section II. The lower bound isessentially [1, Lemma III.1].Theorem I.1 is particularly useful when applied to the casewhere a sequence of tasks is produced by a source { X i } ∞ i =1 with alphabet X and the ﬁrst n tasks X n = ( X , . . . , X n ) arejointly described using nR bits: f : X n → { , . . . , nR } . (8)We assume that the order in which the tasks are performedmatters and that every n -tuple of tasks in the set f − ( f ( X n )) must be performed. The total number of performed tasks istherefore n | f − ( f ( X n )) | , and the ratio of the number of per-formed tasks to the number of assigned tasks is | f − ( f ( X n )) | . Theorem I.2.

Let { X i } ∞ i =1 be any source with ﬁnite alpha-bet X .1) If R > lim sup n →∞ H ˜ ρ ( X n ) /n , then there exist en-coders f n : X n → { , . . . , nR } such that lim n →∞ E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) = 1 . (9)

2) If

R < lim inf n →∞ H ˜ ρ ( X n ) /n , then for any choice ofencoders f n : X n → { , . . . , nR } , lim n →∞ E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) = ∞ . (10) Proof:

On account of Theorem I.1, for all n large enoughso that nR > n log |X | + 2 , nρ (cid:0) H ˜ ρ ( Xn ) n − R (cid:1) ≤ min f n : X n →{ ,..., nR } E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) < nρ (cid:0) H ˜ ρ ( Xn ) n − R + δ n (cid:1) , (11) Throughout nR stands for ⌊ nR ⌋ . here δ n → as n → ∞ .When it exists, the limit lim n →∞ H α ( X n ) n (12)is called the R´enyi entropy rate of order α . It exists fora large class of sources, including time-invariant Markovsources [2]–[4]. Theorem I.2 generalizes [1, Theorem IV.1]from IID sources to sources with memory and furnishes anoperational characterization of the R´enyi entropy rate forall orders in (0 , . Note that for IID sources the R´enyientropy rate reduces to the R´enyi entropy because in thiscase H ˜ ρ ( X n ) = nH ˜ ρ ( X ) .The proof of the lower bound in Theorem I.1 hinges on thefollowing simple observation. Proposition I.3. If L , . . . , L M is a partition of a ﬁnite set X into M nonempty subsets (i.e., S Mm =1 L m = X and L m ∩L m ′ = ∅ if, and only if, m ′ = m ), and L ( x ) is the cardinalityof the subset containing x , then X x ∈X L ( x ) = M. (13) Proof: X x ∈X L ( x ) = M X m =1 X x ∈L m L ( x ) (14) = M X m =1 X x ∈L m |L m | (15) = M. (16)Note that the reverse of Proposition I.3 is not true in thesense that if λ : X → N , { , , . . . } satisﬁes X x ∈X λ ( x ) = µ, (17)then there need not exist a partition of X into ⌈ µ ⌉ subsets suchthat the cardinality of the subset containing x is at most λ ( x ) .A counterexample is X = { a, b, c, d } with λ ( a ) = 1 , λ ( b ) =2 , and λ ( c ) = λ ( d ) = 4 . In this example, µ = 2 , but weneed 3 subsets to satisfy the cardinality constraints.However, as our next result shows, allowing a slightly largernumber of subsets sufﬁces: Proposition I.4. If X is a ﬁnite set, λ : X → N ∪ { + ∞} and X x ∈X λ ( x ) = µ (18) (with the convention / ∞ = 0 ), then there exists a partitionof X into at most min α> ⌊ αµ + log α |X | + 2 ⌋ (19) subsets such that L ( x ) ≤ min { λ ( x ) , |X |} , for all x ∈ X , (20) where L ( x ) is the cardinality of the subset containing x . Proposition I.4 is the key to the upper bound in Theorem I.1.Combined with Proposition I.3 it can be considered an analogof the Kraft Inequality [5, Theorem 5.5.1] for partitions ofﬁnite sets. A proof is given in Section III.The construction of the encoder in the derivation of theupper bound in Theorem I.1 requires knowledge of the distri-bution P of X (see Section II-B). In Section IV we consider amismatched version of this direct part where the constructionis carried out based on the law Q instead of P . We show thatthe penalty incurred by the mismatch between P and Q canbe expressed in terms of the divergence measures ∆ α ( P || Q ) , log P x ∈X Q ( x ) α (cid:0)P x ∈X P ( x ) α (cid:1) − α (cid:18) X x ∈X P ( x ) Q ( x ) − α (cid:19) α − α , (21)where α can be any positive number not equal to one. (We usethe convention / and a/ ∞ if a > .) This familyof divergence measures was proposed by Sundaresan [6], whoshowed that it plays a similar role in the Massey-Arikanguessing problem [7], [8].II. P ROOF OF T HEOREM

I.1

A. The Lower Bound (Converse)

The proof of the lower bound is inspired by the proof of [8,Theorem 1]. Fix an encoder f : X → { , . . . , M } , and notethat it gives rise to a partition of X into the M subsets { x ∈ X : f ( x ) = m } , m ∈ { , . . . , M } . (22)Let N denote the number of nonempty subsets in this partition.Also note that for this partition the cardinality of the subsetcontaining x is L ( x ) = | f − ( f ( x )) | , for all x ∈ X . (23)Recall H¨older’s Inequality: If a, b : X → [0 , ∞ ) , p, q > and /p + 1 /q = 1 , then X x ∈X a ( x ) b ( x ) ≤ (cid:18) X x ∈X a ( x ) p (cid:19) /p (cid:18) X x ∈X b ( x ) q (cid:19) /q . (24)Rearranging (24) gives X x ∈X a ( x ) p ≥ (cid:18) X x ∈X b ( x ) q (cid:19) − p/q (cid:18) X x ∈X a ( x ) b ( x ) (cid:19) p . (25)Substituting p = 1 + ρ , q = (1 + ρ ) /ρ , a ( x ) = P ( x ) ρ | f − ( f ( x )) | ρ ρ and b ( x ) = | f − ( f ( x )) | − ρ ρ in (25), we obtain X x ∈X P ( x ) | f − ( f ( x )) | ρ (26) ≥ (cid:18) X x ∈X | f − ( f ( x )) | (cid:19) − ρ (cid:18) X x ∈X P ( x ) ρ (cid:19) ρ (27) = 2 ρ ( H ˜ ρ ( X ) − log N ) (28) ≥ ρ ( H ˜ ρ ( X ) − log M ) , (29)where (28) follows from (4), (23), and Proposition I.3; andwhere (29) follows because N ≤ M . . The Upper Bound (Direct Part) Since H¨older’s Inequality (24) holds with equality if, andonly if, (iff) a ( x ) p is proportional to b ( x ) q , it follows thatthe lower bound in Theorem I.1 holds with equality iff | f − ( f ( x )) | is proportional to P ( x ) − / (1+ ρ ) . We derive theupper bound in Theorem I.1 by constructing a partition thatapproximately satisﬁes this relationship. To this end, we useProposition I.4 with α = 2 in (19) and λ ( x ) = ((cid:6) β P ( x ) − ρ (cid:7) if P ( x ) > , + ∞ if P ( x ) = 0 , (30)where we choose β just large enough to guarantee the ex-istence of a partition of X into at most M subsets satisfy-ing (20). This is accomplished by the choice β = 2 P x ∈X P ( x ) ρ M − log |X | − . (31)(This is where we need M > log |X | + 2 .) Indeed, µ = X x ∈X λ ( x ) (32) ≤ X x ∈X P ( x ) ρ β (33) = M − log |X | − , (34)and hence µ + log |X | + 2 ≤ M. (35)Let then the partition L , . . . , L N with N ≤ M be as promisedby Proposition I.4, and construct f : X → { , . . . , M } bysetting f ( x ) = m if x ∈ L m . For this encoder, X x ∈X P ( x ) | f − ( f ( x )) | ρ = X x : P ( x ) > P ( x ) L ( x ) ρ (36) ≤ X x : P ( x ) > P ( x ) λ ( x ) ρ (37) < ρ ( H ˜ ρ ( X ) − log f M ) , (38)where the strict inequality follows from (30) and the inequality ⌈ ξ ⌉ ρ < ρ ξ ρ , for all ξ ≥ , (39)which is easily checked by considering separately thecases ≤ ξ ≤ and ξ > .III. P ROOF OF P ROPOSITION

I.4We describe a procedure for constructing a partition of X with the desired properties. Since the labels do not matter,we may assume for convenience of notation that X = { , . . . , |X |} and λ (1) ≤ λ (2) ≤ · · · ≤ λ ( |X | ) . (40)The ﬁrst subset in the partition we construct is L = { x ∈ X : λ ( x ) ≥ |X |} . (41) If X = L , then the construction is complete and (19) and (20)are clearly satisﬁed. Otherwise we follow the steps below toconstruct additional subsets L , . . . , L M . Step : If |X \ L | ≤ λ (1) , (42)then we complete the construction by setting L = X \ L and M = 1 . Otherwise we set L = (cid:8) , . . . , λ (1) (cid:9) (43)and go to Step . Step m ≥ : If (cid:12)(cid:12)(cid:12)(cid:12) X \ m − [ i =0 L i (cid:12)(cid:12)(cid:12)(cid:12) ≤ λ ( |L | + . . . + |L m − | + 1) , (44)then we complete the construction by setting L m = X \ S m − i =0 L i and M = m . Otherwise we let L m contain the λ ( |L | + . . . + |L m − | + 1) smallestelements of X \ S m − i =0 L i , i.e., we set L m = (cid:8) |L | + . . . + |L m − | + 1 , . . . , |L | + . . . + |L m − | + λ ( |L | + . . . + |L m − | + 1) (cid:9) (45)and go to Step m + 1 .We next verify that (20) is satisﬁed and that the total numberof subsets M + 1 does not exceed (19). Clearly, L ( x ) ≤ |X | for every x ∈ X , so to prove (20) we check that L ( x ) ≤ λ ( x ) for every x ∈ X . It is clear that L ( x ) ≤ λ ( x ) for all x ∈ L . Let k ( x ) denote the smallest element in the subsetcontaining x . Then L ( x ) ≤ λ ( k ( x )) for all x ∈ S Mm =1 L m byconstruction, and since k ( x ) ≤ x , we have λ ( k ( x )) ≤ λ ( x ) bythe assumption (40), and hence L ( x ) ≤ λ ( x ) for all x ∈ X .It remains to check that M + 1 does not exceed (19). Thisis clearly true when M = 1 , so we assume that M ≥ . Since L ( x ) = λ ( k ( x )) for all x ∈ S M − m =1 L m , we have on accountof Proposition I.3 M = X x ∈ S Mm =1 L m L ( x ) (46) = 1 + X x ∈ S M − m =1 L m L ( x ) (47) = 1 + X x ∈ S M − m =1 L m λ ( k ( x )) . (48)Fix an arbitrary α > and let M be the set of indices m ∈{ , . . . , M − } such that there is an x ∈ L m with λ ( x ) >αλ ( k ( x )) . We next argue that |M| < log α |X | . To this end,enumerate the indices in M as m < m < · · · < m |M| . Foreach i ∈ { , . . . , |M|} select x i ∈ L m i such that λ ( x i ) >αλ ( k ( x i )) . Then λ ( x ) > αλ ( k ( x )) (49) ≥ α. (50)ote that if m < m ′ and x ∈ L m and x ′ ∈ L m ′ , then x < x ′ .Thus, x < k ( x ) because x ∈ L m and k ( x ) ∈ L m , and m < m . Consequently, λ ( x ) > αλ ( k ( x )) (51) ≥ αλ ( x ) (52) > α . (53)Iterating this argument shows that λ ( x |M| ) > α |M| . (54)And since λ ( x ) ≤ |X | for x ∈ S Mm =1 L m by (41), itfollows that |M| < log α |X | . Continuing from (48) with M c , { , . . . , M − } \ M , M = 1 + |M| + X x ∈ S m ∈M c L m λ ( k ( x )) (55) < α |X | + α X x ∈ S m ∈M c L m λ ( x ) (56) ≤ α |X | + αµ, (57)where the ﬁrst inequality follows because λ ( x ) ≤ αλ ( k ( x )) for x ∈ S m ∈M c L m , and where the second inequality followsfrom the hypothesis of the proposition. Since M + 1 is aninteger and α > is arbitrary, it follows from (57) that M + 1 is upper-bounded by (19).IV. M ISMATCH

The key to the upper bound in Theorem I.1 was to useProposition I.4 with λ as in (30) and (31) to obtain a partitionof X for which the cardinality of the subset containing x isapproximately proportional to P ( x ) − / (1+ ρ ) . Evidently, thisconstruction requires knowledge of the distribution P of X . Inthis section, we derive the penalty when P is replaced with Q in (30) and (31). Since it is then still true that µ ≤ M − log |X | − , (58)Proposition I.4 guarantees the existence of a partition of X into at most M subsets satisfying (20). Constructing f fromthis partition as in Section II-B and proceeding similarly asin (36) to (38), we obtain X x ∈X P ( x ) | f − ( f ( x )) | ρ < ρ ( H ˜ ρ ( X )+∆ ˜ ρ ( P || Q ) − log f M ) , (59)where ∆ ˜ ρ ( P || Q ) is as in (21) and f M is as in Theorem I.1.(Note that ∆ ˜ ρ ( P || Q ) < ∞ only if the support of P iscontained in the support of Q .) The penalty in the exponentwhen compared to the upper bound in Theorem I.1 is thusgiven by ∆ ˜ ρ ( P || Q ) . To reinforce this, further note that ∆ α ( P n || Q n ) = n ∆ α ( P || Q ) , (60)where P n and Q n are the n -fold products of P and Q .Consequently, if the source { X i } ∞ i =1 is IID P and we construct f n : X n → { , . . . , nR } similarly as above based on Q n instead of P n , we obtain the bound E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) < nρ ( H ˜ ρ ( X )+∆ ˜ ρ ( P || Q ) − R + δ n ) , (61)where δ n → as n → ∞ . The RHS of (61) tends to oneprovided that R > H ˜ ρ ( X ) + ∆ ˜ ρ ( P || Q ) . Thus, in the IIDcase ∆ ˜ ρ ( P || Q ) is the rate penalty incurred by the mismatchbetween P and Q .We conclude this section with some propertiesof ∆ α ( P || Q ) . Properties 1–3 (see below) were givenin [6]; we repeat them here for completeness. Note thatR´enyi’s divergence (see, e.g., [9]) D α ( P || Q ) = 1 α − X x ∈X P ( x ) α Q ( x ) − α , (62)satisﬁes Properties 1 and 3 but none of the others in general. Proposition IV.1.

Let supp ( P ) and supp ( Q ) denote the sup-port sets of P and Q . The functional ∆ α ( P || Q ) has thefollowing properties.1) ∆ α ( P || Q ) ≥ with equality iff P = Q .2) ∆ α ( P || Q ) = ∞ iff ( < α < and supp ( P ) supp ( Q ) )or ( α > and supp ( P ) ∩ supp ( Q ) = ∅ .)3) lim α → ∆ α ( P || Q ) = D ( P || Q ) .4) lim α → ∆ α ( P || Q ) = log | supp ( Q ) || supp ( P ) | if supp ( P ) ⊆ supp ( Q ) .5) lim α →∞ ∆ α ( P || Q ) = log max x ∈X P ( x ) |Q| P x ′∈Q P ( x ′ ) , where Q = (cid:8) x ∈ X : Q ( x ) = max x ′ ∈X Q ( x ′ ) (cid:9) . Proof:

Property 2 follows by inspection of (21). Proper-ties 3–5 follow by simple calculus. As to Property 1, considerﬁrst the case where < α < . In view of Property 2, wemay assume that supp ( P ) ⊆ supp ( Q ) . H¨older’s Inequality (24)with p = 1 /α and q = 1 / (1 − α ) gives X x ∈X P ( x ) α = X x ∈ supp ( P ) P ( x ) α Q ( x ) α (1 − α ) Q ( x ) α (1 − α ) (63) ≤ (cid:18) X x ∈ supp ( P ) P ( x ) Q ( x ) − α (cid:19) α (cid:18) X x ∈ supp ( P ) Q ( x ) α (cid:19) − α ≤ (cid:18) X x ∈X P ( x ) Q ( x ) − α (cid:19) α (cid:18) X x ∈X Q ( x ) α (cid:19) − α . (64)The conditions for equality in H¨older’s Inequality imply thatequality holds iff P = Q . Consider next the case where α > .By H¨older’s Inequality with p = α and q = α/ ( α − , X x ∈X P ( x ) Q ( x ) − α = X x ∈X P ( x ) Q ( x ) α − (65) ≤ (cid:18) X x ∈X P ( x ) α (cid:19) α (cid:18) X x ∈X Q ( x ) α (cid:19) α − α , (66)with equality iff P = Q . EFERENCES[1] C. Bunte and A. Lapidoth, “Source coding, lists, and R´enyi entropy,” in

Information Theory Workshop (ITW), 2013 IEEE , 2013, pp. 350–354.[2] Z. Rached, F. Alajaji, and L. Campbell, “R´enyi’s divergence and entropyrates for ﬁnite alphabet Markov sources,”

IEEE Trans. Inf. Theory , vol. 47,no. 4, pp. 1553–1561, 2001.[3] C.-E. Pﬁster and W. Sullivan, “R´enyi entropy, guesswork moments, andlarge deviations,”

IEEE Trans. Inf. Theory , vol. 50, no. 11, pp. 2794–2800,2004.[4] D. Malone and W. G. Sullivan, “Guesswork and entropy,”

IEEE Trans.Inf. Theory , vol. 50, no. 3, pp. 525–526, 2004. [5] T. Cover and J. Thomas,

Elements of Information Theory , 2nd ed.Hoboken, NJ: John Wiley & Sons, 2006.[6] R. Sundaresan, “Guessing under source uncertainty,”

IEEE Trans. Inf.Theory , vol. 53, no. 1, pp. 269–287, 2007.[7] J. Massey, “Guessing and entropy,” in

Information Theory Proceedings(ISIT), 1994 IEEE International Symposium on , 1994, p. 204.[8] E. Arikan, “An inequality on guessing and its application to sequentialdecoding,”

IEEE Trans. Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996.[9] I. Csisz´ar, “Generalized cutoff rates and R´enyi’s information measures,”