Codes for Tasks and Rényi Entropy Rate
aa r X i v : . [ c s . I T ] M a y Codes for Tasks and R ´enyi Entropy Rate
Christoph Bunte and Amos Lapidoth
ETH ZurichEmail: { bunte,lapidoth } @isi.ee.ethz.ch Abstract —A task is randomly drawn from a finite set of tasksand is described using a fixed number of bits. All the tasks thatshare its description must be performed. Upper and lower boundson the minimum ρ -th moment of the number of performed tasksare derived. The key is an analog of the Kraft Inequality forpartitions of finite sets. When a sequence of tasks is producedby a source of a given R´enyi entropy rate of order / (1 + ρ ) and n tasks are jointly described using nR bits, it is shown thatfor R larger than the R´enyi entropy rate, the ρ -th moment of theratio of performed tasks to n can be driven to one as n tends toinfinity, and that for R less than the R´enyi entropy rate it tendsto infinity. This generalizes a recent result for IID sources by thesame authors. A mismatched version of the direct part is alsoconsidered, where the code is designed according to the wronglaw. The penalty incurred by the mismatch can be expressed interms of a divergence measure that was shown by Sundaresanto play a similar role in the Massey-Arikan guessing problem. I. I
NTRODUCTION
You are asked to complete a task X drawn according to aPMF P from a finite set of tasks X . You do not get to see X but only its description f ( X ) , where f : X → { , . . . , M } . (1)In other words, X is described to you using log M bits. Youknow the mapping f and you promise to complete X basedon f ( X ) , which leaves you no choice but to complete everytask in the set f − ( f ( X )) = { x ∈ X : f ( x ) = f ( X ) } . (2)In the interesting case where M < |X | , you will sometimeshave to perform multiple tasks, of which all but one aresuperfluous. (We use |·| to denote the cardinality of sets.)Given M , the goal is to design f so as to minimize the ρ -thmoment of the number of tasks you perform E (cid:2) | f − ( f ( X )) | ρ (cid:3) = X x ∈X P ( x ) | f − ( f ( x )) | ρ , (3)where ρ is some given positive number. This minimum isat least one because X is in f − ( f ( X )) ; it decreases as M increases; and it is equal to one when M ≥ |X | .Our first result is a pair of upper and lower bounds on thisminimum as a function of M . The bounds are expressed interms of the R´enyi entropy of X of order / (1 + ρ ) : H ρ ( X ) = 1 + ρρ log X x ∈X P ( x ) ρ . (4) Throughout log( · ) stands for log ( · ) , the logarithm to base .For typographic reasons we henceforth use the notation ˜ ρ = 11 + ρ , ρ > . (5) Theorem I.1.
Let ρ > .1) For all positive integers M and every f : X →{ , . . . , M } , E (cid:2) | f − ( f ( X )) | ρ (cid:3) ≥ ρ ( H ˜ ρ ( X ) − log M ) . (6)
2) For every integer
M > log |X | + 2 there exists f : X →{ , . . . , M } such that E (cid:2) | f − ( f ( X )) | ρ (cid:3) < ρ ( H ˜ ρ ( X ) − log f M ) , (7) where f M = ( M − log |X | − / . A proof is provided in Section II. The lower bound isessentially [1, Lemma III.1].Theorem I.1 is particularly useful when applied to the casewhere a sequence of tasks is produced by a source { X i } ∞ i =1 with alphabet X and the first n tasks X n = ( X , . . . , X n ) arejointly described using nR bits: f : X n → { , . . . , nR } . (8)We assume that the order in which the tasks are performedmatters and that every n -tuple of tasks in the set f − ( f ( X n )) must be performed. The total number of performed tasks istherefore n | f − ( f ( X n )) | , and the ratio of the number of per-formed tasks to the number of assigned tasks is | f − ( f ( X n )) | . Theorem I.2.
Let { X i } ∞ i =1 be any source with finite alpha-bet X .1) If R > lim sup n →∞ H ˜ ρ ( X n ) /n , then there exist en-coders f n : X n → { , . . . , nR } such that lim n →∞ E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) = 1 . (9)
2) If
R < lim inf n →∞ H ˜ ρ ( X n ) /n , then for any choice ofencoders f n : X n → { , . . . , nR } , lim n →∞ E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) = ∞ . (10) Proof:
On account of Theorem I.1, for all n large enoughso that nR > n log |X | + 2 , nρ (cid:0) H ˜ ρ ( Xn ) n − R (cid:1) ≤ min f n : X n →{ ,..., nR } E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) < nρ (cid:0) H ˜ ρ ( Xn ) n − R + δ n (cid:1) , (11) Throughout nR stands for ⌊ nR ⌋ . here δ n → as n → ∞ .When it exists, the limit lim n →∞ H α ( X n ) n (12)is called the R´enyi entropy rate of order α . It exists fora large class of sources, including time-invariant Markovsources [2]–[4]. Theorem I.2 generalizes [1, Theorem IV.1]from IID sources to sources with memory and furnishes anoperational characterization of the R´enyi entropy rate forall orders in (0 , . Note that for IID sources the R´enyientropy rate reduces to the R´enyi entropy because in thiscase H ˜ ρ ( X n ) = nH ˜ ρ ( X ) .The proof of the lower bound in Theorem I.1 hinges on thefollowing simple observation. Proposition I.3. If L , . . . , L M is a partition of a finite set X into M nonempty subsets (i.e., S Mm =1 L m = X and L m ∩L m ′ = ∅ if, and only if, m ′ = m ), and L ( x ) is the cardinalityof the subset containing x , then X x ∈X L ( x ) = M. (13) Proof: X x ∈X L ( x ) = M X m =1 X x ∈L m L ( x ) (14) = M X m =1 X x ∈L m |L m | (15) = M. (16)Note that the reverse of Proposition I.3 is not true in thesense that if λ : X → N , { , , . . . } satisfies X x ∈X λ ( x ) = µ, (17)then there need not exist a partition of X into ⌈ µ ⌉ subsets suchthat the cardinality of the subset containing x is at most λ ( x ) .A counterexample is X = { a, b, c, d } with λ ( a ) = 1 , λ ( b ) =2 , and λ ( c ) = λ ( d ) = 4 . In this example, µ = 2 , but weneed 3 subsets to satisfy the cardinality constraints.However, as our next result shows, allowing a slightly largernumber of subsets suffices: Proposition I.4. If X is a finite set, λ : X → N ∪ { + ∞} and X x ∈X λ ( x ) = µ (18) (with the convention / ∞ = 0 ), then there exists a partitionof X into at most min α> ⌊ αµ + log α |X | + 2 ⌋ (19) subsets such that L ( x ) ≤ min { λ ( x ) , |X |} , for all x ∈ X , (20) where L ( x ) is the cardinality of the subset containing x . Proposition I.4 is the key to the upper bound in Theorem I.1.Combined with Proposition I.3 it can be considered an analogof the Kraft Inequality [5, Theorem 5.5.1] for partitions offinite sets. A proof is given in Section III.The construction of the encoder in the derivation of theupper bound in Theorem I.1 requires knowledge of the distri-bution P of X (see Section II-B). In Section IV we consider amismatched version of this direct part where the constructionis carried out based on the law Q instead of P . We show thatthe penalty incurred by the mismatch between P and Q canbe expressed in terms of the divergence measures ∆ α ( P || Q ) , log P x ∈X Q ( x ) α (cid:0)P x ∈X P ( x ) α (cid:1) − α (cid:18) X x ∈X P ( x ) Q ( x ) − α (cid:19) α − α , (21)where α can be any positive number not equal to one. (We usethe convention / and a/ ∞ if a > .) This familyof divergence measures was proposed by Sundaresan [6], whoshowed that it plays a similar role in the Massey-Arikanguessing problem [7], [8].II. P ROOF OF T HEOREM
I.1
A. The Lower Bound (Converse)
The proof of the lower bound is inspired by the proof of [8,Theorem 1]. Fix an encoder f : X → { , . . . , M } , and notethat it gives rise to a partition of X into the M subsets { x ∈ X : f ( x ) = m } , m ∈ { , . . . , M } . (22)Let N denote the number of nonempty subsets in this partition.Also note that for this partition the cardinality of the subsetcontaining x is L ( x ) = | f − ( f ( x )) | , for all x ∈ X . (23)Recall H¨older’s Inequality: If a, b : X → [0 , ∞ ) , p, q > and /p + 1 /q = 1 , then X x ∈X a ( x ) b ( x ) ≤ (cid:18) X x ∈X a ( x ) p (cid:19) /p (cid:18) X x ∈X b ( x ) q (cid:19) /q . (24)Rearranging (24) gives X x ∈X a ( x ) p ≥ (cid:18) X x ∈X b ( x ) q (cid:19) − p/q (cid:18) X x ∈X a ( x ) b ( x ) (cid:19) p . (25)Substituting p = 1 + ρ , q = (1 + ρ ) /ρ , a ( x ) = P ( x ) ρ | f − ( f ( x )) | ρ ρ and b ( x ) = | f − ( f ( x )) | − ρ ρ in (25), we obtain X x ∈X P ( x ) | f − ( f ( x )) | ρ (26) ≥ (cid:18) X x ∈X | f − ( f ( x )) | (cid:19) − ρ (cid:18) X x ∈X P ( x ) ρ (cid:19) ρ (27) = 2 ρ ( H ˜ ρ ( X ) − log N ) (28) ≥ ρ ( H ˜ ρ ( X ) − log M ) , (29)where (28) follows from (4), (23), and Proposition I.3; andwhere (29) follows because N ≤ M . . The Upper Bound (Direct Part) Since H¨older’s Inequality (24) holds with equality if, andonly if, (iff) a ( x ) p is proportional to b ( x ) q , it follows thatthe lower bound in Theorem I.1 holds with equality iff | f − ( f ( x )) | is proportional to P ( x ) − / (1+ ρ ) . We derive theupper bound in Theorem I.1 by constructing a partition thatapproximately satisfies this relationship. To this end, we useProposition I.4 with α = 2 in (19) and λ ( x ) = ((cid:6) β P ( x ) − ρ (cid:7) if P ( x ) > , + ∞ if P ( x ) = 0 , (30)where we choose β just large enough to guarantee the ex-istence of a partition of X into at most M subsets satisfy-ing (20). This is accomplished by the choice β = 2 P x ∈X P ( x ) ρ M − log |X | − . (31)(This is where we need M > log |X | + 2 .) Indeed, µ = X x ∈X λ ( x ) (32) ≤ X x ∈X P ( x ) ρ β (33) = M − log |X | − , (34)and hence µ + log |X | + 2 ≤ M. (35)Let then the partition L , . . . , L N with N ≤ M be as promisedby Proposition I.4, and construct f : X → { , . . . , M } bysetting f ( x ) = m if x ∈ L m . For this encoder, X x ∈X P ( x ) | f − ( f ( x )) | ρ = X x : P ( x ) > P ( x ) L ( x ) ρ (36) ≤ X x : P ( x ) > P ( x ) λ ( x ) ρ (37) < ρ ( H ˜ ρ ( X ) − log f M ) , (38)where the strict inequality follows from (30) and the inequality ⌈ ξ ⌉ ρ < ρ ξ ρ , for all ξ ≥ , (39)which is easily checked by considering separately thecases ≤ ξ ≤ and ξ > .III. P ROOF OF P ROPOSITION
I.4We describe a procedure for constructing a partition of X with the desired properties. Since the labels do not matter,we may assume for convenience of notation that X = { , . . . , |X |} and λ (1) ≤ λ (2) ≤ · · · ≤ λ ( |X | ) . (40)The first subset in the partition we construct is L = { x ∈ X : λ ( x ) ≥ |X |} . (41) If X = L , then the construction is complete and (19) and (20)are clearly satisfied. Otherwise we follow the steps below toconstruct additional subsets L , . . . , L M . Step : If |X \ L | ≤ λ (1) , (42)then we complete the construction by setting L = X \ L and M = 1 . Otherwise we set L = (cid:8) , . . . , λ (1) (cid:9) (43)and go to Step . Step m ≥ : If (cid:12)(cid:12)(cid:12)(cid:12) X \ m − [ i =0 L i (cid:12)(cid:12)(cid:12)(cid:12) ≤ λ ( |L | + . . . + |L m − | + 1) , (44)then we complete the construction by setting L m = X \ S m − i =0 L i and M = m . Otherwise we let L m contain the λ ( |L | + . . . + |L m − | + 1) smallestelements of X \ S m − i =0 L i , i.e., we set L m = (cid:8) |L | + . . . + |L m − | + 1 , . . . , |L | + . . . + |L m − | + λ ( |L | + . . . + |L m − | + 1) (cid:9) (45)and go to Step m + 1 .We next verify that (20) is satisfied and that the total numberof subsets M + 1 does not exceed (19). Clearly, L ( x ) ≤ |X | for every x ∈ X , so to prove (20) we check that L ( x ) ≤ λ ( x ) for every x ∈ X . It is clear that L ( x ) ≤ λ ( x ) for all x ∈ L . Let k ( x ) denote the smallest element in the subsetcontaining x . Then L ( x ) ≤ λ ( k ( x )) for all x ∈ S Mm =1 L m byconstruction, and since k ( x ) ≤ x , we have λ ( k ( x )) ≤ λ ( x ) bythe assumption (40), and hence L ( x ) ≤ λ ( x ) for all x ∈ X .It remains to check that M + 1 does not exceed (19). Thisis clearly true when M = 1 , so we assume that M ≥ . Since L ( x ) = λ ( k ( x )) for all x ∈ S M − m =1 L m , we have on accountof Proposition I.3 M = X x ∈ S Mm =1 L m L ( x ) (46) = 1 + X x ∈ S M − m =1 L m L ( x ) (47) = 1 + X x ∈ S M − m =1 L m λ ( k ( x )) . (48)Fix an arbitrary α > and let M be the set of indices m ∈{ , . . . , M − } such that there is an x ∈ L m with λ ( x ) >αλ ( k ( x )) . We next argue that |M| < log α |X | . To this end,enumerate the indices in M as m < m < · · · < m |M| . Foreach i ∈ { , . . . , |M|} select x i ∈ L m i such that λ ( x i ) >αλ ( k ( x i )) . Then λ ( x ) > αλ ( k ( x )) (49) ≥ α. (50)ote that if m < m ′ and x ∈ L m and x ′ ∈ L m ′ , then x < x ′ .Thus, x < k ( x ) because x ∈ L m and k ( x ) ∈ L m , and m < m . Consequently, λ ( x ) > αλ ( k ( x )) (51) ≥ αλ ( x ) (52) > α . (53)Iterating this argument shows that λ ( x |M| ) > α |M| . (54)And since λ ( x ) ≤ |X | for x ∈ S Mm =1 L m by (41), itfollows that |M| < log α |X | . Continuing from (48) with M c , { , . . . , M − } \ M , M = 1 + |M| + X x ∈ S m ∈M c L m λ ( k ( x )) (55) < α |X | + α X x ∈ S m ∈M c L m λ ( x ) (56) ≤ α |X | + αµ, (57)where the first inequality follows because λ ( x ) ≤ αλ ( k ( x )) for x ∈ S m ∈M c L m , and where the second inequality followsfrom the hypothesis of the proposition. Since M + 1 is aninteger and α > is arbitrary, it follows from (57) that M + 1 is upper-bounded by (19).IV. M ISMATCH
The key to the upper bound in Theorem I.1 was to useProposition I.4 with λ as in (30) and (31) to obtain a partitionof X for which the cardinality of the subset containing x isapproximately proportional to P ( x ) − / (1+ ρ ) . Evidently, thisconstruction requires knowledge of the distribution P of X . Inthis section, we derive the penalty when P is replaced with Q in (30) and (31). Since it is then still true that µ ≤ M − log |X | − , (58)Proposition I.4 guarantees the existence of a partition of X into at most M subsets satisfying (20). Constructing f fromthis partition as in Section II-B and proceeding similarly asin (36) to (38), we obtain X x ∈X P ( x ) | f − ( f ( x )) | ρ < ρ ( H ˜ ρ ( X )+∆ ˜ ρ ( P || Q ) − log f M ) , (59)where ∆ ˜ ρ ( P || Q ) is as in (21) and f M is as in Theorem I.1.(Note that ∆ ˜ ρ ( P || Q ) < ∞ only if the support of P iscontained in the support of Q .) The penalty in the exponentwhen compared to the upper bound in Theorem I.1 is thusgiven by ∆ ˜ ρ ( P || Q ) . To reinforce this, further note that ∆ α ( P n || Q n ) = n ∆ α ( P || Q ) , (60)where P n and Q n are the n -fold products of P and Q .Consequently, if the source { X i } ∞ i =1 is IID P and we construct f n : X n → { , . . . , nR } similarly as above based on Q n instead of P n , we obtain the bound E (cid:2) | f − n ( f n ( X n )) | ρ (cid:3) < nρ ( H ˜ ρ ( X )+∆ ˜ ρ ( P || Q ) − R + δ n ) , (61)where δ n → as n → ∞ . The RHS of (61) tends to oneprovided that R > H ˜ ρ ( X ) + ∆ ˜ ρ ( P || Q ) . Thus, in the IIDcase ∆ ˜ ρ ( P || Q ) is the rate penalty incurred by the mismatchbetween P and Q .We conclude this section with some propertiesof ∆ α ( P || Q ) . Properties 1–3 (see below) were givenin [6]; we repeat them here for completeness. Note thatR´enyi’s divergence (see, e.g., [9]) D α ( P || Q ) = 1 α − X x ∈X P ( x ) α Q ( x ) − α , (62)satisfies Properties 1 and 3 but none of the others in general. Proposition IV.1.
Let supp ( P ) and supp ( Q ) denote the sup-port sets of P and Q . The functional ∆ α ( P || Q ) has thefollowing properties.1) ∆ α ( P || Q ) ≥ with equality iff P = Q .2) ∆ α ( P || Q ) = ∞ iff ( < α < and supp ( P ) supp ( Q ) )or ( α > and supp ( P ) ∩ supp ( Q ) = ∅ .)3) lim α → ∆ α ( P || Q ) = D ( P || Q ) .4) lim α → ∆ α ( P || Q ) = log | supp ( Q ) || supp ( P ) | if supp ( P ) ⊆ supp ( Q ) .5) lim α →∞ ∆ α ( P || Q ) = log max x ∈X P ( x ) |Q| P x ′∈Q P ( x ′ ) , where Q = (cid:8) x ∈ X : Q ( x ) = max x ′ ∈X Q ( x ′ ) (cid:9) . Proof:
Property 2 follows by inspection of (21). Proper-ties 3–5 follow by simple calculus. As to Property 1, considerfirst the case where < α < . In view of Property 2, wemay assume that supp ( P ) ⊆ supp ( Q ) . H¨older’s Inequality (24)with p = 1 /α and q = 1 / (1 − α ) gives X x ∈X P ( x ) α = X x ∈ supp ( P ) P ( x ) α Q ( x ) α (1 − α ) Q ( x ) α (1 − α ) (63) ≤ (cid:18) X x ∈ supp ( P ) P ( x ) Q ( x ) − α (cid:19) α (cid:18) X x ∈ supp ( P ) Q ( x ) α (cid:19) − α ≤ (cid:18) X x ∈X P ( x ) Q ( x ) − α (cid:19) α (cid:18) X x ∈X Q ( x ) α (cid:19) − α . (64)The conditions for equality in H¨older’s Inequality imply thatequality holds iff P = Q . Consider next the case where α > .By H¨older’s Inequality with p = α and q = α/ ( α − , X x ∈X P ( x ) Q ( x ) − α = X x ∈X P ( x ) Q ( x ) α − (65) ≤ (cid:18) X x ∈X P ( x ) α (cid:19) α (cid:18) X x ∈X Q ( x ) α (cid:19) α − α , (66)with equality iff P = Q . EFERENCES[1] C. Bunte and A. Lapidoth, “Source coding, lists, and R´enyi entropy,” in
Information Theory Workshop (ITW), 2013 IEEE , 2013, pp. 350–354.[2] Z. Rached, F. Alajaji, and L. Campbell, “R´enyi’s divergence and entropyrates for finite alphabet Markov sources,”
IEEE Trans. Inf. Theory , vol. 47,no. 4, pp. 1553–1561, 2001.[3] C.-E. Pfister and W. Sullivan, “R´enyi entropy, guesswork moments, andlarge deviations,”
IEEE Trans. Inf. Theory , vol. 50, no. 11, pp. 2794–2800,2004.[4] D. Malone and W. G. Sullivan, “Guesswork and entropy,”
IEEE Trans.Inf. Theory , vol. 50, no. 3, pp. 525–526, 2004. [5] T. Cover and J. Thomas,
Elements of Information Theory , 2nd ed.Hoboken, NJ: John Wiley & Sons, 2006.[6] R. Sundaresan, “Guessing under source uncertainty,”
IEEE Trans. Inf.Theory , vol. 53, no. 1, pp. 269–287, 2007.[7] J. Massey, “Guessing and entropy,” in
Information Theory Proceedings(ISIT), 1994 IEEE International Symposium on , 1994, p. 204.[8] E. Arikan, “An inequality on guessing and its application to sequentialdecoding,”
IEEE Trans. Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996.[9] I. Csisz´ar, “Generalized cutoff rates and R´enyi’s information measures,”