[PDF] A Lower Bound on the Complexity of Approximating the Entropy of a Markov Source

Abstract

Suppose that, for any (k \geq 1), (\epsilon > 0) and sufficiently large σ , we are given a black box that allows us to sample characters from a k th-order Markov source over the alphabet (\{0, ..., \sigma - 1\}). Even if we know the source has entropy either 0 or at least (\log (\sigma - k)), there is still no algorithm that, with probability bounded away from (1 / 2), guesses the entropy correctly after sampling at most ((\sigma - k)^{k / 2 - \epsilon}) characters.

Full PDF

aa r X i v : . [ c s . I T ] D ec A Lower Bound on the Complexity ofApproximating the Entropy of a Markov Source

Travis Gagie

Department of Computer ScienceUniversity of Chile [email protected]

The Asymptotic Equipartition Property (see, e.g., [3]) implies that, if we choose the charactersof a string s of length n independently and according to the same probability distribution P overthe alphabet then, for large values of n , the 0th-order empirical entropy H ( s ) of s (see, e.g., [4])will almost certainly be close to the entropy H ( P ) of P . Batu, Dasgupta, Kumar and Rubinfeld [1]showed that, if H ( P ) = Ω ( γ/ǫ ), then we can almost certainly approximate H ( P ) to within a factorof γ after seeing O (cid:16) σ (1+ ǫ ) /γ log σ (cid:17) characters of s , where σ is the alphabet size and ǫ is any positiveconstant; they proved a lower bound of Ω (cid:16) σ / (2 γ ) (cid:17) , which was later improved by Raskhodnikova,Ron, Shpilka and Smith [5] and Valiant [6].Similarly, the Shannon-McMillan-Breiman Theorem (see, e.g., [3] again) implies that, if wegenerate s from a stationary ergodic k th-order Markov source X then, for large values of n , the k th-order empirical entropy H k ( s ) of s (see, e.g., [4] again) will almost certainly be close to theentropy H ( X ) of X . Although many papers have been written about approximating the entropy ofa Markov source based on a sample (see, e.g., [2] and references therein), we know of no upper orlower bounds similar to Batu et al.’s results. We now give a simple proof that, even if we know X has entropy either 0 or at least log( σ − k ), there is still no algorithm that, with probability boundedaway from 1 /

2, guesses its entropy correctly after seeing at most ( σ − k ) k/ − ǫ characters. Lemma 1.

For any k ≥ , ǫ > and suﬃciently large σ , there is a k th-order Markov source overthe alphabet { , . . . , σ − } that has entropy at least log( σ − k ) but, with high probability, does notemit duplicate k -tuples among its ﬁrst ( σ − k ) k/ − ǫ characters.Proof. Consider the k th-order Markov source that, whenever it has emitted a k -tuple α = a , . . . , a k ,emits a character drawn uniformly at random from { , . . . , σ − } − { a , . . . , a k } . Notice this sourcehas entropy at least log( σ − k ). Also, a k -tuple α cannot occur in position i if it occurs in any ofthe positions i − k + 1 , . . . , i − , i + 1 , . . . , i + k −

1, and vice versa. Finally, the probability α occursin position i is independent of whether it occurs in position j for j ≤ i − k or j ≥ i + k .For i − k + 1 ≤ j ≤ i + k −

1, let the indicator variable B j be 1 if α occurs in position j , and0 otherwise. By Bayes’ Rule, the probability α occurs in position i , given that it does not occur inany of the positions i − k + 1 , . . . , i − , i + 1 , . . . , i + k −

1, isPr h B i = 1 (cid:12)(cid:12)(cid:12) B i − k +1 = · · · = B i − = B i +1 = · · · = B i + k − = 0 i = Pr h B i = 1 and B i − k +1 = · · · = B i − = B i +1 = · · · = B i + k − = 0 i Pr h B i − k +1 = · · · = B i − = B i +1 = · · · = B i + k − = 0 i ≤ Pr[ B i = 1]1 − Pr h B i − k +1 = 1 or · · · or B i − = 1 or B i +1 = 1 or · · · or B i + k − = 1 i / ( σ − k ) k − (2 k − / ( σ − k ) k = 1( σ − k ) k − k − . It follows that the probability α occurs at least twice among the ﬁrst ( σ − k ) k/ − ǫ emitted charactersis at most the probability that, while drawing ( σ − k ) k/ − ǫ elements uniformly at random and withreplacement from a set of size ( σ − k ) k , we draw a speciﬁed element at least twice. Therefore, theprobability any k -tuple occurs at least twice among the ﬁrst ( σ − k ) k/ − ǫ emitted characters is atmost the probability that we draw any element at least twice. For k ≥ σ ,both probabilities are negligible. ⊓⊔ Theorem 1.

Suppose that, for any k ≥ , ǫ > and suﬃciently large σ , we are given a black boxthat allows us to sample characters from a k th-order Markov source over the alphabet { , . . . , σ − } .Even if we know the source has entropy either or at least log( σ − k ) , there is still no algorithmthat, with probability bounded away from / , guesses the entropy correctly after sampling at most ( σ − k ) k/ − ǫ characters.Proof. Consider any algorithm A for guessing the source’s entropy. Suppose there is a string s oflength ( σ − k ) k/ − ǫ containing no duplicate k -tuples and such that, with probability at least 1 / A stops and guesses “at least log( σ − k )” after sampling a preﬁx of s . Then on any source withentropy 0 that starts by emitting s with probability 1 the algorithm errs with probability at least1 /

2. Given s , it is straightforward to build such a source.Now suppose there is no such string s . Then whenever the ﬁrst ( σ − k ) k/ − ǫ sampled characterscontain no duplicate k -tuples, A either samples more characters or stops and guesses “0”, withprobability at least 1 /

2. Therefore, on any source with entropy at least log( σ − k ) that, with highprobability, does not emit duplicate k -tuples among its ﬁrst ( σ − k ) k/ − ǫ characters — such as theone described in the lemma above — A either samples more characters or errs, with probabilitynearly 1 / ⊓⊔ References

1. T. Batu, S. Dasgupta, R. Kumar, and R. Rubinfeld. The complexity of approximating the entropy.

SIAM Journalon Computing , 35(1):132–150, 2005.2. H. Cai, S. R. Kulkarni, and S. Verd´u. Universal entropy estimation via block sorting.

IEEE Transactions onInformation Theory , 50(7):1551–1561, 2004.3. T. M. Cover and J. A. Thomas.

Elements of Information Theory . Wiley-Interscience, 2nd edition, 2006.4. G. Manzini. An analysis of the Burrows-Wheeler transform.

Journal of the ACM , 48(3):407–430, 2001.5. S. Raskhodnikova, D. Ron, A. Shpilka, and A. Smith. Strong lower bounds for approximating distribution supportsize and the distinct elements problem. In

Proceedings of the 48th Symposium on Foundations of Computer Science ,pages 559–569, 2007.6. P. Valiant. Testing symmetric properties of distributions. In

Proceedings of the 40th Symposium on Theory ofComputing , pages 383–392, 2008., pages 383–392, 2008.