Abstract

If S is an infinite sequence over a finite alphabet Σ and β is a probability measure on Σ , then the {\it dimension} of S with respect to β , written dim β (S) , is a constructive version of Billingsley dimension that coincides with the (constructive Hausdorff) dimension dim(S) when β is the uniform probability measure. This paper shows that dim β (S) and its dual $\Dim^\beta(S)$, the {\it strong dimension} of S with respect to β , can be used in conjunction with randomness to measure the similarity of two probability measures α and β on Σ . Specifically, we prove that the {\it divergence formula} \dim^\beta(R) = \Dim^\beta(R) =\CH(\alpha) / (\CH(\alpha) + \D(\alpha || \beta)) holds whenever α and β are computable, positive probability measures on Σ and R∈ Σ ∞ is random with respect to α . In this formula, $\CH(\alpha)$ is the Shannon entropy of α , and $\D(\alpha||\beta)$ is the Kullback-Leibler divergence between α and β .

Full PDF

aa r X i v : . [ c s . CC ] J un T. Neary, D. Woods, A.K. Seda and N. Murphy (Eds.):The Complexity of Simple Programs 2008.EPTCS 1, 2009, pp. 149–152, doi:10.4204/EPTCS.1.14 c (cid:13)

Jack H. Lutz

A Divergence Formula for Randomness and Dimension (Short Version)

Jack H. Lutz ∗ Department of Computer Science, Iowa State University, Ames, IA 50011 USA. [email protected]. If S is an inﬁnite sequence over a ﬁnite alphabet S and b is a probability measure on S , then the dimension of S with respect to b , written dim b ( S ) , is a constructive version of Billingsley dimensionthat coincides with the (constructive Hausdorff) dimension dim ( S ) when b is the uniform probabilitymeasure. This paper shows that dim b ( S ) and its dual Dim b ( S ) , the strong dimension of S withrespect to b , can be used in conjunction with randomness to measure the similarity of two probabilitymeasures a and b on S . Speciﬁcally, we prove that the divergence formula dim b ( R ) = Dim b ( R ) = H ( a ) H ( a ) + D ( a || b ) holds whenever a and b are computable, positive probability measures on S and R ∈ S ¥ is randomwith respect to a . In this formula, H ( a ) is the Shannon entropy of a , and D ( a || b ) is the Kullback-Leibler divergence between a and b . The constructive dimension dim ( S ) and the constructive strong dimension Dim ( S ) of an inﬁnite sequence S over a ﬁnite alphabet S are constructive versions of the two most important classical fractal dimensions,namely, Hausdorff dimension [7] and packing dimension [20, 19], respectively. These two constructivedimensions, which were introduced in [11, 1], have been shown to have the useful characterizationsdim ( S ) = lim inf w → S K ( w ) | w | log | S | (1.1)and Dim ( S ) = lim sup w → S K ( w ) | w | log | S | , (1.2)where the logarithm is base-2 [15, 1]. In these equations, K ( w ) is the Kolmogorov complexity of the pre-ﬁx w of S , i.e., the length in bits of the shortest program that prints the string w. (See [9] for details.) Thenumerators in these equations are thus the algorithmic information content of w, while the denominatorsare the “naive” information content of w , also in bits. We thus understand (1.1) and (1.2) to say thatdim ( S ) and Dim ( S ) are the lower and upper information densities of the sequence S . These constructivedimensions and their analogs at other levels of effectivity have been investigated extensively in recentyears [8]. ∗ This research was supported in part by National Science Foundation Grants 9988483, 0344187, 0652569, and 0728806and by the Spanish Ministry of Education and Science (MEC) and the European Regional Development Fund (ERDF) underproject TIN2005-08832-C03-02.

50 ADivergence Formula forRandomness and DimensionThe constructive dimensions dim ( S ) and Dim ( S ) have recently been generalized to incorporate aprobability measure n on the sequence space S ¥ as a parameter [13]. Speciﬁcally, for each such n andeach sequence S ∈ S ¥ , we now have the constructive dimension dim n ( S ) and the constructive strongdimension Dim n ( S ) of S with respect to n . (The ﬁrst of these is a constructive version of Billingsleydimension [2].) When n is the uniform probability measure on S ¥ , we have dim n ( S ) = dim ( S ) andDim n ( S ) = Dim ( S ) . A more interesting example occurs when n is the product measure generated by anonuniform probability measure b on the alphabet S . In this case, dim n ( S ) and Dim n ( S ) , which we writeas dim b ( S ) and Dim b ( S ) , are again the lower and upper information densities of S, but these densitiesare now measured with respect to unequal letter costs. Speciﬁcally, it was shown in [13] thatdim b ( S ) = lim inf w → S K ( w ) I b ( w ) (1.3)and Dim b ( S ) = lim sup w → S K ( w ) I b ( w ) , (1.4)where I b ( w ) = | w |− (cid:229) i = log 1 b ( w [ i ]) is the Shannon self-information of w with respect to b . These unequal letter costs log ( / b ( a )) for a ∈ S can in fact be useful. For example, the complete analysis of the dimensions of individual points inself-similar fractals given by [13] requires these constructive dimensions with a particular choice of theprobability measure b on S .In this paper we show how to use the constructive dimensions dim b ( S ) and Dim b ( S ) in conjunctionwith randomness to measure the degree to which two probability measures on S are similar. To see whythis might be possible, we note that the inequalities0 ≤ dim b ( S ) ≤ Dim b ( S ) ≤ b and S and that the maximum valuesdim b ( R ) = Dim b ( R ) = R is random with respect to b . It is thus reasonable to hopethat, if R is random with respect to some other probability measure a on S , then dim b ( R ) and Dim b ( R ) will take on values whose closeness to 1 reﬂects the degree to which a is similar to b .This is indeed the case. Our ﬁrst main theorem says that the divergence formula dim b ( R ) = Dim b ( R ) = H ( a ) H ( a ) + D ( a || b ) (1.6)holds whenever a and b are computable, positive probability measures on S and R ∈ S ¥ is random withrespect to a . In this formula, H ( a ) is the Shannon entropy of a , and D ( a || b ) is the Kullback-Leiblerdivergence between a and b . When a = b , the Kullback-Leibler divergence D ( a || b ) is 0, so (1.6)coincides with (1.5). When a and b are dissimilar, the Kullback-Leibler divergence D ( a || b ) is large,so the right-hand side of (1.6) is small. Hence the divergence formula tells us that, when R is a -random,dim b ( R ) = Dim b ( R ) is a quantity in [ , ] whose closeness to 1 is an indicator of the similarity between a and b .ack H.Lutz 151The proof of (1.6) serves as an outline of our other, more challenging task, which is to prove that thedivergence formula (1.6) also holds for the much more effective ﬁnite-state b - dimension dim b FS ( R ) and ﬁnite-state strong b - dimension Dim b FS ( R ) . (These dimensions are generalizations of ﬁnite-state dimen-sion and ﬁnite-state strong dimension, which were introduced in [5, 1], respectively.)With this objective in mind, our second main theorem characterizes the ﬁnite-state b -dimensions interms of ﬁnite-state data compression. Speciﬁcally, this theorem says that, in analogy with (1.3) and(1.4), the identities dim b FS ( S ) = inf C lim inf w → S | C ( w ) | I b ( w ) (1.7)and dim b FS ( S ) = inf C lim sup w → S | C ( w ) | I b ( w ) (1.8)hold for all inﬁnite sequences S over S . The inﬁma here are taken over all information-lossless ﬁnite-state compressors (a model introduced by Shannon [18] and investigated extensively ever since) C withoutput alphabet 0 ,

1, and | C ( w ) | denotes the number of bits that C outputs when processing the preﬁx w of S . The special cases of (1.7) and (1.8) in which b is the uniform probability measure on S , and hence I b ( w ) = | w | log | S | , were proven in [5, 1]. In fact, our proof uses these special cases as “black boxes”from which we derive the more general (1.7) and (1.8).With (1.7) and (1.8) in hand, we prove our third main theorem. This involves the ﬁnite-state versionof randomness, which was introduced by Borel [3] long before ﬁnite-state automata were deﬁned. If a is a probability measure on S , then a sequence S ∈ S ¥ is a - normal in the sense of Borel if every ﬁnitestring w ∈ S ∗ appears with asymptotic frequency a ( w ) in S , where we write a ( w ) = | w |− (cid:213) i = a ( w [ i ]) . Our third main theorem says that the divergence formula dim b FS ( R ) = Dim b FS ( R ) = H ( a ) H ( a ) + D ( a || b ) (1.9)holds whenever a and b are positive probability measures on S and R ∈ S ¥ is a -normal. Acknowledgments.

I thank Xiaoyang Gu and Elvira Mayordomo for useful discussions.

References [1] K. B. Athreya, J. M. Hitchcock, J. H. Lutz, and E. Mayordomo. Effective strong dimension, algorithmicinformation, and computational complexity.

SIAM Journal on Computing , 37:671–705, 2007.[2] P. Billingsley. Hausdorff dimension in probability theory.

Illinois Journal of Mathematics , 4:187–209, 1960.[3] E. Borel. Sur les probabilit´es d´enombrables et leurs applications arithm´etiques.

Rend. Circ. Mat. Palermo ,27:247–271, 1909.[4] T. M. Cover and J. A. Thomas.

Elements of Information Theory . John Wiley & Sons, Inc., second edition,2006.[5] J. J. Dai, J. I. Lathrop, J. H. Lutz, and E. Mayordomo. Finite-state dimension.

Theoretical Computer Science ,310:1–33, 2004.

52 ADivergence Formula forRandomness and Dimension [6] H. Eggleston. The fractional dimension of a set deﬁned by decimal properties.

Quarterly Journal of Mathe-matics , Oxford Series 20:31–36, 1949.[7] F. Hausdorff. Dimension und ¨ausseres Mass.

Mathematische Annalen ∼ jhitchco/bib/dim.shtml (current October, 2008).[9] M. Li and P. M. B. Vit´anyi. An Introduction to Kolmogorov Complexity and its Applications . Springer-Verlag,Berlin, 1997. Second Edition.[10] J. H. Lutz. Dimension in complexity classes.

SIAM Journal on Computing , 32:1236–1259, 2003.[11] J. H. Lutz. The dimensions of individual strings and sequences.