A Divergence Formula for Randomness and Dimension (Short Version)
aa r X i v : . [ c s . CC ] J un T. Neary, D. Woods, A.K. Seda and N. Murphy (Eds.):The Complexity of Simple Programs 2008.EPTCS 1, 2009, pp. 149–152, doi:10.4204/EPTCS.1.14 c (cid:13)
Jack H. Lutz
A Divergence Formula for Randomness and Dimension (Short Version)
Jack H. Lutz ∗ Department of Computer Science, Iowa State University, Ames, IA 50011 USA. [email protected]. If S is an infinite sequence over a finite alphabet S and b is a probability measure on S , then the dimension of S with respect to b , written dim b ( S ) , is a constructive version of Billingsley dimensionthat coincides with the (constructive Hausdorff) dimension dim ( S ) when b is the uniform probabilitymeasure. This paper shows that dim b ( S ) and its dual Dim b ( S ) , the strong dimension of S withrespect to b , can be used in conjunction with randomness to measure the similarity of two probabilitymeasures a and b on S . Specifically, we prove that the divergence formula dim b ( R ) = Dim b ( R ) = H ( a ) H ( a ) + D ( a || b ) holds whenever a and b are computable, positive probability measures on S and R ∈ S ¥ is randomwith respect to a . In this formula, H ( a ) is the Shannon entropy of a , and D ( a || b ) is the Kullback-Leibler divergence between a and b . The constructive dimension dim ( S ) and the constructive strong dimension Dim ( S ) of an infinite sequence S over a finite alphabet S are constructive versions of the two most important classical fractal dimensions,namely, Hausdorff dimension [7] and packing dimension [20, 19], respectively. These two constructivedimensions, which were introduced in [11, 1], have been shown to have the useful characterizationsdim ( S ) = lim inf w → S K ( w ) | w | log | S | (1.1)and Dim ( S ) = lim sup w → S K ( w ) | w | log | S | , (1.2)where the logarithm is base-2 [15, 1]. In these equations, K ( w ) is the Kolmogorov complexity of the pre-fix w of S , i.e., the length in bits of the shortest program that prints the string w. (See [9] for details.) Thenumerators in these equations are thus the algorithmic information content of w, while the denominatorsare the “naive” information content of w , also in bits. We thus understand (1.1) and (1.2) to say thatdim ( S ) and Dim ( S ) are the lower and upper information densities of the sequence S . These constructivedimensions and their analogs at other levels of effectivity have been investigated extensively in recentyears [8]. ∗ This research was supported in part by National Science Foundation Grants 9988483, 0344187, 0652569, and 0728806and by the Spanish Ministry of Education and Science (MEC) and the European Regional Development Fund (ERDF) underproject TIN2005-08832-C03-02.
50 ADivergence Formula forRandomness and DimensionThe constructive dimensions dim ( S ) and Dim ( S ) have recently been generalized to incorporate aprobability measure n on the sequence space S ¥ as a parameter [13]. Specifically, for each such n andeach sequence S ∈ S ¥ , we now have the constructive dimension dim n ( S ) and the constructive strongdimension Dim n ( S ) of S with respect to n . (The first of these is a constructive version of Billingsleydimension [2].) When n is the uniform probability measure on S ¥ , we have dim n ( S ) = dim ( S ) andDim n ( S ) = Dim ( S ) . A more interesting example occurs when n is the product measure generated by anonuniform probability measure b on the alphabet S . In this case, dim n ( S ) and Dim n ( S ) , which we writeas dim b ( S ) and Dim b ( S ) , are again the lower and upper information densities of S, but these densitiesare now measured with respect to unequal letter costs. Specifically, it was shown in [13] thatdim b ( S ) = lim inf w → S K ( w ) I b ( w ) (1.3)and Dim b ( S ) = lim sup w → S K ( w ) I b ( w ) , (1.4)where I b ( w ) = | w |− (cid:229) i = log 1 b ( w [ i ]) is the Shannon self-information of w with respect to b . These unequal letter costs log ( / b ( a )) for a ∈ S can in fact be useful. For example, the complete analysis of the dimensions of individual points inself-similar fractals given by [13] requires these constructive dimensions with a particular choice of theprobability measure b on S .In this paper we show how to use the constructive dimensions dim b ( S ) and Dim b ( S ) in conjunctionwith randomness to measure the degree to which two probability measures on S are similar. To see whythis might be possible, we note that the inequalities0 ≤ dim b ( S ) ≤ Dim b ( S ) ≤ b and S and that the maximum valuesdim b ( R ) = Dim b ( R ) = R is random with respect to b . It is thus reasonable to hopethat, if R is random with respect to some other probability measure a on S , then dim b ( R ) and Dim b ( R ) will take on values whose closeness to 1 reflects the degree to which a is similar to b .This is indeed the case. Our first main theorem says that the divergence formula dim b ( R ) = Dim b ( R ) = H ( a ) H ( a ) + D ( a || b ) (1.6)holds whenever a and b are computable, positive probability measures on S and R ∈ S ¥ is random withrespect to a . In this formula, H ( a ) is the Shannon entropy of a , and D ( a || b ) is the Kullback-Leiblerdivergence between a and b . When a = b , the Kullback-Leibler divergence D ( a || b ) is 0, so (1.6)coincides with (1.5). When a and b are dissimilar, the Kullback-Leibler divergence D ( a || b ) is large,so the right-hand side of (1.6) is small. Hence the divergence formula tells us that, when R is a -random,dim b ( R ) = Dim b ( R ) is a quantity in [ , ] whose closeness to 1 is an indicator of the similarity between a and b .ack H.Lutz 151The proof of (1.6) serves as an outline of our other, more challenging task, which is to prove that thedivergence formula (1.6) also holds for the much more effective finite-state b - dimension dim b FS ( R ) and finite-state strong b - dimension Dim b FS ( R ) . (These dimensions are generalizations of finite-state dimen-sion and finite-state strong dimension, which were introduced in [5, 1], respectively.)With this objective in mind, our second main theorem characterizes the finite-state b -dimensions interms of finite-state data compression. Specifically, this theorem says that, in analogy with (1.3) and(1.4), the identities dim b FS ( S ) = inf C lim inf w → S | C ( w ) | I b ( w ) (1.7)and dim b FS ( S ) = inf C lim sup w → S | C ( w ) | I b ( w ) (1.8)hold for all infinite sequences S over S . The infima here are taken over all information-lossless finite-state compressors (a model introduced by Shannon [18] and investigated extensively ever since) C withoutput alphabet 0 ,
1, and | C ( w ) | denotes the number of bits that C outputs when processing the prefix w of S . The special cases of (1.7) and (1.8) in which b is the uniform probability measure on S , and hence I b ( w ) = | w | log | S | , were proven in [5, 1]. In fact, our proof uses these special cases as “black boxes”from which we derive the more general (1.7) and (1.8).With (1.7) and (1.8) in hand, we prove our third main theorem. This involves the finite-state versionof randomness, which was introduced by Borel [3] long before finite-state automata were defined. If a is a probability measure on S , then a sequence S ∈ S ¥ is a - normal in the sense of Borel if every finitestring w ∈ S ∗ appears with asymptotic frequency a ( w ) in S , where we write a ( w ) = | w |− (cid:213) i = a ( w [ i ]) . Our third main theorem says that the divergence formula dim b FS ( R ) = Dim b FS ( R ) = H ( a ) H ( a ) + D ( a || b ) (1.9)holds whenever a and b are positive probability measures on S and R ∈ S ¥ is a -normal. Acknowledgments.
I thank Xiaoyang Gu and Elvira Mayordomo for useful discussions.
References [1] K. B. Athreya, J. M. Hitchcock, J. H. Lutz, and E. Mayordomo. Effective strong dimension, algorithmicinformation, and computational complexity.
SIAM Journal on Computing , 37:671–705, 2007.[2] P. Billingsley. Hausdorff dimension in probability theory.
Illinois Journal of Mathematics , 4:187–209, 1960.[3] E. Borel. Sur les probabilit´es d´enombrables et leurs applications arithm´etiques.
Rend. Circ. Mat. Palermo ,27:247–271, 1909.[4] T. M. Cover and J. A. Thomas.
Elements of Information Theory . John Wiley & Sons, Inc., second edition,2006.[5] J. J. Dai, J. I. Lathrop, J. H. Lutz, and E. Mayordomo. Finite-state dimension.
Theoretical Computer Science ,310:1–33, 2004.
52 ADivergence Formula forRandomness and Dimension [6] H. Eggleston. The fractional dimension of a set defined by decimal properties.
Quarterly Journal of Mathe-matics , Oxford Series 20:31–36, 1949.[7] F. Hausdorff. Dimension und ¨ausseres Mass.
Mathematische Annalen ∼ jhitchco/bib/dim.shtml (current October, 2008).[9] M. Li and P. M. B. Vit´anyi. An Introduction to Kolmogorov Complexity and its Applications . Springer-Verlag,Berlin, 1997. Second Edition.[10] J. H. Lutz. Dimension in complexity classes.
SIAM Journal on Computing , 32:1236–1259, 2003.[11] J. H. Lutz. The dimensions of individual strings and sequences.
Information and Computation , 187:49–79,2003.[12] J. H. Lutz. A divergence formula for randomness and dimension. Technical Report cs.CC/0811.1825, Com-puting Research Repository, 2008.[13] J. H. Lutz and E. Mayordomo. Dimensions of points in self-similar fractals.
SIAM Journal on Computing ,38:1080–1112, 2008.[14] P. Martin-L¨of. The definition of random sequences.
Information and Control , 9:602–619, 1966.[15] E. Mayordomo. A Kolmogorov complexity characterization of constructive Hausdorff dimension.
Informa-tion Processing Letters , 84(1):1–3, 2002.[16] C. P. Schnorr. A unified approach to the definition of random sequences.
Mathematical Systems Theory ,5:246–258, 1971.[17] C. P. Schnorr. A survey of the theory of random sequences. In R. E. Butts and J. Hintikka, editors,
BasicProblems in Methodology and Linguistics , pages 193–210. D. Reidel, 1977.[18] C. E. Shannon. A mathematical theory of communication.
Bell System Technical Journal , 27:379–423,623–656, 1948.[19] D. Sullivan. Entropy, Hausdorff measures old and new, and limit sets of geometrically finite Kleinian groups.
Acta Mathematica , 153:259–277, 1984.[20] C. Tricot. Two definitions of fractional dimension.