Mutual Dimension and Random Sequences
aa r X i v : . [ c s . CC ] M a r Mutual Dimension and Random Sequences ∗ Adam Case and Jack H. LutzDepartment of Computer ScienceIowa State UniversityAmes, IA 50011 USA
Abstract If S and T are infinite sequences over a finite alphabet, then the lower and upper mutualdimensions mdim ( S : T ) and M dim ( S : T ) are the upper and lower densities of the algorithmicinformation that is shared by S and T . In this paper we investigate the relationships betweenmutual dimension and coupled randomness , which is the algorithmic randomness of two sequences R and R with respect to probability measures that may be dependent on one another. For arestricted but interesting class of coupled probability measures we prove an explicit formula forthe mutual dimensions mdim ( R : R ) and M dim ( R : R ), and we show that the condition M dim ( R : R ) = 0 is necessary but not sufficient for R and R to be independently random.We also identify conditions under which Billingsley generalizations of the mutual dimensions mdim ( S : T ) and M dim ( S : T ) can be meaningfully defined; we show that under these condi-tions these generalized mutual dimensions have the “correct” relationships with the Billingsleygeneralizations of dim ( S ), Dim ( S ), dim ( T ), and Dim ( T ) that were developed and applied byLutz and Mayordomo; and we prove a divergence formula for the values of these generalizedmutual dimensions. Algorithmic information theory combines tools from the theory of computing and classical Shannoninformation theory to create new methods for quantifying information in an expanding variety ofcontexts. Two notable and related strengths of this approach that were evident from the beginning[11] are its abilities to quantify the information in and to assess the randomness of individual dataobjects.Some useful mathematical objects, such as real numbers and execution traces of nonterminatingprocesses, are intrinsically infinitary. The randomness of such objects was successfully defined veryearly [18] but it was only at the turn of the present century [15, 14] that ideas of Hausdorff werereshaped in order to define effective fractal dimensions , which quantify the densities of algorithmicinformation in such infinitary objects. Effective fractal dimensions, of which there are now many,and their relations with randomness are now a significant part of algorithmic information theory[6]. ∗ This research was supported in part by National Science Foundation Grants 0652519, 1143830, 124705, and1545028. Part of the second author’s work was done during a sabbatical at Caltech and the Isaac Newton Institutefor Mathematical Sciences at the University of Cambridge. A preliminary version of part of this work was presentedat the Fortieth International Symposium on Mathematical Foundations of Computer Science, August 24-28, 2015, inMilano, Italy. shared by two objects. The mutual information I ( X ; Y ) of classicalShannon information theory does something along these lines, but for two probability spaces ofobjects rather than for two individual objects [5]. The algorithmic mutual information I ( x : y ),defined in terms of Kolmogorov complexity [13], quantifies the information shared by two individualfinite objects x and y .The present authors recently developed the mutual dimensions mdim ( x : y ) and M dim ( x : y ) inorder to quantify the density of algorithmic information shared by two infinitary objects x and y [4]. The objects x and y of interest in [4] are points in Euclidean spaces R n and their images undercomputable functions, so the fine-scale geometry of R n plays a major role there.In this paper we investigate mutual dimensions further, with objectives that are more conventionalin algorithmic information theory. Specifically, we focus on the lower and upper mutual dimensions mdim ( S : T ) and M dim ( S : T ) between two sequences S, T ∈ Σ ∞ , where Σ is a finite alphabet. (IfΣ = { , } , then we write C for the Cantor space Σ ∞ .) The definitions of these mutual dimensions,which are somewhat simpler in Σ ∞ than in R n , are implicit in [4] and explicit in section 2 below.Our main objective here is to investigate the relationships between mutual dimension and coupledrandomness , which is the algorithmic randomness of two sequences R and R with respect toprobability measures that may be dependent on one another. In section 3 below we formulatecoupled randomness precisely, and we prove our main theorem, Theorem 3.8, which gives an explicitformula for mdim ( R : R ) and M dim ( R : R ) in a restricted but interesting class of coupledprobability measures. This theorem can be regarded as a “mutual version” of Theorem 7.7 of [14],which in turn is an algorithmic extension of a classical theorem of Eggleston [7, 2]. We also showin section 3 that M dim ( R : R ) = 0 is a necessary, but not sufficient condition for two randomsequences R and R to be independently random.In 1960 Billingsley investigated generalizations of Hausdorff dimension in which the dimensionitself is defined “through the lens of” a given probability measure [1, 3]. Lutz and Mayordomodeveloped the effective Billingsley dimensions dim ν ( S ) and Dim ν ( S ), where ν is a probabilitymeasure on Σ ∞ , and these have been useful in the algorithmic information theory of self-similarfractals [17, 8].In section 4 we investigate “Billingsley generalizations” mdim ν ( S : T ) and M dim ν ( S : T ) of mdim ( S : T ) and M dim ( S : T ), where ν is a probability measure on Σ ∞ × Σ ∞ . These turn outto make sense only when S and T are mutually normalizable , which means that the normalizationsimplicit in the fact that these dimensions are densities of shared information are the same for S asfor T . We prove that, when mutual normalizability is satisfied, the Billingsley mutual dimensions mdim ν ( S : T ) and M dim ν ( S : T ) are well behaved. We also identify a sufficient condition formutual normalizability, make some preliminary observations on when it holds, and prove a diver-gence formula, analogous to a theorem of [16], for computing the values of the Billingsley mutualdimensions in many cases. In [4] the authors defined and investigated the mutual dimension between points in Euclidean space.The purpose of this section is to develop a similar framework for the mutual dimension betweensequences.Let Σ = { , , . . . k − } be our alphabet and Σ ∞ denote the set of all k -ary sequences over Σ.2or S, T ∈ Σ ∞ , the notation ( S, T ) represents the sequence in (Σ × Σ) ∞ obtained after pairing eachsymbol in S with the symbol in T located at the same position. For S ∈ Σ ∞ , let α S = ∞ X i =0 S [ i ] k − ( i +1) ∈ [0 , . (2.1)Informally, we say that α S is the real representation of S . Note that, in this section, we often usethe notation S ↾ r to mean the first r ∈ N symbols of a sequence S .We begin by reviewing some definitions and theorems of algorithmic information theory. AllTuring machines are assumed to be self-delimiting. Definition.
The conditional Kolmogorov complexity of u ∈ Σ ∗ given w ∈ Σ ∗ with respect to aTuring machine M is K M ( u | w ) = min {| π | (cid:12)(cid:12) π ∈ { , } ∗ and M ( π, w ) = u } . We define the
Kolmogorov complexity of u ∈ Σ ∗ with respect to a Turing machine M by K M ( u ) = K M ( u | λ ), where λ is the empty string . Definition.
A Turing machine M ′ is optimal if, for all Turing machines M , there exists a constant c ∈ { , } ∗ such that K M ′ ( u ) ≤ K M ( u ) + c, for all u ∈ { , } ∗ .The following theorem is an important observation in algorithmic information theory. Theorem 2.1 (Optimality) . Every universal Turing machine is optimal.
For the duration of this paper, we let U be some fixed universal Turing machine. Definition.
The conditional Kolmogorov complexity of u ∈ Σ ∗ given w ∈ Σ ∗ is K ( u | w ) = K U ( u | w ) . The
Kolmogorov complexity of a string u ∈ Σ ∗ is K ( u ) = K ( u | λ ). For a detailed overview ofKolmogorov complexity and its properties, see [13].The following definition of the Kolmogorov complexity of sets of strings is also useful. Definition (Shen and Vereshchagin [24]). The
Kolmogorov complexity of a set S ⊆ Σ ∗ is K ( S ) = min { K ( u ) | u ∈ S } . Definition.
The lower and upper dimensions of S ∈ Σ ∞ are dim ( S ) = lim inf u → S K ( u ) | u | log | Σ | and Dim ( S ) = lim sup u → S K ( u ) | u | log | Σ | , respectively. 3e now proceed to prove several lemmas which describe how the dimensions of sequences and thedimensions of points in Euclidean space correspond to one another.Keeping in mind that tuples of rationals in Q n can be easily encoded as a string in Σ ∗ , we use thefollowing definition of the Kolmogorov complexity of points in Euclidean space. Definition.
The
Kolmogorov complexity of x ∈ R n at precision r ∈ N is K r ( x ) = K ( B − r ( x ) ∩ Q n ) . We recall a useful corollary from [4] that is used in the proof of Lemma 2.3.
Corollary 2.2.
For all x ∈ R n and r, s ∈ N , K r + s ( x ) ≤ K r ( x ) + o ( r ) . Lemma 2.3.
There is a constant c ∈ N such that, for all S, T ∈ Σ ∞ and r ∈ N , K (( S, T ) ↾ r ) = K r ( α S , α T ) + o ( r ) . Proof.
First we show that K r ( α S , α T ) ≤ K (( S, T ) ↾ r ) + o ( r ).Observe that (cid:12)(cid:12) ( α S , α T ) − ( α S ↾ r , α T ↾ r ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∞ X i =0 S [ i ] k − ( i +1) , ∞ X i =0 T [ i ] k − ( i +1) (cid:19) − (cid:18) r − X i =0 S [ i ] k − ( i +1) , r − X i =0 S [ i ] k − ( i +1) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∞ X i = r S [ i ] k − ( i +1) , ∞ X i = r T [ i ] k − ( i +1) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∞ X i = r S [ i ]2 − ( i +1) , ∞ X i = r T [ i ]2 − ( i +1) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = | (2 − r , − r ) |≤ − r , which implies the inequality K r − ( α S , α T ) ≤ K ( α S ↾ r , α T ↾ r ) . (2.2)Let M be a Turing machine such that, if U ( π ) = ( u , w )( u , w ) · · · ( u n − , w n − ) ∈ (Σ × Σ) ∗ , M ( π ) = (cid:18) n − X i =0 u i · k − ( i +1) , n − X i =0 w i · k − ( i +1) (cid:19) . (2.3)Let c M be an optimality constant for M and π ∈ { , } ∗ be a minimum-length program for ( S, T ) ↾ r .4y optimality and (2.3), K ( α S ↾ r , α T ↾ r ) ≤ K M ( α S ↾ r , α T ↾ r ) ≤ | π | + c M (2.4)= K (( S, T ) ↾ r ) + c M . Therefore, by Corollary 2.2, (2.2), and (2.4), K r ( α S , α T ) ≤ K r − ( α S , α T ) + o ( r ) ≤ K ( α S ↾ r , α T ↾ r ) + o ( r ) ≤ K (( S, T ) ↾ r ) + o ( r ) . Next we prove that K (( S, T ) ↾ r ) ≤ K r ( α S , α T )+ O (1). We consider the case where S = x ( k − ∞ , T = y ( k − ∞ , and x ∈ { , } ∗ and y ∈ { , } ∗ are either empty or end with a symbol other than( k − S has a tail that is an infinite sequence of the largest symbol in Σ and T does not. Let M ′ be a Turing machine such that, if U ( π ) = h q, p i for any two rationals q, p ∈ [0 , M ′ ( π ) = ( u , w )( u , w ) · · · ( u r − , w r − ) ∈ (Σ × Σ) ∗ , (2.5)where M ′ operates by running π on U to obtain ( q, p ) and searching for strings u = u u · · · u r − and w = w w · · · w r − such that q = | x |− X i =0 u i k − ( i +1) + ( k − k − ( | x | +1) , u | x |− < ( k − , and u i = ( k −
1) for i ≥ | x | , (2.6)and w i · k − ( i +1) ≤ p − ( w · k − + w · k − + · · · + w i − · k − i ) < ( w i + 1) · k − ( i +1) (2.7)for 0 ≤ i < r .Let c M ′ be an optimality constant for M ′ and m, t ∈ N such that m, t ≤ k r − α S , α T ) ∈ [ m · k − r , ( m + 1) · k − r ) × [ t · k − r , ( t + 1) · k − r ) . (2.8)Let ( q, p ) ∈ B k − r ( α S , α T ) ∩ [ m · k − r , ( m + 1) · k − r ) × [ t · k − r , ( t + 1) · k − r ) ∩ Q , (2.9)and let π be a minimum-length program for ( q, p ). First we show that u i = S [ i ] for all 0 ≤ i < r .We do not need to consider the case where i ≥ | x | because (2.6) assures us that u i = S [ i ]. Thus wewill always assume that i < | x | . If u = S [0], then, by (2.6), q / ∈ [ S [0] · k − , ( S [0] + 1) · k − ) . By (2.8), this implies that q / ∈ [ m · k − r , ( m + 1) · k − r ) , which contradicts (2.9). Now assume that u n = S [ n ] for all n ≤ i < r −
1. If u i +1 = S [ i + 1], then,5y (2.6), q / ∈ (cid:20) i X n =0 S [ n ] · k − ( i +1) + S [ i + 1] · k − ( i +2) , i X n =0 S [ n ] · k − ( i +1) + ( S [ i + 1] + 1) · k − ( i +2) (cid:19) . By (2.8), this implies that q / ∈ [ m · k − r , ( m + 1) · k − r ) , which contradicts (2.9). Therefore, u i = S [ i ] for all 0 ≤ i < r . A similar argument shows that w i = T [ i ], so we conclude that M ′ ( q, p ) = ( S, T ) ↾ r .By optimality, (2.5), and (2.9), K (( S, T ) ↾ r ) ≤ K M ′ (( S, T ) ↾ r ) + c M ′ ≤ | π | + c M ′ = K ( q, p ) + c M ′ = K ( B − r ( α S , α T ) ∩ [0 , ) + c M ′ ≤ K r ( α S , α T ) + O (1) , where the last inequality holds simply because we can design a Turing machine to transform anypoint from outside the unit square to its edge. All other cases for S and T can be proved in asimilar manner. Lemma 2.4.
There is a constant c ∈ N such that, for all S ∈ Σ ∞ and r ∈ N , K ( S ↾ r ) = K r ( α S ) + c. Proof.
Let 0 ∞ represent the sequence containing all 0’s. It is clear that there exist constants c , c ∈ N such that K ( S ↾ r ) = K (( S, ∞ ) ↾ r ) + c and K r ( α S ,
0) = K r ( α S ) + c . Therefore, by the above inequalities and Lemma 2.3, K ( S ↾ r ) = K (( S, ∞ ) ↾ r ) + c = K ( α S ,
0) + o ( r ) + c = K r ( α S ) + o ( r ) + c + c = K r ( α S ) + o ( r ) . Definition.
For any point x ∈ R n , the lower and upper dimensions of x are dim ( x ) = lim inf r →∞ K r ( x ) r Dim ( x ) = lim sup r →∞ K r ( x ) r , respectively.The next two corollaries describe principles that relate the dimensions of sequences to the di-mensions of the sequences’ real representations. The first follows from Lemma 2.3 and the secondfollows from Lemma 2.4. Corollary 2.5.
For all
S, T ∈ Σ ∞ , dim ( S, T ) = dim ( α S , α T ) and Dim ( S, T ) =
Dim ( α S , α T ) . Corollary 2.6.
For any sequence S ∈ Σ ∞ , dim ( S ) = dim ( α S ) . Lemma 2.7.
There is a constant c ∈ N such that, for all x, y ∈ { , } ∗ , K ( y | x ) ≤ K ( y | h x, K ( x ) i ) + K ( K ( x )) + c. Proof.
Let M be a Turing machine such that, if U ( π ) = K ( x ) and U ( π , h x, K ( x ) i ) = y , M ( π π , x ) = y. Let c M ∈ N be an optimality constant of M . Assume the hypothesis, and let π be a minimum-lengthprogram for K ( x ) and π be a minimum-length program for y given x and K ( x ). By optimality, K ( y | x ) ≤ K M ( y | x ) + c M ≤ | π π | + c M = K ( y | h x, K ( x ) i ) + K ( K ( x )) + c, where c = c M . Lemma 2.8.
For all x ∈ { , } ∗ , K ( K ( x )) = o ( | x | ) as | x | → ∞ .Proof. There exist constants c , c ∈ N such that K ( K ( x )) ≤ log K ( x ) + c ≤ log ( | x | + c ) + c = o ( | x | ) . as | x | → ∞ .The following lemma is well-known and can be found in [13]. Lemma 2.9.
There is a constant c ∈ N such that, for all x, y ∈ { , } ∗ , K ( x, y ) = K ( x ) + K ( y | x, K ( x )) + c. Corollary 2.10.
There is a constant c ∈ N such that, for all x, y ∈ { , } ∗ , K ( x, y ) ≤ K ( x ) + K ( y | x ) + c. Lemma 2.11.
For all x, y ∈ { , } ∗ , K ( y | x ) + K ( x ) ≤ K ( x, y ) + o ( | x | ) as | x | → ∞ . Proof.
By Lemma 2.7, there is a constant c ∈ N such that K ( y | x ) ≤ K ( y | h x, K ( x ) i ) + K ( K ( x )) + c . This implies that K ( y | x ) + K ( x ) ≤ K ( y | h x, K ( x ) i ) + K ( K ( x )) + K ( x ) + c . By Lemma 2.9, there is a constant c ∈ N such that K ( y | x ) + K ( x ) ≤ K ( x, y ) + K ( K ( x )) + c + c . Therefore, by Lemma 2.8, K ( y | x ) + K ( x ) ≤ K ( x, y ) + o ( | x | ) . as | x | → ∞ .The rest of this section is about mutual information and mutual dimension. We now provide thedefinitions of the mutual information between strings as defined in [13] and the mutual dimensionbetween sequences. Definition.
The ( algorithmic ) mutual information between u ∈ Σ ∗ and w ∈ Σ ∗ is I ( u : w ) = K ( w ) − K ( w | u ) . Definition.
The lower and upper mutual dimensions between S ∈ Σ ∞ and T ∈ Σ ∞ are mdim ( S : T ) = lim inf ( u,w ) → ( S,T ) I ( u : w ) | u | log | Σ | and M dim ( S : T ) = lim sup ( u,w ) → ( S,T ) I ( u : w ) | u | log | Σ | , respectively.(We insist that | u | = | w | in the above limits.) The mutual dimension between two sequences isregarded as the density of algorithmic mutual information between them.8 emma 2.12. For all strings x, y ∈ { , } ∗ , I ( x : y ) = K ( x ) + K ( y ) − K ( x, y ) + o ( | x | ) . Proof.
By definition of mutual information and Lemma 2.11, I ( x : y ) = K ( y ) − K ( y | x ) ≥ K ( x ) + K ( y ) − K ( x, y ) + o ( | x | ) . as | x | → ∞ . Also, by Corollary 2.10, there is a constant c ∈ N such that I ( x : y ) = K ( y ) − K ( y | x ) ≤ K ( x ) + K ( y ) − K ( x, y ) + c = K ( x ) + K ( y ) − K ( x, y ) + o ( | x | ) . as | x | → ∞ .The next two definitions were proposed and thoroughly investigated in [4]. Definition.
The mutual information between x ∈ R n and y ∈ R m at precision r ∈ N is I r ( x : y ) = min { I ( q : p ) | q ∈ B − r ( x ) ∩ Q n and p ∈ B − r ( y ) ∩ Q m } . Definition.
The lower and upper mutual dimensions between x ∈ R n and y ∈ R m are mdim ( x : y ) = lim inf r →∞ I r ( x : y ) r and M dim ( x : y ) = lim sup r →∞ I r ( x : y ) r Lemma 2.13.
For all
S, T ∈ Σ ∞ and r ∈ N , I ( S ↾ r : T ↾ r ) = I r ( α S : α T ) + o ( r ) . Proof.
By Lemmas 2.4, 2.3, and 2.12, I ( S ↾ r : T ↾ r ) = K ( S ↾ r ) + K ( T ↾ r ) − K (( S, T ) ↾ r ) + o ( r )= K r ( α S ) + K r ( α T ) − K r ( α S , α T ) + o ( r )= I r ( α S : α T ) + o ( r ) . as r → ∞ .The following corollary follows immediately from Lemma 2.13 and relates the mutual dimensionbetween sequences to the mutual dimension between the sequences’ real representations. Corollary 2.14.
For all
S, T ∈ Σ ∞ , mdim ( S : T ) = mdim ( α S : α T ) and M Dim ( S : T ) = M dim ( α S : α T ) . Theorem 2.15.
For all
S, T ∈ Σ ∞ ,1. dim ( S ) + dim ( T ) − Dim ( S, T ) ≤ mdim ( S : T ) ≤ Dim ( S ) + Dim ( T ) − Dim ( S, T ) .2. dim ( S ) + dim ( T ) − dim ( S, T ) ≤ M dim ( S : T ) ≤ Dim ( S ) + Dim ( T ) − dim ( S, T ) .3. mdim ( S : T ) ≤ min { dim ( S ) , dim ( T ) } ; M dim ( S : T ) ≤ min { Dim ( S ) , Dim ( T ) } .4. ≤ mdim ( S : T ) ≤ M dim ( S : T ) ≤ .5. mdim ( S : T ) = mdim ( T : S ) ; M dim ( S : T ) = M dim ( T : S ) .Proof. The theorem follows directly from the properties of mutual dimension between points inEuclidean space found in [4] and the correspondences described in corollaries 2.5, 2.6, and 2.14.
In this section we investigate the mutual dimensions between coupled random sequences. Becausecoupled randomness is new to algorithmic information theory, we first review the technical frame-work for it. Let Σ be a finite alphabet. A (
Borel ) probability measure on the Cantor space Σ ∞ of all infinite sequences over Σ is (conveniently represented by) a function ν : Σ ∗ → [0 ,
1] with thefollowing two properties.1. ν ( λ ) = 1, where λ is the empty string.2. For every w ∈ Σ ∗ , ν ( w ) = X a ∈ Σ ν ( wa ).Intuitively, here, ν ( w ) is the probability that w ⊑ S ( w is a prefix of S ) when S ∈ Σ ∞ is “chosenaccording to” the probability measure ν .Most of this paper concerns a very special class of probability measures on Σ ∞ . For each n ∈ N ,let α ( n ) be a probability measure on Σ, i.e., α ( n ) : Σ → [0 , X a ∈ Σ α ( n ) ( a ) = 1 , and let ~α = ( α (0) , α (1) , . . . ) be the sequence of these probability measures on Σ. Then the product of ~α (or, emphatically distinguishing it from the products ν × ν below, the longitudinal product of ~α ) is the probability measure µ [ ~α ] on Σ ∞ defined by µ [ ~α ]( w ) = | w |− Y n =0 α ( n ) ( w [ n ])for all w ∈ Σ ∗ , where w [ n ] is the n th symbol in w . Intuitively, a sequence S ∈ Σ ∞ is “chosenaccording to” µ [ ~α ] by performing the successive experiments α (0) , α (1) , . . . independently .To extend probability to pairs of sequences, we regard Σ × Σ as an alphabet and rely on thenatural identification between Σ ∞ × Σ ∞ and (Σ × Σ) ∞ . A probability measure on Σ ∞ × Σ ∞ is thus10 function ν : (Σ × Σ) ∗ → [0 , × Σ) ∗ as ordered pairs( u, v ), where u, v ∈ Σ ∗ have the same length . With this notation, condition 2 above says that, forevery ( u, v ) ∈ (Σ × Σ) ∗ , ν ( u, v ) = X a,b ∈ Σ ν ( ua, vb ) . If ν is a probability measure on Σ ∞ × Σ ∞ , then the first and second marginal probability measures of ν (briefly, the first and second marginals of ν ) are the functions ν , ν : Σ ∗ → [0 ,
1] defined by ν ( u ) = X v ∈ Σ | u | ν ( u, v ) , ν ( v ) = X u ∈ Σ | v | ν ( u, v ) . It is easy to verify that ν and ν are probability measures on Σ ∗ . The probability measure ν hereis often called a joint probability measure on Σ ∞ × Σ ∞ , or a coupling of the probability measures ν and ν .If ν and ν are probability measures on Σ ∞ , then the product probability measure ν × ν onΣ ∞ × Σ ∞ is defined by ( ν × ν )( u, v ) = ν ( u ) ν ( v )for all u, v ∈ Σ ∗ with | u | = | v | . It is well known and easy to see that ν × ν is, indeed a probabilitymeasure on Σ ∞ × Σ ∞ and that the marginals of ν × ν are ν and ν . Intuitively, ν × ν is thecoupling of ν and ν in which ν and ν are independent , or uncoupled .We are most concerned here with coupled longitudinal product probability measures on Σ ∞ × Σ ∞ .For each n ∈ N , let α ( n ) be a probability measure on Σ × Σ, i.e., α ( n ) : Σ × Σ → [0 , X a,b ∈ Σ α ( n ) ( a, b ) = 1 , and let ~α = ( α (0) , α (1) , . . . ) be the sequence of these probability measures. Then the longitudinalproduct µ [ ~α ] is defined as above, but now treating Σ × Σ as the alphabet. It is easy to see that themarginals of µ [ ~α ] are µ [ ~α ] = µ [ ~α ] and µ [ ~α ] = µ [ ~α ], where each α ( n ) i is the marginal on Σ givenby α ( n )1 ( a ) = X b ∈ Σ α ( n ) ( a, b ) , α ( n )2 ( b ) = X a ∈ Σ α ( n ) ( a, b ) . The following class of examples is useful [20] and instructive.
Example 3.1.
Let Σ = { , } . For each n ∈ N , fix a real number ρ n ∈ [ − , α ( n ) on Σ × Σ by α ( n ) (0 ,
0) = α ( n ) (1 ,
1) = ρ n and α ( n ) (0 ,
1) = α ( n ) (1 ,
0) = − ρ n . Then, writing α ~ρ for ~α , the longitudinal product µ [ α ~ρ ] is a probability measure on C × C . Itis routine to check that the marginals of µ [ α ~ρ ] are µ [ α ~ρ ] = µ [ α ~ρ ] = µ, where µ ( w ) = 2 −| w | is the uniform probability measure on C. It is convenient here to use Schnorr’s martingale characterization [22, 21, 23, 13, 19, 6] of thealgorithmic randomness notion introduced by Martin-L¨of [18]. If ν is a probability measure on Σ ∞ ,then a ν – martingale is a function d : Σ ∗ → [0 , ∞ ) satisfying d ( w ) ν ( w ) = P a ∈ Σ d ( wa ) ν ( wa ) for all11 ∈ Σ ∗ . A ν –martingale d succeeds on a sequence S ∈ Σ ∞ if lim sup w → S d ( w ) = ∞ . A ν –martingale d is constructive , or lower semicomputable , if there is a computable function ˆ d : Σ ∗ × N → Q ∩ [0 , ∞ ]such that ˆ d ( w, t ) ≤ ˆ d ( w, t + 1) holds for all w ∈ Σ ∗ and t ∈ N , and lim t →∞ ˆ d ( w, t ) = d ( w ) holdsfor all w ∈ Σ ∗ . A sequence R ∈ Σ ∞ is random with respect to a probability measure ν on Σ ∗ if nolower semicomputable ν –martingale succeeds on R .If we once again treat Σ × Σ as an alphabet, then the above notions all extend naturally toΣ ∞ × Σ ∞ . Hence, when we speak of a coupled pair ( R , R ) of random sequences , we are referringto a pair ( R , R ) ∈ Σ ∞ × Σ ∞ that is random with respect to some probability measure ν onΣ ∞ × Σ ∞ that is explicit or implicit in the discussion. An extensively studied special case hereis that R , R ∈ Σ ∞ are defined to be independently random with respect to probability measures ν , ν , respectively, on Σ ∞ if ( R , R ) is random with respect to the product probability measure ν × ν on Σ ∞ × Σ ∞ .When there is no possibility of confusion, we use such convenient abbreviations as “random withrespect to ~α ” for “random with respect to µ [ ~α ].”A trivial transformation of Martin-L¨of tests establishes the following well known fact. Observation 3.2. If ν is a computable probability measure on Σ ∞ × Σ ∞ and ( R , R ) ∈ Σ ∞ × Σ ∞ is random with respect to ν , then R and R are random with respect to the marginals ν and ν . Example 3.3. If ~ρ is a computable sequence of reals ρ n ∈ [ − , α ~ρ is as in Example 3.1, and( R , R ) ∈ C × C is random with respect to α ~ρ , then Observation 3.2 tells us that R and R arerandom with respect to the uniform probability measure on C .We recall basic definitions from Shannon information theory. Definition.
Let α be a probability measure on Σ. The Shannon entropy of α is H ( α ) = X a ∈ Σ α ( a ) log 1 α ( a ) . Definition.
Let α be probability measures on Σ × Σ. The
Shannon mutual information between α and α is I ( α : α ) = X ( a,b ) ∈ Σ × Σ α ( a, b ) log α ( a, b ) α ( a ) α ( b ) . Theorem 3.4 ([15]) . If ~α is a computable sequence of probability measures α ( n ) on Σ that convergeto a probability measure α on Σ , then for every R ∈ Σ ∞ that is random with respect to ~α , dim ( R ) = H ( α )log | Σ | The following is a corollary to Theorem 3.4.
Corollary 3.5. If ~α is a computable sequence of probability measures α ( n ) on Σ that converge toa probability measure α on Σ , then for every R ∈ Σ ∞ that is random with respect to ~α and every w ⊑ R , K ( w ) = | w |H ( α ) + o ( | w | ) . emma 3.6. If ~α is a computable sequence of probability measures α ( n ) on Σ × Σ that converge toa probability measure α on Σ × Σ , then for every coupled pair ( R , R ) ∈ Σ ∞ × Σ ∞ that is randomwith respect to ~α and ( u, w ) ⊑ ( R , R ) , I ( u : w ) = | u | I ( α : α ) + o ( | u | ) . Proof.
By Lemma 2.12, I ( u : w ) = K ( u ) + K ( w ) − K ( u, w ) + o ( | u | ) . We then apply Observation 3.2 and Corollary 3.5 to obtain I ( u : w ) = | u | ( H ( α ) + H ( α ) − H ( α )) + o ( | u | )= | u | I ( α : α ) + o ( | u | ) . The following is a corollary to Lemma 3.6.
Corollary 3.7. If α is a computable, positive probability measure on Σ × Σ , then, for every sequence ( R , R ) ∈ Σ ∞ × Σ ∞ that is random with respect to α and ( u, w ) ⊑ ( R , R ) , I ( u : w ) = | u | I ( α : α ) + o ( | u | ) . In applications one often encounters longitudinal product measures µ [ ~α ] in which the probabilitymeasures α ( n ) are all the same (the i.i.d. case) or else converge to some limiting probability measure.The following theorem says that, in such cases, the mutual dimensions of coupled pairs of randomsequences are easy to compute. Theorem 3.8. If ~α is a computable sequence of probability measures α ( n ) on Σ × Σ that converge toa probability measure α on Σ × Σ , then for every coupled pair ( R , R ) ∈ Σ ∞ × Σ ∞ that is randomwith respect to ~α , mdim ( R : R ) = M dim ( R : R ) = I ( α : α )log | Σ | . Proof.
By Lemma 3.6, we have mdim ( R : R ) = lim inf ( u,w ) → ( R ,R ) I ( u : w ) | u | log | Σ | = lim inf ( u,w ) → ( R ,R ) | u | I ( α : α ) + o ( | u | ) | u | log | Σ | = I ( α : α )log | Σ | A similar proof shows that
M dim ( R : R ) = I ( α : α ). Example 3.9.
Let Σ = { , } , and let ~ρ be a computable sequence of reals ρ n ∈ [ − ,
1] thatconverge to a limit ρ . Define the probability measure α on Σ × Σ by α (0 ,
0) = α (1 ,
1) = ρ and α (0 ,
1) = α (1 ,
0) = − ρ , and let α and α be the marginals of α . If α ~ρ is as in Example 3.1, then13or every pair ( R , R ) ∈ Σ ∞ × Σ ∞ that is random with respect to α ~ρ , Theorem 3.8 tells us that mdim ( R : R ) = M dim ( R : R )= I ( α : α )= 1 − H ( 1 + ρ . In particular, if the limit ρ is 0, then mdim ( R : R ) = M dim ( R : R ) = 0 . Theorem 3.8 has the following easy consequence, which generalizes the last sentence of Example3.9.
Corollary 3.10. If ~α is a computable sequence of probability measures α ( n ) on Σ × Σ that convergeto a product probability measure α × α on Σ × Σ , then for every coupled pair ( R , R ) ∈ Σ ∞ × Σ ∞ that is random with respect to ~α , mdim ( R : R ) = M dim ( R : R ) = 0 . Applying Corollary 3.10 to a constant sequence ~α in which each α ( n ) is a product probabilitymeasure α × α on Σ × Σ gives the following.
Corollary 3.11. If α and α are computable probability measures on Σ , and if R , R ∈ Σ ∞ areindependently random with respect to α , α , respectively, then mdim ( R : R ) = M dim ( R : R ) = 0 . We conclude this section by showing that the converse of Corollary 3.11 does not hold. This canbe done via a direct construction, but it is more instructive to use a beautiful theorem of Kakutani,van Lambalgen, and Vovk. The
Hellinger distance between two probability measures α and α onΣ is H ( α , α ) = sX a ∈ Σ ( p α ( a ) − p α ( a )) . (See [12], for example.) A sequence α = ( α (0) , α (1) , . . . ) of probability measures on Σ is stronglypositive if there is a real number δ > n ∈ N and a ∈ Σ, α ( n ) ( a ) ≥ δ . Kakutani[10] proved the classical, measure-theoretic version of the following theorem, and van Lambalgen[25, 26] and Vovk [27] extended it to algorithmic randomness. Theorem 3.12.
Let ~α and ~β be computable, strongly positive sequences of probability measures on Σ . 1. If ∞ X n =0 H ( α ( n ) , β ( n ) ) < ∞ , then a sequence R ∈ Σ ∞ is random with respect to ~α if and only if it is random with respectto ~β . . If ∞ X n =0 H ( α ( n ) , β ( n ) ) = ∞ , then no sequence is random with respect to both ~α and ~β . Observation 3.13.
Let
Σ = { , } . If ρ = [ − , and probability measure α on Σ × Σ is definedfrom ρ as in Example 3.9, then H ( α × α , α ) = 2 − p ρ − p − ρ. Proof.
Assume the hypothesis. Then H ( α × α , α ) = X a,b ∈{ , } ( p α ( a ) α ( b ) − p α ( a, b )) = X a,b ∈{ , } (cid:18) − p α ( a, b ) (cid:19) = 2 (cid:18) − r ρ (cid:19) + 2 (cid:18) − r − ρ (cid:19) = 2 − p ρ − p − ρ. Corollary 3.14.
Let
Σ = { , } and δ ∈ (0 , . Let ~ρ be a computable sequence of real numbers ρ n ∈ [ δ − , − δ ] , and let α ~ρ be as in Example 3.1. If ∞ X n =0 ρ n = ∞ , and if ( R , R ) ∈ Σ ∞ × Σ ∞ is random with respect to α ~ρ , then R and R are not independentlyrandom with respect to the uniform probability measure on C .Proof. This follows immediately from Theorem 3.12, Observation 3.13, and the fact that √ x + √ − x = 2 − x o ( x )as x → Corollary 3.15.
There exist sequences R , R ∈ C that are random with respect to the uniformprobability measure on C and satisfy M dim ( R : R ) = 0 , but are not independently random.Proof. For each n ∈ N , let ρ n = 1 √ n + 2 . Let ~ρ = ( ρ , ρ , . . . ), let α ~ρ be as in Example 3.1, and let ( R , R ) ∈ Σ ∞ × Σ ∞ be random withrespect to α ~ρ . Observation 3.2 tells us that R and R are random with respect to the marginalsof α ~ρ , both of which are the uniform probability measure on C . Since ρ n → n → ∞ , the last15entence in Example 3.9 tells us (via Theorem 3.8) that M dim ( R : R ) = 0. Since ∞ X n =0 ρ n = ∞ X n =0 n + 2 = ∞ , Corollary 3.14 tells us that R and R are not independently random. We begin this section by reviewing the Billingsley generalization of constructive dimension, i.e.,dimension with respect to strongly positive probability measures. A probability measure β on Σ ∞ is strongly positive if there exists δ > w ∈ Σ ∗ and a ∈ Σ, β ( wa ) > δβ ( w ). Definition.
The
Shannon self-information of w ∈ Σ is ℓ β ( w ) = | w |− X i =0 log 1 β ( w [ i ]) . In [17], Lutz and Mayordomo defined (and usefully applied) constructive Billingsley dimensionin terms of gales and proved that it can be characterized using Kolmogorov complexity. SinceKolmogorov complexity is more relevant in this discussion, we treat the following theorem as adefinition.
Definition (Lutz and Mayordomo [17]). The dimension of S ∈ Σ ∞ with respect to a stronglypositive probability measure β on Σ ∞ is dim β ( S ) = lim inf w → S K ( w ) ℓ β ( w ) . In the above definition the denominator ℓ β ( w ) normalizes the dimension to be a real number in[0 , u and w by log β ( u,w ) β ( u ) β ( w ) (i.e.,the self-mutual information or pointwise mutual information between u and w [9]) as ( u, w ) → ( S, T ). However, this results in bad behavior. For example, the mutual dimension between any twosequences with respect to the uniform probability measure on Σ × Σ is always undefined. Otherthoughtful modifications to this natural definition results in sequences having negative or infinitelylarge mutual dimension. The main problem here is that, given a particular probability measure, onecan construct certain sequences whose prefixes have extremely large positive or negative self-mutualinformation. In order to avoid undesirable behavior, we restrict the definition of Billingsley mutualdimension to sequences that are mutually normalizable.
Definition.
Let β be a probability measure on Σ ∞ × Σ ∞ . Two sequences S, T ∈ Σ ∞ are mutually β – normalizable (in this order) if lim ( u,w ) → ( S,T ) ℓ β ( u ) ℓ β ( w ) = 1 . efinition. Let
S, T ∈ Σ ∞ be mutually β –normalizable. The upper and lower mutual dimensions between S and T with respect to β are mdim β ( S : T ) = lim inf ( u,w ) → ( S,T ) I ( u : w ) ℓ β ( u ) = lim inf ( u,w ) → ( S,T ) I ( u : w ) ℓ β ( w )and M dim β ( S : T ) = lim sup ( u,w ) → ( S,T ) I ( u : w ) ℓ β ( u ) = lim sup ( u,w ) → ( S,T ) I ( u : w ) ℓ β ( w ) , respectively.The above definition has nice properties because β –normalizable sequences have prefixes withasymptotically equivalent self-information. Given the basic properties of mutual information andShannon self-information, we can see that0 ≤ mdim β ( S : T ) ≤ min { dim β ( S ) , dim β ( T ) } ≤ . Clearly,
M dim β also has a similar property. Definition.
Let α and β be probability measure on Σ. The Kullback-Leibler divergence between α and β is D ( α || β ) = X a ∈ Σ α ( a ) log α ( a ) β ( a )The following lemma is useful when proving Lemma 4.3 and Theorem 4.2. Lemma 4.1 (Frequency Divergence Lemma [16]) . If α and β are positive probability measures on Σ , then, for all S ∈ F REQ α , ℓ β ( w ) = ( H ( α ) + D ( α || β )) | w | + o ( | w | ) as w → S . The rest of this paper is primarily concerned with probability measures on alphabets. Our firstresult of this section is a mutual divergence formula for random, mutually β –normalizable sequences.This can be thought of as a “mutual” version of a divergence formula in [16]. Theorem 4.2 (Mutual Divergence Formula) . If α and β are computable, positive probability mea-sures on Σ × Σ , then, for every ( R , R ) ∈ Σ ∞ × Σ ∞ that is random with respect to α such that R and R are mutually β –normalizable, mdim β ( R : R )= M dim β ( R : R )= I ( α : α ) H ( α ) + D ( α || β ) = I ( α : α ) H ( α ) + D ( α || β ) . roof. By Corollary 3.7 and the Frequency Divergence Lemma, we have mdim β ( R : R ) = lim inf ( u,w ) → ( R ,R ) I ( u : w ) ℓ β ( u )= lim inf ( u,w ) → ( R ,R ) | u | I ( α : α ) + o ( | u | log | Σ | )( H ( α ) + D ( α || β )) | u | + o ( | u | )= lim inf ( u,w ) → ( R ,R ) | u | ( I ( α : α ) + o (log | Σ | )) | u | (( H ( α ) + D ( α || β )) + o (1))= I ( α : α ) H ( α ) + D ( α || β ) . Similar arguments show that mdim β ( R : R ) = I ( α : α ) H ( α ) + D ( α || β )and M dim β ( R : R ) = I ( α : α ) H ( α ) + D ( α || β ) = I ( α : α ) H ( α ) + D ( α || β ) . We conclude this section by making some initial observations regarding when mutual normaliz-ability can be achieved.
Definition.
Let α , α , β , β be probability measures over Σ. We say that α is ( β , β )– equivalent to α if X a ∈ Σ α ( a ) log 1 β ( a ) = X a ∈ Σ α ( a ) log 1 β ( a ) . For a probability measure α on Σ, let F REQ α be the set of sequences S ∈ Σ ∞ satisfyinglim n →∞ n − |{ i < n (cid:12)(cid:12) S [ i ] = a }| = α ( a ) for all a ∈ Σ. Lemma 4.3.
Let α , α , β , β be probability measures on Σ . If α is ( β , β ) –equivalent to α ,then, for all pairs ( S, T ) ∈ F REQ α × F REQ α , S and T are mutually β –normalizable.Proof. By the Frequency Divergence Lemma,lim ( u,w ) → ( S,T ) ℓ β ( u ) ℓ β ( w ) = lim n →∞ ( H ( α ) + D ( α || β )) · n + o ( n )( H ( α ) + D ( α || β )) · n + o ( n )= H ( α ) + D ( α || β ) H ( α ) + D ( α || β )= X a ∈ Σ α ( a ) log 1 β ( a ) X a ∈ Σ α ( a ) log 1 β ( a )= 1 , where the last equality is due to α being ( β , β )–equivalent to α .18iven probability measures β and β on Σ, we would like to know which sequences are mutually β –normalizable. The following results help to answer this question for probability measures on andsequences over { , } . Lemma 4.4.
Let β and β be probability measures on { , } such that exactly one of the followingconditions hold.1. < β (0) < β (1) < β (0) < β (1) < < β (1) < β (0) < β (1) < β (0) < < β (0) < β (0) < β (1) < β (1) < < β (1) < β (1) < β (0) < β (0) < β = µ and β = µ .If f is defined by f ( x ) = x · log β (1) β (0) + log β (1) β (1) log β (1) β (0) , then < f ( x ) < , for all x ∈ [0 , .Proof. First, observe that f is linear and has a negative slope under conditions 1 and 2, a positiveslope under conditions 3 and 4, and zero slope under condition 5. We verify that, for all x ∈ [0 , f ( x ) ∈ (0 ,
1) under each condition.Under condition 1, we assume β (0) < β (1) < β (1) , which implies that log β (0) β (1) < log β (1) β (1) < . From the above inequality, we obtain 0 < log β (1) β (1) log β (1) β (0) < . Therefore, by the definition of f , 0 < f (0) < . (4.1)Under the same condition, we have β (0) < β (1) , which implies that log β (0) β (1) < log β (1) β (1) . β (0) β (1) log β (1) β (0) < log β (1) β (1) log β (1) β (0) , whence 0 < log β (1) β (0) + log β (1) β (1) log β (1) β (0) . Therefore, by the definition of f , 0 < f (1) . (4.2)By (4.1), (4.2), and the negativity of the slope of f ,0 < f (1) < f (0) < . A similar argument shows that, if condition 2 holds, then 0 < f (1) < f (0) < β (0) < β (1) < β (1), then0 < f (0) < , (4.3)using the argument given above. Under the same condition, we have β (0) < β (0) , which implies thatlog β (1) − log β (0) + log β (1) − log β (1) < log β (1) − log β (0) . From this inequality, we derive log β (1) β (0) + log β (1) β (1) log β (1) β (0) < . Therefore, by the definition of f , f (1) < . (4.4)By (4.3), (4.4), and the positivity of the slope of f ,0 < f (0) < f (1) < . A similar argument shows that, if condition 4 holds, then 0 < f (1) < f (0) < β = µ and β (0) < / < β (1),which implies 0 < β (1) < log β (1) β (0) . < log β (1)1 / log β (1) β (0) < , whence, by the definition of f , 0 < f ( x ) < , for all x ∈ [0 , Theorem 4.5.
Let β and β be probability measures on { , } that satisfy exactly one of theconditions from Lemma 4.4, and let α be an arbitrary probability measure on { , } . Then α is ( β , β ) –equivalent to exactly one unique probability measure α , which is defined by α (0) = α (0) log β (1) β (0) + log β (1) β (1) log β (1) β (0) and α (1) = 1 − α (0) . Proof.
By Lemma 4.4, α is a valid probability measure. Observe that α (0) = α (0) log β (1) β (0) + log β (1) β (1) log β (1) β (0) if and only if α (0) (cid:18) log 1 β (0) − log 1 β (1) (cid:19) + log 1 β (1) = α (0) (cid:18) log 1 β (0) − log 1 β (1) (cid:19) + log 1 β (1) . The above equality holds if and only if α (0) log 1 β (0) + α (1) log 1 β (1) = α (0) log 1 β (0) + α (1) log 1 β (1) , which implies that α is ( β , β )–equivalent to α .The following corollary follows from Theorem 4.5 and Lemma 4.3. Corollary 4.6.
Let β , β , α , and α be as defined in Theorem 4.5. For all ( S, T ) ∈ F REQ α × F REQ α , S and T are mutually β –normalizable. Acknowledgments.
We thank an anonymous reviewer of [4] for posing the question answered byCorollary 3.15. We also thank anonymous reviewers of this paper for useful comments, especiallyincluding Observation 3.2.
References [1] P. Billingsley. Hausdorff dimension in probability theory.
Illinois Journal of Mathematics ,4:187–209, 1960. 212] P. Billingsley.
Ergodic theory and information . R. E. Krieger Pub. Co, 1978.[3] H. Cajar.
Billingsley dimension in probability spaces , volume 892 of
Lecture Notes in Mathe-matics . Springer, 1981.[4] Adam Case and Jack H. Lutz. Mutual dimension.
ACM Transactions on Computation Theory ,7, July 2015, article no. 12.[5] Thomas R. Cover and Joy A. Thomas.
Elements of Information Theory . John Wiley & Sons,Inc., second edition, 2006.[6] Rodney G. Downey and Denis R. Hirschfeldt.
Algorithmic Randomness and Complexity .Springer, 2010 edition, 2010.[7] H. G. Eggleston. The fractional dimension of a set defined by decimal properties.
The QuarterlyJournal of Mathematics , 20:31–36, 1965.[8] Xiaoyang Gu, Jack H. Lutz, R. Elvira Mayordomo, and Philippe Moser. Dimension spectra ofrandom subfractals of self-similar fractals.
Annals of Pure and Applied Logic , 165:1707–1726,2014.[9] Te Sun Han and Kingo Kobayashi.
Mathematics of Information and Coding . Translations ofMathematical Monographs (Book 203). American Mathematical Society, 2007.[10] Shizuo Kakutani. On equivalence of infinite product measures.
Annals of Mathematics ,49(1):214–224, 1948.[11] A. N. Kolmogorov. Three approaches to the quantitative definition of information.
Problemsof Information Transmission , 1(1):1–7, 1965.[12] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer.
Markov Chains and Mixing Times .American Mathematical Society, first edition, 2009.[13] Ming Li and Paul Vit´anyi.
An Introduction to Kolmogorov Complexity and Its Applications .Springer, third edition, 2008.[14] Jack H. Lutz. Dimension in complexity classes.
SIAM Journal on Computing , 32(5):1235–1259,2003.[15] Jack H. Lutz. The dimensions of individual strings and sequences.
Information and Computa-tion , 187(1):49–79, 2003.[16] Jack H. Lutz. A divergence formula for randomness and dimension.
Theoretical ComputerScience , 412:166–177, 2011.[17] Jack H. Lutz and Elvira Mayordomo. Dimensions of points in self-similar fractals.
SIAMJournal on Computing , 38(3):1080–1112, 2008.[18] Per Martin-L¨of. The definition of random sequences.
Information and Control , 9:602–619,1966.[19] Andre Nies.
Computability and Randomness . Oxford University Press, reprint edition, 2012.2220] Ryan O’Donnell.
Analysis of Boolean Functions . Cambridge University Press, first edition,2014.[21] Claus-Peter Schnorr. A unified approach to the definition of random sequences.
MathematicalSystems Theory , 5(3):246–258, 1971.[22] Claus-Peter Schnorr.
Zuflligkeit und Wahrscheinlichkeit: Eine algorithmische Begrndung derWahrscheinlichkeitstheorie . Springer-Verlag, 1971 edition, 1971.[23] Claus-Peter Schnorr. A survey of the theory of random sequences. In
Proceedings of theFifth International Congress of Logic, Methodology and Philosophy of Science , pages 193–211.Springer, 1977.[24] Alexander Shen and Nikolai K. Vereshchagin. Logical operations and Kolmogorov complexity.
Theoretical Computer Science , 271(1-2):125–129, 2002.[25] M. van Lambalgen.
Random Sequences . PhD thesis, University of Amsterdam, 1987.[26] M. van Lambalgen. Von mises’ definition of random sequences reconsidered.
Journal of SymbolicLogic , 52(3):725–755, 1987.[27] V.G. Vovk. On a criterion for randomness.