aa r X i v : . [ c s . CC ] J a n Approximations of Kolmogorov Complexity
Samuel [email protected] 30, 2020
Abstract
In this paper we show that the approximating the Kolmogorov complexity of a set of numbersis equivalent to having common information with the halting sequence. The more precise theapproximations are, and the greater the number of approximations, the more information isshared with the halting sequence. An encoding of the 2 N unique numbers and their Kolmogorovcomplexities contains at least & N mutual information with the halting sequence. We alsoprovide a generalization of the“Sets have Simple Members” theorem to conditional complexity. The Kolmogorov complexity of a string x , K ( x ), is the size of the smallest program that outputs x with respect to a universal prefix-free program. It is a well known fact that Kolmogorov complexity K is uncomputable (see [Kol65] and [Sol64]). In fact any computable function f : N → N that isnot greater than K is bounded by a constant. This is because for each n ∈ N , you can find an x n such that f ( x n ) > n . Thus x n can be identified with f and n , so K ( x n ) < O (log n ). However since f ≤ K , we have n < K ( x n ), thus causing a contradiction for large enough n .The authors in [BFNV05], using expanding graphs, introduced an algorithm that when givena non-random string, outputs a small list of strings of the same length containing a string withhigher complexity. In [Zim16], an algorithm was presented that when given a non-random string,outputs a large list of strings of the same length where 99% of the outputted strings have highercomplexity. Given a universal machine U , a c -short program for x is a string p such that U ( p ) = x and the length of p is bounded by c + K ( x ). The authors in [BMVZ13] showed that there exists acomputable function that maps every x to a list of size | x | containing a O (1)-short program for x .In this paper, we show that the approximate knowledge about the Kolmogorov complexity of afinite number of strings is equivalent to sharing a certain amount of information with the haltingsequence. The more strings in the collection and the better their approximation to K , the moreinformation this collection has with the halting sequence. The mutual information between anencoding of 2 N unique numbers alongside their Kolmogorov complexity and the halting sequenceis at least & N . Due to information non-growth laws, there is no (randomized) algorithmic meansto produce information with the halting sequence.We also provide a generalization of “Sets Have Simple Members theorem, first seen in [EL11],to conditional Kolmogorov complexity and conditional algorithmic probability. The theorem statesthat the minimum conditional complexity, over pairs specified by an binary relation is less than thenegative log of the combined conditional algorithmic probability of all pairs in the enumeration. We use N , Q , R , Σ, Σ ∗ , Σ ∞ to denote natural numbers, rational numbers, real numbers, bits { , } ,finite strings, and infinite sequences. We use X > and X ≥ to denote the positive and non-negative1lements of set X . The i th bit a string x is x [ i ]. For string x ∈ Σ ∗ , x − = x − = x . The lengthof a string x is k x k . The size of a set D ⊆ Σ ∗ is | D | . For x ∈ Σ ∗ and y ∈ Σ ∗ ∪ Σ ∞ , we use x ⊑ y iff x = y or there is some string z ∈ Σ ∗ ∪ Σ ∞ where xz = y . We say x ⊏ y iff x = y and x ⊑ y .The self delimiting code of a string x ∈ Σ ∗ is h x i = 1 k x k x . The encoding of (a possibly ordered)set { x , . . . , x m } ⊂ Σ ∗ , is h m ih x i . . . h x m i .A (discrete) measure Q is a function Q : Σ ∗ → R ≥ . Measure Q is a semi-measure iff P x Q ( x ) ≤
1. Measure Q is a probability measure iff P x Q ( x ) = 1. The support of a measure Q is supp( Q ) = { x : Q ( x ) > } . Probability measure Q is elementary if | supp( Q ) | < ∞ and Range( Q ) ⊂ Q ≥ .Elementary probability measures Q with { x , . . . , x m } = Supp( Q ) are encoded by finite strings ofthe h Q i = h{ x , Q ( x ) , . . . , x m , Q ( w m ) }i . For semi-measure Q , we say function d : Σ ∗ → R ≥ is a Q test, iff P x Q ( x )2 d ( x ) ≤ f , we use ≤ + f , ≥ + f , = + f to denote < f + O (1), > f − O (1),and = f ± O (1). We also use ≤ log f and ≥ log f to denote < f + O (log( f +1)) and > f − O (log( f +1)).We use algorithms T α ( x ) on input programs x ∈ Σ ∗ and auxilliary inputs α ∈ Σ ∗ ∪ Σ ∞ . T isa prefix free algorithm if for all α ∈ Σ ∗ ∪ Σ ∞ and x, s ∈ Σ ∗ , s = ∅ , T α ( x ) does not halt or T α ( xs )does not halt. There exists a universal prefix algorithm U where for all prefix algorithm T thereexists a t ∈ Σ ∗ , where for all x ∈ Σ ∗ and α ∈ Σ ∗ ∪ Σ ∞ , U α ( tx ) = T α ( x ). As is standard, we defineKolmogorov complexity with respect to U , with for x, y ∈ Σ ∗ , K ( x | y ) = min {k p k : U y ( p ) = x } .The universal probability m is defined as m ( x | y ) = P { −k p k : U y ( x ) = p } . By the coding theorem K ( x | y ) = + − log m ( x | y ). By the chain rule, K ( x, y ) = + K ( x ) + K ( y | x, K ( x )).Let F : D → N be a function with a finite domain D ⊂ Σ ∗ , | D | < ∞ . Then h F i , where D = { x , . . . , x m } , is h{ x , F ( x ) , . . . , x m , F ( x m ) }i . K ( F ) = K ( h F i ). The complexity of generalpartial computable functions is defined as the length of the shortest U -program that computes it.The halting sequence H ∈ Σ ∞ is the unique infinite sequence where H [ i ] is equal to 1 iff U ( i ) halts.The information that H has about x ∈ Σ ∗ is I ( x : H ) = K ( x ) − K ( x |H ). This paper uses notions of total strings and left-total machines. A string x is total if all sufficientlylong extensions of x will cause the universal Turing machine U to halt. More formally x is total ifand only if there exists a finite prefix free set of strings G ⊂ Σ ∗ such that P { −k y k : y ∈ G } = 1,and for all y ∈ G , U ( xy ) halts. Along with totality, we introduce the notion of leftness. We say x ∈ Σ ∗ is to the left of y ∈ Σ ∗ , x ⊳ y , iff there exists a string z ∈ Σ ∗ such that z ⊑ x and z ⊑ y .We say the universal Turing machine U is left total if for all strings x, y ∈ Σ ∗ , with x ⊳ y , if U ( y )halts then x is total. An example of the domain of a left total machine can be seen in Figure 1.This example also illustrates the reason for using “left” in the definition.Without loss of generality, we can assume that the universal Turing machine U is left-total. Werefer the readers to [Eps19b] on the explicit construction of a left-total universal Turing machine.The border sequence B ∈ Σ ∞ is the unique sequence where if x ⊏ B then x has both total andnon-total extensions. The sequence is called “border” because if x ⊳ B then x is total and if B ⊳ x ,then U will never halt when given x as the starting input.For total string b , we define the following function bbtime ( b ) = { t : U ( p ) runs in time t, p ⊳ b or p ⊒ b } as the longest running time of a program that is to the left of b or extends b . If b and b − are total, then bbtime ( b ) ≤ bbtime ( b − ). For total string b ∈ Σ ∗ , and x, y ∈ Σ ∗ , let m b ( x | y )be the algorithmic weight of x from programs conditioned on y in time bbtime ( b ). More formally, m b ( x | y ) = X { −k p k : U y ( p ) = x in time bbtime ( b ) } .
00 0 0 001 1 1 1111 b x x x x x x y Figure 1: The above diagram represents the domain of a left total machine with the0 bits branching to the left and the 1 bits branching to the right, with y = 110. For i ∈ { , . . . , } , x i ⊳ x i +1 and x i ⊳ y . Assuming T ( y ) halts, each x i is total. This alsoimplies each x − i is total as well.The term m b ( x | y ) is 0 if b is not total. If b and b − are total, then m ( b ) ≤ m ( b − ). This paper uses the notion of stochasticity, which is a part of algorithmic statistics. For a compre-hensive survey of algorithmic statistics, see [VS17]. A string is stochastic if it is typical of a simpleprobability measure. Typicality is measured by the deficiency of randomness d . The deficiency ofrandomness of a string x ∈ Σ ∗ , with respect to a probability measure Q , conditioned on auxillarystring v ∈ Σ ∗ , is d ( x | Q, v ) = ⌊− log Q ( x ) ⌋ − K ( x | v ) . The deficiency of randomness measures the difference between the length of the Q -Shannon-Fanocode for x and the shortest description of x (given v ). If x is typical, then its d measure will besmall. We say a string is ( j, k ) stochastic conditional to y , for j, k ∈ N and y ∈ Σ ∗ , if there existsa program v ∈ Σ j of length j where U y ( p ) = h Q i , and Q is an elementary probability measure,and d ( x | Q, h v, y i ) ≤ k . The stochasticity measure of a string x ∈ Σ ∗ , conditional on auxilliaryinformation y ∈ Σ ∗ isΛ( x | y ) = min { j + 3 log k : x is ( j, k ) stochastic conditional to y } . The following lemma is from [EL11]. It states that strings that have high stochasticity measuresare exotic, in that they have high mutual information with the halting sequence. Another versionof the lemma can be found in [Eps19b].
Lemma 1
For x, y ∈ Σ ∗ , Λ( x | y ) ≤ log I ( x : H| y ) . The following lemma is from [Eps19a]. A variant of the same idea can found in Proposition 5 of[VS17]. It states that there is no total computable function can increase the stochasticity of a stringby more than a constant factor (dependent on the complexity of the function).
Lemma 2
Given total recursive function g : Σ ∗ → Σ ∗ , Λ( g ( a )) < Λ( a ) + K ( g ) + O (log K ( g )) . b is total and b − is not total, then b − ⊏ B .This is because the border sequence B is defined by the unique sequence whose prefixes have totaland non total extensions. Since b is total and b − is not total, b − has total and non-total extensions.The following lemma states that if a prefix of border is simple relative to a string x and its ownlength, then it will be a part of the common information of x and H . Lemma 3 If b ∈ Σ ∗ is total and b − is not, and x ∈ Σ ∗ ,then K ( b ) + I ( x ; H| b ) ≤ log I ( x : H ) + K ( b |h x, k b ki ) . The following theorem is from [Eps19b]. It states that given two (not necessarily probabilistic)measures W and η with certain summation requirements, if the combined η -score of elements ofset D is large, then there exist an element in D can be identified by low W -code. Theorem 1
Relativized to computable W : N → R ≥ an η : N → R ≥ with P a ∈ N W ( a ) η ( a ) ≤ ,if for some finite set D ⊂ N , log P a ∈ D η ( a ) ≥ s ∈ N , then there exists a ∈ D with K ( a ) < − log W ( a ) − s + Λ( D ) + O ( K ( s )) . Corollary 1 shows that an encoding of any 2 n unique pairs h b, K ( b ) i has more than ∼ n bits ofmutual information with the halting sequence H . So all such large sets are exotic . Theorem 2
For any finite set D of natural numbers and L : D → N , where s = ⌊ log | D |⌋ , we have s < a ∈ D ( | L ( a ) − K ( a ) | ) + I ( L : H ) + O ( K ( s ) + log I ( L : H )) . Proof.
Let j = max a ∈ D | L ( a )+ ⌈ log m ( a ) ⌉| . Note that by the coding theorem K ( a ) = + − log m ( a ).Let b be the shortest total string with max a ∈ D | L ( a ) + ⌈ log m b ( a ) ⌉ | ≤ j + 1. K ( b |hk b k , L i ) ≤ + K ( j ) , (1)as there is a program that when given k b k , L and j , enumerates all total strings of length k b k andreturns the first x where max a ∈ D | L ( a ) + ⌈ log m x ( a ) ⌉| ≤ j + 1, which we call satisfying property A . This is equal to b , otherwise there is a b ′ ⊳ b , k b ′ k = k b k that satisfies property A . This impliesthat b ′− is total and satisfies property A . This implies a contradiction for b being the smallest totalstring satisfying property A . The same arguments can be used if b ⊳ b ′ . This also implies b − is nottotal. A graphical depiction of this argument can be seen in Figure 2.So for all a ∈ D , − log m b ( a ) − K ( a ) ≤ + j . Let η ( a ) = 1, and W ( a ) = m b ( a ). K ( h W, η i| b ) = O (1). Theorem 1, relativized to b , gives a ∈ D where K ( a | b ) < − log m b ( a ) − s + Λ( D | b ) + O ( K ( s )).So s < − log m b ( a ) − K ( a | b ) + Λ( D | b ) + O ( K ( s )) < − log m b ( a ) − K ( a ) + K ( b ) + Λ( D |h b, s i ) + O ( K ( s )) < log( m ( a ) / m b ( a )) + K ( b ) + Λ( D | b ) + O ( K ( s )) < j + K ( b ) + Λ( D | b ) + O ( K ( s )) . Let f be a total computable function that when given an encoding of a function G : R → N forfinite R ⊂ N , outputs R . Thus D = f ( L ). Due to Lemma 2, conditioned on b , s < j + K ( b ) +Λ( L | b ) + O ( K ( s )). Due to Lemma 1, s < j + K ( b ) + I ( L : H| b ) + O (log I ( L : H| b ) + K ( s )) . bb ′ b ′− Figure 2: A graphical argument for why the total string b in the proof of Theorem 2 isunique. Each path repsents a string, with 0s branching to the left and 1s branching tothe right. If another string b ′ exists with the desired m b ′ property, and it is to the leftof b , then its prefix b ′− will also be total and have the desired m b ′− property, causing acontradiction.Let h x = I ( L : H| x ). Due to Lemma 3 and Equation 1, K ( b ) + h b ≤ log h ∅ + K ( b |h L, k b ki ) ≤ log h ∅ + K ( j ). This implies s ≤ j + h ∅ + O ( K ( s ) + K ( j ) + log h ∅ ) . If 2 j ≥ s , then the theorem is trivially solved. So, assuming s > j , we have K ( s − j ) < O (log( s − j )) < O (log( K ( s ) + K ( j ) + h ∅ )). So K ( j ) ≤ + K ( s ) + K ( s − j ) < O ( K ( s ) + log( K ( j ) + h ∅ )).Therefore K ( j ) < O ( K ( s ) + log h ∅ ). So s < j + h ∅ + O ( K ( s ) + log h ∅ ). (cid:3) Corollary 1
Any set X ⊂ Σ ∗ of n unique pairs h b, K ( b ) i has n ≤ log I ( X : H ) . For U -programs p ∈ Σ ∗ that enumerate a (potentially infinite) binary relation and total string b ,we use p [ b ] ⊂ N × N to denote the finite binary relation enumerated by p in bbtime ( b ) steps. Weuse p [ ∞ ] ⊆ N × N to denote the entire binary relation enumerated by p . Theorem 3
For U -program p ∈ Σ ∗ that enumerates a binary relation, with i = max {⌈− log P ( x,y ) ∈ p [ ∞ ] ⌉ , } and h = I ( p : H ) , min ( x,y ) ∈ p [ ∞ ] K ( x | y ) < i + h + O ( K ( i ) + log h ) . Proof.
Let b ∈ Σ ∗ be the shortest total string where ⌈− log P ( x,y ) ∈ p [ b ] m b ( x | y ) ⌉ ≤ i + 1. Wehave the inequality K ( b | p, k b k ) ≤ + K ( i ) because there is a program that when given k b k , p , and i ,can enumerate all total strings c of length k b k and all pairs ( x, y ) ∈ p [ c ], and return the first totalstring c where ⌈− log P ( x,y ) ∈ p [ c ] m c ( x | y ) ⌉ ≤ i + 1, which we call satisfying property A . This stringis unique, otherwise there exists a string b ′ = b , k b ′ k = k b k which satisfies property A . If b ′ ⊳ b ,then b ′− is total and satisfies property A , contradicting the definition of B being the shortest totalstring satisfying property A . Similar reasoning can be used for when b ⊳ b ′ . Therefore b ′ = b , and b is unique. Figure 2 illustrates this point. 5et v ′ ∈ Σ ∗ , Q ′ be the program and elementary probability measure that minimize the stochas-ticity of p conditional on h b, i i , Λ( p |h b, i i ), where U h b,i i ( v ′ ) = h Q ′ i , and k v ′ k + 3 log max { d ( p | Q ′ , h v ′ , b, i i ) , } = Λ( p |h b, i i ) . Let Q be an elementary probability measure equal to Q ′ conditioned on the largest set of programs q that enumerate binary relations where ⌈− log P ( x,y ) ∈ q [ b ] m b ( x | y ) ⌉ ≤ i + 1, which we call satisfyingproperty B . Thus Q ( q ) = [ q ∈ T ] Q ′ ( q ) /Q ( T ), where T = { q : q ∈ Supp( Q ′ ) , q satisfies property B } .Let v ∈ Σ ∗ , U h b,i i ( v ) = h Q i , with v = v v ′ , where v ∈ Σ ∗ is helper code of size O (1). Thus K ( v | v ′ , b, i ) = O (1) which implies − K ( q | v, b, i ) ≤ + − K ( q | v ′ , b, i ). Let d = max { d ( q | Q, h v, b, i i ) , } .So k v k ≤ + k v ′ kk v k + 3 log d ≤ + k v ′ k + 3 log d = + k v ′ k + 3 log (max {− log Q ( q ) − K ( q | v, b, i ) , } ) ≤ + k v ′ k + 3 log (cid:0) max {− log Q ′ ( q ) − K ( q | v, b, i ) , } (cid:1) ≤ + k v ′ k + 3 log (cid:0) max {− log Q ′ ( q ) − K ( q | v ′ , b, i ) , } (cid:1) ≤ + Λ( q |h b, i i ) . (2)Let S = S { y : ( x, y ) ∈ q [ b ] , q ∈ supp( Q ) } , which is finite. Let δ y be a set of random vectors,indexed by y ∈ S , each of size ( c + d )2 i +1 . The number c ∈ N is a constant solely dependent on U to be determined later. Each element of the vector δ y is chosen with probability m b ( ·| y ), and ∅ ischosen with probability 1 − P x ∈ Σ ∗ m b ( x | y ). Let t H y : Σ ∗ → R ≥ , be a nonnegative function overstrings, parameterized by sets of strings H y , each of size ( c + d )2 i +1 , each indexed by a string y ∈ S .For an enumerative program q , t H y ( q ) = 0, if there exists ( x, y ) ∈ q [ b ] where x ∈ H y . Otherwise t H y ( q ) = e d + c − . So, using the fact (1 − m ) e m ≤ m ∈ [0 , E δ y [ Q ( t δ y )] = X q Q ( q ) Y y ∈ S (1 − X x :( x,y ) ∈ q [ b ] m b ( x | y )) ( c + d )2 i +1 e c + d − ≤ X q Q ( q ) Y y ∈ S e − P x :( x,y ) ∈ q [ b ] m b ( x | y )( c + d )2 i +1 e c + d − ≤ X q Q ( q ) e − ( P ( x,y ) ∈ q [ b ] m b ( x | y ) ) ( c + d )2 i +1 e c + d − ≤ X q Q ( q ) e − − i − ( c + d )2 i +1 e c + d − = e − < . Thus there exists a collection of sets G y , indexed by y ∈ S , where Q ( t G y )) ≤
1. This collection canbe found using brute search given v d , c , and h b, i i , with K ( G y | v, d, c, b, i ) = O (1).There exists y ∈ S , and ( x, y ) ∈ p [ b ] where x ∈ G y . Otherwise t G y ( p ) = e d + c − , and for properchoice of c , solely dependent on U , we have d > − log Q ( p ) − K ( p | v, b, i ) − O (1) > − log Q ( p ) − ( − log t G x ( p ) Q ( p ) + K ( t G x ( · ) Q ( · ) | v, b, i )) − O (1) > − log Q ( p ) − ( − log t G x ( p ) Q ( p ) + K ( G x , Q | v, b, i )) − O (1) > (log e )( c + d ) − K ( d, c ) − O (1) > d, c into the additive constants for the rest of the theorem. So t G y ( p ) = 0, and there exists an ( x, y ) ∈ p [ b ], where x ∈ G y . So K ( x | y, b, i ) ≤ + log | G y | + K ( G y | v, d, b, i ) + K ( v, d | b, i ) ≤ + i + 3 log d + k v k (3) ≤ + i + Λ( p | i, b ) (4) < i + I ( p : H| i, b ) + O (log I ( p : H| i, b )) (5) K ( x | y ) < i + K ( b ) + I ( p : H| b ) + O ( K ( i ) + log I ( p : H| b )) K ( x | y ) < i + K ( b ) + I ( p : H| b ) + O ( K ( i ) + log( I ( p : H| b ) + K ( b ))) K ( x | y ) < i + I ( p : H ) + K ( b | p, k b k ) + O ( K ( i ) + log( I ( p : H ) + K ( b | p, k b k ))) (6) K ( x | y ) < i + I ( p : H ) + O ( K ( i ) + log( I ( p : H ))) . (7)Equation 3 is due to the fact that v is a U program (conditioned on h b, i i ). So its conditionalcomplexity is not more than its length. Equation 4 is due to Equation 2. Equation 5 is due toLemma 1. Equation 6 is due to Lemma 3. Equation 7 is to the inequality K ( b | p, k b k ) ≤ + K ( i ). Corollary 2
For finite binary relation B ⊂ N × N , with i = max {⌈− log P ( x,y ) ∈ B m ( x | y ) ⌉ , } , min ( x,y ) ∈ B K ( x | y ) < i + I ( B : H ) + O ( K ( i ) + log I ( B : H )) . Corollary 3
For partial computable function f with i = max {⌈− log P x ∈ Dom( f ) m ( f ( x ) | x ) ⌉ , } , min x K ( f ( x ) | x ) < i + I ( f : H ) + O ( K ( i ) + log I ( f : H )) . References [BFNV05] H. Buhrman, L. Fortnow, I. Newman, and N. Vereshchagin. Increasing kolmogorovcomplexity. In
STACS 2005 , pages 412–421, 2005.[BMVZ13] B. Bauwens, A. Makhlin, N. Vereshchagin, and M. Zimand. Short lists with shortprograms in short time. In , pages98–108, 2013.[EL11] Samuel Epstein and Leonid Levin. On sets of high complexity strings.
CoRR ,abs/1107.1458, 2011.[Eps19a] S. Epstein. On the Complexity of Completing Binary Predicates. arXiv e-prints , pagearXiv:1907.04776, 2019.[Eps19b] Samuel Epstein. All sampling methods produce outliers.
CoRR , abs/1304.3872, 2019.[Kol65] A. N. Kolmogorov. Three approaches to the quantitative definition of information.
Prob-lems in Information Transmission , 1:1–7, 1965.[Sol64] R. J. Solomonoff. A Formal Theory of Inductive Inference, Part l.
Information andControl , 7:1–22, 1964.[VS17] Nikolay K. Vereshchagin and Alexander Shen. Algorithmic statistics: Forty years later.In
Computability and Complexity , pages 669–737, 2017.[Zim16] Marius Zimand. List approximation for increasing kolmogorov complexity.