Chaitin's Omega and an Algorithmic Phase Transition
CChaitin’s Omega and an Algorithmic Phase Transition
Christof Schmidhuber ∗ Zurich University of Applied Sciences, School of EngineeringTechnikumstrasse 9, 8401 Winterthur, CH-Switzerland
September 1, 2020
Abstract
We consider the statistical mechanical ensemble of bit string histories that are com-puted by a universal Turing machine. The role of the energy is played by the programsize. We show that this ensemble has a first-order phase transition at a critical tem-perature, at which the partition function equals Chaitin’s halting probability Ω. Thisphase transition is almost zeroth-order in the sense that the free energy is continuousnear the critical temperature, but almost jumps: it converges more slowly to its finitecritical value than any computable function. We define a non-universal Turing machinethat approximates this behavior of the partition function in a computable way by asuper-logarithmic singularity, and discuss some of its statistical mechanical properties.For universal Turing machines, we conjecture that the ensemble of bit string historiesat the critical temperature has a continuum formulation in terms of string theory.
Keywords:
Chaitin’s Omega, Complexity, Turing Machine, Algorithmic Thermodynamics, PhaseTransition, String Theory ∗ [email protected] a r X i v : . [ c s . CC ] A ug Introduction
In 1975, G. Chaitin [1] introduced a constant associated with a given universal Turingmachine U [2] that is often called the ”halting probability” Ω. It is computed as a weightedsum over all prefix-free input programs p for U that halt:Ω U = (cid:88) halting p(U) − l ( p ) = ∞ (cid:88) l =1 N ( l ) 2 − l (1)where p is a ”program” (a bit string made up of 0’s and 1’s), l ( p ) is its length (thenumber of bits), and N ( l ) is the number of prefix-free programs of length l for which U halts. Turing machines and prefix-free bit strings are briefly reviewed in section 2 and inthe appendix. For a general introduction to information theory, see [3].Most of the discussion around Ω has focused on its first few digits, which are deter-mined by the function N ( l ) for small program length l . As every mathematical hypothesiscan be translated into a halting problem (the question whether a given program halts for agiven Turing machine), many long-standing mathematical problems could be solved if onlyone could compute Ω digit by digit. Unfortunately, Ω is not computable by any haltingprogram, precisely because knowing Ω would imply that one could decide mathematicalproblems that are known to be undecidable in the sense of G¨odel’s incompleteness theorem[4]. Moreover, even when the first few digits of Ω are computable for a given universalTuring machine U , they are not universal: they depend on the choice of U .In this note, we will therefore not be concerned with the contribution of short programsto Ω, nor will we dwell much on the issues of incompleteness and undecidability. Instead,we will focus on the contribution of very long programs to Ω, i.e., on the behaviour of N ( l ) · − l as l → ∞ . More precisely, following [5, 6, 7], in a generalization of (1), weconsider the statistical mechanical ensemble of bit string histories with partition function Z U ( β ) = (cid:88) halting p(U) exp {− β · l ( p ) } with β = 1 kT , (2)where k is the Boltzmann constant and T the temperature as usual in statistical mechanics(see, e.g., [8] for a review of statistical mechanics and field theory). We will study Z as a2unction of β = β c + (cid:15) in the vicinity of the ”Chaitin point” β c = ln 2 with (cid:15) (cid:28) β c is the log of the alphabet size).We find that the ”Chaitin point” β = β c = ln 2 corresponds to a critical temperature,at which a first-order phase transition occurs. This phase transition has very curious prop-erties. In particular, the free energy is almost discontinuous: it converges more slowly toits finite critical value than any computable function. We illustrate this type of transitionin a toy model, namely a non-universal Turing machine (the ”counting machine”) thatapproximates this behavior of the free energy by a super-logarithmic singularity.In the outlook, we discuss the fascinating question whether there might be a continuumdescription of our bit string ensemble at the Chaitin point in terms of a (super-) stringtheory [9], in which the two-dimensional string world-sheet is spanned by the bit stringand the computation time. The generalization (2) of Chaitin’s halting probability was previously studied by Tadaki[5], who investigated the degree of randomness of the real number Z U ( β ), written in bi-nary form. The relation with statistical mechanics was pointed out by Calude and Stay[6], who also discussed variants of (2), in which the sum runs over general (as opposedto only prefix-free) programs (the partition function then diverges at β = ln 2, insteadof converging to Chaitin’s Ω). The statistical mechanical approach was formulated moremathematically by Tadaki in [10].Baez and Stay [7] defined the corresponding ”algorithmic” versions of the specific heatand other thermodynamic quantities. Morevover, they formally extended the Gibbs factor(2) by including two other terms, corresponding to the logarithm E ( p ) of the computationtime of the Turing machine, and to the expectation value N ( p ) of the output bitstringstring (interpreted as a natural number in binary form):exp {− β · l ( p ) − β · E ( p ) − β · N ( p ) } . (3)3lgorithmic versions of the Carnot cycle were also discussed in [7], and it was pointed outthat the partition function has a singularity at β = ln 2 , β = β = 0. Tadaki [11, 12]discussed computational aspects of this ”algorithmic phase transition”.Our paper complements this previous work by studying the nature of this algorithmicphase transition from a more physical point of view. In particular, a key question aboutphase transitions is, whether they are first-order or second-order. As mentioned, we resolvethis in sections 7 and 8 by showing that this one is an exotic first-order transition.In a seperate line of work (see [13, 14, 15] and references therein), Manin considers asimilar partition function as (2) and relates it to error-correcting codes, to Zipf’s law, andto renormalization in field theory (I thank D. Murfet for pointing this out to me). While(2) sums over all prefix-free input bit strings, the partition function of [15] is defined interms of a sum over all output bit strings B :˜ Z U ( β ) = (cid:88) B exp {− β · K ( B ) } , (4)Here, K ( B ) is the Kolmogorov complexity of B , i.e., the length of the shortest inputprogram that makes U compute B . K ( B ) is not computable for general bit strings B . Torelate (4) to (2), consider a generalization K β ( B ) of K ( B ), defined in terms of a sum overall programs p ( B ) that halt and whose output bit string is B : K β ( B ) = − β ln (cid:88) p ( B ) exp {− β · l ( p ) } . (5)In the limit β → ∞ , where the shortest program dominates the sum, K β ( B ) converges tothe Kolmogoroff complexity K ( B ). Summing over all B , we see that 2 can be regarded asa variant of (4), in which K ( B ) is replaced by K β ( B ).It will be interesting to try to extend the Hamiltonian described in [15] to the criticalpoint β = β c , replacing the ”energy” K ( B ) in (4) by K β c ( B ) (see also appendix A6, whichdiscusses the evolution of bit strings in a different ”time”, called ”world-sheet time”).One of the many issues to be addressed in this context is that it might turn out to bean undecidable problem whether or not the energy spectrum has a vanishing mass gap,corresponding to a second-order phase transition [16, 17].4 Turing Machines and Prefix-free Programs
We follow Chaitin’s definition of a Turing machine, which is reviewed in appendix A. A bitstring y is called a prefix of a bit string x , if x can be written as a concatenation x = yz ,with a third bit string z . A set of bit strings is called prefix-free, if no bit string is a prefixof another. For the current argument, it is sufficient to think of a Turing machine T as amap (”computation”) from a set P of prefix-free input bit strings p ∈ P (”programs”), forwhich the computation halts, to the set O of arbitrary output bit strings of any length: T : p ∈ P → T ( p ) ∈ O The output bit strings are written on a ”work tape” that extends infinitely in bothdirections. The computation manipulates them until it halts. The prefix-free input bitstrings are written on a finite read-only ”program tape” (see appendix A for details). Theprefix-free input programs p are what we sum over in (2), and whose lengths l play therole of the energy in the Boltzmann factor.Figure 1: Tree representation of prefix-free bit stringsOne may represent a set X of prefix-free bit strings by a tree (fig. 1). The vertices inthe ( l + 1)-th line (or l -th generation) of the graph represent the 2 l binary numbers b l with l digits. Branching to the left appends a 0, branching to the right appends a 1 at the endof b l to yield the next generation of b l +1 . At each vertex, the corresponding number b l iseither added to the set X l ⊂ X of prefix-free programs p l of size l (red dots) or not (blackdots). Black dots are prefixes (parents, grand-parents, ...) of red dots and give birth totwo children; we assume that each black dot is the prefix of at least one red dot. Red dotshave no children. In the figure, white dots represent bit strings that are never born.Let n l be the number of red dots (prefix-free programs) of length l . Let m l be thenumber of black dots (prefixes) of length l . Let w l = 2 l − n l − m l be the number of white5ots of length l . We define the percentages Q l = w l · − l of white dots and P l = n l · − l ofred dots in the l -th generation and get P l = Q l +1 − Q l with lim l →∞ Q l = ∞ (cid:88) l =1 P l = (cid:88) p l ∈ X − l = 1 , (6)where the last equation states that ”Kraft’s inequality is satisfied with equality” (see [3]).As an example of a set of prefix-free programs, consider ”Fibonacci coding”: a child is amember of X , if its last 2 digits are ”1” or - in a slight generalization - if its last N digitsare ”1”. In this case, one easily verifies that P l falls off exponentially as l → ∞ .For a given Turing machine T , there are two kinds of red dots: ˜ n l halting programsand n l − ˜ n l non-halting programs. We denote by h l = ˜ n l /n l the fraction of programs inthe l -th generation that halt. Then the partition function (2) can be written as Z U ( β ) = ∞ (cid:88) l =1 P l · h l e − (cid:15) · l with (cid:15) = β − β c , β c = ln 2 . (7) At the critical point β = β c = ln 2, our partition function (7) is Chaitin’s Ω (1): Z U ( β c ) = ∞ (cid:88) l =1 P l · h l = Ω < β < β c , the partition function diverges, as long as P l · h l falls off more slowly thanexponentially as l → ∞ , which is the case for any universal Turing machine U (see below).How exactly does Z U approach Ω as β approaches β c from above? Let us first discuss inhow far this singularity near β = β c is universal, i.e., independent of U .A universal Turing Machine (”UTM”) U is one that can simulate any other Turingmachine T i in the following sense: there is a finite bit string (”translator program”) c i suchthat for each program p , U ( c i p ) = T i ( p ). I.e., if p makes T i compute an output bit string,the concatenation c i p makes U compute the same output bit string. Let C i be the finitelength of the program c i . Then the partition function Z U ( β ) of the UTM contains thepartition function Z i ( β ) of T i as a subset: Z U ( β ) ≥ e − βC i · Z i ( β )6his applies to all Turing machines T i . Thus, as (cid:15) = β − β c →
0, the Turing machine T i with the strongest singularity (i.e., with the largest derivative Z (cid:48) i ( (cid:15) ) at (cid:15) ∼
0) dominatesthe singularity of the partition function Z U ( β ) at β = ln 2. As this applies to all U , weconclude that this singularity is universal, i.e., independent of the choice of the UTM, upto an overall pre-factor 2 − C i .Our ensemble (2) includes only programs that halt. This makes it intractable, as itis generally an undecidable question whether a given Turing machine halts for a givenprogram. Thus, the factor h l in (7), Chaitin’s Ω, and the partition function Z U ( β ) areactually not computable by any halting program. These issues around un-decidability andnon-computability, fascinating as they may be, will not play a major role here. It is clearfrom (7) that the strongest singularity in (cid:15) corresponds to the product P l · h l that decaysmost slowly as l → ∞ . Thus, the non-computable factor h l can only make this singular-ity weaker. We will therefore first discuss non-universal Turing machines T i for which allprograms halt (i.e. h l = 1), and then return to universal Turing machines in the last section.As an example of a function P l that converges more slowly than that from Fibonaccicoding, let the N of Fibonacci coding grow with the program length: N ( l ) = int(1 + lg l ),where lg ≡ log . In this case, it is not difficult to see that P l decays like a power of l : P l ∝ l − α as l → ∞ with α > ⇒ − Z ( (cid:15) ) ∝ (cid:15) α − as (cid:15) = β − β c → We now define a Turing machine T that we call the ”counting machine”, corresponding toa particular set of prefix-free programs, that always halts. We will then show that its par-tition function (7) has a computable, super-logarithmic singularity that, for our purposes,serves as a good model of the singularity of UTM’s.Let us first describe the output of the machine T . Given any infinite input bit string p on the program tape, T writes a number N of 1’s in a row on its otherwise blank worktape and then halts. We call p N the prefix of p consisting only of those bits of p that have7een read by the time the machine halts, i.e., the machine halts on the last bit of p N . Thisdefines a set P of prefix-free input programs p N ∈ P . We will construct T such that anynumber N ∈ N of 1’s appears as the output bit string of exactly one such p N , namely:for N < p = 00 , p = 01 , p = 10 with length l N = 2for N = 3 : p = 110 with length l N = 3 (8)for N > p N = 11 n ...n k N l N = 6 + n + ... + n k where n k is the binary length of N , n k − is the binary length of n k , and so on, until alength n = 3 = 11 is reached. For N > p N begins with ”11” and ends with ”0”. Thenumber of iterations k can be recursively expressed as follows: k ( N ) = (cid:40) N <
41 + k (1 + lg N ) if N ≥ k (4) = k (7) = 1 , k (8) = k (127) = 2 , k (128) = 3, and so on.Next, we describe how T reconstructs N from p N . Given an infinte string p on theprogram tape, T proceeds as follows:1. T reads the first two digits n = p p of p . If n = 00 , T leaves the work tapeblank and halts; if n = 01 , T writes 1 on the work tape and halts; if n = 10 , T writes 11 on the work tape and halts. If n = 11 , T reads the next digit p anddefines the new integer m = 3.2. If p m = p = 0, T writes 1 n = 111 on the work tape, then halts. If p m = p = 1, T reads the next n = 3 digits p m +1 , .., p m + n of p , i.e., p , p , p . T defines m = m + n = 6 and n = p p p (the concatenation with p but without p )...i. In the i -th step, if p m i − = 0, T writes 1 n i − on the work tape and halts. If p m i − = 1, T reads in the next n i − digits. It defines m i = m i − + n i − and the concatenation n i = p m i − ... p m i − and moves on to step ( i + 1), until T halts. If T halts in the i -th step, then i is related to k of (9) by k = i −
2, and N = n i − .8.g., in the third step, p = 1 and m = 6. Suppose, n = p p p = 101 = 5. Ifthe sixth digit p of p is 0, T writes a sequence of n = 5 1’s on the tape and halts.In this case, k = 1 and N = 5. However, if p = 1, T reads in the next five digits p ...p , defines m = m + n = 11 and n = p ...p , and moves on to step 4.As an example, consider the input bit string 11100110100. Then n = 11 , so in step2, T defines m = 6 , n = 100 = 4. In step 3, since p = 1, T sets m = 10 and readsin the 4-digit number n = 1101 = 13. In step 4, since the next digit p is a 0, T writes N = 13 digits 1 in a row and halts. Only the first 10 digits 1110011010 of the input bitstring constitute an element of P . More generally, if the counting machine halts in step k ,the first m k − digits of the input string constitute an element of P . The first elements are: P = { , , , , , , , , , ... } One may verify that any number N of 1’s in a row appears as the output bit string ofexactly one program p N ∈ P , as claimed above. It is also clear that P is complete in thesense that it cannot be enlarged by any additional bit string without spoiling its propertyof being prefix-free. As a result, (6) implies that Z ( β c ) = 1.Although the counting machine T only produces bit strings that are trivial in the sensethat they contain only 1’s, variants of the counting machine can be used to make them lesstrivial in subsequent steps. E.g., in a second step, one variant T may generate all integers k in binary form, such as k = 20 = 10100 , and then overwrite the 1’s by repeating k untilthe bit string ends: ”1010010100...”. In a third step, another variant T may generate allintegers m , and then create ”kinks” on the bit strings resulting from step 2, by flipping allbits after the m th digit. In this sense, the counting machine can be a tool for systematicallyand efficiently generating nontrivial output bit strings of increasing complexity.More generally, the counting machine T can be used whenever one needs a highlycompact specification of large numbers N by prefix-free programs. Of course, other setsof prefix-free programs may give a shorter description of individual large numbers, such as2 , at the expense of the average large number.Appendix A4 presents a concrete implementation of the counting machine T .9 Super-logarithmic Singularity
In this section, we compute how the partition function (7) approaches its critical value as β approaches β c from above in the case of the counting machine. The counting machinehalts for every input program ( h l = 1) and therefore has a computable partition functionˆ Z ( β ) = (cid:88) all p exp {− β · l ( p ) } = ∞ (cid:88) k =0 ˆ Z k ( β ) with ˆ Z ( β c ) = 1 , (10)where ˆ Z k ( β ) is the contribution from programs p that halt after k iterations, k beingdefined in (9). Using (8), we expand:ˆ Z ( β ) = 3 e − β + e − β , ˆ Z ( β ) = 4 e − β ˆ Z ( β ) = 8 e − β + 16 e − β + 32 e − β + 64 e − β ˆ Z k ( β ) = (cid:88) n ,...,n k ,N e − β · (6+ n + ... + n k ) ∼ (cid:88) n ,...,n k − (6+ n + ... + n k − ) · e − (cid:15)n k with (cid:15) = β − β c . (11)where n runs from 4 to 7, n i +1 runs from 2 n i − to 2 n i −
1, and N runs from 2 n k − to2 n k −
1. In the last line, we have expanded near β = β c = ln 2, and kept only the leadingterm in (cid:15) , noting that n k (cid:29) n k − . For a given k , let Λ k be the largest possible value of n k :Λ = 3 , Λ = 7 , Λ = 127 , Λ k +1 = 2 Λ k − . (12)If Λ k − (cid:29) /(cid:15) , we can approximate ˆ Z k in (11) by 0, since the minimum value of n k isΛ k − + 1. On the other hand, if Λ k (cid:28) /(cid:15) , we can approximate (cid:15) by 0 in ˆ Z k . This yieldsˆ Z = 7 / , ˆ Z = 1 /
16. Noting that there are always 2 n i / n i +1 , in thecase Λ k (cid:28) /(cid:15) we can iteratively perform the sum over n , ..., n k for k > Z k = 12 (cid:88) n ,...,n k − − n − ... − n k − = 14 (cid:88) n ,...,n k − − − n − ... − n k − = ... = 12 k +3 We now perform the sum (10) over k and first consider the (rare) case where 1 /(cid:15) = Λ K forsome K . In appendix A5, it is shown that, in this case, ˆ Z K = 2 − K − , Z K +1 = 0 to highaccuracy already for K ≥
4. Thus,ˆ Z ( (cid:15) ) = 78 + 116 + K (cid:88) k =2 ˆ Z k = 1 − − K − with 1 (cid:15) = Λ K (13)10he singularity in (cid:15) comes from the dependence of K on (cid:15) . To continue (13) to general (cid:15) ,we use the ”super-logarithm” slog ( x ) with basis 2 in the so-called ”linear approximation”:slog ( x ) = (cid:40) x − < x ≤ (lg( x )) + 1 if x > (1) = 0 , slog (2) = 1 , slog (4) = 2 , slog (2 x ) = slog ( x ) + 1. Realvalues of slog are interpolated from its integer part lg − ( x ) = int(slog ( x )) byslog ( x ) = lg − ( x ) + lg ... lg x with 1 + lg − ( x ) iterationsWe can now express K ( (cid:15) ) in terms of the super-logarithm by noting from (12) thatslog (Λ k +1 ) → slog (Λ k ) + 1 to very high accuracy already for k >
2: slog (Λ ) =1 + lg lg 3 ∼ . , slog (Λ ) = 2 + lg lg lg 7 ∼ . , slog (Λ k ) = k + 0 . ⇒ K ( (cid:15) ) ∼ slog (1 /(cid:15) ) − φ with φ = 0 . ... ˆ Z ( (cid:15) ) ∼ − λ · − slog (1 /(cid:15) ) = 1 − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − , (15)where λ = 2 φ − ∼ . − (1 /(cid:15) ) iterations of the logarithm in the last line.This continues (13) to any (cid:15) . Although the continuation (14) of the super-logarithm toreal values, and thus the continuation (15) of ˆ Z ( (cid:15) ), is not unique, different continuationsdiffer only by sub-leading orders in (cid:15) . Thus, (15) is the leading singularity of the partitionfunction ˆ Z ( (cid:15) ) at the critical point. This partition function is plotted in fig. 2. It convergesextremely slowly to 1 as (cid:15) →
0, and is continuous but ”almost” discontinuous.Figure 2: ˆ Z ( (cid:15) ) as a function of 1 /(cid:15) (left) and β (right)11 Critical Behavior
Armed with the results of section 5, we would now like to examine the phase transitionfor the counting machine near the critical point β = β c + (cid:15) with β c = ln 2 , (cid:15) (cid:28)
1. The freeenergy F and average program length (cid:104) l (cid:105) are:ˆ Z ( β ) = e − βF = (cid:88) p e − βl ( p ) ⇒ F ( β ) = − β ln ˆ Z ( β ) , (cid:104) l (cid:105) = − ∂ β ln ˆ Z ( β ) (16)The program length is the energy in our case. The heat capacity is (using T ∂ T = − β∂ β ): C ( T ) = − T ∂ F∂T ∼ − ∂ β ln (cid:104) l (cid:105) + higher orders in (cid:15) Generally, in a zeroth-order phase transition the free energy F ( T ) is discontinuous ata critical point T = T c . In a first-order transition, F ( T ) is continuous but ∂ T F ( T ) isdiscontinuous, the gap being the latent heat. In a second-order transition, ∂ T F ( T ) is alsocontinuous, but some higher-order derivative of F ( T ) is discontinuous [8]. In our case,ˆ Z ( (cid:15) ) = 1 − λ · − slog (1 /(cid:15) ) = 1 − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − where λ ∼ . − is the integer part of the super-logarithm and we have lg − (1 /(cid:15) )iterations of the logarithm. Thus, in the limit (cid:15) →
0, we have F ( (cid:15) ) ∝ − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − (17) (cid:104) l (cid:105) ∝ [ (cid:15) · lg 1 (cid:15) · lg lg 1 (cid:15) · ... · (lg ... lg 1 (cid:15) ) ] − (18) F ( (cid:15) ) is finite at the critical point. It is continuous, but almost discontinuous. Thus, thephase transition is first-order, but almost zeroth order. We also see that the latent heat isinfinite, and that the average program size (cid:104) l (cid:105) diverges at the critical point. The averagesize N of the output strings also diverges, as l is of the order lg N .To put things into perspective, the diameter of the observable universe, measured inPlanck lengths, is about D = 2 . For (cid:15) < /D , one needs to consider contributions to(16) from input bit strings with length l > D to continuously interpolate between F ( β c )and F ( β c + (cid:15) ). The super-logarithm of D is about 4.6, so for (cid:15) of order 2 − , Z is stillabout 0 .
76% away from 1. To get a super-logarithm of 5, we need a universe of diameter2 (cid:48) Planck lengths. Even then, Z is still 0 .
58% away from 1. In this sense, the super-logarithmic singularity is indistinguishable from a discontinuity of the partition functionat least for all bit string ensembles that can be hosted by our universe.12
Singularity for Universal Turing Machines
In the previous section, we have discussed the singularity of the partition function (7) near (cid:15) = 0 for the non-universal counting machine. How does it compare with the singularityfor a universal Turing machine?Since it is generally an undecidable question whether a given Turing machine halts fora given program, for a UTM the function h l in (7) and Chaitin’s Ω are not computable byany halting program. Neither is the singularity of Z ( (cid:15) ) at the critical point computable.In fact, Z ( (cid:15) ) converges towards Ω more slowly than any computable function.To see this, let us slightly modify the last step of the counting machine of section 4: if,in the i -th step, p m i − = 0, the modified T switches into a new mode: instead of writing1 n i − on the work tape, it reads the next Σ( n i − ) digits of the program p from the programtape, where Σ( n ) is the busy-beaver function. The modified machine ˜ T writes those digitson the work tape and then halts. Formula (11) thus gets replaced by˜ Z k ( β ) = (cid:88) n ,...,n k Σ( nk ) (cid:88) N =0 e − β · (6+ n + ... + n k +Σ( n k )) ∼ (cid:88) n ,...,n k − (6+ n + ... + n k ) · e − (cid:15) · Σ( n k ) Σ( n ) is known to diverge faster than any computable function as n → ∞ . This impliesthat ˜ Z ( (cid:15) ) converges more slowly than any computable function to its critical value 1 forthe modified machine ˜ T . Now, any UTM U simulates the modified machine ˜ T , if it is fedwith all possible input programs. This implies that, for any UTM, Z U ( (cid:15) ) converges moreslowly than any computable function to its critical value Ω.The conclusion for UTM’s is thus similar as for the counting machine: at the criticalpoint, the phase transition is first-order but almost zeroth-order, and the average programsize diverges. The average size of the output bit strings also diverges, as U simulates ˜ T ,among other machines. Strictly speaking, Z U ( (cid:15) ) for a UTM is continuous at (cid:15) = 0, but inpractise, the behavior is indistinguishable from a discontinuity. As we have seen, at leastfor all bit string ensembles within our universe, the super-logarithmic singularity of thecounting machine already has such an effective discontinuity. In this respect, the countingmachine provides a simplified toy model for our ensemble.13 Outlook
We have shown that our ensemble of bit string histories has a first-order phase transitionat the Chaitin point. This phase transition is almost zeroth-order in the sense that the freeenergy is continuous but ”almost” discontinuous: it converges to its critical value, namelyChaitin’s Ω, more slowly than any computable function. At this critical point, the averagesize of the input programs and the average size of the output bit strings both diverge.It is somewhat disappointing that the transition is first- and not second-order. Second-order phase transitions, such as the transition between water and steam at a temperatureof 374 o C and a pressure of 218 atm, are particularly interesting, because there the statis-tical mechanical system typically has a continuum limit, in the sense that it is describedin terms of some quantum field theory. The field represents the order parameter. In ourcase, a second-order transition would have been a clear indication that our ensemble of bitstring histories (2) has a continuum limit at the Chaitin point, where it would be describedby what could be called a ”logical quantum field theory”.However, although the phase transition is first-order in β , there might still be a contin-uum limit, once β = ln 2 is fixed. In particular, there might be second-order transitions inother parameters β i that multiply the infinitely many other operators, by which the Gibbsfactor (3) can be generalized. A few examples of such operators are the number of timesthe Turing machine flips a bit, or changes its state, or switches the direction in which thehead moves. Other such operators may describe properties of the output bit string, suchas its length, its number of ”kinks” (adjacent bits that are not equal), etc.As precedents of statistical mechanical systems that exhibit zeroth- or first-order phasetransitions as a function of one parameter, and second-order phase transitions as a func-tion of other parameters, consider the Ising model on a planar random surface [18], or theKosterlitz-Thouless transition in the sine-Gordon model on a planar random surface [19].In these systems, the free energy is discontinuous as a function of the two-dimensionalcosmological constant µ . Still, once µ is fixed to its critical value, there is a continuumlimit that is described by renormalizable two-dimensional field theories on random surfaces.These field theories are also known as ”non-critical string theories”.14t is tempting to take the analogy with string theory further: as the bit string on thework tape of the Turing machine evolves in computation time, its history can be recordedin a two-dimensional graph. An example is the graph in fig. 3 (center and left) of appendixA2, where a new line is added each time the head of the machine changes direction. Inanalogy with the world sheet that is swept out by superstrings [9], let us refer to such atwo-dimensional graph as a ”bit string world sheet”. In appendix A6, it is shown how thetime evolution of the bit string on such a discretized world sheet can be described by a computable Hamiltonian, and how the sum over input bit strings p in (1) turns this bitstring on the work tape into a quantum mechanical object.Suppose that our bit string ensemble (1) at the critical point β = ln 2 indeed has acontinuum limit, where it is described by a theory of dynamical continuous strings. Theonly known consistent (i.e., renormalizable, modular-invariant, tachyon-free) such theoriesare the various superstring theories [9], which are all related to each other by dualities [20].This leads us to conjecture that the Chaitin point is described by superstring theory in thelimit of very large bit strings, which would also suggest a curious answer to the questionwhat superstrings are made of: they might be purely mathematical objects, made of bits.Further work to support this argument is underway. Acknowledgements
I would like to thank my brother Juergen Schmidhuber for arising my interest in infor-mation theory. The current work was inspired by his idea of the ”Great Programmer”[21]. This research did not receive any specific grant from funding agencies in the public,commercial, or not-for-profit sectors. 15 eferences ppendix A: Review of Turing Machines
The appendix is organized as follows. A1 presents a Turing machine that contains only awork tape and no program tape. An example is given in A2. In A3, Chaitin’s definition ofa Turing machine is recalled. Using the example of A2 as a building block, we realize thecounting machine of section 4 in A4. A5 contains a supplementary argument to section 5.A6 constructs a computable transfer matrix/Hamiltonian for our ensemble (2).
A1. A simple Turing Machine
Our first example of a Turing machine contains a ”work tape” that extends infinitely inboth directions. It consists of cells that are blank, except for a finite, contingent bit stringof 0’s and 1’s (the ”input string”). A blank cannot be written between 0’s or 1’s, so itis not equivalent to a third letter in addition to 0 and 1. Rather, blank areas mark thebeginning and end of the string on the work tape. On the first cell of the input string sitsa head, which can read, write, and move in both directions. The head can be in one ofseveral states, labelled by 1, 2, 3, ... ,H. At each step, the machine operates as follows:1. it reads the bit on the work tape on which the head sits (0, 1 or a blank)2. depending on that bit and on its internal state, it writes a 0, 1 or a blank in thatcell on the work tape. It may only write a blank if the cell has a blank neighbour, toensure that the binary string remains contingent3. it moves the head either one cell to the left or one cell to the right4. it may or may not change its internal state5. If and when it reaches the state ”H”, it halts
A2. An example
As an example, consider a Turing machine with 5 states 1 , , , , H . The first six columnsof table 1 define how this particular machine writes 0 , −
1) or right (+1), and then switches to a new state, depending on the inputbit it reads (left column) and the state it is in (top row).First, let the input string be ”01”. Fig 3 (left) shows a two-dimensional graph of theevolution of the bit string, with a new row appended for each time step. The machine18 starts in state 1 on the first bit of the work tape, which is 0. • writes a 0, remains in state 1 and moves right to the next bit, whose value is 1. • writes a 1, remains in state 1, and moves right to the next bit, which is blank. • switches to state 2, and moves back left to the prevous bit, whose value is 1. • overwrites it with 0, switches to state 3, and moves left, and so on. Table 1
Current state Current state & program bit
Operation
Work Bit 1 2 3 4 5 Write 0 0 1 0 0 - - 2 1 1 1 2 1 2on work 1 1 0 1 1 0 1 2 1 1 1 2 1 2tape 2 2 2 2 0 2 2 2 1 1 1 2 1 2Set 0 1 2 3 4 - - 8 7 H 1 H H 6the new 1 1 3 4 4 5 5 8 7 H 1 H H 6state 2 2 H b in binary code, then the output string of this particular Turing machine alwaysconsists of b b = 1001 = 9 . By”condensed”, we mean that each time step now corresponds to a new square, rather thana new row, such that the computation time is the area of the graph. The head of themachine moves along the rows of the graph, and each time it changes direction, a new row19igure 3: A simple Turing machineis appended. For completeness, fig. 3 (right) also shows the state of the machine at eachpoint in the computation. The machine moves right along the light grey rows (state 1) andleft along the other (blue) rows (states 2,3,4).As an example of a universal Turing Machine (UTM) that can simulate all other Turingmachines, consider our brain: given the above table for any Turing machine, we can read itand use it to simulate the machine as you have just done if you have followed the exercise.Essentially, the table becomes part of the input, rather than being hard-coded into theTuring machine. For a more specific example of a universal Turing machine, see, e.g., [22].A UTM is arbitrarily flexible and can quickly compute strings with one Turing machinethat take a long time or are impossible to compute with another machine.There are many alternative, but equivalent definitions of Turing machines. E.g., onecan introduce other symbols in addition to 0 and 1, or more states, or one can work withseveral parallel work tapes instead of just one. A3. Chaitin’s Machine
In Chaitin’s definition, there is a read-only ”program tape” of finite length, in addition tothe work tape. The program tape begins with a blank cell followed by a finite bit stringof 0’s and 1’s, the ”program”. On the program tape sits another head, the ”programhead”. Initially, it sits on the blank cell. At each step, the machine performs the followingoperations in addition to steps 1-5 of subsection A1:20 initial step: it reads the bit on the program tape on which the head sits • last step: it moves the program head either one cell to the left or leaves it where it isThe machine either halts or runs forever without reading any more program bits. Asa result, the set of input programs, from the first to the last bit that has been read by themachine, is prefix-free. A4. The Counting Machine
As an example within Chaitin’s framework, we present an implementation of the countingmachine of section 4. We begin with the Turing machine of appendix (A1), and add afinite read-only program tape, on which the programs of section 4 are written. We startwith a work tape that is initially blank.We first add three additional states 6, 7, 8, whose role is to read the first two bits onthe program tape and get the machine started (steps 1 and 2 of section 4). The operationsin states 6, 7, 8 depend only on the program bit on which the program head sits, and noton the work bit on which the work head sits. They are defined in table 1. The machine isinitially in state 8 (in states 6 and 7, the program bit is then never 2).Next, we slightly modify state 2 in table 1 as follows: if the machine is in state 2, andthe head on the work tape sits on a blank, then it switches to state H only if the head of theprogram tape sits on a 0. Otherwise, it moves to a new state 5 (i.e., the bold-faced ”H” intable 1 is replaced by 5, if the program bit is 1). The operations of the new state 5 are alsodefined in table 1. Its role is to write a new portion from the program tape onto the worktape, thereby over-writing the contingent sequence of 1’s. Its operations depend both onthe current work bit and on the current program bit (the machine is never in state 5 whenthe program head is on a blank or when the work head is on a 0). It is straightforward toverify that this machine indeed represents the counting machine of section 4. A5. A Supplementary Argument
In section 5, we want to evaluate the K -th part of the partition functionˆ Z K ( (cid:15) ) = (cid:88) n ,...,n K − (6+ n + ... + n K − ) · e − (cid:15)n K in the case 1 (cid:15) = Λ K (19)21here K ≥ n runs from 4 to 7, n i +1 runs from 2 n i − to 2 n i −
1, and Λ K is the largestpossible value of n K . Specifically, Λ = 127 , Λ = 2 −
1, and therefore Λ K − ∼ lg(Λ K ) =lg(1 /(cid:15) ) to high accuracy for K ≥
4. Defining M = 2 n K − , the sum over n K yieldsˆ Z K ( (cid:15) ) = (cid:88) n ,n ,...,n K − − (6+ n + ... + n K − ) · A ( n K − , (cid:15) ) (20) A ( n K − , (cid:15) ) = 12 M M − (cid:88) n K = M/ e − (cid:15) · n K = 12 M (cid:15) ( e − M(cid:15)/ − e − M(cid:15) ) (21) A ( x, (cid:15) ) is plotted in fig. 4. It is a monotonously decaying function with A ( x, (cid:15) ) → (cid:40) for x (cid:28) lg (cid:15) ∼ Λ K − x (cid:29) lg (cid:15) ∼ Λ K − (22) n K − runs from 2 n K − − to 2 n K − −
1. Only for the maximal value of n K − are therea few values of n K − near Λ K − , for which A differs significantly from 1 /
4. Even in thiscase, the contribution of these differences is • small (of order 1%) for K = 4: for the highest value n = 7, n runs from 64 to 127.Only the last few of these n contribute significantly to the difference • practically zero for K ≥
5: e.g., for K = 5 and the highest value n = 127, n runsfrom 2 to 2 −
1. Only a tiny portion of these n contribute to the differenceFigure 4: The function A ( x )As long as K ≥
4, we can thus approximate A by 1 / x ≤ Λ K − to obtain ˆ Z K =2 − K − as claimed in section 5. An analogous argument, not repeated here, shows that wecan approximate A by 0 for x > Λ K − to obtain ˆ Z K +1 = 0, as long as K ≥
6. Hamiltonian Formulation
The evolution of the bit string in computation time can be represented by a two-dimensionalgraph, such as in fig. 3 (center and left) of appendix A2. In section 9, this is called the”bit string world sheet”. Here we show how the ”time” evolution on such discretized worldsheets can be described by a computable
Hamiltonian.To this end, let us assume that the head of the Turing machine changes direction at thecomputation time steps 0 < t < t < ... . At computation time t T , the ”bit string state” | S T (cid:105) can be described by a 3-tuple | S T (cid:105) = | b T , s T , k T (cid:105) with ”world-sheet time” T ∈ { , , , ... } , where b T is the bit string at time t T , s T is the state of the Turing machine at time t T , and k T is the position of the head at time t T ( k T = 1 means that the head sits on the firstnon-blank bit of the bit string). Given the state | S T (cid:105) , the next state | S T +1 (cid:105) can uniquelybe determined, as long as no input bits are read in between t T and t T +1 . If s T =”Halt”,we define | S T +1 (cid:105) = | S T (cid:105) . Note that k T +1 = ±∞ , if the head keeps moving in the samedirection without halting. However, for a given Turing machine and any finite bit string b T , it is always decidable whether this will happen, so k T +1 is still computable.Some states will prompt for an input bit to be read in from the input programs p in(1). Each input bit represents a random variable with value 0 or 1. At the time it is read(corresponding to a particular cell of the bit string world sheet), it changes the evolutionof the bit string in two possible ways, both of which are computable. This turns the bitstring state into a quantum mechanical superposition of states. E.g., the output of thecounting machine, if fed with all possible input programs, is the state | S ∞ (cid:105) = ∞ (cid:88) N =0 ψ N | N , ”Halt” , (cid:105) with ψ N = 2 − l N / → (cid:88) N | ψ N | = 1 , where l N is defined in (8). For any Turing machine, the evolution of the superposition | S T (cid:105) from ”world-sheet time” T to T + 1 is computable. Thus, it can be described by acomputable Transfer matrix, or - equivalently - by a computable Hamiltonian acting onthe Hilbert space spanned by all possible bit string states. Of course, for general Turingmachines, the halting problem re-appears: not all components of the superposition | S T (cid:105) will have s T =”Halt”, as T → ∞ , i.e., there may not exist a limit state | S ∞ (cid:105)(cid:105)