[PDF] Chaitin's Omega and an Algorithmic Phase Transition

Abstract

We consider the statistical mechanical ensemble of bit string histories that are computed by a universal Turing machine. The role of the energy is played by the program size. We show that this ensemble has a first-order phase transition at a critical temperature, at which the partition function equals Chaitin's halting probability Ω . This phase transition has curious properties: the free energy is continuous near the critical temperature, but almost jumps: it converges more slowly to its finite critical value than any computable function. At the critical temperature, the average size of the bit strings diverges. We define a non-universal Turing machine that approximates this behavior of the partition function in a computable way by a super-logarithmic singularity, and discuss its thermodynamic properties. We also discuss analogies and differences between Chaitin's Omega and the partition functions of a quantum mechanical particle, a spin model, random surfaces, and quantum Turing machines. For universal Turing machines, we conjecture that the ensemble of bit string histories at the critical temperature has a continuum formulation in terms of a string theory.

Full PDF

CChaitin’s Omega and an Algorithmic Phase Transition

Christof Schmidhuber ∗ Zurich University of Applied Sciences, School of EngineeringTechnikumstrasse 9, 8401 Winterthur, CH-Switzerland

September 1, 2020

Abstract

We consider the statistical mechanical ensemble of bit string histories that are com-puted by a universal Turing machine. The role of the energy is played by the programsize. We show that this ensemble has a ﬁrst-order phase transition at a critical tem-perature, at which the partition function equals Chaitin’s halting probability Ω. Thisphase transition is almost zeroth-order in the sense that the free energy is continuousnear the critical temperature, but almost jumps: it converges more slowly to its ﬁnitecritical value than any computable function. We deﬁne a non-universal Turing machinethat approximates this behavior of the partition function in a computable way by asuper-logarithmic singularity, and discuss some of its statistical mechanical properties.For universal Turing machines, we conjecture that the ensemble of bit string historiesat the critical temperature has a continuum formulation in terms of string theory.

Keywords:

Chaitin’s Omega, Complexity, Turing Machine, Algorithmic Thermodynamics, PhaseTransition, String Theory ∗ [email protected] a r X i v : . [ c s . CC ] A ug Introduction

In 1975, G. Chaitin [1] introduced a constant associated with a given universal Turingmachine U [2] that is often called the ”halting probability” Ω. It is computed as a weightedsum over all preﬁx-free input programs p for U that halt:Ω U = (cid:88) halting p(U) − l ( p ) = ∞ (cid:88) l =1 N ( l ) 2 − l (1)where p is a ”program” (a bit string made up of 0’s and 1’s), l ( p ) is its length (thenumber of bits), and N ( l ) is the number of preﬁx-free programs of length l for which U halts. Turing machines and preﬁx-free bit strings are brieﬂy reviewed in section 2 and inthe appendix. For a general introduction to information theory, see [3].Most of the discussion around Ω has focused on its ﬁrst few digits, which are deter-mined by the function N ( l ) for small program length l . As every mathematical hypothesiscan be translated into a halting problem (the question whether a given program halts for agiven Turing machine), many long-standing mathematical problems could be solved if onlyone could compute Ω digit by digit. Unfortunately, Ω is not computable by any haltingprogram, precisely because knowing Ω would imply that one could decide mathematicalproblems that are known to be undecidable in the sense of G¨odel’s incompleteness theorem[4]. Moreover, even when the ﬁrst few digits of Ω are computable for a given universalTuring machine U , they are not universal: they depend on the choice of U .In this note, we will therefore not be concerned with the contribution of short programsto Ω, nor will we dwell much on the issues of incompleteness and undecidability. Instead,we will focus on the contribution of very long programs to Ω, i.e., on the behaviour of N ( l ) · − l as l → ∞ . More precisely, following [5, 6, 7], in a generalization of (1), weconsider the statistical mechanical ensemble of bit string histories with partition function Z U ( β ) = (cid:88) halting p(U) exp {− β · l ( p ) } with β = 1 kT , (2)where k is the Boltzmann constant and T the temperature as usual in statistical mechanics(see, e.g., [8] for a review of statistical mechanics and ﬁeld theory). We will study Z as a2unction of β = β c + (cid:15) in the vicinity of the ”Chaitin point” β c = ln 2 with (cid:15) (cid:28) β c is the log of the alphabet size).We ﬁnd that the ”Chaitin point” β = β c = ln 2 corresponds to a critical temperature,at which a ﬁrst-order phase transition occurs. This phase transition has very curious prop-erties. In particular, the free energy is almost discontinuous: it converges more slowly toits ﬁnite critical value than any computable function. We illustrate this type of transitionin a toy model, namely a non-universal Turing machine (the ”counting machine”) thatapproximates this behavior of the free energy by a super-logarithmic singularity.In the outlook, we discuss the fascinating question whether there might be a continuumdescription of our bit string ensemble at the Chaitin point in terms of a (super-) stringtheory [9], in which the two-dimensional string world-sheet is spanned by the bit stringand the computation time. The generalization (2) of Chaitin’s halting probability was previously studied by Tadaki[5], who investigated the degree of randomness of the real number Z U ( β ), written in bi-nary form. The relation with statistical mechanics was pointed out by Calude and Stay[6], who also discussed variants of (2), in which the sum runs over general (as opposedto only preﬁx-free) programs (the partition function then diverges at β = ln 2, insteadof converging to Chaitin’s Ω). The statistical mechanical approach was formulated moremathematically by Tadaki in [10].Baez and Stay [7] deﬁned the corresponding ”algorithmic” versions of the speciﬁc heatand other thermodynamic quantities. Morevover, they formally extended the Gibbs factor(2) by including two other terms, corresponding to the logarithm E ( p ) of the computationtime of the Turing machine, and to the expectation value N ( p ) of the output bitstringstring (interpreted as a natural number in binary form):exp {− β · l ( p ) − β · E ( p ) − β · N ( p ) } . (3)3lgorithmic versions of the Carnot cycle were also discussed in [7], and it was pointed outthat the partition function has a singularity at β = ln 2 , β = β = 0. Tadaki [11, 12]discussed computational aspects of this ”algorithmic phase transition”.Our paper complements this previous work by studying the nature of this algorithmicphase transition from a more physical point of view. In particular, a key question aboutphase transitions is, whether they are ﬁrst-order or second-order. As mentioned, we resolvethis in sections 7 and 8 by showing that this one is an exotic ﬁrst-order transition.In a seperate line of work (see [13, 14, 15] and references therein), Manin considers asimilar partition function as (2) and relates it to error-correcting codes, to Zipf’s law, andto renormalization in ﬁeld theory (I thank D. Murfet for pointing this out to me). While(2) sums over all preﬁx-free input bit strings, the partition function of [15] is deﬁned interms of a sum over all output bit strings B :˜ Z U ( β ) = (cid:88) B exp {− β · K ( B ) } , (4)Here, K ( B ) is the Kolmogorov complexity of B , i.e., the length of the shortest inputprogram that makes U compute B . K ( B ) is not computable for general bit strings B . Torelate (4) to (2), consider a generalization K β ( B ) of K ( B ), deﬁned in terms of a sum overall programs p ( B ) that halt and whose output bit string is B : K β ( B ) = − β ln (cid:88) p ( B ) exp {− β · l ( p ) } . (5)In the limit β → ∞ , where the shortest program dominates the sum, K β ( B ) converges tothe Kolmogoroﬀ complexity K ( B ). Summing over all B , we see that 2 can be regarded asa variant of (4), in which K ( B ) is replaced by K β ( B ).It will be interesting to try to extend the Hamiltonian described in [15] to the criticalpoint β = β c , replacing the ”energy” K ( B ) in (4) by K β c ( B ) (see also appendix A6, whichdiscusses the evolution of bit strings in a diﬀerent ”time”, called ”world-sheet time”).One of the many issues to be addressed in this context is that it might turn out to bean undecidable problem whether or not the energy spectrum has a vanishing mass gap,corresponding to a second-order phase transition [16, 17].4 Turing Machines and Preﬁx-free Programs

We follow Chaitin’s deﬁnition of a Turing machine, which is reviewed in appendix A. A bitstring y is called a preﬁx of a bit string x , if x can be written as a concatenation x = yz ,with a third bit string z . A set of bit strings is called preﬁx-free, if no bit string is a preﬁxof another. For the current argument, it is suﬃcient to think of a Turing machine T as amap (”computation”) from a set P of preﬁx-free input bit strings p ∈ P (”programs”), forwhich the computation halts, to the set O of arbitrary output bit strings of any length: T : p ∈ P → T ( p ) ∈ O The output bit strings are written on a ”work tape” that extends inﬁnitely in bothdirections. The computation manipulates them until it halts. The preﬁx-free input bitstrings are written on a ﬁnite read-only ”program tape” (see appendix A for details). Thepreﬁx-free input programs p are what we sum over in (2), and whose lengths l play therole of the energy in the Boltzmann factor.Figure 1: Tree representation of preﬁx-free bit stringsOne may represent a set X of preﬁx-free bit strings by a tree (ﬁg. 1). The vertices inthe ( l + 1)-th line (or l -th generation) of the graph represent the 2 l binary numbers b l with l digits. Branching to the left appends a 0, branching to the right appends a 1 at the endof b l to yield the next generation of b l +1 . At each vertex, the corresponding number b l iseither added to the set X l ⊂ X of preﬁx-free programs p l of size l (red dots) or not (blackdots). Black dots are preﬁxes (parents, grand-parents, ...) of red dots and give birth totwo children; we assume that each black dot is the preﬁx of at least one red dot. Red dotshave no children. In the ﬁgure, white dots represent bit strings that are never born.Let n l be the number of red dots (preﬁx-free programs) of length l . Let m l be thenumber of black dots (preﬁxes) of length l . Let w l = 2 l − n l − m l be the number of white5ots of length l . We deﬁne the percentages Q l = w l · − l of white dots and P l = n l · − l ofred dots in the l -th generation and get P l = Q l +1 − Q l with lim l →∞ Q l = ∞ (cid:88) l =1 P l = (cid:88) p l ∈ X − l = 1 , (6)where the last equation states that ”Kraft’s inequality is satisﬁed with equality” (see [3]).As an example of a set of preﬁx-free programs, consider ”Fibonacci coding”: a child is amember of X , if its last 2 digits are ”1” or - in a slight generalization - if its last N digitsare ”1”. In this case, one easily veriﬁes that P l falls oﬀ exponentially as l → ∞ .For a given Turing machine T , there are two kinds of red dots: ˜ n l halting programsand n l − ˜ n l non-halting programs. We denote by h l = ˜ n l /n l the fraction of programs inthe l -th generation that halt. Then the partition function (2) can be written as Z U ( β ) = ∞ (cid:88) l =1 P l · h l e − (cid:15) · l with (cid:15) = β − β c , β c = ln 2 . (7) At the critical point β = β c = ln 2, our partition function (7) is Chaitin’s Ω (1): Z U ( β c ) = ∞ (cid:88) l =1 P l · h l = Ω < β < β c , the partition function diverges, as long as P l · h l falls oﬀ more slowly thanexponentially as l → ∞ , which is the case for any universal Turing machine U (see below).How exactly does Z U approach Ω as β approaches β c from above? Let us ﬁrst discuss inhow far this singularity near β = β c is universal, i.e., independent of U .A universal Turing Machine (”UTM”) U is one that can simulate any other Turingmachine T i in the following sense: there is a ﬁnite bit string (”translator program”) c i suchthat for each program p , U ( c i p ) = T i ( p ). I.e., if p makes T i compute an output bit string,the concatenation c i p makes U compute the same output bit string. Let C i be the ﬁnitelength of the program c i . Then the partition function Z U ( β ) of the UTM contains thepartition function Z i ( β ) of T i as a subset: Z U ( β ) ≥ e − βC i · Z i ( β )6his applies to all Turing machines T i . Thus, as (cid:15) = β − β c →

0, the Turing machine T i with the strongest singularity (i.e., with the largest derivative Z (cid:48) i ( (cid:15) ) at (cid:15) ∼

0) dominatesthe singularity of the partition function Z U ( β ) at β = ln 2. As this applies to all U , weconclude that this singularity is universal, i.e., independent of the choice of the UTM, upto an overall pre-factor 2 − C i .Our ensemble (2) includes only programs that halt. This makes it intractable, as itis generally an undecidable question whether a given Turing machine halts for a givenprogram. Thus, the factor h l in (7), Chaitin’s Ω, and the partition function Z U ( β ) areactually not computable by any halting program. These issues around un-decidability andnon-computability, fascinating as they may be, will not play a major role here. It is clearfrom (7) that the strongest singularity in (cid:15) corresponds to the product P l · h l that decaysmost slowly as l → ∞ . Thus, the non-computable factor h l can only make this singular-ity weaker. We will therefore ﬁrst discuss non-universal Turing machines T i for which allprograms halt (i.e. h l = 1), and then return to universal Turing machines in the last section.As an example of a function P l that converges more slowly than that from Fibonaccicoding, let the N of Fibonacci coding grow with the program length: N ( l ) = int(1 + lg l ),where lg ≡ log . In this case, it is not diﬃcult to see that P l decays like a power of l : P l ∝ l − α as l → ∞ with α > ⇒ − Z ( (cid:15) ) ∝ (cid:15) α − as (cid:15) = β − β c → We now deﬁne a Turing machine T that we call the ”counting machine”, corresponding toa particular set of preﬁx-free programs, that always halts. We will then show that its par-tition function (7) has a computable, super-logarithmic singularity that, for our purposes,serves as a good model of the singularity of UTM’s.Let us ﬁrst describe the output of the machine T . Given any inﬁnite input bit string p on the program tape, T writes a number N of 1’s in a row on its otherwise blank worktape and then halts. We call p N the preﬁx of p consisting only of those bits of p that have7een read by the time the machine halts, i.e., the machine halts on the last bit of p N . Thisdeﬁnes a set P of preﬁx-free input programs p N ∈ P . We will construct T such that anynumber N ∈ N of 1’s appears as the output bit string of exactly one such p N , namely:for N < p = 00 , p = 01 , p = 10 with length l N = 2for N = 3 : p = 110 with length l N = 3 (8)for N > p N = 11 n ...n k N l N = 6 + n + ... + n k where n k is the binary length of N , n k − is the binary length of n k , and so on, until alength n = 3 = 11 is reached. For N > p N begins with ”11” and ends with ”0”. Thenumber of iterations k can be recursively expressed as follows: k ( N ) = (cid:40) N <

41 + k (1 + lg N ) if N ≥ k (4) = k (7) = 1 , k (8) = k (127) = 2 , k (128) = 3, and so on.Next, we describe how T reconstructs N from p N . Given an inﬁnte string p on theprogram tape, T proceeds as follows:1. T reads the ﬁrst two digits n = p p of p . If n = 00 , T leaves the work tapeblank and halts; if n = 01 , T writes 1 on the work tape and halts; if n = 10 , T writes 11 on the work tape and halts. If n = 11 , T reads the next digit p anddeﬁnes the new integer m = 3.2. If p m = p = 0, T writes 1 n = 111 on the work tape, then halts. If p m = p = 1, T reads the next n = 3 digits p m +1 , .., p m + n of p , i.e., p , p , p . T deﬁnes m = m + n = 6 and n = p p p (the concatenation with p but without p )...i. In the i -th step, if p m i − = 0, T writes 1 n i − on the work tape and halts. If p m i − = 1, T reads in the next n i − digits. It deﬁnes m i = m i − + n i − and the concatenation n i = p m i − ... p m i − and moves on to step ( i + 1), until T halts. If T halts in the i -th step, then i is related to k of (9) by k = i −

2, and N = n i − .8.g., in the third step, p = 1 and m = 6. Suppose, n = p p p = 101 = 5. Ifthe sixth digit p of p is 0, T writes a sequence of n = 5 1’s on the tape and halts.In this case, k = 1 and N = 5. However, if p = 1, T reads in the next ﬁve digits p ...p , deﬁnes m = m + n = 11 and n = p ...p , and moves on to step 4.As an example, consider the input bit string 11100110100. Then n = 11 , so in step2, T deﬁnes m = 6 , n = 100 = 4. In step 3, since p = 1, T sets m = 10 and readsin the 4-digit number n = 1101 = 13. In step 4, since the next digit p is a 0, T writes N = 13 digits 1 in a row and halts. Only the ﬁrst 10 digits 1110011010 of the input bitstring constitute an element of P . More generally, if the counting machine halts in step k ,the ﬁrst m k − digits of the input string constitute an element of P . The ﬁrst elements are: P = { , , , , , , , , , ... } One may verify that any number N of 1’s in a row appears as the output bit string ofexactly one program p N ∈ P , as claimed above. It is also clear that P is complete in thesense that it cannot be enlarged by any additional bit string without spoiling its propertyof being preﬁx-free. As a result, (6) implies that Z ( β c ) = 1.Although the counting machine T only produces bit strings that are trivial in the sensethat they contain only 1’s, variants of the counting machine can be used to make them lesstrivial in subsequent steps. E.g., in a second step, one variant T may generate all integers k in binary form, such as k = 20 = 10100 , and then overwrite the 1’s by repeating k untilthe bit string ends: ”1010010100...”. In a third step, another variant T may generate allintegers m , and then create ”kinks” on the bit strings resulting from step 2, by ﬂipping allbits after the m th digit. In this sense, the counting machine can be a tool for systematicallyand eﬃciently generating nontrivial output bit strings of increasing complexity.More generally, the counting machine T can be used whenever one needs a highlycompact speciﬁcation of large numbers N by preﬁx-free programs. Of course, other setsof preﬁx-free programs may give a shorter description of individual large numbers, such as2 , at the expense of the average large number.Appendix A4 presents a concrete implementation of the counting machine T .9 Super-logarithmic Singularity

In this section, we compute how the partition function (7) approaches its critical value as β approaches β c from above in the case of the counting machine. The counting machinehalts for every input program ( h l = 1) and therefore has a computable partition functionˆ Z ( β ) = (cid:88) all p exp {− β · l ( p ) } = ∞ (cid:88) k =0 ˆ Z k ( β ) with ˆ Z ( β c ) = 1 , (10)where ˆ Z k ( β ) is the contribution from programs p that halt after k iterations, k beingdeﬁned in (9). Using (8), we expand:ˆ Z ( β ) = 3 e − β + e − β , ˆ Z ( β ) = 4 e − β ˆ Z ( β ) = 8 e − β + 16 e − β + 32 e − β + 64 e − β ˆ Z k ( β ) = (cid:88) n ,...,n k ,N e − β · (6+ n + ... + n k ) ∼ (cid:88) n ,...,n k − (6+ n + ... + n k − ) · e − (cid:15)n k with (cid:15) = β − β c . (11)where n runs from 4 to 7, n i +1 runs from 2 n i − to 2 n i −

1, and N runs from 2 n k − to2 n k −

1. In the last line, we have expanded near β = β c = ln 2, and kept only the leadingterm in (cid:15) , noting that n k (cid:29) n k − . For a given k , let Λ k be the largest possible value of n k :Λ = 3 , Λ = 7 , Λ = 127 , Λ k +1 = 2 Λ k − . (12)If Λ k − (cid:29) /(cid:15) , we can approximate ˆ Z k in (11) by 0, since the minimum value of n k isΛ k − + 1. On the other hand, if Λ k (cid:28) /(cid:15) , we can approximate (cid:15) by 0 in ˆ Z k . This yieldsˆ Z = 7 / , ˆ Z = 1 /

16. Noting that there are always 2 n i / n i +1 , in thecase Λ k (cid:28) /(cid:15) we can iteratively perform the sum over n , ..., n k for k > Z k = 12 (cid:88) n ,...,n k − − n − ... − n k − = 14 (cid:88) n ,...,n k − − − n − ... − n k − = ... = 12 k +3 We now perform the sum (10) over k and ﬁrst consider the (rare) case where 1 /(cid:15) = Λ K forsome K . In appendix A5, it is shown that, in this case, ˆ Z K = 2 − K − , Z K +1 = 0 to highaccuracy already for K ≥

4. Thus,ˆ Z ( (cid:15) ) = 78 + 116 + K (cid:88) k =2 ˆ Z k = 1 − − K − with 1 (cid:15) = Λ K (13)10he singularity in (cid:15) comes from the dependence of K on (cid:15) . To continue (13) to general (cid:15) ,we use the ”super-logarithm” slog ( x ) with basis 2 in the so-called ”linear approximation”:slog ( x ) = (cid:40) x − < x ≤ (lg( x )) + 1 if x > (1) = 0 , slog (2) = 1 , slog (4) = 2 , slog (2 x ) = slog ( x ) + 1. Realvalues of slog are interpolated from its integer part lg − ( x ) = int(slog ( x )) byslog ( x ) = lg − ( x ) + lg ... lg x with 1 + lg − ( x ) iterationsWe can now express K ( (cid:15) ) in terms of the super-logarithm by noting from (12) thatslog (Λ k +1 ) → slog (Λ k ) + 1 to very high accuracy already for k >

2: slog (Λ ) =1 + lg lg 3 ∼ . , slog (Λ ) = 2 + lg lg lg 7 ∼ . , slog (Λ k ) = k + 0 . ⇒ K ( (cid:15) ) ∼ slog (1 /(cid:15) ) − φ with φ = 0 . ... ˆ Z ( (cid:15) ) ∼ − λ · − slog (1 /(cid:15) ) = 1 − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − , (15)where λ = 2 φ − ∼ . − (1 /(cid:15) ) iterations of the logarithm in the last line.This continues (13) to any (cid:15) . Although the continuation (14) of the super-logarithm toreal values, and thus the continuation (15) of ˆ Z ( (cid:15) ), is not unique, diﬀerent continuationsdiﬀer only by sub-leading orders in (cid:15) . Thus, (15) is the leading singularity of the partitionfunction ˆ Z ( (cid:15) ) at the critical point. This partition function is plotted in ﬁg. 2. It convergesextremely slowly to 1 as (cid:15) →

0, and is continuous but ”almost” discontinuous.Figure 2: ˆ Z ( (cid:15) ) as a function of 1 /(cid:15) (left) and β (right)11 Critical Behavior

Armed with the results of section 5, we would now like to examine the phase transitionfor the counting machine near the critical point β = β c + (cid:15) with β c = ln 2 , (cid:15) (cid:28)

1. The freeenergy F and average program length (cid:104) l (cid:105) are:ˆ Z ( β ) = e − βF = (cid:88) p e − βl ( p ) ⇒ F ( β ) = − β ln ˆ Z ( β ) , (cid:104) l (cid:105) = − ∂ β ln ˆ Z ( β ) (16)The program length is the energy in our case. The heat capacity is (using T ∂ T = − β∂ β ): C ( T ) = − T ∂ F∂T ∼ − ∂ β ln (cid:104) l (cid:105) + higher orders in (cid:15) Generally, in a zeroth-order phase transition the free energy F ( T ) is discontinuous ata critical point T = T c . In a ﬁrst-order transition, F ( T ) is continuous but ∂ T F ( T ) isdiscontinuous, the gap being the latent heat. In a second-order transition, ∂ T F ( T ) is alsocontinuous, but some higher-order derivative of F ( T ) is discontinuous [8]. In our case,ˆ Z ( (cid:15) ) = 1 − λ · − slog (1 /(cid:15) ) = 1 − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − where λ ∼ . − is the integer part of the super-logarithm and we have lg − (1 /(cid:15) )iterations of the logarithm. Thus, in the limit (cid:15) →

0, we have F ( (cid:15) ) ∝ − λ · − lg − (1 /(cid:15) ) · { lg ... lg (1 /(cid:15) ) } − (17) (cid:104) l (cid:105) ∝ [ (cid:15) · lg 1 (cid:15) · lg lg 1 (cid:15) · ... · (lg ... lg 1 (cid:15) ) ] − (18) F ( (cid:15) ) is ﬁnite at the critical point. It is continuous, but almost discontinuous. Thus, thephase transition is ﬁrst-order, but almost zeroth order. We also see that the latent heat isinﬁnite, and that the average program size (cid:104) l (cid:105) diverges at the critical point. The averagesize N of the output strings also diverges, as l is of the order lg N .To put things into perspective, the diameter of the observable universe, measured inPlanck lengths, is about D = 2 . For (cid:15) < /D , one needs to consider contributions to(16) from input bit strings with length l > D to continuously interpolate between F ( β c )and F ( β c + (cid:15) ). The super-logarithm of D is about 4.6, so for (cid:15) of order 2 − , Z is stillabout 0 .

76% away from 1. To get a super-logarithm of 5, we need a universe of diameter2 (cid:48) Planck lengths. Even then, Z is still 0 .

58% away from 1. In this sense, the super-logarithmic singularity is indistinguishable from a discontinuity of the partition functionat least for all bit string ensembles that can be hosted by our universe.12

Singularity for Universal Turing Machines

In the previous section, we have discussed the singularity of the partition function (7) near (cid:15) = 0 for the non-universal counting machine. How does it compare with the singularityfor a universal Turing machine?Since it is generally an undecidable question whether a given Turing machine halts fora given program, for a UTM the function h l in (7) and Chaitin’s Ω are not computable byany halting program. Neither is the singularity of Z ( (cid:15) ) at the critical point computable.In fact, Z ( (cid:15) ) converges towards Ω more slowly than any computable function.To see this, let us slightly modify the last step of the counting machine of section 4: if,in the i -th step, p m i − = 0, the modiﬁed T switches into a new mode: instead of writing1 n i − on the work tape, it reads the next Σ( n i − ) digits of the program p from the programtape, where Σ( n ) is the busy-beaver function. The modiﬁed machine ˜ T writes those digitson the work tape and then halts. Formula (11) thus gets replaced by˜ Z k ( β ) = (cid:88) n ,...,n k Σ( nk ) (cid:88) N =0 e − β · (6+ n + ... + n k +Σ( n k )) ∼ (cid:88) n ,...,n k − (6+ n + ... + n k ) · e − (cid:15) · Σ( n k ) Σ( n ) is known to diverge faster than any computable function as n → ∞ . This impliesthat ˜ Z ( (cid:15) ) converges more slowly than any computable function to its critical value 1 forthe modiﬁed machine ˜ T . Now, any UTM U simulates the modiﬁed machine ˜ T , if it is fedwith all possible input programs. This implies that, for any UTM, Z U ( (cid:15) ) converges moreslowly than any computable function to its critical value Ω.The conclusion for UTM’s is thus similar as for the counting machine: at the criticalpoint, the phase transition is ﬁrst-order but almost zeroth-order, and the average programsize diverges. The average size of the output bit strings also diverges, as U simulates ˜ T ,among other machines. Strictly speaking, Z U ( (cid:15) ) for a UTM is continuous at (cid:15) = 0, but inpractise, the behavior is indistinguishable from a discontinuity. As we have seen, at leastfor all bit string ensembles within our universe, the super-logarithmic singularity of thecounting machine already has such an eﬀective discontinuity. In this respect, the countingmachine provides a simpliﬁed toy model for our ensemble.13 Outlook

We have shown that our ensemble of bit string histories has a ﬁrst-order phase transitionat the Chaitin point. This phase transition is almost zeroth-order in the sense that the freeenergy is continuous but ”almost” discontinuous: it converges to its critical value, namelyChaitin’s Ω, more slowly than any computable function. At this critical point, the averagesize of the input programs and the average size of the output bit strings both diverge.It is somewhat disappointing that the transition is ﬁrst- and not second-order. Second-order phase transitions, such as the transition between water and steam at a temperatureof 374 o C and a pressure of 218 atm, are particularly interesting, because there the statis-tical mechanical system typically has a continuum limit, in the sense that it is describedin terms of some quantum ﬁeld theory. The ﬁeld represents the order parameter. In ourcase, a second-order transition would have been a clear indication that our ensemble of bitstring histories (2) has a continuum limit at the Chaitin point, where it would be describedby what could be called a ”logical quantum ﬁeld theory”.However, although the phase transition is ﬁrst-order in β , there might still be a contin-uum limit, once β = ln 2 is ﬁxed. In particular, there might be second-order transitions inother parameters β i that multiply the inﬁnitely many other operators, by which the Gibbsfactor (3) can be generalized. A few examples of such operators are the number of timesthe Turing machine ﬂips a bit, or changes its state, or switches the direction in which thehead moves. Other such operators may describe properties of the output bit string, suchas its length, its number of ”kinks” (adjacent bits that are not equal), etc.As precedents of statistical mechanical systems that exhibit zeroth- or ﬁrst-order phasetransitions as a function of one parameter, and second-order phase transitions as a func-tion of other parameters, consider the Ising model on a planar random surface [18], or theKosterlitz-Thouless transition in the sine-Gordon model on a planar random surface [19].In these systems, the free energy is discontinuous as a function of the two-dimensionalcosmological constant µ . Still, once µ is ﬁxed to its critical value, there is a continuumlimit that is described by renormalizable two-dimensional ﬁeld theories on random surfaces.These ﬁeld theories are also known as ”non-critical string theories”.14t is tempting to take the analogy with string theory further: as the bit string on thework tape of the Turing machine evolves in computation time, its history can be recordedin a two-dimensional graph. An example is the graph in ﬁg. 3 (center and left) of appendixA2, where a new line is added each time the head of the machine changes direction. Inanalogy with the world sheet that is swept out by superstrings [9], let us refer to such atwo-dimensional graph as a ”bit string world sheet”. In appendix A6, it is shown how thetime evolution of the bit string on such a discretized world sheet can be described by a computable Hamiltonian, and how the sum over input bit strings p in (1) turns this bitstring on the work tape into a quantum mechanical object.Suppose that our bit string ensemble (1) at the critical point β = ln 2 indeed has acontinuum limit, where it is described by a theory of dynamical continuous strings. Theonly known consistent (i.e., renormalizable, modular-invariant, tachyon-free) such theoriesare the various superstring theories [9], which are all related to each other by dualities [20].This leads us to conjecture that the Chaitin point is described by superstring theory in thelimit of very large bit strings, which would also suggest a curious answer to the questionwhat superstrings are made of: they might be purely mathematical objects, made of bits.Further work to support this argument is underway. Acknowledgements

I would like to thank my brother Juergen Schmidhuber for arising my interest in infor-mation theory. The current work was inspired by his idea of the ”Great Programmer”[21]. This research did not receive any speciﬁc grant from funding agencies in the public,commercial, or not-for-proﬁt sectors. 15 eferences ppendix A: Review of Turing Machines

The appendix is organized as follows. A1 presents a Turing machine that contains only awork tape and no program tape. An example is given in A2. In A3, Chaitin’s deﬁnition ofa Turing machine is recalled. Using the example of A2 as a building block, we realize thecounting machine of section 4 in A4. A5 contains a supplementary argument to section 5.A6 constructs a computable transfer matrix/Hamiltonian for our ensemble (2).

A1. A simple Turing Machine

Our ﬁrst example of a Turing machine contains a ”work tape” that extends inﬁnitely inboth directions. It consists of cells that are blank, except for a ﬁnite, contingent bit stringof 0’s and 1’s (the ”input string”). A blank cannot be written between 0’s or 1’s, so itis not equivalent to a third letter in addition to 0 and 1. Rather, blank areas mark thebeginning and end of the string on the work tape. On the ﬁrst cell of the input string sitsa head, which can read, write, and move in both directions. The head can be in one ofseveral states, labelled by 1, 2, 3, ... ,H. At each step, the machine operates as follows:1. it reads the bit on the work tape on which the head sits (0, 1 or a blank)2. depending on that bit and on its internal state, it writes a 0, 1 or a blank in thatcell on the work tape. It may only write a blank if the cell has a blank neighbour, toensure that the binary string remains contingent3. it moves the head either one cell to the left or one cell to the right4. it may or may not change its internal state5. If and when it reaches the state ”H”, it halts

A2. An example

As an example, consider a Turing machine with 5 states 1 , , , , H . The ﬁrst six columnsof table 1 deﬁne how this particular machine writes 0 , −

1) or right (+1), and then switches to a new state, depending on the inputbit it reads (left column) and the state it is in (top row).First, let the input string be ”01”. Fig 3 (left) shows a two-dimensional graph of theevolution of the bit string, with a new row appended for each time step. The machine18 starts in state 1 on the ﬁrst bit of the work tape, which is 0. • writes a 0, remains in state 1 and moves right to the next bit, whose value is 1. • writes a 1, remains in state 1, and moves right to the next bit, which is blank. • switches to state 2, and moves back left to the prevous bit, whose value is 1. • overwrites it with 0, switches to state 3, and moves left, and so on. Table 1

Current state Current state & program bit

Operation

Work Bit 1 2 3 4 5 Write 0 0 1 0 0 - - 2 1 1 1 2 1 2on work 1 1 0 1 1 0 1 2 1 1 1 2 1 2tape 2 2 2 2 0 2 2 2 1 1 1 2 1 2Set 0 1 2 3 4 - - 8 7 H 1 H H 6the new 1 1 3 4 4 5 5 8 7 H 1 H H 6state 2 2 H b in binary code, then the output string of this particular Turing machine alwaysconsists of b b = 1001 = 9 . By”condensed”, we mean that each time step now corresponds to a new square, rather thana new row, such that the computation time is the area of the graph. The head of themachine moves along the rows of the graph, and each time it changes direction, a new row19igure 3: A simple Turing machineis appended. For completeness, ﬁg. 3 (right) also shows the state of the machine at eachpoint in the computation. The machine moves right along the light grey rows (state 1) andleft along the other (blue) rows (states 2,3,4).As an example of a universal Turing Machine (UTM) that can simulate all other Turingmachines, consider our brain: given the above table for any Turing machine, we can read itand use it to simulate the machine as you have just done if you have followed the exercise.Essentially, the table becomes part of the input, rather than being hard-coded into theTuring machine. For a more speciﬁc example of a universal Turing machine, see, e.g., [22].A UTM is arbitrarily ﬂexible and can quickly compute strings with one Turing machinethat take a long time or are impossible to compute with another machine.There are many alternative, but equivalent deﬁnitions of Turing machines. E.g., onecan introduce other symbols in addition to 0 and 1, or more states, or one can work withseveral parallel work tapes instead of just one. A3. Chaitin’s Machine

In Chaitin’s deﬁnition, there is a read-only ”program tape” of ﬁnite length, in addition tothe work tape. The program tape begins with a blank cell followed by a ﬁnite bit stringof 0’s and 1’s, the ”program”. On the program tape sits another head, the ”programhead”. Initially, it sits on the blank cell. At each step, the machine performs the followingoperations in addition to steps 1-5 of subsection A1:20 initial step: it reads the bit on the program tape on which the head sits • last step: it moves the program head either one cell to the left or leaves it where it isThe machine either halts or runs forever without reading any more program bits. Asa result, the set of input programs, from the ﬁrst to the last bit that has been read by themachine, is preﬁx-free. A4. The Counting Machine

As an example within Chaitin’s framework, we present an implementation of the countingmachine of section 4. We begin with the Turing machine of appendix (A1), and add aﬁnite read-only program tape, on which the programs of section 4 are written. We startwith a work tape that is initially blank.We ﬁrst add three additional states 6, 7, 8, whose role is to read the ﬁrst two bits onthe program tape and get the machine started (steps 1 and 2 of section 4). The operationsin states 6, 7, 8 depend only on the program bit on which the program head sits, and noton the work bit on which the work head sits. They are deﬁned in table 1. The machine isinitially in state 8 (in states 6 and 7, the program bit is then never 2).Next, we slightly modify state 2 in table 1 as follows: if the machine is in state 2, andthe head on the work tape sits on a blank, then it switches to state H only if the head of theprogram tape sits on a 0. Otherwise, it moves to a new state 5 (i.e., the bold-faced ”H” intable 1 is replaced by 5, if the program bit is 1). The operations of the new state 5 are alsodeﬁned in table 1. Its role is to write a new portion from the program tape onto the worktape, thereby over-writing the contingent sequence of 1’s. Its operations depend both onthe current work bit and on the current program bit (the machine is never in state 5 whenthe program head is on a blank or when the work head is on a 0). It is straightforward toverify that this machine indeed represents the counting machine of section 4. A5. A Supplementary Argument

In section 5, we want to evaluate the K -th part of the partition functionˆ Z K ( (cid:15) ) = (cid:88) n ,...,n K − (6+ n + ... + n K − ) · e − (cid:15)n K in the case 1 (cid:15) = Λ K (19)21here K ≥ n runs from 4 to 7, n i +1 runs from 2 n i − to 2 n i −

1, and Λ K is the largestpossible value of n K . Speciﬁcally, Λ = 127 , Λ = 2 −

1, and therefore Λ K − ∼ lg(Λ K ) =lg(1 /(cid:15) ) to high accuracy for K ≥

4. Deﬁning M = 2 n K − , the sum over n K yieldsˆ Z K ( (cid:15) ) = (cid:88) n ,n ,...,n K − − (6+ n + ... + n K − ) · A ( n K − , (cid:15) ) (20) A ( n K − , (cid:15) ) = 12 M M − (cid:88) n K = M/ e − (cid:15) · n K = 12 M (cid:15) ( e − M(cid:15)/ − e − M(cid:15) ) (21) A ( x, (cid:15) ) is plotted in ﬁg. 4. It is a monotonously decaying function with A ( x, (cid:15) ) → (cid:40) for x (cid:28) lg (cid:15) ∼ Λ K − x (cid:29) lg (cid:15) ∼ Λ K − (22) n K − runs from 2 n K − − to 2 n K − −

1. Only for the maximal value of n K − are therea few values of n K − near Λ K − , for which A diﬀers signiﬁcantly from 1 /

4. Even in thiscase, the contribution of these diﬀerences is • small (of order 1%) for K = 4: for the highest value n = 7, n runs from 64 to 127.Only the last few of these n contribute signiﬁcantly to the diﬀerence • practically zero for K ≥

5: e.g., for K = 5 and the highest value n = 127, n runsfrom 2 to 2 −

1. Only a tiny portion of these n contribute to the diﬀerenceFigure 4: The function A ( x )As long as K ≥

4, we can thus approximate A by 1 / x ≤ Λ K − to obtain ˆ Z K =2 − K − as claimed in section 5. An analogous argument, not repeated here, shows that wecan approximate A by 0 for x > Λ K − to obtain ˆ Z K +1 = 0, as long as K ≥

6. Hamiltonian Formulation

The evolution of the bit string in computation time can be represented by a two-dimensionalgraph, such as in ﬁg. 3 (center and left) of appendix A2. In section 9, this is called the”bit string world sheet”. Here we show how the ”time” evolution on such discretized worldsheets can be described by a computable

Hamiltonian.To this end, let us assume that the head of the Turing machine changes direction at thecomputation time steps 0 < t < t < ... . At computation time t T , the ”bit string state” | S T (cid:105) can be described by a 3-tuple | S T (cid:105) = | b T , s T , k T (cid:105) with ”world-sheet time” T ∈ { , , , ... } , where b T is the bit string at time t T , s T is the state of the Turing machine at time t T , and k T is the position of the head at time t T ( k T = 1 means that the head sits on the ﬁrstnon-blank bit of the bit string). Given the state | S T (cid:105) , the next state | S T +1 (cid:105) can uniquelybe determined, as long as no input bits are read in between t T and t T +1 . If s T =”Halt”,we deﬁne | S T +1 (cid:105) = | S T (cid:105) . Note that k T +1 = ±∞ , if the head keeps moving in the samedirection without halting. However, for a given Turing machine and any ﬁnite bit string b T , it is always decidable whether this will happen, so k T +1 is still computable.Some states will prompt for an input bit to be read in from the input programs p in(1). Each input bit represents a random variable with value 0 or 1. At the time it is read(corresponding to a particular cell of the bit string world sheet), it changes the evolutionof the bit string in two possible ways, both of which are computable. This turns the bitstring state into a quantum mechanical superposition of states. E.g., the output of thecounting machine, if fed with all possible input programs, is the state | S ∞ (cid:105) = ∞ (cid:88) N =0 ψ N | N , ”Halt” , (cid:105) with ψ N = 2 − l N / → (cid:88) N | ψ N | = 1 , where l N is deﬁned in (8). For any Turing machine, the evolution of the superposition | S T (cid:105) from ”world-sheet time” T to T + 1 is computable. Thus, it can be described by acomputable Transfer matrix, or - equivalently - by a computable Hamiltonian acting onthe Hilbert space spanned by all possible bit string states. Of course, for general Turingmachines, the halting problem re-appears: not all components of the superposition | S T (cid:105) will have s T =”Halt”, as T → ∞ , i.e., there may not exist a limit state | S ∞ (cid:105)(cid:105)