Theoretical Robopsychology: Samu Has Learned Turing Machines
TTheoretical Robopsychology: Samu Has LearnedTuring Machines
Norbert B´[email protected] of Information TechnologyUniversity of DebrecenOctober 25, 2018
Abstract
From the point of view of a programmer, the robopsychology is a syn-onym for the activity is done by developers to implement their machinelearning applications. This robopsychological approach raises some fun-damental theoretical questions of machine learning. Our discussion ofthese questions is constrained to Turing machines. Alan Turing had givenan algorithm (aka the Turing Machine) to describe algorithms. If it hasbeen applied to describe itself then this brings us to Turing’s notion ofthe universal machine. In the present paper, we investigate algorithmsto write algorithms. From a pedagogy point of view, this way of writ-ing programs can be considered as a combination of learning by listeningand learning by doing due to it is based on applying agent technologyand machine learning. As the main result we introduce the problem oflearning and then we show that it cannot easily be handled in realitytherefore it is reasonable to use machine learning algorithm for learningTuring machines.
Samu is a disembodied developmental robotic experiment to develop a familychatterbot agent who will be able to talk in a natural language like humans do[B´at15a]. At this moment it is only an utopian idea of the project Samu. Thepractical purpose of Samu projects is to develop computational mental organsthat can support software agents to acquire higher-order knowledge from theirinput [BB16]. The activities have been conducted during the development ofsuch mental organs may be considered as first efforts to create on demand theAsimovian profession called robopsychology [B´at16a].The roots of this paper lie in the two new software experiments Samu Turing[B´at16c] and Samu C. Turing [B´at16b]. These are very simplified versions ofthe former habituation-sensitization [CS14] based (like for example SamuBrain[B´at16d] or SamuKnows [B´at16e]) learning projects of Samu. Their commonfeature is that they use the same COP-based Q-learning engine that the chat-bot Samu does. To be more precise the mental organs use the same code https://en.wikipedia.org/wiki/Robopsychology a r X i v : . [ c s . A I] J un to see this compare https://github.com/nbatfai/SamuLife/blob/master/SamuQl.h with https://github.com/nbatfai/nahshon/blob/master/ql.hpp )as the chatbot does. The term “COP-based” (Consciousness Oriented Program-ming [B´at11]) means that the engine predicts its future input. The engine itselfis based on the Q-learning that receives positive reinforcement if the chatbot(or a mental organ) can successfully forecast the next input of Q-learning in theactual step. In this case the previous output (the previous prediction) is thesame as the actual input, for precise details see [B´at15a] and [BB16]). In the twonew experiments in question the transition rules of Turing machines (TMs) havebeen learned as it is illustrated in Fig 1. It should be noticed that neither theseexperiments nor this paper focus on the habituation-based learning because thelearning agent knows the model (TM) that generates the reality. Our motivationFigure 1: This is a screenshot from the project Samu Turing. The reality shownin the left side is generated by the operation of a given Turing machine. Theright side shows the predicted configurations of the investigated Turing machine.to write this paper stems from the last paragraph of the work of Neumann on thegeneral theory of automata [vN51] where Neumann had suggested that there isa complexity level above which the machines can reproduce themselves and evenmore complicated ones. Neumann investigated the self-reproducing automata[vN51] roughly a decade after Alan Turing had published his work on universalsimulation theorem [Tur36]. The Turing machine is a precise form of the infor-mal notion of the algorithm to describe algorithms. If this description algorithmhas been applied to describe itself then this brings us to Turing’s notion of theuniversal machine. In an intuitive sense we can say that Neumann replacedTuring’s notion of simulation with the notion of reproduction. In this work wewould like to replace the reproduction with the learning. To be more precise weinvestigate algorithms to write algorithms. For simplicity of our discussion thescope of this paper is constrained to Turing machines. It should be noticed thatwe could have used other universal computing models such as the Cellular Au-tomata. For example, the first mental organs had learned the Conway’s Gameof Life [BB16] (or see the YouTube video at https://youtu.be/_W0Ep2HpJSQ ).But in spite of this, we chose Turing machines because they are closer to theprogrammers’ intuition.The structure of this paper is as follows: the next section introduces the basicnotations. Then, in Sect. 3 we present the results of two Samu-based devel-opmental robotic software experiments to learn how Turing machines operate.Here we investigate some specific TMs. It should be noticed that some of them,such as the machines of Schult and Uhing or the Marxen and Buntrock’s BB5champion machine are famous in the field of the Rad´o Tibor’s Busy Beaverproblem [LV08]. It is worth noting that despite that this problem is a veryinteresting theoretical computer science problem we do not address it in thispaper. We introduce of the learning problem and give the basic notions of thissubject. Finally we present a new complexity measure called self-reproductioncomplexity and we show in Subsect. 3.2.3 that it is reasonable to use machine2earning algorithm for learning Turing machines. The paper is closed by a shortconclusion in which some possible directions for further work are pointed out. Throughout both this article and our software experiments we use the defi-nition of the Turing machine (TM) that was introduced in [B´at09] and alsoused in [B´at15b] where the Turing machine was defined by a quadruple T =( Q, , { , } , f ) where f : Q × { , } → Q × { , } × {← , ↑ , →} is a partial tran-sition function and 0 ∈ Q ⊂ N is the starting state. As usual a configurationdetermines the actual state of the head, the position of the head and the con-tents of the tape. With the notation of [B´at09] a configuration can be writtenin the form w before [ q > w after , where w before , w after ∈ { , } ∗ and q ∈ Q .In some proofs for simplicity’s sake we use multitape Turing machines orthe blank symbol on the tape (that is the tape alphabet is extended by thesymbol ). In addition, without limiting the generality, we may assume thathalting Turing machines (with a given input) do not contain unused transitionrules. The notation T ( x ) < ∞ denotes that the machine T with the input x halts. Definition 2.0.1 (configN) . The word b N . . . b [ q > a a . . . a N over the alpha-bet { , , [ , > } ∪ Q where a i , b j ∈ { , } is referred to as a configN configurationif there is a configuration w before [ q > w after such that w before [ q > w after = w ,before b N . . . b [ q > a a . . . a N w ,after . Remark 2.0.1 (config ∞ ) . In some cases, see for example Remark 3.2.1, weextend the definition of the configuration as follows ∞ w before [ q > w after ∞ . Inthis sense a usual configuration corresponds to a config ∞ configuration where w ,before = w ,after = λ the empty word. We may note that the release of the project Samu C. Turing used in Fig. 2uses config4 configurations.
In the aforementioned projects Samu Turing and Samu C. Turing we pro-grammed the Samu agent to work in a similar way as, for example, ProfessorJames Harland did in his work [Har16] where he observed and studied the con-figurations of Marxen and Buntrock’s Busy Beaver champion machines [MB90].In our experiments the agent Samu observes (listening) the consecutive subcon-figurations of a given investigated Turing machine and try to predict (doing) thenext rule of the machine that will be applied. From this viewpoint this wholelearning process can be seen as a way of learning by listening and doing wherethe listening part is the sensation of the agent and doing is the prediction ofthe agent. But the question may naturally be raised why should we use agenttechnology and machine learning algorithms to learn Turing machines? Ourexplicit answer is based on the following intuitive results and it will be found inSect. 3.2.3. 3 l l l l l l l l + + + + + Some running and learning times
The number of ones T he nu m be r o f s t ep s
26 14 21 32 160 501 1915 1471 4097
Figure 2: This figure shows the usual running time (time complexity) of somegiven machines and the learning time of these investigated machines. The bluecurve is the usual time complexities and the red one is the running times of thelearning. The x-axis labeled with the number of ones printed by the Turingmachines “26”, “14”, “21”, “32”, “160”, Schult (“501”), Uhing (“1915”),Uhing (“1471”) and Marxen-Buntrock (“4097”). For more precise details seehttps://github.com/nbatfai/SamuCTuring/releases/tag/vPaperTheorRobopsyand Table 1.
Fig. 2 summarizes and compares some running results produced by the projectSamu C. Turing. The numbers of two kinds of running times (usual time com-plexity and “learning complexity”, see the caption of the figure for details) arenot directly comparable because they use different scales to compute the y-axisvalues. One of the two curves is computed by the number of steps of a Turingmachine and the other by the number of sensory-action pairs of the reinforce-ment learning agent Samu C. Turing. The exact values can be found in Table1. One of the notions of cognitive complexity defined in Subsect. 3.2.3 will bebased on this intuitive “learning complexity”. In Fig. 2, it seems that the growthrate of the learning time is related to the running time. It is worth to comparethis with Fig. 6 where the growth rate of an another (the “self-reproducing”)complexity has already been separated from the running time.
From the observations of the two experiments above, we can build the abstractmodel of learning that is referred to as the learning problem. The learning4roblem of learning TMs is divided into two parts. The first is a simulation ofthe TM to be learned. The second is the actual learning problem itself. Fig.3 shows the schematic of the learning problem where the UTM R takes thedescription of the machine T and an x input of T . Then R has collected theconfigurations of T whilst it is simulating T with x . After the simulation S takes the collected configurations and it must try to figure out what TM wasactually simulating.T TxR S (cid:104) c i (cid:105) xT Figure 3: This figure shows the schematic of the learning problem. The universalmachine R takes two input parameters the description T of a TM and the input x of the machine T . The machine R computes the sequence (cid:104) c i (cid:105) xT of configurationsoccurred during the execution of the machine T with its input x . Then thelearning machine S takes this sequence and finally S has to figure out from thisinput sequence what was actually simulated by the machine R . It is obvious that the running problem trivially contains the halting problem.Therefore we may notice that similar undecidable statements can be made forthis case as well but in this paper we only focus on halting machines.
Lemma 3.2.1.
Apart from the trivial case of the empty tapes, the transitionrule between two consecutive configurations c i and c i +1 is uniquely determinedby the configurations c i and c i +1 .Proof. Suppose that there are two transition rules ( q, r ) → ( q , w , d ) and( q, r ) → ( q , w , d ) where q, q , q ∈ Q , r, w , w ∈ { , , } , d , d ∈ {← , ↑ , →} and then we show that q = q , w = w and d = d .Let c i = Ll [ q > rR where l ∈ { , , } , L, R ∈ { , , } ∞ ,Then the following cases are possible c i +1 = Ll [ q > w R, ( d = ↑ ) = Ll [ q > w R , ( d = ↑ ) ⇔ ( q = q , w = w )= L [ q > lw R , ( ← ) ⇔ ( q = q , Ll = L , w R = lw R that is, iff l, w , w = and R, L = ∞ )= Llw [ q > R , ( → ) ⇔ ( q = q , Ll = Llw , w R = R that is, iff l, w , w = and R, L = ∞ )5 i +1 = L [ q > lw R, ( ← ) = L [ q > lw R , ( ← ) ⇔ ( q = q , w = w )= Llw [ q > R , ( → ) ⇔ ( q = q , L = Llw , lw R = R that is, iff l, w , w = and R, L = ∞ ) c i +1 = Llw [ q > R, ( → ) (cid:40) = Llw [ q > R , ( → ) ⇔ ( q = q , w = w ) Remark 3.2.1.
It is noted that we may give an even more simpler lemma andproof using the usual ∗ and { , , } ∗ instead of ∞ and { , , } ∞ . We use thelatter because they are closer to the programmers’ intuition. Theorem 3.2.2 (Universal Learning) . There exist an universal running ma-chine R and a learning machine S such that, for all halting Turing machines T , it holds that S ( R ( T, x )) = T .Proof. The proof is divided into two parts: in the first one, we modify the usualproof of Turing’s universal simulation theorem (see for example the textbook[ISR00]) to produce the sequence of configurations of T by the universal machine R . In the other part we focus the learning of S by using the previous lemma.We provide only an outline of the first part. We use a multitape TM for theimplementation of R . Fig. 4 shows the preparation of the tapes before startingthe simulation of T . The tapes are shown in Fig. 5 after the simulation of thei-th step of T . encoded T and x ...(cid:46) | q > x(cid:47) Figure 4: This figure shows the preparation of the tapes of R . On the secondlast tape R denotes the used cells with the symbols (cid:46) and (cid:47) . From the pointof view of T these symbols are interpreted as the blank symbol on the tape.But from the point of view of R they may be “interpreted” as ∞ from left andfrom right.Then the theorem follows from Lemma 3.2.1. The previous theorem shows that there is no problem with learning if we useconfig ∞ (or the usual) configurations. But otherwise, as shown in the followingtwo simple examples of config2 configurations (Example 3.2.1 and 3.2.2) theapplied transition rule between two consecutive configN configurations may be6ncoded T and x ...(cid:46)c i (cid:47)(cid:46) c (cid:47) (cid:46) c (cid:47) . . . (cid:46) c i (cid:47) Figure 5: This figure shows that the denoted configuration (cid:46)c i (cid:47) is copied (andcollected) to the output tape after the simulation of the i-th step of T .not uniquely determined by the configN configurations. If we use configN con-figurations instead of the usual or config ∞ configurations then the Lemma 3.2.1does not hold. In the next subsection a notion of complexity will be exactlybased on this property. Example 3.2.1.
Let c i = ∞ q > ∞ be a config ∞ configurationand c ,i be a corresponded config2 configuration. Then the rules ( q, → ( q, , ← ) , ( q, → ( q, , → ) , and ( q, → ( q, , ↑ ) yield the same c ,i +1 = 11[ q > config2configuration. Example 3.2.2.
Let c i = ∞ q > ∞ be a config ∞ configuration and c ,i be a corresponded config2 configuration. Then the rules ( q, → ( q, , ← ) and ( q, → ( q, , → ) yield the same c ,i +1 = 10[ q > config2 configuration. As has already been mentioned in Sect. 3.1 we intuitively use the running timeof the learning machines as a complexity measure that may be formulated asfollows cc ( T, x ) = min { t S ( (cid:104) c i (cid:105) xT ) | T ( x ) < ∞ , S ( R ( T, x )) = T } but it does not seem very helpful because it is probably correlated with theusual time complexity of T as it is suggested by Fig. 2. The next type ofcomplexity tells what is the first finite N for which Lemma 3.2.1 holds withusing the configurations conf igN . To be more precise, it is defined as cc ∗ ( T, x ) = min { N | T ( x ) < ∞ , S ( R ( T, x )) = T and for configN the lemma 3.2.1 holds } that has shown different behavior than the previous one as it can be seen in Fig.6 The growth rate of the investigated cc ∗ values not related to the number ofones rather than to the running time (see “14”, “21” and “1471”).The results shown in Fig. 6 also suggest that it is hopeless to handle thelearning problem with the universal learning machine S of Lemma 3.2.1. Thisjustifies the using of agent technology (an agent observes the operation of theinvestigated TMs) and machine learning algorithms (such as Q-learning) to learnTuring machines instead of searching for suitable configNs for any universallearning machine S . In this paper, we started with two developmental robotic software experimentsSamu Turing [B´at16c] and Samu C. Turing [B´at16b] to learn how Turing ma-7
Some self−reproduction complexities
The number of ones T he m i n i m a l c on f i g N
26 14 21 32 160 501 1915 1471 4097
Figure 6: This figure shows the cc ∗ values of machines of Fig.2. The values are computed by the version of the project SamuC. Turing that tagged by self-reproducing complexity , see https://github.com/nbatfai/SamuCTuring/tree/self-reproducing complexity where a manual binary search was also used to determine the last three cc ∗ values. The x-axis is exactly the same as in Fig. 2.chines operate. This subject of the experiments itself enabled us to inves-tigate the theoretical properties of learning. First, we have eliminated fromour software experiments the developmental robotic processes (for example thehabituation-sensitization parts) and then we introduced the problem of learningand some complexity measures based on it. For some cases of given TMs wealso determine these complexities. The cc ∗ of machines of greater sophistica-tion cannot easily be computed by the universal learning machine S of Theo-rem 3.2.2. This justifies the usage of agent technology and machine learningfor learning Turing machines. We have provided only an outline of the proof ofTheorem 3.2.2. To complete it may be a further theoretical computer sciencework. Further work of a practical robopsychological nature is also needed. Forexample, we are going to investigate using Samu’s neural architecture [B´at15a],Samu mental organs (like MPUs) [BB16] and deep learning to learn how TMsoperate.To return to Neumann’s train of thought mentioned in the introduction itseems to be interesting to study when the learning algorithm has been appliedto write itself. Let’s start from a machine T that halts with x . It follows fromTheorem 3.2.2 that R ( T, x ) = (cid:104) c i (cid:105) xT and S ( R ( T, x )) = T . But then we can alsolearn this learning of T , that is R ( S, (cid:104) c i (cid:105) xT ) = (cid:104) c i (cid:105) (cid:104) c i (cid:105) xT S and S ( R ( S, (cid:104) c i (cid:105) xT )) = S .And then we can learn again the learning of learning of T , that is, to be more8 T
1s of T ( λ ) cc ( T, λ ) cc ∗ ( T, λ )9, 0, 9, 1, 11, 2, 17, 3, 21, 4, 19, 5, 29, 6, 5, 7, 6, 8, 8264 26 95048 249, 0, 9, 1, 11, 2, 5, 3, 20, 4, 17, 5, 24, 7, 29, 8, 15, 9, 1314 14 60872 79, 0, 9, 1, 11, 2, 15, 3, 20, 4, 21, 5, 27, 6, 4, 7, 2, 8, 12515 21 463558 129, 0, 21, 1, 9, 2, 24, 3, 6, 4, 3, 5, 20, 6, 17, 7, 0, 9, 15583 32 535050 419, 0, 9, 1, 12, 2, 15, 3, 21, 4, 29, 5, 1, 7, 24, 8, 2, 9, 2720928 160 512623 1609, 0, 11, 1, 12, 2, 17, 3, 23, 4, 3, 5, 8, 6, 26, 8, 15, 9, 5(Schult’s machine)134467 501 1685939 6649, 0, 11, 1, 15, 2, 0, 3, 18, 4, 3, 6, 9, 7, 29, 8, 20, 9, 8(Uhing’s machine)2133492 1915 4365184 38169, 0, 11, 2, 15, 3, 17, 4, 26, 5, 18, 6, 15, 7, 6, 8, 23, 9, 5(Uhing’s machine)2358064 1471 8368208 19619, 0, 11, 1, 15, 2, 17, 3, 11, 4, 23, 5, 24, 6, 3, 7, 21, 9, 0(Marxen and Buntrock’s BB5 champion machine)47176870 4097 9833455 12287Table 1: This table numerically shows the cc ∗ values of the investigated ma-chines. The combine columns show the given TM in the form of rule-indexnotation [B´at15b].precise R ( S, (cid:104) c i (cid:105) (cid:104) c i (cid:105) xT S ) = (cid:104) c i (cid:105) (cid:104) c i (cid:105) (cid:104) ci (cid:105) xTS S and so on. If we introduce the notation y j = (cid:104) c i (cid:105) ... (cid:104) ci (cid:105)(cid:104) ci (cid:105) xTS S then we can easily write that cc ( S, y j ) < cc ( S, y j +1 ) because t S ( y j ) < t S ( y j +1 )but the similar relation between cc ∗ ( S, y j ) and cc ∗ ( S, y j +1 ) is an open questionat this moment.It is clear, of course, that further work of a theoretical robopsychologicalnature is required as well. For example, we are going to find possible relationsamong the time, space, Kolmogorov and cognitive complexities. We believe thatthis is a necessary step towards achieving the situation that has been defined as“Programs hacking programs” by Neo in the movie “The Matrix Reloaded”. Inthe framework of Turing machines and Busy Beaver problem this quotation hasa special meaning namely that can we program a computer program not onlyto discover a BB machine but to build it from scratch?9 Acknowledgment
The author would like to thank his students in “High Level Programming Lan-guages” course in the spring semester of 2015/2016 at the University of Debre-cen for testing the Samu projects. He would also like to thank the membersof some AI-specific communities on Facebook, Google+ and Linkedin and es-pecially his group called DevRob2Psy at for their interest.
References [B´at09] Norbert B´atfai. On the Running Time of the Shortest Programs.
CoRR , abs/0908.1159, 2009. http://arxiv.org/abs/0908.1159 .[B´at11] Norbert B´atfai. Conscious Machines and Consciousness Oriented Pro-gramming.
CoRR , abs/1108.2865, 2011. http://arxiv.org/abs/1108.2865 .[B´at15a] Norbert B´atfai. A disembodied developmental robotic agent calledSamu B´atfai.
CoRR , abs/1511.02889, 2015. http://arxiv.org/abs/1511.02889 .[B´at15b] Norbert B´atfai. Are there intelligent Turing machines?
CoRR ,abs/1503.03787, 2015. http://arxiv.org/abs/1503.03787 .[B´at16a] Norbert B´atfai. How to Become a Robopsychologist. GitHub Project, https://github.com/nbatfai/Robopsychology/files/169195/robopsychology.pdf , 2016.[B´at16b] Norbert B´atfai. Samu C. Turing. GitHub Project, https://github.com/nbatfai/SamuCTuring , (visited: 2016-06-04), 2016.[B´at16c] Norbert B´atfai. Samu Turing. GitHub Project, https://github.com/nbatfai/SamuTuring , (visited: 2016-06-04), 2016.[B´at16d] Norbert B´atfai. SamuBrain. GitHub Project, https://github.com/nbatfai/SamuBrain , (visited: 2016-06-04), 2016.[B´at16e] Norbert B´atfai. SamuKnows. GitHub Project, https://github.com/nbatfai/SamuKnows , (visited: 2016-06-04), 2016.[BB16] Norbert B´atfai and Ren´at´o Besenczi. Robopsychology Manifesto:Samu in His Prenatal Development. submitted manuscript , 2016.[CS14] Angelo Cangelosi and Matthew Schlesinger.
Developmental Robotics:From Babies to Robots . The MIT Press, 2014.[Har16] James Harland. Busy Beaver Machines and the Observant Ot-ter Heuristic (or How to Tame Dreadful Dragons).
CoRR ,abs/1602.03228, 2016. http://arxiv.org/abs/1602.03228 .[ISR00] G´abor Ivanyos, R´eka Szab´o, and Lajos R´onyai.
Algoritmusok . Typo-tex, 2000. 10LV08] Ming Li and Paul M.B. Vit´anyi.
An Introduction to Kolmogorov Com-plexity and Its Applications . Springer Publishing Company, Incorpo-rated, 3 edition, 2008.[MB90] Heiner Marxen and J¨urgen Buntrock. Attacking the busy beaver 5.
Bull EATCS , 40:247–251, 1990.[Tur36] Alan Turing. On computable numbers with an application to theEntscheidungsproblem.
Proceeding of the London Mathematical Soci-ety , 1936.[vN51] John von Neumann. The general and logical theory of automata. InL. A. Jeffress, editor,