Self-Organizing Intelligent Matter: A blueprint for an AI generating algorithm
SS ELF -O RGANIZING I NTELLIGENT M ATTER : A
BLUEPRINT FOR AN AI GENERATING ALGORITHM
Karol Gregor, Frederic Besse
DeepMind, UK { karolg } @google.com A BSTRACT
We propose an artificial life framework aimed at facilitating the emergence ofintelligent organisms. In this framework there is no explicit notion of an agent:instead there is an environment made of atomic elements. These elements con-tain neural operations and interact through exchanges of information and throughphysics-like rules contained in the environment. We discuss how an evolution-ary process can lead to the emergence of different organisms made of many suchatomic elements which can coexist and thrive in the environment. We discuss howthis forms the basis of a general AI generating algorithm. We provide a simplifiedimplementation of such system and discuss what advances need to be made toscale it up further.
NTRODUCTION
An AI generating algorithm (Clune, 2019) is a computational system that runs by itself withoutoutside interventions and after a certain amount of time generates intelligence (though the generalidea is much older than this reference). Evolution on earth is the only known successful system thusfar that we know of. In this paper we propose a computational framework, and argue why it mightconstitute such general algorithm, while being computationally tractable on current or near futurehardware.Building such system successfully will take many iterations and require a number of advances. Whatwe hope to provide however, is a general procedure, where better and better systems arise as a resultof improving the elements of the system and of experimentation rather than a fundamentally newalgorithm. As an example, we had such procedure for supervised learning since the 1980’s - neuralnetworks trained by back-propagation and stochastic gradient descent. To reach the current impres-sive performance, it required a number of clever improvements, such as rectified non-linearities,convolutions, batch normalization, attention, residual connections and better optimizers, but theoverall algorithm hasn’t changed.1.1 E
VOLUTION
Evolution is the primary process by which our algorithm operates. We describe it here.In machine learning, the word evolution is typically used to describe variations of the followingprocess (Back, 1996). We have a number of individuals and an objective to optimize. We evaluatethe individuals, select the ones with good values of the objective, and mutate them to produce thenext generation. Over time, individuals that are better at optimizing the objective appear. The use ofthe word evolution in this context is perhaps unfortunate, as this process is quite unlike the evolutionobserved in nature (Stanley et al., 2017). The clearest difference we can see is in the outcome. Theformer results in a small variation of final individuals that are the best at the objective. The latterresults in a coexistence of huge variation of individuals with different behaviors - it is open-ended.Let us therefore discuss the basic operation of natural evolution. We have an environment built outof elements (atoms) that are organized into bigger units such as individual bacteria or animals orgroups of these. Those classes of units that propagate (Joyce, 1994) into the future (e.g. replicate- we will discuss this in a moment) keep existing, while those that don’t propagate cease to exist.1 a r X i v : . [ c s . N E ] J a n here is no objective based on which units are selected for propagation. Different collections ofunits find different means of propagating.An important mechanism for the coexistence of a large number of different solutions is niche con-struction (NicheConstruction, 2020). Different collections of individuals modify or form the envi-ronment for one another. For example a bacterium consumes a food and excretes waste products,which modifies the local environment, being either food or a toxic substance for other bacteria. An-other example is a prey forming a food source for a predator. These systems are self balancing, forexample too many predators means they can’t find food and die off, and vice versa. This means thatthe coexistence of multiple solutions is present (and evolving). Collections of individuals in suchsystems have different means of propagating, rather then being selected by a global objective.The lack of objective we believe, as argued in (Stanley & Lehman, 2015), is critical and fundamen-tal to open-ended creation and coexistence of diversity and runs counter to most developments inmachine learning. Conversely, a presence of an objective in a system likely leads to a collapse ofdiversity. To see that attempting to create an objective is problematic, let us therefore try to suggestone and see what the problems would be. One of the clearest objectives one might propose is to re-ward an individual for reproduction. However, a predator might then simply kill its offspring whichwould increase its chance of making another one. We could try to tweak or find other objectives, butthis might lead to unwanted and unforeseen behaviours similar to the one described above. Thereare other issues. We don’t actually make copies of ourselves, but instead mix genes from individuals(sexual reproduction), which we need to select. What do we actually want to reward? In addition,most of the time, propagation of species depends on individuals working together in a group, oftensacrificing some members for the good of the group. What is then a real reproducing unit? Thisis the reason why the word ”propagate” (Joyce, 1994) is more appropriate than ”reproduce”. Whatreally happens is that those classes of groups of elements that are set ”a certain way” propagate andthose that are not, don’t. Note that this does not contradict the evolution of an intrinsic reward -which is an evolved means of finding a good policy during the lifetime of a given individual.Another example of evolutionary process is our society. People don’t have children in proportion tosome objective the society has set, or there isn’t just one job or hobby that we all converge on andthat is the ”best”. People engage in a large number of jobs and hobbies. At the same time, memes,values, work practices, company structures and many other emergent concepts propagate. All thesethings coexist, both cooperating and competing.1.2 P RINCIPLES
The field of artificial life aims at producing life and an evolutionary process inside a computer(Langton, 1997; Ray, 1991; Lenski et al., 2003; Sims, 1994; Yaeger et al., 2011; Gras et al., 2009;Soros & Stanley, 2014), see (Aguilar et al., 2014) for a review. In seminal work on Tierra (Ray,1991), Thomas Ray created an artificial life system in the substrate of assembly instructions incomputer memory. The set of instructions is executed by a number of heads and one organismcorresponds to one head. The system was initialized by a handcrafted sequence of instructions thatwhen executed, will copy itself to another part of the memory. The executions undergo mutations.Some of these result in organisms that are unable to replicate, while some others get better at it.Some organisms find ways to use other organisms copying mechanism to copy, forming parasites.Then resistance to parasites evolves, and hyper-parasites, phenomenons of sociality and cheating areobserved.This process eventually peters out, and the quest ever since has been to create a system that istruly open-ended, where complexity keeps increasing without bounds (Standish, 2003). A numberof principles that characterize and open-ended process has been proposed (Soros & Stanley, 2014;Taylor et al., 2016) Here we select two that that we find the most important (points 2 and 3) andintroduce two new ones (points 1 and 4).• There should be no built in notion of an individual and no built in operation for reproductionof an individual. Instead, these should be an emergent properties of collections of units,composing new collections of units or themselves.• The evolution of new (here emergent) individuals should create novel opportunities for thesurvival of others (Soros & Stanley, 2014).2 The potential size and complexity of the individuals’ phenotypes should be (in principle)unbounded (Soros & Stanley, 2014).• To tractably obtain intelligent agents, fundamental neural operations should be basic build-ing blocks of the environment.The first property is absent in majority of works on artificial life, except a few that we review laterbelow. However, this property is quite close to being true in Tierra and Avida (Lenski et al., 2003).A given individual (organism), consisting of a sequence of instructions, actually has to construct anew one (create the new sequence of instructions). Then however, a new agent is declared, a newhead is allocated for it, and there is a distinction in operations within and outside of/between theindividuals. Properties two and three should really be consequences of property one, which we willsee once we describe the system in section 2. Property four is introduced for tractability and againwill become clearer below.There are other approaches that studied diversity in evolution. To prevent collapse of diversity inobjective based evolution, ideas such MAP-Elites (Mouret & Clune, 2015) or quality diversity (Pughet al., 2016) have been proposed, that explicitly try to keep diverse set of solutions. These assume asmall handcrafted space of variables that define what individuals are different. However, given suchspace, they show that searching for space of diverse solutions, leads to a better result on the finalobjective than optimizing the objective directly, as the search process is able to step through solutionsthat are not obviously directly on the path aimed at the objective. To remove an objective, (Soros &Stanley, 2014; Brant & Stanley, 2017) propose to give every organism a chance to reproduce but doso if it satisfies a minimal criterion, such as sufficient amount of energy. In the latter, they considera set of mazes and their solvers. A maze is propagated to the next generation if there are solverssolving it and a solver is propagated if there are mazes it can solve. This results in a coexistenceof many solutions in the population. The system however behaves somewhat like a random walkthrough solvable mazes and it would be good to find a system with a stronger selection pressure.Another approach to create an open ended system is to co-evolve one agent that designs an en-vironment (sets the layout of things and such) and another one that tries to solve it (Wang et al.,2020; Racaniere et al., 2019). The former maintains a collection of environment-agent pairs, anda new such pair is allocated if it is sufficiently far from the pairs in the collection according to apre-specified objective that does not depend on the details of the environment (but on the ability ofagents to solve the environment).In the system that we propose that there is no special agent designing the environment - there isactually no concept of agent, and instead there is only an environment where agent-like organismscan emerge and reshape the environment from within. There neither is any explicit minimal criterionfor an organism since there is no explicit operation of copy of individual, but such criterion willappear for any emergent individual.
ROPOSED SYSTEM
The real world is built out of elementary particles that interact and compose bigger entities. Ourproposed environment (an aspiring AI generating algorithm) is as follows. The environment is builtout of elements, but at much coarser scale. Each element contains a neural operation. This can befor example a matrix multiplication, an outer product, or more likely a sequence of such operatorscomprising a mini neural network. The elements interact with each other through some form ofunderlying rules, a type of physics, and through a direct communication of neural states.There can be various implementations of this system. In section 3 we provide a grid-world im-plementation with basic elements lying on a grid, communicating with propagating signals or viaan attention mechanism and with underlying physics implementing energy and chemical-like ex-changes. Another example could be elements forming rigid pieces in a three dimensional space thatcan be attached using joints, that contain the neural operations, interacting through exchange of sig-nals with nearby attached pieces and that set the torques on the joints. There could be several typesof elements in the system, and not all need to have neural networks inside.In section 3 we provide a grid world implementation that highlights a number of important prop-erties. Along the way we discuss what advances need to be made to make this system powerful.3owever, the potential of the proposed system is unbounded and here are some of the features thatthis system supports. Larger units composed of several elements can be formed either by physicalattachment (like a robot) or simply as a set of units that decide to communicate and form a whole.There is no limit to the potential size of these units. These units can propagate in a number of ways- they can grow by taking over other elements in the environment (a colony), they can replicate byassembling new copies - moving appropriate collected elements to place (e.g. a robot assembling acopy itself from pieces), or self-assemble, or they can produce whole different units that either im-plement specialized functions (a useful machine) or units that are even better than their predecessors.The latter likely requires intelligence.2.1 C
APACITY FOR INTELLIGENCE .In this subsection we discuss why the computational system just proposed has the capacity to repre-sent general intelligence. We provide two arguments. First, any neural algorithm in machine learningthat we have created, and likely create in the future, can be written as a sequence of operations, suchas additions, matrix multiplications, outer products and non-linearities, operating on states whichare tensors. An example is the sequence resulting from the forward, backward, and optimizer op-erations of a neural network. Auto ML Zero (Real et al., 2020) realized this, and directly searchedfor the sequence of such operators along with connectivity to states on which they operate and wasable to learn basic neural algorithms. Since these operators are fundamental building elements ofour environment, and these elements can be made to communicate with arbitrary connectivity, allneural algorithms can be implemented in our system as well.Second, the human brain is capable of general intelligence, and its computation closely resembles anonline recurrent network (not trained by back-propagation in time). That is, it can be approximatedby a function F that updates neural and synaptic values online h t , W t = F ( h t − , W t − , x t , v ) where h i is a cell state of neuron i and W ij is the state of the synapse connecting neurons i and j , x is an input and v represent hyper-parameters such as connectivity, coefficients, and nonlinearities.To represent actual neural computations well enough (note that we are not interested in faithfulrepresentation but something with similar algorithm and computational power) we likely need torepresent h i and W ij by small vectors, as suggested in (Gregor, 2020; Bertens & Lee, 2019). Thesecomputations are local and therefore to search for them, we need to search for relatively simplefunctions (with few hyperparameters). We can search for them automatically, as in Auto ML zeroor other neural architecture search works (Zoph & Le, 2016; Pham et al., 2018; Liu et al., 2018;Elsken et al., 2018). The important point to remember from this, is that there likely exists an onlineneural update, of power at least as large as that of the human brain, on a system of a similar sizein terms of computational capacity. We could implement the computation of a given element of theenvironment by a layer of such neurons. During the run of the environment, other elements can setthe hyper-parameters v as well as h , W of a given element. Those groups that have elements withthe right settings of these parameters, for example v ’s that implement a good learning algorithm (theupdate of W ), should enable a better propagation of this group compared to others.2.2 A GENT HYPOTHESIS .In our system there is no separation between an agent and an environment (AgentHypothesis, 2020)- there is only an environment. The elements themselves may or may not form units of evolution(Smith, 1986; Szathm´ary et al., 2005) - entities that multiply, show heredity and where the heredityis not exact. In the former case, they could move autonomously, collect energy and replicate, andform bigger aggregates or replicating organisms because that provides an advantage. However, wepropose to aim for the latter, where a certain minimal number of simpler cooperating units are needto propagate themselves.2.2.1 R
ELATION TO OTHER WORKS
There is a large amount of works that are related in one way or another to self-organization, andwe can only review certain ones most relevant to our paper. The study of origins of life aims tounderstand how life evolved from non-living things. At some point in time in early earth, therewere only simple combinations of elements: atoms combined into simple molecules, and at latertime, there was the last universal common ancestor (Theobald, 2010) of all current life, which was4lready a rather complex organism containing core structures of bacteria. There are theories of howlife emerged in the period in between, such as an auto-catalytic system enclosed in a membrane(Maturana & Varela, 1991; G´anti, 2003) or a self-replicating molecule (Joyce, 2002).Such self replication for was studied in models for example in (Penrose & Penrose, 1957; Virgoet al., 2012). In the later they consider a physical system of shapes and succeed at finding lengthtwo chains that replicate. However, they find that the shapes have to have a very specific features,arguing that such shapes are unlikely to emerge spontaneously from chemistry and perhaps makesthis processes less likely to be the origin of life.Simple substrates to study artificial life are cellular automata and related particle systems. (Gardner,1970; Chan, 2020; Schmickl et al., 2016; Sayama, 2009; Ventrella, 2020; Nichele et al., 2017). In theformer, elements are cells on a grid and in the later they have real valued coordinates and move. Theirbehavior is governed by simple rules that update their state based on their previous state and that ofnear neighbours. A goal is to find a set of rules that give rise to life, exhibiting replication, variationand heredity. This has proven quite difficult and while simple self replicating structures have beenfound, they don’t satisfy these criteria of life. It is then especially difficult to imagine how to obtainan intelligent life from such simple rules in a tractable fashion. The proposal we put forward in thispaper, is to use general neural networks of section 2.1 as the key parts of the elements, which cantogether form large neural networks, but with comparatively much fewer elements than basic cellularautomata/particle systems. This is the reason for the word ”intelligent” in the title of the paper. Inaddition, obtaining complex behavior should be much easier than tweaking cellular automata rulessince we are starting with good learners - knowing the current capabilities of reinforcement learningagents, which we can use to jump-start the system.Cellular automata can be implemented using convolutional neural networks (Wulff & Hertz, 1993;Gilpin, 2019). In study of morphogenesis (self-assembly of a given object/pattern), (Mordvintsevet al., 2020) trained a convolutional neural network that respects the structure of cellular automatato produce a given pattern starting with single cell or number of other patterns. Similar approachwas pursued using compositional pattern producing networks (Nichele et al., 2017). These workshowever are not building artificial life systems.Finally, there is a body of work on swarm systems and self-assembling robots. The former studieshow can swarms of robots coordinate their behavior to accomplish tasks such as disaster relief orto study an emergent swarm behavior. The later, which is the closest in some ways to out proposal,studies how robots can self-assemble from smaller pieces. In (Weel et al., 2014), they consider asingle type of (simulated) robot piece from which varying bodies are assembled through a processof evolution. There are some differences from what we propose: There is an explicit notion ofan individual that is constructed in a central birthing facility according to an evolved template, anindividual has a central brain (rather than composed one), and individuals replicate if they meet aftera certain minimal distance from the birthing facility. While they do indeed aim for objective freeevolution, the individuals do tend be selected based on how quickly they can get certain distancefrom the birthing area. Finally, in (Mathews et al., 2017; Pathak et al., 2019) they build roboticsystems (real and simulated respectively) build out of pieces that self assemble out of pieces thatand at the same time form a bigger computational system, and are trained to accomplish a task.
RID VERSION OF
SIM
AND GENERAL PROPERTIES . This section introduces an instance of the self-organizing intelligent matter (SIM). It also serves asa discussion of various points relating to SIM and gives more reasons of why we believe it can forman AI generating algorithm.As discussed in subsection 2.2, there is no built-in notion of an agent and there really is just anenvironment. Seeing that, it feels unnatural to implement the system on two different platforms asis commonly the case - one for the physical part - such as a physics simulator, and one for the neuralpart - a neural network framework such as TensorFlow, PyTorch or Jax. Instead, we propose to makesuch system in a single platform. Because we would like to produce intelligent behavior, we needto run neural networks efficiently, and therefore the system needs to be implemented on one of thelatter platforms. Because of its flexibility, we choose Jax.5ax operates on tensors, which we use to store elements. The elements need to interact and have theability to form flexible aggregates of arbitrary size (property 3, subsection 1.2). We first focus onneural operations and describe the ideal system: the one we aspire for. We then talk about the stepswe took towards implementing that system.3.1 T
HE IDEAL SYSTEM
In our system, we place a neural network at every point of an m × m grid ( m up to 400 in ourexperiments), but in general, a different and flexible connectivity can be used. The ideal systemwould use the networks described in subsection 2.1 since, as we argued, they are likely able tocompose general learners. At every point in time such network updates activations, weights andpossibly hyperparameters h t , W t , v t = F ( h t − , W t − , v t − , x t ) . Each network outputs a numberof ”actions” that affect other elements - the cells of the grid. One fundamental action a networkcan take is to set the values of h, W, v of neighbouring cells. This action can be executed if certainconditions are satisfied, such as the cell having enough energy - the energy concept and its updates inour system are described later (this is a minimal criterion for an element, not an emergent individual).What does such action allow? We give a few examples.• The first example is to copy v with a noise, set W ’s and h ’s to some random variables andkeep v constant through the lifetime of any cell. This corresponds to creating a new mini-brain (consisting of one cell) that has a very similar structure to the source and is untrained -in other words, a replication operation, but without replicating the learned content - similarto way children are created.• The second example is to copy all the variables, with some noise. This creates copy withthe same knowledge as the original cell.• The third example is somewhere in between, transferring some knowledge and not other.• The fourth example is to have a joint set of weights for an aggregate of cells, and select outthose which are used in a given copy, analogous to cell differentiation in animal bodies orbee specialization in a colony of bees.• The fifth example is setting the parameters (or some of them) to something quite different- programming them - that causes new aggregates of cells to perform useful functions forthe original such as collecting energy - in effect creating useful machines for the originalaggregate.• In the final example, again, the new cells are programmed, but this time with the goal ofcreating new aggregates that are better than the original. The aggregates can be thoughtof as both organisms and machines - there is no distinction between the two in our envi-ronment. Here, machines create better machines by both designing better brains and betterbodies. Such process requires intelligence, which is exactly what we are aiming for. Webelieve there will be a point in time in the evolution of the environment when this will hap-pen, creating a self reinforcing loop of improving intelligence. Having neural operations asthe fundamental elements of the environment is the reason that makes us believe that suchstate can be achieved in computationally tractable fashion in this type of system.3.2 O UR PROPOSED STEPS TOWARDS THAT SYSTEM
Let us come back to describing the system we built. Since more powerful algorithms are not yetavailable, we use standard recurrent neural networks. We cannot use reinforcement learning totrain these networks in a direct way - RL is an algorithm for optimizing an objective, which wefundamentally don’t have here. We could try evolving such objective and such approach might betractable. Instead we resort to what is usually done in these situations - simply evolve weights. Weuse the standard tanh recurrent network and the operation of setting the weight matrices and biases ofnew cell is simply copy with mutation. Alternative representation that is common are compositionalpattern producing networks (Stanley, 2007).The neurons, and all the other variables mentioned below (such as energy and chemicals), are placedon an m × m dimensional grid, with m up to 400, Figure 1 comprising up to 160k networks. Thesecond action that a network can output is to move to a neighbouring cell, which will swap thecontent of this cell with the new one (we experiment with allowing and disallowing this action).6igure 1: Diagram of the grid version of the system.It is difficult to implement rigid bodies on a grid in Jax. To allow coordinated movements of largerunits, we let the neurons communicate. For example they can communicate to all the cells belongingto a given group to move. One way to communicate is via local attention mechanism, where a givencell can read values of the h of a cell in its neighborhood. This would also allow an aggregate ofcells to implement a deep neural network, having different units representing different layers of adeep network. A single pass through such network takes a number of world updates. In our casewe opt for a simpler communication mechanism, because a locally connected computation whichlocal attention would utilize is not available in Jax. We create four signal layers that move in fourdirections (right, up, left, down). Each cell writes h into all the layers, which then move and eachcell can then read the content of all four signal layers.Next we introduce a type of physics into the system, other than motion. We consider fields ofenergy and fields of n c types of chemicals C , . . . , C n c with n c typically − , at every locationof the grid. A given cell can pull or push all these quantities in each of the four directions, whichcosts it energy proportional to the push or pull. Neighbouring cells can ”fight” for these quantities,and if an energy of a cell falls below threshold, the cell is declared dead and its weights are setto zero. The cell also produces enzymes Z , . . . , Z n c that control the reactions in the respectiveorder (e.g. Z controlling C → C ), that cost energy to make and that sum to at most one. Agiven reaction releases energy equal to C i Z i except the last reaction which releases zero energy. Asan example, to maximize energy, the cell should have one enzyme, say Z , pull chemical C fromoutside (assuming some other cell is producing it), convert it to C and excrete it to another cell. Thecell also has a maximum lifetime and therefore it has to do something non-trivial (at least produceenergy and copy) to propagate its information. If the population of live elements is below 10% wereawaken new ones with random weights. We also note that in our implementation, the elementsthemselves form autonomous units capable of self-replication, however, as discussed in section 2.2,in an ideal system they wouldn’t.There are two other variations that we experimented with. First we wanted to know how simple canone make the system and still observe and interesting behavior. We created a pure energy system,where instead of having chemicals, energy increases at every location up to some threshold. In thesecond system we experimented with different implementation. Instead of having a network at everypoint of the grid, we consider n (1600) elements on m × m ( × ) grid that can move around.This time we used attention mechanism for communication within some distance between elements.We again only had a background energy, and the elements don’t die or need anything for them tobe ”alive”. They could just ”sit around” (as atoms can). However, those that are active, can acquireenergy by moving (energy gets automatically absorbed) and copy their weight onto others that havea lower energy. Those that do that will be the ones that propagate and overtake the system.7 E XPERIMENTS
We run the system explained in the previous section. More detailed explanation and the settings ofhyper-parameters are in the Appendix. The code will be released in a near future. What we observeis an exciting diversity among a series of runs, with snapshots shown in the Figure 2. This is bestviewed in accompanying video . It also shows the run of the system from the the start, showingcompetition among various classes of elements.Figure 2: Results of runs. Top row.
We selected three random weights and plotted them in RGBchannels. This way elements with similar weights will have similar colour and those with differ-ent weights will likely have different colour.
Bottom row . First three chemicals plotting in RGBchannels.
Discussion.
We see a lot of diversity in the runs. These are best viewed in accompanyingvideo. We often see coexistence of two species - seen in two different colours in the same regionof space. We also see that the elements moved the chemicals around creating regions of high andlow densities. In the top left square, we see that the region of low density is occupied by differentspecies than the region of high density
First graph
Fraction of elements alive as a function of timefrom the start of the run (random new weights with no evolution in this run only). We see oscillatorybehavior resulting from dynamics of two species.
Second graph
Energy content of elements in pureenergy system of moving elements that need no energy to survive. However, those elements thatcollect energy can copy weights onto those with less energy, and those that do that propagate.Looking closely at the Figure 2 top, we see in several regions a stable coexistence of two speciesof elements, represented by dots of different color in the same region of space. These persists forlong amount of time, as can be seen from the videos, meaning they found a way to coexist, creatingniches for one another.Furthermore, looking at the Figure 2 top left, we see a light yellow region, containing two species,and an orange region containing different, third species. We see that the chemicals have been movedto sub-regions of space, and that the the species of the orange region live in a place with less chemicaldensity. Thus the elements created niches by modifying density of chemicals in the environment.The challenge is not simply to be able to reproduce alone, but in the presence of others.There are other types of behaviour we observe. In a smaller system, where we diffused the chemicalseverywhere quickly, and turned off evolution for stability (weights are reset if density is less then10%) we observed oscillations in population of two species, Figure 2 middle (and in the video) isreminiscent of the Lotka-Voltera system, (Lotka, 2002). The behavior in the pure energy grid systemusually results in one species, but there are instances of two. This is similar for the version of neuralelements not living in a grid. In Figure 2 right we see that in this system, despite not having anysurvival notion or an explicit selection for energy, the elements learn to collect energy, as this allowsthem to copy their weights onto others, which causes them to propagate. Videos: Best viewed by downloading (not viewing on the site): https://bit.ly/3nb6MCI (3.2Gb). Com-pressed version (more blurry): https://youtu.be/ifVjzhWL9ro D ISCUSSION
In this paper we proposed a framework for achieving intelligence by evolutionary process in anenvironment that is built out of interacting elements implementing computationally efficient andgeneral learning. Only time will tell whether this framework is capable of such a feat. There aresome critical advances that need to be made, most notably, creating general neural updates thatcan be evolved, or otherwise designed. Other directions for development are a way these elementsinteract, their embedding in a space and the underlying physics.We have created a version of this system, that we believe has all the core necessary components, orwould have if we had more powerful learning updates. It could form a starting point for the devel-opments outlined in the previous paragraph. We summarize the core properties of this system thatare general to all instances of SIM. There are elements containing neural networks. These elementscan interact and communicate, forming larger units, in effect implementing larger networks. Thereis no objective based on which anything is selected. Instead, those compositions of units that findways to propagate, will. There is no distinction between organisms and machines. The elements canwrite into other elements, they can do this by copying, or they can write other information to buildmachines that are useful for their creators or they can create whole new autonomous machines. Themachines can create both new ”bodies” - functional compositions utilizing physics, and new brainscreating better algorithms than themselves.A very intriguing question is what is the necessary physics needed for explosion of diversity and riseof intelligence. In the real world, the elements are elementary particles such as quarks and electrons.The former combine into protons and neutrons (there are few other particles such as photons), thesecombine into atoms. Few core atoms, primarily carbon, hydrogen, oxygen, nitrogen and phosphorus,form a wide diversity of molecules in the form of proteins and other types of molecules, eventuallybuilding cells as a basic units of life. What are the core rules that could give rise to complex life inthe system we propose, where fundamental units themselves can already exhibit complex interactionand behavior? Does one need to introduce a complex chemistry or classical physics? An intriguingpossibility, that we would like to explore in the future is whether any built in complexity is evennecessary? What if we have the brain elements and only a basic rule - that of energy? Will naturalselection, with such general learners, progressively construct more and more complex structures thatoutsmart one another? These are some of the exciting questions we plan to study in the future. R EFERENCES
AgentHypothesis. http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/agenthypothesis.html , 2020.Wendy Aguilar, Guillermo Santamar´ıa-Bonfil, Tom Froese, and Carlos Gershenson. The past,present, and future of artificial life.
Frontiers in Robotics and AI , 1:8, 2014.Thomas Back.
Evolutionary algorithms in theory and practice: evolution strategies, evolutionaryprogramming, genetic algorithms . Oxford university press, 1996.Paul Bertens and Seong-Whan Lee. Network of evolvable neural units: Evolving to learn at asynaptic level. arXiv preprint arXiv:1912.07589 , 2019.Jonathan C Brant and Kenneth O Stanley. Minimal criterion coevolution: a new approach to open-ended search. In
Proceedings of the Genetic and Evolutionary Computation Conference , pp.67–74, 2017.Bert Wang-Chak Chan. Lenia and expanded universe. arXiv preprint arXiv:2005.03742 , 2020.Jeff Clune. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificialintelligence. arXiv preprint arXiv:1905.10985 , 2019.Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey. arXivpreprint arXiv:1808.05377 , 2018.Tibor G´anti.
The principles of life . Oxford University Press, 2003.9artin Gardner. Mathematical games: The fantastic combinations of john conway’s new solitairegame “life”.
Scientific American , 223(4):120–123, 1970.William Gilpin. Cellular automata as convolutional neural networks.
Physical Review E , 100(3):032402, 2019.Robin Gras, Didier Devaurs, Adrianna Wozniak, and Adam Aspinall. An individual-based evolvingpredator-prey ecosystem simulation using a fuzzy cognitive map as the behavior model.
Artificiallife , 15(4):423–463, 2009.Karol Gregor. Finding online neural update rules by learning to remember. arXiv preprintarXiv:2003.03124 , 2020.Gerald F Joyce. The antiquity of rna-based evolution.
Nature , 418(6894):214–221, 2002.J Joyce. Foreword in: Origins of life: The central concepts.
W. Deamer and GR Fleischaker, eds ,1994.Christopher G Langton.
Artificial life: An overview . Mit Press, 1997.Richard E Lenski, Charles Ofria, Robert T Pennock, and Christoph Adami. The evolutionary originof complex features.
Nature , 423(6936):139–144, 2003.Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXivpreprint arXiv:1806.09055 , 2018.Alfred J Lotka. Contribution to the theory of periodic reactions.
The Journal of Physical Chemistry ,14(3):271–274, 2002.Nithin Mathews, Anders Lyhne Christensen, Rehan O’Grady, Francesco Mondada, and MarcoDorigo. Mergeable nervous systems for robots.
Nature communications , 8(1):1–7, 2017.Humberto R Maturana and Francisco J Varela.
Autopoiesis and cognition: The realization of theliving , volume 42. Springer Science & Business Media, 1991.Alexander Mordvintsev, Ettore Randazzo, Eyvind Niklasson, and Michael Levin. Growing neuralcellular automata.
Distill , 5(2):e23, 2020.Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites. arXiv preprintarXiv:1504.04909 , 2015.NicheConstruction. https://nicheconstruction.com/ , 2020.Stefano Nichele, Mathias Berild Ose, Sebastian Risi, and Gunnar Tufte. Ca-neat: evolved compo-sitional pattern producing networks for cellular automata morphogenesis and replication.
IEEETransactions on Cognitive and Developmental Systems , 10(3):687–700, 2017.Deepak Pathak, Christopher Lu, Trevor Darrell, Phillip Isola, and Alexei A Efros. Learning tocontrol self-assembling morphologies: a study of generalization via modularity. In
Advances inNeural Information Processing Systems , pp. 2295–2305, 2019.Lionel S Penrose and Roger Penrose. A self-reproducing analogue.
Nature , 179(4571):1183–1183,1957.Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecturesearch via parameter sharing. arXiv preprint arXiv:1802.03268 , 2018.Justin K Pugh, Lisa B Soros, and Kenneth O Stanley. Quality diversity: A new frontier for evolu-tionary computation.
Frontiers in Robotics and AI , 3:40, 2016.S´ebastien Racaniere, Andrew K Lampinen, Adam Santoro, David P Reichert, Vlad Firoiu, andTimothy P Lillicrap. Automated curricula through setter-solver interactions. arXiv preprintarXiv:1909.12892 , 2019. 10homas S Ray. Evolution and optimization of digital organisms.
Scientific excellence in supercom-puting: The IBM 1990 contest prize papers , 1991.Esteban Real, Chen Liang, David R So, and Quoc V Le. Automl-zero: Evolving machine learningalgorithms from scratch. arXiv preprint arXiv:2003.03384 , 2020.Hiroki Sayama. Swarm chemistry.
Artificial life , 15(1):105–114, 2009.Thomas Schmickl, Martin Stefanec, and Karl Crailsheim. How a life-like system emerges from asimple particle motion law.
Scientific reports , 6:37969, 2016.Karl Sims. Evolving 3d morphology and behavior by competition.
Artificial life , 1(4):353–372,1994.John Maynard Smith. The problems of biology. 1986.L Soros and Kenneth Stanley. Identifying necessary conditions for open-ended evolution throughthe artificial life world of chromaria. In
Artificial Life Conference Proceedings 14 , pp. 793–800.MIT Press, 2014.Russell K Standish. Open-ended artificial evolution.
International Journal of Computational Intel-ligence and Applications , 3(02):167–175, 2003.Kenneth O Stanley. Compositional pattern producing networks: A novel abstraction of development.
Genetic programming and evolvable machines , 8(2):131–162, 2007.Kenneth O Stanley and Joel Lehman.
Why greatness cannot be planned: The myth of the objective .Springer, 2015.Kenneth O Stanley, Joel Lehman, and Lisa Soros. Open-endedness: The last grand challenge you’venever heard of.
While open-endedness could be a force for discovering intelligence, it could alsobe a component of AI itself , 2017.E¨ors Szathm´ary, Mauro Santos, and Chrisantha Fernando. Evolutionary potential and requirementsfor minimal protocells. In
Prebiotic Chemistry , pp. 167–211. Springer, 2005.Tim Taylor, Mark Bedau, Alastair Channon, David Ackley, Wolfgang Banzhaf, Guillaume Beslon,Emily Dolson, Tom Froese, Simon Hickinbotham, Takashi Ikegami, et al. Open-ended evolution:Perspectives from the oee workshop in york.
Artificial life , 22(3):408–423, 2016.Douglas L Theobald. A formal test of the theory of universal common ancestry.
Nature , 465(7295):219–222, 2010.Ventrella. http://ventrella.com/Clusters/ , 2020.Nathaniel Virgo, Chrisantha Fernando, Bill Bigge, and Phil Husbands. Evolvable physical self-replicators.
Artificial life , 18(2):129–142, 2012.Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, and Kenneth O Stanley.Enhanced poet: Open-ended reinforcement learning through unbounded invention of learningchallenges and their solutions. arXiv preprint arXiv:2003.08536 , 2020.Berend Weel, Emanuele Crosato, Jacqueline Heinerman, Evert Haasdijk, and AE Eiben. A roboticecosystem with evolvable minds and bodies. In , pp. 165–172. IEEE, 2014.NH Wulff and J A Hertz. Learning cellular automaton dynamics with neural networks. In
Advancesin Neural Information Processing Systems , pp. 631–638, 1993.Larry Yaeger, Virgil Griffith, and Olaf Sporns. Passive and driven trends in the evolution of com-plexity. arXiv preprint arXiv:1112.4906 , 2011.Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprintarXiv:1611.01578 , 2016. 11 A PPENDIX
A.1 D
ETAILED DESCRIPTION OF THE SYSTEM .We describe the system grid implementation in detail here. We consider a grid of sizes varying from × to × . At every location of the grid we have the following variables: Energy(real scalar), chemicals ( n c real scalars, typically 4, but up to 9), enzymes ( n c real scalars). Wealso have signals carrying information about various variables as we describe below. Finally, wehave recurrent neural network variables (we describe the network next): hidden layer activations ofsize n h (typically 16), input to hidden weights, hidden to hidden weights, hidden biases, hidden toactions weights and action biases.Next, we describe the neural network operations. At every point in time, except special times de-scribed below, we apply a classic recurrent neural network update to activations while keeping theweights fixed: h t = tanh( W x x + W h h + b h ); ˆ a t = W a h t + b a , where W ’s are weights, h areactivations and ˆ a are variables from which actions are computed (as described below). In the aboveformula we dropped the time index from W ’s as they weren’t updated at this time. The W ’s wereupdated at the following times: If one cell (grid element) took a specific action (copy + mutate)aimed at neighbour cell, if it had sufficient energy (a threshold) and if the neighbour cell was empty- meaning having zero energy - the weights and biases got copied with added noise to the other cell.As mentioned in the text, we use this update because we haven’t yet discovered a good forms ofgeneral update. In sections 2.1, 3.1 we propose that such update would take the form h t , W t , u t = F ( h t − , W t − , u t − ) where u t are the hyperparameters of the update rules. The simplest analogueto the previous paragraph would be that at every point in time except the same special times, h, W ’sare updated (so the network can learn = update weights) while u ’s are fixed. Finally, during thespecial times, u ’s are not merely copied, but the network can generate new u ’s in the neighbour cell,as well as h and W , allowing the originator cell to program the new one.Next let us describe the energy-chemical dynamics within the cell. Energy is a scalar that is boundbetween and some max value. The chemicals transform in directions Z i → Z ( i +1) modn c , i =1 , . . . , n c in proportion the amount of enzymes C i respectively and all the reactions release a fixedamount of energy except for i = n c which does not release any. The released energy is added to theenergy of the cell. If energy falls below a threshold (by mechanisms described next), the weightsand activations are erased - set to zero. We also use maximum lifetime of a cell after which theweights and activations are erased. The amount of enzyme as well as flows of quantities betweenthe cells is controlled by the cell’s actions which we describe next.The cell has the following actions. 1. Copy action: from ˆ a we extract five components and takesoftmax to decide where and if to copy (0 - do not copy, 1-4 copy to one of the four directions). Ifa copy is successful, a fixed energy (tending to be large) is removed from the cell. 2. Move action:This is optional. If enabled, it has the same logic, but this time, the full content of the two cells(originator and target) is swapped. 3. Energy flow: The four values determine the proportion ofenergy the cells want to suck from each neighbour or push to the neighbour (positive vs negative).The cost of such action is proportional to the push. The resulting flow between two cells is obtainedby subtracting the values of actions from the two cells, allowing two cells to fight or cooperate inmoving the energy. 4. Chemical flow: The same logic but for each chemical. There are n c values,one for each chemical and direction. 5. Enzyme production: n c actions that create each enzyme.The total amount of enzyme is normalized to be at most one. There are two extra features. Theenergy is normally protected from falling below a threshold (protecting the cell from accidentallykilling itself). The exception is that if the energy pull from the neighbour is sufficiently large, it willpull all the energy and kill the other cell (erase its weights and activations).Finally there are signals. The cells need to know about what is happening at the other cells and tocommunicate to form larger aggregates and networks. We implemented the following communica-tion. There are four information grids - one moving in each of the four directions. At every point intime, each grid moves in its direction. In addition, part of activations h as well as energy, chemicalsand enzymes overwrite to information grids for those cell for which energy is above threshold. Theneural network input is formed by concatenating the four information grids at a given location.12inally each run of the system was run on single GPU, and the largest system size × at n h = 16 (160000