Evolving Structures in Complex Systems
EEvolving Structures in Complex Systems
Hugo Cisneros
CIIRC, CTU in Prague ENS [email protected]
Josef Sivic
Inria, DI-ENS, PSL CIIRC, CTU in Prague [email protected] Tomas Mikolov
Facebook [email protected]
Abstract —In this paper we propose an approach for measuringgrowth of complexity of emerging patterns in complex systemssuch as cellular automata. We discuss several ways how ametric for measuring the complexity growth can be defined.This includes approaches based on compression algorithms andartificial neural networks. We believe such a metric can be usefulfor designing systems that could exhibit open-ended evolution,which itself might be a prerequisite for development of generalartificial intelligence. We conduct experiments on 1D and 2Dgrid worlds and demonstrate that using the proposed metric wecan automatically construct computational models with emergingproperties similar to those found in the Conway’s Game of Life,as well as many other emergent phenomena. Interestingly, someof the patterns we observe resemble forms of artificial life. Ourmetric of structural complexity growth can be applied to a widerange of complex systems, as it is not limited to cellular automata.
I. I
NTRODUCTION
Recent advances in machine learning and deep learninghave had successes at reproducing some very complex featstraditionally thought to be only achievable by living beings.However, making these systems adaptable and capable ofdeveloping and evolving on their own remains a challenge thatmight be crucial for eventually developing AI with generallearning capabilities (for example as is further discussed in[1]). Building systems that mimic some key aspects of thebehavior of existing intelligent organisms (such as the abilityto evolve, improve, adapt, etc.) might represent a promisingpath. Intelligent organisms — e.g., human beings but alsomost living organisms if we consider a broad definition ofintelligence — are a form of spontaneously occurring, everevolving complex systems that exhibit these kinds of proper-ties [2]. The ability to sustain open-ended evolution appears tobe a requirement in order to enable emergence of arbitrarilycomplex adaptive systems.Although a rigorous attempt at defining intelligence or lifeis beyond the scope of this paper, we assume that a systemwe might identify as evolving, with the potential of developingintelligence, should have the property of self-preservation andthe ability to grow in complexity over time. These propertiescan be observed in living organisms [2] and should also be apart of computational models that aim to mimic them. CIIRC - Czech Institute of Informatics, Robotics and Cybernetics, CzechTechnical University in Prague. WILLOW project, Departement d’Informatique de l’ ´Ecole NormaleSup´erieure, ENS/INRIA/CNRS UMR 8548, PSL Research University.
To recognize self-preservation and growth in complexity,one should be able to detect emerging macro-structures com-posed of smaller elementary components. For the purpose ofobtaining computational models that grow in complexity overtime, one should also be able to determine the amount ofcomplexity these systems contain. We propose and discussin this paper several ways of estimating the complexity anddetecting the presence of emerging and stable patterns incomplex systems such as cellular automata. We show thatsuch metrics are useful when searching the space of cellularautomata with the objective of finding those that seem toevolve in time. II. R
ELATED WORK
A. Artificial life and open-ended evolution
Several works have attempted to artificially create open-ended evolution. A non-exhaustive list of some well knownsystems include Tierra [3], Sims [4], Avida [5], Polyworld[6], Geb [7], Division Blocks [8] and Chromaria [9]. Designsfocusing on an objective, and making use of reinforcementlearning methods to drive evolution are also being studied,e.g. in [10]. Most of these simulated “worlds” have had somesuccess in reproducing key aspects of evolving artificial life,enabling the emergence of complex behavior from simpleorganisms. However, they still work within constrained simu-lated environments and usually consider organisms composedof elementary building blocks, while they don’t work outsideof this usually very constrained framework. Divergent and cre-ative evolutionary process could be happening at a much lowerconceptual level with fewer assumptions. For this reason, weconsider cellular automata in the rest of the paper because theyrely on a very few assumptions while offering a very largeexpressive power and a potentially wide range of behaviorsthat can be discovered. However, the metrics defined in thispaper have the potential to be applied to other types of complexsystems as discussed in Section VII.
B. Cellular automata
Cellular automata are very simple systems, usually definedin one or two dimensions, composed of cells that can bein a set of states. The cells are updated in discrete timesteps using a transition table that defines the next state ofa cell given the states of its neighbors. They were originallyproposed by Stanislaw Ulam and studied by Von Neumann[11], who was interested in designing a computational system a r X i v : . [ n li n . C G ] M a r hat can self-reproduce itself in a non-trivial way. The trivialself-reproducing patterns were then those that do not havepotential to evolve, for example the growth of crystals.Stephen Wolfram later took a more bottom up approach,beginning with the study of the simple 1D binary cellularautomata (CA), and identifying four qualitative classes ofcellular automaton behavior [12]:Class 1 evolves to a homogeneous state.Class 2 evolves to simple periodic patterns.Class 3 yields aperiodic disordered patterns.Class 4 yields complex aperiodic and localized structures,including propagating structures.Wolfram and his colleagues also studied 2D CA using toolsfrom information theory and dynamical systems theory, de-scribing the global properties of these systems in terms ofentropies and Lyapunov exponents [13].Christopher Langton and colleagues also studied CA dy-namics [14] — e.g. using the λ parameter [15] — and designeda self-replicating pattern much simpler than Von Neumann’s[16], now known as Langton’s loops. The main issues with hissystem and Von Neumann’s universal replicator is the fact thatthey are very fragile and based on a large amount of humandesign. As a consequence, although they do self-replicate,they cannot increase in complexity and are not robust toperturbations or unexpected interactions with the environment.A genetic algorithm-based search for spontaneously occur-ring self-replicating patterns in 2D cellular automata withseveral states was undertaken in [17] using entropy measuresof the frequency distribution of × patterns. C. Compression and complexity
Compression has often been used as a measure of com-plexity. Lempel and Ziv have introduced in [18] the nowwidespread Lempel-Ziv (LZ) algorithm as a method for mea-suring the complexity of a sequence. By constructing back-references to previous parts of a string, the LZ algorithm iscapable of taking advantage of duplicate patterns in the inputto reduce its size. The DEFLATE algorithm that we use inthe following section combines LZ with Huffman coding forefficient representation of the symbols obtained after the firststep. It is one of the most widespread compression algorithmsand is for instance used in gzip and PNG file compressionstandards.The PAQ compression algorithm series [19] is an ensem-ble of compression algorithms initially developed by MattMahoney with state of the art compression ratio on severalcompression benchmarks. Better compression of an inputmeans a better approximation of the minimum descriptionlength and implicit understanding of more of the underlyingpatterns in input data. The usefulness of a better compressoris that it can detect much more complex and intricate patternsthat aren’t simple repetitions of previous patterns.In [20], H. Zenil investigates the effects of a compression-based metric to classify cellular automata and observes thatit results in a partitioning of the space of 1D CA intoseveral clusters that match Wolfram’s classes of automata. He also used this approach on the output of simple Turingmachines and some 1D automata with more than two statesand larger neighborhoods. Extensions of this work includeasymptotic sensitivity analysis of the compressed length forinput configurations of growing complexity, as introduced in[21], [22].Additionally, image decompression time as an approxi-mation of Bennet’s logical depth [23], [24] and the outputdistribution of simple Turing machines combined with blockdecomposition of CA to approximate their algorithmic com-plexity have also been investigated [25], [26]. However, thepossible extent to which such measures of complexity could beapplied to more complex automata and other complex systemshas not yet been extensively studied. For a review of severalmeasures of complexity and their applications, see [27].III. C
OMPRESSION - BASED METRIC
A cellular automaton of size n in 1D can be representedat time t by its grid-state S ( t ) = { c ( t )1 , ..., c ( t ) n } where each c i (also called a cell) can take one of k possible values(representing the possible states), and a transition rule φ . Thetransition rule is defined with respect to a neighborhood radius r with the mapping φ ( c ( t ) i − r , ..., c ( t ) i ..., c ( t ) i + r ) = c ( t +1) i that maps { , ..., k } r +1 to { , ..., k } . The quantity r + 1 correspondsto the number of cells taken into account for computing thenext state of a cell, namely that cell itself and r neighboringcells in both directions.Simulating a CA amounts to the recursive application ofthis mapping φ to an initial state S (0) = { c (0)1 , ..., c (0) n } .In the rest of the paper, we consider cyclic boundary condi-tions for the automata, meaning that the indices i − r, ..., i + r above are taken modulo n the size of the automaton in 1D.Boundary conditions can have some effect on a CA’s evolution,but cyclic boundaries have been empirically observed to havelimited effect on the complexity of automata in 1D [28].The definition given in the equation above can be extendedto higher dimensional automata by modifying the neighbor-hood and the definition of φ . A 2D neighborhood of radius 1can be defined as the 3 by 3 square around the center cell —also called the Moore neighborhood — or by only consideringthe four immediate horizontal and vertical neighbors of thecenter cell — the Von Neumann neighborhood.Elementary cellular automata (ECA) are 1D CA with k = 2 and r = 1 . There are elements in { , ..., k } r +1 and = 256 possible different set transition rules that are oftencompactly represented as a binary string with 8 bits. Therelatively low number of rules of this type makes it possibleto appreciate the performance of a metric and compare it withothers.We define the compressed length C of a 1D cellularautomaton at time t as C ( S T ) = length ( comp ( c || c || ... c n )) (1)where || denotes the string concatenation operator and thecells c i are implicitly converted into string characters (with onesymbol per unique state). comp is a compression algorithm a) 6 highest scoring automata.Only the first 30 timesteps areshown for readability. Rule number C o m p r e ss e d l e ng t h ( by t e s ) (b) All 256 compressed lengthscoresFig. 1. Compression-based metric on 1D ECA. 1a represents the 1D ECAevolution with each line being the state of the automaton at a given timestep,starting from a single cell set to 1. Cells which are in state 1 are represented inblack and cells in state 0 are represented in white. Time increases downwards.Figure 1b represents the compressed length of the 256 ECA rules, withdifferent marker and colors corresponding to the obtained KMeans clusters. that takes a string as input and outputs a compressed string,and length is the operator that returns the length of an inputstring.Similarly to [29], [20], we use zlib’s C implementationof DEFLATE to compress the final state of the automaton.If we apply the above metric to the 256 ECA run for 512timesteps and initialized with one activated cell in the middle,we obtain the plot of Figure 1b. This example is re-used in thepaper as a way to easily visualize and check that the definedcomplexity measures are coherent with one another. Thecolors on Figure 1b were obtained with a KMeans clusteringalgorithm applied on the compressed length of the automatastates.As visible on Figure 1b, rules are clearly separated intoseveral clusters that turn out to match Wolfram’s classificationof ECA. Class 3 behavior can be found at the top of theplot (highest compressed length, orange and blue clusters),Class 1 and 2 are clearly separated at the bottom part (notdetailed here) and Class 4 rules (colored in green) lie inbetween the other types of behavior. The 6 highest scoringrules are shown on Figure 1a and correspond to Class 3behavior in Wolfram’s classification. Among the classes ofbehavior, some sub-clusters can be found that correspond tosimilarly behaving rules.Ultimately, the theoretical goal of using compression al-gorithms is to approach the theoretical minimum descriptionlength of the input [30]. For very regular inputs, this lengthshould be relatively small and inversely for random inputs.However, gzip and PAQ are crude approximations of theminimum description length, and may only approach it in agiven context. As an example, compressing text data (a taskoften performed with gzip in practice) is much more efficientwith a language model that can assign a very low probabilityto non grammatically correct sentences. The Kolmogorovcomplexity [31] of a cellular automaton is upper boundedby a value that is independent of the chosen rule, as it isentirely determined by the transition table, the grid size, initialconfiguration and number of steps. IV. P REDICTOR - BASED METRIC
One obvious limit of using compression length as a proxyfor complexity is the fact that interesting systems mostly haveintermediate compressed length. Compressed length increaseswith the amount of disorder in the string being compressed.Therefore, extreme lengths correspond either to systems thatdo not increase in complexity on the lower end of the spec-trum, or systems that produce a maximal amount of disorderon the higher end. Neither of them have the potential to createinteresting behavior and increase in complexity. Intermediatevalues of compressed length are also hard to interpret, sinceaverage lengths might either correspond to interesting rules orslowly growing disordered systems.To cope with this limitation, one should also take intoaccount the dynamics of complexity, that is how the systembuilds on its complexity at a given time as it keeps evolving,while retaining some of the structures it had acquired earlier.Compression leverages the amount of repetitions in inputs tofurther compress and this may also be used as an estimate ofstructure overlap, as explained in the following section.
A. Joint compression
As a way to both measure the complexity and the amount ofoverlap between two automata states apart in time, we definea joint compressed length metric for a delay τ as C (cid:48) (cid:0) S ( T + τ ) , S ( T ) (cid:1) = C (cid:0) S ( T ) || S ( T + τ ) (cid:1) (2)where || represents the concatenation operator. This quantityis simply the compressed length of a pair of global states— defined at the beginning of III, represented by the letter S — at two timesteps separated by a delay τ . In 1D,concatenation means chaining the two string representationsbefore compressing, and in 2D we can chain two flattenedrepresentations of the 2D grid. This introduces several issueswhich we discuss in Section IV-B.To quantify the amount of overlap between the two globalstates, we can compute the ratio of this joint compressedlength with the sum of the two compressed lengths C ( S t ) and C ( S t − τ ) , thereby forming the joint compression score µ = C ( S t ) + C ( S t − τ ) C (cid:48) ( S t , S t − τ ) (3)defined for an automaton S , time t and delay τ .This metric is based on the intuition that if patterns occurat step T − τ of the automaton’s evolution and are still presentat step T , the joint compressed length will be lower than thesum of the two compressed length. The idea is illustratedin Figure 2, where it is pointed out that a stable movingstructure (sometimes called glider or spaceships in Game ofLife) will yield lower joint compressed lengths. This alsoapplies to structures that self-replicate, grow from a stableseed or maintain the presence of some sub-structures. Biggerstructures yield a higher compression gain.Joint compression alone is not sufficient since it only selectsrules that either behave like identity or shift input because theymaximize the conservation of structures through time — as (cid:31)(cid:30)(cid:29)(cid:30)(cid:28) (cid:27)(cid:26)(cid:25)(cid:24)(cid:30)(cid:23)(cid:22)(cid:26)(cid:21)(cid:28)(cid:30)(cid:20)(cid:22)(cid:19)(cid:18)(cid:17)(cid:16)(cid:15)(cid:15)(cid:26)(cid:14)(cid:26)(cid:13)(cid:26)(cid:28)(cid:12)(cid:11)(cid:22)(cid:10)(cid:30)(cid:23)(cid:22)(cid:26)(cid:21)(cid:28)(cid:30)(cid:20)(cid:22)(cid:19)(cid:18)(cid:17)(cid:16)(cid:15)(cid:15)(cid:26)(cid:14)(cid:26)(cid:13)(cid:26)(cid:28)(cid:12) Fig. 2. Joint compression method illustration. If a structure persists throughtime, this will decrease the joint compressed length compared to the sum ofcompressed lengths. A persistent structure is circled in red. illustrated in Figure 3a. However, one may combine the jointcompression score with another complexity measure to onlyselect rules that exhibit some disorder, or growth in complexity— as Figure 3b shows (the condition here was a threshold onthe difference of compressed length between initial and finalstates). (a) Highest joint compressionscore among the 256 ECA. (b) With condition on com-pressed length increase.Fig. 3. Comparison of the raw joint compression score and the addition of acomplexity increase condition. The high overlap in structures is not enough toget interesting rules a shown in 3a, but the addition of a complexity thresholdallows to retrieve rules with complex but still structured behavior, as shownin 3b. Figures are from the same slice of 60 cells over 30 timesteps takenfrom larger automata with random initial states. The top row corresponds to t = 0 and time increases downwards. B. Count-based predictor
A major issue with the joint compression metric is the factthat it is designed to compress a linear stream of data. Thisis not ideal when considering higher dimensional automata.Larger sets of transformations have to be considered such astranslations, rotations, flips, etc. Theoretically this should notbe a problem for a good enough linear compression algorithm,but hardware and software limitations make it impractical towork with existing algorithms on higher dimensional structures— with e.g DEFLATE’s upper limit on dictionary size.These higher dimensional automata might be better atgenerating complex dynamics, and the large size of their rulespaces makes it a challenge to explore. There has been at leastone attempt to deal with these higher dimensional systems [25]that lacks the scalability to work with large inputs.An alternative to the linear compression-based method pre-sented above would be to use compressors optimized for n-dimensional data (e.g. PNG compression for 2D automata) totake advantage of spatial correlation for compressing. How-ever, these compressors are rare for higher dimensional data,and are usually optimized for one type of input — e.g. imageswith PNG.Another way to tackle the problem is to use a predic-tion based approach to compression. Similarly to methodsdescribed in [32] and one of the first steps of the PAQcompression algorithm [19], we learn a statistical model ofinput data to predict the content of a cell given its immediatepredecessors. For compression, this is often followed by anencoding step — using Huffman or arithmetic coding — that encodes data which contains the least information (least“surprising” data) with the most compact representation. Thisapproach can also be related to the texture synthesis methoddescribed in [33], where the authors learn a non parametricmodel to predict the next pixel of a texture given a previouslysynthesized neighborhood. Additionally, because we don’tneed the operation to be reversible as in regular compression,it is not necessary to limit the prediction model to makingprediction with predecessors only.For a global state S = ( c , ...c i , ..., c n ) , the neighborhoodof cell i with radius r , denoted n r,i is defined as the tuple n r,i = ( c i − r , ...c i − , c i +1 ..., c i + r ) — without the middlecell. The goal of this method is to estimate the conditionalprobability distribution p ( s | n r ) of the middle states at timestep T given a neighborhood of radius r . Assuming cell states giventheir neighborhood can be modeled by mutually independentrandom variables, the log-probability of global state S ( T ) iswritten log p ( S ( T ) ) = log (cid:81) Ni =1 p ( c i | n r,i ) = (cid:80) Ni =1 log p ( c i | n r,i ) (4)If the automaton has a very ordered behavior, a model willpredict with high confidence the state of the middle cell givena particular neighborhood. On the other hand, in the presenceof maximal disorder, the middle cell will have an equalprobability of being in every state no matter the neighborhood.In the latter case, a predictive model minimizing − log p ( S ( T ) ) would yield a high negative log-probability.A simple possible predictor for such purpose is a largelookup table that maps all visited neighborhoods to a prob-ability distribution over the states that the middle cell can bein. State distributions for each neighborhood are obtained bymeasuring the frequency of cell states given some observedneighborhoods. We denote by Λ this lookup table, defined fora window of radius r , which maps all possible neighborhoodsof size r +1 (ignoring the middle cell) to a set of probabilities p over the possible states { s , ..., s n } , and p can be written [ p s , p s , ..., p s n ] . Λ is defined by Λ : { s , ..., s n } r → ∆ n n r,i (cid:55)→ p (5)where ∆ n denotes the probability simplex in dimension n .To measure the uncertainty of that predictor, we can com-pute the cross-entropy loss between the data distribution it wastrained on and its output. We compute the log probability ofthe observed data given the model, or the quantity L = − N (cid:80) Ni =1 (cid:80) nk =1 { s k } ( c i ) log Λ( n r,i ) s k (6)where { s k } denotes the indicator function of the singleton set { s k } . An illustration of the counting process is represented inFigure 4. The quantity L is minimal when the Λ( n r,i ) s k alwaysequal one, which means the state of every cell is entirelydetermined by its neighborhood.We apply this metric to all 256 ECA, with a window radiusof size 3 (the 6 closest neighbors are used for prediction),and the same settings as for Figure 1b. Cross-entropy loss of ig. 4. Count-based predictor method for a radius r = 1 . A frequencylookup table is computed from the global state at time T by consideringall neighborhoods with radius r = 1 (3 consecutive cells but ignoring themiddle cell). Cross-entropy with the automaton at time T quantifies the overallcomplexity. This can be compared to the cross-entropy at time T + t for theamount of overlap. the lookup table gives the results of Figure 5a. Colors are thesame as in Figure 1b for comparison purposes. Rule number C r o ss - e n t r opy l o ss (a) Count-based predictor Rule number (b) Neural network-based pre-dictorFig. 5. Average cross entropy loss for the two predictor-based methods on all256 ECA. Rules are separated in several clusters. The count-based predictor(left plot) and neural network-based predictor (right plot) were applied witha neighborhood radius r = 1 and . We note the similarity between this plot and the one fromFigure 1b, with a roughly equivalent resulting classification ofECA rules, with the exception of rules with low score. Rulesthat produce highly disordered patterns are on top of the plotwhereas the very simply behaving rules are at the bottom. Thisindicates coherence between the two metrics.
C. Neural network based predictor ? ... Output probabilities
Prediction of central cell
Fig. 6. Neural network architecture for predicting a central cell given itsneighbors. Output probabilities are defined for all possible states of the centralcell.
The frequency based predictor described above still haslimitations: • It doesn’t take into account any redundancy in the inputwhich may lead to suboptimal predictions (in a CA,very similar positions might have similar center cellstate distribution, e.g. a glider in Game of Life shouldbe recognized by the model no matter the rest of theneighborhood). • For the same reasons, when considering large windowsizes, the number of possible neighborhood configurationgets much larger than the number of observed ones,leading to an input sparsity problem.More sophisticated models can cope with above limitationsby dealing with high dimensional inputs without sparsityproblems, and taking into account redundancy of inputs andpotential interactions between states for prediction.We measure the cross-entropy loss of this simple model onthe training set after a standard learning procedure which isthe same for all rules. The procedure is applied to a one hiddenlayer neural network with a fixed hidden layer size. We usea ReLU non-linearity for the hidden layer and a softmax toobtain the output probabilities.For n possible states s , ..., s n , a cell in state s k is repre-sented as a vector of 0s of size n with a 1 in position k . Theinput to the network is the concatenation of these cell vectorsfor all cells in the neighborhood. The output of the network isa vector of size n with the output probability for each state.Gradient updates are computed during training to minimizethe cross-entropy loss between outputs and target examples.For a timestep T , we use the training procedure in order tominimize with respect to θ the following quantity, L ( T ) θ = − N N (cid:88) i =1 n (cid:88) k =1 { s k } (cid:16) c ( T ) i (cid:17) log (cid:20) f θ (cid:16) n ( T ) r,i (cid:17) s k (cid:21) (7)where the neural network depending on parameter θ is denoted f θ , and n ( T ) r,i , the neighborhood of cell i with radius r at time T is defined in the same way as in eq. (6). Loss is computedwith respect to the testing set at time T + τ by computing thesame quantity at this subsequent timestep.The training procedure is selected to achieve reasonableconvergence of the loss for the tested examples. It must bewell defined and stay the same to allow for comparison of theresults across several rules. Score at timestep T for a delay τ is computed with the following formula µ τ = L ( T ) L ( T + τ ) (8)where L ( T + τ ) is the log probability of the automaton state attimestep T + τ (defined in eq. (7)) according to a model withparameters learned during training at timestep T and L ( T ) isthe same as in eq. (7). The value µ τ will be lower for easily“learnable” global states that do not translate well in futuresteps — they create more complexity or disorder — therebydiscarding slowly growing very disordered structures. Highervalues of µ τ correspond to automata that have a disorderedglobal state at time T that can be transposed to future timestepsrelatively well. Those rules will tend to have interesting spatialproperties — not trivially simple but not completely disorderedbecause the model transposes well — as well as a large amountof overlap between a given step and the future ones, indicatingpersistence of the spatial properties from one state to another.We also selected the metric among other quantities computedfrom L ( T ) and L ( T + τ ) because it yielded the best score onour experimental datasets. ABLE IE
XPERIMENTAL RESULTS - AP
SCORES
Neural network r = 1
300 steps 0.358 0.454 0.488
Lookup table r = 1
25 steps 0.092 0.07050 steps 0.102 0.070300 steps 0.093 0.069This table shows the average precision (AP) scores obtained on the datasetof section V with the neural network-based and lookup table-based methods.Results are show for delays τ = 5 , , and several radii values r . V. E
XPERIMENTS
We carried out several experiments on a dataset of 500randomly generated 3 states ( n = 3 ) rules with radius r =1 . Those rules were manually annotated for interestingness,defined as the presence of visually detectable non trivialstructures. The dataset contains 46 rules labeled as interestingand 454 uninteresting rules. Ranking those rules with themetrics introduced above allows to study the influence ofparameters and the adequacy between interestingness as weunderstand it and what the metric measures.The task of finding interesting rules can either be framed asa classification problem or a ranking problem with respect tothe score we compute on the dataset. The performance of ourmetric can be measured with usual evaluation metrics used onthese problems, and notably the average precision (AP) of theresulting classifier.Average precision scores for the neural network and count-based methods for time windows of 5, 50 and 300 timestepsare given in Table I. Scores were computed on automata ofsize × cells, ran for 1000 timesteps ( T + τ = 1000 ).Scores were computed for radii ranging from 1 cell (8 nearestneighbors) to 5 cells (120 neighbors), with a one layer neuralnetwork containing 10 hidden units trained for 30 epochs withbatches of 8 examples. Best AP for each time window is shownin bold. Results for the frequency lookup table predictor areonly shown for r = 1 , because of sparsity issues with thelookup table from r = 2 and above making it unpractical touse the table — possible entries for the lookup table with r = 2 against only observed states.From these experiments, bigger radii appear to performslightly better, although not in a radical way. Since the numberof neighbors scales with the square of the radius, reasonablysmall radii might be a good trade-off between performanceand computational cost of the metric.We also study the performance of our metrics — lookuptable and neural network-based — as inputs of a binaryclassifier against two simple baselines on a random 70/30split of our dataset. The first baseline classifies all exampleas negative. The second baseline is based on compressedlength as defined in [20] and computed by choosing a pair ofthresholds that minimize mean square error when classifyingexamples in between as positive — this is based on theobservation made in Section III that interesting rules have TABLE IIE
XPERIMENTAL RESULTS - A
CCURACY
Metric
Baseline Compressed length[20] LookupTable Neuralnetwork
Accuracy
Accuracy of each metric of complexity when used to classify whichautomatons do evolve interestingly, compared against the trivial all-negativebaseline and the compressed length metric [20]. intermediate compressed lengths. Results are in Table II whereonly the best radius is shown. The lookup table performs betterthan the baselines but the neural network gives the best score.Above experiments demonstrate the capability of our pro-posed metric to match a subjective notion of interestingness ofour labeling. For instance, the top 5 and top 10 scoring rulesof the best performing configuration ( r = 3 , τ = 5 ) are alllabeled as interesting, and top 100 scores contain 41 of the 46rules labeled as interesting.VI. D ISCUSSION
In this section, we discuss the results obtained by using themetric of equation (8) and the way they can be interpreted. a) One dimensional cellular automata:
By applying themetric on the same example as before, we again obtain a plotwith a rule classification that matches a visual appreciationof complexity of 1D CA. Results are shown on Figure 5b.Similarly to the previous cases, rules we might label asinteresting are unlikely to be either at the top or bottom ofthe plot. b) Two dimensional cellular automata:
Simulations con-ducted with 2D CA used grids of size 256 × T = 1000 ). Rules are defined with a tableof transitions from all possible neighborhood configurationswith radius r = 1 (3 × (0 . , . , . and sample transitions according to theseproportions. This parametrization can be related to Langton’slambda parameter [15] that takes into account the proportion oftransitions towards a transient (inactive) state and all the otherstates. We obtain approximately 10% interesting rules withthis sampling as the proportions of our experimental datasetshow.Using the neural network-based complexity metric, we wereable to find rules with interesting behavior among a verylarge set through random sampling. Some of these rules areshown in the paper. Figure 8 displays three 2D rules thatwere selected manually upon visual inspection among the 20highest scoring for metric µ (defined in eq. (8)) of a sample ig. 7. Rules with 3 states that have spontaneously occurring glider structures.The gliders are the small structures that are outside of the center disorderedzone. Some of them move along the diagonals while some others followhorizontal or vertical paths. Note that some repeating patterns occur also inthe more disordered center zone.(a) Timestep T (b) Timestep T + 50 Fig. 8. Spontaneous glider formation and evolution is observed for some highscoring 2 states rules. Each row corresponds to a rule, with a 50 timestepsdifference between the two columns. Gliders are marked with a gray square.Runs were initialized with a small 20 by 20 disordered square (uniformlysampled among possible configuration) in the center simulated for up to 400steps. of 1700 randomly generated 2-states 3 by 3 neighborhoodrules. For comparison, Conway’s Game of Life rule (GoL)ranks in the top 1% of the 2500 rules mentioned above forruns that don’t end in a static global state. We observe thatspontaneous glider formation events appear to be capturedby our metric. Although gliders in cellular automata are asimple process that can manually be created, detection of theirspontaneous emergence within a random search setting is afirst step towards finding more complex macro structures thatcan emerge out of simple components. Rules with low scoresare overwhelmingly of the disordered kind.Figures 7, 9 and 10 show some three states rules that wereselected through random sampling on the simplex with theneural-network based metric. They were selected among the30 highest scoring rules out of 2500 randomly selected 3 statesrules. Their behaviors all involve the growth and interactionof some small structures made of elementary cells.All automata were initialized with a random disordered
Fig. 9. Rules with 3 states that generate cell-like interacting structures. Thesepatterns are either static or moving and can interact with one another togenerate copies of themselves and other patterns. Note the very similar micro-structures that are repeated at several places in the space.Fig. 10. Rules with surprising behaviors that are highly structured butcomplex. Those rules were selected among high-ranking rules for the neural-network based complexity metric. They all exhibit structurally non trivialbehavior. square of 20 by 20 cells in the center. In the Figures mentionedabove, colors were normalized with the most common state setto blue. Figure 7 shows rules that spontaneously emit glidersthat go through space in a direction until they interact withsome other active part of the automaton. Figure 9 shows rulesthat generate small structures of between four and thirty cellsthat are relatively stable and interact with each other. Theseelementary components could be a basis for the spontaneousconstruction of more complex and bigger components. Fig-ure 10 shows some other rules from this set of high rankingautomata. They highlight the wide range of behaviors that canbe obtained with these systems. Interesting rules from thispaper can be found, along with other examples, in the formof animated GIFs .For some of these rules interesting patterns appear lessfrequently in smaller grids, indicating that the size of the spacemight impact the ability to generate complex macro-structures.Increasing the size of the state space to very large grids mighttherefore make it easier generating very complex patterns.VII. C ONCLUSION
In this paper, we have proposed compression-inspired met-rics for measuring a form of complexity occurring in complexsystems. We demonstrated its usefulness for selecting CArules that generate interesting emergent structures from verylarge sets of possible rules. In higher dimensions where linearcompression — as in gzip — is not sufficient to find complexpatterns, our metric is also useful. https://bit.ly/interesting automata e study 2 and 3 states automata in the paper and weplan to investigate the effects of additional states or largerneighborhoods on the ability to evolve more structures andobtain more interesting behaviors.In the future, we will publish the dataset and code to enablereproducibility and improvement on the results reported here .The metrics we introduce in this paper could be used todesign organized systems of artificial developing organismsthat grow in complexity through an evolutionary mechanism.A possible path toward such systems could start by creating anenvironment where computational resource allocation favorsthe fraction of subsystems that perform the best according toour measure of complexity.The proposed metric is theoretically applicable to anycomplex system where a notion of state of an elementarycomponent and locality can be defined. With these require-ments fulfilled, we can build a similar prediction model thatuses information about local neighbors to predict the state ofa component and thereby assess the structural complexity ofan input.We believe that the capability of creating evolving systemsout of such elementary components and with few assumptionscould be a step towards AGI. By devising ways to guidethis evolution in a direction we find useful, we would beable to find efficient solution to hard problems while retainingadaptability of the system. It might be suitable to avoid over-specialization that can happen in systems designed to solve aparticular task — e.g. reinforcement learning algorithms thatcan play games, and supervised learning — by staying awayfrom any sort of objective function to optimize and by leavingroom for open-ended evolution. Acknowledgments.
This work was partially supported by ERC grantLEAP No. 336845, CIFAR Learning in Machines & Brains programand the EU Structural and Investment Funds, Operational ProgrameResearch, Development and Education under the project IMPACT(reg. no. CZ . . . / . / . /
15 003 / ). R EFERENCES[1] T. Mikolov, A. Joulin, and M. Baroni, “A roadmap towards machineintelligence,” in
International Conference on Intelligent Text Processingand Computational Linguistics . Springer, 2016, pp. 29–61.[2] L. Booker, “Perspectives on Adaptation in Natural and Artificial Sys-tems,” in
Proceedings Volume in the Santa Fe Institute Studies in theSciences of Complexity. , New York, NY, USA, 2004.[3] T. S Ray, “An approach to the synthesis of life,” in
Artificial Life II ,1st ed., Jan. 1991, pp. 371–408.[4] K. Sims, “Evolving virtual creatures,” in
SIGGRAPH , 1994.[5] C. Ofria and C. O. Wilke, “Avida: a software platform for research incomputational evolutionary biology,”
Artificial Life , vol. 10, no. 2, pp.191–229, 2004.[6] L. Yaeger, “Computational genetics, physiology, metabolism, neuralsystems, learning, vision, and behavior or Poly World: Life in a newcontext,” in
Santa Fe Institute studies in the Sciences of Complexity ,vol. 17, 1994, pp. 263–263.[7] A. Channon, “Improving and still passing the ALife test: Component-normalised activity statistics classify evolution in Geb as unbounded,”
Proceedings of Artificial Life VIII, Sydney , pp. 173–181, 2003. https://github.com/hugcis/evolving-structures-in-complex-systems [8] L. Spector, J. Klein, and M. Feinstein, “Division Blocks and the Open-ended Evolution of Development, Form, and Behavior,” in Proceedingsof the 9th Annual Conference on Genetic and Evolutionary Computation ,ser. GECCO ’07, 2007, pp. 316–323.[9] L. B. Soros and K. Stanley, “Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria,” in
Ar-tificial Life 14: Proceedings of the Fourteenth International Conferenceon the Synthesis and Simulation of Living Systems , 2014.[10] D. Pathak, C. Lu, T. Darrell, P. Isola, and A. A. Efros, “Learning toControl Self-Assembling Morphologies: A Study of Generalization viaModularity,” in
NeurIPS , 2019.[11] J. Von Neumann and A. W. Burks, “Theory of self-reproducing au-tomata,”
IEEE Transactions on Neural Networks , vol. 5, pp. 3–14, 1966.[12] S. Wolfram, “Universality and complexity in cellular automata,”
PhysicaD: Nonlinear Phenomena , vol. 10, no. 1, pp. 1–35, Jan. 1984.[13] N. H. Packard and S. Wolfram, “Two-dimensional cellular automata,”
Journal of Statistical Physics , vol. 38, no. 5/6, 1985.[14] W. Li, N. H. Packard, and C. G. Langton, “Transition phenomena incellular automata rule space,”
Physica D: Nonlinear Phenomena , vol. 45,pp. 77–94, 1990.[15] C. G. Langton, “Computation at the edge of chaos: Phase transitionsand emergent computation,”
Physica D: Nonlinear Phenomena , vol. 42,no. 1-3, pp. 12–37, Jun. 1990.[16] ——, “Self-reproduction in cellular automata,”
Physica D: NonlinearPhenomena , vol. 10, no. 1, pp. 135–144, Jan. 1984.[17] E. Bilotta, P. Pantano, and S. Vena, “ARTIFICIAL MICRO-WORLDSPART I: A NEW APPROACH FOR STUDYING LIFE-LIKE PHE-NOMENA,”
International Journal of Bifurcation and Chaos , vol. 21,no. 02, pp. 373–398, Feb. 2011.[18] A. Lempel and J. Ziv, “On the Complexity of Finite Sequences,”
IEEETransactions on Information Theory , vol. 22, no. 1, pp. 75–81, Jan.1976.[19] M. V. Mahoney, “Fast Text Compression with Neural Networks,” in
FLAIRS , 2000.[20] H. Zenil, “Compression-Based Investigation of the Dynamical Propertiesof Cellular Automata and Other Systems,”
Complex Systems , vol. 19,no. 1, 2010.[21] H. Zenil and E. Villarreal-Zapata, “Asymptotic behavior and ratios ofcomplexity in cellular automata,”
International Journal of Bifurcationand Chaos , vol. 23, no. 09, 2013.[22] H. Zenil, “What Is Nature-Like Computation? A Behavioural Approachand a Notion of Programmability,”
Philosophy & Technology , vol. 27,no. 3, pp. 399–421, Sep. 2014.[23] C. H. Bennett, “Logical Depth and Physical Complexity,” in
The Univer-sal Turing Machine A Half-Century Survey , R. Herken and R. Herken,Eds., Vienna, 1995, vol. 2, pp. 207–235.[24] H. Zenil, J.-P. Delahaye, and C. Gaucherel, “Image characterization andclassification by physical complexity,”
Complexity , vol. 17, no. 3, pp.26–42, 2012.[25] H. Zenil, F. Soler-Toscano, J.-P. Delahaye, and N. Gauvrit, “Two-dimensional Kolmogorov complexity and an empirical validation of theCoding theorem method by compressibility,”
PeerJ Computer Science ,vol. 1, Sep. 2015.[26] F. Soler-Toscano, H. Zenil, J.-P. Delahaye, and N. Gauvrit, “CalculatingKolmogorov Complexity from the Output Frequency Distributions ofSmall Turing Machines,”
PLOS ONE , vol. 9, no. 5, May 2014.[27] P. Grassberger, “Randomness, Information, and Complexity,” in
Pro-ceedings of the 5th Mexican School on Statistical Physics , 1989.[28] B. J. LuValle, “The Effects of Boundary Conditions on Cellular Au-tomata,”
Complex Systems , vol. 28, no. 1, pp. 97–124, Mar. 2019.[29] T. Kowaliw, “Measures of complexity for artificial embryogeny,” in
Proceedings of the 10th annual conference on Genetic and evolutionarycomputation - GECCO ’08 . ACM Press, 2008.[30] P. D. Gr¨unwald,
The minimum description length principle , ser. Adaptivecomputation and machine learning, Cambridge, Mass, 2007.[31] A. N. Kolmogorov, “Three approaches to the quantitative definition ofinformation,”
International Journal of Computer Mathematics , vol. 2,no. 1-4, pp. 157–168, Jan. 1968.[32] J. Schmidhuber and S. Heil, “Sequential neural text compression,”
IEEETransactions on Neural Networks , vol. 7, no. 1, pp. 142–146, Jan. 1996.[33] A. Efros and T. Leung, “Texture synthesis by non-parametric sampling,”in