Open-Ended Automatic Programming Through Combinatorial Evolution
Sebastian Fix, Thomas Probst, Oliver Ruggli, Thomas Hanne, Patrik Christen
AAutomatic Programming Through Combinatorial Evolution
Sebastian Fix, Thomas Probst, Oliver Ruggli, Thomas Hanne, and Patrik Christen ∗ Institute for Information Systems, FHNW University of Applied Sciences and ArtsNorthwestern Switzerland, Riggenbachstrasse 16, 4600 Olten, Switzerland
21 February 2021
Abstract
It has been already shown that combinatorial evolution – the creation of new thingsthrough the combination of existing things – can be a powerful way to evolve ratherthan design technical objects such as electronic circuits in a computer simulation.Intriguingly, only a few iterations seem to be required to already achieve complexobjects. In the present paper we want to employ combinatorial evolution in softwaredevelopment. Our research question is whether it is possible to generate computerprograms of increasing complexity using automatic programming through combinato-rial evolution. Specifically, we ask what kind of basic code blocks are needed at thebeginning, how are these code blocks implemented to allow them to combine, and howcan code complexity be measured. We implemented a computer program simulatingcombinatorial evolution of code blocks stored in a database to make them availablefor combinations. Automatic programming is achieved by evaluating regular expres-sions. We found that reserved key words of a programming language are suitable fordefining the basic code blocks at the beginning of the simulation. We also found thatplaceholders can be used to combine code blocks and that code complexity can bedescribed in terms of the importance to the programming language. As in the pre-vious combinatorial evolution simulation of electronic circuits, complexity increasedfrom simple keywords and special characters to more complex variable declarations,to class definitions, to methods, and to classes containing methods and variable dec-larations. Combinatorial evolution, therefore, seems to be a promising approach forautomatic programming. ∗ Corresponding author: [email protected]. a r X i v : . [ c s . S E ] F e b Introduction
Genetic algorithms and evolutionary computation in general are widely used for solv-ing optimisation problems [1]. Such algorithms follow the paradigm of biological evolu-tion. They consist of a collection of virtual organisms, where every organism representsa possible solution to a given problem. Some fitness measure is then calculated foreach organism in an iterative process and it tries to find improved solutions by formingrandom mutations and crossovers on them.In contrast to such evolutionary computation, combinatorial evolution as proposedby W. Brian Arthur [2, 3] makes no modifications to the organisms themselves. Newsolutions are formed through the combination of existing components which then formnew solutions in later iterations with the goal of satisfying certain needs. The moreuseful a combination is, the higher is its need rating. Combining existing componentsto construct new components can be observed in the evolution of technology [2, 3].For instance, the invention of radar was only possible through combining simpler elec-tronic parts fulfilling functions like amplification and wave generation [4]. In orderto investigate combinatorial evolution, Arthur and Polak [4] created a simple com-puter simulation, where electronic circuits were evolved in a combinatorial manner.Their simulation started by randomly combining primitive elementary logic gates andthen used these simpler combinations for more complicated combinations in later it-erations. Over time, a small number of simple building blocks was transformed intomany complicated ones, where some of them might be useful for future applications.Surprisingly, only a few iterations were required until more complicated building blockswere generated in the simulation [4]. It was concluded that combinatorial evolutionallows building some kind of library of building blocks for the creation future morecomplicated building blocks.Here we want to explore whether combinatorial evolution could be also appliedto software development, more specifically to automatic programming. An early ideaof automatic programming was to implement high-level programming languages thatare more human readable resulting in compilers, which produce low-level programs– down to machine code – from human readable syntax [5]. However, human inputin some form was still needed and the programming task was simply transferred toa higher level. Furthermore, the software solution is limited by the programmer’scapabilities and creativity. Language therefore remains a barrier between programmersand computers. A way around this barrier would be to let the computer do theprogramming, which might even lead to better programs. Koza [6] addressed this issuethrough genetic programming, where populations of computer programs are generatedby a computer using genetic algorithms. The problem space consists of programsthat try to solve (or approximately solve) problems. It has been demonstrated thatrandom mutations and crossovers in source code can effectively contribute in creatingnew sophisticated programs [7].Therefore, it seems possible to define a programming task and let a computerdo the programming. Looking at the process of software development, programmingseems more comparable to technological rather than biological evolution. Existinglibraries or algorithms are often integrated into new software without the necessity ofmodifying them. Therefore, an automatic programming approach that creates newcomputer programs by means of combinatorial evolution might be an interesting al-ternative to genetic programming. In the present study we investigate ways to definea programming task for automatic programming through combinatorial evolution in-cluding the evaluation of the generated code with a need rating. Our research questionis whether it is possible to generate computer programs of increasing complexity usingautomatic programming through combinatorial evolution. Specifically, we ask whatkind of basic code blocks are needed at the beginning? How are these code blocksimplemented to allow them to combine? How can code complexity be measured? Automatic Programming
Since the development of computers, it has been a challenge to optimise and adaptprogram code to access the potential performance of a computer. While the computa-tional power of computers has been steadily increasing in recent years, program codeis still limited by the ability of programmers to create efficient and functioning code.Programming languages have also evolved over the past decades. The developmentof programming languages has sought to provide programmers with abstractions athigher levels. However, this also led to limitations, especially regarding performanceand creativity. It is thus intriguing to shift the programming to the computer itself.Most of the programming is currently done by human programmers, which often leadsto a time-intensive and error-prone process of software development. The idea thatcomputers automatically create software programs has been a long-standing goal [8]with the potential to streamline and improve software development.Automatic programming was first considered in the 1940s describing the automa-tion of a manual process in general and with the goal to maximise efficiency [9]. Today,automatic programming is considered a type of computer programming in which codeis generated using tools that allow developers to write code at a higher level of abstrac-tion [9]. There are two main types of automatic programming: application generators and generative programming . Cleaveland [10] describes the development of applicationgenerators as the use of higher-level programming models or templates to translatecertain components into lower-level source code. Generative programming, on theother hand, assists developers in writing programs. This can be achieved, e.g. by pro-viding standard libraries as a form of reusable code [11]. In generative programmingit is crucial to have a domain model, which consists of three main parts: a problemspace, a solution space, and a configuration knowledge mapping that connects them.The problem space includes the features and concepts used by application engineers toexpress their needs. These can be textual or graphical programming languages, inter-active wizards, or graphical user interfaces. The solution space consists of elementarycomponents with a maximum of combinability and a minimum of redundancy. Theconfiguration knowledge mapping presents a form of generator that translates theobjects from the problem space to build components in the solution space [11].While these kinds of automatic programming heavily depend on human interactionand thus the capabilities and creativity of programmers, genetic programming can beregarded an attempts to reduce this dependency and shift the focus to automationdone by the computer itself. Koza [6] describes genetic programming as a type of pro-gramming in which programs are regarded as genes that can be evolved using geneticalgorithms [12, 13]. It aims to improve the performance of a program to perform apredefined task. According to Becker and Gottschlich [8], a genetic algorithm takes,as an input, a set of instructions or actions that are regarded as genes. A random setof these instructions is then selected to form an initial sequence of DNA. The wholegenome is then executed as a program and the results are scored in terms of howwell the program solves a predefined task. Afterwards, the top scorers are used tocreate offspring, which are rated again until the desired program is produced. To findnew solutions, evolutionary techniques such as crossover, mutation, and replicationare used [14]. Crossover children are created by picking two parents and switchingcertain components. Another technique is mutation, which uses only one individualparent and randomly modifies its parts to create a new child. Sometimes parentswith great fitness will be transferred to the next iteration without any mutation orcrossover because they might do well in later steps as well.
With combinatorial evolution, new solutions build on existing combinations of pre-viously discovered solutions. Every evolution starts with some primitive, existing uilding blocks and uses them to build combinations. Those combinations are thenstored in an active repertoire. If the output satisfies a need better than an earliersolution, it replaces the old one and will be used as the building block within lateriterations. Building blocks are thus not modifies, they are combined together as awhole creating a new building block. The result is a library of functionalities thatmay be beneficial for a solution in the future [2, 3].As Ogburn [15] suggested, the more equipment there is within a material culture,the greater the number of inventions are. This is known as the Ogburn’s Claim. It cantherefore be inferred that the number and diversity of developed components as well astheir technological developments matters because next generation components buildupon the technological level of the previous, existing components. To investigate this,Arthur and Polak [4] created a simple computer simulation to ‘discover’ new electroniccircuits. In their simulation, they used a predefined list of truth tables of basic logicfunctions such as full adders or n-bit adders. Every randomly created combinationrepresented a potential satisfaction of a need, which was then tested against thislist. If the truth table of a newly created circuit matched one from the predefinedlist, it is added to the active repertoire as it fulfils the pre-specified functionality.Sometimes, it also replaced one that was found earlier, if it used fewer parts andtherefore would cost less. New technologies in the real world are not usually foundby randomly combining existing ones nor do they exist in a pre-specified list to becompared against. Nevertheless, their needs are generally clearly visible in economicsand current technologies [4].Combinatorial evolution is in general an important element of evolutionary sys-tems. Stefan Thurner and his colleagues developed a general model of evolutionarydynamics in which the combination of existing entities to create new entities playsa central role [16, 17, 18]. There were able to validate this model using world tradedata [19], therefore underlining the importance of evolutionary dynamics in economicmodelling in general and combinatorial interactions in particular. Genetic algorithms have been used for automatic programming already, however, alarge number of iterations are required to significantly increase code complexity inorder to solve more complex problems [20]. It therefore seems beneficial to use combi-natorial evolution in which complexity seems to increase in fewer steps and thus lesstime.Code complexity has been measured in this context with different approached. Thecyclomatic complexity of a code is the number of linearly independent paths within it[21]. For instance, if the source code contains no control flow elements (conditionalsor decision points), the complexity would be 1, since there would be only a single paththrough the code. If the code has one single-condition IF statement, there would betwo paths through the code – one where the IF statement evaluates to TRUE andanother one where it evaluates to FALSE – so the complexity would be 2. Two nestedsingle-condition IFs, or one IF with two conditions, would produce a complexity of 3[22]. According to Garg [23] the cyclomatic complexity is one of the most used andrenowned software metrics together with other proposed and researched metrics, suchas the number of lines of code and the Halstead measure. Although the cyclomaticcomplexity is very popular, it is difficult to calculate for object-oriented code [24].
We used the programming language Java though other programming languages wouldhave been feasible as well. The development environment was installed on VirtualBox an open source virtualisation environment from Oracle. Oracle Java SE DevelopmentKit 11 was used with Apache Maven as build automation tool. To map the existingcode with a database, Hibernate ORM was used. It allows mapping object-orientedJava code to a relational database. Furthermore, code versioning with GitHub wasused. The simulation is initialised by adding some basic code building blocks into a repos-itory. The first simulation iteration then starts by randomly selecting code blocksfrom this repository. Selected blocks are then combined into a new code block, whichsubsequently gets analysed for its usefulness and complexity. Based on this analysis,the code block is assigned a value. Nonsense code, which is the most common resultwhen randomly combining key words of a programming language, are assigned a valueof 0 and not used any further. Only code blocks with a value greater than 0 are addedto the repository and consequently have a chance of being selected in a later iteration.
Defining the initial code building blocks of the repository is a challenging task sincethey should not contain too much predefined logic and on the other hand have a mini-mal complexity in order to allow creative combinations without limiting or predefiningtoo much. One way would be to define code snippets with placeholders, e.g. a codesnippet of a method definition where the body is a placeholder. Placeholders are im-portant for the combination of code blocks – they define where a certain code blockcan be linked to another code block in the repository. However, some preliminary ex-periments showed that this approach would limit the creativity and complexity of theautomatic programming solution by the predefined snippets – the simulation wouldonly create program logic that is already given by the basic set of code blocks.To overcome this limitation, we defined the basic code building blocks accordingto the basic key words and special characters of the Java programming language, e.g.the key words int , for , class , and String as well as the special characters & , = , ; , and { .In addition to the key words and special characters, we defined three more extra codeblocks that are used in the simulation: PLACEHOLDER is used to define where blocksallow other code blocks to be combined and integrated. This is particularly importantfor nesting certain code elements, such as method object that must be nested into aclass construct to be valid Java code.
NAME is used to name something, e.g. classes,methods, and variables require names in Java. The special key word main is usedin the main method definition of every Java program and is therefore required to bewithin the initial basic code blocks as well.
During the selection process, new source code is generated based on combinations ofexisting code blocks from the repository. The chance that a particular code block isselected depends on its classification value, which takes the importance and complexityof the code block into account (see the following subsection for more details). In a firststep, a helper function defines a random value of how many code blocks are taken intoconsideration in the current iteration. There is a minimum of two code blocks requiredto generate a new code block. The maximum number can be predefined in the program.Arthur and Polak [4] combined up to 12 building blocks. To reduce the number ofiterations needed for receiving valid Java code, a maximum of eight blocks turned outto be a good limit in preliminary experiments. After randomly defining the numberof code blocks to be combined, the weighted random selection of code blocks based ontheir classification value follows. Instead of simply chaining all selected code blockstogether, there is also the possibility to nest them into a
PLACEHOLDER if available. random function decides whether a code block is nested into the PLACEHOLDER , orsimply added to the whole code block. This procedure is important because programcode usually exhibits such nested structures.
After the selection and combination process, the newly generated source code is passedinto the classification function where it gets analysed. The classification process is re-quired to weight the different code blocks according to their relevance in the Javaprogramming language and to see whether the code evolved with respect to complex-ity. To achieve that, regular expression patterns were defined. They allow identifyingrelevant Java code structures such as classes and methods, which can be used todetermine the weight of the newly generated code block. After code analysis by regu-lar expressions, predefined classification values for these code structures are assigned.Basic structures such as variable declarations have a value of 1. More elaborate struc-tures such as classes have a value of 2 and even more complicated structures suchas methods have a value of 3. If a structure contains several of these substructures,their classification values is added. An important structure in many programminglanguages is the declaration of a variable. With the following regular expression anydeclaration of the value types boolean , byte , char , double , float , int , long , and short are detected: (PLACEHOLDER( ? !PLACEHOLDER) ) ?( boolean | byte | char | double | f l o a t | int | long | short ) NAME;(PLACEHOLDER( ? !PLACEHOLDER) ) ? Other important elements are brackets. E.g. they are used in methods and classesspecifying the body. The syntax is given by the programming language. Placeholdersinside brackets are important, they allow new code to be injected into existing codeblocks in future combinations. We therefore created the following regular expression: ˆ( \{ PLACEHOLDER \ } | \ (PLACEHOLDER \ ) ) $ As already shown in the simple simulation with electronic circuits [4], one needs aminimal complexity of the initial building blocks to be able to generate useful and morecomplex future combinations. Classes and methods are essential to build anythingcomplex in Java. Therefore, regular expressions were implemented to identify validclasses and methods. Valid means, the element is closed and it successfully compiles.Variable declarations and methods are allowed to be nested in the class structure. Thefollowing regular expression to detect classes was developed: ( protected | private | public ) c l a s s NAME \{ ( ( boolean | byte | char | double | f l o a t | int | long | short ) NAME; | ( protected | private | public ) void NAME \ (( ( boolean | byte | char | double | f l o a t | int | long | short ) NAME) ? \ ) \{ ( ( boolean | byte | char | double | f l o a t | int | long | short ) NAME; | PLACEHOLDER( ? !PLACEHOLDER) ) ∗ \ }| PLACEHOLDER( ? !PLACEHOLDER) ) ∗ \ } $ A valid method needs to be correctly closed and can contain either a placeholderor a variable declaration. We wanted to take as less influence on the generated code aspossible. Therefore, many attempts had to be done with formulating regular expres-sion patterns and weighting the detected code structure accordingly. Our presentedset of regular expression patterns is obviously not complete. Many additional rulesfor structures could be defined. The following regular expression to detect methodswas developed: (PLACEHOLDER( ? !PLACEHOLDER) ) ?( protected | private | public ) void NAME \ (( ( boolean | byte | char | double | f l o a t | int | long | short ) NAME) ? \ ) \{ ( ( boolean | byte | char | double | f l o a t | int | long | short ) NAME; PLACEHOLDER( ? !PLACEHOLDER) ) ∗ \ } (PLACEHOLDER( ? !PLACEHOLDER) ) ? In some preliminary experiments, we automatically compiled source code of newlycombined code blocks to check whether they are valid. However, this process is tootime consuming to allow large numbers of iterations. An iteration required one to threeseconds compilation time. As combinatorial evolution relies on rather large numbersof iterations, we instead used regular expressions to check whether newly combinedcode blocks compile and are thus valid. It turned out to be a much faster alternativeto the actual compilation of source code.
Using Java key words for initial basic code blocks, we found the first useful combi-nations of code blocks within 100’000 iterations. Such code blocks mainly consistedof combinations of three basic building blocks. Table 1 shows some examples thatwere found in a simulation lasting for 1.6 billion iterations, which took approximately5 hours on a desktop computer. Such combinations are typically assigned a smallclassification value due to their simplicity, keeping in mind that only code blocks thatare assigned values greater than 0 are added to the code block repository for latercombinations.It did not take long for the combinatorial evolution simulation to find the firstcombinations that consisted of previously found code blocks as illustrated in Table 2.E.g. code block 45 – which consists of block 42 and block 44 – was found only 308iterations later. Though it took some time to find a Java method in code block 168,only a small number of iterations later, many subsequent code blocks followed withhigher classification values. Code blocks 169 and 170 characterise Java classes thatcontain methods and declarations of variables.Shortly after code blocks 45 and 61 were found, around 100 combinations followedthat represented classes containing variable declarations. This can be observed bylooking at the dots between 100k and 10m in Fig. 1. The larger the repository grew,the smaller became the probability of a Java method being found. The dot with value3 at around 1 billion iterations is the method code block 168 in Table 2. All thefollowing code blocks built on this one. Combinations of the method with a variabledeclaration were assigned a classification value of 4, combinations with a class wereassigned a classification value of 5, and combining all three resulted in the assignmentof a classification value of 6.
Iteration Block New Code Block Class4’647 25 short
NAME ; public void
NAME boolean
NAME ; protected class
NAME { PLACEHOLDER } public class NAME protected c l a s s
NAME { PLACEHOLDER } public c l a s s NAME { PLACEHOLDER } > · public void NAME ( ) { PLACEHOLDER } > · public c l a s s NAME { public void NAME ( ) { PLACEHOLDER } short NAME ; } > · protected c l a s s NAME { boolean NAME ; public void
NAME ( ) { PLACEHOLDER }} In the present paper we investigated whether it is possible to generate computer pro-grams of increasing complexity using automatic programming through combinatorialevolution. Specifically, we wanted to know what kind of basic code blocks are neededat the beginning of a simulation, how are these code blocks implemented to allow themto combine, and how can code complexity be measured. To start the first iteration ofthe combinatorial evolution simulation we needed to define code blocks that existedin the programming framework Java. As basic initial code blocks we defined reservedkey words of the Java programming language that are used to define classes, methods,initialise variables, and so on. This also includes some special characters used in theprogramming language that we also added. Placeholders within code blocks are usedto allow combining code blocks and thus source code. Newly generated code blocksare assigned a classification value according to their structure, which represents codecomplexity. The combinatorial evolution simulation generated code blocks includingclasses, methods, variables, and combinations thereof. It therefore generated code ofincreasing complexity.Regarding measuring complexity this complexity and the different approached toso, e.g. determining the number of lines of code and McCabe’s [22] cyclomatic com- plexity, have been taken into consideration but the code blocks from the outcomes afternearly 2 billion iterations were in our opinion still too short to implement this complex-ity measure. Two factors were important why we did not use McCabe’s [22] cyclomaticcomplexity. First, it did not generate the required public static void main(String[] args) {} method within a reasonable number of iterations, so there was no starting point. Sec-ond, we decided to not have the decision code block assigned a value greater than 0in the initial code blocks. Without any of these code blocks, the complexity wouldalways be evaluated as 1.We conclude that the combinatorial evolution simulation clearly shows how Javacode can be automatically created using combinatorial evolution. Simple key wordsand special characters were successfully combined into more complex and differentstructures like variable declarations or methods and in later iterations they even gotcombined into more sophisticated results such as classes consisting of methods andvariable declarations.The reached limitations of complexity show that further research is required. Sim-ilar observations for genetic programming [20] suggest that more advanced evolua-tionary operators should be useful. However, already when starting with further elab-orated code blocks or when reaching them during previous combinatorial evolution,the goal of automatic programming might come much closer. Therefore, forthcomingresearch may also study the concept with much increased computational power anddistributed computing. References [1] Thomas B¨ack.
Evolutionary Algorithms in Theory and Practice . Oxford Univer-sity Press, New York, 1996.[2] W. Brian Arthur. How We Became Modern. In Shuzhen Sim and BenjaminSeet, editors,
Sydney Brenner’s 10-on-10: The Chronicles of Evolution . WildtypeBooks, 2018.[3] W. Brian Arthur.
The Nature of Technology: What it Is and How it Evolves .Free Press, New York, 2009.
4] W. Brian Arthur and Wolfgang Polak. The evolution of technology within asimple computer model.
Complexity , 11(5):23–31, 2006.[5] Wendy Hui Kyong Chun. On Software, or the Persistence of Visual Knowledge.
Grey Room , 18:26–51, 2005.[6] John R. Koza. Genetic programming as a means for programming computers bynatural selection.
Statistics and Computing , 4(2):87–112, 1994.[7] Riccardo Poli, William B Langdon, Nicholas F McPhee, and John R Koza. Ge-netic programming: An introductory tutorial and a survey of techniques andapplications.
University of Essex, UK, Tech. Rep. CES-475 , pages 927–1028,2007.[8] Kory Becker and Justin Gottschlich. AI Programmer: Autonomously CreatingSoftware Programs Using Genetic Algorithms. arXiv , 2017.[9] David Lorge Parnas. Software aspects of strategic defense systems.
ACM SIG-SOFT Software Engineering Notes , 10(5):15–23, 1985.[10] J.C. Cleaveland. Building application generators.
IEEE Software , 5(4):25–33,1988.[11] Krysztof Czarnecki and Ulrich Eisenecker.
Generative Programming: Methods,Tools, and Applications . ACM Press/Addison-Wesley Publishing Co., New York,2000.[12] John H. Holland.
Signals and Boundaries: Building Blocks for Complex AdaptiveSystems . The MIT Press, Cambridge, 2012.[13] John H. Holland. Genetic Algorithms.
Scientific American , 267(1):66–72, 1992.[14] Nelishia Pillay and Caryl K. A. Chalmers. A hybrid approach to automatic pro-gramming for the object-oriented programming paradigm. In
Proceedings of the2007 Annual Research Conference of the South African Institute of ComputerScientists and Information Technologists on IT Research in Developing Coun-tries , SAICSIT ’07, page 116–124, New York, NY, USA, 2007. Association forComputing Machinery.[15] William Fielding Ogburn.
Social Change: With Respect to Culture and OriginalNature . B. W. Huebsch, New York, 1922.[16] Stefan Thurner. The Creative Destruction Of Evolution. In Shuzhen Sim andBenjamin Seet, editors,
Sydney Brenner’s 10-on-10: The Chronicles of Evolution .Wildtype Books, 2018.[17] Stefan Thurner, Rudolf Hanel, and Peter Klimek.
Introduction to the Theory ofComplex Systems . Oxford University Press, New York, 2018.[18] Stefan Thurner. A Simple General Model of Evolutionary Dynamics. In HildegardMeyer-Ortmanns and Stefan Thurner, editors,
Principles of Evolution: From thePlanck Epoch to Complex Multicellular Life , The Frontiers Collection, pages 119–144. Springer, Berlin Heidelberg, 2011.[19] Peter Klimek, Ricardo Hausmann, and Stefan Thurner. Empirical Confirmationof Creative Destruction from World Trade Data.
PLoS ONE , 7(6):e38924, 2012.[20] Adam Tyler Harter.
Advanced techniques for improving canonical genetic pro-gramming . Missouri University of Science and Technology, 2019.[21] Christof Ebert, James Cain, Giuliano Antoniol, Steve Counsell, and Phillip La-plante. Cyclomatic complexity.
IEEE software , 33(6):27–29, 2016.[22] T. J. McCabe. A complexity measure.
IEEE Transactions on Software Engineer-ing , SE-2(4):308–320, 1976.[23] Ankita Garg. An approach for improving the concept of Cyclomatic Complexityfor Object-Oriented Programming. arXiv , 2014.[24] Mir Muhammd Suleman Sarwar, Sara Shahzad, and Ibrar Ahmad. Cyclomaticcomplexity: The nesting problem. In
Eighth International Conference on DigitalInformation Management (ICDIM 2013) , pages 274–279. IEEE, 2013., pages 274–279. IEEE, 2013.