[PDF] Code Building Genetic Programming

Abstract

In recent years the field of genetic programming has made significant advances towards automatic programming. Research and development of contemporary program synthesis methods, such as PushGP and Grammar Guided Genetic Programming, can produce programs that solve problems typically assigned in introductory academic settings. These problems focus on a narrow, predetermined set of simple data structures, basic control flow patterns, and primitive, non-overlapping data types (without, for example, inheritance or composite types). Few, if any, genetic programming methods for program synthesis have convincingly demonstrated the capability of synthesizing programs that use arbitrary data types, data structures, and specifications that are drawn from existing codebases. In this paper, we introduce Code Building Genetic Programming (CBGP) as a framework within which this can be done, by leveraging programming language features such as reflection and first-class specifications. CBGP produces a computational graph that can be executed or translated into source code of a host language. To demonstrate the novel capabilities of CBGP, we present results on new benchmarks that use non-primitive, polymorphic data types as well as some standard program synthesis benchmarks.

Full PDF

CCode Building Genetic Programming

Edward Pantridge

SwoopCambridge, Massachusetts, [email protected]

Lee Spector

Amherst College, Hampshire College, and UMass AmherstAmherst, Massachusetts, [email protected]

ABSTRACT

In recent years the field of genetic programming has made sig-nificant advances towards automatic programming. Research anddevelopment of contemporary program synthesis methods, such asPushGP and Grammar Guided Genetic Programming, can produceprograms that solve problems typically assigned in introductoryacademic settings. These problems focus on a narrow, predeter-mined set of simple data structures, basic control flow patterns,and primitive, non-overlapping data types (without, for example,inheritance or composite types). Few, if any, genetic programmingmethods for program synthesis have convincingly demonstratedthe capability of synthesizing programs that use arbitrary datatypes, data structures, and specifications that are drawn from exist-ing codebases. In this paper, we introduce Code Building GeneticProgramming (CBGP) as a framework within which this can bedone, by leveraging programming language features such as reflec-tion and first-class specifications. CBGP produces a computationalgraph that can be executed or translated into source code of ahost language. To demonstrate the novel capabilities of CBGP, wepresent results on new benchmarks that use non-primitive, poly-morphic data types as well as some standard program synthesisbenchmarks.

CCS CONCEPTS • Software and its engineering → Genetic programming ; KEYWORDS automatic programming, genetic programming, inductive programsynthesis

ACM Reference Format:

Edward Pantridge and Lee Spector. 2020. Code Building Genetic Program-ming. In

Genetic and Evolutionary Computation Conference (GECCO ’20),July 8–12, 2020, CancÃžn, Mexico.

ACM, New York, NY, USA, 9 pages.https://doi.org/10.1145/3377930.3390239 “Inductive program synthesis” is the term used to describe the pro-cess of constructing an executable program from a set of input-output examples. A form of inductive program synthesis which

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

GECCO ’20, July 8–12, 2020, CancÃžn, Mexico © 2020 Copyright held by the owner/author(s). Publication rights licensed to theAssociation for Computing Machinery.ACM ISBN 978-1-4503-7128-5/20/07...$15.00https://doi.org/10.1145/3377930.3390239 has been gaining attention in recent years is “general programsynthesis,” which specifically aims to produce programs that usesimilar constructs (data types, control flows, data structures) ashuman programmers. The goal of this research area is to eventuallydiscover a process of automatic programming that is comparableto human skill [12].Real world applications of a sufficiently sophisticated automaticprogramming system would have dramatic impact on softwaredevelopment. Organizations could potentially deploy automatedsystems that attempt to fix bugs or reduce the resource utilizationof their software. In addition, there may be a class of problems forwhich the solution is difficult for humans to produce, but is easyfor program synthesis methods to discover.An ideal program synthesis framework must be capable of pro-ducing programs that utilize arbitrary data types, including typeswith arbitrary relationships, such as inheritance. The computationsperformed by this system must support generic manipulation ofpolymorphic types.Humans rely on decomposition, abstraction, and re-use to cre-ate complex software applications. An ideal program synthesisframework must be able to utilize preexisting abstractions (humanwritten and/or previously synthesized) in the programs it produces.Furthermore, the framework should not require additional configu-ration or modification to use problem-specific abstractions.The output of an ideal program synthesis framework would bethe same as the output of a human programmer. Typically this issource code that can be executed in any environment that supportsthe programming language.No known program synthesis methodology is capable of achiev-ing the goals stated above. This paper presents Code BuildingGenetic Programming (CBGP), a general program synthesis sys-tem with comparable problem solving performance to PushGPand G3P . CBGP also provides additional capabilities that are notpresent in other program synthesis systems. These include: use ofgeneric functions, support for polymorphic types, automatic inte-gration with existing codebases, and transcription of programs intohuman-readable source code of the host language.The rest of this paper is structured as follows. First, we discuss re-lated work that informed our research. Second, we describe CBGP’sprogram representation and the architecture of the search process.We then provide details on the benchmark problems used to eval-uate CBGP and present early comparisons against other methods.We conclude with discussion of the implications of this work andsuggestions for future research. An open source implementation of our prototype can be found athttps://github.com/erp12/CodeBuildingGeneticProgramming-ProtoType a r X i v : . [ c s . P L ] A ug ECCO ’20, July 8–12, 2020, CancÃžn, Mexico Edward Pantridge and Lee Spector

Inductive program synthesis is a research topic that spans multiplefields of study. Evolutionary computation has produced multiplegenetic programming frameworks designed to synthesize generalprograms. Some of the state-of-the-art program synthesis methodsinclude PushGP and Grammar Guided Genetic Programming (G3P).PushGP evolves programs in a Turing complete language, calledPush. The Push language uses a stack-based execution model [17].Push programs are nested sequences of instructions and literals.Literals are values that get directly pushed onto a particular stackbased on their data type. Instructions are functions that pop valuesoff the stacks, transform them, and push the result back onto theappropriate stack. Programs are run through an interpreter and thefinal state of the stacks is considered the output of the program.Push code can be pushed onto a dedicated stack that can be usedby instructions to implement control flow patterns like iterationand conditionals.It is not possible for PushGP to evolve programs that use arbi-trary data types without the definition of additional instructions.This means a custom PushGP system would need to be built totake advantage of preexisting problem-specific abstractions. Fur-thermore, it is not clear how to build a PushGP system capable ofusing overlapping types. If C sub is a concrete subclass with a con-crete super-class, C super , which stack should an instruction thatrequires an input of type C super pop from to find its arguments?To our knowledge, no PushGP implementation has ever developeda successful strategy for dealing with this situation or any otherform of polymorphism.PushGP programs are nested sequences of tokens that can beprocessed by a Push interpreter. Thus, the programs are only asportable as the PushGP implementations that produces them. Re-implementation of Push programs into source code of a host-languageis difficult because the stack-based execution does not map cleanlyinto common programming paradigms.Grammar Guided Genetic Programming (G3P) is a more recentlyintroduced method of inductive program synthesis that featuresa set of context-free Backus-Naur Form grammars [2]. G3P usesone grammar per supported data type to achieve coverage overgeneral programs. Using these grammars, a grammatical evolution(GE) framework can map variable length sequences of integers intoprograms [16].The G3P methodology has the advantage of producing type-safe source code in the host-language. The code produced cancontain variable assignments, control flow, and iteration. It hasbeen previously discussed that extension of G3P systems to supportadditional data types is laborious. It requires the manual creationand integration of additional grammars [2]. It has also been shownthat the performance of G3P is sensitive to the exact implementationof the grammars [3]. Similar to PushGP, we do not know of anyprior work that attempts to extend G3P to support polymorphism.Other program synthesis methods that are not variants of geneticprogramming exist, such as: Excel Flash Fill [5], MagicHaskeller [11],TerpreT [4], and DeepCoder [1]. Although all of these systems aresuccessful in their target problem domains, they are incapable ofattempting the standard program synthesis benchmark problems Figure 1: Some example program DAGS. “DAG 1” shows aprogram that will return the first 3 elements of a list ofstrings. The list of strings is provided to the DAG as an in-put value named

MyList . Definitions for the expressions in“DAG 1” can be found in Figure 2. “DAG 2” is an example ofthe use of higher order functions described in Section 3.2. due to a lack of support for crucial data types and control struc-tures [13].Code Building Genetic Programming aims to be a programsynthesis methodology with comparable search performance toPushGP and G3P, while adding some additional capabilities thatbring the field significantly closer to feasible real-world applica-tions.

Programs produced by Code Building Genetic Programming arecomputational graphs. Each inner node of a graph is a functionand each leaf node is either a constant value or a reference to oneof the program’s inputs. We use the general term "expression" asthe name for a node in a computational graph. The computationalgraphs are directed and acyclic (DAG). A program produced byCBGP will be a DAG of expressions, called a "program DAG."

Expressions are containers that encapsulate a particular compu-tation or value. In addition, expressions hold an associated spec-ification of the computation they encapsulate. For this work, thespecifications are annotations denoting relevant data types associ-ated with the expression.The prototype implementation of CBGP discussed in this paperuses a finite set of expression types. These include:

Constants, Inputs,Functions, Methods, Constructors , and

Higher Order Functions .DAG 1 shown in Figure 1 shows an example program DAG thatrepresents a program that returns the first 3 elements of a list, orthe entire list if it contains 3 or fewer elements. Descriptions ofeach expression in the DAG are found in Figure 2.Note that the

MyList expression is specified to return a listof strings which is a sub-type of list. The

Length expression isspecified to take a generic list as an argument, but the

MyList expression is a valid child expression because all values of

MyList are guaranteed to be instances of the expected argument type.Program DAGs are type-safe because the return type of all child ode Building Genetic Programming GECCO ’20, July 8–12, 2020, CancÃžn, Mexico

Name Expr. Type Arguments Return BehaviorMyList Input - List[str] An input that will be supplied each time the DAG is executed.3 Constant - Int A constant integer value of 3.Length Method L : List Int Returns the number of elements in L .Min Function a : Number, b : Number Number Returns the the minimum of a and b .Take Method L : List, N : Int List Returns a list containing the first N elements of L . Figure 2: The definitions of the expressions found in “DAG 1” of Figure 1. The “Arguments” and “Return” attributes of a anexpression make up their specification.

Method expressions treat the class instance object as an implicit argument. expressions are sub-types (or the same type) as the correspondingargument types of their parent expression.

Constants and

Inputs are leaf nodes that don’t require any argu-ments.

Constants always return the same value and

Inputs returnthe value of a specific input to the program DAG. Both expressiontypes have a specification that is the data type of the value of thenode. If the data type is a collection type, union type, or some otherpolymorphic type, the specification will be a decompose-able rep-resentation of the type and its parts. For example, a key-value maptype annotation can be decomposed into the data type of the keysand values respectively. In Figure 1 the “MyList” expression is an

Input and the “3” expression is a

Constant .Some expression nodes of a program DAG represent a function-like computation that accepts arguments and returns a value. Theseexpressions wrap an underlying function in the host-language.

Method and

Constructor expressions wrap parts of a pre-existingclass definition. All function-like expressions have the same struc-ture in their specification consisting of of a mapping from argumentnames to types and an additional type corresponding to the returnvalue. In Figure 1 “Length,” “Min,” “Take,” and “Add” are all function-like expressions.It is possible for

Function and

Constructor expressions to be leafnodes of a program DAG if they have an arity of zero.

Method ex-pressions cannot be leaf nodes because they are treated as functionswhich have an implicit argument that is the class instance object.

Contemporary program synthesis frameworks, such as PushGP andG3P, have demonstrated the capability to produce programs thatutilize iteration and control flow. It is challenging to express genericforms of iteration, such as a while-loop, in an acyclic computationalgraph. In contrast, higher order functions are beginning to seemore widespread use in the computational graphs used by big-dataprocessing frameworks like Spark [18]. CBGP draws inspirationfrom these modern DAG representations and uses higher orderfunctions in program DAGs to manipulate data structures.Figure 1 depicts a program DAG that utilizes the map higherorder function. The first argument to the map expression must bean expression that returns a collection. The second argument is anexpression that represents an anonymous function to apply to eachelement of the collection.As shown in DAG 2 of Figure 1, one of the expressions containedin the anonymous function sub-DAG is a special kind of

Input expression, called a “local input.” These expressions can only existin a program DAG as part of an anonymous function. Each callto the anonymous function within an execution of the DAG will use the collection element as the value for the local input. It ispossible for multiple local input expressions to appear in a singleanonymous function sub-DAG, and anonymous function DAGsmay contain higher order function expressions with their nestedanonymous function DAGs.When the map

Higher Order Function expression is evaluated aspart of a program DAG, the following process is followed:(1) An empty sequential buffer is initialized.(2) The child expression corresponding to the sequential collec-tion is evaluated.(3) For each element of the resulting collection:(a) The anonymous function child expression is called withthe element passed as the value for the local input. Inputvalues of the overall program DAG are also passed to theanonymous function.(b) The resulting value is appended to the buffer.(4) The buffer is returned as the output of the map expression.The current implementation of CBGP also supports a filter

Higher Order Function expression which follows a similar process.The anonymous function for the filter expression must returna Boolean value. Additional

Higher Order Function expressions,such as reducing and partitioning, are possible but have not beenimplemented yet in any CBGP system.

It is becoming increasingly popular for programming languagesto be accompanied by rich, first-class, data oriented, specificationtooling. These tools enable static and run-time validations of codebehavior. One example of this tooling is the typing module in-troduced into the standard library of the Python programminglanguage as of version 3.5. This library adds type hint annotationsto function definitions, which complement the primarily objectoriented programming language [19]. The Clojure

Spec library isan example of an exceptionally expressive specification frameworkthat goes beyond type checking [10]. It has seen rapid adoption afterits emergence into Clojure’s functional programming ecosystem.These new technologies allow for systems to reason about pro-grams, and their behavior, as first-class objects which is profoundlyuseful for program synthesis systems.The prototype implementation of Code Building Genetic Pro-gramming uses Python’s type annotations as specifications to thevalues and functions encapsulated in expressions. Importantly, forpolymorphic types the annotations can go beyond the generic data-type. For example, a list of strings might be annotated as list oras

List[str] . The

List[str] annotation can be decomposed as

ECCO ’20, July 8–12, 2020, CancÃžn, Mexico Edward Pantridge and Lee Spector a collection type (

List ) and an element type ( str ). The typing system also supports the comparison of types at run-time. For ex-ample, list is a sub-type of

Sequence by inheritance and int is asub-type of

Union[int, float] by composition.Furthermore, language features such as reflection can be used toautomatically discover the available functions, types, and valuesdefined in the environment as well as their associated specifica-tions. This allows CBGP to be injected into an arbitrary run-timeenvironment, with existing code and external dependencies, andcreate a set of expressions from the discover-able classes, functions,and variables. The only assumption made by CBGP is that there isan available specification for everything it finds. The implicationof this is that CBGP can synthesize programs which utilize humanwritten code, previously synthesized code, and external librarieswithout any additional modification or configuration.As a consequence, in real world applications the performance ofCBGP can be improved by a human developer creating additionalabstraction for the CBGP system to use. This kind of cooperationbetween a human programmer and a program synthesis frameworkis infeasible under any other known method because incorpora-tion of new human written abstractions require modification orlaborious configuration of the program synthesis method.

Once a set of expressions has been defined within the context of aCode Building Genetic Programming application, program DAGscan be produced via the composition of expressions. An expressionbecomes a program DAG once it is assigned child nodes that satisfyall of the arguments required by the specification. Each child nodemust also be a valid program DAG.

Constant , Input , and zero-arity

Function expressions are always valid program DAGs.

Although a

Function expression has a specification for the argu-ments and return value, it is often possible to produce a more exactspecification of the expression once the precise specification of oneor more of its child expressions is known.For example, the

Take expression described in Figure 2 is initiallyspecified to return a

List . However, it is known at developmenttime that the

Take method will return a list with the same elementtype as the list the method was called on. If a

MyList expressionfrom Figure 2 is the child of a

Take expression, the correct spec-ification of the return type for the

Take expression should be a

List[str] .We name this process of updating the specification of an ex-pression instance “expression reification” because it concretizesattributes of the specification that become implicit given the addi-tional information provided by the specification of the child nodes.Figure 3 shows an example of different ways an arithmetic additionexpression might be reified in a program DAG.Reification is implemented in CBGP as a set of zero or more“reification rules” that are assigned to each expression based onits logical behavior. Figure 4 describes the reification rule typesthat are used in the CBGP implementation created for this research.Most function-like expressions either do not require reification, or

Figure 3: Box 1 diagrams an “Add” expression with its de-fault specification and no arguments. Boxes 2 and 3 showtwo different possible reifications for the “Add” expressiondepending on the exact type of the arguments. require the exact reification described in one of the common rules.In other words, reification rules can be heavily reused throughoutan application.

Code Building Genetic Programming evolves a collection of ex-pressions than can be “compiled” into a program DAG using astack-based process inspired by the PushGP execution model [17].PushGP uses this stack-based model to run the evolved programs,while CBGP uses this process to construct the program DAGs.This compilation process translates a nested, sequential, col-lection of expressions into a type-safe, reified, program DAG thatsatisfies the overall specification of the program being evolved. Thisensures that the program produced can be evaluated and is unlikelyto produce run-time errors.The input to the DAG compilation process is a nested, sequential,structure of expressions and a specification of the return value forthe desired program. Much like PushGP, the nested sequence ofexpressions is loaded onto the exec stack of a Push interpreterto be processed one element at a time. Unlike PushGP, the Pushinterpreter used by CBGP only contains 2 additional stacks: one forDAGs and one for anonymous function definitions.While the exec stack is not empty, the top element is popped andprocessed depending on its type.

Constant and

Input instructionsare pushed to the DAG stack.

Function , Method , and

Constructor expressions undergo the following process:(1) For each argument in the expression’s specification:(a) The

DAG stack is traversed.(b) If the next DAG returns a sub-type of the expected argu-ment’s type, the DAG is a viable child to the expressionand is removed from the

DAG stack.(c) The expression’s specification is updated with its reifica-tion rules (if any).(d) Step 1 is repeated until all arguments are satisfied by achild expression, or an argument is found that can’t besatisfied by any DAG on the stack.(2) If the set of child DAGs is incomplete:(a) The expression is discarded.(b) All stacks are reverted to their original states.(c) Step 3 is skipped.(3) If all arguments are satisfied by a child node, ode Building Genetic Programming GECCO ’20, July 8–12, 2020, CancÃžn, Mexico

Reification Rule Description ExamplesPass Though Expression returns the same type as a particular argument. abs, reverse, filter

Return Element Expression returns the element type of a collection argument. first, last, nth, max, min

Args To Element Expression argument must be the element type of the collection argument. index_of, find, is_in

Args To Same Expression arguments must have the same type. concat, <, >

List Of Expression returns a list with an element type of another argument. list

Max Type Expression returns the argument type that is highest in a hierarchy. +, -, *

Figure 4: The set of reification rules used to in the prototype implementation of CBGP presented in this paper. (a) The children are assigned to the expression, creating avalid DAG.(b) The new DAG is pushed to the

DAG stack.If the top element of the exec stack is a list, it is pushed onto the anonymous function stack. It cannot be compiled into a programDAG because the required argument and return specifications arenot known until the anonymous function is used by a specific higherorder function.When a

Higher Order Function expression is processed, it under-goes the following process:(1) The DAG stack will be traversed to find a child expressionthat returns a collection type.(2) If no collection type expression is found, the higher or-der function expression is discarded and the stacks are un-changed. Steps 3 through 5 are skipped.(3) The anonymous function stack is traversed. For each liston the stack:(a) A new Push compilation process is made to compile thelist into a DAG. This compilation process uses local inputexpressions as references to an element of the collection.(b) If the nested compilation process produces a program DAGthat returns the correct type , the DAG is assigned as theanonymous function body.(4) If no viable anonymous functions are found on the anony-mous function stack, the higher order function expression isdiscarded and any changes to stacks are reverted.(5) If the higher order function expression has both of the re-quired child nodes, it becomes a valid DAG and is pushed tothe DAG stack.If a local input expression is processed outside of the compilationof an anonymous function, it is ignored and has no effect on thestacks.Once the exec stack is empty, the DAG stack will hold zero ormore program DAGs. To find the single DAG that will be consideredthe program DAG, elements are popped from the

DAG stack until aDAG is found whose return type is the same type (or a sub-type) ofthe return value for the desired program.It is possible that this compilation does not produce a programDAG that satisfies the specification of the program being evolved.This situation must be handled in the error functions that are usedto guide evolution toward a solution. For example,

Filter expressions must find a predicate anonymous function thatreturns a Boolean

The novel capabilities of CBGP come from its expressive programDAG representation and unique process for safely building theDAGs as described in Section 5.2. The implementation of CBGPused for this research borrows all other aspects of the evolutionarycomputation from existing contemporary methods. The evolution-ary population holds individuals defined by a linear genome. Errorfunctions are used to assign errors to individuals. A parent selectionstrategy is used to pick individuals for variation. In summary, CBGPimplements a typical generational evolutionary algorithm [15].After evolution has found a solution individual, commonly usedgenome simplification methods are used to simplify individualswithout sacrificing performance. Often the simplified genomes pro-duce program DAGs with improved generalization performance [6].

Individuals in CBGP use a derivative of the Plushy genome rep-resentation found in multiple PushGP systems [14]. The genomeis a flat sequence of expressions and structure tokens. The twokinds of structure tokens are

OPEN tokens and

CLOSE tokens. ThePlushy genomes can be translated into a Push representation thatcan be compiled into a program DAG via the process outlined inSection 5.2.This linear structure allows for the use of a wide range of vari-ation operators, and has been shown to yield better search re-sults [7, 9]. Also, the layers of indirection when translating Plushygenomes into Push code and compiling Push code into programDAGs ensure that type-safe computational graphs can be evolvedwithout suffering from the bloat that accompanies using graph (ortree) structures directly.

Once an individual’s genome has been translated into Push codeand compiled into a program DAG, it can be evaluated by an errorfunction. The general structure of a typical error function is toevaluate the program on set of training cases containing input-output pairs. If the Push compilation process did not produce aprogram DAG that returns the required type for the given problem,the individual is assigned penalty errors for every training case.It is also possible for a program DAG to raise exceptions duringrun-time because not all expressions are defined for every valueof their argument types. For example, a function with an integerargument might only return when given a natural integer and raisean exception when given a negative integer. If this happens during

ECCO ’20, July 8–12, 2020, CancÃžn, Mexico Edward Pantridge and Lee Spector program DAG evaluation, the exception is caught and a penaltyerror is assigned for the training case.One advantage CBGP has over PushGP is that the program DAGsonly need to be compiled once and then the individual can be eval-uated on all training cases. PushGP requires a separate executionof the stack-based interpreter for every training case. The com-putation graph representation of the programs produced duringCBGP are much faster to compute than the stack-based executionin PushGP. Given that evaluation is the most expensive step of theevolutionary cycle, CBGP has the potential to dramatically reducethe cost of evolution for program synthesis tasks.

One appealing aspect of G3P systems is their ability to producesource code that is potentially readable by humans and is portablebetween codebases. This capability implies there is a possibilitythat a sufficiently capable GP system could eventually be usedin cooperation with human developers to contribute to the samecodebase.The program DAGs synthesized by Code Building Genetic Pro-gramming are roughly analogous to abstract syntax trees. Usingknowledge of the host-language’s syntax, it is possible to producesource code from a program DAG. This process makes the outputof CBGP human interpret-able and portable, similar to G3P.As mentioned previously, program DAGs evolved by CBGP con-tain expressions that wrap existing functions and methods. Thus,the source code representation of a program DAG includes calls tothese functions and methods. This further motivates the potentialcooperation between human programmers and a program synthesisframework as mentioned in Section 4.

For every benchmark problem, the CBGP system was configuredto use the following evolutionary settings.Parameter SettingRuns 31Generations 300Population Size 1000Selection LexicaseVariation UMADRandom Training Cases 100Test Cases 1000With the exception of the relatively low number of runs perproblem, these settings are comparable to those used by otherGP frameworks in previous publications. We leave the study andcalibration of these settings for CBGP as future work.We tested a prototype implementation of CBGP on two small setsof benchmark problems. The first is a novel set of benchmarks whichare designed to demonstrate CBGP’s ability to handle polymorphictypes and integrate with existing codebases. The second is a sampleof problems taken from the “General Program Synthesis BenchmarkSuite” which has become a standard benchmark set for programsynthesis GP systems since its introduction in 2015 [8].The novel benchmarks presented in this section are similar totypical utility functions that might be implemented in real world DateTime TimeDelta Pathyear() -> int days() -> int to_str() -> strmonth() -> int seconds() -> int abspath() -> Pathday() -> int split() -> List[str]hour() -> int basename() -> strminute() -> int dirname() -> Pathsecond() -> int isabs() -> booljoin(other: Path) -> Path

Figure 5: APIs of the classes defined for the benchmarkproblems. These classes are annotated versions of existingclasses provided by the Python standard library. This collec-tion of classes are meant to demonstrate CBGP’s ability tointerface with pre-existing codebases. applications as part of a larger system. The “General Program Syn-thesis Benchmark Suite” problems are derivatives of introductoryacademic assignments [8]. In the following subsections we willdescribe the novel benchmarks created for CBGP.

The “Days Between” problem is designed to demonstrate CBGP’sability to work with pre-defined types and classes. The problemprompt is as follows:Given two DateTime objects, return the absolute num-ber of days between them.The API of

DateTime class and the related

TimeDelta class canbe found in Figure 5. If should also be noted that

DateTime and

TimeDelta objects can be compared with comparison functions (ie. < , ≤ , > , ≥ , == ) and shifted with arithmetic functions (ie. add , sub ).To our knowledge, no program synthesis GP methods haveshown a capability to work with date and time data types, withoutexplicit configuration and extension. The “Filter Bounds” problem shows CBGP’s ability to produce func-tions that work with different instances of a polymorphic type. The“Filter Bounds” problem also requires use of higher order functions.The problem prompt is as follows:Given a list of elements that are all of the same compa-rable type, T , and two instance of type T representinga lower and upper bound, filter the list to the elementsthat fall between two bounds (inclusively). For exam-ple, if given the list [6,5,4,3,2,1] , the lower bound , and the upper bound the result should be [5,4,3] .Also, given the list ["a","b","c"] , the lower bound "x" , and the upper bound "zzz" the result should bean empty list.The datasets of training and test cases use a variety of comparabletypes for T . The evolved solution is expected to work genericallyfor all lists of comparable elements. R The “Prefix Paths” problem is designed with similar goals tothe “Days Between” problem, except with the added complexity of ode Building Genetic Programming GECCO ’20, July 8–12, 2020, CancÃžn, Mexico requiring the use of a higher order function. The problem promptis as follows:Given a

Path object representing a root directory anda list of file names (as strings), return a list of

Path objects that join the root path and each filename. Theresulting list of

Path objects should be in the sameorder as the given filenames. For example, given aroot of

Path("/tmp") and a list of files ["log.txt", "data.csv"] , the result should be [Path("/tmp/log.txt"), Path("/tmp/data.csv")] .The API for the ‘Path‘ class used in this benchmark can be foundin Figure 5. The solution program should work with absolute andrelative root paths.

We present solution rates of a Code Building Genetic Programmingsystem on the three novel benchmarks in the following table.Problem Solutions RateDays Between 31/31 100%Filter Bounds 31/31 100%Prefix Paths 30/31 96.8%These benchmark problems are not complex and these highsolution rates don’t necessarily indicate a strong overall searchperformance of CBGP. Instead, these problems are meant to demon-strate particular novel capabilities of CBGP described in Section 8.To highlight the successful demonstration of these capabilities, Fig-ure 6 contains generated Python code which was produced using asolution DAG from one evolutionary run and simple string format-ting rules that transcribe a program DAG into valid source code.We see usages of functions, methods, constants, constructors, andhigher order functions in the generated source code.We also tested CBGP on seven problems from the “General Pro-gram Synthesis Benchmark Suite” in order to compare performancebetween CBGP and other contemporary program synthesis meth-ods. Figure 7 shows the solution rates between CBGP, PushGP, andG3P.The solution rates for CBGP are significantly higher (with aP-value of 0.05) than PushGP for

Negative To Zero , Median , Small-est , and

Vector Average . CBGP performs significantly worse thanPushGP on

Compare String Lengths and

Replace Space With Newline .Code Building GP is not significantly worse that G3P on any ofthe problems we experimented with and finds significantly moresolutions for

Negative To Zero , Median , Number IO , Smallest , and

Vector Average .It should be noted that differences in supported functions/instructionsmake comparison between CBGP, PushGP, and G3P problematic.The CBGP implementation created for this research uses a subsetof Python’s built-in functions and types to create expressions, inline with the design goals of the system . PushGP and G3P usemanually curated instruction sets and grammars that are designednot to “cheat” at solving the benchmarks with instructions thatare too close to a solution. This practice is useful when comparingmethods, but is unrepresentative of real-world applications, for A listing of python functions used in our CBGP prototype can be found in thesource code files at https://github.com/erp12/CodeBuildingGeneticProgramming-ProtoType/tree/master/push4/library which any augmentation or restriction of the instruction set wouldbe acceptable if it produces results that meet a current need.In most cases, the improved performance of CBGP can easily beexplained by the differences in supported functions. For example,Python has a built-in sum function that will sum a list of numbers.This function is made available to CBGP but neither PushGP, norG3P, choose to support an equivalent operation. This gives CBGP asignificant advantage on the “Vector Average” problem.Another crucial difference between PushGP, G3P, and CBGP isconfiguration of which operations to use for a particular problem.Both PushGP and G3P select a subset of supported operations anddata types to include before an evolutionary run. This configura-tion varies for each benchmark problem. The CBGP experimentspresented in this paper use the entire set of core functions andclasses that were defined and annotated to be identical to the builtin capabilities of the Python language. This puts CBGP at a dis-advantage because its search space has not been narrowed to therelevant operatoins.We suggest that future research should test CBGP more rigor-ously, including some experimentation with a set of expressionsthat matches what is available in PushGP and G3P for each problem.

10 DISCUSSION AND FUTURE WORK

The theoretical capabilities of Code Building Genetic Programmingpromise a wider range of potential applications than any other pro-gram synthesis GP system. The initial indication from our empiricalresults show that CBGP can demonstrate these applications, at leastfor simple problems. To our knowledge, CBGP is the first induc-tive programming method, genetic programming or otherwise, todemonstrate the ability to synthesize programs that utilize arbitrarypreexisting data types and an ability to handle polymorphism.Furthermore, the ability for CBGP to gather its own set of sup-ported expressions by leveraging technologies like reflection greatlyreduces the requirements on external configuration. This makesCBGP a much simpler program synthesis framework, potentiallysuitable for real world applications by non-expert practitioners.With regard to problem solving capabilities, our initial cruderesults indicate that CBGP is comparable or superior on some prob-lems, while severely lacking on others. More rigorous experimentson a wider set of benchmarks is required to understand the generaladvantages and disadvantages between CBGP and its contempo-raries. This research is ongoing.The following subsections will discuss areas of future researchwhich would improve the overall quality of a Code Building GeneticProgramming system.

Computational graphs, like the ones constructed by CBGP, arecommon representations of executable procedures. One advantageto using computational graphs is the ability to canonicalize andoptimize the graphs for improved performance.Big data frameworks, like Spark, utilize lazy evaluation of execu-tion plans consisting of map-reduce operations to allow for queryoptimization [20]. The optimizer mutates the DAG into a moreefficient computation that has the same behavior. This optimizationcan also be used to canonicalize an execution plan so that caching

ECCO ’20, July 8–12, 2020, CancÃžn, Mexico Edward Pantridge and Lee Spector def days_between(dt1, dt2):return abs(sub(dt1, dt2).days())def prefix_files(root, filenames):return map(lambda _0: root.join(Path(_0)), filenames)def filter_bounds(lst, lower, upper):return filter(lambda _0: lt(lt(_0, lower), le(_0, upper)), lst)def replace_space_with_newline(input1):return sub(len(input1), print_tap(input1.replace(" ", "\n", -87)).count("\n"))def negative_to_zero(input1):return map(lambda _0: max(bool2int(not(float2bool(0.5724738469524758))), _0), input1)

Figure 6: A small sample of solution code snippets produced by CBGP on the benchmark problems described in Section 8. Thefunctions that are not built-in Python functions (ie. bool2int , print_tap ) are simple wrapper functions that add the necessaryannotations. Notice that CBGP generated the name “_0” for the arguments to lambda functions.Figure 7: Solution rates between CBGP, PushGP, and G3P ona subset of the General Program Synthesis Benchmark Suite. strategies can detect if a equivalent computation has already beenperformed.Code Building GP could eventually utilize similar optimizationalgorithms to produce programs that are more resource efficientand easier to convert into source code. The code in Figure 6 hasmultiple instances of unnecessary function calls. The presentedsolution to the “Negative to Zero” problem has an 4 node sub-DAGthat will always return a zero. Genome simplification was unableto address these issues, but it is likely that a more sophisticatedcanonicalization process would. The prototype CBGP implementation uses data type based specifi-cations. Types are a relatively weak form of specification becausesome functions are not defined across the entire argument space.As specification tooling matures, it will be beneficial to implementCBGP such that it utilizes more information than data types. Thiswill allow evolution to avoid most, if not all, run-time errors in its program DAGs which will reduce the use of penalty errors andsmooth out the search space.

11 CONCLUSION

In this paper, we present Code Building Genetic Programming andshow early demonstrations of its unique capabilities. The bench-marks clearly demonstrate the use of preexisting functions andclasses in evolved program DAGs, as well as the transcription ofDAGs into valid, type-safe, Python code. This would not be pos-sible without the introduction of expression reification and thestack-based compilation process.Although our comparison to other program synthesis frame-works is flawed, the problem solving ability of CBGP is promising.Regardless, we recognize the value of first-class specifications andtools like reflection in making program synthesis methods moreadaptive to a given use case. We hope these lessons will help pro-pel the field beyond the academic benchmarks towards real-worldapplications.Finally, we direct future research towards rigorous evaluation ofCBGP, DAG canonicalization, richer specification, and attemptedcollaborations between a CBGP system and a human programmeron a complex, real-world application.

ACKNOWLEDGMENTS

We thank Bill Tozier, Nicholas McPhee, and the members of theHampshire College Institute for Computational Intelligence fordiscussions that advanced this work. This material is based uponwork supported by the National Science Foundation under GrantNo. 1617087. Any opinions, findings, and conclusions or recom-mendations expressed in this publication are those of the authorsand do not necessarily reflect the views of the National ScienceFoundation.

REFERENCES [1] Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin,and Daniel Tarlow. 2016. DeepCoder: Learning to Write Programs.

CoRR ode Building Genetic Programming GECCO ’20, July 8–12, 2020, CancÃžn, Mexico abs/1611.01989 (2016). arXiv:1611.01989 http://arxiv.org/abs/1611.01989[2] Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O’Neill. 2017.A Grammar Design Pattern for Arbitrary Program Synthesis Problems in GeneticProgramming. In

Genetic Programming . Springer International Publishing, Cham,262–277.[3] Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O’Neill. 2018.Extending Program Synthesis Grammars for Grammar-Guided Genetic Program-ming. In

Parallel Problem Solving from Nature – PPSN XV . Springer InternationalPublishing, Cham, 197–208.[4] Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Push-meet Kohli, Jonathan Taylor, and Daniel Tarlow. 2016. TerpreT: A ProbabilisticProgramming Language for Program Induction.

CoRR abs/1608.04428 (2016).arXiv:1608.04428 http://arxiv.org/abs/1608.04428[5] Sumit Gulwani. 2011. Automating String Processing in Spread-sheets using Input-Output Examples. In

PoPL’11, January 26-28, 2011,Austin, Texas, USA

Proceedings of the Genetic and Evolutionary Computation Conference(GECCO âĂŹ17) . Association for Computing Machinery, New York, NY, USA,937âĂŞ944. https://doi.org/10.1145/3071178.3071330[7] Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. ProgramSynthesis Using Uniform Mutation by Addition and Deletion. In

Proceedings ofthe Genetic and Evolutionary Computation Conference (GECCO ’18) . ACM, NewYork, NY, USA, 1127–1134. https://doi.org/10.1145/3205455.3205603[8] Thomas Helmuth and Lee Spector. 2015. General Program Synthesis BenchmarkSuite. In

Proceedings of the 2015 Annual Conference on Genetic and EvolutionaryComputation (GECCO âĂŹ15) . Association for Computing Machinery, New York,NY, USA, 1039âĂŞ1046. https://doi.org/10.1145/2739480.2754769[9] Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook.2017. Linear Genomes for Structured Programs. In

Genetic Programming Theoryand Practice XIV . Springer.[10] Rich Hickey. 2016. clojure.spec - Rationale and Overview. (May 2016). https://clojure.org/about/spec [11] Susumu Katayama. 2008. Efficient Exhaustive Generation of Functional ProgramsUsing Monte-Carlo Search with Iterative Deepening. In

PRICAI 2008: Trends inArtificial Intelligence . Springer Berlin Heidelberg, Berlin, Heidelberg, 199–210.[12] Michael OâĂŹNeill and Lee Spector. 2019. Automatic programming: The openissue?

Genetic Programming and Evolvable Machines (2019). https://doi.org/10.1007/s10710-019-09364-2[13] Edward Pantridge, Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector.2017. On the Difficulty of Benchmarking Inductive Program Synthesis Methods.In

Proceedings of the Genetic and Evolutionary Computation Conference Companion(GECCO âĂŹ17) . Association for Computing Machinery, New York, NY, USA,1589âĂŞ1596. https://doi.org/10.1145/3067695.3082533[14] Edward R. Pantridge and Lee Spector. 2018. Plushi: an embeddable, languageagnostic, push interpreter. In

Proceedings of the Genetic and Evolutionary Compu-tation Conference Companion, GECCO 2018, Kyoto, Japan, July 15-19, 2018 . ACM,1379–1385. https://doi.org/10.1145/3205651.3208296[15] Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008.

A fieldguide to genetic programming . Published via http://lulu.com and freely avail-able at . (With contributions by J. R.Koza).[16] Conor Ryan, JJ Collins, and Michael O. Neill. 1998. Grammatical evolution:Evolving programs for an arbitrary language. In

Genetic Programming

Presented as part of the 9th { USENIX } Symposium on NetworkedSystems Design and Implementation ( { NSDI }12)