Learning to Represent Programs with Property Signatures
PPublished as a conference paper at ICLR 2020 L EARNING TO R EPRESENT P ROGRAMSWITH P ROPERTY S IGNATURES
Augustus Odena, Charles Sutton
Google Research {augustusodena,charlessutton}@google.com A BSTRACT
We introduce the notion of property signatures, a representation for programs andprogram specifications meant for consumption by machine learning algorithms.Given a function with input type τ in and output type τ out , a property is a functionof type: ( τ in , τ out ) → Bool that (informally) describes some simple propertyof the function under consideration. For instance, if τ in and τ out are both listsof the same type, one property might ask ‘is the input list the same length as theoutput list?’. If we have a list of such properties, we can evaluate them all for ourfunction to get a list of outputs that we will call the property signature. Crucially,we can ‘guess’ the property signature for a function given only a set of input/outputpairs meant to specify that function. We discuss several potential applications ofproperty signatures and show experimentally that they can be used to improveover a baseline synthesizer so that it emits twice as many programs in less thanone-tenth of the time. NTRODUCTION
Program synthesis is a longstanding goal of computer science research (Manna & Waldinger, 1971;Waldinger et al., 1969; Summers, 1977; Shaw; Pnueli & Rosner, 1989; Manna & Waldinger, 1975),arguably dating to the 1940s and 50s (Copeland, 2012; Backus et al., 1957). Deep learning methodshave shown promise at automatically generating programs from a small set of input-output examples(Balog et al., 2016; Devlin et al., 2017; Ellis et al., 2018b; 2019b). In order to deliver on this promise,we believe it is important to represent programs and specifications in a way that supports learning.Just as computer vision methods benefit from the inductive bias inherent to convolutional neuralnetworks (LeCun et al., 1989), and likewise with LSTMs for natural language and other sequence data(Hochreiter & Schmidhuber, 1997), it stands to reason that ML techniques for computer programswill benefit from architectures with a suitable inductive bias.We introduce a new representation for programs and their specifications, based on the principlethat to represent a program, we can use a set of simpler programs . This leads us to introduce theconcept of a property, which is a program that computes a boolean function of the input and outputof another program. For example, consider the problem of synthesizing a program from a small setof input-output examples. Perhaps the synthesizer is given a few pairs of lists of integers, and theuser hopes that the synthesizer will produce a sorting function. Then useful properties might includefunctions that check if the input and output lists have the same length, if the input list is a subset ofthe output, if element of the output list is less than element , and so on.The outputs of a set of properties can be concatenated into a vector, yielding a representation that wecall a property signature . Property signatures can then be used for consumption by machine learningalgorithms, essentially serving as the first layer of a neural network. In this paper, we demonstratethe utility of property signatures for program synthesis, using them to perform a type of premiseselection as in Balog et al. (2016). More broadly, however, we envision that property signaturescould be useful across a broad range of problems, including algorithm induction (Devlin et al., 2017),improving code readability (Allamanis et al., 2014), and program analysis (Heo et al., 2019).More specifically, our contributions are: 1 a r X i v : . [ c s . P L ] F e b ublished as a conference paper at ICLR 2020 • We introduce the notion of property signatures, which are a general purpose way of featurizingboth programs and program specifications (Section 3). • We demonstrate how to use property signatures within a machine-learning based synthesizer for ageneral-purpose programming language. This allows us to automatically learn a useful set of propertysignatures, rather than choosing them manually (Sections 3.2 and 4). • We show that a machine learning model can predict the signatures of individual functions giventhe signature of their composition, and describe several ways this could be used to improve existingsynthesizers (Section 5). • We perform experiments on a new test set of 185 functional programs of varying difficulty, designedto be the sort of algorithmic problems that one would ask on an undergraduate computer scienceexamination. We find that the use of property signatures leads to a dramatic improvement in theperformance of the synthesizer, allowing it to synthesize over twice as many programs in less thanone-tenth of the time (Section 4). An example of a complex program that was synthesized only by theproperty signatures method is shown in Listing 1.For our experiments, we created a specialized programming language, called Searcho (Section 2),based on strongly-typed functional languages such as Standard ML and Haskell. Searcho is designedso that many similar programs can be executed rapidly, as is needed during a large-scale distributedsearch during synthesis. We release the programming language, runtime environment, distributedsearch infrastructure, machine learning models, and training data from our experiments so that theycan be used for future research. fun unique_justseen(xs :List
ROGRAMMING B Y E XAMPLE AND T HE S EARCHO L ANGUAGE
In Inductive Program Synthesis, we are given a specification of a program and our goal is to synthesizea program meeting that specification. Inductive Synthesis is generally divided into Programming byExample (PBE) and Programming by Demonstration (PBD). This work is focused on PBE. In PBE,we are given a set of input/output pairs such that for each pair, the target program takes the input tothe corresponding output. Existing PBE systems include Winston (1970), Menon et al. (2013), andGulwani (2011). A PBE specification might look like Listing 2: io_pairs = [(1, 1), (2, 4), (6, 36), (10, 100)] Listing 2: An example PBE specification.for which a satisfying solution would be the function squaring its input. Arbitrarily many functionssatisfy this specification. It is interesting but out of scope to think about ways to ensure that thesynthesis procedure recovers the ‘best’ or ‘simplest’ program satisfying the specification.Much (though not all) work on program synthesis is focused on domain specific languages that areless than maximally expressive (Gulwani, 2011; Balog et al., 2016; Wang et al., 2017; Alur et al., Searcho is heavily based on code written by Niklas Een, which is available at https://github.com/tensorflow/deepmath/tree/master/deepmath/zz/CodeBreeder Available at https://github.com/brain-research/searcho Though note that in this work and in prior work, the search procedure used will tend to emit ‘shorter’programs first, and so there is an Occam’s-Razor-type argument (Spade & Panaccio, 2019) to be made that you should get this for free. Searcho code is compiled to bytecode and run on the Searcho Virtual Machine. Codeis incrementally compiled, which means that the standard library and specification can be compiledonce and then many programs can be pushed on and popped off from the stack in order to check themagainst the specification. Searcho is strongly typed with algebraic datatypes (Pierce & Benjamin,2002) Searcho includes a library of 86 functions, all of which are supported by our synthesizer. Thisis a significantly larger language and library than have been used in previous work on neural programsynthesis.We have also implemented a baseline enumerative synthesizer. The main experiments in this paperwill involve plugging the outputs of a machine learning model into the configuration for our baselinesynthesizer to improve its performance on a set of human-constructed PBE tasks.
ROPERTY S IGNATURES
Consider the PBE specification in Listing 3: io_pairs = [ ([1, 2345, 34567], [1, 2345, 34567, 34567, 2345, 1]), ([True, False], [True, False, False, True]), (["Batman"], ["Batman", "Batman"]), ([[1,2,3], [4,5,6]], [[1,2,3], [4,5,6], [4,5,6], [1,2,3]]) ] Listing 3: An example PBE Specification.We can see that the function concatenating the input list to its reverse will satisfy the specification,but how can we teach this to a computer? Following Balog et al. (2016) we take the approach oftraining a machine learning model to do premise selection for a symbolic search procedure. Buthow do we get a representation of the specification to feed to the model? In Balog et al. (2016), themodel acts only on integers and lists of integers, constrains all integers to lie in [ − , , hasspecial-case handling of lists, and does not deal with polymorphic functions. It would be hard toapply this technique to the above specification, since the first example contains unbounded integers,the second example contains a different type than the first , and the third and fourth examples containrecursive data structures (lists of characters and lists of integers respectively).Thankfully, we can instead learn a representation that is composed of the outputs of multiple otherprograms running on each input/output pair. We will call these other programs properties. Considerthe three properties in Listing 4. all_inputs_in_outputs ins outs = all (map (\x -> x in outs) ins) ouputs_has_dups ins outs = has_duplicates (outs) input_same_len_as_output ins outs = (len ins) == (len outs) Listing 4: Three function projections that can act on the specification from Listing 3.Each of these three programs can be run on all 4 of the input output pairs to yield a
Boolean . Thefirst always returns
True for our spec, as does the second. The third always returns
False on the givenexamples, although note that it would return
True if the examples had contained the implicit base caseof the empty list. Thus, we can write that our spec has the ‘property signature’ [ True , True , False ] . In this paper, we will present illustrative programs in Haskell syntax to make them more broadly readable.Searcho programs will be presented in Searcho syntax, which is similar. Types have been shown to substantially speed up synthesis. See e.g. Figure 6 of Feser et al. (2015). So any function satisfying the spec will be parametrically polymorphic.
BSTRACTING P ROPERTIES INTO S IGNATURES
Now we describe our representation for a program f :: τ in → τ out . Each property is a program p :: ( τ in , τ out ) → Bool that represents a single “feature” of the program’s inputs and outputswhich might be useful for its representation. In this section, we assume that we have determined asequence P = [ p . . . p n ] of properties that are useful for describing f , and we wish to combine theminto a single representation of f . Later, we will describe a learning principle for choosing relevantproperties.We want the property signature to summarize the output of all the properties in P over all validinputs to f . To do this, we first extend the notion of property to a set of inputs in the naturalway. If S is a set of values of type τ in and p ∈ P , we define p ( S ) = { p ( x, f ( x )) | x ∈ S } . Because p ( S ) is a set of booleans, it can have only three possible values, either p ( S ) = { True } , or p ( S ) = { False } , or p ( S ) = { True , False } , corresponding respectively to the cases where p isalways true, always false, or neither. To simplify notation slightly, we define the function Π as Π( { True } ) = AllTrue , Π( { False } ) = AllFalse , and Π( { True , False } ) = Mixed . Finally, we candefine the property signature sig(
P, f ) for a program f and a property sequence P as sig( P, f )[ i ] = Π( p i ( V ( τ in ))) , where V ( τ in ) is the possibly infinite set of all values of type τ in .Computing the property signature for f could be intractable or undecidable, as it might requireproving difficult facts about the program. Instead, in practice, we will compute an estimated propertysignature for a small set of input-output pairs S io . The estimated property signature summarizes theactions of P on S io rather than on the full set of inputs V ( τ in ) . Formally, the estimated propertysignature is (cid:99) sig( P, S io )[ i ] := Π( { p i ( x in , x out ) | ( x in , x out ) ∈ S io } ) . (1)This estimate gives us an under-approximation of the true signature of f in the following sense: Ifwe have (cid:99) sig( P, S ) =
Mixed , we must also have sig(
P, f ) =
Mixed . If (cid:99) sig(
P, S ) =
AllTrue , theneither sig(
P, f ) =
AllTrue or sig( P, f ) =
Mixed , and similarly with
AllFalse . Estimated propertysignatures are particularly useful for synthesis using PBE, because we can compute them from theinput-output pairs that specify the synthesis task, without having the definition of f . Thus we can useestimated property signatures to ‘featurize’ PBE specifications for use in synthesis.3.2 L EARNING U SEFUL P ROPERTIES
How do we choose a set of properties that will be useful for synthesis? Given a training set of randomprograms with random input/output examples, we generate many random properties. We then prunethe random properties based on whether they distinguish between any of the programs. Then, givena test suite of programs, we do an additional pruning step: among all properties that give the samevalue for every element of the test suite, we keep the shortest property, because of Occam’s razorconsiderations. Given these ‘useful’ properties, we can train a premise selector (Balog et al., 2016)to predict library function usage given properties. Specifically, from the remaining properties, we Although we write f as a function, that is, as returning an output, it is easy to handle procedures that do notreturn a value by defining τ out to be a special void type. HY ARE P ROPERTY S IGNATURES U SEFUL ?Experiments in the next section will establish that property signatures let our baseline synthesizeremit programs it previously could not, but we think that they can have broader utility: • They allow us to represent more types of functions. Property signatures can automatically dealwith unbounded data types, recursive data types, and polymorphic functions. • They reduce dependency on the distribution from which examples are drawn. If the user ofa synthesizer gives example inputs distributed differently than the training data, the ‘estimated’properties might not change much. • They can be used wherever we want to search for functions by semantics. Imagine a search enginewhere users give a specification, the system guesses a property signature, and this signature guess isused to find all the pre-computed functions with similar semantics. • Synthesized programs can themselves become new properties. For example, once I learn a programfor primality checking, I can use primality checking in my library of properties.
ROGRAM S YNTHESIS WITH PROPERTY SIGNATURES
We design an experiment to answer the following question: Can property signatures help us synthesizeprograms that we otherwise could not have synthesized? As we will show, the answer is yes!4.1 E
XPERIMENTAL S ETUP
How Does the Baseline Synthesizer Work?
Our baseline synthesizer is very similar to that inFeser et al. (2015) and works by filling in typed holes . That is, we infer a program type τ in → τ out from the specification and the synthesizer starts with a empty ‘hole’ of type τ in → τ out and then fillsit in all possible ways allowed by the type system. Many of these ways of filling-in will yield newholes, which can in turn be filled by the same technique. When a program has no holes, we check if itsatisfies the spec. We order the programs to expand by their cost, where the cost is essentially a sumof the costs of the individual operations used in the program.At the beginning of the procedure, the synthesizer is given a configuration, which is essentially aweighted set of pool elements that it is allowed to use to fill in the holes. A pool element is a rewriterule that replaces a hole with a type-correct Searcho program, which may itself contain its own, newholes. In our synthesizer, there is one possible pool element for each of the 86 library functions inSearcho, which calls the library function, with correctly-typed holes for each of its arguments. Theconfiguration will specify a small subset of these pool elements to use during search. It is through theconfiguration that we will use machine learning to inform the search procedure, as we describe later.See Appendix A.1 for further details on this baseline system. How is the Training Data Generated?
Our test corpus contains programs with 14 different types.For each of those 14 types, we randomly sample configurations and then randomly generate trainingprograms for each configuration, pruning for observational equivalence. We generate up 10,000 This argument does rely on properties being somehow simple. For instance, if the property does notcompute whether a list contains the value 777, it cannot fail to generalize with respect to the presence or absenceof 777. Since we search for properties in a shortest-first fashion, the properties we find should be biased towardsimplicity, though certainly this hypothesis merits more experimental validation. In the synthesis literature, this approach of first discovering the high-level structure and then filling it inis sometimes called ‘top-down’ synthesis (Solar-Lezama, 2018). Top-down synthesis is to be contrasted with‘bottom-up’ synthesis, in which low-level components are incrementally combined into larger programs.
Bool → Bool ). We also generate and prune random properties asdescribed in Section 3.2. See Listing 5 for examples of useful properties that were generated. \:(List
List
How was the Test Set Constructed?
We’ve constructed a test set of 185 human generated pro-grams ranging in complexity from one single line to many nested function calls with recursion.Programs in the test set include computing the GCD of two integers, computing the n -th fibonaccinumber, computing the intersection of two sets, and computing the sum of all pairs in two lists. Weensure that none of the test functions appear in the training set. See the open source code for moredetails on this. What is the Architecture of the Model?
As mentioned above, we train a neural network topredict the number of times each pool element will appear in the output. This neural network is fullyconnected, with learned embeddings for each of the values
AllTrue , AllFalse and
Mixed . How does the Model Output Inform the Search Procedure?
Since we have a large number ofpool elements (86), we can’t run the synthesizer with all pool elements if we want to find programsof reasonable length. This is both because we will run out of memory and because it will take toolong. Thus, we randomly sample configurations with less pool elements. We then send multiple suchconfigurations to a distributed synthesis server that tries them in parallel.When we use the model predictions, we sample pool elements in proportion to the model’s predictednumber of times that pool element appears. The baseline samples pool elements in proportion to theirrate of appearance in the training set.4.2 U
SING P ROPERTY S IGNATURES L ETS US S YNTHESIZE N EW F UNCTIONS
We ran 3 different runs of our distributed synthesizer for 100,000 seconds with and without the aidof property signatures. The baseline synthesizer solved 28 test programs on average. With propertysignatures, the synthesizer solved an average of 73 test programs. See Figure 1 for more discussion.Indeed, it can be seen from the figure that not only did the synthesizer solve many more test programsusing property signatures, but it did so much faster, synthesizing over twice as many programs inone-tenth of the time as the baseline.4.3 C
OMPARISON WITH D EEP C ODER
We have conducted an experiment to compare premise selection using Property Signatures to thepremise selection algorithm from (Balog et al., 2016). This required considerable modifications tothe experimental procedure.First, since the premise-selection part of DeepCoder can only handle Integers and lists of Integers,we restricted the types of our training and test functions. In particular, we read through (Balog et al.,2016) and found four function types in use: f :: [Int] -> [Int] g :: [Int] -> Int h :: ([Int], [Int]) -> Int k :: ([Int], Int) -> Int Listing 6: The four function types used in DeepCoder.6ublished as a conference paper at ICLR 2020Figure 1: Comparison of synthesis with property signatures and without property signatures. The x -axis denotes time elapsed in seconds. Roughly speaking, we let the distributed synthesizer runfor 1 day. The y -axis represenets the cumulative number of programs synthesized. On average, thebaseline solved 28 of the test programs, while the baseline enhanced with property signatures solved73 test programs (around 2.6 times as many programs). Both the baseline and the run with propertysignatures were run with three different random seeds. Altogether, this experiment provides strongevidence that property signatures can be useful.The types of f and g in 6 are taken directly from (Balog et al., 2016). The types of h and k are inferredfrom examples given in the appendix of (Balog et al., 2016). Their DSL does not technically havetuples, but we have wrapped the inputs of their ‘two-input-functions’ in tuples for convienence.Second, since DeepCoder can only handle integers betwen − and , we first re-generated allof our random inputs (used for ‘hashing’ of generated training data) to lie in that range. We thengenerated random training functions of the above four types. We then made a data set of trainingfunctions associated with 5 input-output pairs, throwing out pairs where any of the outputs wereoutside the aforementioned range, and throwing out functions where all outputs contained somenumber outside that range.Third, of the examples in our test set with the right types, we modified their input output pairs in asimilar way. We filtered out functions that could not be so modified. After doing so, we were leftwith a remaining test suite of 32 functions.Finally, we trained a model to predict functions-to-use from learned embeddings of the input-outputpairs, as in DeepCoder. We didn’t see a description of how functions with multiple inputs had theirinputs embedded, so we elected to separate them with a special character, distinct from the nullcharacters that are used to pad lists.Compared with the Property Signatures method, this technique results in far fewer synthesized testset programs. We did 3 random restarts for each of DeepCoder, Property Signatures, and the RandomBaseline (recall that the random baseline itself is already a relatively sophisticated synthesis algorithm- it’s just the configurations that are random). The 3 DeepCoder runs synthesized an average of . test programs, while the Property Signature runs (trained on the same modified training dataand tested on the same modified test data) synthesized . . The random baseline synthesized programs on average.A priori, this seems like a surprisingly large gap, but it actually fits with what we know from existingliterature. Shin et al. (2018) observe something similar: which is that DeepCoder-esque techniquestend to generalize poorly to a a test set where the input-output pairs come from a different distributionthan they do in training. This is the case in our experiment, and it will be the case in any realistic7ublished as a conference paper at ICLR 2020setting, since the test set will be provided by users. Property Signatures are (according to ourexperiments) much less sensitive to such shift. This makes intuitive sense: whether an input list ishalf the length of an output list (for instance) is invariant to the particular distribution of members ofthe list.Note that even if Property Signatures did not outperform DeepCoder on this subset of our test set,they would still constitute an improvement due to their allowing us to operate on arbitrary programsand inputs types. REDICTING PROPERTY SIGNATURES OF F UNCTION C OMPOSITIONS
Most programs involve composing functions with other functions. Suppose that we are trying tosolve a synthesis problem from a set of input/output examples, and during the search we create apartial program of the form f ( g ( x )) for some unknown g . Since we know f , we know its propertysignature. Since we have the program specification, we also have the estimated property signature for f ◦ g := f ( g ( x )) . If we could somehow guess the signature for g , we could look it up in a cacheof previously computed functions keyed by signature. If we found a function matching the desiredsignature, we would be done. If no matching function exists in the cache, we could start a smallersearch with only the signature of g as the target, then use that result in our original search. We couldattempt to encode the relationship between f and g into a set of formal constraints and pass that to asolver of some kind (De Moura & Bjørner, 2008), and while that is potentially an effective approach,it may be difficult to scale to a language like Searcho. Instead, we can simply train a machine learningmodel to predict the signature of g from the signature of f and the signature of f ◦ g .Here we present an experiment to establish a proof of concept of this idea. First, we generated adata set of 10,000 random functions taking lists of integers to lists of integers. Then we randomlychose 50,000 pairs of functions from this list, arbitrarily designating one as f and one as g . We thencomputed the signatures of f , g and f ◦ g for each pair, divided the data into a training set of 45,000elements and a test set of 5,000 elements, and trained a small fully connected neural network topredict the signature of g from the other two signatures.On the test set, this model had 87.5% accuracy, which is substantially better than chance. Weinspected the predictions made on the test set and found interesting examples like the one in Listing7, where the model has learned to do something you might (cautiously) refer to as logical deductionon properties. This result is suggestive of the expressive power of property signatures. It also pointstoward exciting future directions for research into neurally guided program synthesis. f: \:List
There is substantial prior work on program synthesis in general. We can hardly do it justice here, butsee some of Gottschlich et al. (2018); Solar-Lezama (2018); Gulwani et al. (2017); Allamanis et al.(2018) for more detailed surveys.
Property Based Testing:
Function properties are similar to the properties from Property BasedTesting, a software testing methodology popularized by the QuickCheck library (Claessen & Hughes,2011) that has now spread to many contexts (Gallant, 2018; Holser, 2018; Hypothesis, 2018; Luu,2015; Elhage, 2017; MacIver, 2017). Quickcheck properties are human-specified and operate onfunctions, while our properties operate on input/output pairs.8ublished as a conference paper at ICLR 2020
Automated Theorem Proving:
Synthesizing programs using machine learning is related to theidea of proving theorems using machine learning (Irving et al., 2016). Synthesis and theorem provingare formally related as well (Howard, 1980).
Program Synthesis from a Programming Languages Perspective:
Most existing work on syn-thesis approaches is from the perspective of programming language design. Our baseline synthesizerborrows many ideas from Feser et al. (2015). Polikarpova et al. (2016) use refinement types (Freeman,1994) (roughly, a decidable version of dependent types - see Pierce & Benjamin (2002)) to giveprogram specifications, allowing the type-checker to discard many candidate programs. Propertysignatures can be thought of as a compromise between refinement types and dependent types: we canwrite down specifications with them that would be impossible to express in refinement types, but wecan only check those specifications empirically.
ML-Guided Program Synthesis:
More recently, researchers have used machine learning to syn-thesize and understand programs. We have mentioned Balog et al. (2016), but see all of: Nye et al.(2019); Ellis et al. (2018a); Zohar & Wolf (2018); Kalyan et al. (2018); Ellis et al. (2019a); Liang et al.(2010); Alon et al. (2019) as well. Menon et al. (2013) introduces the idea of features: a predecessorto the idea of properties. Features differ from properties in that they are hand-crafted rather thanlearned, and that they were applied only on a limited string processing domain.
Deepcoder:
The relationship between this work and Balog et al. (2016) merits special discussion.Aside from the inclusion of property signatures, they differ in the following ways: • We use a more expressive DSL. Their DSL only allows linear control flow with a small set offunctions, whereas our language is Turing complete (it has looping, recursion, etc). We also have alarger set of allowed component functions: 86 vs. 34. • Their machine learning method does not work straightforwardly for arbitrary programs. Theirtraining and test programs only deal with integers and lists of integers, while we have 14 differentfunction types. It would thus not be feasible to compare the techniques on anything but a tiny subsetof our existing test set. • The test cases in Balog et al. (2016) are generated from their enumerative synthesizer. It is thereforeguaranteed that the synthesizer will be able to emit them in a reasonable amount of time during testing,so their demonstrated improvements are ‘merely’ speed-ups. Our test cases are human generated, andover half of the programs synthesized using property signatures were not synthesized at all givenover a day of time. ONCLUSION AND F UTURE W ORK
In this work, we have introduced the idea of properties and property signatures. We have shownthat property signatures allow us to synthesize programs that a baseline otherwise was not able tosynthesize, and have sketched out other potential applications as well. Finally, we have open sourcedall of our code, which we hope will accelerate future research into ML-guided program synthesis.A
CKNOWLEDGMENTS
We would like to thank Kensen Shi, David Bieber, and the rest of the Program Synthe-sis Team for helpful discussions. We would like to thank Colin Raffel for reading a draftof the paper. Most of all, we owe a substantial debt to Niklas Een, on whose Evo pro-gramming language ( https://github.com/tensorflow/deepmath/tree/master/deepmath/zz/CodeBreeder ) the Searcho language is heavily based. Of course, barring bugs in the synthesizer, they would be synthesized eventually . R EFERENCES
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. Learning natural codingconventions. In
Symposium on the Foundations of Software Engineering (FSE) , 2014.Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. A survey of machinelearning for big code and naturalness.
ACM Computing Surveys (CSUR) , 51(4):81, 2018.Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed represen-tations of code.
Proceedings of the ACM on Programming Languages , 3(POPL):40, 2019.Rajeev Alur, Pavol ˇCerný, and Arjun Radhakrishna. Synthesis through unification. In DanielKroening and Corina S. P˘as˘areanu (eds.),
Computer Aided Verification , pp. 163–179, Cham, 2015.Springer International Publishing. ISBN 978-3-319-21668-3. URL http://ecee.colorado.edu/pavol/publications/cav15a/cav15a.pdf .J. W. Backus, R. J. Beeber, S. Best, R. Goldberg, L. M. Haibt, H. L. Herrick, R. A. Nelson, D. Sayre,P. B. Sheridan, H. Stern, I. Ziller, R. A. Hughes, and R. Nutt. The fortran automatic coding system.In
Papers Presented at the February 26-28, 1957, Western Joint Computer Conference: Techniquesfor Reliability , IRE-AIEE-ACM ’57 (Western), pp. 188–198, New York, NY, USA, 1957. ACM. doi:10.1145/1455567.1455599. URL http://doi.acm.org/10.1145/1455567.1455599 .Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow.Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989 , 2016.Koen Claessen and John Hughes. Quickcheck: a lightweight tool for random testing of haskellprograms.
Acm sigplan notices , 46(4):53–64, 2011.B.J. Copeland.
Alan Turing’s Electronic Brain: The Struggle to Build the ACE, the World’s FastestComputer . OUP Oxford, 2012. ISBN 9780199609154. URL https://books.google.com/books?id=YhQZnczOS7kC .Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In
International conference onTools and Algorithms for the Construction and Analysis of Systems , pp. 337–340. Springer, 2008.Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, andPushmeet Kohli. RobustFill: Neural program learning under noisy I/O. In
International Conferenceon Machine Learning (ICML) , volume 70 of
Proceedings of Machine Learning Research , pp.990–998, 2017. URL http://proceedings.mlr.press/v70/devlin17a.html .Nelson Elhage. Property-based testing is fuzzing, 2017. URL https://blog.nelhage.com/post/property-testing-is-fuzzing/ .Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum.Learning libraries of subroutines for neurally–guided bayesian program induction. In S. Bengio,H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),
Advances inNeural Information Processing Systems 31 , pp. 7805–7815. Curran Associates, Inc., 2018a.Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Joshua B. Tenen-baum. Library learning for neurally-guided bayesian program induction. In
NeurIPS , 2018b.Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama.Write, execute, assess: Program synthesis with a repl. arXiv preprint arXiv:1906.04604 , 2019a.Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama.Write, execute, assess: Program synthesis with a REPL. In
NeurIPS , 2019b.John K Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations frominput-output examples. In
ACM SIGPLAN Notices , volume 50, pp. 229–239. ACM, 2015.Tim Freeman. Refinement types ml. Technical report, CARNEGIE-MELLON UNIV PITTSBURGHPA DEPT OF COMPUTER SCIENCE, 1994.Andrew Gallant. Quickcheck for rust, 2018. URL https://github.com/BurntSushi/quickcheck . 10ublished as a conference paper at ICLR 2020Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, ReginaBarzilay, Saman Amarasinghe, Joshua B Tenenbaum, and Tim Mattson. The three pillars ofmachine programming. In
Proceedings of the 2nd ACM SIGPLAN International Workshop onMachine Learning and Programming Languages , pp. 69–80. ACM, 2018.Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In
Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages , POPL ’11, pp. 317–330, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0490-0. doi: 10.1145/1926385.1926423. URL http://doi.acm.org/10.1145/1926385.1926423 .Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. Program synthesis.
Foundations andTrends R (cid:13) in Programming Languages , 4(1-2):1–119, 2017.Kihong Heo, Mukund Raghothaman, Xujie Si, and Mayur Naik. Continuously reasoning about pro-grams using differential bayesian inference. In Programming Language Design and Implementation(PLDI) , 2019.Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.
Neural computation , 9(8):1735–1780, 1997.Paul Holser. junit-quickcheck, 2018. URL https://github.com/pholser/junit-quickcheck/ .William A Howard. The formulae-as-types notion of construction.
To HB Curry: essays oncombinatory logic, lambda calculus and formalism , 44:479–490, 1980.Hypothesis. Hypothesis, 2018. URL https://github.com/HypothesisWorks/hypothesis .Geoffrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Eén, François Chollet, and JosefUrban. Deepmath-deep sequence models for premise selection. In
Advances in Neural InformationProcessing Systems , pp. 2235–2243, 2016.Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani.Neural-guided deductive search for real-time program synthesis from examples. arXiv preprintarXiv:1804.01186 , 2018.Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, WayneHubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition.
Neural computation , 1(4):541–551, 1989.Percy Liang, Michael I Jordan, and Dan Klein. Learning programs: A hierarchical bayesian approach.In
Proceedings of the 27th International Conference on Machine Learning (ICML-10) , pp. 639–646,2010.Dan Luu. Afl + quickcheck = ?, 2015. URL https://danluu.com/testing/ .David R. MacIver. What is property based testing, 2017. URL https://hypothesis.works/articles/what-is-property-based-testing/ .Zohar Manna and Richard Waldinger. Knowledge and reasoning in program synthesis.
Artificialintelligence , 6(2):175–208, 1975.Zohar Manna and Richard J Waldinger. Toward automatic program synthesis.
Communications ofthe ACM , 14(3):151–165, 1971.Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. A machine learningframework for programming by example. In
International Conference on Machine Learning , pp.187–195, 2013. 11ublished as a conference paper at ICLR 2020Maxwell I. Nye, Luke B. Hewitt, Joshua B. Tenenbaum, and Armando Solar-Lezama. Learning toinfer program sketches. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.),
Proceedingsof the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, LongBeach, California, USA , volume 97 of
Proceedings of Machine Learning Research , pp. 4861–4870.PMLR, 2019. URL http://proceedings.mlr.press/v97/nye19a.html .Benjamin C Pierce and C Benjamin.
Types and programming languages . 2002.Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In
Proceedings of the 16thACM SIGPLAN-SIGACT symposium on Principles of programming languages , pp. 179–190. ACM,1989.Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program synthesis from polymorphicrefinement types. In
ACM SIGPLAN Notices , volume 51, pp. 522–538. ACM, 2016.D Shaw. Inferring lisp programs from examples.Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, and DawnSong. Synthetic datasets for neural program synthesis. 2018.Armando Solar-Lezama. Introduction to program synthesis. https://people.csail.mit.edu/asolar/SynthesisCourse/TOC.htma , 2018. Accessed: 2018-09-17.Paul Vincent Spade and Claude Panaccio. William of ockham. In Edward N. Zalta (ed.),
The StanfordEncyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, spring 2019 edition,2019.Phillip D Summers. A methodology for lisp program construction from examples.
Journal of theACM (JACM) , 24(1):161–175, 1977.R.J. Waldinger, R.C.T. Lee, and SRI International.
PROW: A Step Toward Automatic ProgramWriting . SRI International, 1969. URL https://books.google.com/books?id=3BITSQAACAAJ .Chenglong Wang, Alvin Cheung, and Rastislav Bodik. Synthesizing highly expressive sqlqueries from input-output examples. In
Proceedings of the 38th ACM SIGPLAN Conferenceon Programming Language Design and Implementation , PLDI 2017, pp. 452–466, New York,NY, USA, 2017. ACM. ISBN 978-1-4503-4988-8. doi: 10.1145/3062341.3062365. URL http://doi.acm.org/10.1145/3062341.3062365 .Patrick H. Winston. Learning structural descriptions from examples. Technical report, Cambridge,MA, USA, 1970.Amit Zohar and Lior Wolf. Automatic program synthesis of long programs with a learned garbagecollector. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett(eds.),
Advances in Neural Information Processing Systems 31 , pp. 2094–2103. Curran Associates,Inc., 2018. 12ublished as a conference paper at ICLR 2020
Data:
A PBE spec and a synthesizer configuration
Result:
A program satisfying the specification (hopefully!)
Queue.push ( hole :: τ in → τ out ); while Queue is not empty do partial_program ← GetLowestCostPartial(
Queue ) ; if HasHoles( partial_program ) then ExpandOneHole( partial_program ) ; endelse TestAgainstSpec( partial_program ) ; endend Figure 2: The top-down synthesizer that we use as a baseline in this work. In a loop until a satisfyingprogram is found or we run out of time, we pop the lowest-cost partial program from the queue of allpartial programs, then we fill in the holes in all ways allowed by the type system, pushing each newpartial program back onto the queue. If there are no holes to fill, the program is complete, and wecheck it against the spec. The cost of a partial program is the sum of the costs of its pool elements,plus a lower bound on the cost of filling each of its typed holes, plus the sum of the costs of a fewspecial operations such as tuple construction and lambda abstraction.
A A
PPENDIX
A.1 F
URTHER D ETAILS ON THE B ASELINE S YNTHESIZER