[PDF] Learning to Represent Programs with Property Signatures

Abstract

We introduce the notion of property signatures, a representation for programs and program specifications meant for consumption by machine learning algorithms. Given a function with input type τ in and output type τ out , a property is a function of type: ( τ in , τ out )→Bool that (informally) describes some simple property of the function under consideration. For instance, if τ in and τ out are both lists of the same type, one property might ask `is the input list the same length as the output list?'. If we have a list of such properties, we can evaluate them all for our function to get a list of outputs that we will call the property signature. Crucially, we can `guess' the property signature for a function given only a set of input/output pairs meant to specify that function. We discuss several potential applications of property signatures and show experimentally that they can be used to improve over a baseline synthesizer so that it emits twice as many programs in less than one-tenth of the time.

Full PDF

PPublished as a conference paper at ICLR 2020 L EARNING TO R EPRESENT P ROGRAMSWITH P ROPERTY S IGNATURES

Augustus Odena, Charles Sutton

Google Research {augustusodena,charlessutton}@google.com A BSTRACT

We introduce the notion of property signatures, a representation for programs andprogram speciﬁcations meant for consumption by machine learning algorithms.Given a function with input type τ in and output type τ out , a property is a functionof type: ( τ in , τ out ) → Bool that (informally) describes some simple propertyof the function under consideration. For instance, if τ in and τ out are both listsof the same type, one property might ask ‘is the input list the same length as theoutput list?’. If we have a list of such properties, we can evaluate them all for ourfunction to get a list of outputs that we will call the property signature. Crucially,we can ‘guess’ the property signature for a function given only a set of input/outputpairs meant to specify that function. We discuss several potential applications ofproperty signatures and show experimentally that they can be used to improveover a baseline synthesizer so that it emits twice as many programs in less thanone-tenth of the time. NTRODUCTION

Program synthesis is a longstanding goal of computer science research (Manna & Waldinger, 1971;Waldinger et al., 1969; Summers, 1977; Shaw; Pnueli & Rosner, 1989; Manna & Waldinger, 1975),arguably dating to the 1940s and 50s (Copeland, 2012; Backus et al., 1957). Deep learning methodshave shown promise at automatically generating programs from a small set of input-output examples(Balog et al., 2016; Devlin et al., 2017; Ellis et al., 2018b; 2019b). In order to deliver on this promise,we believe it is important to represent programs and speciﬁcations in a way that supports learning.Just as computer vision methods beneﬁt from the inductive bias inherent to convolutional neuralnetworks (LeCun et al., 1989), and likewise with LSTMs for natural language and other sequence data(Hochreiter & Schmidhuber, 1997), it stands to reason that ML techniques for computer programswill beneﬁt from architectures with a suitable inductive bias.We introduce a new representation for programs and their speciﬁcations, based on the principlethat to represent a program, we can use a set of simpler programs . This leads us to introduce theconcept of a property, which is a program that computes a boolean function of the input and outputof another program. For example, consider the problem of synthesizing a program from a small setof input-output examples. Perhaps the synthesizer is given a few pairs of lists of integers, and theuser hopes that the synthesizer will produce a sorting function. Then useful properties might includefunctions that check if the input and output lists have the same length, if the input list is a subset ofthe output, if element of the output list is less than element , and so on.The outputs of a set of properties can be concatenated into a vector, yielding a representation that wecall a property signature . Property signatures can then be used for consumption by machine learningalgorithms, essentially serving as the ﬁrst layer of a neural network. In this paper, we demonstratethe utility of property signatures for program synthesis, using them to perform a type of premiseselection as in Balog et al. (2016). More broadly, however, we envision that property signaturescould be useful across a broad range of problems, including algorithm induction (Devlin et al., 2017),improving code readability (Allamanis et al., 2014), and program analysis (Heo et al., 2019).More speciﬁcally, our contributions are: 1 a r X i v : . [ c s . P L ] F e b ublished as a conference paper at ICLR 2020 • We introduce the notion of property signatures, which are a general purpose way of featurizingboth programs and program speciﬁcations (Section 3). • We demonstrate how to use property signatures within a machine-learning based synthesizer for ageneral-purpose programming language. This allows us to automatically learn a useful set of propertysignatures, rather than choosing them manually (Sections 3.2 and 4). • We show that a machine learning model can predict the signatures of individual functions giventhe signature of their composition, and describe several ways this could be used to improve existingsynthesizers (Section 5). • We perform experiments on a new test set of 185 functional programs of varying difﬁculty, designedto be the sort of algorithmic problems that one would ask on an undergraduate computer scienceexamination. We ﬁnd that the use of property signatures leads to a dramatic improvement in theperformance of the synthesizer, allowing it to synthesize over twice as many programs in less thanone-tenth of the time (Section 4). An example of a complex program that was synthesized only by theproperty signatures method is shown in Listing 1.For our experiments, we created a specialized programming language, called Searcho (Section 2),based on strongly-typed functional languages such as Standard ML and Haskell. Searcho is designedso that many similar programs can be executed rapidly, as is needed during a large-scale distributedsearch during synthesis. We release the programming language, runtime environment, distributedsearch infrastructure, machine learning models, and training data from our experiments so that theycan be used for future research. fun unique_justseen(xs :List) -> List { let triple = list_foldl_, Int, Bool)>( xs, (nil, 0, _1), \(list_elt, (acc, last_elt, first)){ cond_(or_(first, not_equal_(list_elt, last_elt)), \{(cons_(list_elt, acc), list_elt, _0)}, \{(acc , list_elt, _0)}) }); list_reverse_( }; Listing 1: A program synthesized by our system, reformatted and with variables renamed forreadability. This program returns the sub-list of all of the elements in a list that are distinct from theirprevious value in the list.

ROGRAMMING B Y E XAMPLE AND T HE S EARCHO L ANGUAGE

In Inductive Program Synthesis, we are given a speciﬁcation of a program and our goal is to synthesizea program meeting that speciﬁcation. Inductive Synthesis is generally divided into Programming byExample (PBE) and Programming by Demonstration (PBD). This work is focused on PBE. In PBE,we are given a set of input/output pairs such that for each pair, the target program takes the input tothe corresponding output. Existing PBE systems include Winston (1970), Menon et al. (2013), andGulwani (2011). A PBE speciﬁcation might look like Listing 2: io_pairs = [(1, 1), (2, 4), (6, 36), (10, 100)] Listing 2: An example PBE speciﬁcation.for which a satisfying solution would be the function squaring its input. Arbitrarily many functionssatisfy this speciﬁcation. It is interesting but out of scope to think about ways to ensure that thesynthesis procedure recovers the ‘best’ or ‘simplest’ program satisfying the speciﬁcation.Much (though not all) work on program synthesis is focused on domain speciﬁc languages that areless than maximally expressive (Gulwani, 2011; Balog et al., 2016; Wang et al., 2017; Alur et al., Searcho is heavily based on code written by Niklas Een, which is available at https://github.com/tensorflow/deepmath/tree/master/deepmath/zz/CodeBreeder Available at https://github.com/brain-research/searcho Though note that in this work and in prior work, the search procedure used will tend to emit ‘shorter’programs ﬁrst, and so there is an Occam’s-Razor-type argument (Spade & Panaccio, 2019) to be made that you should get this for free. Searcho code is compiled to bytecode and run on the Searcho Virtual Machine. Codeis incrementally compiled, which means that the standard library and speciﬁcation can be compiledonce and then many programs can be pushed on and popped off from the stack in order to check themagainst the speciﬁcation. Searcho is strongly typed with algebraic datatypes (Pierce & Benjamin,2002) Searcho includes a library of 86 functions, all of which are supported by our synthesizer. Thisis a signiﬁcantly larger language and library than have been used in previous work on neural programsynthesis.We have also implemented a baseline enumerative synthesizer. The main experiments in this paperwill involve plugging the outputs of a machine learning model into the conﬁguration for our baselinesynthesizer to improve its performance on a set of human-constructed PBE tasks.

ROPERTY S IGNATURES

Consider the PBE speciﬁcation in Listing 3: io_pairs = [ ([1, 2345, 34567], [1, 2345, 34567, 34567, 2345, 1]), ([True, False], [True, False, False, True]), (["Batman"], ["Batman", "Batman"]), ([[1,2,3], [4,5,6]], [[1,2,3], [4,5,6], [4,5,6], [1,2,3]]) ] Listing 3: An example PBE Speciﬁcation.We can see that the function concatenating the input list to its reverse will satisfy the speciﬁcation,but how can we teach this to a computer? Following Balog et al. (2016) we take the approach oftraining a machine learning model to do premise selection for a symbolic search procedure. Buthow do we get a representation of the speciﬁcation to feed to the model? In Balog et al. (2016), themodel acts only on integers and lists of integers, constrains all integers to lie in [ − , , hasspecial-case handling of lists, and does not deal with polymorphic functions. It would be hard toapply this technique to the above speciﬁcation, since the ﬁrst example contains unbounded integers,the second example contains a different type than the ﬁrst , and the third and fourth examples containrecursive data structures (lists of characters and lists of integers respectively).Thankfully, we can instead learn a representation that is composed of the outputs of multiple otherprograms running on each input/output pair. We will call these other programs properties. Considerthe three properties in Listing 4. all_inputs_in_outputs ins outs = all (map (\x -> x in outs) ins) ouputs_has_dups ins outs = has_duplicates (outs) input_same_len_as_output ins outs = (len ins) == (len outs) Listing 4: Three function projections that can act on the speciﬁcation from Listing 3.Each of these three programs can be run on all 4 of the input output pairs to yield a

Boolean . Theﬁrst always returns

True for our spec, as does the second. The third always returns

False on the givenexamples, although note that it would return

True if the examples had contained the implicit base caseof the empty list. Thus, we can write that our spec has the ‘property signature’ [ True , True , False ] . In this paper, we will present illustrative programs in Haskell syntax to make them more broadly readable.Searcho programs will be presented in Searcho syntax, which is similar. Types have been shown to substantially speed up synthesis. See e.g. Figure 6 of Feser et al. (2015). So any function satisfying the spec will be parametrically polymorphic.

BSTRACTING P ROPERTIES INTO S IGNATURES

Now we describe our representation for a program f :: τ in → τ out . Each property is a program p :: ( τ in , τ out ) → Bool that represents a single “feature” of the program’s inputs and outputswhich might be useful for its representation. In this section, we assume that we have determined asequence P = [ p . . . p n ] of properties that are useful for describing f , and we wish to combine theminto a single representation of f . Later, we will describe a learning principle for choosing relevantproperties.We want the property signature to summarize the output of all the properties in P over all validinputs to f . To do this, we ﬁrst extend the notion of property to a set of inputs in the naturalway. If S is a set of values of type τ in and p ∈ P , we deﬁne p ( S ) = { p ( x, f ( x )) | x ∈ S } . Because p ( S ) is a set of booleans, it can have only three possible values, either p ( S ) = { True } , or p ( S ) = { False } , or p ( S ) = { True , False } , corresponding respectively to the cases where p isalways true, always false, or neither. To simplify notation slightly, we deﬁne the function Π as Π( { True } ) = AllTrue , Π( { False } ) = AllFalse , and Π( { True , False } ) = Mixed . Finally, we candeﬁne the property signature sig(

P, f ) for a program f and a property sequence P as sig( P, f )[ i ] = Π( p i ( V ( τ in ))) , where V ( τ in ) is the possibly inﬁnite set of all values of type τ in .Computing the property signature for f could be intractable or undecidable, as it might requireproving difﬁcult facts about the program. Instead, in practice, we will compute an estimated propertysignature for a small set of input-output pairs S io . The estimated property signature summarizes theactions of P on S io rather than on the full set of inputs V ( τ in ) . Formally, the estimated propertysignature is (cid:99) sig( P, S io )[ i ] := Π( { p i ( x in , x out ) | ( x in , x out ) ∈ S io } ) . (1)This estimate gives us an under-approximation of the true signature of f in the following sense: Ifwe have (cid:99) sig( P, S ) =

Mixed , we must also have sig(

P, f ) =

Mixed . If (cid:99) sig(

P, S ) =

AllTrue , theneither sig(

P, f ) =

AllTrue or sig( P, f ) =

Mixed , and similarly with

AllFalse . Estimated propertysignatures are particularly useful for synthesis using PBE, because we can compute them from theinput-output pairs that specify the synthesis task, without having the deﬁnition of f . Thus we can useestimated property signatures to ‘featurize’ PBE speciﬁcations for use in synthesis.3.2 L EARNING U SEFUL P ROPERTIES

How do we choose a set of properties that will be useful for synthesis? Given a training set of randomprograms with random input/output examples, we generate many random properties. We then prunethe random properties based on whether they distinguish between any of the programs. Then, givena test suite of programs, we do an additional pruning step: among all properties that give the samevalue for every element of the test suite, we keep the shortest property, because of Occam’s razorconsiderations. Given these ‘useful’ properties, we can train a premise selector (Balog et al., 2016)to predict library function usage given properties. Speciﬁcally, from the remaining properties, we Although we write f as a function, that is, as returning an output, it is easy to handle procedures that do notreturn a value by deﬁning τ out to be a special void type. HY ARE P ROPERTY S IGNATURES U SEFUL ?Experiments in the next section will establish that property signatures let our baseline synthesizeremit programs it previously could not, but we think that they can have broader utility: • They allow us to represent more types of functions. Property signatures can automatically dealwith unbounded data types, recursive data types, and polymorphic functions. • They reduce dependency on the distribution from which examples are drawn. If the user ofa synthesizer gives example inputs distributed differently than the training data, the ‘estimated’properties might not change much. • They can be used wherever we want to search for functions by semantics. Imagine a search enginewhere users give a speciﬁcation, the system guesses a property signature, and this signature guess isused to ﬁnd all the pre-computed functions with similar semantics. • Synthesized programs can themselves become new properties. For example, once I learn a programfor primality checking, I can use primality checking in my library of properties.

ROGRAM S YNTHESIS WITH PROPERTY SIGNATURES

We design an experiment to answer the following question: Can property signatures help us synthesizeprograms that we otherwise could not have synthesized? As we will show, the answer is yes!4.1 E

XPERIMENTAL S ETUP

How Does the Baseline Synthesizer Work?

Our baseline synthesizer is very similar to that inFeser et al. (2015) and works by ﬁlling in typed holes . That is, we infer a program type τ in → τ out from the speciﬁcation and the synthesizer starts with a empty ‘hole’ of type τ in → τ out and then ﬁllsit in all possible ways allowed by the type system. Many of these ways of ﬁlling-in will yield newholes, which can in turn be ﬁlled by the same technique. When a program has no holes, we check if itsatisﬁes the spec. We order the programs to expand by their cost, where the cost is essentially a sumof the costs of the individual operations used in the program.At the beginning of the procedure, the synthesizer is given a conﬁguration, which is essentially aweighted set of pool elements that it is allowed to use to ﬁll in the holes. A pool element is a rewriterule that replaces a hole with a type-correct Searcho program, which may itself contain its own, newholes. In our synthesizer, there is one possible pool element for each of the 86 library functions inSearcho, which calls the library function, with correctly-typed holes for each of its arguments. Theconﬁguration will specify a small subset of these pool elements to use during search. It is through theconﬁguration that we will use machine learning to inform the search procedure, as we describe later.See Appendix A.1 for further details on this baseline system. How is the Training Data Generated?

Our test corpus contains programs with 14 different types.For each of those 14 types, we randomly sample conﬁgurations and then randomly generate trainingprograms for each conﬁguration, pruning for observational equivalence. We generate up 10,000 This argument does rely on properties being somehow simple. For instance, if the property does notcompute whether a list contains the value 777, it cannot fail to generalize with respect to the presence or absenceof 777. Since we search for properties in a shortest-ﬁrst fashion, the properties we ﬁnd should be biased towardsimplicity, though certainly this hypothesis merits more experimental validation. In the synthesis literature, this approach of ﬁrst discovering the high-level structure and then ﬁlling it inis sometimes called ‘top-down’ synthesis (Solar-Lezama, 2018). Top-down synthesis is to be contrasted with‘bottom-up’ synthesis, in which low-level components are incrementally combined into larger programs.

Bool → Bool ). We also generate and prune random properties asdescribed in Section 3.2. See Listing 5 for examples of useful properties that were generated. \:(List, List)->Bool (input, output) { list_for_all_ (input, \x {in_list_ (x, output)})} \:(List, List)->Bool (input, output) { not_ (is_even_ (list_len_ output))} \:(List, List)->Bool (input, output) { not_equal_ ((ints_sum_ input), (ints_sum_ output))} \:(List, List)->Bool (input, output) { gt_ ((list_len_ input), (list_len_ output))} Listing 5: 4 of the Properties with the highest discriminative power on functions of type

List → List . The ﬁrst checks if every element of the input list is in the output list.The second checks if the length of the output list is even. The third checks if sum of the input and theoutput list is the same, and the fourth checks if the input list is longer than the output list.

How was the Test Set Constructed?

We’ve constructed a test set of 185 human generated pro-grams ranging in complexity from one single line to many nested function calls with recursion.Programs in the test set include computing the GCD of two integers, computing the n -th ﬁbonaccinumber, computing the intersection of two sets, and computing the sum of all pairs in two lists. Weensure that none of the test functions appear in the training set. See the open source code for moredetails on this. What is the Architecture of the Model?

As mentioned above, we train a neural network topredict the number of times each pool element will appear in the output. This neural network is fullyconnected, with learned embeddings for each of the values

AllTrue , AllFalse and

Mixed . How does the Model Output Inform the Search Procedure?

Since we have a large number ofpool elements (86), we can’t run the synthesizer with all pool elements if we want to ﬁnd programsof reasonable length. This is both because we will run out of memory and because it will take toolong. Thus, we randomly sample conﬁgurations with less pool elements. We then send multiple suchconﬁgurations to a distributed synthesis server that tries them in parallel.When we use the model predictions, we sample pool elements in proportion to the model’s predictednumber of times that pool element appears. The baseline samples pool elements in proportion to theirrate of appearance in the training set.4.2 U

SING P ROPERTY S IGNATURES L ETS US S YNTHESIZE N EW F UNCTIONS

We ran 3 different runs of our distributed synthesizer for 100,000 seconds with and without the aidof property signatures. The baseline synthesizer solved 28 test programs on average. With propertysignatures, the synthesizer solved an average of 73 test programs. See Figure 1 for more discussion.Indeed, it can be seen from the ﬁgure that not only did the synthesizer solve many more test programsusing property signatures, but it did so much faster, synthesizing over twice as many programs inone-tenth of the time as the baseline.4.3 C

OMPARISON WITH D EEP C ODER

We have conducted an experiment to compare premise selection using Property Signatures to thepremise selection algorithm from (Balog et al., 2016). This required considerable modiﬁcations tothe experimental procedure.First, since the premise-selection part of DeepCoder can only handle Integers and lists of Integers,we restricted the types of our training and test functions. In particular, we read through (Balog et al.,2016) and found four function types in use: f :: [Int] -> [Int] g :: [Int] -> Int h :: ([Int], [Int]) -> Int k :: ([Int], Int) -> Int Listing 6: The four function types used in DeepCoder.6ublished as a conference paper at ICLR 2020Figure 1: Comparison of synthesis with property signatures and without property signatures. The x -axis denotes time elapsed in seconds. Roughly speaking, we let the distributed synthesizer runfor 1 day. The y -axis represenets the cumulative number of programs synthesized. On average, thebaseline solved 28 of the test programs, while the baseline enhanced with property signatures solved73 test programs (around 2.6 times as many programs). Both the baseline and the run with propertysignatures were run with three different random seeds. Altogether, this experiment provides strongevidence that property signatures can be useful.The types of f and g in 6 are taken directly from (Balog et al., 2016). The types of h and k are inferredfrom examples given in the appendix of (Balog et al., 2016). Their DSL does not technically havetuples, but we have wrapped the inputs of their ‘two-input-functions’ in tuples for convienence.Second, since DeepCoder can only handle integers betwen − and , we ﬁrst re-generated allof our random inputs (used for ‘hashing’ of generated training data) to lie in that range. We thengenerated random training functions of the above four types. We then made a data set of trainingfunctions associated with 5 input-output pairs, throwing out pairs where any of the outputs wereoutside the aforementioned range, and throwing out functions where all outputs contained somenumber outside that range.Third, of the examples in our test set with the right types, we modiﬁed their input output pairs in asimilar way. We ﬁltered out functions that could not be so modiﬁed. After doing so, we were leftwith a remaining test suite of 32 functions.Finally, we trained a model to predict functions-to-use from learned embeddings of the input-outputpairs, as in DeepCoder. We didn’t see a description of how functions with multiple inputs had theirinputs embedded, so we elected to separate them with a special character, distinct from the nullcharacters that are used to pad lists.Compared with the Property Signatures method, this technique results in far fewer synthesized testset programs. We did 3 random restarts for each of DeepCoder, Property Signatures, and the RandomBaseline (recall that the random baseline itself is already a relatively sophisticated synthesis algorithm- it’s just the conﬁgurations that are random). The 3 DeepCoder runs synthesized an average of . test programs, while the Property Signature runs (trained on the same modiﬁed training dataand tested on the same modiﬁed test data) synthesized . . The random baseline synthesized programs on average.A priori, this seems like a surprisingly large gap, but it actually ﬁts with what we know from existingliterature. Shin et al. (2018) observe something similar: which is that DeepCoder-esque techniquestend to generalize poorly to a a test set where the input-output pairs come from a different distributionthan they do in training. This is the case in our experiment, and it will be the case in any realistic7ublished as a conference paper at ICLR 2020setting, since the test set will be provided by users. Property Signatures are (according to ourexperiments) much less sensitive to such shift. This makes intuitive sense: whether an input list ishalf the length of an output list (for instance) is invariant to the particular distribution of members ofthe list.Note that even if Property Signatures did not outperform DeepCoder on this subset of our test set,they would still constitute an improvement due to their allowing us to operate on arbitrary programsand inputs types. REDICTING PROPERTY SIGNATURES OF F UNCTION C OMPOSITIONS

Most programs involve composing functions with other functions. Suppose that we are trying tosolve a synthesis problem from a set of input/output examples, and during the search we create apartial program of the form f ( g ( x )) for some unknown g . Since we know f , we know its propertysignature. Since we have the program speciﬁcation, we also have the estimated property signature for f ◦ g := f ( g ( x )) . If we could somehow guess the signature for g , we could look it up in a cacheof previously computed functions keyed by signature. If we found a function matching the desiredsignature, we would be done. If no matching function exists in the cache, we could start a smallersearch with only the signature of g as the target, then use that result in our original search. We couldattempt to encode the relationship between f and g into a set of formal constraints and pass that to asolver of some kind (De Moura & Bjørner, 2008), and while that is potentially an effective approach,it may be difﬁcult to scale to a language like Searcho. Instead, we can simply train a machine learningmodel to predict the signature of g from the signature of f and the signature of f ◦ g .Here we present an experiment to establish a proof of concept of this idea. First, we generated adata set of 10,000 random functions taking lists of integers to lists of integers. Then we randomlychose 50,000 pairs of functions from this list, arbitrarily designating one as f and one as g . We thencomputed the signatures of f , g and f ◦ g for each pair, divided the data into a training set of 45,000elements and a test set of 5,000 elements, and trained a small fully connected neural network topredict the signature of g from the other two signatures.On the test set, this model had 87.5% accuracy, which is substantially better than chance. Weinspected the predictions made on the test set and found interesting examples like the one in Listing7, where the model has learned to do something you might (cautiously) refer to as logical deductionon properties. This result is suggestive of the expressive power of property signatures. It also pointstoward exciting future directions for research into neurally guided program synthesis. f: \:List->List inputs { consume_ (inputs, (list_foldl_ (inputs, int_min, mod_)))} g: \:List->List inputs { list_map_ (inputs, neg_)} prop: \:(List, List)->Bool (inputs, outputs) { list_for_all_ (outputs, \x {in_list_ (x, inputs)})} Listing 7: Example of successful prediction made by our composition predictor model. The propertyin question checks whether all the elements of the output list are members of the input list. For f , thevalue is AllTrue , and for f ◦ g the value is Mixed . The model doesn’t know g or its signature, butcorrectly predicts that the value of this property for g must be Mixed . ELATED W ORK

There is substantial prior work on program synthesis in general. We can hardly do it justice here, butsee some of Gottschlich et al. (2018); Solar-Lezama (2018); Gulwani et al. (2017); Allamanis et al.(2018) for more detailed surveys.

Property Based Testing:

Function properties are similar to the properties from Property BasedTesting, a software testing methodology popularized by the QuickCheck library (Claessen & Hughes,2011) that has now spread to many contexts (Gallant, 2018; Holser, 2018; Hypothesis, 2018; Luu,2015; Elhage, 2017; MacIver, 2017). Quickcheck properties are human-speciﬁed and operate onfunctions, while our properties operate on input/output pairs.8ublished as a conference paper at ICLR 2020

Automated Theorem Proving:

Synthesizing programs using machine learning is related to theidea of proving theorems using machine learning (Irving et al., 2016). Synthesis and theorem provingare formally related as well (Howard, 1980).

Program Synthesis from a Programming Languages Perspective:

Most existing work on syn-thesis approaches is from the perspective of programming language design. Our baseline synthesizerborrows many ideas from Feser et al. (2015). Polikarpova et al. (2016) use reﬁnement types (Freeman,1994) (roughly, a decidable version of dependent types - see Pierce & Benjamin (2002)) to giveprogram speciﬁcations, allowing the type-checker to discard many candidate programs. Propertysignatures can be thought of as a compromise between reﬁnement types and dependent types: we canwrite down speciﬁcations with them that would be impossible to express in reﬁnement types, but wecan only check those speciﬁcations empirically.

ML-Guided Program Synthesis:

More recently, researchers have used machine learning to syn-thesize and understand programs. We have mentioned Balog et al. (2016), but see all of: Nye et al.(2019); Ellis et al. (2018a); Zohar & Wolf (2018); Kalyan et al. (2018); Ellis et al. (2019a); Liang et al.(2010); Alon et al. (2019) as well. Menon et al. (2013) introduces the idea of features: a predecessorto the idea of properties. Features differ from properties in that they are hand-crafted rather thanlearned, and that they were applied only on a limited string processing domain.

Deepcoder:

The relationship between this work and Balog et al. (2016) merits special discussion.Aside from the inclusion of property signatures, they differ in the following ways: • We use a more expressive DSL. Their DSL only allows linear control ﬂow with a small set offunctions, whereas our language is Turing complete (it has looping, recursion, etc). We also have alarger set of allowed component functions: 86 vs. 34. • Their machine learning method does not work straightforwardly for arbitrary programs. Theirtraining and test programs only deal with integers and lists of integers, while we have 14 differentfunction types. It would thus not be feasible to compare the techniques on anything but a tiny subsetof our existing test set. • The test cases in Balog et al. (2016) are generated from their enumerative synthesizer. It is thereforeguaranteed that the synthesizer will be able to emit them in a reasonable amount of time during testing,so their demonstrated improvements are ‘merely’ speed-ups. Our test cases are human generated, andover half of the programs synthesized using property signatures were not synthesized at all givenover a day of time. ONCLUSION AND F UTURE W ORK

In this work, we have introduced the idea of properties and property signatures. We have shownthat property signatures allow us to synthesize programs that a baseline otherwise was not able tosynthesize, and have sketched out other potential applications as well. Finally, we have open sourcedall of our code, which we hope will accelerate future research into ML-guided program synthesis.A

CKNOWLEDGMENTS

We would like to thank Kensen Shi, David Bieber, and the rest of the Program Synthe-sis Team for helpful discussions. We would like to thank Colin Raffel for reading a draftof the paper. Most of all, we owe a substantial debt to Niklas Een, on whose Evo pro-gramming language ( https://github.com/tensorflow/deepmath/tree/master/deepmath/zz/CodeBreeder ) the Searcho language is heavily based. Of course, barring bugs in the synthesizer, they would be synthesized eventually . R EFERENCES

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. Learning natural codingconventions. In

Symposium on the Foundations of Software Engineering (FSE) , 2014.Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. A survey of machinelearning for big code and naturalness.

ACM Computing Surveys (CSUR) , 51(4):81, 2018.Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed represen-tations of code.

Proceedings of the ACM on Programming Languages , 3(POPL):40, 2019.Rajeev Alur, Pavol ˇCerný, and Arjun Radhakrishna. Synthesis through uniﬁcation. In DanielKroening and Corina S. P˘as˘areanu (eds.),

Computer Aided Veriﬁcation , pp. 163–179, Cham, 2015.Springer International Publishing. ISBN 978-3-319-21668-3. URL http://ecee.colorado.edu/pavol/publications/cav15a/cav15a.pdf .J. W. Backus, R. J. Beeber, S. Best, R. Goldberg, L. M. Haibt, H. L. Herrick, R. A. Nelson, D. Sayre,P. B. Sheridan, H. Stern, I. Ziller, R. A. Hughes, and R. Nutt. The fortran automatic coding system.In

Papers Presented at the February 26-28, 1957, Western Joint Computer Conference: Techniquesfor Reliability , IRE-AIEE-ACM ’57 (Western), pp. 188–198, New York, NY, USA, 1957. ACM. doi:10.1145/1455567.1455599. URL http://doi.acm.org/10.1145/1455567.1455599 .Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow.Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989 , 2016.Koen Claessen and John Hughes. Quickcheck: a lightweight tool for random testing of haskellprograms.

Acm sigplan notices , 46(4):53–64, 2011.B.J. Copeland.

Alan Turing’s Electronic Brain: The Struggle to Build the ACE, the World’s FastestComputer . OUP Oxford, 2012. ISBN 9780199609154. URL https://books.google.com/books?id=YhQZnczOS7kC .Leonardo De Moura and Nikolaj Bjørner. Z3: An efﬁcient smt solver. In

International conference onTools and Algorithms for the Construction and Analysis of Systems , pp. 337–340. Springer, 2008.Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, andPushmeet Kohli. RobustFill: Neural program learning under noisy I/O. In

International Conferenceon Machine Learning (ICML) , volume 70 of

Proceedings of Machine Learning Research , pp.990–998, 2017. URL http://proceedings.mlr.press/v70/devlin17a.html .Nelson Elhage. Property-based testing is fuzzing, 2017. URL https://blog.nelhage.com/post/property-testing-is-fuzzing/ .Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum.Learning libraries of subroutines for neurally–guided bayesian program induction. In S. Bengio,H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),

Advances inNeural Information Processing Systems 31 , pp. 7805–7815. Curran Associates, Inc., 2018a.Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Joshua B. Tenen-baum. Library learning for neurally-guided bayesian program induction. In

NeurIPS , 2018b.Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama.Write, execute, assess: Program synthesis with a repl. arXiv preprint arXiv:1906.04604 , 2019a.Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama.Write, execute, assess: Program synthesis with a REPL. In

NeurIPS , 2019b.John K Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations frominput-output examples. In

ACM SIGPLAN Notices , volume 50, pp. 229–239. ACM, 2015.Tim Freeman. Reﬁnement types ml. Technical report, CARNEGIE-MELLON UNIV PITTSBURGHPA DEPT OF COMPUTER SCIENCE, 1994.Andrew Gallant. Quickcheck for rust, 2018. URL https://github.com/BurntSushi/quickcheck . 10ublished as a conference paper at ICLR 2020Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, ReginaBarzilay, Saman Amarasinghe, Joshua B Tenenbaum, and Tim Mattson. The three pillars ofmachine programming. In

Proceedings of the 2nd ACM SIGPLAN International Workshop onMachine Learning and Programming Languages , pp. 69–80. ACM, 2018.Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In

Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages , POPL ’11, pp. 317–330, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0490-0. doi: 10.1145/1926385.1926423. URL http://doi.acm.org/10.1145/1926385.1926423 .Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. Program synthesis.

Foundations andTrends R (cid:13) in Programming Languages , 4(1-2):1–119, 2017.Kihong Heo, Mukund Raghothaman, Xujie Si, and Mayur Naik. Continuously reasoning about pro-grams using differential bayesian inference. In Programming Language Design and Implementation(PLDI) , 2019.Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.

Neural computation , 9(8):1735–1780, 1997.Paul Holser. junit-quickcheck, 2018. URL https://github.com/pholser/junit-quickcheck/ .William A Howard. The formulae-as-types notion of construction.

To HB Curry: essays oncombinatory logic, lambda calculus and formalism , 44:479–490, 1980.Hypothesis. Hypothesis, 2018. URL https://github.com/HypothesisWorks/hypothesis .Geoffrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Eén, François Chollet, and JosefUrban. Deepmath-deep sequence models for premise selection. In

Advances in Neural InformationProcessing Systems , pp. 2235–2243, 2016.Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani.Neural-guided deductive search for real-time program synthesis from examples. arXiv preprintarXiv:1804.01186 , 2018.Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, WayneHubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition.

Neural computation , 1(4):541–551, 1989.Percy Liang, Michael I Jordan, and Dan Klein. Learning programs: A hierarchical bayesian approach.In

Proceedings of the 27th International Conference on Machine Learning (ICML-10) , pp. 639–646,2010.Dan Luu. Aﬂ + quickcheck = ?, 2015. URL https://danluu.com/testing/ .David R. MacIver. What is property based testing, 2017. URL https://hypothesis.works/articles/what-is-property-based-testing/ .Zohar Manna and Richard Waldinger. Knowledge and reasoning in program synthesis.

Artiﬁcialintelligence , 6(2):175–208, 1975.Zohar Manna and Richard J Waldinger. Toward automatic program synthesis.

Communications ofthe ACM , 14(3):151–165, 1971.Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. A machine learningframework for programming by example. In

International Conference on Machine Learning , pp.187–195, 2013. 11ublished as a conference paper at ICLR 2020Maxwell I. Nye, Luke B. Hewitt, Joshua B. Tenenbaum, and Armando Solar-Lezama. Learning toinfer program sketches. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.),

Proceedingsof the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, LongBeach, California, USA , volume 97 of

Proceedings of Machine Learning Research , pp. 4861–4870.PMLR, 2019. URL http://proceedings.mlr.press/v97/nye19a.html .Benjamin C Pierce and C Benjamin.

Types and programming languages . 2002.Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In

Proceedings of the 16thACM SIGPLAN-SIGACT symposium on Principles of programming languages , pp. 179–190. ACM,1989.Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program synthesis from polymorphicreﬁnement types. In

ACM SIGPLAN Notices , volume 51, pp. 522–538. ACM, 2016.D Shaw. Inferring lisp programs from examples.Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, and DawnSong. Synthetic datasets for neural program synthesis. 2018.Armando Solar-Lezama. Introduction to program synthesis. https://people.csail.mit.edu/asolar/SynthesisCourse/TOC.htma , 2018. Accessed: 2018-09-17.Paul Vincent Spade and Claude Panaccio. William of ockham. In Edward N. Zalta (ed.),

The StanfordEncyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, spring 2019 edition,2019.Phillip D Summers. A methodology for lisp program construction from examples.

Journal of theACM (JACM) , 24(1):161–175, 1977.R.J. Waldinger, R.C.T. Lee, and SRI International.

PROW: A Step Toward Automatic ProgramWriting . SRI International, 1969. URL https://books.google.com/books?id=3BITSQAACAAJ .Chenglong Wang, Alvin Cheung, and Rastislav Bodik. Synthesizing highly expressive sqlqueries from input-output examples. In

Proceedings of the 38th ACM SIGPLAN Conferenceon Programming Language Design and Implementation , PLDI 2017, pp. 452–466, New York,NY, USA, 2017. ACM. ISBN 978-1-4503-4988-8. doi: 10.1145/3062341.3062365. URL http://doi.acm.org/10.1145/3062341.3062365 .Patrick H. Winston. Learning structural descriptions from examples. Technical report, Cambridge,MA, USA, 1970.Amit Zohar and Lior Wolf. Automatic program synthesis of long programs with a learned garbagecollector. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett(eds.),

Advances in Neural Information Processing Systems 31 , pp. 2094–2103. Curran Associates,Inc., 2018. 12ublished as a conference paper at ICLR 2020

Data:

A PBE spec and a synthesizer conﬁguration

Result:

A program satisfying the speciﬁcation (hopefully!)

Queue.push ( hole :: τ in → τ out ); while Queue is not empty do partial_program ← GetLowestCostPartial(

Queue ) ; if HasHoles( partial_program ) then ExpandOneHole( partial_program ) ; endelse TestAgainstSpec( partial_program ) ; endend Figure 2: The top-down synthesizer that we use as a baseline in this work. In a loop until a satisfyingprogram is found or we run out of time, we pop the lowest-cost partial program from the queue of allpartial programs, then we ﬁll in the holes in all ways allowed by the type system, pushing each newpartial program back onto the queue. If there are no holes to ﬁll, the program is complete, and wecheck it against the spec. The cost of a partial program is the sum of the costs of its pool elements,plus a lower bound on the cost of ﬁlling each of its typed holes, plus the sum of the costs of a fewspecial operations such as tuple construction and lambda abstraction.

A A

PPENDIX

A.1 F

URTHER D ETAILS ON THE B ASELINE S YNTHESIZER