[PDF] Grammar Filtering For Syntax-Guided Synthesis

Abstract

Programming-by-example (PBE) is a synthesis paradigm that allows users to generate functions by simply providing input-output examples. While a promising interaction paradigm, synthesis is still too slow for realtime interaction and more widespread adoption. Existing approaches to PBE synthesis have used automated reasoning tools, such as SMT solvers, as well as works applying machine learning techniques. At its core, the automated reasoning approach relies on highly domain specific knowledge of programming languages. On the other hand, the machine learning approaches utilize the fact that when working with program code, it is possible to generate arbitrarily large training datasets. In this work, we propose a system for using machine learning in tandem with automated reasoning techniques to solve Syntax Guided Synthesis (SyGuS) style PBE problems. By preprocessing SyGuS PBE problems with a neural network, we can use a data driven approach to reduce the size of the search space, then allow automated reasoning-based solvers to more quickly find a solution analytically. Our system is able to run atop existing SyGuS PBE synthesis tools, decreasing the runtime of the winner of the 2019 SyGuS Competition for the PBE Strings track by 47.65% to outperform all of the competing tools.

Full PDF

aa r X i v : . [ c s . L G ] F e b Grammar Filtering For Syntax-Guided Synthesis

Kairo Morton, William Hallahan, Elven Shum, Ruzica Piskac, Mark Santolucito, George School, Yale University, Deerﬁeld [email protected], [email protected], eshum20@deerﬁeld.edu,[email protected], [email protected]

Abstract

Programming-by-example (PBE) is a synthesis paradigm thatallows users to generate functions by simply providing input-output examples. While a promising interaction paradigm,synthesis is still too slow for realtime interaction and morewidespread adoption. Existing approaches to PBE synthesishave used automated reasoning tools, such as SMT solvers,as well as works applying machine learning techniques. Atits core, the automated reasoning approach relies on highlydomain speciﬁc knowledge of programming languages. Onthe other hand, the machine learning approaches utilize thefact that when working with program code, it is possible togenerate arbitrarily large training datasets. In this work, wepropose a system for using machine learning in tandem withautomated reasoning techniques to solve Syntax Guided Syn-thesis (SyGuS) style PBE problems. By preprocessing Sy-GuS PBE problems with a neural network, we can use a datadriven approach to reduce the size of the search space, then al-low automated reasoning-based solvers to more quickly ﬁnda solution analytically. Our system is able to run atop exist-ing SyGuS PBE synthesis tools, decreasing the runtime of thewinner of the 2019 SyGuS Competition for the PBE Stringstrack by 47.65% to outperform all of the competing tools.

Introduction

The term “program synthesis” refers to automatically gen-erating code to satisfy some speciﬁcation. That speciﬁcationdescribes what the code should do, without going into de-tails about how it should be done. The speciﬁcation couldbe given as a set of constraints (Manna and Waldinger 1979;Kuncak et al. 2010), it can be deduced from the programand its environment (Gvero et al. 2013; Feng et al. 2017), orit can be inferred from a large corpus (Balog et al. 2017;Santolucito et al. 2017).One paradigm of program synthesis is called program-ming by example (Cypher et al. 1993) (PBE). In the PBE ap-proach, a user only provides a set of pairs of input-outputexamples that illustrate the desired behavior of the code.From these examples, the PBE engine should then generatecode that generalizes from the examples to create a programwhich covers the unspeciﬁed examples as well.

The idea of automated code synthesis is an area of re-search with a long history (cf. the Church synthesis prob-lem (Church 1963)). However, due to the problem’s unde-cidability and high computational complexity for decidablefragments, for almost 50 years the research in program syn-thesis was mainly focused on addressing theoretical ques-tions and the size of synthesized programs was relativelysmall. However, the state of affairs has drastically changedin the last decade. By leveraging advances in automated rea-soning and formal methods, there has been a renewed in-terest in software synthesis. The research in program syn-thesis has recently focused on developing efﬁcient algo-rithms and tools, and synthesis has even been used in in-dustrial software (Gulwani 2011). Today, machine learningplays a vital role in modern software synthesis and thereare numerous tools and startups that rely on machine learn-ing and big data to automatically generate code (cod 2019;Balog et al. 2017).With numerous synthesis tools and formats being de-veloped, it was difﬁcult to empirically evaluate andcompare existing synthesis tools. The Syntax GuidedSynthesis (SyGuS) format language (Alur et al. 2013;Raghothaman and Udupa 2019) was introduced in an effortto standardize the speciﬁcation format of program synthe-sis, including PBE synthesis problems. The SyGuS languagespeciﬁes synthesis problems through two components - aset of constraints (eg input-output examples), and a gram-mar (a set of functions). The goal of a SyGuS synthesisproblem is to construct a program from functions within thegiven grammar that satisﬁes the given constraints. With thisstandardized synthesis format and an ever expanding set ofbenchmarks, there is now a yearly competition of synthesistools (Alur et al. 2019), which pushes the frontier of scalablesynthesis further.The SyGuS Competition splits synthesis problems intotracks, for example PBE Strings or PBE BitVectors, assign-ing a different grammar for each track - and sometimeseven varying the grammar within a single track. As thegrammar deﬁnes the search space in SyGuS, this allowsbenchmark designers to ensure problems are relativelyin-scope of current tools. However, when synthesis is de-ployed in real-world applications, we must allow for largerrammars that account for the wide range of use-cases usersrequire (Santolucito, Hallahan, and Piskac 2019). Whilelarger grammars allow for more expressive power in thesynthesis engine, it also slows down the whole synthesisprocess.In our own experimentation, we found that by manuallyremoving some parts of the grammar from the SyGuS Com-petition benchmarks, we can signiﬁcantly improve synthesistimes. Accordingly, we sought to automate this process. Re-moving parts of a grammar is potentially dangerous though,as we may remove the possibility of ﬁnding a solution al-together. In fact, understanding the grammar’s impact onsynthesis algorithms is a complex problem, connected to theconcept of overﬁtting (Padhi et al. 2019).In this paper, we utilize machine learning to automate ananalysis of a SyGuS grammar and a set of synthesis con-straints. We generate a large number of SyGuS problems,and use this data to train a neural network. Given a new Sy-GuS problem, the neural network predicts how likely it is fora given grammar element to be critical to synthesizing a so-lution to that problem. Our key insight is that, in addition tocriticality, we predict how much time we expect to save byremoving this grammar element. We combine these predic-tions to efﬁciently ﬁlter grammars to ﬁt a speciﬁc synthesisproblem, in order to speed up synthesis times. Even withthese reduced grammars, we are still able to ﬁnd solutionsto the problems.We implemented our approach in a modular tool, GRT,that can be attached to any existing SyGuS synthesis engineas a blackbox. We evaluated GRT by running it on the Sy-GuS Competition Benchmarks from 2019 in the PBE Stringstrack. We found GRT outperformed CVC4, the winner of theSyGuS Competition from 2019, reducing the overall synthe-sis time by . . Additionally, GRT was able to solve abenchmark for which CVC4 timed out.In summary, the core contributions of our work are as fol-lows:1. A methodology to generate models that can reduce timeneeded to synthesize PBE SyGuS problems. In particular,our technique reduced the grammar by identifying whichfunctions to try to eliminate to increase the efﬁciency ofa SyGuS solver. It also learns a model to predict whichfunctions are critical for a particular PBE problem.2. A demonstration of the effectiveness of our methodology.We show experiments on existing SyGuS PBE Stringstrack that demonstrates the speed up resulting from us-ing our ﬁltering as a preprocessor for an existing SyGuSsolver. Over the set of benchmarks, our techniques de-creases the total time taken by synthesis by . . Related

One approach to SyGuS is to directly traina neural network to satisfy the input/outputexamples (Andrychowicz and Kurach 2016;Devlin et al. 2017b; Graves, Wayne, and Danihelka 2014;Joulin and Mikolov 2015; Kaiser and Sutskever 2015;Chen, Liu, and Song 2017). However, such approachesstruggle to generalize, especially when the number of examples is small (Devlin et al. 2017a). Some existingwork (Wang et al. 2018; Bunel et al. 2018) aims to repre-sent aspects of the syntax and semantics of a language ina neural network. In contrast to these existing approaches,which aim to outright solve SyGuS problems, our work actsas a preprocessor for a separate SyGuS solver. However,one could also explore using our work as a preprocessorfor one of these existing neural network directed synthe-sis approaches. Other works have explored combininglogic-directed and machine learning guided synthesisapproaches (Nye et al. 2019). This work sought to splitsynthesis tasks between generating high level sketches withneural networks, and ﬁll in the holes of the sketch with anenumerative solver. Our work could be complementary tothis, by assisting in pruning of the search space needed toﬁll in the holes.Like our work, DeepCoder (Balog et al. 2017)and Neural-Guided Deductive Search(NGDS) (Kalyan et al. 2018) identify pieces of a grammarthat should be removed from the grammar. However, in ourparlance, these works only consider criticality , which mea-sures how important a part of the grammar is to completingsynthesis. Unlike our work, they do not consider the timesavings from removing or keeping a part of the grammar.NGDS (Kalyan et al. 2018) does note that different modelscould be trained for different pieces of a grammar, however,it provides no means of automating this process. Rather, theuser would have to manually elect to train individual neuralnetworks for different grammatical elements. Work by Siet al (Si et al. 2018) aims to learn an efﬁcient solver for aSyGuS from scratch, rather than, as in our work, acting as apreprocessor for a separate solver.

Background

A SyGuS synthesis problem is a tuple ( C, G ) of constraints, C , and a context-free grammar, G . In our case we restrictthe set of constraints to the domain of PBE, so that all con-straints are in the form of pairs ( i, o ) of input-output exam-ples. We write G \ g to denote the grammar G , but withoutthe terminal symbol g . The set of terminal symbols are thecomponent functions that can be used in constructing a pro-gram (e.g. +, -, str.length). We also use the notation, π ( G ) , todenote the projection of G into its set representation, whichis the set of the terminal symbols in the grammar.The problem statement of syntax-guided synthesis (Sy-GuS) is; given a grammar, G , and a set of constraints C , ﬁnda program, P ∈ G , such that the program satisﬁes all theconstraints – ∀ c ∈ C.P ⊢ c . For brevity, we equivalentlywrite P ⊢ C . If our synthesis engine is able to ﬁnd such aprogram in t seconds or less, we write that ( G, C ) t P . Weuse the notation T CG to indicate the time to run ( G, C ) t P .If the SyGuS solver is not able to ﬁnd a solution within thetimeout ( T CG > t ), we denote this as ( G, C ) t P . Wetypically set a timeout on all synthesis problems of 3600seconds, the same value of the timeout used in the SyGuScompetition. We write ( G, C ) P and ( G, C ) P asshorthand for ( G, C ) P and ( G, C ) P , re-spectively.rammar G Constraints C Predict Time CriticalityComboPredictedGrammar G ⋆ Figure 1: GRT uses the grammar G and constraints C to pre-dict how critical each function is, and the amount of time thatwould be saved by eliminating it from the grammar. Then, itoutputs a new grammar G ⋆ , which it expects will speed upsynthesis over the original grammar (that is, it expects that T CG ⋆ < T CG ).We deﬁne G as the grammar constructed from the maxi-mal set of terminal symbols we consider for synthesis. Wecall a terminal, g , within a grammar, critical for a set of con-straints, C , if ( G \ g, C ) P . For any given set of con-straints, if a solution exists with G , there is also a grammar, G crit , that contains exactly the critical terminal symbols re-quired to ﬁnd a solution. More formally, G crit is constructedsuch that ( G crit , C ) P ∧ ∀ g ∈ G crit . ( G \ g, C ) P Note that G crit is not unique.The goal of our work is to ﬁnd a grammar, G ⋆ , where π ( G crit ) ⊆ π ( G ⋆ ) ⊆ π ( G ) . This will yield a grammarthat removes some noncritical terminal symbols so that thesearch space is smaller, but still sufﬁcient to construct a cor-rect program. Overview

Our system, GRT, works as a preprocessing step for a Sy-GuS solver. The goal of GRT is to remove elements from thegrammar and thus, by having a smaller search space, savetime during synthesis. To do this we combine two metrics,as shown in Figure 1: our predicted conﬁdence that a gram-mar element is not needed, and our prediction of how muchtime will be saved by removing that element. We focus onremoving only elements where we are both conﬁdent thatthe grammar element is noncritical, and that removing thegrammar element signiﬁcantly impacts synthesis times. Bygiving the constraints and the grammar deﬁnition to GRT,we predict which elements of the grammar can be safelyremoved. By analyzing running times we predict which ofthese elements are beneﬁcal to remove. We describe GRTin three sections, addressing dataset generation, the trainingstage, and our evaluation.

Data Generation

In order to learn a model for GRT, we need to generate alabelled dataset that maps constraints to grammar compo-nents in G crit . This will allow us to predict, given a newset of constraints C ′ , which grammar elements are noncrit-ical for synthesis, and accordingly prune our grammar. Thegeneration of data for application to machine learning forprogram synthesis is a nontrivial problem, requiring carefulconstruction of the dataset (Shin et al. 2019). We break thegeneration of this dataset into two stages: ﬁrst, we generate aset of programs, P from G . Then, for each program in P , wegenerate constraints for that program. We additionally needa dataset of synthesis times, in order to predict how longsynthesis takes for a given set of constraints. Criticality Data

To generate a set of programs P , that can be generatedfrom a grammar G , we construct a synthesis query withno constraints. We then run CVC4 with the command --sygus-stream , which instructs CVC4 to output asmany solutions as it can ﬁnd. With no constraints, all func-tions satisfy the speciﬁcation, and CVC4 will generate allpermutations of (well-formed and well-typed) functions inthe grammar, until the process is terminated (we terminateafter generating n programs). Because CVC4 generates so-lutions of increasing size, we collect all generated programs,then shufﬂe the order to prevent data bias with respect to theorder (size) in which CVC4 generated programs.After generating programs, we generate correspondingconstraints (in the form of input-output examples for PBE)for these functions. To do this, for each program, P , we ran-domly generate a set of inputs I , and compute the input-output pairs C = { ( i, P ( i )) | i ∈ I } . We then form a SyGuSproblem ( G, C ) , where we know that the program P satis-ﬁes the constraints, and is part of the grammar: P ⊢ C and P ∈ G . This amounts to programs that could be synthesizedfrom the constraints (i.e. ( G, C ) ∞ P ). It is important thatour dataset represent programs that could be synthesized, asopposed to what can be synthesized (i.e. ( G, C ) P ).This is important because we will use this data set to try tolearn the “semantics” of constraints, and we do not want touse this data set to additionally, inadvertently learn the limi-tations of the synthesis engine.At this point, we have now constructed a dataset of triplesof grammars (ﬁxed for all benchmarks), constraints, andprograms, D = { ( G, C , P ) . . . ( G, C n , P n ) } . In order touse D to helps us predict G crit , we break up each triple bysplitting each constraint set C into its individual constraints.For a triple ( G, C, P ) , where C = { c . . . c m } , we generatea new set of triples { ( G, c , P ) . . . ( G, c m , P ) } . The unionof all these triples of individual constraints form our train-ing set, T R crit , that will be used to predict critical functionsin the grammar for a given set of constraints.

Timing Data

In addition to a training set for predicting G crit , we alsoneed a separate training set for predicting the time that canbe saved by removing a terminal from the grammar. Thisataset maps grammar elements g ∈ G to the effect on syn-thesis times, R , when g is dropped from the grammar. To dothis we require synthesis problems that more closely modelthe types of constraints that humans typically write. We col-lect these set of benchmarks from users of the live coding in-terface for SyGuS (Santolucito, Hallahan, and Piskac 2019).Because we had limited number of human-generated con-straint examples, we augmented this with constraints gener-ated from T R crit .We run synthesis for each problem with the full grammar,as well as with all grammars constructed by removing oneelement, g . For every synthesis problem benchmark, ≤ i ≤ m , we record the difference in synthesis times betweenrunning with the full grammar, and removing g : T C i G − T C i G \ g (1)Thus, we create a training set, T R time , relating each ter-minal g ∈ π ( G ) and a set of constraints, to the time it takesto synthesize a solution without that terminal. Training

Predicting criticality

Our goal is to predict, given a set of constraints C , if a ter-minal g belongs to the set of terminals π ( G crit ) for C . Todo this, we use a Feedforward Neural Network (Multi-LayerPerceptron), with an extra embedding layer to encode thestring valued input-output examples into feature vectors. Wetrain the neural network to predict the membership of eachterminal g ∈ π ( G ) to the critical set π ( G crit ) , based on asingle constraint c ∈ C . This prediction produces a 1D bi-nary vector of length | π ( G ) | , where 1 at position i in thebinary vector indicates the terminal in position i is predictedto belong to the critical set.When a SyGuS problem has multiple ( | C | ≥ ) con-straints, we run our prediction on each constraint individ-ually. We then use a voting mechanism to come to consen-sus on the construction of G ⋆ . After computing | C | binaryvectors across all constraints, the vectors are summed to pro-duce a ﬁnal voting vector. The magnitude of each element inthis ﬁnal voting vector represents the number of votes “fromeach constraint” that the terminal represented by that ele-ment is in the critical set. We then use this ﬁnal voting vectorin combination with our time predictions. Predicting time savings

It is only worthwhile to remove a terminal symbol g froma grammar G if T CG \ g is less than T CG . If a g stands to onlygive us a small gain in synthesis times, it may not be worththe risk that we incorrectly predicted its criticality.To predict the amount of time saved by removing a termi-nal g we examine the distribution of times in our training set T R time . For each terminal g , we calculate A g , the averagetime increase that results from removing g from the gram-mar. Denoting the time to run ( G, C ) P as T CG , we canwrite A g as: A g = P ni =1 T C i G − T C i G \ g n If a terminal g has a negative A g , then removing it fromthe grammar actually slows down synthesis, on average. Assuch, dropping the terminal from the grammar is not gener-ally helpful. Thus, we only consider those terminals with apositive A g in our second step. Combining predictions

With our predictions of the criticality a terminal g and oftime saved by removing g , we must make a ﬁnal decision onwhether or not we should remove g . To do this, we take thetop three terminals with the greatest average positive impacton synthesis time over the training set, as computed with A g . These tended to be terminals that mapped between typeswhich saved more time due to the internal mechanisms andheuristics of the CVC4 solver. We then use the ﬁnal votingvector from our criticality prediction to choose only two outof the three to remove from G to form G ⋆ . We chose to re-move only two terminals from G in order to minimize thelikelihood of generating a G ⋆ , such that π ( G ⋆ ) ⊆ π ( G crit ) .We conjecture that the number of terminals removed is agrammar-dependent parameter that must be selected on a pergrammar basis, just as the number of terminals with A g > is grammar speciﬁc. Falling back to the full grammar

There is some danger that G ⋆ will, in fact, not be sufﬁcientto synthesize a program. Thus, we propose a strategy that • ﬁrst, tries to synthesize a program with the grammar G ⋆ • second, if synthesis with G ⋆ is unsuccessful, falls back toattempting synthesis with the full grammar G .We determine how long to wait before switching from G ⋆ to G by ﬁnding an x that minimizes: n X i =1 (cid:26) T C i G ⋆ T C i G ⋆ < x min( x + T C i G , t ) T C i G ⋆ > x (cid:27) (2)where C . . . C n are the constraints from the training set,and t is the timeout for synthesis.Ideally, as captured in the ﬁrst line of the sum, ( C i , G ⋆ ) x P will ﬁnish before T C i G ⋆ = x. However, if abenchmark does not ﬁnish in that time, it will fall back onthe full grammar. Then, either ( C i , G ⋆ ) t − x P will suc-ceed, and synthesize the expression in total time x + T C i G , orsynthesis will timeout, in total time ( t − x ) + x = t . Experiments

The SyGuS competition (Alur et al. 2017) provides publiccompetition benchmarks and results from previous years.In particular, the PBE Strings dataset provides a collectionof PBE problems over a grammar that includes string, in-teger, and Boolean manipulating functions. First, we de-scribe our approach to generating a training set of PBEproblems over strings. Then, we present our results run-ning GRT against the 2019 competition’s winner in the PBEStrings track, CVC4 (N¨otzli et al. 2019; Barrett et al. 2011;Alur et al. 2019). We are able to reduce synthesis time by . and synthesize a new solution to a benchmark thatwas left unsolved by CVC4.3 44 45 46 67 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Benchmark Id S ec ond s t o c o m p l e t e CVC4GRT+CVC4Figure 2: The top 20 problems with longest synthesis time for CVC4 (excepting timeouts), and the corresponding synthesistimes for GRT+CVC4.

Technical details

The data triples generated during our initial data generationprocess of

T R crit are triples of strings. However, the neuralnetwork cannot process input-output pairs of type string asinput. Thus, this data must be encoded numerically beforeit can be utilized to train the neural network. Each characterin the input-output pairs is converted to its ASCII equiva-lent integer value. The size of each pair is then standardizedby adding a padding of zeros to the end of each newly en-coded input and output vector respectively. This creates twovectors: the encoded input and the encoded output, both ofwhich have a length of 20. These two vectors are then con-catenated to give us a single vector for training. By the endof this process the triples created in our ﬁrst data generationstep are now one vector of type N representing the input-output pair and a correct label P that will be predicted.To generate the training set for predicting synthesis times, T R time , we combine human generated and automaticallygenerated SyGuS problems. Speciﬁcally, we use 10 humangenerated SyGuS problems, and 20 randomly selected prob-lems from

T R crit .The overall architecture of our model can be categorizedas a multi-layer perceptron (MLP) neural network. Morespeciﬁcally, our model is made up of ﬁve fully connectedlayers: the input layer, three hidden layers, and the outputlayer. By using the Keras Framework, we include an em-bedding layer along with our input layer which enables usto create unique vector embeddings of length 100 for anygiven input-output pair in the dataset. This embedding layerlearns the optimal weights used to create these unique vec-tors through the training process. Thus, we create an encod-ing of the input-output pairs for training, while simultane-ously standardizing the scale of the vector before it reachesthe ﬁrst hidden layer. The hidden layers of the model are allfully connected, and all use the sigmoid activation function.In addition, we implement dropout during training to ensure that overﬁtting does not occur. The size of the hidden layerswas calculated using a geometric series to ensure that therewas a consistent decrease in layer size as the layers get closerto the output layer. Speciﬁcally, the size of each hidden layerwas calculated by: HL size ( n ) = input size (cid:0) output size input size (cid:1) nL num +1 (3)where L num represents the total number of layers in thenetwork. Our model used the Adam optimization methodand the binary-cross entropy loss function as it is well suitedfor multi-label classiﬁcation. Overall, our model was trainedon 124928 data points for 15 epochs with a batch size of 200producing a training time of 228 seconds. Results

After generating our data sets and training our model, wewrote a wrapper script to run GRT as a preprocessor forCVC4’s SyGuS engine. We compared the synthesis resultsof GRT+CVC4 with the synthesis results of running CVC4alone. All experiments were run on MacBook Pro with a 2.9GhZ Intel i5 processor with 8GB of RAM. CVC4 uses adefault random seed, and is deterministic over the choiceof that seed, so the results of synthesis from CVC4 on agiven grammar and set of constraints are deterministic. Wenote that our training data in no way used any of the SyGuSbenchmarks.GRT+CVC4 outperformed directly calling CVC4 on 32out of 64 benchmarks (50%), with a reduction in total syn-thesis time over all benchmarks from 1304.87 seconds withCVC4 to 683.09 seconds with GRT+CVC4. On one bench-mark, CVC4 timed out and was not able to ﬁnd a solution(even when the timeout was increased to 5000 seconds),while GRT+CVC4 found a solution within the timeout spec-iﬁed by the SyGuS Competition rules (3600 seconds). Onone benchmark, both CVC4 and GRT+CVC4 timeout (TO)9 61 62 58 48

Benchmark Id L e ng t ho f s y t h e s i ze d s o l u ti on s CVC4GRT+CVC4Figure 3: When the GRT+CVC4 found a different solutionthan CVC4, it was on average shorter than the solution foundwith the full grammar.and are not able to ﬁnd a solution. On the other 31 bench-marks, CVC4 performed the same (within ± . s) with andwithout the preprocessor. All the benchmarks for whichCVC4 performed the same as GRT+CVC4 ﬁnish in under2 seconds, and 28 of the 31 ﬁnish in under a second. Inthese cases there was little room for improvement even withGRT+CVC4.Figure 4 shows the exact running times with both the fulland reduced grammars from the benchmarks with the 30largest running times with the full grammar. These are thebenchmarks for which the synthesis times and size of the so-lution diverge most meaningfully, however all other data isavailable in the supplementary material for this paper. Fig-ure 4 also shows | P | and | P ∗ | , the sizes of the programsfound by the CVC4 and GRT+CVC4, respectively. We de-ﬁne size of a program as the number of nodes in the abstractsyntax tree of the program. In terms of the grammar G , thisis the number of terminals (including duplicates) that werecomposed to create the program.In Figure 2, we present a visual comparison of the resultsfor the 20 functions that took CVC4 the longest, while stillﬁnishing in the 3,600 second time limit. We note that wehave the largest gains on the problems for which CVC4 isthe slowest. Problems that CVC4 already handles quicklystand to beneﬁt less from our approach.In order to get a better baseline to understand the impactof GRT on running times, we ran a version of GRT withonly the criticality prediction, which we call GRTC. In thiscase, GRTC+CVC4 actually performed worse than CVC4by itself, increasing the running time on 53 out of the 62benchmarks that did not timeout on CVC4.On all but 5 benchmarks, CVC4 synthesized the same pro-gram when running with G and G ⋆ . The sizes of the pro-grams (in terms of the number of terminal symbols used) forthe benchmarks on which CVC4 synthesized different pro-grams are shown in Figure 3. While on some benchmarksGRT+CVC4 produced a larger solution than CVC4, as awhole the sum of the size of all solutions for CVC4 was id ﬁle T CG T CG ⋆ | P | | P ⋆ |

34 lastname-small.sl 1.80 1.84 4 435 bikes-long.sl 1.97 1.76 3 336 bikes-long-repeat.sl 2.08 1.71 3 337 lastname.sl 2.31 1.83 4 438 phone-6-short.sl 3.23 1.22 11 1139 phone-7-short.sl 3.26 1.26 11 1140 initials-long-repeat.sl 3.33 2.54 7 741 phone-5-short.sl 3.72 1.51 9 942 phone-7.sl 4.57 2.03 11 1143 phone-8.sl 4.72 2.17 11 1144 phone-6.sl 4.85 1.97 11 1145 phone-5.sl 4.88 2.20 11 1146 phone-9-short.sl 4.88 4.73 52 5247 phone-10-short.sl 8.81 8.28 49 4948 phone-9.sl 12.08 4.86 56 5249 phone-10.sl 31.23 8.49 97 4950 lastname-long.sl 32.40 25.49 4 451 lastname-long-repeat.sl 32.49 24.92 4 452 phone-6-long-repeat.sl 83.59 25.31 11 1153 phone-5-long-repeat.sl 84.77 33.68 11 1154 phone-7-long.sl 87.83 26.15 11 1155 phone-7-long-repeat.sl 89.13 26.23 11 1156 phone-5-long.sl 90.81 30.01 11 1157 phone-8-long-repeat.sl 91.04 35.64 11 1158 phone-9-long-repeat.sl 91.19 77.02 47 5059 phone-6-long.sl 98.15 24.75 11 1160 phone-8-long.sl 108.06 29.94 11 1161 phone-10-long-repeat.sl 149.53 129.43 49 6562 phone-10-long.sl 153.32 133.22 49 6563 initials-long.sl TO TO - -64 phone-9-long.sl TO 3516.21 - 49

Figure 4: Synthesis results over the 30 longest runningbenchmarks from SyGuS Competition’s PBE Strings track.806, while for GRT+CVC4 it was 789. Thus, overall, wewere able to outperform CVC4 on size of synthesis as well.The SyGuS competition scores each tool using the for-mula: N + 3 F + S , where N is the number of benchmarkssolved (non-timeouts), F is based on a “pseudo-logarithmicscale” (Alur et al. 2017) indicating speed of synthesis, and S is based on a “pseudo-logarithmic scale” indicating sizeof the synthesized solution. On all three of these measure-ments, GRT+CVC4 performed better than CVC4. There arenumber of other synthesis tracks available in the SyGuScompetition, which do not involve PBE constraints. We notethat our approach can selectively be applied as a preprocess-ing step for input in the PBE track without incurring an over-head on other synthesis tasks.Although we implemented a strategy to manage a switchfrom the reduced grammar back to the full grammar, wefound in practice that the optimal strategy for our systemwas to exclusively use the reduced grammar. Because wehad conservatively pruned the grammar, we had no need toswitch back to the full grammar. onclusions In a way, by training on a dataset we generate from the out-put of the interpreter of the language, we are encoding an ap-proximation of the semantics into our neural network. Whilethe semantic approximation is too coarse to drive synthesisitself, we can use it to prune the search space of potentialprograms. By predicting terminals impact on synthesis time,we more conservatively remove only terminals likely to havea positive impact. In conjunction with analytically driventools, we can then signiﬁcantly improve synthesis times withvery little overhead.While we have presented GRT, which demonstrates a sig-niﬁcant gain in performance over all existing SyGuS solvers,we still have many opportunities for further improvement. Inour prediction of the potential time saved by removing a ter-minal from the grammar, we have simply used the averageexpected value over all samples in the dataset. By using aneural network here, we may be able to leverage some prop-erty of the SyGuS problem constraints to have more accuratepotential time savings predictions. This would allow us, pos-sibly in combination with a more advance prediction com-bination strategy, to more aggressively prune the grammar.The drawback to this approach is that we may then poten-tially remove too much from the grammar. One of the keyfeatures of GRT is that it introduces no new timeouts, thatis, it does not remove any critical parts of the grammar.Additionally, our prediction of criticality of a terminaluses a voting mechanism to combine the prediction basedon each constraint. While this worked well in practice, thisstrategy ignores the potential for interaction between con-straints. In our preliminary exploration, we were not able toconstruct a model that captures this inter-constraint interac-tion in a useful way. This may be a path for future work. In asimilar vein, there exist a number of other works that deﬁnea criticality measure for each terminal in the SyGuS gram-mar (Balog et al. 2017; Kalyan et al. 2018). It may be possi-ble to leverage these in place of our criticality measure, andin combination with our time savings prediction, to achievebetter results.So far we have only explored the PBE Strings track of theSyGuS Competition. The competition also features a PBEBitVectors track where our technique may have signiﬁcantgains as well. This would require a new encoding scheme,but the overall approach would remain similar. In general,extending this work to allow for other PBE types, as well asmore general constraints, would broaden the potential real-world application of SyGuS.

Acknowledgments

This work was supported in partby NSF grants CCF-1302327, CCF-1715387, and CCF-1553168.

References [Alur et al. 2013] Alur, R.; Bodik, R.; Juniwal, G.; Martin,M. M.; Raghothaman, M.; Seshia, S. A.; Singh, R.; Solar-Lezama, A.; Torlak, E.; and Udupa, A. 2013. Syntax-guidedsynthesis. In , 1–8. IEEE. [Alur et al. 2017] Alur, R.; Fisman, D.; Singh, R.; and Solar-Lezama, A. 2017. Sygus-comp 2017: Results and analysis. arXiv preprint arXiv:1711.11438 .[Alur et al. 2019] Alur, R.; Fisman, D.; Padhi, S.;Reynolds, A.; Singh, R.; and Udupa, A. 2019.The 6th competition on syntax-guided synthesis.https://sygus.org/comp/2019/results-slides.pdf. Accessed:2019-11-20.[Andrychowicz and Kurach 2016] Andrychowicz, M., andKurach, K. 2016. Learning efﬁcient algorithms with hierar-chical attentive memory. arXiv preprint arXiv:1602.03218 .[Balog et al. 2017] Balog, M.; Gaunt, A. L.; Brockschmidt,M.; Nowozin, S.; and Tarlow, D. 2017. Deepcoder: Learn-ing to write programs. In .[Barrett et al. 2011] Barrett, C.; Conway, C. L.; Deters, M.;Hadarean, L.; Jovanovi´c, D.; King, T.; Reynolds, A.; andTinelli, C. 2011. CVC4. In

International Conference onComputer Aided Veriﬁcation , 171–177. Springer.[Bunel et al. 2018] Bunel, R.; Hausknecht, M.; Devlin, J.;Singh, R.; and Kohli, P. 2018. Leveraging grammar andreinforcement learning for neural program synthesis. arXivpreprint arXiv:1805.04276 .[Chen, Liu, and Song 2017] Chen, X.; Liu, C.; and Song, D.2017. Learning neural programs to parse programs.

CoRR abs/1706.01284.[Church 1963] Church, A. 1963. Application of recursivearithmetic to the problem of circuit synthesis.

Journal ofSymbolic Logic

Watch What I Do: Programming byDemonstration . Cambridge, MA, USA: MIT Press.[Devlin et al. 2017a] Devlin, J.; Bunel, R. R.; Singh, R.;Hausknecht, M.; and Kohli, P. 2017a. Neural program meta-induction. In

Advances in Neural Information ProcessingSystems , 2080–2088.[Devlin et al. 2017b] Devlin, J.; Uesato, J.; Bhupatiraju, S.;Singh, R.; Mohamed, A.-r.; and Kohli, P. 2017b. Robustﬁll:Neural program learning under noisy i/o. In

Proceedingsof the 34th International Conference on Machine Learning-Volume 70 , 990–998. JMLR. org.[Feng et al. 2017] Feng, Y.; Martins, R.; Wang, Y.; Dillig, I.;and Reps, T. W. 2017. Component-based synthesis for com-plex apis. In

Proceedings of the 44th ACM SIGPLAN Sym-posium on Principles of Programming Languages , POPL2017, 599–612.[Graves, Wayne, and Danihelka 2014] Graves, A.; Wayne,G.; and Danihelka, I. 2014. Neural turing machines. arXivpreprint arXiv:1410.5401 .[Gulwani 2011] Gulwani, S. 2011. Automating string pro-cessing in spreadsheets using input-output examples. In

PoPL’11, January 26-28, 2011, Austin, Texas, USA .Gvero et al. 2013] Gvero, T.; Kuncak, V.; Kuraj, I.; andPiskac, R. 2013. Complete completion using types andweights. In

ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, PLDI ’13, Seattle,WA, USA, June 16-19, 2013 , 27–38.[Joulin and Mikolov 2015] Joulin, A., and Mikolov, T. 2015.Inferring algorithmic patterns with stack-augmented recur-rent nets. In

Advances in neural information processing sys-tems , 190–198.[Kaiser and Sutskever 2015] Kaiser, Ł., and Sutskever, I.2015. Neural gpus learn algorithms. arXiv preprintarXiv:1511.08228 .[Kalyan et al. 2018] Kalyan, A.; Mohta, A.; Polozov, O.; Ba-tra, D.; Jain, P.; and Gulwani, S. 2018. Neural-guided deduc-tive search for real-time program synthesis from examples. arXiv preprint arXiv:1804.01186 .[Kuncak et al. 2010] Kuncak, V.; Mayer, M.; Piskac, R.; andSuter, P. 2010. Complete functional synthesis. In

Proceed-ings of the 2010 ACM SIGPLAN Conference on Program-ming Language Design and Implementation, PLDI 2010,Toronto, Ontario, Canada, June 5-10, 2010 , 316–329.[Manna and Waldinger 1979] Manna, Z., and Waldinger, R.1979. A deductive approach to program synthesis. In

Pro-ceedings of the Sixth International Joint Conference on Ar-tiﬁcial Intelligence, IJCAI 79, Tokyo, Japan, August 20-23,1979, 2 Volumes , 542–551.[N¨otzli et al. 2019] N¨otzli, A.; Reynolds, A.; Barbosa, H.;Niemetz, A.; Preiner, M.; Barrett, C.; and Tinelli, C. 2019.Syntax-guided rewrite rule enumeration for smt solvers.

SAT .[Nye et al. 2019] Nye, M.; Hewitt, L.; Tenenbaum, J.; andSolar-Lezama, A. 2019. Learning to infer program sketches. arXiv preprint arXiv:1902.06349 .[Padhi et al. 2019] Padhi, S.; Millstein, T. D.; Nori, A. V.;and Sharma, R. 2019. Overﬁtting in synthesis: Theory andpractice. In

Computer Aided Veriﬁcation - 31st InternationalConference, CAV 2019, New York City, NY, USA, July 15-18,2019, Proceedings, Part I , 315–334.[Raghothaman and Udupa 2019] Raghothaman, M., andUdupa, A. 2019. Language to specify syntax-guided syn-thesis problems. https://sygus.org/assets/pdf/SyGuS-IF.pdf.Accessed: 2019-11-20.[Santolucito et al. 2017] Santolucito, M.; Zhai, E.; Dhodap-kar, R.; Shim, A.; and Piskac, R. 2017. Synthesizing con-ﬁguration ﬁle speciﬁcations with association rule learning.

PACMPL

Extended Abstracts of the 2019 CHIConference on Human Factors in Computing Systems, CHI2019, Glasgow, Scotland, UK, May 04-09, 2019. [Shin et al. 2019] Shin, R.; Kant, N.; Gupta, K.; Bender, C.;Trabucco, B.; Singh, R.; and Song, D. 2019. Syntheticdatasets for neural program synthesis. In

International Con-ference on Learning Representations . [Si et al. 2018] Si, X.; Yang, Y.; Dai, H.; Naik, M.; and Song,L. 2018. Learning a meta-solver for syntax-guided programsynthesis.[Wang et al. 2018] Wang, C.; Huang, P.-S.; Polozov, A.;Brockschmidt, M.; and Singh, R. 2018. Execution-guidedneural program decoding. arXiv preprint arXiv:1807.03100arXiv preprint arXiv:1807.03100