[PDF] RbSyn: Type- and Effect-Guided Program Synthesis

Abstract

In recent years, researchers have explored component-based synthesis, which aims to automatically construct programs that operate by composing calls to existing APIs. However, prior work has not considered efficient synthesis of methods with side effects, e.g., web app methods that update a database. In this paper, we introduce RbSyn, a novel type- and effect-guided synthesis tool for Ruby. An RbSyn synthesis goal is specified as the type for the target method and a series of test cases it must pass. RbSyn works by recursively generating well-typed candidate method bodies whose write effects match the read effects of the test case assertions. After finding a set of candidates that separately satisfy each test, RbSyn synthesizes a solution that branches to execute the correct candidate code under the appropriate conditions. We formalize RbSyn on a core, object-oriented language \lambda_{syn} and describe how the key ideas of the model are scaled-up in our implementation for Ruby. We evaluated RbSyn on 19 benchmarks, 12 of which come from popular, open-source Ruby apps. We found that RbSyn synthesizes correct solutions for all benchmarks, with 15 benchmarks synthesizing in under 9 seconds, while the slowest benchmark takes 83 seconds. Using observed reads to guide synthesize is effective: using type-guidance alone times out on 10 of 12 app benchmarks. We also found that using less precise effect annotations leads to worse synthesis performance. In summary, we believe type- and effect-guided synthesis is an important step forward in synthesis of effectful methods from test cases.

Full PDF

RRbSyn: Type- and Effect-Guided Program Synthesis

Sankha Narayan Guria

University of MarylandCollege Park, Maryland, USA [email protected]

Jeffrey S. Foster

Tufts UniversityMedford, Massachusetts, USA [email protected]

David Van Horn

University of MarylandCollege Park, Maryland, USA [email protected]

Abstract

In recent years, researchers have explored component-basedsynthesis, which aims to automatically construct programsthat operate by composing calls to existing APIs. However,prior work has not considered efficient synthesis of meth-ods with side effects, e.g., web app methods that update adatabase. In this paper, we introduce RbSyn, a novel type-and effect-guided synthesis tool for Ruby. An RbSyn synthe-sis goal is specified as the type for the target method and aseries of test cases it must pass. RbSyn works by recursivelygenerating well-typed candidate method bodies whose writeeffects match the read effects of the test case assertions. Afterfinding a set of candidates that separately satisfy each test,RbSyn synthesizes a solution that branches to execute thecorrect candidate code under the appropriate conditions. Weformalize RbSyn on a core, object-oriented language 𝜆 𝑠𝑦𝑛 and describe how the key ideas of the model are scaled-upin our implementation for Ruby. We evaluated RbSyn on 19benchmarks, 12 of which come from popular, open-sourceRuby apps. We found that RbSyn synthesizes correct solu-tions for all benchmarks, with 15 benchmarks synthesizingin under 9 seconds, while the slowest benchmark takes 83seconds. Using observed reads to guide synthesize is effec-tive: using type-guidance alone times out on 10 of 12 appbenchmarks. We also found that using less precise effect an-notations leads to worse synthesis performance. In summary,we believe type- and effect-guided synthesis is an importantstep forward in synthesis of effectful methods from test cases. A key task in modern software development is writing codethat composes calls to existing APIs, such as from a libraryor framework.

Component-based synthesis aims to carry outthis task automatically, and researchers have shown how toperform component-based synthesis using SMT solvers [22];how to synthesize branch conditions [27]; and how to per-form synthesis given a very large number of components [10].This prior work guides the synthesis process using typesor special properties of the synthesis domain, which is crit-ical to achieving good performance. However, prior workdoes not explicitly consider side effects , which are perva-sive in many domains. For example, consider synthesizing a

Conference’17, July 2017, Washington, DC, USA method that updates a database. Without reasoning abouteffects—in this case, that the method body needs to changethe database—synthesis of such a method reduces to brute-force search, limiting its performance.In this paper, we address this issue by introducing RbSyn,a new tool for synthesizing Ruby methods. In RbSyn, theuser specifies the desired method by its type signature anda series of test cases it must pass. RbSyn then searches fora solution by enumerating candidates and checking themagainst the tests. The key novelty of RbSyn is that the searchis both type- and effect-guided . Specifically, the search beginswith a typed hole tagged with the method’s return type. Eachstep either replaces a typed hole with an expression of thattype, possibly introducing more typed holes; inserts an effecthole , annotated with a write effect that may be needed tosatisfy a test assertion; or replaces an effect hole with anexpression with the given write effect, possibly insertinganother effect hole. Once this process finds a set of methodbodies that cumulatively pass all tests, RbSyn uses a novelmerging strategy to construct a complete solution: It cre-ates a method whose body branches among the conditions,executing the corresponding (passing) code, thus yieldinga single method that passes all tests. (§ 2 gives a completeexample of RbSyn’s synthesis process.)We formalize RbSyn for 𝜆 𝑠𝑦𝑛 , a core object-oriented lan-guage. The synthesis algorithm is comprised of three parts.The first part, type-guided synthesis, is similar to prior work[14, 26, 28], but is geared towards imperative, object-orientedprograms. The second part is effect-guided synthesis , whichtries to fill an effect hole (cid:94) : 𝜖 with an expression witheffect 𝜖 . In 𝜆 𝑠𝑦𝑛 , an effect accesses a region 𝐴.𝑟 , where 𝐴 is a class and 𝑟 is an uninterpreted identifier. For example, Post.author might indicate reading instance field author of class

Post . This notion of effects balances precision andtractability: effects are precise enough to guide synthesiseffectively, yet coarse enough that reasoning about them issimple. The last part of the synthesis algorithm synthesizesbranch conditions to create a merged program that combinessolutions for individual tests into an overall solution for thecomplete problem. (§ 3 discusses our formalism.)Our implementation of RbSyn is built on top of RDL, aRuby type system [13]. Our implementation extends RDL toinclude effect annotations, including a self region to givemore precise effect information in the presence of inheri-tance. Our implementation also makes use of RDL’s type-levelcomputations [23] to provide precise typing during synthesis. a r X i v : . [ c s . P L ] F e b onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn define :update_post, "(Str, Str, {author: ?Str, title: ?Str, slug: ?Str}) → Post", [User, Post] do spec "author can only change titles" do setup { seed_db @post = Post.create(author: 'author', slug: 'hello-world', title: 'Hello World') update_post('author', 'hello-world', author: 'dummy', title: 'Foo Bar', slug: 'foobar') } postcond { |updated| assert { updated.id == @post.id } assert { updated.author == "author" } assert { updated.title == "Foo Bar" } assert { updated.slug == 'hello-world' } } end spec "other users cannot change anything" do setup { ... update_post('dummy', ...) } postcond { |updated| ... assert { updated.title == "Hello World" } } end end Figure 1.

Specification for update_post methodFinally, when searching for solutions, our implementationheuristically prioritizes further exploration of candidates thatare small and have passed more assertions. (§ 4 describes ourimplementation.)We evaluated RbSyn on a suite of 19 benchmarks, in-cluding seven benchmarks we wrote and 12 benchmarksextracted from three widely used, open-source Ruby apps:Discourse, Gitlab, and Disaspora. For the former, we wroteour own specifications. For the latter, we used unit teststhat came with the benchmarks. We found that RbSyn syn-thesizes correct solutions for all benchmarks and does soquickly, taking less than 9 seconds each for 15 of the bench-marks, and 83 seconds for the slowest benchmark. Moreover,type- and effect-guidance is critical. Without it, a majorityof the benchmarks time out after five minutes. Finally, weexamine the tradeoff of effect precision versus performance.We found that restricting effects to class names only causes 3benchmarks to time out, and restricting effects to only puri-ty/impurity causes 10 benchmarks to time out. (§ 5 discussesthe evaluation in detail.)We believe that RbSyn is an important step forward insynthesis of effectful methods from test cases.

In this section, we illustrate RbSyn by using it to synthesizea method from a hypothetical web blogging app. This app makes heavy use of ActiveRecord, a popular database accesslibrary for Ruby on Rails. It is the ActiveRecord methodswhose side effects RbSyn uses to guide synthesis.Figure 1 shows the synthesis problem. This particular appincludes database tables for users and posts. In ActiveRe-cord, rows of these tables are represented as instances ofclasses

User and

Post , respectively. For reference, the tableschemas are shown in lines 1 and 2. Each user has a name and username . Each post has the author ’s username, thepost’s title , and a slug , used to compute a permalink.The goal of this particular synthesis problem, given bythe call to define , is to create a method update_post thatallows users to change the information about a post. Lines 4and 5 specify the method’s type signature in the format ofRDL [13], a Ruby type system that RbSyn uses for types andtype checking. Here, the first two arguments are strings, andthe last is a finite hash type that describes an instance of

Hash with optional (indicated by ? ) keys author , title , and slug (all symbols , which are just interned strings) that mapto strings. The method itself returns a Post .In addition to the type signature, the synthesis problemalso includes a lists of constants that can be used in the targetmethod. In this case, those constants are the classes

User and

Post , as given by the last argument to define on line 5.These classes can then be used to invoke singleton (class)methods in the synthesized method.Finally, the synthesis problem includes a number of specs ,which are just test cases. Each spec has a title, for humanconvenience; a setup block to establish any necessary pre-conditions and call the synthesized method; and a postcond block with assertions that must hold after the synthesizedmethod runs. As we will see below, separating the pre- andpostconditions allows RbSyn to more easily use effects toguide synthesis. In this example, both specs add a few usersand a post created by each of them to the database (call to seed_db , details not shown) and then create a post titled“Hello World” by the user author . The first spec asserts that update_post allows author to update a post’s title. The sec-ond spec asserts that a dummy user cannot update the post.The check for id ensures that only existing posts are updated(any new posts will have a new unique id).The final, synthesized solution is shown on the right ofFigure 2. Notice the synthesized code calls several ActiveRe-cord methods ( exists? , where , and first ) as well as thehash access method [] . Applying solver-aided synthesis tothis problem would require developing accurate models ofthese methods, which is a difficult challenge [25]. To addressthis limitation, RbSyn instead enumerates candidates, whichcan then be run to check them against the specs. As thesearch space is vast, RbSyn uses update_post ’s type signa-ture and the effects from the specs’ postcond s the guide thesearch. Finally, RbSyn uses a novel merging algorithm tosynthesize the necessary branch condition to yield a solutionthat satisfies both specs. onference’17, July 2017, Washington, DC, USA The left portion of Figure 2 shows the search process RbSynuses to solve this synthesis problem. To begin, RbSyn ob-serves that the return type of update_post is Post . Thus,the search begins (upper left) by creating a candidate methodbody □ :Post , which is a typed hole that must be filled byan expression of type Post . RbSyn then iteratively expandsholes in candidates, running the specs whenever it producesfully concretized candidates with no holes.In general, RbSyn can fill a typed hole with a local variable,a constant, or a method call. As there are no local variables(which so far are just parameters) or constants of the ap-propriate type, RbSyn chooses a method call. To do so, itsearches through the available method type annotations tofind those that could return

Post . In this case, RbSyn takes ad-vantage of RDL’s type annotations for ActiveRecord [23] tosynthesize candidates C1 and C2 , among others (not shown).It is straightforward for the user to add type annotationsfor any other library methods that might be needed by thesynthesized method. For illustration purposes, we also showa candidate C3 that returns the wrong type. Such candidatesare discarded by RbSyn, vastly reducing the search space.Next, RbSyn tries to fill holes in candidate expressions,starting with smaller candidates. In this case, it first consid-ers C1 , which has a hole of type Class , which is thesingleton type for the constant

Post . Thus, there is only onechoice for the hole, yielding candidate C4 . Since C4 has noholes, RbSyn runs it against the specs. More specifically, itruns it against the first spec—as we will discuss shortly, Rb-Syn synthesizes solutions for each spec independently, andthen combines them. In this case, C4 fails the spec (becausethe first post in the database is not the one to be updated,due to the initial database seeding) and hence is rejected.Continuing with C5 , RbSyn fills in the (finite hash-typed)hole, yielding choices that include C6 and C7 . RbSyn rejects C6 since there is no way to construct an expression of type Int . However, for C7 , there are two local variables of type Str from the method arguments. Substituting these yields C8 and C9 . C8 uses arg0 , the username, to query the Post table’s slug, so it fails. C9 queries the Post table with thecorrect slug value arg1 . This passes the first two assertions(line 15 onwards) but fails the third, which expects the posttitle to be updated from “Hello World” to “Foo Bar.”RbSyn extends RDL’s type annotations to include readand write effects. When the expression inside an assert evaluates to false , RbSyn infers the assert ’s read and writeeffects based on those of the methods it calls. For example,we can give the

Post method, used by the thirdassertion, the following signature: type Post, :title, '() → Str', read: ['Post.title'] A indicates instance method m of class A . Thus, RbSyn sees that the failing assertion reads

Post.title ,an abstract effect label. To make the assertion succeed, Rb-Syn inserts an effect hole (cid:94) : Post.title in the candidateprogram (

C10 ). It also saves the value of the previous candi-date expression in a temporary variable, and inserts a holewith the candidate’s type at the end. RbSyn then continuesthe search, trying to fill the effect hole with a call to a methodwhose write effect matches the hole—such a call could poten-tially satisfy the failed assertion. Here, RbSyn replaces theeffect hole (

C11 ) with a call to

Post , which is sucha method. (We should note that all previous candidates thatfailed a spec due to a side effect will also have effect holesadded in a similar fashion. We omit these candidates fromthe discussion as they do not lead to a solution.)RbSyn continues by using type-guided synthesis for thetyped holes of

C11 —yielding

C12 , rejected due to assertionfailures—and then

C13 . After several steps (not shown), Rb-Syn arrives at

C14 , which fails the spec, and

C15 , which fullysatisfies the first spec. Indeed, we see this exact expressionin lines 4–6 of the solution in Figure 2.

RbSyn next uses the same technique to synthesize an expres-sion that satisfies the second spec, yielding the expressionshown on line 8. Now RbSyn needs to merge these individualsolutions into a single solution that passes all specs. At ahigh-level, it does so by constructing a program if 𝑏 then 𝑒 elsif 𝑏 then 𝑒 end , where the 𝑒 𝑖 are the solutionsfor the specs and the 𝑏 𝑖 are branch conditions capturing theconditions under which those expressions pass the specs.To create the 𝑏 𝑖 , RbSyn uses the same technique again, thistime synthesizing a boolean-valued expression that evaluatesto true under the setup of spec 𝑖 . In this case, this processresults in the same branch condition true for both specs.However, since this trivially holds for both specs, this branchcondition does not work—we need to find a branch conditionthat distinguishes the two cases.Next RbSyn tries to synthesize a branch condition 𝑏 ′ that evaluates to true for the setup of the first spec and false for the setup of the second. This yields the more pre-cise branch condition 𝑏 ′ = Post.exists?(author: arg0,slug: arg1) . This is a sufficient condition, as the update_-post method is supposed to update a post only if a postwith slug arg1 is authored by arg0 . It solves an analogoussynthesis problem for the second spec, yielding 𝑏 ′ = !Post.-exists?(author: arg0, slug: arg1) . As these are thenegation of each other, RbSyn then merges these two to-gether as if-then-else (rather than an if-then-elsif--then-else ), yielding the final synthesized program in Fig-ure 2. onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn ☐: Post (☐:Class) .first(☐:Class) .where(☐:{FF.}).first(☐:Class) .exists?(☐:{FF.}) Post.where(☐:{id: Int, slug: Str, FF.}) .first Post.where({ id: (☐:Int)}) .firstPost.where({ slug: (☐:Str)}) .first Post.where({ slug: arg0}) .firstPost.where({ slug: arg1}) .firstPost.first ✓ ✗✗✗ ✗ t0 = Post.where({ slug: arg1}).first(☐:Post).title = (☐:Str)☐:Postt0 = Post.where({ slug: arg1}).firstt0.title = arg0t0t0 = Post.where({ slug: arg1}).firstt0.title = (☐:{ author: Str, title: Str, FF.}) [☐:author or title or FF.]t0 ✗✗ t0 = Post.where({ slug: arg1}).firstt0.title = arg2[:author]t0t0 = Post.where({ slug: arg1}).firstt0.title = arg2[:title]t0 ✗ Eﬀect: Post.title

Type Error Test Failure No Terms Test FailureTest FailureTest FailureTest Failure

C1 C2C3 C4C5 C6C7 C8C11C12 C13C14C15 t0 = Post.where({ slug: arg1}).first( ◇ :Post.title)☐:Post C10C9 def update_post(arg0, arg1, arg2) if Post.exists?(author: arg0, slug: arg1) t0 = Post.where(slug:arg1).first t0.title=arg2[:title] t0 else Post.where(slug: arg1).first end end Figure 2.

Left:

Steps in the synthesis of solution to the first specification. Some choices available to the synthesis algorithmhave been omitted for simplicity.

Right:

Synthesized update_post method.

Values 𝑣 :: = nil | true | false | [ 𝐴 ] Expressions 𝑒 :: = 𝑣 | 𝑥 | 𝑒 ; 𝑒 | 𝑒.𝑚 ( 𝑒 )| if 𝑏 then 𝑒 else 𝑒 | let 𝑥 = 𝑒 in 𝑒 | □ : 𝜏 | (cid:94) : 𝜖 Conditionals 𝑏 :: = 𝑒 | ! 𝑏 | 𝑏 ∨ 𝑏 Types 𝜏 :: = 𝐴 | 𝜏 ∪ 𝜏 Programs 𝑃 :: = def 𝑚 ( 𝑥 ) = 𝑒 Specs 𝑠 :: = ⟨ 𝑆, 𝑄 ⟩ Setup 𝑆 :: = 𝑒 ; 𝑥 𝑟 = 𝑃 ( 𝑒 ) Postconditions 𝑄 :: = assert 𝑒 | 𝑄 ; 𝑄 Spec Set Ψ : = { 𝑠 𝑖 } Synthesis Goal 𝐺 :: = ⟨ 𝜏 → 𝜏, Ψ ⟩ Class Table 𝐶𝑇 :: = ∅ | 𝐴.𝑚 : 𝜎, 𝐶𝑇 Method Types 𝜎 :: = 𝜏 ⟨ 𝜖 𝑟 ,𝜖 𝑤 ⟩ −−−−−−→ 𝜏 Type Env. Γ :: = ∅ | 𝑥 : 𝜏, Γ Dynamic Env. 𝐸 :: = 𝑥 → 𝑣 Constants Σ :: = ∅ | 𝑣 : 𝜏, Σ Effect 𝜖 :: = • | ∗ | 𝐴. ∗ | 𝐴.𝑟 | 𝜖 ∪ 𝜖𝑟 ∈ effect regions • ⊆ 𝜖 𝜖 ⊆ ∗ 𝐴 . ∗ ⊆ 𝐴 . ∗ and 𝐴 .𝑟 ⊆ 𝐴 .𝑟 and 𝐴 .𝑟 ⊆ 𝐴 . ∗ if 𝐴 ≤ 𝐴 𝜖 ⊆ 𝜖 ∪ 𝜖 𝜖 ⊆ 𝜖 ∪ 𝜖 ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ ∪ ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ = ⟨ 𝜖 𝑟 ∪ 𝜖 𝑟 , 𝜖 𝑤 ∪ 𝜖 𝑤 ⟩ 𝑥 ∈ variables, 𝑚 ∈ methods, 𝐴 ∈ classes, Nil ≤ 𝜏 𝜏 ≤ Obj 𝜏 ≤ 𝜏 ∪ 𝜏 𝜏 ≤ 𝜏 ∪ 𝜏 Figure 3.

Syntax and Relations of 𝜆 𝑠𝑦𝑛 . In this section, we formalize RbSyn on 𝜆 𝑠𝑦𝑛 , a core object-oriented calculus shown in Figure 3. Values 𝑣 include nil , true , false , and objects [ 𝐴 ] of class 𝐴 . Note that we omit fields to keep the presentation simpler. Expressions 𝑒 includevalues, variables 𝑥 , sequences 𝑒 ; 𝑒 , method calls 𝑒.𝑚 ( 𝑒 ) , con-ditionals if 𝑏 then 𝑒 else 𝑒 , and variable bindings let 𝑥 = 𝑒 in 𝑒 . A conditional guard 𝑏 can be an expression 𝑒 , a nega-tion ! 𝑏 , or a disjunction 𝑏 ∨ 𝑏 . The grammar for guards islimited to match what RbSyn can actually synthesize.Expressions also include typed holes □ : 𝜏 and effect holes (cid:94) : 𝜖 , which are placeholders that are eventually filled withan expression of the given type, or expression with the givenwrite effect, respectively. We note our synthesis algorithmonly inserts effect holes at positions that can have any type.Types are either classes or unions of types, and we assumeclasses form a lattice with Nil (the class of nil ) as the bottomelement and

Obj as the top element. We write 𝐴 ≤ 𝐵 whenclass 𝐴 is a subclass of 𝐵 according to the lattice. We defer thedefinition of effects for the moment. Finally, a synthesizedprogram 𝑃 is a single method definition def 𝑚 ( 𝑥 ) = 𝑒 . Werestrict the method to one argument for convenience.A spec 𝑠 in 𝜆 𝑠𝑦𝑛 is a pair of setup code 𝑆 and a postcondi-tion 𝑄 . A setup 𝑒 ; 𝑥 𝑟 = 𝑃 ( 𝑒 ) includes some initialization 𝑒 followed by a special form indicating calling the synthesizedmethod in 𝑃 with argument 𝑒 and binding the result to 𝑥 𝑟 .The postcondition is a sequence of assertions that can test 𝑥 𝑟 and inspect the global state using library methods. Wewrite Ψ for a set of specs, and a synthesis goal 𝐺 is a pair ⟨ 𝜏 → 𝜏 , Ψ ⟩ , where 𝜏 and 𝜏 are the method’s domain andrange types, respectively, and Ψ are the specs the synthesizedmethod should satisfy.The next part of Figure 3 defines additional notation usedin the formalism. Synthesized methods can use classes andmethods from a class table 𝐶𝑇 , which maps class and methodnames to the methods’ types. For example, the class tablehas type information for other methods of a target app andlibrary methods such as those from ActiveRecord. A methodtype 𝜎 has the form 𝜏 ⟨ 𝜖 𝑟 ,𝜖 𝑤 ⟩ −−−−−−→ 𝜏 ′ , where 𝜏 and 𝜏 ′ are the onference’17, July 2017, Washington, DC, USA domain and range types, respectively, and ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ specifiesthe method’s read effect 𝜖 𝑟 and write effect 𝜖 𝑤 (discussedshortly). During type-guided synthesis, RbSyn maintains atype environment Γ mapping variables to their types. Whenexecuting a synthesized program, the operational semantics(omitted) uses a dynamic environment 𝐸 mapping variablesto their values. During synthesis, Σ is a list of user-suppliedconstants that can fill holes. Effects.

The last part of Figure 3 defines effects 𝜖 . In RbSyn,effects are hierarchical names that abstractly label the pro-gram state. The empty effect • denotes no side effect, used forpure computations. The effect ∗ is the top effect, indicatinga computation that might touch any state in the program.Lastly, effect 𝐴. ∗ denotes code that touches any state withinclass 𝐴 , and 𝐴.𝑟 denotes code that touches the region labeled 𝑟 in 𝐴 , where region names are completely abstract. Effectscan also be unioned together.We define subsumption 𝜖 ⊆ 𝜖 on effects to hold when 𝜖 may include 𝜖 . Effects • and ∗ are the bottom and top,respectively, of the ⊆ relation, and if 𝐴 ≤ 𝐴 then 𝐴 .𝑟 ⊆ 𝐴 .𝑟 and 𝐴 .𝑟 ⊆ 𝐴 . ∗ and 𝐴 . ∗ ⊆ 𝐴 . ∗ . We also have standardrules for subsumption with effect unions.In RbSyn, all effects arise from calling methods from theclass table 𝐶𝑇 , which have effect annotations of the form ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , where 𝜖 𝑟 and 𝜖 𝑤 are the method’s read and writeeffects, respectively. We extend subsumption to such pairedeffects in the natural way. During synthesis, if RbSyn ob-serves the failure of an assertion with some read effect 𝜖 𝑟 , ittries to fix the failure by inserting a call to some method withwrite effect 𝜖 𝑤 such that 𝜖 𝑟 ⊆ 𝜖 𝑤 , i.e., it tries writing to thestate that is read. For example, in Section 2, this techniquegenerated a call to Post .Our effect language is inspired by the region path listsapproach of Bocchino Jr et al. [4], but is much simpler. Weopted for coarse-grained, abstract effects to make it easier towrite annotations for library methods. Although class namesare included in the effect language, such names are for hu-man convenience only—nothing precludes a method in class 𝐴 being annotated with an effect to 𝐵.𝑟 for some other class 𝐵 . We found that this approach works well for our problemsetting of synthesizing code for Ruby apps, where tryingto precisely model heap and database state would be diffi-cult. However, we believe the core of this approach—pairingeffects (in our case, reads and writes) and then creating can-didates using the opposing element of such a pair—can begeneralized to more complex effect systems. Synthesis Problem.

We can now formally specify the syn-thesis problem. Given a synthesis goal ⟨ 𝜏 → 𝜏 , {⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩}⟩ ,RbSyn searches for a program 𝑃 such that, for all 𝑖 , assumingthat 𝑆 𝑖 calls 𝑃 with an argument of type 𝜏 , evaluating to 𝑥 𝑟 of type 𝜏 , it is the case that 𝑃 ⊢ 𝑆 𝑖 ; 𝑄 𝑖 ⇓ 𝑣 . In other words,evaluating the setup followed by the postcondition yields Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 : 𝜏 Γ ( 𝑥 ) = 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑥 ⇝ 𝑥 : 𝜏 T-Var Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ [ 𝑥 ↦→ 𝜏 ] ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 let 𝑥 = 𝑒 in 𝑒 ⇝ let 𝑥 = 𝑒 ′ in 𝑒 ′ : 𝜏 T-Let Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ ( □ : 𝜏 ) : 𝜏 T-Hole 𝑣 : 𝜏 ∈ Σ 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ 𝑣 : 𝜏 S-Const Γ ( 𝑥 ) = 𝜏 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ 𝑥 : 𝜏 S-Var 𝑚 : 𝜏 → 𝜏 ∈ 𝐶𝑇 ( 𝐴 ) 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ ( □ : 𝐴 ) .𝑚 ( □ : 𝜏 ) : 𝜏 S-App

Figure 4.

Type-guided synthesis rules (selected).any value rather than aborting with a failed assertion. Weomit the evaluation rules as they are standard.

The first component of RbSyn is type-guided synthesis,which creates candidate expressions of a given type by try-ing to fill a hole □ : 𝜏 where 𝜏 is the method return type.Figure 4 shows a subset of the type-guided synthesis rules;the full set can be found in Appendix A.2. These rules havethe form Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 : 𝜏 , meaning with constants Σ ,in type environment Γ , under class table 𝐶𝑇 , the holes in 𝑒 can be rewritten to yield 𝑒 , which has type 𝜏 .The rules in Figure 4 have two forms. The T- rules applyto expressions whose outermost form is not rewritten. Thusthese rules perform standard type checking. For example, T-Var type checks a variable 𝑥 by checking its type against thetype environment Γ , leaving the term unchanged. T-Let type-checks and recursively rewrites (or not) the subexpressionsand then rewrites those new expressions into a let-binding,ensuring the resulting term is type-correct. Finally, T-Holeapplies to a typed hole that is not being rewritten, in whichcase it remains the same and has the given type.The S- rules rewrite typed holes. S-Const replaces a holeby a constant of the correct type from Σ . S-Var is similar,replacing a hole by a variable from Γ . Finally, S-App replacesa hole with a call to a method with the right return type,inserting typed holes for the method receiver and argument. Type Narrowing.

Notice that in these three rules, the termreplacing the hole may actually have a subtype of the orig-inal hole’s type. Thus, type-guided synthesis could narrow onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn Σ , Γ , 𝜖 𝑟 ⊢ 𝐶𝑇 𝑒 ↠ 𝑒 Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 : 𝜏 Σ , Γ , 𝜖 𝑟 ⊢ 𝐶𝑇 𝑒 ↠ let 𝑥 = 𝑒 in ( (cid:94) : 𝜖 𝑟 ; □ : 𝜏 ) S-Eff Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 : 𝜏 Σ , Γ ⊢ 𝐶𝑇 (cid:94) : 𝜖 ⇝ ( (cid:94) : 𝜖 ) : Obj

T-EffObj 𝜖 𝑟 ⊆ 𝜖 ′ 𝑤 𝑚 : 𝜏 ⟨ 𝜖 ′ 𝑟 ,𝜖 ′ 𝑤 ⟩ −−−−−−→ 𝜏 ∈ 𝐶𝑇 ( 𝐴 ) Σ , Γ ⊢ 𝐶𝑇 (cid:94) : 𝜖 𝑟 ⇝ □ : 𝜖 ′ 𝑟 ; ( □ : 𝐴 ) .𝑚 ( □ : 𝜏 ) : 𝜏 S-EffApp Σ , Γ ⊢ 𝐶𝑇 (cid:94) : 𝜖 ⇝ nil : Nil

S-EffNil

Figure 5.

Effect guided synthesis ruletypes in a synthesized program, potentially also narrow-ing the search space. For example, consider an expression ( □ : Str ) . append ( □ : Str ) that joins two strings, and as-sume the set of constants Σ includes nil . Notice that nil isa valid substitution for □ , which will then cause the type ofthe receiver to narrow to Nil . But then the typing derivationfails because the

Nil type has no append method, stoppingfurther exploration along this path. In contrast, if we hadtyped the replacement term at

Str , then RbSyn would havefruitlessly continued the search, trying various replacementsfor □ only to reject them due to a runtime failure for invok-ing a method on nil . The second component of RbSyn is effect-guided synthesis,used when type-guided synthesis creates a candidate thatdoes not satisfy the postcondition of the tests. If this happens,RbSyn computes the effect ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ of the failed assertion inthe postcondition. (We defer the formal rules for computingthis effect to Appendix A.1, as they simply union the effectsof method calls in the assertion.) Then, we hypothesize thatthe assertion may have failed because the region denoted by 𝜖 𝑟 is in the wrong state.To potentially fix the state, RbSyn applies a new rule S-Eff,shown in Figure 5. The hypothesis computes the type 𝜏 of 𝑒 , the candidate expression that failed the postcondition. Inthe conclusion, 𝑒 is rewritten to let 𝑥 = 𝑒 in ( (cid:94) : 𝜖 𝑟 ; □ : 𝜏 ) ,i.e., 𝑒 is computed, bound to 𝑥 , and two holes are sequenced.The first must be filled with an expression of the desiredeffect 𝜖 𝑟 . The second must have 𝑒 ’s type 𝜏 , to preserve type-correctness. For example, it could be filled by 𝑥 , as happenedin Figure 2 when t0 is returned.The rules for working with effect holes are shown in thebottom of Figure 5, which extends Figure 4. T-EffObj givesan effect hole that is not rewritten type Obj . Since this isthe top of the type hierarchy, this ensures an effect hole ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑒 , 𝑏 , Ψ ∪ Ψ ⟩ if 𝑒 ≡ 𝑒 and 𝑏 = ⇒ 𝑏 (1) ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑒 , 𝑏 ∨ 𝑏 , Ψ ∪ Ψ ⟩ if 𝑒 ≡ 𝑒 and 𝑏 ̸ = ⇒ 𝑏 (2) ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑒 , 𝑏 𝑠𝑦𝑛 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 𝑠𝑦𝑛 , Ψ ⟩ if 𝑒 (cid:46) 𝑒 and 𝑏 = ⇒ 𝑏 where ∀⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑠𝑦𝑛 ⊢ 𝑆 𝑖 ; assert 𝑥 𝑟 ⇓ 𝑣 ∧ ∀⟨ 𝑆 𝑗 , 𝑄 𝑗 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑠𝑦𝑛 ⊢ 𝑆 𝑗 ; assert ! 𝑥 𝑟 ⇓ 𝑣 and ∀⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑠𝑦𝑛 ⊢ 𝑆 𝑖 ; assert ! 𝑥 𝑟 ⇓ 𝑣 ∧ ∀⟨ 𝑆 𝑗 , 𝑄 𝑗 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑠𝑦𝑛 ⊢ 𝑆 𝑗 ; assert 𝑥 𝑟 ⇓ 𝑣 (3) Figure 6.

Rewriting rules.can safely be replaced by a term with any type. In otherwords, effect holes are filled for their effects, not their types.S-EffApp does the heavy lifting, filling an effect hole with acall to a method 𝑚 with a write effect 𝜖 ′ 𝑤 that subsumes thedesired effect 𝜖 𝑟 . Of course, this call may itself read state 𝜖 ′ 𝑟 ,so the rule precedes the method call with a hole with thateffect, in case said state needs to change. Finally, S-EffNilreplaces an effect hole with nil , which removes it from theprogram. This is used in case some extra effect holes areadded that are not actually needed. The last component of RbSyn combines expressions that passindividual specs into a final program that passes all specs.More specifically, given a synthesis goal ⟨ 𝜏 → 𝜏 , { 𝑠 𝑖 }⟩ ,RbSyn first uses type- and effect-guided synthesis to createexpressions 𝑒 𝑖 such that 𝑒 𝑖 is the solution for spec 𝑠 𝑖 . Then,RbSyn combines the 𝑒 𝑖 into a branching program roughly ofthe form if 𝑏 then 𝑒 elsif 𝑏 then 𝑒 . . . for some 𝑏 𝑖 .For each 𝑖 , RbSyn uses the type-guided synthesis rules in§ 3.1 to synthesize a 𝑏 𝑖 such that under the setup 𝑆 𝑖 of spec 𝑠 𝑖 , conditional 𝑏 𝑖 evaluates to true , i.e., def 𝑚 ( 𝑥 ) = 𝑏 𝑖 ⊢ 𝑆 𝑖 ; assert 𝑥 𝑟 ⇓ 𝑣 . Note effect-guided synthesis is not usedhere as the asserted expression 𝑥 𝑟 is pure.Notice that while each initial 𝑏 𝑖 evaluates to true underthe precondition, there is no guarantee it is a sufficient con-dition for 𝑠 𝑖 to satisfy the postcondition—especially becauseRbSyn aims to synthesize small expressions, as discussedfurther in § 4. Moreover, there may be multiple 𝑒 𝑖 that are ac-tually the same expression, and therefore could be combinedto yield a smaller solution.Thus, RbSyn next performs a merging step to create thefinal solution. This process operates on tuples of the form ⟨ 𝑒, 𝑏, Ψ ⟩ , which is a hypothesis that the program fragment if 𝑏 then 𝑒 satisfies the specs Ψ . RbSyn repeatedly mergessuch tuples using an operation ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ to onference’17, July 2017, Washington, DC, USA represent that if 𝑏 then 𝑒 else if 𝑏 then 𝑒 satisfies thespecs Ψ ∪ Ψ . We define Specs (⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ... ) = (cid:208) Ψ 𝑖 , i.e.,the specs from merged tuples, and Prog (⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ... ) = def 𝑚 ( 𝑥 ) = if 𝑏 then 𝑒 else ... , a definition with theexpression represented by the merged tuples.Figure 6 defines rewriting rules that are applied to createthe final solution. Rule 1 simplifies the case where 𝑒 and 𝑒 are the same and 𝑏 implies 𝑏 , yielding a single expressionand branch that satisfy Ψ ∪ Ψ . Note we omit the symmetriccase for all rules due to space limitations. Rule 2 applies when 𝑏 does not imply 𝑏 but 𝑒 and 𝑒 are the same. In this case, 𝑒 satisfies the union of the specs under the disjunction ofthe branch conditions. (Note this rule could also applied if 𝑏 ⇒ 𝑏 , but the resulting solution would be longer thanRule 1 generates.) Finally, Rule 3 applies when 𝑒 and 𝑒 differbut 𝑏 implies 𝑏 . In this scenario, it must be that 𝑏 , 𝑏 , orboth are insufficient to select among the 𝑒 𝑖 . Thus, RbSynsynthesizes stronger conditionals 𝑏 𝑠𝑦𝑛𝑖 that hold for all specsin Ψ 𝑖 and do not hold for the specs for the other tuple.RbSyn also includes a number of other merging rules,deferred to Appendix A.4, for further simplifying expressions.For example, if 𝑏 then 𝑒 else if ! 𝑏 then 𝑒 else nil canbe rewritten as if 𝑏 then 𝑒 else 𝑒 , which was used togenerate the solution in Figure 2. Checking Implication.

Checking the implications in Fig-ure 6 is challenging since branch conditions may includemethod calls whose semantics is hard to reason about. Tosolve this problem, RbSyn checks implications using a heuris-tic approach that is effective in practice. Each unique branchcondition 𝑏 is mapped to a fresh boolean variable 𝑧 . Simi-larly, ! 𝑏 is encoded as ¬ 𝑧 , and 𝑏 ∨ 𝑏 is encoded as 𝑧 ∨ 𝑧 .Then to check an implication 𝑏 ⇒ 𝑏 , RbSyn uses a SATsolver to check the implication of the encoding. While thischeck could err in either direction (due to not modeling thesemantics of the 𝑏 𝑖 precisely), we found it works surprisinglywell in practice. In case the implication check fails due tolack of precision, we fall back on the original ⊕ form whichrepresents the complete program if 𝑏 then 𝑒 else if . . . without loss of precision. Should the implication check in-correctly succeed, it will be caught by running the mergedprogram against the assertions. Constructing the Final Program.

Finally, notice that themerge operation ⊕ is not associative, and it may yield dif-ferent results depending on the order in which it is applied.Thus, to get the best solution, RbSyn uses Algorithm 1. Itbuilds the set of all possible merged fragments (line 2). Thenit simplifies each candidate solution using the rewrite rulesand only considers a candidate valid if it passes all tests.It returns any such program as the solution. This branchmerging strategy tries all combinations, so it is less sensi-tive to spec order than other component based synthesisapproaches [27]. In practice, we found that reordering thespecs does not have much effect. Algorithm 1

Merge programs procedure MergeProgram(candidates = {⟨ 𝑒 𝑖 , 𝑏 𝑖 , Ψ 𝑖 ⟩} ) merged ← { (cid:201) ⟨ 𝑒 𝑖 , 𝑏 𝑖 , Ψ 𝑖 ⟩} final ← {} for all 𝑚 ∈ merged do 𝑚 ← apply (1)-(3) to 𝑚 until no rewrites possible final ← final ∪ { 𝑚 } if ∀⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩ ∈ Specs ( 𝑚 ) . (cid:211) 𝑖 Prog ( 𝑚 ) ⊢ 𝑆 𝑖 ; 𝑄 𝑖 ⇓ 𝑣 end for return Prog ( 𝑚 ) s.t. 𝑚 ∈ final end procedure3.4 Discussion Before discussing our implementation in the next section,we briefly discuss some design choices in our algorithm.Our effect system uses pairs of read and write effects inregions. As mentioned, this core idea could be extended toany effects in a test assertion that can be paired with an effectin the synthesized method body. For example, throwing andcatching exceptions, I/O to disk or network, or enabling/dis-abling features in a UI could all be expressed this way. Weleave exploring such effect pairs to future work.One convenient feature of our algorithm is that correct-ness is determined by passing specs, which are directly exe-cuted. Thus, the synthesizer can generate as many candidatesas it likes—i.e., be as over approximate as it likes—as longas its set of candidates includes the solution. This featureenables RbSyn to use a fairly simple effect annotation systemcompared to effect analysis tools [4].Finally, we distinguish typed holes from effect holes, ratherthan have a single type-and-effect hole, to control where touse type-guidance and where to use effect-guidance. Wheninitially trying to synthesize a method body, we omit effectsbecause it is unclear which effects are needed. For example,in Figure 1, the second spec has read effects on all fields of thepost, and yet the target method does not write any fields, asthe spec is checking the case when the post is not modified.Thus, we cannot simply use all effects in all assertions foreffect guidance. Moreover, type-guided synthesis often willsynthesize effectful expressions, e.g., the call to

Post.where in Figure 2. Conversely, our algorithm only places effectholes in positions where the type does not matter—hencetype information for such a hole would not add anything.Nonetheless, type-and-effect holes would be a simple exten-sion of our approach, and we leave exploration of them tofuture work in other synthesis domains.

RbSyn is implemented in approximately 3,600 lines of Ruby,excluding its dependencies.Synthesis specifications, as discussed in § 2, are writtenin a custom domain-specific language. Each has the form: onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn define :name, "method-sig", [consts,...] dospec "spec1" do setup { ... } postcond { ... } end ...end where :name names the method to be synthesized; method-sig is its type signature; and consts lists constants that can beused in the synthesized method. Each spec is a test case themethod must pass: setup describes the test case setup, and postcond makes assertions about the results.In Ruby, do...end and {...} are equivalent syntax forcreating code blocks , i.e., closures. Having the setup and post-condition in separate code blocks allows RbSyn to run thesetup code and check the postcondition independently.RbSyn also has optional hooks for resetting the globalstate before any setup block is run. This ensures candidateprograms are tested in a clean slate without being affected byside-effects from previous runs. In our experiments, RbSynresets the global state by clearing the database. Program Exploration Order.

While our synthesis rules arenon-deterministic, our implementation is completely deter-ministic. This makes it sensitive to the order in which ex-pressions are explored. RbSyn uses two metrics to prioritizesearch. First, programs are explored in order of their size;smaller programs are preferred over larger ones. Programsize is calculated as the number of AST nodes in the pro-gram. Second, RbSyn prefers trying effect-guided synthesisfor expressions that have passed more assertions rather thanfewer. (Appendix A.1 formally describes counting passedassertions.) Untested candidates are assumed to have passedzero assertions. In general, expressions are explored in de-creasing order of number of passed assertions, then in in-creasing order of program size. We leave experimenting withother search strategies to future work.

Effect Annotations.

We extended RDL to support effect an-notations along with type annotations for library methods.Programmers specify read and write effects following thegrammar in § 3. For example a method annotated with a writeeffect

Post.author writes to some region author in someobject of class

Post . Here author is an uninterpreted string,selected by the programmer. Similarly the labels “ . ” and “ ∗ ”stand for pure and any region (or simply “impure”), respec-tively. A region Post.* is written as

Post for convenience.One important extension is a self effect region, which in-dicates a read or write to the class of the receiver. This isessential for supporting ActiveRecord, whose query methodsare inherited by the actual Rails model classes. For exam-ple, we use the self effect on the exists? query method of

ActiveRecord::Base . Then at a call

Post.exists? , where

Post inherits from

ActiveRecord::Base , we know the queryreads the

Post table and not any other table.Effect annotations are similar to frame conditions [5, 12,24] used in verification literature. More precise effect an-notations help RbSyn find a solution faster because it will have fewer methods with subsumed effects than an impre-cise one, shrinking the search space. But effect precisiondoes not affect the correctness of the synthesized program,since correctness is ensured by the specs. For example, ifthe effect annotation for the method

Post shown in§ 2.1 had just

Post as its write annotation, synthesis wouldstill work, but would try more candidate programs. In somecases, coarse effects are required, e.g. the

Post.where methodqueries records from the

Post table. It has the coarser

Post annotation because which columns such a query will accesscannot be statically specified: it depends on the arguments.We evaluate some of the tradeoffs in effect precision in § 5.4.

Type Level Computations.

RbSyn uses RDL [13, 30] to rea-son about types, e.g., checking if one type is a subtype ofanother, and using the type environment and class table tofind terms that can fill holes. RDL includes type-level compu-tations [23], or comp types , in which certain methods’ typesinclude computations that run during type checking. Forexample, a comp type for the

ActiveRecord methodcan compute that

A.joins(B) returns a model that includesall columns of tables A and B combined. Using a comp typefor joins encodes a quadratic number of type signatures,for different combinations of receivers and arguments, into asingle type, and more for joins of more than two tables [23].RbSyn uses RDL’s comp types, but with new type sig-natures designed for synthesis. In particular, the previousversion of RDL’s comp types gave precise types when thereceiver and arguments were known, e.g., in A.joins(B) ,RDL knows exactly which two classes are being joined. Butthis may not hold during synthesis, e.g., if B is replaced by ahole in the example, then the exact return type of the joins call cannot be computed.To address this issue, we modified RDL’s existing comptype signatures for ActiveRecord methods like joins sothat they compute all possible types. For example, if a holeis an argument to joins , then the type finds all models B1 , B2 , . . . that could be joined (i.e., those with associations);gives the hole type B1 ∪ B2 ∪ . . . ; and sets the return typeof joins to a table containing the columns of A , B1 , B2 , . . . .This over-approximation is narrowed as the argument termsare synthesized, leading to cascading narrowing of typesthroughout the program as discussed in § 3.1. Optimizations.

Synthesis of terms that pass a spec is anexpensive procedure. In practice, we found solutions to asingle spec often satisfy others. Thus, when confronted witha new spec, RbSyn first tries existing solutions and condi-tionals to see if they hold for the spec, before falling back onsynthesis from scratch if needed. This makes the bottleneckfor synthesis not the number of tests, but the number ofunique paths through the program. Moreover, this reducesthe number of tuples for merging, as a single expression andconditional tuple can represent multiple specs Ψ . onference’17, July 2017, Washington, DC, USA Finally, we found that in practice, the condition in onespec often "turns out to be the negation of the conditionin another. Thus during synthesis of conditionals, RbSyntries the negation of already synthesized conditionals beforefalling back on synthesis from scratch.

We evaluated RbSyn by using it to synthesize a range ofbenchmarks extracted from widely used open source appli-cations that use a variety of libraries. We pose the followingquestions in our evaluation: • How does RbSyn perform using code based on existingunit tests in widely deployed applications? (§ 5.2) • How much improvement is type-and-effect guidancecompared to alternatives such as only type-guidanceor only effect-guidance? (§ 5.3) • How does the precision of effect annotations affectsynthesis performance? (§ 5.4)

To answer the questions above, we collected a benchmarksuite comprised of programs from the following sources: • Synthetic benchmarks is a set of minimal examples thatdemonstrate features of RbSyn. • Discourse [20] is a Rails-based discussion platform usedby over 1,500 companies and online communities. • Gitlab [16] is a web-based Git repository manager withwiki, issue tracking, and CI/CD tools built on Rails. • Diaspora [7] is a distributed social network, with groupsof independent nodes (called Pods), also built on Rails.We selected these apps because they are popular, well-maintained, widely used, and representative of programsthat are written with supporting unit tests. We selected asubset of the app’s methods for synthesis, choosing onesthat fall into the Ruby grammar we can synthesize, haveside effects due either to database accesses or to reading andwriting globals, and do not use meta-programming in thesynthesized code.Table 1 lists the benchmarks. The first column group liststhe app name (or

Synthetic for the synthetic benchmarks);the benchmark id; the benchmark name; and the number ofspecs. The synthetic benchmarks exercise features of RbSynby synthesizing pure methods, methods with side effects,methods in which multiple branches are folded into a singleline program, etc. The Discourse benchmarks include a num-ber of effectful methods in the

User model, such as methodsto activate an user account, unstage a placeholder accountcreated for email integration, etc. The Gitlab benchmarksinclude methods that disable two factor authentication fora user, methods to close and reopen issues, etc. Finally, theDisaspora benchmarks include methods to confirm a user’semail, accept a user invitation, etc. We derived the specs for the non-synthetic benchmarksdirectly from the unit tests included in the app. We spliteach test into setup and postcondition blocks in the obviousway, and we added an appropriate type annotation to thesynthesis goal. Across all benchmarks, we started with abase set of constants ( Σ in § 3) to be true , false , 0, 1 and theempty string. Then we added nil and singleton classes (forcalling class methods) on a per benchmark basis as needed.(As with many enumerative search based methods, we relyon the user to provide the right set of constants.)A few apps have several different unit tests with exactlythe same setup but different assertions in the postcondition.We merged any such group of tests into a single spec withthat setup and the union of the assertions as the postcondi-tion, to ensure that every spec setup can be distinguishedwith a unique branch condition, if necessary. We indicate thisin the column of Table 1 by listing the final number ofspecs followed by the original number of tests in parenthesesif they differ. Annotations for Benchmarks.

Finally, the col-umn lists the number of library methods available duringsynthesis. These are methods for which we provided type-and-effect annotations. In total, 164 such methods are sharedacross all benchmarks, including, e.g., ActiveRecord and coreRuby libraries. Since our benchmarks are sourced from fullapps, they often also depend on some other methods in theapp. We wrote type-and-effect annotations for such methodsand included those annotations only when synthesizing thatapp. Since RbSyn needs to run the synthesized code, whenrunning specs we include the code for both general-purposemethods, such as those from ActiveRecord, and requiredapp-specific methods. We slightly modify the set of librarymethods for A9, as discussed further below.When writing effect annotations, we aim to make themas precise as possible. For example, we annotate a methodthat accesses field name from an object of class

User witheffect

User.name , while we annotate a method that mightwrite to any field of the

User class with effect

User (equiv-alent to

User.* ). We found that annotating methods witheffects was easier than writing their type annotations, aseffect annotations include at most a class name and a human-readable word (the region). We used RDL’s support for typingmetaprogramming-generated methods [30] to also generateeffect annotations for such methods. For example, when RDLcreates the type signature for

Post , it now also createsa read effect annotation

Post.title for it.

RbSyn successfully synthesized methods that pass the specsfor every benchmark. We manually examined the output andfound that the synthesized code is equivalent to the original,human-written code, modulo minor differences that do notchange the code’s behavior in practice. For example, one such onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn ± SIQR Types Effects Neither Size Cond

Synthetic S1 lvar 1 164 0.34 ± ± ± ± ± ± - - - 72 3S7 fold branches 3 164 82.44 ± ± - - - 24 3A2 User ± - - - 28 2A3 User ± - - - 31 2A4 User ± - - - 28 3Gitlab A5 Discussion ± - - - 18 0A6 User ± - 0.44 - 22 0A7 Issue ± ± - 0.55 45.66 17 0Diaspora A9 Pod ± - - - 19 2A10 User ± ± - - - 12 0A12 User ± - - - 31 3 Table 1.

Synthesis benchmarks and results. is the number of specs used to synthesize the method; is thenumber of library methods used for every benchmark;

Time shows the median and semi-interquartile range over 11 runs,followed by the median time for synthesis using only types, only effects and naive term enumeration (

Neither ). Method Size isthe number of AST nodes in the synthesized method; shows the number of conditionals in the synthesized method.difference occurs with original code that updates multipledatabase columns with a single ActiveRecord call, and thenhas a sequence of asserts to check that each updated columnis correct. Because RbSyn considers the effects of assertionsin the postcondition one by one, it instead synthesizes asequence of database updates, one per column. Another dif-ference occurs in Gitlab, which uses the state_machine gem(an external package) to maintain an issue’s state (closed,reopened, etc). RbSyn synthesizes correct implementationsthat work without the gem.The middle group of columns in Table 1 summarizes Rb-Syn’s running time. We set a timeout of 300 seconds on allexperiments. The first column reports performance num-bers for the full system as the median and semi-interquar-tile range (SIQR) of 11 runs on a 2016 Macbook Pro with a2.7GHz Intel Core i7 processor and 16GB RAM. The nextthree columns show the median performance when RbSynuses only type-guidance, only effect-guidance, and naiveenumeration, respectively. The SIQRs (omitted due to spaceconstraints) for these runs are very small compared to themedian runtime, similar to the performance numbers with allfeatures enabled. We discuss the runs with certain guidancedisabled in detail in § 5.3. The right-most group of columnsshows the synthesized method size (in terms of number ofAST nodes) and the number of conditional branches in themethod (0 for straight-line code). Overall, RbSyn runs quickly, with around 80% of bench-marks solving in less than 9s. Benchmarks like A3 take longerbecause it requires synthesis of nil terms—recall nil is thebottom element of our type lattice, causing RbSyn to synthe-size nil at every typed hole for method arguments. Conse-quently, this requires testing all completed candidates—eventhough they eventually fail—consuming significant time.For one benchmark, A9, we changed the set of defaultlibrary methods slightly due to some pathological behavior.This benchmark includes an assertion that invokes ActiveRe-cord’s reload method, which touches all fields of that record.But then when RbSyn tries to find matching write effects,it explores a combinatorial explosion of writes to differentsubsets of the fields. This effort is almost entirely wasted,because the remainder of the assertion looks at only one par-ticular field—but that one read is subsumed by the effect ofthe reload , making it invisible to RbSyn’s search. As a result,synthesis for A9 slows down by two orders of magnitude.We addressed this by removing four ActiveRecord methodsthat manipulate specific fields and adding ActiveRecord’s update! method as the only way to write a field back to thedatabase. An alternative approach would have been to movethe reload call to be outside the assertion.As this example shows, and as is common with manysynthesis problems, performance is very hard to predict.Indeed, we can see from Table 1 that performance is generallynot well correlated with either the size of the output program onference’17, July 2017, Washington, DC, USA o f b e n c h m a r k s TE EnabledE OnlyT OnlyTE DisabledSyntheticApps

Figure 7.

Number of benchmarks synthesized using type-and-effect (

TE Enabled ) guided synthesis relative to usingonly type (

T Only ) or effect (

E Only ) guidance separately andnaive enumeration (

TE Disabled ). Lower is better.or with the number of branches. We do observe that RbSyn’sbranch merging strategy is effective, often producing fewerconditional than there are specs, e.g., in A12 there are sevenspecs but only three conditionals.

Next, we explore the performance benefits of type- and effect-guidance. Figure 7 plots the running times from Table 1 whenall features of RbSyn are enabled (

TE Enabled ), with onlytype-guidance (

T Only ), with only effect-guidance (

E Only )and with neither (

TE Disabled ). The plot shows the numberof benchmarks that complete ( 𝑦 -axis) in a given amount oftime ( 𝑥 -axis), based on the median running times.We can clearly see that type- and effect-guided synthesisperforms best, successfully synthesizing all benchmarks; theslowest takes 83s. In contrast, with both strategies disabled,all but three small benchmarks time out. Performance withonly type- or only effect-guidance lies in between. With onlytype-guidance, synthesis completes on eight benchmarks,of which the majority are pure methods from the syntheticbenchmarks. From apps, it only synthesize A7 and A10. Inthese benchmarks, the needed effectful expressions are smalland hence can be found with essentially brute-force search.With only effect-guidance, synthesis performance signifi-cantly worse, completing only five benchmarks, of whichonly three are from apps. These benchmarks succeeded be-cause effect-guided synthesis quickly generates the templatefor the effectful method calls and then correctly fills themsince they are small and can be found quickly by naive enu-meration. Finally, we explore the tradeoff between effect annotationprecision and synthesis performance. Recall that we foundthe burden of writing effect annotations to be low, especially T i m e ( s ) Precise EffectsClass EffectsPurity Effects

Figure 8.

Performance of RbSyn with varying effect annota-tion precision: full, class effects only, and purity annotationson library methods. Lower is better. Full height indicatestimeout.compared to writing type annotations. However, it can be fur-ther minimized by writing less precise annotations. This willnot affect correctness, since RbSyn only accepts synthesiscandidates that pass all specs, but it does affect performance.Figure 8 plots the median of synthesis times for bench-marks over 11 runs under three conditions:

Precise Effects ,which are the effects used above;

Class Effects , in which anno-tations include only class names and eliminate region labels(e.g.,

Post.title becomes

Post ); and

Purity Effects , in whichthe only effect annotations are pure or impure (the • and ∗ effects, respectively, in our formalism). The benchmarks( x -axis) are ordered in increasing order of time for PurityEffects , then

Class Effects , and finally

Precise Effects .From these experiments, we see that synthesis time in-creases as effect annotation precision decreases, often lead-ing to a timeout. Class labels were sufficient to synthesize 16of 19 benchmarks. Overall, class labels take time similar toprecise labels, except for the three cases (A8, A11, and A3)where side-effecting method calls require precise labels toquickly find the candidate. As all precise effects are reducedto class effects, RbSyn must try many candidates with classeffect before finding the correct one, leading to timeouts.We note that A1 and A4 are slightly faster when usingclass effects. The reason is an implementation detail. Theeffect holes in these benchmarks can only be correctly filledby methods whose regular annotations are class annotations(more precise annotations are not possible). However, whentrying to fill holes, RbSyn first tries all methods with preciseannotations, only afterward trying methods with class an-notations. Since the precise annotations never match, thisyields worse performance under the precise effect conditionthan under the class effect condition, when the search couldby chance find the matching methods sooner.Purity labels only enabled synthesis of 9 benchmarks, in-cluding just 3 of 12 app benchmarks. The purity annotationsare slow in general and only effective in the cases where thenumber of impure library methods is small. onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn Component-Based Synthesis.

Several researchers have pro-posed component-based synthesis, which creates code bycomposing calls to existing APIs, as RbSyn does. For example,Jha et al. [22] propose synthesis of loop-free programs forbit-vector manipulation. Their approach uses formal spec-ifications for synthesis, in contrast to RbSyn, which usesunit tests. CodeHint [15] synthesizes Java programs, usinga probabilistic model to guide the search towards expres-sions more often used in practice. SyPet [10] also synthe-sizes programs that use Java APIs, by modeling them as apetri net and using SAT-based techniques to find a solution.These approaches do not support synthesis of programs withbranches, which are common in the domain of web apps. Un-like RbSyn, none of these systems use side effect information,slowing synthesis for effectful methods.

Type-Guided Synthesis.

There has been much success us-ing type information to guide synthesis of recursive func-tional programs. Myth [14, 26] uses bidirectional type check-ing to synthesize programs, using input/output examples asthe specification. However, Myth expects examples to be trace complete , meaning the user has to provide input/out-put examples for any recursive calls on the function argu-ments. RbSyn does not synthesize recursive functions, asthey are rarely needed in our target domain of Ruby webapps. Synqid [28] uses polymorphic refinement types as thespecification for synthesis. In contrast, RbSyn uses tests asspecifications, which can check effects caused by a method.Similarly, Hoogle+ [21] uses Haskell tests and types onlyto synthesize potential solutions, primarily geared towardsAPI discovery. However, unlike RbSyn, Hoogle+ forbids theuse of control structures (e.g., conditionals) in programs.

Programming by Example.

Escher [1] and spreadsheetmanipulation tools [17–19] all accept input/output examplesas a partial specification for synthesis. These tools primarilytarget users who cannot program, whereas RbSyn is targetedtowards programmers. In addition, RbSyn’s specs are fullunit tests, so they can check both return values and sideeffects. 𝜆 [11] synthesizes data structure transformationsusing higher-order functions, a feature not handled by Rb-Syn because of our target domain of Rails web apps, whichrarely use such functions. STUN [2] uses a program mergingstrategy that is similar to ours, but it depends on definingdomain-specific unification operators to safely combine pro-grams under branches. In contrast, our approach may bemore domain-independent, using preconditions and tests tofind correct branch conditions. There have been multiple ap-proaches to synthesizing database programs [6, 9]. Perhapsthe closest in purpose to RbSyn is Scythe [34], which synthe-sizes SQL queries based on input/output examples. Scytheuses a two-phased synthesis process to synthesize an ab-stract query, after which enumeration is used to concretize the abstract query. In contrast, the use of comp types [23]allows RbSyn to quickly construct a template for a databasequery. With precise types for the method argument holes,this essentially builds abstract queries for free, whose holesare then filled later during synthesis. Solver-Aided Synthesis.

In solver-aided synthesis, synthe-sis specifications are transformed to a set of constraints for aSAT or SMT solver. Sketch [31] allows users to write partialprograms, called sketches, where the omitted parts are thensynthesized by the tool. Migrator [35] uses conflict-drivenlearning [8] to synthesize raw SQL queries, for use in data-base programs for schema refactoring. In contrast, programssynthesized by RbSyn use ActiveRecord to access the data-base. Rosette [32, 33] is a solver-aided language that providesaccess to verification and synthesis. It relies on symbolic ex-ecution, and thus requires significant modeling of externallibraries for synthesizing programs that use such libraries.Alur et al. [3] synthesize programs with branches, where thesolver generates counter-examples from the failed candidateprograms. This availability of extra examples helps in theconstruction of program branches like a decision tree usingan information gain heuristic. However, it has the additionalcost of requiring formal specifications of library method se-mantics, which is a significant amount of work comparedto RbSyn’s type and effect annotations. SuSLik [29] synthe-sizes heap-manipulating programs using separation logic toprecisely model the the heap. RbSyn, in contrast, uses verycoarse effects to track accesses that can go beyond the heap,such as database reads and writes.

We presented RbSyn, a system for type- and effect-guidedprogram synthesis for Ruby. In RbSyn, the synthesis goal isdescribed by the target method type and a series of specscomprising preconditions followed by postconditions thatuse assertions. The user also supplies the set of constantsthe synthesized method can use, and type-and-effect an-notations for any library methods it can call. RbSyn thensearches for a solution starting from a hole □ : 𝜏 typed withthe method’s return type, inserting (write) effect holes (cid:94) : 𝜖 derived from the read effects of failing assertions. Finally,RbSyn merges together solutions for individual specs bysynthesizing branch conditions to select among the differentsolutions as needed. We evaluated RbSyn by running it on asuite of 19 benchmarks, 12 of which are representative pro-grams from popular open-source Ruby on Rails apps. RbSynsynthesized correct solutions to all benchmarks, completingsynthesis of 15 of the 19 benchmarks in under 9s, with theslowest benchmark solving in 83s. We believe RbSyn demon-strates a promising new approach to synthesizing effectfulprograms. onference’17, July 2017, Washington, DC, USA Acknowledgments

This material is based upon work supported by the NationalScience Foundation under Grant No. nnnnnnn and GrantNo. mmmmmmm. Any opinions, findings, and conclusionsor recommendations expressed in this material are those ofthe author and do not necessarily reflect the views of theNational Science Foundation.

References [1] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recur-sive program synthesis. In

International conference on computer aidedverification . Springer, 934–950.[2] Rajeev Alur, Pavol Čern`y, and Arjun Radhakrishna. 2015. Synthesisthrough unification. In

International Conference on Computer AidedVerification . Springer, 163–179.[3] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. ScalingEnumerative Program Synthesis via Divide and Conquer. In

Toolsand Algorithms for the Construction and Analysis of Systems - 23rdInternational Conference, TACAS 2017, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2017, Uppsala,Sweden, April 22-29, 2017, Proceedings, Part I (Lecture Notes in ComputerScience) , Vol. 10205. 319–336.[4] Robert L Bocchino Jr, Vikram S Adve, Danny Dig, Sarita V Adve,Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Sim-mons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effectsystem for deterministic parallel Java. In

Proceedings of the 24th ACMSIGPLAN conference on Object oriented programming systems languagesand applications . 97–116.[5] Alexander Borgida, John Mylopoulos, and Raymond Reiter. 1995. Onthe frame problem in procedure specifications.

IEEE Transactions onSoftware Engineering

21, 10 (1995), 785–798.[6] Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Op-timizing database-backed applications with query synthesis. In

ACMSIGPLAN Conference on Programming Language Design and Implemen-tation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013 , Vol. 48. ACM NewYork, NY, USA, 3–14.[7] Diaspora Inc. 2020. Diaspora: A privacy-aware, distributed, opensource social network. https://github.com/diaspora/diaspora .[8] Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Pro-gram synthesis using conflict-driven learning. In

Proceedings of the39th ACM SIGPLAN Conference on Programming Language Design andImplementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 ,Vol. 53. ACM New York, NY, USA, 420–435.[9] Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and SwaratChaudhuri. 2017. Component-based synthesis of table consolidationand transformation tasks from examples. In

Proceedings of the 38thACM SIGPLAN Conference on Programming Language Design and Im-plementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017 , Vol. 52.ACM New York, NY, USA, 422–436.[10] Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas WReps. 2017. Component-based synthesis for complex APIs. In

Proceed-ings of the 44th ACM SIGPLAN Symposium on Principles of ProgrammingLanguages . 599–612.[11] John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing datastructure transformations from input-output examples.

Proceedings ofthe 36th ACM SIGPLAN Conference on Programming Language Designand Implementation

50, 6 (2015), 229–239.[12] Jean-Christophe Filliâtre and Andrei Paskevich. 2013. Why3—whereprograms meet provers. In

European symposium on programming .Springer, 125–128.[13] Jeffrey Foster, Brianna Ren, Stephen Strickland, Alexander Yu, MilodKazerounian, and Sankha Narayan Guria. 2020. RDL: Types, typechecking, and contracts for Ruby. https://github.com/tupl-tufts/rdl . [14] Jonathan Frankle, Peter-Michael Osera, David Walker, and SteveZdancewic. 2016. Example-directed synthesis: a type-theoretic in-terpretation.

Proceedings of the 43rd Annual ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages

51, 1 (2016), 802–815.[15] Joel Galenson, Philip Reames, Rastislav Bodik, Björn Hartmann, andKoushik Sen. 2014. Codehint: Dynamic and interactive synthesis ofcode snippets. In

Proceedings of the 36th International Conference onSoftware Engineering . 653–663.[16] GitLab B.V. 2020. GitLab is an open source end-to-end software devel-opment platform with built-in version control, issue tracking, codereview, CI/CD, and more. https://gitlab.com/gitlab-org/gitlab .[17] Sumit Gulwani. 2011. Automating string processing in spreadsheetsusing input-output examples.

Proceedings of the 38th Annual ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages

46, 1 (2011), 317–330.[18] Sumit Gulwani, William R Harris, and Rishabh Singh. 2012. Spread-sheet data manipulation using examples.

Commun. ACM

55, 8 (2012),97–105.[19] William R Harris and Sumit Gulwani. 2011. Spreadsheet table trans-formations from examples.

Proceedings of the 32nd ACM SIGPLANConference on Programming Language Design and Implementation https://github.com/discourse/discourse .[21] Michael B James, Zheng Guo, Ziteng Wang, Shivani Doshi, Hila Peleg,Ranjit Jhala, and Nadia Polikarpova. 2020. Digging for fold: synthesis-aided API discovery for Haskell.

Proceedings of the ACM on Program-ming Languages

4, OOPSLA (2020), 1–27.[22] Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010.Oracle-guided component-based program synthesis. In , Vol. 1. IEEE,215–224.[23] Milod Kazerounian, Sankha Narayan Guria, Niki Vazou, Jeffrey S Foster,and David Van Horn. 2019. Type-level computations for Ruby libraries.In

Proceedings of the 40th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation . 966–979.[24] Bertrand Meyer. 2015. Framing the Frame Problem. In

DependableSoftware Systems Engineering . Vol. 40. IOS Press, 193–203.[25] Jaideep Nijjar and Tevfik Bultan. 2011. Bounded verification of Ruby onRails data models. In

Proceedings of the 2011 International Symposiumon Software Testing and Analysis . 67–77.[26] Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis.

Proceedings of the 36th ACM SIGPLANConference on Programming Language Design and Implementation

Proceedings of the 35th ACM SIGPLANConference on Programming Language Design and Implementation

Proceedings ofthe 37th ACM SIGPLAN Conference on Programming Language Designand Implementation

51, 6, 522–538.[29] Nadia Polikarpova and Ilya Sergey. 2019. Structuring the synthesis ofheap-manipulating programs.

Proceedings of the ACM on ProgrammingLanguages

3, POPL (2019), 1–30.[30] Brianna M Ren and Jeffrey S Foster. 2016. Just-in-time static type check-ing for dynamic languages. In

Proceedings of the 37th ACM SIGPLANConference on Programming Language Design and Implementation . 462–476.[31] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia,and Vijay Saraswat. 2006. Combinatorial sketching for finite programs.In

Proceedings of the 12th international conference on Architectural onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn support for programming languages and operating systems . 404–415.[32] Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided lan-guages with rosette. In Proceedings of the 2013 ACM international sym-posium on New ideas, new paradigms, and reflections on programming& software . 135–152.[33] Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolicvirtual machine for solver-aided host languages.

Proceedings of the35th ACM SIGPLAN Conference on Programming Language Design andImplementation

49, 6 (2014), 530–541. [34] Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthe-sizing highly expressive SQL queries from input-output examples. In

Proceedings of the 38th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation . 452–466.[35] Yuepeng Wang, James Dong, Rushi Shah, and Isil Dillig. 2019. Synthe-sizing database programs for schema refactoring. In

Proceedings of the40th ACM SIGPLAN Conference on Programming Language Design andImplementation . 286–300.14 onference’17, July 2017, Washington, DC, USA

Algorithm 2

Synthesis of programs that passes a spec 𝑠 procedure Generate( 𝜏 → 𝜏 , 𝐶𝑇 , Σ , s, maxSize) Γ ← [ 𝑥 ↦→ 𝜏 ] 𝑒 ← □ : 𝜏 workList ← [( , 𝑒 )] while workList is not empty do ( 𝑐, 𝑒 𝑏 ) ← pop(workList) 𝜔 𝜏 ← {( 𝑐, 𝑒 𝑡 ) | Σ , Γ ⊢ 𝑒 𝑏 ⇝ 𝑒 𝑡 : 𝜏 } 𝜔 𝑒𝑣𝑎𝑙 ← {( 𝑐, 𝑒 𝑡 ) ∈ 𝜔 𝜏 ∧ evaluable ( 𝑒 𝑡 )} 𝜔 𝑟 ← 𝜔 𝜏 − 𝜔 𝑒𝑣𝑎𝑙 for all ( 𝑐, 𝑒 𝑡 ) ∈ 𝜔 𝑒𝑣𝑎𝑙 do 𝑐 𝑟 , 𝑣 𝑟 ← EvalProgram ( 𝑒 𝑡 , 𝑠 ) if 𝑣 𝑟 = err ( 𝜖 𝑟 , 𝜖 𝑤 ) then 𝜔 𝑟 ← 𝜔 𝑟 ∪ {( 𝑐, 𝑒 𝑓 ) | Σ , Γ , 𝜖 𝑟 ⊢ 𝑒 𝑡 ↠ 𝑒 𝑓 } else return 𝑒 𝑡 end if end for 𝜔 𝑟 ← {( 𝑐, 𝑒 𝑏 ) ∈ 𝜔 𝑟 ∧ size ( 𝑒 𝑏 ) ≤ maxSize } workList ← reorder(workList + 𝜔 𝑟 ) end while return Error: No solution found end procedure procedure

EvalProgram( 𝑒 , ⟨ 𝑆, 𝑄 ⟩ ) 𝑃 ← def 𝑚 ( 𝑥 ) = 𝑒 , 𝐸 ← [] ( 𝑐, R) ← (cid:74) 𝐸, , ⟨• , •⟩ , 𝑃 ; 𝑆 ; 𝑄 (cid:75) ↩ → ∗ 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , R (cid:75) return ( 𝑐, R) end procedure Errors E :: = err ( 𝜖 𝑟 , 𝜖 𝑤 ) Results R :: = 𝑣 | E Figure 9.

Extended 𝜆 𝑠𝑦𝑛 . A Appendix

A.1 Evaluation Rules

We extend 𝜆 𝑠𝑦𝑛 to include errors E . Errors can originate fromthe evaluation of an assertion assert 𝑒 and encapsulates theread effect 𝜖 𝑟 and write effect 𝜖 𝑤 inferred from 𝑒 . Results R of an evaluation can either be a value or an error. Collectingeffects while evaluating the postcondition of tests requirespecial evaluation rules. Figure 10 shows selected rules ofthe small step operational semantics for only postconditions(rest omitted as they are standard rules). The rules prove judg-ments of the form (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐 ′ , ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑄 ′ (cid:75) that reduce configurations that contain a dynamic environ-ment 𝐸 , counter of passed assertions 𝑐 , the pair of read andwrite effects ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ collected during evaluation, and post-condition 𝑄 under evaluation. Rule E-AssertPass applieswhen assertion evaluates to a truthy-y value. It also incre-ments the counter for passed assertions. If 𝑒 evaluates to a false-y value, it results in an error, in which case it returnsthe collected side effects with the error ( E-AssertFail). Eval-uation of a library method, gives a union of its effects withthe already collected effects (E-MethCall). During the eval-uation of a sequence 𝑄 ; 𝑄 , if the first assertion evaluates to avalue, the evaluation continues discarding all collected effects(E-SeqVal). If the evaluation of an assert yields an error, theevaluation of postcondition terminates with the error as finalresult (E-SeqErr). We define the big step semantics as fol-lows: 𝑄 ⇓ R if ∃ 𝐸 ′ , 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ . (cid:74) [ 𝑥 𝑟 → 𝑣 ] , , ⟨• , •⟩ , 𝑄 (cid:75) ↩ → ∗ 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , R (cid:75) , in other words, evaluating the postcon-dition in an environment containing the return value ofsynthesis goal 𝑥 𝑟 , will evaluate to a result. A.2 Type-Guided Synthesis

Figure 11 shows all the type checking and type-directedsynthesis rules. The repeated rules are same as § 3. T-Nil typechecks the value nil and assigns it the type

Nil . The rulesT-True, T-False, T-Obj and T-Err do the same for the values true , false , [ 𝐴 ] and err ( 𝜖 𝑟 , 𝜖 𝑤 ) respectively. Similarly T-NegB and T-OrB give the rules to type check conditionalsthat contain negation or disjunction. T-EffHole typechecksan effect hole with the Obj type. It can be narrowed to a moreprecise type when synthesis rules are applied. T-Seq doestype checking and synthesis for sequences and T-App typechecks or synthesizes terms in the receiver and argumentsof the method call. T-If type checks if-else expressions.

A.3 Algorithm

Algorithm 2 describes the synthesis of candidates that passa single spec. The algorithm uses a work list, which ini-tially contains a tuple with the number of passed assertionsstarting at 0 initially and the initial hole of type 𝜏 . Thefirst tuple is popped off the work list and applies type oreffect guided synthesis rules, the ⇝ relation, to a base ex-pression 𝑒 𝑏 to build the set of new expressions 𝜔 𝜏 . Next,expressions with no holes are filtered into a list of evaluable expressions 𝜔 𝑒𝑣𝑎𝑙 . Then, EvalProgram is called to make aprogram def 𝑚 ( 𝑥 ) = 𝑒 𝑡 and evaluate spec 𝑠 in an environ-ment 𝐸 , starting from a passed assertion count of 0.If the evaluation of the postcondition results in an error err ( 𝜖 𝑟 , 𝜖 𝑤 ) , the algorithm proceeds to introduce an effecthole, using the relation ↠ to build the set 𝜔 𝜖 . If a programpasses all the assertions then it means a correct solution hasbeen found so, Generate returns it. Finally, the algorithmcollects all the remainder expressions 𝜔 𝑟 with holes andfilters the programs that exceed the maximum permissiblesize maxSize . This bounds the search to particular searchspace size. It then takes the filtered programs and programsfrom the remainder of the work list and reorders them. Theprograms are sorted by the number of passed assertions 𝑐 in the decreasing order and then by the program size inthe increasing order. This assumes that a program that is onference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐 ′ , ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑄 ′ (cid:75) 𝑣 ∈ { true , [ 𝐴 ]} (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , assert 𝑣 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐 + , ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑣 (cid:75) E-AssertPass 𝑣 ∈ { false , nil } (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , assert 𝑣 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , err ( 𝜖 𝑟 , 𝜖 𝑤 ) (cid:75) E-AssertFail (cid:74)

𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑒 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑒 ′ (cid:75)(cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , assert 𝑒 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸 ′ , 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , assert 𝑒 ′ (cid:75) E-AssertStep type_of ( 𝑣 𝑟 , 𝑣 𝑎 ) = ( 𝐴 𝑟 , 𝐴 𝑎 ) 𝑚 : 𝜏 𝑎 ⟨ 𝜖 𝑟 ,𝜖 𝑤 ⟩ −−−−−−→ 𝜏 ∈ 𝐶𝑇 ( 𝐴 ) 𝐴 𝑟 ≤ 𝐴 𝐴 𝑎 ≤ 𝜏 𝑎 call ( 𝐴.𝑚, 𝑣 𝑟 , 𝑣 𝑎 ) = 𝑣 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑣 𝑟 .𝑚 ( 𝑣 𝑎 ) (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ ∪ ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑣 (cid:75) E-MethCall (cid:74)

𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑄 ′ (cid:75)(cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑄 ; 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 ′ 𝑟 , 𝜖 ′ 𝑤 ⟩ , 𝑄 ′ ; 𝑄 (cid:75) E-SeqStep (cid:74)

𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , 𝑣 ; 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨• , •⟩ , 𝑄 (cid:75) E-SeqVal (cid:74)

𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , err ( 𝜖 𝑟 , 𝜖 𝑤 ) ; 𝑄 (cid:75) ↩ → 𝐶𝑇 (cid:74) 𝐸, 𝑐, ⟨ 𝜖 𝑟 , 𝜖 𝑤 ⟩ , err ( 𝜖 𝑟 , 𝜖 𝑤 ) (cid:75) E-SeqErr

Figure 10.

Selected rules for operational semantics of the postcondition 𝑄 Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 : 𝜏 Σ , Γ ⊢ 𝐶𝑇 nil ⇝ nil : Nil

T-Nil Σ , Γ ⊢ 𝐶𝑇 true ⇝ true : Bool

T-True Σ , Γ ⊢ 𝐶𝑇 false ⇝ false : Bool

T-False Σ , Γ ⊢ 𝐶𝑇 [ 𝐴 ] ⇝ [ 𝐴 ] : 𝐴 T-Obj Σ , Γ ⊢ 𝐶𝑇 err ( 𝜖 𝑟 , 𝜖 𝑤 ) ⇝ err ( 𝜖 𝑟 , 𝜖 𝑤 ) : Err

T-Err Γ ( 𝑥 ) = 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑥 ⇝ 𝑥 : 𝜏 T-Var Σ , Γ ⊢ 𝐶𝑇 𝑏 ⇝ 𝑏 ′ : Bool Σ , Γ ⊢ 𝐶𝑇 ! 𝑏 ⇝ ! 𝑏 ′ : Bool

T-NegB Σ , Γ ⊢ 𝐶𝑇 𝑏 ⇝ 𝑏 ′ : Bool Σ , Γ ⊢ 𝐶𝑇 𝑏 ⇝ 𝑏 ′ : Bool Σ , Γ ⊢ 𝐶𝑇 𝑏 ∨ 𝑏 ⇝ 𝑏 ′ ∨ 𝑏 ′ : Bool

T-OrB Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ □ : 𝜏 T-Hole Σ , Γ ⊢ 𝐶𝑇 (cid:94) : 𝜖 ⇝ ( (cid:94) : 𝜖 ) : Obj

T-EffHole 𝑣 : 𝜏 ∈ Σ 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ 𝑣 : 𝜏 S-Const Γ ( 𝑥 ) = 𝜏 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ 𝑥 : 𝜏 S-Var 𝑚 : 𝜏 → 𝜏 ∈ 𝐶𝑇 ( 𝐴 ) 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 □ : 𝜏 ⇝ ( □ : 𝐴 ) .𝑚 ( □ : 𝜏 ) : 𝜏 S-App Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑒 ; 𝑒 ⇝ 𝑒 ′ ; 𝑒 ′ : 𝜏 T-Seq Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 𝑚 : 𝜏 → 𝜏 ∈ 𝐶𝑇 ( 𝐴 ) 𝜏 ≤ 𝐴 𝜏 ≤ 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑒 .𝑚 ( 𝑒 ) ⇝ 𝑒 ′ .𝑚 ( 𝑒 ′ ) : 𝜏 T-App Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ [ 𝑥 ↦→ 𝜏 ] ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 let 𝑥 = 𝑒 in 𝑒 ⇝ let 𝑥 = 𝑒 ′ in 𝑒 ′ : 𝜏 T-Let Σ , Γ ⊢ 𝐶𝑇 𝑏 ⇝ 𝑏 ′ : Bool Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 𝑒 ⇝ 𝑒 ′ : 𝜏 Σ , Γ ⊢ 𝐶𝑇 if 𝑏 then 𝑒 else 𝑒 ⇝ if 𝑏 ′ then 𝑒 ′ else 𝑒 ′ : 𝜏 ∪ 𝜏 T-If

Figure 11.

Type checking and type-directed synthesis rules onference’17, July 2017, Washington, DC, USA evaluable 𝑒 ; 𝑒 = evaluable 𝑒 ∧ evaluable 𝑒 evaluable 𝑒 .𝑚 ( 𝑒 ) = evaluable 𝑒 ∧ evaluable 𝑒 evaluable □ : 𝜏 = falseevaluable let 𝑥 = 𝑒 in 𝑒 = evaluable 𝑒 ∧ evaluable 𝑒 evaluable if 𝑏 then 𝑒 else 𝑒 = evaluable 𝑏 ∧ evaluable 𝑒 ∧ evaluable 𝑒 evaluable ! 𝑏 = evaluable 𝑏 evaluable 𝑏 ∨ 𝑏 = evaluable 𝑏 ∧ evaluable 𝑏 evaluable _ = truesize 𝑒 ; 𝑒 = size 𝑒 + size 𝑒 size 𝑒 .𝑚 ( 𝑒 ) = size 𝑒 + size 𝑒 + size let 𝑥 = 𝑒 in 𝑒 = size 𝑒 + size 𝑒 size if 𝑏 then 𝑒 else 𝑒 = size 𝑏 + size 𝑒 + size 𝑒 size ! 𝑏 = size 𝑏 size 𝑏 ∨ 𝑏 = size 𝑏 + size 𝑏 size _ = Figure 12.

Helper functions used by RbSyn ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑏 , 𝑏 ∨ 𝑏 , Ψ ∪ Ψ ⟩ if 𝑒 ≡ true , 𝑒 ≡ false and 𝑏 ≡ ! 𝑏 (4) ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑏 , 𝑏 ∨ 𝑏 , Ψ ∪ Ψ ⟩ if 𝑒 ≡ false , 𝑒 ≡ true and 𝑏 ≡ ! 𝑏 (5) ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 𝑔 , Ψ ⟩ if 𝑏 𝑔 ≡ ! 𝑏 and ∀⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑔 ⊢ 𝑆 𝑖 ; assert 𝑥 𝑟 ⇓ 𝑣 (6) ⟨ 𝑒 , 𝑏 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ = ⟨ 𝑒 , 𝑏 𝑔 , Ψ ⟩ ⊕ ⟨ 𝑒 , 𝑏 , Ψ ⟩ if 𝑏 𝑔 ≡ ! 𝑏 and ∀⟨ 𝑆 𝑖 , 𝑄 𝑖 ⟩ ∈ Ψ . def 𝑚 ( 𝑥 ) = 𝑏 𝑔 ⊢ 𝑆 𝑖 ; assert ! 𝑥 𝑟 ⇓ 𝑣 (7) Figure 13.

Branch pruning rules.more likely correct and smaller will be selected earlier forprocessing in the work list. Lastly, if Generate doesn’t find a correct program in that search space it will return an errorfor the same.Figure 12 shows the formal definitions of hasHole , and size functions.

A.4 Branch pruning rules

Figure 13 formally describes the rules that allows RbSyn todo term rewriting in 𝜆 𝑠𝑦𝑛 for branch pruning. These are use-ful particularly for reducing boolean programs. Rules 4 and 5allows us to rewrite expressions into their branch conditionif the expression body is true or false reducing expressionslike if 𝑏 then true else false to 𝑏 . Rules 6 and 7 guess aconditional that is the negation of the other, if the negationholds for the tests. Any ⊕ term where two branch condi-tions are negations of each other reflect a shorter program if 𝑏 then 𝑒 else 𝑒 in 𝜆 𝑠𝑦𝑛 . If it is a boolean program, thenit might even enable application of rules 4 and 5 producinga single line program.. If it is a boolean program, thenit might even enable application of rules 4 and 5 producinga single line program.