[PDF] Refinement Type Directed Search for Meta-Interpretive-Learning of Higher-Order Logic Programs

Abstract

The program synthesis problem within the Inductive Logic Programming (ILP) community has typically been seen as untyped. We consider the benefits of user provided types on background knowledge. Building on the Meta-Interpretive Learning (MIL) framework, we show that type checking is able to prune large parts of the hypothesis space of programs. The introduction of polymorphic type checking to the MIL approach to logic program synthesis is validated by strong theoretical and experimental results, showing a cubic reduction in the size of the search space and synthesis time, in terms of the number of typed background predicates. Additionally we are able to infer polymorphic types of synthesized clauses and of entire programs. The other advancement is in developing an approach to leveraging refinement types in ILP. Here we show that further pruning of the search space can be achieved, though the SMT solving used for refinement type checking comes

Full PDF

aa r X i v : . [ c s . A I] F e b University of Oxford

MSc in Computer Science 2017-18

Reﬁnement Type Directed Search forMeta-Interpretive Learning ofHigher-Order Logic Programs

Rolf MorelSt. John’s Collegesupervised by Prof. Luke Ongand Dr. Andrew Cropper

A thesis submitted for the degree of

Master of Science Computer Science bstract

The program synthesis problem within the Inductive Logic Programming(ILP) community has typically been seen as untyped. We consider the bene-ﬁts of user provided types on background knowledge. Building on the Meta-Interpretive Learning (MIL) framework, we show that type checking is ableto prune large parts of the hypothesis space of programs. The introductionof polymorphic type checking to the MIL approach to logic program syn-thesis is validated by strong theoretical and experimental results, showing acubic reduction in the size of the search space and synthesis time, in termsof the number of typed background predicates. Additionally we are able toinfer polymorphic types of synthesized clauses and of entire programs. Theother advancement is in developing an approach to leveraging reﬁnementtypes in ILP. Here we show that further pruning of the search space can beachieved, though the SMT solving used for reﬁnement type checking comesat a signiﬁcant cost timewise. 1 ontents AI : Untyped Abstracted MIL . . . . . . . . . . . . . . 193.3.1 Representation of Background Knowledge . . . . . . . 203.3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . 21 PT : Type Checking through Uniﬁcation . . . . . . . . 345.2.1 Derivation and General Types . . . . . . . . . . . . . . 345.2.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . 362.2.3 Forward Propagating Derivation Types . . . . . . . . . 365.2.4 Inferring the Most General Type . . . . . . . . . . . . 395.3 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . 405.3.1 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . 405.3.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . 415.3.3 Proportion of Relevant Predicates . . . . . . . . . . . . 435.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 445.4.1 Derivation steps . . . . . . . . . . . . . . . . . . . . . . 455.4.2 Experiment 1: Search Space Reduction . . . . . . . . . 455.4.3 Experiment 2: Ratio Inﬂuence . . . . . . . . . . . . . . 495.4.4 Experiment 3: Simply Typed Droplasts . . . . . . . . . 52 RT . . . . . . . . . . . . 586.2.3 Tree-Shaped Grand Reﬁnement . . . . . . . . . . . . . 606.3 Grand Reﬁnement Checking . . . . . . . . . . . . . . . . . . . 616.3.1 SMT Solvers . . . . . . . . . . . . . . . . . . . . . . . . 616.3.2 Language of Reﬁnements . . . . . . . . . . . . . . . . . 636.4 Expressiveness of Reﬁnements . . . . . . . . . . . . . . . . . . 646.4.1 Z3 Sequence Theory . . . . . . . . . . . . . . . . . . . 646.4.2 DTLIA: Quantiﬁers, Datatypes and Arithmetic . . . . 656.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . 666.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 676.6.1 Experiment 4: Droplasts with Sensible Predicates . . . 676.6.2 Experiment 5: Droplasts with Additional Predicates . . 71 hapter 1Introduction In the last decade there has been a great surge in the eﬀective applicationof Artiﬁcial Intelligence techniques to a great number of practical problems,e.g. image classiﬁcation [Deng et al., 2009], code completion [Raychev et al.,2014], as well as autonomous navigation [Bojarski et al., 2016]. Though theseadvances in AI technology have made previous instances of the above prob-lems tractable, many of the techniques leveraged are not able to give a usefulexplanation of why decisions are being made (a prime example being artiﬁ-cial neural networks), and struggle to generalize over their entire input space[Gulwani et al., 2015]. For some problems, like in the case of autonomousdriving cars, the ability to understand why certain decisions will be/weremade could be of utmost importance with regard to safety and liability is-sues.

Comprehensibility

The problem with the alluded to methods is that thegenerated programs do not necessarily have a description that allows hu-mans to understand and eﬀectively reason about the programs (e.g. the edgeweights in neural networks do not give a ready interpretation of program be-haviour). The focus of these approaches is on improving according to Michie[1988]’s weak criterion for machine learning, i.e. improving predictive accu-racy. In contrast, programs described by programming languages do havesemantics that are comprehensible to both machines and humans, whichcorresponds to Michie’s criterion for strong machine learning. From this itfollows that a fruitful approach to explainable AI might be to generate com-puter programs expressed in programming languages, i.e. code programs . Pro-gramming languages also strongly emphasize the composability of programs,motivating the usage of the term synthesis for the generation of code pro-grams. Logic and functional languages in particular are good candidates forsynthesis due to their high level of abstraction being associated with smaller4rograms. Recent work on ultra-strong machine learning [Muggleton et al.,2018] has shown that learning logic relational programs can extend humans’capacity for understanding training data to a level beyond unassisted study.

Inductive synthesis

Code program synthesis is not only useful in termsof explainable AI, it is also of interest for the sake of automating program-ming. Code synthesis holds promise as a tool for developers to become moreeﬃcient, but has also proven eﬀective in empowering non-programmers inobtaining programs that ﬁt their needs, e.g. learning spreadsheet transfor-mations [Singh and Gulwani, 2012b]. For these users of program synthesisthe synthesis problem is posed as ﬁnding a program that accounts for theexamples that the user gives (in terms of inputs and outputs) and whichis mostly likely to appropriately generalize from the examples [Gulwani,2010]. The paradigm that combines both automatic generation of codeprograms and learning from examples is known as

Inductive Programming [Kitzelmann, 2010]. Versus general machine learning, inductive program-ming typically relies on very few examples to learn from, e.g. the singleexample f ([[1 , , , [4 , , , [4]] is enough to induce a program thatdrops the last element from input lists. For programmers this paradigm isuseful for problems ranging from which constants to ﬁll in for edge cases(see [Solar-Lezama, 2009]), up to the potential to discover novel eﬃcientalgorithms [Cropper and Muggleton, 2018]. Logic programs

The above considerations motivate the study of Induc-tive Logic Programming (ILP), ”a form of machine learning that uses logicprogramming to represent examples, background knowledge, and learnedprograms” [Cropper, 2017], an active ﬁeld of study for almost thirty years[Muggleton, 1991]. The speciﬁcation of an inductive logic problem is in termsof the examples the program needs to satisfy, along with background infor-mation in the form of asserted facts and available program fragments, andpossibly rules about the structure of the program to be synthesized. A partic-ularly powerful approach developed recently is that of the

Meta-InterpretiveLearning (MIL) [Muggleton et al., 2014a]. One of the major beneﬁts of thisapproach (versus virtually all other existing synthesis systems) is that it isable do predicate invention , that is, it is able to construct helper predicates/-functions.The MIL framework essentially takes the user’s problem description andsees it as a speciﬁcation of a search space of possible programs to consider.The time required for synthesis is most inﬂuenced by how large the searchspace is that is traversed. The implementation of the MIL framework is5ritten in Prolog, a logic programming language that does not consider thetypes of its terms. The MIL algorithm in turn does not make use of the typesof predicates during its search.

Contribution

The thesis of this document is that types are a signiﬁcantfeature for the MIL approach to code program synthesis. Types provide aneﬀective way of pruning out nonsensical programs in the search space, i.e. pro-grams already rejected by type checking. Diﬀerent gradations of types areconsidered, namely polymorphic types, i.e. types with type parameteres, andreﬁnement types (with polymorphism), i.e. types with a proposition restrict-ing inhabitants. The second system has strictly more accurate types thanthe ﬁrst, but the type checking comes at a considerable cost. For polymor-phic types we show signiﬁcant beneﬁts both theoretically and experimentally,showing a cubic reduction in the size of the search space and synthesis time,in terms of the number of typed background predicates. Additionally we areable to infer polymorphic types of synthesized clauses and of entire programs.For reﬁnement typing the foremost contribution is in the introduction of re-ﬁnement types to ILP, while the experimental results indicate more work isneeded for reﬁnement type checking to be truly eﬀective.

Document organization

This dissertation consists of seven chapters. Theﬁrst chapter presents a brief introduction to the project. The other chap-ters are described as follows: chapter 2 contains a literature review of codesynthesis approaches with a focus on types and ILP. Chapter 3 discusses indetail the existing MIL systems. Chapters 4 through 7 represent the novelcontributions of this project. Chapter 4 discusses where MIL can beneﬁtfrom the introduction of types. The subsequent chapter discusses how poly-morpic type checking is introduced to the existing system. Chapter 6 dealswith reﬁnement types and with how Satisﬁability Modulo Theories (SMT)solving is leveraged for type checking. Both chapter 5 and 6 have sectionspresenting theoretical and experimental results. The conclusion, chapter 7,considers the implications of the experimental results and relates this out-come to future work. 6 hapter 2Review of Program Synthesis

Following on from the arguments made in the introduction regarding the sig-niﬁcance of program synthesis, this chapter opens with discussing the strongrelation between program synthesis and the notion of proof search.Subsequently the presentation switches to a literature review. Diﬀerentaspects of the synthesis problem are highlighted, such as the diﬀerences inproblem deﬁnitions, various application areas and multiple paradigms. Wedo not aim for comprehensiveness, instead we review a selection of disparateapproaches in the literature based on contrasting features.The focus is on code program synthesis approaches for synthesis of func-tional programs, usually based around typing information, and synthesis oflogic programs, where we mainly look at the Inductive Logic Programming(ILP) approach. Methods outside of these areas are also touched upon.

There is a strong correspondence between program synthesis and automatedtheorem proving. In program synthesis there is a speciﬁcation for which wetry to ﬁnd a program satisfying the speciﬁcation. The user might providehelper functions to be used in the synthesized program. In automated theo-rem proving we state a proposition and want to ﬁnd a proof for the validityof the formula. The user might provide lemmas that may be used in theproof.As we will see later, many speciﬁcations can be expressed as a type thatthe synthesized program needs to satisfy. The connection between programsynthesis and proof search is most eloquently expressed by the Curry-Howardcorrespondence [Sørensen and Urzyczyn, 2006]. This correspondence statesthat there is a direct mapping between programs (expressed in type theory)7nd mathematical proofs. For every type theory, a formal system for de-scribing (functional) programs with their types, there is a mapping betweenthe types of the programs and logical formulae. From this correspondence itfollows that type checking is the same as checking that a proof follows therules of formal logical system (e.g. natural deduction).This connection means that when we are interested in program synthe-sis we should be aware of the work that exists “on the other side” of thecorrespondence, namely in (semi-)automated theorem proving. On the sideof theorem proving there is an interesting distinction made between systemsthat fully automate the construction of proofs [Bibel, 2013] and of so calledproof assistants (e.g. Coq [2018], and Isabelle [Nipkow et al., 2002]), wheresoftware assists users in writing down proofs in a formal system. One formof assistance provided is the ability to do proof search. Whilst the ideal oftheorem proving is full automation, for many non-trivial logical systems thisis simply not tractable (in general, it is even undecidable to determine thetruth of all propositions). Proof assistants are instead able to do proof searchbased on a (partial) proof, and lemmas already constructed by the user.The idea of having the user guide proof automation is recognizable pro-gram synthesis as providing background knowledge, in the form of helperfunctions. How a user of a proof assistant asks the system to ﬁnd a proof,and upon failure will try to provide additional guidance by providing addi-tional lemmas (or working further on the proof) is also in clear relation tohow users work with synthesis systems.For this project we work in the setting of logic programs. Logic program-ming is at an interesting intersection of programming and theorem proving,as execution is performed based on SLD-resolution, the algorithm behindﬁrst-order theorem provers. For this document we will hence often rely onthe terminology of proofs and proof search.

In this section we review several important features of program synthesis.Based on a survey of the literature (focused on type-based synthesis andapproaches to ILP), we look at the domains and applications areas wheresynthesis has been successfully applied. Next we look at the diﬀerent waysthat the synthesis problem can be speciﬁed by the user. Following on we lookat the types of programs that can be learned by diﬀerent approaches. Weﬁnish on a discussion of the recognisable paradigms within the literature. Wefollow [Gulwani, 2010] in identifying the key aspects of an approach to thesynthesis problem as capturing the user’s intent in a speciﬁcation, the space8f programs that is considered, and the approach to searching this space.

In this section we give an idea of the diversity of problems addressable byprogram synthesis, based on a selection of successful approaches in the litera-ture. The review paper by [Gulwani, 2010] is great resource for entering uponthe ﬁeld of program synthesis, and unless otherwise annotated the referencefor the below discussion is this paper.Bitvector algorithms “typically describe some plausible yet unusual op-eration on integers or bit strings that could easily be programmed usingeither a longish ﬁxed sequence of machine instructions or a loop, but thesame thing can be done much more cleverly using just four or three or twocarefully chosen instructions whose interactions are not at all obvious untilexplained or fathomed” [Warren, 2002]. In synthesis of bitvector algorithminitial approaches used straight-forward brute-force search [Massalin, 1987],while newer approaches are to leverage SMT solver reasoning [Jha et al.,2010].Other systems are able to ﬁll in holes left in programs. The Sketch sys-tem [Solar-Lezama, 2009] deals with ﬁnding the appropriate values for singlevalue holes in imperative programs. The holes are typically boundary condi-tions, e.g. for loops. In template-based synthesis [Srivastava et al., 2013] theconditions on holes are less-restricted in that holes can be ﬁlled by arbitraryexpressions. Templates are a way for the user to provide their insight to thesynthesis system, by writing code or invariants with holes. Constraints aregenerated for these holes and SMT solvers are used to ﬁnd solutions.Sometimes we are able to give a very precise speciﬁcation for algorithms.For example, we may be able to express the condition for properly imple-menting critical sections and shared variables for mutual exclusion algo-rithms. A synthesis system may then be able synthesis new concurrentalgorithms by properly inserting lock acquiring and releasing statements.[Bar-David and Taubenfeld, 2003] More generally we might be able to veryprecisely describe how function behave. For functional programs a good placeto assert such a speciﬁcation is in the type of the program [Frankle et al.,2016]. In [Polikarpova et al., 2016] non-trivial algorithms over binary-searchtrees are synthesized from speciﬁcations expressed as reﬁnement types. Thesynthesis technique derives from type checking rules for the programminglanguage that is considered.The are a couple of domains that are only well suited for synthesis fromexamples. We highlight string and matrix transformations. A string transfor-mation is a mapping on strings. For example “[email protected]” might9e mapped to “Hoare, T., UK”. One of the applications of string transfor-mation is to learn spreadsheet operations [Singh and Gulwani, 2012a]. Stringtransformations are also an often considered problem in ILP, e.g. [Cropper et al.,2016]. Similarly one can try to learn matrix transformations, by giving input-output examples. These examples are typically larger than string transfor-mations, making it a good benchmark test for synthesis systems [Wang et al.,2017].Another avenue explored is inventing strategies for robots performingtasks. In [Cropper and Muggleton, 2016a] a very high-level strategy is learnedfor serving either tea or coﬀee for a table of cups. The system is providedwith examples of good behaviour which are used to generate a higher-orderlogic program describing the actions that the robot should perform.

From the user’s perspective one of the most important features of a synthesissystem is which problems it is able to solve. In this subsection we presentdiﬀerent ways of specifying the synthesis problem. We distinguish two mainaspects to speciﬁcation, namely how to specify the goal and how to provide background knowledge . Goals

The goal of a synthesis problem is a program that satisﬁes the spec-iﬁcation. The are multiple ways to describe what is expected of the goal.There is the programming-by-example speciﬁcation, where the conditionson the goal are stated as input-output examples [Cropper and Muggleton,2016b]. Examples are split in positive examples, ones that the synthe-sized program needs to satisfy, and negative examples (also called counter-examples) which should not be entailed by the program. A related speci-ﬁcation of the goal is programming-by-demonstration [Lau et al., 2000]: inaddition to providing input-output examples a (partial) trace is provided ofthe transformations that turned the input into the output.As discussed in the previous section, there is the Sketch/templates ap-proach to speciﬁcation by writing programs with holes. In essence the pro-gram goal is already partially given by the user and only small fragmentsneed to be ﬁlled in.A very general approach is to say that any (set of) logical proposition(s)might be used as a speciﬁcation. The previous section’s mutual exclusionconditions would fall in this category. A more rigid framework follows fromonly allowing speciﬁcations over a function’s arguments. By the Curry-Howard correspondence these propositions may also be stated as functiontyping [Frankle et al., 2016]. A recent development is that even examples10ay be encoded in types [Osera and Zdancewic, 2015], yielding types as avery powerful speciﬁcation vehicle.

Background knowledge

There are systems that approach synthesis withonly a speciﬁcation. Such systems are forced to perform searches over verylarges program spaces, the exhaustive search of small lambda-terms satisfyinga type, by [Katayama, 2005], being such an example. More common is toaccept guidance from the user, in which case we call the provided hints the background knowledge .In inductive logic programming the user may provide background knowl-edge in the form of facts, deﬁned Horn clauses (which may be higher-order)[Raedt, 2010]. In meta-interpretive learning [Muggleton et al., 2014b] in ad-dition metarules are supplied. Metarules deﬁne structure for clauses whichare invented by the synthesis system. In chapter 3 we will at these metarulesin more detail.In the setting where we are trying to synthesis functional programs fromspeciﬁcation embedded in types background knowledge takes on the followingforms. Type declarations informing the system of the data structures overwhich to operate. Type declarations for helper functions (with or withoutthe actual deﬁnition of the functions). For encoding more precise propertiesreﬁnement types are leveraged.As a ﬁnal example of background knowledge we highlight that the Sketchand template approaches to synthesis blur the line between providing goalspeciﬁcations and background knowledge. Partial programs are more typ-ically background knowledge, but are here used as the primary means ofspeciﬁcation.

In this subsection we look some of the paradigms for program synthesis.For the separate approaches we highlight the types of programs that havebeen synthesized by the method. The most straightforward approach tosynthesis to enumerative the entire space of programs that your method iswilling to consider, e.g. [Katayama, 2005] searches over all lambda-terms.We look at several more sophisticated methods. We will look at InductiveLogic Programming here as it will feature prominently in the remainder ofthe document.

Maintaining consistent programs

One idea is to maintain the space ofprograms that are consistent with examples. With no examples the entire11pace of programs works. By interatively adding examples one can start re-ducing this space of consistent programs. Version Space Algebra [Lau et al.,2000] is a powerful approach whereby the space of programs (version space) ismaintained by a partial ordering of programs (usually by generality), whichcan be fully represented by the maxima and minima of the ordering. An up-date function is able to shrink these boundaries for each additional exampleconsidered.Other approaches that are able to use a succinct representation of thespace of solutions is the work by Gvero et al. [2013] where types are usedto represent classes of expressions that are candidates for code completionqueries in IDEs. In [Singh and Gulwani, 2012a], an exponentially sized spaceof consistent string transformation is maintained in polynomial space by aclever sharing of shared sub-expressions.

Constraint solving

A very general approach to synthesis is to convert theproblem to (logical) constraints. These constraints are written in a languageof an oﬀ-the-shelf solver (in particular SMT solvers). The results of the solverare then used to construct program solutions.In the Sketch/template approach to synthesis the holes in programs aresurrounded by code that imposes conditions on the possibilities for such holes[Solar-Lezama, 2009] [Srivastava et al., 2013]. In Constraint-Based Synthesisof Datalog Programs [B et al., 2017] the derivation algorithm for (ﬁrst-order)logic programming is encoded as logical constraints, with additionally thatthe predicates symbols are allowed to vary, again subject to constraints. Tomake problem tractable the constraints only encode derivations of a boundedlength. Any solution found satisfying the constraints for partial derivations isthen checked for being an actual solution by using the found program to builda derivation in Datalog itself. If the program does not work an additionalconstraint is generated excluding the program from being considered again.

Inductive logic programming

We have that in ILP the norm is to learnuntyped logic programs. Logic programs consist of Horn clauses, implica-tions with a single atom in the consequent [Raedt, 2010]. These programs arecommonly interpreted as either Prolog, or Datalog code. Prolog representsthe SLD-resolution approach to program execution, and comes with featuresthat make the language Turing-complete, whilst Datalog uses grounding ofﬁrst-order logic to propositional logic to determine entailment (which is de-cidable). While ILP systems primarily are used to synthesize ﬁrst-orderprograms, with an important feature being recursive programs, recent workmakes invention of higher-order programs possible [Muggleton et al., 2015].12his document will further discuss the Meta-Interpretive Learning (MIL)framework as a particularly strong approach to ILP. A simplistic approachto typing in MIL was already considered in [Farquhar et al., 2015] in orderto learn proof strategies. The system only support simple non-polymorphictypes and hardcodes types in metarules.

Typing

As already stated, the speciﬁcation of functions is well addressedby types. There are a number of approaches in the literature on using typesto synthesize functional programs. The norm is to synthesis programs ac-cording to an (existing) type system, with the guarantee that the programstype checks. Important features include being parametric polymorphism,algebraic data types, and structural recursion over these types. Modern sys-tems are able to utilize reﬁnement types as speciﬁcation, but also direct thesearch [Polikarpova et al., 2016].Type directed synthesis usually takes an approach that is very similar totheorem provers. As the program will need to conform to a type accordingto the rules of a type system, these rules are actually used to direct thesearch [Osera and Zdancewic, 2015]. The type speciﬁcation is decomposedaccording the rules and non-deterministic choices are made when the premisesof the rules need information that is not captured in the conclusion, i.e. thecurrent program fragment to be proven.It is interesting to note that type directed synthesis is quite ﬂexible. In[Frankle et al., 2016] it is noted that examples can encoded in types, meaningthat types are expressive enough to capture logical requirements on programsas well as inductive synthesis from examples. In [Gvero et al., 2013] typesare used direct the generation of code completions in IDEs. Here types areviewed as set of consistent expressions all being candidates to be enumerated.

Arbitrary DSLs

A more ambitious approach is to synthesis programs notrestricted to a particular programming language. In work by [Wang et al.,2017] a domain expert provides a domain speciﬁc language (DSL) and theend-user provides examples. Along with the DSL concrete semantics (anevaluation function) and abstract semantics (a mapping to abstract values,e.g. value ranges) are provided. The domain experts chooses the appropriateabstractions made available to the system.The approach uses ﬁnite tree automata (FTA) to encode abstract syntaxtrees (ASTs) of the DSL with predicates on the nodes encoding their abstractsemantics. The FTA is used to maintain a set of programs whose abstractsemantics are consistent with the examples. A “best” program is selectedamong the accepted programs and is checked against the examples. If un-13uccessful the tree automata is modiﬁed such that more nodes with abstractsemantics become available and with the guarantee that the previously se-lected programs are no longer successful. This is iterated until a successfulprogram is found. It is shown in the paper that this system is able to out-perform multiple other purpose built systems. However, a severe limitationof this approach is that it does not handle DSLs with binders, e.g. lambdaterms and let bindings. 14 hapter 3Untyped Meta-InterpretiveLearning

This chapter contains an overview of the existing meta-interpretive learning(MIL) approach to one variant of the inductive logic programming prob-lem. First, we state prerequisites before presenting a formal deﬁnition of theproblem addressed by MIL. Subsequently the MIL framework is explained,along with the central role of metarules. Finally, the high-level algorithm ispresented in the form of Metagol AI , an implementation of MIL which incor-porates a powerful extension to allow for higher-order abstractions. We work within the framework of logic programming. A reader unfamil-iar with this topic is referred to [Nienhuys-Cheng and De Wolf, 1997] for acomprehensive treatment. The primary logic programming features we willassume familarity with are logical variables, uniﬁcation on logical variables,and SLD-resolution (along with seeing a SLD-tree as a derivation/proof of agoal atom). We use the language of logic to introduce the requisite concepts.The deﬁnitions in this chapter mainly follow those of Cropper [2017]’s PhDthesis.

First-order A variable is a character string whose initial letter is upper-case. Function and predicate symbols are character strings whose initial letteris lowercase. The arity n of a predicate/function symbol p is the number ofarguments it takes and is denoted as p/n . The predicate signature P is theset of predicate symbols with arity greater than 0. A constant is a functionsymbol with arity zero. The constant signature C is the set of constant sym-15ols. A variable which can be substituted by a constant or function symbol iscalled a ﬁrst-order . The set of ﬁrst-order variables is denoted as V . A term is a variable, a constant symbol, or a function symbol of arity n immediatelyfollowed by a bracketed n -tuple of terms. A term which contains no variablesis called ground . A formula p ( t , . . . , t n ), where p is a predicate symbol ofarity n and each t i is a term, is called an atom . An atom is ground if allof its terms are ground. The ¬ symbol is used for negation. A literal is anatom A or its negation ¬ A . Clauses A clause is a ﬁnite disjunction of literals. Each variable in a clauseis implicitly universally quantiﬁed. A clause that contains no variables isground. Clauses with at most one positive literal are called Horn clauses . AHorn clause with exactly one positive literal is called a deﬁnite clause : Deﬁnition 3.1.1.

A (ﬁrst-order) deﬁnite clause is of the form: A ← A , . . . , A m where m ≥ A i is an atom of the form p ( t , . . . , t n ), such that p ∈ P and t i ∈ C ∪ V . The atom A is the head and the conjunction A , . . . , A m isthe body .A deﬁnite clause with no body literals is called a fact . A Horn clausewith no head, i.e. no positive literal, is called a goal . Higher-order

For higher-order logic, the quantiﬁcation of ﬁrst-order logicis extended to allow for quantiﬁers to range over predicate and functionsymbols. A variable which can be substituted by a predicate symbol is higher-order . The set of higher-order variables is denoted as V . A higher-order termis a higher-order variable or a predicate symbol. An atom which has at leastone higher-order term is higher-order. A deﬁnite clause with at least onehigher-order atom is higher-order: Deﬁnition 3.1.2. A higher-order deﬁnite clause is of the form: A ← A , . . . , A m where m ≥ A i is an atom of the form p ( t , . . . , t n ), such that p ∈ P ∪ V and t i ∈ C ∪ P ∪ V ∪ V . Substitution

Given a formula with variables v , . . . , v n , simultaneouslyreplacing the variables with terms t , . . . , t n is called a substitution. Such asubstitution is denoted by θ = { v /t , . . . , v n /t n } . A substitution θ uniﬁesatoms A and B in the case Aθ = Bθ .16 .2 Meta-Interpretive Learning This section starts by formally introducing the problem addressed by MIL.To do so we ﬁrst need the key MIL concept of metarules. The second partof this section gives an overview of meta-interpreting and how metarules liftthis notion to meta-interpretive learning.

The user supplies a set of examples E and background knowledge B . All examples e ∈ E are ground atoms over the same predicate name. The ex-amples E = ( E + , E − ) are divided into positive and negative examples. The background knowledge B = D ∪ M consists of deﬁnite clauses D , representingprogram fragments, and metarules M . The deﬁnite clauses also encode thefacts (clauses without body) that are asserted. Deﬁnition 3.2.1.

A higher-order formula of the form ∃ π ∀ µ . A ← A , . . . , A m where m ≥ π and µ are disjoint sets of higher-order variables, is called a metarule . Each A i is an atom of the form p ( t , . . . , t n ) such that p ∈ P ∪ π ∪ µ and each t i ∈ C ∪ P ∪ π ∪ µ .Metarules diﬀer from higher-order deﬁnite clauses in that they allow ex-istential quantiﬁcation.Table 3.1 lists common metarules, a selection of which will be usedthroughout this document. Note that we elide the quantiﬁers, e.g. the full def-inition of the Identity metarule is ∃ P ∃ Q ∀ A ∀ B P ( A, B ) ← Q ( A, B ). Whenquantiﬁers are omitted, we always label universally quantiﬁed ﬁrst-order vari-ables with names from the start of the alphabet (

A, B, C, . . . ) and existen-tially quantiﬁed higher-order variables with names starting from P on in thealphabet. Deﬁnition 3.2.2.

Given ( B , E ) = ( B , ( E + , E − )) as background knowledgeand examples, respectively, a program H is a consistent hypothesis , denoted H ∪ B (cid:15) E , if all positive examples are entailed by the program, and none ofthe negative examples are. Deﬁnition 3.2.3. A MIL learner takes an input (

B, E ) and either outputs adeﬁnite program H that is a consistent hypothesis for the input, or terminatesstating failure to ﬁnd a program. 17 ame metarule Identity P ( A, B ) ← Q ( A, B )Precon P ( A, B ) ← Q ( A ) , R ( A, B )Curry P ( A, B ) ← P ( A, B, R )Chain P ( A, B ) ← Q ( A, C ) , R ( C, B )Tailrec P ( A, B ) ← Q ( A, C ) , P ( A, B )Table 3.1: Common metarules, where variables A , B , and C are universallyquantiﬁed and P , Q , and R are existentially quantiﬁed.We consider MIL learners that have the additional guarantee that theyreturn programs that are optimal in a textual complexity sense. The opti-mizing criterion used is the number of clauses in the program. The guaranteeis that there is no other consistent hypothesis which has fewer clauses. A Prolog meta-interpreter evaluates a Prolog-like language by unifying agoal with a head of one of the ﬁrst-order clauses that it has available. Theatoms in the body of the uniﬁed clause become new goals subject to the sameprocedure. For example, SLD-resolution on the goal ancestor ( alice, charlie ),given the deﬁnite clauses parent ( alice, bob ) parent ( bob, charlie ) ancestor ( A, B ) ← parent ( A, B ) ancestor ( A, B ) ← parent ( C, B ) , ancestor ( A, C )yields by resolution with the last clause (unifying the goal with the head ofthis clause) that the goals required to prove become ancestor ( alice, C ) and parent ( C, charlie ). To prove the ﬁrst of these two goals the ﬁrst clause with ancestor as head is chosen (non-deterministically out of the two options).The two goals are now parent ( alice, C ) and parent ( C, charlie ). Unifyingthe ﬁrst goal with the ﬁrst fact ﬁxes C to bob allowing both goals to beproven by the asserted facts.The Meta-Interpretive Learning framework is a meta-interpreter that ad-ditionally tries to unify a goal with the head of a metarule, which is a higher-order clause. Upon selecting a metarule to prove the goal, the uniﬁcation issaved in the form of a meta-substitution .18 eﬁnition 3.2.4. Let M be a metarule with the name x , C be a horn clause, θ be a unifying substitution of M and C , and Σ ⊆ θ be the substitutionswhere the variables are all existentially quantiﬁed in M , such that Σ = { v /t , . . . , v n /t n } . Then a meta-substitution for M and C is an atom of theform: sub ( x, [ v , . . . , v n ] { v /t , . . . , v n /t n } ), where the second argument is alist of logical variables with the appropriate substitution applied.Saved meta-substitutions are reused for proving goals that are encoun-tered later on, becoming available as uniﬁcation targets like the deﬁniteclauses in the background knowledge. Upon completing the proof of all goalsthe meta-substitutions contain a description of the program. To obtain theprogram the saved substitutions are applied to the named metarules. Eachsuch instantiated metarule corresponds to an invented clause of the program,with the predicate symbols being grounded through the substitution. Deﬁnition 3.2.5.

Let ( B , E ) be a MIL input and H a hypothesis. Then apredicate p is an invention if it is in the predicate signature of H and not inthe predicate signature of B ∪ E .Metarules form the heart of the MIL approach to learning. Metarulesallow for introducing a strong bias to the hypothesis space. This is due tothe clauses of programs in the hypothesis having to conform to the structureof the supplied metarules. Metarules hence give great control over the sizeof the search space, as well as how the search space is traversed. They alsoallow the user to specify which features they consider likely to be needed bythe program. Examples are (tail-)recursion and higher-order clauses.In recent work by Cropper and Muggleton [2016a] the MIL framework isexpanded so as to allow background predicates and inventions with higher-order arguments. AI : Untyped Abstracted MIL The implementation of MIL that we consider as a basis in this document isthe Metagol AI system introduced by Cropper and Muggleton [2016a]. Thissection shows how to specify a problem instance in Prolog code and goes overa high-level version of the algorithm.The improvement of the Metagol AI versus the original Metagol implemen-tation of MIL [Muggleton et al., 2014a] is in allowing higher-order predicatesto be part of the program, in particular predicates that are termed abstrac-tions . As shown in [Cropper and Muggleton, 2016a], learning higher-orderinventions has beneﬁts in terms of ﬁnding smaller programs, which leads to19educed learning times, and can achieve high accuracy with a reduced numberof examples. Deﬁnition 3.3.1. A higher-order deﬁnition is a set of higher-order deﬁniteclauses with matching head predicates.The following deﬁnition of map , operating over lists , is an example of ahigher-order deﬁnition: map ([] , [] , F ) ← map ([ A | As ] , [ B | Bs ] , F ) ← F ( A, B ) , map ( As, Bs )Any clause which takes an argument that is a predicate is termed an abstraction : Deﬁnition 3.3.2.

A higher-order deﬁnite clause of the form ∀ τ p ( s , . . . , s m ) ← q ( u , . . . , u n , v . . . , v o )where o > τ ⊆ V ∪ V , p, q, v , . . . , v o ∈ P , and s , . . . , s m , u , . . . , u n ∈ V ,is called an abstraction .The following clause, which increases each element of a list by one, is anexample of an abstraction: increment all ( A, B ) ← map ( A, B, succ ) . For Prolog code we will use the typewriter font. Deﬁnite clauses in Prologare very similar to the logic syntax, except the arrow ( ← ) is replaced by anASCII version ( :- ) and every clause is terminated by a dot. For clauses withempty bodies the ersatz arrow is omitted.The background knowledge is separated into three parts: primitive clauses , interpreted clauses and metarules. Primitive clauses are just standard Prologdeﬁnite clauses, as such they do not involve atoms whose predicate symbolare variable. Such clause deﬁnitions are added to the background knowl-edge by asserting that they are available as a primitive, e.g. id(X,X). is adeﬁnite clause, and by asserting prim(id/2) the predicate is added to thebackground knowledge. A list in Prolog is either [], the empty list, or is a cons of a head H and a tail T ,denoted as [ H | T ]. interpreted , e.g. interpreted(map/3) .Asserting metarule(Name,Subs,(Head:-Body)) adds a metarule withthe name Name to the background knowledge.

Subs is a list of the existential(higher-order) variables occuring in the metarule,

Head is the head atom ofthe rule and

Body is a list of the body atoms of the rule. Take the chain ruleas an example: metarule(chain,[P,Q,R],(P(A,B):-[Q(A,B),R(B,C)])).

Figure 3.1 contains the Prolog code for the abstracted MIL algorithm, whichhas support for higher-order abstraction.

Invocation

The ﬁrst clause, learn , is the invocation point of the algo-rithm. After the user has asserted their background knowledge, they run thealgorithm by calling learn with a list of their positive and a list of their neg-ative examples whereby, upon success,

Prog gets instantiated to a program.The algorithm starts out with an empty program, the ﬁrst argument to prove in the body of learn . If a program is found entailing the positive examples itis checked against the negative examples. Were one of the negative examplesto be entailed, backtracking occurs and the search will continue for anotherprogram entailing the positive examples. When all the positive examples areentailed by the found program, and none of the negative examples are, thesearch successfully terminates.

The prove procedure is mainly for choosing the ﬁrst goal in a listof goals to hand oﬀ to the prove aux procedure. The mutual recursion of prove and prove aux represents a left-most depth-ﬁrst search over the goalsthat arise during synthesis. When a goal has been proven successfully by prove aux it may have changed the program under construction, hence thenew program is passed along for proving the remaining

Atoms goals. As thelast disjuctive clause prove aux will always succeed with introducing a newinvented clause, an additional mechanism, not shown in the code, of an limiton the number of inventions is used. The limit makes sure that the depthﬁrst search does not run oﬀ and keeps creating invented clauses for goals21 earn(Pos,Neg,Prog):-prove(Pos,[],Prog),not(prove(Neg,Prog,Prog)).prove([],Prog,Prog).prove([Atom|Atoms],Prog1,Prog2):-prove_aux(Atom,Prog1,Prog3),prove(Atoms,Prog3,Prog2).prove_aux(Atom,Prog,Prog):-prim(Atom),call(Atom).prove_aux(Atom,Prog1,Prog2):-interpreted((Atom:-Body)),prove(Body,Prog1,Prog2).prove_aux(Atom,Prog1,Prog2):-member(sub(Name,Subs),Prog1),metarule(Name,Subs,(Atom:-Body)),prove(Body,Prog1,Prog2).prove_aux(Atom,Prog1,Prog2):-metarule(Name,Subs,(Atom:-Body)),prove(Body,[sub(Name,Subs)|Prog1],Prog2).

Figure 3.1: The Metagol AI algorithm.that are diﬃcult (or impossible) to prove. By re-running the algorithm withincreasing invention limit all possible programs are considered. Additional proving rules

For the most important part of the algorithm,we consider each of the disjunctive bodies of prove aux separately: • The ﬁrst disjunct tries to prove an atom (whose predicate symbol mightbe a variable) by unifying the atom’s predicate with a predicate assertedas background knowledge, considering all options based on the predi-cate’s arity. When this uniﬁcation succeeds the atom is uniﬁed with thehead of the predicate, whereupon it is known that

Atom represents aﬁrst-order atom which Prolog is able to evaluate. Evaluation is invokedby call(Atom) . If successful the atom’s predicate symbol will remainﬁxed for for any goals subsequently generated. If the call to call fails,or having tried all possible programs which included this particularchoice for the atom causing Prolog to backtrack to this decision point,all remaining predicates in the background knowledge will be tried in22he same manner. If none lead to a successful program being foundthis disjunct of prove aux fails, causing the next disjunct to be tried. • The second disjunct tries to prove the goal atom by unifying the atomwith the head of one the head of interpreted background predicates.Again, if the predicate symbol is a variable it will become ﬁxed uponsuccessful uniﬁcation. By unifying the head of an interpreted clausethe corresponding variables in the

Body change accordingly, therebymaking the body of the interpreted clause the goals that need to beproved to be able to conclude the uniﬁed head. • Upon failure to ﬁnd a successful program by proving the atom by aninterpreted predicate, the algorithm checks if it may make use of anyof the invented clauses that it keeps track of in the form of meta-substitutions. For each meta-substitution that is already in the pro-gram the algorithm tries to unify the atom with a fresh head of themeta-substitution’s metarule, under the restrictions of the variablesthat are already ﬁxed in the substitution list

Subs . If it succeeds, thebody of the reused invented clause represents the goals that need to beproven. • Finally, the last disjunct applies when all other disjuncts failed to leadto a successful program. Now a metarule is instantiated, with no con-ditions on the

Subs titutions for the existentially quantiﬁed predicatesymbols. This means that

Atom is used to create a new invention. Thebody atoms of the metarule become the goals to be proven and theinvention is remembered by prepending a new meta-substitution to theexisting program.In overview: a depth-ﬁrst search is conducted over partial programs, start-ing with the examples as goals. To prove each goal, ﬁrst it is attempted to beproven by one of the primitive and interpreted predicates. In the ﬁrst caseProlog is able to evaluate whether the atom holds, and in the second casethe body of the higher-order predicate remain as goals to be proven by meta-interpretation. Otherwise the saved invented clauses are tried, again leadingto additional goals to be proven. As a last resort an atom can be proven bycreating a metarule-guided invention, where the body of the invention willlikely contain goal atoms with predicate variables. If there are no more goalsto prove a successful program has been found.23 hapter 4Typed Meta-InterpretiveLearning

Starting with this chapter, this document will focus on presenting novel con-tributions. This chapter highlights the potential beneﬁts of adding typingto Meta-Interpretive Learning (MIL) before the subsequent chapters focuson tackling the search space reduction issue. After some preliminaries re-garding types, three areas are discussed where types bring with them greatpotential. The ﬁrst is the ability to prune away parts of the search space.Second, we consider the idea that types can provide guidance in choosinghow to traverse the search space. The ﬁnal highlighted feature is that itmight be possible to inform the user that the background knowledge thatthey provided is insuﬃcient for solving the synthesis problem.

As is usual in presenting logic, the deﬁnitions of the previous chapter did notinclude any type decorations. For our purposes a type is a set of values. Avalue that belongs to a type is called an inhabitant . Normally it is knownfor functions and predicates for which values a predicate or function makessense. We will restrict the domain of predicates/functions to just these values.We will use such domain restrictions, along with co-domain restrictions, indeﬁning the formal types of predicates and functions. As a general referencefor reﬁnement types with polymorphism see [Freeman and Pfenning, 1991].

Polymorphic types

The types we consider are constructed from a col-lection of base types B , among others, the integers, int , and the Booleans,denoted bool . We also allow holes in our types, for which we use type vari- bles . The Cartesian product ( × ) is used to construct types whose values aretuples of values from the supplied types. The arrow ( → ) is used to constructfunction types from other types. We have that the grammar for our typesis stated as T ::= b i | X j | T × T | T → T, where b i ∈ B and X j represents type variables. A type is higher-order if afunction type appears in an argument position. Deﬁnition 4.1.1.

A predicate p , with arity n , is a function whose result typeis Boolean. The predicate type of p is ﬁxed by the types of its n arguments.This is denoted as: p : T × . . . × T n → bool or p ( a , . . . , a n ) : T × . . . × T n → bool where a i : T i are the formal arguments of p .We will abbreviate the predicate typing p : T × . . . × T n → bool to thecompacter p : [ T , . . . , T n ]. In what follows we will be lax in distinguish-ing between the type of an atom (always bool ) and the type of the atom’spredicate.For function typing a similar deﬁnition holds, except that the co-domainis not ﬁxed to be Boolean. For functions we do not use shorthand notation.A type is polymorphic when it is parametrized by one or more types, i.e. itcontains type variables. We always consider type variables to be universallyquantiﬁed. For example, the polymorphic type of lists is list ( X ). Replac-ing the type parameter X with a type resolves the polymorphism, e.g. take X = int , then list ( int ) is the type of lists of integers, which is no longerpolymorphic. A predicate is polymorphic when one of its argument types is. Reﬁnement types

Given the above description of types we deﬁne the no-tion of reﬁnement type, and refer to unreﬁned types as simple (polymorphic)types . A reﬁnement type { x : T | ϕ } is the subset of the type T consist-ing of the values x that satisfy the formula ϕ . For this work we only allowreﬁnements to occur on predicates.In order to discuss the meaning of reﬁnements on predicates, hereby areminder regarding the interpretation of predicates: a predicate implementsa relation, which identiﬁes tuples, by mapping arguments included in therelation to ⊤ (true) and otherwise returning ⊥ (false).Let p ( a , . . . , a n ) : T × . . . × T n → bool be a predicate with its type. A re-ﬁnement ϕ for this type is a proposition that can mention any of the variables This is in essence the type grammar of System-F [Girard, 1971]. i . The reﬁned type is denoted as T × . . . × T n → bool h ϕ i , with shorthand[ T , . . . , T n ] h ϕ i . The semantics of this type is that ϕ denotes a necessarycondition of any ⊤ inhabitant of the Boolean result type. In conventionalreﬁnement type notation this corresponds to bool h ϕ i = { b : bool | b = ⊥ ∨ ( b = ⊤ ⇒ ϕ ) } = { b : bool | b = ⊥ ∨ ( b = ⊤ ∨ ϕ ) } = { b : bool | ¬ ϕ ⇒ b = ⊥ ) } . Given a valuation for the arguments of p , the proposition ϕ has a truthvalue. When a valuation makes ϕ false, the values of the arguments to thepredicate cannot be identiﬁed by the relation that the predicate is represent-ing, as the predicate must return false for these arguments. Hence reﬁnementare a way of restricting types to make them more accurate.As an example consider the higher-order map predicate, as of it is a goodcandidate for a reﬁnement type. The map relation guarantees that the listarguments are of the same length. This is encoded in a reﬁnement type asfollows: map ( A, B, F ) : [ list ( T ) , list ( S ) , [ T, S ]] h length ( A ) = length ( B ) i It is important to note that simple polymorphic types are precisely ourreﬁnement types with all reﬁnements just being the true proposition (notimposing any restrictions). In this text we will elide types (and their reﬁne-ments) when they are not of interest to the discussion at hand.

The MIL approach is surprisingly eﬀective gives its simplicity. The algorithmis quite naive in regard to what possibilities for programs it is willing to con-sider. One major shortcoming, inherited from its logic programming origins,is its irreverence of types. Many of the logic programming languages do notconcern themselves with types. In particular, Prolog does not have predicatetypes.Consider having to prove a goal atom P ([ true, f alse, true ] , [1]), with P apredicate variable. We can immediately see that its type is [ list ( bool ) , list ( int )].Let us suppose that the given background predicates include the predicates succ : [ int, int ] (the successor relation on integers), tail : [ list ( X ) , list ( X )](the list tail relation), and map : [ list ( X ) , list ( Y ) , [ X, Y ]] (the relation thatmaps another relation over elements). The metarules we consider for thisexample are Chain and Curry (see table 3.1). Clearly any predicate in the26ackground knowledge over non-list types need not be tried as a direct sub-stitution for P . Also many inventions and interpreted predicates need notbe tried, such as inv ( A, B ) ← tail ( A, C ) , map ( C, B, succ ), purely due to thetyping of this clause being incompatible with the type of the example goal.

The work involved in the above example is considerable given how easy it is tosee, based on typing, that it cannot succeed. First P will be uniﬁed with inv whereupon uniﬁcation of the atoms assigns A = [ true, f alse, true ] and B =[1]. The body atoms tail ([ true, f alse, true ] , C ) and map ( C, [1] , succ ) becomethe new goals, being passed of to be proven by a recursive prove call. Theatom tail ([ true, f alse, true ] , C ) is proven in the ﬁrst disjunct of prove aux ,assigning C = [ f alse, true ]. Now the second goal map ([ f alse, true ] , , succ )is considered. None of the primitives predicates unify with the predicatename, hence the interpreter clause of prove aux is evaluated. Here the headof map is ﬁrst checked for whether it happens to unify with base case overempty lists, which it does not. Subsequently the head of the inductive dis-junct of map is uniﬁed, whereupon succ ( f alse,

1) and map ([ true ] , [] , succ ) be-come the new goals. Again prove is called recursively, whereupon succ ( f alse, AI is able to decide that the invented clause inv ( A, B ) ← tail ( A, C ) , map ( C, B, succ ) could not be used, leading the meta-interpreterto backtrack to the decision point of having to pick a way of proving theoriginal atom P ([ true, f alse, true ] , [1]). Type Checked

In contrast, when we consider types, a uniﬁcation attemptis enough to determine that the above work is unnecessary. The type anno-tations for the atoms are P ([ true, f alse, true ] , [0]) : [ list ( bool ) , list ( int )]and inv ( A, B ) : [ list ( int ) , list ( int )] ← tail ( A, C ) , map ( C, B, succ ) . Type checking corresponds to checking whether the type of inv can havezero or more of its variables uniﬁed such that it corresponds to P ’s type.Clearly the ﬁrst arguments’ types cause uniﬁcation to fail, thereby rejectingthe attempt to try to prove the goal atom with the invented clause.27his is just one example of where simple type checking helps considerably.Every time type checking can determine that one of the options considered byMetagol AI cannot be successful exploration of a part of the search space canbe skipped over, which is called pruning . When such parts of the search spacecontain multiple decisions points considered by the algorithm the beneﬁt oftype checking becomes considerable more signiﬁcant. Polymorphic types are already quite powerful in pruning parts of the searchspace. But suppose we make a simple adjustment to the above example.Instead of our goal’s arguments including Booleans we change these valuesto integers, making the example goal P ([1 , , , [1]). A type check determinesthat the atom’s type [ list ( int ) , list ( int )] is uniﬁable with [ list ( int ) , list ( int )],hence giving the go-ahead to attempt to prove the atom with inv . Unnecessary work revisited

Because polymorphic type checking is un-able to rule out the invented clause the algorithm will be forced to do thesame work as detailed above. When it reaches the atom succ (0 ,

1) Prologwill instead determine that this atom does hold. Now even more work will beperformed, namely the remaining goal, map ([1] , [] , succ ), will be passed oﬀto prove aux , which again checks that this is not a primitive before checkingthe interpreted disjunctive clauses of map . Thanks to the ﬁrst argument thebase case does not apply, and because the second argument does not allowitself to be split into a head and tail the inductive case of map also doesnot apply. So in this case, there was not only more work in trying to proveadditional goals before ﬁnally ﬁnding out that the invented clause cannot beused, but we also incurred a small cost for the uniﬁcation attempt of thetype checking.Intuitively, when we look at the example goal and at the invented clausewe are still able to immediately see that this clause cannot work. This isdue to being able to reason about the lengths of the lists involved. Clearlyall the tuples entailed by the invented clause have as property that the ﬁrstargument has exactly one additional element versus its second argument. Reﬁnement reasoning

The above noted relation on the arguments of inv is a property well captured by a reﬁnement type and can be stated as: inv ( A, B ) : [ list ( int ) , list ( int )] h length ( A ) = length ( B ) + 1 i This type states that the predicate can only entail the arguments if theﬁrst list has a length exactly one longer than the second list. If we now return28o the type checking of P ([1 , , , [1]) : [ list ( int ) , list ( int )] against the type of inv we not only try to unify the polymorphic types, but we also check that thereﬁnement is still satisﬁable, i.e. does not preclude the arguments from beingentailed. What happens is that the heads are speculatively uniﬁed, leadingto the reﬁnement becoming instantiated to length ([1 , , length ([1]) + 1.For this example checking the reﬁnement can simply be done by expandingthe deﬁnition of length , which directly derives the contradiction 3 = 2. Thereﬁnement type hence declares that these values are never part of the relationencoded by the inv clause. At this point the algorithm will give up on inv and will try the next option for proving the example atom. Reﬁnementtype checking is hence able to prune parts of the search space that simplepolymorphic type checking is not able to.There is, of course, the issue that reasoning over (instantiated) reﬁne-ments is not entirely trivial. For the example the work needed to come toa contradiction was very simple. The main challenge for the usage of re-ﬁnement types is in identifying reﬁnements that suitably abstract from thepredicates, and in ﬁnding algorithms that are able to very rapidly reason onthe constraints speciﬁed by the reﬁnements. Composing reﬁnements

The above identiﬁed reﬁnement for inv is notsomething a user would be able to supply to the system, as the clause inquestion is an invention. The body atoms on the other hand are backgroundpredicates which the user can annotate with appropriate reﬁnements. Thereﬁnement for tail ( A, C ) could well be length ( A ) = length ( C ) + 1while map ( C, B, F )’s reﬁnement can be taken to be length ( C ) = length ( B ).The structure of the clause now indicates how these reﬁnements compose,namely length ( A ) = length ( C ) + 1 ∧ length ( C ) = length ( B ). Clearly, thisapproach to reﬁnements is quite powerful in that is it able to capture sensiblereﬁnements even for invented clauses. The Metagol AI algorithm uses a basic depth-ﬁrst search procedure for de-termining the order in which goals are proven. A novel observation is thatthis search might be better steered by the information available in the goalsthat remain to be proven. The idea is that the ordering of the goals in thesearch needs to be guided by a heuristic function. Correspondingly the search29lgorithm needs to be adjusted so as to implement the best-ﬁrst search pro-cedure. Best-ﬁrst search keeps track of all the nodes left to explore, in ourcase the current goals left to prove, and selects among them the best nodeto explore/prove next (according to the heuristic). As it is non-obvious whatthe main features of interest are and how much weight they should have rel-ative to each other, work is needed in determining good heuristic orderingfunctions.The goals are the basic units whose proof order needs to be decided on,which, in the untyped case have quite limited information available. Obviousconsiderations are to prioritize those goals that are already entirely ground,including the predicate symbol(s). Amongst them the atoms with primitivepredicate symbols would need to be sorted ﬁrst as an inconsistency on sucha goal would lead to immediate backtracking. Next one would sort on thenumber of variables in the goals, preferring atoms with fewer variables asthese are less likely to incur non-determinism in their proof. Issues in orderingstart to crop up when deciding on the ordering of atoms with diﬀerent aritiesand whether the proportion of the number of ground arguments might notbe a more useful feature than just the sheer number of them. Type Guidance

The introduction of types is interesting for heuristic searchbecause it adds additional information to use for determining the ordering.As we saw in the section on pruning the search space, types of the predicatesare often already known while the arguments of such predicates are still vari-ables. During synthesis, e.g. of a new invention, there are also variables inthe types, meaning that some types are more “complete” than others.A heuristic in the typed environment could consider ordering goals withcomplete types (i.e. no variables in them) ﬁrst, as these types will be ableto rule out large parts of the background predicates. Subsequently it wouldbe useful to consider how to order the predicates with incomplete types. Todeﬁne a useful heuristic it would then be necessary to weight the type featuresrelative to the non-type features.

Implementation

The accompanying synthesis system implemented forthis project has best-ﬁrst search available as an option. There is a proof-of-concept heuristic that looks at type information and the number of groundarguments. While the implementation is able to show some beneﬁts in ex-ploring less of the search space, the heuristic needs more thought before thevalue of heuristic search in MIL can be thoroughly evaluated. For now wedefer work on heuristics. 30 .4 Showing Insuﬃciency of the BackgroundKnowledge

A third area where type annotated predicates might be useful is in determin-ing that the predicates that the user supplied will never be able to prove thegoals. Clearly it would be very beneﬁcial for the user to be aware that thebackground knowledge that they are providing is insuﬃcient for solving theproblem. Ideally the synthesis algorithm is able to detect, no matter the sizeof the programs it will consider, whether there are any sensible programs atall in the hypothesis space.Deciding on whether the background predicates can be composed is some-thing that type checking is already able to reason about. Hence the idea is toleverage the user provided type annotations to check whether the backgroundpredicates compose at all.One simple approach is to take the type of the examples and for eachmetarule try types for the body atoms. The types one would try are theones on the background predicates. Simple types or reﬁnement type checkingmight indicate that not a single sensible composition was found. Such a resultwould only prove that no single clause programs can be constructed for thegiven examples. If we relax the condition that the clause must match with thetype of the examples we could show that no invention exist that is composedfrom just background predicates.The above reasoning is not strong enough to make claims about the non-existence of programs that are only slightly more complex. The main issueis that the type of an invention does not have to be the same as the type ofany of the background predicates. Determining what types can be generatedfor inventions would be one approach to trying to show insuﬃciency of thebackground knowledge.In this thesis we limit ourselves to presenting the above argument for whytypes could be useful for showing insuﬃciency of the background knowledge,but will leave a solution to the problem for future work.31 hapter 5Synthesis with PolymorphicTypes

This chapter introduces polymorphic types to Meta-Interpretive Learning(MIL). The main motivation for the introduction of types is so that typechecking is able to prune parts of the search space. For a motivating exampleplease refer back to section 4.2.We restate the problem that our system is able to address, noting the typeannotations we expect the user to provide. We introduce the Metagol PT algorithm, which extends the Metagol AI algorithm with polymorphic typechecking. We look at how the type annotated background knowledge, alongwith uniﬁcation, can propagate types through to the newly derived goals. Inthe section that follows we look at inferring polymorphic types, i.e. generatethe most general type of each clause, and of the program itself.Towards the end of the chapter we argue for the algorithm’s correctnessin terms of being sound and complete (relative to Metagol AI ). We close outthe chapter by a theory result regarding the inﬂuence of types on the size ofthe search space, and perform experiments validating the work of introducingpolymorphic types. We assume familiarity with the synthesis problem statement in section 3.2.1,as well as with the way users provide background knowledge to the Metagol AI algorithm, as discussed in section 3.3.1. Instead of reiterating the completedeﬁnitions we note the needed adjustments to these deﬁnitions.In addition to supplying examples, the user now supplies a type for the ex-amples, that is a single type that is consistent for all examples. For the back-32round knowledge we stipulate that each atom that can become a goal withinthe MIL algorithm is annotated with a (polymorphic) type. For the primitiveclauses only the head of the clause needs to be assigned a type. As an exam-ple, the (primitive predicate) head , which returns the ﬁrst element of a list,was added to the untyped background knowledge as prim(head/2) (with the2 signifying the predicate’s arity). In the setting of typed synthesis predi-cates need to be asserted with their type, that is, prim(head:[list(X),X]) adds the head relation to the (typed) background knowledge.The interpreted predicates and the metarules need types for their headatoms as well as for their body atoms. The variable names used for the typesare shared in a deﬁnition of a clause, which is the main means of propagatingtype information.For adding interpreted predicates to the untyped background knowledge,the deﬁnition of the interpreted map predicate was stated as a normal Prologdeﬁnition, plus an interpreted assertion: map([],[],F).map([A|S],[B|T],F):-F(A,B),map(S,T,F).interpreted(map/3). For the typed assertions of interpreted predicates, we need to annotatethe atoms of the body of an interpreted clause in addition to the head atom.Following the notation used for metarules, the deﬁnition of the clause ismoved into the interpreted assertion : interpreted(map([],[],F):[list(X),list(Y),[X,Y]] :- []).interpreted(map([A|S],[B|T],F):[list(X),list(Y),[X,Y]] :-[F(A,B):[X,Y],map(S,T,F):[list(X),list(Y),[X,Y]]]). Metarule deﬁnitions now include types on their atoms. For the un-typed Chain metarule the assertion metarule(chain,[P,Q,R],(P(A,B):-[Q(A,C),R(C,B)])) suﬃced. For adding the Chain metarule to the typedbackground knowledge the assertion becomes: metarule(chain,[P,Q,R],(P(A,B):[X,Y] :- [Q(A,C):[X,Z],R(C,B):[Z,Y]]).

Deﬁnition 3.2.2, on consistent hypotheses, only needs adjustment in thatthe generated clauses of the program have simple types on their atoms. Note that we make heavy use of Prolog’s square bracket list notation here: to operateover values, as the data structure containing the multiple body atoms, and as a convenientsyntax for predicate types. eﬁnition 5.1.1. A Typed MIL learner takes examples and backgroundknowledge with polymorphic types and outputs a polymorphic typed deﬁniteprogram H that is a consistent hypothesis for the input. PT : Type Checking through Uni-ﬁcation One of the major strengths of Prolog (and logic programming in general) issupport for logical variables and uniﬁcation on these variables. Uniﬁcationplays an important role in most type checking and type inference algorithms(see, for example, [Kanellakis and Mitchell, 1989]). For our purposes we canreduce all type checking to uniﬁcation.Uniﬁcation on types serves two purposes during synthesis. For type check-ing we want to show that a general (polymorphic) type can be instantiatedto a more speciﬁc type, e.g. as in the case of head:[list(X),X] , and provingan atom

P([1],1):[list(int),int] .At the same type we often have type variables in atoms’ types due to theexploratory nature of synthesis (e.g. the Z variable in the Chain metaruledeﬁnition of the previous section will always be just a variable right afterthe metarule is used for inventing a clause). These type variables representfreedom in how to interpret the atom, in particular these variables representthat the type of the arguments is as of yet undetermined. Non-ground ar-guments (i.e. arguments with variables in them) will often initially have avariable for their type. For example, suppose we are trying to prove atom P([1],B):[list(int),X] . Uniﬁcation with the head primitive on the pred-icate name and type ﬁxes this atom’s type to [list(int),int] .In either case, whether we are type checking or making types more spe-ciﬁc, when uniﬁcation succeeds we know we can proceed in our proof attempt.

The major change we introduce to the Metagol AI algorithm is that eachatom has both a Derivation Type ( DT ), and a General Type ( GT ), that is Atom:DT:GT . The derivation type is always an instance of the general typeof the atom, which can be seen as instantiating any type parameters in thegeneral type to correspond to the types of the values that the arguments havetaken on in the derivation.The derivation type is for keeping track of the type of the atom as itis used in the proof of the entailment for the user provided examples. Thederivation type hence is as accurate as possible taking into account the values GT , on the other hand is not concerned with beingaccurate with regard to the values that the atom has taken on in the currentderivation. Instead the general type sees the atom as the head atom ofa deﬁnite clause and maintains the type that the atom’s arguments may be instantiated with, i.e. its polymorphic type. The general type is hencedetermined by the constraints imposed on the arguments’ types based onthe types of the atoms that become goals to prove this atom. The generaltypes can hence be seen as deriving from the leafs in the derivation, upthrough the subtree under this particular atom. In particular, the generaltype is not inﬂuenced by the example goals (and type) given to the algorithm.Upon a successful derivation being found, in general, it is not the case thatthe general type does not contain variables, instead the point of the generaltypes (of the head atoms of inventions) is that they may be polymorphic. An example

As an example, suppose we have to prove

P([1,2,3],2):DT:GT .Its derivation type will already be ﬁxed as

DT=[list(int),int] , due to thealgorithm being able to maintain fully accurate derivation types for argu-ments instantiated with values (given that the type of the example goals wasprovided and the background predicates had accurate types). As we havenot yet proven the atom the general type will usually be GT = [ X, Y ]. Sup-pose the invented clause the algorithm comes up with to prove the atom is inv(A,B):-tail(A,C),head(C,B) . We have that the (general) type of thebody atoms is tail:[list(X),list(X)] and head:[list(Y),Y] . The gen-eral type of the inv derives from these general types as being [list(X),X] .As part of proving the goal atom by this invention we unify the head ofthe invention with the goal to obtain inv([1,2,3],2):[list(int),int]:[list(X),X] . Now the body atoms of the invention need to proven, which are In the case that the argument variables are shared with other atoms in a clause itmight be that there are already constraints imposed on the general type (which wouldnecessarily also be present in the derivation types). Such constraints make the type morespeciﬁc than just two variables. ail([1,2,3],C):[list(int),list(int)]:[list(X),list(X)] and atom head(C,2):[list(int),int]:[list(Y),Y] . The derivation types have thecorrect type for C imposed by the typing of the background predicates of tail and head forcing uniﬁcation on C ’s type based on the type of the otherarguments. Clearly the proof succeeds when C=2 is forced by proving the tail atom.This example makes the utility of having a general type for inv appar-ent. If we would need to prove another goal such as

Q([[1],[2]],[2]]:[list(list(int)),list(int)]:QGT , we are not restricted to just being ableto retrieve the derivation type that was used in the proof for the previousgoal. Instead we can check that this derivation type is also an instantiationof the general type of inv , thereby allowing reuse of the invented predicate.

The polymorphic type checking contribution to MIL of this thesis has resultedin the Metagol PT algorithm in ﬁgure 5.1. The code in this ﬁgure is theMetagol AI algorithm, except for the bold code which achieves type checking.We have as main invariant of the algorithm that every goal provided asthe ﬁrst argument to prove aux has its derivation type DT instantiated tothe most speciﬁc type that is consistent with how the atom was derived fromthe examples (using the program built up at to that point), and that GT isthe most general type of the atom as is constrained by the typing of thebackground predicates and the invented clauses in the program. An atom’sderivation type is always an instance of its general type, as can be easilychecked. As part of specifying what problem the user would like to solve, the usersupplies a type,

Type , for the examples. In the learn clause of Metagol PT ,the Pos and

Neg examples are of the form p(a,...,z) , just as in the untypedcase. The ﬁrst two lines of the new learn body make sure that the exampleatoms satisfy the above invariant, namely each example is mapped from p(a,...,z) to p(a,...,z):Type:GT , where Type is the user supplied type,and GT is a new entirely unconstrained type variable.The prove clauses of Metagol PT are taken unchanged from the untypedalgorithm. Their main purpose is to implement (leftmost) depth-ﬁrst search.For this subsection we will focus on how the derivation type DT is usedto prune parts of the search space and how the correct derivation type isassigned to new goals. For now it suﬃces to know that the diﬀerence between36 earn(Pos,Neg, Type ,Prog):- map(de orate types(Type),Pos,PosTyped),map(de orate types(Type),Neg,NegTyped), prove(

PosTyped ,[],Prog),not(prove(

NegTyped ,Prog,Prog)).prove([],Prog,Prog).prove([Atom|Atoms],Prog1,Prog2):-prove_aux(Atom,Prog1,Prog3),prove(Atoms,Prog3,Prog2).prove_aux(Atom :DT:GT ,Prog,Prog):-prim(Atom :DT ),!, prim(Atom:GT), call(Atom).prove_aux(Atom :DT:GT ,Prog1,Prog2):-interpreted(Atom :DT:-BodyDT ), interpreted(Atom:GT:-BodyGT), ombine types(BodyDT,BodyGT,Body), prove(Body,Prog1,Prog2).prove_aux(Atom :DT:GT ,Prog1,Prog2):-member(sub(Name, GTinv ,Subs),Prog1), instan e of(DT,GTinv),GT=GTinv, metarule(Name,Subs,(Atom :DT:-BodyDT )), metarule(Name,Subs,(Atom:GT:-BodyGT)), ombine types(BodyDT,BodyGT,Body), prove(Body,Prog1,Prog2).prove_aux(Atom :DT:GT ,Prog1,Prog2):-metarule(Name,Subs,(Atom :DT:-BodyDT) ), metarule(Name,Subs,(Atom:GT:-BodyGT)), ombine types(BodyDT,BodyGT,Body), prove(Body,[sub(Name, GT,

Subs)|Prog1],Prog2).

Figure 5.1: The Metagol PT algorithm.37he body atoms BodyDT , the body atoms with derivation type, and the bodyatoms

Body in the prove aux clauses in 5.1 is that

Body ’s atoms additionallyhave their general type set.In discussing the prove aux clause disjuncts we maintain the invariantthat the goal atoms, i.e the ﬁrst argument to the clause and the

Body atoms,have the derivation type DT instantiated to the most accurate known type. Disjuncts

In the ﬁrst disjunct of prove aux we check whether the atomcan be proved by one of the primitive background predicates. An atom’spredicate will now be matched against primitives, not only on arity, butalso on the DT type. A primitive can only be chosen when the derivationtype is an instance of the primitive’s type. For example, take the goal P([1,2],B) with derivation type [list(int),int] . When considering the tail:[list(X),list(X)] predicate, uniﬁcation of the types fails due to int and list(X) not being uniﬁable. A predicate such as head:[list(X),X] does pass the uniﬁcation test due to

X=int equalizing the types. Hence accu-rate derivation type checking occurs at the prim(Atom:DT) line, and failureto unify the types means that Prolog’s evaluation will not be invoked for theatom.The second disjunct matches against the type annotated interpreted back-ground clauses. The assertion interpreted(Atom:DT:-BodyDT) says to ﬁnda background clause deﬁnition such that the head of the clause successfullyuniﬁes both the

Atom , containing a predicate name and values, as well as thederivation type DT with the clause’s head and type. When the head uniﬁessuccessfully, the deﬁnition of the interpreted clause makes sure that types ofthe body atoms are appropriately uniﬁed as well. Upon successful selectionof a background predicate, the general type of each atom is appended to theirderivation type.The third disjunct tries to reuse an invented clause of the program. Herewe use that the polymorphic type of (the head of) the clause is saved alongwith the meta-substitution and the name of the metarule used for the inven-tion. The instance of(DT,GTinv) goal asserts that the derivation type isan instance of GTinv . Note that DT is not uniﬁed with GTinv itself, whichwould make the type to speciﬁc, instead DT is uniﬁed with a copy of GTinv .We create a new instance of the invented clause, making sure that the samemeta-substitutions constraints are imposed, and generate body atoms withthe appropriate derivation types.The ﬁnal disjunct deals with the option of inventing a new clause. Theonly major change to note: in unifying the head of the metarule with theatom we need to prove, the derivation type (and the general polymorphic38ype) are uniﬁed as well.

Having looked at how goals get the correct derivation type assigned, we nowlook at how the general type can be determined for atoms. Note that de-termining the general type serves two purposes. First, we need the generaltype of an invention if we are to reuse an invention wherever we can, e.g. aninvention inv(A,B):-tail(A,C),head(C,B) derived to prove

P([1,2],2) has derivation type [list(int),int] when invented for these values. In-stead we want to know the more general polymorphic type, [list(X),X] ,such that the invention can be reused, e.g. to prove

P([[],[1]],[1]):[list(list(int)),list(int)] . Second, which follows directly from know-ing the general type of inventions, is that we can give a general type to theentire program, i.e. to the head of the clause used for the examples. Thismeans we can also generalize from the example type that we have been given.All of the background knowledge (primitive and interpreted predicates,and metarules) is already annotated with general types (and become moreaccurate for goal derivation types by making them more speciﬁc throughuniﬁcation). This is the property exploited in the algorithm, and why we seeuniﬁcation on the background knowledge twice, once for the derivation typesand once for the general types.

Disjuncts revisited

For the ﬁrst disjunct it should be clear that unify-ing GT with the general type of the already selected primitive keeps trackof the general type of the atom. For the interpreted clauses of the seconddisjunct this holds true as well, except that body atoms of the interpretedclause become goals which can restrict the general type during their proof.We have that background knowledge is only annotated with a single type,which is why we unify twice, once we get body atoms with derivation typesand once we get body atoms with general types. The combine types pred-icate combines these lists of singly typed atoms, of the form BodyAtom:DTy and

BodyAtom:GTy , into atoms with both a derivation and general type,e.g.

BodyAtom:DTy:GTy .For the third disjunct we already explained how the general type of aninvention is used to type check. To keep track of the general type, note thatthe general type of the (head of the) invention must be the general type ofthe atom, which is asserted by the equality. This equality also makes surethat the general type among all usages of the invention remain consistent forthis one general type. The metarule instantiation with the general type GT . To argue the correctness of the Metagol PT algorithm we establish soundness,i.e. the programs found by the algorithm are correct for the examples, andrelative completeness, that is, every program found by the Metagol AI algo-rithm will also be found by Metagol PT . Related to the completeness result,we also brieﬂy look at how sound pruning impacts predictive accuracy. As aseparate result we characterize how types on predicates can bound the sizeof the search space. Deﬁnition 5.3.1.

An inductive synthesis algorithm is sound if every pro-gram returned by the algorithm is a consistent hypothesis (deﬁnition 3.2.2).To establish soundness of Metagol PT , we make the following assumption:the Metagol AI algorithm is sound. Though we assume it here, it is not toohard to be convinced that this assumption is true. In essence the meta-interpreter extends the proof procedure of SLD-resolution with additionalhigher-order rules, and at the same time maintains the proof steps in thederivation of the entailment of the examples that an implementation of SLD-resolution does not itself maintain. Proposition 5.3.1

Given that the Metagol AI algorithm is sound, the Metagol PT algorithm is sound.Proof. Assume the precondition.Note that in logic/constraint programming we have that adding moreconstraints to clauses can only reduce the number of solutions found. Forthe Metagol PT algorithm to succeed the same derivation that the Metagol AI algorithm establishes must be found, i.e. the derivation on atoms without An alternative simple approach would be to modify the deﬁnition of meta-subtitutionsto include types on the predicate names, forgoing the need to store the type separately. AI derivation. This is due to the Metagol PT algorithm only adding constraints to the Metagol AI algorithm.Clearly the type checking of Metagol PT only imposes additional condi-tions on the proof of atoms, hence the derivations found by Metagol PT mustbe a subset of the derivations found by Metagol AI . Conclude that any re-turned program by Metagol AI must as well be a consistent hypothesis.The only thing to remark with regard to the additional constraints im-posed by Metagol PT is that they are not as simple as just additional atomsin bodies. The head atoms gain some freedom in that they have additionalvariables in their types, and some of the body atoms in the clauses are mod-iﬁed instead of just added. These changes do not matter, as in the end thealgorithm can only succeed when the atoms (without their types) form aproper Metagol AI proof. For our completeness result we assume that the background knowledge istypable by the polymorphic types introduced in section 4.1. We also assumethat the provided background knowledge has correct types, i.e. the typesmight be too general, but they are not inaccurate. For completeness relativeto the Metagol AI algorithm we show that when the Metagol AI algorithm isable to ﬁnd a program, then Metagol PT must ﬁnd the same program (syn-tactically identical).Remember, from section 3.3.2, that the search procedure of the Metagol AI (and hence of the Metagol PT ) algorithm is leftmost depth-ﬁrst. Deﬁnition 5.3.2.

A procedure that discards parts of a search space performs inconsistency pruning when the parts of the search space pruned away nevercontain any successful nodes.

Proposition 5.3.2

Let A be a depth-ﬁrst search algorithm. If an algorithm A ′ is algorithm A except that it does additional inconsistency pruning, thenwhen algorithm A ﬁnds its ﬁrst successful node s , algorithm A ′ will also ﬁndnode s as its ﬁrst successful node.Proof. Assume the stated relationship between A and A ′ and that algorithm A has found its ﬁrst successful node s . Suppose that either A ′ does not ﬁnda successful node, or ﬁnds a node t . Due to sound pruning we have that A ′ cannot have pruned away the part of the space containing s , and must comeacross it eventually. Hence A ′ must ﬁnd node t . Because the traversal orderof the nodes is the same (modulo unsuccessful nodes being left out) it followsthat t = s . 41irst note that a program with a type check error cannot be a consis-tent hypothesis (given at least one positive example). That Metagol PT addsinconsistency pruning relative to Metagol AI follows from that a type check-ing failure leads to pruning and that any program with a type check errorcannot be made to type check by adding additional clauses to the program(i.e. continuing the search).That Metagol PT ’s type checking through uniﬁcation correctly implementstype checking can be established by induction on the size of the derivationtree constructed during the algorithm . The induction hypothesis is that thevalues are always inhabitants of the derivation types. When the algorithmtries to prove a goal with a primitive/interpreted/invented predicate and thetyping of the predicate cannot be instantiated to the derivation type of goal,that predicate will be rejected due to a type error. This rejection is sounddue to the predicate’s type indicating that the predicate cannot successfullythe values of this type. Otherwise, the predicate may be used to continuethe search, at which point it is easy to show that the values in the new goalsare again inhabitants of their type.Using the above result we can immediately conclude the below proposi-tion, by instantiating the algorithms A and A ′ by Metagol AI and Metagol PT ,respectively. Proposition 5.3.3

The (simply typable) programs found by the Metagol AI algorithm and the Metagol PT algorithm are exactly the same syntactically. We will use this result in justifying why it is not worthwhile to experi-mentally investigate the diﬀerence in accuracy of Metagol PT and Metagol AI .The above result only holds when the algorithms are allowed to run arbi-trarily long, long enough to ﬁnd the (ﬁrst) successful program. In practice atimeout is used. As we will see in the experimental work, the untyped sys-tem is considerably slower than the typed system. A consequence is that thetyped system can ﬁnd programs for which the untyped system will use moretime than is allowed by a timeout. In such cases the predictive accuracy ofthe typed system will vastly outperform the untyped system. In experimentswhere both systems are able to ﬁnd the program the predictive accuracy isnot impacted by type checking. Note that we plan to make the presentation of the type system formal such that thisproof, amongst others, can be proven formally. .3.3 Proportion of Relevant Predicates For the remainder of the theoretical examination we focus on a particularlanguage class of logic programs. The metarules in table 3.1, e.g. contain-ing the Chain rule P ( A, B ) ← Q ( A, C ) , R ( C, B ), focus on restricting thestructure of the class of programs where each clause has a head atom andtwo body atoms. Each atom’s predicate has exactly arity two. This class isknown as H .The number of programs in this class is given as O ( | M | n ( p ) n ), where | M | is the number of metarules, n is the number of clauses and p is the number ofpredicate symbols [Lin et al., 2014]. The p term is due to that all three of thepredicates in a clause being chosen independently from each other. If higher-order abstractions, each with k ≥ O ( | M | n p (2+ k ) n ) [Cropper and Muggleton, 2016a].We will suppose that the predicates in the background knowledge areannotated with polymorphic types. The improvement of taking types intoconsideration is due to predicate typing having to match up, meaning thatpredicates in a clause usually cannot be chosen independently from eachother. Only a portion of the background predicates’ types will line up.Given p predicates with types, ﬁxing one of the three predicates in a H clause restricts the choice of the two other predicates of the clause. It hencemakes sense to consider the maximum number of predicates that remain aspossible choices for any predicate of a H clause, after either (or both) of theother predicates have been selected. This is determined per instance of thebackground knowledge.Let p ≤ p be the worst case number of predicates that remain as choicesfor any of the predicates in a H clause. This value is determined by anexhaustive search over the three predicate places in a clause, ﬁlling in any onepredicate and checking how many of the predicates can still be substitutedfor the remaining predicate variables. Deﬁnition 5.3.3.

Given p typed predicates, with p ≤ p an upper boundon the number of typed predicates that can be ﬁlled in any of the predicatevariables of a H clause, given that another predicate of the clause has alreadybeen selected, t = p/p is the worst case proportional constant .The ratio will always be between 0 and 1, with lower values indicating agreater reduction in the search space. The proportional constant is a conve-nient value to work with due to it abstracting away the number of predicatesin the program. The following result characterizes the reduction in the hy-pothesis space of programs given that predicates are properly annotated withtypes. 43 roposition 5.3.4 Given p typed predicates, and the worst case propor-tional constant t for the predicates, the hypothesis space H is reduced by afactor of t n , where n is the number of clauses, versus the untyped hypothesisspace. The p term in O ( | M | n ( p ) n ) is the (maximum) number of predicates thatcan be ﬁlled in for any predicate variables in a untyped (unabstracted) clause.For the typed case we know that this maximum is p , and hence we substitute p for p in the size bound: | M | n ( p ) n = | M | n (( tp ) ) n = | M | n ( t n )( p n )= ( t n )( | M | n ( p ) n ))In case of the abstracted search space we have that the reduction factoris t (2+ k ) n , using the exact same reasoning. These results imply that usingtypes to prune the search space leads to a considerable reduction in eﬀort. There are two main concerns when evaluating synthesis systems: the speedwith which they are able to ﬁnd programs and, in an inductive setting,the predictive accuracy of the found programs for the relation that is be-ing learned.As shown in the previous section, when the search algorithm is (leftmost)depth-ﬁrst search, the ﬁrst encountered program that correctly entails theexamples is always the same. It follows that the accuracy of the foundprograms (barring timeouts, which we will not consider in this section) is notaﬀected by polymorphic type checking based pruning, justifying the decisionof only performing experiments that check the impact on the size of thesearch space and the impact on the time needed for synthesis.We perform three experiments to evaluate the beneﬁt of polymorphictypes to MIL. First, the

Search Space Reduction experiment checks thatthe inclusion of irrelevant background predicates has negative eﬀects for theuntyped framework, though the typed system is able to ignore them. Second,the

Ratio Inﬂuence checks that the implementation is able to come close tothe theoretical result regarding better than linear inﬂuence of the ratio on thesearch space. Finally, we do a statistical experiment,

Simply Typed Droplasts ,on the synthesis of the droplasts program, checking for a time speedup anda reduction in the number of derivation steps.44or simplicity’s sake we will compare Metagol PT to the untyped system byjust disabling Metagol PT ’s type checking . Note that this means that thereis some additional overhead versus Metagol AI , which does not keep track oftypes at all. The number of proof steps is not impacted by this overhead,only the time needed is (though the impact should be quite small).The Prolog implementation used for running the experiments is SWI-Prolog. In determining how eﬃcient program synthesis is, we propose to keep trackof the number of decisions made for traversing the program search space.That is, a decision is a choice made to construct part of a potential program.A decision in the MIL framework corresponds to • trying to prove an atom with a primitive predicate, • trying to prove an interpreted higher-order predicate, by expanding thebody, • trying an existing invented predicate/clause, • or choosing to apply a metarule, creating an invention.Figure 5.2 indicates with bold lines where in the Metagol PT algorithm the(global) decision counter is increased. Note, that for our purposes, a decisionis made after type-checking. For the untyped algorithm the checks occur atthe same place (modulo additional type checking lines).Due to the correspondence of synthesis with constructing proofs, in par-ticular derivations for the positive examples, the decisions will also be calledproof steps or derivation steps. In this experiment we verify, via deterministic tests, that 1) adding typedpredicates that do not compose w.r.t. the input type are ignored by the typedsystem, and 2) that the diﬀerence in time and proof steps corresponds to thetheoretical results in proposition 5.3.4. Time constraints caused us to decide that proper experiments generating both typedand untyped experiments will have to be deferred to future work. rove_aux(Atom:DT:GT,Prog,Prog):-prim(Atom:DT),!,prim(Atom:GT), in rease ounter, call(Atom).prove_aux(Atom:DT:GT,Prog1,Prog2):-interpreted(Atom:DT:-BodyDT),interpreted(Atom:GT:-BodyGT),combine_types(BodyDT,BodyGT,Body), in rease ounter, prove(Body,Prog1,Prog2).prove_aux(Atom:DT:GT,Prog1,Prog2):-member(sub(Name,GTinv,Subs),Prog1),instance_of(DT,GTinv),GT=GTinv,metarule(Name,Subs,(Atom:DT:-BodyDT)),metarule(Name,Subs,(Atom:GT:-BodyGT)),combine_types(BodyDT,BodyGT,Body), in rease ounter, prove(Body,Prog1,Prog2).prove_aux(Atom:DT:GT,Prog1,Prog2):-metarule(Name,Subs,(Atom:DT:-BodyDT)),metarule(Name,Subs,(Atom:GT:-BodyGT)),combine_types(BodyDT,BodyGT,Body), in rease ounter, prove(Body,[sub(Name,GT,Subs)|Prog1],Prog2). Figure 5.2: Metagol PT ’s prove aux annotated with decision counter.46 ull Hypothesis 5.4.1 Metagol PT with type checking enabled is not ableto prune the search space relative to Metagol AI (i.e. Metagol PT with typechecking disabled). Null Hypothesis 5.4.2

Metagol PT with type checking enabled is not ableto learn faster than Metagol AI . Setup

We ask the system to prove a single positive example p(0,1) , oftype [ int, int ]. In order to strictly control the search space we only providethe chain metarule. As we want to traverse the entire search space we willonly introduce predicates that cannot contribute to a successful program.The predicate(s) in the background knowledge need to be general enoughto always succeed in unifying with goal atom (thereby making sure thatthe largest possible search space is traversed). The following predicate waschosen due to it being well-behaved when the search space is traversed depthﬁrst: to zero ( X, . We add one instance of this predicate with type [ int, int ], thereby allowing itto be used by the typed system. To test how the system handles completelyirrelevant background predicates we iteratively add additional instances ofthis predicate, but now with type [ bottom, bottom ], where bottom is a dummytype. Predicates with both bottom and int types cannot occur in the bodyof the Chain metarule.

Result

The graph on top in ﬁgure 5.3 shows the number proof steps thatwere needed to traverse the search space, when the number of clauses in thelargest program considered is restricted to three. The graph on the bottomin this ﬁgure shows the time spent on traversing the search space in both thetyped and the untyped system. 47

10 20 30 40 p r oo f s t e p s ( * . . ) t i m e ( i n s e c s ) untypedtyped Figure 5.3: Number of proof steps and time for increasing number of mis-matched predicates. Max clauses is three. Typed proof steps are constant at83 and the time required for typed synthesis is (approximately) constant at0 .

014 seconds. 48he number of steps required by the typed system remain constant at 83.Correspondingly, the time for the typed program is also as near to constant,never needing more than 0 .

014 seconds. The untyped program on the otherhand traverses bigger and bigger search spaces with increasing number of(irrelevant) background predicates. These graphs are strong evidence forrejecting hypothesis 5.4.1, regarding not being able to further prune thesearch space, and for rejecting hypothesis 5.4.2, regarding performance notbeing aﬀected relative to the untyped system.

Additional Analysis

It is interesting to note that the graph for the proofsteps (and time as well), for a maximum of three clauses, is best explained by a polynomial of degree four. This deviates from the known theory resultregarding the size of the search space being O ( | M | n ( p ) n ), where | M | = 1, n = 3, and p is varied in the experiment. Based on the theory one wouldexpect a polynomials of degree 9 to best explain the experimental data. Theexperiment was repeated to try to better understand this observation: thenumber of clauses was limited to four, which resulted in a polynomial ofdegree ﬁve best ﬁtting the data, while a degree 12 polynomial would beexpected according to the theory.Actually it is not hard to see that the theory result does not take intoaccount certain conditions on the predicates imposed by the MIL framework.It only represents the size of the search space traversed by a completely naivebrute-force algorithm. One clear example is that the predicates in the headare restricted to the predicate of the examples and to invented symbols. Itfollows that at least one of the three predicates in a clause is restricted to n ,instead of p . This observation allows us to replace the p in the size expressionwith p (given p ≫ n ). This restricted size bound still does not fully explainour lower degree polynomials, meaning that we need to reﬁne the theory toaccurately capture the hypothesis space that is actually considered by theMIL framework. For this experiment we consider the worst case proportional constant t = p/p as a varying ratio, where p is the total number of background knowledgepredicates, of which at most p of which can be used for any predicate variable.Via a deterministic test we determine the inﬂuence of the ratio of matchingtypes predicates versus all background predicates on the search space that By running linear regression and determining that terms of higher degree always aregiven trivially sized coeﬃcients typed system, Metagol PT , explores. Again, we are interested in the sizeof the search space, hence we do not try to ﬁnd a successful program. Null Hypothesis 5.4.3

Ratio unimportant The ratio t does not inﬂuencethe size of the search space explored. Setup

We take as basis the previous experiment. Again, we take the to zero ( X,

0) predicate, though this time we insert 25 of them in the back-ground knowledge. We vary the types of the predicates such that diﬀerentratios are achieved, e.g. 1 /

25, 2 /

25, etc. The predicates that will be allowedby type checking will have type [ int, int ] and the other predicates will havetype [ bottom, bottom ]. We restrict the number of clauses to three.

Result

Figure 5.4 has a plot showing how the time necessary for travers-ing the search space is inﬂuenced by the ratio of type matching versus allpredicates. The second plot shows the same behaviour but for the numberof proof steps by the untyped and simply typed systems.There is enough evidence to reject the hypothesis that the ratio of cor-rectly typed background predicates does not matter. Clearly the ratio hassigniﬁcant inﬂuence on the time taken and the number of decisions made bythe algorithm. Note that the untyped system will not care about the ratioof the matching typed predicates as it will not consider this feature at all,hence leading to its constant behaviour.The plots are again best explained by degree four polynomials, againdemonstrating the need for better ﬁtting theory. The time plot is interestingin particular for that it reveals that the Metagol PT with type checking turnedoﬀ, which is how this test was conducted for the untyped results, apparentlyhas overhead from the types that are carried around, even more so than thetyped system, as clearly demonstrated by the case when the ratio is 1 . .0 0.2 0.4 0.6 0.8 1.0 ratio matching/all predicates p r oo f s t e p s untypedtyped ratio matching/all predicates t i m e ( i n s e c s ) untypedtyped Figure 5.4: Number of proof steps and time for increasing ratio of typematching predicates out of all predicates.51 .4.4 Experiment 3: Simply Typed Droplasts

As a ﬁnal experiment we take a popular exercise from the literature [Kitzelmann,2007], namely the droplasts program. This polymorphic program takes alist of lists and drops the last element from each of the inner lists. Theprogram learned is: droplasts(A,B):-map(A,B,droplasts_1).droplasts_1(A,B):-reverse(A,C),droplasts_2(C,B).droplasts_2(A,B):-tail(A,C),reverse(C,B).

Null Hypothesis 5.4.4

Metagol PT is not able to improve on the untypedsystem, in the time required and the number of proof steps needed, in a (semi-)realistic setting. Setup

This experiment is conducted stochastically. We generate small ran-dom input examples (outer and inner lists (of integers) of length between 2and 5) and run them through a reference implementation to obtain correctpositive examples. As we are interested in the eﬀect of type checking onthe search space based on the background predicates, we ﬁx the number of(positive) examples generated to three.We provide the synthesis system with predicates for a list concat relation(appending elements at the back), the tail relation, the reverse relationand the two-place identity relation, all with appropriate polymorphic types.The type of the examples is set to [ list ( list ( int )) , list ( list ( int ))]. We giveit the chain metarule, along with metarules for abstraction to the follow-ing predicates. The higher-order predicates made available are the map , the reduceback and the filter relations.For the experiment we add additional typed predicates, which for sake ofexecution time we take to be simple, forgoing excessive costs associated withnon-determinism and resolution . The additional predicates only match onparticular values for their arguments, where the values are randomly chosenfrom a small distribution. The types of predicates are correct for the valuesof the argument and are either: bool , nat , int , list ( int ), list ( list ( int )), or list ( list ( X )). The rather limited lengths are chosen so that we can reuse most of this experimentwhen we consider the reﬁnement experiment for the following chapter The typed system would for the most part not incur these costs, hence adding anadditional cost for resolution and non-determinism would mostly be a tool to manipulatethe results in your favor. p r oo f s t e p s untypedtyped t i m e ( i n s e c s ) untypedtyped Figure 5.5: Average number of proof steps and time for increasing numberof typed background predicates. Standard deviation is depicted by bars.For each number of additional background predicates we run ten trials, foreach trial generating random positive examples and predicates. We average53ver the trials and calculate the standard deviation of the sample.

Result

The plots in ﬁgure 5.5 depict the average time and number of proofsteps required by the untyped and typed systems. The standard deviationsare included as bars.These graphs are reasonable evidence for concluding that the typed sys-tem is able to outperform the untyped system in a non-toy synthesis example,thereby rejecting hypothesis 5.4.4.

Analysis

The advantage of type checking observed in the previous, deter-ministic experiments is less pronounced in this experiment. The main reasonfor this is that while in the previous tests the untyped program needed toexplore the entire space of possible programs, i.e. with each background pred-icate at each possible position in all clauses, now the untyped system is ableto discard parts of its search space based on the values in the backgroundpredicates not unifying. 54 hapter 6Synthesis with ReﬁnementTypes

Based on the success of polymorphic type checking in making synthesis moreeﬃcient, it is natural to consider whether more expressive types can be lever-aged for further improvement. The extension to polymorphic types we willconsider in this chapter is reﬁnement types, i.e. polymorphic types with anadditional proposition restricting inhabitants of the type (see section 4.1 fora deﬁnition). We will often use the term simple types to refer to polymorphictypes without reﬁnements.First we discuss how the user speciﬁes reﬁnement types for the back-ground knowledge. As the MIL algorithm conceptually only needs minoradjustments, we next introduce the adapted algorithm Metagol RT . We showhow a reﬁnement proposition representing the entire program can be obtainedfrom the proof structure maintained by the algorithm.Reﬁnement checking is accomplished by proving (un)satisﬁability by aSMT solver. We discuss the levels of reﬁnement expressivity available, andhow there is a tradeoﬀ to be made. The chapter closes with theory andexperimental results. The reader is referred to section 4.1 for the deﬁnition and some of the syntaxused for reﬁnement types. We assume familiarity with the problem statementfor program synthesis using polymorphic typed predicates, as well as with theway users provide their background knowledge to the Metagol PT algorithm,as can be found in section 5.1.The user supplied positive and negative examples are unchanged versus55he simply typed case: the user provides a single consistent (non-reﬁnement)type for all the examples. As in the case of polymorphic type checking, weneed the user to supply reﬁnement types for the background knowledge. Thereﬁnement types on the predicates should be such that they are consistentwith the predicates, i.e. for any predicate p ( A , . . . , A k ) : [ T , . . . , T k ] h ϕ i ,which gets its arguments instantiated to a , . . . , a k , it is never the case that p ( a , . . . , a k ) holds while ϕ [ a /A , . . . , a k /A k ] is false. Note that the user isnot otherwise restricted, they can choose to use the most general reﬁnement( true ) or to make it as precise as they like (even as far as the reﬁnementcompletely characterizing the predicate).For specifying reﬁnements in Prolog we require a way to refer to the argu-ments. Instead of just annotating the predicate’s name with a type we alsowrite out its formal arguments when asserting that a predicate belongs tothe background knowledge. The primitive predicate assertions gain an addi-tional argument for the reﬁnement on the simple type. The simple type itselfis appended to the atom representing the predicate’s name and formal ar-guments. For example prim(tail(A,B):[list(X),list(X)], h length ( A ) = length ( B )+1 i ) , is the predicate representing the tail relation with the reﬁne-ment stating the length property that always holds of its arguments. Notethat the reﬁnement bracket syntax ( h . . . i ) made its way into our Prolog no-tation. For now we hide the actual syntax used in writing down reﬁnementsby this notation and comeback to the reﬁnement speciﬁcation language insection 6.4.The interpreted predicates gain an additional assertion for the speciﬁca-tion of the reﬁnement type. We will need to keep track of structure of theprogram derived, and hence will keep track of the existentially quantiﬁedpredicate names in the interpreted clause. As an example, the assertion ofthe map predicate being an interpreted background predicate becomes: interpreted([F],map([],[],F):[list(X),list(Y),[X,Y]] :- []).interpreted([F],map([A|S],[B|T],F):[list(X),list(Y),[X,Y]] :-[F(A,B):[X,Y],map(S,T,F):[list(X),list(Y),[X,Y]]]).interpreted_ref(map(C,D,F), h length ( C ) = length ( D ) i ) The metarules will not need to keep track of reﬁnements (directly) andhence remain unchanged versus the simple type problem statement, i.e. allatoms are decorated with simple types. The deﬁnition of consistent hypothe-ses, and that of MIL learner are essentially unchanged: the programs learnedare still simply typed, and the only adjustment needed is that the suppliedbackground predicates are reﬁnement typed.56 .2 From a Program Derivation to a Reﬁne-ment

This section starts out with observing how reﬁnements on predicates giverise to a reﬁnement over a program. Next we present how reﬁnement typechecking can be integrated at a high level into the MIL algorithm. This high-level algorithm will delegate the type checking to a subroutine, hiding thecomplexity. We subsequently look how this type checking subroutine is ableto construct a single proposition that needs to be checked for satisﬁability.

We now look at how the validity of substituting the head of a deﬁnite clausewith its body gives rise to inferring reﬁnements. Suppose we are in a posi-tion of needing to prove that an atom Q ( X, Y ) is an inhabitant of its type[ T X , T Y ] h ϕ Q i , where ϕ Q might be a unknown reﬁnement and X and Y mightboth be variables (and not yet concrete terms) of as of yet undeterminedtype. Substitution of reﬁnements

Suppose that after applying the chain metarulethat Q is assigned the body Q ( X, Y ) ← R ( X, A ) , S ( A, Y ) with the followingtyping (again involving unknowns): R ( X, A ) : [ T X , T A ] h ϕ R i S ( A, B ) : [ T A , T Y ] h ϕ S i If the reﬁnement of Q ( X, Y ) was unknown we now know that ϕ R ∧ ϕ S is anaccurate substitution for ϕ Q as the just invented deﬁnite clause would be theonly way of deriving Q ( X, Y ). To see that this is an accurate substitutionobserve that any occurrence of Q ( X, Y ) in the program may be replacedby the newly constructed body. If ϕ Q is not unknown it is the case thatthere is already another body assigned to Q ( X, Y ), which means we areadding a disjunctive clause. The reﬁnement ϕ Q needs to be updated to ϕ ′ Q = ϕ Q ∨ ( ϕ R ∧ ϕ S ). Backward action

From the above description it follow that the types inthe body of an invention have a kind of backward action with regard to thetype of the predicate that is being invented. This backward action leads toadditional named (argument) variables occurring in reﬁnement types, names57ot present in a predicate head’s arguments. Therefore a context needs tobe maintained for any such existentially qualiﬁed variables. The signiﬁcanceof this backward action is that is leads to a grand reﬁnement for the entireprogram, i.e. a single proposition whose satisﬁability determines whether theprogram is still consistent. For example, it could be that R ( X, A ) has tobe invented leading to an adjustment of its reﬁnement type, which would inturn change Q ( X, Y )’s type. In the same way Q ( X, Y ) could occur in thebody of the clause invented for predicate of the examples. Hence we havethat the entire program’s reﬁnement type is inﬂuenced by any adjustment toa type due to ﬁlling in a deﬁnition for a predicate.Note that for interpreted predicates we have that the same property holds:the reﬁnement for the head of the interpreted clause can be made moreaccurate by conjuncting it with the reﬁnements of the predicates that occurin the clause’s body. RT Figure 6.1 contains the code for Metagol RT , the reﬁnement type checkingMeta-Interpretive Learning algorithm. The algorithm is, in essence, theMetagol PT algorithm from the previous chapter with additional pruning. Thecode in bold notes all the changes made to implement reﬁnement checking.The same four disjunctive clauses remain and fulﬁll the same tasks.Each disjunct maintains in its Prog arguments what the currently de-rived program is. For the ﬁrst disjunct this is done by unifying the selectedpredicate name with one of the variables already in

Prog , hence the changein the program

Prog needs no further work. The other disjuncts explicitlykeep track of the structure of the proof constructed by the algorithm. Forinvented clauses a meta-substitution is already kept track of. Uses of higher-order background predicates and of invented clauses are now also saved.The check refinement predicate achieves pruning by deriving the propo-sition representing the constraints imposed on the entire program by thereﬁnements. The stored proof of the program gives rise to the grand reﬁne-ment, the proposition whose satisﬁability determines whether the programbeing considered is still a viable option, or else is already inconsistent. Whenthe grand reﬁnement is satisﬁable the call to check refinement holds for thesupplied program and the search continues. If the reﬁnement of the suppliedprogram is proven unsatisﬁable, the proof search procedure of Prolog startsbacktracking, discarding (at least) the last choice made for the program.When considering just ﬁlling in the existential variables from a clausebody we have that an inconsistent reﬁnement will not become consistentupon ﬁlling in more variables in the clause. This monotonicity property58 earn(Pos,Neg,Type,Prog):-map(decorate_types(Type),Pos,PosTyped),map(decorate_types(Type),Neg,NegTyped),prove(PosTyped,[],Prog),not(prove(NegTyped,Prog,Prog)).prove([],Prog,Prog).prove([Atom|Atoms],Prog1,Prog2):-prove_aux(Atom,Prog1,Prog3),prove(Atoms,Prog3,Prog2).prove_aux(Atom:DT:GT,Prog,Prog):-prim(Atom:DT),!,prim(Atom:GT), he k refinement(Prog), call(Atom).prove_aux(Atom:DT:GT,Prog1,

Prog3 ):-interpreted(

Subs,

Atom:DT:-BodyDT),interpreted(

Subs,

Atom:GT:-BodyGT),combine_types(BodyDT,BodyGT,Body),

Prog2=[inter(Atom,DT,GT,Subs)|Prog1℄ he k refinement(Prog2), prove(Body,

Prog2,Prog3 ).prove_aux(Atom:DT:GT,Prog1,

Prog3 ):-member(sub(Name,GTinv,Subs),Prog1),check_unifies_with(GTinv,DT),GT=GTinv,metarule(Name,Subs,(Atom:DT:-BodyDT)),metarule(Name,Subs,(Atom:GT:-BodyGT)),combine_types(BodyDT,BodyGT,Body),

Prog2=[inv(Name,Atom,DT,GT,Subs)|Prog1℄, he k refinement(Prog2), prove(Body,

Prog2,Prog3 ).prove_aux(Atom:DT:GT,Prog1,

Prog3 ):-metarule(Name,Subs,(Atom:DT:-BodyDT)),metarule(Name,Subs,(Atom:GT:-BodyGT)),combine_types(BodyDT,BodyGT,Body),

Prog2=[sub(Name,Atom,DT,GT,Subs)|Prog1℄, prove(Body,

Prog2,Prog3 ). Figure 6.1: Metagol RT : reﬁnement type checking MIL.59arries over to the grand reﬁnement of the entire program, i.e. once provingunsatisﬁable subsequent additions to the program cannot make the grandreﬁnement satisﬁable. This observation is enough to guarantee sound pruningof the search space. As noted in the previous section the algorithm maintains the structure of theprogram constructed in the form of how the invented and interpreted clausesare used. This structural information encodes where predicates occur in theprogram and over which variables they operate.

Tree-shape derivation

The Meta-Interpretive Learning approach to syn-thesis extends Prolog’s backward-chaining algorithm with additional meth-ods for proving atoms. The backward-chaining algorithm come downs tothe idea that a goal atom is proven by unifying the goal with the head of adeﬁnite clause leading to the body atoms of this clause becoming the goals.Hence a goal atom has child goal atoms, which in turn have child goal atoms,etc., until a goal atom is an asserted fact (i.e. has not body to prove). Thismeans that the proof of a goal atom forms a tree of goals.The derivation maintained in the Metagol RT algorithm maintains thisstructure in Prog , but just for the proof steps that involve interpreted and in-vented clauses. For usages of interpreted predicates we store inter(Atom,DT,GT,Subs) , where

Atom ’s predicate identiﬁes the interpreted clause used. The inter(...) atom encodes the information needed to reconstruct goal bodyatoms, instantiating known predicate variables with the substitution

Subs .For invented clauses we have that sub(Name,...) and inv(Name,...) use

Name to identify the metarule used, thereby giving access to the body goals.The leafs of this proof tree are the atoms that are either proved by a primi-tive, meaning that the chosen primitive will be stored in the

Sub substitutionof the parent goal, or are atoms who are yet to be proven (whose predicatesymbol might be a variable). Hence the information stored in

Prog is suﬃ-cient to recreate the proof tree with known predicate variables resolved andleaving unknown predicates as variables.

Constructing the Grand Reﬁnement

As follows from section 6.2.1,the grand reﬁnement can be directly derived from this proof structure. Weexplain a traversal of the derivation tree whereby the grand reﬁnement isconstructed and at the same time a variable context is maintained. Thevariable context is the set of typed variables that occur in the derivation,60long with the values that they are assigned to (in case an argument variablehas already been assigned a value).The details of the algorithm for converting a derivation to its grand re-ﬁnement are in ﬁgure 6.2. The basics are that two mutually recursive clausesbuild up the grand reﬁnement bottom up. Leafs, i.e. those atoms whosepredicate is a variable or a primitive, are directly convertible, see the ﬁrsttwo disjuncts of atom to refinement . In the case of interpreted predicatesthe body reﬁnement is ﬁrst derived, and this reﬁnement is conjuncted withthe reﬁnement that was supplied for the interpreted clause. For every atomthe arguments are added to the context. In body to refinement a list ofatoms is a clause body, which represents a conjunction of atoms, hence themain task of this clause is to collect contexts and conjunct reﬁnements.The grand reﬁnement is derived by supplying atom to refinement withthe example goal that algorithm is currently trying to prove, along withthe current program derivation. As a result we obtain a variable context,containing variable names with typing (and possibly values), and a singlelarge proposition: the grand reﬁnement.

The only issue not dealt with in the previous sections is that of establishingwhether the grand reﬁnement is satisﬁable. Satisﬁability of the reﬁnementis deﬁned as there being an assignment of the variables in the context suchthat the grand reﬁnement is true.This section explores the possibilities that Satisﬁability Modulo Theories(SMT) provide as a framework for specifying reﬁnements. In this frameworkthe grand reﬁnement corresponds to a SMT logic formula. SMT solvers willbe used to try to prove unsatisﬁability (also called inconsistency) of thisformula.

The main motivation for choosing to explore the eﬀectiveness of reﬁnementtypes in pruning the search space is that SMT solvers have proven to be veryeﬃcient in solving satisﬁability problems.Satisﬁability Modulo Theories (SMT) are theories for stating logic prob-lems regarding a set of formulas having a satisfying model. The formulasare expressed in a suitably restricted logic. The satisﬁability problem is en-coded in a language supported by SMT solvers. A common standard, withwidespread support, is the SMTLIB (2.0) language [Barrett et al., 2010].61 tom_to_refinement(Pred(Args):Ty,Prog,[], h true i ):-var(Pred).atom_to_refinement(Pred(Args):Type,Prog,Context,Refinement):-prim(Atom:Type,Refinement),typed_args_to_context(Args,Type,Context).atom_to_refinement(Pred(Args):Type,Prog,Context,Refinement):-member(inter(Pred(Args),Type,GT,Subs),Prog),interpreted(Subs,Pred(Args):Type:-Body),interpreted_ref(Pred(_),InterRef),body_to_refinement(Body,Prog,BodyContext,BodyRef),typed_args_to_context(Args,Type,InterCtx),append(BodyContext,InterCtx,Context),Refinement=and(BodyRef,InterRef).atom_to_refinement(Pred(Args):Type,Prog,Context,Refinement):-(member(app(Name,Pred(Args),Type,GT,Subs),Prog);(member(sub(Name,Pred(Args),Type,GT,Subs),Prog)),metarule(Name,Subs,(Pred(Args):Type:-Body)),body_to_refinement(Body,Prog,BodyContext,Refinement),typed_args_to_context(Args,Type,InterCtx),append(BodyContext,InterCtx,Context).body_to_refinement([],Prog,Context, h true i ).body_to_refinement([Atom|Atoms],Prog,Context,Refinement):-atom_to_refinement(Atom,Prog,Ctx1,Ref1),body_to_refinement(Atoms,Prog,Ctx2,Ref2),append(Ctx1,Ctx2,Context),Refinement=and(Ref1,Ref2). Figure 6.2: Prolog code for converting a derivation to a Grand Reﬁnement.62MT solvers are special purpose programs utilizing the best available algo-rithms to prove satisﬁability, often relying on heuristics. Forerunners in per-formance and support of new logics are the Z3 solver [De Moura and Bjørner,2008], and the CVC4 solver [Barrett et al., 2011].

Up till this point we have relied on mathematical syntax for specifying re-ﬁnements. The only requirement we have had on reﬁnements is that theyare propositions that may mention the arguments of a predicate, where weleft the availability of certain functions and notations out of scope.With our choice for solving satisﬁability of the grand reﬁnement ﬁxed,we can use this decision to guide our speciﬁcation of reﬁnements. As anylanguage we choose needs to be translated to a format that the SMT solversare able to work with, the simplest solution is to take the syntax of SMTLIB.

Variables in reﬁnements

The main feature on top of SMTLIB that weneed is to be able to correctly keep track of the named variables in thereﬁnements. The chosen solution is to move from a reﬁnement being a singlestring to list of strings and terms. An example for such a reﬁnement is forthe map predicate: interpreted_ref(map(A,B,F):[list(X),list(Y),[X,Y]],[’(= (length ’,A,’) (length ’,B,’))’]).

The ﬁrst thing to notice is the Lisp-like syntax of the reﬁnement, andthat the SMTLIB language has support for functions, e.g. length . The sig-niﬁcance of the interspersing of variable names is that a variable occurringmultiple times in the program derivation (and hence also in the grand reﬁne-ment), are identiﬁed with one another. This also makes it easy to translatesuch a single derivation variable to just one SMT variable. The SMT trans-lation of the grand reﬁnement just has this single variable name in place ofall the occurrences of the variable.

Encoded problem

The translation of the grand reﬁnement to a SMTproblem now proceeds as follows. First the context variables are considered,which are the same logical variables that occur in the reﬁnement. For eachvariable there is a declaration of a new SMT variable with a new name, withtyping as it occurs in the derivation. If the variable has a known value inthe derivation a SMT equality assertion is generated for the variable withthis value. Note that there may occur higher-order variables in predicate63rguments, but these will never occur as variables in reﬁnements and henceare ﬁltered out.For the grand reﬁnement we have that each single reﬁnement occurringin it can now be “folded down” from a list of strings and Prolog variablesto a single string. The names chosen for the SMT variables are used as sub-stitutions for the variables that occur in the reﬁnements, where upon stringconcatenation becomes available. The structure of the grand reﬁnement it-self, involving disjunctions and conjunctions over reﬁnements, can now bemade into a single string by replacing nodes such as and(ref1,ref2) by RefStr from the following code: ref_to_SMT(ref1,smt1),ref_to_SMT(ref2,smt2),str_concat(["(and ",smt1," ",smt2,")"],RefStr).

The set of formulas given to the SMT solver are thus: the variable declara-tions, assertions for values of variables (as far as they are known), and thetranslation of the grand reﬁnement type.

Having chosen the syntax of the type reﬁnements, this section looks at theavailable choices in regards the logic theories that SMT solvers support. Ashas been the theme of this document we identiﬁed higher-order predicates aspromising, with as argument that many standard list functions/predicateshave simple reﬁnements. We therefore focus on SMT logic theories that areable to reason over lists.

The ﬁrst candidate is the Z3 Sequence [N. Bjørner and Veanes., 2012] theory.This theory is able to reason over lists of bounded length and includes suchfunction symbols as: seq.concat , seq.len , seq.indexof , seq.extract ,etc. The theory is already undecidable, but it is the smallest theory that weidentiﬁed that includes list reasoning.In this theory it is very easy to write reﬁnements using just the set ofavailable function symbols. As our main example, the map predicate’s re-ﬁnement assertion looks very familiar: interpreted_ref(map(A,B,F),[’(= (seq.len ’,A,’) (seq.len ’,B,’))’]). Lists are the prime example of algebraic datatypes (ADTs). In the ADTformulation of lists there is only the empty list constructor and the cons list constructor for an element and a second list. The only operations avail-able are to pattern match on these two constructors. Embracing algebraicdatatypes leads more possibilities beyond just lists, e.q. option types andrecord types. The ADT formulation of lists comes with no functions pre-deﬁned, that is, notions such as length will have to be user-deﬁned. Manyimportant functions on lists are recursive, e.g. the length function. Whenreasoning over lengths it is often useful to also compare lengths. This meanswe also need some basic arithmetic notions.Recently there has been work on SMT solvers to support algebraic datatypes,the logic fragment of which is called DT [Reynolds and Blanchette, 2015].This theory on its own is decidable, though with quantiﬁers it is not. Inother recent work progress has been made on supporting recursive functionson these datatypes [Reynolds et al., 2016]. SMT solvers translate recursivefunctions deﬁnitions (on ADTs) to quantiﬁed formula, therefore we have torelinquish decidability. The need to support simple arithmetic is satisﬁedby the Linear Integer Arithmetic (LIA) theory. The combination of theories DT (with quantiﬁers over Uninterpreted Functions) and LIA is the theory(

U F ) DT LIA . Only the CVC4 solver implements this theory.The complexity of the recursive function deﬁnitions can be hidden fromthe user for standard deﬁnitions, e.g. dt length in the following: interpreted_ref(map(A,B,F),[’(= ’,dt_length(A),’ ’,dt_length(B),’)’]).

Behind the scenes the dt length predicate is able to insert the appropri-ate SMTLIB function deﬁnition in the set of formulas handed to the SMTsolver. The below code shows such a deﬁnition. Note that SMTLIB doesnot support the polymorphism of this example (the example uses Y as a typeparameter), but that the generated code must actually generate a separatefunction deﬁnition based on the types in each instance of the function beingin the grand reﬁnement. The work is so recent that I discovered a unsoundness issue in the implementation.The issue was promptly ﬁxed: https://github.com/CVC4/CVC4/issues/2133 define-fun-rec dt_length ((x (List Y))) Y(if-then-else (= x (as nil (List Y)))0(+ 1 (dt_length (tail x))))) For user-deﬁned recursive SMT functions the eﬀort required to integratewith the system becomes signiﬁcant. The implementation of this system isquite non-trivial while the details are rather uninteresting. The documentwill not look further into the matter.The

DT LIA is our preferred theory in terms of expressiveness. Recur-sive functions give enough power to express subset inclusions on lists, henceit becomes possible to express the property of the sort predicate that theelements are permuted. It is interesting to note that the recursive functionsare so powerful that they essentially allow us to fully encode (ﬁrst-order)Prolog predicates in the SMT logic. Hence users get the option to choose,on a sliding scale, between very accurate reﬁnements and very coarse reﬁne-ments. The understanding is that these more complicated reﬁnements mighttake more time to reason over, though could also be useful in detecting in-consistency earlier.

The results from the previous chapter can be directly lifted to the settingof reﬁnement types. The reason for why the propositions can be directlyrestated and their proofs only slightly modiﬁed is that the reﬁnement typeshave the same soundness properties for the synthesis algorithm as simpletypes. In particular the main feature is the inconsistency pruning of thesearch space.

Proposition 6.5.1

The programs found by the Metagol AI algorithm (whichcan be assigned a polymorphic types with reﬁnements) and the Metagol RT algorithm are exactly the same. The proof follows again by that a depth-ﬁrst search algorithm is used totraverse the program hypothesis space and that programs are still encoun-tered in the same order, just that the some inconsistent programs have beenskipped over by the Metagol RT algorithm due to type checking. For a moreprecise argument see section 5.3. 66he reduction of the space of H programs also directly applicable tothe reﬁnement type system. The reﬁnements are for the most part an im-provement with regard to the granularity of the types, i.e. they will furtherrestrict the possibilities for choices of predicates in a clause. This meansthat the result regarding the worst case proportional constant applies butthat reﬁnements in the background knowledge will shrink this worst caseratio further. We present experiments which check whether there is suﬃcient evidence toclaim that the current implementation of reﬁnement type checking has sig-niﬁcant advantages, i.e. whether we are justiﬁed in rejecting the following:

Null Hypothesis 6.6.1

Reﬁnement type checking cannot reduce the numberof proof steps compared to non-reﬁnement polymorphic type checking.

As in the previous chapter we focus on the droplasts program. Thisprogram takes a list of lists and drops the last element from each of the innerlists. We perform statistical experiments to evaluate whether reﬁnement typechecking has any beneﬁts versus only simple types checking (and by extensionversus the untyped system).In the ﬁrst experiment we add predicates with polymorphic types thatcompose well, but whose reﬁnement types lead to being able to decide thatsome combinations of predicates in a clause will not work. This test is pri-marily to show a diﬀerence between simple type checking and reﬁnementtype checking. The second experiment tries to be a bit more general andincludes predicates that are less sensible, though allow for a bigger diﬀerencebetween the untyped case and the typed cases.

The program we are synthesizing is droplasts . We perform a statisticalexperiment where we have random background knowledge containing predi-cates with reﬁnement types and are iteratively adding additional reﬁnementtyped predicates to check how the reﬁnement type checking system behaveswhen the composition of reﬁnements rules out certain programs.

Setup

We follow the setup of the simply typed droplasts experiment ofsection 5.4.4. We generate small random input examples, outer and inner67ists (of integers) of length between 2 and 5, and run them through a referenceimplementation to obtain correct positive examples. We choose this lengthrestriction for the examples, because the SMT solver needs more time forlarger lists. Adding additional examples would make the found programsmore accurate, but would come at the cost of even longer synthesis times(due to the SMT solver being invoked more often). As we are interested inthe eﬀect of type checking on the search space we ﬁx the number of (positive)examples generated to three.We provide the synthesis system with the following predicates: • concat:[A:list(T),B:T,C:list(T)] h length ( A ) + 1 = length ( C ) i The relation that appends an element at the back, where length is the length function over ADT lists deﬁnable in SMT-LIB. • tail:[A:list(X),B:list(X)] h A = cons ( , B ) i The relation taking of the head of a list, with accurate reﬁnementrepresentation. • reverse:[A:list(X),B:list(X)] h rev ( A, B ) i The reverse relation on lists. The reﬁnement is a SMT-LIB quadratictime predicate function checking whether the arguments are each oth-ers’ reversal. • map:[A:list(X),B:list(Y),F:[X,Y]] h length ( A ) = length ( B ) i The higher-order map relation with a reﬁnement concerning lengths. • reduceback:[list(T),list(T),[list(T),T,list(T)]] h true i The reduceback relation which essentially replaces cons in a listwith the function parameter. The current reﬁnement system cannotassign a more useful reﬁnement. • filter:[A:list(T),B:list(T),F:[T]] h length ( B ) ≤ length ( A ) i The ﬁlter relation whereby element can only be retained from theﬁrst list if the F predicate holds for the element.As part of the experiment we add random background predicates. Thesepredicates drop an element at a ﬁxed index from a list. The reﬁnement forthese predicates encode that lists of lower length are unaﬀected and lists oflength equal to this index, or higher, have their lengths reduced by one.The SMT solver’s timeout has been set to 30 milliseconds, which stillgives enough time for the solver to prove some problem instances unsat .68e use the CVC4 SMT solver, for which we specify the logic DTLIA, withoptions for ﬁnite model ﬁnding and inductive reasoning enabled.For each number of additional background predicates we run ten trials, foreach trial generating positive examples and random predicates. We averageover the trials and calculate the standard deviation of the sample. Result

The plots in ﬁgure 6.3 depict the average time and number of proofsteps required by the untyped, simply typed and reﬁnement type systems.The standard deviations are included as bars.The amount of time necessary for reﬁnement checking is an issue madeobvious by the ﬁrst graph. The proportion of type spent in Prolog runningthe Metagol code is trivial compared to the time spent waiting for the SMTsolver. We are aware that this means that the current implementation doesnot aﬀord a lot of practical beneﬁt, instead we should approach the work onreﬁnement types as a proof of concept.For the plot on the right we have that the reﬁnement system is able to cutdown on the search space explored relative to the untyped and simply typedsystems, though with considerable variance . Note that because the typesfor the most part compose that the diﬀerence between the simply typedand untyped systems is small, again showing that worst case proportionalconstant is signiﬁcant in improving on the untyped system. Given thesegraphs we are inclined to reject the hypothesis that reﬁnement type checkingdoes not have any advantages, though we do so with the knowledge that theamount of data generated is actually rather limited and a sudden spike invariance. Another provision is that improvement in search space reductioncomes at a very severe cost in regard to execution time.As a sanity check of the implementations we use the soundness resultsfrom this, and the previous, chapter. By soundness we know that when asystem prunes part of the search space they do so only when that part ofthe search space cannot yield a successful program. Hence the number ofproof steps for the Metagol PT should always be at most the number of proofsteps of the Metagol AI system, and the Metagol RT system should alwaysneed at most the number of proof steps of the Metagol PT system. We canconﬁrm this for each separate trial that was run, giving some conﬁdence inthe correctness of the implementation. The sudden increase of variance is a sign that this experiment is not particularly welldesigned. In future work other long running experiments will be used to evaluate the work. p r oo f s t e p s untypedsimply typedrefinement typed t i m e ( i n s e c s ) untypedsimply typedrefinement typed Figure 6.3: Average number of proof steps and time for increasing numberof typed background predicates. Standard error is depicted by bars.70 .6.2 Experiment 5: Droplasts with Additional Pred-icates

In order to distinguish more between the untyped and the simply typedcases in the previous experiment an (even) longer running experiment wasconducted.

Setup

We take an identical setup to the previous experiment, though nowremove the identity predicate from the base background knowledge and addthe following reﬁnement typed predicates: dumb0([0],[]):[list(nat),list(X)] h true i .dumb1(W,0):[list(nat),int] h f alse i :-findall(K,(between(3,4,K)),W).dumb2(W,0):[list(int),int] h f alse i :-findall(K,(between(2,7,K)),W). The ﬁrst and the second predicate should usually be ruled out due topolymorphic types and the second and third have reﬁnements that statethat they cannot be used in a program.

Result

The plots in ﬁgure 6.4 depict the average time and number of proofsteps required by the untyped, simply typed and reﬁnement type systems.The standard deviations are included as bars.As is unsurprising, the time results show that our reﬁnement checkingis very slow, and has very high variance as well. The plot of the numberof proofs steps to the right shows that while there is evidence for conclud-ing that the reﬁnement type checking system can improve in regard to theuntyped system, the simply typed and reﬁnement typed systems are quiteclose in the size of the search space traversed. This experiment is not ableto show signiﬁcant improvements for the reﬁnement type checking. A betterconsidered experiment is needed, which we have to defer to future work.The data does show that the reﬁnement type checking system is ableto do with strictly fewer proof steps than the (simple) polymorphic typedsystem and the untyped system. We can cautiously use this as evidence thatthe reﬁnement typed system is able to improve on the untyped (and simplytyped) system, though we have to keep in mind that this is solely for a minorreduction in the size of the search space and that it comes at a very largeexpense in terms of runtime. 71 p r oo f s t e p s untypedsimply typedrefinement typed t i m e ( i n s e c s ) untypedsimply typedrefinement typed Figure 6.4: Average number of proof steps and time for increasing numberof typed background predicates. Standard error is depicted by bars.72 hapter 7Conclusion

Logic program synthesis by Meta-Interpretive Learning (MIL) beneﬁts sig-niﬁcantly, in terms of search space explored and time needed, from typechecking. The diﬀerence in performance between the untyped Metagol AI al-gorithm and the same algorithm with simple type checking, Metagol PT , isimpressive. On the other hand, the value of the reﬁnement types checkingintroduced in this document is foremost theoretical: there is only limitedexperimental improvement, but the work does represent an advancement interms of bringing reﬁnement typing to Inductive Logic Programming (ILP). Summary and evaluation

In building on the MIL framework, the basic synthesis algorithm has shownwhy it is such a convincing approach to ILP: it is a simple, succinct, andhighly adaptable extension of the algorithm used for proving atoms in logicprogramming. The Metagol AI variant of the framework was essential in al-lowing the system to express higher-order programs. It is this work on higher-order programs that made it possible to consider interesting reﬁnement typeproperties. The major deﬁciency we identiﬁed with the system was its naivet´ein not taking typing into account.The addition of polymorphic types to the MIL algorithm is a very goodﬁt. The syntax for types has been chosen such that it is easy to leverage thepowerful uniﬁcation of Prolog, which is entirely responsible for type checking.In experimental work we have shown that the introduction of simple typechecking is very eﬀective, showing signiﬁcant improvements in both shrinkingthe search space explored and the time required for synthesis. There is (up to)a cubic reduction in the size of the search space and synthesis time, in termsof the number of typed background predicates. As annotating backgroundknowledge with polymorphic types is only a small burden for users, this73esult provides a strong argument for taking polymorphic type checking intoconsideration for future ILP systems.We presented the theory that introduces reﬁnement types to ILP. Reﬁne-ments on types allow for more accurate type checking. The cost of checkingthe reﬁnements is however not insigniﬁcant. We leverage SMT solving, whichin the current approach incurs such overhead that timewise the reﬁnementchecking has a severe detrimental eﬀect. The overhead is attributable to thelogics considered not being as performant as hoped, and to the shear numberof invocation of the solver. The experimental work shows that some addi-tional pruning of the search space is achieved, though more work is neededto make reﬁnement type checking in ILP worthwhile. Future workTheory

The experimental work on polymorphic types (section 5.4) showed(and partially explained) a discrepancy between the theory of the size ofhypothesis spaces and the search space over programs considered by MILalgorithms. A better characterization of the search spaces considered byMIL would be useful in predicting the impact of algorithmic changes.The work regarding polymorphic type checking itself can be made moreconvincing by presenting formal proofs of soundness and completeness. Inthe future a formal type system should be introduced for this purpose.

Justifying reﬁnements

While the polymorphic type checking approach isquite satisfactory, the reﬁnement types work can be improved upon in a num-ber of areas. Following on from the experimental work, better experimentsare needed to justify reﬁnement type checking as a worthwhile approach tofurther pruning the search space.In addition, as the main aim of reﬁnement type checking is in improvingperformance, the most direct way of addressing this issue is by making useof more performant theories for the SMT solvers. The work is mainly inidentifying logics that are expressive enough to state useful properties, whilerequiring much less time to prove (un)satisﬁability. Analysis of the numberof invocations of the SMT solvers in the experimental data of section 6.4could be used to express the performance needed of SMT solvers for thecurrent approach, which involves many SMT invocations, to be a sensibleway forward.

Reduction to SMT

The current approach asks the SMT solver to solve anentirely new problem every time it is invoked, but when a previous reﬁnementcheck was satisﬁable it is usually the case that only additional assertions need74o be added. This structural property could be leveraged to make solversmore eﬃcient by allowing them to reuse work performed for the previouschecks. A more ambitious approach (brieﬂy considered for this project) isto completely encode the ILP synthesis problem as a set of constraints fora SMT solver. The prospect of integrating the checking of reﬁnements intosuch a single SMT problem is especially enticing.

Directing the search

In section 4.3 we already explored the possibilityof types being able to guide the traversal of the search space, which givena well-considered heuristic would be able to further prune the search space.Such further pruning might also have implications for the viability of thecurrent reﬁnement type checking approach as this could signiﬁcantly reducethe number of invocations of the SMT solver.

Functional metarules

The MIL approach to synthesis is especially pow-erful in that it is able to invent program clauses. The main reason for thiscapability are the metarules. It appears that introducing rules for structuralinventions might be applicable outside of ILP. Introduction of metarules tothe setting of functional programs would a major new avenue to explore. Ex-isting type system-based approaches could be extended, thereby introducingthe (almost) unique feature of invention of helper clauses/functions to theﬁeld of functional program synthesis.75 ibliography

Aws Albarghouthi B, Paraschos Koutris, Mayur Naik, and CalvinSmith. Constraint-Based Synthesis of Datalog Programs.10416:689–706, 2017. doi: 10.1007/978-3-319-66158-2. URL http://link.springer.com/10.1007/978-3-319-66158-2 . 12Yoah Bar-David and Gadi Taubenfeld. Automatic discovery of mutual exclu-sion algorithms. In

International Symposium on Distributed Computing ,pages 136–150. Springer, 2003. 9Clark Barrett, Aaron Stump, Cesare Tinelli, et al. The smt-lib standard:Version 2.0. In

Proceedings of the 8th International Workshop on Satisﬁ-ability Modulo Theories (Edinburgh, England) , volume 13, page 14, 2010.61Clark Barrett, Christopher L Conway, Morgan Deters, Liana Hadarean, De-jan Jovanovi´c, Tim King, Andrew Reynolds, and Cesare Tinelli. Cvc4. In

International Conference on Computer Aided Veriﬁcation , pages 171–177.Springer, 2011. 63Wolfgang Bibel.

Automated theorem proving . Springer Science & BusinessMedia, 2013. 8Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner,Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, UrsMuller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 , 2016. 4Coq. The coq proof assistant, version 8.8.0, apr 2018. URL https://doi.org/10.5281/zenodo.1219885 . 8Andrew Cropper.

Eﬃciently learning eﬃcient programs . PhD thesis, Impe-rial College London, 2017. 5, 15 76ndrew Cropper and Stephen H. Muggleton. Learning higher-order logicprograms through abstraction and invention.

IJCAI Int. Jt. Conf. Artif.Intell. , 2016-Janua:1418–1424, 2016a. ISSN 10450823. 10, 19, 43Andrew Cropper and Stephen H. Muggleton. Metagol sys-tem. https://github.com/metagol/metagol, 2016b. URL https://github.com/metagol/metagol . 10Andrew Cropper and Stephen H. Muggleton. Learning ef-ﬁcient logic programs.

Machine Learning , Apr 2018.ISSN 1573-0565. doi: 10.1007/s10994-018-5712-6. URL https://doi.org/10.1007/s10994-018-5712-6 . 5Andrew Cropper, Alireza Tamaddoni-Nezhad, and Stephen H. Muggleton.Meta-interpretive learning of data transformation programs. In KatsumiInoue, Hayato Ohwada, and Akihiro Yamamoto, editors,

Inductive LogicProgramming , pages 46–59, Cham, 2016. Springer International Publish-ing. 10Leonardo De Moura and Nikolaj Bjørner. Z3: An eﬃcient smt solver. In

International conference on Tools and Algorithms for the Construction andAnalysis of Systems , pages 337–340. Springer, 2008. 63Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Im-agenet: A large-scale hierarchical image database. In

Computer Visionand Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pages248–255. Ieee, 2009. 4Colin Farquhar, Gudmund Grov, Andrew Cropper, Stephen Muggleton, andAlan Bundy. Typed meta-interpretive learning for proof strategies.

CEURWorkshop Proc. , 1636:17–32, 2015. ISSN 16130073. 13Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic.Example-directed synthesis: a type-theoretic interpretation.

ACM SIG-PLAN Notices , 51(1):802–815, 2016. 9, 10, 13Tim Freeman and Frank Pfenning.

Reﬁnement types for ML , volume 26.ACM, 1991. 24J-Y Girard. Une extension de l’interpretation de godel a l’analyse et son ap-plication a l’elimination des coupures dans l’analyse et la theorie des types.In

Proc. 2nd Scandinavian Logic Symp. , pages 63–92. North-Holland, 1971.25 77umit Gulwani. Dimensions in program synthesis.

Proc. 12thInt. ACM SIGPLAN Symp. Princ. Pract. Declar. Program. - PPDP’10 , pages 13–24, 2010. doi: 10.1145/1836089.1836091. URL http://portal.acm.org/citation.cfm?doid=1836089.1836091 . 5, 8,9Sumit Gulwani, Jos´e Hern´andez-Orallo, Emanuel Kitzelmann, Stephen HMuggleton, Ute Schmid, and Benjamin Zorn. Inductive programmingmeets the real world.

Communications of the ACM , 58(11):90–99, 2015. 4Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac.Complete completion using types and weights.

Pldi , 48(6):27–38, 2013. ISSN 15232867. doi: 10.1145/2491956.2462192. URL http://dl.acm.org/citation.cfm?doid=2499370.2462192{%}5Cnpapers3://publication/doi/10.1145/2499370.2462192 .12, 13Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. Oracle-guided component-based program synthesis. In

Proceedings of the32Nd ACM/IEEE International Conference on Software Engineering -Volume 1 , ICSE ’10, pages 215–224, New York, NY, USA, 2010.ACM. ISBN 978-1-60558-719-6. doi: 10.1145/1806799.1806833. URL http://doi.acm.org/10.1145/1806799.1806833 . 9Paris C Kanellakis and John C Mitchell. Polymorphic uniﬁcation and mltyping. In

Proceedings of the 16th ACM SIGPLAN-SIGACT symposiumon Principles of programming languages , pages 105–115. ACM, 1989. 34Susumu Katayama. Systematic search for lambda expres-sions.

Trends Funct. Program. , 81(985):195–205, 2005. URL http://books.google.com/books?hl=en{&}lr={&}id=p0yV1sHLubcC{&}oi=fnd{&}pg=PA111{&}dq=Systematic+search+for+lambda+expressions{&}ots=x57nz-UKrs{&}sig=I3{_}ntlWzKChXekOKKsvtQH2FEvg .11Emanuel Kitzelmann. Data-driven induction of recursive functions frominput/output-examples. In

Proceedings of the ECML/PKDD 2007Workshop on Approaches and Applications of Inductive Programming(AAIP’07). , 2007. 52Emanuel Kitzelmann. Inductive programming: A survey of program syn-thesis techniques. In Ute Schmid, Emanuel Kitzelmann, and Rinus Plas-meijer, editors,

Approaches and Applications of Inductive Programming ,pages 50–73, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. ISBN978-3-642-11931-6. 5 78essa A. Lau, Pedro Domingos, and Daniel S. Weld. Version Space Alge-bra and its Application to Programming by Demonstration.

ICML ’00Proc. Seventeenth Int. Conf. Mach. Learn. , pages 527–534, 2000. URL http://dl.acm.org/citation.cfm?id=657973 . 10, 12Dianhuan Lin, Eyal Dechter, Kevin Ellis, Joshua B Tenenbaum, andStephen H Muggleton. Bias reformulation for one-shot function induction.2014. 43Henry Massalin. Superoptimizer: A look at the smallest program.

SIGARCHComput. Archit. News , 15(5):122–126, October 1987. ISSN 0163-5964. doi:10.1145/36177.36194. URL http://doi.acm.org/10.1145/36177.36194 .9D. Michie. Machine learning in the next ﬁve years. In

Third European workingsession on learning, In Proceedings of the , pages 107–122, 1988. 4Stephen H. Muggleton. Inductive Logic Programming.

New Gener. Comput. ,8(4):295–318, 1991. ISSN 0288-3635. doi: 10.1007/BFb0027303. URL http://link.springer.com/10.1007/BFb0027303 . 5Stephen H. Muggleton, Dianhuan Lin, Niels Pahlavi, and Alireza Tamaddoni-Nezhad. Meta-interpretive learning: Application to grammatical infer-ence.

Mach. Learn. , 94(1):25–49, 2014a. ISSN 08856125. doi: 10.1007/s10994-013-5358-3. 5, 19Stephen H Muggleton, Dianhuan Lin, Niels Pahlavi, and Alireza Tamaddoni-Nezhad. Meta-interpretive learning: application to grammatical inference.

Machine learning , 94(1):25–49, 2014b. 11Stephen H. Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad.Meta-interpretive learning of higher-order dyadic datalog: predicate in-vention revisited.

Mach. Learn. , 100(1):49–73, 2015. ISSN 15730565. doi:10.1007/s10994-014-5471-y. 12Stephen H. Muggleton, Ute Schmid, Christina Zeller, Alireza Tamaddoni-Nezhad, and Tarek Besold. Ultra-strong machine learning: comprehen-sibility of programs learned with ilp.

Machine Learning , 107(7):1119–1140, Jul 2018. ISSN 1573-0565. doi: 10.1007/s10994-018-5707-3. URL https://doi.org/10.1007/s10994-018-5707-3 . 5R. Michel N. Bjørner, V. Ganesh and M. Veanes. An smt-lib format forsequences and regular expressions. In

In SMT workshop 2012 , 2012. 6479han-Hwei Nienhuys-Cheng and Ronald De Wolf.

Foundations of inductivelogic programming , volume 1228. Springer Science & Business Media, 1997.15Tobias Nipkow, Markus Wenzel, and Lawrence C. Paulson.

Isabelle/HOL:A Proof Assistant for Higher-order Logic . Springer-Verlag, Berlin, Heidel-berg, 2002. ISBN 3-540-43376-7. 8Peter-Michael Osera and Steve Zdancewic. Type-and-example-directed pro-gram synthesis. In

ACM SIGPLAN Notices , volume 50, pages 619–630.ACM, 2015. 11, 13Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program syn-thesis from polymorphic reﬁnement types.

ACM SIGPLAN Notices , 51(6):522–538, 2016. 9, 13Luc De Raedt.

Inductive Logic Programming , pages 529–537. Springer US, Boston, MA, 2010. ISBN 978-0-387-30164-8. doi: 10.1007/978-0-387-30164-8 396. URL https://doi.org/10.1007/978-0-387-30164-8_396 . 11, 12Veselin Raychev, Martin Vechev, and Eran Yahav. Code comple-tion with statistical language models.

SIGPLAN Not. , 49(6):419–428,June 2014. ISSN 0362-1340. doi: 10.1145/2666356.2594321. URL http://doi.acm.org/10.1145/2666356.2594321 . 4Andrew Reynolds and Jasmin Christian Blanchette. A decision procedure for(co) datatypes in smt solvers. In

International Conference on AutomatedDeduction , pages 197–213. Springer, 2015. 65Andrew Reynolds, Jasmin Christian Blanchette, Simon Cruanes, and CesareTinelli. Model ﬁnding for recursive functions in smt. In

International JointConference on Automated Reasoning , pages 133–151. Springer, 2016. 65Rishabh Singh and Sumit Gulwani. Learning semantic string trans-formations from examples.

Proc. VLDB Endow. , 5(8):740–751,2012a. ISSN 21508097. doi: 10.14778/2212351.2212356. URL http://dl.acm.org/citation.cfm?doid=2212351.2212356 . 10, 12Rishabh Singh and Sumit Gulwani. LNCS 7358 - Syn-thesizing Number Transformations from Input-OutputExamples.

Cav , pages 634–651, 2012b. URL .5 80rmando Solar-Lezama. The sketching approach to program synthesis.

Lect.Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. NotesBioinformatics) , 5904 LNCS:4–13, 2009. ISSN 03029743. doi: 10.1007/978-3-642-10672-9 3. 5, 9, 12Morten Heine Sørensen and Pawel Urzyczyn.

Lectures on the Curry-Howardisomorphism , volume 149. Elsevier, 2006. 7Saurabh Srivastava, Sumit Gulwani, and Jeﬀrey S. Foster. Template-basedprogram veriﬁcation and program synthesis.

Int. J. Softw. Tools Tech-nol. Transf. , 15(5-6):497–518, 2013. ISSN 14332779. doi: 10.1007/s10009-012-0223-4. 9, 12Xinyu Wang, Isil Dillig, and Rishabh Singh. Program Synthesis using Ab-straction Reﬁnement. 1(January), 2017. doi: 10.1145/3158151. URL http://arxiv.org/abs/1710.07740 . 10, 13Henry S. Warren.