[PDF] Inductive logic programming at 30

Abstract

Inductive logic programming (ILP) is a form of logic-based machine learning. The goal of ILP is to induce a hypothesis (a logic program) that generalises given training examples and background knowledge. As ILP turns 30, we survey recent work in the field. In this survey, we focus on (i) new meta-level search methods, (ii) techniques for learning recursive programs that generalise from few examples, (iii) new approaches for predicate invention, and (iv) the use of different technologies, notably answer set programming and neural networks. We conclude by discussing some of the current limitations of ILP and discuss directions for future research.

Full PDF

aa r X i v : . [ c s . A I] F e b Machine learning manuscript No. (will be inserted by the editor)

Inductive logic programming at 30

Andrew Cropper · Sebastijan Dumanˇci´c · RichardEvans · Stephen H. Muggleton the date of receipt and acceptance should be inserted later

Abstract

Inductive logic programming (ILP) is a form of logic-based machine learning.The goal of ILP is to induce a hypothesis (a logic program) that generalises given trainingexamples and background knowledge. As ILP turns 30, we survey recent work in the ﬁeld.In this survey, we focus on (i) new meta-level search methods, (ii) techniques for learningrecursive programs that generalise from few examples, (iii) new approaches for predicateinvention, and (iv) the use of different technologies, notably answer set programmingand neural networks. We conclude by discussing some of the current limitations of ILPand discuss directions for future research.

Inductive logic programming (ILP) [

75, 78 ] is a form of machine learning (ML). As withother forms of ML, the goal of ILP is to induce a hypothesis that generalises trainingexamples. However, whereas most forms of ML use vectors / tensors to represent data(examples and hypotheses), ILP uses logic programs (sets of logical rules). Moreover,whereas most forms of ML learn functions, ILP learns relations.To illustrate ILP suppose you want to learn a string transformation program fromthe following examples. A. CropperUniversity of OxfordE-mail: [email protected]. Dumanˇci´cKU LeuvenE-mail: [email protected]. EvansImperial College LondonE-mail: [email protected]. H. MuggletonImperial College LondonE-mail: [email protected] We do not introduce ILP in detail and refer the reader to the introductory paper of Cropper andDumancic [ ] or the textbooks of Nienhuys-Cheng and Wolf [ ] and De Raedt [ ] . Andrew Cropper et al. Input Output inductive elogic cprogramming gMost forms of ML would represent these examples as a table, where each row wouldbe an example and each column would be a feature, such as a one-hot-encoding repre-sentation of the string. By contrast, in ILP, we would represent these examples as logicalatoms, such as f([i,n,d,u,c,t,i,v,e], e) , where f is the target predicate that wewant to learn (the relation to generalise). We would also provide auxiliary information(features) in the form of background knowledge (BK), also represented as a logical the-ory (a logic program). For instance, for the string transformation problem, we could pro-vide BK that contains logical deﬁnitions for string operations, such as empty(A) , whichholds when the list A is empty; head(A,B) , which holds when B is the head of the list A ;and tail(A,B) , which holds when B is the tail of the list A . Given the aforementionedexamples and BK, an ILP system could induce the hypothesis (a logic program): f(A,B):- tail(A,C),empty(C),head(A,B).f(A,B):- tail(A,C),f(C,B). Each line of the program is a rule. The ﬁrst rule says that the relation f(A,B) holdswhen the three literals tail(A,C) , empty(C) , and head(A,B) hold. In other words, theﬁrst rule says that B is the last element of A when the tail of A is empty and B is the headof A . The second rule is recursive and says that the relation f(A,B) holds when the twoliterals tail(A,C) and f(C,B) hold. In other words, the second rule says that f(A,B) holds when the same relation holds for the tail of A .1.1 Why ILP?Compared to most ML approaches, ILP has several attractive features [

25, 17 ] : Data efﬁciency.

Many forms of ML are notorious for their inability to generalise fromsmall numbers of training examples, notably deep learning [

70, 13 ] . As Evans and Grefen-stette [ ] point out, if we train a neural system to add numbers with 10 digits, it mightgeneralise to numbers with 20 digits, but when tested on numbers with 100 digits, thepredictive accuracy drastically decreases [

91, 53 ] . By contrast, ILP can induce hypothe-ses from small numbers of examples, often from a single example [

69, 82 ] . Background knowledge.

ILP learns using BK represented as a logic program. Using logicprograms to represent data allows ILP to learn with complex relational information,such as constraints about causal networks [ ] , the axioms of the event calculus whenlearning to recognise events [

55, 56 ] , and using a theory of light to understand images [ ] . Moreover, because hypotheses are symbolic, hypotheses can be added the to BK,and thus ILP systems naturally support lifelong and transfer learning [

69, 15, 16 ] . Expressivity.

Because of the expressivity of logic programs, ILP can learn complex rela-tional theories, such as cellular automata [

51, 40 ] , event calculus theories [

55, 56 ] , Petrinets [ ] , and general algorithms [ ] . Because of the symbolic nature of logic programs,ILP can reason about hypotheses, which allows it to learn optimal programs, such asminimal time-complexity programs [ ] and secure access control policies [ ] . nductive logic programming at 30 3 Expainability.

Because of logic’s similarity to natural language, logic programs can beeasily read by humans, which is crucial for explainable AI. For instance, Muggleton et al [ ] provide the ﬁrst demonstration of ultra-strong ML [ ] , where a learned hypothesisis expected to not only be accurate but to also demonstrably improve the performanceof a human when provided with the learned hypothesis.1.2 Recent advancesSome of the aforementioned advantages come from recent developments, which we sur-vey in this paper . To aid the reader, we coarsely compare old and new ILP systems, wherenew represents systems from the past decade. We use FOIL [ ] , Progol [ ] , TILDE [ ] ,and HYPER [ ] as representative old systems and ILASP [ ] , Metagol [ ] , ∂ ILP [ ] ,and Popper [ ] as representative new systems. This comparison, shown in Table 1, is,of course, vastly oversimpliﬁed, and there are many exceptions. In the rest of this paper,we survey these developments (each row in the table) in turn. After discussing these newideas, we discuss recent application areas (Section 5.2) before concluding by proposingdirections for future research. Old ILP New ILPSearch method

Top-down and Bottom-up Meta-level

Recursion

Limited Yes

Predicate invention

No Limited

Hypotheses

First-order Higher-order, ASP

Optimality

No Yes

Technology

Prolog Prolog, ASP, NNs

Table 1

A simpliﬁed comparison of old and new ILP systems.

The fundamental ILP problem is to efﬁciently search a large hypothesis space. Mostolder ILP approaches search in either a top-down or bottom-up fashion. These methodsrely on notions of generality (typically using theta-subsumption [ ] ), where one pro-gram is more general or more speciﬁc than another. A third new search approach hasrecently emerged called meta-level ILP [

50, 84, 49, 66, 19 ] . We discuss these approachesin turn.1.3 Top-down and bottom-upTop-down approaches [

89, 9, 12 ] start with a general hypothesis and then specialise it.HYPER, for instance, searches a tree in which the nodes correspond to hypotheses andeach child of a hypothesis in the tree is more speciﬁc than or equal to its predecessorin terms of theta-subsumption. An advantage of top-down approaches is that they canoften learn recursive programs (although not all do). A disadvantage is that they can be This paper extends the paper of Cropper et al [ ] . Andrew Cropper et al. prohibitively inefﬁcient because they can generate many hypotheses that do not coverthe examples.Bottom-up approaches, by contrast, start with the examples and generalise them [

74, 77, 79, 51 ] . For instance, Golem [ ] generalises pairs of examples based on rela-tive least-general generalisation [ ] . Bottom-up approaches can be seen as being data- or example-driven . An advantage of these approaches is that they are typically fast. AsBratko [ ] points out, disadvantages include (i) they typically use unnecessarily longhypotheses with many clauses, (ii) it is difﬁcult for them to learn recursive hypothesesand multiple predicates simultaneously, and (iii) they do not easily support predicateinvention.Progol [ ] , which inspired many other ILP approaches [ ] , combinesboth top-down and bottom-up approaches. Starting with an empty program, Progolpicks an uncovered positive example to generalise. To generalise an example, Progoluses mode declarations to build the bottom clause [ ] , the logically most-speciﬁc clausethat explains the example. The bottom clause bounds the search from below (the bottomclause) and above (the empty set). Progol then uses an A* algorithm to generalise thebottom clause in a top-down (general-to-speciﬁc) manner and uses the other examplesto guide the search. Top-down and bottom-up approaches reﬁne and revise a single hypothesis. A third ap-proach has recently emerged called meta-level

ILP [

50, 84, 49, 66, 19 ] . There is no stan-dard deﬁnition for meta-level ILP . Most approaches encode the ILP problem as a meta-level logic program, i.e. a program that reasons about programs. Meta-level approachesthen often delegate the search for a hypothesis to an off-the-shelf solver [

14, 21, 62,54, 100, 40, 19 ] after which the meta-level solution is translated back to a standardsolution for the ILP task. In other words, instead of writing a procedure to search ina top-down or bottom-up manner, most meta-level approaches formulate the learningproblem as a declarative search problem. For instance, ASPAL [ ] translates an ILP taskinto a meta-level ASP program which describes every example and every possible rule inthe hypothesis space. ASPAL then delegates the search to an ASP system to ﬁnd a subsetof the rules that covers all the positive but none of the negative examples.The main advantage of meta-level approaches is that they can more easily learn re-cursive programs and optimal programs [

14, 62, 21, 54, 40, 19 ] , which we discuss inSections 2 and 4 respectively. Moreover, whereas classical ILP systems were almost en-tirely based on Prolog, meta-level approaches use diverse techniques and technologies,such as ASP solvers [

14, 62, 54, 19, 40 ] , which we expand on in Section 5. The devel-opment of meta-level ILP approaches has, therefore, diversiﬁed ILP from the standardclause reﬁnement approach of earlier ILP systems.Most meta-level approaches encode the ILP learning task as a single static meta-levelprogram [

14, 62, 54, 40 ] . A major issue with this approach is that the meta-level programcan be very large so these approaches can struggle to scale to problems with non-trivialdomains and to programs with large clauses. Two related approaches try to overcomethis limitation by continually revising the meta-level program.ILASP3 [ ] employs a counter-example-driven select-and-constrain loop. ILASP3ﬁrst pre-computes every clause in the hypothesis space deﬁned by a set of given modedeclarations [ ] . ILASP3 then starts its select-and-constrain loop. With each iteration,ILASP3 uses an ASP solver to ﬁnd the best hypothesis (a subset of the rules) it can. nductive logic programming at 30 5 If the hypothesis does not cover one of the examples, ILASP3 ﬁnds a reason why andthen generates constraints (boolean formulas over the rules) which it adds to the meta-level program to guide subsequent search. Another way of viewing ILASP3 is that ituses a counter-example-guided approach and translates an uncovered example e into aconstraint that is satisﬁed if and only if e is covered.Popper [ ] adopts a similar approach but differs in that it (i) does not precomputeevery possible rule in the hypothesis space, and (ii) translates a hypothesis into a set ofconstraints, rather than an uncovered example. Popper works in three repeating stages: generate , test , and constrain . Popper ﬁrst constructs a meta-level logic program whereits models correspond to hypotheses. In the generate stage, Popper asks an ASP solverto ﬁnd a model (a hypothesis). In the test stage, Popper tests the hypothesis against theexamples. A hypothesis fails when it is incomplete (does not entail all the positive ex-amples) or inconsistent (entails a negative example). If a hypothesis fails, Popper learnsconstraints from the failure, which it then uses to restrict subsequent generate stages. Forinstance, if a hypothesis is inconsistent, then Popper generates a generalisation constraintto prune all generalisations of the hypothesis and adds the constraint to the meta-levelprogram, which eliminates models and thus prunes the hypothesis space. This processrepeats until Popper ﬁnds a complete and consistent program.For more information about meta-level learning, we suggest the work of Inoue [ ] and Law et al [ ] . Learning recursive programs has long been considered a difﬁcult problem for ILP [ ] . The power of recursion is that an inﬁnite number of computations can be describedby a ﬁnite recursive program [ ] . To illustrate the importance of recursion, reconsiderthe string transformation problem from the introduction. Without recursion, an ILP sys-tem would need to learn a separate clause to ﬁnd the last element for each list of length n , such as this program for when n = f(A,B):- tail(A,C),empty(C),head(A,B).f(A,B):- tail(A,C),tail(C,D),empty(D),head(C,B).f(A,B):- tail(A,C),tail(C,D),tail(D,E),empty(E),head(D,B). This program does not generalise to lists of arbitrary lengths. Moreover, most ILP systemswould need examples of lists of each length to learn such a program. By contrast, an ILPsystem that supports recursion can learn the compact program: f(A,B):- tail(A,C),empty(C),head(A,B).f(A,B):- tail(A,C),f(C,B).

Because of the symbolic representation and the recursive nature, this program gener-alises to lists of arbitrary length and which contain arbitrary elements (e.g. integers andcharacters). In general, without recursion, it can be difﬁcult for an ILP system to gener-alise from small numbers of examples [ ] .Older ILP systems struggle to learn recursive programs, especially from small num-bers of training examples. A common limitation with existing approaches is that theyrely on bottom clause construction [ ] . In this approach, for each example, an ILP sys-tem creates the most speciﬁc clause that entails the example, and then tries to generalise Andrew Cropper et al. the clause to entail other examples. However, this sequential covering approach requiresexamples of both the base and inductive cases.Interest in recursion has resurged with the introduction of meta-interpretive learn-ing (MIL) [

83, 84, 27 ] and the MIL system Metagol [ ] . The key idea of MIL is to use metarules [ ] , or program templates, to restrict the form of inducible programs, andthus the hypothesis space . A metarule is a higher-order clause. For instance, the chain metarule is P ( A , B ) ← Q ( A , C ) , R ( C , B ) , where the letters P , Q , and R denote higher-ordervariables and A , B and C denote ﬁrst-order variables. The goal of a MIL system, such asMetagol, is to ﬁnd substitutions for the higher-order variables. For instance, the chain metarule allows Metagol to induce programs such as f(A,B):- tail(A,C),head(C,B) .Metagol induces recursive programs using recursive metarules, such as the tailrec metarule P(A,B) ← Q(A,C), P(C,B) .Following MIL, many meta-level ILP systems can learn recursive programs [

62, 39,54, 19 ] . With recursion, ILP systems can now generalise from small numbers of exam-ples, often a single example [ ] . Moreover, the ability to learn recursive programs hasopened up ILP to new application areas, including learning string transformations pro-grams [ ] , answer set grammars [ ] , and general algorithms [ ] . A key characteristic of ILP is the use of BK. BK is similar to features used in most forms ofML. However, whereas features are tables, BK contains facts and rules (extensional andintensional deﬁnitions) in the form of a logic program. For instance, when learning stringtransformation programs, we may provide helper background relations, such as head/2 and tail/2 . For other domains, we may supply more complex BK, such as a theory oflight to understand images [ ] or higher-order operations, such as map/3 , filter/3 ,and fold/4 , to solve programming puzzles [ ] .As with choosing appropriate features, choosing appropriate BK is crucial for goodlearning performance. ILP has traditionally relied on hand-crafted BK, often designed bydomain experts. This approach is limited because obtaining suitable BK can be difﬁcultand expensive. Indeed, the over-reliance on hand-crafted BK is a common criticism ofILP [ ] .Rather than expecting a user to provide all the necessary BK, the goal of predicateinvention (PI) [

77, 104 ] is for an ILP system to automatically invent new auxiliary pred-icate symbols. This idea is similar to when humans create new functions when manuallywriting programs, as to reduce code duplication or to improve readability. Whilst PI hasattracted interest since the beginnings of ILP [ ] , and has subsequently been repeatedlystated as a major challenge [

58, 81, 60 ] , most ILP systems do not support it.A key challenge faced by early ILP systems was deciding when and how to invent anew symbol. As Kramer [ ] points out, PI is difﬁcult because it is unclear how manyarguments an invented predicate should have, how the arguments should be ordered,etc. Several PI approaches try to address this challenge, which we discuss in turn. The idea of using metarules to restrict the hypothesis space has been widely adopted by many ap-proaches [ ] . However, despite their now widespread use, there is little work deter-mining which metarules to use for a given learning task ( [ ] is an exception), which future work mustaddress. Metagol can induce longer clauses though predicate invention, which is described in Section 3.nductive logic programming at 30 7 [ ] call placeholders . However, this placeholder approach is limitedbecause it requires that a user manually specify the arity and argument types of a sym-bol [ ] , which rather defeats the point, or requires generating all possible inventedpredicates [

39, 40 ] , which is computationally expensive.3.2 MetarulesInterest in automatic PI (where a user does not need to predeﬁne an invented symbol) hasresurged with the introduction of MIL. MIL avoids the issues of older ILP systems by usingmetarules to deﬁne the hypothesis space and in turn reduce the complexity of inventinga new predicate symbol. For instance, the chain metarule ( P ( A , B ) ← Q ( A , C ) , R ( C , B ) )allows Metagol to induce programs such as f(A,B):- tail(A,C),tail(C,D) , whichwould drop the ﬁrst two elements from a list. To induce longer clauses, such as to dropﬁrst three elements from a list, Metagol uses the same metarule but invents a predicatesymbol to chain their application, such as to induce the program: f(A,B):- tail(A,C),inv1(C,B).inv1(A,B):- tail(A,C),tail(C,B). To learn this program, Metagol invents the predicate symbol inv1 and induces a def-inition for it using the chain metarule. Metagol uses this new predicate symbol in thedeﬁnition for the target predicate f .A side-effect of this metarule-driven approach is that problems are forced to be de-composed into reusable solutions. For instance, to learn a program that drops the ﬁrstfour elements of a list, Metagol learns the following program, where the invented pred-icate symbol inv1 is used twice: f(A,B):- inv1(A,C),inv1(C,B).inv1(A,B):- tail(A,C),tail(C,B). PI has been shown to help reduce the size of target programs, which in turns reducessample complexity and improves predictive accuracy [ ] . Several new ILP systems sup-port PI using a metarule-guided approach [

39, 54, 47 ] .3.3 Pre / post-processingMetarule-driven PI approaches perform PI during the learning task. A recent trend isto perform PI as a pre- or post-processing step to improve knowledge representation [

36, 37, 15, 47 ] .CUR LED [ ] performs PI by clustering constants and relations in the provided BK,turning each identiﬁed cluster into a new BK predicate. The key insight of CUR LED isnot to use a single similarity measure, but rather a set of various similarities. This choiceis motivated by the fact that different similarities are useful for different tasks, but inthe unsupervised setting the task itself is not known in advance. CUR LED performs PIby producing different clusterings according to the features of the objects, communitystructure, and so on.

Andrew Cropper et al.

ALPs [ ] perform PI using an auto-encoding principle: they learn an encoding logicprogram that maps the provided data to a new, compressive latent representation (de-ﬁned in terms of the invented predicates), and a decoding logic program that can recon-struct the provided data from its latent representation. This approach shows improvedperformance on supervised tasks, even though the PI step is task-agnostic.Knorf [ ] pushes the idea of ALPs even further. Knorf compresses a program byremoving redundancies in it. If the learnt program contains invented predicates, Knorfrevises them and introduces new ones that would lead to a smaller program. The refac-tored program is smaller in size and contains less redundancy in clauses, both of whichlead to improved performance. The authors experimentally demonstrate that refactoringimproves learning performance in lifelong learning and that Knorf substantially reducesthe size of the BK program, reducing the number of literals in a program by 50% or more.3.4 Lifelong LearningAn approach to acquiring BK is to learn it in a lifelong learning setting. The general ideais to reuse knowledge gained from solving one problem to help solve a different problem.Metagol DF is an ILP system [ ] which given a set of tasks, uses Metagol to try tolearn a solution for each task using at most one clause. If Metagol ﬁnds a solution for atask, it adds the solution to the BK and removes the task from the set. Metagol DF thenasks Metagol to ﬁnd solutions for the rest of the tasks but can now (i) use an additionalclause, and (ii) reuse solutions from previously solved tasks. This process repeats untilMetagol DF solves all the tasks or reaches a maximum program size. In this approach,Metagol DF automatically identiﬁes easier problems, learn programs for them, and thenreuses the solutions to help learn programs for more difﬁcult problems. The authorsexperimentally show that their multi-task approach performs substantially better thana single-task approach because learned programs are frequently reused and leads to ahierarchy of induced programs.Metagol DF saves all learned programs (including invented predicates) to the BK,which can be problematic because too much irrelevant BK is detrimental to learningperformance [ ] . To address this problem, Forgetgol [ ] introduces the idea of for-getting . In this approach, Forgetgol continually grows and shrinks its hypothesis spaceby adding and removing learned programs to and from its BK. The authors show thatforgetting can reduce both the size of the hypothesis space and the sample complexityof an ILP learner when learning from many tasks.3.5 LimitationsThe aforementioned techniques have improved the ability of ILP to invent high-levelconcepts. However, PI is still difﬁcult and there are many challenges to overcome. Thechallenges are that (i) many systems struggle to perform PI at all, and (ii) those that dosupport PI mostly need much user-guidance, metarules to restrict the space of inventedsymbols or that a user speciﬁes the arity and argument types of invented symbols.By developing better approaches for PI, we can make progress on existing challeng-ing problems. For instance, in inductive general game playing [ ] , the task is to learnthe symbolic rules of games from observations of gameplay, such as learning the rules nductive logic programming at 30 9 of connect four . The target solutions, which come from the general game playing com-petition [ ] , often contain auxiliary predicates. For instance, the rules for connect four are deﬁned in terms of deﬁnitions for lines which are themselves deﬁned in terms ofcolumns, rows, and diagonals. Although these auxiliary predicates are not strictly nec-essary to learn the target solution, inventing such predicates signiﬁcantly reduces thesize of the solution, which in turns makes them easier to learn. Although new methodsfor PI can invent high-level concepts, they are not yet sufﬁciently powerful enough toperform well on the IGGP dataset. Making progress in this area would constitute a majoradvancement in ILP.ILP systems have traditionally induced deﬁnite and normal logic programs, typicallyrepresented as Prolog programs. A recent development has been to use different hypoth-esis representations.3.6 DatalogDatalog is a syntactical subset of Prolog which disallows complex terms as argumentsof predicates and imposes restrictions on the use of negation. Datalog is a truly declara-tive language, whereas in Prolog reordering clauses can change the program. Moreover,Datalog query is guaranteed to terminate, though this guarantee is at the expense of notbeing a Turing-complete language, which Prolog is. Several works [

3, 39, 54 ] induceDatalog programs. The general motivation for reducing the expressivity of the repre-sentation language from Prolog to Datalog is to allow the problem to be encoded as asatisﬁability problem, particularly to leverage recent developments in SAT and SMT. Wediscuss the advantages of this approach more in Section 5.1.3.7 Answer Set ProgrammingASP [ ] is a logic programming paradigm based on the stable model semantics of nor-mal logic programs that can be implemented using the latest advances in SAT solvingtechnology. Law et al [ ] discuss some of the advantages of learning ASP programs,rather than Prolog programs, which we reiterate. When learning Prolog programs, theprocedural aspect of SLD-resolution must be taken into account. For instance, whenlearning Prolog programs with negation, programs must be stratiﬁed; otherwise programmay loop under certain conditions. By contrast, as ASP is a truly declarative language, nosuch consideration need be taken into account when learning ASP programs. Comparedto Datalog and Prolog, ASP supports addition language constructs, such as disjunctionin the head of a clause, choice rules, and hard and weak constraints. A key differencebetween ASP and Prolog is semantics. A deﬁnite logic program has only one model (theleast Herbrand model). By contrast, an ASP program can have one, many, or even nostable models (answer sets). Due to its non-monotonicity, ASP is particularly useful forexpressing common-sense reasoning [ ] .To illustrate the beneﬁts of learning ASP programs, we reuse an example from Lawet al [ ] . Given a sufﬁcient examples of Hamiltonian graphs, ILASP [ ] can learn aprogram to deﬁnite them: This program illustrates useful language features of ASP. The ﬁrst rule is a choice ruleand the last two rules are hard constraints .Approaches to learning ASP programs can mostly be divided into two categories: brave learners , which aim to learn a program such that at least one answer set covers theexamples, and cautious learners , which aim to ﬁnd a program which covers the examplesin all answer sets. ILASP is notable because it supports both brave and cautious learning,which are both needed to learn some ASP programs [ ] . Moreover, ILASP differs frommost Prolog-based ILP systems because it learns unstratiﬁed ASP programs, includingprograms with normal rules, choice rules, and both hard and weak constraints, whichclassical ILP systems cannot. Learning ASP programs allows for ILP to be used for newproblems, such as inducing answer set grammars [ ] .3.8 Higher-order programsImagine learning a droplasts program, which removes the last element of each sublist ina list, e.g. [ alice,bob,carol ] [ alic,bo,caro ] . Given suitable input data, Metagol can learnthis ﬁrst-order recursive program: f(A,B):- empty(A),empty(B).f(A,B):- head(A,C),tail(A,D),head(B,E),tail(B,F),f1(C,E),f(D,F).f1(A,B):- reverse(A,C),tail(C,D),reverse(D,B). Although semantically correct, the program is verbose. To learn smaller programs, Metagol ho [ ] extends Metagol to support learning higher-order programs, where predicate sym-bols can be used as terms. For instance, for the same droplasts problem, Metagol ho learnsthe higher-order program: f(A,B):- map(A,B,f1).f1(A,B):- reverse(A,C),tail(C,D),reverse(D,B). To learn this program, Metagol ho invents the predicate symbol f1 , which is used twice inthe program: as term in the map(A,B,f1) literal and as a predicate symbol in the f1(A,B) literal. Compared to the ﬁrst-order program, this higher-order program is smaller be-cause it uses map/3 (predeﬁned in the BK) to abstract away the manipulation of the listand to avoid the need to learn an explicitly recursive program (recursion is implicit in map/3 ). Metagol ho has been shown to reduce sample complexity and learning times andimprove predictive accuracies [ ] .3.9 Probabilistic logic programsA major limitation of logical representations, such as Prolog and its derivatives, is theimplicit assumption that the BK is perfect. That is, most ILP systems assume that atoms nductive logic programming at 30 11 are true or false, leaving no room for uncertainty. This assumption is problematic if datais noisy, which is often the case.Integrating probabilistic reasoning into logical representations is a principled way tohandle such uncertainty in data. This integration is the focus of statistical relational arti-ﬁcial intelligence (StarAI) [

29, 32 ] . In essence, StarAI hypothesis representations extendBK with probabilities or weights indicating the degree of conﬁdence in the correctnessof parts of BK. Generally, StarAI techniques can be divided in two groups: distributionrepresentations and maximum entropy approaches.Distribution semantics approaches [ ] , including Problog [ ] and PRISM [ ] , ex-plicitly annotate uncertainties in BK. To allow such annotation, they extend Prolog withtwo primitives for stochastic execution: probabilistic facts and annotated disjunctions.Probabilistic facts are the most basic stochastic primitive and they take the form of logicalfacts labelled with a probability p . Each probabilistic fact represents a Boolean randomvariable that is true with probability p and false with probability 1 − p . For instance,the following probabilistic fact states that there is 1% chance of an earthquake in Naples. An alternative interpretation of this statement is that 1% of executions of the proba-bilistic program would observe an earthquake. The second type of stochastic primitiveis an annotated disjunction. Whereas probabilistic facts introduce non-deterministic be-haviour on the level of facts, annotated disjunctions introduce non-determinism on thelevel of clauses. Annotated disjunctions allow for multiple literals in the head, whereonly one of the head literals can be true at a time. For instance, the following annotateddisjunction states that a ball can be either green, red, or blue, but not a combination ofcolours: ::colour(B,green); ::colour(B,red); ::colour(B,blue) :- ball(B). By contrast, maximum entropy approaches annotate uncertainties only at the level of alogical theory. That is, they assume that the predicates in the BK are labelled as eithertrue or false, but the label may be incorrect. These approaches are not based on logicprogramming, but rather on ﬁrst-order logic. Consequently, the underlying semantics aredifferent: rather than consider proofs, these approaches consider models or groundingsof a theory. This difference primarily changes what uncertainties represent. For instance,Markov Logic Networks (MLN) [ ] represent programs as a set of weighted clauses.The weights in MLN do not correspond to probabilities of a formula being true but,intuitively, to a log odds between a possible world (an interpretation) where the clauseis true and a world where the clause is false. For instance, a clause that is true in 80% ofthe worlds would have a weight of 1.386 (log )The techniques from learning such probabilistic programs are typically direct exten-sions of ILP techniques. For instance, ProbFOIL [ ] extends FOIL [ ] with probabilisticclauses. Similarly, SLIPCOVER [ ] is a bottom-up approach, similar to Aleph [ ] andProgol [ ] . Huynh and Mooney [ ] use Aleph to ﬁnd interesting clauses and then learnthe corresponding weights. Kok and Domingos [ ] use relational pathﬁnding over BK toidentify useful clauses. That is, they interpret the BK as a hypergraph in which constantsform vertices and atoms form hyper-edges and perform random walks. Frequently oc-curring walks, or their subparts, are then turned into clauses. Such random walks couldbe seen as an approximate way to construct bottom clauses. There are often multiple (sometimes inﬁnitely many) hypotheses that explain the data.Deciding which hypothesis to choose has long been a difﬁcult problem. Older ILP systemswere not guaranteed to induce optimal programs, where optimal typically means withrespect to the size of the induced program or the coverage of examples. A key reason forthis limitation was that most search techniques learned a single clause at a time, leadingto the construction of sub-programs which were sub-optimal in terms of program sizeand coverage. For instance, programs induced by Aleph offer no guarantee of optimalitywith respect to the program size and coverage.Newer ILP systems try to address this limitation. As with the ability to learn recursiveprograms, the main development is to take a global view of the induction task by usingmeta-level search techniques. In other words, rather than induce a single clause at a timefrom a single example, the idea is to induce multiple clauses from multiple examples.For instance, ILASP uses ASP’s optimisation abilities to provably learn the program withthe fewest literals.The ability to learn optimal programs opens up ILP to new problems. For instance,learning efﬁcient logic programs has long been considered a difﬁcult problem in ILP [

78, 81 ] , mainly because there is no declarative difference between an efﬁcient program,such as mergesort, and an inefﬁcient program, such as bubble sort. To address this issue,Metaopt [ ] extends Metagol to support learning efﬁcient programs. Metaopt maintainsa cost during the hypothesis search and uses this cost to prune the hypothesis space.To learn minimal time complexity logic programs, Metaopt minimises the number ofresolution steps. For instance, imagine trying to learn a ﬁnd duplicate program, whichﬁnds any duplicate element in a list e.g. [ p,r,o,g,r,a,m ] r , and [ i,n,d,u,c,t,i,o,n ] i .Given suitable input data, Metagol can induce the program: f(A,B):- head(A,B),tail(A,C),element(C,B).f(A,B):- tail(A,C),f(C,B). This program goes through the elements of the list checking whether the same elementexists in the rest of the list. Given the same input, Metaopt induces the program: f(A,B):- mergesort(A,C),f1(C,B).f1(A,B):- head(A,B),tail(A,C),head(C,B).f1(A,B):- tail(A,C),f1(C,B).

This program ﬁrst sorts the input list and then goes through the list to check whetherfor duplicate adjacent elements. Although larger, both in terms of clauses and literals,the program learned by Metaopt is more efﬁcient O ( n log n ) than the program learnedby Metagol O ( n ) . Metaopt has been shown to learn efﬁcient robot strategies, efﬁcienttime complexity logic programs, and even efﬁcient string transformation programs.FastLAS [ ] is an ASP-based ILP system that takes as input a custom scoring functionand computes an optimal solution with respect to the given scoring function. The authorsshow that this approach allows a user to optimise domain-speciﬁc performance metricson real-world datasets, such as access control policies. Older ILP systems mostly use Prolog for reasoning. Recent work considers using differenttechnologies. nductive logic programming at 30 13 [ ] . To leverage these advances,much recent work in ILP uses related techniques, notably ASP [

14, 83, 62, 55, 56, 100,54, 40, 19 ] . The main motivations for using ASP are to leverage (i) the language beneﬁtsof ASP (Section 3.7), and (ii) the efﬁciency and optimisation techniques of modern ASPsolvers, such as CLASP [ ] , which supports conﬂict propagation and learning. Withsimilar motivations, other approaches encode the ILP problem as SAT [ ] or SMT [ ] problems. These approaches have been shown able to reduce learning times compared tostandard Prolog-based approaches. However, some unresolved issues remain. A key issueis that most approaches encode an ILP problem as a single (often very large) satisﬁabilityproblem. These approaches therefore often struggle to scale to very large problems [ ] ,although preliminary work attempts to tackle this issue [ ] .5.2 Neural networksWith the rise of deep learning, several approaches have explored using gradient-basedmethods to learn logic programs. These approaches all replace discrete logical reasoningwith a relaxed version that yields continuous values reﬂecting the conﬁdence of theconclusion.The various neural approaches can be characterised along four orthogonal dimen-sions. The ﬁrst dimension is whether the neural network implements forward or back-ward inference. While some [ ] use backward (goal-directed) chaining with a neuralimplementation of uniﬁcation, most approaches [

39, 108, 33 ] use forward chaining. Thesecond dimension is whether the network is designed for big data problems [ ] or for data-efﬁcient learning from a handful of data items [ ] . Few neural systems todate are capable of handling both big data and small data, with the notable exceptionof [ ] . The third dimension is whether the neural system jointly learns embeddings(mapping symbolic constants to continuous vectors) along with the logical rules [ ] .The advantage of jointly learning embeddings is that it enables fuzzy uniﬁcation be-tween constants that are similar but not identical. The challenge for these approachesthat jointly learn embeddings is how to generalize appropriately to constants that havenot been seen at training time. The fourth dimension is whether or not the neural sys-tem is designed to allow explicit human-readable logical rules to be extracted from theweights of the network. While most neural ILP systems [ ] do produce explicitlogic programs, some [ ] do not. It is perhaps moot whether implicit systems that donot produce explicit programs count as ILP systems at all – but note that even in the im-plicit neural systems, the weight sharing of the neural net is designed to achieve stronggeneralisation by performing the same computation on all tuples of objects.Currently, most neural approaches to ILP require the use of metarules or templatesto make the search space tractable. This severely limits the applicability of these ap-proaches, as the user cannot always be expected to provide suitable metarules for a newproblem. The only approach that avoids the use of metarules or templates is Neural LogicMachines [ ] .We now survey recent application areas for ILP. Scientiﬁc discovery.

Perhaps the most prominent application of ILP is in scineﬁtic discov-ery. ILP has, for instance, been used to identify and predict ligands (substructures re- sponsible for medical activity) [ ] and infer missing pathways in protein signalling net-works [ ] . There has been much recent work on applying ILP in ecology [

10, 105, 11 ] .For instance, Bohan et al [ ] use ILP to generate plausible and testable hypotheses fortrophic relations (‘who eats whom’) from ecological data. Program analysis.

Due to the expressivity of logic programs as a representation lan-guage, ILP systems have found successful applications in software design. ILP systemshave proven effective in learning SQL queries [

3, 101 ] , programming language seman-tics [ ] , and code search [ ] . Robotics.

Robotics applications often require incorporating domain knowledge or impos-ing certain requirements on the learnt programs. For instance, The Robot Engineer [ ] uses ILP to design tools for robot and even complete robots, which are tests in simulationsand real-world environments. Metagol o [ ] learns robot strategies considering their re-source efﬁciency and Antanas et al [ ] recognise graspable points on objects throughrelational representations of objects. Games.

Inducing game rules has a long history in ILP, where chess has often been thefocus [ ] . Legras et al [ ] show that Aleph and TILDE can outperform an SVM learnerin the game of Bridge. Law et al [ ] use ILASP to induce the rules for Sudoku and showthat this more expressive formalism allows for game rules to be expressed more com-pactly. Cropper et al [ ] introduce the ILP problem of inductive general game playing :the problem of inducing game rules from observations, such as Checkers , Sokoban , and

Connect Four . Data curation and transformation.

Another successful application of ILP is in data cura-tion and transformation, which is again largely because ILP can learn executable pro-grams. The most prominent example of such tasks are string transformations, such asthe example given in the introduction. There is much interest in this topic, largely due tosuccess in synthesising programs for end-user problems, such as string transformationsin Microsoft Excel [ ] . String transformation have become a standard benchmark forrecent ILP papers [

69, 27, 15, 18 ] . Other transformation tasks include extracting valuesfrom semi-structured data (e.g. XML ﬁles or medical records), extracting relations fromecological papers, and spreadsheet manipulation [ ] . Learning from trajectories.

Learning from interpretation transitions (LFIT) [ ] auto-matically constructs a model of the dynamics of a system from the observation of itsstate transitions. Given time-series data of discrete gene expression, it can learn geneinteractions, thus allowing to explain and predict states changes over time [ ] . LFIThas been applied to learn biological models, like Boolean Networks, under several se-mantics: memory-less deterministic systems [

51, 92 ] , and their multi-valued extensions [

93, 71 ] . Martínez et al [ ] combine LFIT with a reinforcement learning algorithm tolearn probabilistic models with exogenous effects (effects not related to any action) fromscratch. The learner was notably integrated in a robot to perform the task of clearing thetableware on a table. In this task external agents interacted, people brought new table-ware continuously and the manipulator robot had to cooperate with mobile robots totake the tableware to the kitchen. The learner was able to learn a usable model in justﬁve episodes of 30 action executions. Evans et al [ ] apply the Apperception Engine nductive logic programming at 30 15 to explain sequential data, such as cellular automata traces, rhythms and simple nurs-ery tunes, image occlusion tasks, game dynamics, and sequence induction intelligencetests. Surprisingly, they show that their system can achieve human-level performance onthe sequence induction intelligence tests in the zero-shot setting (without having beentrained on lots of other examples of such tests, and without hand-engineered knowledgeof the particular setting). At a high level, these systems take the unique selling point ofILP systems (the ability to strongly generalise from a handful of data), and apply it tothe self-supervised setting, producing an explicit human-readable theory that explainsthe observed state transitions.In a survey paper from a decade ago, Muggleton et al [ ] proposed directions for fu-ture research. In the decade since, there have been major advances on many of the topics,notably in predicate invention (Section 3), using higher-order logic as a representationlanguage (Section 3.2) and to represent hypotheses (Section 3.8), and applications inlearning actions and strategies (Section 5.2). Despite the advances, there are still manylimitations in ILP that future work should address.5.3 Limitations and future research Better systems.

Muggleton et al [ ] argue that a problem with ILP is the lack of well-engineered tools. They state that whilst over 100 ILP systems have been built, less than ahandful of systems can be meaningfully used by ILP researchers. In the decade since theauthors highlighted this problem, little progress has been made: most ILP systems are noteasy to use. In other words, ILP systems are still notoriously difﬁcult to use and you oftenneed a PhD in ILP to use any of the tools. Even then, it is still often only the developersof a system that know how to properly use it. By contrast, driven by industry, otherforms of ML now have reliable and well-maintained implementations, such as PyTorchand TensorFlow, which has helped drive research. A frustrating issue with ILP systemsis that they use many different language biases or even different syntax for the samebiases. For instance, the way of specifying a learning task in Progol, Aleph, TILDE, andILASP varies considerably despite them all using mode declarations, If it is difﬁcult forILP researchers to use ILP tools, then what hope do non-ILP researchers have? For ILPto be more widely adopted both inside and outside of academia, we must develop morestandardised, user-friendly, and better-engineered tools. Language biases.

As Cropper et al [ ] state, one major issue with ILP is choosing anappropriate language bias. For instance, Metagol uses metarules (Section 3.2) to restrictthe syntax of hypotheses and thus the hypothesis space. If a user can provide suitablemetarules, then Metagol is extremely efﬁcient. However, if a user cannot provide suitablemetarules (which is often the case), then Metagol is almost useless. This same brittle-ness applies to ILP systems that employ mode declarations [ ] . In theory, a user canprovide very general mode declarations, such as only using a single type and allowingunlimited recall. In practice, however, weak mode declarations often lead to very poorperformance. For good performance, users of mode-based systems often need to manu-ally analyse a given learning task to tweak the mode declarations, often through a processof trial and error. Moreover, if a user makes a small mistake with a mode declaration,such as giving the wrong argument type, then the ILP system is unlikely to ﬁnd a goodsolution. Even for ILP experts, determining a suitable language bias is often a frustratingand time-consuming process. We think the need for an almost perfect language bias is severely holding back ILP from being widely adopted. We think that an important direc-tion for future work in ILP is to develop techniques for automatically identifying suitablelanguage biases. Although there is some work on mode learning [

72, 41, 87 ] and workon identifying suitable metarules [ ] , this area of research is largely under-researched. Better datasets.

Interesting problems, alongside usable systems, drive research and at-tract interest in a research ﬁeld. This relationship is most evident in the deep learn-ing community which has, over a decade, grown into the largest AI community. Thiscommunity growth has been supported by the constant introduction of new problems,datasets, and well-engineered tools. Challenging problems that push the state-of-the-artto its limits are essential to sustain progress in the ﬁeld; otherwise, the ﬁeld risks stag-nation through only small incremental progress. ILP has, unfortunately, failed to deliveron this front: most research is still evaluated on 20-year old datasets. Most new datasetsthat have been introduced often come from toy domains and are designed to test speciﬁcproperties of the introduced technique. To an outsider, this sends a message that ILP isnot applicable to real-world problems. We think that the ILP community should learnfrom the experiences of other AI communities and put signiﬁcant efforts into develop-ing datasets that identify limitations of existing methods as well as showcase potentialapplications of ILP.

Relevance.

New methods for predicate invention (Section 3) have improved the abilitiesof ILP systems to learn large programs. Moreover, these techniques raise the potentialfor ILP to be used in lifelong learning settings. However, inventing and acquiring new BKcould lead to a problem of too much BK, which can overwhelm an ILP system [ ] .On this issue, a key under-explored topic is that of relevancy . Given a new inductionproblem with large amounts of BK, how does an ILP system decide which BK is relevant?One emerging technique is to train a neural network to score how relevant programs arein the BK and to then only use BK with the highest score to learn programs [

6, 38 ] .However, the empirical efﬁcacy of this approach has yet to be demonstrated. Moreover,these approaches have only been demonstrated on small amounts of BK and it is unclearhow they scale to BK with thousands of relations. Without efﬁcient methods of relevanceidentiﬁcation, it is unclear how efﬁcient lifelong learning can be achieved. Handling mislabelled and ambiguous data.

A major open question in ILP is how best tohandle noisy and ambiguous data. Neural ILP systems [

96, 39 ] are designed from thestart to robustly handle mislabelled data. Although there has been work in recent yearson designing ILP systems that can handle noisy mislabelled data, there is much less workon the even harder and more fundamental problem of designing ILP systems that canhandle raw ambiguous data . ILP systems typically assume that the input has already beenpreprocessed into symbolic declarative form (typically, a set of ground atoms represent-ing positive and negative examples). But real-world input does not arrive in symbolicform. Consider e.g. a robot with a video camera, where the raw input is a sequenceof pixel images. Converting each pixel image into a set of ground atoms is a challeng-ing non-trivial achievement that should not be taken for granted. For ILP systems to bewidely applicable in the real world, they need to be redesigned so they can handle rawambiguous input from the outset [

39, 34 ] . nductive logic programming at 30 17 Probabilistic ILP.

Real-world data is often noisy and uncertain. Extending ILP to deal withsuch uncertainty substantially broadens its applicability. While StarAI is receiving grow-ing attention, learning probabilistic programs from data is still largely under-investigateddue to the complexity of joint probabilistic and logical inference. When working withprobabilistic programs, we are interested in the probability that a program covers anexample, not only whether the program covers the example. Consequently, probabilis-tic programs need to compute all possible derivations of an example, not just a singleone. Despite added complexity, probabilistic ILP opens many new challenges. Most of theexisting work on probabilistic ILP considers the minimal extension of ILP to the prob-abilistic setting, by assuming that either (i) BK facts are uncertain, or (ii) that learnedclauses need to model uncertainty. These assumptions make it possible to separate struc-ture from uncertainty and simply reuse existing ILP techniques. Following this minimalextension, the existing work focuses on discriminative learning in which the goal is tolearn a program for a single target relation. However, a grand challenge in probabilisticprogramming is generative learning. That is, learning a program describing a genera-tive process behind the data, not a single target relation. Learning generative programsis a signiﬁcantly more challenging problem, which has received very little attention inprobabilistic ILP.

Explainability.

Explainability is one of the claimed advantages of a symbolic represen-tation. Recent work [

85, 2 ] evaluates the comprehensibility of ILP hypotheses usingMichie’s [ ] framework of ultra-strong machine learning , where a learned hypothesisis expected to not only be accurate but to also demonstrably improve the performanceof a human being provided with the learned hypothesis. [ ] empirically demonstrateimproved human understanding directly through learned hypotheses. However, morework is required to better understand the conditions under which this can be achieved,especially given the rise of PI.5.4 SummaryAs ILP approaches 30, we think that the recent advances surveyed in this paper haveopened up new areas of research for ILP to explore. Moreover, we hope that the nextdecade sees developments on the numerous limitations we have discussed so that ILPcan have a signiﬁcant impact on AI. References

1. Ahlgren J, Yuen SY (2013) Efﬁcient program synthesis using constraint satisfactionin inductive logic programming. J Machine Learning Res 14(1):3649–36822. Ai L, Muggleton S, Hocquette C, Gromowski M, Schmid U (2020) Beneﬁcialand harmful explanatory machine learning. Machine Learning In Press, availablehttp: // arxiv.org / abs /

4. Antanas L, Moreno P, De Raedt L (2015) Relational kernel-based grasping with nu-merical features. In: Inductive Logic Programming - 25th International Conference,ILP 2015, Springer, Lecture Notes in Computer Science, vol 9575, pp 1–145. Bain M, Srinivasan A (2018) Identiﬁcation of biological transition systems usingmeta-interpreted logic programs. Machine Learning 107(7):1171–12066. Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2017) Deepcoder:Learning to write programs. In: 5th International Conference on Learning Repre-sentations, ICLR 2017, OpenReview.net7. Bartha S, Cheney J (2019) Towards meta-interpretive learning of programminglanguage semantics. In: Inductive Logic Programming - 29th International Confer-ence, ILP 2019, Springer, Lecture Notes in Computer Science, vol 11770, pp 16–258. Bellodi E, Riguzzi F (2015) Structure learning of probabilistic logic programs bysearching the clause space. Theory Pract Log Program 15(2):169–2129. Blockeel H, De Raedt L (1998) Top-down induction of ﬁrst-order logical decisiontrees. Artif Intell 101(1-2):285–29710. Bohan DA, Caron-Lormier G, Muggleton S, Raybould A, Tamaddoni-Nezhad A(2011) Automated discovery of food webs from ecological data using logic-basedmachine learning. PLoS One 6(12):e29,02811. Bohan DA, Vacher C, Tamaddoni-Nezhad A, Raybould A, Dumbrell AJ, WoodwardG (2017) Next-generation global biomonitoring: large-scale, automated recon-struction of ecological networks. Trends in Ecology & Evolution 32(7):477–48712. Bratko I (1999) Reﬁning complete hypotheses in ILP. In: Inductive Logic Program-ming, 9th International Workshop, ILP-99, Springer, Lecture Notes in ComputerScience, vol 1634, pp 44–5513. Chollet F (2019) On the measure of intelligence. CoRR abs / / https://arxiv.org/abs/2008.07912 ,

18. Cropper A, Dumanˇci´c S (2020) Learning large logic programs by going beyondentailment. In: Proceedings of the Twenty-Ninth International Joint Conferenceon Artiﬁcial Intelligence, IJCAI 2020, ijcai.org, pp 2073–207919. Cropper A, Morel R (2021) Learning programs by learning from failures. MachineLearning20. Cropper A, Muggleton SH (2015) Learning efﬁcient logical robot strategies involv-ing composable objects. In: Proceedings of the Twenty-Fourth International JointConference on Artiﬁcial Intelligence, IJCAI 2015, AAAI Press, pp 3423–342921. Cropper A, Muggleton SH (2016) Metagol system. URL https://github.com/metagol/metagol

22. Cropper A, Muggleton SH (2019) Learning efﬁcient logic programs. MachineLearning 108(7):1063–1083 nductive logic programming at 30 19

23. Cropper A, Tourret S (2020) Logical reduction of metarules. Machine Learning109(7):1323–136924. Cropper A, Tamaddoni-Nezhad A, Muggleton SH (2015) Meta-interpretive learn-ing of data transformation programs. In: Inductive Logic Programming - 25th In-ternational Conference, ILP 2015, Springer, Lecture Notes in Computer Science,vol 9575, pp 46–5925. Cropper A, Dumanˇci´c S, Muggleton SH (2020) Turning 30: New ideas in induc-tive logic programming. In: Proceedings of the Twenty-Ninth International JointConference on Artiﬁcial Intelligence, IJCAI 2020, ijcai.org, pp 4833–483926. Cropper A, Evans R, Law M (2020) Inductive general game playing. Machine Learn-ing 109(7):1393–143427. Cropper A, Morel R, Muggleton S (2020) Learning higher-order logic programs.Machine Learning 109(7):1289–132228. De Raedt L (2008) Logical and relational learning. Cognitive Technologies,Springer29. De Raedt L, Kersting K (2008) Probabilistic Inductive Logic Programming,Springer-Verlag, Berlin, Heidelberg, p 1–2730. De Raedt L, Kimmig A, Toivonen H (2007) Problog: A probabilistic prolog and itsapplication in link discovery. In: IJCAI 2007, Proceedings of the 20th InternationalJoint Conference on Artiﬁcial Intelligence, Hyderabad, India, January 6-12, 2007,pp 2462–246731. De Raedt L, Dries A, Thon I, den Broeck GV, Verbeke M (2015) Inducing proba-bilistic relational rules from probabilistic examples. In: Proceedings of the Twenty-Fourth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2015, AAAIPress, pp 1835–184332. De Raedt L, Kersting K, Natarajan S, Poole D (2016) Statistical Relational ArtiﬁcialIntelligence: Logic, Probability, and Computation. Synthesis Lectures on ArtiﬁcialIntelligence and Machine Learning, Morgan & Claypool Publishers33. Dong H, Mao J, Lin T, Wang C, Li L, Zhou D (2019) Neural logic machines. In: ICLR34. Dong H, Mao J, Lin T, Wang C, Li L, Zhou D (2019) Neural logicmachines. In: 7th International Conference on Learning Representations,ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, URL https://openreview.net/forum?id=B1xY-hRctX

35. Dumancic S, Guns T, Cropper A (2020) Knowledge refactoring for inductive pro-gram synthesis. AAAI36. Dumanˇci´c S, Blockeel H (2017) Clustering-based relational unsupervised represen-tation learning with an explicit distributed representation. In: Proceedings of theTwenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017,ijcai.org, pp 1631–163737. Dumanˇci´c S, Guns T, Meert W, Blockeel H (2019) Learning relational represen-tations with auto-encoding logic programs. In: Proceedings of the Twenty-EighthInternational Joint Conference on Artiﬁcial Intelligence, IJCAI 2019, ijcai.org, pp6081–608738. Ellis K, Morales L, Sablé-Meyer M, Solar-Lezama A, Tenenbaum J (2018) Learn-ing libraries of subroutines for neurally-guided bayesian program induction. In:NeurIPS 2018, pp 7816–782639. Evans R, Grefenstette E (2018) Learning explanatory rules from noisy data. J ArtifIntell Res 61:1–64

40. Evans R, Hernández-Orallo J, Welbl J, Kohli P, Sergot M (2021) Making sense ofsensory input. Artiﬁcial Intelligence p 10343841. Ferilli S, Esposito F, Basile TMA, Mauro ND (2004) Automatic induction of ﬁrst-order logic descriptors type domains from observations. In: Inductive Logic Pro-gramming, 14th International Conference, ILP 2004, Springer, Lecture Notes inComputer Science, vol 3194, pp 116–13142. Gebser M, Kaminski R, Kaufmann B, Schaub T (2012) Answer Set Solving in Prac-tice. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, Morgan &Claypool Publishers43. Gebser M, Kaufmann B, Schaub T (2012) Conﬂict-driven answer set solving: Fromtheory to practice. Artif Intell 187:52–8944. Genesereth MR, Björnsson Y (2013) The international general game playing com-petition. AI Magazine 34(2):107–11145. Gulwani S (2011) Automating string processing in spreadsheets using input-outputexamples. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Prin-ciples of Programming Languages, POPL 2011, ACM, pp 317–33046. Heule MJH, Kullmann O, Marek VW (2016) Solving and verifying theboolean pythagorean triples problem via cube-and-conquer. In: CreignouN, Berre DL (eds) Theory and Applications of Satisﬁability Testing -SAT 2016 - 19th International Conference, Bordeaux, France, July 5-8, 2016, Proceedings, Springer, Lecture Notes in Computer Science,vol 9710, pp 228–245, DOI 10.1007 / \ _15, URL https://doi.org/10.1007/978-3-319-40970-2_15

47. Hocquette C, Muggleton SH (2020) Complete bottom-up predicate invention inmeta-interpretive learning. In: Proceedings of the Twenty-Ninth International JointConference on Artiﬁcial Intelligence, IJCAI 2020, ijcai.org, pp 2312–231848. Huynh TN, Mooney RJ (2008) Discriminative structure and parameter learningfor markov logic networks. In: Proceedings of the 25th International Conferenceon Machine Learning, Association for Computing Machinery, New York, NY, USA,p 416?423, DOI 10.1145 / nductive logic programming at 30 21 / / Morgan Kaufmann, pp 287–29275. Muggleton S (1991) Inductive logic programming. New Generation Computing8(4):295–31876. Muggleton S (1995) Inverse entailment and progol. New Generation Comput13(3&4):245–28677. Muggleton S, Buntine WL (1988) Machine invention of ﬁrst order predicates byinverting resolution. In: Machine Learning, Proceedings of the Fifth InternationalConference on Machine Learning, Morgan Kaufmann, pp 339–35278. Muggleton S, De Raedt L (1994) Inductive logic programming: Theory and meth-ods. J Log Program 19 / nductive logic programming at 30 23

94. Ribeiro T, Folschette M, Magnin M, Inoue K (2020) Learning any semantics fordynamical systems represented by logic programs, working paper or preprint95. Richardson M, Domingos PM (2006) Markov logic networks. Ma-chine Learning 62(1-2):107–136, DOI 10.1007 / s10994-006-5833-1, URL https://doi.org/10.1007/s10994-006-5833-1

96. Rocktäschel T, Riedel S (2017) End-to-end differentiable proving. In: Advances inNeural Information Processing Systems 30: Annual Conference on Neural Infor-mation Processing Systems 2017, 4-9 December 2017, pp 3788–380097. Sammut C, Sheh R, Haber A, Wicaksono H (2015) The robot engineer. In: LateBreaking Papers of the 25th International Conference on Inductive Logic Program-ming, CEUR-WS.org, CEUR Workshop Proceedings, vol 1636, pp 101–10698. Sato T (1995) A statistical learning method for logic programs with distributionsemantics. In: Sterling L (ed) Logic Programming, Proceedings of the Twelfth In-ternational Conference on Logic Programming, Tokyo, Japan, June 13-16, 1995,MIT Press, pp 715–72999. Sato T, Kameya Y (2001) Parameter learning of logic programs for symbolic-statistical modeling. J Artif Intell Res 15:391–454, DOI 10.1613 / jair.912, URL https://doi.org/10.1613/jair.912 //