[PDF] Modular Design Patterns for Hybrid Learning and Reasoning Systems: a taxonomy, patterns and use cases

Abstract

The unification of statistical (data-driven) and symbolic (knowledge-driven) methods is widely recognised as one of the key challenges of modern AI. Recent years have seen large number of publications on such hybrid neuro-symbolic AI systems. That rapidly growing literature is highly diverse and mostly empirical, and is lacking a unifying view of the large variety of these hybrid systems. In this paper we analyse a large body of recent literature and we propose a set of modular design patterns for such hybrid, neuro-symbolic systems. We are able to describe the architecture of a very large number of hybrid systems by composing only a small set of elementary patterns as building blocks. The main contributions of this paper are: 1) a taxonomically organised vocabulary to describe both processes and data structures used in hybrid systems; 2) a set of 15+ design patterns for hybrid AI systems, organised in a set of elementary patterns and a set of compositional patterns; 3) an application of these design patterns in two realistic use-cases for hybrid AI systems. Our patterns reveal similarities between systems that were not recognised until now. Finally, our design patterns extend and refine Kautz' earlier attempt at categorising neuro-symbolic architectures.

Full PDF

MM ODULAR D ESIGN P ATTERNSFOR H YBRID L EARNING AND R EASONING S YSTEMS : A TAXONOMY , PATTERNS AND USE CASES

A P

REPRINT : UNDER REVIEW

Michael van Bekkum[1] Maaike de Boer[1] Frank van Harmelen ∗ [2] Andr´e Meyer-Vitali[1]Annette ten Teije † [2] [1] TNO, The Netherlands[2] Vrije Universiteit Amsterdam, The NetherlandsFebruary 25, 2021 A BSTRACT

The uniﬁcation of statistical (data-driven) and symbolic (knowledge-driven) methods is widelyrecognised as one of the key challenges of modern AI. Recent years have seen large number ofpublications on such hybrid neuro-symbolic AI systems. That rapidly growing literature is highlydiverse and mostly empirical, and is lacking a unifying view of the large variety of these hybridsystems. In this paper we analyse a large body of recent literature and we propose a set of modulardesign patterns for such hybrid, neuro-symbolic systems. We are able to describe the architectureof a very large number of hybrid systems by composing only a small set of elementary patterns asbuilding blocks. The main contributions of this paper are: 1) a taxonomically organised vocabularyto describe both processes and data structures used in hybrid systems; 2) a set of 15+ design patternsfor hybrid AI systems, organised in a set of elementary patterns and a set of compositional patterns;3) an application of these design patterns in two realistic use-cases for hybrid AI systems. Our pat-terns reveal similarities between systems that were not recognised until now. Finally, our designpatterns extend and reﬁne Kautz’ earlier attempt at categorising neuro-symbolic architectures. K eywords neuro-symbolic systems · design patterns It is widely acknowledged in recent AI literature that the data-driven and knowledge-driven approaches to AI havecomplementary strengths and weaknesses [17]. This has led to an explosion of publications that propose differentarchitectures to combine both symbolic and statistical techniques. Surveys exist on narrow families of such systems[66, 67, 3], but to date no conceptual framework is available in which such hybrid symbolic-statistical systems canbe discussed, compared, conﬁgured and combined. In this paper, we propose a set of modular design patterns forHybrid AI systems that combine learning and reasoning (using combinations of data-driven and knowledge-driven AIcomponents). With this set of design patterns, we aim to achieve the following goals. First, we provide high-leveldescriptions of the architectures of hybrid AI systems. Such abstract descriptions should enable us to better understandthe commonalities and differences between different systems, while abstracting from their speciﬁc technical details.We will show for a number of our design patterns that they describe systems which have not been recognised inthe literature as essentially doing the same task. Secondly, our set of design patterns is intended to bridge the gapbetween the different communities that are currently studying hybrid approaches to AI systems. AI communities such ∗ [email protected] † [email protected] a r X i v : . [ c s . A I] F e b ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS as machine learning and knowledge-based systems as well as other communities (such as cognitive science) often usevery different terminologies which hamper the communication about the systems under study. Finally, and perhapsmost importantly, our design patterns are modular, and are intended as a tool for engineering hybrid AI systems outof reusable components. In this respect, our design patterns for hybrid AI systems have the same goals as the designpatterns which are well known from Software Engineering [28].Darwiche [18] draws attention to the distinction between two types of components in AI systems, for which he usesthe terms function-based and model-based. Similarly, Pearl [52] uses the term ”model-free” for the representationstypically used in many learning systems. Other names used for inferences at this layer are: “model-blind,” “black-box,” or “data-centric” [52], to emphasize that the main task performed by machine learning systems is function-ﬁtting:ﬁtting data by a complex function deﬁned by a neural network architecture. Such ”function-based” or ”model-free”representations are in contrast to the ”model-based” representations typically used in reasoning systems. We will usea similar distinction in our design patterns. Both Darwiche and Pearl argue for combining components of these twotypes: ”the question is not whether it is functions or models but how to profoundly integrate and fuse functions withmodels” [18] and ”Our general conclusion is that human-level AI cannot emerge solely from model-blind learningmachines; it requires the symbiotic collaboration of data and models” [52]. However, neither of the cited worksdiscuss how such combinations must be made. This is exactly what we set out to do in this paper by proposing a setof modular design patterns for such combinations.Lamb & Garcez [19] ask for ”The trained network and the logic [to] become communicating modules of a hybridsystem [..]. This distinction between having neural and symbolic modules that communicate in various ways andhaving translations from one representation to the other in a more integrative approach to reasoning and learningshould be at the centre of the debate in the next decade.”

Our proposal is a concrete step in precisely the direction thatthey call for, providing a set of composition patterns that can be used as modular building blocks in hybrid systems.The main contributions of this paper are as follows: (i) a taxonomically organised vocabulary to describe both theprocesses that constitute hybrid AI systems as well as the data structures that such processes produce and consume;(ii) a set of modular design patterns for hybrid AI systems, organised in a set of elementary patterns, plus moresophisticated patterns that can be constructed by composing such elementary patterns; we will show how a numberof systems from the recent literature can be described as such compositional patterns; these patterns are a furtherelaboration of our ﬁrst proposal for such design patterns in [65]; (iii) two realistic use-cases for hybrid AI systems(one for skill matching, one for robot action selection), showing how the architecture for each of these use-cases canbe described in terms of our compositional design patterns.The paper is structured as follows, ﬁrst we describe our taxonomical vocabulary in section 2, in section 3 we describea set of elementary patterns, followed by a set of compositional patterns in section 4. Section 5 describes two realisticuse-cases in terms of the patterns from section 3 and 4. Section 7 concludes and discusses directions for future work.

In order to describe design patterns, a terminology is required that deﬁnes a taxonomy of both processes and theirinputs and outputs on various levels of abstraction. On the highest level of abstraction, we deﬁne instances, models,processes and actors. In the pattern diagrams, instances are represented as rectangular boxes, models as hexagonalboxes, processes as ovals and actors as triangles. More speciﬁc concepts will be used, when necessary and useful, usinga colon-separated notation. For example, model:stat:NN refers to a neural network model. The level of abstractiondepends on the use of the pattern and the stage of design and implementation: the closer to implementation, the morespeciﬁc the concepts will be. The design patterns should abstract from implementation details, but be speciﬁc enoughto document applicable design choices. As abbreviated notation, we will not name the highest abstraction level,because that is implied by the type of box. In the models, we always indicate whether a speciﬁc model is statistical( stat ) or semantic ( sem ). The full taxonomy with deﬁnitions can be found in the Appendix.

Instances are the basic building blocks of “things”, examples or single occurrences of something. The two mainclasses of instances are data and symbols. The precise distinction between symbols and “non-symbols” remains acontentious issue in modern philosophy. We are following [Berkeley, 2008] by imposing the following requirementsbefore any token can be deemed a symbol: (1) a symbol must designate an object, a class or a relation in the world,and such a designated object, class or relation is then called the interpretation of the symbol; (2) symbols can be eitheratomic or complex, in which case they are composed of other symbols according to a formal set of compositionalrules; and (3) there must be system of operations that, when applied to a symbol, generates new symbols, that again2

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS must have a designation. Thus, the tokens p1 and p2 may designate particular persons, with the symbol r designatingsome relation between these persons; then r(p1,p2) is a complex symbol made out of these atomic symbols, and theoperation r(p1,p2) | = T r(p2,p1) deﬁnes an operation constructing one complex symbol out of another. In logic, such“operations” correspond to logical inference | = , and this logical view is most relevant in this paper, but in anothercontext such operations may be transitions between the symbols that denote the states of a ﬁnite state machine. Allthis makes the symbol p1 different from a data item, say a picture of a person, where the collection of pixels may be anaccurate image of a person, but does not designate the person in a way that allows the construction of more complexdesignations and operations that transform these into other designations. Simply put: a symbol p designates a person,whereas a picture is just itself. Such tokens which are not symbols are what we will call “data” in this paper.The types of data that appear in Hybrid AI systems include• numbers: numerical measurements;• texts: sequences of words, sentences;• tensors: multi-dimensional spaces, including bitmaps;• streams: (real-time) sequences of data, including video and other sensor inputs.The types of symbols, on the other hand, include• labels: short descriptions;• relations: connections between data items, such as triples and other n-ary relations;• traces: (historical) records of data and events, such as proof traces for explanations. Models are descriptions of entities and their relationships. They are useful for inferring data and knowledge. Modelsin Hybrid AI systems can be either (1) statistical or (2) semantic models. Statistical models represent dependencies be-tween statistical variables. Examples are (Deep) Neural Networks, Bayesian Networks and Markov Models. Semanticmodels represent the implicit meaning of symbols by specifying their concepts, attributes and relationships. Examplesare Taxonomies, Ontologies, Knowledge Graphs, Rulebases and Differential Equations. We summarise these semanticmodels under the umbrella term “knowledge base” (KB).

In order to perform operations on instances and models, processes deﬁne the steps that lead from inputs to results.Three main types of processes are: (i) the generation of instances and models, (ii) their transformation and (iii) infer-encing thereupon. Generation of models is performed either via training or, “manually” by knowledge engineeringwith experts. Many forms of transformations exist, such as transforming a knowledge graph to a vector space. Infer-ences are made using induction or deduction. Induction is constructing a generalisation out of speciﬁc instances. Sucha generalisation (a “model”) can take many different forms, ranging from the trained weights in a neural network toclauses in a learned Logic Program, but in every case, such models are created by inductions on instances.Using such models, we can apply deductive inferencing in order to reach conclusions about speciﬁc instances of data.Commonly, deduction is associated with logical inference, but the standard deﬁnition, namely as inference in whichthe conclusion is of no greater generality than the premises, equally applies to the forward pass of a neural network,where a general model (the trained network) is applied to an instance in order to arrive at a conclusion. Both inductiveand deductive processes can be either symbolic or statistical, but in every case induction reasons from the speciﬁc(the instances) to the general (a model), and deduction does the converse. Thus, the distinction between inductionand deduction corresponds precisely to the distinction between learning and reasoning. Both classiﬁcation (“thesystematic arrangement in groups or categories according to established criteria”, see our appendix) and prediction(“to calculate some future event or condition as a result of analysis of available data”, again see our appendix) aredeductive processes, since both use a generic model (obtained through an inductive process of learning or engineering)to derive information about speciﬁc instances (the objects to be classiﬁed or the events to be predicted).

Processes in an AI system are initiated by autonomous actors, based on their intentions and goals. They interact witheach other using many protocols and behaviours, such as collaboration, negotiation or competition. These interactions3

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS lead to collective intelligence and emergent social behaviour, such as in swarms, multi-agent systems or human-agent teams. Autonomy is a gradual property, ranging from remotely controlled to selﬁsh behaviour and all forms ofcooperation in between. Actors can be humans, (software) agents or robots (physically embedded agents). Examplesare proactive software components, apps, services, mobile agents, drones or (parts of) autonomous vehicles. Actorsare not yet used explicitly in the current collection of patterns, but will be included in the future, when we will alsodive into distributed AI and human-agent interaction.

The taxonomy of instances, models and processes from the previous section gives rise to a number of elementarybuilding blocks for hybrid systems that combine learning and reasoning. The “train” process consumes either data orsymbols to produce a model (all these terms taken from the taxonomy): modeldata (1a)generate:train (1b)modelsymbol generate:train

Additionally, an actor (e.g., a domain expert or knowledge engineer) can create a model, such as an ontology orrule-base: (1c)modelactor generate:engineer

In the processing, often a transformation step is needed to create the right type of data, either from symbol or data: (1d)symbol/data datatransform

Of course models are only trained in order to be subsequently used in a downstream task (predicting labels, derivingconclusions, predicting links, etc). This process is captured in the patterns (2a)-(2c), depending on the symbolic orstatistical nature of the data. Following our taxonomy, an infer step uses such models in combination with either dataor symbols to infer conclusions: (2a)modeldata symbolinfer:deduce (2b)model:semanticsymbol symbolinfer:deduce ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS (2c)modelsymbol/data modelinfer:deduce

Finally, sometimes an operation on a semantic model is neither a logical induction or deduction, but a transformationinto another datastructure. This is captured by the ﬁnal elementary pattern (2d): (2d)model:semanticsymbol datatransform:embed

As is encoded in these diagrams, the types of models involved (symbolic or statistical) and the type of results derivedis constrained by the type of the inputs of these elementary processes.These elementary patterns allow us to give a more precise deﬁnition of the concept of ”hybrid systems”, which is oftenused rather nebulously in the literature:

Deﬁnition:

Machine Learning systems are systems that combine pattern (1a) with (2a), yielding pattern (3a), seebelow;

Knowledge Representation systems are systems that follow pattern (2b);

Hybrid systems are systems that formany other combination of the elementary patterns (1a)-(2d).Already these elementary patterns (1a-1d,2a-2d), even in their almost trivial simplicity, can be used to group togethera large number of very different approaches from the literature: even though the algorithms and representations ofInductive Logic Programming (ILP) [38], Markov Logic Networks [55] and Probabilistic Soft Logic [32, 5] are com-pletely different, the architecture pattern 1b applies to all of them, showing that they are all aimed at the same goal:learning over symbolic structures.Similarly, learning a symbolic rule-set that captures rules for knowledge graph completion [48] is captured by thispattern. Constructing knowledge graph embeddings into a high-dimensional vector space [51, 67, 49] is also capturedby ﬁgure 1b. So ILP and KG embedding would each be captured more speciﬁcally by adding type annotations to theconstructed model: model:sem for ILP, and model:stat for KG embedding. Many ”classical” learning algorithms suchas decision tree learning and rule mining, as well as deep learning systems, are covered by architectural pattern 1a.The learning patterns (1a)-(1b) must be combined with the prediction patterns (2a)-(2c) to give a model for the fulllearning and prediction task (3a): train & deducedatadata model symbol(3a)infer:deducegenerate:train

This model is precisely the composition of the elementary processes for train and infer given above. Even learningwith a regular neural network is captured by this diagram (although it is not typically recognized that the feed-forwardphase, when a trained neural network is applied to new data, is actually a deductive task, namely reasoning from givenpremises (input data plus the learned network weights) to a conclusion.Analogously to learning from data , it is also possible to learn from symbols , using elementary patterns (1b) and (2b)instead of (1a) and (2a): 5

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS train & deduce modelsymbol (3b)generate:train symbolsymbol infer:deduce

As mentioned above, this pattern then describes the learning (and subsequent inference) in Inductive Logic Program-ming, Knowledge Graph embeddings, Probabilistic Soft Logic and Markov Logic Networks.Using our hierarchical taxonomy, the two patterns (3a) and (3b) can be abstracted into a single pattern, replacing allthe boxes labelled with ”data” or ”symbol” by the generic term ”instance”. The speciﬁc diagrams (3a) and (3b) canthen be recovered by adding type annotations ”instance:sym” or ”instance:data”. Such type-specialisations maintainthe insight that many of these very different approaches (ILP, MLN, PSL, Knowledge Graph embeddings) actuallyfollow the same schema. .

In this section, we describe compositional patterns based on the elementary pattern described in the previous section.We combined papers from several ﬁelds into one pattern. From the elementary patterns, we create compositions intwo ways: (1) we can create a more complex pattern by connecting or ‘stitching’ elementary patterns; (2) we can gomore speciﬁc or more abstract (only showing the boxes of 1a - 1b for example); in a speciﬁc pattern we can specifythe type of, for example, a symbol block in terms of symbol:relations . inline]Add ”Human-Driven FOL Explanations of Deep Learning” (IJCAI2020) as another example of ”learning sym-bols from data”? data:text(2a) symbol:rel: ontologysymbol: relations(2b) (4)infer:deduceinfer:predict model: semanticmodel: stat In ontology learning, an symbolic ontology is learned from data in the form of text [4, 3, 39, 10, 70, 12, 71]. The textis ﬁrst translated into (subject, verb, object) relations using a statistical model such as the Stanford Parser [13, 20].These relations are an intermediate representation. A semantic model, for example rules for Hearst patterns, can theninfer the relations that form a full ontology including relation hierarchies and axioms. This pattern combines patterns(2a) and (2b). Whereas in other cases an ontology can play the role of a model on the basis of which properties ofinstances are deduced, in this case, we represent the ontology as a set of relations, because it is the output of a process,and not a model which is input to a process.A related but different instantiation of this pattern is the use of text-mining not to learn full-blown ontologies, butto learn just the class/instance distinction (which is always problematic in ontology modelling), as done in [50]. Asconcerns the architectural patterns, this work only differs in the actual content of the symbolic output: a full-blownontology, or only a class/instance label.In contrast, other ontology-learning systems [11, 40] start from a given set of relations (the ”A-box” of descriptionlogic) and then infer an ontological hierarchy. These systems only apply the second half of the above pipeline, pattern(2b).An entirely different application domain is found in [2], where symbolic ﬁrst-order logic representations are generatedto describe the content of images. 6

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS train & deduce modeldata(3a) symbol (5a)infer:deducegenerate:traindata model:semanticinfer:deducesymbol:trace(2b)explaintrain & deduce modelsymbol(3b) symbol (5b)infer:deducegenerate:trainsymbol model:semanticinfer:deducesymbol:trace(2b)explain

Hybrid symbolic-statistical systems are seen as one possible way to remedy the “black-box” problem of many modernmachine learning systems [68]. Pattern (5a) shows one of the hybrid architectures that have been proposed in theliterature for this purpose. A standard machine learning system is trained (generate:train) to construct a model which isthen applied to input data in order to produce (infer:deduce) a prediction (for example, a label for a given input image).The result of this process (in the form of the pairs of image + label) is then passed on to a symbolic reasoning systemwhich then uses background knowledge (model:semantic) to produce a “rational reconstruction” of a reason to justifythe input/output-pair of the learning system. An example of this is the work by [64] who uses large knowledge graphsto reconstruct the justiﬁcation of temporal patterns learned from Google Trends data. It is important to emphasizethat the justiﬁcation found by the symbolic system is unrelated to the inner workings of the black box of the machinelearning system. The symbolic system produces a post-hoc justiﬁcation that is not necessarily reﬂecting the statisticalcomputation. This architecture is also used in [58], where a description logic reasoner is used to come up with alogical justiﬁcation of classiﬁcations produced by a deep learning system. Notice that pattern (5a) is a straightforwardcombination of elementary patterns (1a), (2a) and (2b).Pattern (5a) captures so-called ”instance-level explanations”, where a separate explanation is generated for everyspeciﬁc result of applying the learned model. In contrast, it is also possible to generate ”model-level explanations”,where a generic explanation is constructed that captures the structure of the entire learned model. An example of this is[14], which trains a second neural network that uses the input/output behaviour of a classiﬁer network to general ﬁrst-order logic formulas that can then be used to explain the behaviour of the classiﬁer. This results in a modiﬁcation of theabove pattern in which the subsystem labelled with ”2b” is replaced by a learning system that takes the learned modelfrom 1a as input and produces a symbolic explanation of that model. In other words: the explanation is generatedbased on the trained model, and not just on the derived individual result of applying that model to a given piece ofdata. data(3a) generate:trainmodeldata infer:deduce symbol(3b) infer:deduce symbolsymbolgenerate:trainmodel (6a) ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS

Intermediate abstraction for learning.

A well known machine learning benchmark is to recognise sets of handwrit-ten digits [42]. This digit-recognition task could be extended to perform addition on such handwritten digits throughend-to-end training on the bitmap representations of the handwritten digits. This would correspond to our basic pattern(3a). However, many authors have observed that it is more efﬁcient to learn an intermediate symbolic representation(mapping the pixel representation of the handwritten digit into a symbolic representation), and then train a model tosolve the symbolically represented addition task. This pattern is represented above, where two standard train+deducepatterns (pattern 3a, 3b) are chained together through a symbolic intermediate representation which serves as outputfor the ﬁrst pattern and as input for the second. This pattern is exploited in Deep Problog [45] where a system is trainedto add handwritten digits by ﬁrst recognising the digit and then doing the addition. This then turns out to be a muchmore robust approach then simple end-to-end training going from the digit bitmaps to the summed result in one step.This same pattern also captures the DeepMind experiment [30] where a reinforcement learning agent is trained to notjust navigate on a bitmap representation of its world, but to ﬁrst learn an intermediate symbolic representation of theworld and then use that for navigation.Besides learning a spatial abstraction (as in [30]), the work in [36] uses the same architecture pattern for derivinga temporal abstraction of sequence of subtasks, which are then input to reinforcement learning agents. One of theadvantages of such an intermediate representation is the much higher rates of transfer learning that can be obtainedafter making trivial changes to the input distribution, be they handwritten digits or bitmaps of ﬂoor spaces.

Intermediate abstraction for reasoning . Whereas pattern (6a) consists of a composition of two patterns for learning,pattern (3) (ﬁrst deriving an intermediate abstraction on the basis of a trained model and then using this intermediateabstraction as the basis for a further derivation on the basis of a second trained model), it is also possible to usethe derived abstraction as the input for a deductive reasoning task, composing pattern (3) with pattern (2b), creatingpattern (6b). A classic example of this pattern is the AlphaGo system [61], where machine learning is used to train anevaluation function that gives rise to a symbolic search tree which is then traversed using deductive techniques (MonteCarlo tree search) in order to derive a (symbolically represented) next move on the Go board. Notice that pattern 5a isa specialisation of this pattern 6b. data(3a) generate:trainmodeldata infer:deduce symbol (6b)(2b) infer:deduce symbolmodel

Pattern (4) above (ontology learning from text) can now be seen to also be a variation on the general theme of “learningan intermediate abstraction”, where the set of relations extracted by linguistic analysis is the intermediate abstractthat is input for the set of rules that constructs the ﬁnal ontology out of these relations. In pattern (4) the models(model:semantic) are assumed to be given (e.g. by using the pretrained Stanford parser) and hence the training phasesare omitted. 8

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS (3a) infer:deduce symboldata generate:trainmodel (7)model:semanticsymbol symbolinfer:deduce(2b) data

In [66] a large collection of over 100 different systems is discussed which are all captured by pattern (7). In thispattern, the training phase of a learning system is guided by information that is obtained from a symbolic inferencesystem (pattern 2b). For this purpose, the training step from elementary pattern (1a) is extended with a further input toallow for this guidance by inferred symbolic information. A particular example is where domain knowledge (such ascaptured in modern large knowledge graphs) is used as a symbolic prior to constrain the search space of the trainingphase [8]. In general, this patterns also captures all systems with a so-called semantic loss function [72], where (partof) the loss-function is formulated in terms of the degree to which the symbolic background knowledge is violated.Such a semantic loss-function is also used in [43], where the semantic loss is calculated by weighted model counting.In [22] and [46] the semantic loss-function is realised through approximate constraint satisfaction. Another example is[21] where logical rules are used as background knowledge for a gradient descent learning task in a high-dimensionalreal-valued vector space. In the same spirit, [57] exploits a type-hierarchy to inform an embedding in hyperbolic space.Logic Tensor Networks [23] also fall in this category, since they jointly minimise both the loss function of a neuralnetwork and maximise the degree to which a ﬁrst-order logic theory is satisﬁed. The fact those LTN’s are captured bythe same design pattern as semantic loss functions suggests an analogy between the two (namely that the maximisationof ﬁrst-order satisﬁability in LTN’s can be regarded as a semantic loss-function). This analogy between these twosystems was not mentioned in their original papers, but only comes to light through our analysis in terms of high-leveldesign patterns.An entirely different category of systems that is captured by the same pattern are constrained reinforcement learners(e.g. [31]), where the exploration behaviour of a reinforcement learning agent is constrained through symbolic con-straints that enforce safety conditions. Similarly, [37] uses high-level symbolic plans to guide a reinforcement learnertowards efﬁciently learning a policy. [62] shows how adding domain knowledge in the form of symbolic constraintsgreatly improves the sampling-frequency of a neural network trained to solve a combinatorial problem. The LYRICSsystem [47] proposes a generic interface layer that allows to deﬁne arbitrary ﬁrst order logic background knowledge,allowing a learning system to learn its weights under the constraints imposed by the prior knowledge.The full design pattern (7) requires that the symbolic prior is derived by a symbolic reasoning system, but it is ofcourse also possible that this symbolic prior (or ”inductive bias”, using the terminology from [7]) is simply in the formof an explicit knowledge-base for which no further derivation is possible. This would lead to a simpliﬁed version ofpattern (7) where the “Infer” step would be omitted. An example of this is [41], where input data is ﬁrst abstractedwith the help of a symbolic ontology, and is then fed into a classiﬁer, which performs better on the abstracted symbolicdata than on the original raw data. A similar example is given in [6], where knowledge graphs are successfully usedas priors in a scene description task.An interesting variation is presented in [73]. This work exploits the fact that pattern (7) uses prior knowledge insymbolic as input for learning, while pattern (4) produces symbolic results as the output of learning. The systemdescribed in [73] then iterates between producing symbolic knowledge, then using this symbolic knowledge as inputfor informed machine learning, followed again by using the learned model to produce better symbolic knowledge,hence iterating between patterns (7) and (4). The ItereﬁnE system [1] is another example of this pattern.9

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS (2a variant) symbol: relationsinfer:deducesymbol:kg(1d) data:tensortransform:embed (8)

Link prediction (or: graph completion) in knowledge graphs [51, 67, 49], has been a very active area on the boundaryof symbolic and statistical representations, and is an example of what is captured in pattern (8). Almost all graph com-pletion algorithms perform this task by ﬁrst translating the knowledge graph to a representation in a high-dimensionalvector space (a process called “embedding”, this is captured in pattern (1d)), and then use this representation to predictadditional edges which are deemed to be true based on geometric regularities in the vector space, even though they aremissing from the original graph. This can be expressed in a variant of pattern 2a.

Integrating knowledge representations into a machine learning system is a long standing challenge in Hybrid AI,since it allows logic calculus to be carried out by a neural network in an efﬁcient and robust manner. Encoding priorknowledge also allows for better training results on fewer data. This pattern describes the integration of logic intoa machine learning model through tensorization of the logic involved [29] by applying prior (semantic) knowledgerepresentations as constraints for machine learning. The pattern transforms a semantic model into vector/tensorrepresentations and uses these to train a neural network in order to learn. The machine learner can make inferencesbased on its embedded logic, for example Logic Tensor Networks [59, 74] where relational data is embedded in a(convolutional) neural network. Graph Neural Networks (GNNs, [9]) embed a semantic graph model by transforminga neighbourhood structure of its nodes and edges into vectors and using these to train a neural network. In [15] areiﬁed knowledge base is provided as input to train neural modules to perform multi-hop inferencing tasks. Thepattern in itself is an extension of pattern 3a, where the training input data is a (transformed) representation ofrelational data. (2d) model:semanticsymbol transform:embed train & deducedatadata model symbol(3a) infer:deducegenerate:train (9) ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS train & deduce(2b) model:semanticsymbol symbolinfer:deduce (1d) datatransformgenerate:train(3a variant) model:statsymbol: relation symbol: relationinfer:deduce (10)

Whereas pattern (9) provides representation learning by embedding vector/tensor representations of logical structuresin a neural network, there are also attempts to learn the reasoning process itself in neural networks. This is motivatedby the ability of neural networks to provide higher scalability and better robustness when dealing with noise in theinput data (incomplete, contradictory, erroneous). The focus of pattern (10) is on reasoning with ﬁrst-order logic onknowledge graphs. This pattern learns speciﬁc reasoning tasks based on symbolic input tuples and the inferencingresults from the symbolic reasoner. Pattern (10) is a combination of our basic patterns for symbolic reasoning (2b) andtraining to produce a statistical model (1a).This pattern for training a neural network to do logical reasoning captures a wide variety of approaches such as rea-soning over RDF knowledge bases [24], Description Logic Reasoning [33] and logic programming [56]. RelationalTensor Networks (RTNs) [34] use a recurrent neural network to learn two kinds of predictions, namely the member-ship of individuals to classes and the existence of relations. In a somewhat different application, [25] takes a set ofnormalized triples and a normalized query to learn classiﬁcation of the entailment of the query from the statements inthe current knowledge graph. Whereas current efforts focus on deductive reasoning in knowledge bases of FOL andDL (or fragments thereof), the pattern can in theory be applied to other inference tasks and mechanisms. (2b) model:semanticsymbol: relation infer:deducemodeldata(1a) generate:train (11)

There is a long-standing tradition in both AI [16] and in the ﬁeld of cognitive architectures (e.g. [53]) to investigateso-called meta-reasoning systems, where one system reasons about (or:learns from) the behaviour of another system.It is widely recognised that the conﬁguration of machine learning systems is highly non-trivial, ranging from choosingan appropriate neural network architecture to setting a multitude of hyper-parameters for that architecture. The ﬁeldof AutoML [35] aims to automate that conﬁguration task. This is often done through applying machine learningtechniques to this task (ie the system is learning the right hyper-parameter settings for the target learning system), butthe conﬁguration of the target system can also be done by capturing the knowledge of machine learning engineers.This is done in a system such as Alpine Meadow [60] and is captured in pattern (11): a knowledge based of MLconﬁguration knowledge is used to deduce appropriate hyper-parameter settings for a learning system (sub-pattern(2b)), these parameters are then used to train a model (requiring a slightly modiﬁed version of sub-pattern (1a)), and11

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS the resulting performance of this model is inspected by the knowledge base which may give rise to adaptations of thehyper-parameters in the next iteration.Another class of systems which at ﬁrst sight may seem very different, but which are an instantiation of the same patternare so called ”curriculum-guided learning” systems. Curriculum learning problem can be deﬁned as the problem ofﬁnding the most efﬁcient sequence of learning situations in these various tasks so as to maximize its learning speed[27, 26]. In terms of pattern (11), the task of the (2b) subsystem is to feed the learner in subsystem (1a) its traininginstances in the optimal sequence.inline]Add the recent [PDF] Learning Reasoning Strategies in End-to-End Differentiable Proving P Minervini, SRiedel, P Stenetorp, E Grefenstette. . . - arXiv preprint arXiv . . . , 2020, Attempts to render deep learning modelsinterpretable, data-efﬁcient, and robust have seen some success through hybridisation with rule-based systems, forexample, in Neural Theorem Provers (NTPs). These neuro-symbolic models can . . .Notice that this pattern closely resembles pattern (7) (informed learning with prior knowledge). In that pattern, thesymbolic component deduces prior knowledge as input for the training component once, but the resulting trainingmodel is not subsequently inspected to possibly adjust this input. Whereas in pattern (11) a symbolic system is usedto guide the learning behaviour of a subsymbolic system, the converse is also possible. In [44], a system is presentedwhere a subsymbolic system learns the search strategies needed to guide a symbolic theorem prover. This line of workhas a long history, dating back to the 1990’s [63]. (12)model: semantic: skillsactor generate:engineer(1c) (1d) data:tensor: ontologytransform:embed data: vacancies(1d) data:tensor: vacanciestransform:embedmodel:stat: match(1a) generate:train data: vacancies(1d)transform:embeddata:tensor: vacancies(2a)symbol:skills infer:deduce(2b)symbol infer:deduce

In this section, we describe the use of our boxology patterns in two real world use cases.We have identiﬁed the need for a pattern-based system of systems approach towards design and evaluation of hybridAI systems. Using our approach can increase the level of trustworthiness of such systems. Trust in AI systems emergesfrom a number of factors, such as transparency, reproducibility, predictability and explainability. A Hybrid AI systemis not to be seen as a monolithic component, but communicating modules of such a hybrid system [19]. Insightinto the individual modules and components and their relationships and dependencies is essential, in particular in adecentralised system. The speciﬁcation and veriﬁcation of each component and their interactions enable a system-widevalidation of expected behaviour. The deﬁnition and use of best practices and design patterns supports the generationof trustworthy AI systems - either when building new systems or when understanding existing systems by dissectionand reverse-engineering.Our method allows for step-wise reﬁnement of a system design by starting with a high level of abstraction and drillingdown towards implementation details and reusable components by specifying more and more concrete choices, suchas which models to use. Starting from generic patterns, an implementation can be derived and deployed, based on theexperience and best practices in Hybrid AI. 12

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS

In the ﬁrst use case, the goal is to create a piece of software that is able to match open vacancies or job descriptionswith CVs of job seekers. In this speciﬁc use case skills, deﬁned as the ability to perform a task, are used to do thematchmaking. In the ﬁrst part of the project, a large architecture picture was created with a lot of boxes, arrows andterms. The distinction between processes and data was not very clear and the type of model was also not explicitlydeﬁned.With use of the boxology, the architecture picture is better readable (see ﬁgure (12)). This ﬁgure made it possible toon the one hand talk about the bigger boxes (elementary patterns) and on the other hand go into more detail about thespeciﬁc implementation of for example a model box and the speciﬁc input and output type needed. The ﬁgure makes italso possible to think about the future of the project in terms of patterns that have to be added, substituted or removed.To go into more detail for this speciﬁc use case: in the training phase, vacancies are transformed to the data type tensorusing an (word2vec) embedding. A Skills ontology, which is a semantic model, engineered by a human (pattern 1c),is also transformed to a tensor using an embedding (variant of pattern 1d). Both the vacancy and the skill ontology areused to train a neural network (pattern 1a), a statistical model. This model learns which sentences or parts of sentencesof a vacancy text contain a skill and match to which speciﬁc skill in the ontology. When this model is applied to a newvacancy, transformed using an embedding to a tensor (pattern 1d), it predicts the most probable skill through deduction(pattern 2a). An additional step is to use the ontology to deduce (pattern 2b) the label skill to a more understandableskill, for example a skill with a description. train & deducedata:videodata:video data:tracks(3a) infer:deducegenerate:traintrain & deducedata:imagedata:image model: statistic: object symbol: detectedObject(3a) infer:deducegenerate:train (2b) symbol: bestActioninfer:deducemodel:semanticinfer:deduce(3a variant) actorgenerate:engineermodel: semantic: rulesmodel:stat: tracker (13)

In the second use case, the goal is to get a robot to choose the best action to perform. The robot has access to acamera. In a training phase, an object detector and a tracker are trained using statistical models, whereas an ontologyis handcrafted by an expert actor. The object detector is a neural network trained using images of the environment(pattern 1a). The tracker uses the video stream of the camera and uses a different type of neural network (pattern1a). A semantic model in the form of an ontology about the world is engineered by a human (pattern 1c). At eachtimestep, the robot obtains new information from the camera. The image is used to predict which objects are visible inthe environment (pattern 2a). The tracker is used to track objects through time (pattern 2a). The tracks and the worldmodel are used to induce semantic rules (pattern 2c), such as reasoning what will happen next. Then these rules andthe detected object(s) are used to predict what the best action is (pattern 2b).In this use case, the boxology helped to see the similarity between the object detector and the tracker and to determinehow the detailed process ﬂow should be. 13

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS

In [66] a large collection of over 100 different systems for “informed machine learning” is presented. The survey paperprovides a broad overview of how many different learning algorithms can be enriched with prior knowledge. Theirﬁgure 2 provides a taxonomy of ”informed machine learning” across three dimensions: which source of knowledgeis integrated (e.g. expert knowledge or common sense world knowledge), how that knowledge is represented (e.g. asknowledge graphs, logic rules, algebraic equations and 5 other types of representation) and where this knowledge isintegrated in the machine learning pipeline (in the training data, the hypothesis set, the learning algorithm or the ﬁnalhypothesis). The second and third of these dimensions are used to categorise well over 100 different published systemsinto an 8x4 grid. All of these systems are captured in one of our design patterns (pattern 7), so while our proposalcovers a much broader set of hybrid systems, the result of [66] is a very detailed analysis of one of our patterns.In his invited address to the AAAI 2020 conference, Henry Kautz introduced a taxonomy for neural-symbolic sys-tems The proposed taxonomy consists of a ﬂat set of 6 types. We will brieﬂy summarise (our understanding of) theseinformally characterised types (partly based on the explanation in [19]), and we will show how Kautz’ types relateto the design patterns that we will propose in this paper. Type 1 systems (informally written by Kautz as “symbolicNeuro symbolic”) are learning systems that have symbols as input and output. This directly corresponds to ourelementary pattern 3b. Type 2 systems (informal notation: “Symbolic[Neuro]”) are symbolic reasoning systems thatare guided by a learned search strategy. These directly correspond to a variation of our pattern (11). Type 3 systems(informal notation “Neuro;Symbolic”) consist of a sequence of a neural learning system that performs abstractionfrom data to symbols, followed by a symbolic system. This corresponds to our patterns (6a) and (6b), showing that inthis case we make a more ﬁne-grained distinction. Type 4 systems (informal notation “Neuro:Symbolic → Neuro”)use symbolic I/O training pairs to teach a neural system. These correspond partly to our elementary pattern (3b) (forexample: inductive logic programming), partly to our pattern (8) (eg link prediction) and partly to pattern (10) (eglearning to reason), again showing that we propose a much more ﬁne-grained distinction. Type 5 systems (informalnotation “Neuro”) use symbolic rules that inform neural learning. These correspond to our pattern 7. Finally, Type 6systems (informally “Neuro[Symbolic]”) remains somewhat unclear (as also acknowledged in [19]), and we refrainfrom interpreting this type.Kautz types: Type 1 Type 2 Type 3 Type 4 Type 5 Type 6Our Patterns: 3b 11 6a,6b 3b, 8, 10 7 -The above table shows that there are substantial differences between our proposed design patterns and the system typesfrom Kautz. Kautz’ taxonomy has similar goals to ours, namely to identify different interaction patterns between neuraland symbolic components in a modular hybrid architecture, but our proposal goes beyond Kautz’ proposal because(a) Kautz proposes a taxonomy of systems without describing the internal architectures of the types of systems inhis taxonomy, and (b) we make more ﬁne-grained distinctions than Kautz, reﬁning his 6 categories into distinctivesubtypes, each with their own internal modular architecture (= design pattern).In [54] the authors survey hybrid (“neural-symbolic”) systems along eight different dimensions. We brieﬂy describeeach of these, and discuss their relationship to the distinction made in our own work.

Directed vs undirected graphical models . This is a ﬁner grained distinction about representations than we make. Inour patterns, these are both captured by the same “semantic model” component

Model-based vs. proof-based inference (better known as model-theoretic vs. proof-theoretic inference). Note thisuse of “model-based” is using “model” as the term is used by logicians, which is unlike how Darwiche usesthe term model-based, using the term model as used by machine learning, showing the highly ambiguous useof the term “model” in Computer Science in general and in AI in particular. Again, this is a ﬁner graineddistinction than we make, and again both of these forms of inference are captured in our single KR component.

Logic vs. Neural . This corresponds to our distinction between ML and KR components

Boole an vs. probabilistic semantics.

Similar to above, both of these would be captured by the KR component withoutmaking the distinction

Structure vs. parameter learning . This is captured in our notation by ML components that have either a statistical ora semantic model as their result

Symbols vs. Sub-symbols . This corresponds to our distinction between symbols and data. Unfortunately, and similarto our work, De Raedt et al. do not give a precise distinction between the two categories. ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS

Type of Logic . This is another ﬁner distinction then we make, these different types of logic are all captured in termsof our KR componentSummarising, on the one hand, De Raedt et al. make a number of ﬁner distinctions than our boxology, mostly in termsof the details inside our components (different variations of KR components, different variations of models), while onthe other hand de Raedt et al. do not discuss how these components should be conﬁgured into larger systems in orderto achieve a particular functionality, which is the goal of our boxology. Whereas our boxology is a reﬁnement of the6 types proposed by Kautz (both aiming to describe modular architectures of interacting components), the work by deRaedt et al. is a reﬁnement of some of our components, and could be combined with our work in a future version. Thesame is true for [66].

In our paper, we have presented a visual language (boxology) to represent learning and reasoning systems. Thetaxonomical vocabulary and a collection of patterns expressed in this language aim to foster a better understandingof Hybrid AI systems and support communication between AI communities. A practical application in two use casesdemonstrates that we can use the boxology to create a communicable blueprint of rather complex Hybrid AI systemsthat integrate various AI technologies.The work presented here provides ample opportunities for additional features and uses. We expect to apply the tax-onomy and visual language in many more use cases and is likely to evolve further as a result. New examples of AIsystems will contribute to extending and improving the taxonomy, which in turn allows us to cover more use cases.Using this approach, an increasingly more mature visual language will evolve.As a ﬁrst extension to the current boxology, the concept of actors can be deﬁned, along with the correspondinginteraction processes and models. Actors are necessary for modelling interactions among autonomous entities, such assoftware agents or robots, whether they are physically or logically distributed. They also allow for specifying systemswith humans in the loop and human-machine interaction in general. Use cases for actors include federated learningand reasoning, multi-robot coordination or hybrid human-agent teams. In [69] the authors propose an extension ofour boxology [65] with two abstract patterns for humans-in-the-loop systems, namely where the human agent eitherperforms the role of a feedback-provider or a feedback-consumer.Future work also includes developing the boxology from a means of representing system functionality towards anarchitectural tool-set of reusable components for design, implementation and deployment of hybrid AI systems. Amore coherent methodology for complex AI systems based on the boxology allows these systems to be easier tounderstand in terms of functionality. This in turn provides a basis for more explainable and trustworthy AI systemsdesign. An interesting topic to pursue in this respect is the creation and development of a generative grammar andlogic calculus for composing and verifying patterns. This would facilitate the above-mentioned goals of and allow forformal veriﬁcation at the component, pattern and system levels.When using a coherent methodology for complex Hybrid AI system design, it is expected that such a design becomeseasier to understand and maintain. In addition, hybrid AI systems will become better explainable, responsible, reliableand predictable. It is our aim to develop such systems as being trustworthy by design. This could provide a frameworkfor system quality control by evaluation and certiﬁcation.Finally, the methodology needs to be further documented using guidelines for specifying increasingly concrete imple-mentations of the concepts.

Acknowledgement

This research was partially funded by the Hybrid Intelligence Center, a 10-year programme funded by the DutchMinistry of Education, Culture and Science through the Netherlands Organisation for Scientiﬁc Research NWO, https://hybrid-intelligence-centre.nl . References [1] S. Arora, S. Bedathur, M. Ramanath, and D. Sharma. Itereﬁne: Iterative KG reﬁnement embeddings usingsymbolic knowledge.

CoRR , abs/2006.04509, 2020.[2] M. Asai. Unsupervised grounding of plannable ﬁrst-order logic representation from images. In J. Benton,N. Lipovetzky, E. Onaindia, D. E. Smith, and S. Srivastava, editors,

Proceedings of the Twenty-Ninth Interna- ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS tional Conference on Automated Planning and Scheduling, ICAPS 2018, Berkeley, CA, USA, July 11-15, 2019 ,pages 583–591. AAAI Press, 2019.[3] M. N. Asim, M. Wasim, M. U. G. Khan, W. Mahmood, and H. M. Abbasi. A survey of ontology learningtechniques and applications.

Database , 2018, 2018.[4] M. N. Asim, M. Wasim, M. U. G. Khan, W. Mahmood, and H. M. Abbasi. A survey of ontology learningtechniques and applications.

Database , 2018:bay101, 2018.[5] S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss markov random ﬁelds and probabilistic softlogic.

Journal of Machine Learning Research , 18:109:1–109:67, 2017.[6] S. Baier, Y. Ma, and V. Tresp. Improving visual relationship detection using semantic modeling of scene descrip-tions. In C. d’Amato, M. Fern´andez, V. A. M. Tamma, F. L´ecu´e, P. Cudr´e-Mauroux, J. F. Sequeda, C. Lange,and J. Heﬂin, editors,

The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna,Austria, October 21-25, 2017, Proceedings, Part I , volume 10587 of

Lecture Notes in Computer Science , pages53–68. Springer, 2017.[7] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti,D. Raposo, A. Santoro, R. Faulkner, C¸ . G¨ulc¸ehre, F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani,K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, andR. Pascanu. Relational inductive biases, deep learning, and graph networks.

CoRR , abs/1806.01261, 2018.[8] J. Bian, B. Gao, and T. Liu. Knowledge-powered deep learning for word embedding. In T. Calders, F. Esposito,E. H¨ullermeier, and R. Meo, editors,

Machine Learning and Knowledge Discovery in Databases - EuropeanConference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I , volume 8724 of

Lecture Notes in Computer Science , pages 132–148. Springer, 2014.[9] K. Borgwardt, E. Ghisu, F. Llinares-L´opez, L. O’Bray, and B. Rieck. Graph kernels: State-of-the-art and futurechallenges, 2020.[10] Z. Bouraoui, S. Jameel, and S. Schockaert. Inductive reasoning about ontologies using conceptual spaces. In

Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017.[11] Z. Bouraoui, S. Jameel, and S. Schockaert. Inductive reasoning about ontologies using conceptual spaces. InS. P. Singh and S. Markovitch, editors,

Proceedings of the Thirty-First AAAI Conference on Artiﬁcial Intelligence,February 4-9, 2017, San Francisco, California, USA. , pages 4364–4370. AAAI Press, 2017.[12] C. A. Brewster.

Mind the gap : bridging from text to ontological knowledge . PhD thesis, University of Shefﬁeld,UK, 2008.[13] P. Cimiano, A. M¨adche, S. Staab, and J. V¨olker. Ontology learning. In

Handbook on ontologies , pages 245–267.Springer, 2009.[14] G. Ciravegna, F. Giannini, M. Gori, M. Maggini, and S. Melacci. Human-driven FOL explanations of deeplearning. In C. Bessiere, editor,

Proceedings of the Twenty-Ninth International Joint Conference on ArtiﬁcialIntelligence, IJCAI 2020 , pages 2234–2240. ijcai.org, 2020.[15] W. W. Cohen, H. Sun, R. A. Hofer, and M. Siegler. Scalable neural methods for reasoning with a symbolicknowledge base. arXiv preprint arXiv:2002.06115 , 2020.[16] S. Costantini. Meta-reasoning: A survey. In A. C. Kakas and F. Sadri, editors,

Computational Logic: LogicProgramming and Beyond, Essays in Honour of Robert A. Kowalski, Part II , volume 2408 of

Lecture Notes inComputer Science , pages 253–288. Springer, 2002.[17] A. Darwiche. Human-level intelligence or animal-like abilities?

Communications of the ACM , 61(10):56–67,2018.[18] A. Darwiche. Human-level intelligence or animal-like abilities?

Commun. ACM , 61(10):56–67, Sept. 2018.[19] A. d’Avila Garcez and L. C. Lamb. Neurosymbolic ai: The 3rd wave, 2020.[20] M. H. de Boer and J. P. Verhoosel. Creating and evaluating data-driven ontologies.

International Journal onAdvances in Software , 12(3 and 4):300–309, 2019.[21] T. Demeester, T. Rockt¨aschel, and S. Riedel. Lifted rule injection for relation embeddings. In J. Su, X. Carreras,and K. Duh, editors,

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 1389–1399. The Association for ComputationalLinguistics, 2016.[22] F. Detassis, M. Lombardi, and M. Milano. Teaching the old dog new tricks: Supervised learning with constraints.

CoRR , abs/2002.10766, 2020. 16

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS [23] I. Donadello, L. Seraﬁni, and A. S. d’Avila Garcez. Logic tensor networks for semantic image interpretation. InC. Sierra, editor,

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI2017, Melbourne, Australia, August 19-25, 2017 , pages 1596–1602. ijcai.org, 2017.[24] M. Ebrahimi, M. K. Sarker, F. Bianchi, N. Xie, D. Doran, and P. Hitzler. Reasoning over RDF knowledge basesusing deep learning.

CoRR , abs/1811.04132, 2018.[25] M. Ebrahimi, M. K. Sarker, F. Bianchi, N. Xie, D. Doran, and P. Hitzler. Reasoning over rdf knowledge basesusing deep learning. arXiv preprint arXiv:1811.04132 , 2018.[26] M. Fang, T. Zhou, Y. Du, L. Han, and Z. Zhang. Curriculum-guided hindsight experience replay. In H. M.Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´e-Buc, E. B. Fox, and R. Garnett, editors,

Advances in Neu-ral Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019,NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada , pages 12602–12613, 2019.[27] P. Fournier, O. Sigaud, M. Chetouani, and P. Oudeyer. Accuracy-based curriculum learning in deep reinforcementlearning.

CoRR , abs/1806.09614, 2018.[28] E. Gamma, R. Helm, R. Johnson, and J. M. Vlissides.

Design Patterns: Elements of Reusable Object-OrientedSoftware . Addison-Wesley Professional, 1 edition, 1994.[29] A. d. Garcez, M. Gori, L. C. Lamb, L. Seraﬁni, M. Spranger, and S. N. Tran. Neural-symbolic comput-ing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprintarXiv:1905.06088 , 2019.[30] M. Garnelo, K. Arulkumaran, and M. Shanahan. Towards deep symbolic reinforcement learning.

CoRR ,abs/1609.05518, 2016.[31] P. Geibel. Reinforcement learning for mdps with constraints. In J. F¨urnkranz, T. Scheffer, and M. Spiliopoulou,editors,

Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany,September 18-22, 2006, Proceedings , volume 4212 of

Lecture Notes in Computer Science , pages 646–653.Springer, 2006.[32] L. Getoor. Probabilistic soft logic: A scalable approach for markov random ﬁelds over continuous-valued vari-ables - (abstract of keynote talk). In L. Morgenstern, P. S. Stefaneas, F. L´evy, A. Z. Wyner, and A. Paschke,editors,

Theory, Practice, and Applications of Rules on the Web - 7th International Symposium, RuleML 2013,Seattle, WA, USA, July 11-13, 2013. Proceedings , volume 8035 of

Lecture Notes in Computer Science , page 1.Springer, 2013.[33] P. Hohenecker and T. Lukasiewicz. Deep learning for ontology reasoning.

CoRR , abs/1705.10342, 2017.[34] P. Hohenecker and T. Lukasiewicz. Deep learning for ontology reasoning. arXiv preprint arXiv:1705.10342 ,2017.[35] F. Hutter, L. Kotthoff, and J. Vanschoren, editors.

Automatic Machine Learning: Methods, Systems, Challenges .Springer, 2019.[36] R. T. Icarte, T. Q. Klassen, R. A. Valenzano, and S. A. McIlraith. Teaching multiple tasks to an RL agent usingLTL. In E. Andr´e, S. Koenig, M. Dastani, and G. Sukthankar, editors,

Proceedings of the 17th InternationalConference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15,2018 , pages 452–461. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC,USA / ACM, 2018.[37] L. Illanes, X. Yan, R. T. Icarte, and S. A. McIlraith. Symbolic plans as high-level instructions for reinforcementlearning. In J. C. Beck, O. Buffet, J. Hoffmann, E. Karpas, and S. Sohrabi, editors,

Proceedings of the ThirtiethInternational Conference on Automated Planning and Scheduling, Nancy, France, October 26-30, 2020 , pages540–550. AAAI Press, 2020.[38] K. Inoue, H. Ohwada, and A. Yamamoto. Special issue on inductive logic programming, 2017.[39] S. Konstantopoulos and A. Charalambidis. Formulating description logic learning as an inductive logic program-ming task. In

International Conference on Fuzzy Systems , pages 1–7. IEEE, 2010.[40] S. Konstantopoulos and A. Charalambidis. Formulating description logic learning as an inductive logic program-ming task. In

FUZZ-IEEE 2010, IEEE International Conference on Fuzzy Systems, Barcelona, Spain, 18-23 July,2010, Proceedings , pages 1–7. IEEE, 2010.[41] R. Kop and et al. Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routineelectronic medical records.

Comp. in Bio. and Med. , 76:30–38, 2016.[42] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.

Proceedings of the IEEE , 86(11):2278–2324, 1998. 17

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS [43] L. D. Liello, P. Ardino, J. Gobbi, P. Morettin, S. Teso, and A. Passerini. Efﬁcient generation of structured objectswith constrained adversarial networks.

CoRR , abs/2007.13197, 2020.[44] S. M. Loos, G. Irving, C. Szegedy, and C. Kaliszyk. Deep network guided proof search. In T. Eiter and D. Sands,editors,

LPAR-21, 21st International Conference on Logic for Programming, Artiﬁcial Intelligence and Reason-ing, Maun, Botswana, May 7-12, 2017 , volume 46 of

EPiC Series in Computing , pages 85–105. EasyChair,2017.[45] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. D. Raedt. Deepproblog: Neural probabilisticlogic programming. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,editors,

Advances in Neural Information Processing Systems 31: Annual Conference on Neural InformationProcessing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr´eal, Canada , pages 3753–3763, 2018.[46] G. Marra, M. Diligenti, F. Giannini, M. Gori, and M. Maggini. Relational neural machines. In G. D. Giacomo,A. Catal´a, B. Dilkina, M. Milano, S. Barro, A. Bugar´ın, and J. Lang, editors,

ECAI 2020 - 24th EuropeanConference on Artiﬁcial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29- September 8, 2020 - Including 10th Conference on Prestigious Applications of Artiﬁcial Intelligence (PAIS2020) , volume 325 of

Frontiers in Artiﬁcial Intelligence and Applications , pages 1340–1347. IOS Press, 2020.[47] G. Marra, F. Giannini, M. Diligenti, and M. Gori. LYRICS: A general interface layer to integrate logic inferenceand deep learning. In U. Brefeld, ´E. Fromont, A. Hotho, A. J. Knobbe, M. H. Maathuis, and C. Robardet,editors,

Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019,W¨urzburg, Germany, September 16-20, 2019, Proceedings, Part II , volume 11907 of

Lecture Notes in ComputerScience , pages 283–298. Springer, 2019.[48] C. Meilicke, M. W. Chekol, D. Rufﬁnelli, and H. Stuckenschmidt. Anytime bottom-up rule learning for knowl-edge graph completion. In

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intel-ligence, IJCAI-19 , pages 3137–3143. International Joint Conferences on Artiﬁcial Intelligence Organization, 72019.[49] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledgegraphs.

Proceedings of the IEEE , 104(1):11–33, 2016.[50] A. Padia, D. Martin, and P. F. Patel-Schneider. Automating class/instance representational choices in knowl-edge bases. In C. Faron-Zucker, C. Ghidini, A. Napoli, and Y. Toussaint, editors,

Knowledge Engineering andKnowledge Management - 21st International Conference, EKAW 2018, Nancy, France, November 12-16, 2018,Proceedings , volume 11313 of

Lecture Notes in Computer Science , pages 273–288. Springer, 2018.[51] H. Paulheim. Knowledge graph reﬁnement: A survey of approaches and evaluation methods.

Semantic Web ,8(3):489–508, 2017.[52] J. Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. In Y. Chang,C. Zhai, Y. Liu, and Y. Maarek, editors,

Proceedings of the Eleventh ACM International Conference on WebSearch and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018 , page 3. ACM, 2018.[53] D. Perlis, J. Brody, S. Kraus, and M. J. Miller. The internal reasoning of robots. In A. S. Gordon, R. Miller,and G. Tur´an, editors,

Proceedings of the Thirteenth International Symposium on Commonsense Reasoning,COMMONSENSE 2017, London, UK, November 6-8, 2017. , volume 2052 of

CEUR Workshop Proceedings .CEUR-WS.org, 2017.[54] L. D. Raedt, S. Dumancic, R. Manhaeve, and G. Marra. From statistical relational to neuro-symbolic artiﬁcialintelligence. In C. Bessiere, editor,

Proceedings of the Twenty-Ninth International Joint Conference on ArtiﬁcialIntelligence, IJCAI 2020 , pages 4943–4950. ijcai.org, 2020.[55] M. Richardson and P. Domingos. Markov logic networks.

Machine learning , 62(1-2):107–136, 2006.[56] T. Rockt¨aschel and S. Riedel. End-to-end differentiable proving. In I. Guyon, U. von Luxburg, S. Bengio, H. M.Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors,

Advances in Neural Information ProcessingSystems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, LongBeach, CA, USA , pages 3791–3803, 2017.[57] F. Sala, C. D. Sa, A. Gu, and C. R´e. Representation tradeoffs for hyperbolic embeddings. In J. G. Dy andA. Krause, editors,

Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stock-holmsm¨assan, Stockholm, Sweden, July 10-15, 2018 , volume 80 of

Proceedings of Machine Learning Research ,pages 4457–4466. PMLR, 2018.[58] M. K. Sarker, N. Xie, D. Doran, M. Raymer, and P. Hitzler. Explaining trained neural networks with semanticweb technologies: First steps. In T. R. Besold, A. S. d’Avila Garcez, and I. Noble, editors,

Proceedings of theTwelfth International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2017, London, UK, July17-18, 2017. , volume 2003 of

CEUR Workshop Proceedings . CEUR-WS.org, 2017.18

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS [59] L. Seraﬁni and A. d. Garcez. Logic tensor networks: Deep learning and logical reasoning from data and knowl-edge. arXiv preprint arXiv:1606.04422 , 2016.[60] Z. Shang, E. Zgraggen, and T. Kraska. Alpine meadow: A system for interactive automl. In

NEURIPS , 2109.[61] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai,A. Bolton, Y. Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis. Masteringthe game of go without human knowledge.

Nat. , 550(7676):354–359, 2017.[62] M. Silvestri, M. Lombardi, and M. Milano. Injecting domain knowledge in neural networks: a controlled exper-iment on a constrained problem.

CoRR , abs/2002.10742, 2020.[63] C. B. Suttner and W. Ertel. Automatic acquisition of search guiding heuristics. In M. E. Stickel, editor, , volume449 of

Lecture Notes in Computer Science , pages 470–484. Springer, 1990.[64] I. Tiddi, M. d’Aquin, and E. Motta. Data patterns explained with linked data. In A. Bifet, M. May, B. Zadrozny,R. Gavald`a, D. Pedreschi, F. Bonchi, J. S. Cardoso, and M. Spiliopoulou, editors,

Machine Learning and Knowl-edge Discovery in Databases - European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11,2015, Proceedings, Part III , volume 9286 of

Lecture Notes in Computer Science , pages 271–275. Springer,2015.[65] F. van Harmelen and A. ten Teije. A boxology of design patterns for hybrid learning and reasoning systems.

J.Web Eng. , 18(1-3):97–124, 2019.[66] L. Von Rueden, S. Mayer, J. Garcke, C. Bauckhage, and J. Schuecker. Informed machine learning–towards ataxonomy of explicit integration of knowledge into machine learning.

Learning , 18:19–20, 2019.[67] Q. Wang, Z. Mao, B. Wang, and L. Guo. Knowledge graph embedding: A survey of approaches and applications.

IEEE Transactions on Knowledge and Data Engineering , 29(12):2724–2743, 12 2017.[68] D. S. Weld and G. Bansal. Intelligible artiﬁcial intelligence.

CoRR , abs/1803.04263, 2018.[69] H. F. Witschel, C. Pande, A. Martin, E. Laurenzi, and K. Hinkelmann.

Visualization of Patterns for HybridLearning and Reasoning with Human Involvement , pages 193–204. Springer International Publishing, Cham,2021.[70] W. Wong, W. Liu, and M. Bennamoun. Ontology learning from text: A look back and into the future.

ACMComputing Surveys (CSUR) , 44(4):1–36, 2012.[71] W. Wong, W. Liu, and M. Bennamoun. Ontology learning from text: A look back and into the future.

ACMComput. Surv. , 44(4):20:1–20:36, 2012.[72] J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. V. den Broeck. A semantic loss function for deep learning withsymbolic knowledge. In J. G. Dy and A. Krause, editors,

Proceedings of the 35th International Conference onMachine Learning, ICML 2018, Stockholmsm¨assan, Stockholm, Sweden, July 10-15, 2018 , volume 80 of

JMLRWorkshop and Conference Proceedings , pages 5498–5507. JMLR.org, 2018.[73] W. Zhang, B. Paudel, L. Wang, J. Chen, H. Zhu, W. Zhang, A. Bernstein, and H. Chen. Iteratively learningembeddings and rules for knowledge graph reasoning. In L. Liu, R. W. White, A. Mantrach, F. Silvestri, J. J.McAuley, R. Baeza-Yates, and L. Zia, editors,

The World Wide Web Conference, WWW 2019, San Francisco,CA, USA, May 13-17, 2019 , pages 2366–2377. ACM, 2019.[74] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review ofmethods and applications. arXiv preprint arXiv:1812.08434 , 2018.

Appendix: Taxonomy

Concept DeﬁnitionInstance

An example or single occurrence of something. (Collins)

Data

Factual information (such as measurements or statistics) used as a basis for reasoning,discussion, or calculation. (MW)Number A numerical quantity that is assigned or is determined by calculation or measurement.(MW)Text Words and form of a written or printed work. Something (such as a story or movie)considered as an object to be examined, explicated, or deconstructed. (MW)19

ODULAR D ESIGN P ATTERNS FOR H YBRID L EARNING AND R EASONING S YSTEMS

Tensor A set of components, functions of the coordinates of any point in space, that transformlinearly between coordinate systems. For three-dimensional space there are 3 r compo-nents, where r is the rank. A tensor of zero rank is a scalar, of rank one, a vector. (Collins)A picture (bitmap) can be represented as a tensor of rank 2.Stream A stream of things is a large number of them occurring one after another. (Collins)Digital data (such as audio or video material) that is continuously delivered one packetat a time and is usually intended for immediate processing or playback. (MW)

Symbol

Something that stands for or suggests something else by reason of relationship, associ-ation, convention, or accidental resemblance. An arbitrary or conventional sign used inwriting or printing relating to a particular ﬁeld to represent operations, quantities, ele-ments, relations, or qualities (MW)Label A descriptive or identifying word or phrase (MW)An aspect or quality (such as resemblance) that connects two or more things or parts asbeing or belonging or working together or as being of the same kind. (MW)Trace A sign or evidence of some past thing. (MW)

Model

A system of postulates, data, and inferences presented as a mathematical description ofan entity or state of affairs. (MW)

Statistical Model

A statistical model is usually speciﬁed as a mathematical relationship between one ormore random variables and other non-random variables. (MW)

Semantic Model

A conceptual model represents ’concepts’ (entities) and relationships between them.(MW) Semantic technologies formally represent the meaning involved in information.For example, ontology can describe concepts, relationships between things, and cate-gories of things. (MW)

Processing

A series of actions or operations conducing to an end. (MW)

Generation

A process of coming or bringing into being. (MW) Deﬁning or originating (something,such as a mathematical or linguistic set or structure) by the application of one or morerules or operations. (MW)Training The process of learning the skills that you need for a particular job or activity. (Collins)In particular: model building, the process of preparing a machine learning model to beuseful by feeding it data from which it can learn, ie. detect statistical patterns.Engineering The application of science and mathematics by which the properties of matter and thesources of energy in nature are made useful to people. The design and manufacture ofcomplex products. (MW) In particular, software engineering: the methodical design,implementation, and maintenance of models and components of software architectures.

Transformation

The operation of changing (as by rotation or mapping) one conﬁguration or expressioninto another in accordance with a mathematical rule. (MW)

Inference

A conclusion or opinion that is formed because of known facts or evidence. (A) The actof passing from one proposition, statement, or judgment considered as true to anotherwhose truth is believed to follow from that of the former. (B) The act of passing fromstatistical sample data to generalisations, usually with calculated degrees of certainty.(MW)

Induction

Inference of a generalised conclusion from particular instances. (MW)

Deduction

The deriving of a conclusion by reasoning. Inference in which the conclusion aboutparticulars follows necessarily from general or universal premises. (MW)

Classiﬁcation systematic arrangement in groups or categories according to established criteria. (MW)

Prediction

To calculate (some future event or condition) usually as a result of study and analysis ofavailable pertinent data. (MW)

Actor

Autonomous entity acting proactively to initiate processes, based on intentions and goals.Interaction among actors leads to emergent system behaviour.

Human

Human beings interacting with AI system components and other actors.

Agent

Active software components that are proactive, rather than merely reactive. Agents canbe based on many internal reasoning mechanisms.