Symbolic Learning and Reasoning with Noisy Data for Probabilistic Anchoring
Pedro Zuidberg Dos Martires, Nitesh Kumar, Andreas Persson, Amy Loutfi, Luc De Raedt
SSymbolic Learning and Reasoningwith Noisy Data for Probabilistic Anchoring
Pedro Zuidberg Dos Martires , ∗ , Nitesh Kumar , Andreas Persson , ∗ ,Amy Loutfi , Luc De Raedt Declaratieve Talen en Artificiele Intelligentie (DTAI), Department of ComputerScience, KU Leuven, 3001 Heverlee, Belgium Center for Applied Autonomous Sensor Systems (AASS), Dept. of Science andTechnology, Örebro University, 701 82 Örebro, Sweden
Correspondence*:Pedro Zuidberg Dos [email protected] [email protected]
ABSTRACT
Robotic agents should be able to learn from sub-symbolic sensor data, and at the same time, beable to reason about objects and communicate with humans on a symbolic level. This raises thequestion of how to overcome the gap between symbolic and sub-symbolic artificial intelligence.We propose a semantic world modeling approach based on bottom-up object anchoring usingan object-centered representation of the world. Perceptual anchoring processes continuousperceptual sensor data and maintains a correspondence to a symbolic representation. We extendthe definitions of anchoring to handle multi-modal probability distributions and we couple theresulting symbol anchoring system to a probabilistic logic reasoner for performing inference.Furthermore, we use statistical relational learning to enable the anchoring framework to learnsymbolic knowledge in the form of a set of probabilistic logic rules of the world from noisy andsub-symbolic sensor input. The resulting framework, which combines perceptual anchoringand statistical relational learning, is able to maintain a semantic world model of all the objectsthat have been perceived over time, while still exploiting the expressiveness of logical rules toreason about the state of objects which are not directly observed through sensory input data.To validate our approach we demonstrate, on the one hand, the ability of our system to performprobabilistic reasoning over multi-modal probability distributions, and on the other hand, thelearning of probabilistic logical rules from anchored objects produced by perceptual observations.The learned logical rules are, subsequently, used to assess our proposed probabilistic anchoringprocedure. We demonstrate our system in a setting involving object interactions where objectocclusions arise and where probabilistic inference is needed to correctly anchor objects.
Keywords: Semantic World Modeling, Perceptual Anchoring, Probabilistic Anchoring, Statistical Relational Learning, ProbabilisticLogic Programming, Object Tracking, Relational Particle Filtering, Probabilistic Rule Learning.
Statistical Relational Learning (SRL) (Getoor and Taskar, 2007; De Raedt et al., 2016) tightly integratespredicate logic with graphical models in order to extend the expressive power of graphical models towardsrelational logic and to obtain probabilistic logics than can deal with uncertainty. After two decades of uidberg Dos Martires et al. Symbolic Learning and Reasoning research, a plethora of expressive probabilistic logic reasoning languages and systems exists, (e.g. Sato andKameya (2001); Richardson and Domingos (2006); Getoor (2013); Fierens et al. (2015)). One obstacle thatstill lies ahead in the field of SRL (but see Gardner et al. (2014) and Beltagy et al. (2016)), is to combinesymbolic reasoning and learning, on the one hand, with sub-symbolic data and perception, on the otherhand. The question is how to create a symbolic representation of the world from sensor data in order toreason and ultimately plan in an environment riddled with uncertainty and noise. In this paper, we will takea probabilistic logic approach to study this problem in the context of perceptual anchoring.An alternative to using SRL or probabilistic logics would be to resort to deep learning. Deep learningis based on end-to-end learning (e.g., Silver et al. (2016)). Although exhibiting impressive results, deepneural networks do suffer from certain drawbacks. As opposed to probabilistic rules, it is, for example, notstraightforward to include prior (symbolic) knowledge in a neural system. Moreover, it is also often difficultto give guarantees for the behavior of neural systems, cf. the debate around safety and explainability inAI (Huang et al., 2017; Gilpin et al., 2018). Although not free from this latter shortcoming, this is less of aconcern for symbolic systems, which implies that bridging the symbolic/sub-symbolic gap is thereforeparamount. A notion that aims to bridge the symbolic/sub-symbolic gap is the definition of perceptualanchoring , as introduced by Coradeschi and Saffiotti (2000, 2001). Perceptual anchoring tackles theproblem of creating and maintaining, in time and space, the correspondence between symbols and sensordata that refer to the same physical object in the external world (a detailed overview of perceptual anchoringis given in Section 2.1). In this paper, we particularly emphasize sensor-driven bottom-up anchoring (Loutfiet al., 2005), whereby the anchoring process is triggered by the sensory input data.A further complication in robotics, and perceptual anchoring more specifically, is the inherent dependencyon time. This means that a probabilistic reasoning system should incorporate the concept of time natively.One such system, rooted in the SRL community, is the probabilistic logic programming language
DynamicDistributional Clauses (DDC) (Nitti et al., 2016b), which can perform probabilistic inference over logicsymbols and over time. In our previous work, we coupled the probabilistic logic programming languageDDC to a perceptual anchoring system (Persson et al., 2019), which endowed the perceptual anchoringsystem with probabilistic reasoning capabilities. A major challenge in combining perceptual anchoring witha high-level probabilistic reasoner, and which is still an open research question, is the administration of multi-modal probability distributions in anchoring . In this paper, we extend the anchoring notation in orderto handle additionally multi-modal probability distributions. A second point that we have not addressedin Persson et al. (2019), is the learning of probabilistic rules that are used to perform probabilistic logicreasoning. We show that, instead of hand-coding these probabilistic rules, we can adapt existing methodspresent in the body of literature of SRL to learn them from raw sensor data. In other words, instead ofproviding a model of the world to a robotic agent, it learns this model in form of probabilistic logical rules.These rules are then used by the robotic agent to reason about the world around it, i.e. perform inference.In Persson et al. (2019), we showed that enabling a perceptual anchoring system to reason further allowsfor correctly anchoring objects under object occlusions. We borrowed the idea of encoding a theory ofocclusion as a probabilistic logic theory from Nitti et al. (2014) (discussed in more detail in Subsection 2.3).While Nitti et al. operated in a strongly simplified setting, by identifying objects with AR tags, we useda perceptual anchoring system instead — identifying objects from raw RGB-D sensor data. In contrastto the approach presented here, the theory of occlusion was not learned but hand-coded in these previous A multi-modal probability distribution is a continuous probability distribution with strictly more than one local maximum. The key difference to a uni-modalprobability distribution, such as a simple normal distribution, is that summary statistics do not adequately mirror the actual distribution. In perceptual anchoringthese multi-modal distributions do occur, especially in the presence of object occlusions, and handling them appropriately is critical for correctly anchoringobjects. This kind of phenomenon is well known when doing filtering and is the reason why particle filters can be preferred over Kalman filters. uidberg Dos Martires et al. Symbolic Learning and Reasoning works and did not take into account the possibility of multi-modal probability distributions. We evaluatethe extensions of perceptual anchoring, proposed in this paper, on three showcase examples, which exhibitexactly this behavior: 1) we perform probabilistic perceptual anchoring when object occlusion induces amulti-modal probability distributions, and 2) we perform probabilistic perceptual anchoring with a learnedtheory of occlusion.We structure the remainder of the paper as follows. In Section 2, we introduce the preliminaries ofthis work by presenting the background and motivation of used techniques. Subsequently, we discuss, inSection 3, our first contribution by first giving a more detailed overview of our prior work (Persson et al.,2019), followed by introducing a probabilistic perceptual anchoring approach in order to enable anchoringin a multi-modal probabilistic state-space. We continue, in Section 4, by explaining how probabilisticlogical rules are learned. In Section 5, we evaluate both our contributions on representative scenarios beforeclosing this paper with conclusions, presented in Section 6.
Perceptual anchoring, originally introduced by Coradeschi and Saffiotti (2000, 2001), addresses a subsetof the symbol grounding problem in robotics and intelligent systems. The notion of perceptual anchoringhas been extended and refined since its first definition. Some notable refinements include the integration of conceptual spaces (Chella et al., 2003, 2004), the addition of bottom-up anchoring (Loutfi et al., 2005),extensions for multi-agent systems (LeBlanc and Saffiotti, 2008), considerations for non-traditional sensingmodalities and knowledge based anchoring given full scale knowledge representation and reasoningsystems (Loutfi, 2006; Loutfi and Coradeschi, 2006; Loutfi et al., 2008), and perception and probabilisticanchoring (Blodow et al., 2010). All these approaches to perceptual anchoring share, however, a number ofcommon ingredients from Coradeschi and Saffiotti (2000, 2001), including: • A symbolic system (including: a set X = { x , x , . . . } of individual symbols ; a set P = { p , p , . . . } of predicate symbols ). • A perceptual system (including: a set Π = { π , π , . . . } of percepts ; a set Φ = { φ , φ , . . . } of attributes with values in the domain D ( φ i ) ). • Predicate grounding relations g ⊆ P × Φ × D (Φ) that encode the correspondence between unarypredicates and values of measurable attributes (i.e., the relation g maps a certain predicate to compatibleattribute values).While the traditional definition of Coradeschi and Saffiotti (2000, 2001) assumed unary encodedperceptual-symbol correspondences, this does not support the maintenance of anchors with differentattribute values at different times. To address this problem, Persson et al. (2017) distinguishes two differenttypes of attributes: • Static attributes φ , which are unary within the anchor according to the traditional definition. • Volatile attributes φ t , which are individually indexed by time t , which are maintained in a set ofattribute instances ϕ , such that φ t ∈ ϕ .Without loss of generality, we assume from here on that all attributes stored in an anchor are volatile,i.e., that they are indexed by a time step t . Static attributes are trivially converted to volatile attributes bygiving them the same attribute value in each time step. uidberg Dos Martires et al. Symbolic Learning and Reasoning
Figure 1.
A conceptual illustration of the internal data structure that constitutes a single anchor, and whichis first initiated by a percept π from a raw image. The volatile and static attributes are derived from thispercept, while predicates such as red , are derived from static attributes (which are not indexed by time),e.g. the static color histogram attribute.Given the components above, an anchor is an internal data structure α xt , indexed by time t and identifiedby a unique individual symbol x (e.g., mug-1 and apple-4 ), which encapsulates and maintains thecorrespondences between percepts and symbols that refer to the same physical object , as depicted inFigure 1. Following the definition presented by Loutfi et al. (2005), the principal functionalities to createand maintain anchors in a bottom-up fashion , i.e., functionalities triggered by a perceptual event, are: • Acquire – initiates a new anchor whenever a candidate object is received that does not match anyexisting anchor α xt . This functionality defines a structure α xt , indexed by time t and identified by aunique identifier x , which encapsulates and stores all perceptual and symbolic data of the candidateobject. • Re-acquire – extends the definition of a matching anchor α xt from time t − k to time t . This functionalityensures that the percepts pointed to by the anchor are the most recent and adequate perceptualrepresentation of the object.Based on the functionalities above, it is evident that an anchoring matching function is essential todecide whether a candidate object is matches an existing anchor or not. Different approaches in perceptualanchoring vary, in particular in how the matching function is specified. For example, in Persson et al.(2019), we have shown that the anchoring matching function can be approximated by a learned modeltrained with manually labeled samples collected through an annotation interface (through which the humanuser can interfere with the anchoring process and provide feedback about which objects in the scene matchpreviously existing anchors).In another recently published anchoring approach, Ruiz-Sarmiento et al. (2017) focus on spatial featuresand distinguish unary object features, e.g., the position of an object, from pairwise object features, e.g.,the distance between two objects, in order to build a graph-based world model that can be exploited bya probabilistic graphical model (Koller et al., 2009) in order to leverage contextual relations betweenobjects to support - D object recognition. In parallel with our previous work on anchoring, Günther et al.(2018) have further exploited this graph-based model on spatial features and propose, in addition, to learnthe matching function through the use of a Support Vector Machine (trained on samples of object pairsmanually labeled as "same or different object" ), in order to approximate the similarity between two objects.The assignment of candidate objects to existing anchors is, subsequently, calculated using prior similarityvalues and a
Hungarian method (Kuhn, 1955). However, in contrast to Günther et al. (2018), the matchingfunction introduced in Persson et al. (2019) do not only rely upon spatial features (or attributes), but can uidberg Dos Martires et al. Symbolic Learning and Reasoning also take into consideration visual features (such as color features), as well as semantic object categories,in order to approximate the anchoring matching problem.
Dynamic Distributional Clauses (DDC) (Nitti et al., 2016b) provide a framework for probabilisticprogramming that extends the logic programming language
Prolog (Sterling and Shapiro, 1994) to theprobabilistic domain. A comprehensive treatise on the field of probabilistic logic programming can befound in De Raedt and Kimmig (2015) and Riguzzi (2018). DDC is capable of representing discrete andcontinuous random variables and to perform probabilistic inference. Moreover, DDC explicitly models time,which makes it predestined to model dynamic systems. The underpinning concepts of DDC are related toideas presented in Milch et al. (2007) but embedded in logic programming. Related ideas of combiningdiscrete time steps, Bayesian learning and logic programming are also presented in Angelopoulos andCussens (2008, 2017).An atom p ( t , ..., t n ) consists of a predicate p/n of arity n and terms t , ..., t n . A term is either a constant(written in lowercase), a variable (in uppercase), or a function symbol. A literal is an atom or its negation.Atoms which are negated are called negative atoms and atoms which are not negated are called positiveatoms .A distributional clause is of the form h ∼ D ← b , . . . , b n , where ∼ is a predicate in infix notationand b i ’s are literals, i.e., atoms or their negation. h is a term representing a random variable and D tellsus how the random variable is distributed. The meaning of such a clause is that each grounded instanceof a clause ( h ∼ D ← b , . . . , b n ) θ defines a random variable h θ that is distributed according to D θ whenever all literals b i θ are true. A grounding substitution θ = { V / t , . . . , V n / t n } is a transformationthat simultaneously substitutes all logical variables V i in a distributional clause with non-variable terms t i . DDC can be viewed as a language that defines conditional probabilities for discrete and continuousrandom variables: p ( h θ | b θ, . . . , b n θ ) = D θ . Example 1:
Consider the following DDC program: n ~ poisson(6).pos(P):0 ~ uniform(0,100) ← n~=N, between(1,N,P).pos(P):t+1 ~ gaussian(X+3, Σ ) ← pos(P):t~=X.left(O1,O2):t ~ finite([0.99:true, 0.01:false]) ← pos(O1):t~=P1, pos(O2):t~=P2, P1
Probability in the second argument unifies with the probability of object being to the left ofobject and having a positive coordinate position.A DDC program P is a set of distributional and/or definite clauses (as in Prolog). A DDC program definesa probability distribution p ( x ) over possible worlds x . Example 2:
One possible world of the uncountably many possible worlds encoded by the programin Example 1. The sampled number n determines that objects exists, for which the ensuingdistributional clauses then generate a position and the left/2 relationship: n ~= 2.pos(1):t ~= 30.5. pos(1):t ~= 63.2.pos(1):t+1 ~= 32.4. pos(1):t+1 ~= 58.8.left(1,2):t ~= true. left(2,1):t ~= false. When performing inference within a specific time step, DDC deploys importance sampling combinedwith backward reasoning (SLD-resolution), likelihood weighting and Rao-Blackwellization (Nitti et al.,2016a). Inferring probabilities in the next time given the previous time step is achieved through particlefiltering (Nitti et al., 2013). If the DDC program does not contain any predicates labelled with a time indexthe program represents a
Distributional Clauses (DC) (Gutmann et al., 2011) program, where filtering overtime steps is not necessary.
Object occlusion is a challenging problem in visual tracking and a plethora of different approachesexist that tackle different kinds of occlusions; a thorough review of the field is given in Meshgi and Ishii(2015). The authors use three different attributes of an occlusion to categorize it: the extent (partial or fullocclusion), the duration (short or long), and the complexity (simple or complex) . Another classification ofocclusions separates occlusions into dynamic occlusions , where objects in the foreground occlude eachother and scene occlusions , where objects in the background model are located closer to the camera andocclude target objects by being moved between the camera and the target objects .Meshgi and Ishii report that the majority of research on occlusions in visual tracking has been done onpartial, temporal and simple occlusions. Furthermore, they report that none of the approaches examined inthe comparative studies of Smeulders et al. (2013) and Wu et al. (2013), handles either partial complexocclusions or full long complex occlusions. To the best of our knowledge, our previous paper on combiningbottom-up anchoring and probabilistic reasoning, constitutes the first tracker that is capable of handlingocclusions that are full, long and complex (Persson et al., 2019). This was achieved by declaring a theoryof occlusion (ToO) expressed as dynamic distributional clauses. Example 3:
An excerpt from the set of clauses that constitute the ToO. The example clause describesthe conditions under which an object is considered a potential
Occluder of an other object
Occluded . occluder(Occluded,Occluder):t+1 ~ finite(1.0:true) ← observed(Occluded):t, An occlusion of an object is deemed complex if during the occlusion the occluded object considerably changes one of its key characteristics, e.g. position,color, size). An occlusion is simple if it is not complex. Further categories exist, we refer the reader to Vezzani et al. (2011); Meshgi and Ishii (2015). uidberg Dos Martires et al. Symbolic Learning and Reasoning \+observed(Occluded):t+1, %not observed in next time stepposition(Occluded):t ~= (X,Y,Z),position(Occluder):t+1 ~= (XH,YH,ZH),D is sqrt((X-XH)^2+(Y-YH)^2), Z Out of all the potential Occluder ’s the actual occluding object is then sampled uniformly: occluded_by(Occluded,Occluder):t+1 ← sample_occluder(Occluded):t+1 ~= Occluder.sample_occluder(Occluded):t+1 ~ uniform(ListOfOccluders) ← findall(O, occluder(Occluded, O):t+1, ListOfOccluders). Declaring a theory of occlusion and coupling it to the anchoring system allows the anchoring system toperform occlusion reasoning and to track objects not by directly observing them but by reasoning aboutrelationships that occluded objects have entered with visible (anchored) objects. The idea of declaring atheory of occlusion first appeared in Nitti et al. (2013), where, however, the data association problem wasassumed to be solved by using AR tags.As the anchoring system was not able to handle probabilistic states in our previous work, the theory ofocclusion had to describe unimodal probability distributions. In this paper we repair this deficiency (cf.Section 3.2). Moreover, the theory of occlusion had to be hand-coded (also the case for Nitti et al. (2013)).We replace the hand-coded theory of occlusion by a learned one (cf. Section 4).Considering our previous work from the anchoring perspective, our approach is most related to thetechniques proposed in Elfring et al. (2013), who introduced the idea of probabilistic multiple hypothesisanchoring in order to match and maintain probabilistic tracks of anchored objects, and thus, maintain anadaptable semantic world model . From the perspective of how occlusions are handled, Elfring et al.’s andour work differs, however, substantially. Elfring et al. handle occlusions that are due to scene occlusion .Moreover, the occlusions are handled by means of a multiple hypothesis tracker, which is suited for shortocclusions rather then long occlusions. The limitations with the use of multiple hypothesis tracking forworld modeling, and consequently also for handling object occlusions in anchoring scenarios (as in Elfringet al. (2013)), have likewise been pointed out in a publication by Wong et al. (2015). Wong et al. reportedinstead the use of a clustering-based data association approach (opposed to a tracking-based approach), inorder to aggregate a consistent semantic world model from multiple viewpoints, and hence, compensate forpartial occlusions from a single viewpoint perspective of the scene. In this section, we present a probabilistic anchoring framework based on our previous work on conjoiningprobabilistic reasoning and object anchoring (Persson et al., 2019). An overview of our proposed framework,which is implemented utilizing the libraries and communication protocols available in the Robot OperatingSystem (ROS) , can be seen in Figure 2. However, our prior anchoring system , seen in Figure 2– (cid:13) ,was unable to handle probabilistic states of objects. While the probabilistic reasoning module , seen inFigure 2– (cid:13) , was able to model the position of an object as a probability distribution over possible positions,the anchoring system only kept track of a single deterministic position: the expected position of an object. The code can be found online at: https://bitbucket.org/reground/anchoring uidberg Dos Martires et al. Symbolic Learning and Reasoning Therefore, we extend the anchoring notation towards a probabilistic anchoring approach, in order to enablethe anchoring system to handle multi-modal probability distributions. A NCHORING S YSTEM I NFERENCE S YSTEM P r ob a b ili s t i c R eas on i ng R e - A c qu i r e t–kt Track RGB-D Sensor Data P ERCEPTUAL P IPELINE 1. 3.2. A c qu i r e Feature Extraction Measurments(attributes) Predicate Grounding+Object Classification red blackbrowngreen yellowsmall big Predicate Symbols Matching Function O b j ec t S e g m e n t a t i on A NCHOR- S PACE Figure 2. The overall framework is divided into three basic sub-systems (or modules): (cid:13) an initial perceptual processing pipeline for detecting, segmenting and processing perceived objects, (cid:13) an anchoringsystem for creating and maintaining updated and consistent representations (anchors) of perceived objects,and (cid:13) an inference system for aiding the anchoring system and logically tracking objects in complexdynamic scenes. Before presenting our proposed probabilistic anchoring approach, we first introduce the necessaryrequirements and assumptions (which partly originate in our previous work, Persson et al. (2019)):1. We assume that unknown anchor representations, α y , are supplied by a black-box perceptual processingpipeline, as exemplified in Figure 2– (cid:13) . They consist of extracted attribute measurements andcorresponding grounded predicate symbols . We further assume that for each perceptual representationof an object, we have the following attribute measurements: a color attribute ( φ colory ), a positionattribute ( φ posy ), and a size attribute ( φ sizey ). Example 4: In this paper we use the combined Depth Seeding Network (DSN) and RegionRefinement Network (RNN), as presented by Xie et al. (2019), for the purpose of segmentingarbitrary object instances in tabletop scenarios. This two-stage approach leverages both RGBand depth data (given by a Kinect V2 RGB-D sensor), in order to first segment rough initialobject masks (based on depth data), followed by a second refinement stage of these object masks(based on RGB data). The resulting output for each segmented object, is then both a - D spatialpercept ( φ spatialy ), as well as a - D visual percept ( φ visualy ). For each segmented spatial percept ,and with the use of the Point Cloud Library (PCL), are both a position attribute measured asthe - D geometrical center, and a size attribute measured as the - D geometrical bounding box.Similarly, using the Open Computer Vision Library (OpenCV), a color attribute is measuredas the discretized color histogram (in HSV color-space) for each segmented visual percept, asdepicted in Figure 3. uidberg Dos Martires et al. Symbolic Learning and Reasoning 2. In order to semantically categorize objects , we assume a Convolutional Neural Network (CNN),such as the GoogLeNet model (Szegedy et al., 2015), is available, cf. Persson et al. (2017). In thecontext of anchoring, the inputs for this model are segmented visual percepts ( π visualy ), while resultingobject categories, denoted by the predicate p categoryy ∈ P , are given together with the predictedprobabilities φ categoryy (cf. Section 2.1). An example of segmented objects together with the , given by an integrated GoogLeNet model, is illustrated in Figure 4. In addition,this integrated model is also used to enhance the traditional acquire functionality such that a uniqueidentifier x is generated based on the object category symbol p category . For example, if the anchoringsystem detects an object it has not seen before and classifies it as a cup , a corresponding uniqueidentifier x = cup-4 could be generated (where the means that this is the forth distinct instance of a cup object perceived by the system). Figure 3. Examples of measured color attribute (measured as the discretized color histogram overeach segmented object). Figure 4. Examples of semantically categorizeobjects (depicted with the for each segmented object).3. We require the presence of a probabilistic inference system coupled to the anchoring system , asillustrated in Figure 2– (cid:13) . The anchoring system is responsible for maintaining objects perceivedby the sensory input data and for maintaining the observable part of the world model. Maintainedanchored object representations are then treated as observations in the inference system, which usesrelational object tracking to infer the state of occluded objects through their relations with perceivedobjects in the world. This inferred belief of the world is then sent back to the anchoring system, wherethe state of occluded objects is updated. The feedback-loop between the anchoring system and theprobabilistic reasoner results in an additional anchoring functionality (Persson et al., 2019): • Track – extends the definition of an anchor α x from time t − to time t . This functionality isdirectly responding to the state of the probabilistic object tracker, which ensures that the perceptspointed to by the anchor are the adequate perceptual representation of the object, even though theobject is currently not perceived.Even though the mapping between measured attribute values and corresponding predicate symbols isan essential facet of anchoring, we will not cover the predicate grounding in further detail in this paper.However, for completeness, we will refer to Figure 3 and exemplify that the predicate grounding relation ofa color attribute can, intuitively, be expressed as the encoded correspondence between a specific peek in the uidberg Dos Martires et al. Symbolic Learning and Reasoning color histogram and certain predicate symbol (e.g., the symbol black for the mug object). Likewise, a futuregreater ambition of this work is to establish a practical framework through which the spatial relationshipsbetween objects are encoded and expressed using symbolic values, e.g., object A is underneath object B . The entry point for the anchoring system, seen in Figure 2– (cid:13) , is a learned matching function . Thisfunction assumes a bottom-up approach to perceptual anchoring, described in Loutfi et al. (2005), where thesystem constantly receives candidate anchors and invokes a number of attribute specific matching similarityformulas (i.e., one matching formula for each measured attribute). More specifically, a set of attributes Φ y of an unknown candidate anchor α yt (given at current time t ) is compared against the set of attributes Φ x of an existing anchor α xt − k (defined at time t − k ) through attribute specific similarity formulas. Forinstance, the similarity between the positions attributes φ posy of an unknown candidate anchor, and thelast updated position φ post − k,x of an existing anchor, is calculated according to the L -norm (in - D space),which is further mapped to a normalized similarity value (Blodow et al., 2010): d pos ( φ post − k,x , φ post,y ) = e − L ( φ post − k,x ,φ post,y ) (1)Hence, the similarity between two positions attributes is given in interval [0 , , where a value of is equivalent with perfect correspondence. Likewise, the similarity between two color attributes arecalculated by the color correlation , while the similarity between size attributes is calculated according tothe generalized Jaccard similarity (for further details regarding similarity formulas, we refer to our previouswork (Persson et al., 2019)). The similarities between the attributes of a known anchor and an unknowncandidate anchor are then fed to the learned matching function to determine whether the matching functionclassifies the unknown anchor to be acquired as a new anchor, or re-acquired as an existing anchor.In our prior work on anchoring, the attribute values have always been assumed to be deterministicwithin a single time step. This assumption keeps the anchoring system de facto deterministic even thoughit is coupled to a probabilistic reasoning module. We extend the anchoring notation with two distinctspecifications of (volatile) attributes:1. An attribute φ t ∈ ϕ is deterministic at time t if it takes a single value from the domain D ( φ t ) .2. An attribute φ t ∈ ϕ is probabilistic at time t if it is distributed according to a probability distribution P r ( φ t ) over the domain D ( φ t ) at time step t .Having a probabilistic attribute value φ t (e.g., φ post − k,x in Equation 1), means that the similarity calculatedwith the probabilistic attribute values (e.g., the similarity value d pos ), will also be probabilistic. Next, inorder to use an anchor matching function together with probabilistic similarity values, two extensions arepossible: 1) extend the anchor matching function to accept random variables (i.e., probabilistic similarityvalues), or 2) retrieve a point estimate of the random variable.We chose the second option as this allows us to reuse the anchor matching function learned in Perssonet al. (2019) without the additional expense of collecting data and re-training the anchor matching function.The algorithm to produce the set of matching similarity values that are fed to the anchor matching functionis given in Algorithm 1, where lines 4-5 are the extension proposed in this work.The point _ estimate function in Algorithm 1 (line 5) is attribute specific (indicated by the subscript( φ t − ,x )), i.e. we can chose a different point estimation function for color attributes than for position uidberg Dos Martires et al. Symbolic Learning and Reasoning Algorithm 1 Attribute Compare Input: Φ x , Φ y – sets of anchor attribute values Output: D x,y – set of matching similarity values function A TTRIBUTE C OMPARE ( Φ x , Φ y ) D x,y ← empty set for each φ t,x ∈ Φ x and φ t,y ∈ Φ y do if φ t − ,x is probabilistic then D x,y + ← point _ estimate φ t − ,x ( d ( φ t − ,x , φ t,y )) else (cid:46) deterministic case D x,y + ← d ( φ t − k,x , φ t,y ) return D x,y attributes . An obvious attribute upon which reasoning can be done is the position attribute, for example, inthe case of possible occlusions. In other words, we would like to perform probabilistic anchoring whiletaking into account the probability distribution of an anchor’s position. A reasonable goal is then to matchan unknown candidate anchor with the most likely anchor, i.e. with the anchor whose position attributevalue is located at the highest mode of the probability distribution of the position attribute values. This isachieved by replacing line 5 in Algorithm 1 with: F posx ← (cid:40) φ post − ,x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂P r ( φ post − ,x ) ∂φ post − ,x = 0 (cid:41) (2) D x,y + ← max φ pos ∈F posx ( d pos ( φ pos , φ post,y )) (3) F posx is the set of positions situated at the modes of the probability distribution P r ( φ post − ,x ) . In Equation 3we take the max as the co-domain of the position similarity value d pos is in [0 , , where 1 reflects perfectcorrespondence (cf. Equation 1).In Persson et al. (2019), we approximated the probabilistic state of the world in the inference system (cf.Figure 2– (cid:13) ) by N particles, which are updated by means of particle filtering. The precise informationthat is passed from the inference system to the anchoring system is a list of N particles that approximate a(possible) multi-modal belief of the world. More specifically, an anchor α xt is updated according to the N particles of possible states of a corresponding object, maintained in the inference system, such that N possible - D positions are added to the volatile position attributes ϕ posx . In practice we assume thatsamples are only drawn around the modes of the probability distribution, which means that we can replaceline 5 of Algorithm 1 with: D x,y + ← max i (cid:16) d pos ( φ post − ,x,i , φ post,y ) (cid:17) = max i (cid:16) e − L ( φ post − ,x,i ,φ post,y ) (cid:17) (4)Where φ t − ,x,i is a sampled position and i ranges from to the number of samples N .Performing probabilistic inference in the coordinate space is a choice made in the design of theprobabilistic anchoring system. Instead, the probabilistic tracking could also be done in the HSV colorspace, for instance. In this case, the similarity measure used in Algorithm 1 would have to be adaptedaccordingly. It is also conceivable to combine the tracking in coordinate space and color space. Thisintroduces, however, the complication of finding a similarity measure that works on the coordinate space uidberg Dos Martires et al. Symbolic Learning and Reasoning and the color space at the same time. A solution to this would be to, yet again, learn this similarity functionfrom data (Persson et al., 2019). While several approaches exist in the SRL literature that learn probabilistic relational models, most ofthem focus on parameter estimation (Sato, 1995; Friedman et al., 1999; Taskar et al., 2002; Neville andJensen, 2007) and structure learning has been restricted to discrete data. Notable exceptions include therecently proposed hybrid relational formalism by Ravkic et al. (2015), which learns relational models in adiscrete-continuous domain but has not been applied to dynamics or robotics, and the related approachof Nitti et al. (2016b), where a relational tree learner DDC-TL learns both the structure and the parametersof distributional clauses. DDC-TL has been evaluated on learning action models (pre- and post-conditions)in a robotics setting from before- and after-states of executing the actions. However, there were severallimitations of the approach. It simplified perception by resorting to AR tags for identifying the objects, itdid not consider occlusion, and it could not deal with uncertainty or noise in the observations.A more general approach to learning distributional clauses, extended with statistical models , is beingproposed in Kumar et al. (2020) . Such a statistical model relates continuous variables in the body of adistributional clause to parameters of the distribution in the head of the clause. The approach simultaneouslylearns the structure and parameters of (non-dynamic) distributional clauses, and estimates the parametersof the statistical model in clauses. A DC program consisting of multiple distributional clauses is capableof expressing intricate probability distributions over discrete and continuous random variables. A furthershortcoming of DDC-TL (also tackled by Kumar et al.) is the inability of learning in the presence ofbackground knowledge — that is, additional (symbolic) probabilistic information about objects in theworld and relations (such as spatial relations) among the objects that the learning algorithm should takeinto consideration.However, until now, the approach presented in Kumar et al. (2020) has only been applied to the problemof autocompletion of relational databases by learning a (non-dynamic) DC program. We now demonstratewith an example of how this general approach can also be applied for learning dynamic distributionalclauses in a robotics setting. A key novelty in the context of perceptual anchoring is that we learn a DDCprogram that allows us to reason about occlusions. Example 5: Consider again a scenario where objects might get fully occluded by other objects. Wewould now like to learn the ToO that describes whether an object is occluded or not given multipleobservations of the before and after state. In DDC we represent observations through facts as follows pos(o1_exp1):t ~= 2.3.pos(o1_exp1)t+1 ~= 9.3.pos(o2):t ~= 2.2.pos(o2):t+1 ~= 9.2.occluded_by(o1_exp1,o2_exp1):t+1.pos(o3_exp1):t ~= 8.3.... For the sake of clarity, we have considered only one-dimensional positions in this example. https://github.com/niteshroyal/DreaML uidberg Dos Martires et al. Symbolic Learning and Reasoning Given the data in form dynamic distributional clauses, we are now interested in learning the ToOinstead of relying on a hand-coded one, as in Example 3. An excerpt from the set of clauses thatconstitute a learned ToO is given below. As in Example 3, the clause describe the circumstances underwhich an object ( Occluded ) is potentially occluded by an other object ( Occluder ). occluder(Occluded,Occluder):t+1 ~ finite(1.0:false) ← occluded_by(Occluded,Occluder):t,observed(Occluded):t+1.occluder(Occluded,Occluder):t+1 ~ finite(0.92:true,0.08:false) ← occluded_by(Occluded,Occluder):t,\+observed(Occluded):t+1.occluder(Occluded,Occluder):t+1 ~ finite(P1:true,P2:false) ← \+occluded_by(Occluded,Occluder):t,\+observed(Occluded):t+1,distance(Occluded,Occluder):t~=Distance,logistic([Distance],[-16.9,0.8],P1),P2 is 1-P1. Note that, in the second but last line of the last clause above the arbitrary threshold on the Distance is superseded by a learned statistical model, in this case a logistic regression, which maps the inputparameter Distance to the probability P1 : P1 = 11 + e . × D -0.8 (5)Replacing the hand-coded occluder rule with the learned one in the theory of occlusion allows usto track occluded objects with a partially learned model of the world.In order to learn dynamic distributional clauses, we first map the predicates with subscripts that referto the current time step t and the next time step t+1 to standard predicates, which gives us an input DCprogram. For instance, we map pos(o1_exp1):t to pos_t(o1_exp1) , and occluder(o1_exp1,o2_exp2):t+1 to occluder_t1(o1_exp1,o2_exp2) . The method introduced in (Kumar et al.,2020) can now be applied for learning distributional clauses for the target predicate occluder_t1(o1_exp1,o2_exp2) from the input DC program.Clauses for the target predicate are learned by inducing a distributional logic tree. An example of such atree is shown in Figure 5. The key idea is that the set of clauses for the same target predicate are representedby a distributional logic tree, which satisfies the mutual exclusiveness property of distributional clauses.This property states that if there are two distributional clauses defining the same random variable, theirbodies must be mutually exclusive. Internal nodes of the tree correspond to atoms in the body of learnedclauses. A leaf node corresponds to a distribution in the head and to a statistical model in the body of alearned clause. A path beginning at the root node and proceeding to a leaf node in the tree corresponds to aclause. Parameters of the distribution and the statistical model of the clause are estimated by maximizingthe expectation of the log-likelihood of the target in partial possible worlds. The worlds are obtained byproving all possible groundings of the clause in the input DC program. The structure of the induced treedefines the structure of the learned clauses. The approach requires declarative bias to restrict the searchspace while inducing the tree. uidberg Dos Martires et al. Symbolic Learning and Reasoning Figure 5. A distributional logic tree that represents learned clauses for the target occluder(Occluded,Occluder):t+1 . The leftmost path corresponds to the first clause, the rightmost path correspondsto the last clause for occluder(Occluded,Occluder):t+1 in Example 5. Internal nodessuch as occluder(Occluded,Occluder):t and observed(Occluded):t+1 are discretefeatures, whereas, internal nodes such as distance(Occluded,Occluder):t+1~=Distance is a continuous feature.In summary, the input to the learning algorithm is a DC program consists of • background knowledge, in the form of DC clauses; • observations, in the form of DC clauses — these constitute the training data; • the declarative bias, which is necessary to specify the hypothesis space of the DC program (Adé et al.,1995); • the target predicates for which clauses should be learned.The output is: • a set of DC clauses represented as a tree for each target predicate specified in the input.Once the clauses are learned, predicates are mapped back to predicates with subscripts to obtain dynamicdistributional clauses. For instance, occluder_t1(Occluded,Occluder) in the learned clauses ismapped back to occluder(Occluded,Occluder):t+1 .The data used for the learning of the theory of occlusion consists of training points of before-after statesof two kinds. The first kind are pairs describing a transition of an objection from being observed to beingoccluded. The second kind of data pairs describe an object being occluded in the current state as well as inthe next state. Examples of two raw data points for the first kind can be seen in Figure 6. The processeddata that was fed to the distributional clauses learner is available online as well as models with the learnedtheory of occlusion . https://bitbucket.org/reground/anchoring/downloads/ https://bitbucket.org/reground/anchoring uidberg Dos Martires et al. Symbolic Learning and Reasoning Figure 6. Depicted are two training points in the data set that were used to learn the transition rule of anobject to another object. The panels on the left show a ball that is being occluded by a box , and on theright, the same ball that is being grabbed by a hand (or a skin object, as we have only trained our usedGoogLeNet model to recognize general human skin objects instead of particular human body parts, cf.Section 3.1). The plotted dots on top of the occluding object represent samples drawn from the probabilitydistribution of the occluded object, in other words the object that is labeled in the data set to transition intothe occluding counterpart. A probabilistic anchoring system that is coupled to an inference system (cf. Section 3.2) is comprisedof several interacting components. This turns the evaluation of such a combined framework, with manyintegrated systems, into a challenging task. We, therefore, evaluate the integrated framework as a wholeon representative scenarios (videos of which are online ) that demonstrate our proposed extensions toperceptual anchoring. In Section 5.1, we demonstrate how the extended anchoring system can handleprobabilistic multi-modal states (described in Section 3). In Sections 5.2 and 5.3, we show that semanticrelational object tracking can be performed with the probabilistic logic rules (in form of a DDC program)instead of handcrafted ones. We present the evaluation in the form of screenshots captured during the execution of a scenario wherewe obscure the stream of sensor data. We start out with three larger objects (two mug objects and one box object), and one smaller ball object. During the occlusion phase, seen in Figure 7– (cid:13) , the RGB-Dsensor is covered by a human hand and the smaller ball is hidden underneath one of the larger objects. InFigure 7– (cid:13) , it should also be noted that the anchoring system preserves the latest update of the objects,which is here illustrated by the outlined contour of each object. At the time that the sensory input streamis uncovered, and there is no longer any visual perceptual input of the ball object, the system can onlyspeculate about the whereabouts of the missing object. Hence, the belief of the ball ’s position becomes amulti-modal probability distribution, from which we draw samples, as seen in Figure 7– (cid:13) . At this point,we are, however, able to track the smaller ball through its probabilistic relationships with the other largerobjects. During all the movements of the larger objects, the probabilistic inference system manages to trackthe modes of the probability distribution of the position of the smaller ball . The probability distribution forthe position of the smaller ball (approximated by N samples) is continuously fed back to the anchoringsystem. Consequently, once the hidden ball is revealed and reappears in the scene, as seen in Figures 7– (cid:13) and 7– (cid:13) , the ball is correctly re-acquired as the initial ball-1 object. This would not have been possiblewith a non-probabilistic anchoring approach. https://vimeo.com/manage/folders/1365568 uidberg Dos Martires et al. Symbolic Learning and Reasoning 1. 2. 3. 4. Figure 7. Screen-shots captured during the execution of a scenario where the stream of sensor data isobscured. Visually perceived anchored objects are symbolized by a unique anchor identifiers (e.g., mug-1 ),while occluded hidden objects are depicted by plotted particles that represent possible positions of theoccluded object in the inference system. The screenshots illustrate a scenario where the RGB-D sensor iscovered and a ball is hidden under either one of three larger objects. These larger objects are subsequentlyshuffled around before the whereabouts of the hidden ball is revealed. The conceptually easiest ToO is one that describes the occlusion of on object by an other object. Usingthe method described in Section 4, we learned such a ToO, which we demonstrate in Figure 8. Shownare two scenarios. In the one in the upper row the a can gets occluded by a box — shown in the secondscreenshot. The can is subsequently tracked through its relation with the observed box and successfullyre-anchored as can-1 once it is revealed. Note that in the second screenshot, the mug is also briefly believedto be hidden under box , shown through the black dots, as is the mug is temporally obscured behind the box and not observed by the vision system. However, once the mug is again observed the black dots disappear.In the second scenario, we occlude one of two ball objects with a box and track the ball again throughits relation with the box . Note that some of the probability mass accounts for the possibility for the occluded ball to be occluded by the mug . This is due to the fact that the learned rule is probabilistic.In both scenarios, we included background knowledge that specifies that a ball cannot be the an occluderof an object (it does not afford to be the occluder). This is also why we see a probability mass of theoccluded ball at the mug ’s location and not at the observed ball ’s location in the second scenario. Learning (probabilistic) rules, instead of a black-box function, has the advantage that a set of rules caneasily be extended with further knowledge. For example, if we would like the ToO to be recursive, i.e., uidberg Dos Martires et al. Symbolic Learning and Reasoning Figure 8. The two scenario show how a learned ToO is used to perform semantic relational object tracking.In both scenarios, an object is occluded by a box and successfully tracked before the occluded object isbeing revealed and again re-acquired as the same initial object.objects can be occluded by objects that are themselves occluded, we simply have to add the following ruleto the DDC program describing the theory of occlusion: occluded_by(Occluded,Occluder):t+1 ← occluded_by(Occluded,Occluder):t,\+observed(Occluded):t+1,\+observed(Occluder):t+1,occluded_by(Occluder,_):t+1. Extending the ToO from Section 5.2 with the above rule, enables the anchoring system to handle recursiveocclusions. We demonstrate such a scenario in Figure 9. Initially, we start this scenario with a ball , a mug and a box object (which in the beginning is miss-classified as block object, cf. Figure 4). In the first caseof occlusion, seen in Figures 9– (cid:13) , we have the same type of uni-modal occlusion as described in theprevious Section 5.2, where the mug occludes the ball and, subsequently, triggers the learned relationaltransition (where plotted yellow dots represent samples drawn from the probability distribution of theoccluded ball object). In the second recursive case of occlusion, seen in Figure 9– (cid:13) , we proceed by alsooccluding the mug with the box . Above rule administers this transitive occlusion — triggered when the ball is still hidden underneath the mug and the mug is occluded by the box . This is illustrated here by both yellow and black plotted dots that represent samples drawn from the probability distributions of occluded mug and the transitively occluded ball object, respectively. Consequently, once the box is moved, both the mug and the ball are tracked through the transitive relation with the occluding box . Reversely, it can beseen, in Figure 9– (cid:13) , that once the mug object is revealed the object is correctly re-acquired as the same mug-1 object, while the relation between the mug and the occluded ball object is still preserved. Finally,as the ball object is revealed, in Figure 9– (cid:13) , it can be also seen that the object is, likewise, correctly re-acquired as the same ball-1 object. uidberg Dos Martires et al. Symbolic Learning and Reasoning 1. 2. 4. 3. Figure 9. A scenario that demonstrates transitive occlusions based on learned rules for handling the theoryof occlusions. First the ball is occluded by the mug (indicated by the yellow dots) and subsequently the mug is occluded in turn by the box (indicated by the black dots). Once the mug is observed again the ball isstill believed to be occluded by the mug . We have presented a two-fold extension to our previous work on semantic world modelling (Persson et al.,2019), where we proposed an approach to couple an anchoring system to an inference system. Firstly, weextended the notions of perceptual anchoring towards the probabilistic setting by means of probabilisticlogic programming. This allowed us to maintain a multi-modal probability distribution of the positions ofobjects in the anchoring system and to use it for matching and maintaining objects at the perceptual level —thus, we introduce probabilistic anchoring of objects either directly perceived by the sensory input dataor logically inferred through probabilistic reasoning. We illustrated the benefit of this approach with thescenario in Section 5.1, which the anchoring system was able to resolve correctly only due to its abilityof maintaining a multi-modal probability distribution. This also extends an earlier approach to relationalobject tracking (Nitti et al., 2014), where the symbol-grounding problem was solved by the use of AR tags.Secondly, we have deployed methods from statistical relational learning to the field of anchoring. Thisapproach allowed us to learn, instead of handcraft, rules needed in the reasoning system. A distinguishingfeature of the applied rule learner (Kumar et al., 2020) is its ability to handle both continuous and discretedata. We then demonstrated that combining perceptual anchoring and SRL is also feasible in practice byperforming relational anchoring with a learned rule (demonstrated in Section 5.2). This scenario did alsoexhibit a further strength of using SRL in anchoring domains, namely that the resulting system becomes ahighly modularizable system. In our evaluation, for instance, we were able to integrate an extra rule intothe ToO, which enabled us to resolve recursive occlusions (described in Section 5.3).A possible future direction would be to exploit how anchored objects and their spatial relationships,tracked over time, facilitate the learning of both the function of objects, as well as object uidberg Dos Martires et al. Symbolic Learning and Reasoning affordances (Kjellström et al., 2011; Moldovan et al., 2012; Koppula et al., 2013; Koppula and Saxena,2014). Through the introduction of a probabilistic anchoring approach, together with the learning ofthe rules that express the relation between objects, we have presented a potential framework for futurestudies of spatial relationship between objects, e.g., the spatial-temporal relationships between objectsand human hand actions to learn the function of objects (cf. Kjellström et al. (2011); Moldovan et al.(2012)). Such a future direction would tackle a similar question, currently discussed in the neural-symboliccommunity (Garcez et al., 2019), namely how to propagate back symbolic information to sub-symbolicrepresentations of the world. A recent piece of work that combines SRL and neural methods is, for instance,Manhaeve et al. (2018).Another aspect of our work that deserves future investigation is probabilistic anchoring, in itself. Withthe approach presented in this paper we are merely able to perform MAP inference. In order to perform fullprobabilistic anchoring, one would need to render the anchor matching function itself fully probabilistic, i.e.the anchor matching function would need to take as arguments random variables and again output probabilitydistributions instead of point estimates — ideas borrowed from multi-hypothesis anchoring (Elfring et al.,2013) might, therefore, be worthwhile to consider for future work. AUTHOR CONTRIBUTIONS PZ and AP outlined the extension of the framework to include probabilistic properties and multi-modalstates PZ and NK integrated SRL with perceptual anchoring. PZ, AP and NK performed the experimentalevaluation. AL and LD have developed the notions and the ideas in the paper together with the otherauthors. PZ, NK, AP, AL and LD have all contributed to the text. CONFLICT-OF-INTEREST STATEMENT The authors declare that the research was conducted in the absence of any commercial or financialrelationships that could be construed as a potential conflict of interest. REFERENCES Adé, H., De Raedt, L., and Bruynooghe, M. (1995). Declarative bias for specific-to-general ilp systems. Machine Learning 20, 119–154Angelopoulos, N. and Cussens, J. (2008). Bayesian learning of bayesian networks with informative priors. Annals of Mathematics and Artificial Intelligence 54, 53–98Angelopoulos, N. and Cussens, J. (2017). Distributional logic programming for bayesian knowledgerepresentation. International Journal of Approximate Reasoning 80, 52–66Beltagy, I., Roller, S., Cheng, P., Erk, K., and Mooney, R. J. (2016). Representing meaning with acombination of logical and distributional models. Computational Linguistics 42, 763–808Blodow, N., Jain, D., Marton, Z. C., and Beetz, M. (2010). Perception and probabilistic anchoring fordynamic world state logging. In .160–166Chella, A., Coradeschi, S., Frixione, M., and Saffiotti, A. (2004). Perceptual anchoring via conceptualspaces. In proceedings of the AAAI-04 workshop on anchoring symbols to sensor data . 40–45Chella, A., Frixione, M., and Gaglio, S. (2003). Anchoring symbols to conceptual spaces: the case ofdynamic scenarios. Robotics and Autonomous Systems 43, 175–188 uidberg Dos Martires et al. Symbolic Learning and Reasoning Coradeschi, S. and Saffiotti, A. (2000). Anchoring symbols to sensor data: preliminary report. In Proc. ofthe 17th AAAI Conf. International JointConference on Artificial Intelligence (IJCAI) . 407–416De Raedt, L., Kersting, K., Natarajan, S., and Poole, D. (2016). Statistical relational artificial intelligence:Logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 1–189De Raedt, L. and Kimmig, A. (2015). Probabilistic (logic) programming concepts. Machine Learning Robotics and Autonomous Systems 61, 95–105Fierens, D., Van den Broeck, G., Renkens, J., Shterionov, D., Gutmann, B., Thon, I., et al. (2015). Inferenceand learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice ofLogic Programming 15, 358–401Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. (1999). Learning probabilistic relational models. In IJCAI . vol. 99, 1300–1309Garcez, A. d., Gori, M., Lamb, L. C., Serafini, L., Spranger, M., and Tran, S. N. (2019). Neural-symboliccomputing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088 Gardner, M., Talukdar, P., Krishnamurthy, J., and Mitchell, T. (2014). Incorporating vector space similarityin random walk inference over knowledge bases. In Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP) . 397–406Getoor, L. (2013). Probabilistic soft logic: A scalable approach for markov random fields overcontinuous-valued variables. In Proceedings of the 7th International Conference on Theory, Practice,and Applications of Rules on the Web (Berlin, Heidelberg: Springer-Verlag), RuleML’13, 1–1.doi:10.1007/978-3-642-39617-5_1Getoor, L. and Taskar, B. (2007). Introduction to Statistical Relational Learning (Adaptive Computationand Machine Learning) (The MIT Press)Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. (2018). Explaining explanations:An overview of interpretability of machine learning. In (IEEE), 80–89Gutmann, B., Thon, I., Kimmig, A., Bruynooghe, M., and De Raedt, L. (2011). The magic of logicalinference in probabilistic programming. Theory and Practice of Logic Programming 11, 663–680Günther, M., Ruiz-Sarmiento, J., Galindo, C., González-Jiménez, J., and Hertzberg, J. (2018). Context-aware 3d object anchoring for mobile robots. Robotics and Autonomous Systems International Conference on Computer Aided Verification (Springer), 3–29Kjellström, H., Romero, J., and Kragi´c, D. (2011). Visual object-action recognition: Inferring objectaffordances from human demonstration. Computer Vision and Image Understanding Probabilistic Graphical Models: Principles and Techniques (MIT press)Koppula, H. S., Gupta, R., and Saxena, A. (2013). Learning human activities and object affordances fromrgb-d videos. The International Journal of Robotics Research 32, 951–970 uidberg Dos Martires et al. Symbolic Learning and Reasoning Koppula, H. S. and Saxena, A. (2014). Physically grounded spatio-temporal object affordances. In European Conference on Computer Vision (Springer), 831–847Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research LogisticsQuarterly 2, 83–97. doi:10.1002/nav.3800020109Kumar, N., Kuzelka, O., and De Raedt, L. (2020). Learning distributional programs for relationalautocompletion. arXiv:2001.08603 LeBlanc, K. and Saffiotti, A. (2008). Cooperative anchoring in heterogeneous multi-robot systems. In Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA) (Pasadena, CA), 3308–3314Loutfi, A. (2006). Odour Recognition using Electronic Noses in Robotic and Intelligent Systems . Ph.D.thesis, Örebro University, Örebro, SwedenLoutfi, A. and Coradeschi, S. (2006). Smell, think and act: A cognitive robot discriminating odours. Autonomous Robots 20, 239–249Loutfi, A., Coradeschi, S., Daoutis, M., and Melchert, J. (2008). Using knowledge representation forperceptual anchoring in a robotic system. Int. Journal on Artificial Intelligence Tools 17, 925–944Loutfi, A., Coradeschi, S., and Saffiotti, A. (2005). Maintaining coherent perceptual information usinganchoring. In Proc. of the 19th IJCAI Conf. (Edinburgh, UK), 1477–1482Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., and De Raedt, L. (2018). Deepproblog: Neuralprobabilistic logic programming. In Advances in Neural Information Processing Systems . 3749–3759Meshgi, K. and Ishii, S. (2015). The state-of-the-art in handling occlusions for visual object tracking. IEICE TRANSACTIONS on Information and Systems 98, 1260–1274Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2007). 1 blog: Probabilisticmodels with unknown objects. Statistical relational learning , 373Moldovan, B., Moreno, P., Van Otterlo, M., Santos-Victor, J., and De Raedt, L. (2012). Learning relationalaffordance models for robots in multi-object manipulation tasks. In (IEEE), 4373–4378Neville, J. and Jensen, D. (2007). Relational dependency networks. Journal of Machine Learning Research 8, 653–692Nitti, D., De Laet, T., and De Raedt, L. (2013). A particle filter for hybrid relational domains. In IntelligentRobots and Systems (IROS), 2013 IEEE/RSJ International Conference on (IEEE), 2764–2771Nitti, D., De Laet, T., and De Raedt, L. (2014). Relational object tracking and learning. In Robotics andAutomation (ICRA), 2014 IEEE International Conference on (IEEE), 935–942Nitti, D., De Laet, T., and De Raedt, L. (2016a). Probabilistic logic programming for hybrid relationaldomains. Mach. Learn. Proceedings of the Twenty-second European Conference on Artificial Intelligence (IOS Press), 1283–1290Persson, A., Länkvist, M., and Loutfi, A. (2017). Learning actions to improve the perceptual anchoring ofobjects. Frontiers in Robotics and AI 3, 76. doi:10.3389/frobt.2016.00076Persson, A., Zuidberg Dos Martires, P., Loutfi, A., and De Raedt, L. (2019). Semantic relational objecttracking. arXiv preprint arXiv:1902.09937 Ravkic, I., Ramon, J., and Davis, J. (2015). Learning relational dependency networks in hybrid domains. Machine Learning Machine learning 62, 107–136Riguzzi, F. (2018). Foundations of Probabilistic Logic Programming (River Publishers) uidberg Dos Martires et al. Symbolic Learning and Reasoning Ruiz-Sarmiento, J. R., Günther, M., Galindo, C., González-Jiménez, J., and Hertzberg, J. (2017).Online context-based object recognition for mobile robots. In . 247–252. doi:10.1109/ICARSC.2017.7964083Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Proceedingsof the 12th International Conference on Logic Programming . 715–729Sato, T. and Kameya, Y. (2001). Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research 15, 391–454Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016). Masteringthe game of go with deep neural networks and tree search. nature IEEE transactions on pattern analysis and machine intelligence The art of Prolog: advanced programming techniques (MIT press)Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper withconvolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1–9Taskar, B., Abbeel, P., and Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (Morgan KaufmannPublishers Inc.), 485–492Vezzani, R., Grana, C., and Cucchiara, R. (2011). Probabilistic people tracking with appearance modelsand occlusion classification: The ad-hoc system. Pattern Recognition Letters 32, 867–877Wong, L. L., Kaelbling, L. P., and Lozano-Pérez, T. (2015). Data association for semantic world modelingfrom partial views. The International Journal of Robotics Research 34, 1064–1082Wu, Y., Lim, J., and Yang, M.-H. (2013). Online object tracking: A benchmark. In Proceedings of theIEEE conference on computer vision and pattern recognition . 2411–2418Xie, C., Xiang, Y., Mousavian, A., and Fox, D. (2019). The best of both modes: Separately leveraging rgband depth for unseen object instance segmentation.