Distributed Consistent Multi-Robot Semantic Localization and Mapping
DDistributed Consistent Multi-Robot Semantic Localization and Mapping
Vladimir Tchuiev and Vadim Indelman ∗ Abstract
We present an approach for multi-robot consistent distributed localization and semantic mapping in an un-known environment, considering scenarios with classification ambiguity, where objects’ visual appearance generallyvaries with viewpoint. Our approach addresses such a setting by maintaining a distributed posterior hybrid beliefover continuous localization and discrete classification variables. In particular, we utilize a viewpoint-dependentclassifier model to leverage the coupling between semantics and geometry. Moreover, our approach yields a con-sistent estimation of both continuous and discrete variables, with the latter being addressed for the first time, tothe best of our knowledge. We evaluate the performance of our approach in a multi-robot semantic SLAM simu-lation and in a real-world experiment, demonstrating an increase in both classification and localization accuracycompared to maintaining a hybrid belief using local information only.
Deployment of multi-robot systems allow for fast information gathering, and can be used in a wide variety ofapplications, for example: search and rescue, autonomous driving, and agriculture. A significant part of ongoingresearch is multi-robot Simultaneous Localization and Mapping (SLAM), where a group of robots localize themselvesand cooperatively map the environment. Multi-robot SLAM is utilized in a variety of navigation tasks such ascooperative search and rescue, underwater navigation, or warehouse management. SLAM itself is a widely researchedproblem (see e.g. [1]) in the robotics community. In particular, semantic SLAM reasons about objects within theenvironment with richer information, such as object’s class, compared to geometric SLAM. Yet, often when observedfrom certain viewpoints, inferring the correct class of an object can be challenging, i.e. an object may visually appearsimilar to representative objects from different classes. This induces a viewpoint dependency for classifier outputsand requires information from different viewpoints for maintaining a belief over classification variables.In this paper we present the first distributed multi-robot approach for semantic localization and mapping inthe above setting. Our approach maintains a hybrid belief over continuous variables (object and camera poses)and discrete variables (object classes), while considering the coupling between classification and localization, andenforcing consistent, double-counting-free estimation.In contrast, existing approaches for multi-robot semantic SLAM utilize most-likely class measurements to solvedata association. Moreover, these approaches do not maintain a belief over classification variables, nor model thecoupling between semantic and geometric information.As each robot uses information from other robots, it must not use measurements more than once, otherwise itwill lead to erroneous and overconfident estimates, i.e. it will double count information. To address this key problem,multiple approaches were proposed, all considering continuous variables: from complex book-keeping (e.g. [2]) toinformation removal techniques (e.g. [3]). In this work we address consistent inference of a hybrid belief that consistsof continuous and discrete variables. To the best of our knowledge, the latter has not been addressed thus far.To summarize, our main contributions are as follows. (i) we contribute a multi-robot approach that maintains ahybrid belief over robot and object poses, and object classes in a distributed setting, while addressing the couplingbetween semantic and geometric information via viewpoint-dependent classifier model; (ii) we address estimationconsistency aspects considering both continuous and discrete random variables; (iii) we demonstrate the strength ofthis approach in simulation and real-world experiment, comparing to single robot and distributed multi-robot withdouble counting. This paper is accompanied with supplementary material which provides further details and results. ∗ The authors are with the Department of Aerospace Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel. { vovatch , vadim.indelman } @technion.ac.il . This work was partially supported by the Israel Ministry of Science & Technology(MOST). a r X i v : . [ c s . R O ] J u l arameters x Robot pose x on , c n n’th object pose and class X ok Poses of objects observed up to time k X onew,k Poses of objects newly observed at time k X k Robot and object poses up to time kC k Object seen up to time k class realization C new,k Classes of objects newly observed at time k Z k Measurements at time k including geometric and semantic M k Motion model from x k − to x k L k Measurement likelihood of Z k H k History of measurements and action up to time kb k Conditional continuous belief at time kw k Discrete weight at time kξ k Continuous object marginal belief at time kφ k Discrete marginal belief at time kN k ( · ) Number of objects observed by a robot or a group up to time k Superscripts r States of robot rR States of robots communicating with r , directly and indirectly, including itself Table 1:
Main notations used in the paper.
Various works have utilized sequential classification with a classifier model for a single robot. Omidshafiei et al. [4]presented a sequential classification approach that used a Dirichlet distributed classifier model. The classifier modelwas not modeled as viewpoint-dependent. Kopitkov and Indelman [5] presented an approach to train a viewpointdependent classifier model. Feldman and Indelman [6] proposed a sequential object classification that utilizes aviewpoint dependent classifier with known relative poses a-priori. Tchuiev et al. [7] maintained a hybrid belief witha viewpoint dependent classifier to disambiguate between data association realizations. These works, [7], addressonly sequential classification and do not consider the coupled problem with SLAM. To our knowledge, our work isthe first to address the coupled problem in a distributed setting.There are different approaches for distributed multi-robot SLAM; Walls et al. [8] proposed a distributed geometricSLAM approach that communicates factors between robots. Other approaches for geometric SLAM include ExtendedKalman Filter (such as [9]) or Particle Filter based methods (such as [10]). Choudhary et al. [11] presented anapproach for distributed semantic SLAM which communicates relative poses between robots and uses object classinformation for data association. The geometric approaches do not reason about object classes, while the semanticapproaches consider only most likely classification, i.e. do not maintain a belief over class variables. Our semanticapproach maintains a belief over object classes and considers the coupling between the continuous and discretevariables.Consistent estimation is a key issue in a distributed setup, with multiple approaches proposed to address it. Bahret al. [2] proposed a distributed algorithm for under-water vehicles, with an approach for using all measurementswithout information loss. Indelman et al. [12] proposed a graph based method that calculated cross-covariance termsthat represent the correlation between measurements from different robots, utilizing it for consistent estimation.Cunningham et al. [13] presented the DDF-SAM distributed SLAM algorithm that avoided double counting bycreating two maps for each robot: local and global. The global map is updated with condensed local maps. A laterwork by Cunningham et al. [3] introduced DDF-SAM2, where each robot maintains only the global map. To avoiddouble counting, the old information during communication is filtered out via down-dating by each robot. Theseapproaches consider continuous random variables. In contrast, we reason about discrete variables as well.
Consider a group of robots operating in an unknown environment represented by object landmarks. All of the robotsaim to localize themselves, and map the environment geometrically and semantically within a distributed multi-robotframework. In this work we consider a closed-set setting, where each of the objects is of one of M possible classes.The number of objects in the environment prior to the scenario is unknown.We denote states inferred by robot r with a superscript (cid:3) r . Set R is the set of all robots communicating withrobot r (including itself), either directly, or relayed through other robots. Note that R can increase its size with time.Let x k denote robot pose at time k , x on and c n denote the n ’th object pose and class respectively. Let X o . = { x on } n and C . = { c n } n denote poses and classes of objects, and X k . = { x k , X ok } denotes all poses up to time k . Subscript new, k representing the objects newly observed at k .Let Z rk be the set of measurements robot r receives at time k by its own sensors. Z rk is composed of geometric and2emantic measurements Z geo,rk , and Z sem,rk respectively. We assume independence between geometric and semanticmeasurements, as well as between different time steps.We assume Gaussian and known identical motion M k . = P ( x k | x k − , a k − ) and geometric P ( z geo,rk | x rk , x o,r ) modelsfor all robots. At each time step, there is a subset of object poses involved in the geometric and classifier modelthat is determined by data association (DA). Unlike our previous work [7], herein, DA is assumed to be externallydetermined.Additionally, we use a viewpoint-dependent classifier model that ”predicts” classification scores (a vector ofclass probabilities). This model couples classifier scores with viewpoint dependency between object and camera; thiscoupling can be used to improve pose inference performance [7]. The viewpoint dependency is modeled as a Gaussianwith parameters that depend on the relative viewpoint from the camera to the object x o,r (cid:9) x rk and object’s class c : P ( z sem,rk | x rk , x o , c ) = N ( h c ( x rk , x o,r ) , Σ c ( x rk , x o,r )) , (1)where h c ( · ) and Σ c ( · ) can be learned offline via a Gaussian Process (GP) [6] or a deep neural network [5]. Note thatfor M candidate classes, M viewpoint-dependent models have to be learned.Let L rk . = P ( Z rk |X rk , C rk ) be the local measurement likelihood of r that consists of geometric and classifier models: L rk . = Y x o,r ,c r P ( Z geo,rk | x rk , x o,r ) P ( Z sem,rk | x rk , x o,r , c r ) , (2)where x o,r ∈ X o,rβ k and c r ∈ C rβ k ; the term β k represents the local DA of robot r at time k , i.e. the correspondencesbetween observations and object IDs. Denote X o,rβ k the set of all poses of objects that observed by r at time k , andsimilarly denote C rβ k for object classes. For the reader’s convenience, Table 1 presents the important notations usedin the paper, some will be defined in the next section. Problem formulation
For each robot r we aim to maintain the following hybrid belief: P ( X Rk , C R |H Rk ) , (3)where H Rk . = {Z r k , a r k − } r ∈ R is the history of measurements of robot r itself and transmitted information to r ,as well as actions from all robots in R . The belief in Eq. (3) is a hybrid belief over both continuous (camera andobject poses), and discrete (object classes) random variables. We aim to update this hybrid belief per each robot ina recursive manner, using both local measurements and information sent from other robot in the neighborhood, aswell as sending information by itself. We aim to keep estimation consistency by avoiding double counting, i.e. usingevery measurement only once. We present a framework for distributed classification, localization, and mapping. As with many multi-robot dis-tributed frameworks, over-confident estimations, due to double counting, is a key issue; We propose a frameworkthat simplifies the book-keeping that allows relaying of information (e.g. robot 1 sends information to robot 2, then2 sends to 3 information that also includes the received from robot 1). This framework requires the maintenance ofa local belief P ( X rk , C r |H rk ) per each robot that can be sent and relayed to other robots. From multiple local beliefsa distributed belief can be constructed. The local beliefs are stored by each robot, and updated accordingly whennew information arrives, and the receiving robot filters out the old information, thus avoiding double counting.In the next sections we derive a recursive formulation for maintenance of the local belief, the distributed hybridbelief, and the information stack each robot holds and transmits. Our formulation for maintaining local hybrid beliefs builds upon our previous work [7], with the main differencesbeing that here we assume the DA is solved, and the number of objects is unknown a-priori. In this section wepresent an overview of this approach.We maintain the hybrid belief of robot r only from local information. This belief can be split into continuousand discrete parts as in: P ( X rk , C rk |H rk ) = P ( X rk | C rk , H rk ) | {z } b rk P ( C rk |H rk ) | {z } w rk . (4)3o maintain this hybrid belief, we must maintain a set of continuous beliefs conditioned on the class realization ofall objects observed in the scene by robot r thus far.The continuous part can be updated as follows: b rk ∝ b rk − · L rk · M rk · P ( X o,r new ,k ) , (5)where P ( X o,r new ,k ) = P ( X o,rk ) P ( X o,rk − ) is the prior over object poses newly observed at time k . As opposed to [7], this formulationalso supports an increasing number of objects known at each time step, with both X o,rk and C rk increasing in dimension.Note that in general b rk is different for each class realization, as models (1) are different for each class.The discrete part is the weight associated to its corresponding continuous belief. As our measurement modelsdepend on continuous variables, we use Bayes rule on P ( C rk |H rk ) and marginalize the measurement likelihood asfollows: w rk ∝ w rk − P ( C r new ,k ) Z X rk L rk · b rk − · M rk d X rk , (6)where P ( C r new ,k ) = P ( C rk ) P ( C rk − ) is the prior over classes of new objects locally observed by r at time k . We compute theintegral in Eq. (6) by sampling the continuous variables that participate in P ( Z rk |X rk , C rk ), i.e. the last robot pose x rk and the poses of observed objects X o,rβ k at time k . These variables are sampled from the propagated belief b rk − · M rk .Variables that do not participate in L rk can be marginalized analytically. In this section we extend the formulation presented in Sec. 4.1 to include updates from other robots, consideringa distributed multi-robot setting. As will be seen, our formulation uses each measurement only once, thus keepingestimation consistency and avoiding double counting. Similarly to (4), we factorize the distributed hybrid belief (3) P ( X Rk , C Rk |H Rk ) = P ( X Rk | C Rk , H Rk ) | {z } b Rk P ( C Rk |H Rk ) | {z } w Rk . (7)As in the single robot case, maintaining this belief requires managing multiple hypotheses of class realizations. Com-pared to the single robot case, the number of objects observed will be equal or greater for distributed belief, thereforethe number of possible realizations increases as well. Importantly, information transmitted by other robots im-pacts both b Rk and w Rk . Furthermore, the classifier viewpoint-dependent model induces coupling between localizationuncertainty and classification of different robots.We present a recursive formulation for maintaining each of the parts in (7). The distributed measurement history H Rk can be split to a prior part, and a new part, defined as ∆ H Rk , that consists of measurements and actions fromtime k , s.t: H Rk = H Rk − ∪ ∆ H Rk . Similarly, let H rk . = H rk − ∪ {Z rk , a rk − } for the single robot case. Note information in∆ H Rk transmitted by other robots can potentially be from earlier time instances (as each robot during communicationtransmits to robot r its own stack of local beliefs of other robots, see Section 4.3). Crucially, each measurement mustbe used once to avoid double counting. We also denote history without local measurements and action at time k as H R − k . = H Rk \{Z rk , a rk − } , ∆ H R − k . = ∆ H Rk \{Z rk , a rk − } . (8)Using the above notations, one can observe H R − k = H Rk − ∪ ∆ H R − k . Next, we detail our approach for maintainingboth the conditional continuous part b Rk and the discrete part w Rk recursively for a realization of object classes C Rk . b Rk Using Bayes rule, we rewrite b Rk as: b Rk = η · L rk · b R − k (9)where η . = P ( Z rk | C rk , H Rk \Z rk ) − is a normalization constant the does not participate in inference of the continuousbelief. The local measurement likelihood, L rk , is defined in Eq. (2).The term b R − k . = P ( X Rk | C Rk , H Rk \Z rk ) is the distributed propagated belief that is conditioned on informationtransmitted by other robots at time k , and on the latest action of robot r but not on its local measurement. Duringupdate, b R − k is saved to be used in maintenance of w Rk , as seen in the next subsection. Using chain rule, we canextract the motion model of the latest action as well: b R − k = M rk · P ( X Rk \ x rk | C Rk , H R − k ) . (10)4e can express P ( X Rk \ x rk | C Rk , H R − k ) in terms of the distributed continuous prior b Rk − . = P ( X Rk − | C Rk − , H Rk − ), andthe new information received from other robots (see supplementary material Sec. 2): P ( X Rk \ x rk | C Rk , H R − k ) = b Rk − · P ( X o,Rk | C o,Rk , ∆ H R − k ) P ( X o,Rk − ) (11)Finally, we substitute Eq. (11) to Eq. (10) and in turn to Eq. (9), and get the following recursive formulation: b Rk ∝ b Rk − · L rk · M rk · P ( X o,R new ,k ) P ( X o,Rk | C o,Rk , ∆ H R − k ) P ( X o,Rk ) , (12)where the measurement likelihood L rk accounts for the new local measurement, M rk accounts for the latest action ofrobot r , and P ( X o,Rk | C o,Rk , ∆ H R − k ) (shown in blue) accounts for new information sent to r by other robots in R attime k . This pdf is only over object poses ( X o,Rk ), while the other robots’ poses are marginalized out. Thus, robotscommunicate the environment states, which are implicitly affected by the robots’ pose estimation. Computation ofthe blue part is further discussed in Sec. 4.3. Compared to the local belief update (5), the blue part is the maindifference. The expression P ( X o,R new ,k ) represents pose prior of objects newly known by r at time k .The distributed belief has at worst M N k ( R ) continuous beliefs with corresponding weights, where the numberof objects N k ( R ) known by r can increase with time. Naturally, a multi-robot system will observe more objectsthan a single robot, therefore the computational burden for distributed belief will be larger than for the local belief.Therefore, the significance of pruning beliefs with small weight grows. We set a threshold for the ratio between aweight and the largest weight in the distributed hybrid belief. w Rk To maintain w Rk , we use a similar derivation to the weight update via local information only, presented in Sec. 4.1.We use Bayes rule to extract the last local measurement likelihood: w Rk = η · w R − k · P ( Z rk | C Rk , H Rk \Z rk ) , (13)where w R − k . = P ( C Rk |H Rk \Z rk ) is the posterior distributed weight without accounting for the latest local measure-ments, and η . = P ( Z rk |H Rk \Z rk ) − is a normalization constant that is identical in all realizations of C Rk , thus doesnot participate in weight inference. As we use a viewpoint dependent classifier model that utilizes the couplingbetween relative viewpoint and object class, we need to marginalize P ( Z rk | C Rk , H Rk \Z rk ) over the involved poses inthis likelihood: the last robot pose x rk , and poses of objects observed at time k . We denote the latter by X o,rβ k , andto shorten notations denote X r inv ,k . = { x rk , X r,kβ k } , and by ¬X r inv ,k . Thus, P ( Z rk | C Rk , H Rk \Z rk ) is marginalized as P ( Z rk | C Rk , H Rk \Z rk ) = Z X r inv ,k L rk · P ( X r inv ,k | C rk , H Rk \Z rk ) d X r inv ,k , (14)where P ( X r inv ,k | C rk , H Rk \Z rk ) is computed by marginalizing b R − k over the uninvolved variables ¬X r inv ,k , with X Rk = X r inv ,k ∪ ¬X r inv ,k , as P ( X r inv ,k | C rk , H Rk \Z rk ) = Z ¬X r inv ,k b R − k d ¬X r inv ,k . (15)The propagated distributed belief b R − k is given to us from the continuous belief with Eq. (10), and includes theexternal information, shown in blue.In practice, we sample the involved variables X r inv ,k in the current measurement likelihood and compute its value.As b Rk and L rk are Gaussian, η does not play a role in the sampling process. Despite the classifier outputs beingmodeled as Gaussian, we integrate over poses; In the general case, expectation and covariance of the classifier modelare a function of the relative viewpoint, thus we need to sample X r inv ,k as presented in Sec. 4.1 at Eq. (6).The other term we will address from Eq. (13) is w R − k . We express w R − k in terms of w Rk − : w R − k ∝ w Rk − · P ( C Rk − ) − · P ( C Rk | ∆ H Rk \Z rk ) . (16)Finally, we substitute Eq. (14) and (16) to Eq. (13) to reach our final recursive form for the discrete belief update: w Rk ∝ w Rk − · P ( C R new ,k ) P ( C Rk | ∆ H Rk \Z rk ) P ( C Rk ) R X r inv ,k L rk ·· P ( X r inv ,k | C rk , H Rk \Z rk ) d X r inv ,k , (17)5ith P ( X r inv ,k | C rk , H Rk \Z rk ) computed via Eq. (15). This is a recursive formulation that includes the discrete prior w Rk − , external updates for the class probability from other robots (shown in red), and the external updates for thecontinuous belief contained within the integral. Remark : One might be tempted to infer the class of each object separately, but it is not accurate due to thecoupling between relative viewpoint and object class, as each object class is possibly implicitly dependent on allposes: robot and objects (see supplementary material Sec. 3).
In Sec. 4.2 we presented a framework to maintain a hybrid belief of r given information obtained from other robotsin R . That information was represented by the continuous blue expression in Eq. (12) and implicitly in Eq. (17),and the discrete red expression in Eq. (17). In this section, we present our approach for computing these parts, thusdescribing the management of this information and what each robot sends when communicating. We aim to achievetwo goals:1. Simple double counting prevention when maintaining the distributed belief without complex bookkeeping.2. Distributed belief inference also via data not directly transmitted (e.g. robot r sends data to r , r to r , and r is using data from r ).As will be shown next, the blue and red terms in Eqs. (12) and (17) can be expressed via local information transmittedby different robots in R to robot r . To that end, each robot r maintains and broadcasts a stack of local hybrid beliefsof other robots it is aware of. In contrast to (4), these local beliefs are marginal beliefs over object poses and classes,i.e. robot poses are marginalized out. Each slot for robot r in the stack of robot r contains N k ( r ) continuous anddiscrete marginal beliefs (defined below as ξ r,r k and φ r,r k ), one pair per class realization, following a factorizationsimilar to (4). Additionally, each slot includes a time-stamp that indicates on what data the local hybrid belief isconditioned upon. All in all, every stack contains P | R | i =1 N k ( r i ) continuous and discrete beliefs. Eq. (18) presents thestack of robot r as a set of slots, where each slot contains a hybrid belief of a particular robot r i ∈ R over objectposes and classes, normalized by their priors. S rk . = ( P ( X o,r i k i | C r i k i , H r i k i ) P ( C r i k i |H r i k i ) P ( X o,r i k i ) P ( C r i k i ) , k i !) r i ∈ R , (18)where k i is the time-stamp when robot r received information about r i . In general, time k i is not synchronizedwith k . The marginal continuous and discrete beliefs that robot r has about robot r i ∈ R are denoted ξ r,r i k . = P ( X o,r i k i | C r i k i , H r i k i ) / P ( X o,r i k i ) for the continuous part, and φ r,r i k . = P ( C r i k i |H r i k i ) / P ( C r i k i ) for the discrete part.With these definitions of ξ r,r i k and φ r,r i k , it is possible to show that the blue part in Eq. (12) can be expressed as(see full derivation in supplementary material) P ( X o,Rk | C Rk , ∆ H R − k ) P ( X o,Rk ) = Y r i ∈ R ξ r,r i k ξ r,r i k − (19)Similarly, the red term in Eq. (17) can be expressed as (see full derivation in supplementary material): P ( C Rk | ∆ H Rk \Z rz ) P ( C Rk ) = Y r i ∈ R φ r,r i k φ r,r i k − . (20)Eqs. (19) and (20) present the external update as a product of local beliefs, with only the updates from k − r are present. This formulation avoids double counting by removing old information, ξ r,r i k − and φ r,r i k − , in eachcommunication and uses measurements only once. Specifically for ξ r,r i k − , we use the approach presented in [3]. Doingso by maintaining stacks of individual information does not require complex book-keeping, only time-stamps for eachslot; Thus we fulfill the first goal. Robots can also relay information transmitted to them, thus the distributed beliefcan be updated by information from robots that did not transmit to the inferring robot, thus fulfilling the secondgoal.Robot r i sends the entire stack during information broadcast. When robot r receives information, it integratesthe broadcast in as follows: recall that r i ’s stack is divided to slots, with a time stamp per each slot. Robot r compares time stamps with the received information per slot, and replaces the information within the slot if the6eceived time stamps is newer. If r receives information from more than one other robot at the same time, it willselect the newest information per slot. This procedure is dependent on the relations between time-stamps, thus it isnot necessary to synchronize time between the robots.In the following section we discuss double counting aspects of discrete random variables, corresponding to Eq. (20). Double counting leads to over-confident estimations, and if an erroneous measurement is counted multiple times, itmay lead to a large error in the state’s estimation in turn. While the implications of double counting on continuousrandom variables (e.g. camera poses and objects) have been investigated, it is not so for discrete random variables.Both cases have a common thread: measurements counted multiple times will ’push’ the posterior estimation to acertain direction while leading to lower uncertainty than when double counting is appropriately avoided (i.e. eachmeasurement is used at most once). In the continuous Gaussian case, it manifests in a covariance matrix with smallereigenvalues. Comparatively, in the discrete case the highest probability category will have its probability increasewhile the probability of not being in this category decreases.To illustrate the above, consider an example with a categorical random variable c ; we receive two sets of data Z a = { z , z } , and Z b = { z , z } , with a common measurement z . Considering a measurement likelihood P ( z | c ),the posterior over c is (see e.g. Bailey et al. [14]): P ( c | Z a , Z b ) ∝ P ( c ) P ( Z a , Z b | c ) = P ( c ) P ( z | c ) P ( z | c ) P ( z | c ) P ( z | c ) . (21)If the common data (measurement z ) is not removed via the denominator in Eq. (21), it will be double counted.Compared to Eq. (20), the above nominator and denominator correspond, respectively, to the terms φ r,r i k and φ r,r i k − .Denote P ( z | c = i ) . = a i , and to shorten the notations P ( c = i ) P ( z | c = i ) P ( z | c = i ) . = L i . The normalizedposterior can be written as: P ( c = i | Z a , Z b ) = a i L i P mj =1 a j L j = a i L i P mj =1 a j L j · a i (22)where m is the number of candidate categories. Double counting, i.e. without the denominator in Eq. (21), givesafter normalization a i L i P mj =1 a j L j .The largest a i is denoted a max , with i max being the category corresponding to a max , and subsequently the productof all other terms for i max is denoted L max . Double counting of P ( z | c i ) will increase the probability of i max : P ( c = i max | Z a , Z b ) = a max L max P mj =1 a j L j · a max ≤ a max L max P mj =1 a j L j . (23)Similarly, it can be shown that with higher power (i.e. counting the data more) can increase the posterior probabilityeven further; In addition, the reverse can be shown for the lowest probability in a . This increase in influence canbe disastrous if the category of the highest probability likelihood is not correct, possibly leading to pruning of thecorrect class hypothesis when maintaining the hybrid belief (3).A visualization can be seen in Fig. 1, where there are 4 categories with uninformed prior and a measurementlikelihood; in Figs. 1a, 1b and 1c the likelihood is counted once, twice and thrice respectively. Evidently, the strongestcategory’s probability (cat. 3) is increased when counted more times while all other have their probability diminish. We evaluated our approach in a multi-robot SLAM simulation and with real-world data where we consider anenvironment comprising several scattered objects observed by multiple mobile cameras from different viewpoints.Fig. 2a and Fig. 5a present the ground truth for simulation and experiment respectively. Our implementation usesthe GTSAM library [15] with a python wrapper. The hardware used is an Intel i7-7700 processor running at 2.8GHzand 16GB RAM, with GeForce GTX 1050Ti with 4GB RAM.
Consider 3 robots, denoted r , r , and r , moving in a 2D environment represented by N = 15 scattered objects.We consider a closed-set setting and assume, for simplicity, M = 2 classes, where each object can be one of the two.In this scenario the maximum number of possible class realizations is M N = 32768.7 (a) (b) (c) Figure 1:
Conceptual demonstration of the effects of double counting on discrete random variables. Consider 4 possible categories with anuninformative prior over them. (a) is the measurement likelihood for the categories. Considering the uninformative prior, it is the posteriordistribution as well. (b) and (c) counts the same likelihood twice and thrice respectively.
Our approach is evaluated for both classification, and pose inference accuracy, as we maintain a hybrid belief.We consider an ambiguous scenario where the classifier model cannot distinguish between the two classes from acertain viewpoint, thus requiring additional viewpoints to correctly disambiguate between the two classes. The robotscommunicate between themselves, increasing performance for discrete and continuous variables, i.e. classification andSLAM. Additionally, the distributed setting extends the sensing horizon, allowing robots to reason about objectsthat are not directly observed, while keeping estimation consistency.Each robot only communicates with robots within a 10 meter communication range, relaying the local informationstored in its stack. In particular, initially r and r share information with each other, then r and r , relayinginformation from r through r . For a complete table of communication in the considered scenario, see supplementarymaterial. Further, we assume the robots share a common reference frame (this assumption can be relaxed as in [16]).We simulate relative pose odometry and geometric measurements, and we crafted a classifier model that simulatesperceptual aliasing.In the evaluation we compare between three approaches: local estimations, our approach, and our approach withdouble counting, i.e. ξ r,r i k − = 1 and φ r,r i k − = 1 in Eq. (19) and (20) respectively. In all benchmarks we average theresults for each robot. The parameters are presented in the supplementary material Sec. 6. −8 −6 −4 −2 0 2X axis [m]−7.5−5.0−2.50.02.55.07.5 Y a x i s [ m ] O1 O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 R1O1 O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 R2O1 O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15R3 (a)
Ground Truth
Time step
No communicationDistributedDouble Counting (b)
Robot position
Time step
No communicationDistributedDouble Counting (c)
Object position
Time step
No communicationDistributedDouble Counting (d)
Robot covariance
Time step
No communicationDistributedDouble Counting (e)
Object covariance
Figure 2:
Simulation figures; (a) present the ground truth of the scenario. Red points represent the initial position of the robots, with differentcolored lines represent different robots. The green points represent the object poses. (b) and (c) represent the average ˜ x w avg for robot and objectposition respectively as a function of time. (d) and (e) present the corresponding square-root of the position covariance for the robot and objectaverage respectively. As explained in Sec. 4.4, when double counting occurs, the posterior class probability will converge to extremeresults quicker, and may result on either completely right or wrong classifications. Therefore, reasoning about asingle run is insufficient, and a statistical study is required. To quantify classification accuracy, we sample 100 timesdifferent geometric and semantic measurements, and perform a statistical study over the results. For that, we usemean square detection error (MSDE) averaged over all objects, robots, and runs (also used by Teacy et al. [17] andFeldman & Indelman [6]). We define MSDE per robot and object as follows:
M SDE . = 1 m m X i =1 ( P gt ( c = i ) − P ( c = i |H Rk )) , (24)where P gt ( c = i ) represents the classification ground truth and can be either 1 for the correct class or 0 for all otherclasses. Therefore M SDE = 1 for completely incorrect classification, thus allowing us to perform statistical studyof the effects of double counting of discrete random variables. To quantify localization accuracy, we use estimationerror ˜ x w avg which is the weighted average of Euclidean distance between the estimated and ground truth poses.8
20 40 60
Time step M S D E No communicationDistributedDouble Counting (a)
MSDE −10 −5 0 5
X axis [m] −505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (b)
Local
Object 7Object 8Object 1Object 13Object 5Object 14Object 10Object 11Object 15Object 3 . . . . . . Probability of weight
XXXXXXXXX X (c)
Local −10 −5 0 5
X axis [m] −505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (d)
Distributed
Object 7Object 8Object 1Object 4Object 2Object 3Object 13Object 15Object 12Object 11Object 9Object 5Object 14Object 6Object 10 . . . . . . Probability of weight
XXXXX XXX XXX XXX X (e)
Distributed
Figure 3: (a) presents average MSDE for the robots over 100 runs with different measurements. The rest are figures for time k = 60 of r . (b) and (d) represent multiple SLAM hypotheses for local and distributed setting respectively; Black dots with gray ellipse represent object poseestimation, red & blue signs with red ellipse represent robot pose estimation. Green and red points represent ground truth for object and robotpositions respectively. (c) and (e) represent class probabilities for c = 1 for objects observed thus far for local and distributed respectively. The X notations represent ground truth (1 for class c = 1, 0 for class c = 2). Fig. 2 presents results for continuous variables, i.e. robot and object poses. Figs. 2b and 2c show a clear advantageto our approach, where the localization error is the smallest for robots and objects respectively after the first 10 timesteps. In Figs. 2d and 2e the estimation covariance is presented, where the double counted approach has the smallestvalues as expected. Fig. 2e shows ’spikes’ in the average objects’ position covariance; these correspond to new objectdetections where the localization uncertainty is still high.Fig. 3 visualizes classification and estimations at time k = 60 for local only and for distributed beliefs of robot r .At that time, robot r communicated earlier with r , and for the first time communicates with r . When comparingFig. 3b (local) to Fig. 3d (distributed), the number of possible class realizations is reduced. In addition, the estimateof r ’s pose, as well as the objects, is more certain and accurate. When comparing Figs. 3c and 3e, the latter presentsa larger map, i.e. more objects observed, and the class estimations (classification) are closer to the ground truth.Fig. 3a presents the average MSDE over 100 runs, where as a whole our approach shows lower MSDE values, i.e.statistically stronger classification results. In supplementary material we present additional classification and SLAMresults. In our scenario 3 robots are moving within an environment with multiple objects within it. We scattered 6 chairswithin the environment and photographed them using a camera on a stand, keeping a constant height. In Fig. 4awe show an image from the scenario with the corresponding bounding box. The chairs were detected with YOLO3DarkNet detector [18], which provided bounding boxes, and then each bounding box was classified using a ResNet50convolutional neural network [19]. We considered 3 candidate classes out of 1000: ’barber chair’, ’punching bag’,and ’traffic light’, as c = 1 , , c = 1 being the ground truth class. We trained three viewpoint-dependent classifier models using three sets of relative pose and class probability vector pairs, with the spatialparameters being the yaw and pitch angles from camera to object; The models are presented in the supplementarymaterial Sec. 9. For the ground truth class we photographed an objects from multiple viewpoints, and then classifiedit using ResNet 50. For the other two classifier models, we sampled class probability vectors with larger probabilityfor the corresponding class of the model, and used the same relative poses as the first model. Fig. 4b, 4c presentsexpectation of c = 1 for two of the classifier models as a function of the spatial parameters.In the experiment (deployment phase), we utilized both geometric and semantic measurements, using the corre-sponding (learned) measurement likelihood models. Relative pose geometric measurements for odometry and betweencamera and objects were generated by corrupting ground truth with Gaussian noise, while the semantic measure-ments are provided by YOLO3 and ResNet from real images. For parameter details, see supplementary materialSec. 9. The same metrics as the simulation are used here. Fig. 5 presents SLAM results for the same benchmarks as in Fig. 2. Figs. 5b and 5c present an average ˜ x w avg over allrobots for robot and object positions, respectively. In general, the advantage of our approach is evident with lower9 a) (b) (c) Figure 4: (a) is an image used in the experiment, with corresponding the bounding box. (b) and (c) are class probability expectation for class c = 1 for classifier models of c = 1 and c = 2 respectively. −1 0 1X axis [m]−2−1012 Y a x i s [ m ] O1O2O3 O4O5O6 R1 O1O2O3 O4O5O6R2 O1O2O3 O4O5O6 R3 (a)
Ground Truth
Time step
No communicationDistributedDouble Counting (b)
Robot position
Time step
No communicationDistributedDouble Counting (c)
Object position
Time step
No communicationDistributedDouble Counting (d)
Robot covariance
Time step
No communicationDistributedDouble Counting (e)
Object covariance
Figure 5:
Experiment figures; (a) present the ground truth of the scenario. Red points represent the initial position of the robots, with differentcolored lines represent different robots. The green points represent the object poses. (b) and (c) represent the average ˜ x w avg for robot and objectpositions respectively as a function of time for the experiment. (d) and (e) present the corresponding square-root of the position covariance forthe robot and object average respectively. errors. In addition, Figs. 5d and 5e present a similar pattern to Figs. 2d and 2e, respectively, where the covarianceof our approach is smaller than the single robot case, but larger than the over-confident double counting case.For classification results, Fig. 6a shows the average MSDE per robot as a function of time step, where eventuallyour approach out-performs both the single robot and the double counting cases, with higher probability for thecorrect class realization. In Fig. 6, SLAM and classification results for Robot 2 at time step k = 35 are presented,showing similar resulting trends to Fig. 3. Comparing Fig. 6b and Fig. 6d, the later shows more accurate SLAMcompared to the former, with less class realizations. In addition, compared to Fig. 6e, Fig. 6c shows more accurateclassification with an additional object classified.For additional results at different time steps, refer to the supplementary material Sec. 10-11 and multimediasubmission. Time step M S D E No communicationDistributedDouble Counting (a)
MSDE −3 −2 −1 0 1 2 3
X axis [m] −3−2−10123 Y a x i s [ m ] O1O2O3 O4O5O6 (b)
Local
Object 5Object 1Object 4Object 6Object 3 . . . . . . Probability of weight (c)
Local −3 −2 −1 0 1 2 3
X axis [m] −3−2−10123 Y a x i s [ m ] O1O2O3 O4O5O6 (d)
Distributed
Object 5Object 3Object 1Object 4Object 6Object 2 . . . . . . Probability of weight (e)
Distributed
Figure 6: (a) presents average MSDE for the robots over 100 runs with different measurements. The rest are figures for time k = 35 of r . (b) and (d) represent multiple SLAM hypotheses for local and distributed setting respectively; Black dots with gray ellipse represent object poseestimation, red & blue signs with red ellipse represent robot pose estimation. Green and red points represent ground truth for object and robotposes respectively. (c) and (e) represent class probabilities for c = 1 and c = 2 for objects observed thus far for local and distributed respectively,with blue and orange for classes 1 and 2 respectively. In this case, the ground truth class of all objects is c = 1. Conclusions
We presented an approach for multi-robot semantic SLAM in an unknown environment. In this approach a distributedhybrid belief is maintained per robot using local information transmitted to other robots as a ’stack’, designed to keepestimation consistency without complex book-keeping, both for continuous and discrete states. We utilized a view-point dependent classifier model to account for the coupling of relative pose between robot and object, and object’sclass. In simulation and real-world experiment we showed that our approach improves classification and localizationperformance while avoiding double counting. Future work will incorporate data association disambiguation.
References [1] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. D. Reid, and J. J. Leonard, “Simulta-neous localization and mapping: Present, future, and the robust-perception age,”
IEEE Trans. Robotics , vol. 32,no. 6, pp. 1309 – 1332, 2016.[2] A. Bahr, M. Walter, and J. Leonard, “Consistent cooperative localization,” in
IEEE Intl. Conf. on Robotics andAutomation (ICRA) , May 2009, pp. 3415–3422.[3] A. Cunningham, V. Indelman, and F. Dellaert, “DDF-SAM 2.0: Consistent distributed smoothing and map-ping,” in
IEEE Intl. Conf. on Robotics and Automation (ICRA) , Karlsruhe, Germany, May 2013.[4] S. Omidshafiei, B. T. Lopez, J. P. How, and J. Vian, “Hierarchical bayesian noise inference for robust real-timeprobabilistic object classification,” arXiv preprint arXiv:1605.01042 , 2016.[5] D. Kopitkov and V. Indelman, “Robot localization through information recovered from cnn classificators,” in
IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS) . IEEE, October 2018.[6] Y. Feldman and V. Indelman, “Bayesian viewpoint-dependent robust classification under model and localizationuncertainty,” in
IEEE Intl. Conf. on Robotics and Automation (ICRA) , 2018.[7] V. Tchuiev, Y. Feldman, and V. Indelman, “Data association aware semantic mapping and localization viaa viewpoint-dependent classifier model,” in
IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS) ,2019.[8] J. M. Walls, A. G. Cunningham, and R. M. Eustice, “Cooperative localization by factor composition over afaulty low-bandwidth communication channel,” in
IEEE Intl. Conf. on Robotics and Automation (ICRA) , 2015.[9] S. Roumeliotis and G. Bekey, “Distributed multi-robot localization,”
IEEE Trans. Robot. Automat. , August2002.[10] A. Howard, “Multi-robot simultaneous localization and mapping using particle filters,”
Intl. J. ofRobotics Research , vol. 25, no. 12, pp. 1243–1256, 2006. [Online]. Available: http://cres.usc.edu/cgi-bin/print pub details.pl?pubid=514[11] S. Choudhary, L. Carlone, C. Nieto, J. Rogers, H. I. Christensen, and F. Dellaert, “Distributed mapping withprivacy and communication constraints: Lightweight algorithms and object-based models,”
Intl. J. of RoboticsResearch , vol. 36, no. 12, pp. 1286–1311, 2017.[12] V. Indelman, P. Gurfil, E. Rivlin, and H. Rotstein, “Graph-based distributed cooperative navigation for a generalmulti-robot measurement model,”
Intl. J. of Robotics Research , vol. 31, no. 9, August 2012.[13] A. Cunningham, M. Paluri, and F. Dellaert, “DDF-SAM: Fully distributed slam using constrained factorgraphs,” in
IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS) , 2010.[14] T. Bailey, S. Julier, and G. Agamennoni, “On conservative fusion of information with unknown non-gaussiandependence,” in
Intl. Conf. on Information Fusion, FUSION , 2012, pp. 1876 – 1883.[15] F. Dellaert, “Factor graphs and GTSAM: A hands-on introduction,” Georgia Institute of Technology, Tech.Rep. GT-RIM-CP&R-2012-002, September 2012. 1116] V. Indelman, E. Nelson, J. Dong, N. Michael, and F. Dellaert, “Incremental distributed inference from arbitraryposes and unknown data association: Using collaborating robots to establish a common reference,”
IEEE ControlSystems Magazine (CSM), Special Issue on Distributed Control and Estimation for Robotic Vehicle Networks ,vol. 36, no. 2, pp. 41–74, 2016.[17] W. Teacy, S. J. Julier, R. De Nardi, A. Rogers, and N. R. Jennings, “Observation modelling for vision-basedtarget search by unmanned aerial vehicles,” in
Intl. Conf. on Autonomous Agents and Multiagent Systems(AAMAS) , 2015, pp. 1607–1614.[18] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767 , 2018.[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
IEEE Conf. on ComputerVision and Pattern Recognition (CVPR) , 2016, pp. 770–778.12 upplementary Material
Technical Report ANPL-2020-01
Vladimir Tchuiev and Vadim Indelman
We present a relation that is used in equation derivation; Let A be a random variable conditioned on the set { B i } of random variables B i that are independent from each other. By using Bayes Law, we can split the conditionalprobability P ( A |{ B i } ) to a product of conditional probabilities: P ( A |{ B i } ) = P ( { B i }| A ) P ( A ) P ( { B i } ) = Q i P ( B i | A ) Q i P ( B i ) P ( A ) . (1)Using Bayes Law again on each element in the product, we reach the following expression: P ( A |{ B i } ) = Y i (cid:18) P ( A | B i ) P ( A ) (cid:19) P ( A ) . (2)This allow to express a random variable as a multiplication of conditionals, which will be useful to separate localand external measurements. P ( X Rk \ x rk | C Rk , H R − k ) Recall splitting H R − k into prior history H Rk − and non-local measurements & actions: H R − k = H Rk − ∪ ∆ H R − k . (3)We then use the above definition and relation (2) to split P ( X Rk \ x rk | C Rk , H R − k ) into a product of two beliefs, one thatdepends on prior history, and one that depends on external new measurements: P ( X Rk \ x rk | C Rk , H R − k ) = P ( X Rk \ x rk | C Rk ) l P ( X Rk \ x rk | C Rk , ∆ H R − k ) P ( X Rk \ x rk | C Rk ) P ( X Rk \ x rk | C Rk , H Rk − ) P ( X Rk \ x rk | C Rk ) . (4)This formulation allows us to isolate the new information sent by other robots at time k , from information alreadyused for inference at previous times. Next, we have to address that not all known objects are present in the sentlocal beliefs. Because the priors are assumed independent between poses and classes, P ( X Rk \ x rk | C Rk ) = P ( X Rk \ x rk ).From P ( X Rk \ x rk | C Rk , H Rk − ) we can split X Rk \ x rk into poses of objects that are involved in H Rk − and ones that do notas: P ( X Rk \ x rk | C Rk , H Rk − ) = P ( X o,R new ,k | C R new ,k , H Rk − ) P ( X Rk − | C Rk , H Rk − ) . (5)Poses of objects that r wasn’t aware of at time k − H Rk − , and without measurements, X o,R new ,k are independent of C R new ,k as well. In addition, X Rk − is independent of classes of objects that are observed only attime k , thus we can write: P ( X Rk \ x rk | C Rk , H Rk − ) = P ( X o,R new ,k ) · P ( X Rk − | C Rk − , H Rk − ) , (6) The authors are with the Department of Aerospace Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel. { vovatch, vadim.indelman } @technion.ac.il . a r X i v : . [ c s . R O ] J u l hich is the prior for poses of newly known objects at time step k , multiplied by the conditional continuous belief forobjects already known. Similarly to Eq. (6), the prior for X Rk \ x rk is separated to previously known and new objects: P ( X Rk \ x rk ) = P ( X o,R new ,k ) P ( X Rk − ) , (7)therefore we can write: P ( X Rk \ x rk | C Rk , H Rk − ) P ( X Rk \ x rk | C Rk ) = b Rk − P ( X Rk − ) , (8)and substitute it into the rightmost fracture in Eq. (4). Then we cancel out P ( X Rk \ x rk | C Rk ) and proceed to remove r ’s poses from the prior by: P ( X Rk \ x rk | C Rk , ∆ H R − k ) P ( X Rk − ) = P ( X o,Rk | C Rk , ∆ H R − k ) P ( X o,Rk − ) , (9)as robot r ’s poses up until time k − P ( X Rk \ x rk | C Rk ), and Eq. (8) and (9) with Eq. (4) we reach the following expression that is used in the paper: P ( X Rk \ x rk | C Rk , H R − k ) = b Rk − · P ( X o,Rk | C o,Rk , ∆ H R − k ) P ( X o,Rk − ) (10) In this section we show that the classes of two objects are not independent. We present a simple example where c and c be the underlying classes of objects 1 and 2 respectively. Let H R be the total measurement history, includingsemantic measurements z sem and z sem for objects 1 and 2 respectively. Recall that measurements are assumedindependent from each other. Using the Bayes Law: P ( c , c |H R ) ∝ P ( c , c |H R \ z sem , z sem ) P ( z sem | c ) P ( z sem | c ) . (11)We use a viewpoint dependent classifier model, so we must marginalize P ( z sem | c ) P ( z sem | c ) by the correspondingrelative viewpoints, denoted x rel and x rel respectively: P ( z sem | c ) P ( z sem | c ) = Z x rel ,x rel P ( z sem | c , x rel ) P ( z sem | c , x rel ) P ( x rel , x rel |H R ) dx rel dx rel . (12)From the above equation, the condition for c and c to be independent is that x rel and x rel must be independent,which is not true in the general case, thus c and c are dependent. P ( X o,Rk | C Rk , ∆ H R − k ) P ( X o,Rk ) (Continuous Belief Update) Using the relation (2) we can split the blue part by separating the new measurements and actions per robot, excluding r itself as it is not present in ∆ H R − k : P ( X o,Rk | C Rk , ∆ H R − k ) = Y k ,r ∈ R \ r P ( X o,Rk | C r k , ∆ H r k ) P ( X o,Rk ) ! P ( X o,Rk ) . (13)From that, we will address every element in the product. Poses of objects that r doesn’t observe locally can becanceled out as follows, leaving only the object poses that r observed: P ( X o,Rk | C r k , ∆ H r k ) P ( X o,Rk ) = P ( X o,r k | C r k , ∆ H r k ) P ( X o,r k ) . (14)Then, using relation (2) again, we can expand P ( X o,r k | C r k , H r k ) to separate between known prior and new measure-ments: P ( X o,r k | C r k , H r k ) = P ( X o,r k ) P ( X o,r k | C r k , H r k − l ) P ( X o,r k ) P ( X o,r k | C r k , ∆ H r k ) P ( X o,r k ) , (15)2ith l being the time difference between subsequent slots of r (that can be 0 if the slot isn’t updated). Then wetake out all the poses that aren’t dependent on the prior information, and we reach the definition of ξ r,r k − , i.e. themarginal object poses at the previous time. P ( X o,r k | C r k , H r k − l ) P ( X o,r k ) = P ( X o,r k − l | C r k − l , H r k − l ) P ( X o,r k − l ) . = ξ r,r k − . (16)With the definition of ξ r,r k , and by substituting Eq. (16) into Eq. (15) we reach the expression for a single elementof the product in Eq. (13): P ( X o,r k | C r k , ∆ H r k ) P ( X o,r k ) = ξ r,r k ξ r,r k − . (17)Finally, substituting Eq. (17) we reach the expression for the external continuous update belief: P ( X o,Rk | C Rk , ∆ H R − k ) P ( X o,Rk ) = Y r ∈ R ξ r,r k ξ r,r k − . (18) P ( C Rk | ∆ H R − k ) P ( C Rk ) (Discrete Belief Update) The discrete belief update is similar to the continuous in its derivation, with probability over class realization, ratherthan object poses. Again, using the relation (2) we can split the red part by separating the new measurements andactions per robot, excluding r itself as it is not present in ∆ H R − k : P ( C Rk | ∆ H R − k ) = Y k ,r ∈ R \ r P ( C Rk | ∆ H r k ) P ( C Rk ) ! P ( C Rk ) (19)From that, we will address every element in the product. Classes of objects that r doesn’t observe locally can becanceled out as follows, leaving only the classes of object that r observed: P ( C Rk | ∆ H r k ) P ( C Rk ) = P ( C r k | ∆ H r k ) P ( C r k ) . (20)Then, using relation (2) again, we can expand P ( C r k |H r k ) to separate between known prior and new measurements: P ( C r k |H r k ) = P ( C r k ) P ( C r k |H r k − l ) P ( C r k ) P ( C r k | ∆ H r k ) P ( C r k ) , (21)with l being the time difference between subsequent slots of r (that can be 0 if the slot isn’t updated). Then wetake out all the classes of objects that not appear in prior information, and we reach the definition of φ r,r k − , i.e. themarginal object poses at the previous time. P ( C r k |H r k − l ) P ( C r k ) = P ( C r k − l |H r k − l ) P ( C r k − l ) . = φ r,r k − (22)With the definition of φ r,r k , and by substituting Eq. (22) into Eq. (21) we reach the expression for a single elementof the product in Eq. (19): P ( C r k | ∆ H r k ) P ( C r k ) = φ r,r k φ r,r k − (23)Finally, substituting Eq. (23) we reach the expression for the external continuous update belief: P ( C Rk | ∆ H R − k ) P ( C Rk ) = Y r ∈ R φ r,r k φ r,r k − . (24)3 Simulation: Parameters
We consider a motion model with noise covariance Σ w = diag (0 . , . , . geov = diag (0 . , . , . h c ( c = 1 , ψ ) . = [0 . · sin( ψ ) + 0 . , . − sin( ψ ))] T h c ( c = 2 , ψ ) . = [0 . − sin( ψ )) , . · sin( ψ ) + 0 . T , where h c ( c = i, ψ ) ∈ R M is the predicted probability vector given object class c is i . Recall that our semanticmeasurements z sem,rk are probability vectors as well. ψ is the relative orientation between robot and object, computedfrom the relative pose x relk . = x o (cid:9) x k . The measurement covariance is defined via the square root information matrix,such that Σ c . = ( R T R ) − , and R = (cid:20) . − .
750 1 . (cid:21) . Both the geometric and semantic measurements are limited to10 meters from the robot’s pose. The highest probability for ambiguous class measurements is at ψ = − ◦ , where h c = [0 . , . T for both classes. 4 Simulation: Table of Stack Time Stamps
In this section we present a table of stack time stamps that indicates direct and indirect communication between robotsin our scenario. Recall that the maximal communication radius is 10 meters, thus robots r and r communicatefrom time k = 6, robots r starts communicating to others from time k = 13.Time step Stack of r Stack of r Stack of r k = 1 t( r ): 1t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 1 k = 2 t( r ): 2t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 2 k = 3 t( r ): 3t( r ): 0t( r ): 0 t( r ): 0t( r ): 3t( r ): 0 t( r ): 0t( r ): 0t( r ): 3 k = 4 t( r ): 4t( r ): 0t( r ): 0 t( r ): 0t( r ): 4t( r ): 0 t( r ): 0t( r ): 0t( r ): 4 k = 5 t( r ): 5t( r ): 0t( r ): 0 t( r ): 0t( r ): 5t( r ): 0 t( r ): 0t( r ): 0t( r ): 5 k = 6 t( r ): 6t( r ): 0t( r ): 0 t( r ): 0t( r ): 6t( r ): 5 t( r ): 0t( r ): 5t( r ): 6 k = 7 t( r ): 7t( r ): 0t( r ): 0 t( r ): 0t( r ): 7t( r ): 6 t( r ): 0t( r ): 6t( r ): 7 k = 8 t( r ): 8t( r ): 0t( r ): 0 t( r ): 0t( r ): 8t( r ): 7 t( r ): 0t( r ): 7t( r ): 8 k = 9 t( r ): 9t( r ): 0t( r ): 0 t( r ): 0t( r ): 9t( r ): 8 t( r ): 0t( r ): 8t( r ): 9 k = 10 t( r ): 10t( r ): 0t( r ): 0 t( r ): 0t( r ): 10t( r ): 9 t( r ): 0t( r ): 9t( r ): 10 k = 11 t( r ): 11t( r ): 0t( r ): 0 t( r ): 0t( r ): 11t( r ): 10 t( r ): 0t( r ): 10t( r ): 11 k = 12 t( r ): 12t( r ): 0t( r ): 0 t( r ): 0t( r ): 12t( r ): 11 t( r ): 0t( r ): 11t( r ): 12 k = 13 t( r ): 13t( r ): 12t( r ): 12 t( r ): 12t( r ): 13t( r ): 12 t( r ): 12t( r ): 12t( r ): 13 k = 14 t( r ): 14t( r ): 13t( r ): 13 t( r ): 13t( r ): 14t( r ): 13 t( r ): 13t( r ): 13t( r ): 14 k = 15 t( r ): 15t( r ): 14t( r ): 14 t( r ): 14t( r ): 15t( r ): 14 t( r ): 14t( r ): 14t( r ): 15 Time step Stack of r Stack of r Stack of r k = 16 t( r ): 16t( r ): 15t( r ): 15 t( r ): 15t( r ): 16t( r ): 15 t( r ): 15t( r ): 15t( r ): 16 k = 17 t( r ): 17t( r ): 16t( r ): 16 t( r ): 16t( r ): 17t( r ): 16 t( r ): 16t( r ): 16t( r ): 17 k = 18 t( r ): 18t( r ): 17t( r ): 17 t( r ): 17t( r ): 18t( r ): 17 t( r ): 17t( r ): 17t( r ): 18 k = 19 t( r ): 19t( r ): 18t( r ): 18 t( r ): 18t( r ): 19t( r ): 18 t( r ): 18t( r ): 18t( r ): 19 k = 20 t( r ): 20t( r ): 19t( r ): 19 t( r ): 19t( r ): 20t( r ): 19 t( r ): 19t( r ): 19t( r ): 20 k = 21 t( r ): 21t( r ): 20t( r ): 20 t( r ): 20t( r ): 21t( r ): 20 t( r ): 20t( r ): 20t( r ): 21 k = 22 t( r ): 22t( r ): 21t( r ): 21 t( r ): 21t( r ): 22t( r ): 21 t( r ): 21t( r ): 21t( r ): 22 k = 23 t( r ): 23t( r ): 22t( r ): 22 t( r ): 22t( r ): 23t( r ): 22 t( r ): 22t( r ): 22t( r ): 23 k = 24 t( r ): 24t( r ): 23t( r ): 23 t( r ): 23t( r ): 24t( r ): 23 t( r ): 23t( r ): 23t( r ): 24 k = 25 t( r ): 25t( r ): 24t( r ): 24 t( r ): 24t( r ): 25t( r ): 24 t( r ): 24t( r ): 24t( r ): 25 k = 26 t( r ): 26t( r ): 25t( r ): 25 t( r ): 25t( r ): 26t( r ): 25 t( r ): 25t( r ): 25t( r ): 26 k = 27 t( r ): 27t( r ): 26t( r ): 26 t( r ): 26t( r ): 27t( r ): 26 t( r ): 26t( r ): 26t( r ): 27 k = 28 t( r ): 28t( r ): 27t( r ): 27 t( r ): 27t( r ): 28t( r ): 27 t( r ): 27t( r ): 27t( r ): 28 k = 29 t( r ): 29t( r ): 28t( r ): 28 t( r ): 28t( r ): 29t( r ): 28 t( r ): 28t( r ): 28t( r ): 29 k = 30 t( r ): 30t( r ): 29t( r ): 29 t( r ): 29t( r ): 30t( r ): 29 t( r ): 29t( r ): 29t( r ): 305ime step Stack of r Stack of r Stack of r k = 31 t( r ): 31t( r ): 30t( r ): 30 t( r ): 30t( r ): 31t( r ): 30 t( r ): 30t( r ): 30t( r ): 31 k = 32 t( r ): 32t( r ): 31t( r ): 31 t( r ): 31t( r ): 32t( r ): 31 t( r ): 31t( r ): 31t( r ): 32 k = 33 t( r ): 33t( r ): 32t( r ): 32 t( r ): 32t( r ): 33t( r ): 32 t( r ): 32t( r ): 32t( r ): 33 k = 34 t( r ): 34t( r ): 33t( r ): 33 t( r ): 33t( r ): 34t( r ): 33 t( r ): 33t( r ): 33t( r ): 34 k = 35 t( r ): 35t( r ): 34t( r ): 34 t( r ): 34t( r ): 35t( r ): 34 t( r ): 34t( r ): 34t( r ): 35 k = 36 t( r ): 36t( r ): 35t( r ): 35 t( r ): 35t( r ): 36t( r ): 35 t( r ): 35t( r ): 35t( r ): 36 k = 37 t( r ): 37t( r ): 36t( r ): 36 t( r ): 36t( r ): 37t( r ): 36 t( r ): 36t( r ): 36t( r ): 37 k = 38 t( r ): 38t( r ): 37t( r ): 37 t( r ): 37t( r ): 38t( r ): 37 t( r ): 37t( r ): 37t( r ): 38 k = 39 t( r ): 39t( r ): 38t( r ): 38 t( r ): 38t( r ): 39t( r ): 38 t( r ): 38t( r ): 38t( r ): 39 k = 40 t( r ): 40t( r ): 39t( r ): 39 t( r ): 39t( r ): 40t( r ): 39 t( r ): 39t( r ): 39t( r ): 40 k = 41 t( r ): 41t( r ): 40t( r ): 40 t( r ): 40t( r ): 41t( r ): 40 t( r ): 40t( r ): 40t( r ): 41 k = 42 t( r ): 42t( r ): 41t( r ): 41 t( r ): 41t( r ): 42t( r ): 41 t( r ): 41t( r ): 41t( r ): 42 k = 43 t( r ): 43t( r ): 42t( r ): 42 t( r ): 42t( r ): 43t( r ): 42 t( r ): 42t( r ): 42t( r ): 43 k = 44 t( r ): 44t( r ): 43t( r ): 43 t( r ): 43t( r ): 44t( r ): 43 t( r ): 43t( r ): 43t( r ): 44 k = 45 t( r ): 45t( r ): 44t( r ): 44 t( r ): 44t( r ): 45t( r ): 44 t( r ): 44t( r ): 44t( r ): 45 Time step Stack of r Stack of r Stack of r k = 46 t( r ): 46t( r ): 45t( r ): 45 t( r ): 45t( r ): 46t( r ): 45 t( r ): 45t( r ): 45t( r ): 46 k = 47 t( r ): 47t( r ): 46t( r ): 46 t( r ): 46t( r ): 47t( r ): 46 t( r ): 46t( r ): 46t( r ): 47 k = 48 t( r ): 48t( r ): 47t( r ): 47 t( r ): 47t( r ): 48t( r ): 47 t( r ): 47t( r ): 47t( r ): 48 k = 49 t( r ): 49t( r ): 48t( r ): 48 t( r ): 48t( r ): 49t( r ): 48 t( r ): 48t( r ): 48t( r ): 49 k = 50 t( r ): 50t( r ): 49t( r ): 49 t( r ): 49t( r ): 50t( r ): 49 t( r ): 49t( r ): 49t( r ): 50 k = 51 t( r ): 51t( r ): 50t( r ): 50 t( r ): 50t( r ): 51t( r ): 50 t( r ): 50t( r ): 50t( r ): 51 k = 52 t( r ): 52t( r ): 51t( r ): 51 t( r ): 51t( r ): 52t( r ): 51 t( r ): 51t( r ): 51t( r ): 52 k = 53 t( r ): 53t( r ): 52t( r ): 52 t( r ): 52t( r ): 53t( r ): 52 t( r ): 52t( r ): 52t( r ): 53 k = 54 t( r ): 54t( r ): 53t( r ): 53 t( r ): 53t( r ): 54t( r ): 53 t( r ): 53t( r ): 53t( r ): 54 k = 55 t( r ): 55t( r ): 54t( r ): 54 t( r ): 54t( r ): 55t( r ): 54 t( r ): 54t( r ): 54t( r ): 55 k = 56 t( r ): 56t( r ): 55t( r ): 55 t( r ): 55t( r ): 56t( r ): 55 t( r ): 55t( r ): 55t( r ): 56 k = 57 t( r ): 57t( r ): 56t( r ): 56 t( r ): 56t( r ): 57t( r ): 56 t( r ): 56t( r ): 56t( r ): 57 k = 58 t( r ): 58t( r ): 57t( r ): 57 t( r ): 57t( r ): 58t( r ): 57 t( r ): 57t( r ): 57t( r ): 58 k = 59 t( r ): 59t( r ): 58t( r ): 58 t( r ): 58t( r ): 59t( r ): 58 t( r ): 58t( r ): 58t( r ): 59 k = 60 t( r ): 60t( r ): 59t( r ): 59 t( r ): 59t( r ): 60t( r ): 59 t( r ): 59t( r ): 59t( r ): 606 Simulation: Additional Results
In this section we present additional results for the simulation. In Fig. 1, 2, 3, and 4 we show the beliefs at variousstages of the path. − 10 − 5 0 5
X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (a)
Local SLAM r Object 4Object 2 . . . . . . Probability of weight XX (b) Local classification r − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (c)
Local SLAM r Object 3Object 4Object 2Object 12 . . . . . . Probability of weight
XXX X (d)
Local classification r Figure 1:
Figures for robot r and r , local beliefs for time k = 15 and k = 20 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (a)
Dis. SLAM r Object 4Object 3Object 2Object 7Object 8Object 1 . . . . . . Probability of weight
XXX XXX (b)
Dis. classification r − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (c)
Dis. SLAM r Object 3Object 4Object 2Object 7Object 8Object 1Object 13Object 15Object 12 . . . . . . Probability of weight
XXX XXXXX X (d)
Dis. classification r Figure 2:
Figures for robot r and r , distributed beliefs for time k = 15 and k = 20 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. The results of all the graphs support the paper results, where both classification and SLAM in general are moreaccurate for the distributed belief. In addition, the robots inferring the distributed belief take into account objectsthat they didn’t observe directly.In Fig. 5 we show the time each inference time-step takes to compute for the distributed case, without and withdouble-counting. In general, computation time is influenced by the number of class realizations that aren’t pruned,and is higher when robots communicate between each other. For each newly observed object the algorithm mustconsider all realizations for the said object, thus the computation time ”spikes” at the first step the new object isobserved. 7
10 − 5 0 5
X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (a)
Local SLAM r Object 4Object 2Object 3Object 15Object 11 . . . . . . Probability of weight
XX XXX (b)
Local classification r − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (c)
Local SLAM r Object 7Object 8Object 1Object 13Object 5Object 14Object 10Object 11 . . . . . . Probability of weight
XXXXXXXX (d)
Local classification r Figure 3:
Figures for robot r and r , local beliefs for time k = 25 and k = 50 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (a)
Dis. SLAM r Object 4Object 3Object 2Object 7Object 8Object 1Object 15Object 13Object 12Object 11Object 9 . . . . . . Probability of weight
XXX XXXX XXXX (b)
Dis. classification r − 10 − 5 0 5 X axis [m] − 505 Y a x i s [ m ] O1O2O3 O4O5 O6 O7O8O9O10 O11 O12O13O14 O15 (c)
Dis. SLAM r Object 7Object 8Object 1Object 4Object 2Object 3Object 13Object 15Object 12Object 11Object 9Object 5Object 14Object 6Object 10 . . . . . . Probability of weight
XXXXX XXX XXX XXX X (d)
Dis. classification r Figure 4:
Figures for robot r and r , distributed beliefs for time k = 25 and k = 50 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. Time step C o m p . t i m e pe r s t ep [ s ] DistributedDouble Counting
Figure 5:
Calculation time as a function of the time step in seconds. Experiment: Parameters
We consider a motion model with noise covariance Σ w = diag (0 . , . , . geov = diag (0 . , . , . c = 1 is ’Barber Chair’ and is considered ourground truth. Class c = 2 is ’Punching Bag’ and class c = 3 is ’Traffic Light’. We trained the classifiers using pairsof relative pose and probability vectors; for c = 1, we used images of a chair used in the experiment, while for c = 2and c = 3, we sampled measurements from Dirichlet Distribution with parameters α = [5 , ,
3] and α = [5 , , ψ , and the relative θ , with the camerabeing viewed from the object’s frame of reference.Fig. 6 presents 4 of the images used in the experiment, with bounding boxes for the chairs. Fig. 7, Fig 8, and 9present the trained expected probability values for each relative ψ and θ values, i.e. P ( z sem | c = i, ψ, θ ) for each figurewith different i . Each subfigure (a) to (c) representing measurement probability of class c = 1 to c = 3 respectively. Figure 6:
Four of the experiment images shown with corresponding bounding boxes. [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (a) c = 1 model, c = 1 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (b) c = 1 model, c = 2 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (c) c = 1 model, c = 3 prob. Figure 7:
Classifier model for c = 1, trained on real images: probabilities of classes 1 to 3 depending on relative yaw and pitch angles presentedi (a) to (c) respectively. Higher surfaces go have bluer color. [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (a) c = 2 model, c = 1 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (b) c = 2 model, c = 2 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (c) c = 2 model, c = 3 prob. Figure 8:
Classifier model for c = 2, trained on real images: probabilities of classes 1 to 3 depending on relative yaw and pitch angles presentedi (a) to (c) respectively. Higher surfaces go have bluer color. [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (a) c = 3 model, c = 1 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (b) c = 3 model, c = 2 prob. ψ [ r a d ] −3 −2 −1 0 1 2 3 θ [ r a d ] P r e d i c t e d p r o b a b ili t y (c) c = 3 model, c = 3 prob. Figure 9:
Classifier model for c = 3, trained on real images: probabilities of classes 1 to 3 depending on relative yaw and pitch angles presentedi (a) to (c) respectively. Higher surfaces go have bluer color. In this section we present a table of stack time stamps that indicates direct and indirect communication between robotsin our scenario. Recall that the maximal communication radius is 3 meters, thus all robots start to communicatebetween them at step k = 6.Time step Stack of r Stack of r Stack of r k = 1 t( r ): 1t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 1 k = 2 t( r ): 2t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 0 t( r ): 0t( r ): 0t( r ): 2 k = 3 t( r ): 3t( r ): 0t( r ): 0 t( r ): 0t( r ): 3t( r ): 0 t( r ): 0t( r ): 0t( r ): 3 k = 4 t( r ): 4t( r ): 0t( r ): 0 t( r ): 0t( r ): 4t( r ): 0 t( r ): 0t( r ): 0t( r ): 4 k = 5 t( r ): 5t( r ): 0t( r ): 0 t( r ): 0t( r ): 5t( r ): 0 t( r ): 0t( r ): 0t( r ): 5 k = 6 t( r ): 6t( r ): 5t( r ): 5 t( r ): 5t( r ): 6t( r ): 5 t( r ): 5t( r ): 5t( r ): 6 k = 7 t( r ): 7t( r ): 6t( r ): 6 t( r ): 6t( r ): 7t( r ): 6 t( r ): 6t( r ): 6t( r ): 7 k = 8 t( r ): 8t( r ): 7t( r ): 7 t( r ): 7t( r ): 8t( r ): 7 t( r ): 7t( r ): 7t( r ): 8 k = 9 t( r ): 9t( r ): 8t( r ): 8 t( r ): 8t( r ): 9t( r ): 8 t( r ): 8t( r ): 8t( r ): 9 k = 10 t( r ): 10t( r ): 9t( r ): 9 t( r ): 9t( r ): 10t( r ): 9 t( r ): 9t( r ): 9t( r ): 10 k = 11 t( r ): 11t( r ): 10t( r ): 10 t( r ): 10t( r ): 11t( r ): 10 t( r ): 10t( r ): 10t( r ): 11 k = 12 t( r ): 12t( r ): 11t( r ): 11 t( r ): 11t( r ): 12t( r ): 11 t( r ): 11t( r ): 11t( r ): 12 k = 13 t( r ): 13t( r ): 12t( r ): 12 t( r ): 12t( r ): 13t( r ): 12 t( r ): 12t( r ): 12t( r ): 13 k = 14 t( r ): 14t( r ): 13t( r ): 13 t( r ): 13t( r ): 14t( r ): 13 t( r ): 13t( r ): 13t( r ): 14 k = 15 t( r ): 15t( r ): 14t( r ): 14 t( r ): 14t( r ): 15t( r ): 14 t( r ): 14t( r ): 14t( r ): 15 Time step Stack of r Stack of r Stack of r k = 16 t( r ): 16t( r ): 15t( r ): 15 t( r ): 15t( r ): 16t( r ): 15 t( r ): 15t( r ): 15t( r ): 16 k = 17 t( r ): 17t( r ): 16t( r ): 16 t( r ): 16t( r ): 17t( r ): 16 t( r ): 16t( r ): 16t( r ): 17 k = 18 t( r ): 18t( r ): 17t( r ): 17 t( r ): 17t( r ): 18t( r ): 17 t( r ): 17t( r ): 17t( r ): 18 k = 19 t( r ): 19t( r ): 18t( r ): 18 t( r ): 18t( r ): 19t( r ): 18 t( r ): 18t( r ): 18t( r ): 19 k = 20 t( r ): 20t( r ): 19t( r ): 19 t( r ): 19t( r ): 20t( r ): 19 t( r ): 19t( r ): 19t( r ): 20 k = 21 t( r ): 21t( r ): 19t( r ): 19 t( r ): 19t( r ): 21t( r ): 20 t( r ): 19t( r ): 20t( r ): 21 k = 22 t( r ): 22t( r ): 19t( r ): 19 t( r ): 19t( r ): 22t( r ): 21 t( r ): 19t( r ): 21t( r ): 22 k = 23 t( r ): 23t( r ): 19t( r ): 22 t( r ): 21t( r ): 23t( r ): 21 t( r ): 22t( r ): 21t( r ): 23 k = 24 t( r ): 24t( r ): 19t( r ): 23 t( r ): 21t( r ): 24t( r ): 21 t( r ): 23t( r ): 21t( r ): 24 k = 25 t( r ): 25t( r ): 23t( r ): 24 t( r ): 21t( r ): 25t( r ): 24 t( r ): 24t( r ): 24t( r ): 25 k = 26 t( r ): 26t( r ): 24t( r ): 25 t( r ): 24t( r ): 26t( r ): 25 t( r ): 25t( r ): 25t( r ): 26 k = 27 t( r ): 27t( r ): 25t( r ): 26 t( r ): 25t( r ): 27t( r ): 26 t( r ): 26t( r ): 26t( r ): 27 k = 28 t( r ): 28t( r ): 26t( r ): 27 t( r ): 26t( r ): 28t( r ): 27 t( r ): 27t( r ): 27t( r ): 28 k = 29 t( r ): 29t( r ): 27t( r ): 28 t( r ): 27t( r ): 29t( r ): 28 t( r ): 28t( r ): 28t( r ): 29 k = 30 t( r ): 30t( r ): 29t( r ): 29 t( r ): 29t( r ): 30t( r ): 29 t( r ): 29t( r ): 29t( r ): 3013ime step Stack of r Stack of r Stack of r k = 31 t( r ): 31t( r ): 30t( r ): 30 t( r ): 30t( r ): 31t( r ): 30 t( r ): 30t( r ): 30t( r ): 31 k = 32 t( r ): 32t( r ): 31t( r ): 31 t( r ): 31t( r ): 32t( r ): 31 t( r ): 31t( r ): 31t( r ): 32 k = 33 t( r ): 33t( r ): 32t( r ): 32 t( r ): 32t( r ): 33t( r ): 32 t( r ): 32t( r ): 32t( r ): 33 k = 34 t( r ): 34t( r ): 33t( r ): 33 t( r ): 33t( r ): 34t( r ): 33 t( r ): 33t( r ): 33t( r ): 34 k = 35 t( r ): 35t( r ): 34t( r ): 34 t( r ): 34t( r ): 35t( r ): 34 t( r ): 34t( r ): 34t( r ): 35 k = 36 t( r ): 36t( r ): 35t( r ): 35 t( r ): 35t( r ): 36t( r ): 35 t( r ): 35t( r ): 35t( r ): 36 k = 37 t( r ): 37t( r ): 36t( r ): 36 t( r ): 36t( r ): 37t( r ): 36 t( r ): 36t( r ): 36t( r ): 37 k = 38 t( r ): 38t( r ): 37t( r ): 37 t( r ): 37t( r ): 38t( r ): 37 t( r ): 37t( r ): 37t( r ): 38 k = 39 t( r ): 39t( r ): 38t( r ): 38 t( r ): 38t( r ): 39t( r ): 38 t( r ): 38t( r ): 38t( r ): 39 k = 40 t( r ): 40t( r ): 39t( r ): 39 t( r ): 39t( r ): 40t( r ): 39 t( r ): 39t( r ): 39t( r ): 40 14 In this section we present additional results for the experiment. We show the beliefs at various stages of the path. − 3 − 2 − 1 0 1 2 3
X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (a)
Local SLAM r Object 1 . . . . . . Probability of weight (b)
Local classification r − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (c)
Local SLAM r Object 5Object 1Object 4Object 6 . . . . . . Probability of weight (d)
Local classification r Figure 10:
Figures for robot r and r , local beliefs for time k = 15 and k = 20 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (a)
Dis. SLAM r Object 1Object 5Object 3Object 4 . . . . . . Probability of weight (b)
Dis. classification r − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (c)
Dis. SLAM r Object 5Object 3Object 1Object 4Object 6 . . . . . . Probability of weight (d)
Dis. classification r Figure 11:
Figures for robot r and r , distributed beliefs for time k = 15 and k = 20 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. The results of all the graphs support the paper results as well, where both classification and SLAM in generalare more accurate for the distributed belief. In addition, the robots inferring the distributed belief take into accountobjects that they didn’t observe directly.In Fig. 14 we show the time each inference time-step takes to compute for the distributed case, without and withdouble-counting. In general, computation time is influenced by the number of class realizations that aren’t pruned,and is higher when robots communicate between each other. For each newly observed object the algorithm mustconsider all realizations for the said object, thus the computation time ”spikes” at the first step the new object isobserved. Because the classifier model in the experiment uses deep neural networks, the computation is slower thanin the simulation where hand crafted models were used.
References [1] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical imagedatabase. In
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pages 248–255.IEEE, 2009. 15
X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (a)
Local SLAM r Object 1Object 3Object 5Object 6 . . . . . . Probability of weight (b)
Local classification r − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (c)
Local SLAM r Object 5Object 3Object 1Object 4Object 2 . . . . . . Probability of weight (d)
Local classification r Figure 12:
Figures for robot r and r , local beliefs for time k = 35 and k = 40 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (a)
Dis. SLAM r Object 1Object 5Object 3Object 4Object 6Object 2 . . . . . . Probability of weight (b)
Dis. classification r − 3 − 2 − 1 0 1 2 3 X axis [m] − 3− 2− 10123 Y a x i s [ m ] O1O2O3 O4O5O6 (c)
Dis. SLAM r Object 5Object 3Object 1Object 4Object 6Object 2 . . . . . . Probability of weight (d)
Dis. classification r Figure 13:
Figures for robot r and r , distributed beliefs for time k = 35 and k = 40 respectively. (a) and (b) show results for r , (c) and (d) for r . (a) and (c) present SLAM results, (b) and (d) present classification results. Time step C o m p . t i m e pe r s t ep [ s ] DistributedDouble Counting
Figure 14: