[PDF] Multi-Robot Target Search using Probabilistic Consensus on Discrete Markov Chains

Abstract

In this paper, we propose a probabilistic consensus-based multi-robot search strategy that is robust to communication link failures, and thus is suitable for disaster affected areas. The robots, capable of only local communication, explore a bounded environment according to a random walk modeled by a discrete-time discrete-state (DTDS) Markov chain and exchange information with neighboring robots, resulting in a time-varying communication network topology. The proposed strategy is proved to achieve consensus, here defined as agreement on the presence of a static target, with no assumptions on the connectivity of the communication network. Using numerical simulations, we investigate the effect of the robot population size, domain size, and information uncertainty on the consensus time statistics under this scheme. We also validate our theoretical results with 3D physics-based simulations in Gazebo. The simulations demonstrate that all robots achieve consensus in finite time with the proposed search strategy over a range of robot densities in the environment.

Full PDF

MMulti-Robot Target Search using Probabilistic Consensus onDiscrete Markov Chains

Aniket Shirsat, Karthik Elamvazhuthi, and Spring Berman

Abstract — In this paper, we propose a probabilisticconsensus-based multi-robot search strategy that is robust tocommunication link failures, and thus is suitable for disasteraffected areas. The robots, capable of only local communication,explore a bounded environment according to a random walkmodeled by a discrete-time discrete-state (DTDS) Markov chainand exchange information with neighboring robots, resulting ina time-varying communication network topology. The proposedstrategy is proved to achieve consensus, here deﬁned as agree-ment on the presence of a static target, with no assumptionson the connectivity of the communication network. Usingnumerical simulations, we investigate the effect of the robotpopulation size, domain size, and information uncertainty onthe consensus time statistics under this scheme. We also validateour theoretical results with 3D physics-based simulations inGazebo. The simulations demonstrate that all robots achieveconsensus in ﬁnite time with the proposed search strategy overa range of robot densities in the environment.

I. INTRODUCTIONDisaster areas, such as regions affected by earthquakesand ﬂoods, experience great disruption to communication andpower infrastructure. This presents challenges in coordinat-ing searches for survivors and dispersing relief teams to thoselocations. Teams of mobile robots have proved to be usefulfor exploring and mapping environments in disaster responsescenarios [1], [2], [3]. However, such robots are subject toconstraints on the payloads that they can carry, includingpower sources, sensors, embedded processors, actuators, andcommunication devices for transmitting information to otheragents and/or to a command center. In addition, many multi-robot control strategies rely on a communication networkfor coordination. Centralized exploration strategies like [4]rely on constant communication between agents and a centralnode. However, these strategies do not scale well withthe number of agents, since the communication bandwidthbecomes a bottleneck with increasing agent population size.Moreover, such strategies suffer from a single point offailure, i.e., a disruption to the central node causes loss ofcommunication for all the agents.These drawbacks can be overcome by employing decen-tralized exploration strategies that involve only local com-munication between agents. However, communication canbecome unreliable as the number of agents increases [6],

This work was supported by ONR Young Investigator Award N00014-16-1-2605 and by the Arizona State University Global Security Initiative.Aniket Shirsat and Spring Berman are with the School for Engineering ofMatter, Transport and Energy, Arizona State University, Tempe, AZ, 85287USA { ashirsat, Spring.Berman } @asu.edu .Karthik Elamvazhuthi is with the Department of Mathematics,University of California, Los Angeles, CA, 90095 USA [email protected] . Fig. 1: Overhead view of problem scenario, simulated inGazebo 9 [5]. Multiple aerial robots, ﬂying at differentheights, search for a target represented by the magenta boxusing a Markov chain motion model.and the connectivity of the communication network may bedisrupted in some applications by the environment [7] or bythe movement of agents outside of communication range.Decentralized multi-agent control strategies that employcommunication networks often require the agents to reach consensus on a particular variable. Achieving consensus isthe problem of arriving at a common output variable orglobal property from measurements by distributed agentswith local communication, without the need for a supervisoryagent (leader or central processor) [8]. Consensus problemshave been studied in the cases of static or ﬁxed networktopologies [9] and dynamic or switching network topologies[8], directed and undirected communication graphs [10],random networks [11], and mobile networks with commu-nication delays [12]. Consensus algorithms for multi-robotrendezvous, e.g. [13], [14], [15], [16], are an example of sucha strategy on a dynamic network. The robot controllers drivethe robots to meet at a common location in order to enabletheir information exchange via local communication. How-ever, such strategies restrict exploration since the robots mustaggregate at a common location. Distributed consensus formerging individual agents’ information has been previouslyused for multi-agent search, e.g. [17]; however, it requiresa connected communication network. Although random mo-bility models are commonly used in multi-robot exploration,e.g. [18], [19], [20], few works consider consensus problemsfor agents that perform probabilistic search strategies, andthus have randomly time-varying communication networks.To address this problem, we present and analyze a a r X i v : . [ c s . R O ] S e p robabilistic multi-agent search strategy that is based ona distributed consensus protocol. The proposed strategy isdecentralized and asynchronous and relies on only limitedcommunication among agents. Thus, it can be employed inapplications, such as disaster response scenarios, where it isinfeasible to maintain a connected communication network,rendezvous, or communicate with a central node. The agentsmove according to a discrete-time discrete-state (DTDS)Markov chain model on a ﬁnite spatial grid, as illustratedin Figure 1. We consider only static features here, whichrepresent persistent characteristics of the target(s) that theagents are searching for in the environment. The maincontributions of this paper are the following:1) We prove that agents with a DTDS Markov motionmodel and local communication will achieve consen-sus, in an almost sure sense, on the presence of a staticfeature of interest in a bounded environment.2) Our proof does not require the assumption that theagent communication network remain connected overa non-zero ﬁnite time interval, as assumed in [21]for a similar consensus problem over a time-varyingnetwork. We validate our theoretical results with Monte Carlo sim-ulations in MATLAB and with 3D physics simulationsperformed in Gazebo 9 [5] using the Robot Operating System(ROS). From the simulation results, we empirically charac-terize the dependence of the expected time until consensuson the number of agents, the grid size, and the agent density,which can be used to guide the selection of the number ofagents to search a given environment.The remainder of the paper is organized as follows.Section II presents the problem formulation, and SectionIII describes the probabilistic motion model of the agents.Section IV proves that all agents will reach consensus on thepresence of the feature under our stochastic search strategy.Section V presents example implementations of our strategyin numerical and 3D physics simulations and discusses theresults. Section VI concludes and suggests future work.II. PROBLEM STATEMENTWe consider an unknown, bounded environment that con-tains a ﬁnite, non-zero number of static features of interest,indexed by the set

I ⊂ Z + , where Z + is the set ofpositive integers. A set of N agents, indexed by the set N = { , , ..., N } , explore the environment using a randomwalk strategy. We assume that each agent can localize itselfin the environment and can detect a feature within its sensingrange. When an agent a ∈ N detects a feature at discretetime k , it associates a scalar information state ξ a [ k ] ∈ R ≥ with its current position. The vector of information statesfor all agents at time k is denoted by ξ [ k ] . Deﬁning U (0 , as the uniform probability distribution on the interval [0 , ,the initial information state of each agent a is speciﬁed This assumption implies the existence of a uniform upper bound on theinterval between successive meeting times of any two agents, which is notguaranteed for agents that evolve stochastically on a ﬁnite connected statespace. a priori as ξ a [0] ∼ U (0 , . The agent can communicateits information state ξ a [ k ] at time k to all agents withina disc of radius r comm ∈ (0 , δ ] , where δ is the maximumcommunication radius. We deﬁne these agents as the setof neighbors of agent a at time k , denoted by N ak . Inaddition, we assume that the agents can avoid obstaclesduring their exploration. Since the agents are constantlymoving, the set of agents with which they can communicatechanges over time. The time evolution of this communicationnetwork is determined by the random walks of the agentsthroughout the bounded environment. This approach uses lowcommunication bandwidth, since each agent only transmitsa scalar value associated with each feature that it detects.We discretize the environment, as shown in Figure 2, intoa square grid of nodes spaced at a distance d apart. The set ofnodes is denoted by S ⊂ Z + . We deﬁne S = |S| . Let G s =( V s , E s ) be an undirected graph associated with this ﬁnitespatial grid, where V s is the set of nodes and E s is the setof edges ( i, j ) that signify pairs of nodes i, j ∈ V s betweenwhich agents can travel. We refer to these pairs of nodes as neighboring nodes . Each agent performs a random walk onthis grid, moving from its current node i to a neighboringnode j at the next time step with transition probability p ij .Let Z ak ∈ S be a random variable that represents the index ofthe node that an agent a ∈ N occupies at the discrete time k .For each agent a , the probability mass function π k ∈ R × S of Z ak evolves according to a DTDS Markov chain: π k +1 = π k P , (1)where the state transition matrix P ∈ R S × S has elements p ij ∈ [0 , at row i ∈ S and column j ∈ S .We assume that no prior information about possible searchlocations is available. To cover the search area uniformly,each agent is deployed from a random node on the spatialgrid. These initial agent positions are chosen independentlyof one another and are identically distributed according to theprobability mass function π , deﬁned as a discrete uniformdistribution over the set of nodes. We deﬁne ξ r ∈ R ≥ asa scalar reference information state that is associated withthe set of nodes Z r ⊂ S from which an agent can detect afeature. In this work, we consider environments with a singlefeature of interest.We now deﬁne another graph that models the time-varyingcommunication topology of the agents as they move alongthe spatial grid. Let G c [ k ] = ( V c , E c [ k ]) be an undirectedgraph in which V c = N , the set of agents, and E c [ k ] isthe set of all pairs of agents ( a, b ) ∈ N × N that cancommunicate with each other at time k . Let M [ k ] ∈ R N × N be the adjacency matrix with elements m ab [ k ] = 1 if ( a, b ) ∈ E c [ k ] and m ab [ k ] = 0 otherwise. We deﬁne L [ k ] ∈ R N × N as the graph Laplacian, whose elements are l ab [ k ] = (cid:80) Nb =1 m ab [ k ] = deg ( v a ) if a = b and l ab [ k ] = − m ab [ k ] if a (cid:54) = b . Given the agent dynamics (1) on the spatial grid, eachagent a updates its information state at each time k accordingto a consensus protocol similar to one developed in [22].This update is based on the agent’s current information; theinformation from all its neighboring agents, of which there lgorithm 1: Control strategy for agent a ∈ N Input: α, g a , (cid:15), ξ r ; ξ a [0] ∼ U (0 , ; Z a ← i ∈ S Output: k, ξ a [ k ] for which | ξ a [ k ] − ξ r | ≤ (cid:15)k ← while | ξ a [ k ] − ξ r | > (cid:15) do sum1 ← sum2 ← forall b ∈ N ak do /* agents a , b communicate */ sum1 ← sum1 − αl ab [ k ]( ξ a [ k ] − ξ b [ k ]) endif i ∈ Z r then /* agent a detects feature */ sum2 ← − g a ( ξ a [ k ] − ξ r ) end ξ a [ k + 1] ← ξ a [ k ] + sum1 + sum2 Z ak +1 ← j , ( i, j ) ∈ E s , with probability p ij i ← jk ← k + 1 end are at most d max = N − ; and the reference informationstate: ξ a [ k + 1] = ξ a [ k ] − α (cid:88) b ∈N ak l ab [ k ]( ξ a [ k ] − ξ b [ k ]) − g a ( ξ a [ k ] − ξ r ) , (2)where a, b ∈ N ; α is a constant, chosen such that α ∈ (0 , d max ) [12]; and g a is deﬁned as: g a = (cid:40) , Z ak ∈ Z r , otherwise (3)In the next two sections, we will show that when agentsmove on the spatial grid according to (1) and exchange in-formation with their neighbors according to (2), they achieve average consensus on their information states, deﬁned asfollows: Deﬁnition II.1.

We say that the vector ξ [ k ] converges almostsurely to average consensus if ξ [ k ] a.s → ξ r , (4) where ∈ R N × is a vector of ones. This implies that the agents’ individual information stateswill eventually converge to a common information state thatindicates the presence of the object being searched. We deﬁne T c as the time k at which every agent’s information state ξ a [ k ] reaches ξ r within a small tolerance (cid:15) , where ≤ (cid:15) (cid:28) ;i.e., | ξ a [ T c ] − ξ r | < (cid:15) for all agents a ∈ N . We consider T c to be the time at which the agents reach consensus.The implementation of this probabilistic search strategyon each agent is described in the pseudo code shown inAlgorithm 1. We illustrate the strategy for a scenario withtwo quadrotors in Figure 2. The quadrotors start at the spatialgrid nodes indexed by i and j and move on the grid according m = Z k = Z k Z = i Z = j upleft downrightFig. 2: Illustration of our multi-agent search strategy, show-ing sample paths for two quadrotors (orange and red) ona square grid. The quadrotors search the environment for astatic target (the magenta star) as they perform a randomwalk on the grid.to the DTDS Markov chain dynamics in (1). The ﬁgureshows sample paths of the quadrotors. The orange quadrotordetects the feature, indicated by a magenta star, when itmoves to a node in the set Z r (at these nodes, the feature iswithin the quadrotor’s sensing range). The quadrotors meet atgrid node m after k = 9 time steps and exchange informationaccording to (2). They stop the search if their informationstates are within (cid:15) of ξ r ; otherwise, they continue to random-walk on the grid.III. A NALYSIS OF THE M ARKOV C HAIN M ODEL OF A GENT M OBILITY

Consider the DTDS Markov chain that governs the prob-ability mass function of the state Z ak , deﬁned as the locationof agent a at time k on the spatial grid that representsthe environment. Then, the time evolution of the agent a ’smovement in this ﬁnite state space can be expressed by usingthe Markov property as follows: P r ( Z ak +1 = j | Z ak = i, Z ak − = m, . . . , Z a = l )= P r ( Z ak +1 = j | Z ak = i ) , (5)where the second expression is the probability with whichan agent at node j transitions to node i at time k + 1 , and m, l ∈ Z + . A. State Transition Matrix

The Markov chain (1) is expressed in terms of the statetransition matrix P . The time invariant matrix P is de-ﬁned by the state space of the spatial grid representingthe discretized environment. Hence, the Markov chain is time-homogeneous , which implies that P r ( Z ak +1 = j | Z ak = i ) is the same for all agents at all times k . The entries of P ,hich are the state transition probabilities, can therefore bedeﬁned as p ij = P r ( Z ak +1 = j | Z ak = i ) , ∀ i, j ∈ S , k ∈ Z + , ∀ a ∈ N . (6)Since each agent chooses its next node from a uniformdistribution, these entries can be computed as p ij = (cid:40) d i +1 , ( i, j ) ∈ E s , , otherwise , (7)where d i is the degree of the node i ∈ S , deﬁned as d i = 2 if i is a corner of the spatial grid, d i = 3 if it is on an edgebetween two corners, and d i = 4 otherwise. Since each entry p ij ≥ , we use the notation P ≥ . We see that P m ≥ for m ≥ . Hence, P is a non-negative matrix. Using Theorem5 in [23], we can conclude that the state transition matrix P is a stochastic matrix. B. Stationary Distribution

A stationary distribution of a Markov chain is deﬁned asfollows.

Deﬁnition III.1. (Page 227 in [23]) The vector π ∈ R S iscalled a stationary distribution of a Markov chain if π hasentries such that: π j ≥ ∀ j ∈ S and (cid:80) Sj =1 π j = 1 π P = π Thus, if π is a stationary distribution, we can say that ∀ k ∈ Z + , π P k = π. (8)From the construction of the Markov chain (1), each agenthas a positive probability of moving from any node i ∈ S toany other node j ∈ S of the spatial grid in a ﬁnite number oftime steps. As a result, the Markov chain Z ak is an irreducible Markov chain, and therefore P is an irreducible matrix.From Lemma 8.4.4 (Perron-Frobenius) in [24], we knowthat there exists a real unique positive left eigenvector of P .Moreover, since P is a stochastic matrix, its spectral radius ρ ( P ) is equal to 1. Therefore, we can conclude that this lefteigenvector is the stationary distribution of the correspondingDTDS Markov chain. We will next apply the followingtheorem. Theorem III.1. (Theorem 21.12 in [25]) An irreducibleMarkov chain with transition matrix P is positive recurrentif and only if there exists a probability distribution π suchthat π P = π . Since we have shown that the Markov chain is irreducible and has a stationary distribution π , which satisﬁes π P = π ,we can conclude from Theorem III.1 that the Markov chainis positive recurrent . Thus, all states in the Markov chain arepositive recurrent, which implies that each agent will keepvisiting every state on the ﬁnite spatial grid inﬁnitely often. IV. A NALYSIS OF C ONSENSUS ON A GENTS ’I NFORMATION S TATES

The dynamics of all agents’ movements on the spatial gridcan be modeled by a composite Markov chain with statesdeﬁned as Z k = ( Z k , Z k , ..., Z Nk ) ∈ M , where M = S N .Note that S = |S| and |M| = S N . We deﬁne an undirectedgraph ˆ G = ( ˆ V , ˆ E ) that is associated with the compositeMarkov chain. The vertex set ˆ V is the set of all possiblerealizations ˆ ı ∈ M of Z k . The notation ˆ ı ( a ) represents the a th entry of ˆ ı , which is the spatial node i ∈ S occupied byagent a . We deﬁne the edge set ˆ E of the graph ˆ G as follows: (ˆ ı, ˆ  ) ∈ ˆ E if and only if (ˆ ı ( a ) , ˆ  ( a )) ∈ E s for all agents a ∈ N . Let Q ∈ R |M|×|M| be the state transition matrixassociated with the composite Markov chain. The elementsof Q , denoted by q ˆ ı ˆ  , are computed from the transitionprobabilities deﬁned by Equation (7) as follows: q ˆ ı ˆ  = N (cid:89) a =1 p ˆ ı ( a )ˆ  ( a ) , ∀ ˆ ı, ˆ  ∈ M . (9)In the above expression, q ˆ ı ˆ  is the probability that in the nexttime step, each agent a will move from spatial node ˆ ı ( a ) tonode ˆ  ( a ) . i j lp ij p jl p ii p jj p ll Fig. 3: A graph G s = ( V s , E s ) deﬁned on the set of spatialnodes V s = { i, j, l } . The arrows signify directed edgesbetween pairs of distinct nodes or self-edges. The edge setof the graph is E s = { ( i, i ) , ( j, j ) , ( l, l ) , ( i, j ) , ( j, l ) } . ( i, i )ˆ i ( i, j )ˆ j ( i, l )ˆ lq ˆ i, ˆ j q ˆ j, ˆ l q ˆ i, ˆ i q ˆ j, ˆ j q ˆ l, ˆ l Fig. 4: A subset of the composite graph ˆ G = ( ˆ V , ˆ E ) for 2agents that move on the graph G s shown in Figure 3.For example, consider a set of two agents, N = { , } , that move on the graph G s as shown in Figure 3.The agents can stay at their current node in the nexttime step or travel between nodes i and j and betweennodes j and l , but they cannot travel between nodes i and l . Figure 4 shows a subset of the resulting com-posite graph ˆ G . The set of nodes in the graph ˆ G is ˆ V = { ( i, i ) , ( i, j ) , ( i, l ) , ( j, i ) , ( j, j ) , ( j, l ) , ( l, i ) , ( l, j ) , ( l, l ) } .Each node in ˆ V is labeled by a single index ˆ ı , e.g., ˆ ı = ( i, j ) ,with ˆ ı (1) = i and ˆ ı (2) = j . Due to the connectivity ofthe spatial grid deﬁned by E s , we can for example identify (( i, j ) , ( i, l )) as an edge in ˆ E , but not (( i, j ) , ( l, l )) . Since N = 2 and S = 3 , we have that |M| = 3 = 9 . Forach ˆ ı, ˆ  ∈ ˆ V , we can compute the transition probabilities in Q ∈ R × from Equation (9)as follows: q ˆ ı ˆ  = P r ( Z k +1 = ˆ  | Z k = ˆ ı ) = p ˆ ı (1)ˆ  (1) p ˆ ı (2)ˆ  (2) ,k ∈ Z + . (10)We now deﬁne ˆ ξ [ k ] = [ ξ [ k ] ξ [ k ] . . . ξ N [ k ] ξ r ] T ∈ R N +1 as an augmented information state vector. The dy-namics of information exchange among the agents modeledby Equation (2) can then be represented in matrix form asfollows: ˆ ξ [ k + 1] = H [ k ] ˆ ξ [ k ] , (11)where H [ k ] ∈ R ( N +1) × ( N +1) is deﬁned as H [ k ] = (cid:20) I − α L [ k ] + diag ( d ) − d0 (cid:21) (12)in which d = [ g g . . . g N ] T , ∈ R × N is a vector ofzeros, and I ∈ R N × N is the identity matrix.We associate Equation (11) with a graph G r [ k ] , an expan-sion of the graph G c [ k ] that includes information ﬂow fromthe feature nodes Z r to agents that occupy these nodes. Herewe consider the feature as an additional agent a f = N + 1 ,which remains ﬁxed. Let G r [ k ] = ( V r , E r [ k ]) be a directedgraph in which V r = N ∪ a f , the set of agents and thefeature, and E r [ k ] = E c [ k ] ∪ E f [ k ] , where E f [ k ] is the setof agent-feature pairs ( a, a f ) for which Z ak ∈ Z r at time k .In this graph, information ﬂows in one direction from thefeature nodes to all agents that occupy a feature node on theﬁnite spatial grid at time k . In addition, information ﬂowsbidirectionally between agents that are neighbors at time k .We now prove the main result of this paper in the followingtheorem, which shows that all agents will track the referencefeature in the environment almost surely and in a distributedfashion. Theorem IV.1.

Consider a group of N agents whose in-formation states evolve according to Equation (11) . Theinformation states of all agents will converge to the referenceinformation state ξ r almost surely.Proof. Suppose that at an initial time k , the locations ofthe N agents on the spatial grid are represented by the node ˆ ı ∈ ˆ V . Consider another set of agent locations at a futuretime k + k , represented by the node ˆ  ∈ ˆ V . The transitionof the agents from conﬁguration ˆ ı to conﬁguration ˆ  in k time steps corresponds to a random walk of length k onthe composite Markov chain Z k from node ˆ ı to node ˆ  . Italso corresponds to a random walk by each agent a on thespatial grid from node ˆ ı ( a ) to node ˆ  ( a ) in k time steps. Byconstruction, the graph G s is strongly connected and each ofits nodes has a self-edge. Thus, there exists a discrete time n > such that, for each agent a , there exists a randomwalk on the spatial grid from node ˆ ı ( a ) to node ˆ  ( a ) in n time steps. Consequently, there always exists a randomwalk of length n on the composite Markov chain Z k fromnode ˆ ı to node ˆ  . Therefore, Z k is an irreducible Markovchain. All states of an irreducible Markov chain belong toa single communication class. In this case, all states are positive recurrent . As a result, each state of Z k is visitedinﬁnitely often by the group of agents. Moreover, becausethe composite Markov chain is irreducible, we can concludethat ∪ k ∈ Z + G c [ k ] = G , where G is the complete graph onthe set of agents N , and therefore that ∪ k ∈ Z + G r [ k ] containsa directed spanning tree with ξ r as the ﬁxed root. Since thisunion of graphs has a spanning tree, we can apply Theorem3.1 in [26] to conclude that the information state of eachagent will converge to ξ r almost surely. The notation θ ( k ) and F θ ( k ) in [26] corresponds to our deﬁnitions of Z k and H [ k ] , respectively.V. S IMULATION R ESULTS

We validate the result on average information consensus inTheorem IV.1 with numerical simulations in MATLAB and3D physics-based software-in-the-loop (SITL) simulationsdeveloped in ROS-Melodic and Gazebo 9 [5]. In the sim-ulations, multiple agents perform random walks on a ﬁnitespatial grid according to the dynamics in Equation (1). Eachgrid is deﬁned as a square lattice with c = √ S nodes oneach side, where the distance between neighboring nodesis d = 1 m. The state transition probabilities p ij of thecorresponding graph G s are deﬁned according to Equation(7). Since our largest simulated agent population is N = 14 ,and the parameter α must be less than d max = N − [12],we set α = − ≈ . . The tolerance (cid:15) deﬁning the timeuntil consensus was set to 0.01. All simulations were run ona desktop computer with 16 GB of RAM and an Intel Xeon3.0 GHz 16 core processor with an NVIDIA Quadro M4000graphics processor. A. Numerical Simulations

We performed large ensembles of Monte Carlo simulationsto investigate the effect of the number of agents N , the spatialgrid dimension c , and the resulting agent density N/c onthe expected time until the agents reach consensus, i.e., agreethat the feature of interest is present. Quantifying the effect ofthese factors is necessary in order to determine the number ofagents that should search a given area. This would help ﬁrstresponders to optimally distribute resources for searching adisaster-affected environment.Each agent is modeled as a point mass that can movebetween adjacent nodes on the graph G s , as illustrated inFigure 2. We assume that the agents can localize on G s . Theset of neighbors N ak of an agent a at time k consists of allagents that occupy the same spatial node as agent a at thattime. The feature can by detected by an agent located atnodes Z r = { , , } of the spatial grid, and the referenceinformation state of the feature is deﬁned as ξ r = 1 .To investigate the dependence of the expected time toreach consensus, E [ T c ] , on the number of agents N andthe spatial grid dimension c , we simulated scenarios withdifferent combinations of N ∈ { , , . . . , } and c ∈{ , , , , , } meters. For each scenario, we ran 1000simulations with random initial agent positions and computedthe mean time µ at which the agents reached consensus.Figure 5 plots the values of µ versus N and c for eachig. 5: Mean time (s) until consensus is reached, µ , versusnumber of agents N and spatial grid dimension c . Eachvalue of µ is averaged over 1000 Monte Carlo simulationsof scenarios with the corresponding values of N and c . Fig. 6: Mean time (s) until consensus is reached, µ , versusagent density N/c for the simulation data plotted in Fig-ure 5.simulated scenario, and Figure 6 plots µ versus the cor-responding agent density, N/c . We observe from theseﬁgures that a decrease in the agent density results in anincrease in µ . This can be attributed to low agent encounterrates with other agents and with feature nodes at low agentdensities. Using the curve ﬁtting toolbox in MATLAB anddata from Figure 6 we see that there is an exponential relationbetween E [ T c ] and N/c given by E [ T c ] = ae − b Nc with a = 0 . , b = − . . Figure 6 shows that the expectedtime until consensus does not decrease appreciably for agentdensities above approximately N/c = 0 . . Thus, for agiven grid size c , it may not be necessary to deploy morethan about (cid:100) . c (cid:101) agents ( . c rounded up to the nextinteger) to search the area.For selected combinations of N and c , we also computedthe standard deviation σ of the time to reach consensus Reference information Time until consensus is reached (s)state µ ± σξ r = 1 ± ξ r ∼ N(1 , . ± TABLE I: Time until consensus is reached ( µ ± σ ), computedfrom 1000 Monte Carlo simulations of scenarios with N = 5 , c = 5 and different values of ξ r over the corresponding 1000 simulations. Figure 7a plots µ ± σ versus N for a ﬁxed grid dimension c = 5 , andFigure 7b plots µ ± σ versus c for a ﬁxed number of agents N = 5 . Figure 7a shows that for a relatively small gridsize ( c = 5 ), both µ and σ do not vary substantially with N . Thus, a small number of agents would be sufﬁcientto search such an environment, since increasing the agentdensity would not signiﬁcantly speed up the search or reducethe variability in time until consensus. Figure 7b indicatesthat for a ﬁxed group size of N = 5 agents, both µ and σ increase monotonically with the size of the grid. This trendsuggests that more agents should be deployed if the predictedtime until consensus and/or the variability in this time is toohigh for a given environment.We illustrate the agents’ consensus dynamics with twocases of the simulation runs. Figure 8 plots the time evolutionof the agent information states for each case. In the ﬁrst case, N = 2 agents traverse a spatial grid with dimension c = 3 .From Figure 8a, we see that the time until consensus, i.e.the time at which both agents’ information states convergewithin (cid:15) of the reference state ξ r = 1 , is approximately 160s. We also simulate N = 5 agents that traverse a spatial gridwith dimension c = 10 . Figure 8b shows that the time untilconsensus has increased to about 570 s in this case, whichis within one standard deviation σ of the mean consensustime µ computed from our Monte Carlo analysis, as shownin Figure 7b for c = 10 .We also studied the effect on E [ T c ] of uncertainty inthe agents’ identiﬁcation of the feature nodes (i.e., ξ r isa random variable), which may arise in practice due tofactors such as sensor noise, occlusion of features, and inter-agent communication failures. We ran 1000 Monte Carlosimulation runs, for each of two scenarios, all with N = 5 agents moving on a spatial grid with dimension c = 5 m.For each scenario, Table I shows the mean µ and standarddeviation σ of the time until the agents reach consensus. Toinvestigate the effect of uncertainty in feature identiﬁcation,we speciﬁed that agents either perfectly identify the feature,in which case ξ r = 1 , or obtain noisy measurements of thefeature, for which ξ r ∼ N(1 , . . From Table I, we observethat the addition of noise to the agents’ measurements of thefeature results in an increase in both µ and σ . However, de-spite information uncertainty, the agents successfully achieveconsensus. B. 3D Physics Simulations

We also tested our search strategy in physics-based sim-ulations. A snapshot of the Gazebo simulation environmentis shown in Figure 1. The agents are modeled as quadrotors (a) (b)

Fig. 7: Time until consensus is reached, averaged over1000 Monte Carlo simulations of scenarios with (a) varyingnumbers of agents N and grid dimension c = 5 ; (b) varying c and N = 5 . The circles mark mean times µ , and the errorbars show standard deviations σ .with a plus frame conﬁguration. We assume that the agentscan accurately localize in the environment using onboardinertial and GPS sensors. The analysis of our probabilisticconsensus strategy under localization uncertainty is beyondthe scope of this paper. We also assume that the feature ofinterest is known to be present in the environment, but itslocation is unknown.Each quadrotor is equipped with a downward-facing RGBcamera with a resolution of × . The feature ofinterest is modeled as a magenta box, which the agents detectfrom their camera images using a color-based classiﬁer. Weadded zero-mean Gaussian noise with standard deviation . to the photometric intensity in the camera sensor model.We also used a standard plumb bob distortion model toaccount for camera lens distortion. The quadrotors are spaced0.5 m apart in altitude in order to prevent collisions. Thealtitude difference causes slight disparities in the quadrotors’ﬁeld-of-view (FOV), but this does not signiﬁcantly affect theperformance of the search strategy.We simulated two scenarios: N = 2 robots at altitudes . m and m traversing a × grid, and N = 5 robots at alti- (a) (b) Fig. 8: Time evolution of the agent information states ξ a [ k ] in simulations of (a) N = 2 agents moving on a 3 × N = 5 agents moving on a 10 ×

10 grid.tudes between m and m traversing a × grid. The videoattachment (also online at https://youtu.be/j74jeWQ0HM0)shows a simulation run of the second scenario. Figure 9a andFigure 9b plot the time evolution of the agent informationstates over a single simulation run of each scenario. Theinformation states sometimes display steep drops in value, asin the plots of ξ and ξ in Figure 9b from 50 s to 70 s. Thesedrops can be attributed to the following factors: (1) an agentupdates its information state with states communicated by itsneighbors, according to the consensus protocol; (2) an agentthat is at the feature node stops detecting the feature belowwhen another agent at a lower altitude enters its ﬁeld of view,occluding the feature; (3) spurious measurements like falsepositives may have been introduced by an agent’s sensors.Despite the unmodeled effects of the second and third factorson the information states, the agents still successfully reachconsensus during the Gazebo simulations. We see that thetime until consensus is reached in Figure 9a and Figure 9bis about 210 s and 250 s, respectively. The delays in thesetimes compared to the times in the Monte Carlo simulationsin Figure 7 can be attributed to the second and third factorsdescribed above and to the inertia of the quadrotor, whichaffect the Gazebo simulations but not the Monte Carlosimulations.VI. C ONCLUSION AND F UTURE W ORK

In this paper, we have presented a probabilistic searchstrategy for multiple agents with local sensing and com-

50 100 150 200 250

Time (s) (a)

Time (s) (b)

Fig. 9: Time evolution of the robot information states ξ a [ k ] in Gazebo simulation runs of (a) N = 2 robots moving ona 3 × N = 5 robots moving on a 5 × EFERENCES[1] Nathan Michael, Shaojie Shen, Kartik Mohta, Vijay Kumar, KeijiNagatani, Yoshito Okada, Seiga Kiribayashi, Kazuki Otake, KazuyaYoshida, Kazunori Ohno, et al. Collaborative mapping of an earth- quake damaged building via ground and aerial robots. In

Field andService Robotics , pages 33–47. Springer, 2014.[2] Wolfram Burgard, Mark Moors, Cyrill Stachniss, and Frank E Schnei-der. Coordinated multi-robot exploration.

IEEE Transactions onRobotics , 21(3):376–386, 2005.[3] Keiji Nagatani, Seiga Kiribayashi, Yoshito Okada, Kazuki Otake,Kazuya Yoshida, Satoshi Tadokoro, Takeshi Nishimura, TomoakiYoshida, Eiji Koyanagi, Mineo Fukushima, et al. Emergency responseto the nuclear accident at the Fukushima Daiichi Nuclear Power Plantsusing mobile rescue robots.

Journal of Field Robotics , 30(1):44–63,2013.[4] Reid Simmons, David Apfelbaum, Wolfram Burgard, Dieter Fox, MarkMoors, Sebastian Thrun, and H˚akan Younes. Coordination for multi-robot exploration and mapping. In

AAAI/IAAI , pages 852–858, 2000.[5] Nathan Koenig and Andrew Howard. Design and use paradigms forgazebo, an open-source multi-robot simulator. In ,volume 3, pages 2149–2154. IEEE, 2004.[6] Andrew Howard, Lynne E Parker, and Gaurav S Sukhatme. Exper-iments with a large heterogeneous mobile robot team: Exploration,mapping, deployment and detection.

The International Journal ofRobotics Research , 25(5-6):431–447, 2006.[7] Ammar Husain, Heather Jones, Balajee Kannan, Uland Wong, TiagoPimentel, Sarah Tang, Shreyansh Daftry, Steven Huber, and William LWhittaker. Mapping planetary caves with an autonomous, heteroge-neous robot team. In , pages 1–13.IEEE, 2013.[8] Demetri P Spanos, Reza Olfati-Saber, and Richard M Murray. Dy-namic consensus on mobile networks. In

IFAC World Congress , pages1–6. Citeseer, 2005.[9] Wei Ren, Randal W Beard, and Ella M Atkins. Information consensusin multivehicle cooperative control.

IEEE Control Systems , 27(2):71–82, 2007.[10] Wei Ren and Randal W Beard. Consensus of information underdynamically changing interaction topologies. In , volume 6, pages 4939–4944. IEEE, 2004.[11] M Mesbahi and M Egerstedt.

Graph theoretic methods in multiagentsystems . Princeton University, Princeton, NJ, 2010.[12] Reza Olfati-Saber and Richard M Murray. Consensus problems innetworks of agents with switching topology and time-delays.

IEEETransactions on Automatic Control , 49(9):1520–1533, 2004.[13] Ramviyas Parasuraman, Jonghoek Kim, Shaocheng Luo, and Byung-Cheol Min. Multipoint rendezvous in multirobot systems.

IEEETransactions on Cybernetics , 50(1):310–323, 2018.[14] Xi Yu and M Ani Hsieh. Synthesis of a time-varying communicationnetwork by robot teams with information propagation guarantees.

IEEE Robotics and Automation Letters , 5(2):1413–1420, 2020.[15] Dieter Fox, Jonathan Ko, Kurt Konolige, Benson Limketkai, DirkSchulz, and Benjamin Stewart. Distributed multirobot exploration andmapping.

Proceedings of the IEEE , 94(7):1325–1339, 2006.[16] Regis Vincent, Dieter Fox, Jonathan Ko, Kurt Konolige, BensonLimketkai, Benoit Morisset, Charles Ortiz, Dirk Schulz, and BenjaminStewart. Distributed multirobot exploration, mapping, and task alloca-tion.

Annals of Mathematics and Artiﬁcial Intelligence , 52(2-4):229–255, 2008.[17] Jinwen Hu, Lihua Xie, Kai-Yew Lum, and Jun Xu. Multiagentinformation fusion and cooperative control in target search.

IEEETransactions on Control Systems Technology , 21(4):1223–1235, 2012.[18] Fredy Martinez, Edwar Jacinto, and Diego Acero. Brownian motionas exploration strategy for autonomous swarm robots. In ,pages 2375–2380. IEEE, 2012.[19] Israel A Wagner, Michael Lindenbaum, and Alfred M Bruckstein.Robotic exploration, Brownian motion and electrical resistance. In

International Workshop on Randomization and Approximation Tech-niques in Computer Science , pages 116–130. Springer, 1998.[20] Alan FT Winﬁeld. Distributed sensing and data collection via brokenad hoc wireless connected networks of mobile robots. In

DistributedAutonomous Robotic Systems 4 , pages 273–282. Springer, 2000.[21] Ragesh K. Ramachandran, Zahi Kakish, and Spring Berman. Informa-tion correlated L´evy walk exploration and distributed mapping usinga swarm of robots.

IEEE Transactions on Robotics , 2020.[22] Wei Ren and Randal W Beard. Consensus tracking with a referencestate.

Distributed Consensus in Multi-vehicle Cooperative Control:Theory and Applications , pages 55–73, 2008.23] Geoffrey Grimmett and David Stirzaker.

Probability and randomprocesses . Oxford University Press, 2001.[24] Roger A Horn and Charles R Johnson.

Matrix analysis . CambridgeUniversity Press, 1990.[25] David A Levin and Yuval Peres.

Markov chains and mixing times ,volume 107. American Mathematical Society, 2017.[26] Ion Matei, Nuno C Martins, and John S Baras. Consensus problemswith directed Markovian communication patterns. In , pages 1298–1303. IEEE, 2009.[27] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool.Speeded-up robust features (SURF).