Multi-Robot Target Search using Probabilistic Consensus on Discrete Markov Chains
MMulti-Robot Target Search using Probabilistic Consensus onDiscrete Markov Chains
Aniket Shirsat, Karthik Elamvazhuthi, and Spring Berman
Abstract — In this paper, we propose a probabilisticconsensus-based multi-robot search strategy that is robust tocommunication link failures, and thus is suitable for disasteraffected areas. The robots, capable of only local communication,explore a bounded environment according to a random walkmodeled by a discrete-time discrete-state (DTDS) Markov chainand exchange information with neighboring robots, resulting ina time-varying communication network topology. The proposedstrategy is proved to achieve consensus, here defined as agree-ment on the presence of a static target, with no assumptionson the connectivity of the communication network. Usingnumerical simulations, we investigate the effect of the robotpopulation size, domain size, and information uncertainty onthe consensus time statistics under this scheme. We also validateour theoretical results with 3D physics-based simulations inGazebo. The simulations demonstrate that all robots achieveconsensus in finite time with the proposed search strategy overa range of robot densities in the environment.
I. INTRODUCTIONDisaster areas, such as regions affected by earthquakesand floods, experience great disruption to communication andpower infrastructure. This presents challenges in coordinat-ing searches for survivors and dispersing relief teams to thoselocations. Teams of mobile robots have proved to be usefulfor exploring and mapping environments in disaster responsescenarios [1], [2], [3]. However, such robots are subject toconstraints on the payloads that they can carry, includingpower sources, sensors, embedded processors, actuators, andcommunication devices for transmitting information to otheragents and/or to a command center. In addition, many multi-robot control strategies rely on a communication networkfor coordination. Centralized exploration strategies like [4]rely on constant communication between agents and a centralnode. However, these strategies do not scale well withthe number of agents, since the communication bandwidthbecomes a bottleneck with increasing agent population size.Moreover, such strategies suffer from a single point offailure, i.e., a disruption to the central node causes loss ofcommunication for all the agents.These drawbacks can be overcome by employing decen-tralized exploration strategies that involve only local com-munication between agents. However, communication canbecome unreliable as the number of agents increases [6],
This work was supported by ONR Young Investigator Award N00014-16-1-2605 and by the Arizona State University Global Security Initiative.Aniket Shirsat and Spring Berman are with the School for Engineering ofMatter, Transport and Energy, Arizona State University, Tempe, AZ, 85287USA { ashirsat, Spring.Berman } @asu.edu .Karthik Elamvazhuthi is with the Department of Mathematics,University of California, Los Angeles, CA, 90095 USA [email protected] . Fig. 1: Overhead view of problem scenario, simulated inGazebo 9 [5]. Multiple aerial robots, flying at differentheights, search for a target represented by the magenta boxusing a Markov chain motion model.and the connectivity of the communication network may bedisrupted in some applications by the environment [7] or bythe movement of agents outside of communication range.Decentralized multi-agent control strategies that employcommunication networks often require the agents to reach consensus on a particular variable. Achieving consensus isthe problem of arriving at a common output variable orglobal property from measurements by distributed agentswith local communication, without the need for a supervisoryagent (leader or central processor) [8]. Consensus problemshave been studied in the cases of static or fixed networktopologies [9] and dynamic or switching network topologies[8], directed and undirected communication graphs [10],random networks [11], and mobile networks with commu-nication delays [12]. Consensus algorithms for multi-robotrendezvous, e.g. [13], [14], [15], [16], are an example of sucha strategy on a dynamic network. The robot controllers drivethe robots to meet at a common location in order to enabletheir information exchange via local communication. How-ever, such strategies restrict exploration since the robots mustaggregate at a common location. Distributed consensus formerging individual agents’ information has been previouslyused for multi-agent search, e.g. [17]; however, it requiresa connected communication network. Although random mo-bility models are commonly used in multi-robot exploration,e.g. [18], [19], [20], few works consider consensus problemsfor agents that perform probabilistic search strategies, andthus have randomly time-varying communication networks.To address this problem, we present and analyze a a r X i v : . [ c s . R O ] S e p robabilistic multi-agent search strategy that is based ona distributed consensus protocol. The proposed strategy isdecentralized and asynchronous and relies on only limitedcommunication among agents. Thus, it can be employed inapplications, such as disaster response scenarios, where it isinfeasible to maintain a connected communication network,rendezvous, or communicate with a central node. The agentsmove according to a discrete-time discrete-state (DTDS)Markov chain model on a finite spatial grid, as illustratedin Figure 1. We consider only static features here, whichrepresent persistent characteristics of the target(s) that theagents are searching for in the environment. The maincontributions of this paper are the following:1) We prove that agents with a DTDS Markov motionmodel and local communication will achieve consen-sus, in an almost sure sense, on the presence of a staticfeature of interest in a bounded environment.2) Our proof does not require the assumption that theagent communication network remain connected overa non-zero finite time interval, as assumed in [21]for a similar consensus problem over a time-varyingnetwork. We validate our theoretical results with Monte Carlo sim-ulations in MATLAB and with 3D physics simulationsperformed in Gazebo 9 [5] using the Robot Operating System(ROS). From the simulation results, we empirically charac-terize the dependence of the expected time until consensuson the number of agents, the grid size, and the agent density,which can be used to guide the selection of the number ofagents to search a given environment.The remainder of the paper is organized as follows.Section II presents the problem formulation, and SectionIII describes the probabilistic motion model of the agents.Section IV proves that all agents will reach consensus on thepresence of the feature under our stochastic search strategy.Section V presents example implementations of our strategyin numerical and 3D physics simulations and discusses theresults. Section VI concludes and suggests future work.II. PROBLEM STATEMENTWe consider an unknown, bounded environment that con-tains a finite, non-zero number of static features of interest,indexed by the set
I ⊂ Z + , where Z + is the set ofpositive integers. A set of N agents, indexed by the set N = { , , ..., N } , explore the environment using a randomwalk strategy. We assume that each agent can localize itselfin the environment and can detect a feature within its sensingrange. When an agent a ∈ N detects a feature at discretetime k , it associates a scalar information state ξ a [ k ] ∈ R ≥ with its current position. The vector of information statesfor all agents at time k is denoted by ξ [ k ] . Defining U (0 , as the uniform probability distribution on the interval [0 , ,the initial information state of each agent a is specified This assumption implies the existence of a uniform upper bound on theinterval between successive meeting times of any two agents, which is notguaranteed for agents that evolve stochastically on a finite connected statespace. a priori as ξ a [0] ∼ U (0 , . The agent can communicateits information state ξ a [ k ] at time k to all agents withina disc of radius r comm ∈ (0 , δ ] , where δ is the maximumcommunication radius. We define these agents as the setof neighbors of agent a at time k , denoted by N ak . Inaddition, we assume that the agents can avoid obstaclesduring their exploration. Since the agents are constantlymoving, the set of agents with which they can communicatechanges over time. The time evolution of this communicationnetwork is determined by the random walks of the agentsthroughout the bounded environment. This approach uses lowcommunication bandwidth, since each agent only transmitsa scalar value associated with each feature that it detects.We discretize the environment, as shown in Figure 2, intoa square grid of nodes spaced at a distance d apart. The set ofnodes is denoted by S ⊂ Z + . We define S = |S| . Let G s =( V s , E s ) be an undirected graph associated with this finitespatial grid, where V s is the set of nodes and E s is the setof edges ( i, j ) that signify pairs of nodes i, j ∈ V s betweenwhich agents can travel. We refer to these pairs of nodes as neighboring nodes . Each agent performs a random walk onthis grid, moving from its current node i to a neighboringnode j at the next time step with transition probability p ij .Let Z ak ∈ S be a random variable that represents the index ofthe node that an agent a ∈ N occupies at the discrete time k .For each agent a , the probability mass function π k ∈ R × S of Z ak evolves according to a DTDS Markov chain: π k +1 = π k P , (1)where the state transition matrix P ∈ R S × S has elements p ij ∈ [0 , at row i ∈ S and column j ∈ S .We assume that no prior information about possible searchlocations is available. To cover the search area uniformly,each agent is deployed from a random node on the spatialgrid. These initial agent positions are chosen independentlyof one another and are identically distributed according to theprobability mass function π , defined as a discrete uniformdistribution over the set of nodes. We define ξ r ∈ R ≥ asa scalar reference information state that is associated withthe set of nodes Z r ⊂ S from which an agent can detect afeature. In this work, we consider environments with a singlefeature of interest.We now define another graph that models the time-varyingcommunication topology of the agents as they move alongthe spatial grid. Let G c [ k ] = ( V c , E c [ k ]) be an undirectedgraph in which V c = N , the set of agents, and E c [ k ] isthe set of all pairs of agents ( a, b ) ∈ N × N that cancommunicate with each other at time k . Let M [ k ] ∈ R N × N be the adjacency matrix with elements m ab [ k ] = 1 if ( a, b ) ∈ E c [ k ] and m ab [ k ] = 0 otherwise. We define L [ k ] ∈ R N × N as the graph Laplacian, whose elements are l ab [ k ] = (cid:80) Nb =1 m ab [ k ] = deg ( v a ) if a = b and l ab [ k ] = − m ab [ k ] if a (cid:54) = b . Given the agent dynamics (1) on the spatial grid, eachagent a updates its information state at each time k accordingto a consensus protocol similar to one developed in [22].This update is based on the agent’s current information; theinformation from all its neighboring agents, of which there lgorithm 1: Control strategy for agent a ∈ N Input: α, g a , (cid:15), ξ r ; ξ a [0] ∼ U (0 , ; Z a ← i ∈ S Output: k, ξ a [ k ] for which | ξ a [ k ] − ξ r | ≤ (cid:15)k ← while | ξ a [ k ] − ξ r | > (cid:15) do sum1 ← sum2 ← forall b ∈ N ak do /* agents a , b communicate */ sum1 ← sum1 − αl ab [ k ]( ξ a [ k ] − ξ b [ k ]) endif i ∈ Z r then /* agent a detects feature */ sum2 ← − g a ( ξ a [ k ] − ξ r ) end ξ a [ k + 1] ← ξ a [ k ] + sum1 + sum2 Z ak +1 ← j , ( i, j ) ∈ E s , with probability p ij i ← jk ← k + 1 end are at most d max = N − ; and the reference informationstate: ξ a [ k + 1] = ξ a [ k ] − α (cid:88) b ∈N ak l ab [ k ]( ξ a [ k ] − ξ b [ k ]) − g a ( ξ a [ k ] − ξ r ) , (2)where a, b ∈ N ; α is a constant, chosen such that α ∈ (0 , d max ) [12]; and g a is defined as: g a = (cid:40) , Z ak ∈ Z r , otherwise (3)In the next two sections, we will show that when agentsmove on the spatial grid according to (1) and exchange in-formation with their neighbors according to (2), they achieve average consensus on their information states, defined asfollows: Definition II.1.
We say that the vector ξ [ k ] converges almostsurely to average consensus if ξ [ k ] a.s → ξ r , (4) where ∈ R N × is a vector of ones. This implies that the agents’ individual information stateswill eventually converge to a common information state thatindicates the presence of the object being searched. We define T c as the time k at which every agent’s information state ξ a [ k ] reaches ξ r within a small tolerance (cid:15) , where ≤ (cid:15) (cid:28) ;i.e., | ξ a [ T c ] − ξ r | < (cid:15) for all agents a ∈ N . We consider T c to be the time at which the agents reach consensus.The implementation of this probabilistic search strategyon each agent is described in the pseudo code shown inAlgorithm 1. We illustrate the strategy for a scenario withtwo quadrotors in Figure 2. The quadrotors start at the spatialgrid nodes indexed by i and j and move on the grid according m = Z k = Z k Z = i Z = j upleft downrightFig. 2: Illustration of our multi-agent search strategy, show-ing sample paths for two quadrotors (orange and red) ona square grid. The quadrotors search the environment for astatic target (the magenta star) as they perform a randomwalk on the grid.to the DTDS Markov chain dynamics in (1). The figureshows sample paths of the quadrotors. The orange quadrotordetects the feature, indicated by a magenta star, when itmoves to a node in the set Z r (at these nodes, the feature iswithin the quadrotor’s sensing range). The quadrotors meet atgrid node m after k = 9 time steps and exchange informationaccording to (2). They stop the search if their informationstates are within (cid:15) of ξ r ; otherwise, they continue to random-walk on the grid.III. A NALYSIS OF THE M ARKOV C HAIN M ODEL OF A GENT M OBILITY
Consider the DTDS Markov chain that governs the prob-ability mass function of the state Z ak , defined as the locationof agent a at time k on the spatial grid that representsthe environment. Then, the time evolution of the agent a ’smovement in this finite state space can be expressed by usingthe Markov property as follows: P r ( Z ak +1 = j | Z ak = i, Z ak − = m, . . . , Z a = l )= P r ( Z ak +1 = j | Z ak = i ) , (5)where the second expression is the probability with whichan agent at node j transitions to node i at time k + 1 , and m, l ∈ Z + . A. State Transition Matrix
The Markov chain (1) is expressed in terms of the statetransition matrix P . The time invariant matrix P is de-fined by the state space of the spatial grid representingthe discretized environment. Hence, the Markov chain is time-homogeneous , which implies that P r ( Z ak +1 = j | Z ak = i ) is the same for all agents at all times k . The entries of P ,hich are the state transition probabilities, can therefore bedefined as p ij = P r ( Z ak +1 = j | Z ak = i ) , ∀ i, j ∈ S , k ∈ Z + , ∀ a ∈ N . (6)Since each agent chooses its next node from a uniformdistribution, these entries can be computed as p ij = (cid:40) d i +1 , ( i, j ) ∈ E s , , otherwise , (7)where d i is the degree of the node i ∈ S , defined as d i = 2 if i is a corner of the spatial grid, d i = 3 if it is on an edgebetween two corners, and d i = 4 otherwise. Since each entry p ij ≥ , we use the notation P ≥ . We see that P m ≥ for m ≥ . Hence, P is a non-negative matrix. Using Theorem5 in [23], we can conclude that the state transition matrix P is a stochastic matrix. B. Stationary Distribution
A stationary distribution of a Markov chain is defined asfollows.
Definition III.1. (Page 227 in [23]) The vector π ∈ R S iscalled a stationary distribution of a Markov chain if π hasentries such that: π j ≥ ∀ j ∈ S and (cid:80) Sj =1 π j = 1 π P = π Thus, if π is a stationary distribution, we can say that ∀ k ∈ Z + , π P k = π. (8)From the construction of the Markov chain (1), each agenthas a positive probability of moving from any node i ∈ S toany other node j ∈ S of the spatial grid in a finite number oftime steps. As a result, the Markov chain Z ak is an irreducible Markov chain, and therefore P is an irreducible matrix.From Lemma 8.4.4 (Perron-Frobenius) in [24], we knowthat there exists a real unique positive left eigenvector of P .Moreover, since P is a stochastic matrix, its spectral radius ρ ( P ) is equal to 1. Therefore, we can conclude that this lefteigenvector is the stationary distribution of the correspondingDTDS Markov chain. We will next apply the followingtheorem. Theorem III.1. (Theorem 21.12 in [25]) An irreducibleMarkov chain with transition matrix P is positive recurrentif and only if there exists a probability distribution π suchthat π P = π . Since we have shown that the Markov chain is irreducible and has a stationary distribution π , which satisfies π P = π ,we can conclude from Theorem III.1 that the Markov chainis positive recurrent . Thus, all states in the Markov chain arepositive recurrent, which implies that each agent will keepvisiting every state on the finite spatial grid infinitely often. IV. A NALYSIS OF C ONSENSUS ON A GENTS ’I NFORMATION S TATES
The dynamics of all agents’ movements on the spatial gridcan be modeled by a composite Markov chain with statesdefined as Z k = ( Z k , Z k , ..., Z Nk ) ∈ M , where M = S N .Note that S = |S| and |M| = S N . We define an undirectedgraph ˆ G = ( ˆ V , ˆ E ) that is associated with the compositeMarkov chain. The vertex set ˆ V is the set of all possiblerealizations ˆ ı ∈ M of Z k . The notation ˆ ı ( a ) represents the a th entry of ˆ ı , which is the spatial node i ∈ S occupied byagent a . We define the edge set ˆ E of the graph ˆ G as follows: (ˆ ı, ˆ ) ∈ ˆ E if and only if (ˆ ı ( a ) , ˆ ( a )) ∈ E s for all agents a ∈ N . Let Q ∈ R |M|×|M| be the state transition matrixassociated with the composite Markov chain. The elementsof Q , denoted by q ˆ ı ˆ , are computed from the transitionprobabilities defined by Equation (7) as follows: q ˆ ı ˆ = N (cid:89) a =1 p ˆ ı ( a )ˆ ( a ) , ∀ ˆ ı, ˆ ∈ M . (9)In the above expression, q ˆ ı ˆ is the probability that in the nexttime step, each agent a will move from spatial node ˆ ı ( a ) tonode ˆ ( a ) . i j lp ij p jl p ii p jj p ll Fig. 3: A graph G s = ( V s , E s ) defined on the set of spatialnodes V s = { i, j, l } . The arrows signify directed edgesbetween pairs of distinct nodes or self-edges. The edge setof the graph is E s = { ( i, i ) , ( j, j ) , ( l, l ) , ( i, j ) , ( j, l ) } . ( i, i )ˆ i ( i, j )ˆ j ( i, l )ˆ lq ˆ i, ˆ j q ˆ j, ˆ l q ˆ i, ˆ i q ˆ j, ˆ j q ˆ l, ˆ l Fig. 4: A subset of the composite graph ˆ G = ( ˆ V , ˆ E ) for 2agents that move on the graph G s shown in Figure 3.For example, consider a set of two agents, N = { , } , that move on the graph G s as shown in Figure 3.The agents can stay at their current node in the nexttime step or travel between nodes i and j and betweennodes j and l , but they cannot travel between nodes i and l . Figure 4 shows a subset of the resulting com-posite graph ˆ G . The set of nodes in the graph ˆ G is ˆ V = { ( i, i ) , ( i, j ) , ( i, l ) , ( j, i ) , ( j, j ) , ( j, l ) , ( l, i ) , ( l, j ) , ( l, l ) } .Each node in ˆ V is labeled by a single index ˆ ı , e.g., ˆ ı = ( i, j ) ,with ˆ ı (1) = i and ˆ ı (2) = j . Due to the connectivity ofthe spatial grid defined by E s , we can for example identify (( i, j ) , ( i, l )) as an edge in ˆ E , but not (( i, j ) , ( l, l )) . Since N = 2 and S = 3 , we have that |M| = 3 = 9 . Forach ˆ ı, ˆ ∈ ˆ V , we can compute the transition probabilities in Q ∈ R × from Equation (9)as follows: q ˆ ı ˆ = P r ( Z k +1 = ˆ | Z k = ˆ ı ) = p ˆ ı (1)ˆ (1) p ˆ ı (2)ˆ (2) ,k ∈ Z + . (10)We now define ˆ ξ [ k ] = [ ξ [ k ] ξ [ k ] . . . ξ N [ k ] ξ r ] T ∈ R N +1 as an augmented information state vector. The dy-namics of information exchange among the agents modeledby Equation (2) can then be represented in matrix form asfollows: ˆ ξ [ k + 1] = H [ k ] ˆ ξ [ k ] , (11)where H [ k ] ∈ R ( N +1) × ( N +1) is defined as H [ k ] = (cid:20) I − α L [ k ] + diag ( d ) − d0 (cid:21) (12)in which d = [ g g . . . g N ] T , ∈ R × N is a vector ofzeros, and I ∈ R N × N is the identity matrix.We associate Equation (11) with a graph G r [ k ] , an expan-sion of the graph G c [ k ] that includes information flow fromthe feature nodes Z r to agents that occupy these nodes. Herewe consider the feature as an additional agent a f = N + 1 ,which remains fixed. Let G r [ k ] = ( V r , E r [ k ]) be a directedgraph in which V r = N ∪ a f , the set of agents and thefeature, and E r [ k ] = E c [ k ] ∪ E f [ k ] , where E f [ k ] is the setof agent-feature pairs ( a, a f ) for which Z ak ∈ Z r at time k .In this graph, information flows in one direction from thefeature nodes to all agents that occupy a feature node on thefinite spatial grid at time k . In addition, information flowsbidirectionally between agents that are neighbors at time k .We now prove the main result of this paper in the followingtheorem, which shows that all agents will track the referencefeature in the environment almost surely and in a distributedfashion. Theorem IV.1.
Consider a group of N agents whose in-formation states evolve according to Equation (11) . Theinformation states of all agents will converge to the referenceinformation state ξ r almost surely.Proof. Suppose that at an initial time k , the locations ofthe N agents on the spatial grid are represented by the node ˆ ı ∈ ˆ V . Consider another set of agent locations at a futuretime k + k , represented by the node ˆ ∈ ˆ V . The transitionof the agents from configuration ˆ ı to configuration ˆ in k time steps corresponds to a random walk of length k onthe composite Markov chain Z k from node ˆ ı to node ˆ . Italso corresponds to a random walk by each agent a on thespatial grid from node ˆ ı ( a ) to node ˆ ( a ) in k time steps. Byconstruction, the graph G s is strongly connected and each ofits nodes has a self-edge. Thus, there exists a discrete time n > such that, for each agent a , there exists a randomwalk on the spatial grid from node ˆ ı ( a ) to node ˆ ( a ) in n time steps. Consequently, there always exists a randomwalk of length n on the composite Markov chain Z k fromnode ˆ ı to node ˆ . Therefore, Z k is an irreducible Markovchain. All states of an irreducible Markov chain belong toa single communication class. In this case, all states are positive recurrent . As a result, each state of Z k is visitedinfinitely often by the group of agents. Moreover, becausethe composite Markov chain is irreducible, we can concludethat ∪ k ∈ Z + G c [ k ] = G , where G is the complete graph onthe set of agents N , and therefore that ∪ k ∈ Z + G r [ k ] containsa directed spanning tree with ξ r as the fixed root. Since thisunion of graphs has a spanning tree, we can apply Theorem3.1 in [26] to conclude that the information state of eachagent will converge to ξ r almost surely. The notation θ ( k ) and F θ ( k ) in [26] corresponds to our definitions of Z k and H [ k ] , respectively.V. S IMULATION R ESULTS
We validate the result on average information consensus inTheorem IV.1 with numerical simulations in MATLAB and3D physics-based software-in-the-loop (SITL) simulationsdeveloped in ROS-Melodic and Gazebo 9 [5]. In the sim-ulations, multiple agents perform random walks on a finitespatial grid according to the dynamics in Equation (1). Eachgrid is defined as a square lattice with c = √ S nodes oneach side, where the distance between neighboring nodesis d = 1 m. The state transition probabilities p ij of thecorresponding graph G s are defined according to Equation(7). Since our largest simulated agent population is N = 14 ,and the parameter α must be less than d max = N − [12],we set α = − ≈ . . The tolerance (cid:15) defining the timeuntil consensus was set to 0.01. All simulations were run ona desktop computer with 16 GB of RAM and an Intel Xeon3.0 GHz 16 core processor with an NVIDIA Quadro M4000graphics processor. A. Numerical Simulations
We performed large ensembles of Monte Carlo simulationsto investigate the effect of the number of agents N , the spatialgrid dimension c , and the resulting agent density N/c onthe expected time until the agents reach consensus, i.e., agreethat the feature of interest is present. Quantifying the effect ofthese factors is necessary in order to determine the number ofagents that should search a given area. This would help firstresponders to optimally distribute resources for searching adisaster-affected environment.Each agent is modeled as a point mass that can movebetween adjacent nodes on the graph G s , as illustrated inFigure 2. We assume that the agents can localize on G s . Theset of neighbors N ak of an agent a at time k consists of allagents that occupy the same spatial node as agent a at thattime. The feature can by detected by an agent located atnodes Z r = { , , } of the spatial grid, and the referenceinformation state of the feature is defined as ξ r = 1 .To investigate the dependence of the expected time toreach consensus, E [ T c ] , on the number of agents N andthe spatial grid dimension c , we simulated scenarios withdifferent combinations of N ∈ { , , . . . , } and c ∈{ , , , , , } meters. For each scenario, we ran 1000simulations with random initial agent positions and computedthe mean time µ at which the agents reached consensus.Figure 5 plots the values of µ versus N and c for eachig. 5: Mean time (s) until consensus is reached, µ , versusnumber of agents N and spatial grid dimension c . Eachvalue of µ is averaged over 1000 Monte Carlo simulationsof scenarios with the corresponding values of N and c . Fig. 6: Mean time (s) until consensus is reached, µ , versusagent density N/c for the simulation data plotted in Fig-ure 5.simulated scenario, and Figure 6 plots µ versus the cor-responding agent density, N/c . We observe from thesefigures that a decrease in the agent density results in anincrease in µ . This can be attributed to low agent encounterrates with other agents and with feature nodes at low agentdensities. Using the curve fitting toolbox in MATLAB anddata from Figure 6 we see that there is an exponential relationbetween E [ T c ] and N/c given by E [ T c ] = ae − b Nc with a = 0 . , b = − . . Figure 6 shows that the expectedtime until consensus does not decrease appreciably for agentdensities above approximately N/c = 0 . . Thus, for agiven grid size c , it may not be necessary to deploy morethan about (cid:100) . c (cid:101) agents ( . c rounded up to the nextinteger) to search the area.For selected combinations of N and c , we also computedthe standard deviation σ of the time to reach consensus Reference information Time until consensus is reached (s)state µ ± σξ r = 1 ± ξ r ∼ N(1 , . ± TABLE I: Time until consensus is reached ( µ ± σ ), computedfrom 1000 Monte Carlo simulations of scenarios with N = 5 , c = 5 and different values of ξ r over the corresponding 1000 simulations. Figure 7a plots µ ± σ versus N for a fixed grid dimension c = 5 , andFigure 7b plots µ ± σ versus c for a fixed number of agents N = 5 . Figure 7a shows that for a relatively small gridsize ( c = 5 ), both µ and σ do not vary substantially with N . Thus, a small number of agents would be sufficientto search such an environment, since increasing the agentdensity would not significantly speed up the search or reducethe variability in time until consensus. Figure 7b indicatesthat for a fixed group size of N = 5 agents, both µ and σ increase monotonically with the size of the grid. This trendsuggests that more agents should be deployed if the predictedtime until consensus and/or the variability in this time is toohigh for a given environment.We illustrate the agents’ consensus dynamics with twocases of the simulation runs. Figure 8 plots the time evolutionof the agent information states for each case. In the first case, N = 2 agents traverse a spatial grid with dimension c = 3 .From Figure 8a, we see that the time until consensus, i.e.the time at which both agents’ information states convergewithin (cid:15) of the reference state ξ r = 1 , is approximately 160s. We also simulate N = 5 agents that traverse a spatial gridwith dimension c = 10 . Figure 8b shows that the time untilconsensus has increased to about 570 s in this case, whichis within one standard deviation σ of the mean consensustime µ computed from our Monte Carlo analysis, as shownin Figure 7b for c = 10 .We also studied the effect on E [ T c ] of uncertainty inthe agents’ identification of the feature nodes (i.e., ξ r isa random variable), which may arise in practice due tofactors such as sensor noise, occlusion of features, and inter-agent communication failures. We ran 1000 Monte Carlosimulation runs, for each of two scenarios, all with N = 5 agents moving on a spatial grid with dimension c = 5 m.For each scenario, Table I shows the mean µ and standarddeviation σ of the time until the agents reach consensus. Toinvestigate the effect of uncertainty in feature identification,we specified that agents either perfectly identify the feature,in which case ξ r = 1 , or obtain noisy measurements of thefeature, for which ξ r ∼ N(1 , . . From Table I, we observethat the addition of noise to the agents’ measurements of thefeature results in an increase in both µ and σ . However, de-spite information uncertainty, the agents successfully achieveconsensus. B. 3D Physics Simulations
We also tested our search strategy in physics-based sim-ulations. A snapshot of the Gazebo simulation environmentis shown in Figure 1. The agents are modeled as quadrotors (a) (b)
Fig. 7: Time until consensus is reached, averaged over1000 Monte Carlo simulations of scenarios with (a) varyingnumbers of agents N and grid dimension c = 5 ; (b) varying c and N = 5 . The circles mark mean times µ , and the errorbars show standard deviations σ .with a plus frame configuration. We assume that the agentscan accurately localize in the environment using onboardinertial and GPS sensors. The analysis of our probabilisticconsensus strategy under localization uncertainty is beyondthe scope of this paper. We also assume that the feature ofinterest is known to be present in the environment, but itslocation is unknown.Each quadrotor is equipped with a downward-facing RGBcamera with a resolution of × . The feature ofinterest is modeled as a magenta box, which the agents detectfrom their camera images using a color-based classifier. Weadded zero-mean Gaussian noise with standard deviation . to the photometric intensity in the camera sensor model.We also used a standard plumb bob distortion model toaccount for camera lens distortion. The quadrotors are spaced0.5 m apart in altitude in order to prevent collisions. Thealtitude difference causes slight disparities in the quadrotors’field-of-view (FOV), but this does not significantly affect theperformance of the search strategy.We simulated two scenarios: N = 2 robots at altitudes . m and m traversing a × grid, and N = 5 robots at alti- (a) (b) Fig. 8: Time evolution of the agent information states ξ a [ k ] in simulations of (a) N = 2 agents moving on a 3 × N = 5 agents moving on a 10 ×
10 grid.tudes between m and m traversing a × grid. The videoattachment (also online at https://youtu.be/j74jeWQ0HM0)shows a simulation run of the second scenario. Figure 9a andFigure 9b plot the time evolution of the agent informationstates over a single simulation run of each scenario. Theinformation states sometimes display steep drops in value, asin the plots of ξ and ξ in Figure 9b from 50 s to 70 s. Thesedrops can be attributed to the following factors: (1) an agentupdates its information state with states communicated by itsneighbors, according to the consensus protocol; (2) an agentthat is at the feature node stops detecting the feature belowwhen another agent at a lower altitude enters its field of view,occluding the feature; (3) spurious measurements like falsepositives may have been introduced by an agent’s sensors.Despite the unmodeled effects of the second and third factorson the information states, the agents still successfully reachconsensus during the Gazebo simulations. We see that thetime until consensus is reached in Figure 9a and Figure 9bis about 210 s and 250 s, respectively. The delays in thesetimes compared to the times in the Monte Carlo simulationsin Figure 7 can be attributed to the second and third factorsdescribed above and to the inertia of the quadrotor, whichaffect the Gazebo simulations but not the Monte Carlosimulations.VI. C ONCLUSION AND F UTURE W ORK
In this paper, we have presented a probabilistic searchstrategy for multiple agents with local sensing and com-
50 100 150 200 250
Time (s) (a)
Time (s) (b)
Fig. 9: Time evolution of the robot information states ξ a [ k ] in Gazebo simulation runs of (a) N = 2 robots moving ona 3 × N = 5 robots moving on a 5 × EFERENCES[1] Nathan Michael, Shaojie Shen, Kartik Mohta, Vijay Kumar, KeijiNagatani, Yoshito Okada, Seiga Kiribayashi, Kazuki Otake, KazuyaYoshida, Kazunori Ohno, et al. Collaborative mapping of an earth- quake damaged building via ground and aerial robots. In
Field andService Robotics , pages 33–47. Springer, 2014.[2] Wolfram Burgard, Mark Moors, Cyrill Stachniss, and Frank E Schnei-der. Coordinated multi-robot exploration.
IEEE Transactions onRobotics , 21(3):376–386, 2005.[3] Keiji Nagatani, Seiga Kiribayashi, Yoshito Okada, Kazuki Otake,Kazuya Yoshida, Satoshi Tadokoro, Takeshi Nishimura, TomoakiYoshida, Eiji Koyanagi, Mineo Fukushima, et al. Emergency responseto the nuclear accident at the Fukushima Daiichi Nuclear Power Plantsusing mobile rescue robots.
Journal of Field Robotics , 30(1):44–63,2013.[4] Reid Simmons, David Apfelbaum, Wolfram Burgard, Dieter Fox, MarkMoors, Sebastian Thrun, and H˚akan Younes. Coordination for multi-robot exploration and mapping. In
AAAI/IAAI , pages 852–858, 2000.[5] Nathan Koenig and Andrew Howard. Design and use paradigms forgazebo, an open-source multi-robot simulator. In ,volume 3, pages 2149–2154. IEEE, 2004.[6] Andrew Howard, Lynne E Parker, and Gaurav S Sukhatme. Exper-iments with a large heterogeneous mobile robot team: Exploration,mapping, deployment and detection.
The International Journal ofRobotics Research , 25(5-6):431–447, 2006.[7] Ammar Husain, Heather Jones, Balajee Kannan, Uland Wong, TiagoPimentel, Sarah Tang, Shreyansh Daftry, Steven Huber, and William LWhittaker. Mapping planetary caves with an autonomous, heteroge-neous robot team. In , pages 1–13.IEEE, 2013.[8] Demetri P Spanos, Reza Olfati-Saber, and Richard M Murray. Dy-namic consensus on mobile networks. In
IFAC World Congress , pages1–6. Citeseer, 2005.[9] Wei Ren, Randal W Beard, and Ella M Atkins. Information consensusin multivehicle cooperative control.
IEEE Control Systems , 27(2):71–82, 2007.[10] Wei Ren and Randal W Beard. Consensus of information underdynamically changing interaction topologies. In , volume 6, pages 4939–4944. IEEE, 2004.[11] M Mesbahi and M Egerstedt.
Graph theoretic methods in multiagentsystems . Princeton University, Princeton, NJ, 2010.[12] Reza Olfati-Saber and Richard M Murray. Consensus problems innetworks of agents with switching topology and time-delays.
IEEETransactions on Automatic Control , 49(9):1520–1533, 2004.[13] Ramviyas Parasuraman, Jonghoek Kim, Shaocheng Luo, and Byung-Cheol Min. Multipoint rendezvous in multirobot systems.
IEEETransactions on Cybernetics , 50(1):310–323, 2018.[14] Xi Yu and M Ani Hsieh. Synthesis of a time-varying communicationnetwork by robot teams with information propagation guarantees.
IEEE Robotics and Automation Letters , 5(2):1413–1420, 2020.[15] Dieter Fox, Jonathan Ko, Kurt Konolige, Benson Limketkai, DirkSchulz, and Benjamin Stewart. Distributed multirobot exploration andmapping.
Proceedings of the IEEE , 94(7):1325–1339, 2006.[16] Regis Vincent, Dieter Fox, Jonathan Ko, Kurt Konolige, BensonLimketkai, Benoit Morisset, Charles Ortiz, Dirk Schulz, and BenjaminStewart. Distributed multirobot exploration, mapping, and task alloca-tion.
Annals of Mathematics and Artificial Intelligence , 52(2-4):229–255, 2008.[17] Jinwen Hu, Lihua Xie, Kai-Yew Lum, and Jun Xu. Multiagentinformation fusion and cooperative control in target search.
IEEETransactions on Control Systems Technology , 21(4):1223–1235, 2012.[18] Fredy Martinez, Edwar Jacinto, and Diego Acero. Brownian motionas exploration strategy for autonomous swarm robots. In ,pages 2375–2380. IEEE, 2012.[19] Israel A Wagner, Michael Lindenbaum, and Alfred M Bruckstein.Robotic exploration, Brownian motion and electrical resistance. In
International Workshop on Randomization and Approximation Tech-niques in Computer Science , pages 116–130. Springer, 1998.[20] Alan FT Winfield. Distributed sensing and data collection via brokenad hoc wireless connected networks of mobile robots. In
DistributedAutonomous Robotic Systems 4 , pages 273–282. Springer, 2000.[21] Ragesh K. Ramachandran, Zahi Kakish, and Spring Berman. Informa-tion correlated L´evy walk exploration and distributed mapping usinga swarm of robots.
IEEE Transactions on Robotics , 2020.[22] Wei Ren and Randal W Beard. Consensus tracking with a referencestate.
Distributed Consensus in Multi-vehicle Cooperative Control:Theory and Applications , pages 55–73, 2008.23] Geoffrey Grimmett and David Stirzaker.
Probability and randomprocesses . Oxford University Press, 2001.[24] Roger A Horn and Charles R Johnson.
Matrix analysis . CambridgeUniversity Press, 1990.[25] David A Levin and Yuval Peres.
Markov chains and mixing times ,volume 107. American Mathematical Society, 2017.[26] Ion Matei, Nuno C Martins, and John S Baras. Consensus problemswith directed Markovian communication patterns. In , pages 1298–1303. IEEE, 2009.[27] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool.Speeded-up robust features (SURF).