SSensor Planning for Large
Numbers of Robots
Micah CorahSeptember 21, 2020
CMU-RI-TR-20-53
The Robotics InstituteSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213
Thesis Committee:
Nathan Michael,
Chair
Anupam GuptaKatia SycaraMac Schwager, Stanford
Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Robotics.
Copyright © a r X i v : . [ c s . R O ] F e b or my parents, David and Barbara, whosupported and encouraged my interest inrobotics at every step of this journey. bstract In the wake of a natural disaster, locating and extracting victims quickly iscritical because mortality rises rapidly after the first forty-eight hours. In orderto assist search and rescue teams and improve response times, teams of aerialrobots equipped with sensors and cameras can engage in sensing tasks suchas mapping buildings, assessing structural integrity, and locating victims. Weseek to enable large numbers of robots to cooperate to complete such sensingtasks more quickly and thereby to improve response times for first responders.When formalized, such sensing tasks encapsulate numerous computationallydifficult (at least NP-Hard) problems related to routing, sensor coverage, anddecision processes. One way to simplify planning for these tasks is to focuson maximizing sensing performance over a short time horizon. Specifically,consider the problem of how to select motions for a team of robots to maximizea notion of sensing quality (the sensing objective) over the near future, say bymaximizing the amount of unknown space in a map that robots will observeover the next several seconds. By repeating this process regularly (about oncea second), the robots can react quickly to new observations as they work tocomplete the sensing task. In technical terms, this planning and control processforms an example of receding-horizon control. Fortunately, common sensingobjectives for these problems benefit from well-known monotonicity properties(e.g. submodularity), and greedy algorithms can exploit these monotonicityproperties to solve the receding-horizon optimization problems that we studynear-optimally.However, greedy algorithms typically force robots to make decisions sequen-tially. Thus, planning time grows with the number of robots and eventuallyexceeds time constraints to replan in real time. This is particularly importantin distributed settings as the accumulation of communication latencies betweenrobots in sequence can be significant on its own. Further, recent works thathave begun to investigate sequential greedy planning, have demonstrated thatreducing the number of sequential steps while retaining suboptimality guaran-tees can be hard or impossible.This thesis demonstrates that halting such growth in planning time is pos-sible for many sensing problems. To do so, we introduce new greedy methods,Randomized Sequential Partitions (RSP) and Range-limited RSP. These meth-ods enable planning with a fixed number of sequential steps that does not growwith the number of robots. Additionally, we prove that our algorithms ap-proach the initial near-optimality guarantees for sequential planning for manysensing problems. In doing so, we develop new methods for quantifying redun-dancy between potential future observations and highlight the importance of arelatively unknown monotonicity property of some sensing objectives.e apply our algorithms to autonomous mapping (known as exploration)and target tracking problems which serve as proxies for the variety of tasks andcombinations of tasks that may arise in search and rescue scenarios. Simulationresults demonstrate that our greedy planners often approach the performanceof sequential planning (in terms of target position uncertainty) given only a fewplanning steps (2-4), even for very large numbers of robots (96). This amountsto a 24 × reduction in the number of sequential steps and an equivalent orgreater reduction in the duration of distributed planning.With exploration, we apply our methods in the context of a complex systemwhere robots equipped with depth cameras map unknown, three-dimensionalenvironments such as an office space or a cave. Moreover, the analysis wepresent when applying our planning methods to mapping objectives also pro-vides valuable insights into the design of such exploration systems. While con-sistently improving completion times via greedy methods proves challenging,we demonstrate that sequential planning and RSP increase coverage rates earlyin simulation trials and reliably improve solution quality. Finally, we presenta distributed implementation of RSP and simulation results for explorationof an office environment in real time, along with a 5% improvement in taskcompletion time in this setting. iv cknowledgments I would like to begin by thanking my family for their ongoing support. Inparticular, my parents have encouraged my interests in robotics for quite a longtime and put up with quite a nerdy youngster. They deserve more than justmy thanks. I should also thank my partner, Srujana. She kept me alive, happy,and well fed during the final and most harrowing months of my doctorate.I also have many to thank in the Resilient Intelligent Systems Lab. My of-ficemates, Ellen and Derek were ever-present companions. Ellen, I believe ourfish tank set new standards for office decor as well as undergraduate achieve-ment. Derek, aside from being my office compatriot, you were also an earlymentor when I was still an intern. Thank you for that time. And, Vibhav,you were my replacement, but I preferred being students together. Kshitij andCormac, we collaborated on some of the best work that is not in this thesis.Thank you Curt for keeping my robots alive before I learned to avoid them.And, thanks Wennie for doing the hard work to develop systems for explorationand mapping. Vishnu, thank you for demonstrating the first feasible solutionto the graduation problem. Your leadership was invaluable. Shaurya, thankyou for being someone who I can talk to about research and everything else.Also, John, the pandemic has taken our pizza, and I want it back. Thanks toeveryone else in the lab who has made my time here brighter, including Alex,Xuning, Aditya, Arjav, Mosam, Moses, Tim, Matt, Mike, Erik, and anyone elsewho I may have forgotten. Also, Karen Widmaier, you put in a good deal ofhard work for our lab, and you make our hearts warmer.Many thanks to my friends in the Robotics Institute. Achal and Radhika,Dhruv and Cara, and now I am compelled to mention Xuning again, thank youfor the adventures that brought so much joy to this time. Shushman, you are acaring friend. I did not introduce you to my family; I introduced my family toyou. Thank you Reuben, you speak softly, but your words carry great weight.Nick, you have done great things for the RI, but you also do great things forfrogs.Thanks to everyone else who contributed to my time here. Rachel Burcin,I am only one of a vast number of people who have been significantly impactedby your tireless enthusiasm administrating RISS. I could not keep up with you.Reid, you welcomed me to CMU twice, and I hope to someday become more likeyou. Thank you Michael Erdmann, you allowed me to babble about submodularfunctions in front of 16-811. Thanks Sankalp, your words in front of same classintroduced me to same topic. My thanks also to Roie for validating my interestin 3-increasing functions and for our too infrequent exchanges on the topic.Thank you Nate for guiding my journey in robotics at CMU. You acceptedmy stubbornness and pushed me to excellence. Thanks also to the rest of mycommittee, Anupam, Katia, and Mac for direction and advice. I hope thesefirst few pages are to your liking; for the next hundred, well I am not Asimov.i ontents
A Additional Technical Discussion 139
A.1 Expected coverage and mutual information for exploration with independentcells are not necessarily adaptive submodular . . . . . . . . . . . . . . . . . 139A.2 Analysis for scaling target tracking to large numbers of robots . . . . . . . 142A.2.1 Scaling and sensor models . . . . . . . . . . . . . . . . . . . . . . . 143x.3 An argument in favor of coverage over entropy reduction for evaluating ex-ploration performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143A.4 Evaluating the performance of an approximate mutual information objective 144A.4.1 Upper and lower bounds on mutual information . . . . . . . . . . . 146A.5 Comparison of suboptimality for RSP and DSGA . . . . . . . . . . . . . . 146A.6 Description of auction methods for distributed submodular maximization . 147
B Assorted Proofs 151
B.1 Proof for representations of derivatives of set functions . . . . . . . . . . . 151B.2 Proof of Lemma 1, chain rule for derivatives of set functions . . . . . . . . 152B.3 Proof of Theorem 3, post-hoc suboptimality of DSGA . . . . . . . . . . . . 152B.4 Proof of Theorem 4, worst case suboptimality of DSGA . . . . . . . . . . . 153B.5 Proof of Lemma 11, suboptimality of general assignments . . . . . . . . . . 154B.6 Proof of Theorem 10, suboptimality of distributed planning for target tracking154B.7 Proof of Theorem 12, noiseless mutual information with independent cellsis 3-increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Bibliography 157 xiii ist of Figures ψ values for varying numbers of robots and distributed plan-ning rounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Computational performance in terms of total computation time . . . . . . 624.4 Entropy reduction performance with different numbers of robots and plannerconfigurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5 Simulated exploration results with dynamic quadrotors . . . . . . . . . . . 654.6 Cumulative numbers of fallback trajectories selected by multi-robot plannerin simulation results with dynamic quadrotors . . . . . . . . . . . . . . . . 654.7 Results for hardware experiments . . . . . . . . . . . . . . . . . . . . . . . 665.1 Illustration of area coverage and inter-agent redundancy . . . . . . . . . . 705.2 Example of a submodular objective that is not 3-increasing . . . . . . . . . 725.3 Maximum area coverage problem and sequential solution . . . . . . . . . . 815.4 Results for area coverage problem . . . . . . . . . . . . . . . . . . . . . . . 825.5 Example of a probabilistic sensing scenario and a sequential solution . . . . 835.6 Results for a probabilistic sensing problem . . . . . . . . . . . . . . . . . . 846.1 Target tracking illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.2 Visualization of target tracking simulation . . . . . . . . . . . . . . . . . . 956.3 Target tracking results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.1 Visualization of sampled informative views and view distance . . . . . . . . 1077.2 Visualization of exploration of the Skylight environment . . . . . . . . . . 1087.3 Illustration of online and oblivious solution bounds . . . . . . . . . . . . . 1127.4 Environment coverage results . . . . . . . . . . . . . . . . . . . . . . . . . 114xiii.5 Coverage rates per robot across environments . . . . . . . . . . . . . . . . 1157.6 Lower bounds on suboptimality for receding-horizon planning for exploration 1167.7 Coverage rates per robot up to the early progress threshold . . . . . . . . . 1188.1 Objective values and messaging statistics for various distributed submodularmaximization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308.2 Visualization of exploration of an office environment with distributed plan-ning in real time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1318.3 Results for exploration with the distributed RSP implementation . . . . . 131A.1 Adaptive submodularity counter-example . . . . . . . . . . . . . . . . . . . 140A.2 Comparison of environment coverage and entropy reduction . . . . . . . . 144A.3 Comparison of exploration with CSQMI and optimistic coverage . . . . . . 145A.4 Comparison of lower bounds on suboptimality for planning for explorationwith RSP and DSGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147xiv ist of Tables r a ) and sensor ( r s ) radii as a function of the number of agents ( n a = 50in this chapter). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807.1 Planner parameters for receding-horizon exploration . . . . . . . . . . . . . 1097.2 List of exploration environments . . . . . . . . . . . . . . . . . . . . . . . . 1117.3 Lower bounds on suboptimality for receding-horizon planning . . . . . . . 1177.4 Completion times and times for reaching the early progress threshold foreach environment and planner. . . . . . . . . . . . . . . . . . . . . . . . . 119xvvi ist of Algorithms otation General mathematical notation R , R > The set of real numbers and real numbers greater thanzero respectively Z The set of integers SO (3) The special orthogonal group in three dimensions (rota-tions) SE (3) The special Euclidean group for three dimensions (rigid-body motions) p , v Examples of vectors A n , A X Examples of indexing “sets.” Elements of sets will gen-erally be implicitly associated with time or agent indiceswhich will be accessed using ranges and sets of indices X (cid:63) , X g , X d Superscripts designating (elements of) an optimal solu-tion to an optimization problem and examples of outputfrom greedy and distributed algorithms as will be evi-dent from context
Information theory and probability P ( X = x | Y = y ) Probability that a random variable X equals x condi-tional on Y = y E [ X ] The expected value of X H ( X | Y ) Entropy of X conditional on Y I ( X ; Y | Z ) Mutual information between X and Y conditional on Z ombinatorial optimization U The ground set I An independence system B A block of a partition (matroid) g ( A, B ), g ( x ) Evaluation of a set function g at A ∪ B where A, B ⊆ U and implicit evaluation at { x } where x ∈ U g ( A | B ), g ( A ; B | C ) The former shows evaluation at A “conditional” on B ( g ( A, B ) − g ( A )) using similar notation as for mu-tual information. More generally, these two are firstand second discrete derivatives of g where g ( A ; B | C ) = g ( B ; A | C ) = g ( A, B, C ) − g ( A, C ) − g ( B, C )+ g ( C ) (with A , B , C disjoint). Graphs G A (directed or undirected) graph E The set of edges of a graph˜ E Edges removed from a complete graph,
R × R \ E ifrobots are vertices W Edge weightsAs hinted above, we will forgo use of a symbol for vertices in favor of the symbol forthe set of objects—generally robots or agents—in question.
Robotics and control R A set of (mobile) robots A A set of (immobile) agents U A set of controls (or sometimes actions) f ( x , u ) System dynamics given state x and control u (generallyfor an individual robot) h ( x ) Sensing or observation function (also for individualrobots) Algorithm acronyms
SGA Sequential greedy assignment for submodular maximiza-tion on a partition matroid (acronym used in Chapter 4)DSGA, DSGA n d Distributed sequential greedy assignment in general andin n d rounds, an early version of the methods this thesisproposesRSP, R-lRSP Randomized sequential partitions and the range-limitedvariant, the main contributions of this thesis2 hapter 1Introduction Robots are often characterized as being able to sense their environment and act uponit [25]. While the actions that a robot can perform upon an environment—say foldinglaundry [130] or preparing a meal—are important [86], sensing is becoming increasinglyprominent. This shift has been driven by changes in the markets for both robot platformsand the sensors they carry: • Depth sensors and cameras have become increasingly prevalent, cheaper, and lighter,and • Aerial robots have become both popular and pervasive [65, p. 41–51].Moreover, aerial robots, when equipped with appropriate sensors, are able to navigatecomplex [61] and unknown [120] environments to accomplish sensing tasks such as videog-raphy [22], monitoring crops [17], and mapping [36].Sensing problems arising from urban disaster and emergency response and in defensesettings are time-sensitive and involve operation in unknown and cluttered environments.Sometimes, the time-sensitive nature of a task is immediate. One such case is a fire whereinrobots may assist firefighters to outpace the effects of a conflagration. Here, a team ofrobots might engage in sensing tasks immediately preceding or in parallel with firefightersentering a building. Disaster response has similar features, but the time-sensitivity isdriven by scale. In the case of a widespread disaster, teams may have ample time toinspect individual buildings, but when viewed as a whole, the number of locations thatmust be inspected and the paucity of responders motivate rapid action. In particular,mortality typically increases rapidly forty-eight hours after the start of a disaster whilesearch and rescue teams often operate on a first-come-first-serve basis [138].Figure 1.1 illustrates an example of how a team of robots may contribute to sensingtasks that arise while searching a building in a disaster response scenario. For the purposeof this thesis, we will focus on applications and challenges related to disaster and emergencyresponse scenarios such as this, although many of the contributions are relevant to othersensing tasks and optimization problems. While human-robot interaction is not a subject of this thesis, humans can be treated as additionalmembers of a team that may provide additional sensor data, dictate sensing objectives, and must not becollided with. igure 1.1: This figure illustrates a typical disaster response scenario. Here, first responders wish toinspect a disaster site, such as after an earthquake, in order to ascertain structural integrity and to locateany survivors. To do so, they deploy a team of robots from a mobile base station that they transport alongwith the rest of their equipment, and the operator uses a tablet interface to instruct the robots to inspectthe disaster site. Because communication infrastructure may have been destroyed in the disaster, theteam has access to little prior information on the environment, and robots take advantage of distributed,onboard computation to mitigate effects of unreliable communication.
When seeking to improve response times, a designer may work to improve the sensingplatform itself along with planning and control algorithms for individual robots [76, 77].However, platform constraints will ultimately limit the performance of individual robots.Additionally, teams of robots can cooperate in sensing tasks to cover more space at onceor to complete tasks more rapidly. We focus on this latter dimension of the problem ofimproving response times and ask: how can a designer compose multiple engineered sens-ing platforms to improve completion time in sensing tasks?
Addressing this question forlarge teams of robots will involve solving increasingly large planning problems, addressingcommunication constraints, and accounting for complications such as inter-robot collisionconstraints.As sensing tasks progress, the robots’ individual and collective understanding of theworld (commonly referred to as belief) can also change rapidly. The process of updatingcontrol actions in response to new sensor data is commonly referred to as adaptation [6,79, 91]. Adaptation is particularly important for robots operating in cluttered and urbanenvironments. In these scenarios, as robots obtain new sensor data, they can become awareof both new regions of interest and new unoccupied and traversable space. Often, as for arobot that is moving toward unobserved space [76, 77], this data is immediately pertinentto the selection of control actions. By extension, teams of robots that are able completesensing tasks rapidly while operating in unknown environments must also adapt frequently.4 .1 Challenges for distributed sensing and time-sensitive sensing domains
The challenges that arise in this thesis can be divided into two categories: technical chal-lenges related to the design of sensing systems and domain challenges for urban search andrescue. While the domain challenges will motivate and shape the methods we propose, thefollowing technical challenges apply more generally to aerial sensing systems: • Size, Weight, and Power (SWaP) constraints:
Robots, especially aerial robots,have limitations in size and weight based on the application domain. These con-straints are pervasive and can lead to further constraints on thrust, flight time, on-board computation, and sensor payloads • Safety:
Robots may operate in close proximity, replan frequently, and have non-trivial dynamics. Maintaining safety involves avoiding collisions between robots andother objects despite uncertainties in the states of the robots and the environment.Moreover, systems should be robust to planners that may sometimes fail to provideresults, and safety may also include more complex conditions such as the ability toreturn to an initial state or avoiding hazards • Sensing:
The sensors that aerial robots carry are not trivial. Camera views are sen-sitive to orientation, and observations are complicated by the geometry of occlusions.Robots may also carry a variety of sensors such as thermal cameras, range sensors,or gas detectors, and autonomous sensing systems must account for sensor models,fuse data, and reason about the contributions of sensing actions • Communication:
Communication bandwidth between robots and operators mayalso be limited, and communication links may be unreliable. Communication con-straints also affect dissemination of sensor data, operation of distributed algorithms,and interaction with human operators • Computation:
Algorithms related to planning and control may also scale poorly(e.g. being NP-Hard) and have to be solved suboptimally or approximately to betractable. Computational units situated on the robots themselves or elsewhere mustalso have access to relevant information such as the robots’ locations and sensor dataThen, considering the application of methods for time-sensitive sensing for disaster andemergency response in urban environments narrows these challenges and produces moreconcrete details: • Size:
Systems—referring to the entire multi-robot system and related equipment—that are usable by teams of first responders are likely limited in size to what one ortwo people can carry or what is portable with a small vehicle [138] • Hazardous environments:
Environments may contain water, dust, gasses, andother materials that are hazardous to people and robots [138]. Similarly, the struc-tural integrity of a building may be compromised, and rubble may be unstable. Notonly should robots be resilient to these hazards, but these hazards may themselvesbe the focus of sensing tasks, such as to provide situational awareness to responders5ho wish to avoid hazards as well • Complex geometry:
Rubble creates complex formations and small and irregularvoids. Because casualties are often buried deep within the rubble [138], sensing robotsshould be able navigate such voids or else be able to recognize voids and mark themfor further inspection • Limited communication:
Disasters frequently destroy communication infrastruc-ture, and wireless communication within a disaster site may be unreliable such asdue to rubble [138]. Teams of robots may also employ ad-hoc networks [73], and thecommunication graphs linking robots may be incomplete or disconnected • Lack of information:
Emergency and disaster response activities typically occurwithin hours of an event [138], and teams often work independently and with limitedaccess to information [137]. By extension and considering other constraints on com-munication, robot teams will also have limited prior information on the environmentsthat they operate inDisaster and emergency response scenarios do not always exhibit all of these challenges.However, first responders and search and rescue teams are individually responsible for awide variety of disasters and emergencies. For this reason, any aerial sensing systemsthat are useful in practice will likely be those that are readily accessible to respondersand which those responders have been trained to use. As such, in order to be useful,robots may have to address many challenges related to different emergencies and disasterscenarios [136, 138].
This thesis focuses on the process of jointly planning sensing actions over short timehorizons (e.g. seconds) and across large teams of robots, with a focus on sensing tasksarising from urban search and rescue. The proposed approach draws from sequentialgreedy algorithms for maximization of submodular functions [69, 140] (discussed in de-tail in Chapter 3) which are becoming increasingly popular for multi-robot sensor planningproblems [7, 89, 102, 151, 161, 169, 203].Lack of reliable communication between robots and damage to infrastructure also moti-vate our focus on distributed planning. Specifically, robots may not be able to communicatereliably with centralized or remote computational resources (e.g. a mobile base station orremote server) so robots should be able to plan for themselves. However, we note that someof the techniques we develop in this thesis could be adapted to produce efficient centralizedand parallel implementations.While some works that apply submodular maximization to related sensor planningproblems allude to distributed implementation [7, 151], planning time for greedy techniquesstill scales at least linearly with the number of robots. Challenges related to distributedvariations of these algorithms have also only begun to be addressed [75, 82, 83], and yetexisting approaches still scale poorly in planning time or suboptimality as the number ofrobots increases. Conversely, this thesis proposes techniques for planning via submodular6ission Planning(Goal Assignment)Receding-HorizonPlanning
Goals ActionsLocalModelGlobalModel (Comms)Sensor Data Thesiscontributions
DistributedPerception
Figure 1.2:
Diagram of a typical sensing system. This thesis focuses on planning sensing actions for teamsof robots over a receding-horizon (e.g. several seconds). The inputs to the receding-horizon planner arelong-term goals for the sensing process and models of the environment that are locally consistent in spaceand time. The outputs of the receding-horizon planner are control actions, typically a smooth trajectory,which the robot will execute to collect sensor data. maximization that scale to any number of robots, and we prove that the techniques wepropose maintain constant-factor suboptimality for a variety of sensing tasks.Specifically, we develop a receding-horizon approach to sensor planning—illustrated inFig. 1.2—whereby robots collectively optimize sensing actions over a short time horizon,typically several seconds. In general, such planners take high-level goals and locally consis-tent models of the environment as inputs and output control actions in the form of smoothtrajectories. This thesis focuses primarily on the distributed aspect of these receding-horizon optimization problems. While we address some challenges related to receding-horizon planning for individual robots , that topic can also be addressed separately [76, 77].We will also provide little detail on methods for high-level mission planning (or goal as-signment) and distributed perception which are each necessary components of completesensing systems and, themselves, important topics of study. Maintaining distributed rep-resentations of the environment can also incur significant communication costs. Whilethere are ways to mitigate such communication costs either by intelligently communicat-ing or compressing sensor data [57, 72] or by distributed robots in ways that reduce theneed for communication [123], the methods in this thesis will not address this challenge.Likewise, problems related to estimating robot position and states, particularly simulta-neous localization and mapping (SLAM), are out of scope both for passive estimation andas an active control problem [93]. Additionally, since our analysis will apply primarilyto receding-horizon sub-problems, translating improvements in solution quality on thesesub-problems into improvements in task performance—such as for robots to map an envi-ronment more quickly—will itself pose an important challenge in this thesis.7urther, this thesis will apply the methods for receding-horizon planning which wedevelop to two application areas: exploration (autonomous mapping) and target tracking .Here, exploration serves as a simple proxy for more complex tasks related to search andrescue. In particular, planning for exploration addresses the challenge of selecting sensingactions while operating in an environment with unknown geometry. On the other hand, thetarget tracking problems which we study are not immediately relevant to the search andrescue domain. Instead, these target tracking problems provide an opportunity to study aspecial class of factored objectives; for target tracking this arises in an objective which canbe written as a sum over terms associated with each of the targets. More generally, this classof objectives can capture the multi-faceted sensing tasks that arise in search and rescue.In this sense, our analysis for such factored objectives could enable designers to extend thesystems we develop for exploration to simultaneously address a variety of nuanced sensingtasks such as for locating survivors, assessing structural integrity, or identifying hazards.
The formulations of the exploration and target tracking tasks will also make a few sim-plifying assumptions. These assumptions are not necessarily realistic in practice but canoften be relaxed with minor modifications to the approach. The following assumptions areimportant to various parts of this thesis: • Homogeneity:
We assume teams of robots are homogeneous to simplify exposi-tion, but our methods could be applied to heterogeneous teams with only minormodifications • Spatial locality:
A stricter assumption is that robots’ incremental motions andsensor ranges are bounded . However, we make this assumption primarily for thepurpose of analysis of large teams of robots, and this assumption need only applyloosely to actual teams of robots. Additionally, spatial locality will arise in differentforms for different tasks and will be treated on a case-by-base basis • Static environments:
For exploration, we assume that robots map static environ-ments . This assumption that environments are largely static while being searchedis likely realistic—searchers would generally avoid disturbing objects and rubble toavoid causing collapses and endangering themselves or any survivors. However, robotsmay also have to avoid collisions with responders who are moving about the environ-ment or may have to recognize minor changes in the map such as after respondersclear some rubble or open a door. Thus, while the environment may remain predom-inantly static for the purpose of the sensing task, addressing extraneous dynamics inand around the environment will be left to future work
This thesis provides two kinds of contributions: development of scalable, distributed al-gorithms with performance guarantees for multi-robot sensing and development of sensing8pplications for exploration (autonomous mapping) and target tracking. Moreover, thecentral contribution of this thesis is a planning algorithm that guarantees constant factorsensing performance on average for a variety of sensing problems and requires only a fixednumber of sequential planning steps for any number of robots. The rest of the thesis willinvolve implementing such algorithms, analysis of relevant sensing objectives, and develop-ing methods for operating in and exploring unknown environments. These contributionsare outlined below: • Planning for single- and multi- robot exploration:
Chapter 4 proposes a firstattempt toward a distributed sensing algorithm for sensor planning via submodularmaximization—by modifying to existing greedy methods [69]. The approach providespost-hoc guarantees and benefits from parallel computation when single-robot plan-ning is expensive. We develop this approach, along with methods for single-robotplanning based on Monte-Carlo tree search, to develop a receding-horizon plannerfor exploration with teams of aerial robots. And, the rest of this thesis applies simi-lar approaches to single-robot planning, throughout . Additionally, we guarantee thatrobots do not collide with each other or the surrounding environment and describesome weak conditions that are sufficient to guarantee liveness. Notably, this is alsothe only chapter to include experiments with physical robots • Scalable multi-agent coverage:
Chapter 5 presents a greedy algorithm (Random-ized Sequential Partitions or RSP) for coverage and any other submodular maxi-mization problems which exhibit a certain higher-order monotonicity condition. Ouralgorithm guarantees constant factor suboptimality in the sensing objective (e.g.coverage) in expectation with fixed numbers of planning and communication rounds(versus one per robot for sequential greedy methods). This guarantee extends toany number of robots given certain conditions on spatial locality.
To our knowledgeours is the first algorithm to provide such guarantees for a non-trivial class of sub-modular functions.
As such, the remainder of the contributions continue to developapplications and implementations of RSP planning • Target tracking and quantifying inter-robot redundancy with factored ob-jectives:
Chapter 6 introduces target tracking problems and applies similar methodsfor planning as the preceding chapters. Although we found in Chapter 5 that somesensing objectives satisfy a higher-order monotonicity condition that enables develop-ment of scalable planners with suboptimality guarantees, these results do not applyin general to some popular objectives (e.g. mutual information). However, we provethat some special cases of mutual information (those that can be factored as sums)satisfy similar suboptimality guarantees. We also provide detailed analysis for ap-plying this result to target tracking problems. Further, this sum decomposition isrelevant to the kinds of multi-objective problems which may arise when robots engagein more complex sensing tasks. This chapter also provides new analysis for approx-imate receding-horizon planning in real time that is relevant to sensing problemsthroughout this thesis. Last, simulation studies establish performance improvementsfor RSP planning in terms of target uncertainty and demonstrate that the subopti-mality guarantees remain well-behaved for large numbers of robots (with results for9p to 96 robots, providing a 24 × reduction in the number of planning steps comparedto planning sequentially) • Time-sensitive sensing in unknown environments:
Chapter 7 revisits the prob-lem of multi-robot exploration. We provide more in-depth analysis of objective func-tions and present one case of a mutual information objective that satisfies the mono-tonicity conditions we study in this thesis. This chapter also incorporates somechanges in planner design that significantly improve on what Chapter 4 presents.Regarding simulation results, we find that improving completion times via sequentialor RSP planning (versus having no explicit coordination in planning) can be chal-lenging. Still, we demonstrate that planning with RSP improves coverage rates earlyin simulation trials and reliably improves solution quality (suboptimality) • Design and implementation of a distributed sensor planner:
While the pre-vious chapters described distributed algorithms, Chapter 8 is the first to providea distributed implementation of an RSP planner. Toward this end, we provide de-tailed discussion of design decisions and results that investigate communication costs.Simulation results demonstrate this distributed, synchronous, anytime planner in ex-ploration of an office-like environment, verify that the system behaves as expected,and identify a 5% improvement in task completion time for coordination via RSP,albeit for a small set of experiments
Some of the source code for these contributions is available via the BSD license. The nu-merical experiments for coverage in Chapter 5, the target tracking simulations in Chapter 6,and the communication study in Chapter 8 rely on a common code-base, written in Julia.We have also released the distributed implementation of RSP that Chapter 8 describeswhich has been implemented in C++ via ROS. This release encompasses the distributedscheduling and communication aspects of RSP and excludes methods for exploration. Wehope that the design of this package can enable incorporation of RSP methods alongsidevarious single-robot receding-horizon sensor planners with minimal effort. https://github.com/mcorah/MultiAgentSensing https://github.com/mcorah/distributed_randomized_sequential_partitions hapter 2Sensing Problems Before developing the contributions of this thesis, let us pause to discuss the nature ofthe problems that we study. So far, the introduction (Chapter 1) has discussed sensingproblems at a high level and from the perspective of applications, particularly searchand rescue. This chapter will focus more closely on problem form and evaluation fortime-sensitive sensing and for target tracking. Later, Chapter 3 will provide technicalbackground methods and detailed discussion of related works.We provide one caveat: the discussion in this chapter is primarily for the purpose ofempirical evaluation of the systems we propose and to broadly characterize how featuresof sensing problems relate to the requirements of potential solution methods. Where somesensing processes admit algorithms with formal guarantees [79], features such as robotmotion confound existing methods. Likewise, the receding-horizon techniques and othermethods that we develop involve approximations and heuristics, and we would not expectthe resulting systems to satisfy rigorous suboptimality guarantees.
The time-sensitive sensing moniker intends to provide some formalism and structure tothe problems we study in this thesis. Chapter 1 provided motivation for why completingsensing tasks quickly can be important. However, designers might consider other criteria.For example, even early methods for exploration based on frontiers [198] could be describedas complete in the sense that the system will eventually explore the entire environmentunder reasonable assumptions.In defining the time-sensitive sensing problem, we take a cue from Golovin and Krause[79] and describe the controller using notation for policies. However, in line with the restof this thesis and to provide a direct description of the system, we also draw on notationfrom control theory.Consider an unknown static environment E ∼ E drawn from some known distributionover possible environments E . One or more robots (this problem definition can be inter-preted as describing arbitrary systems, consisting of any number of robots) navigate theenvironment and obtain observations according to the known dynamics and observation11odel x t = f ( x t − , u t − ) y t = h ( x t , E ) (2.1)where x t , y t , and u t are the state, observation, and control input at the (discrete) time t . Like the environment, the initial state x ∼ X is random and drawn from a knowndistribution. Given that the robot(s) learn about the environment through the sequenceof observations, the controller is a policy: u t = π ( X t − , Y t − ) (2.2)where the capital letters represent vectors of states and observations. This policy statesthat the robot makes decisions at each time t given the available information, the states,and observations. Further, as the robot navigates the environment, it must also avoidcollisions or otherwise unsafe states. In general, the robot must remain in some set of safestates X safe ( E ) at all times. We seek to minimize the expected completion time (giventhe distribution over environments and starting positions) for a given sensing task—saymapping a building. Progress in the task is measured by the sensing quality function J ( X t , Y t ) which depends on states and observations while completion is modeled by aquota B ( E ) which depends on the environment. For example, this sensing quality mayrepresent how much of a building that a robot has observed or mapped by a given time,and the quota may require the robot to map a certain fraction of the building. Puttingthis all together, the time-sensitive sensing problem is as follows:min π E E, x [ T ]s . t . J ( X T , Y T ) ≥ B ( E ) x t ∈ X safe ( E ) u t = π ( X t − , Y t − ) x t = f ( x t − , u t − ) y t = h ( x t , E )and the above for all t in { . . . T } (2.3)Solving this problem, even approximately, is challenging. This problem is imbued withchallenges from several different fields such as control theory, artificial intelligence, andcombinatorial optimization as we hint by mixing notation from these fields. The following paragraphs discuss some of the challenges and properties of time-sensitivesensing problems. For now, we write the policy and system model as if deterministic. However, the reasoning we presentalso applies to stochastic systems and policies, as we study elsewhere in this thesis. Similar theoretic works [41, 79, 169] typically use a more abstract monotonic or modular cost whichcould represent quantities like time or energy. Although planning with energy costs is an important topicof study [173, 179], such general cost models do not arise in this thesis. .1.1a Inference and uncertainty The robot gains information about the environment by collecting observations from avariety of states. In particular, the set of possible environments that are consistent with acollection of states X t and Y t is { E : Y t = h ( X t , E ) for all possible environments E } . (2.4)Referring back to (2.3), several features of the problem formulation depend on the envi-ronment: the observations h ( x , E ), the safe set X safe ( E ), and the quota B ( E ). Effects ofuncertainty in observations will naturally arise frequently as this thesis focuses on sensingproblems. On the other hand, uncertainty in the quota could be interpreted as compli-cating certain solution strategies, but the quota will only appear again in experimentalevaluation where its value will be known.Although, we will discuss the safe set less frequently, the uncertainty in which actionsare safe has strong implications on system performance such as by bounding maximumspeeds [76]. Likewise, this uncertainty implies that we cannot plan complete paths (suchas to map a building) a priori because those paths may not be safe. Thus, methods forpath planning cannot solve (2.3) on their own. We will continue this discussion later inthis section in terms of safety and feasibility . The sensing quality J ( X t , Y t ) is immediately in terms of known quantities: states andobservations. However, the sensing quality is also a function of the states and environment J ( X t , Y t ) = J ( X t , h ( X t , E )) , (2.5)wherein we abuse notation by applying the observation function to a vector of states. Giventhis latter form, we can make inferences about future observations. In this sense, futuresensing progress is uncertain to the extent that the environment remains uncertain but canalso be optimized by selecting robot motions.Numerous recent works seek to optimize common measures of future sensing progress.However, common sensing functions (such as those we discuss in this thesis) have sometimessurprising differences in important functional properties. For this reason, we do not yetspecify the functional form of the sensing quality J . For example, Golovin and Krause [79]demonstrate that certain sensor coverage functions have a monotonicity property called adaptive submodularity even though important measures of information gain do not [45, 80].We will frequently revisit similar functions for coverage and information gain. As theseoften have similar properties, distinctions such as this are important. Here, adaptivesubmodularity is one factor that determines whether greedy methods can solve (2.3) to We could write this more generally for noisy observations using Bayes’ rule. However, some relevantsolution methods [96] only apply to deterministic models. We emphasize the deterministic case to avoidunduly restricting applicable solutions and because the sensing noise for common depth sensors in map-ping [87] is frequently small compared to range (meters) and the environment discretization (typically10 cm).
The principal effect of the dynamics model is that it changes which information gatheringactions are available at a given time. Here, the information available at a given stateconsists of state, observation pairs ( x , y ). Robots gain information about the environmentby visiting states and inspecting such pairs as y = h ( x , E ). The information gatheringactions that the robot has available at a state x consists of the set of states the robot canvisit next { x (cid:48) : x (cid:48) = f ( x , u ) for all control inputs u } (2.6)As such, the set of available observations at any given time depend on the prior selec-tions. This excludes some methods for adaptive sensing [47, 79] which require the set ofinformation gathering actions not to change over time.Another important property of the system dynamics f is whether it produces a directedor undirected relationship between states. The system dynamics induce an undirectedrelationship on the system states if, for any possible state x and control u pair x = f ( x , u )there exists some u such that x = f ( x , u ) . (2.7)Alternatively, the dynamics produce a directed relationship if the above does not hold.Which of these properties holds can determine which kinds of algorithms and analysis [41,43] apply to the planning problems we discuss in this thesis.Many of the systems in this thesis will involve trivialized kinematic models with undi-rected dynamics. However, Chapters 4 and 8 each provide results for systems with moregeneral, directed dynamics models. Likewise, the methods for distributed coordinationwhich we develop are independent of the robots’ dynamics. Let us consider how dynamics affect safety. Often, we will consider problems where robotscan always stop immediately. In terms of control theory, this means that for all possiblestates x t there exists some control input u t that stops the robot so that x t +1 = x t = f ( x t , u t ) (for some u t ) , (2.8)ensuring that the robot remains in the same state. Individual states where (2.8) holdsare called invariant states [150]. Not all states are invariant states in practice—consider Similar properties also apply for invariant sets where each state in the set has some control actionwhich can keep the robot in the invariant set.
14 robot with non-zero velocity and non-trivial dynamics. Now, consider a robot flyingrapidly toward a brick wall; that robot will eventually reach a point of no return whereit cannot stop without hitting the wall—such is the heightening intensity of two childrenplaying “chicken” who must decide when to act to avoid collision. States past such pointsof no return, where no sequence of control inputs can prevent the robot from leavingthe safe set X safe ( E ), are inevitable collision states . Controllers that ensure safety bypreventing robots from entering inevitable collision states are a frequent topic of study incontemporary control theory in the form of control barrier functions [2, 142, 147] and alsofor applications to mobile robots [77, 95, 120, 121, 192]. This will arise in Chapter 4 wherewe will maintain safe stopping actions for all robots at all times. Further, some workshave begun to investigate the relationship between safety and inference in mapping [95].For example, a robot can take a turn widely while accelerating around a corner to obtainadvance warning of what lies beyond. Now, building on these questions about safety, wecan also ask whether a robot will safely complete a sensing task—whether (2.3) is feasible. Our definition of time-sensitive sensing (2.3) sets a high bar for producing even just anapproximate solution. A feasible policy must: • Complete the sensing task (or else the completion time is infinite), • Guarantee that the robot will remain in the safe set, • And must do so for all of the environments that remain consistent with prior obser-vations (2.4).Given an instance of (2.3), we may ask several relevant questions about feasibility: • Does a feasible solution exist? • Given a policy, does that policy constitute a feasible solution? (A policy that isfeasible and thereby always eventually finishes the sensing task is said to be complete ) • Does any of the above hold for an individual environment and starting positionpairing or a restricted collection of such pairs? • Does any of the above hold for a given partial history ( X t , Y t )? • Is the converse true for any of the above? (e.g. Is there no feasible solution?)We will answer a small subset of this class of questions. Questions about safety alreadyanswer parts of the converse forms (e.g. Is failure imminent? Can a given policy violate thesafety constraint?). Classical frontier-based exploration [198] approaches are also completegiven reasonable assumptions: The robot will navigate to and observe unknown space untilthere is nothing left to observe. We will then attempt to replicate aspects of such classicalapproaches when considering more nuanced problems involving dynamics, unstructuredenvironments, and cameras with limited fields of view.15 .1.1f Decision processes and adaptivity
Time-sensitive sensing problems form decision processes by nature of control via policies π ( X t , Y t ), functions of prior decisions and observations, rather than fixed sequences ofcontrol actions. However, does (2.3) readily fit the form of a classical Partially ObservableMarkov Decision Process (POMDP) [104]? At this point, it is important to realize that thereward (or, more accurately, cost) in (2.3) is the completion time. Although time is easyto compute, completion depends on the sensing quality J ( X t , Y t ) which is a function ofthe sequence of states. Adapting these problems to the form of a traditional POMDP thenrequires an exponential number of states for the same reasons as described by Golovin andKrause [79] regarding the adaptive submodular maximization problems which they define.Alternatively, certain POMDP variants allow for more complex rewards which we discussin Sec. 3.3.3 Because solving full time-sensitive sensing problems is challenging, this thesis takes theapproach of solving a simpler sub-problem. Given prior states and actions X t and Y t we seek to maximize the sensing quality over the next L steps for a fixed sequence of futurecontrol actions u L (known as the receding-horizon ) and adopt the policy: π ( X t , Y t ) = arg max u L E E [ J ( X t + L , Y t + L )]s . t . x t + l ∈ X safe ( E ) x t + l = f ( x t , u l ) y t + l = h ( x t + l , E )for all l in { . . . L } (2.9)Receding-horizon methods such as this are common in robotics and control [150] andfor sensing problems like those we discuss in this thesis [20, 36, 116, 160]. We adoptthis approach as a heuristic and would not expect to obtain approximation guarantees.However, we will be able to guarantee certain properties of the resulting system such assafe operation. Furthermore, this sub-problem does not involve adaptation and thereforeno longer requires specialized analysis for adaptation [79]. This feature can then enableapplication of informative path planning techniques for non-adaptive problems. The solution to (2.9) corresponds to a path (as x t : t + L is a function of u L ) instead ofa policy. This path may be directed or undirected as discussed earlier. Assuming that E E [ J ( X t + L , Y t + L )] satisfies certain properties, existing methods for informative pathplanning can solve (2.9) with various guarantees [41, 43, 90, 169, 200]. Much of this thesiswill focus on how to apply those and similar methods to large instance of (2.9) where thestates and dynamics represent teams of robots .16 .1.3 Exploration In this thesis, exploration refers to the process of mapping some environment. This sectionpresents a simplified but also useful model of the exploration process. To begin, we modelthe environment as a sequence of n m cells E = [ c , . . . , c n m ] which respectively representfree and occupied space c i ∈ { , } . This environment is drawn from some probabilitydistribution, and we will make no assumptions about that distribution at this point.While navigating this environment the robot must avoid occupied cells while observingcell occupancy values with a camera or other sensor. For some state x , the camera function F cam ( x , E ) ⊆ { , . . . , n m } returns the set of cells which the robot observes from that stateand given the specific environment. This camera function may then refer an arbitrary pro-jective camera model. Then, the observation function indicates which cells the robot ob-serves along with their values h ( x , E ) = { ( i, c i ) : i ∈ F cam ( x , E ) } . Similarly, the occupancyfunction F occ ( x ) ⊆ { , . . . , n m } determines which cells that the robot occupies. A robot isin the safe set if all the cells it occupies are free X safe ( E ) = { x : c i = 0 for all i ∈ F occ ( x ) } .We will reward the robot for the number of cells that it observes. So, the sensing qualityin exploration for given states X t and observations Y t is J ( X t , Y t ) = (cid:12)(cid:12)(cid:12)(cid:91) ti =0 F cam ( x i , E ) (cid:12)(cid:12)(cid:12) . (2.10)When written as a function of a set (2.10) is a kind of coverage objective and has usefulmonotonicity properties (Chapter 3 discusses both coverage and monotonicity in detail).Additionally, the expectation, which appears in the receding-horizon problem (2.9), retainssimilar properties. Yet, unlike some coverage objectives [79], (2.10) is not necessarilyadaptive submodular (see Appendix A.1). Chapter 6 deviates from the rest of this thesis by focusing on sensor planning for observationof dynamic systems . Unlike exploration where mobile (dynamic) robots seek to completelymap a static environment, (target) tracking problems are ongoing and can be evaluatedbased on average performance. In this case, we can consider tracking a target whose state x t evolves according to a Markov process so that x t t ∼ P ( x t t | x t t − ), (here, the super-script “t”refers to target states) although more general processes are also relevant to this discussion.In this case, the sensing quality J ( X t , Y t ) might correspond to the number of timesthat the robot has observed a target or a measure of the uncertainty in the target’s entire An appropriate sanity condition for F cam would be to require the set of cells which the robot observes A = F cam ( x , E ) to depend only on the state and the values of the cells that the robot observes. Considerthe occupancy values of those cells E A . We can require for all environments E (cid:48) that whenever the valuesof the observed cells are the same E (cid:48) A = E A , then set of cells that the robot observes A = F cam ( x , E (cid:48) )must also be the same. This avoids inadvertently providing information about cells outside of the field ofview. X t0: t . Adapting (2.3) as appropriate, the time-average performance ismax π lim t →∞ E [ J ( X t , Y t )] t , (2.11)given some observation model and subject to robot dynamics. Then, given the focus ontime-average performance, the tracking problems we study (unlike exploration) are nottime-sensitive sensing problems as described in Sec. 2.1.We will now discuss these problems briefly because they are of secondary importanceto this thesis and because many of the same properties and challenges apply. The target tracking problem (2.11), as described, is a kind of long duration Markov Deci-sion Process (MDP) [4], and if the target states X t are not known exactly, such trackingproblems are also partially observable as for time-sensitive sensing problems. Here, thekey feature of the problem is that (2.11) focuses on average performance for all time anddoes not favor near-term gains. Although some solution methods exist for these problems,the problems that we study also incorporate features that make these problems difficult tosolve such as very large state and action spaces. One important property in such problems is ergodicity . For an ergodic system, the dis-tribution of states for a single trajectory, taken over a very long duration, is the same asthe distribution for many trajectories. In short, ergodicity states that no transitions ina system have irrevocable effects. This could refer to termination conditions or a systemthat falls off a ledge and cannot get back up again.Long duration problems are typically studied under various sorts of ergodicity condi-tions [4]. In this sense, ergodicity conditions determine whether (2.11) describes rewardsfor an individual trajectory or whether rewards could vary greatly across even very longtrials. Unsurprisingly, ergodicity has also been used in heuristics to solve similar prob-lems [8, 127, 129].
We briefly note that concepts from information theory and information dynamics are highlyrelevant to target tracking problems and will go into more detail on related concepts later inthis text [23, 58]. Information theory can describe how quickly uncertainty in target statescan grow [58] and can provide further connections to ergodicity or characterize interactionsbetween system components [23].
Stability properties are also relevant to tracking problems. In this case stability can de-scribe whether robots may travel far from the targets that they are tracking or whether18ncertainty can grow indefinitely [156]. Such conditions will generally produce worst caserewards in (2.11) (such as zero or infinitely negative reward). Although we will not seekto characterize stability in the systems we study, this property characterizes the kind ofbehavior that could arise in results.
In an abstract sense, the discussion so far applies to both individual robots and multi-robot teams when viewed as a collective. Yet, this chapter has not explicitly addressed theramifications of multi-robot sensing. To begin, the following chapters will describe variousmethods to decompose and plan for multi-robot sensing problems. These methods are allmuch more efficient than applying planners for individual robots to the joint state spaceof the entire team.Additional challenges and constraints also arise when considering multi-robot teams.For example, safety constraints (Sec. 2.1.1d) can be amended to account for inter-robotcollisions. We also encounter new challenges when considering distributed planning (viaprocessors onboard each robot) and accounting for communication between robots. Re-garding distributed planning, robots should be able to obtain solutions in a timely manner,preferably while taking advantage of parallel computation. Likewise, a distributed plannermay require systems for sharing sensor data or maintaining consistent beliefs via commu-nication between robots. This, in turn, leads to challenges with respect to communicationas robots may have to maintain connectivity across the team or respect bandwidth con-straints. We will begin to explore these challenges next with the background and relatedwork (Chapter 3), and later Chapter 8 will address challenges related to communicationmore directly. 190 hapter 3Background and Related Work
This chapter will establish the theoretic and practical foundations for this thesis and forsensing and information gathering in unknown environments. We will begin with a high-level discussion of sensor planning, exploration, and navigation in unknown environments(Sec. 3.1). This section builds somewhat on the challenges for sensing problems (Chap-ter 2) and introduces related works in this area. The foundations for questions aboutinformation and uncertainty in robotics come from information theory (Sec. 3.2). Moredirectly, there is a wide body of literature related to robotics that applies informationtheory to planning and control for sensing and information acquisition (Sec. 3.3). And forthe purpose of this thesis, we are particularly interested in applying techniques for sens-ing and information gathering to multi-robot settings (Sec. 3.4). Many sensing objectives,especially those based on information theory, share useful monotonicity properties. Oneof the best known monotonicity properties is submodularity, and many single- and multi-robot planning problems can be solved efficiently and near-optimally using techniques forsubmodular maximization (Sec. 3.5). In particular, we are interested in applying methodsfor submodular maximization to distributed planning for multi-robot teams. There are al-ready a number of works on parallel and distributed submodular maximization (Sec. 3.6).However, not all of these works are readily applicable to multi-robot settings. Further, noexisting works are able to scale to large numbers of robots due to increasing planning time.Therefore, in addition to developing methods for aerial autonomy and exploration, thisthesis also presents new methods for submodular maximization that can scale to arbitrarynumbers of robots.
While sensing goals and objectives can vary significantly, we expect urban emergency anddisaster response tasks to feature cluttered and at least partially unknown environmentsand refer to scenarios where mapping is the primary sensing task as exploration problems.We may then cast navigation and mapping in unknown environments as a sensing prob-lem [37, 103] and plan to maximize some notion of information gain. However, operation21n unknown environments is distinguished by several unique challenges: • Robots must avoid collision with objects and each other; • The navigable subset of the environment is revealed incrementally while the robotobserves free and occupied space; • Robot states and positions are uncertain; and • Environments may be large and informative observations distant.Further, the non-trivial dynamics of aerial robots exacerbate these challenges. The restof this section will address information-based formulations and each of these challenges inturn.
Safety reduces to planning collision-free paths for slow-moving ground robots with largefields of view [116, 172, 198], and authors frequently overlook challenges related to safetyconstraints. On the contrary, the aerial robots that we study can move quickly and cannotstop instantaneously. These same robots can also move side-to-side, outside of the camerafield of view, and may not be able observe what they are moving toward. Constraints onreachability that ensure set-invariance [150] can ensure that robots always avoid collisionswith the environment and each other or can ensure that robots can return to a startingposition [71]. Safety requirements also constrain how quickly robots move toward [76, 77]and plan through [95] unmapped space. As such, learning to avoid collisions [152] can alsoimprove speeds to the extent that the learner can generalize about the environment.
Thus far, we have emphasized observations of the state of the environment rather than thestates of the robots themselves. As studied in robotics, uncertainty in the states of robotsleads to simultaneous localization and mapping (SLAM) problems and some additionalchallenges: estimates of the robot state can drift dangerously over time, and uncertainty inthe state history contributes to uncertainty in the map. Direct observation of position byGPS or motion capture can mitigate or eliminate these challenges. Except, GPS access canalso be limited when operating in urban environments [137] which motivates developmentof reliable SLAM systems for the problems we study.Robots can also plan to minimize uncertainty via information-theoretic methods forsensor planning [7, 93, 172] forming methods for what is known as active SLAM. Whileclassical SLAM algorithms that apply particle filters to two-dimensional SLAM readilyadmit joint optimization to minimize uncertainty in the map and robot state [172], methodsfor sensing with modern smoothing-based frameworks [7, 93] are somewhat more abstractand often represent uncertainty in environment geometry indirectly. Additionally, someworks in this area employ the same greedy algorithms for submodular maximization [7, 93]which we study in this thesis. However, submodularity generally only applies to robots’histories of states and observations but not to future states, due to dependence on therobots’ own control actions [7]. 22s such, even though this thesis does not address state uncertainty explicitly, methodssimilar to those we propose could play a role in systems for active SLAM. Alternatively,designers may also wish to address uncertainty in future states through constraints onuncertainty as a matter of safety, and such constraints could be incorporated into theproblem formulations we propose.
The mobility of aerial robots exploring some environment contrasted against limitationson field of view motivate methods that can predict information gain at different viewpoints such as with mutual information [37, 103, 201] and coverage-like [29, 62] objectives.Authors have also applied a variety of such objectives along with a numerous techniquesfor planning paths and actions [20, 36, 116].We will apply methods based on motion primitives [139, 179] and Monte-Carlo treesearch [39, 116], and, more recently, our work has addressed motion primitive design inthis framework [76, 77]. Additionally, Chapter 7 will revisit objective design and drawconnections between a number of relevant objectives for robotic exploration.
Some of the earliest works on exploration [198] emphasize the geometry of the environ-ment: observations of unknown space by a robot in known free space all pass through the boundary between the known free space and unknown space which is called the frontier .Providing the weak assumption that robots are always able to observe nearby frontiers,navigating toward the nearest frontier can serve as a simple and reliable technique for ex-ploring an environment to completion. Although frontiers can be useful for selecting localcontrol actions [50, 62], they also model the distribution of information sources in explo-ration tasks: after mapping one part of an environment, a robot may traverse a significantdistance to reach the next nearest frontier. In this sense, we can generalize frontiers as setsof sufficiently informative points in the robot state space [57], and robots can alternatebetween maximizing information gain locally and navigating to new information-rich re-gions of the environment. Moreover, in either case, designers may consider routing robotsbetween information-rich regions or frontiers [111] to reduce travel distance or completiontime.
Most works on multi-robot exploration focus on assignment of frontiers [29, 30] and envi-ronment regions [170] or maintaining maps via communication [40, 57]. Although authorsdo not typically apply arguments based on submodularity in this domain, greedy assign-ment mechanisms are common [29, 30] but at the level of task assignment rather thanreceding-horizon planning as in our approach (which we would expect to run at fasterrates than global assignment processes). 23s such, designers seeking to map an environment quickly may deploy large numbersof robots that interact on short time scales, and this thesis addresses the concomitantchallenges via coordination over finite horizons. Our approach then complements works onexploration that focus on larger spatial and temporal scales [29, 30, 134], and such methodsmay run in outer loops or on centralized nodes to supplement the distributed planners wepropose (as in Fig. 1.2).
Exploration scenarios, such as for search and rescue [138] have the potential to impaircommunication between robots via damage to infrastructure and harsh environments (e.g.due to stone and concrete). Teams of robots may then jointly plan to explore the en-vironment and to maintain or regain communication links to communicate data to theoperator and each other [13, 14, 181]. Such, methods for maintaining communication arediverse: Banfi et al. [14] allow robots to become disconnected but enforce recurrent con-nectivity constraints to enable periodic communication; Tatum [181] proposes a methodwhere robots jointly explore and place communication nodes with emphasis on subter-ranean environments; or designers may simply require the team to maintain connectivityat all times [128, 187].This thesis does not address connectivity directly. However, the planners we proposeare robust to loss of communication, and our approach could reasonably be extended byapplying any of the cited connectivity-aware methods as part of a high-level planningcomponent.
Having discussed a few sensing problems in robotics, let us delve into the foundations forquantifying uncertainty in states and information gained via sensing actions. To begin,entropy quantifies the uncertainty in a random variable X —such as the position of a robotor a target—with the average number of bits necessary to encode a realization of thatvariable and is denoted as H ( X ) = (cid:88) i − P ( X = i ) log P ( X = i ) . (3.1)The joint entropy then expresses the combined uncertainty of two (or more) variables X and Y and simply involves another sum: H ( X, Y ) = (cid:88) i (cid:88) j − P ( X = i, Y = j ) log P ( X = i, Y = j ) . (3.2)Then, conditional entropy encodes the expected uncertainty in X given that Y will beobserved: H ( X | Y ) = (cid:88) i (cid:88) j − P ( X = i, Y = j ) log P ( X = i | Y = j ) . (3.3)24his can be written as an expectation over possible outcomes for Y := (cid:88) i (cid:88) j − P ( Y = j ) P ( X = i | Y = j ) log P ( X = i | Y = j )= E j ∼ Y [ H ( X | Y = j )] (3.4)or in terms of the joint entropy:= H ( X, Y ) − H ( Y ) . The goal of exploration and other sensor planning problems is often to reduce uncertaintyand therefore entropy of a target variable (say X ) by obtaining observations Y .Mutual information quantifies that expected reduction of the entropy of X given anobservation Y I ( X ; Y ) = (cid:88) i (cid:88) j − P ( X = i, Y = j ) log P ( X = i ) P ( Y = j ) P ( X = i, Y = j ) , (3.5)and can be written in terms of entropies= H ( X ) − H ( X | Y ) = H ( X ) + H ( Y ) − H ( X, Y ) . (3.6)In some cases, we will also apply the conditional mutual information which simply involvesmodifying the definition to include conditioning on the given variable I ( X ; Y | Z ) = H ( X | Z ) − H ( X | Y, Z ) (3.7)= − H ( Z ) + H ( X, Z ) + H ( Y, Z ) − H ( X, Y, Z ) . Please refer to Cover and Thomas [58] for further detail on information theory and theproperties of entropy and mutual information and in reference to further discussion in thissection, except when otherwise noted.
The discussion at the beginning of this section focuses on information theory for the sim-ple case of discrete variables. Analogous expressions exist for continuous variables andprobability densities. These collectively refer to differential information. For example, thedifferential entropy can be written as h ( X ) = (cid:90) − p ( X = x ) log p ( X = x ) d x (3.8)where X is a continuous variable, and p ( X = x ) is its probability density.The expression for (3.8) is nearly identical to the original expression for entropy in(3.1) aside from replacing the sum with an integral. Likewise, differential informationsatisfies many of the same properties as the discrete form so that the two are frequently25nterchangeable. In particular, this thesis will make use of monotonicity properties such assubmodularity which are discussed later in this chapter; both forms of information satisfythese properties equally. As such, this thesis will not typically disambiguate differentialand discrete information, except in cases where the two behave differently. One suchdifference is that differential entropy may be negative so that h ( X ) (cid:3)
0. By extension, I ( X ; Y ) = h ( X ) − h ( X | Y ) (cid:2) h ( X ) so that mutual information is also not upper-boundedby differential entropy. Fortunately, even though differential entropy may be negative,mutual information between continuous variables is always non-negative. Information theoretic expressions exhibit a number of intuitive properties from which thistheory gains much of its elegance and generality.Entropy and mutual information satisfy chain rules [58, Theorem 2.5.1–2]. For entropy H ( X , . . . , X n ) = H ( X n | X , . . . , X n − ) + H ( X , . . . , X n − ) , (3.9)and for mutual information I ( X ; Y , . . . , Y n ) = I ( X ; Y n | Y , . . . , Y n − ) + I ( X ; Y , . . . , Y n − ) . (3.10)These equations allow us to write both kinds of quantities in terms of sums of marginalgains (the conditional term in each expression). We will later find that we are able to takeadvantage of the behaviors of these marginal gains when optimizing information-theoreticquantities.Entropy also decreases under conditioning so that H ( X | Y ) ≤ H ( X ) which expressesthat new information cannot increase the uncertainty of a random variable. If we write themutual information in terms of entropies, we can rearrange this inequality to prove that themutual information between two variables is non-negative I ( X ; Y ) = H ( X ) − H ( X | Y ) ≥ X and Y are independent H ( X ) = H ( X | Y ) and therefore I ( X ; Y ) = 0. Similar statements hold forconditioning when variables are conditionally independent. Additionally, these propertiescan be used to prove that common cases of mutual information are submodular, and wewill come back to that later.Having an upper bound on mutual information is often useful such as when we wouldlike to establish when a sensing task is almost complete. For discrete random variables H ( X ) ≥ I ( X ; Y ) = H ( X ) − H ( X | Y ) ≤ H ( X ) . (3.11)However, entropy for continuous variables may be negative because, with enough observa-tions, we may learn the value of a continuous variable with infinite precision and therebygain infinite information. However, if observations share a noisy channel, mutual informa-tion between the observation and target variable is bounded by the mutual information26etween the target variable and the output of the channel. Specifically, the Markov in-equality establishes that if X → Y → Z I ( X ; Y ) ≤ I ( X ; Z ) . (3.12)This will be important later for target tracking problems where we will discuss channelcapacities between robots and targets. This thesis applies techniques from information theory and combinatorial optimization torobot sensing and information gathering. As a robot moves through an environment itmay obtain sensor data, reduce uncertainty in its environment model, and gain informa-tion rewards. Problems related to selecting actions to maximize quality of sensor data areprominent in robotics and control and studied with a wide variety of tools. The methodswe propose for coordinating teams of robots complement algorithms for individual robotsas both can contribute to improving completion times in time-sensitive sensing tasks. Here,we provide an overview of such techniques moving from general applications (Sec. 3.3.1),to more formal informative path planning problems (Sec. 3.3.2), and theory that charac-terizes how and why robots benefit from replanning and updating their decisions based onincoming sensor data (Sec. 3.3.3).
Sensing and information gathering problems span a wide spectrum that includes trackingand localization problems [35, 59], robot state estimation [7, 93], sensing problems basedon manipulation and mechanical interactions [1, 21, 53], and active perception problemsfor control and reconstruction with image data [10, 153]. Although methods for decision-making vary, submodular functions and information-theoretic objectives not only generalizeacross these domains but are also applied regularly [7, 35, 53, 59, 93, 153]. While we focus ontime-sensitive mapping and sensing problems such as urban disaster response, the methodswe propose may also be applied more broadly. Likewise, we desire planning techniques thatapply across multiple sensing tasks as aspects of different tasks such as localization [35, 59]and mapping [37, 103] may arise simultaneously.
Algorithms for informative path planning seek to plan a path for a robot to maximizeinformation gain or some other submodular function, subject to a limited travel budget.Such problems have been studied extensively. Performance guarantees exist for algorithmson graphs [41, 169, 200], and sampling-based methods exist for optimization of paths incontinuous spaces [90]. The expression X → Y → Z is a Bayes graph and states that the joint distribution of the randomvariables can be factored as P ( X = x, Y = y, Z = z ) = P ( X ) P ( X = x | Y = y ) P ( Z = z | Y = y ). a) No obstacle (b)
Obstacle
Figure 3.1:
The illustrations above provide a typical example of adaptation (or replanning) during roboticexploration. Here, a robot will observe unknown space behind a boulder only as that robot navigatesaround to the back. This robot may benefit from quick adaptation by (a) navigating into the unknownspace if there is no obstacle. Else, (b) the robot must be prepared to remain in free space if that unknownspace turns out to be occupied.
These and related algorithms can also serve as suboptimal oracles to greedily optimizepaths for individual robots in multi-robot problems [169], and performance guarantees canbe rewritten in terms of the performance of the given planner. In this sense, path plannerscan be chosen independently of methods for coordinating the multi-robot team.
Decision processes such as Markov decision process (MDPs), partially-observable Markovdecision process (POMDPs), and variants thereof directly model continuing processes of se-lecting actions, obtaining observations, and selecting new actions to maximize some reward.Although rewards are typically functions of states and actions, specialized formulationspermit rewards in terms of beliefs e.g. based on mutual information or entropy [5, 171].Greedy algorithms have been applied to solve combinatorial subproblems when actions aresets of sensors [159] and in multi-agent problems to optimize sequences of policies [112],and those lines of work are complementary to our own.Instead, adaptive submodularity [79] seeks to directly extend techniques for submodularoptimization to decision processes. However, such approaches require strict regularity onobjectives and the sets of actions to maintain monotonicity and diminishing returns. Mobilerobots generally violate these conditions as sets of available sensing actions change while therobot moves through the environment. Other problems in robotics, such as manipulation,are more amenable to the requirements of adaptive submodularity [45, 96].Few methods for decision processes are applicable to mobile robots and large-scaleproblems. However, Monte-Carlo techniques for POMDPs provide an avenue to addressinglarge-scale decision processes [167]. These techniques and applications in robotic explo-ration [116] inspired our use of Monte-Carlo tree search for single-robot planning in thisthesis.Otherwise, rather than seeking to optimize adaptation via policies—which is often28ntractable—we may instead consider the benefit of adaptation and the cost of not adapt-ing [6, 78, 91]. Operation in unknown environments, as in urban disaster response, alsoinduces rapidly changing beliefs. Robots moving through cluttered environments may ben-efit from quickly responding to observations of occluded objects [76, 77] (see the examplein Fig. 3.1), and further, teams of robots should maintain this ability to adapt and replanas appropriate for the time-scales of a given sensing problem.
The sensing problems that we have discussed to this point also come with multi-robotcounterparts. We are interested in scenarios where robots interact locally while operatingin close proximity and observing nearby regions of an environment, and our methods seekto enable robots to react to new observations both individually and collectively over shorttime spans. The techniques we develop leverage simple greedy algorithms and are scalablewhile providing strong performance guarantees. However, various other techniques existand are applicable to the problems we consider. Best et al. [18] propose a multi-robotplanner that applies Monte-Carlo tree search as we do but uses a different mechanism formulti-robot coordination—probability collectives—a generic optimization approach basedon game theory. In this case, our approach will provide stronger guarantees on solutionquality for planning in finite time. We also note that Sung et al. [176] propose a max-minformulation for tracking problems that addresses similar short time scales that satisfiesperformance guarantees before rounding. Sung et al. [176] also address a similar challengeas we do by developing an algorithm with computation time that is independent of theproblem size. Although we study a slightly different class of sensing problems, we note thatour mechanisms for coordination, based on sharing incremental solutions, are somewhatsimpler and more general than the linear programming methods that they employ.
Singh et al. [169] are largely responsible for developing sequential planning methods formulti-robot sensing (based on methods for submodular maximization which we will discussnext in Sec. 3.5) and informative path planning. In these methods, robots plan in asequence, (as illustrated in Fig. 3.2) and each robot seeks to maximize a sensing objectivegiven all prior decisions. For many objectives, solutions by sequential planning are alsoguaranteed to be within half of the optimal objective value, whereas obtaining optimalsolutions is NP-Hard.While Singh et al. [169] studied a mutual information objective, sequential planners Historical note: Singh et al. [169] are most commonly cited for specializing matroid-constrainedsubmodular maximization to partition matroids for sensor planning and, more specifically, multi-robotpath planning. Goundan and Schulz [81] observed around the same time that this kind of approachrecreates a variant of the locally greedy algorithm described by Fisher et al. [69]. In fact, Singh et al. [169],Goundan and Schulz [81], and even also Williams [193] all appear to have independently (re)developedsimilar specializations to partition matroids (implicitly or explicitly) at around the same time. igure 3.2: Four robots plan sequentially. Each robot selects an action to maximize a sensing objectivegiven the decisions by the prior robots. can provide suboptimality guarantees for wide variety of objective functions (which wewill discuss in the next section). As a result, their work influenced not only a variety ofworks on informative planning [7, 151] but also problems with more varied objectives suchas search for a moving target [89] and survivability-aware problems where robots may failen-route to their destinations [102]. While most of these authors study similar multi-robotpath planning problems, Jorgensen et al. [102] and Williams et al. [194] also apply relatedgreedy planning methods to more complex assignment problems such as with constraintson coverage, resources (such as a limited number of runways), and network connectivity.
A useful feature of sequential planners is that specialized planners like those in Sec. 3.3.2can be used to implement the maximization step while retaining similar approximationguarantees [169] (that is, they serve as maximization oracles ). This is particularly impor-tant because the alternative is to iterate over all possible solutions for a given robot—doingso is typically intractable for path planning problems as the number of possible solutionsis generally exponential in both the time horizon and the number of available actions ateach step. Moreover, this approach enables designers to take advantage of advances forplanning for individual robots and to apply those methods to planning for multiple robotswith minimal effort.
In non-adaptive settings, (where robots do not replan after obtaining new information)solutions can be found offline and without strict constraints on time or computation. Forexample, Singh et al. [169] plan paths for a sensing task on a lake and compute the entirepaths beforehand. However, this thesis studies adaptive settings (Sec. 2.1.1f) where theplan or sensing actions change in response to incoming sensor data during execution of thesensing task, and sequential planning forms a component of a solver for a receding-horizonoptimization problem (Sec. 2.1.2) [20, 38, 116].Sequential planners for these receding horizon problems then must satisfy time con-straints for replanning. When planning with large numbers of robots, meeting time con-straints may require improving efficiency via parallel versions of these planners or by paral-lelizing the single-robot planners. The following sections will describe the theory for thesesensing problems and provide more detail regarding parallel and distributed planning.30 .5 Submodular maximization
The sequential planning and informative path planning problems that arise in this thesisbroadly fall into the framework of so-called submodular maximization problems. Specifi-cally, many sensing objectives increase monotonically and exhibit diminishing returns (sub-modularity). These classes of functions have been studied extensively from the perspectiveof combinatorial optimization, and there are numerous results for exact and approximateoptimization in various settings. In this way, monotonic and submodular functions studiedin combinatorial optimization are analogous to the convex functions studied in nonlinearoptimization: each has been studied intensively and optimization problems involving ei-ther class of functions can be solved efficiently . Objective functions with these propertiesappear frequently in robotics and especially in sensing problems. For example, notions ofmonotonicity or diminishing returns intuitively describe how the value of a collection ofsensing actions changes as actions are added or removed. Later, we will discuss submodularobjective functions such as common cases of mutual information objectives and differentnotions of sensor coverage.Going a step further, we are also interested in different kinds of constraints on theseoptimization problems, just as constraints are common in other kinds of optimizationproblems. Whereas this section discusses optimizing functions of sets, the constraints inquestion will determine which sets to consider. These constraints model concepts such aswhich robots can perform an action, how many actions a robot can perform at once, orwhich sets of actions will result in collisions between robots. In this sense, these constraintswill describe which sets of actions teams of robots can feasibly execute. Consider a finite—but possibly extremely large—set of sensing actions U , commonly re-ferred to as the ground set . The ground set describes the universe of set elements that areavailable in a given optimization problem e.g. the set of all sensing actions or finite-horizonplans available to any robot. A set function g : 2 U → R maps a collection of such actionsto a real value—often a reward—where 2 U is the power set of the ground set, the set ofall subsets of U .Most objectives have zero value at the empty set, or else that value can be subtractedto normalize the objective. Definition 1 (Normalized) . A set function g is normalized if g ( ∅ ) = 0 . A function is monotonically increasing (or monotonic) if its value cannot decrease. Definition 2 (Monotonically increasing) . A set function g is monotonically increasing iffor any A ⊆ B ⊆ U then g ( A ) ≤ g ( B ) (3.13) There are connections between submodular functions and both convex and concave functions, butthose are unrelated to this analogy. Some texts describe the functions that we call monotonically increasing as being “non-decreasing” toemphasize the weak inequality in the definition.
Definition 3 (Submodular (supermodular)) . A set function g is submodular if for any A ⊆ B ⊆ U and C ⊆ U \ B then g ( A ∪ C ) − g ( A ) ≥ g ( B ∪ C ) − g ( B ) . (3.14) Here, the marginal gain for C decreases after adding elements to A to obtain B . Thenegation − g is supermodular. Objective functions for robot sensing are frequently monotonic and submodular. Mutualinformation (3.5) and entropy (3.1) naturally characterize reduction of uncertainty with re-spect to an unknown environment, and robots may select sensing actions which reduce thatuncertainty. Mutual information with respect to a target variable satisfies monotonicityand submodularity when observations are conditionally independent [107]. Specifically, ifwe associate elements of a ground set a ∈ U with conditionally independent observations Y a , the set function g ( A ) = I ( X ; Y A ) is normalized, monotonic, and submodular [107].However, designers may consider other kinds of objectives, either for different tasks orbecause mutual information can be expensive to evaluate. For example, we will also studysimpler coverage-like objectives which satisfy the same properties and can be thought ofas representing sums of rewards for observing objects in subsets of the environment. For brevity, we will treat set functions as multi-variate functions with implicit unions sothat g ( A, B ) = g ( A ∪ B ) where A, B ⊆ U . We will also implicitly convert elements of theground set to subsets so that g ( x ) = g ( { x } ) for x ∈ U . We also write incremental changesas g ( A | B ) = g ( A, B ) − g ( B ) which is analogous to conditioning for mutual information (3.7)and also expresses the discrete derivative of g at B with respect to A (discussed next inSec. 3.5.1c). Using this notation, the expression for submodularity (3.14) can be writtenmore concisely as g ( C | A ) ≥ g ( C | B ) , (3.15)recalling that A ⊆ B .When dealing with sets and set functions, being able to concisely specify and index intosets will be advantageous. Often sets will be specified with an implied ordering as in X = { x , . . . , x n } . Subscripts will then be used for indexing subsets as in the following range X i = { x , . . . , x i } for i ≤ n or sometimes using sets of indices so that X A = { x j | j ∈ A } . The definitions of monotonic (3.13) and submodular (3.14) functions are closely related.The value of a monotonic function may only stay the same or increase, and the values ofincremental changes of submodular functions may only stay the same or decrease.32 a) g ( A ) = g ( A |∅ ) (b) g ( B | A ) (c) − g ( B ; C | A ) Figure 3.3:
In this figure, areas shown in gray are equal in size to (a) zeroth, (b) first, and (c) secondderivatives of a set function g , each evaluated at A . As will be discussed in more detail, later, the derivativesof some set functions are monotonic such as in this illustration of area coverage. If we interpret A as asubset of R , (a) area coverage increases monotonically in A ; (b) marginal gains decrease monotonically; (c)and the second derivative increases monotonically (the negation of which, the highlighted area, decreases). We can go further and define higher derivatives of set functions and correspondingmonotonicity conditions. To assist in understanding, Fig. 3.3 provides examples of suchderivatives for an area coverage objective. Given disjoint variables
A, B, C ⊆ U the secondderivative of g at C in directions A and B is g ( A ; B | C ) = g ( A, B, C ) − g ( A, C ) − g ( B, C ) + g ( C ) . Expressions of second derivatives will also arise with respect to the empty set. Suchexpressions take the form of g ( A ; B ) = g ( A | B ) − g ( A ) = g ( A, B ) − g ( A ) − g ( B ) , given that g ( ∅ ) = 0, and being able to recognize them will be useful. For higher derivatives,given X ⊆ U and Y , . . . , Y n ⊆ U , all disjoint, we can define the n th derivative of g withrespect to Y , . . . , Y n recursively as g ( Y ; . . . ; Y n | X ) = g ( Y ; . . . ; Y n − | X, Y n ) − g ( Y ; . . . ; Y n − | X ) . (3.16)Note that, just as for a continuous derivatives, derivatives of set functions are not sensitiveto the order of the directions.This notation for set functions and their discrete derivatives also intentionally overlapswith notation for entropy and mutual information. Continuing with this analogy, themutual information could be thought of as the negation of the second derivative of entropywere it interpreted as a set function as I ( A ; B | C ) = H ( A | C ) − H ( A | B, C ). Therefore, onemight adopt the colloquialism I ( A ; B | C ) = − H ( A ; B | C ) , although no such notation will beused in this text .Foldes and Hammer [70] provide a slightly different definition of the derivative. Asthis thesis will reference their work repeatedly, we provide a brief proof of equivalence inAppendix B.1. 33 igure 3.4: The above illustrates the 3-increasing condition for coverage. Let the area covered by anumber of sets (or sensing actions) be a function g . The green area corresponds to an amount of sensingredundancy between two sensor coverage actions A and B conditional on a third observation C . The sizeof that area in green is equal to the negation of a second derivative − g ( A ; B | C ) = g ( A | C ) + g ( B | C ) − g ( A, B | C ), and the area decreases monotonically as the region represented by C increases. Then, secondderivatives g ( A ; B | C ) increase monotonically for coverage objectives so we say that g is 3-increasing (asthe first monotonicity condition corresponds to the zeroth derivative). Foldes and Hammer [70] also describe monotonic and submodular set functions as being,respectively, 1-increasing and 2-decreasing and provide detailed discussion of these condi-tions. More generally, if the n th derivative of a function is non-negative, then derivativesof order n − n -increasing. Then,if that derivative is non-positive, the function is instead n -decreasing. Later, we will ap-ply this terminology in a requirement that certain objective functions are 3-increasing. Inparticular, weighted set coverage is 3-increasing, and we illustrate the intuition in Fig. 3.4.Later, we will provide more direct analysis when we encounter this condition again.Higher-order monotonicity conditions, such as this, are not yet common in the literatureon optimizing set functions but are experiencing increasing interest [44, 84, 106, 191]. Wanget al. [191] apply the 3-increasing condition to develop a variation of the continuous greedyalgorithm, and Chen et al. [44] study functions with alternating monotonicity conditions,something we will also encounter. Measures of redundancy [68] and analogues of mutualinformation [94] also fit the form of second derivatives and are highly relevant to works on3-increasing functions. When proving bounds for submodular maximization techniques, we will often want todecompose a derivatives with respect to sets (say A ⊆ U ) in terms of the contributions ofindividual elements (i.e. a ∈ A ). Chain rules allow us to do so.Previously, we have discussed such chain rules in the context of information theory(Sec. 3.2). Set functions and their derivatives satisfy analogous chain rules. Later, we willrequire chain rules for both first and second derivatives of set functions. Toward this end,the following defines a chain rule for any derivative of a set function. Lemma 1 (Chain rule for derivatives of set functions) . Consider sets Y , . . . , Y n , X ⊆ U ,all disjoint. Then, writing the elements of Y n as Y n = { y n, , . . . , y n, | Y n | } , the derivative of can be rewritten in terms of derivatives with respect to the individual elements of Y n as g ( Y ; . . . ; Y n | X ) = | Y n | (cid:88) i =1 g ( Y ; . . . ; Y n − ; y n,i | Y n, i − , X ) . (3.17)The proof of Lemma 1 is in Appendix B.2. This thesis focuses primarily on multi-robot planning problems that can be described usingcombinatorial constraints. Such constraints that describe feasible solution sets are called set systems . However, a completely general constraint would not be useful for the purposeof optimization. Independence systems describes constraints that admit subsets of anyfeasible set and which is the most general kind of set system that we will find useful.
Definition 4 (Independence system) . A tuple ( U , I ) where I is composed of subsetsof the ground set such that I ∈ I → I ⊆ U is an independence system if I satisfiesa heredity property so that for all I ⊆ I ∈ I then I ∈ I and, additionally, I isnon-empty ( ∅ ∈ I ). Multi-robot non-collision constraints form one example of an independence system;eliminating a robot from a non-colliding multi-robot plan will not cause the remainingrobots to be in collision.A matroid is a special case of an independence system and generalizes the notionsof linear independence from vector spaces to set systems. Matroids have been studiedextensively in the optimization literature (discussed in more detail in following sub-sections)and find use in this work to model product spaces in multi-robot planning problems.
Definition 5 (Matroid) . An independence system ( U , I ) that also satisfies the exchangeproperty is a matroid, that is if for all I , I ∈ I such that | I | > | I | there exists some x ∈ I \ I such that I ∪ { x } ∈ I . The reader may think of elements that can be added to a given set as being effectivelyperpendicular to that set. For more detail on this topic, the reader may wish to refer toSchrijver [162]. A consequence of the exchange property is that all maximal independentsets—those to which we cannot add any more elements—have the same cardinality whichis referred to as the rank of the matroid or rank( I ). Additionally, in this work, the rankwill also be equal to the number of robots or other sensing agents.Although matroids are the focus of numerous related works on optimization, a matroidis a somewhat more general structure than is necessary in this thesis, and we will gain inefficiency from using slightly more specialized constraints. A partition matroid is a specialcase of a matroid and is nearly directly analogous to a product space. Definition 6 (Partition matroid) . Consider a partitioning of the ground set into blocks B i ⊆ I for all i ∈ { . . . n } such that B i ∩ B j = ∅ for all i (cid:54) = j . Then ( U , I ) is apartition matroid if I = { I ⊆ U | a i ≥ | I ∩ B i | , ∀ i ∈ { . . . n }} where a i ∈ Z > is the maximum number of elements that can be selected from a given block. a i = 1 for all i , the partition matroid will be referred to as a simple partitionmatroid [162, Sec. 39.4]. These matroids will be used to model joint action spaces of multi-robot teams where B i is the space of trajectories or actions available to robot i , assumingthat there are no inter-robot constraints on action selection. Although the techniquesdeveloped in this thesis will all use simple partition matroids, they all extend to generalpartition matroids with slight modification.Finally, we will discuss one more (and simpler) type of matroid, the uniform matroid,which expresses a cardinality constraint on sets in the set system. Definition 7 (Uniform matroid) . A uniform matroid is an independence system ( U , I ) that fits the form fits the form I = { I ⊆ U | n ≥ | I |} where n ∈ Z > . The problems studied in this thesis involve selecting control actions to maximize notionsof sensing quality such as mutual information with a map or sensor coverage. When suchobjectives are submodular (and generally also monotonic and normalized), the resultingoptimization problems fit broadly into the realm of submodular maximization problems.Then, when these problems involve planning actions for a multi-robot team, as hinted inSec. 3.5.2, we will often also introduce a matroid constraint of some form.
Problem 1 (Submodular (monotonic, normalized) maximization problem) . Let g : 2 U → R be a submodular, monotonic, and normalized set function and ( U , I ) a set system.Together, these define a submodular maximization problem X (cid:63) ∈ arg max X ∈ I g ( X ) with optimal solution(s) X (cid:63) . Now, unless otherwise specified, the objectives in any submodular maximization prob-lems discussed in this text will also be monotonic and normalized. We will primarily onlyrefer to other cases when discussing related work on other problems or in passing. Instead,we now focus on the form of the constraint I and, eventually, further conditions on g .A general set system may present 2 | U | feasible solutions. Although | I | will generally besmaller, solving submodular maximization problems by iteration is typically intractable.Recall that when ( U , I ) is an independence system (Def. 4), subsets of feasible solutionsare also feasible. Solutions can then be constructed incrementally by searching U for newelements to add to partial solutions. In fact, algorithms that construct such solutionsgreedily often guarantee solutions within a constant factor of the optimal solution. Algorithm 1 (Greedy submodular maximization) . Greedy submodular maximization pro-duces feasible (suboptimal) solutions to submodular maximization problem (Prob. 1) where ( U , I ) is an independence system. Define an incremental solution X i such that X = ∅ and X i ∈ arg max { x }∪ X i − ∈ I g ( x | X i − ) . (3.18) This algorithm outputs a solution, designated X g = X n , once no more solution elementscan be added, i.e. once there is no x ∈ U such that { x } ∪ X n ∈ I . U when using thisalgorithm, but keep in mind that solving (3.18) may still be intractable when U is large.Developing methods that avoid incurring costs of such sequential maximization steps is arecurring theme in related work and this thesis. Cardinality-constrained submodular maximization forms a simple and particularly commonsubmodular maximization problem. In this case, the statement that the cardinalities offeasible solutions are no greater than some value produces a uniform matroid (Def. 7)constraint.
Problem 2 (Cardinality-constrained submodular maximization) . Any instance of Prob. 1where ( U , I ) is a uniform matroid is an instance of cardinality-constrained submodularmaximization. Cardinality-constrained problems arise frequently in applications [32, 51, 101] due tothe simplicity of the constraint as well as because of strong tightness and hardness guar-antees. These tightness and hardness guarantees are important to us because uniformmatroids are a special case of set systems such as partition matroids and because manyresults for cardinality constrained problems also apply to problems with more complexconstraints (such as by forming lower bounds). Nemhauser et al. [140] proved that greedysolutions to cardinality-constrained problems satisfy a constant-factor suboptimality guar-antee g ( X g ) ≥ (1 − /e ) g ( X (cid:63) ) ≈ . g ( X (cid:63) ). Soon after, some of the authors of thatwork also proved that this approximation guarantee is optimal in the value oracle model(where g is treated as a black box) for algorithms that evaluate g a polynomial numberof times [141]. Still, certain classes of these problems could be easier to solve or couldbe solved more quickly given complete access to the problem representation. Feige [66]proved that no better bound can be achieved by polynomial time algorithms for set cov-erage objectives unless P=NP. Not only is this a stronger guarantee, but the simplicityof the set coverage objective enables reductions to other relevant objectives. This includesgeneralizations of coverage discussed later in this text and, thanks to Krause and Guestrin[107], Shannon mutual information on Bayes graphs. While cardinality-constrained problems provide hardness results that are relevant to thisthesis, matroid-constrained problems will generalize the multi-robot planning problemsthat we will discuss.
Problem 3 (Matroid-constrained submodular maximization) . An instance of Prob. 1where ( U , I ) is a matroid is an instance of matroid-constrained submodular maximization. Fisher et al. [69] proved that when ( U , I ) is a matroid, solutions by Alg. 1 satisfy g ( X g ) ≥ / g ( X (cid:63) ). Although this thesis applies greedy algorithms, more recent works [31,67] propose more advanced methods such as the continuous greedy algorithm [31, 185] Set coverage refers to “covering” a set C . The elements of the ground set x ∈ U are subsets of thisset x ⊆ C , and the objective is to maximize the cardinality of their union g ( X ) = (cid:12)(cid:12)(cid:83) x ∈ X x (cid:12)(cid:12) . − /e which is the best possible guarantee given thehardness results for cardinality-constrained problems. Although direct application of Alg. 1 provides solutions within half of optimal, as men-tioned, the greedy step (3.18) involves iteration over the entirety of U which may beexpensive. This can be avoided in cases such as when the matroid constraint is a partitionmatroid. For simplicity, we will focus on the special case of simple partition matroids. Problem 4 (Simple partition matroid-constrained submodular maximization) . An in-stance of Prob. 1 where ( U , I ) is a simple partition matroid is an instance of partitionmatroid-constrained submodular maximization. Problems constrained with simple partition matroids admit a relaxation of the greedyalgorithm: rather than computing the incremental solutions by maximizing over the entireground set, increments can be computed by maximizing over the blocks of the partition.
Algorithm 2 (Locally greedy algorithm for simple partition matroids) . The locally greedyalgorithm produces feasible (suboptimal) solutions to partition matroid-constrained maxi-mization submodular maximization problems (Prob. 4). Define solution elements x i suchthat x i ∈ arg max x ∈ B i g ( x |{ x , . . . , x i − } ) (3.19) to produce the output X g = { x , . . . , x n } where n is the rank of the partition matroid. Notably, this algorithm implements the same sequential planning process as in Sec. 3.4.1and will gain guarantees on solution quality for submodular functions. This algorithm isalso similar to the prior algorithm for greedy submodular maximization (Alg. 1). However,because the maximization steps are now over individual blocks, computation of a completesolution requires only a single pass over the ground set. Yet, crucially, solutions via Alg. 2satisfy the same bounds as for Alg. 1: g ( X g ) ≥ / g ( X (cid:63) ).The form of the locally greedy algorithm is particularly desirable for distributed imple-mentations on multi-robot systems. Recall that a simple partition matroid can be used tomodel the joint space in a multi-robot planning problem. In this case, B i is the local setof actions available to the i th robot. Robots can then plan for themselves by search overthese local sets of actions and given access to prior robots’ plans but without access to theentire ground set. However, dependency on prior solution elements is a barrier to parallelcomputation in distributed implementations. One of the contributions of this thesis is toeliminate that barrier in multi-robot sensing problems. Monotonicity serves the starting point in the proofs of many common performance bounds.If X (cid:63) is an optimal solution to a submodular maximization problem (Prob. 1) and X ⊆ U ,then g ( X (cid:63) ) ≤ g ( X (cid:63) , X ) . (3.20)38ypically, X is the full solution obtained by the greedy algorithm, but that is not strictlynecessary [82]. We can exchange an element of the optimal solution x (cid:63) ∈ X (cid:63) for an elementof a greedy solution by applying the principle of greedy choice g ( x (cid:63) | ˆ X ) ≤ g ( x g | ˆ X ) (3.21)where ˆ X is generally a subset of the greedy solution and x (cid:63) ∈ B belongs to the feasible set B ⊆ U of a greedy maximization step that solves x g = max x ∈ B g ( x | ˆ X ). Observe that if weinstead have an expression for g ( x | ¯ X ) where ˆ X ⊆ ¯ X ⊆ U we can apply submodularity toreplace ¯ X with ˆ X because g ( x | ¯ X ) ≤ g ( x | ˆ X ). If the greedy solutions are also suboptimal,we may also apply the expression for the suboptimality to (3.21) to replace the optimalgreedy solution elements with a suboptimal ones. As Minoux [132] observes, this intuition can be applied more generally to obtain onlinebounds [79, 109] for solutions obtained by any algorithm such as by applying (3.20) andbounding the terms of the sum in g ( X (cid:63) , X ) ≤ (cid:80) x (cid:63) ∈ X (cid:63) g ( x (cid:63) | X ) e.g. by maximizing overground set U or an independence system I . These online bounds have been used todemonstrate that real solutions by greedy algorithms can often perform significantly betterthan worst case bounds in practice [79, 109]. Similar bounds will later prove to be usefulfor characterizing solution quality for multi-robot exploration. An important challenge in this thesis is to reduce computation time for submodular maxi-mization in multi-robot sensor planning. Ideally, this computation time would be indepen-dent of the number of robots in the team, and we would achieve this by taking advantageof parallel computation. However, that is easier said than done due to sequential con-straints on the decision process and constraints on information access. Further, althoughthere is a growing body of work on parallel and distributed submodular maximization,many existing methods are impractical for multi-robot sensor planning. Still, a few workspropose methods that are relevant to this thesis, and we will seek to take advantage oftheir contributions.
Figure 3.5 illustrates the distributed setting for the submodular maximization problemsthat we will study. In this work, robots will solve submodular maximization problems(Prob. 4) where each robot i ∈ R selects an action from a block of a partition matroid x ∈ B i . We assume robots have access to a locally approximate model of the environmentwhich we refer to here as θ i (and which we will omit for most of this text) via a distributed39eceding-HorizonPlanningDistributedPerception Figure 3.5:
Illustration of the model of computation for distributed sensor planning. Each robot i ∈ R has access to a block of a partition matroid B i which represents that robot’s space of control actions. Theperception system then provides a locally consistent model θ i . Given this model, a robot can accuratelyapproximate marginal gains in the sensing objective g θ i ( x | Y ) for its own actions x ∈ B i given a set ofothers’ actions Y (e.g. numbers of newly observed cells in an occupancy map for x or the reduction inuncertainty of positions of nearby targets). But, that robot cannot compute marginal gains for other,possibly distant, robots (and distant targets or regions of an occupancy map). perception system. With this model a robot can approximate marginal gains for its ownactions g θ i ( x | Y ) given others Y ⊆ U . However, a robot may not evaluate marginal gainsfor assignments to other robots because that robot may not have an up-to-date model ofdistant regions of an occupancy map or estimates of positions of distant targets.Additionally, while evaluating different parallel and distributed algorithms, we willsometimes write results in terms of the number of robots n r = |R| based on the followingassumption: Assumption 1 (Proportional problem size) . Consider a team of n r robots and a sub-modular sensor planning problem (in the form of Prob. 4). The size of the ground set isproportional to the number of robots | U | ∈ Θ( n r ) ; the rank of the matroid constraint is n r ;and the blocks of the partition matroid have constant size | B i | ∈ Θ(1) for all i ∈ R . A number of properties of parallel and distributed submodular can contribute to increasesin computation time in multi-robot sensor planning. Consider a team of n r robots thatplan using O ( n r ) onboard processors. We are interested in the following quantities: • Adaptivity:
The number of sequential calls to the objective g . See the next section(Sec. 3.6.3) • Parallelism:
The number of parallel queries to the objective g that an algorithmassumes. Or, for the purpose of this section, the robots may be supported by a centralized resource with thesame number of processors. Query number:
The maximum number of queries to the objective g . Given analgorithm that runs with k queries, the n r processors require at least O ( k/n r ) time. • Query size:
This is the cardinality of the largest set | X | that an algorithm usesto query the objective g . For derivatives (3.16), (such as the marginal gain) thisincludes elements of the sets representing the derivative directions.Previously, Sec. 3.5 mentioned that different algorithms take different numbers of passesover the ground set. In the same direction, the general greedy algorithm (Alg. 1) involvesΘ( n r2 ) queries because it maximizes over the entire ground set at each step. On theother hand, the locally greedy algorithm (Alg. 2) only completes a single pass for Θ( n r )queries. Parallelizing the general greedy algorithm over n r processors may then obtainsimilar computation times as the local algorithm on a single processor.The total number of queries for the continuous greedy algorithm is more subtle. How-ever, we can note that each integration step requires at least Ω( n r ) queries (technically, thisscales with the rank of the matroid, but the rank is n r in all cases that we encounter). Al-gorithms that solve subproblems with the continuous greedy algorithm on single processorsrequire at least Ω( n r ) time.Additionally, we are not aware of any algorithm (other than what we propose) thatobtains constant-factor suboptimality with a maximum query size that is less than thenumber of robots. Instead, we are able to achieve constant query size and suboptimalityin certain settings by allowing robots to select actions for themselves given limited accessto prior selections.
Nominally, greedy methods (as in Alg. 2) operate sequentially (e.g. with one step perrobot) because each decision in the sequence takes all prior decisions into account. Afew works, that we will discuss later in more detail, consider variations of such greedyalgorithms where agents can ignore some prior decisions [75, 82]. However, if we fix thedegree of suboptimality, the best algorithms in this class require a number of sequentialsteps that is proportional to the number of robots , just as for other greedy algorithms.More broadly: How much can parallel computation reduce planning time for submod-ular maximization with any algorithm? To answer this question, recent works have begunto study the adaptive complexity of submodular maximization problems. Balkanski andSinger [11] describe the adaptive complexity as “the minimal number of sequential roundsrequired to achieve a constant factor approximation when polynomially-many queries [tothe submodular objective] can be executed in parallel at each round.” Moreover, adaptiv-ity is the worst-case number of sequential steps for a given algorithm; an algorithm thatexecutes in k steps is k -adaptive. By virtue of parallel execution, none of the queries to theobjective during a given round can depend on another, and the set of queries for a givenadaptive round is known at the beginning of the round. The query size for R-lRSP depends on the size of the largest communication neighborhood. aution: Adaptivity is related to but distinct from computational complexity and the(parallel) duration of computation. Adaptivity provides a lower bound on the growth ofthe computation time , but the actual computation time may grow more quickly, dependingon factors such as the number of available processors and the application context. Nev-ertheless, decreasing adaptivity is a necessary first step toward developing constant-timealgorithms in this thesis.Looking toward applications in multi-robot sensing, the best known algorithm for sub-modular maximization with a matroid constraint has adaptivity: O (log | U | log(rank I )) [12,42]. Further, Balkanski and Singer [11] proved that no algorithm can obtain adaptivitybetter than ˜Ω(log | U | ) (lower-bounded by log | U | up to lower-order logarithmic terms).Critically, these results improve on prior algorithms for submodular maximization thatobtain constant factor guarantees, including the continuous greedy algorithm [31], whichhas adaptivity of at least Ω( | U | ) [12, Sec. 3.1] in general. However, we caution that thecontinuous greedy algorithm may have better or even constant adaptivity on the class ofproblems that we study—the same analysis techniques that we apply to sequential plan-ning in this thesis might also be applicable to the continuous greedy algorithm to obtainperformance bounds where the number of integration steps is independent of the numberof robots.Further, one limitation of low-adaptivity methods is that converging to non-trivialconstant factor bounds (e.g. to obtain a 1 − /e − (cid:15) for small (cid:15) ) can require a large(polynomial) number of steps in both theory and practice as discussed by [24] and [119].However, Breuer et al. [24] presented an algorithm for cardinality-constrained problemswhich obtains both low-adaptivity and efficiency in practice and is the first to do so. Still,no such algorithm exists for more general matroid constraints. Likewise, although low-adaptivity results characterize what is possible for submodular maximization algorithms,they also depend on shared memory models of computation which makes them difficultto adapt to distributed computation on multi-robot teams.Table 3.1 summarizes the results we have discussed in this section in terms of the numberof robots n r in a sensing problem. Moreover, the adaptive complexity of submodularmaximization with a matroid constraint lies in ˜Ω(log n r ). In order to develop algorithmswith constant adaptivity, we will need to restrict the problem class further and rely onadditional structural properties of the sensing problems we study. The adaptivity of the continuous greedy algorithm depends on the number of discrete steps in anintegral that are necessary to obtain a performance bound. Balkanski et al. [12] (informally) provide a lower bound on this number. Additionally, suboptimality guarantees [31] typically produce upper boundson the adaptivity. Parallel Random Access Machines (PRAM) any algorithm that achieves constant-factor perfor-mance (general adaptive complexity) [11, 12]* ˜Ω(log n r ) O (log n r )For variations of the local greedy algorithm (Alg. 2) thatallow robots to ignore prior decisions [75, 82] Ω( n r ) O ( n r )* The tilde in ˜Ω or “Soft-Omega” indicates that the expression absorbs lower-order logarithmic terms.
Table 3.1:
Adaptivity results (numbers of sequential steps) for submodular maximization on partitionmatroids within a constant factor of optimal in terms of the number of robots n r . This thesis presentsalgorithms that achieve O (1) adaptivity on a slightly restricted class of submodular maximization problems. Many parallel algorithms for submodular maximization apply the popular
MapReduce model to solve problems with cardinality [133] and matroid [15, 16, 113] constraints. Al-though these algorithms may appear widely different at first glance, they share features thatmake them less suitable for multi-robot sensing. Each of these approaches [15, 16, 113, 133]seeks to reduce computation time by running greedy algorithms (sequential greedy algo-rithms or the continuous greedy algorithm) on small subsets of the ground set U . Indoing so, they distribute such subsets across processors and run some greedy algorithm oneach subset in parallel to produce candidate solutions. Although the mechanisms for pro-ducing subsets differ, those that run sequential greedy algorithms [15, 113, 133] on thosesubsets all have high adaptivity (at least O ( n r )) which means that computation times willinevitably increase and become intractable for large enough numbers of robots. Likewise,all approaches, including those that apply the continuous greedy algorithm [16], producefull rank solutions when solving subproblems and so have Ω( n r ) queries and computationtime on each processor.Moreover, these approaches are also generally impractical in the multi-robot settingsthat we consider in this thesis as each robot would have to compute candidate solutions forthe entire team. Additionally, distributing necessary information for that computation andcollecting the candidate solutions would involve a significant amount of communication.By contrast, this thesis proposes approaches with low adaptivity where robots plan forthemselves and communicate individual solution elements rather than full solutions. A number of works have proposed distributed algorithms that are applicable to submodularmaximization for different kinds of networked multi-robot and multi-agent systems [46, 75, The MapReduce model [60] is popular due to its simplicity, and roughly consists of three steps: 1)Distribute data across processors 2) Execute (map) some function on that data in parallel (say by squaringvalues) 3) Combine (reduce) the outputs from the prior step (e.g. by summing the squares). g ( X ) = (cid:80) n r i =1 g i ( X )) such as if theobjective is derived from data that is spread out across all processors. Their approaches areinteresting due to how they apply continuous consensus techniques [28, 143] to distributethe continuous greedy algorithm across processors. On the other hand, all processorsoperate on the entire ground set and constraint structure which requires Ω( n r ) time (muchlike algorithms in Sec. 3.6.4).Instead, Robey et al. [154] extend the work of Mokhtari et al. [135] to the same settingthat we apply for multi-robot sensing with the ground set and partition matroid distributedacross the computation nodes. However, convergence still slows with increasing numbers ofrobots as evident from inspection of the error bound [154, Theorem 1] and due to limitationson adaptivity of the continuous greedy algorithm (refer to Sec. 3.6.3, [12]). Althoughwe will not present novel variants of the continuous greedy algorithm, the methods foranalysis of redundancy that we develop could possibly also be applied to algorithms likewhat Robey et al. [154] describe to provide constant time guarantees for the problems westudy. Otherwise, the primary challenge for applying distributed versions of the continuousgreedy algorithm [154, 164] to the problems we study is adapting the continuous greedyalgorithm to use single-robot planners as maximization oracles. Doing so would requiresolving path planning problems on the multilinear extension of the objective [31] whichrequires costly sampling. Additionally, the large solution spaces endemic to path planningproblems may adversely impact suboptimality as error bounds for the continuous greedyalgorithm depend on the size of the ground set. Multi-robot sensor planning is also closely related to task assignment as both can oftenwritten in the form of similar submodular maximization problems. Specifically, there isa special case of submodular maximization with a partition matroid constraint (Prob. 4)that is relevant to task assignment, the submodular welfare problem. Seeking to solvesuch problems, Choi et al. [46] developed the consensus-based bundle algorithm (CBBA)which implements a distributed version of the general greedy algorithm (Alg. 1). Later,Williams et al. [194] demonstrated that this approach also applies to problems with morecomplex constraints such as intersections of matroids. Most importantly, several workshave extended [98, 166] CBBA, and others have demonstrated distributed implementa-tions have been demonstrated on aerial robots [146]. Although CBBA-like approaches canprovide significant speedups on relevant problems compared to the general greedy algo-rithm (which these approaches implement), they still have O ( n r ) adaptivity in the worstcase. However, CBBA can also converge more quickly in practice which is an importantpoint for comparison. Vondr´ak [185] provides an example of the reduction from a welfare problem to Prob. 4. .6.5b Game theory for distributed submodular maximization Distributed submodular maximization has also been studied from the perspective of gametheory. Specifically, various equilibria satisfy constant factor suboptimality guarantees [155].In fact, the strategy sets for games with submodular utility form a partition matroid, andNash equilibria satisfy the same performance guarantees as greedy algorithms on matroid-constrained problems [184].When agents employ algorithms that converge to such equilibria, game theoretic meth-ods can be interpreted as producing algorithms for distributed optimization. Moreover,Goundan and Schulz [81] observed that one method of doing so would implement the lo-cally greedy algorithm (Alg. 2) exactly. Additionally, this line of work has been applied toplanning and sensing problems [112] and has produced interesting results for coverage onVoronoi decompositions [126]. Still, game theoretic methods do not yet provide tools forreducing the time to find a near-optimal solution which is a key challenge for this thesis.
Increasing numbers of robots, unreliable and constrained communication, and limited plan-ning time all motivate a closer look at algorithms and models of computation for submodu-lar maximization applied to multi-robot systems. A few works [75, 82] have begun to studythe locally greedy algorithm (Alg. 2) on distributed computation models that are relevantto multi-robot sensor planning (where each agent in a network selects an action from ablock of a partition matroid). Specifically, Gharesifard and Smith [75] consider variantsof the locally greedy algorithm where agents have limited information about others’ deci-sions according to a directed acyclic graph (DAG) Fig. 3.6). Crucially, this framework candescribe planners that employ parallel computation [175], and for this reason, we applysimilar models throughout this thesis. Gharesifard and Smith [75] also provided worst-caseanalysis on this model (see also Sec. 3.6.3) which Grimsman et al. [82] then extended withtighter bounds. However, planners based on these worst-case results have Ω( n r ) adaptivitywhen the suboptimality is held constant. To address this, we will focus on developingplanners that can take advantage of features of problem structure beyond submodularityin order to provide near-optimal results with a constant number of sequential steps.45 a) A full sequential communication graph (b)
A graph consisting of two cliques (c)
A maximal graph for parallel planning
Figure 3.6:
This figure illustrates several directed acyclic communication graphs for submodular maxi-mization using the model proposed by Gharesifard and Smith [75]. (a) The standard locally greedy algo-rithm (Alg. 2) corresponds to a completed directed acyclic graph; (b) and although analysis by Gharesifardand Smith [75] and Grimsman et al. [82] leads to planners consisting of cliques, (c) algorithms proposedin this thesis use subsets of edges from maximal communication graphs with parallel computation. hapter 4Toward Distributed Multi-RobotExploration This chapter (which originally appeared in [54, 56]) introduces an early version of themethods developed in this thesis in the context of a team of robots exploring an unknownenvironment. We will continue to use Monte-Carlo tree search for single-robot planningthroughout this thesis as we also do in other related works [57, 76]. The parallel algorithmfor submodular maximization (DSGA) that we describe is also a precursor to methodsbased on Randomized Sequential Partitions (RSP) which we will introduce later on. Al-though possibly interesting in its own right, an actual distributed implementation of DSGAwould be more complex than the distributed implementation of RSP which we introducein Chapter 8. Moreover, DSGA requires more communication than RSP methods and doesnot fully eliminate certain elements of sequential computation.However, some of the contributions of this chapter are unique in this thesis. For ex-ample, this is the only chapter to address inter-robot collisions. Although this introducessome additional complexity, similar methods could be applied to the RSP planners thatwe discuss later. Similarly, this chapter provides the only results involving physical robots.
We pose multi-robot exploration as the problem of actively mapping environments byplanning actions for a team of sensor-equipped robots to maximize informative sensormeasurements. In this work, we address the problem of planning for exploration withlarge teams of robots using distributed computation and emphasize online planning andoperation in confined and cluttered environments.Informative planning problems of this form are known to be NP-Hard [110]. Ratherthan attempt to find an optimal solution in possibly exponential time, we seek approximatesolutions with bounded suboptimality that can be found efficiently in practice. Sequentialplanning techniques (Alg. 2) are common in active sensing and exploration [7, 34, 151].In this chapter, we will refer to Alg. 2 as performing sequential greedy assignment (SGA).Here, SGA assigns plans to robots in sequence using a single-robot planner to maximize47 a) (b) (c) (d)
Figure 4.1:
An example exploration experiment. A multi-robot team explores a three-dimensionalenvironment cluttered with numerous obstacles (cubes) while using an online distributed planner. Knownempty space is gray, and occupied space is black. Robots are shown in blue with red trajectories andobtain rainbow-colored point-cloud observations from their depth cameras. (a) The robots begin withrandomized initial positions near a lower edge of the exploration environment which is bounded by a cube.(b) After entering the environment robots spread out to cover the bottom of the cubic environment (c)and then proceed upward to cover more of the volume. (d) Given enough time the robots explore theentire environment. mutual information between a robot’s future observations and the explored map givenknowledge of plans assigned prior robots in the planning sequence. Recall that the prop-erties of mutual information objectives ensure that techniques such as SGA achieve sub-optimality within half of optimal [69] and that suboptimal single-robot planners achievesimilar bounds [169].The sequential nature of SGA implies that planning time increases at least linearlywith the number of robots and precludes online planning for large teams. Increasing com-putation time is especially relevant to exploration problems as new information aboutoccupancy significantly affects both feasible and optimal plans and necessitates reactiveplanning to enable high rates of exploration. We propose a modified version of SGA,distributed sequential greedy assignment (DSGA), which consists of a fixed number of se-quential planning rounds. At the beginning of each round, robots plan in parallel usinga single-robot planner. A subset of those plans is chosen to minimize the difference be-tween the information gain for the entire subset and the sum of the information gains foreach robot individually, which does not consider redundancy between robots. We obtaina performance bound in terms of the result by Singh et al. [169] that explicitly describesthe additional suboptimality due to parallel planning as the accrual of differences betweenactual and computed objective values during the planning process. In doing so, we re-duce the distributed planning problem to selection of subsets of plans after each parallelplanning phase that minimize these differences. Thus, we are able to take advantage of de-coupled observations and distributed computation to improve computation times for onlineplanning without compromising exploration performance.However, robots may collide with each other and the environment, and aerial robotshave non-trivial dynamics. Therefore, we also introduce inter-robot collision constraintsand modify our algorithms to guarantee collision-free operation. We apply this approach48o teams of aerial robots and demonstrate collision-free exploration both in simulation andexperimentally in a motion capture arena.
This section describes the problem of distributed multi-robot exploration. We begin bydescribing the system and environment models and then introduce the planning problemas a finite-horizon optimization.
Consider a team of robots, R = { r , . . . , r n r } , engaged in exploration of some environment m . The dynamics and sensing are described by x t = f ( x t − , u ) , (4.1) y t = h ( x t , m ) + ν (4.2)where x t represents a robot’s state at time t ∈ Z ≥ and u ∈ U is a control input selectedfrom the finite set U . For aerial robots as studied here x t ∈ SE (3). The observation, y t , is afunction of both the state and the environment and is corrupted by noise, ν . We use capitalletters to refer to random variables and lowercase for realizations so M and Y t representrandom variables associated with the environment and an observation, respectively. The environment itself is unknown and can be with the random variable M which is mod-eled as an occupancy grid [64] and with an associated approximations of mutual informa-tion for ranging sensors [37, 103, 201]. This occupancy grid is in turn discretized into cells, M = { C , . . . , C n m } , that are either occupied or free with some probability. Cells occupancyis independent such that the probability of a realization m is P ( M = m ) = (cid:81) n m i =1 P ( C i = c i )where c i ∈ { , } . The conditional probability of M given previous states and observationsis then written P ( M = m | x T , y T ) = (cid:81) Tt =1 P ( y t | M = m, x t ) P ( M = m ) (cid:80) m (cid:48) ∈M (cid:81) Tt =1 P ( y t | M = m (cid:48) , x t ) P ( M = m (cid:48) ) (4.3)where M ∈ { , } n m is the set of possible environments. As representing an unconstrainedjoint distribution between cells is intractable, the conditional probabilities of the cells givenprevious observations are also treated as being independent with probability p i,t such thatthe conditional probability is P ( M = m | x T , y T ) = n m (cid:89) i =1 p i,t (4.4)49e denote the collection of probabilities as the belief, b t = (cid:83) n m i =1 p i,t .The robots considered in this work are equipped with depth cameras, a form of rangingsensor. Each sensor observation consists of a collection of range observations which eachprovide the distance along a ray from the sensor origin to the nearest occupied cell in theenvironment, subject to additive Gaussian noise. For more detail, we refer the reader towork by Charrow et al. [37]. For one robot and one time-step the optimal control input in terms of entropy reduction is u (cid:63) = arg max u ∈U I ( Y t +1 ; M | b t , x t ) . (4.5)subject to the dynamics (4.1) and observation model (4.2). Consider an l -step horizon.The problem becomes a belief-dependent, partially observable Markov decision process(POMDP) as is discussed in more detail by Lauri and Ritala [116] and is an optimizationproblem over policies. We instead optimize over a fixed series of actions (see Sec. 2.1.2)which results in a simpler problem. To simplify notation, let Y i indicate the space ofpossible observations available to robot i over the finite horizon induced by the finite setof control inputs, the dynamics (4.1), and the observation model (4.2). The optimal multi-robot, finite-horizon informative plan is then Y (cid:63)t +1: t + l, n r = arg max Y l, n r ∈Y n r I ( Y l, n r ; M | b t , x t, n r ) (4.6)where the indexing x t, n r represents values at times 1 through t and for robots 1 through n r . In the following sections, we will drop the time and robot index as well as robot statesand belief when appropriate.Solutions to (4.6) will be constructed incrementally using greedy algorithms. Usingthe definitions in Sec. 3.5.2, the constraints in (4.6) can be interpreted as simple partitionmatroid where U = (cid:83) n r i =1 Y i and I = { Y ⊂ U | | Y ∩ Y i | ≤ , ∀ i ∈ R} . We make the following assumptions regarding the exploration scenario:
1) all robots havethe same belief state, operate synchronously, and communicate via a fully connected net-work ;
2) incremental motions via f are bounded ; and
3) the sensor range is bounded .The first assumption simplifies analysis in the context of this work. Here we emphasizescenarios where large numbers of robots operate in close proximity leading to redundantobservations. Extending the proposed algorithm to incorporate additional considerationssuch as communication constraints is left to future work. The second and third assump-tions ensure that the mutual information between observations made by distant robots iszero. These assumptions simplify the problem structure and are the key reason that theproposed efficient algorithm comes with little to no reduction in solution quality.50 .3 Single-robot planning
We employ Monte-Carlo tree search [27, 39] for the single-robot planner as previouslyproposed for active perception and exploration [18, 116, 144] and in multi-robot activeperception [18].In order to ensure bounded and similarly scaled rewards, constant terms from (4.6) aredropped when planning for the i th robot to obtain a local objective I ( Y t +1: t + l,i ; M | Y t +1: t + l,A ) , (4.7)the mutual information between Y t +1: t + l,i and the map conditional on observations Y t +1: t + l,A obtained by some set of robots A such that A ∩ { i } = ∅ .Denote solutions obtained from the Monte-Carlo tree search single-robot planner max-imizing (4.7) as ˆ Y i = SingleRobot( i, Y A ) (4.8)and assume this planner has suboptimality η ≥
1, such that η I ( M ; ˆ Y i | Y A ) ≥ max Y ∈Y i I ( M ; Y | Y A ) (4.9)as in to the approach of Singh et al. [169].Although Monte-Carlo tree search does not come with a suboptimality guarantee, someexisting algorithms for informative path planning [41, 169] do. Monte-Carlo tree search isused instead on account of limited computation time and later for ease in incorporation ofinter-robot collision constraints which are time-varying. The main contribution of this work is the design and analysis of a new distributed multi-robot planner that extends the single-robot planner discussed in Sec. 4.3 or any plannersatisfying (4.9) to multi-robot exploration. In development of a distributed algorithm, wefirst present a commonly used algorithm, SGA, which provides suboptimality guarantees[169] but requires robots to plan sequentially. We then propose a similar distributedalgorithm, DSGA, and analyze its performance in terms of time and suboptimality.
Consider an algorithm that plans for each robot in the team by maximizing (4.7) given allpreviously assigned plans and continues in this manner to sequentially assign plans to eachrobot. We will refer to this as sequential greedy assignment (SGA). Singh et al. [169] usethe properties of mutual information discussed in Sec. 3.5.1 to establish that SGA obtainsan objective value within 1 + η of the optimal solution. The greedy solution using an51 lgorithm 3 Distributed sequential greedy assignment (DSGA) from the perspective ofrobot i n d ← number of planning rounds n r ← number of robots Y F ← ∅ (cid:46) set of fixed trajectories for , . . . , n d do Y i | Y F ← SingleRobot( i, Y F ) I i, ← I ( M ; Y i | Y F | Y F ) (cid:46) planner reward I i,F ← I i, (cid:46) updated reward for k = 1 , . . . , (cid:100) n r n d (cid:101) do j ← arg min j ∈ n r I j, − I j,F (cid:46) computed by distributed reduction across robots if i = j then Transmit: Y i | Y F return Y i | Y F else Receive: Y j | Y F Y F ← Y F ∪ Y j | Y F I i,F ← I ( M ; Y i | Y F | Y F ) (cid:46) update rewardoptimal single-robot planner can be defined inductively as Y g = Y g0: n r using a suboptimalplanner as in (4.8) to obtain the solution Y g = Y g0: n r such that Y g0 = ∅ Y g i = SingleRobot( i, Y g1: i − ) (4.10)This algorithm satisfies the following suboptimality bound. Theorem 2 (Suboptimality bound of sequential assignment [169]) . SGA obtains a subop-timality bound of I ( M ; Y (cid:63) ) ≤ (1 + η ) I ( M ; Y g ) (4.11)This multi-robot planner is formulated as an extension of a generic single-robot plan-ner and depends only on the suboptimality of the single-robot planner. As robots plansequentially, this leads to large computation times as the number of robots grows. Consider a scenario with spatially distributed robots such that the mutual information be-tween any observations obtainable within a finite horizon by any pair of robots is zero. Theunion of solutions obtained for individuals independently is then equivalent to a solutionto the combinatorial problem over all robots, Y (cid:63) . A weaker version of this idea appliessuch that if the plans returned for a subset of robots are conditionally independent, thoseplans are optimal over that subset of robots regardless of the inter-robot distances. The52istributed planner, DSGA, is designed according to this principle and allows all robots toplan at once and then selects a subset of those plans while minimizing suboptimality.DSGA is defined in Alg. 3 from the perspective of robot i . Planning proceeds in afixed number of rounds, n d (line 4). Each round begins with a planning phase where eachrobot plans for itself given the set of plans that are assigned in previous rounds (line 5),stores the initial objective value, I i, (line 6), and copies this to a variable, I i,F (line 7) thatrepresents the updated value as more plans are assigned. The round ends with a selectionphase (line 8) during which a subset of (cid:100) n r n d (cid:101) plans are assigned to robots. The plans areassigned greedily to minimize the decrease in the objective values, I j,F − I j, , and the robotwhose plan is to be assigned is selected using a distributed reduction across the multi-robotteam (line 9) [114], and ties are broken arbitrarily. The chosen robot sends its plan to theother robots (line 11), and these robots store this plan (line 15) and update their objectivevalues (line 16).Denote a planner with n d planning rounds as DSGA n d . Let D i be the set of robots whosetrajectories are assigned during the i th distributed planning round and let F i = (cid:83) ij =1 D j be the set of all robots with trajectories assigned by that round. Denote solutions tothis new distributed algorithm as Y d . Then, let ˆ Y r | Y d Fi represent the approximate solutionreturned by the single-robot planner given previously assigned trajectories. The result ofDSGA can then be written as Y d D i,j = ˆ Y D i,j | Y d Fi − where D i,j is the j th robot assigned duringround i . DSGA achieves a bound related to Theorem 2 with an additive term based onthe decrease in objective values from initial planning to assignment that DSGA seeks tominimize (Alg. 3, line 9). Theorem 3.
The excess suboptimality of the distributed algorithm (Alg. 3) compared tothe bound for greedy sequential assignment is given by the sum of conditional mutual in-formations between each selected observation and prior selections for the same planninground I ( M ; Y (cid:63) ) ≤ (1 + η ) I ( M ; Y d ) + ηψ (4.12) where ψ = (cid:80) n d i =1 (cid:80) | D i | j =1 I ( Y d D i,j ; Y d D i, j − | Y d F i − ) represents excess suboptimality. The proof isprovided in Appendix B.3. This is an online bound in the sense that it is parametrized by the planner solution.However, as will be shown in the results, ψ tends to be small in practice indicating thatDSGA produces results comparable to SGA. In this sense, small values of ψ serve tocertify the greedy bound of 1 + η empirically without needing to obtain the objective valuereturned by SGA explicitly. The main focus of this work is to investigate how aspects of problem structure can be usedto achieve efficient planning with minimal impact on suboptimality. This contrasts starkly Although we address multi-robot exploration, this result applies generally to informative planningproblems and general monotone submodular maximization with partition matroid constraints (aside fromnotation and problem specialization).
53o the worst-case suboptimality which is inversely proportional to the number of robots.While worst-case results by Gharesifard and Smith [75] and Mirzasoleiman et al. [133] arerelevant, Grimsman et al. [82] recently proved a somewhat tighter bound that is readilyapplicable to Alg. 3.
Theorem 4 (Worst-case suboptimality) . The following worst-case bound for Alg. 3 holds: I ( M ; Y (cid:63) ) ≤ (1 + η (cid:100) n r /n d (cid:101) ) I ( M ; Y d ) . (4.13) The proof can be found in Appendix B.4.
According to this bound, the planner may entirely fail to take advantage of additionalrobots if additional planning rounds are not introduced. However, this does not accountfor locality which can lead distant robots to become decoupled in the solution. This boundcan then be interpreted as a limiting scenario for when locality does not hold such asdue to extremely close proximity or complex interactions between observations and theenvironment model.
The subset selection step of DSGA itself uses a greedy strategy. Looking at this moredirectly, the negation of the contribution of a single round is I ( M ; Y d D i | Y d F i − ) − | D i | (cid:88) j =1 I ( M ; Y d D i,j | Y d F i − ) , (4.14)found by application of the chain rule of mutual information (see Sec. 3.2) to (B.7). Equa-tion (4.14) is submodular and monotonically decreasing unlike the monotonically increasingobjectives considered previously. While we apply a heuristic approach and evaluate resultsempirically, other works have evaluated this setting more directly [74].More generally, approaches that seek to minimize (4.14) over the course of individualrounds may fail for sufficiently unbalanced problems. For example, a large number ofrobots with zero contribution could cause all robots with non-zero objective values to beselected at once at significant detriment to solution quality. While such scenarios are notencountered in this work, this is an important direction for future work. We compare the run time of DSGA to SGA for variable numbers of robots with runtime defined as the time elapsed from when the first robot begins computation until thelast robot is finished. We assume point-to-point communication over a fully connectednetwork requiring a fixed amount of time per message. The messages have fixed sizes andcorrespond either to a finite-horizon plan or a difference in mutual information. Giventhese assumptions, broadcast and reduction steps each require O (log n r ) time [114]. Let α ( n r , b t ) bound the run time of the selected single-robot planner, noting the dependenceon the number of robots and the environment. Similarly, let β ( n r , b t ) bound cost of themutual information computation in Alg. 3, line 16.54GA consists of n r planning steps, each with one broadcast step. The computationtime of sequential greedy assignment is thenSGA : O ( n r α ( n r , b t ) + n r log n r ) . (4.15)Each of the n d rounds in DSGA begins with a single planning phase, coming to a costof O ( n d α ( n r , b t )). The rest of the algorithm consists of subset selection, broadcast of thechosen plans, and computation of mutual information which cumulatively occur once perrobot for a total cost ofDSGA n d : O ( n d α ( n r , b t ) + n r β ( n r , b t ) + n r log n r ) . (4.16)If n d grows slowly or is constant, the performance of DSGA relative to SGA depends onthe relative cost of the mutual information computation ( β ).For the approximation developed by Charrow et al. [37], evaluation of mutual informa-tion is linear in the number of cells of the map that are observed. Given the assumptionof bounded sensor range, evaluation of mutual information scales linearly in the number ofrobots so that β ( n r , b t ) ∈ O ( n r ). When planning time is dominated by a constant numberof mutual information computation evaluations, such as when using a Monte-Carlo treesearch planner with a fixed number of sample trajectories, then α ( n r , b t ) ∈ O ( n r ) so thatboth algorithms are quadratic. However, in practice, α includes a large constant factorleading to a significant speedup for DSGA n d compared to SGA. In general, α may alsodepend non-trivially on factors such as the length of the plan or the scale or complexityof the environment. This further emphasizes the significance of the of eliminating the n r coefficient from the single-robot planning step. Up to this point, the analysis has considered maximization of the mutual informationobjective over a joint space of finite-horizon trajectories which has the form of a partitionmatroid constraint. However, inter-robot collision constraints cannot be modeled using amatroid, and further challenges arise if the planner can fail to provide a solution or if nosolutions exist.
After including inter-robot collision constraints, the space of feasible joint plans no longerhas the structure of a partition matroid or even a general matroid. Consider, a trajectorythat is free of collisions with one assignment of trajectories to a subset of robots. Thistrajectory is not necessarily free of collisions with another assignment of trajectories tothe same subset of robots and neither can a robot in the subset be assigned an additionaltrajectory from the first assignment. This violates the exchange property for matroids(Def. 5). Instead the constraints can be modeled using an independence system (Def. 4),55iven that subsets of collision-free trajectories are also collision-free. In general, greedyalgorithms perform arbitrarily poorly for general independence systems [69]. However,we can recover a lower bound of 1 / ( ηn r ) by maximizing mutual information using thefirst selection in each planning round which is otherwise under-determined because thedifferences in Alg. 3, line 9 are uniformly zero for the first plan selected in each round. Thebound follows by observing that the maximal feasible subsets have cardinality of at most n r and by applying suboptimality of the single-robot planner. This result will be used laterduring discussion of system liveness in Sec. 4.5.3.Although suboptimality bounds change with the constraint structure, this is not nec-essarily indicative of typical performance in exploration. Mutual information objectives asused in exploration often favor configurations with large inter-robot distances that min-imize redundancy in sensor observations. Robots are also often small compared to theirsensor footprints which further mitigates detrimental effects of collision constraints. Safety is interpreted as meaning that robots do not enter any occupied part of the en-vironment and do not collide with each other, and each condition is implemented usingthresholds on distances as discussed in more detail in Sec. 4.6. Safety is guaranteed byenforcing the invariant that the current joint plan is safe and terminates in a controlledinvariant state as is common in model predictive control [150]. Plans are executed ina receding-horizon fashion so that robots do not necessarily enter the planned invariantstates whereas in the case of planner failure, one or more robots may enter invariant statesand remain in those states until a feasible plan can be found.Safety will be guaranteed under the following assumptions:
1) the initial state, at thestart of the exploration process, is an invariant state ; and
2) states that are believed not tobe in collision with the environment are free of collisions with the environment and remainso for all time . The first assumption ensures safety by induction. The second preventsarbitrary changes in the conditions for safety. This latter assumption does not alwayshold in practice. Approaches, such as selectively relaxing constraints, can address this ifnecessary.Assume that the team of robots is tracking a plan that is safe for all time (i.e. meetsthe stated requirements for all time). We modify Alg. 3 to only commit to a single-robotplan if the resulting joint plan that results from swapping the old single-robot plan withthe new one is also safe for all time. The following constraints are also implemented inthe single-robot planner:
1) plans terminate in an invariant state ; and
2) plans respectcollision constraints with other robots and the environment according to the full and currentjoint plan at planning time . The joint plan which results from replacing a robot’s plan withthe output of the single-robot planner may not be safe either due to failure to meet theabove constraints or due to updates to other robots’ plans. In this case, the system falls In the implementation, robots are required to come to a stop and trajectories are then temporallyextended as necessary. Although these conditions can be relaxed, such as to invariant sets, there do notappear to be clear or significant benefits from doing so.
Informally, liveness properties refer to guarantees that a system will make progress towarda goal. In exploration, a natural statement of liveness for some system state is that themulti-robot team will eventually select an action that reduces the entropy of the map.Although liveness is not guaranteed in general, the proposed system design does admita limited liveness guarantee. This prevents scenarios such as when a small number ofconflicting robots can create a persistent state of deadlock in the entire system.
Assume that some robot finds a feasible single-robot plan with a mutual informationreward of at least (cid:15) . Then, using the results and modifications described in Sec. 4.5.1,the resulting joint plan also provides a mutual information reward of at least (cid:15) which inturn corresponds to (cid:15) entropy reduction in expectation. This condition guarantees live-ness in exploration with high probability (i.e. barring infinite sequences of disinformativeobservations) so long as some robot can find an action that reduces the entropy of the map.
The results are divided into two main parts. Section 4.6.1 does not incorporate inter-robotcollisions and addresses scalability and performance of Alg. 3 as well as the suboptimalityin terms of Theorem 3 in the intended context of submodular maximization over a partitionmatroid. Section 4.6.2 introduces inter-robot collisions and the application to a team ofaerial vehicles both in simulation and on real hardware. These experiments incorporate theapproach described in Sec. 4.5 in order to address additional challenges related to dynamicsand collision prevention.
The proposed approach is first evaluated in a simplified scenario with a team of kinematicquadrotors, disregarding dynamics and inter-robot collisions, so the discussion in Sec. 4.5does not come into play. Performance is evaluated in terms of planner objective values,computation time, and entropy reduction with respect to the map.
Remark.
Chapter 7 revisits the results from this chapter. While developing that work wefound that relative transformations for control actions were being applied a second timebefore computing the information gain which affects the results in this chapter for thekinematic model (but not results for the dynamic model or with physical robots). Applyingtransforms twice has little effect for translations (as the field of view does not change much)but significant effect for rotations. Although we include the original results, note that fixingthis error improved completion time (Sec. 7.6.2) by about 18% for that version of theexploration system and eliminated some aberrant behavior. .6.1a Exploration scenario We test the exploration methodology in a confined and cluttered environment with obsta-cles (cubes) of various sizes and with robot positions initialized randomly near a lower edgeas depicted in Fig. 4.1. The environment is bounded by a 6 m × × . through DSGA (noting that DSGA n d represents Alg. 3 with n d planning rounds) to SGA for 4, 8, 16, and 32 robots over 3000 iterations (time-steps).Tests evaluating exploration performance are each run twenty times each with randomizedinitializations. Experiments that evaluate computation time use specially-instrumentedplanner and single trial for each configuration. We evaluate the proposed algorithm in simulation and run experiments on a computerequipped with an Intel Xeon E5-2450 CPU (2.5 GHz). The planner and other componentsof the system are implemented using C++ and ROS [148]. When timing the distributedplanner, planning steps that would normally be performed in parallel (single-robot plan-ning during each planning round and information propagation) are executed serially. Theduration of each such step is taken as the maximum duration of the serially computedsteps in order to mimic distributed computation. In practice, computing the reduction tofind the minimum excess term (Alg. 3, line 9) requires an insignificant amount of time. So,although the analysis assumes a logarithmic-time parallel, we compute this by iterationover all elements in the implementation.The simulated robots emulate kinematic quadrotors moving in a three-dimensionalenvironment with planning, mapping, and mutual information computation each performedin 3D. The single-robot Monte-Carlo tree search planner is run for a fixed number ofsamples (200) using a discrete set of actions consisting of the choice of translations of ± . x – y – z directions or heading changes of ± π/ . ×
12 resolution and a 43 . ° × . ° field of view which isoriented with the long axis aligned vertically to promote effective sweeping motions. Forefficient evaluation of the mutual information objective, we substitute the approximationof Cauchy-Schwarz mutual information (CSQMI) described by Charrow [33] (which is notnecessarily submodular) for the Shannon mutual information (3.5) and downsample raysby two in each direction. Rather than the uniform prior for cell occupancy (50%) which iscommonly used in mapping, we introduce a prior of 12 .
5% probability of occupancy duringevaluation of mutual information [179]. This encourages selection of actions that observelarger volumes of unknown space. This set of experiments is not run in real-time althoughthe next is. The planner maintains the robot states internally and triggers the cameraafter each iteration. 58
500 1 ,
000 1 ,
500 2 ,
000 2 , O b j ec t i v e ( b i t s ) (a) Information reward, n r = 4 ,
000 1 ,
500 2 ,
000 2 , ψ ( b i t s ) DSGA DSGA DSGA SGA (b) ψ terms, n r = 4 ,
000 1 ,
500 2 ,
000 2 , O b j ec t i v e ( b i t s ) (c) Information reward, n r = 8 ,
000 1 ,
500 2 ,
000 2 , ψ ( b i t s ) (d) ψ terms, n r = 8 ,
000 1 ,
500 2 ,
000 2 , O b j ec t i v e ( b i t s ) (e) Information reward, n r = 16 ,
000 1 ,
500 2 ,
000 2 , ψ ( b i t s ) (f) ψ terms, n r = 16 ,
000 1 ,
500 2 ,
000 2 , O b j ec t i v e ( b i t s ) (g) Information reward, n r = 32 ,
000 1 ,
500 2 ,
000 2 , , ,
500 Robot-Iteration ψ ( b i t s ) (h) ψ terms, n r = 32 Figure 4.2:
Objective and ψ values for varying numbers of robots and distributed planning rounds: (left)Mutual information objective values for DSGA and DSGA closely track SGA while the performanceof DSGA (which ignores inter-robot interactions) degrades with increasing numbers of robots. (right)Variations in the excess, ψ , terms correspond with the objective values—although ψ -values increase withthe numbers of robots, they also decrease with increasing numbers of planning rounds and remain smallcompared to the objective for DSGA . Transparent patches indicate standard-error. lg. n r Objective (cid:0) bitsrobot (cid:1) ψ (cid:0) bitsrobot (cid:1) Entropy red. rate (cid:0) bitsrobot · iter. (cid:1) Total entropy red. (bits)avg. std. dev. avg. std. dev. avg. std. dev. avg. std. dev.DSGA
16 2.22e+01 7.47e+00 1.24e+01 8.14e+00 1.64e+02 1.11e+01 1.60e+05 1.77e+03DSGA
16 2.46e+01 8.08e+00 3.04e+00 2.58e+00 1.77e+02 9.52e+00 1.66e+05 1.42e+03DSGA
16 2.52e+01 8.17e+00 9.67e-01 1.30e+00 1.78e+02 8.97e+00 1.67e+05 1.67e+03SGA 16 2.48e+01 7.86e+00 0.00e+00 0.00e+00 1.74e+02 7.60e+00 1.68e+05 1.65e+03DSGA
32 1.69e+01 4.64e+00 2.00e+01 1.00e+01 1.26e+02 4.60e+00 1.61e+05 1.75e+03DSGA
32 1.94e+01 4.99e+00 6.00e+00 3.59e+00 1.27e+02 5.45e+00 1.67e+05 1.44e+03DSGA
32 2.02e+01 5.30e+00 2.44e+00 1.79e+00 1.29e+02 7.60e+00 1.69e+05 9.94e+02SGA 32 2.03e+01 4.63e+00 0.00e+00 0.00e+00 1.31e+02 8.73e+00 1.72e+05 6.51e+02
Table 4.1:
Exploration performance results for the kinematic exploration scenario: The reduction rateis computed with respect to the map over the first 250 robot-iterations and is representative of nominalperformance. The total entropy reduction shows the entropy reduction with respect to the map at the endof the experimental trial (3000 robot-iterations).
Figure 4.2 and Table 4.1 show results for exploration experiments comparing DSGA through DSGA to SGA. The excess ( ψ ) is largest at the beginning of each explorationrun as all robots are start near the same location. As the robots spread out, all plannersapproach approximately steady-state conditions in terms of both excess suboptimality andobjective values before decaying once the environment is mostly explored. The ψ termsremain relatively large for DSGA —which assigns all plans in a single round and doesnot reason about conditional dependencies. However, the ψ terms decrease monotonicallywith increasing numbers of planning rounds. These values remain small for DSGA andare approximately one-eighth of the objective value on average for 32 robots whereas valuesfor DSGA exceed the value of the objective for the same number of robots. Decreasingvalues of ψ are reflected in plots of the mutual information objective. Whereas DSGA andDSGA closely track SGA, objective values for DSGA degrade with increasing numbersof robots.When interpreting these results and entropy reduction results to be shown in Sec 4.6.1e,it is helpful to note that the scale of the environment remains constant and that increas-ing the number of robots serves to increase a notion of the density of the distribution ofrobots in the environment. In this sense, similar performance can be expected of largerteams operating in larger environments up to constraints on computation and communi-cation. Toward this end, the next subsection evaluates computation times for DSGA anddemonstrates that it scales much better than SGA in practice.60 .6.1d Computational performance While planning performance is largely consistent, DSGA is able to take advantage of dis-tributed computation in to provide significantly improved computation times (Fig. 4.3).Computation times for SGA scale approximately linearly in our simulation experiments.The average computation time for SGA increases more than seven times from n r = 4 to n r = 32 robots from 2 .
25 s to 16 . provides a 2 to 8 times speedup compared to SGA. With 32 robots, thecomputation time increases by only a factor of 2.5 from DSGA to DSGA , despite triplingthe number of planning rounds. These results indicate that DSGA is able to providemulti-robot coordination for large numbers of robots while accommodating requirementsfor online performance well after doing the same with SGA becomes intractable. Figure 4.4 compares DSGA to SGA for various numbers of robots and is summarized inTab. 4.1. Exploration rates early in exploration runs remain consistent across planners anddegrade by approximately 43% as the numbers of robots increase due to crowding. Thistranslates to significant improvements in the total rate of exploration even as efficiencydecreases.The entropy reduction performance of DSGA and DSGA closely matches SGA foreach number of robots. Interestingly, the performance of DSGA is worst for n r = 4robots where the rate of entropy reduction degrades after initially performing well. Forlarger numbers of robots, differences in entropy reduction performance are most apparentin the total entropy reduction. This is likely a result of the planners involving inter-robot coordination being able to distribute robots across the environment more effectively.Because we use local finite-horizon planner, DSGA then occasionally fails to allocaterobots to and explore some portions of the environment. Given a spatially global planningstrategy, longer exploration times could then be expected for DSGA due to the cost oftraveling to these unexplored regions of the environment. This section addresses more realistic systems and involves inter-robot collision avoidanceand non-trivial dynamics in the form of teams of real and simulated quadrotors. Unlikethe previous section, robots now track dynamic trajectories while planning is performed inreal-time. Simulation results for a team of six robots demonstrate that DSGA retains per-formance comparable to SGA. Experiments using a three real quadrotors serve to groundthe simulation results by demonstrating comparable results on a similarly configured sys-tem in practice. 61lg. n r S.R. Planning Prop. Totalavg. std. avg. std. avg. std.DSGA .
600 0 . . . .
625 0 . .
19 0 . . . .
21 0 . .
11 0 .
106 0 . . .
13 0 . .
23 0 .
150 0 . . .
25 0 . .
630 0 . . . .
682 0 . .
22 0 . . . .
27 0 . .
82 0 .
105 0 . . .
86 0 . .
22 0 .
243 0 . . .
25 0 .
16 0 .
609 0 . .
106 0 . .
715 0 .
16 1 .
28 0 . .
106 0 . .
38 0 .
16 1 .
91 0 . .
106 0 . .
02 0 . .
55 0 .
329 0 . . .
63 0 .
32 0 .
649 0 . .
217 0 . .
865 0 .
32 1 .
30 0 . .
222 0 . .
53 0 .
32 1 .
92 0 . .
218 0 . .
14 0 . . .
592 0 .
131 0 . . . (a) Table of timing data D u r a t i o n ( s ) DSGA DSGA DSGA SGA (b)
Timing for SGA and DSGA Figure 4.3:
Computational performance (seconds) in terms of total computation time (time elapsedfrom when the first robot starts planning until the last robot stops). (a) Time per iteration spent in thesingle-robot planner, propagation of the information reward (DSGA only), and total computation time.(b) Comparison of the timing differences between SGA and DSGA.
500 1 ,
000 1 ,
500 2 ,
000 2 , . . · Robot-Iteration E n t r o p y r e du c t i o n ( b i t s ) (a) Information gain per robot ( n r = 4) ,
000 1 ,
500 2 ,
000 2 , . . · Robot-Iteration E n t r o p y r e du c t i o n ( b i t s ) DSGA DSGA DSGA SGA (b)
Information gain per robot ( n r = 8) ,
000 1 ,
500 2 ,
000 2 , . . · Robot-Iteration E n t r o p y r e du c t i o n ( b i t s ) (c) Information gain per robot ( n r = 16) ,
000 1 ,
500 2 ,
000 2 , . . · Robot-Iteration E n t r o p y r e du c t i o n ( b i t s ) (d) Information gain per robot ( n r = 32) Figure 4.4:
Entropy reduction performance with different numbers of robots and planner configurations:DSGA and DSGA closely track SGA in all cases while DSGA , which does not incorporate inter-robotcoordination, performs worst for n r = 4 and has reduced total entropy reduction for n r = 16 and n r = 32.Transparent patches show standard-error. Gray lines indicate the maximum mean entropy reduction of1 . · bits (data continues through 3000 robot-iterations) and a 90% threshold for task completionwhich facilitates approximate comparison to results in Chapter 7. .6.2a Implementation details As these experiments now involve inter-robot collision constraints, we now incorporate thedetails Sec. 4.5 into both the SGA and DSGA planners. Rather than discrete steps, theMonte-Carlo tree search planner now uses a library of polynomial motion primitives similarto that described by Tabib et al. [179] to provide actions. Because trajectories are executedin real-time, the Monte-Carlo tree search planner is run in a fully any-time fashion at arate of 1 / . The simulation trials (Fig. 4.5) evaluate a team of six robots exploring a pseudo-planarrearrangement of the environment shown in Figure 4.1 using sequential planning (SGA),myopic planning (DSGA ), and distributed planning with three rounds (DSGA ) withtwenty trials per each planner. Single-robot planning steps (Monte-Carlo tree search) wererun in parallel on the same computer as in Sec 4.6.1b.The simulation environment and results for entropy reduction are evident shown inFig. 4.5. The exploration performance is largely similar across planners except that DSGA exhibits a slight delay at the start due to a tendency for robots to choose conflicting trajec-tories. Because robots are initialized near each other, toward the center of the environment,(with blue lines marking the locations where robots took off) the single-robot planner re-turns plans that would result in inter-robot collisions most often early in each trial. This isevident in Fig. 4.6 which plots the cumulative number of times that the multi-robot plan-ner has had to fallback to the existing safe trajectory due to either inter-robot collisionsor failure of single-robot planning as described in Sec 4.5.2. This is most pronounced forDSGA due to the lack of coordination in the planner. Using the fallback trajectories ismost detrimental at the very beginning of the trial when the fallback consists of stayingstationary at the starting position, causing the delay. Later, once they have been populatedwith prior planning results, resorting to fallback trajectories becomes less significant. An experimental trial with a team of three quadrotors is also included as depicted inFig. 4.7. This experiment serves to ground the simulation results described in the priorsubsections on a real system. Due to the small size of the team, only SGA is used for64
100 200 30000 . . · Time (s) E n t r o p y r e du c t i o n ( b i t s ) DSGA DSGA SGA
Figure 4.5: (left) Six robots explore a pseudo-planar environment using motion-primitive trajectories.(middle) Robots (such as the green and blue) plan informative and collision-free trajectories using DSGA despite operating in close proximity. The portion of each robot’s receding-horizon plan that is currentlybeing executed is colored in green, and the rest of the horizon is magenta. (right) Planners perform similarlyin terms of entropy reduction although the initial performance of DSGA is impaired as it frequently resortsto using fallback trajectories due to conflicts in planned trajectories.0 50 100 150 200 250 3000102030 Time (s) C u m . F a ll b a c k T r a j s . DSGA DSGA SGA
Figure 4.6:
Cumulative numbers of fallback trajectories selected by multi-robot planner in simulationresults with dynamic quadrotors: The multi-robot planners resort to fallback trajectories when the single-robot planner fails or returns a plan that would result in inter-robot collisions. DSGA uses fallbacktrajectories frequently early in trials when robots are near each other and later after most of the environ-ment has been explored.
20 40 60 80 1001234 · Time (s) E n t r o p y r e d . ( b i t s ) Figure 4.7: (left) A team of three robots explores a motion capture arena occupied by a geometric objectconsisting of foam boxes using Monte-Carlo tree search and motion-primitive trajectories and coordinationvia SGA and (middle) produce a voxel grid map of the environment (shown from a different vantage-pointto also detail the robot in red which explores behind the obstacle). (right) The entropy of the map decreasesas the robots move through the environment. coordination as DSGA would be expected to provide little benefit (DSGA is essentiallyequivalent to SGA for three robots). Planning is performed offboard on a laptop with aIntel i7-5600U CPU (2.6 GHz). Otherwise, the system configuration is identical to priorset of simulation results. Whereas the simulation results demonstrate application of theapproach described in Sec. 4.5 for exploration with collision constraints and dynamic robotsusing a significant number of robots, these hardware results connect the implementationand its configuration to a real system.The experiment is performed in a motion capture flight arena populated with a pile offoam blocks of various shapes and sizes. During the course of the exploration trial, theteam of robots is able to successfully navigate and map the environment as evident in thetrajectories and cumulative entropy reduction. The rate of entropy reduction per robotat the beginning of the trial is similar to Fig. 4.7 at approximately 400 bitss · robot . While thesimulation results demonstrate efficiency of SGAand DSGAwith and without collision con-straints these experimental results demonstrate that the planner design and configurationis also applicable to exploration with a real multi-robot team and performs similarly. The proposed distributed algorithm (DSGA) efficiently approximates sequential greedy as-signment (SGA) and is appropriate for implementation on large multi-robot teams usingonline planning. We have applied this algorithm to the problem of multi-robot explo-ration, and demonstrated consistent entropy reduction performance in simulation for largenumbers of robots exploring and mapping a complex three-dimensional environment. Theresults demonstrate that DSGA is able to effectively take advantage of parallel computa-tion without degradation in solution quality at scales where application of SGA becomesintractable. We expect that this result will be instrumental in development of physicalmulti-robot systems that take advantage of distributed computation for exploration andother online informative planning problems.In order to address realistic scenarios, we incorporated robot dynamics and inter-robot66ollision constraints into the problem formulation. Although collision constraints resultin increased problem complexity, we were able to modify the planning approach to guar-antee safety and liveness under weak assumptions. The simulation results demonstratedconsistent exploration performance and indicate that collision constraints have a relativelyminor impact on exploration performance. Further, including collision constraints makesthis approach viable for application to real robots, and we demonstrated this on a team ofaerial robots flying in a motion capture arena.Although DSGA can take advantage of parallel computation, the actual assignments inthe subset selection step are fully sequential, and the asymptotic computation time is iden-tical to sequential greedy assignment even though we obtained significant improvements inpractice. The bound on suboptimality (Theorem 3) also only provides a post-hoc guaran-tee, and optimization performance could degrade for large numbers of robots (Theorem 4).Instead, the RSP methods that we introduce in the following chapters eliminate all sequen-tial computation involving the entire multi-robot team and obtain strong suboptimalityguarantees for any number of robots. 678 hapter 5Scalable and Near-Optimal Planningfor Multi-Agent Sensor Coverage
Many objective functions that arise in sensor planning problems such as mutual informa-tion objectives [169] (and as in Chapter 4), objectives for sensing in hazardous environ-ments [102], and various notions of area, set, and sensor coverage [153] are submodular.Intuitively, submodularity implies diminishing returns when constructing sets of sensingactions. This work explores multi-agent planning problems with submodular objectivefunctions and especially variants of set and sensor coverage. We emphasize problems thatfeature networks of large or even unspecified numbers of agents seeking to maximize aglobal submodular objective, and the planners we propose obtain constant-factor perfor-mance under mild conditions such as for agents with limited sensor range and spatiallylocal sensing actions.Chapter 4 began to address the challenge that sequential planning scales poorly as thenumber of agents increases. Further, solving agents’ local planning subproblems can betime-consuming on its own as the space of actions may be nearly infinite [169]. At thesame time, we have already seen that dynamic environments and beliefs motivate real-time planning so that efficient multi-agent coordination is critical when scaling to largenumbers of agents. We address this issue by proposing efficient distributed planners thatconsist of fixed numbers of sequential planning steps and approach existing constant-factorperformance bounds (in expectation) when known pairwise interactions between agents areproportional to objective values.Several recent works [75, 82] also address the core challenge of this work: design ofparallel variants of sequential planners for multi-agent systems. Gharesifard and Smith[75] define a class of distributed planners based on directed acyclic graphs where agentsperform greedy planning steps using only a subset of the decisions made by prior agents.Although they provide worst-case bounds on suboptimality, Grimsman et al. [82] providetighter results for the same framework. However, both works obtain bounds that areinversely proportional to the maximum number of agents that can plan in parallel. Incontrast, Chapter 4 demonstrated that such planners can be effective when agents thatplan in parallel can find sets of decoupled actions. However, that approach also onlyprovided post-hoc bounds and did not scale to arbitrary numbers of agents as some steps69 a) Area coverage (b)
Inter-agent redundancy
Figure 5.1:
Consider a team of robots maximizing (a) sensor area coverage. Intuitively, distant agentsmay make decisions independently with no loss in optimality. We exploit such conditions to enable efficientdistributed planning. Specifically, (b) inter-agent redundancy quantifies the maximum coupling betweentwo agents. The distributed algorithm that we propose requires planning time that is independent of thenumber of agents and guarantees near-optimal performance when redundancy decreases monotonicallysubject to prior observations as is true for coverage objectives. remained fully sequential.We continue in the direction of the previous chapter by seeking to develop efficientdistributed planners that exploit problem structure. As discussed in Fig. 5.1, our approachis inspired by the intuition that decisions by distant agents are often decoupled. We modelthis notion with a concept of inter-agent redundancy which describes how much one agent’smarginal gain can decrease as a result of ignoring another agent. Our approach requiressuch redundancy to decrease monotonically in the presence of actions selected by previousagents in the planning sequence. This condition is equivalent to requiring the objective tosatisfy the next higher-order monotonicity property after submodularity.Together, these properties enable use of pairwise redundancy to bound the effect ofignoring an agent at any step of the planning process which relates total redundancy tosuboptimality. We propose an algorithm—Randomized Sequential Partitions (RSP)—thatrandomly partitions agents and obtains a constant-factor bound when the optimal solutionis proportional to the cumulative pairwise redundancy between all agents. This condition isgenerally satisfied by problems involving limited sensing range and distributions of agentswith bounded density, and the approach further admits features such as local adaptationand limits on communication range. We refer to the latter case as Range-limited RSP(orR-lRSP). Finally, we prove that a generalized variant of weighted set coverage satisfieshigher-order monotonicity conditions and provide simulation results for two cases, areacoverage and a probabilistic detection scenario.This chapter originally appeared in [55].70 .1 Background discussion
Consider a set function g : 2 U → R where U is the ground set. Just as for the previouschapter, the functions we study in this one are normalized , monotonically increasing , and submodular according to the definitions in Sec. 3.5.1.We now introduce the next higher-order monotonicity property (see Sec. 3.5.1d) [70]:the objectives in this chapter are 3-increasing. Given the definition of the derivative of aset function (3.16) and for A ⊆ B ⊆ U and disjoint subsets X, Y ⊆ U \ B , 3-increasingfunctions satisfy g ( X ; Y | A ) ≤ g ( X ; Y | B ) . (5.1)When expanded and negated, this expression takes the form g ( X | B ) − g ( X | B, Y ) ≤ g ( X | A ) − g ( X | A, Y ). Given that A ⊆ B , this can be interpreted as stating that con-ditioning reduces redundancy, and expressions of the form − g ( A ; C ) = g ( A ) − g ( A | C ) willbe referred to as expressing the pairwise redundancy between A and C . Such higher-ordermonotonicity properties have not been used extensively in the literature on optimiza-tion of submodular functions although a few works study the same and similar proper-ties [44, 70, 106, 149].Weighted set cover is an example of a submodular function that is 3-increasing, andsuch objectives have been studied extensively and used to prove hardness results for sub-modular maximization [66] and tightness results for distributed planning [82]. Becauseexisting results already use functions that are 3-increasing, requiring this condition doesnot impact hardness of an optimization problem. At the same time, some common sub-modular objectives are not necessarily 3-increasing. Figure 5.2 describes one such scenariofor a submodular mutual information objective. Recall from Sec. 3.5.2 that partition matroids describe multi-agent problems where thejoint action space is a product of local action spaces for each agent. The ground set ofthe partition matroid ( U , I ) is partitioned by a set of blocks { B , . . . , B n } , and theadmissible sets of actions are I = { X ⊆ U | | X ∩ B i | ≤ (cid:96) i } for (cid:96) i ≥ Consider a multi-agent planning problem with agents A = { , . . . , n a } where each agent i ∈ A is associated with a set of actions B i which is also a block of the partition matroid.Agents may select at most one action so that | X ∩ B i | ≤ Previously, we referred to the 3-increasing condition as supermodularity of conditioning [55]. Wanget al. [191] also refer to the combination of this condition with submodularity as “strong submodularity.” hat coloris 1? What coloris n?Do n-1 and nhave the samecolor?Do 1 and 2have the samecolor? Figure 5.2:
This illustration depicts an example of a submodular objective that is not 3-increasingand generalizes common examples for when mutual information increases under conditioning [58]. Inthis example, boxes in the set C = { , . . . , n } are colored green or gold independently and with equalprobability. Sensors may observe either end directly { Y , Y n } and differences colors of adjacent boxes { Y \ , . . . , Y n − \ n } , to obtain a submodular mutual information reward g ( Y ) = I ( C ; Y ) where Y is a setof such observations. Each individual observation provides one bit of information, and pairs obtain twobits and have no redundancy. This trend only changes once we consider all of the observations together: we can determine the color of the last box in the sequence by observing the first box and each subsequentchange in color. This violates the 3-increasing condition (5.1) as the addition of last observation signifiesan increase in redundancy and equivalently a decrease in the second derivative of the objective. That is g ( Y ; Y n ) = 0 but g ( Y ; Y n | Y \ , . . . , Y n − \ n ) = − actions X ∈ I . Further, the agents are engaged in a sensing task with an objective g thatsatisfies the conditions outlined in Sec. 5.1.1 and so seek to solve X (cid:63) ∈ arg max X ∈ I g ( X ) . (5.2)Previously, Sec. 3.5.3c described the local greedy heuristic (sequential maximization, Alg. 2)which obtains approximate solutions by recursively applying a greedy maximization step.Further, solutions via this approach are guaranteed to be within half of optimal. Sequential planning (Alg. 2) on a large network of agents is time-consuming as each agentmust wait to receive the incremental solution from the previous agents before beginningcomputation. Gharesifard and Smith [75] propose a related class of planners where agentsmay ignore the decisions of previous agents according to a directed acyclic graph. Ratherthan planning with respect to all prior decisions as in Alg. 2, these planners obtain thesolution X d = { x d1 , . . . , x d n a } by evaluating x d i ∈ arg max x ∈ B i g ( x | X d N i ) (5.3)using incremental solutions from N i ⊆ { , . . . , i − } , the set of in-neighbors of agent i inthe directed acyclic graph. This model can be used to design distributed planners where72istant agents do not communicate or where subsets of agents execute their planningsteps in parallel. Prior works studying such planners examine worst-case behavior forobjectives that are submodular and monotonic [75, 82]. These fail to obtain constant-factor suboptimality when given only a fixed number of sequential planning steps butan arbitrary number of agents. Instead, this work examines sufficient conditions for adistributed planner with a fixed number of sequential planning steps to approach theperformance of a sequential planner (half of optimal, see Sec 3.5.3c) We begin by analyzing(5.3) based on the redundancy of sensing actions between pairs of agents. The performance of the distributed planner will be analyzed by bounding decreases inmarginal gains due to failure to condition on choices by prior agents i.e. g ( x d i | X d N i ) − g ( x d i | X d1: i − ). The 3-increasing condition enables derivation of bounds on such changes inmarginal gains using pairwise redundancies between elements.Define the inter-agent redundancy graph as a weighted, undirected graph G = ( A , E , W )with agents as vertices, edges E = { ( i, j ) | i, j ∈ A , i (cid:54) = j } , and weights W ( i, j ) = w ij = max x i ∈ B i , x j ∈ B j − g ( x i ; x j ) . (5.4)This connects the notion of redundancy to the multi-agent planning problem via maximum(and undirected) inter-agent redundancies.Decreases in marginal gains can be bounded using the pairwise redundancies from theinter-agent redundancy graph using the following lemma. Lemma 5 (Pairwise redundancy bound) . Consider disjoint subsets
A, B, C ⊆ U . Then g ( A | B, C ) − g ( A | C ) ≥ (cid:88) b ∈ B g ( A ; b ) (5.5) Proof.
Lemma 5 follows from the chain rule (Lemma 1) and g being 3-increasing. Givenan ordering B = { b i , . . . , b | B | } , construct a telescoping sum g ( A | B, C ) − g ( A | C ) = g ( B | A, C ) − g ( A | C )= | B | (cid:88) i =1 g ( b i | B i − , A, C ) − g ( b i | B i − , C )= | B | (cid:88) i =1 g ( b i ; A | B i − , C )Then, (5.5) follows because g is 3-increasing (5.1). Finally, note that, although the expres-sion for (5.5) is tailored to its usage, the left hand side has the form of a second derivativeof g so that g ( A ; B | C ) = g ( A | B, C ) − g ( A | C ). (cid:4) Being undirected, ( i, j ) and ( j, i ) are the same edge. α -redundant for α > αg ( X (cid:63) ) ≥ (cid:88) ( i,j ) ∈E w ij . (5.6)Instances of (5.2) with finite objective values and numbers of agents are all α -redundant forsome α although specific values are not guaranteed in general. We will use α -redundancyto absorb additive terms proportional to graph weights into constant-factor multiplicativebounds in terms of α . The inter-agent redundancy graph defined in the previous section can be applied in theanalysis of distributed planners (5.3) using a similar approach as in the previous chapter(Chapter 4). Let ˆ N i = { , . . . , i − } \ N i be the set of preceding agents that are ignoredat step i of the assignment process according to (5.3) so that the set of all deleted edges is˜ E = { ( i, j ) | i ∈ A , j ∈ ˆ N i } . Then the planner suboptimality can be bounded in terms ofthe weights of these deleted edges. Theorem 6 (Suboptimality of distributed planning) . The suboptimality of a planner obey-ing (5.3) can be bounded using the cumulative weight of deleted edges as g ( X (cid:63) ) ≤ g ( X d ) + (cid:88) ( i,j ) ∈ ˜ E w ij . (5.7) Proof.
Theorem 6 follows from application of pairwise redundancy on the inter-agent re-dundancy graph to the standard proof technique for sequential maximization, g ( X (cid:63) ) ≤ g ( X (cid:63) , X d )= g ( X d ) + n a (cid:88) i =1 g ( x (cid:63)i | X d , X (cid:63) i − ) ≤ g ( X d ) + n a (cid:88) i =1 g ( x (cid:63)i | X d N i ) ≤ g ( X d ) + n a (cid:88) i =1 g ( x d i | X d N i )= 2 g ( X d ) + n a (cid:88) i =1 (cid:0) g ( x d i | X d N i ) − g ( x d i | X d1: i − ) (cid:1) . (5.8)The first equality is a telescoping sum. The inequalities follow first from g being mono-tonically increasing, second from submodularity, and third by greedy choice according to(5.3). The last equality is another telescoping sum. The main result (5.7) follows fromapplication of (5.5) and (5.4) to the sum in (5.8) and the definition of ˜ E . (cid:4) .5 Randomized distributed planners Let us now apply the analysis from the previous section to design of randomized distributedplanners. A set of agents can execute planning steps in parallel if no pair of agents in theset shares an edge in the planner model (5.3). Considering this, we construct distributedplanners by partitioning the agents and eliminating edges within blocks of the partition andthen bound suboptimality for randomly assigned partitions. Finally, we present conditionsfor such planners to scale to an arbitrary number of agents and to admit features such aslimited communication range.
Consider a planner where subsets of agents plan in parallel according to a partition { D , . . . , D n d } of agents A = ∪ n d i =1 D i . In such a planner, n d corresponds to the maxi-mum number of sequential planning steps. Let d i map each agent i to its block in thepartition so that i ∈ D d i , and let the total ordering of agents respect a partial orderinginduced by ordering the blocks of the partition so that i < j implies d i ≤ d j . We con-struct a planner (5.3) from the partition and ordering of agents and blocks by eliminatingneighbors that share the same block from the complete directed acyclic graph N i = { , . . . , i − } \ D d i , ˆ N i = { , . . . , i − } ∩ D d i . (5.9)Ideally, the partition would minimize the cumulative weight of edges eliminated in thesubgraphs of the blocks. However, that is equivalent to maximizing the weight of edgesoutside of the subgraphs which is the Max k -Cut problem on the inter-agent redundancygraph. Finding exact solutions is intractable because Max k -Cut is NP-Complete [105].Therefore, the next section proposes randomized approaches that produce approximatesolutions. As observed by Andersson [3], a random partition obtains a trivial n d − n d by observing thatedges are removed uniformly at random. The approach presented here is similar and ispresented from the perspective of individual agents.Consider a distributed planner as defined by (5.3) where agents share partial solutionswith neighbors given a partition of the agents as in (5.9). Let each agent select its partitionindex d i independently and uniformly at random from { , . . . , k i } so that n d = max i ∈A k i .We refer to such planners as implementing Randomized Sequential Partitions (RSP n d )as this approach partitions agents into n d groups which plan sequentially over the samenumber of sequential steps.We consider two policies for selection of k i based on the weights of the redundancygraph (5.4) and a per-agent budget for additive suboptimality γ >
0. For global adaptive planners agents i ∈ A select from a fixed number of partition indices proportional to the75otal redundancy so that k i = n d = n a γ (cid:88) ( i,j ) ∈E w ij . (5.10)With local adaptive planners k i is proportional instead to the cumulative redundancy forthat agent which may be large compared to other agents but involves less global knowledge k i = γ (cid:88) j ∈A\{ i } w ij . (5.11)Note that the factor of two in arises because the sum in (5.11) counts each edge twiceunlike (5.10). Both local and global planners respect the following bound. Theorem 7 (Suboptimality of RSP planning) . Given a budget γ > for per-agent subopti-mality and a planner defined according to (5.3) which partitions agents according to (5.9) bydrawing partition indices d i uniformly from { , . . . , k i } using (5.10) or (5.11) suboptimalityis bounded in expectation as g ( X (cid:63) ) ≤ E [ g ( X d )] + n a γ. (5.12) Proof.
The expectation of the cumulative weight of deleted edges for either planner isbounded as E (cid:88) ( i,j ) ∈ ˜ E w ij = 12 n a (cid:88) i =1 E (cid:88) j ∈ D di \{ i } w ij ≤ n a (cid:88) i =1 k i (cid:88) j ∈A\{ i } w ij (5.13)where the inequality accounts for when d j > k i . That is, when another agent j selects apartition index d j > k i outside of the set considered by i , the corresponding edge cannotbe deleted from the perspective of agent i . For global adaptive planners, k i = n d for all i and (5.13) simplifies to n d (cid:80) ( i,j ) ∈E w ij and holds with equality. Then (5.12) follows byapplying (5.10) or (5.11) and substituting into the expectation of (5.7) over partitions ofagents. (cid:4) Restating in terms of α -redundancy provides a stronger statement that is useful whenvarying the number of agents. Corollary 7.1 (Constant factor suboptimality) . Problems with fixed α -redundancy satisfythe constant-factor bound − (cid:15) g ( X (cid:63) ) ≤ E [ g ( X d )] (5.14)76 or (cid:15) > and a budget of γ = (cid:15)αn a (cid:88) ( i,j ) ∈E w ij (5.15) by substituting (5.15) into (5.12) , applying (5.6) , and rearranging. Corollary 7.2 (Fixed n d for global planning) . Given fixed α -redundancy, global adaptiveplanners (5.10) provide constant-factor suboptimality for n d = (cid:100) α(cid:15) (cid:101) sequential planningsteps which follows by rearranging (5.15) to match (5.10) . In this section, we present sufficient conditions to preserve these guarantees when increasingthe number of agents. These conditions correspond intuitively to scenarios where agentshave access to local actions and where the environment volume and rewards scale with thenumber of agents.We say problems (5.2) with n a agents exhibit β -linear scaling for β > g ( X (cid:63) ) ≥ βn a (5.16)which expresses the condition that rewards scale with the number of agents.In order to express the relationship between inter-agent distances—or the distributionof agents—and inter-agent redundancy, define a function of inter-agent distance r : R ≥ → R ≥ so that w ij ≤ r ( || p i − p j || ) (5.17)where || · || is some norm and p i , p j ∈ R d are appropriately defined agent positions associ-ated with the blocks of the partition matroid. The following theorem identifies sufficientconditions for problems to have finite α on average and in turn to satisfy Theorem 7 andcorollaries which implies a constant expected number of sequential steps ( n d ) for the globalplanner design (5.10) and any number of agents. Theorem 8 (Finite average redundancy) . Consider a class of problems (5.2) with a dis-tribution of agents in R d with finite density of at most ρ that satisfies linear scaling (5.16) and has redundancy bounded in terms of inter-agent distance (5.17) for fixed β and r . If (cid:82) ∞ r ( s ) s d − d s is finite, the average value of α , interpreted as a random variable, is alsofinite.Proof. Let ρ be an upper bound on the marginal density of agents in R d , and let A d be the surface area of the unit sphere under the chosen norm. Because the distribu-tion of agents has fixed maximum density ρ , we may obtain results for arbitrarily largenumbers of agents by taking the limit as this distribution of agents covers the entireEuclidean space. By applying (5.17) and integrating over spheres centered on p i foran arbitrarily large environment, the expected redundancy for a given agent is at mostˆ r = ρA d (cid:82) ∞ r ( s ) s d − d s ≥ E (cid:104)(cid:80) j ∈A\{ i } w ij (cid:105) which is proportional to the integral in Theo-rem 8. Treating α as a random variable so that (5.6) is tight and applying (5.16) we get E [ α ] ≤ E (cid:104) (cid:80) ( i,j ) ∈E w ij g ( X (cid:63) ) (cid:105) ≤ ˆ rβ which is finite if ˆ r is also finite given that β > (cid:4) .5.4 Limited communication range Similar analysis can be used to analyze or design limits on communication range. Let r c be the maximum communication range so that the set of agents that are in range is B i = { j | r c > || p i − p j ||} . Then, any existing communications graph can be readilymodified by intersecting the set of in-neighbors with the set of agents that are in rangeto obtain a new set of neighbors ˆ N i = B i ∩ N i . Doing so significantly reduces messaging:instead of each robot sending a number of messages proportional to the total number ofrobots, range limits reduce the number of messages per robot to the average size of B . Toemphasize this difference, we refer to this approach as Range-limited RSP (or R-lRSP).To account for ignoring agents past a given range, applying (5.17) to Theorem 6,demonstrates that each agent incurs at most (cid:88) j ∈A\ ( { i }∪B i ) w ij ≤ (cid:88) j ∈A\ ( { i }∪B i ) r ( || x i − x j || )2 (5.18)additional suboptimality (i.e. increase to γ in (5.12)). Then, as in Sec. 5.5.3, the expec-tation of (5.18) over the distribution of agents is upper-bounded by ρA d (cid:82) ∞ r c r ( s ) s d − d s .This reduces the problem of limited communication range to a question of whether theadditional suboptimality is acceptable and whether techniques such as multi-hop commu-nication are necessary to extend the communication range. Note that even though thecommunication range can be designed to incur arbitrarily little additional suboptimality,limiting the communication range is not sufficient to reduce the number of sequential plan-ning steps as those agents within range of any one agent may, in turn, depend on agentsthat are out of range of that agent and so on. Although submodular set functions have been studied extensively, set functions withhigher-order monotonicity properties have received relatively little interest [44, 70, 149].Before moving on to present simulation results, let us examine one such objective whichsatisfies the conditions presented in Sec. 5.1.1. The two scenarios that we will study insimulation involve special-cases of this following objective which is a mild extension ofweighted set cover.Consider a general event detection or identification problem with independent, proba-bilistic failures. We define a set of events E and let each event e ∈ E have value v e ≥ e ∈ E and element of the ground set x ∈ U is associated with an independentfailure probability 0 ≤ p ex ≤
1. The expected value of identified events given a set ofsensing actions X ⊆ U is then g ( X ) = (cid:88) e ∈ E (cid:18) − (cid:89) x ∈ X p ex (cid:19) v e (5.19)and is equivalent to the well-known weighted set cover objective in the deterministic case( p ex ∈ { , } ). 78he probabilistic sensor coverage objective is monotonically increasing, submodular,and 3-increasing and even satisfies alternating monotonicity conditions on higher deriva-tives [9, 44]. These conditions follow inductively by demonstrating that differences aresimilar in form to the original function. We note that similar results exist in the litera-ture for coverage Wang et al. [191, Section 4.1] and related generalizations thereof [157,Theorem 13].
Theorem 9 (Probabilistic coverage is monotonic, submodular, and 3-increasing) . Cov-erage with sensor failure (5.19) and, by extension, weighted set cover satisfy alternatingmonotonicity conditions and are i -increasing and i -decreasing respectively for odd and even i . As such, each is monotonic, submodular, and 3-increasing.Proof. The probabilistic coverage objective (5.19) can be written in the form g ( X ) = a − (cid:88) e ∈ E (cid:89) x ∈ X \ A p ex ˆ v e (5.20)for a ∈ R , A ⊆ U , ˆ v e ∈ R ≥ , and 0 ≤ p ex ≤ e ∈ E and x ∈ U . Observe that g is monotonic due to the form of the product and because the terms of the productare probabilities. Recall the recursive definition of the derivative (3.16), and consider thederivative g ( Y | X ) in the direction Y ⊆ U evaluated at X ⊆ U \ Y which has the form g ( Y | X ) = g ( Y, X ) − g ( X )= − (cid:88) e ∈ E (cid:89) j ∈ ( X ∪ Y ) \ A p ej ˆ v e + (cid:88) e ∈ E (cid:89) k ∈ X \ A p ek ˆ v e = (cid:88) e ∈ E (cid:18) − (cid:89) j ∈ Y \ A p ej (cid:19) (cid:89) k ∈ X \ A p ek ˆ v e . (5.21)Because − g ( Y |· ) has the same form as g , this first derivative g ( Y |· ) is decreasing; thatis, g is submodular. Applying the recursive definition of the derivative (3.16) to g ( Y |· )then produces the second derivative which is, in turn, a monotonically increasing functionwith the same form as g . Therefore, g is 3-increasing (5.1). Moreover, by induction g also satisfies alternating monotonicity conditions on higher derivatives (Sec. 3.5.1d).Finally, probabilistic coverage (5.19), having the same form, satisfies the same monotonicityconditions. (cid:4) The proposed distributed planning approach is evaluated through two sets of simulationexperiments, each using a variant of the objective function analyzed in Sec. 5.6. Thefirst evaluates the performance of distributed planners (RSP) that use various numbers ofsequential planning steps (agent partition size n d ) in an area coverage task. The secondset of experiments evaluates adaptive planning and limits on communication range (RSP See discussion of the M¨obius inversion [9, Sec. 6.3]. n a n a = 50 By n a n a = 50 r s (cid:112) / ( n a π ) 0.113 (cid:112) . / ( n a π ) 6 . · − r a r s r s Table 5.1:
Agent ( r a ) and sensor ( r s ) radii as a function of the number of agents ( n a = 50 in this chapter). and R-lRSP) in a more complex problem with spatially varying rewards and probabilisticsensing. Several aspects of experiment design are kept constant in each scenario. Each scenario isevaluated in 50 random trials and features a large number of agents ( n a = 50) so that theproposed distributed planners utilize many times fewer sequential planning steps than astandard sequential planner (Alg. 2). Agent positions are distributed uniformly at randomover the unit square, and each agent has a choice of 10 available sensing actions ( B i ) whichare sampled from a uniform distribution over a disk with radius r a centered on the agentposition. Although the two sets of simulation experiments do not use the same sensormodel, each is a function of sensor radius r s . The sensor and agent radii used in eachexperiment are listed in Tab. 5.1. In each case, the objective is designed to take on valuesno greater than one. The reward for the area coverage task is the area of the union of discs, each with radius r s , intersected with the unit square. In terms of the sensor model defined in Sec. 5.6,this is equivalent to having a failure probability of one outside the disk and zero inside.An example of one simulation trial (using parameters tuned for visualization purposes) isdepicted in Fig. 5.3. The experiments compare distributed planning with RSP and fixedpartition sizes, k i = n d ∈ { , , } , to sequential planning (Alg. 2) and two naive planners:completely random action selection and myopic maximization of the objective over thelocal space of sensing actions (wherein agents ignore each entirely, equivalent to n d = 1).Figure 5.4 shows the results of these experiments.Our RSP planners perform well, although the performance bounds (Theorem 7) aretechnically degenerate for these instances because the deleted edge weight generally ex-ceeds the maximum possible objective value (one unit of area) which is evident from the Unless otherwise specified, each trial features both random planners and scenarios according to theirrespective designs. Agent and sensor radii are set according to a normalization over the number of agents and by usingparameter search to minimize the ratio of the average performance of myopic and sequential planning toidentify hard problem cases. igure 5.3: This figure depicts a maximum area coverage problem and a sequential solution. Agents eachhave a unique color and are distributed uniformly throughout the environment. Sensing actions ( ∗ ) are inturn distributed uniformly within agent radii (dashed lines). The solution consists of one selected action( (cid:63) ) for each agent; these actions are centered on the translucent disks which make up the covered area. cumulative weights of the inter-agent redundancy graph. However, the trend in perfor-mance is similar to what would be expected for increasing n d as the objective values ofour RSP planners approach the performance of sequential planning approximately with1 /n d : the difference in area coverage compared to sequential planning decreases by ap-proximately half each time n d is doubled and by 9.9 times from n d = 1 to n d = 8. Overall,the performance of the distributed RSP planner represents a significant improvement oversequential planning. Given the number of agents, even the greatest value of n d provides a6-times improvement in the number of sequential planning steps. Then, by the scalabilityanalysis in Sec. 5.5.3, similar performance can be expected for larger problems given similardensities of agents. The goal of the probabilistic sensing task is to maximize the value of correctly identifiedevents (e.g. correct classification of objects moving through the environment). For eachtrial n e = 50 events, each worth a value of 1 /n e , are sampled from a fixed Gaussian Note that the bound ceases to be degenerate for larger values of n d and that all results for subopti-mality and scaling still apply. . . . RSP n d =2RSP n d =4RSP n d =8 Sequential Area coverage (a)
Objective values .
01 0 .
02 0 .
03 0 . · Edge weight (redundancy) F r e q u e n c y (b) Edge weight
10 12 14 16 18051015Redundancy graph weight F r e q u e n c y (c) Total graph weight
Figure 5.4:
Results for the area coverage problem (Fig. 5.3): (a) Objective values for myopic and randomplanning (no coordination), variations of the distributed RSP planner with n d sequential planning steps,and fully sequential planning (intractable for large numbers of agents). The performance of the proposeddistributed planner approaches sequential planning given many times fewer sequential planning steps. (b)Redundancies are computed for each pair of agents. Sensing actions (disks) for distant agents cannotoverlap resulting in many zero weighted edges, and remaining edges are distributed according to varyingdegrees of overlap in potential sensing actions. (c) The total weight of the redundancy graph is largelybetween 13 and 17. Even though the planners perform well, our suboptimality bounds would no longer bemeaningful as deleted edge weights would exceed maximum objective values. mixture, rejecting samples outside the unit square. An example is shown in Fig. 5.5although using agent parameters more appropriate for visualization purposes but the sameGaussian mixture and number of events. This results in a spatially varying distribution ofreward and redundancy. The success probability of the sensor model is e − x /r s4 , where x is the distance from sensor to event location. This success probability effectively amountsto area coverage with soft edges.This set of experiments evaluates adaptive planning and limited communication range.The budget for deleted edge weight per-agent for the local and global planners is set to γ = 0 . /n = 8 · − . Range-limited RSP planners are obtained by deleting edges from therespective instances of the distributed planners with a communication range of r c = 2 r a which allows for a small amount of redundancy at the limits of the sensor range r s . Randomplanning is not included in this set of experiments because it vastly under-performs myopicplanning as the problem design ensures that a large fraction of sensing actions provide littlevalue.Fig. 5.6 shows the results of these experiments. Local and global adaptation each per-form almost identically in terms of distributions of objective values and with distributiononly slightly below that of sequential planning. Because the objective and actions arehighly local, enforcing limits on the communication range has little impact on the plannerperformance in terms of either objective value or cumulative weight of deleted edges. Theglobal adaptive RSP planner obtains consistent partition sizes by averaging over all agents.In contrast with the local planners, agents sometimes select from as many as 33 planningsteps. Sec. 5.5.3 provides some discussion of mitigation strategies to avoid such long plan-ning times. More generally, mixing the extremes of the local and global RSP planning maybe desirable and avoid computing averages over all agents.82 a) Scenario (b)
Planner result
Figure 5.5:
This figure shows an example of a probabilistic sensing scenario and a sequential solution.Parameters for the scenario are identical to the experimental trials, but the parameters for the agentshave been tuned for purpose of visualization. The goal of this task is to maximize the expected numberof successful detections or identifications. (a) For each trial, events (x) are sampled from a mixture ofGaussians and are identified correctly with some probability dependent on the sensing actions. (b) Agents,each shown in a different color, are distributed uniformly throughout the environment. Sensing actions ( ∗ )are distributed uniformly within the agent radius (dashed lines), and each agent selects a single sensingaction ( (cid:63) ) and successfully identifies events according to a soft-coverage sensing model. The resultingidentification probability given selected actions is shown in the background; identification probability ishigh (yellow) near selected sensing actions and, for good selections, near the events. Efficiently solving submodular maximization problems on sensor networks is challengingdue to the inherent sequential structure of common planning strategies. Whereas priorworks [75, 82] have shown that worst-case performance degrades rapidly when reducing thenumber of sequential planning steps, we show that constant-factor performance approach-ing that of the standard sequential algorithm can be obtained via randomized planning(RSP) so long as cumulative redundancy between agents is at most proportional to theobjective values. Toward this end, the inter-agent redundancy graph expresses the degreeof coupling between agents in the submodular maximization problem, and functions thatare 3-increasing admit performance bounds in terms of this graph structure. Further, wehave demonstrated that the bounds we obtain with this approach are readily applicable forplanner design such as for adapting the number of sequential planning steps or via rangelimits on communication.Later chapters will apply these results to the design of online anytime receding-horizonplanners for target tracking tasks and exploration (as in Chapter 4). Likewise, this sameapproach is applicable to other multi-robot sensing tasks which the reader might encounter.Ultimately, planning in real time will also require increased attention to timing for planning83 . .
35 0 . . Global RSPG. Range RSPLocal RSPL. Range RSP
Sequential Expect. num. ident. events (a)
Objective values 4 5 6 7 8 9 10 11010203040 Partition size, n d F r e q u e n c y (b) Partition sizes: global adaptive RSP0 5 10 15 20 25 30 350200400600 Local partition size, k i F r e q u e n c y (c) Partition sizes: local adaptive RSP 0 0 . . . . Global RSPG. Range RSPLocal RSPL. Range RSP
Cumulative deleted edge weight (d)
Deleted edge weight
Figure 5.6:
Results for the probabilistic sensing problem (Fig. 5.5): (a) The proposed local and globaladaptive RSP planners along with their
Range-limited RSP counterparts outperform myopicplanning and approach the performance of the fully sequential planner (intractable for large numbers ofagents) in terms of objective values. (b) The global adaptive planner uses 4 to 10 partitions in all trialswhile (c) the local adaptive planner occasionally uses local partitions sizes exceeding 20 for agents withhigh redundancy. (d) The cumulative weight of deleted edges is on the same order as the objective valueand is below the desired limit of 0.4. Range-limited planners are obtained by deleting edges from theassociated adaptive planner at the cost of relatively little additional deleted edge weight. steps such as reasoning about the impact of available planning time on anytime planningperformance.Additionally, we have noted that submodular objectives such as mutual information arenot necessarily 3-increasing. Identifying when objective functions are 3-increasing (exactlyor approximately) is central to broad application of the results in this chapter, and wewill find ways to obtain similar results for mutual information objectives for both targettracking and exploration.Finally, high-order monotonicity conditions have been useful in other problems involv-ing submodular objectives [84, 106, 157, 191], and developing these applications would bean interesting direction for future study. 84 hapter 6Receding-Horizon Planning forTarget Tracking with Sums ofSubmodular Functions
Planners for robotics and sensor planning problems frequently introduce notions of locality,coupling, or redundancy amongst robots, sensors, or locations [26, 108]. Chapter 5 intro-duced one such notion of redundancy using properties of 3-increasing functions in order todesign scalable planners. In this chapter, we expand on the idea of redundancy as a toolfor planner design by analyzing a class of factored mutual information objectives that arerelevant to target tracking and form sums of submodular functions.Chapter 5 provided performance bounds for a slight generalization of weighted setcoverage but also presented a counterexample for mutual information with conditionallyindependent observations (i.e. for typical submodular mutual information objectives, seeSec. 3.5.1a). Mutual information is particularly important for perception and control [37,103, 163] and is the focus of Chapter 4. As such, generalizing to a variety of objectivefunctions, as we will begin to do here, is an important part of this thesis.Additionally, this chapter applies the RSP and R-lRSP planners from the previouschapter to produce anytime, receding-horizon planners and includes additional analysis forthis setting.
In target tracking problems, robots seek to observe a number of discrete targets whosestates may evolve in time, such as for surveillance, monitoring wildlife [52], and interceptingrogue UAVs [165]. Such sensing tasks typically involve planning to observe a number ofdiscrete objects, targets , whose states change with time. When a large quantity of targetsare spread over more space than a single robot can cover, deploying teams of robots canimprove tracking performance.Even simple target tracking problems, such as with noisy range sensors, produce multi-modal posterior distributions that do not have closed-form solutions; planning and tracking85ystems may then approximate both the posterior and sensing utility [34, 35]. Realisticenvironments can also induce complex motion models which arise in search on road net-works [145] and in indoor environments [89]. This motivates selection of sufficiently generalobjectives and path planners to capture the degree of complexity in these problems. Toaddress this, we provide analysis that applies to general problems and even adversarialproblem selection so long as the targets’ dynamics do not depend on the tracking robots. Systems for target tracking in multi-robot settings often rely on greedy algorithms [182,202] for submodular maximization. However, existing results and those in Chapter 5 thatseek to develop scalable planners that are relevant to target tracking problems are alsolimited to coverage-like objectives [177]. Analysis for sequential planners also typicallyassumes individual robots obtain either exact [7, 69] solutions or solutions within a constantfactor of optimal [102, 169]. Although that analysis is appropriate for single-robot plannerswith guarantees on solution quality [41, 102, 169], assuming constant-factor suboptimalityis less suited for approximate, anytime, and sampling-based planning [7, 90, 116] which wewill address in this chapter.
This chapter presents a distributed planning algorithm, analysis that accounts for commonalgorithmic approximations, and design of such a planner along with simulation results.
The analysis in this chapter demonstrates that RSP planners are applicable to target track-ing problems by providing guarantees on solution quality in terms of pairwise redundancybetween robots’ actions via an extension to sums of submodular functions. This extendsour prior results for coverage-like objectives (Chapter 5) to include more general objec-tives such as for mutual information which can represent information gain with respectto targets that have complex states and dynamics. We also account for the contributions of common sources of suboptimality (approximationof the objective and suboptimal single-robot planning) . Our results affirm that methods forsubmodular maximization are applicable for target tracking in the presence of approximateobjective values and anytime planners that may occasionally produce poor results or fail.
Finally, we apply these results to develop a planner for multi-robot multi-target trackingwith a mutual information objective, and we show that distributed planning for target The latter condition excludes pursuit-evasion problems [49, 165]. Fig 5.2 describes a mutual information objective that violates requirements on “coverage-like” objec-tives. igure 6.1: A team ofaerial robots R (black) planover a receding horizon totrack a number of targets T (red). In doing so, robots se-lect sensing actions to mini-mize uncertainty in the tar-get states which evolve in-dependent of each other andthe robots. tracking with Range-limited RSP in constant time, independent of the number of robots canguarantee suboptimality approaching that of fully sequential planning. Finally, simulationresults support our claim that R-lRSP maintains consistent solution quality while planningfor up to 96 robots. This produces over a 20 × reduction in the number of sequentialplanning steps and an even further improvement in computation time (sequential planningis intractable at this scale). Consider a set of moving targets T = { , . . . , n t } and robots tracking those targets R = { , . . . , n r } , seeking to minimize uncertainty (entropy [58]), as illustrated in Fig. 6.1. Let x r i,t ∈ R d r and x t j,t ∈ R d t be the respective states of robot i ∈ R and target j ∈ T at time t ∈ { , . . . , T } . The states of each evolve in discrete time, according to known dynamics x r i,t +1 = f r ( x r i,t , u i,t ) , x t j,t +1 = f t ( x t j,t , (cid:15) t j,t ) , (6.1)where u i,t ∈ U is a control input from the finite set of control inputs U and (cid:15) t i,t is a randomdisturbance. The robots receive noisy observations y i,j,t of the target states according to y i,j,t = h ( x r i,t , x t j,t , (cid:15) y i,j,t ) (6.2)where (cid:15) y i,j,t is observation noise. We refer to states and observations collectively usingboldface capitals as X r t , X t t , and Y t , each at time t . The robots plan to jointly maximize a mutual information (MI) objective over a recedinghorizon, starting at time t with duration l . Specifically, robots maximize a submodular,monotonic, and normalized objective g subject to a partition matroid constraint, and theoptimal set of control actions is X (cid:63) ∈ arg max X ∈ I g ( X ) (6.3)87here I is a partition matroid that represents assignment of sequences of control actionsto robots such that B i = { ( i, u l ) | u l ∈ U l } ∀ i ∈ R , (6.4)and g is the mutual information between observations and target states given the choiceof control actions. Interpreting future states X t t +1: t + l and observations Y t +1: t + l as randomvariables (induced by the process (6.1) and observation (6.2) noise), and write Y t +1: t + l ( X )for X ⊆ U g ( X ) = I ( X t t +1: t + l ; Y t +1: t + l ( X ) | Y t , X r0: t ) (6.5)where I ( X ; Y | Z ) is the Shannon mutual information between X and Y conditional on Z and quantifies the reduction in uncertainty (entropy) of one random variable givenanother. Although we discuss properties of entropy and mutual information, we referinterested readers to Cover and Thomas [58] for detail and definitions. Critically, mutualinformation objectives are normalized, monotonic, and submodular (defined in Sect. 3.5.1)when observations are conditionally independent of target states [107] although higher-order monotonicity conditions may not apply (see discussion in Fig. 5.2). Because therobot states are known and deterministic, individual targets and observations of the sameare jointly independent so that (6.5) can be written as a sum over the targets g ( X ) = (cid:88) j ∈T I ( X t j,t +1: t + l ; Y j,t +1: t + l ( X ) | Y j, t , X r0: t ) (6.6)because the mutual information is a difference of entropies [58, Eq. 2.45] which, in turn,decompose as sums over targets [58, Theorem 2.6.6]. Spatial locality in target tracking problems arises when the capacity to sense a targetdecreases with some measure of distance. The variations in each robot’s i ∈ R abilityto sense different targets j ∈ T take the form of channel capacities C i,j from informationtheory [58, Chapter 7] so that C i,j = max x ∈ B i I ( X t j,t +1: t + l ; Y j,t +1: t + l ( x ) | Y j, t , X r0: t ) . (6.7)This channel capacity is itself an informative planning problem although designers mayapply existing results for channel capacities by relaxing (6.7) such as given bounded traveldistances on the robot and target. More formally, Y t +1: t + l ( X ) is a random variable that encodes the noisy observations (6.2) thatrobot i receives after executing u l according to (6.1) for each ( i, u l ) ∈ X . When X ⊆ U includesmultiple hypothetical assignments to the same robot—which arises in the analysis—duplicates obtainunique observations and observation noise (cid:15) t . All analysis applies to discrete and continuous mutual information and transformations of targetstates e.g. for MI with position but not derivatives.
A common feature of distributed planning problems is limited access to information. Thecentral assumption for our computational model is that each robot i ∈ R is able to ap-proximate the objective for its own set of actions B i . That is, each robot has access to anapproximation of marginal gains (cid:101) g i ( x i | A ) for each action x i ∈ B i in its local set and forprior selections A ⊆ U . This may reflect both approximate evaluation of mutual infor-mation and constraints on access to sensor data such as allowing robots to ignore distanttargets.Robots also have limited access to the ground set U : they produce elements of theirlocal sets B i ⊆ U implicitly via local planners and obtain access to other robots’ actionswhen they communicate their decisions. Algorithm 4
Distributed algorithm for receding-horizon target tracking with RandomizedSequential Partitions (RSP) from the perspective of robot i ∈ R for execution starting attime t . N in ← in neighbors of robot i N out ← out neighbors of robot i θ ← sensor data (or summary), accessible to robot i Receive : X d N in from N in (cid:101) g ← approximation of g given θ x d ← PlanAnytime ( (cid:101) g, x r t , X d N in ) Send : x d to N out Execute : x d starting at time t and until the beginning of the result of the nextplanning roundThe distributed planner that we present in this section extends methods from Chapter 5.Algorithm 4 provides pseudo-code for this distributed RSP algorithm based on a directedacyclic graph where robots are vertices with incoming edges N in (and outgoing N out ).Although we do not emphasize timing, this algorithm runs in synchronous epochs wherebythe robots collectively maximize information gain to collectively solve instances of thejoint receding-horizon optimization problems (6.3). The output of this planner is then thecollection of the robots’ decisions X d = { x d1 , . . . , x d n r } .89ach robot begins planning after receiving decisions from a set of in-neighbors and ap-proximates the objective based on available data ( θ ) and computational resources (lines 4–6). Once the planner exits or runs out of time, the robot commits to an action which itsends to any out-neighbors and then executes in a receding-horizon fashion (lines 7–8). Inthis exposition, we assume the in- and out-neighbors are known for brevity. However, theplanner graph can also be implicit based on the messages the robot has received when itstarts planning. Through this process, the single-robot planners ( PlanAnytime ) collec-tively run in n d sequential steps. We assume robots are randomly assigned to planningsteps via the RSP methods presented in Chapter 5. Aside from applying RSP to receding-horizon optimization, we seek to account for approx-imations (sampling based planning and objective evaluation and ignoring distant targets)pursuant to constraints on computation time and information access. Specifically, theanalysis (Sect. 6.5) will account for costs of approximations arising in distributed planning , objective evaluation , and anytime (single-robot) planning . These costs will then relate thesuboptimality of Alg. 4 to the nominal bound for sequential planning. Before moving on to the individual costs, let us present a generalized expression of costsfor approximations in decision making. Any of the approximations we encounter caninhibit exact evaluation of maximization steps (
PlanAnytime ) in the distributed planner(Alg. 2). Rather than assume constant factor suboptimality at each step [169]—as would beappropriate if the local planner provided a consistent performance guarantee—we presenta more flexible cost model that accounts for uncertain results that arise when planning inreal time. Given an instance of (6.3) and a distributed solution X d , the cost to robot i ∈ R for making a suboptimal decision x d i is the difference between the value of that decisionand the true maximum over B i γ i ( g, X ) = max x ∈ B i g ( x | X ) − g ( x d i | X ) (6.8)for the true objective g marginally specific prior decisions X ⊆ X d1: i − . In the nominal sequential greedy algorithm (Alg. 2), robot i ∈ R plans with access toall previous decisions by robots { , . . . , i − } . However, in our approach (Alg. 4) robotsonly have access to a subset N i ⊆ { , . . . , i − } of these decisions ( N in in the algorithmdescription), which induces a directed acyclic graph with edges ( j, i ) for each robot j ∈ N i whose decision i uses while planning [75, 82]. In a sense, the robots ignore decisions90y the remaining robots ˆ N i = { , . . . , i − } \ N i , and the cost of doing so is a secondderivative (3.16): γ dist i = g ( x d i | X d N i ) − g ( x d i | X d1: i − ) = − g ( x d i ; X dˆ N i | X d N i ) . (6.9)Later, we will upper bound the right-hand-side in terms of ˆ N i . Although we do not focus on the communication and representation of sensor data, we as-sume robots have access to relevant data for exact or approximate evaluation of the mutualinformation objective (6.5). However, robots may ignore distant targets or approximatethe objective via sampling. We say robot i ∈ R has access to a local approximation (cid:101) g i ofthe objective, and the cost of this approximation (treating stochasticity implicitly) is atmost the sum of the maximum over- and under-approximation of gγ obj i = max x ,x ∈ B i ( (cid:101) g i ( x | X N i ) − g ( x | X N i ) + g ( x | X N i ) − (cid:101) g i ( x | X N i )) . (6.10)Here, x and x are respectively the points where (cid:101) g i most under- and over-approximates g over all decisions B i available to robot i . Selecting sensing actions for individual robots produces informative path planning prob-lems [41, 169]. Each robot has a limited amount of time available for planning and mustterminate planning and transmit results soon enough so that later robots can make theirown decisions before the plans go into effect (at time t in Algorithm 4).Although some existing methods provide performance guarantees [41, 169, 200], de-signers who apply these methods may have to vary replanning rates or tune problemparameters to satisfy constraints on planning time for operation in real-time. On the otherhand, methods such as randomized planning [90, 116] and gradient- and Newton-basedtrajectory generation [36, 93] converge to local or global maxima but provide no specificguarantees on solution quality before convergence for anytime planning. Along these lines,we will now provide analysis based on empirical performance and will apply Monte-Carlotree search [27, 39] later for single-robot planning in the simulation results as in Chapter 4.The cost of approximate single-robot planning in terms of empirical performance isthen γ plan i = γ i ( (cid:101) g i , X d N i ) . (6.11)This approach captures this inherent uncertainty of anytime planning and enables us tocharacterize collective performance in terms of the bulk suboptimality of single-robot plan-ning. 91 .5 Analysis of suboptimality of distributed planning The distributed planner described in Alg. 4 achieves a performance bound that approachesthat of sequential planning (Alg. 2) with additional suboptimality that arises from eval-uating the objective, planning for individual robots, and distributed coordination of theteam.
Theorem 10 (Suboptimality of Alg 4 for target tracking tasks) . Consider an instance of (6.3) , any solution X d that Alg. 4 produces satisfies g ( X (cid:63) ) ≤ g ( X d ) + (cid:88) i ∈R (cid:16) γ dist i + γ obj i + γ plan i (cid:17) , (6.12) and the total cost of distributed planning is bounded by (cid:88) i ∈R γ dist i ≤ (cid:88) i ∈R (cid:88) j ∈ ˆ N i (cid:99) W ( i, j ) (6.13) where (cid:99) W is a collection of edge weights which we will define later that describes redundan-cies that arise when pairs of robots may observe the same targets. The proof of Theorem 10 is in Appendix B.6, and we provide a brief summary at the endof this section in Sect. 6.5.3 after introducing some preliminary results related to the first(Sect. 6.5.1) and second (Sect. 6.5.2) parts of this theorem.Regarding the structure of Theorem 10 the bounds describe the performance of practicalimplementations of Alg. 4. An idealized version of this distributed algorithm would haveaccess to exact objective values and maxima so that the associated costs γ obj and γ plan would all be zero. From this perspective, (6.12) describes how real implementations maydeviate from this ideal and states that the total suboptimality is a simple accumulation ofindividual inefficiencies. The following lemma expresses the suboptimality of any decision as a sum of costs ofsuboptimal decisions.
Lemma 11 (Suboptimality of general assignments) . Given a submodular, monotonic, nor-malized objective g , any basis (assignment of actions to all robots) X d ∈ I on a simplepartition matroid satisfies g ( X (cid:63) ) ≤ g ( X d ) + n r (cid:88) i =1 γ i ( g, X d1: i − ) . (6.14)The proof of Lemma 11 is in Appendix B.5.Observe that if we obtain X d via exact sequential maximization, the terms ( γ ) of thesum in (6.14) go to zero, and we obtain the original result by Fisher et al. [69]. Simi-larly, [169, Theorem 1] on constant factor suboptimal solvers follows after substituting thesuboptimality into the cost model (6.8). 92 .5.2 Bounding the cost of distributed planning for target track-ing problems This section provides tools for characterizing the cost of distributed planning (6.13) intarget tracking problems. We begin by discussing decomposition of the objective as a sumover targets and derive a bound in terms of the result. Applying this bound produces acollection of weights that relate the cost of distributed planning to the channel capacities(6.7) between the robots and targets.
The objectives of sensing problems can often be written as sums is true for our targettracking objective (6.5) which is a sum over information sources (6.6). Specifically, let G = { g , . . . , g n t } be a collection of set functions so that for j ∈ T and X ⊆ U then g j ( X ) = I ( X t j,t +1: t + l ; Y j,t +1: t + l ( X ) | Y j, t , X r0: t ) . (6.15)This decomposition will enable characterization of interactions between distant robots interms of each robot’s capacity to sense near and distant targets. Definition 8 (Sum decomposition) . A set of submodular, monotonic, and normalizedfunctions G = { g , . . . , g n } decomposes a set function g if g ( X ) = (cid:88) ˆ g ∈ G ˆ g ( X ) , for all X ⊆ U . (6.16)Closure over sums [70] ensures that g is submodular, monotonic, and normalized if thesame is true for each ˆ g ∈ G as in Def. 8. Further, although some such sum decompositionalways exists ( G = { g } ), the choice of decomposition will determine the tightness of ourperformance bounds; here, we are interested in decompositions that express how interac-tions vary with distance. Choosing G according to (6.15) captures spatial locality arisingout of distributions of robots and targets. Given some G that decomposes g , the second derivative (3.16) at X ⊆ U with respect to A, B ⊆ U , all disjoint, is g ( A ; B | X ) = (cid:88) ˆ g ∈ G ˆ g ( A ; B | X ) (6.17)and likewise for all other derivatives which are linear combinations of values of g at differentpoints.This derivative has the same form as the cost of ignoring prior decisions during dis-tributed planning γ dist (6.9). The rest of this section is devoted to obtaining upper boundsrelating each robot’s decision to the decisions they ignore ( A and B ) while eliminating thedecisions each robot takes into account ( X ).93 .5.2c Bounding second derivatives using sum decompositions Applying monotonicity and submodularity respectively provides a trivial upper bound onthe second derivative of a set function g ( A ; B | X ) = g ( A | B, X ) − g ( A | X ) ≥ − g ( A | X ) ≥ − g ( A ) (6.18)where A, B, X ⊆ U are disjoint subsets of the ground set. By symmetry g ( A ; B | X ) ≥ − min( g ( A ) , g ( B )) . (6.19)Then, expressing the second derivative of g in terms of the sum decomposition (6.17) andbounding the derivatives of ˆ g ∈ G yields g ( A ; B | X ) ≥ (cid:88) ˆ g ∈ G − min(ˆ g ( A ) , ˆ g ( B )) . (6.20) Remark.
Chapter 5 relies on g ( A ; B | X ) increasing monotonically in X , and we could makea similar statement here by writing the right-hand-side of (6.19) in terms of marginal gainswith respect to X .More broadly, generalizing (6.20) to include any relevant lower bounds on second deriva-tives of ˆ g ∈ G unifies the results we present here with those for 3-increasing functions inChapter 5. This enables development of multi-objective applications such as for robotscovering an environment while simultaneously localizing objects. Bounding the second derivative of g (6.20) leads to bounds on interactions between agents.The upper bounds on these interactions form a weighted undirected graph G = ( R , E , W )connecting the robots with edges E = { ( i, j ) | i, j ∈ R , i (cid:54) = j } and weights W ( i, j ) = max x i ∈ B i ,x j ∈ B j (cid:88) ˆ g ∈ G min(ˆ g ( x i ) , ˆ g ( x j )) . (6.21)Evaluating the right-hand-side of the above expression is difficult as doing so involvessearch over the product of two robots’ action spaces. To make evaluation of the weightstractable, relaxing this expression by taking the pairwise minimum of the maximum valuesof each objective component produces an upper bound in terms of the channel capacities(6.7) that avoids search over a product space (cid:99) W ( i, j ) = (cid:88) k ∈T min ( C i,k , C j,k ) = (cid:88) ˆ g ∈ G min (cid:18) max x i ∈ B i ˆ g ( x i ) , max x j ∈ B j ˆ g ( x j ) (cid:19) ≥ max x i ∈ B i ,x j ∈ B j (cid:88) ˆ g ∈ G min(ˆ g ( x i ) , ˆ g ( x j )) = W ( i, j ) , (6.22)recalling that in (6.15) we chose G to decompose g by targets T so that the second equalityfollows from (6.7). As the terms of the sum are the smaller channel capacities, this boundcaptures the idea that if sensing quality decreases with distance so do interactions betweenrobots. 94 igure 6.2: Visualizations of eight andsixteen robots tracking same numbersof targets. Robots with dotted finite-horizon trajectories are blue and targetsred. The background illustrates the sumof target probabilities at each grid space,increasing from purple to yellow.
Theorem 10 consists of two parts. The first, the effect of approximations on planningperformance (6.12) follows by applying Lemma 11, on the suboptimality of general assign-ments, and substituting the definitions of the costs ((6.9), (6.10), and (6.11)). The secondpart (6.13) characterizes suboptimality due to distributed planning and follows by applyingthe chain rule to the definition of cost of distributed planning (6.9) and substituting (6.20),(6.21), and (6.22). Please refer to Appendix B.6 for the full proof.
Algorithm 4 requires a number of sequential planning steps that depends on the structure ofthe planner graph (Sect. 6.4.2) and a constant number of sequential steps for RSP planners.Yet, the sequential greedy algorithm (Alg. 2) requires one step per robot. Further, whenthe optimum is proportional to the sum of weights (ignoring the objective and plannercosts) distributed planning with a constant number of steps independent of the numberof robots can guarantee constant-factor suboptimality in expectation, approaching half ofoptimal (by extension of Theorem 7).Spatial locality—more specifically ensuring that the robot-target channel capacities (6.7)decrease sufficiently quickly with distance—can enable robots to ignore distant robots andtargets with bounded costs and provides optimization performance independent of thenumber of robots. Incorporating range limits in planning by ignoring robots and targetspast a given distance and tracking targets using sparse filters can ensure that computationtime for the single-robot planner is also constant. Likewise, regarding communication,robots send one message for each edge in the directed planner graph (Sect. 6.4.2), andignoring distant robots reduces this to a constant number of messages per robot. Further,Appendix A.2 presents sufficient conditions for the cost of distributed planning γ dist (6.9)to be bounded and so to provide constant optimization performance for any number ofrobots. 95 .7 Results To evaluate the approach, we provide simulation results (visualized in Fig. 6.2) for teams ofrobots tracking targets (one target per robot), robots moving according to planner outputand targets via a random walk, all on a four-connected grid (with side length √ . n r ).The robots estimate target locations using Bayesian filters given range observations to eachtarget with mean ˆ d = min( d,
20) and variance 0 . . d where d is the Euclidean distanceto the target in grid cells. For the purpose of this chapter, all robots have access to allobservations or, equivalently, have access to centralized filters. All simulation trials runfor 100 time-steps; all initial states are uniformly random; and initial target locations areknown. Additionally, all results ignore the first 20 steps of each trial to allow the systemto converge to steady-state conditions.Robots plan actions individually using Monte-Carlo tree search (MCTS, PlanAny-time ) [39] with a two step horizon and collectively according to the specified planner. Toensure tractability we replace the original objective (6.5) with a sum of mutual informationfor each time-step g sim ( X ) = l (cid:88) i =1 I ( X t t + i ; Y t +1: t + i ( X ) | Y t , X r0: t ) , X ⊆ U . (6.23)This objective is equivalent to [156, (18)] and can be though of as minimizing uncertaintyat the time of each planning step. Being a sum, (6.23) remains submodular, monotonic,and normalized so Theorem 10 still applies. Like Ryan and Hedrick [156], we evaluate thisobjective by simulating the system and computing the sample mean of the filter entropy.Because MCTS itself is sample-based, the planner estimates the objective implicitly bysimulating the system once per MCTS rollout; by sampling the more valuable actionsmore often, MCTS produces increasingly accurate estimates for nearly optimal trajectories.Planners with sixteen or more robots use sparse filters with a threshold of 1 e − one step per robot ); the proposed distributed planner (Alg. 4)with each robot assigned randomly to one of n d sequential steps according to the RSPmethods from Chapter 5; myopic planning (MCTS without coordination, one step ); andrandom selection of actions. Additionally, we provide some results for distributed planningwith range limits (R-lRSP) where robots plan while ignoring targets further than 12 unitsaway (in terms of mean position) and other robots further than 20 units. Given the use ofsparse filters, this latter planner runs in constant time .We evaluate the distributed planning approach in terms of task performance (averagetarget entropy) for various numbers of robots (Fig. 6.3), objective values on a commonset of subproblems (6.3), and the redundancy per robot (6.22) for increasing numbers ofrobots which is proportional to 1 /n times the bound on cost of distributed planning (6.13)for randomized assignment in n rounds (see Chapter 5). The results for average targetentropy—which captures the uncertainty in target locations [58]—are based on 20 simula-tions of target tracking for each configuration. Results for objective values and redundancyuse planning subproblems (6.3) taken from the simulation trials for distributed planning96
16 24 32 402 . . . E n tr o p y p e rt a r g e t( b i t s ) RSP n d = 2 random RSP n d = 4 myopic RSP n d = 8 sequential R-lRSP n d = 4 (a) Target entropy .
88 0 . .
92 0 .
94 0 .
96 0 .
98 1 r a n d o mm y o p i c R S P n d = R S P n d = R S P n d = R - l R S P n d = s e q u e n t i a l Objective (fraction of max) (b)
Objective values
20 40 60 80 10010203040 20 40 60 80 100 2 . . R e dund a n c y p e rr o b o t Number of robots O b j ec t i v e v a l u e p e rr o b o t redundancy, large-scaleredundancy, solver-trialsobjective, large-scaleobjective, solver-trials (c) Redundancy (6.22)
Figure 6.3: (a) For target entropy (lower is better) which is the task performance criterion distributedplanning with RSP consistently improves upon myopic planning and approaches sequential planning givenmany times fewer sequential steps. (b) Objective values on common 16-robot subproblems reflect a similartrend, and results for distributed planning (bold) are effectively equivalent to sequential planning givenonly four planning rounds. (6.3). (c) Average objective value and total redundancy weight (6.22) perrobot for RSP with n d = 4. Traces depict values for trials from (b) and five additional trials with R-lRSPreaching up to 96 robots. Values of each initially increase but appear to approach an asymptote whichindicates that the distributed planner approaches constant factor suboptimality at large scales.
97n four rounds. The results for objective values by solver are normalized according to themaximum values across solvers for each planning problem and reflect trials with 16 robots.For the results on redundancy, an additional five trials for a four-round distributed plannerwith range limits demonstrate behavior for up to 96 robots. Proposed distributed planners provide consistent improvements in target tracking per-formance (average target entropy) (Fig. 6.3a) compared to myopic planning; distributedplanning in eight rounds roughly matches sequential planning despite requiring as muchas five times fewer planning steps and produces 5–13% better (lower) target entropy thanwhen planing myopically. The objective values (Fig. 6.3b) exhibit a similar trend, and alldistributed planners closely match sequential planning, even in terms of the distributionsof results.Although the redundancy per robot (Fig. 6.3c) initially increases with the number ofrobots, that redundancy eventually levels off. This is consistent with the analysis of scalingperformance in Appendix B.5 which indicates that the suboptimality approaches a constantfactor bound. Overall, the results indicate that a small amount of coordination that doesnot scale with the number of robots is sufficient to produce performance comparable tosequential planning in receding-horizon settings.
This chapter has presented a distributed planner for mutual information-based target track-ing that mitigates the effects of the sequential structure of existing methods for submodularmaximization. The analysis provides bounds on suboptimality for distributed planners thatcan be designed to run with fixed numbers of planning steps. By explicitly accounting forsuboptimal local planning (e.g. anytime planning) and approximation of the objective, weaffirm that the proposed approach is applicable to practical tracking systems. The resultsdemonstrate that distributed planning improves tracking performance (in terms of targetentropy) compared to planners with no coordination and that distributed planning withlittle coordination can even match fully sequential planning given a constant number ofplanning rounds.Although we focus on target tracking, the analysis applies to general multi-objectivesensing problems and generalizes the results in Chapter 5 on coverage. In this sense, wepresent a first positive result for mutual information objectives where the second discretederivative is only nearly monotonic. Planning at this scale is intractable for other planner configurations. hapter 7Time-Sensitive Exploration ofUnknown Environments The line of work that led to this thesis began with multi-robot exploration of unknownenvironments (Chapter 4), and we will now revisit this topic in greater depth. Obtainingimprovements in performance for exploration—in terms of completion time—via submodu-lar maximization had proven to be a challenge as the earlier results exhibited little variationin task performance across different methods for submodular maximization. The methodsfor analysis of redundancy that we have developed since then also do not immediately applyto mutual information objectives for exploration (see discussion in Fig. 5.2). We addressthe former through more thorough development and evaluation of the exploration system.Toward the same end, we also consider the design of objectives for robotic exploration. Indoing so, we also address the theoretic limitations related to mutual information in orderto apply our existing analysis for RSP planning to multi-robot exploration.Unlike the target tracking problem that we studied in the previous chapter (Chapter 6),mutual information and submodular maximization on their own do not address a numberof key challenges for multi-robot exploration. First, environments for exploration do notreflect the common assumption that cells are occupied with independent probability. Suchcell independence assumptions are central to existing information-theoretic objectives [37,103, 201] which often strongly emphasize efficient computation [37, 87, 118, 201]. Althoughthese independence assumptions can be limiting, they admit changes to the occupancy priorfor unobserved space (typically decreasing the prior) [87, 88, 179] (see also Sec. 4.6.1b).Still, even though recent works [118, 201] establish that Shannon mutual information canbe evaluated efficiently and accurately for an individual range observation along a ray,methods for evaluating the joint contributions of multiple rays and camera views dependon approximation by summing over rays [37]. Our analysis highlights connections betweenmutual information and coverage objectives and suggests that coverage can accuratelyrepresent joint contributions when the prior probability of occupancy is low. Moreover,our results demonstrate that switching to a coverage objective can improve completiontimes by as much as 16%.The same connections to coverage support applying RSP planning to provide efficient,distributed submodular maximization for exploration with large numbers of robots. To-99ard this end, we investigate the design of exploration systems and the potential impactsof methods for submodular maximization on the time to explore an environment. First,we introduce a reward for reducing distance to sufficiently informative views [57] whichensures reliable completion of exploration tasks and mimics methods that obtain similareffects via navigation toward frontiers [37, 198]. We also provide simulation results formore than a thousand simulation trials which compare methods for multi-robot coordina-tion via submodular maximization for different environments and numbers of robots. Theresults demonstrate that sequential (greedy) and RSP planning substantially improve sub-optimality for receding-horizon planning for multi-robot exploration. However, decreasingtime to complete exploration tasks remains a challenge as the improvements in suboptimal-ity translate primarily into increases in coverage rates early in the exploration process thatare not often sustained through the end. Still, we remain optimistic that incorporatingmethods for planning and task assignment at larger spatial scales [134, 168] can realizesustained improvements in exploration performance.
The following problem description parallels and reiterates prior discussion in Sec. 2.1 andSec. 2.1.3 which include additional details on the properties of exploration and relatedproblems.Consider a team of robots R = { , . . . , n r } seeking to map a discretized environment E = [ c , . . . , c n m ] consisting of cells C = { , . . . , n m } that are each either free (0) or occupied (1) (that is c i ∈ { , } for i ∈ C ) according to some probability distribution. Each robot r ∈ R moves through the environment with state x t,r ∈ R at some discrete time t according to the following dynamics x t,r = f ( x t − ,r , u t,r ) (7.1)where u t,r ∈ U belongs to a finite set of control inputs U . Robots must remain in free spacewhich induces the constraint that they remain in a safe set x t,r ∈ X safe ( E ) . (7.2)The robots observe the environment with depth cameras (or other ray-based sensors suchas lidar sensors). For the purpose of this chapter we assume that measurements are deter-ministic, without noise; robots can infer definitely that cells in the path of each ray are freeup to the first occupied cell or the maximum range along the beam. For brevity, we ab-stract this process of inference and state that, at each time-step, robots observe sets of cells F cam ( x , E ) ⊆ C with depth cameras and obtain their occupancy values via observations y t,r = h ( x t,r , E ) = { ( i, c i ) : i ∈ F cam ( x , E ) } . (7.3)Through the process of exploration, robots seek to maximize the number of cells ob-served J ( X t , Y t ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:91) t (cid:48) ∈{ ,...,t } ,r ∈R F cam ( x t (cid:48) ,r , E ) (cid:12)(cid:12)(cid:12)(cid:12) . (7.4)100e refer to this quantity as the environment coverage . Exploration is complete once theenvironment coverage reaches an environment-dependent quota B ( E ) when J ( X T , Y T ) ≥ B ( E ) , (7.5)and the robots seek to minimize the amount of time T required to reach that quota andcomplete the exploration process. Like previous chapters, we propose planning with a receding-horizon approach wherebyrobots collectively maximize an objective g over a horizon with L stepsmax u (cid:48) L, n r g ( X t : t + L, n r )s . t . x t + l,r ∈ X safe ( Y t ) x t + l,r = f ( x t + l,r , u (cid:48) l,r ) y t + l,r = h ( x t + l,r , E )for all l ∈ { . . . L } and r ∈ R , (7.6)where X safe ( Y t ) refers to the subset of the state space that is known to be safe givenavailable observations (unlike (7.2) which refers to the complete set of safe states). Aftersolving (7.6), robots then execute the first control actions in the sequence u (cid:48) , n r .Likewise, we can interpret g as a submodular, monotonic, and normalized set functionand can rewrite (7.6) as a submodular maximization problem with a simple partition ma-troid constraint (Problem 4). From this perspective, the ground set consists of assignmentsof control actions to robots U = R × U L , and the blocks of the partition matroid (Def. 6)are assignments { r } × U L to robots r ∈ R .We propose an objective that consists of two components g view and g dist so that, givensome assignment of actions X ⊆ U , then g ( X ) = g view ( X ) + (cid:88) r ∈R g dist ( X r ) , (7.7)where X r is the assignment to robot r ∈ R . The view (or information) reward seeks tocapture the value of observations from camera views over the planning horizon while the distance component provides reward for moving toward regions of the environment (beyondthe planning horizon) where valuable observations can be obtained.Note that g is normalized ( g ( ∅ ) = 0) will retain the monotonicity properties of g view solong as g dist is positive and zero when trajectories are not assigned because the distanceterms are additive. Previously, (in Chapter 4) we also used entropy to evaluate exploration performance. Appendix A.3explains the reasoning behind this change. Being additive, marginal gains for the distance reward are fixed and do not depend on other robots.As such, aside from increasing monotonically, all higher-order monotonicity conditions hold because all second and higher-order derivatives (3.16) are zero. .3 Spatially local volumetric reward
The volumetric reward g view seeks to capture the joint value of observations (camera views)in terms of the amount of unknown space that the robots will collectively observe. Ideally, g view would correspond directly to the increment in environment coverage (7.4). Except,the environment E and, in turn, future values of the environment coverage are unknown.To mitigate this issue, prior works predominantly either compute rewards given anassumed [20, 30, 62, 168] (or possibly learned) guess at the environment instantiation orelse approximate information gain (i.e. mutual information (3.5)) for observations given aBayesian prior [36, 37, 87, 103] (Chapter 4 applies such methods) so far always assumingindependent (Bernoulli) cell occupancy.As well as presenting the approach for this chapter, the following discussion unifiesand contrasts different kinds of exploration rewards and demonstrates that our analysis forRSP planning applies to a variety of volumetric rewards. We will connect different exploration rewards in terms of expected coverage. Considersome possible distribution over possible environments E guess . The only restriction is thatany environment that E guess assigns non-zero probability E (cid:48) ∼ E guess must be consistentwith observations up to the current time Y t ( E guess may even assign zero probability toall but one possible environment). Now, for convenience define the set of future states thatrobots will visit while executing a set of actions X ⊆ U as Φ( X ). Given non-negativeweights on cells w cell : C → R > =0 the expected weighted coverage is g cov ( X ) = E E (cid:48) ∼E guess (cid:88) i ∈ C cov ( X,E (cid:48) ) w cell ( i ) (7.8)where C cov is the hypothetical set of cells that the robots may observe: C cov ( X, E (cid:48) ) = (cid:91) x ∈ Φ( X ) F cam ( x , E (cid:48) ) . (7.9)An obvious weighting scheme, which we apply often in this chapter, is to provide a uniform(fixed) reward for each newly observed cells w new ( i ) = (cid:40) i ∈ (cid:83) t (cid:48) ∈{ ,...,t } ,r ∈R F cam ( x t (cid:48) ,r , E )1 otherwise (for newly observed cells) . (7.10)However, other weighting schemes are also relevant. For example, Yoder and Scherer [199]propose a system for inspecting surfaces. Their surface frontier approach would be similarin spirit to a scheme that provides increased weight to unobserved cells that are near102nown occupied cells. Later, we will see that weighting cells by entropy produces a mutualinformation objective. Now let us discuss the properties of the expected coverage and the relationship tomutual information.
The expected coverage (7.8) is normalized and satisfies alternating monotonicity conditions(including being monotonic, submodular, and 3-increasing). This follows because theseconditions hold for weighted coverage (Theorem 9) and because the expectation forms aconvex combination which preserves monotonicity conditions (and being normalized) [70].Likewise, all the analysis for RSP planning in Chapter 5 applies to receding-horizonplanning for exploration with expected coverage. Note that the expected coverage is not necessarily adaptive submodular [79]. SeeAppendix A.1 for more detail.
The previous section described how expected coverage has favorable properties for solvingoptimization problems. We now comment briefly on a property related to the explorationprocess. Specifically, the volumetric reward g view should be non-zero if and only if the actualenvironment coverage (7.4) (given the true environment E ) will increase after taking anobservation. As such the robot will never be rewarded by g cov for visiting a state whereit will not observe any unknown cells or obtain zero reward when it will observe unknowncells.A condition for this to hold for depth sensors is for the weight w cell to be zero if andonly if a cell i ∈ C has already been observed ( i ∈ (cid:83) t (cid:48) ∈{ ,...,t } ,r ∈R F cam ( x t (cid:48) ,r , E )). Note thatthe uniform weighting scheme (7.10) satisfies this requirement by construction.This sanity condition follows, specifically for depth sensors , because each ray either:1. Terminates (or reaches the maximum range) in known space for all environmentsthat are consistent with prior observations, or2. Terminates in unknown space, providing the occupancy value of at least one cell(even if the first unknown cell in the path of the ray is occupied). If not for mutual information, we would define the weighted expected coverage (7.8), concisely, in termsof unobserved cells and the weight (7.10) as a constant for all cells. However, probabilistic occupancy doesnot lend itself to explicit distinctions between known and unknown values. In fact, the same argument is valid for Theorem 9 which describes the probabilistic coverage objective. Moreover, the expectation is itself weighted coverage, covering a possibly exponentially larger set.This is evident by observing that the sum over weighted coverage functions is equivalent to duplicatingthe set being covered for each summand and adjusting weights according to probability. Note that none of the complications that arise in Chapter 6 apply to this chapter either for the 3-increasing condition or for the behavior of bounds on suboptimality for large numbers of robots, the latterbecause robots have clearly defined sensor ranges and are not observing objects with uncertain positions. Most existing works on mutual information for occupancy mapping assume noisy measure-ments via either a simplified (often Gaussian) [37, 201] or more general [103] noise model.However, Zhang et al. [201] find that the choice of information metric and noise modelhas little impact on performance in exploration experiments. Likewise, Henderson et al.[87] observe that the sensor noise for modern lidar sensors and depth cameras is typicallysmall compared to the maximum range. For these reasons, we assume that sensor noiseis negligible for the purpose of evaluating mutual information for exploration and ignoresensor noise in this chapter. Additionally, prior works on mutual information for map-ping [37, 103, 201] typically assume cell occupancy is independent according to the prior E guess on the environment.The combination of cell independence and lack of sensor noise produces a special caseof an expected coverage objective: Theorem 12 (Noiseless mutual information with independent cells is 3-increasing) . Themutual information I ( E ; Y ( X )) between an environment E with uncertain occupancy andhypothetical future observations Y ( X ) can be written as I ( E ; Y ( X )) = E E (cid:48) ∼E guess (cid:20)(cid:88) i ∈ C cov ( X,E (cid:48) ) H ( c i ) (cid:21) . (7.11)(7.8) given that:1. Cell occupancy probabilities E guess are independent, and2. There is no sensor noise.This expression (7.11) has the form of expected weighted coverage (7.8) and is therefore3-increasing. The proof is included in Appendix B.7. Note that Theorem 12 implies that our resultsfor RSP planning for 3-increasing functions apply to noiseless mutual information just asfor expected coverage. This holds even though mutual information is not 3-increasing ingeneral (see Fig. 5.2).Whether the sanity condition that Sec. 7.3.1b describes applies to mutual informationdepends on the updates to cell occupancy probabilities as observed cells would have tobe marked as occupied with probabilities of either zero or one to ensure zero entropy andreward. However, this can be attributed to pedagogy as this chapter puts relatively limitedweight on probabilistic models. Instead, we note that Julian et al. [103, Theorem 2.5] provessimilar properties for mutual information objectives in general. This does not exclude the limiting case for weights w cell approaching zero which will occur for mutualinformation objectives if an unobserved cell is marked as free or occupied with probability approachingone. Alternatively, note that sensor noise is likely more relevant to perception tasks such as surface recon-struction. .3.2a Limitations of existing approximations for computing mutual informa-tion
Our presentation of mutual information (7.11) (and also expected coverage (7.8)) doesnot yet address computational challenges. While either could be evaluated via sampling,doing so would significantly increase computational costs for systems that need to reactquickly to operate effectively [63, 76]. For this reason, many works on information-basedexploration [37, 87, 118, 201] emphasize computational contributions and approximate eval-uation. However, relatively little is know about how design decisions and approximationscan affect decision-making and exploration performance.Moreover, Chapter 4 applied a CSQMI objective [37] while this chapter will apply acoverage-based objective instead. Our results for a small study indicate that this choicesignificantly improved exploration performance (by as much as 16%). However, we cautionthat we do not intend to establish conclusively that one objective is better than anotherand note that the optimistic coverage objective we compare to is itself a limiting casefor mutual information objectives (see Sec. 7.3.3b). We believe that these connectionsbetween mutual information and coverage are useful for evaluating and improving mutualinformation objectives, and we highlight some of these points and provide additional resultsin Appendix A.4.
Let us now expand on the prior two sections (Sec. 7.3.1 and Sec. 7.3.2) which definevolumetric exploration objectives and their properties and discuss the ramifications ofthese observations for objective design.
For the purpose of this chapter, we select a degenerate prior over environments by optimisti-cally assuming that unobserved space is empty along with the uniform weighting scheme(7.10). This choice is similar in effect to numerous other works on exploration [30, 168]and is compatible with our definition and discussion of expected weighted coverage, despitefailing to produce a meaningful distribution. Moreover, accurate evaluation of optimisticcoverage objective is trivial, even for joint observations by teams of robots. Mapping applications and even numerous papers on exploration [37, 103, 201] frequentlyassume that unobserved cells are independent and occupied with a probability of 0.5. Givena prior on occupancy of p , a beam will terminate after traversing 1 /p cells which worksout to two for a prior of 0.5. However, the environments that robots explore may, more While writing this thesis, optimistic coverage served an important role to characterize the performanceof submodular maximization for exploration. Because the objective values for optimistic coverage arenot approximate and provide oracle rewards in some cases, we can better attribute potential causes fordeficiencies in exploration performance. Similarly, Henderson [88]provides detailed discussion and illustrations that demonstrate how the occupancy priorcan affect decisions.In the limit, a prior with low occupancy probability assumes that all unobserved cells areunoccupied like the optimistic coverage objective. In fact, optimistic coverage is equivalentto mutual information in this case after applying a scaling factor to normalize entropies ofthe unknown cells in (7.11) (and assuming entropies of observed cells are zero). This is auseful observation, because it provides a special case of the mutual information that canbe evaluated exactly and efficiently.
Another useful property is for the objective to provide rewards for observations givenaccess to the environment E (i.e. ground truth) and thereby the true increment in theenvironment coverage (7.4). Comparing to an oracle is a useful tool for characterizinglimits on performance as having an oracle provides the planner with additional informationabout the environment. Prior works on problems related to exploration sometimesapply oracles similarly [158] while Choudhury et al. [48] use such oracles in a learningprocess to train an exploration policy.Notably, the optimistic coverage objective is an oracle for empty environments , and weprovide results for one such environment later in this chapter (Fig. 7.2). Likewise, a learned model may emulate an oracle [158, 190] and serve in a similar role aspart of an expected coverage objective (with a degenerate prior) or a mutual informationobjective (while introducing a small amount of uncertainty). Our results where the objec-tive is an oracle also provide insight into how well an exploration system with a learnedmodel could perform.
Finally, the prior E guess might also encode a distribution over environments such as pro-duced by a generative model where cells are not independent. Choudhury et al. [48] studyexpected coverage in a similar setting.For such general priors, mutual information and expected coverage may differ becausecell occupancy may not be independent. This leads to a philosophical issue for roboticexploration: Should robots seek to observe portions of the environment which they arecertain of but have not seen?
In this case, expected coverage encodes the affirmative (all Inspection reveals that some of this work [179] did not address the topic of priors in as much detailas remembered. Note that we still constrain the robot to operating in the known safe set X safe ( Y t ) despite havingaccess to oracle rewards. An optimal policy for time-sensitive exploration (Sec. 7.1) would obtain strictly better performancewith access to an oracle. Given that we do not have access to an optimal policy, actual results may vary. Our implementation of the optimistic coverage objective includes access to the bounding box forexploration so it is a true oracle for our Empty environment. igure 7.1:
This visualization of sampled informative views in X goal (7.12) (red arrows) and view distance(rainbow with red corresponding to least distance), demonstrates how sufficiently informative views, nearthe boundary of the unobserved space, (light gray) can play a similar role as frontiers for exploration. unobserved cells are valuable), and mutual information the negative (only uncertain cellsare valuable). Of course, this question is rhetorical, and answers will depend on the setting.However, we note that our analysis for RSP planning applies to the former, and the latteris a matter for future work. In addition to being able to evaluate the value of nearby views over the duration of aplanning horizon, robots should be able to reason about the value of visiting view points. Acommon approach is to navigate toward the nearest boundary of unoccupied and unknownspace [198] (the frontier, Sec. 3.1.4) which is much like approaches that have robots navigatetoward view points at or near frontiers [30, 36, 168].The second term of the exploration reward (7.3) seeks to address this issue. Our distancereward g dist is based on methods developed in our prior work [57]. This approach definesinformative regions of the environment in terms of view value via a level set X goal = { x | g view ( x ) ≥ (cid:15) view , x ∈ X safe ( Y t ) } . (7.12)where (cid:15) view > X safe (which we call informativeviews) similarly as described by [57]. Next, solving for shortest path distances between each point in X safe to any point in X goal produces a distance field d goal : X safe → R ≥ . Figure 7.1 provides an example of such Some notable changes are that we replace the novelty threshold with thresholds on yaw and translationdistance and only count new views toward the sample limit. We also compute distances over the entire gridto obtain more idealized results unlike our prior work which approximated values over a local sub-map [57]. a)
80 robot-iters. (b)
448 robot-iters. (c)
816 robot-iters. (d)
Figure 7.2:
The images above visualize an example of the process of exploration of the Skylight environ-ment with 16 robots and RSP planning with n d = 6. Additionally, a video providing examples of the explo-ration process for each environment and number of robots is available at: https://youtu.be/B9j8LVIs384 a distance field along with the sampled informative views. Reducing distances in d goal thenbrings robots closer to valuable views in the goal region X goal . This distance field can becomputed efficiently (for low-dimensional problems as we do on a three-dimensional grid)using the fast marching method [183] which generalizes Dijkstra’s algorithm.The distance reward at time t is proportional to the greatest reduction in the pathdistance over the course of the trajectory g dist ( X r ) = α · max l ∈{ ,...,L } (cid:0) d goal ( x t ) − d goal ( x t + l ) (cid:1) . (7.13)Note that this value is non-negative. Selecting the maximum along the trajectory is in-tended to provide sane rewards for when robots are near the informative region (7.12) andmay exit the region by the end of the planning horizon.Also, just as for frontier methods, this distance term provides the system with a weaknotion of completeness. So long as there are states with observations worth a given value,robots will navigate to those states once local rewards from g view have decreased sufficiently.Finally, observe that the distance to the nearest view in (7.13) could be replaced with thedistance to a specific goal location. This provides an avenue for incorporating methods forgoal assignment (e.g. by solving a cardinality-constrained problem, Prob. 2) or schedulingwhich is common in other works related to multi-robot exploration [46, 134, 168] The following section describes the design of the exploration experiments for this chapter.We provide results for 10 trials per each configuration and environment and for variousnumbers of robots (4, 8, 16, and 32). For intuition, Fig. 7.2 visualizes an example of theexploration process and provides a link to a video providing examples for all environmentsand numbers of robots.
The robot dynamics and sensor model are the same as in the kinematic model in Sec. 4.6.1b.The set of control actions consists of 0 . π/ . ×
12, and a field of view of108 lanner MCTSsamples c p Horizon ( L ) View ValueThreshold ( (cid:15) view ) View DistanceFactor ( α ) Discount FactorSequential 200 1500 10 900 500 0.7Myopic 200 1500 10 300 700 1.0 Table 7.1:
Planner parameters for receding-horizon exploration. The myopic and sequential plannerswere tuned separately to maximize performance for 16 robots in the Boxes and Empty environments(Table. 7.2). All RSP planners use parameters for sequential planning. The parameter c p belongs to theMCTS planner [27] and is set to roughly half the typical value of the objective for a single robot. . ° × . ° , facing forward, oriented with the long axis vertical. Unlike Chapter 4, we donot downsample rays when evaluating mutual information and coverage objectives. Like prior chapters, robots plan by collectively solving receding-horizon planning problems(7.6)—here with the optimistic coverage objective Sec. 7.3.3a—and plan individually viaMonte-Carlo tree search (MCTS) and collectively via some method for submodular maxi-mization. For submodular maximization, we compare sequential planning (Alg. 2), myopicplanning (wherein robots plan via MCTS and ignore others’ decisions), and RSP plan-ning with 1, 3, and 6 rounds ( n d ) except for the larger Office environment (see Sec. 7.5.3)for which we only provide results for n d = 6.We selected parameters by iteratively varying values of individual parameters in simula-tion trials in the Boxes and Empty environments. Parameters were selected separately formyopic and sequential planners with RSP inheriting parameters for sequential planning. Table 7.1 lists the parameters for planning.
We also discount rewards and treat each robot as if having an independent probability offailure after each time-step. This discounting strategy produces a distribution over thestates which robots will visit that is independent of the realization of the environment andis compatible with the theory for the objectives we study Sec. 7.3.1 and 7.3.2. Althoughwe will not go into detail, evaluation of the optimistic coverage objective also remainsstraightforward.Ideally, discounting prevents pathological behaviors where robots indefinitely put offrewards to a future time-step and would weight one robots’ early horizon view aboveanother’s overlapping late horizon view (which is ostensibly more uncertain). However,preliminary experiments demonstrated relatively minor impacts on performance. Appendix A.5 compares suboptimality for RSP to DSGA (proposed for exploration in Chapter 4).Although DSGA generally outperforms RSP for a given number of planning rounds, RSP remains a morepractical choice for distributed settings. Note that RSP is equivalent to myopic planning but will use the same parameters as sequentialplanning so that any adverse impacts of parameter selection for the myopic planner will be evident. .5.3 Environments and simulation scenarios The simulation results evaluate performance across a variety of environments (listed inTable 7.2). In each case, robots start with random yaw and slightly perturbed positionsnear a fixed starting location. We determined maximum coverage values and the lengthsof the simulation trials (iterations per robot) through longer preliminary experiments witha low view value threshold ( (cid:15) view = 100) to encourage more complete exploration. Addi-tionally, all maps use a 0 . contains 1000 gridcells.The environments that we study include synthetic environments that encourage dif-ferent kinds of motions and behaviors, (e.g. upward motion is often slow because robotscannot easily observe space above them while moving upward) an empty environmentwhich seeks to characterize maximal steady-state performance (and where optimistic cov-erage provides oracle rewards (Sec. 7.3.3c)), and more complex office-like and subterraneanenvironments. The following describes how we evaluate the performance of the submodular maximizationsolvers (e.g. RSP) and overall exploration performance in terms of completion time.
While most of this thesis focuses on obtaining bounds on solution quality for broad classesof problems, submodularity also produces certain online bounds (Sec. 3.5.3e, [132]) whichcan provide tighter guarantees for individual solutions [79, 109, 117]. These bounds alsoapply to any feasible solution which makes them suitable for comparing different kinds ofplanners.Most works that we are aware of study these online bounds in the context of cardinality-constrained problems. Adapting these bounds to simple partition matroids produces someslight differences compared to the cited works which we present below. Specifically, maxi-mization steps apply to blocks of the partition matroid instead of the ground set. Considerany—possibly incomplete—feasible solution X ∈ I to an instance of Problem 4 and anoptimal solution X (cid:63) . The following holds, respectively, by monotonicity, submodularity,and greedy choice: g ( X (cid:63) ) ≤ g ( X, X (cid:63) ) ≤ g ( X ) + (cid:88) x (cid:63) ∈ X (cid:63) g ( x (cid:63) | X ) ≤ g ( X ) + (cid:88) r ∈R max x ∈ B r g ( x | X ) . (7.14)We apply two instances of the above bound. For the first, X is the full solution returned bythe planner (assigning actions to all robots) which we refer to simply as the online bound Aside from the initial configuration, the planners also introduce stochasticity into the results. mage Name BoundingBoxVolume ExplorationVolume DescriptionBoxes 216 m
199 m Scattered boxes cause occlusions in a 6 mcube. Robots start offset at bottom andmove upward . Hallway-Boxes 217 m
202 m A rearrangement of the boxes environmentinto a 12 m square prism. Robots start atone end and move toward the other.
Plane-Boxes 227 m
212 m Rearrangement of the boxes environmentinto 2 m tall square planar configuration,and robots start in the center. This high-lights performance in common, primarilytwo-dimensional environments.
Empty 500 m
500 m Robots start on one end of a 20 m hallway-like square prism that is completely devoidof obstacles which highlight steady-stateperformance in open space.
Skylight N/A 220 m Mesh based on survey data [99, 100, 179,196] from the Indian Tunnel skylight atCraters of the Moon National Park, scaleddown to ∼
35% of actual size. Robots startat the top of the mouth (now 7 m in diam-eter).
Office 1300 m A simulated 36 m ×
18 m × Table 7.2:
This table lists details and descriptions for the different test environments. The bounding boxvolume provides the fixed volume of the bounding box for the experiment while the exploration volume lists the approximate maximum environment coverage volume for exploration. Images portray partiallyexplored environments with unknown space (excluding subterranean environments) in gray and occupiedin black.
200 400 600 800 1 ,
000 1 ,
200 1 ,
400 1 ,
600 1 ,
800 2 , . . . . Sub o p t i m a li t y ( l o w e r b o und s ) ObliviousOnline
Figure 7.3:
The above illustrates a representative example for the online and oblivious bounds onsuboptimality (solution value over bound) with five trials for sequential planning and 16 robots, all startingnear the same position . Note that the online bound is tighter early when robots are close together, andthe oblivious bound becomes tighter later, as robots spread out, and more so even later, after robots haveobserved most of the environment (environment coverage nears its maximum at about 1000 robot-iterationsfor these trials). Note that this bound is not exact as we approximate maximization steps in (7.14) withMCTS. Shaded regions show the standard error, and dotted lines show individual trials. because it uses the current solution to bound the optimal solution. Next, we call the casewhere X = ∅ is the empty set the oblivious bound because this case can bound the optimalsolution without planning. This oblivious bound reduces to g ( X (cid:63) ) ≤ (cid:80) r ∈R max x ∈ B r g ( x )which would produce an optimal solution if g were additive (modular).Typically, the online case is tightest for solutions where robots observe all nearby cellssuch as when robots are operating in close proximity (illustrated in Fig. 7.3). On the otherhand, the oblivious case is generally loose when robots are close together with overlappingfields of view and tightest when robots are spread out and observing disparate regions ofthe environment. Regardless, observe that the tightest bound typically exceeds 70% whichis significantly tighter than the a-priori bound of 1/2 for sequential planning.Later, we will use these bounds to characterize solution quality across trials with dif-ferent planner configurations in lieu of comparison on common subproblems.
We evaluate task completion in terms of time (simulation time-steps which we refer to asiterations) to reach quotas for environment coverage (see Sec. 7.1). We provide results forquotas at 90% (completion the exploration task) and 30% (early progress in exploration)of the maximum coverage for each environment (Table 7.2). The latter seeks to provide a Note that (7.14) provides an upper bound on g ( X (cid:63) ) so we can characterize the suboptimality of othersolutions by comparing to the right-hand-side. Intuitively, the online bound might also be tight late when there is little left to explore. However,Fig. 7.3 demonstrates that such behavior can be uncommon in practice, primarily (but not exclusively)due to the distance term in the objective (7.7).
Figure 7.4 summarizes the exploration process from the results for the simulations in termsof environment coverage (7.4) and highlights the maximum coverage values (Table 7.2) andcompletion thresholds (Sec. 7.6.2). Although, there is not always much variation acrossplanner configurations, these plots also illustrate consistency in coverage rates, gracefuldegradation in performance, and reliably complete exploration. The latter, reliable taskcompletion, is evident in the asymptotic convergence and decreasing variance at the endof each trial. This is a product of the distance reward (Sec. 7.4), and we note that Corahet al. [57] provide additional results related to this feature. The rest of this section will gointo more detail on the former points regarding time to completion.
Figure 7.5 illustrates how coverage rates vary as robots reach the early progress and com-pletion thresholds for different environments and numbers of robots. All environmentsexhibit some similar trends as would be expected: slowing exploration over time and withincreasing numbers of robots. Across all this data, coverage rates vary from 120 to 630 (cellsper robot per iteration) a factor of about 5 × , and the Empty environment alone exhibitsa variation of 3 × . While not all of this slow-down is likely to be avoidable, there may beroom to improve performance significantly through the end of each trial via more effectivegoal assignment and long-term planning. On the other hand, coverage rates for completingthe exploration task are somewhat more consistent, excluding the Empty environment, andper robot performance varies by about 2 . × . Still, Plane-Boxes—where robots can spreadout quickly to cover the volume—is the only environment where new robots consistentlymaintain coverage rates (and contribute to improving completion times). On the other This figure depicts results for RSP which performs reliably and avoids the case for Sequential in theHallway-Boxes environment in which not all trials completed the exploration task.
500 1 ,
000 1 ,
500 2 , . . · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) Myp. n r =4 Myp. n r =8 Myp. n r =16 Myp. n r =32Seq. n r =4 Seq. n r =8 Seq. n r =16 Seq. n r =32RSP n r =4 RSP n r =8 RSP n r =16 RSP n r =32RSP n r =4 RSP n r =8 RSP n r =16 RSP n r =32RSP n r =4 RSP n r =8 RSP n r =16 RSP n r =32 (a) Boxes ,
000 1 ,
500 2 , . . · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) (b) Hallway-Boxes ,
000 1 , · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) (c) Plane-Boxes ,
000 1 ,
500 2 ,
000 2 ,
500 3 , · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) (d) Empty ,
000 1 ,
500 2 , · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) (e) Skylight . . . . · . · Robot-Iteration E n v i r o n m e n t C o v e r ag e ( ce ll s ) (f) Office
Figure 7.4:
Environment coverage results for each environment. Shaded regions delineate the standarderror. Black lines demarcate (from top to bottom) the maximum environment coverage, the completionthreshold, and the early progress threshold.
These plots highlight general performance trends. Other plotsbetter highlight differences between planners. . · Num. Robots C o v e r ag e R a t e ( ce ll s p e r i t e r a t i o n ) BoxesHallway-BoxesPlane-BoxesEmptySkylightOffice4 8 16 320200400600 Num. Robots C o v e r ag e R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (a) Early Progress Threshold . · Num. Robots C o v e r ag e R a t e ( ce ll s p e r i t e r a t i o n ) C o v e r ag e R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (b) Complete Exploration
Figure 7.5:
Coverage rates across environments up to (a) the threshold for early progress and (b) upto completing the exploration task both per iteration (top, solid) and per robot per iteration (bottom,dashed). In general, the exploration process slows after the beginning of each trial, and contributionsper-robot decrease gradually as more are added. . . . Sub o p t i m a li t y ( l o w e r b o und ) Seq.RSP RSP RSP (a) Boxes . . Sub o p t i m a li t y ( l o w e r b o und ) (b) Hallway-Boxes . . Sub o p t i m a li t y ( l o w e r b o und ) (c) Plane-Boxes . . . Sub o p t i m a li t y ( l o w e r b o und ) (d) Empty . .
91 Num. Robots
Sub o p t i m a li t y ( l o w e r b o und ) (e) Skylight . . .
951 Num. Robots
Sub o p t i m a li t y ( l o w e r b o und ) (f) Office
Figure 7.6:
Lower bounds on suboptimality for receding-horizon planning for exploration in each envi-ronment. Plots provide mean values and standard error ( w.r.t. the standard deviation of trial means)for data up to the completion time of each trial. These results demonstrate how RSP planning perfor-mance approaches that of sequential planning with increasing numbers of planning rounds from n d = 1 to n d = 6. We exclude the Myopic planner as the suboptimality bounds are not directly comparable to thoseof the other planners because the choice of parameters (Table 4.1) changes the objective values. Instead,RSP also plans myopically but has parameters that are comparable to other planner configurations, andTable 7.3 includes data for all planners. hand, robots slow down significantly after beginning exploration in the Skylight environ-ment, possibly because the robots can spread out quickly in the central volume but slowdown while covering the passages on either side.Later, Sec. 7.7.3 (which also includes completion times in Table 4.1) will go into moredetail on exploration times as a function of the method of planning for multi-robot coor-dination. Figure 7.6 presents results on mean values of the lower bound on suboptimality (thegreater of the two bounds in Sec. 7.6.1), and Table 7.3 lists this data as well. These plotsclearly demonstrate how the suboptimality of RSP planning approaches that of sequentialplanning (Alg. 2) with increasing numbers of planning rounds ( n d ) as the performance Bounds are computed approximately via MCTS (for single-robot planning) and are only representativeof suboptimality in multi-robot coordination. um. Robot Myopic Sequential RSP RSP RSP Avg. Std. Avg. Std. Avg. Std. Avg. Std. Avg. Std.
Boxes .
90 0 .
015 0 .
96 0 .
010 0 .
95 0 .
008 0 .
96 0 .
012 0 .
96 0 . .
77 0 .
021 0 .
89 0 .
012 0 .
87 0 .
021 0 .
89 0 .
013 0 .
90 0 . .
69 0 .
008 0 .
80 0 .
016 0 .
76 0 .
020 0 .
79 0 .
010 0 .
80 0 . .
70 0 .
013 0 .
72 0 .
008 0 .
68 0 .
012 0 .
71 0 .
006 0 .
71 0 . Hallway-Boxes .
92 0 .
022 0 .
97 0 .
008 0 .
96 0 .
014 0 .
97 0 .
008 0 .
97 0 . .
83 0 .
019 0 .
94 0 .
008 0 .
90 0 .
014 0 .
93 0 .
010 0 .
93 0 . .
72 0 .
022 0 .
86 0 .
018 0 .
82 0 .
014 0 .
86 0 .
015 0 .
86 0 . .
70 0 .
011 0 .
79 0 .
018 0 .
73 0 .
018 0 .
78 0 .
015 0 .
80 0 . Plane-Boxes .
96 0 .
007 0 .
98 0 .
006 0 .
97 0 .
010 0 .
98 0 .
004 0 .
98 0 . .
88 0 .
011 0 .
96 0 .
009 0 .
94 0 .
006 0 .
96 0 .
007 0 .
96 0 . .
76 0 .
016 0 .
90 0 .
004 0 .
86 0 .
012 0 .
89 0 .
015 0 .
89 0 . .
69 0 .
008 0 .
79 0 .
016 0 .
74 0 .
015 0 .
78 0 .
011 0 .
78 0 . Num. Robot Myopic Sequential RSP RSP RSP Avg. Std. Avg. Std. Avg. Std. Avg. Std. Avg. Std.
Empty .
94 0 .
013 0 .
98 0 .
003 0 .
97 0 .
012 0 .
97 0 .
008 0 .
98 0 . .
87 0 .
022 0 .
95 0 .
008 0 .
92 0 .
012 0 .
95 0 .
008 0 .
94 0 . .
75 0 .
014 0 .
88 0 .
012 0 .
82 0 .
024 0 .
88 0 .
012 0 .
88 0 . .
68 0 .
013 0 .
79 0 .
020 0 .
71 0 .
026 0 .
80 0 .
023 0 .
79 0 . Skylight .
97 0 .
006 0 .
99 0 .
004 0 .
99 0 .
004 0 .
99 0 .
004 0 .
99 0 . .
90 0 .
016 0 .
96 0 .
008 0 .
95 0 .
011 0 .
96 0 .
008 0 .
96 0 . .
79 0 .
023 0 .
90 0 .
014 0 .
87 0 .
018 0 .
90 0 .
011 0 .
91 0 . .
71 0 .
021 0 .
83 0 .
009 0 .
78 0 .
018 0 .
82 0 .
012 0 .
82 0 . Office .
98 0 .
006 0 .
99 0 .
001 – – – – 0 .
99 0 . .
95 0 .
007 0 .
97 0 .
007 – – – – 0 .
97 0 . .
88 0 .
011 0 .
93 0 .
006 – – – – 0 .
93 0 . .
78 0 .
016 0 .
86 0 .
008 – – – – 0 .
85 0 . Table 7.3:
Lower bounds on suboptimality for receding-horizon planning for exploration in each environ-ment. See Fig. 7.6 for more detail. bounds for these planners would suggest (Theorem 7). Moreover, we see that actualsuboptimality is consistently better than the worst case bounds (even approaching onefor fewer robots) which is consistent with observations from related works [79, 109, 117].Likewise, the performance gaps widen with increasing suboptimality and larger numbersof robots (with the difference reaching 8% for in Empty). The decrease in these boundswith increasing numbers of robots is representative of robots operating in closer proximityand with overlapping observations over the planning horizon (see (7.6)). The picture for planning performance becomes more complex when we look at the explo-ration process which is affected by factors such as the design of the exploration objective(7.7), (including the view and distance components) and the use of a receding-horizon withfixed trajectories (7.6). This is evident from Table 7.4 which lists statistics for time to com-pletion and time to reach the early progress threshold (see Sec. 7.6.2). Most planners fora given environment and number of robots perform comparably both at the end of eachtrial and near the beginning.The exception for complete trials is the Plane-Boxes environment where both the My-opic and RSP planners perform worse for 32 robots (by about 6%). This is likely a productof how the small size and limited vertical mobility of the Plane-Boxes environment producesmore frequent and complex interactions between robots.Figure 7.7 compares performance across planners in terms of the rate of increase inenvironment coverage (via the data in Table 4.1). Although the variations in coveragerates are less significant than for suboptimality, the trends that the data does representare consistent with expectations: the Myopic and RSP planner configurations (which eachinvolve planning myopically, robots ignoring others’ decisions) perform worst wheneverthere are significant differences across planners (e.g. by 14% for 16 robots in the Hallway-Boxes environment).The differences in these coverage rates also narrow somewhat going from 16 to 32 robots,unlike the trends for suboptimality (Fig. 7.6) where the performance gaps widen. The pro- Strictly, the bounds for RSP planning only establish convergence to the same worst case suboptimalityas sequential planning (1/2), but we expect comparable suboptimality in practice. E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) Myp.Seq.RSP RSP RSP (a) Boxes E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (b) Hallway-Boxes E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (c) Plane-Boxes E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (d) Empty E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (e) Skylight E a r l y C o v . R a t e ( ce ll s p e rr o b o t - i t e r a t i o n ) (f) Office
Figure 7.7:
Coverage rates per robot up to the early progress threshold are generally similar acrossplanners without many significant differences. Still, the differences that do exist consistently highlightdeficiencies in the Myopic and RSP configurations (each plans myopically with different parameters).Shaded regions depict standard error. um. Robot Myopic Sequential RSP RSP RSP Avg. Std. Avg. Std. Avg. Std. Avg. Std. Avg. Std.
Boxes
Hallway-Boxes
Plane-Boxes
Empty
Skylight
Office ,
128 55 1 ,
130 60 – – – – 1 ,
110 6016 1 ,
685 25 1 ,
566 70 – – – – 1 ,
580 4932 2 ,
672 88 2 ,
578 87 – – – – 2 ,
646 63 (a)
Early progress times
Num. Robot Myopic Sequential RSP RSP RSP Avg. Std. Avg. Std. Avg. Std. Avg. Std. Avg. Std.
Boxes ,
198 56 1 ,
214 46 1 ,
276 86 1 ,
254 65 1 ,
243 56
Hallway-Boxes ∞ – 676 58 648 37 666 478 737 58 701 48 709 34 725 56 707 3516 942 38 945 43 941 40 935 46 934 5332 1 ,
495 55 1 ,
457 55 1 ,
491 67 1 ,
507 53 1 ,
497 58
Plane-Boxes
Empty ,
078 24 1 ,
039 29 1 ,
038 32 1 ,
036 33 1 ,
067 368 1 ,
138 56 1 ,
091 49 1 ,
111 27 1 ,
072 32 1 ,
075 4216 1 ,
399 55 1 ,
388 43 1 ,
436 47 1 ,
389 36 1 ,
391 4332 2 ,
174 47 2 ,
202 72 2 ,
211 61 2 ,
217 44 2 ,
222 38
Skylight ,
008 192 948 155 1 ,
024 1538 907 72 908 158 896 69 928 90 915 7416 1 ,
014 58 1 ,
012 42 1 ,
124 104 1 ,
012 80 1 ,
002 5332 1 ,
418 90 1 ,
415 60 1 ,
452 52 1 ,
434 41 1 ,
423 66
Office ,
675 1 ,
162 3 ,
696 312 – – – – ∞ –8 3 ,
831 236 4 ,
056 466 – – – – 3 ,
747 17816 4 ,
538 276 4 ,
482 222 – – – – 4 ,
623 61632 6 ,
477 275 6 ,
276 206 – – – – 6 ,
250 151 (b)
Completion times
Table 7.4:
Completion times (in robot-iterations) and times for reaching the early progress threshold foreach environment and planner (see Sec. 7.6.2). cess for parameter selection (Table 4.1) which focused on completion times for 16 robotspartially explains this phenomenon. However, this may suggest that the receding-horizonoptimization problems we formulate become less representative of actual performance withincreasing numbers and crowding of robots. One possible cause is that the regions robotsare planning to observe toward the end of their planning horizons are frequently being ob-served sooner by other robots. This produces uncertainty in future actions which assumingfixed trajectories does not account for but which might be alleviated by selecting a smallerdiscount factor.
The system design and results from this chapter significantly improved on the earlier workin Chapter 4. Adding a term for distance to informative regions to the objective of theenvironment, ensured reliable task completion which would not be true for our prior results(Fig. 4.4). Likewise, employing a coverage-based objective also improved performance (byas much as 16%, see Appendix A.4). Incidentally, more mundane changes also contributed.Running experiments on a 16-core AMD Ryzen processor enabled us to dismiss approxi-mations such as down-sampling rays and to perform more effective parameter tuning.This chapter also overcame an important challenge that arose from this thesis by resolv-ing limitations on applying RSP planning to exploration. This came about by demonstrat-ing that mutual information on occupancy grids without noise is 3-increasing. Moreover,this ensures that the suboptimality guarantees for RSP apply to receding-horizon planningfor exploration: constant-factor suboptimality, approaching 1/2, for a fixed number of plan-ning rounds ( n d ) and any number of robots subject to the extent of inter-robot interaction119via the pairwise weights). To reiterate prior chapters, this provides an O ( n r ) speed-upcompared to sequential planning (Alg. 2) while approaching the same suboptimality guar-antee. Additionally, while this chapter did not focus on distributed algorithms, the nextwill present a distributed implementation of RSP and application to similar explorationtasks.Still, realizing significant improvements in performance via submodular maximizationand RSP planning proves challenging. Our results narrow this gap by establishing sig-nificant improvements in suboptimality on the receding-horizon subproblems as well asimprovements in coverage rates early in the exploration process. There are a few clear avenues for further reducing completion time and sustaining improve-ments in coverage rates.Regarding solution quality, the online lower bounds suggest that, although solutionsto the receding-horizon optimization subproblems predominantly outperform worst casebounds (1/2), they may still be far from optimal (as low as 70%). Iteratively improv-ing solutions [7, 184] or developing an applicable variant of the continuous greedy algo-rithm [31] could each improve suboptimality. However, there is no guarantee that doingso would significantly improve exploration performance—this is evident from the largegaps in suboptimality for trials with 32 robots which do not translate into commensurateimprovements in completion times.The exploration process also often slows after the beginning of a trial. In some ofthese cases, better routing and goal assignment [134, 168] could alleviate this issue. Suchapproaches could be implemented within our framework by swapping the distance rewardgoal regions (7.12) with distances to specific goals.120 hapter 8Implementation of Distributed,Receding-Horizon Planning forExploration
While the last chapter focused on task performance in exploration, this chapter focuses onthe implementation of a distributed, receding-horizon planner. The planner is synchronousand runs in real time in an anytime fashion. Moreover, the implementation is simple anddoes not depend on information about other robots, aside from the sensor data that thoserobots use to construct maps of the environment.A number of works propose distributed or decentralized algorithms for online multi-robot informative path planning problems [7, 18, 151, 160, 174] much like those that arisein this thesis. Fewer present results for planning in real time in simulation or with realrobots [38, 160, 174] as we do in Chapter 4 and typically with no more than three robots.Furthermore, actual implementations may still be centralized, again in line with our earlierwork [160]. On the other hand, Sukkar et al. [174] provide results for two robots in anagricultural active perception task with a distributed implementation of Dec-MCTS. Withthis chapter, we seek to provide an implementation of a distributed sensor planning systemthat is suitable for large numbers of robots.Auction algorithms can also solve submodular maximization problems [194] and areperhaps more mature as Ponda [146] provides results for a task assignment problem withsix robots with wireless communication for both aerial and ground robots and as many assix robots at once. Those results demonstrate a version of the Consensus-Based BundleAlgorithm (CBBA) [46].At a high level, agents in auction algorithms like CBBA compute marginal gains (bids)and iteratively communicate sets of assignments to neighbors, seeking to come to consensuson the maximization steps of Alg. 1. We provide a comparison to analogous auctionalgorithms in terms of solution quality and messaging costs. While auctions convergequickly given full access to objective information, our results indicate that constraints on As opposed to offline (equivalently non-adaptive) problems where a plan is computed in entiretybefore execution time [169].
Let us begin with a brief background on distributed algorithms and communication net-works.
This chapter describes a partially synchronous implementation of R-lRSP based on mes-sage timing and with distributed memory [124]. For contrast, the directed graph structureof RSP also lends itself to asynchronous implementation. For asynchronous models, com-putation times, message arrival times, and order all may vary. An asynchronous version ofRSP could be implemented by providing access to neighbors in the directed graph structureof the planner and making each robot wait to start planning until receiving decisions fromall in-neighbors.
Given a network of agents, there are many ways by which those agents may communicate.One the most common is unicast which we also refer to as point-to-point communication.For unicast communication, individual agents send messages to other individuals. On theother extreme, broadcast refers to sending messages from one agent to all others. The actualimplementation we present effectively implements broadcasts via ROS. Communicationneighborhoods for R-lRSP are also naturally amenable to multicast communication as therobots send identical messages to each of their neighbors.
This chapter presents a distributed algorithm for implementation on teams of mobilerobots. However, operation in subterranean and urban environments can limit commu-nication between robots [138]. Designers may then enable communication between robotsthat are not immediately connected by establishing a mesh network over the team of robots.One complication with this approach is that connectivity between robots can changeas they move about the environment. Networks that address this challenge by allowingfor links to change and reconstructing routes online are called mobile ad hoc networks(MANETs). One example of a MANET protocol that may be appropriate for use along-side the algorithm we propose is Better Approach to Mobile Adhoc Networking (BAT-MAN) [115]. However, we note that support for multicast communication appears incom-122lete, and implementations of RSP may be limited to unicast communication, dependingon the underlying mesh network.
This section describes a distributed implementation of RSP with a focus on application toexploration problems much like those in prior chapters. Toward this end, consider a teamof robots R = { , . . . , n r } exploring some environment. The robots plan in a distributedmanner and collectively solve receding-horizon sensing problems (Prob. 4) via methods forsubmodular maximization while simultaneously maintaining distributed representations ofthe environment. The following describes that distributed planner and its operation. The distributed algorithm in this chapter is based on a partially synchronous, distributedmemory model. The implementation itself is timing-based [124], and robots solve receding-horizon submodular maximization problems (Prob. 4) in epochs with fixed start and endtimes. For the purpose of presentation, we assume that robots transmit messages reliablyand instantaneously, potentially on some mesh network. Likewise, we assume robots haveaccess to synchronized clocks; although this work does not address clock-synchronization,the time scales in this work (seconds) are much slower than the accuracy (milliseconds)that common methods such as the Network Time Protocol (NTP) can provide [131].Regarding information access, robots have access to local models of the environmentas in Sec. 3.6.1 and are only able to accurately approximate marginal gains for their ownactions such as due to latency in updates from distant robots.
Robots have access to some local approximation of the environment θ i e.g. an occupancymap. Maintaining models of the environment involves some sort of distributed perceptionsystem. If robots simply share sensor data, the number of messages sent is quadratic withthe number of robots. Compressed representations [57] can alleviate this burden somewhatas well as modifying the system to avoid the need for a complete models. On the lines of the definition of the R-lRSP planner in Sec. 5.5.4, we assume robotscommunicate with a local neighborhood N c i for each robot i ∈ R . This neighborhood maybe based on metric distance or a number of hops in a communication graph.Then, aside from maintaining the local environment model, the only access to (orknowledge of) the multi-robot team and mesh network that we assume is the ability to123ommunicate with these neighbors N c i . In our analysis, we assume point-to-point messag-ing on the mesh network, with messages possibly traveling for multiple hops. However, thedesign is amenable to other modalities such as multicast. The exploration system in this chapter also includes a view distance reward like the oneSec. 7.4 describes. To avoid growth in computation time, we compute the distance field onlyon a sub-map around the robot as in our prior work [57]. More generally, a distributedroad mapping strategy would be appropriate and in line with recent works on roboticexploration [186, 195].
Algorithm 5 describes the implementation of RSP. Robots solve instances of receding-horizon planning problems over the course of each epoch, once every ∆ d seconds, andplanning for individual robots runs for ∆ p = ∆ d n d seconds (less some time for slack inpractice), recalling that n d refers to the number of sequential planning rounds for the RSPplanner.At the beginning of each epoch, robots sample their assignments to planning rounds,uniformly at random (lines 8–9). Each robot i ∈ R then waits for the beginning of itsrespective round (line 10) and, meanwhile, listens for planned actions from robots in itsneighborhood N c i and receives X d N in i (line 11). The robot then plans sensing actions (line 12)over a receding horizon in an anytime fashion—in our case via MCTS—given those receivedactions, its state x i , and its local representation of the environment θ i . Finally, robots sendtheir planned actions to their neighbors and execute their plans (lines 13–14).The following sub-sections expound on the implementation and behavior of this dis-tributed algorithm. The messages that Alg. 5 uses to send and receive decisions consist of:1. The unique id of the sender2. The number of the current planning epoch3. A representation of the finite-horizon plan (e.g. the initial state and time and asequence of action indices)A typical decision message consists of about 130 bytes.
The implementation of Alg. 5 differs from other distributed algorithms in this thesis (asidefrom being the only actual distributed implementation) because it avoids depending on124 lgorithm 5
Synchronous, distributed implementation of Range-limited Randomized Se-quential Partitions (R-lRSP) from the perspective of robot i n e ← number of the current planning epoch n d ← number of planning rounds ∆ d ← duration for execution of the distributed planer ∆ p ← ∆ d /n d (planning time per robot) N c i ← Robots (neighbors) within communication range θ i ← belief (e.g. map) available to robot i x i ← state (e.g. position) of robot i Wait for the start of the current planning epoch at time n e ∆ d d i ∼ { , . . . , n d − } (randomly select one of n d planning rounds) At time n e ∆ d + d ∆ p Receive : X d N in i (listen for action selections from nearby robots N in i ⊆ N c i ) x d i ← PlanAnytime ( θ i , x i , X d N in i , ∆ p ) (plan given available time) Send : x d i to N c i (send plan with epoch n e for neighbors in later rounds) Execute : x d i objects related to the planning process. Instead, the implementation focuses on handlingmessages and scheduling planning times. As such, the planning step ( PlanAnytime ) issimply a callback with messages and planning time as inputs and outputs.We chose this design to simplify the process of augmenting existing single-robot sensorplanning systems with distributed reasoning. The main limitation of this approach isthat the user becomes responsible for tracking any statistics related to planning such asobjective value or solution quality.
Additionally, the realization of the directed graph structure for the planning process (asdescribed in Sec. 5.3) is implicit given the set of decisions each robot receives. When a robotbegins planning for itself, it simply uses the messages that are available at that scheduledplanning time. Messages that arrive late or fail to arrive (such as due to a crashed robot orloss of communication in a subterranean environment) do not prevent or delay the planningprocess. Instead, failures in messaging contribute to increasing suboptimality accordingto the realization of the directed planner graph (as in Theorem 6 or Theorem 10). Thisnon-blocking behavior is valuable, considering that a naive implementation of a sequentialplanner that waits to receive messages from previous robots will not produce a completeplan on a disconnected network or if there are communication failures.125 .2.5d Toward preventing collisions between robots
The planner that we present in this chapter is also not aware of possible collisions betweenrobots unlike the planner in Chapter 4. Moreover, the approach to preventing inter-robotcollisions in Chapter 4 depends on sequential reasoning to determine which robots executeplanned trajectories and which execute fallbacks. However, a planner could be designedthat allows robots to update their decisions in parallel because: • Any robot can commit to a new plan if doing so avoids collisions with other robots’old and new plans. • Local and global maxima (with respect collision-neighbors) given any ordering canalways commit to new plans, using the maxima to break symmetry.Such an approach would preserve most of the guarantees from Chapter 4 (e.g. liveness)with only local rules.Alternatively, other collision-avoidance tools such as barrier methods [189] (which havealso been specialized for aerial robots [121, 188]) can prevent collisions by augmentingplanners that are not necessarily collision-aware. Barriers methods benefit from beingminimally invasive (changing robots’ plans only when necessary). At the same time, suchchanges may harm sensing performance along a robot’s trajectory so further study on thetopic of inter-robot collisions would be beneficial. Because messages contain the number of the planning epoch, robots are able to determinewhether to accept messages (if they belong to the current epoch) or reject them (becausethey did not arrive in time for a previous epoch). This also provides a mechanism forevaluating whether timing mechanisms are working. Due to random assignment to planningrounds (line 9) and because robots should receive messages from prior rounds on time, thenominal message acceptance rate is E (cid:20) n accept n accept + n reject (cid:21) = 12 (cid:18) − n d (cid:19) . (8.1)Tracking whether the empirical acceptance rate matches expectations then enables design-ers to determine whether the planning system is respecting timing constraints and whetherlatencies in messaging are preventing messages from being received in time.Note that this is distinct from rejecting messages that are outside the communicationrange. Hypothetically, a robot might also receive and ignore messages from a distant robotbut at later times due to multi-hop communication. Such messages would then have to beexcluded from these statistics on acceptance rates (based on timing). A collision-neighbor is a robot that is near enough that some pair of trajectories over the finite horizoncould be in collision. This calculation includes unnecessary messages sent during the last round. Omitting those messageswould require adaptation of this formula. .3 Asymptotic behavior for messaging
To begin, the number of sequential message transmissions—which we refer to as the com-munication span—for Alg. 5 is constant ( n d − n d if including messages which are sentduring the last round and so not used). The total number of messages sent is in O ( n r n n )where n n = max i ∈R |N c i | is the largest number of communication neighbors over all robots.The total number of message hops in a mesh network depends on the length of the longestshortest-path d n from any robot i to a communication neighbor j ∈ N c i and comes to O ( n r n n d n ). The total communication volume, considering the number of decisions beingsent, is the same. The numbers of messages, message hops, and communication volumeare then constant per-robot if n n and d n are bounded.In general, distributed perception is more expensive. In the worst case, robots mayshare all sensor data (or summaries thereof) and exchange O ( n r2 ) messages per unit time.Then, if the diameter of the mesh network (the length of the longest shortest-path betweenany two robots) is d m , the number of message hops and communication volume come to O ( d m n r2 )Mechanisms such as compressed representations [53] can reduce these communicationcosts. Further improvement may be possible by maintaining spatially local models, butsome sort of global modeling is necessary in the framework we present such as to computedistances to long-term goals or views (Sec. 8.2.4). The results for this chapter include two studies. The first set of results (Sec. 8.4.1) char-acterize communication costs in relation to other possible distributed solvers in a settingbased on the coverage experiments in Chapter 5. This supports the second set of re-sults (Sec. 8.4.2) which demonstrate the synchronous, distributed implementation of RSP(Alg. 5) via a simulation of exploration with dynamic aerial robots (as in Chapter 4) whichruns in real time but in a setting where communication costs are trivial.
This study seeks to characterize communication costs for distributed submodular maxi-mization on a mesh network in comparison to other relevant approaches. The submodularmaximization problems are based on the coverage problems in Sec. 5.7.2. The objective ofthese problems is to maximize area coverage over a unit square. Actions cover circles withradius r s and are distributed around agent centers within a radius of r a . The radii are setso as to normalize the sum of the areas of all sensing actions for a given number of agents,as described in Table 5.1. The simulations vary the number of agents from 10 to 100 inincrements of 10, and we provide results for 50 trials for each configuration.Agents communicate decisions on an undirected communication graph with edges be-tween any pair of agents within a radius of r c = 3 r a . Due to the choice of sensor and agentradii, this ensures that there is an edge between any two agents that may have non-zero127edundancy. Unlike the experiments in Chapter 5, we require agents to form a connectedcommunication graph (to avoid complications with comparisons to other solvers). To gen-erate such agent positions, the position of the first agent center is sampled randomly andthe rest by sampling a random agent and a position within range of that agent and insidethe unit square.The solver configurations compare R-lRSP to sequential planning (Alg. 2) and two auc-tion algorithms (which converge to results equivalent to Alg. 1). The results for R-lRSPencompass planning with different numbers of rounds n d ∈ { , , } . To reduce sub-optimality and simplify comparisons, the communication range for R-lRSP is set to onestep (or r c ). As such, the cost of ignoring decisions outside the communication range(5.18) is zero, and agents only communicate with their immediate neighbors during eachround of the planning process. For these results, we assume that agents have access totheir communication-neighbors’ planning rounds d j for j ∈ N c i and only send messagesto agents within range and in later rounds (whereas Alg. 5 includes communication with all agents within range). For the sequential planner, agents plan in the order they weregenerated. Messages travel along shortest paths, and we take advantage of the structureof the assignment process to consolidate decisions into individual messages sent betweenadjacent agents in the planning sequence. The auction algorithms are adapted from CBBAby Choi et al. [46] in order to be applicable to general submodular objectives. These twoauction algorithms—detailed in Appendix A.6—differ in terms of information access: the global auction algorithm (Alg. 6) requires full access to the objective g and the abilityto re-evaluate marginal gains for other agents’ actions; and for the local auction algo-rithm (Alg. 7) agents only compute values of their own actions, and they communicateassignments in lists along with the corresponding marginal gains. While the global auc-tion algorithm will converge faster, this comes at the cost of a strict requirement that allagents have complete and consistent access to the objective which violates the assumptionson information access (Sec. 8.2.1). The local auction converges more slowly but respectsconstraints on information access for local models. These auction algorithms representtwo extremes. Specialized implementations might interpolate between the two such as bytracking whether assignments change within a given distance of each agent, somewhat likethe specialized update rules for CBBA. In the spirit of online planning, we also provideresults for early stopping after a given numbers of steps (matching the numbers of roundsfor R-lRSP) as well as for planning to convergence. Figure 8.1 plots objective values and communication costs for varying numbers ofagents. The numbers of messages include each hop on the communication graph. However,only the sequential algorithm sends messages that travel for multiple hops. The communi-cation volume then weights messages and hops by numbers of decisions being sent (ignoringthe marginal gains that the local auction algorithm also sends). The span is the number This could be implemented by sharing random seeds used to generate the assignments to planningrounds. Issues related to inconsistencies in the objective can prevent convergence which is an important topicfor related works on auction algorithms [98]. We state that an auction algorithm has converged once all agents have a complete set of matchingassignments. has a span of 3, and the spanfor auction algorithms with early stopping is at most as such.While these results reflect a large variation in the number of agents, the smallest com-munication radius is still nearly half the length of the unit square. This may explain theslow growth in spans (Fig. 8.1b) for the global auction algorithm. Likewise, the largecommunication radii cause messaging costs to increase quickly compared to asymptoticrates for R-lRSP and both auction algorithms (due to increasing numbers of neighbors)and more slowly for sequential planning (as paths between sequential agents are short).Overall, R-lRSP maintains constant span, small communication volume, and consistentperformance in terms of objective values and requires roughly eight rounds to approachobjective values for the global auction solver. Interestingly, the convergence times forglobal auctions grow slowly, but this may be a product of small network diameters inour results. The global auction solver may then be appropriate for sensor planning insimilar settings if designers can accept greater communication costs and strict constraintson information access. On the other hand, the local auction solver better reflects our as-sumptions on information access, but early stopping significantly harms objective values. These results also exclusively address communication; even when either auction algorithmconverges more quickly, the robot whose action is assigned last completes maximizationsteps for each prior assignment and incurs a computational cost equivalent to sequentialplanning for the entire team (Alg. 2). Conversely, for R-lRSP each robot completes only asingle maximization step.
Having, addressed the question of communication costs, let us now evaluate the distributedplanner in the context of simulated multi-robot exploration (with Fig. 8.2 visualizing theexploration process for these experiments). For these results, the robot and sensor modelsare identical to the distributed model in Sec. 4.6.2. Likewise, individual robots plan withMonte-Carlo tree search [27, 39] with the same motion primitive library. Aside from thedistributed implementation, the primary differences in the exploration system, comparedto Chapter 4, are that planners do not check for inter-robot collisions, inclusion of aview distance term (see Secs. 8.2.4 and 7.4), and use of the optimistic coverage objective(Sec. 7.3.3a). The distributed, receding-horizon planner (Alg. 5) runs in real time in ananytime fashion with an epoch duration of 3 seconds and a horizon of 4 seconds. Becausethe epoch duration is constant, robots planning with greater numbers of rounds ( n d ) havedecreasing amounts of time to plan for themselves. The simulation results include 30 trialswith 8 robots and planning with distributed RSP for 1, 2, and 3 planning rounds. The The auction algorithms produce large communication volumes and numbers of messages because allagents plan and communicate during each communication round and because agents communicate sets ofdecisions rather than individual decisions like RSP. The diameter is the length of the longest shortest-path between any two agents. The distributed exploration system exhibited in Sec. 8.4.2 would also violate the requirements of theglobal auction solver, simply because the robots query the maps at different times so that the models forplanning are not entirely consistent with each other. . . O b j ec t i v e ( f r a c t i o n o f un i t a r e a ) Local-Auction Global-Auction SequentialLocal-Auction-16 Global-Auction-16 R-lRSP-16Local-Auction-8 Global-Auction-8 R-lRSP-8Local-Auction-4 Global-Auction-4 R-lRSP-4 (a)
Objective values 10 Num. Agents M e ss ag i n g s p a n (b) Communication span10 Num. Agents M e ss ag e ss e n t (c) Communication messages 10 Num. Agents M e ss ag i n g v o l u m e (d) Communication volume
Figure 8.1:
Objective values and messaging statistics for various distributed submodular maximizationalgorithms. Shaded regions (although small) show standard error. (a) Objective values (out of one)remain consistent with increasing numbers of agents, except for the local auction solver which degradessignificantly when querying solutions before convergence. RSP requires eight or more communicationrounds to obtain performance comparable to the sequential solver or global-information auction. (b)When considering the span (number of sequential message hops), only RSP obtains constant values, butthe global-information auction solvers still retain a small spans, peaking at slightly over nine. (c) Numbersof messages (including hops) and (c) total message volume (by numbers of decisions and hops traveled)all grow super-linearly as the smallest communication radius (0.48) covers nearly half the length of theenvironment. While sequential planning benefits from being able to consolidate messages, RSP obtainsthe smallest communication volumes and consistently outperforms all auction planners by about at leastan order of magnitude on both metrics. a)
27 seconds (b)
418 seconds (c)
809 seconds (d)
Figure 8.2:
The above images visualize exploration of an office environment with eight robots and RSPplanning with n d = 3. Time stamps are approximate. A video of the exploration process can be found at: https://youtu.be/MMr9NxT_J8c ,
000 1 ,
200 1 , · Time (s) E n v i r o n m e n t C o v e r ag e ( ce ll s ) RSP RSP RSP (a) Environment coverage 2 300 . . Num. Planning Rounds ( n d ) A cce p t a n ce R a t e s ( b y r o b o t) (cid:16) − n d (cid:17) (b) Acceptance rates800 900 1 ,
000 1 ,
100 1 ,
200 1 ,
300 1 ,
400 1 , n d =1Msgs. Enabled, n d =1Msgs. Disabled, n d =2Msgs. Enabled, n d =2Msgs. Disabled, n d =3Msgs. Enabled, n d =3 Completion Time (s) (c) Completion times
Figure 8.3:
Results for exploration with the distributed RSP implementation. (a) Mean environmentcoverage and standard error (shaded regions) for different numbers of robots. Black lines demarcate themaximum environment coverage and a 90% threshold for task completion. (b) Message acceptance ratesclosely match predictions (8.1) which indicates that the synchronous execution of anytime planning roundsis functioning properly. (c) Planning with two or three rounds ( n d ) improves median completion times byabout 5% compared to the same planners after disabling communication of control actions. Note that thecases for n d = 1 effectively represent the same configuration. is 2687m .The implementation of the distributed planner takes advantage of ROS [148] for mes-saging, and planners for each robot run in separate processes (nodes). The simulationsthemselves run on a single desktop computer with a 16-core Ryzen 2950X processor. Whilethe processor provides ample capacity to run computation in parallel, communication costsare not well-represented. Providing projected communication costs can ameliorate this is-sue somewhat. Regarding messaging span and latencies, the largest span (for n d = 3)produces a span of two. Even significant latency, say 0 . . . . − . The simulatedrobots also maintain maps by assimilating camera data from the entire team. Still, allplanning runs on spatially local sub-maps, and our approach is compatible with meth-ods for distributed mapping that can significantly reduce communication costs such as viaGaussian mixture models [57, 178, 180]. Extrapolating from results for Gaussian mixturemapping [57, Fig. 7], maintaining distributed maps for eight robots, again by point-to-point communication, would require a total of about 400KiB s − of communication volume.Figure 8.3 plots results for environment coverage, message acceptance, and task comple-tion. The synchronous planner functions as designed (Fig. 8.3b) with message acceptancerates matching predictions (8.1). This is consistent with the requirement that robots haveaccess to decisions from prior rounds for RSP planning and subsequently to apply subopti-mality guarantees for RSP (Theorem 7). In our implementation, robots query their mapswhen they begin planning, at the beginning of the assigned round rather than the begin-ning of the epoch. Because of this the effective latency can vary, the average decreasingwith increasing n d . To address this, we ran a second set of experiments with communica-tion for RSP disabled which isolates the impact of distributed planning. This narrows thegap in task completion time to about 5%. Still, the scope of these simulation results isnarrow as the intent is to demonstrate functionality of the distributed planner rather thanto characterize task performance (which is the aim of Chapter 7).
This chapter has demonstrated a distributed implementation of RSP for receding-horizonsensor planning in the context of multi-robot exploration. The design is simple, requires The discrepancy in exploration volume is a product of differences in handling height. Here, robotshave a height limit of 0 .
6m without a specific bounding box while Chapter 7 provided a 2m tall boundingbox. In these results [57], three robots produce about 0 . − of data or 6 . − per robot. hapter 9Conclusions and Future Work This thesis has developed and advanced methods for receding-horizon sensor planning forteams of robots. Specifically, receding-horizon planning for problems such as explorationinvolves reasoning about likely observations at different camera views, reasoning aboutoverlaps and redundancy between views, and collectively optimizing trajectories and viewsfor teams of robots.Existing works are able to address some of the challenges related to solving theseproblems via greedy algorithms for maximizing submodular objectives—submodularitybeing a common property amongst many sensing objectives [69, 169]. However, thesealgorithms are not particularly amenable to planning in real time for large numbers ofrobots, particularly in distributed settings: numbers of computation and communicationrounds both grow linearly with the number of robots.Seeking to reduce computation time for submodular maximization problems leads toresults indicating that constant-factor computation time (more specifically, adaptivity,Sec. 3.6.2) and solution quality cannot co-exist in general [11, 75, 82]. This led us todevelop Randomized Sequential Partitions (RSP) and methods for analysis of pairwiseredundancy for multi-robot sensing problems. In Chapter 5, we identified 3-increasingfunctions—which include general coverage objectives—as the class of functions where re-dundancy between robots’ actions decreases monotonically given others’ decisions. Later,we applied this result for 3-increasing function to a case of mutual information for ex-ploration in Chapter 7 via interpretation of the mutual information objective as expectedcoverage. Further, in Chapter 6, developing similar methods for redundancy analysis forsums of submodular functions enabled application of RSP for target tracking problems,demonstrating consistent task performance for as many as 96 robots in simulation results.Finally, we described a distributed implementation of RSP in Chapter 8 and providedsimulation results for exploration in real time with eight robots.
This thesis work opens up numerous directions for future work which can be divided roughlybetween submodular function maximization and active sensing for robotics.135 .1.1 Submodular function maximization
This thesis also did not thoroughly characterize classes of 3-increasing objectives or moregeneral classes that satisfy bounds on redundancy such as sums of submodular functions.For example, we did not identify any meaningful objectives that are 3-increasing but donot satisfy alternating monotonicity conditions as coverage objectives do.Additionally, although we focused on greedy algorithms, the continuous greedy algo-rithm [31] could improve solution quality. While applying the continuous greedy algorithmto sensing problems, particular informative path planning problems with large spaces ofactions, may prove challenging, there is interest in applying such methods in robotics [154].Our methods for redundancy analysis could possibly be used to develop versions of suchalgorithms that converge with small numbers of integration steps.Continuous analogues of submodular functions have also been a recent topic of in-terest [19, 154, 197]. Reformulating our results for 3-increasing functions in continuousdomains could produce interesting results. However, the nature of possible impacts in thisarea are unclear.Future works may also view this text as providing existence proofs that characterizecertain sensing and submodular maximization problems. For example, a corollary to ourresults is that the adaptive complexity (Sec. 3.6.3) of a wide variety of receding-horizonsensing problems is O (1). Likewise, our results could be interpreted as providing upperbounds on the amount of information access that is necessary to achieve a given boundon solution quality; doing so could inform design of more complex optimization systems ormachine learning methods. To begin, this thesis studied two sensing problems: exploration and target tracking. How-ever, RSP methods are applicable to a broad variety of sensing problems. An interestingproperty of the problems we studied is that equilibrium conditions generally perform wellwithout explicit coordination (i.e. with myopic planning): so long as robots communicateobservations to each other, they tend to distribute themselves evenly across the environ-ment, automatically. Other sensing problems, one being target coverage, may not exhibitsuch equilibrium behavior and may benefit more significantly from coordinated sensorplanning.Online adaptation of the planner structure was discussed in Chapter 5 but not revivedafterward. Moreover, selecting the number of rounds produces a potentially challengingtradeoff between planning time and solution quality. Development of impactful methodsfor adaptation of RSP planning, particularly for planning in real time as in Chapter 8 isattractive. Ideal numbers of planning rounds may also vary spatially, and this may pose achallenging problem, particularly for planning in real time.While Chapter 6 applied results for sums of submodular functions to target tracking, themethods can also apply more generally to multi-objective sensing problems. The analysisfor redundancy could capture changes in dependency between robots through the courseof a complex task. For example, robots searching for the source of a gas leak in a building136ay experience large amount of redundancy that decreases after they localize the leak andfocusing, instead, on a mapping task with more local interactions.The applications in this thesis also focused on receding-horizon planning with fixedtrajectories . However, the problems we solve could be addressed more directly as variantsof POMDPs. Satsangi et al. [159] recursively apply greedy algorithms to obtain guaranteesfor certain submodular sensing processes. Similar results could be obtained for partitionmatroids in multi-robot problems, and the contributions of this thesis could be applied aswell, possibly to develop efficient, distributed solvers for active perception processes.In Chapter 7, we provided some incidental contributions toward objective design forexploration, and in kind with Henderson et al. [87], we identified some limitations of priorworks on mutual information objectives. In one such direction, the interpretation of mutualinformation without noise as expected coverage could produce more effective approxima-tions of joint mutual information (see Appendix A.4). The comparison to CSQMI in sameappendix also strongly suggests that improving approximations for joint mutual informa-tion could be fruitful in terms of task performance. Such approximations should also becompared to recent work on approximating conditional mutual information [87].As a whole, the methods for planning in this thesis are limited by the focus on receding-horizon planning. Further, realizing consistent improvements in task performance via RSPmay require better and more complete planning methodologies. Chapter 7 cited spatiallyglobal routing and assignment—such as via distributed auctions—as likely directions forimproving performance. Likewise, development of methods for planning with distributedroad maps would be an important step toward improving our methods for computing viewdistance rewards. 13738 ppendix AAdditional Technical Discussion
A.1 Expected coverage and mutual information forexploration with independent cells are not nec-essarily adaptive submodular
Although this thesis does not take advantage of adaptive submodularity, the work wasdeveloped in the wake of a number of works that apply adaptive submodularity to activesensing problems in robotics [45, 48, 96]. For this reason, considering whether and howadaptive submodularity might apply to problems such as exploration is important to thisthesis.In simple terms, adaptive submodularity [79] seeks to apply the concept of submodular-ity (and subsequent suboptimality guarantees) to adaptive settings—wherein agents obtainrealizations of actions or observations (e.g. whether a measurement succeeded or the real-ization of a actual camera view) at each step of a greedy decision process (as in Alg. 1). Inour work, the receding-horizon planning sub-problems which we formulate (Sec. 2.1.2) are non-adaptive , and the theory we develop primarily applies to the sub-problem solutions.However, the decision process by which the robots replan and explore is adaptive , and wedo not provide guarantees for this process as a whole.Methods based on adaptive submodularity provide the prospect of general suboptimal-ity guarantees for the entire exploration process. Moreover, Choudhury et al. [48] study anexploration problem with an expected coverage objective equivalent to (7.8) and cite rout-ing constraints (e.g. path planning) as the primary theoretic limitation for applying adap-tive submodularity to exploration (versus existing results for cardinality constraints [79]).However, we note that they only assume that the objective is (or is approximately) adap-tive submodular. Although this is true for some special cases [79, Sec. 7.1], such explorationobjectives are not adaptive submodular in general which we will prove by counterexampleafter introducing adaptive submodularity more formally.Consider a function g : 2 U × O U where O is a set of observations or outcomes. Here, X ⊆ U encodes the set of selections and Φ ∈ O U encodes the outcomes for each observa-tion. Further, agents receive an observation whenever they execute an action, and there is139 igure A.1: The above illustrates a counter-example which demonstrates that mutual information forranging observations with independent cells is not necessarily adaptive submodular. The target variable,the environment, consists of four cells E = [ C , . . . , C ] which are each independent and free (0) or occupied (1) with probability 0.5 (each cell with 1 bit of entropy). Two available observations B and A providedistances to the nearest occupied cell along a given ray, as illustrated. some probability distribution Φ over the outcomes for each action in U . Then, followingGolovin and Krause [79], we can define the expected marginal gain as∆( x | ψ ) = E [ g (dom( ψ ) ∪ { x } , Φ) − g (dom( ψ ) , Φ) | Φ ∼ ψ ] (A.1)where x ∈ U , ψ is a partial realization that consists of action-outcome pairs, dom( ψ ) isthe associated set of actions in U , and Φ ∼ ψ indicates that the full realization Φ is drawnconditional on the partial realization ψ . A function g is adaptive submodular with respectto the distribution over outcomes O U if for all ψ , ψ (cid:48) ⊆ ψ , and x ∈ U , then∆( x | ψ ) ≥ ∆( x | ψ (cid:48) ) . (A.2)In other words, the expected marginal gains for all actions must decrease regardless of theoutcome of a selection .Now, let us move on to the counterexamples for mutual information and expectedcoverage. Note that we now use more traditional notation for mutual information sincethat already includes mechanisms for encoding outcomes for observations and expectedmarginal gains which are equivalent to the ones Golovin and Krause [79] describe.
Theorem 13 (Noiseless mutual information for depth sensors with independent cells is notnecessarily adaptive submodular) . Consider an environment E which consists of Bernoullicells that are each free or occupied with some probability and possible observations U froma depth sensor which provide ranges to the nearest occupied cell along collections of raysaccording to descriptions in Chapter 7 and Sec. 7.3.2.The mutual information I ( E ; X ) for X ⊆ U is not adaptive submodular. The fact that mutual information is not adaptive submodular in general is well-known [80]. We provethat the same holds more narrowly for exploration and as part of the proof for expected coverage. Note that this does not preclude some cases being exactly or approximately adaptive submodular asin the work of Choudhury et al. [48]. roof.
Consider the scenario that Fig. A.1 illustrates and the resulting distribution overenvironments E and ground set U = { A, B } .The mutual information between B and E will violate adaptive submodularity. Now,note that B either observes the value of only one cell (if C is occupied) or else all four.As such, the mutual information between the two is initially I ( B ; E ) = 1 + 0 . · . B increases if we observe A and determine that C is free I ( B ; E | C = 0) = 3 . (A.4)This increase in mutual information violates adaptive submodularity (A.2) which completesthe proof. (cid:4) Corollary 13.1 (Expected coverage for exploration is not necessarily adaptive submodu-lar) . Expected coverage as described in Sec. 7.3.1 is not necessarily adaptive submodular.Proof.
This follows from Theorem 13 because noiseless mutual information with indepen-dent cells is an expected coverage objective (Theorem 12). (cid:4)
Still, there is still room to apply adaptive submodularity or similar properties to explo-ration policies. In fact, theory for objectives based on the size of the hypothesis space stillapplies [45, 79, 96, 97]. Further, Gupta et al. [85] provide results for a routing problem ina similar setting.However, reducing the size of the hypothesis space has its limitations. Consider thereduction in the hypothesis space h and the following hypothetical bound for a greedymaximization process: h g ≥ αh (cid:63) (A.5)For a uniform prior over n hypotheses, the relationship between a reduction in the hypoth-esis space h and the information gain I is I = log (cid:18) nn − h (cid:19) = log ( n ) − log ( n − h ) , (A.6)and h = n − log ( n ) − I . (A.7)Substituting into the bound I g ≥ log (cid:18) nn − α ( h (cid:63) ) (cid:19) (A.8)141onsider if h (cid:63) = n (such as for mapping an entire environment) so that I g ≥ log (cid:18) − α (cid:19) (A.9)which works out to one bit for α = 1 / for α = 1 − /e .In general, we can expect bounds on the reduction of the hypothesis space to be mostuseful for small hypothesis spaces (such as to distinguish between a small number of likelyenvironments predicted by a learner or localizing objects [96]). However, such boundswould be less impactful for the exploration problems that we study which exhibit largeinformation gain and exponentially hypothesis spaces. A.2 Analysis for scaling target tracking to large num-bers of robots
The analysis in this section establishes sufficient conditions for the cost of distributedplanning γ dist (6.9) for each robot to be constant (in expectation) for planners with a fixednumber of sequential steps, independent of the number of robots. Afterward, we discusshow this analysis relates to the design and analysis of target tracking systems.Consider a distribution of robots and targets on R n with at most α robots and β targetson average per unit volume. Then assume that the channel capacities (6.7) between eachrobot i ∈ R and target j ∈ T satisfy a non-increasing upper bound φ : R ≥ → R ≥ (possiblyin expectation) so that C i,j ≤ φ ( || p r i − p t j || ) where p r i and p t j are the robot position andtarget mean position in R n . Now, taking the expectation of the total weight for targetsdistributed on a n -ball centered around one robot and for robots on another ball centeredon each target, each ball with radius R , produces an upper bound on the expectationfor robots and targets within a radius of R/ R as B this expectation has the form: E (cid:20) (cid:88) j ∈R\{ i } W ( i, j ) (cid:21) = (cid:90) B (cid:90) B αβ min( φ ( || x || ) , φ ( || y || )) d x d y . (A.10)By integrating over the surface of each ball (each an ( n − S n − )= αβ (cid:90) R (cid:90) R S n − r n − r n − min( φ ( r ) , φ ( r )) d r d r . (A.11)Given that φ is non-increasing, separating the minimum produces:= αβS n − (cid:18)(cid:90) R (cid:90) r r n − r n − φ ( r ) d r d r + (cid:90) R (cid:90) Rr r n − r n − φ ( r ) d r d r (cid:19) , (A.12)142nd by swapping the bounds of the second integral, combining, and evaluating the innerintegral, we get: = αβS n − (cid:18)(cid:90) R (cid:90) r r n − r n − φ ( r ) d r d r + (cid:90) R (cid:90) r r n − r n − φ ( r ) d r d r (cid:19) (A.13)= 2 αβS n − (cid:90) R (cid:90) r r n − r n − φ ( r ) d r d r (A.14)= 2 αβS n − n (cid:90) R r n − φ ( r ) d r . (A.15)The above integral (A.15) converges in the limit if φ ∈ O (1 /x n + (cid:15) ), or most relevantly, on aplane this condition comes to φ ∈ O (1 /x (cid:15) ). Given that sensor models purely with sensornoise proportional to distance do not fit this constraint, designers may wish to considerthe effects of constraints on the total sensor range or the number of observations one robotcan obtain at once.
A.2.1 Scaling and sensor models
The sensitivity to how quickly interactions between robots and targets fall off motivatesattention to sensor design and modeling lest planners perform poorly for large numbersor require additional computation time. For example, additive Gaussian noise with thestandard deviation proportional to distance (as is common in range sensing models [35]) isinsufficient which is evident from the channel capacity of a Gaussian channel [58, Chap. 9].At the same time, robots in realizable systems cannot obtain and process observationsof unbounded numbers of targets, and features such as a maximum sensor range canmodel these limits. Still, the observation of whether a target is within range providessome information. Here, a narrow tails assumption on the probability distributions forthe targets—such as that the tails approach zero exponentially —ensures that φ decreasessufficiently quickly. As a result of this sensitivity to the tails, the scaling behavior (A.15)may be difficult to characterize a-priori, even when known to be bounded. A.3 An argument in favor of coverage over entropyreduction for evaluating exploration performance
Chapter 7 applies coverage-based performance measures while, previously in Chapter 4,we used entropy reduction to evaluate exploration performance. However, the systems forexploration that we study in this thesis are subject to little or no noise and are tunedso that cell occupancy values converge quickly. Anecdotally, we have observed significant This requirement on interactions between robots and targets is stricter than the equivalent one betweenrobots (see Theorem 8) as interactions between robots must decrease as O (1 /x n + (cid:15) ).
200 400 600 800 1 ,
000 1 ,
200 1 ,
400 1 ,
600 1 ,
800 2 ,
000 2 ,
200 2 ,
400 2 , . . · Robot-Iteration E n t r o p y R e du c t i o n ( b i t - ce ll s ) , C o v e r ag e ( ce ll s ) cell coverageentropy reductionscaled entropy reduction Figure A.2:
Compare entropy reduction and environment coverage for five exploration trials with 16robots in the boxes environment (jagged lines are individual trials). Notice that these coincide exactlywhen scaled to match. changes in maximum entropy reduction for exploration trials in the same environment assystems and parameters have changed over time. Moreover, entropy results for our systemsclosely match environment coverage when scaled by a constant factor (Fig. A.2) whichlikely reflects design decisions such as limits on occupancy certainty (which are commonin practice [92, (4)]) rather than any feature of performance. Instead, a coverage-basedmetric provides a consistent measure of the amount of volume that robots have observed(given cell volume) and is less affected by design parameters such as limits on occupancyuncertainty.
A.4 Evaluating the performance of an approximatemutual information ob jective
Initially, we chose the optimistic coverage objective in Chapter 7 primarily because exactevaluation is tractable and because optimistic coverage is 3-increasing. As such, we wishto clarify how this decision affects performance, particularly in comparison to the CSQMIobjective applied in Chapter 4. Additionally, we are not aware of any other works thatcompare ray-based mutual information [37, 103] to coverage-like approaches which arealso common [20, 62, 168]. On the other hand, Zhang and Vorobeychik [200] found thatShannon mutual information objectives behaves similarly as CSQMI.Figure A.3 presents results for exploration in the Boxes and Empty environments whereoptimistic coverage provides a 16% and 11% improvement in average completion time, re- We made this decision before identifying the connection between mutual information and expectedcoverage.
500 1 ,
000 1 ,
500 2 , . . · Robot-Iteration C o v e r ag e ( ce ll s ) (a) Boxes 0 500 1 ,
000 1 ,
500 2 ,
000 2 ,
500 3 , · Robot-Iteration C o v e r ag e ( ce ll s ) optimistic coverageCSQMI (b) Empty
Figure A.3:
The above plots compare exploration with optimistic coverage and CSQMI objectives interms of environment coverage in the Boxes and Empty environments (for ten trials with sequential plan-ning and 16 robots). Black lines demarcate (from top to bottom) the maximum environment coverage,the completion threshold, and the early progress threshold. spectively, compared to exploration with a CSQMI objective with an occupancy prior of0.125. Note that some improvement in performance is expected for the Empty environ-ment because the optimistic coverage objective provides oracle values.These results are somewhat limited. First, we do not vary the parameters of the CSQMIobjective such as the prior and parameters for the independence test. In particular, theresults suggest that one way to improve performance may be to select a lower occupancyprior (compared to 0.125). Additionally, we did not put the exploration system with theCSQMI objective through the same parameter tuning process as we did for optimistic cover-age. Still, the average completion times for CSQMI in the Boxes and Empty environments(1067 and 1526 robot-iterations, respectively) exceed the average for any configuration ofthe optimistic coverage objective that we tested.That said, these results indicate several ways by which exploration performance couldbe improved for systems based on mutual information objectives:To begin, selecting priors with low occupancy probabilities may improve performance(in agreement with Henderson et al. [87]) given that optimistic coverage is equivalent tomutual information with a prior approaching zero. Toward this end, normalizing the mu-tual information (such as by dividing by the entropy of unobserved cells) would mitigatethe issue of the mutual information trending toward zero for small priors. The motiva-tion for such normalization is self-evident from the form of (7.11), and we note that theoptimistic coverage objective effectively realizes this normalization in the limit. Chapter 4 uses the same value for the prior. via the method of Charrowet al. [37]) may significantly affect accuracy for small occupancy priors. Notably, recentwork by Henderson et al. [87] provides accurate values for individual camera views bytreating the view as a continuous integral as well as new bounds for multiple views, and thisapproach could significantly improve approximations for multiple views and robots. Foreither case, optimistic coverage, having the advantage of being exact for a certain regime,could provide a useful point of comparison for evaluating the impact of approximation.
A.4.1 Upper and lower bounds on mutual information
As stated, approximations of mutual information for multiple rays and camera views fre-quently apply upper and lower bounds on joint mutual information [37, 88]. The interpre-tation of noiseless mutual information as expected coverage likewise leads to useful boundson mutual information. Specifically, the expectation for the mutual information in (7.11)commutes and can be computed cell-wise. The expectations for individual cells can thenbe bounded via upper and lower bounds on probabilities that an observation along a givenray will infer the occupancy value for that cell. The maximum probability that any ofseveral rays will observe a cell produces a lower bound on the mutual information whilesumming probabilities of observation (and capping at one) produces an upper bound. No-tably, both of these bounds become tight as the prior probability of occupancy approacheszero because the probability of observing all cells in range approaches one.
A.5 Comparison of suboptimality for RSP and DSGA
The proposed RSP planners overcome a number of limitations of DSGA (Chapter 4):DSGA does not provide offline suboptimality guarantees, distributed implementationswould involve broadcasts and reductions over all robots during each round, and the as-signments themselves are sequential. We provide the following comparison as a point ofreference as DSGA may be impractical in practice.Figure A.4 compares lower bounds on suboptimality (Sec. 7.6.1) for DSGA and RSPgiven different numbers of planning rounds ( n d ). On average, DSGA outperforms RSPplanning in these results. Even though RSP provides stricter performance guarantees,this result is not surprising. While RSP partitions robots randomly, DSGA effectivelypartitions robots in the midst of the planning process. This could enable DSGA to bettertake advantage of problem structure or random improvements in planner results acrossrounds (as DSGA selects a subset of results from all robots during each round). Arguably, accurate evaluation of joint mutual information is much more important to this work thanothers on exploration since we study the collective contributions of teams of robots. One case where the approximation of the joint by Charrow et al. [37] would perform poorly is whenmany rays traverse a single, previously unobserved cell but collectively observe many unobserved cells.However, the contributions of only one or few rays would be counted. Such probabilities are trivial to compute [37, 103]. . . . . .
82 Num. Rounds ( n d ) Sub o p t i m a li t y ( l o w e r b o und ) DSGARSPSequential
Figure A.4:
Comparison of lower bounds on suboptimality for planning for exploration with RSP andDSGA for ten simulation trials in the Boxes environment. The red line is based suboptimality results forsequential planning in the same configuration from Table 7.3.
A.6 Description of auction methods for distributedsubmodular maximization
Chapter 8 compares RSP planning to auction methods much like those which are commonfor task assignment [46]. Such auction algorithms converge to greedy solutions (Alg. 1) andcan even be extended to complex constraints [194]. However, the auction algorithms thatChoi et al. [46] describe (e.g. CBBA) are specific to the assignment problems they studyand do not immediately apply for general submodular objectives. Specifically, they takeadvantage of how the value of an assignment to one robot is independent of assignmentsto others to avoid recomputing marginal gains (bids) when updating assignments.The following algorithms (Algs. 6 and 7) are similar and are suitable for the kinds ofassignment problems we study in this thesis (e.g. (2.9)) The two differ based on assump-tions on how agents evaluate objective values. The first requires full information—theability to evaluate marginal gains for any agents’ actions—in order for agents to reorderexisting assignments. This likely produces behavior that is comparable to CBBA whichcan often avoid re-evaluating marginal gains. For simplicity we assume that both runsynchronously on an undirected communication graph where each agent i has neighbors N i . Both algorithms eventually converge to the maximum value across all agents at eachstep of the assignment process and so eventually obtain solutions equivalent to the generalgreedy algorithm (Alg. 1). Although we do not seek to prove this point, the argumentswould be similar to those by Choi et al. [46].Algorithm 6 requires consistent, global access to the objective g . This algorithm con- Like Choi et al. [46] we assume that there is a deterministic mechanism to break symmetry in com-parisons (e.g. via agent and action indices). i ). This greedy process can enable agents to collectively identifymultiple elements of the final solution during a single synchronous communication roundand sometimes converges in fewer synchronous communication rounds than the number ofagents. However, this approach strictly requires global access to the objective g becausethe agents evaluate the value of others’ assignments during the greedy decision process andto ensure that the outputs are consistent and deterministic. As such, this algorithm doesnot represent a practical solution to the problems we study as asymmetry in informationaccess is common in distributed settings [46]. Algorithm 6
Auction algorithm with global information. One synchronous iteration fromthe perspective of agent i . X i ← Current set of assignments for agent i B i ← Set of actions available to i N i ← Set of communication neighbors Send : X i to N i Receive : X j from j for j ∈ N i G ← (cid:83) j ∈N i ∪ i X j ∪ B i X i ← Greedy ( G ) (Executing Alg. 1 for objective g and partition matroid I )Algorithm 7, which performs auctions with local information, is somewhat more com-plex because agents are not allowed to evaluate marginal gains for others’ assignments sothat these values have to be communicated along with the decisions and so that updatesrequire more care. The main body (lines 5–8) of this algorithm is similar to the last inthat agents exchange assignments with neighbors before updating the local solution. Ex-cept, this time, values and assignments are exchanged in lists in assignment order. Theupdate itself is based on the first position where the assignments differ and consists of threecases: (line 12) the current assignments dominate (and already include an assignment to i );(line 14) the other’s assignments dominate and include some assignment to i (in block B i of the partition matroid), obtained via the following procedure; or finally (line 16) agent i must update the other’s assignments with an assignment to itself. Seeking to emulateAlg. 1, agent i checks whether its best marginal gain beats the value of the previous win-ning assignment V o ,m at each point m in the decision process. Once identifying a winningassignment position (or implicitly beating a null assignment) the agent concatenates itsassignment and value (using “ · ” to represent list concatenation) onto the sub-lists of prior For example, robots in Chapter 6 use different local approximations for target tracking, and for thedistributed exploration implementation in Chapter 8, even though robots construct maps with sensor datafrom all robots, the maps are not synchronized and generally differ. i invalidatesany following marginal gains. This behavior also implies that, unlike Alg. 6, this auctionalgorithm requires at least one synchronous round per each element of the final solution(i.e. one per robot) as the incremental construction of the solution in line 21 ensures thatthe agents also collectively produce the full solution sequentially.Both of these algorithms can be implemented more efficiently from a computationalperspective. But, given that the discussion in Chapter 8 focuses on communication costs,efficient computation will not be necessary for this text. Still, even efficient implemen-tations of these auction algorithms will be more computation intensive than RSP (whereagents together complete a single pass over each block B i of the ground set). The auctionalgorithms both execute a complete pass over the ground set in the first synchronous roundalone and may require many times more computation to converge. Likewise, auction algo-rithms incur greater communication costs because all agents exchange assignments duringeach round (rather than a subset) and because agents exchange sets of assignments ratherthan individual assignments as in RSP. Algorithm 7
Auction algorithm with local information. One synchronous iteration fromthe perspective of agent i . X i ← List of assignments for agent i (in assignment order) V i ← List of marginal gains (values) for assignments X i B i ← Set of actions available to i N i ← Set of communication neighbors Send : X i , V i to N i Receive : X j , V j from j for j ∈ N i for j ∈ N i do X i , V i ← UpdateAssignments ( X i , V i , X j , V j ) (cid:46) Update local assignment in light of one from an other agent procedure
UpdateAssignments ( X (cid:96) , V (cid:96) , X o , V o ) n diff ← min n V (cid:96),n (cid:54) = V o ,n if V (cid:96),n ≥ V o ,n then (Agent i ’s assignments dominate. Return.) return X (cid:96) , V (cid:96) else if B i ∩ X o (cid:54) = ∅ then (Other’s assignments reflect i ’s choices) return X o , V o else for m ∈ { n, . . . , | X o |} do (Search for a viable assignment position) x ← arg max x (cid:48) ∈ B i g ( x (cid:48) | X o , m − ) v ← g ( x | X o , m − ) if v > V o ,m then ( i ’s bid wins at position m ) return X o , m − · x, V o , m − · v (values V o valid up to m − ppendix BAssorted Proofs B.1 Proof for representations of derivatives of set func-tions
Equation (3.16) defines the n th derivative of a set function as g ( Y ; . . . ; Y n | X ) = g ( Y ; . . . ; Y n − | X, Y n ) − g ( Y ; . . . ; Y n − | X ) . (B.1)This is equivalent to the definition by Foldes and Hammer [70], g ( Y ; . . . ; Y n | X ) = (cid:88) Y ⊆{ Y ,...,Y n } ( − n −| Y | g ( X ∪ ˆ Y ) , ˆ Y = (cid:91) Y (cid:48) ∈ Y Y (cid:48) . (B.2)with the exceptions that we define derivatives with respect to sets for convenience (wherethe extension follows from closure properties Foldes and Hammer [70]) and omit set differ-ences by requiring disjoint sets. Proof.
The proof follows by expanding the right hand side of (B.1) and combining theresulting expressions. Define the union of a collection of sets as φ ( X ) = (cid:83) X (cid:48) ∈ X X (cid:48) . Then g ( Y ; . . . ; Y n − | X, Y n ) − g ( Y ; . . . ; Y n − | X )= (cid:88) Y ⊆{ Y ,...,Y n − } ( − n − −| Y | g ( X ∪ Y n ∪ φ ( Y )) − (cid:88) Y ⊆{ Y ,...,Y n − } ( − n − −| Y | g ( X ∪ φ ( Y ))= (cid:88) Y ⊆{ Y ,...,Y n − } ( − n − ( | Y | +1) g ( X ∪ Y n ∪ φ ( Y )) + (cid:88) Y ⊆{ Y ,...,Y n − } ( − n −| Y | g ( X ∪ φ ( Y ))= (cid:88) Y ⊆{ Y ,...,Y n } ( − n −| Y | g ( X ∪ φ ( Y )) = g ( Y ; . . . ; Y n | X ) . (cid:4) .2 Proof of Lemma 1, chain rule for derivatives ofset functions Proof.
The proof follows by expanding the derivative (3.16), forming a telescoping sum,and rewriting the summands as individual derivatives: g ( Y ; . . . ; Y n | X ) = g ( Y ; . . . ; Y n − | Y n , X ) − g ( Y ; . . . ; Y n − | X )= | Y n | (cid:88) i =1 (cid:0) g ( Y ; . . . ; Y n − | Y n, i , X ) − g ( Y ; . . . ; Y n − | Y n, i − , X ) (cid:1) = | Y n | (cid:88) i =1 g ( Y ; . . . ; Y n − ; y n,i | Y n, i − , X ) . (B.3) (cid:4) B.3 Proof of Theorem 3, post-hoc suboptimality ofDSGA
Proof.
The proof of the suboptimality bound relating DSGA to SGA incorporates subop-timality of the single-robot planner and is similar to [169] or [7]. We obtain the followingby monotonicity and by rearranging the resulting telescoping sum I ( M ; Y (cid:63) ) ≤ I ( M ; Y (cid:63) ) + n r (cid:88) i =1 I ( M ; Y d i | Y d1: i − , Y (cid:63)i +1: n r )= I ( M ; Y d ) + n r (cid:88) i =1 I ( M ; Y (cid:63)i | Y d1: i − , Y (cid:63)i +1: n r ) . (B.4)By submodularity I ( M ; Y (cid:63)i | Y d1: i − , Y (cid:63)i +1: n r ) ≤ I ( M ; Y (cid:63)i | Y d1: i − ) . Without loss of generality, assume that agent indices correspond to the selection order andrewrite in terms of the planning rounds such that I ( M ; Y (cid:63)i | Y d1: i − ) = I ( M ; Y (cid:63)D j,k | Y d D j, k − ∪ F j − )and note that although Y (cid:63) is formally a set, the mapping from elements to robots can beobtained by the intersections Y (cid:63) ∩ Y i given that the sets Y i are disjoint. Then, by sub-modularity, I ( M ; Y (cid:63)i | Y d1: i − ) ≤ I ( M ; Y (cid:63)D j,k | Y d F j − ) . By (4.9) and the greedy maximization step in Alg. 3 I ( M ; Y (cid:63)D i,j | Y d F i − ) ≤ η I ( M ; Y d D i,j | Y d F i − ) . (B.5)152ubstitute (B.5) and preceding inequalities into (B.4) to obtain I ( M ; Y (cid:63) ) ≤ I ( M ; Y d ) + η n d (cid:88) i =1 | D i | (cid:88) j =1 I ( M ; Y d D i,j | Y d F i − ) . (B.6)The distributed objective can be rewritten as a sum so that I ( M ; Y d ) = (cid:80) n d i =1 (cid:80) | D i | j =1 I ( M ; Y d D i,j | Y d D i, j − ∪ F i − ) and substituted into (B.6) to obtain I ( M ; Y (cid:63) ) ≤ (1 + η ) I ( M ; Y d ) + η n d (cid:88) i =1 | D i | (cid:88) j =1 (cid:16) I ( M ; Y d D i,j | Y d F i − ) − I ( M ; Y d D i,j | Y d D i, j − ∪ F i − ) (cid:17) (B.7)which expresses the suboptimality in terms of decreases in the conditional mutual informa-tion from when the plans were first obtained from the planner (Alg. 3, line 7) to when theywere assigned, (Alg. 3, line 9) potentially after other assignments in the same round. Byrewriting mutual information in terms of entropies we can rearrange to obtain the following I ( M ; Y ) − I ( M ; Y | Y ) = I ( Y ; Y ) − I ( Y ; Y | M ) . If Y and Y are conditionally independent given M , then the mutual information, I ( Y ; Y | M ) = 0 and so I ( M ; Y ) − I ( M ; Y | Y ) = I ( Y ; Y ). By substitution into (B.7) wecan obtain the slightly more concise and final expression for the suboptimality in terms ofthe mutual information between observations I ( M ; Y (cid:63) ) ≤ (1 + η ) I ( M ; Y d ) + η n d (cid:88) i =1 | D i | (cid:88) j =1 I ( Y d D i,j ; Y d D i, j − | Y d F i − ) . (B.8) (cid:4) B.4 Proof of Theorem 4, worst case suboptimality ofDSGA
Proof.
Equation (4.13) follows from Theorem 1 by Grimsman et al. [82] which proves a k +1bound where k is the clique cover number of a directed acyclic graph associated with theplanner structure. In this directed graph, each robot represents a vertex, and the graph hasa directed edge ( a, b ) between robots a, b ∈ R if b maximizes its objective (4.7) conditionalon the sequence of observations selected by a . Here, a clique, which is a complete subgraph,is analogous to a set of robots that plan sequentially given the choices by all prior robots inthe clique. For Alg. 3, any set of robots A ⊆ R with at most one robot from each planninground ( | A ∩ D i | ≤ i ∈ { , . . . , n d } ) forms a clique in the associated directed graph. Aclique cover of size (cid:100) n r /n d (cid:101) = max i ∈{ ,...,n d } | D i | can be obtained by selecting cliques witha single robot (as available) from each planning round ( D , . . . , D n d ) without replacement.Then, (4.13) follows by substitution of (4.9) to obtain a factor of η . (cid:4) .5 Proof of Lemma 11, suboptimality of general as-signments Proof.
This result follows a similar approach as the other proofs related to sequentialsubmodular maximization that arise throughout this thesis with slight deviation to assistin book-keeping: g ( X (cid:63) ) ≤ g ( X d , X (cid:63) ) (B.9)= g ( X d ) + n r (cid:88) i =1 g ( x (cid:63)i | X (cid:63) i − , X d ) (B.10) ≤ g ( X d ) + n r (cid:88) i =1 g ( x (cid:63)i | X d1: i − ) (B.11) ≤ g ( X d ) + n r (cid:88) i =1 (cid:0) g ( x d i | X d1: i − ) + γ i ( g, X d1: i − ) (cid:1) (B.12)= 2 g ( X d ) + n r (cid:88) i =1 γ i ( g, X d1: i − ) . (B.13)Above, (B.9) follows from monotonicity; (B.10) expands a telescoping series; (B.11) followsfrom submodularity; (B.12) upper bounds the incremental values optimal robot decisionswith the incremental value of each actual decision and its suboptimality (6.8); and (B.13)collapses the telescoping series. (cid:4) B.6 Proof of Theorem 10, suboptimality of distributedplanning for target tracking
Proof.
Theorem 10 consists of two parts, (6.12) and (6.13). We prove each in turn. Sincethe costs in both equations involve sums over robots, both proofs analyze costs with respectto some robot i ∈ R . Proof of Theorem 10, part 1 (6.12)According to the standard greedy algorithm (Alg. 2), robot i would plan conditional ondecisions by robots { , . . . , i − } . However, in Alg. 4 that robot instead plans conditionalon a subset of these robots N i ⊆ { , . . . , i − } and ignores ˆ N i = { , . . . , i − } \ N i . Wecan then write the cost of a suboptimal decision from (6.14) in terms of γ dist i as γ i ( g, X d1: i − ) ≤ γ i ( g, X d N i ) + g ( x d i | X d N i ) − g ( x d i | X d1: i − )= γ i ( g, X d N i ) + γ dist i , (B.14)where the first step follows by substituting the cost model (6.8) and given thatmax x ∈ B i g ( x | X d1: i − ) ≤ max x ∈ B i g ( x | X d N i ) due to submodularity, and the second follows154rom the definition of the cost of distributed planning (6.9). To incorporate the cost ofsuboptimal planning γ plan i , observe that γ i ( g, X d N i ) = γ i ( g, X d N i ) + γ plan i − γ plan i = γ plan i + γ i ( g, X d N i ) − γ i ( (cid:101) g i , X d N i ) (B.15)which follows from the definition of the planning cost (6.11). The cost of approximationof the objective γ obj i upper bounds the difference of the last two terms in (B.15): γ i ( g, X d N i ) − γ i ( (cid:101) g i , X d N i ) = (cid:101) g i ( x d i | X d N i ) − g ( x d i | X d N i )+ max x ∈ B i g ( x | X d N i ) − max x ∈ B i (cid:101) g i ( x | X d N i ) ≤ (cid:101) g i ( x d i | X d N i ) − g ( x d i | X d N i )+ g (ˆ x | X d N i ) − (cid:101) g i (ˆ x | X d N i ) , for ˆ x ∈ arg max g (ˆ x | X d N i ) ≤ γ obj i (B.16)by expanding and rearranging the costs (6.8) on the left-hand-side, using an upper boundto match the arguments of the last two terms, and by using the definition of the objectivecost (6.10) to bound the two differences.Then, the expression for the costs in (6.12) γ i ( g, X d1: i − ) ≤ γ obj i + γ plan i + γ dist i (B.17)follows by substituting the prior three equations into each other: (B.15) into (B.14)and (B.16) into the result. Finally, substituting this inequality (B.17) into (6.14) fromLemma 11 on the suboptimality of general assignments yields the expression for (6.12)which completes the first part of this proof. Proof of Theorem 10, part 2 (6.13)The second part of Theorem 10 (6.13) follows by referring to definition of γ dist i in (6.9),applying the chain rule (3.17), and substituting the bound on second derivatives (6.20)and the definitions of the weights (6.21) and (6.22) in turn: γ dist i = − g ( x d i ; X dˆ N i | X d N i ) (B.18)= − (cid:88) j ∈ ˆ N i g (cid:0) x d i ; x d j | X d N i , X dˆ N i ∩{ j − } (cid:1) (B.19) ≤ (cid:88) j ∈ ˆ N i W ( i, j ) ≤ (cid:88) j ∈ ˆ N i (cid:99) W ( i, j ) . (B.20)Then, (6.13) follows by summing over R . This completes this second and last part of theproof of Theorem 10. (cid:4) .7 Proof of Theorem 12, noiseless mutual informa-tion with independent cells is 3-increasing The following proof takes advantage of cell independence liberally to write mutual infor-mation in terms of the expected entropy of the cells that the robots will observe.
Proof.
We can write the mutual information (3.6) between the environment E and futureobservations Y ( X ) in terms of entropies: I ( E ; Y ( X )) = H ( E ) − H ( E | Y ( X )) . (B.21)The conditional entropy can then be rewritten in terms of the expected entropy (3.4) giventhe direct observations of cell occupancy (7.3) associated with a hypothetical instantiationof the environment E (cid:48) while abbreviating observed cells as C (cid:48) = C cov ( X, E (cid:48) ):= H ( E ) − E E (cid:48) ∼E guess [ H ( E | E C (cid:48) = E (cid:48) C (cid:48) )] , (B.22)and that conditional entropy is simply the entropy of the cells that have not yet beenobserved D (cid:48) = C \ C (cid:48) due to independence:= H ( E ) − E E (cid:48) ∼E guess [ H ( E D (cid:48) )] . (B.23)Then, bringing the entropy of E into the expectation does not change its value, andseparating the independent observed and unobserved cells simplifies the expression:= E E (cid:48) ∼E guess [ H ( E C (cid:48) ) + H ( E D (cid:48) ) − H ( E D (cid:48) )] . (B.24)= E E (cid:48) ∼E guess [ H ( E C (cid:48) )] . (B.25)Finally, the joint entropy of the cells the robot will observe is the sum of their individualentropies I ( E ; Y ( X )) = E E (cid:48) ∼E guess (cid:20)(cid:88) i ∈ C cov ( X,E (cid:48) ) H ( c i ) (cid:21) . (B.26)This expresses a weighted expected coverage objective (7.8) where the weight w cell ( i ) ofeach cell i ∈ C is equal to its entropy H ( c i ). Observing that weighted expected coverage is3-increasing (Sec. 7.3.1a) completes the proof. (cid:4) ibliography [1] Ian Abraham, Anastasia Mavrommati, and Todd D Murphey. Data-driven measure-ment models for active localization in sparse environments. In Proc. of Robot.: Sci.and Syst. , Pittsburgh, PA, 2018. 3.3.1[2] Aaron D. Ames, Xiangru Xu, Jessy W. Grizzle, and Paulo Tabuada. Control barrierfunction based quadratic programs for safety critical systems.
IEEE Transactions onAutomatic Control , 62(8):3861–3876, 2016. 2.1.1d[3] Gunnar Andersson. An approximation algorithm for Max p -Section. In AnnualSymposium on Theoretical Aspects of Computer Science , Trier, Germany, 1999. 5.5.2[4] Aristotle Arapostathis, Vivek S. Borkar, Emmanuel Fern´andez-Gaucherand, Mri-nal K. Ghosh, and Steven I. Marcus. Discrete-time controlled markov processes withaverage cost criterion: a survey.
SIAM Journal on Control and Optimization , 31(2):282–344, 1993. 2.2.1, 2.2.2[5] Mauricio Araya, Olivier Buffet, Vincent Thomas, and Fran¸ccois Charpillet. A pomdpextension with belief-dependent rewards. In
Advances in neural information process-ing systems , pages 64–72, 2010. 3.3.3[6] Arash Asadpour and Hamid Nazerzadeh. Maximizing stochastic monotone submod-ular functions.
Management Science , 62(8):2374–2391, 2015. 1, 3.3.3[7] Nikolay A. Atanasov, Jerome Le Ny, Kostas Daniilidis, and George J. Pappas. De-centralized active information acquisition: Theory and application to multi-robotSLAM. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Seattle, WA, May2015. 1.2, 3.1.2, 3.3.1, 3.4.1, 4.1, 6.1, 7.8.1, 8, B.3[8] Elif Ayvali, Hadi Salman, and Howie Choset. Ergodic coverage in constrained envi-ronments using stochastic trajectory optimization. In
Proc. of the IEEE/RSJ Intl.Conf. on Intell. Robots and Syst. , Vancouver, Canada, September 2017. 2.2.2[9] Francis Bach. Learning with submodular functions: A convex optimization perspec-tive.
Foundations and Trends in Machine Learning , 6(2-3), 2013. 5.6, 3[10] Ruzena Bajcsy. Active perception.
Proceedings of the IEEE , 76(8):966–1005, 1988.3.3.1[11] Eric Balkanski and Yaron Singer. The adaptive complexity of maximizing a sub-modular function. In
Proceedings of the 50th Annual ACM SIGACT Symposium onTheory of Computing , pages 1138–1151, 2018. 3.6.3, 3.1, 915712] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An optimal approximation forsubmodular maximization under a matroid constraint in the adaptive complexitymodel. In
Proceedings of the 51st Annual ACM SIGACT Symposium on Theory ofComputing , pages 66–77, 2019. 3.6.3, 8, 3.1, 3.6.5[13] Jacopo Banfi, Nicola Basilico, and Francesco Amigoni. Multirobot reconnection ongraphs: Problem, complexity, and algorithms.
IEEE Trans. Robotics , 34(5):1299–1314, 2018. 3.1.6[14] Jacopo Banfi, Alberto Quattrini Li, Ioannis Rekleitis, Francesco Amigoni, and NicolaBasilico. Strategies for coordinated multirobot exploration with recurrent connectiv-ity constraints.
IEEE Trans. Robotics , 42(4):875–894, 2018. 3.1.6[15] Rafael Barbosa, Alina Ene, Huy Nguyen, and Justin Ward. The power of random-ization: Distributed submodular maximization on massive datasets. In
InternationalConference on Machine Learning , pages 1236–1244, 2015. 3.6.4[16] Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A newframework for distributed submodular maximization. In
Proc. of the IEEE Annu.Symp. Found. Comput. Sci. , New Brunswick, NJ, October 2016. 3.6.4[17] Jose A. J. Berni, Pablo J. Zarco-Tejada, Lola Su´arez, and Elias Fereres. Thermaland narrowband multispectral remote sensing for vegetation monitoring from anunmanned aerial vehicle.
IEEE Trans. Robotics , 47(3):722–738, 2009. 1[18] Graeme Best, Oliver M. Cliff, Timothy Patten, Ramgopal R. Mettu, and RobertFitch. Dec-MCTS: Decentralized planning for multi-robot active perception.
Intl.Journal of Robotics Research , 38(2-3):316–337, 2019. 3.4, 4.3, 8[19] Andrew An Bian, Baharan Mirzasoleiman, Joachim Buhmann, and Andreas Krause.Guaranteed non-convex optimization: Submodular maximization over continuousdomains. In
Proc. of the Intl. Conf. on Artif. Intell. and Stat. , Ft. Lauderdale, FL,2017. 9.1.1[20] Andreas Bircher, Mina Kamel, Kostas Alexis, Helen Oleynikova, and Roland Sieg-wart. Receding horizon path planning for 3D exploration and surface inspection.
Auton. Robots , 42(2):291–306, 2018. 2.1.2, 3.1.3, 3.4.1b, 7.3, A.4[21] Jeannette Bohg, Karol Hausman, Bharath Sankaran, Oliver Brock, Danica Kragic,Stefan Schaal, and Gaurav S. Sukhatme. Interactive perception: Leveraging action inperception and perception in action.
IEEE Trans. Robotics , 33(6):1273–1291, 2017.3.3.1[22] Rogerio Bonatti, Yanfu Zhang, Sanjiban Choudhury, Wenshan Wang, and SebastianScherer. Autonomous drone cinematographer: Using artistic principles to createsmooth, safe, occlusion-free trajectories for aerial filming. In
Proc. of the Intl. Sym.on Exp. Robot. , 2018. 1[23] Terry Bossomaier, Lionel Barnett, Michael Harr´e, and Joseph T Lizier.
An introduc-tion to transfer entropy . Springer, 2016. 2.2.3[24] Adam Breuer, Eric Balkanski, and Yaron Singer. The FAST algorithm for submod-158lar maximization. arXiv preprint arXiv:1907.06173 , 2019. 3.6.3[25] Rodney A Brooks. Intelligence without reason. 1991. 1[26] Robin Brown, Federico Rossi, Kiril Solovey, Michael T. Wolf, and Marco Pavone.Exploiting locality and structure for distributed optimization in multi-agent systems.In
Proc. of the Euro. Control Conf. , St. Petersburg, Russia, 2020. 6[27] Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowl-ing, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, andSimon Colton. A survey of Monte Carlo tree search methods.
IEEE Trans. onComput. Intell. and AI in Games , 4(1):1–43, 2012. 4.3, 6.4.4, 7.1, 8.4.2[28] Francesco Bullo.
Lectures on Network Systems . CreateSpace, 2018. 3.6.5[29] Wolfram Burgard, Mark Moors, Cyrill Stachniss, and Frank E. Schneider. Coordi-nated multi-robot exploration.
IEEE Trans. Robotics , 21(3):376–386, 2005. 3.1.3,3.1.5[30] Jonathan Butzke and Maxim Likhachev. Planning for multi-robot exploration withmultiple objective utility functions. In , pages 3254–3259. IEEE, 2011. 3.1.5, 7.3, 7.3.3a, 7.4[31] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondr´ak. Maximizing amonotone submodular function subject to a matroid constraint.
SIAM J. Comput. ,40(6):1740–1766, 2011. 3.5.3b, 3.6.3, 8, 3.6.5, 7.8.1, 9.1.1[32] Luca Carlone and Sertac Karaman. Attention and anticipation in fast visual-inertialnavigation.
IEEE Trans. Robotics , 35(1):1–20, 2018. 3.5.3a[33] Benjamin Charrow.
Information-Theoretic Active Perception for Multi-Robot Teams .PhD thesis, University of Pennsylvania, 2015. 4.6.1b[34] Benjamin Charrow, Vijay Kumar, and Nathan Michael. Approximate representationsfor multi-robot control policies that maximize mutual information.
Auton. Robots ,37(4):383–400, 2014. 4.1, 6.1[35] Benjamin Charrow, Nathan Michael, and Vijay Kumar. Cooperative multi-robot es-timation and control for radio source localization.
Intl. Journal of Robotics Research ,33(4):569–580, April 2014. 3.3.1, 6.1, A.2.1[36] Benjamin Charrow, Gregory Kahn, Sachini Patil, Sikang Liu, Ken Goldberg, PieterAbbeel, Nathan Michael, and Vijay Kumar. Information-theoretic planning withtrajectory optimization for dense 3D mapping. In
Proc. of Robot.: Sci. and Syst. ,Rome, Italy, July 2015. 1, 2.1.2, 3.1.3, 6.4.4, 7.3, 7.4[37] Benjamin Charrow, Sikang Liu, Nathan Michael, and Vijay Kumar. Information-theoretic mapping using Cauchy-Schwarz quadratic mutual information. In
Proc. ofthe IEEE Intl. Conf. on Robot. and Autom. , Seattle, WA, May 2015. 3.1, 3.1.3, 3.3.1,4.2.2, 4.2.2, 4.4.5, 6, 7, 7.3, 7.3.2, 7.3.2a, 7.3.3b, A.4, A.4.1, 7, 8[38] Benjamin Charrow, Nathan Michael, and Vijay Kumar. Active control strategies fordiscovering and localizing devices with range-only sensors. In
Algorithmic Found.Robot. , pages 55–71, 2015. 3.4.1b, 8 15939] Guillaume Chaslot.
Monte-Carlo Tree Search . PhD thesis, Universiteit Maastricht,2010. 3.1.3, 4.3, 6.4.4, 6.7, 8.4.2[40] Nesrine Mahdoui Chedly.
Communicating multi-UAV system for cooperative SLAM-based exploration . PhD thesis, Universit´e de Technologie de Compi`egne, 2018. 3.1.5[41] Chandra Chekuri and Pal Martin. A recursive greedy algorithm for walks in directedgraphs. In
Proc. of the IEEE Annu. Symp. Found. Comput. Sci. , pages 245–253,2005. 2, 2.1.1c, 2.1.2a, 3.3.2, 4.3, 6.1, 6.4.4[42] Chandra Chekuri and Kent Quanrud. Parallelizing greedy for submodular set func-tion maximization in matroids and beyond. In
Proceedings of the 51st Annual ACMSIGACT Symposium on Theory of Computing , pages 78–89, 2019. 3.6.3[43] Chandra Chekuri, Nitish Korula, and Martin P´al. Improved algorithms for orien-teering and related problems.
ACM Transactions on Algorithms (TALG) , 8(3):1–27,2012. 2.1.1c, 2.1.2a[44] Wei Chen, Qiang Li, Xiaohan Shan, Xiaoming Sun, and Jialin Zhang. Higher ordermonotonicity and submodularity of influence in social networks: from local to global. arXiv preprint arXiv:1803.00666 , 2018. 3.5.1d, 5.1.1, 5.6, 5.6[45] Yuxin Chen, Shervin Javdani, Amin Karbasi, J. Andrew Bagnell, Siddhartha S.Srinivasa, and Andreas Krause. Submodular surrogates for value of information. In
AAAI , pages 3511–3518, 2015. 2.1.1b, 3.3.3, A.1, A.1[46] Han-Lim Choi, Luc Brunet, and Jonathan P. How. Consensus-based decentralizedauctions for robust task allocation.
IEEE Trans. Robotics , 25(4):912–926, 2009. 3.6.5,3.6.5a, 7.4, 8, 8.4.1, A.6, 9[47] Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, and De-badeepta Dey. Adaptive information gathering via imitation learning. In
Proc. ofRobot.: Sci. and Syst. , 2017. 2.1.1c[48] Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, GireejaRanade, Sebastian Scherer, and Debadeepta Dey. Data-driven planning via imita-tion learning.
Intl. Journal of Robotics Research , 37(13-14):1632–1672, 2018. 7.3.3c,7.3.3d, A.1, 2[49] Timothy H. Chung, Geoffrey A. Hollinger, and Volkan Isler. Search and pursuit-evasion in mobile robotics.
Auton. Robots , 31(4):299, 2011. 1[50] Titus Cieslewski, Elia Kaufmann, and Davide Scaramuzza. Rapid exploration withmulti-rotors: A frontier selection method for high speed flight. In
Proc. of theIEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , Vancouver, Canada, September2017. IEEE. 3.1.4[51] Andrew Clark and Radha Poovendran. A submodular optimization framework forleader selection in linear multi-agent systems. In , pages 3614–3621. IEEE,2011. 3.5.3a[52] Oliver M. Cliff, Robert Fitch, Salah Sukkarieh, Debra L. Saunders, and Robert160einsohn. Online localization of radio-tagged wildlife with an autonomous aerialrobot system. In
Proc. of Robot.: Sci. and Syst. , Rome, Italy, July 2015. 6.1[53] Micah Corah and Nathan Michael. Active estimation of mass properties for safecooperative lifting. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Singapore,May 2017. IEEE. 3.3.1, 8.3[54] Micah Corah and Nathan Michael. Efficient online multi-robot exploration via dis-tributed sequential greedy assignment. In
Proc. of Robot.: Sci. and Syst. , Cambridge,MA, 2017. 4[55] Micah Corah and Nathan Michael. Distributed submodular maximization on parti-tion matroids for planning on large sensor networks. In
Proc. of the IEEE Conf. onDecision and Control , Miami, FL, December 2018. 5, 1[56] Micah Corah and Nathan Michael. Distributed matroid-constrained submodularmaximization for multi-robot exploration: theory and practice.
Auton. Robots , 43(2):485–501, 2019. 4[57] Micah Corah, Cormac O’Meadhra, Kshitij Goel, and Nathan Michael.Communication-efficient planning and mapping for multi-robot exploration in largeenvironments.
IEEE Robot. Autom. Letters , 4(2):1715–1721, 2019. 1.2, 3.1.4, 3.1.5,4, 7, 7.4, 7.4, 14, 7.7, 8.2.2, 8.2.4, 8.4.2, 11[58] T. M. Cover and J. A. Thomas.
Elements of Information Theory . John Wiley &Sons, New York, NY, 2012. 2.2.3, 3.2, 3.2.2, 5.2, 6.2, 6.2.1, 6.2.1, 6.2.2, 6.7, A.2.1[59] Philip Dames and Vijay Kumar. Autonomous localization of an unknown number oftargets without data association using teams of mobile sensors.
IEEE Transactionson Automation Science and Engineering , 12(3):850–864, 2015. 3.3.1[60] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on largeclusters.
Communications of the ACM , 51(1):107–113, 2008. 10[61] Robin Deits and Russ Tedrake. Efficient mixed-integer planning for uavs in clutteredenvironments. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , pages 42–49.IEEE, 2015. 1[62] Jeffrey Delmerico, Stefan Isler, Reza Sabzevari, and Davide Scaramuzza. A com-parison of volumetric information gain metrics for active 3D object reconstruction.
Auton. Robots , 42(2):197–208, 2018. 3.1.3, 3.1.4, 7.3, A.4[63] Mihir Dharmadhikari, Tung Dang, Lukas Solanka, Johannes Loje, Huan Nguyen,Nikhil Khedekar, and Kostas Alexis. Motion primitives-based path planning for fastand agile exploration using aerial robots. In
Proc. of the IEEE Intl. Conf. on Robot.and Autom. , Paris, France, May 2020. 7.3.2a[64] Alberto Elfes. Using occupancy grids for mobile robot perception and navigation.
IEEE Computer Society , 22(6):46–57, 1989. 4.2.2[65] FAA. FAA aerospace forecast: Fiscal years 2019–2039. , 2019. 1 16166] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM(JACM) , 45(4):634–652, 1998. 3.5.3a, 5.1.1[67] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodularmaximization subject to a matroid constraint. In
Proc. of the IEEE Annu. Symp.Found. Comput. Sci. , New Brunswick, NJ, October 2012. 3.5.3b[68] Conor Finn. A new framework for decomposing multivariate information. 2020.3.5.1d[69] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations formaximizing submodular set functions-II.
Polyhedral Combinatorics , 8:73–87, 1978.1.2, 1.4, 2, 3.5.3b, 4.1, 4.5.1, 6.1, 6.5.1, 9[70] Stephan Foldes and Peter L. Hammer. Submodularity, supermodularity, and higher-order monotonicities of pseudo-boolean functions.
Mathematics of Operations Re-search , 30(2):453–461, 2005. 3.5.1c, 3.5.1d, 5.1.1, 5.1.1, 5.6, 6.5.2a, 7.3.1a, B.1, B.1[71] David Fridovich-Keil, Jaime F. Fisac, and Claire J. Tomlin. Safe and complete real-time planning and exploration in unknown environments. In
Proc. of the IEEE Intl.Conf. on Robot. and Autom. , Montreal, Canada, May 2019. 3.1.1[72] Vibhav Nagaraj Ganesh. Robust distributed 3D mapping with communication con-straints. Technical Report CMU-RI-TR-17-32, Carnegie Mellon University, Pitts-burgh, PA, May 2017. 1.2[73] Stephen M. George, Wei Zhou, Harshavardhan Chenji, Myounggyu Won, Yong OhLee, Andria Pazarloglou, Radu Stoleru, and Prabir Barooah. DistressNet: a wire-less ad hoc and sensor network architecture for situation management in disasterresponse.
IEEE Communications Magazine , 48(3):128–136, 2010. 1.1[74] Shayan Oveis Gharan and Jan Vondr´ak. Submodular maximization by simulatedannealing. In
Proc. of the Symposium on Discrete Algorithms , Philadelphia, PA,January 2011. 4.4.4[75] Bahman Gharesifard and Stephen L. Smith. Distributed submodular maximizationwith limited information.
IEEE Trans. Control Netw. Syst. , 2017. to appear. 1.2,3.6.3, 3.1, 3.6.5, 3.6.5c, 3.6, 4.4.3, 5, 5.3, 5.3, 5.8, 6.4.2, 9[76] Kshitij Goel, Micah Corah, Curtis Boirum, and Nathan Michael. Fast explorationusing multirotors: Analysis, planning, and experimentation. In
Field and ServiceRobotics , Tokyo, Japan, August 2019. 1, 1.2, 2.1.1a, 3.1.1, 3.1.3, 3.3.3, 4, 7.3.2a[77] Kshitij Goel, Micah Corah, and Nathan Michael. Fast exploration using multiro-tors: Analysis, planning, and experimentation. Technical Report CMU-RI-TR-19-03, Carnegie Mellon University, Pittsburgh, PA, February 2019. 1, 1.2, 2.1.1d, 3.1.1,3.1.3, 3.3.3[78] Michel Goemans and Jan Vondr´ak. Stochastic covering and adaptivity. In
LatinAmerican symposium on theoretical informatics , pages 532–543. Springer, 2006. 3.3.3[79] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applica-tions in active learning and stochastic optimization.
Journal of Artificial Intelligence esearch , 42:427–486, 2011. 1, 2, 2.1, 2, 2.1.1b, 2.1.1c, 2.1.1f, 2.1.2, 2.1.3, 3.3.3,3.5.3e, 7.3.1a, 7.6.1, 7.7.2, A.1, A.1, A.1[80] Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-optimal bayesian activelearning with noisy observations. In
Adv. in Neural Inf. Process. Syst. , pages 766–774,2010. 2.1.1b, 1[81] Pranava R. Goundan and Andreas S. Schulz. Revisiting the greedy approach tosubmodular set function maximization.
Optim. online , (1984):1–25, 2007. 2, 3.6.5b[82] David Grimsman, Mohd Shabbir Ali, Joao P. Hespanha, and Jason R. Marden. Theimpact of information in greedy submodular maximization.
IEEE Transactions onControl of Network Systems , 2018. 1.2, 3.5.3d, 3.6.3, 3.1, 3.6.5, 3.6.5c, 3.6, 4.4.3, 5,5.1.1, 5.3, 5.8, 6.4.2, 9, B.4[83] David Grimsman, Jo˜ao P. Hespanha, and Jason R. Marden. Strategic informationsharing in greedy submodular maximization. In
Proc. of the IEEE Conf. on Decisionand Control , pages 2722–2727, Miami, FL, December 2018. IEEE. 1.2[84] Anupam Gupta and Roie Levin. Fully-dynamic submodular cover with boundedrecourse. arXiv preprint arXiv:2009.00800 , 2020. 3.5.1d, 5.8[85] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. Approximation algorithmsfor optimal decision trees and adaptive tsp problems.
Mathematics of OperationsResearch , 42(3):876–896, 2017. A.1[86] Panu Harmo, Tapio Taipalus, Jere Knuuttila, Jos´e Vallet, and Aarne Halme. Needsand solutions-home automation and service robots for the elderly and disabled. In
Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , pages 3201–3206.IEEE, 2005. 1[87] Theia Henderson, Vivienne Sze, and Sertac Karaman. An efficient and continuousapproach to information-theoretic exploration. In
Proc. of the IEEE Intl. Conf. onRobot. and Autom. , May 2020. 3, 7, 7.3, 7.3.2, 7.3.2a, 9.1.2, A.4[88] Trevor Henderson.
A continuous approach to information-theoretic exploration withrange sensors . PhD thesis, Massachusetts Institute of Technology, 2019. 7, 7.3.3b,A.4.1[89] Geoffrey Hollinger, Sanjiv Singh, Joseph Djugash, and Athanasios Kehagias. Efficientmulti-robot search for a moving target.
Intl. Journal of Robotics Research , 28(2):201–219, 2009. 1.2, 3.4.1, 6.1[90] Geoffrey A. Hollinger and Gaurav S. Sukhatme. Sampling-based robotic informationgathering algorithms.
Intl. Journal of Robotics Research , 33(9):1271–1287, 2014.2.1.2a, 3.3.2, 6.1, 6.4.4[91] Geoffrey A. Hollinger, Brendan Englot, Franz Hover, Urbashi Mitra, and Gaurav S.Sukhatme. Active planning for underwater inspection and the benefit of adaptivity.
Intl. Journal of Robotics Research , 32(1):3–18, 2013. 1, 3.3.3[92] Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and WolframBurgard. OctoMap: An efficient probabilistic 3D mapping framework based on oc-163rees.
Auton. Robots , 34(3):189–206, 2013. A.3[93] Vadim Indelman, Luca Carlone, and Frank Dellaert. Planning under uncertainty inthe continuous domain: a generalized belief space approach. In
Proc. of the IEEEIntl. Conf. on Robot. and Autom. , pages 6763–6770. IEEE, 2014. 1.2, 3.1.2, 3.3.1,6.4.4[94] Rishabh Iyer, Ninad Khargoankar, Jeff Bilmes, and Himanshu Asanani. Submodularcombinatorial information measures with applications in machine learning. arXivpreprint arXiv:2006.15412 , 2020. 3.5.1d[95] Lucas Janson, Tommy Hu, and Marco Pavone. Safe motion planning in unknownenvironments: Optimality benchmarks and tractable policies. In
Proc. of Robot.:Sci. and Syst. , Pittsburgh, PA, 2018. 2.1.1d, 3.1.1[96] Shervin Javdani, Matthew Klingensmith, J. Andrew Bagnell, Nancy S. Pollard, andSiddhartha S. Srinivasa. Efficient touch based localization through submodularity.In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Karlsruhe, Germany, May2013. 3, 3.3.3, A.1, A.1, A.1[97] Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, Drew Bagnell, andSiddhartha Srinivasa. Near optimal bayesian active learning for decision making. In
Artificial Intelligence and Statistics , pages 430–438, 2014. A.1[98] Luke B. Johnson, Han-Lim Choi, Sameera S. Ponda, and Jonathan P. How. Decen-tralized task allocation using local information consistency assumptions.
J. Aero.Inf. Syst. , 14(2):103–122, 2017. 3.6.5a, 5[99] Heather Jones.
Using Planned View Trajectories to Build Good Models of Plane-tary Features under Transient Illumination . PhD thesis, Carnegie Mellon University,Pittsburgh, PA, May 2016. 7.2[100] Heather Jones, Wennie Tabib, and William L. Whittaker. Planning views to modelplanetary pits under transient illumination. In
IEEE Aerospace Conference . IEEE,2015. 7.2[101] Stefan Jorgensen, Robert H. Chen, Mark B. Milam, and Marco Pavone. The teamsurviving orienteers problem: Routing robots in uncertain environments with survivalconstraints. In , pages 227–234. IEEE, 2017. 3.5.3a[102] Stefan Jorgensen, Robert H. Chen, Mark B. Milam, and Marco Pavone. The matroidteam surviving orienteers problem: Constrained routing of heterogeneous teams withrisky traversal. In
Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and Syst. ,Vancouver, Canada, September 2017. 1.2, 3.4.1, 5, 6.1[103] Brian J. Julian, Sertac Karaman, and Daniela Rus. On mutual information-basedcontrol of range sensing robots for mapping applications.
Intl. Journal of RoboticsResearch , 33(10):1357–1392, 2014. 3.1, 3.1.3, 3.3.1, 4.2.2, 6, 7, 7.3, 7.3.2, 7.3.2, 7.3.3b,A.4, 8[104] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning164nd acting in partially observable stochastic domains.
Artificial intelligence , 101(1-2):99–134, 1998. 2.1.1f[105] Richard M. Karp. Reducibility among combinatorial problems. In
Complexity ofcomputer computations , pages 85–103. Springer, 1972. 5.5.1[106] Nitish Korula, Vahab Mirrokni, and Morteza Zadimoghaddam. Online submodu-lar welfare maximization: Greedy beats 1/2 in random order.
SIAM Journal onComputing , 47(3):1056–1086, 2018. 3.5.1d, 5.1.1, 5.8[107] Andreas Krause and Carlos E. Guestrin. Near-optimal nonmyopic value of infor-mation in graphical models. In
Proc. of the Conf. on Uncertainty in Artif. Intell. ,Edinburgh, Scotland, 2005. 3.5.1a, 3.5.3a, 6.2.1[108] Andreas Krause, Carlos Guestrin, Anupam Gupta, and Jon Kleinberg. Near-optimalsensor placements: Maximizing information while minimizing communication cost.In
Proceedings of the 5th international conference on Information processing in sensornetworks , pages 2–10. ACM, 2006. 6[109] Andreas Krause, Jure Leskovec, Carlos Guestrin, Jeanne VanBriesen, and ChristosFaloutsos. Efficient sensor placement optimization for securing large water distri-bution networks.
Journal of Water Resources Planning and Management , 134(6):516–526, 2008. 3.5.3e, 7.6.1, 7.7.2[110] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensor placementsin Gaussian processes: Theory, efficient algorithms and empirical studies.
J. Mach.Learn. Res. , 9:235–284, 2008. 4.1[111] Miroslav Kulich, Jan Faigl, and Libor Pˇreuˇcil. On distance utility in the explorationtask. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Shanghai, China, May2011. 3.1.4[112] Rajiv Ranjan Kumar, Pradeep Varakantham, and Akshat Kumar. Decentralizedplanning in stochastic environments with submodular rewards. In
AAAI , pages 3021–3028, 2017. 3.3.3, 3.6.5b[113] Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fastgreedy algorithms in mapreduce and streaming.
ACM Transactions on Parallel Com-puting (TOPC) , 2(3):1–22, 2015. 3.6.4[114] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation.
Journal ofthe ACM (JACM) , 27(4):831–838, 1980. 4.4.2, 4.4.5[115] Jean-Philippe Lang. Open-mesh. , 2020. Accessed: 08-07-2020. 8.1.3[116] Mikko Lauri and Risto Ritala. Planning for robotic exploration based on forwardsimulation.
Robot. Auton. Syst. , 83:15–31, 2016. 2.1.2, 3.1.1, 3.1.3, 3.3.3, 3.4.1b,4.2.3, 4.3, 6.1, 6.4.4[117] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van-Briesen, and Natalie Glance. Cost-effective outbreak detection in networks. In
Pro-ceedings of the 13th ACM SIGKDD international conference on Knowledge discovery nd data mining , pages 420–429, 2007. 7.6.1, 7.7.2[118] Peter Zhi Xuan Li, Zhengdong Zhang, Sertac Karaman, and Vivienne Sze. High-throughput computation of Shannon mutual information on chip. In
Proc. of Robot.:Sci. and Syst. , Freiburg, Germany, 2019. 7, 7.3.2a[119] Wenzheng Li, Paul Liu, and Jan Vondrak. A polynomial lower bound on adaptivecomplexity of submodular maximization. arXiv preprint arXiv:2002.09130 , 2020.3.6.3[120] Sikang Liu, Michael Watterson, Sarah Tang, and Vijay Kumar. High speed navigationfor quadrotors with limited onboard sensing. In
Proc. of the IEEE Intl. Conf. onRobot. and Autom. , Stockholm, Sweden, May 2016. 1, 2.1.1d[121] Wenhao Luo and Ashish Kapoor. Airborne collision avoidance systems with prob-abilistic safety barrier certificates. In
Proceedings of Workshop on Safety and Ro-bustness in Decision-making at the Thirty-third Conference on Neural InformationProcessing Systems (NeurIPS 2019) , December 2019. 2.1.1d, 8.2.5d[122] Wenhao Luo, Shehzaman S. Khatib, Sasanka Nagavalli, Nilanjan Chakraborty, andKatia Sycara. Distributed knowledge leader selection for multi-robot environmentalsampling under bandwidth constraints. In
Proc. of the IEEE/RSJ Intl. Conf. onIntell. Robots and Syst. , pages 5751–5757, Daejon, Korea, October 2016. IEEE. A.6[123] Wenhao Luo, Changjoo Nam, George Kantor, and Katia Sycara. Distributed envi-ronmental modeling and adaptive sampling for multi-robot sensor coverage. In
Proc.of the International Conference on Autonomous Agents and Multi-Agent Systems ,pages 1488–1496, 2019. 1.2[124] Nancy A. Lynch.
Distributed algorithms . Elsevier, 1996. 8.1.1, 8.2.1[125] Robert Mahony, Vijay Kumar, and Peter Corke. Multirotor aerial vehicles: Model-ing, estimation, and control of quadrotor.
IEEE Robot. Autom. Mag. , 19(3):20–32,September 2012. 4.6.2a[126] Jason R. Marden. The role of information in multiagent coordination. In
Proc. ofthe IEEE Conf. on Decision and Control , 2015. 3.6.5b[127] George Mathew and Igor Mezi´c. Metrics for ergodicity and design of ergodic dynamicsfor multi-agent systems.
Physica D: Nonlinear Phenomena , 240(4-5):432–442, 2011.2.2.2[128] Nathan Michael, Michael M. Zavlanos, Vijay Kumar, and George J. Pappas. Main-taining connectivity in mobile robot networks. In
Experimental Robotics , pages 117–126. Springer, 2009. 3.1.6[129] Lauren M. Miller, Yonatan Silverman, Malcolm A. MacIver, and Todd D. Murphey.Ergodic exploration of distributed information.
IEEE Trans. Robotics , 32(1):36–52,2015. 2.2.2[130] Stephen Miller, Jur Van Den Berg, Mario Fritz, Trevor Darrell, Ken Goldberg, andPieter Abbeel. A geometric approach to robotic laundry folding.
Intl. Journal ofRobotics Research , 31(2):249–267, 2012. 1166131] David L. Mills. Internet time synchronization: the network time protocol.
IEEETransactions on communications , 39(10):1482–1493, 1991. 8.2.1[132] Michel Minoux. Accelerated greedy algorithms for maximizing submodular set func-tions. In
Optimization techniques , pages 234–243. Springer, 1978. 3.5.3e, 7.6.1[133] Baharan Mirzasoleiman, Rik Sarkar, and Andreas Krause. Distributed submodularmaximization: Identifying representative elements in massive data. In
Adv. in NeuralInf. Process. Syst. , Stateline, Nevada, December 2013. 3.6.4, 4.4.3[134] Derek Mitchell and Nathan Michael. Persistent multi-robot mapping in an uncertainenvironment. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Montreal,Canada, May 2019. 3.1.5, 7, 7.4, 7.8.1[135] Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Decentralized submodularmaximization: Bridging discrete and continuous settings. In
Proc. of the Intl. Conf.on Machine Learning , 2018. 3.6.5[136] Robin R. Murphy and Alexander Kleiner. A community-driven roadmap for theadoption of safety security and rescue robots. In
IEEE Intl. Symp. on Safety, Secu-rity, and Rescue Robotics , pages 1–5. IEEE, 2013. 1.1[137] Robin R. Murphy, Satoshi Tadokoro, Daniele Nardi, Adam Jacoff, Paolo Fiorini,Howie Choset, and Aydan M. Erkmen. Search and rescue robotics.
Springer handbookof robotics , pages 1151–1173, 2008. 1.1, 3.1.2[138] Robin R. Murphy, Satoshi Tadokoro, and Alexander Kleiner. Disaster robotics.
Springer Handbook of Robotics , pages 1577–1604, 2016. 1, 1.1, 3.1.6, 8.1.3[139] Erik Nelson, Micah Corah, and Nathan Michael. Environment model adaptation formobile robot exploration.
Auton. Robots , 42(2):257–272, 2018. 3.1.3[140] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations formaximizing submodular set functions-I.
Math. Program. , 14(1):265–294, 1978. 1.2,3.5.3a[141] George L. Nemhauser and Laurence A. Wolsey. Best algorithms for approximatingthe maximum of a submodular set function.
Mathematics of operations research , 3(3):177–188, 1978. 3.5.3a[142] Gennaro Notomista, Sebastian F. Ruf, and Magnus Egerstedt. Persistification ofrobotic tasks using control barrier functions.
IEEE Robot. Autom. Letters , 3(2):758–763, 2018. 2.1.1d[143] Reza Olfati-Saber and Richard M. Murray. Consensus problems in networks of agentswith switching topology and time-delays.
IEEE Transactions on automatic control ,49(9):1520–1533, 2004. 3.6.5[144] Timothy Patten.
Active Object Classification from 3D Range Data with MobileRobots . PhD thesis, The University of Sydney, 2017. 4.3[145] Chiara Piacentini, Sara Bernardini, and J. Christopher Beck. Autonomous targetsearch with multiple coordinated UAVs.
Journal of Artificial Intelligence Research ,65:519–568, 2019. 6.1 167146] Sameera S. Ponda.
Robust distributed planning strategies for autonomous multi-agentteams . PhD thesis, Massachusetts Institute of Technology, 2012. 3.6.5a, 8[147] Stephen Prajna and Ali Jadbabaie. Safety verification of hybrid systems using bar-rier certificates. In
International Workshop on Hybrid Systems: Computation andControl , pages 477–492. Springer, 2004. 2.1.1d[148] Morgan Quigley, Brian Gerkey, Ken Conley, Josh Faust, Tully Foote, Jeremy Leibs,Eric Berger, Rob Wheeler, and Andrew Ng. ROS: an open-source robot operatingsystem. In
ICRA Workshop on Open Source Software , Kobe, Japan, May 2009.4.6.1b, 8.4.2[149] Srikumar Ramalingam, Chris Russell, L’ubor Ladick`y, and Philip H.S. Torr. Effi-cient minimization of higher order submodular functions using monotonic booleanfunctions.
Discrete Applied Mathematics , 220:1–19, 2017. 5.1.1, 5.6[150] James B. Rawlings and Kenneth R. Muske. The stability of constrained recedinghorizon control.
IEEE Trans. Autom. Control , 38(10):1512–1516, 1993. 2.1.1d, 2.1.2,3.1.1, 4.5.2[151] Tal Regev and Vadim Indelman. Multi-robot decentralized belief space planning inunknown environments via efficient re-evaluation of impacted paths. In
Proc. of theIEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , Daejeon, Korea, October 2016.1.2, 3.4.1, 4.1, 8[152] Charles Richter and Nicholas Roy. Safe visual navigation via deep learning andnovelty detection. In
Proc. of Robot.: Sci. and Syst. , Cambridge, MA, 2017. 3.1.1[153] Mike Roberts, Debadeepta Dey, Anh Truong, Sudipta Sinha, Shital Shah, AshishKapoor, Pat Hanrahan, and Neel Joshi. Submodular trajectory optimization foraerial 3d scanning. arXiv preprint arXiv:1705.00703 , 2017. 3.3.1, 5[154] Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, and HamedHassani. Optimal algorithms for submodular maximization with distributed con-straints. arXiv preprint arXiv:1909.13676 , 2019. 3.6.5, 9.1.1[155] Tim Roughgarden. Intrinsic robustness of the price of anarchy. In
Proceedings of theforty-first annual ACM symposium on Theory of computing , pages 513–522. ACM,2009. 3.6.5b[156] Allison Ryan and J. Karl Hedrick. Particle filter based information-theoretic activesensing.
Robot. Auton. Syst. , 58(5):574–584, 2010. 2.2.4, 6.7[157] Mahyar Salek, Shahin Shayandeh, and David Kempe. You share, I share: Network ef-fects and economic incentives in P2P file-sharing systems. In
International Workshopon Internet and Network Economics , pages 354–365. Springer, 2010. 5.6, 5.8[158] Manish Saroya, Graeme Best, and Geoffrey A. Hollinger. Online exploration oftunnel networks leveraging topological cnn-based world predictions. In
Proc. of theIEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , Las Vegas, NV, October 2020.7.3.3c[159] Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, and Matthijs T. J. Spaan. Ex-168loiting submodular value functions for scaling up active perception.
Auton. Robots ,42(2):209–233, 2018. 3.3.3, 9.1.2[160] Brent Schlotfeldt, Dinesh Thakur, Nikolay Atanasov, Vijay Kumar, and George J.Pappas. Anytime planning for decentralized multi-robot active information gather-ing.
IEEE Robot. Autom. Letters , 3766(c):1–8, 2018. 2.1.2, 8[161] Brent Schlotfeldt, Vasileios Tzoumas, Dinesh Thakur, and George J. Pappas. Re-silient active information gathering with mobile robots. In
Proc. of the IEEE/RSJIntl. Conf. on Intell. Robots and Syst. , pages 4309–4316. IEEE, 2018. 1.2[162] Alexander Schrijver.
Combinatorial optimization: polyhedra and efficiency , vol-ume 24. Springer Science & Business Media, 2003. 3.5.2, 3.5.2[163] Mac Schwager, Philip Dames, Daniela Rus, and Vijay Kumar. A multi-robot controlpolicy for information gathering in the presence of unknown hazards. In
Proc. of theIntl. Sym. of Robot. Research , Flagstaff, AZ, August 2011. 6[164] Pau Segui-Gasco, Hyo-Sang Shin, Antonios Tsourdos, and V. J. Segui. Decentralisedsubmodular multi-robot task allocation. In
Proc. of the IEEE/RSJ Intl. Conf. onIntell. Robots and Syst. , pages 2829–2834, Hamburg, Germany, September 2015.IEEE. 3.6.5[165] Kunal Shah and Mac Schwager. GRAPE: Geometric risk-aware pursuit-evasion.
Robot. Auton. Syst. , 2019. 6.1, 1[166] Hyo-Sang Shin, Teng Li, and Pau Segui-Gasco. Sample greedy based task allocationfor multiple robot systems. arXiv preprint arXiv:1901.03258 , 2019. 3.6.5a[167] David Silver and Joel Veness. Monte-carlo planning in large pomdps. In
Adv. inNeural Inf. Process. Syst. , pages 1–9, 2010. 3.3.3[168] Reid Simmons, David Apfelbaum, Wolfram Burgard, Dieter Fox, Mark Moors, Se-bastian Thrun, and H˚akan Younes. Coordination for multi-robot exploration andmapping. In
Assoc. for Adv. of Artif. Intell. , pages 852–858, 2000. 7, 7.3, 7.3.3a, 7.4,7.4, 7.8.1, A.4[169] Amarjeet Singh, Andreas Krause, Carlos Guestrin, and William J. Kaiser. Efficientinformative sensing using multiple robots.
J. Artif. Intell. Res. , 34:707–755, 2009.1.2, 2, 2.1.2a, 3.3.2, 3.4.1, 3.4.1, 2, 3.4.1a, 3.4.1b, 4.1, 4.3, 4.4, 4.4.1, 2, 5, 6.1, 6.4.1,6.4.4, 6.5.1, 1, 9, B.3[170] Agusti Solanas and Miguel Angel Garcia. Coordinated multi-robot explorationthrough unsupervised clustering of unknown space. In
Proc. of the IEEE/RSJ Intl.Conf. on Intell. Robots and Syst. , volume 1, pages 717–721. IEEE, 2004. 3.1.5[171] Matthijs T. J. Spaan, Tiago S. Veiga, and Pedro U. Lima. Decision-theoretic plan-ning under uncertainty with information rewards for active cooperative perception.
Autonomous Agents and Multi-Agent Systems , 29(6):1157–1185, 2015. 3.3.3[172] Cyrill Stachniss, Giorgio Grisetti, and Wolfram Burgard. Information gain-basedexploration using Rao-Blackwellized particle filters. In
Proc. of Robot.: Sci. andSyst. , 2005. 3.1.1, 3.1.2 169173] Soumya Sudhakar, Sertac Karaman, and Vivienne Sze. Balancing actuation andcomputing energy in motion planning. 2[174] Fouad Sukkar, Graeme Best, Chanyeol Yoo, and Robert Fitch. Multi-robot region-of-interest reconstruction with Dec-MCTS. In
Proc. of the IEEE Intl. Conf. on Robot.and Autom. , pages 9101–9107. IEEE, 2019. 8[175] Haoyuan Sun, David Grimsman, and Jason R Marden. Distributed submodularmaximization with parallel execution. In
Proc. of the Amer. Control Conf. , Denver,CO, 2020. 3.6.5c[176] Yoonchang Sung, Ashish Kumar Budhiraja, Ryan K. Williams, and Pratap Tokekar.Distributed simultaneous action and target assignment for multi-robot multi-targettracking. In
Proc. of the IEEE Intl. Conf. on Robot. and Autom. , Brisbane, Australia,May 2018. IEEE. 3.4[177] Yoonchang Sung, Ashish Kumar Budhiraja, Ryan K. Williams, and Pratap Tokekar.Distributed assignment with limited communication for multi-robot multi-targettracking.
Auton. Robots , 2019. 6.1[178] Wennie Tabib and Nathan Michael. Simultaneous localization and mapping of sub-terranean voids with Gaussian mixture models. In
Field and Service Robotics , 2019.8.4.2[179] Wennie Tabib, Micah Corah, Nathan Michael, and Red Whittaker. Computation-ally efficient information-theoretic exploration of pits and caves. In
Proc. of theIEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , Daejeon, Korea, October 2016. 2,3.1.3, 4.6.1b, 4.6.2a, 7, 7.3.3b, 10, 7.2[180] Wennie Tabib, Kshitij Goel, John Yao, Curtis Boirum, and Nathan Michael. Au-tonomous cave surveying with an aerial robot. arXiv preprint arXiv:2003.13883 ,2020. 8.4.2[181] Michael Tatum. Communications coverage in unknown underground environments.Master’s thesis, Carnegie Mellon University, Pittsburgh, PA, June 2020. 3.1.6[182] Pratap Tokekar, Volkan Isler, and Antonio Franchi. Multi-target visual trackingwith aerial robots. In
Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and Syst. ,Chicago, IL, September 2014. 6.1[183] Alberto Valero-G´omez, Javier V. G´omez, Santiago Garrido, and Luis Moreno. Fastmarching methods in path planning.
IEEE Robot. Autom. Mag. , 2013. 7.4[184] Adrian Vetta. Nash equilibria in competitive societies, with applications to facilitylocation, traffic routing and auctions. In
The 43rd Annual IEEE Symposium onFoundations of Computer Science, 2002. Proceedings. , pages 416–425. IEEE, 2002.3.6.5b, 7.8.1[185] Jan Vondr´ak. Optimal approximation for the submodular welfare problem in thevalue oracle model. In
Proceedings of the fortieth annual ACM symposium on Theoryof computing , pages 67–74. ACM, 2008. 3.5.3b, 11[186] Chaoqun Wang, Delong Zhu, Teng Li, Max Q.-H. Meng, and Clarence W. de Silva.170fficient autonomous robotic exploration with semantic road map in indoor environ-ments.
IEEE Robot. Autom. Letters , 4(3):2989–2996, 2019. 8.2.4[187] Li Wang, Aaron D. Ames, and Magnus Egerstedt. Multi-objective compositionsfor collision-free connectivity maintenance in teams of mobile robots. In
Proc. ofthe IEEE Conf. on Decision and Control , pages 2659–2664, Las Vegas, Nevada,December 2016. IEEE. 3.1.6[188] Li Wang, Aaron D. Ames, and Magnus Egerstedt. Safe certificate-based maneuversfor teams of quadrotors using differential flatness. In
Proc. of the IEEE Intl. Conf.on Robot. and Autom. , pages 3293–3298, Singapore, May 2017. IEEE. 8.2.5d[189] Li Wang, Aaron D. Ames, and Magnus Egerstedt. Safety barrier certificates forcollisions-free multirobot systems.
IEEE Trans. Robotics , 33(3):661–674, 2017. 8.2.5d[190] Yiming Wang, Stuart James, Elisavet Konstantina Stathopoulou, Carlos Beltr´an-Gonz´alez, Yoshinori Konishi, and Alessio Del Bue. Autonomous 3-D reconstruction,mapping, and exploration of indoor environments with a robotic arm.
IEEE Robot.Autom. Letters , 4(4):3340–3347, 2019. 7.3.3c[191] Zengfu Wang, Bill Moran, Xuezhi Wang, and Quan Pan. An accelerated continuousgreedy algorithm for maximizing strong submodular functions.
Journal of Combina-torial Optimization , 30(4):1107–1124, 2015. 3.5.1d, 1, 5.4, 5.6, 5.8[192] Michael Watterson and Vijay Kumar. Safe receding horizon control for aggressivemav flight with limited range sensing. In , pages 3235–3240. IEEE, 2015. 2.1.1d[193] Jason L. Williams.
Information Theoretic Sensor Management . PhD thesis, Mas-sachusetts Institute of Technology, 2007. 2[194] Ryan K. Williams, Andrea Gasparri, and Giovanni Ulivi. Decentralized matroidoptimization for topology constraints in multi-robot allocation problems. In
Proc.of the IEEE Intl. Conf. on Robot. and Autom. , Singapore, May 2017. 3.4.1, 3.6.5,3.6.5a, 8, A.6[195] Christian Witting, Marius Fehr, Rik B¨ahnemann, Helen Oleynikova, and RolandSiegwart. History-aware autonomous exploration in confined environments usingmavs. In
Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and Syst. , Madrid,Spain, October 2018. IEEE. 8.2.4[196] Uland Wong, Warren Whittaker, Heather Jones, and Red Whittaker. NASA plan-etary pits and caves analog dataset. https://ti.arc.nasa.gov/dataset/caves/ ,December 2014. Accessed: 07-01-2020. 7.2[197] Jiahao Xie, Chao Zhang, Zebang Shen, Chao Mi, and Hui Qian. Decentralizedgradient tracking for continuous DR-submodular maximization. In
Proc. of the Intl.Conf. on Artif. Intell. and Stat. , pages 2897–2906, 2019. 3.6.5, 9.1.1[198] B. Yamauchi. A frontier-based approach for autonomous exploration. In
Proc. of theIntl. Sym. on Comput. Intell. in Robot. and Autom. , Monterey, CA, July 1997. 2.1,2.1.1e, 3.1.1, 3.1.4, 7, 7.4 171199] Luke Yoder and Sebastian Scherer. Autonomous exploration for infrastructure mod-eling with a micro aerial vehicle. In
Field and Service Robotics , pages 427–440.Springer, 2016. 7.3.1[200] Haifeng Zhang and Yevgeniy Vorobeychik. Submodular optimization with routingconstraints. In
Assoc. for Adv. of Artif. Intell. , 2016. 2.1.2a, 3.3.2, 6.4.4, A.4[201] Zhengdong Zhang, Theia Henderson, Sertac Karaman, and Vivienne Sze. FSMI: Fastcomputation of shannon mutual information for information-theoretic mapping.
Intl.Journal of Robotics Research , 2020. 3.1.3, 4.2.2, 7, 7.3.2, 7.3.2a, 7.3.3b[202] Lifeng Zhou and Pratap Tokekar. Sensor assignment algorithms to improve observ-ability while tracking targets.
IEEE Trans. Robotics , 35(5):1206–1219, 2019. 6.1[203] Lifeng Zhou, Vasileios Tzoumas, George J. Pappas, and Pratap Tokekar. Resilientactive target tracking with multiple robots.