[PDF] A sub-modular receding horizon solution for mobile multi-agent persistent monitoring

Abstract

We study the problem of persistent monitoring of a finite number of inter-connected geographical nodes by a group of heterogeneous mobile agents. We assign to each geographical node a concave and increasing reward function that resets to zero after an agent's visit. Then, we design the optimal dispatch policy of which nodes to visit at what time and by what agent by finding a policy set that maximizes a utility that is defined as the total reward collected at visit times. We show that this optimization problem is NP-hard and its computational complexity increases exponentially with the number of the agents and the length of the mission horizon. By showing that the utility function is a monotone increasing and submodular set function of agents' policy, we proceed to propose a suboptimal dispatch policy design with a known optimality gap. To reduce the time complexity of constructing the feasible search set and also to induce robustness to changes in the operational factors, we perform our suboptimal policy design in a receding horizon fashion. Then, to compensate for the shortsightedness of the receding horizon approach for reward distribution beyond the feasible policies of the agents over the receding horizon, we add a new term to our utility, which provides a measure of nodal importance beyond the receding horizon's sight. This term gives the policy design an intuition to steer the agents towards the nodes with higher rewards on the patrolling graph. Finally, we discuss how our proposed algorithm can be implemented in a decentralized manner. A simulation study demonstrates our results.

Full PDF

AA sub-modular receding horizon solution formobile multi-agent persistent monitoring

Navid Rezazadeh and Solmaz S. Kia

Abstract —We study the problem of persistent monitoring of aﬁnite number of inter-connected geographical nodes by a groupof heterogeneous mobile agents. We assign to each geographicalnode a concave and increasing reward function that resets to zeroafter an agent’s visit. Then, we design the optimal dispatch policyof what nodes to visit at what time and by what agent by ﬁndinga policy set that maximizes a utility that is deﬁned as the totalreward collected at visit times. We show that this optimizationproblem is NP-hard and its computational complexity increasesexponentially with the number of the agents and the lengthof the mission horizon. By showing that the utility function isa monotone increasing and submodular set function of agentspolicy, we proceed to propose a suboptimal dispatch policy designwith a known optimality gap. To reduce the time complexity ofconstructing the feasible search set and also to induce robustnessto changes in the operational factors, we perform our suboptimalpolicy design in a receding horizon fashion. Then, to compensatefor the shortsightedness of the receding horizon approach forreward distribution beyond the feasible policies of the agents overthe receding horizon, we add a new term to our utility, whichprovides a measure of nodal importance beyond the recedinghorizon’s sight. This term gives the policy design an intuition tosteer the agents towards the nodes with higher rewards on thepatrolling graph. Finally, we discuss how our proposed algorithmcan be implemented in a decentralized manner. A simulationstudy demonstrates our results.

I. I

NTRODUCTION

In recent years, coordinating the movement of mobile sensorsto cover areas that have not been adequately sampled/observedhas been explored in controls, wireless sensors and roboticcommunities with problems related to coverage, exploration,and deployment. Many of the proposed algorithms strive tospread sensors to desired positions to obtain a stationaryconﬁguration such that the coverage is optimized, see e.g., [1]–[3]. In this paper, however, instead of aiming to achieve animproved stationary network conﬁguration as the end resultof the sensors’ movement, our objective is to explore context-aware mobility strategies that re-position mobile sensors tomaximize their utilization and contribution over a missionhorizon. Motivating applications include persistent patrollingto discover forest ﬁres [4] or oil spillage in their earlystages [5], locating endangered animals in a large habitat [6]and event detection in urban environments [7]. More speciﬁ-cally, we consider a persistent patrolling/monitoring of a set of V = { , · · · , N } inter-connected geographical nodes via a setof A = { , · · · , M } mobile sensors/agents, where N > M .The mobile agents are conﬁned to a set of pre-speciﬁed edges

This work is supported by NSF award IIS-SAS-1724331. A preliminaryversion of this work will appear in the proceeding of the 8th IFAC Workshopon Distributed Estimation and Control in Networked Systems [29].

Fig. 1:

Examples of a set of geographical nodes of interest and theedges between them. Finite number of nodes to monitor in a city canbe restricted to some particular scanning zones (the picture on theleft) or the cell partitioned map of the city (the picture on the right).

E ⊂ V ×V , e.g., aerial or ground corridors, to traverse from onenode to another, see Fig. 1. Depending on their vehicle type,agents may have to take different edges to go from one nodeto another. Also, they may have different travel time alongthe same edge. The monitoring task is motivated by objectivessuch as data harvesting, event detection or health monitoring.We study designing a dispatch policy that orchestrates thetopological distribution of the mobile agents such that anoptimized service for a global monitoring task is providedwith a reasonable computational cost. To quantify the serviceobjective we assign the reward function R v ( t ) = (cid:26) , t = ¯ t v ,ψ v ( t − ¯ t v ) , t > ¯ t v , (1)to each node v ∈ V , where ψ v ( t ) is a nonnegative concaveand increasing function of time and ¯ t v is the latest time node v is scanned by an agent. For example, in the data harvestingor health monitoring ψ v ( . ) can be the weighted idle time ofthe node v or in event detection, it can be the probability ofat least one event taking place at inter-visit times. The goalthen in optimal patrolling is to design a dispatch policy (whatsequence of nodes to visit at what times by which agents)to score the maximum amount of collective reward for theteam over a given mission horizon. However, as we explainbelow, this problem is NP-hard. Our aim then is to design asuboptimal solution that has polynomial time complexity. Related work : Dispatch policy design for patrolling/monitoringof geographical nodes can be divided into two categories: theedges to travel between the nodes are not speciﬁed (designin continuous edge space) or otherwise (design in discreteedge space). When there are no prespeciﬁed inter-node edges,the optimal patrolling policy design includes also ﬁnding the a r X i v : . [ c s . M A ] M a r ig. 2: An agent has two possible routes to take over the designatedreceding horizon. The nodes’ color intensity shows their reward value.The blue route offers a higher reward over the receding horizon but itputs the agent close to an area with a lower amount of reward, whilethe red route results in lower total reward over the receding horizonbut puts the agent near an area with higher amount of reward. optimal inter-node trajectories that the agents should followwithout violating their mobility limits. Computing such trajec-tories requires solving two-point boundary value problems [8],which except for the one dimensional space [9], their solutiongenerally is intractable. Sub-optimal solutions are sought bylimiting the trajectory space to elliptical or Fourier seriesfunctions [8] or dividing the mission horizon into smaller timespans and ﬁnding the optimal trajectory over them [10]. Insome applications, however, the mobile agents are conﬁned totravel through pre-speciﬁed known edges between the nodes.For example, in a smart city setting, regulations can restrictthe admissible routes between the geographical nodes. In thedispatch policy design in discrete edge space, the complexityof ﬁnding the optimal policy for a single patrolling agent isthe same as the complexity of solving the Traveling Salesmanproblem, where the computational complexity grows exponen-tially with the number of the nodes [11]. In case of multiplepatrolling agents, the problem is even more complex, sinceeach agent’s policy design depends on the other agents’ policy.This problem is formalized in earlier studies such as [12], [13].Generally, when there are multiple edges to travel betweenevery two nodes or when each node is connected to multipleother nodes, ﬁnding an optimal long term patrolling schemeis not tractable. Constraining the agents to travel throughspeciﬁc edges to traverse among the geographical nodes allowsseeking optimal solutions for the problem. For example, whenthe connection topology between the geographical nodes isa path or a cyclic graph, optimal solutions for the problemare proposed in [14]–[17]. To overcome the complexity issueon generic graphs, [18] explores forming different cycles inthe graph and assigning agents to these cycles to patrol thenodes periodically and seeks to minimize the time that a nodestays un-visited. Alternatively, [19] proposes agents to moveto the most rewarding neighboring node based on their currentlocation.

Statement of contribution :In this paper, we propose a robust and suboptimal solutionto the long term patrolling problem that we stated earlier.Instead of using the customary idle time, ψ v ( t ) = t , as areward function, which reduces the optimal dispatch policydesign to collect the maximum reward to the minimum latency problem [20], we consider reward functions described byan increasing concave function. This allows us to model awider class of patrolling problems such as patrolling for eventdetection. We base the design of the utility function on the sumof the rewards collected over the mission horizon by the mobileagents. We discuss that the design of optimal patrolling policyto maximize the utility over the mission horizon is an NP-hardproblem. Speciﬁcally, we show that the complexity of ﬁndingthe optimal policy increases exponentially with the missionhorizon and number of agents. Next, we show that the utilityfunction is a monotone and increasing and submodular setfunction. To establish this result, we develop a set of auxiliarylemmas, presented in the appendix, based on the Karamata’sinequality [21]. Given the submodularity of the utility function,we propose a receding horizon sequential greedy algorithmto compute a suboptimal dispatch policy with a polynomialcomputation cost and guaranteed bound on optimality. Thereceding horizon nature of our solution induces robustnessto uncertainties of the environment. Our next contribution isto add a new term to our utility function to compensate forthe shortsightedness of the receding horizon approach, seeFig. 2. When agents patrol a large set of inter-connected nodes,this added term becomes useful by giving them an intuitionof the existing reward in the farther nodes. In recent years,submodular optimization has been widely used in sensor andactuator placement problems [22]–[27]. In comparison to thesensor/actuator placement problems, the challenge in our workis that the assigned policy per each mobile agent over thereceding horizon is a dynamic scheduling problem rather thana static sensor placement. To deal with this challenge, we usethe matroid constraint [28] approach to design our suboptimalsubmodular-based policy. Finally, we discuss how the ﬁnalalgorithm can be implemented in a decentralized manner. Asimulation study demonstrates our results. A preliminary ver-sion of this work appeared in [29]. Our notations are standard,though to avoid confusion certain concepts and notations aredeﬁned as need arises.II. P ROBLEM F ORMULATION

To formalize our objective we ﬁrst introduce our notations andstate our standing assumptions. For any node v ∈ V , N v is aset consisting node v and all the neighboring nodes that areconnected to node v via an edge in E . If there exists a pathconnecting node v ∈ V to node w ∈ V , we let τ iv,w ∈ R > bethe shortest travel time of agent i ∈ A from node v to w . Assumption 1:

Upon arrival of any agent i ∈ A at any time ¯ t ∈ R > at node v ∈ V , the agent immediately scans the nodeand the reward R v (¯ t ) is scored for the patrolling team A and ¯ t v of node v in (1) is set to ¯ t . If more than one agent arrive atnode v ∈ V and scan it at the same time ¯ t , the reward collectedfor the team is still R v (¯ t ) . Every agent i ∈ A should spend δ i ∈ R > amount of time at the node it visits to completeits measurement processing. During this time the agent cannotscan the node and therefore no reward can be collected.Let tuple p = ( V p , T p , a p ) be a dispatch policy of any agent a p ∈ A over the given mission time horizon, where V p and T p are the vectors that specify the nodes and the correspondingisit times assigned to agent a p . For any agent i ∈ A , we let P i be the set of all the admissible policies over the missionhorizon. For any policy p ∈ P i , i ∈ A , we let n p be thetotal number of nodes visited by agent i = a p , i.e., n p = dim ( V p ) . We refer to n p as the length of the policy p . Werefer to ( V p ( l ) , T p ( l ) , ) , l ∈ { , , · · · , n p } , as the l th step ofpolicy p . Assumption 2: for any policy p , we have V p ( l + 1) ∈ N V p ( l ) ,for all l ∈ { , , · · · , n p − } .We let P = (cid:83) Mi =1 P i . Then, given any ¯ P ⊂ P , the utilityfunction R : 2 P → R > is R ( ¯ P ) = (cid:88) ∀ p ∈ ¯ P (cid:88) n p l =1 R V p ( l ) ( T p ( l )) . (2)Given (2), the optimal policy to maximize the utility over agiven mission horizon is given by P (cid:63) = argmax ¯ P⊂P R ( ¯ P ) , s.t. (3a) | ¯ P ∩ P i | ≤ , i ∈ { , . . . , M } , (3b)where | . | returns the cardinality of a set. The constraintcondition (3b) is in the so-called partition matroid form [28]and restricts the choice of the optimal solution to be a setthat contains at most one member from each disjoint sets P , · · · , P M . A set value optimization problem of the form (3)is known to be NP-hard [30]. Lemma 2.1 below, whose proofis given in the appendix, gives the cost of constructing thefeasible set P and time complexity of solving optimizationproblem (3). Lemma 2.1 (Time complexity of problem (3) ) The cost ofconstructing the feasible set P of optimization problem (3) is of order O ( (cid:80) Mi =1 D ¯ n i ) , where D = max( |N | , · · · , |N N | ) and ¯ n i = max { n p } ∀ p ∈P i . Furthermore, the time complexity ofsolving optimization problem (3) is O ( (cid:81) Mi =1 D ¯ n i ) . If the system parameters, such as number of the mobileagents or the nodes, or the parameters of ψ v ( . ) of the rewardfunction at any node v , change after the optimal policy design,the operation should be stopped and the design should berepeated for the remainder of the mission horizon under thenew conditions. Our objective in this paper is to construct asuboptimal solution to solve the persistent patrolling problemgiven by (3) with polynomial time complexity. Moreover, weseek a solution that is robust to changes that can happenduring the mission horizon. We note here that because of thetime-varying nature of the various operational elements of theproblem, periodical solutions are not suitable.We close this section by introducing some deﬁnitions andnotations used subsequently. For any set function g : 2 Q → R ,we let ∆ g ( q | ¯ Q ) = g ( ¯ Q ∪ q ) − g ( ¯ Q ) , for ∀ ¯ Q ∈ Q and ∀ q ∈ Q , where ∆ g shows the increase invalue of the set function g going from set ¯ Q to ¯ Q ∪ q . Recallthat g : 2 Q → R is submodular if and only if for two sets Q and Q satisfying Q ⊂ Q ⊂ Q , and for q (cid:54)∈ Q wehave [28] ∆ g ( q | ¯ Q ) ≥ ∆ g ( q | ¯ Q ) . Then submodularity is a property of set functions that showsdiminishing reward as new members are being introduced tothe system. We say g : 2 Q → R is monotone increasing if forall Q , Q ⊂ Q we have Q ⊂ Q if and only if [28] g ( Q ) ≤ g ( Q ) . We denote a sequence of m real numbers ( t , · · · , t m ) by ( t ) m . Given two increasing (resp. decreasing) sequences ( t ) n and ( v ) m , ( t ) n ⊕ ( v ) m is their concatenated increasing (resp.decreasing) sequence, i.e., for ( u ) n + m = ( t ) n ⊕ ( v ) m , any u k , k ∈ { , · · · , n + m } is either in ( t ) n or ( v ) m or is in both.We assume that ( u ) n + m preserves the relative labeling of ( t ) n or ( v ) m , i.e., if t k and t k +1 , k ∈ { , · · · , n − } (resp. v k and v k +1 , k ∈ { , · · · , m − } ) correspond to u i and u j in ( u ) n + m , then i < j .III. S UBOPTIMAL POLICY DESIGN

According to Lemma 2.1 the time complexity of ﬁnding anoptimal patrolling policy in (3) increases exponentially by themaximum length, ¯ n i , of the admissible policies of any agent i ∈ A and also by the number of the exploring agents M .In light of this observation, to reduce the computational cost,we propose the following suboptimal policy design. Since themaximum policy length ¯ n i is proportional to the length of themission horizon, we ﬁrst propose to trade in optimality and di-vide the planning horizon into multiple shorter horizons so thatthe policy design can be carried out in a consecutive mannerover these shorter horizons. Then, to reduce the optimality gapand also to induce robustness to the online changes that canoccur during the mission time, we propose to implement thisapproach in a receding horizon fashion where we calculatethe policy over a speciﬁed shorter horizon but execute onlysome of the initial steps of the policy, and then we repeat theprocess. However, a receding horizon approach suffers fromwhat we refer to as shortsightedness . That is, over large inter-connected geographical node sets, a receding horizon designis oblivious to the reward distribution of the nodes that arenot in the feasible policy set in the planning horizon. Then,the optimal policy over the planning horizon can inadvertentlysteer the agents away from the distant nodes with a higherreward, see Fig. 2. To compensate for this shortcoming, weintroduce the notion of nodal importance and augment thereward function (2) over the design horizon with an additionalterm that given an admissible policy, provides a measure ofhow close an agent at the ﬁnal step of the policy is to a clusterof geographical nodes with a high concentration of reward.Let the augmented reward, whose exact form will be intro-duced below, over the planning horizon be ¯ R . Then, the optimalpolicy design over each receding horizon is P (cid:63) = argmax ¯ P⊂P ¯ R ( ¯ P ) , s.t. | ¯ P ∩ P i | ≤ , (4) lgorithm 1 Sequential Greedy Algorithm procedure SGOpt ( P , · · · , P M , M ) Init: ¯ P ← ∅ , i ← , { ¯ t v } v ∈V while i ≤ M do p i(cid:63) = argmax p ⊂P i ∆ ¯ R ( p | ¯ P ) . ¯ P ← ¯ P ∪ p i(cid:63) . i ← i + 1 . end while Return ¯ P . end procedure where hereafter P = (cid:83) Mi =1 P i is the set of the union of theadmissible policies of the agents P i , i ∈ A , over the planninghorizon. Hereafter, we let ¯ t v be the last time node v ∈ V wasvisited before a planning horizon starts.Next, to reduce the computational cost further, we propose touse Algorithm 1, which is a sequential greedy algorithm with apolynomial cost in terms of the number of the agents to obtaina suboptimal solution for (4). In what follows, we show thatsince the objective function (4) is a submodular set function,Algorithm 1 comes with a known optimality gap. We also showthat with a proper inter-agent communication coordinationAlgorithm 1 can be implemented in a decentralized manner.For v ∈ V , let N rv be the set of nodes, including v , that can bereached from node v using at most r number of edges in E .Then, for every node v ∈ V , we deﬁne the nodal importancewith radius r at time τ as L ( v, τ, r ) = (cid:80) w ∈N rv R w ( τ ) . Next,given an agent i ∈ A that is at node w ∈ V at time ˆ t ∈ R ≥ ,we deﬁne the relative nodal importance of a node v ∈ V withrespect to agent i as L ( v, w, ˆ t, i ) = L ( v, ˆ t + τ iw,v , r ) (cid:14) τ iw,v . Then, L ( v, V p ( n p ) , T p ( n p ) , a p ) is a measure of the relativesize of the awards concentration around any node v ∈ V that takes into account also the travel time of agent a p fromthe ﬁnal step of policy p = ( V p , T p , a p ) ∈ P to v . Let L ( v, p ) be the shorthand notation for L ( v, V p ( n p ) , T p ( n p ) , a p ) .To compensate for the shortsightedness of the receding horizondesign, then we revise the utility function to ¯ R ( ¯ P ) = R ( ¯ P ) + α (cid:88) ∀ p ∈ ¯ P max ∀ v ∈ ¯ V L ( v, p ) , α ∈ R ≥ . (5)For computational efﬁciency, instead of incorporating the rela-tive nodal importance of all the nodes, which can be achievedby setting ¯ V equal to V , we propose to use only ¯ V subset ofthe nodes. We refer to nodes in ¯ V as anchor nodes . The anchornodes can be selected to be the nodes with higher rewardreturn or to be a set of nodes that are scattered uniformlyon the graph. It is interesting to note that the relative nodalimportance term in (5) is a reminiscent of terminal cost used inthe model predictive control (MPC). In MPC, terminal cost thatis used to achieve an inﬁnite horizon control with closed-loopstability guarantees [31] in some way also compensates forthe shortsightedness of the design over ﬁnite planning horizon. Next, we show that the reward function (5) is submodular overany given feasible policy set P in every planning horizon. Theorem 3.1 (Submodularity of the reward function (5) ) For any weighting factor α ∈ R ≥ , the reward function ¯ R :2 P → R > in (5) is a monotone increasing and submodularset function over P .Proof: Let c ( v, Q ) : V × Q → Z > be the total numberof visits to the geographical node v , and I Q ⊂ V be the setof the nodes that are visited when a policy set Q ⊂ P isimplemented. Furthermore, let the increasing sequence ( t v ( Q )) c ( v, Q )1 = ( t v ( Q ) , t v ( Q ) , · · · , t vc ( v, Q ) ( Q )) be the sequence of time that node v ∈ I Q was visited whenagents implement Q . Now consider the reward function ¯ R in(5). Then, the ﬁrst summand of ¯ R expands as R ( ¯ P ) = (cid:88) v ∈I ¯ P (cid:0) (cid:88) c ( v, ¯ P ) j =1 ψ v (∆ t vj ( ¯ P )) (cid:1) , where ∆ t vj ( ¯ P ) = t vj ( ¯ P ) − t vj − ( ¯ P ) is the time between twoconsecutive visits of node v , and t v ( ¯ P ) = ¯ t v . Next, considerthe monitoring policy sets Q , Q and monitoring policy q with Q ⊂ Q ⊂ P , q ∈ P , q (cid:54)∈ Q , and q (cid:54)∈ Q . Because ( t v ( Q )) c ( v, Q )1 is a sub-sequence of ( t v ( Q )) c ( v, Q )1 , usingLemma A.2 and the fact that ψ ( . ) v is a normalized increasingconcave function, we conclude that (cid:88) c ( v, Q ∪ q ) j =1 ψ v (∆( t vj ( Q ∪ q ))) − (cid:88) c ( v, Q ) j =1 ψ l (∆( t vj ( Q ))) ≥ for ∀ v ∈ I ¯ P . Therefore, ∆ R ( p |Q ) ≥ which shows that R ( ¯ P ) is a monotone increasing set function. Furthermore,using Lemma A.3 we can write c ( v, Q ∪ q ) (cid:88) j =1 ψ v (∆( t vj ( Q ∪ q ))) − c ( v, Q ) (cid:88) j =1 ψ v (∆( t vj ( Q ))) ≤ c ( v, Q ∪ q ) (cid:88) j =1 ψ v (∆( t vj ( Q ∪ q ))) − c ( v, Q ) (cid:88) j =1 ψ v (∆( t vj ( Q ))) . Hence, ∆ R ( q |Q ) ≥ ∆ R ( q |Q ) which shows that R ( ¯ P ) is a submodular set function. Then,since the second summand of ¯ R , (cid:80) ∀ p ∈ ¯ P max ∀ l ∈ ¯ V L ( l, p ) , is triv-ially positive and modular, the proof is concluded.Due to Theorem 3.1, the suboptimal dispatch policy of Al-gorithm 1, which has a polynomial computational complexity,has the following well-deﬁned optimality gap. Theorem 3.2 (Optimality gap of Algorithm 1)

Let P (cid:63) be anoptimal solution of (4) and ¯ P be the output of Algorithm 1.Then, ¯ R ( ¯ P ) ≥ ¯ R ( P (cid:63) ) .Proof: Since the objective function of (4) is monotoneincreasing and submodular over P , the proof follows byinvoking [28, Theorem 5.1]. lgorithm 2 Decentralized Implementation of SequentialGreedy Algorithm Init: ¯ P ← ∅ , i ← , { ¯ t v } v ∈V while i ≤ K do if s i is being called for the ﬁrst time then agent s i computes p s i (cid:63) = argmax p ⊂P s i ∆ ¯ R ( p | ¯ P ) . ¯ P ← ¯ P ∪ p s i (cid:63) . end if agent s i pass ¯ P to s i +1 . i ← i + 1 . end while agent s K based on the execution plan of the recedinghorizon operation updates { ¯ t v } v ∈V and communicates itto the team A. Comments on decentralized implementations of Algo-rithm 1

To implement Algorithm 1, given the current position of eachagent and { ¯ t v } v ∈V at the beginning of each planning horizon,the admissible set of policies P i for each agent i ∈ A shouldbe calculated.Let every agent know { ψ v ( t ) } v ∈V . A straightforward decen-tralized implement of Algorithm 1 then is a multi-centralizedsolution. In this solution, agents transmit the feasible policysets across the entire network until each agent knows the wholepolicy set {P , · · · , P M } (ﬂooding approach). Then, eachagent acts as a central node and runs a copy of Algorithm 1locally. Although reasonable for small-size networks, the com-munication and storage costs of this approach scale poorlywith the network size. The sequential structure of Algorithm 1however, offers an opportunity for a communicationally andcomputationally more efﬁcient decentralized implementations,as described in steps 1 to 9 of Algorithm 2. Step 10 ofAlgorithm 2 is included for receding horizon implementationpurpose, where the execution plan can be for example one orall of the agents visit at least one node. To implement Algo-rithm 2, we assume that the agents A can form a bidirectionalconnected communication graph G a = ( A , E a ) , i.e., there is apath from every agent to every other agent on G a . Then, therealways exists a route SEQ = s → · · · → s i → · · · → s K , s k ∈ A , k ∈ { , · · · , K } , K ≥ M , that visits all the agents(not necessarily only one time), see Fig. 3(a). The agentsfollow SEQ to share their information while implementingAlgorithm 2. The communication cost to execute Algorithm 2can be optimized by picking

SEQ to be the shortest path [32]that visits all the agents over graph G a . If G a has a Hamiltonianpath, the optimal choice for SEQ is a Hamiltonian path. Recallthat a Hamiltonian path is a path that visits every agent on G a only once [33]. When, there is a SEQ that visits everyagent on G a , the directed information graph G I = ( A , E I ) of Algorithm 2, which shows the information access of eachagent while implementing Algorithm 2, is full, see Fig. 3.That is, each agent in SEQ is aware of the previous agents’decision. Therefore, the solution obtained by Algorithm 2 isan exact sequential greedy algorithm and its optimality gap

12 3 54

Fig. 3:

The plot on the left shows the bi-directional communicationgraph G a in black along with an example SEQ path in red. The plot onthe right shows the complete information sharing graph G I if agentsfollow SEQ while implementing Algorithm 2. Arrow going from agent i to agent j means that agent j receives agent i ’s information. is / . We recall that the labeling order of the mobile agentsdoes not have an effect on the optimality gap guaranteed byTheorem 3.2 [34]. If an agent i ∈ A appears repeatedly in SEQ (e.g., the blue agent in Fig. 3), with a slight increase incomputation cost, we can modify Algorithm 2 to allow agent i to redesign and improve its sub-optimal policy p i(cid:63) by re-executing step 4 of Algorithm 2.Another form of decentralized implementation of Algorithm 1,which may be more relevant in urban environments, is througha client-server framework implemented over a cloud. In thisframework, agents (clients) connect to shared memory on acloud (server) to download or upload information or use thecloud’s computing power asynchronously. Let {T , · · · , T M } be the sequence of time slots that is allotted respectively toagents A , see Fig. 4. To implement Algorithm 1, agent i ∈ A connects to the server at the beginning of T i to check out ¯ P and { ¯ t v } v ∈V . Then, it completes steps and of Algorithm 1,and checks in the updated ¯ P to the server before T i elapsesfully. The last agent, i = M , based on the execution plan ofthe receding horizon operation updates { ¯ t v } v ∈V and checks itin the cloud memory for next receding horizon planning. Sincethe time slots assigned to the agents do not overlap, agent i has access to policy p k(cid:63) of all agents k with k < i . Thus, theinformation graph G I is full, and the optimality gap of / holds.If there is a message dropout while executing Algorithm 2 or inthe decentralized server-client based operation an agent j takesa longer time than T j to complete and check-in ¯ P to the cloud,the information graph becomes incomplete, see for exampleFig. 4. Then, the corresponding decentralized implementationdeviates from the exact sequential greedy Algorithm 2. Forsuch cases, [34] shows that the optimality gap instead of / becomes M − ω ( G I )+2 , where ω ( G I ) is the clique number of G I [34]. Recall that the clique number of a graph is equalto the number of the nodes in the largest sub-graph such thatadding an edge will cause a cycle [35].IV. N UMERICAL E XAMPLE

We consider persistent monitoring using agents for eventdetection over an area that is divided into by grid mapas shown in Fig. 5(a). The geographical nodes of interest V arethe center of the cells in Fig. 5(a). The agents can travel froma cell to the neighboring cells in the right, left, bottom and top.ig. 4: {T i } Mi =1 are the time slots allotted to each agent to connectto the cloud. The arrows show the time each agent took to do theircalculations for an example scenario. Here, the associated informationgraph G I is as the incomplete graph on the right with clique numberof 3. The agents are homogeneous and the travel time between anyneighboring nodes for all the agents are identical and equal to second. The agents start their patrolling task from the nodeswhere they are depicted in Fig. 5(a). We model the probabilityof discovering at least one event in each geographical nodeas a Poisson distribution and deﬁne our reward function ateach node v ∈ V as (1) with ψ v ( t ) = 1 − e λ v t where λ v ∈ R > is the arrival rate of the event; for more detailssee [29]. Fig. 5(a) shows the reward value of the nodes at t = 120 seconds when there is no monitoring. The colorintensity of the cells in Fig. 5(a) is proportional to λ v ; thehigher λ v , the darker the color of node v . The region enclosedby the blue rectangle initially has a low reward but after seconds its reward value is increased to a higher valueby changing λ v of the corresponding cells. An animateddepiction of the change in the reward map because of differentdispatch policies we discuss below is available in [36]. Wecompare the performance of Algorithm 1, implemented in areceding horizon fashion, and a conventional greedy algorithmwhere each agent always moves to the neighboring node thathas the instantaneous highest reward value. In implementingAlgorithm 1 in a receding horizon fashion, we assume that theplanning horizon is seconds and the execution horizon is second. We consider both the case of including ( α = 0 . )and excluding ( α = 0 ) the nodal importance measure inthe reward function (5). Fig. 5(b) shows that the traditionalgreedy cell selection performs poorly compared to the othertwo planning algorithms. The reason is that the three agents’decision becomes the same after a while, i.e., they startchoosing the same cell after a while and moving together,therefore all three agents act as if one agent is patrolling (recallAssumption 1). Performance of Algorithm 1 is better than thetraditional greedy cell selection because the effect of agent i ’s patrolling policy is taken into account when agent i + 1 isdesigned. Therefore, the chances that all three agents go to thesame cell together and move together is narrow. Furthermore,we can note that implementing Algorithm 1 by consideringthe effect of nodal importance delivers a better outcome. Thereason is that in the case that there is no nodal importance, theagents are drawn to the region of high importance near themand stay there as Fig. 5(c) shows. However, there are otherimportant regions with higher values that are farther away,especially the area on the left top corner which is separated bya low rate stripe from where agents start. Incorporating nodalimportance, as Fig. 5(d) shows steers the agents to the regionswith a higher rate of reward that are beyond the recedinghorizon’s sight. (a) Reward map (b) The collected reward(c) Agents’ path when they followAlgorithm 1 and use α = 0 over [0 , seconds (d) Agents’ path when they followAlgorithm 1 and use α = 0 . over [0 , seconds Fig. 5:

Three agents patrol a ﬁeld, divided into by cells. V. C

ONCLUSION

We proposed a solution for patrolling a ﬁnite number of inter-connected geographical nodes with the purpose of maximizingthe gathered reward. We modeled the reward generation ineach node as an increasing and concave function of timeand tied this with trajectory scheduling of the agents viaa utility function. We discussed that maximizing the utilityfunction is NP-hard. By showing that the reward function isa monotone increasing and submodular set function, we laidthe ground to propose a suboptimal solution with a knownoptimality gap for this NP-hard problem. To induce robustnessto the changes in the problem parameters, we proposed oursuboptimal solution in a receding horizon setting. Next, tocompensate for the shortsightedness of the receding horizonapproach, we added a new term, called the relative nodalimportance, to the reward function as a measure to incorporatea notion of the importance of the regions beyond the feasiblesolution set of the receding horizon optimization problem.Finally, we discussed how our suboptimal solution can beimplemented in a decentralized manner. Our future work isto investigate decentralized algorithms that allow agents tocommunicate synchronously with each other in order to havea consensus on a policy with a known optimality gap.R

EFERENCES[1] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control formobile sensing networks,”

IEEE Tran. on Automatic Control , vol. 20,no. 2, pp. 243–255, 2004.[2] M. Schwager, D. Rus, and J. Slotine, “Decentralized, adaptive coveragecontrol for networked robots,”

The Int. Journal of Robotics Research ,vol. 28, no. 3, pp. 357–375, 2009.3] F. Bullo, R. Carli, and P. Frasca, “Gossip coverage control for roboticnetworks: Dynamical systems on the space of partitions,”

SIAM Journalon Control and Optimization , vol. 50, no. 1, pp. 419–447, 2012.[4] C. Yuan, Y. Zhang, and Z. Liu, “A survey on technologies for automaticforest ﬁre monitoring, detection, and ﬁghting using unmanned aerialvehicles and remote sensing techniques,”

Canadian Journal of ForestResearch , vol. 45, no. 7, pp. 783–792, 2015.[5] N. Henry and O. Henry, “Wireless sensor networks based pipelinevandalisation and oil spillage monitoring and detection: main beneﬁtsfor nigeria oil and gas sectors,”

The SIJ Tran. on Computer ScienceEngineering & its Applications (CSEA) , vol. 3, no. 1, pp. 1–6, 2015.[6] R. Engler, A. Guisan, and L. Rechsteiner, “An improved approachfor predicting the distribution of rare and endangered species fromoccurrence and pseudo-absence data,”

J. of Applied Ecology , vol. 41,no. 2, pp. 263–274, 2004.[7] T. Thomas and E. van Berkum, “Detection of incidents and eventsin urban networks,”

IET Intelligent Transport Systems , vol. 3, no. 2,pp. 198–205, 2009.[8] X. Lin and C. Cassandras, “An optimal control approach to the multi-agent persistent monitoring problem in two-dimensional spaces,”

IEEETran. on Automatic Control , vol. 60, no. 6, pp. 1659–1664, 2014.[9] N. Zhou, X. Yu, S. Andersson, and C. Cassandras, “Optimal event-driven multi-agent persistent monitoring of a ﬁnite set of data sources,”

IEEE Tran. on Automatic Control , 2018.[10] W. Li and C. Cassandras, “A cooperative receding horizon controllerfor multivehicle uncertain environments,”

IEEE Tran. on AutomaticControl , vol. 51, no. 2, pp. 242–257, 2006.[11] R. Karp, “Reducibility among combinatorial problems,” in

Complexityof computer computations , pp. 85–103, Springer, 1972.[12] A. Machado, G. Ramalho, J. Zucker, and A. Drogoul, “Multi-agent pa-trolling: An empirical analysis of alternative architectures,” in

Interna-tional Workshop on Multi-Agent Systems and Agent-Based Simulation ,pp. 155–170, 2002.[13] A. Almeida, G. Ramalho, H. Santana, P. Tedesco, T. Menezes, V. Corru-ble, and Y. Chevaleyre, “Recent advances on multi-agent patrolling,” in

Brazilian Symposium on Artiﬁcial Intelligence , pp. 474–483, Springer,2004.[14] Y. Chevaleyre, “Theoretical analysis of the multi-agent patrolling prob-lem,” in

Intelligent Agent Technology , pp. 302–308, IEEE, 2004.[15] F. Pasqualetti, A. Franchi, and F. Bullo, “On cooperative patrolling: Op-timal trajectories, complexity analysis, and approximation algorithms,”

IEEE Tran. on Robotics , vol. 28, no. 3, pp. 592–606, 2012.[16] J. Yu, S. Karaman, and D. Rus, “Persistent monitoring of events withstochastic arrivals at multiple stations,”

IEEE Tran. on Robotics , vol. 31,no. 3, pp. 521–535, 2015.[17] M. Donahue, G. Rosman, K. Kotowick, D. Rus, and C. Baykal,“Persistent surveillance of transient events with unknown statistics,”tech. rep., MIT Lincoln Laboratory Lexington United States, 2016.[18] A. Asghar, S. Smith, and S. Sundaram, “Multi-robot routingfor persistent monitoring with latency constraints,” arXiv preprintarXiv:1903.06105 , 2019.[19] A. Farinelli, L. Iocchi, and D. Nardi, “Distributed on-line dynamic taskassignment for multi-robot patrolling,”

Autonomous Robots , vol. 41,no. 6, pp. 1321–1345, 2017.[20] A. Blum, P. Chalasani, D. Coppersmith, B. Pulleyblank, P. Raghavan,and M. Sudan, “The minimum latency problem,” in

Proceedings ofthe Twenty-sixth Annual ACM Symposium on Theory of Computing ,pp. 163–171, 1994.[21] Z. Kadelburg, D. Dukic, M. Lukic, and I. Matic, “Inequalities ofKaramata, Schur and Muirhead, and some applications,”

The Teachingof Mathematics , vol. 8, no. 1, pp. 31–45, 2005.[22] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placementsin Gaussian processes: Theory, efﬁcient algorithms and empirical stud-ies,”

Journal of Machine Learning Research , vol. 9, no. Feb, pp. 235–284, 2008. [23] T. Summers, F. Cortesi, and J. Lygeros, “On submodularity and con-trollability in complex dynamical networks.,”

IEEE Tran. on Control ofNetwork Systems , vol. 3, no. 1, pp. 91–101, 2016.[24] Z. Liu, A. Clark, P. Lee, L. Bushnell, D. Kirschen, and R. Poovendran,“Submodular optimization for voltage control,”

IEEE Tran. on PowerSystems , vol. 33, no. 1, pp. 502–513, 2018.[25] A. Clark, P. Lee, B. Alomair, L. Bushnell, and R. Poovendran, “Combi-natorial algorithms for control of biological regulatory networks,”

IEEETran. on Control of Network Systems , vol. 5, no. 2, pp. 748–759, 2018.[26] A. Krause and C. Guestrin, “Near-optimal observation selection usingsubmodular functions,” in

American Association for Artiﬁcial Intelli-gence , vol. 7, pp. 1650–1654, 2007.[27] A. Clark, L. Bushnell, and R. Poovendran, “A supermodular optimiza-tion framework for leader selection under link noise in linear multi-agent systems,”

IEEE Tran. on Automatic Control , vol. 59, no. 2,pp. 283–296, 2014.[28] L. Fisher, G. Nemhauser, and L. Wolsey, “An analysis of approximationsfor maximizing submodular set functionsii,” in

Polyhedral combina-torics , pp. 73–87, Springer, 1978.[29] N. Rezazadeh and S. S. Kia, “A sub-modular receding horizon approachto persistent monitoring for a group of mobile agents over an urbanarea,”

IFAC-PapersOnLine , vol. 52, no. 20, pp. 217–222, 2019.[30] L. Lov´asz, “Submodular functions and convexity,” in

MathematicalProgramming The State of the Art , pp. 235–257, Springer, 1983.[31] E. Garcia, D. Prett, and M. Morari, “Model predictive control: theoryand practicea survey,”

Automatica , vol. 25, no. 3, pp. 335–348, 1989.[32] E. Lawler, J. Lenstra, A. H. R. Kan, and D. Shmoys,

The travelingsalesman problem; a guided tour of combinatorial optimization . Wiley,Chichester, 1985.[33] F. Rubin, “A search procedure for Hamilton paths and circuits,”

Journalof the ACM , vol. 21, no. 4, pp. 576–580, 1974.[34] B. Gharesifard and S. Smith, “Distributed submodular maximizationwith limited information,”

IEEE Tran. on Control of Network Systems ,vol. 5, no. 4, pp. 1635–1645, 2018.[35] J. Bondy, U. Murty, et al. , Graph theory with applications A PPENDIX [Proof of Lemma 2.1]

The time complexity of constructingthe admissible policy set P i is of order of the number ofpossible paths that agent i ∈ A can traverse over the missionhorizon while respecting Assumption 2, which is of order D ¯ n i . Thus, the time complexity of constructing the feasibleset P = (cid:83) Mi =1 P i is O ( (cid:80) Mi =1 D ¯ n i ) . Next, let ¯ P be any subsetof P that satisﬁes constraint (3b). Due to Assumption 1, thereward scored by implementing policy p = ( V p , T p , a p ) ∈ ¯ P cannot be calculated independent from the all the other policiesin ¯ P\{ p } . Hence, to solve optimization problem (3), weneed to evaluate all the possible policy sets ¯ P satisfying theconstraint (3b). Since ¯ P can have at most one policy fromthe policy set P i of i ∈ A and P i has O ( D ¯ n i ) members,then O ( (cid:81) Mi =1 D ¯ n i ) different possibilities of ¯ P exist whichdetermines the time complexity of solving (3). (cid:3) We develop the auxiliary results below to use in the proof ofTheorem 3.1. These results show some of the properties of thesum of evaluation of a concave and increasing function overincreasing sequences and their concatenation. The decreasingequence ( δ t ) n majorizes the decreasing sequence ( δ v ) n , if δ t ≥ δ t ≥ · · · ≥ δ t n ,δ v ≥ δ v ≥ · · · ≥ δ v n ,δ t + · · · + δ t i ≥ δ v + · · · + δ v i , ∀ i ∈ { , · · · , n − } and δ t + · · · + δ t n = δ v + · · · + δ v n hold. Lemma A.1:

Let f : R → R be a concave and increasingfunction with f (0) = 0 . If sequences ( δ t ) n and ( δ v ) m with n ≤ m satisfy δ t + · · · + δ t i ≥ δ v + · · · + δ v i , ∀ i ∈{ , · · · , n − } and δ t + · · · + δ t n = δ v + · · · + δ v m then f ( δ t ) + · · · + f ( δ t n ) ≤ f ( δ v ) + · · · + f ( δ v m ) holds. Proof:

We note that the sequence ( δ u ) m deﬁned as δ u i = δ u i for i ∈ { , · · · , n } and δ u i = 0 for i ∈ { n + 1 , · · · , m } majorizes any sequence ( δ v ) m deﬁned in the lemma statement.Then, since f (0) = 0 , the proof follows from the Karamata’sinequality [21]. Corollary 5.1:

Let f : R ≥ → R ≥ be a monotone increasingand concave function. Then for any a, b, c, d ∈ R ≥ such that ≤ a ≤ c and ≤ b ≤ d , then f ( c ) + f ( d ) − f ( c + d ) ≤ f ( a ) + f ( b ) − f ( a + b ) holds. Proof:

The assumption is that a ≤ c and b ≤ d . Therefor,we have a + b ≤ c + d . By taking a, b, c + d and c, d, a + b as δ t ’s δ v ’s respectively. There will be two possible cases for δ t ’s as ( A

1) : δ t = c + d, δ t = a, δ t = b, ( A

2) : δ t = c + d, δ t = b, δ t = a, and there will be six possible cases for δ v ’s as ( B

1) : δ v = a + b, δ v = d, δ v = c, ( B

2) : δ v = a + b, δ v = c, δ v = d, ( B

3) : δ v = d, δ v = a + b, δ v = c, ( B

4) : δ v = c, δ v = a + b, δ v = d, ( B

5) : δ v = c, δ v = d, δ v = a + b, ( B

6) : δ v = d, δ v = c, δ v = a + b. Taking any cases of A or B , we have δ t + δ t + δ t = δ v + δ v + δ v = a + b + c + d . Comparing any cases of A with any cases of B , δ t ≥ δ v . Taking case ( A , since a > b then we have c + d + a ≥ a + b + d and c + d + a ≥ a + b + c and also simply we have c + d + a ≥ c + d . Therefor,Taking case A and comparing with any cases of B , we have δ t + δ t ≥ δ v + δ v . The same reasoning also can be donefor case A . Hence taking any cases of A and B , we knowthat δ t , δ t , δ t majorizes δ v , δ v , δ v . This results in f ( c ) + f ( d ) + f ( a + b ) ≤ f ( a ) + f ( b ) + f ( c + d ) and consequently f ( a ) + f ( b ) − f ( a + b ) ≤ f ( c ) + f ( d ) − f ( c + d ) . Lemma A.2:

For any ( q ) l , let g (( q ) l ) = (cid:88) l − i =1 f (∆ q i ) , where ∆ q i = q i +1 − q i and f be a concave and increasing func-tion with f (0) = 0 . Now, consider two increasing sequences ( t ) n and ( u ) l , and their concatenation ( a ) n + l = ( t ) n ⊕ ( u ) l .Then, g (( a ) n + l ) − g (( t ) n ) ≥ . holds Proof: If a p = t and a q = t n , then since ( a ) n + l is aincreasing sequence, p < q . Let the sub-sequence of ( a ) n + l ranging from index p to q be ( v ) m where m ≥ n . Letting ∆ v i = v i +1 − v i and ∆ t i = t i +1 − t i , we rearrange ∆ v i ’s and ∆ t i ’s in a descending order to form the sequences ( δ v ) l − and ( δ t ) n − . Since a p = t and a q = t n , we have m − (cid:88) i =1 ∆ v i = m − (cid:88) i =1 δ v i = n − (cid:88) i =1 δ t i = n − (cid:88) i =1 ∆ t i = t n − t . Because ( a ) n + l = ( t ) n ⊕ ( u ) l , then ∀ i ∈ { , · · · , n } thereexists S i ⊂ { , · · · , m } such that (cid:80) j ∈ S i δ v j = δ t i , where S i ∩ S k = ∅ , i (cid:54) = k . Consequently, for r ∈ { , · · · , m } , wehave (cid:80) ri =1 δ v i = (cid:80) j ∈ S δ t j for S ⊂ { , · · · , n } and | S | ≤ r .Since ( δ t ) n − is a decreasing sequence, we can write (cid:88) ri =1 δ v i ≤ (cid:88) ri =1 δ t i . Thus, f ( δ t )+ · · · + f ( δ t n − ) ≤ f ( δ v )+ · · · + f ( δ v m − ) holds as a result of Lemma A.1. Given that f ( δ t )+ · · · + f ( δ t n − ) = n − (cid:88) i =1 f (∆ t i ) and f ( δ v )+ · · · + f ( δ v m − ) = m − (cid:88) i =1 f (∆ v i ) ≤ n +1 − (cid:88) i =1 f (∆ a i ) , then n − (cid:88) i =1 f (∆ t i ) ≤ n +1 − (cid:88) i =1 f (∆ a i ) , which concludes the proof. Lemma A.3:

For any ( q ) l , let g (( q ) l ) = (cid:88) li =1 f (∆ q i ) here ∆ q i = q i +1 − q i and f is a concave and increasingfunction with f (0) = 0 . Now, consider three increasing se-quences ( t ) n and ( v ) m and ( u ) l and concatenations ( a ) n + l =( t ) n ⊕ ( u ) l and ( b ) m + l = ( v ) m ⊕ ( u ) l where ( v ) m is a sub-sequence of ( t ) n , then (cid:0) g (( b ) m + l ) − g (( v ) m ) (cid:1) − (cid:0) g (( a ) n + l ) − g (( t ) n ) (cid:1) ≥ . Proof:

Let the sequence ( u ) p be the ﬁrst p elements of ( u ) l . Then, we can form ∆ S p = (cid:0) g (( v ) m ⊕ ( u ) p ) − g (( v ) m ⊕ ( u ) p − ) (cid:1) − (cid:0) g (( t ) n ⊕ ( u ) p ) − g (( t ) n ⊕ ( u ) p − ) (cid:1) , where ( u ) to be an empty sequence with no members. Since ( v ) m is a sub-sequence of ( t ) n and ( u ) p having one membermore over ( u ) p − , then we have ∆ S p = ( f (∆ S ) + f (∆ S ) − f (∆ S + ∆ S )) − ( f (∆ S ) + f (∆ S ) − f (∆ S + ∆ S )) with ≤ ∆ S ≤ ∆ S and ≤ ∆ S ≤ ∆ S . FromCorollary 5.1, we can conclude that ∆ S p ≥ . Then, given l (cid:88) p =1 ∆ S p = (cid:0) g (( b ) m + l ) − g (( v ) m ) (cid:1) − (cid:0) g (( a ) n + l ) − g (( v ) m ) (cid:1) ,,