[PDF] Event-Driven Receding Horizon Control of Energy-Aware Dynamic Agents For Distributed Persistent Monitoring

Abstract

This paper addresses the persistent monitoring problem defined on a network where a set of nodes (targets) needs to be monitored by a team of dynamic energy-aware agents. The objective is to control the agents' motion to jointly optimize the overall agent energy consumption and a measure of overall node state uncertainty, evaluated over a finite period of interest. To achieve these objectives, we extend an established event-driven Receding Horizon Control (RHC) solution by adding an optimal controller to account for agent motion dynamics and associated energy consumption. The resulting RHC solution is computationally efficient, distributed and on-line. Finally, numerical results are provided highlighting improvements compared to an existing RHC solution that uses energy-agnostic first-order agents.

Full PDF

EEvent-Driven Receding Horizon Control of Energy-Aware DynamicAgents For Distributed Persistent Monitoring

Shirantha Welikala and Christos G. Cassandras

Abstract — This paper addresses the persistent monitoringproblem deﬁned on a network where a set of nodes (targets)needs to be monitored by a team of dynamic energy-awareagents. The objective is to control the agents’ motion tojointly optimize the overall agent energy consumption anda measure of overall node state uncertainty, evaluated overa ﬁnite period of interest. To achieve these objectives, weextend an established event-driven Receding Horizon Control(RHC) solution by adding an optimal controller to account foragent motion dynamics and associated energy consumption. Theresulting RHC solution is computationally efﬁcient, distributedand on-line. Finally, numerical results are provided highlightingimprovements compared to an existing RHC solution that usesenergy-agnostic ﬁrst-order agents.

I. I

NTRODUCTION

We consider the problem of controlling a group of mobile agents deployed to monitor a ﬁnite set of “points of interest”(henceforth called targets ) in a mission space. In particular,each agent follows second-order unicycle dynamics and eachtarget has an “uncertainty” metric associated with its statethat increases when no agent is monitoring (i.e., sensingor collecting information from) the target and decreaseswhen one or more agents are monitoring it by dwelling inits vicinity. The goal is to optimally control each agent’smotion so as to collectively minimize the overall agentenergy consumption and a measure of target uncertainties -evaluated over a ﬁxed period of interest. This problem setupis widely known as the persistent monitoring problem and itencompasses applications such as environmental sensing [1],surveillance [2], trafﬁc monitoring [3], data collection [4],event detection [5] and energy management [6]. In order tosuit different application scenarios, this persistent monitoringproblem has been studied in the literature under differentobjective functions [7], agent dynamic models [8], [9] andtarget state dynamic models [10], [11].A common way to categorize persistent monitoring prob-lem setups is based on whether the shapes of trajectorysegments (available for the agents to travel between targets)are predeﬁned [4], [10] or not [11], [12]. In the latter case, themain challenge is to search for the optimal agent trajectoryshapes. This is often achieved by restricting agent trajectoryshapes to speciﬁc parametric families (elliptical, Fourier, etc.[12]) and optimizing the objective function of interest withinthese families. In contrast, when the shapes of trajectory (cid:63)

Supported in part by NSF under grants ECCS-1931600, DMS-1664644,CNS-1645681, by AFOSR under grant FA9550-19-1-0158, by ARPA-Eunder grant DE-AR0001282 and by the NEXTCAR program under grantDE-AR0000796 and by the MathWorks.The authors are with the Division of Systems Engineering and Center forInformation and Systems Engineering, Boston University, Brookline, MA02446, { shiran27,cgc } @bu.edu . segments are predeﬁned, the challenge is to search for: 1) theoptimal target visiting schedules of agents and 2) the optimalcontrol laws to govern agents on corresponding trajectorysegments - assuming an agent has to remain stationary on atarget to monitor it. As introduced in [10] and illustrated inFig. 1, this can be seen as a Persistent Monitoring on a Net-work (PMN) problem where targets and trajectory segmentsare modeled as nodes and edges of a network, respectively.Such PMN problems are signiﬁcantly more complicated thanthe NP-hard traveling salesman problems [13] and thus haveinspired many different solution approaches [8], [10], [14].Fig. 1: The network abstraction.The work in [14] proposes a centralized off-line greedyalgorithm to determine the optimal target visiting schedulesof agents (i.e., each agent’s sequence of targets to visitand respective dwell-times to be spent at visited targets)in PMN problems. In contrast, for the same task, [10] pro-poses a gradient-based distributed on-line approach - which,however, requires a brief centralized off-line initializationstage to address non-convexities. An alternative approach istaken in the recent work [8] which exploits the event-drivennature of PMN systems to develop a distributed on-linesolution based on event-driven Receding Horizon Control(RHC) [15]. This RHC solution enjoys many promisingfeatures such as being computationally cheap, parameter-free, gradient-free and robust in the presence of various formsof state and system perturbations.However, the work mentioned above [8], [10], [14] ignoresagent dynamics by assuming each trajectory segment has apredeﬁned transit-time value that an agent has to spend inorder to travel on it. This assumption allows one to focus ondetermining the optimal target visiting schedules of agents,ignoring how the agents are governed during the transitionperiods where they travel on trajectory segments. In essence,it is identical to assuming each agent follows a ﬁrst-orderdynamic model controlled by its velocity.In contrast, in this paper, we assume each agent followsa second-order dynamic model governed by accelerationrather than velocity. This leads to a better approximationof actual agent behaviors in practice and smoother agentstate trajectories [9]. In particular, we incorporate agentenergy consumption into the objective function to limit agentaccelerations and velocities and also to motivate agents to a r X i v : . [ ee ss . S Y ] F e b ake energy-efﬁcient decisions. Under these modiﬁcations,we show how each agent needs to optimally select eachtransit-time value on its trajectory based on current localstate information - instead of using a ﬁxed set of predeﬁnedtransit-time values. In particular, we explicitly derive optimalcontrol laws to govern each agent on each trajectory segment.Finally, we not only compare the improvements achievedwith respect to an existing RHC solution [8] that usesenergy-agnostic ﬁrst-order agents but also derive energy-aware optimal control laws for even such ﬁrst-order agents.In this paper, ﬁrst, we show that each agent’s trajectory isfully characterized by the sequence of decisions it makesat speciﬁc discrete event times in its trajectory. Second,considering an agent at each such event-time, we formulatea Receding Horizon Control Problem (RHCP) that deter-mines the agent’s optimal immediate control decisions overan optimally determined planning horizon. These controldecisions are subsequently executed over a shorter actionhorizon deﬁned by the next event that the agent observes, andthe same process is continued in this event-driven manner. Asthe third step, we show that this RHCP includes an optimalcontrol component and it is then solved considering energy-aware second-order agents. Finally, several different numer-ical examples (i.e., PMN problems) are used to compare thedeveloped RHC solution with respect to the RHC solutionproposed in [8] that uses energy-agnostic ﬁrst-order agents.This paper is organized as follows. Section II presents theproblem formulation and overview of the RHC approach.Sections III and IV present the formulation and solution ofthe RHCP with second-order agents and ﬁrst-order agents,respectively. Numerical results are provided in Section V.Finally, Section VI concludes the paper.II. P ROBLEM F ORMULATION

We consider a 2-dimensional mission space containing M targets (nodes) in the set T = { , , . . . , M } where thelocation of target i ∈ T is ﬁxed at Y i ∈ R . A team of N agents in the set A = { , , . . . , N } is deployed to monitor thetargets. Each agent a ∈ A moves within this mission spacewhere its location and orientation at time t are denoted by s a ( t ) ∈ R and θ a ( t ) ∈ [ , π ] , respectively. a) Target Model : Each target i ∈ T has an associated uncertainty state R i ( t ) ∈ R which follows the dynamics [10]:˙ R i ( t ) = (cid:40) A i − B i N i ( t ) if R i ( t ) > A i − B i N i ( t ) >

00 otherwise, (1)where N i ( t ) = ∑ a ∈ A { s a ( t ) = Y i } ( {·} denotes the indicatorfunction) is the number of agents present at target i at time t . According to (1): (i) R i ( t ) increases at a rate A i whenno agent is visiting target i , (ii) R i ( t ) decreases at a rate A i − B i N i ( t ) where B i is the uncertainty removal rate by avisiting agent to the target i and (iii) R i ( t ) ≥ , ∀ t . b) Agent Model : The location and orientation ( s a ( t ) , θ a ( t )) of an agent a ∈ A follows the second-order unicycle dynamics given by˙ s a ( t ) = v a ( t ) (cid:2) cos ( θ a ( t )) sin ( θ a ( t )) (cid:3) T , ˙ v a ( t ) = u a ( t ) , ˙ θ a ( t ) = w a ( t ) , (2)where v a ( t ) is the tangential velocity, u a ( t ) is the tangentialacceleration and w a ( t ) is the angular velocity. We consider u a ( t ) and w a ( t ) as the agent control inputs.Note that according to (1), the agent has to stay stationaryon a target i ∈ T for some positive amount of time tocontribute to decreasing a positive target uncertainty R i ( t ) .Therefore, during such a dwell-time period, the agent mustenforce u a ( t ) = v a ( t ) = s a ( t ) = Y i . c) Objective : Our aim is to minimize the compositeobjective J T of the total energy spent J e (called the energyobjective ) and the mean system uncertainty J s (called the sensing objective ) over a ﬁnite time interval [ , T ] : J T (cid:44) α J e + J s = α (cid:90) T ∑ a ∈ A u a ( t ) dt (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J e + T (cid:90) T ∑ i ∈ T R i ( t ) dt (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J s , (3)by controlling agent control inputs u a ( t ) , w a ( t ) , ∀ a ∈ A , t ∈ [ , T ] . Note that α in (3) is a weight factor that can also bemanipulated to constrain the resulting optimal agent controls(details on selecting α to ensure proper normalization of the J T components are provided in Appendix A). Note also thatthe cost of angular velocity (steering) control is not includedin (3). The trade-off between J e and J s components of (3) isclear from the fact that the aggressiveness of agent transitionsin-between targets affects negatively the J e component butpositively the J s component. d) Graph Topology : We embed a directed graph topol-ogy G = ( T , E ) into the mission space so that the targets are represented by the graph vertices T = { , , . . . , M } andthe inter-target trajectory segments are represented by thegraph edges E ⊆ { ( i , j ) : i , j ∈ T } (see also Fig. 1). Thesetrajectory segments may take arbitrary (prespeciﬁed) shapesso as to account for constraints in the mission space andagent motion. We use ρ i j to denote the transit-time thatan agent spends on a trajectory segment ( i , j ) ∈ E to reachtarget j from target i . In contrast to [10] and [8] where thesetransit-time values were treated as predeﬁned, in this workthey are considered as control-dependent. We also use P i j to represent the transit-time interval ( P i j ⊂ [ , T ] of length ρ i j ) corresponding to the transit-time ρ i j .The neighbor set and the neighborhood of a target i ∈ T are deﬁned based on the available trajectory segments E as N i (cid:44) { j : ( i , j ) ∈ E } and ¯ N i = N i ∪ { i } . (4) e) Control : As stated earlier, when an agent a ∈ A dwells on a target i ∈ T , the agent control u a ( t ) is zero.However, over such a dwell-time period, the agent control w a ( t ) may or may not be zero (exact details will be providedlater). Next, when the agent is ready to leave the target i , itneeds to decide the next-visit target j ∈ N i along with theorresponding control proﬁles u a ( t ) , w a ( t ) to be used on thetrajectory segment ( i , j ) ∈ E over t ∈ P i j .In essence, the overall control exerted on an agent can beseen as a sequence of: dwell-times δ i ∈ R ≥ , next-visit targets j ∈ N i and control proﬁle segments { ( u a ( τ ) , w a ( τ )) : τ ∈ P i j } . Our goal is to determine ( δ i ( t s ) , j ( t s ) , { ( u a ( τ ) , w a ( τ )) : τ ∈ P i j ( t s ) } ) for any agent a ∈ A residing at any target i ∈ T at any time t s ∈ [ , T ] , which is optimal in the senseof minimizing (3).Clearly, this PMN problem is more complicated than thewell known NP-Hard traveling salesman problem (TSP)[13] due to its inclusion of: (i) multiple agents, (ii) targetdynamics, (iii) agent dynamics, (iv) target dwell-times and(v) repeated target visits. Even though one can still resortto dynamic programming techniques to solve this PMNproblem, for all the above reasons, the problem is intractable- even for the most simplistic problem conﬁgurations. f) Receding Horizon Control : As a solution to thisPMN problem, inspired by the prior work [8] (where wedealt with ﬁrst-order agents without agent energy concerns),this paper proposes an

Event-Driven Receding Horizon Con-troller (RHC) at each agent. The key idea behind RHCderives from Model Predictive Control (MPC). However,RHC exploits the problem’s event-driven nature to signiﬁ-cantly reduce the complexity by effectively decreasing thefrequency of control updates. As introduced and extendedlater on in [15] and [16], [8] respectively, the RHC isinvoked by the agents in a distributed manner at speciﬁcevents of interest in their trajectories. Upon invoking it, RHCdetermines the agent controls that optimize the objective(3) over a planning horizon and subsequently executes thedetermined optimal controls over a shorter action horizon .In particular, when the RHC is invoked at some event-time t s ∈ [ , T ] by an agent a ∈ A while residing at target i ∈ T ,it determines: (i) the remaining dwell-time δ i ( t s ) at target i ,(ii) the next-visit target j ( t s ) ∈ N i , (iii) the control proﬁlesegments { u a ( τ ) , w a ( τ ) : τ ∈ P i j ( t s ) } and (iv) the dwell-time δ j ( t s ) at target j ( t s ) . These control decisions are jointlyrepresented by U ia ( t s ) and its optimal value is determined bysolving an optimization problem of the form: U ∗ ia ( t s ) = arg min U ia ( t s ) ∈ U ( t s ) J H ( X ia ( t s ) , U ia ( t s ) ; H ) + ˆ J H ( X ia ( t s + H )) (5)where X ia ( t s ) is the current local state and U ( t s ) is the feasi-ble control set at time t s (exact deﬁnitions are provided later).The term J H ( X ia ( t s ) , U ia ( t s ) ; H ) represents the immediate costover the planning horizon [ t s , t s + H ] and ˆ J H ( X ia ( t s + H ) is anestimate of the future cost based on the state at t s + H .In particular, we follow the variable horizon concept pro-posed in [8] where the planning horizon length is treated asan upper-bounded function of control decisions w ( U ia ( t s )) ≤ H rather than an exogenously selected value H , and theˆ J H ( X ia ( t s + H ) term is ignored. Hence, this approach incorpo-rates the selection of planning horizon length w ( U ia ( t s )) into the optimization problem (5), which now can be re-stated as U ∗ ia ( t s ) = arg min U ia ( t s ) ∈ U ( t s ) J H ( X ia ( t s ) , U ia ( t s ) ; w ( U ia ( t s ))) subject to w ( U ia ( t s )) ≤ H . (6) A. Preliminary Results

According to (1), the target state (uncertainty) R i ( t ) ofa target i ∈ T is piece-wise linear and its gradient ˙ R i ( t ) changes only when one of the following (strictly local) events occurs: (i) An agent arrival at i , (ii) R i ( t ) switches frompositive to zero, denoted as [ R i ( t ) → + ] , or (iii) An agentdeparture from i . Let us denote the sequence of such eventtimes (associated with the target i ) as t ki where k ∈ Z > with t i =

0. Then, it is easy to see from (1) that˙ R i ( t ) = ˙ R i ( t ki ) , ∀ t ∈ [ t ki , t k + i ) . (7) Remark 1:

As pointed out in [17], [8] (and the referencestherein), allowing multiple agents to simultaneously resideon a target (known also as “simultaneous target sharing”)is known to lead to solutions with poor performance levels.Thus, we enforce a constraint [8] on the controller to ensure: N i ( t ) ∈ { , } , ∀ t ∈ [ , T ] , ∀ i ∈ T . (8)Clearly, this constraint only applies if N ≥ { ˙ R i ( t ki ) } k = , ,... is a cyclic order of three elements: {− ( B i − A i ) , , A i } . Next, in order to make sure that each agent iscapable of enforcing the event [ R i → + ] at any target i ∈ T ,we assume the following simple stability condition [8]: Assumption 1:

Target uncertainty rate parameters A i and B i of each target i ∈ T satisfy 0 < A i < B i . a) Decomposition of the Sensing Objective J s : Thefollowing theorem provides a target-wise and temporal de-composition of the sensing objective J s deﬁned in (3). Theorem 1: ([8, Th.1]) The contribution to the term J s in(3) by a target i ∈ T during a time period [ t , t ) ⊆ [ t ki , t k + i ) for some k ∈ Z ≥ is T J i ( t , t ) , where, J i ( t , t ) = (cid:90) t t R i ( t ) dt = ( t − t ) (cid:2) R i ( t ) + ˙ R i ( t )( t − t ) (cid:3) . (9) b) Local Sensing Objective Function : The local sens-ing objective function of a target i ∈ T over a period [ t , t ) ⊆ [ , T ] is deﬁned as¯ J i ( t , t ) = ∑ j ∈ ¯ N i J j ( t , t ) , (10)where each J j ( t , t ) term is evaluated using Theorem 1. c) Decomposition of the Energy Objective J e : Asimilar decomposition result as Theorem 1 applies to theenergy objective J e deﬁned in (3). However, this result isimmediate from (3) and is as follows. The contribution to theterm J e in (3) by an agent a ∈ A from traversing a trajectorysegment ( i , j ) ∈ E over the transit-time interval [ t o , t f ] (cid:44) P i j is J a ( t o , t f ) , where, J a ( t o , t f ) = (cid:90) t f t o u a ( t ) dt . (11)ote that the agent does not have any contribution to the J e term during dwell-time intervals as u a ( t ) = d) Agent Angular Velocity Proﬁle w a ( t ) : The controlproﬁle segment { w a ( t ) : t ∈ P i j } that needs to be used byan agent a ∈ A over the transit-time interval P i j on thetrajectory segment ( i , j ) ∈ E can be obtained using only thefollowing information: (i) the agent tangential accelerationproﬁle { u a ( t ) : t ∈ P i j } and (ii) the shape of the trajectorysegment ( i , j ) given in a parametric form { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } . Note that the parameter values p = p o and p = p f correspond to the terminal target locations Y i ≡ ( x ( p o ) , y ( p o )) and Y j ≡ ( x ( p f ) , y ( p f )) , respectively. For notational conve-nience, let us denote x (cid:48) p = dx ( p ) dp , y (cid:48) p = dy ( p ) dp , x (cid:48)(cid:48) p = d x ( p ) dp and y (cid:48)(cid:48) p = d y ( p ) dp .First, we require a minor technical assumption regardingthe said trajectory segment shape parameterization. Assumption 2:

There exists an injective (i.e., one-to-one)function f : [ p o , p f ] → [ , y i j ] such that f ( p ) (cid:44) (cid:90) pp o (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) d p , (12)with f ( p f ) = y i j and a corresponding inverse function f − .This assumption simply means that we should be able toexpress the distance, say l , along the trajectory segment start-ing from ( x ( p o ) , y ( p o )) to ( x ( p ) , y ( p )) where p ∈ [ p o , p f ] ,explicitly in terms of the parameter p (i.e., l = f ( p ) ) andvice versa (i.e., p = f − ( l ) ). Clearly this assumption holdsif the distance l is used directly as the parameter p (i.e., p = l ) that characterizes the trajectory segment shape.Second, let us deﬁne a function F : [ p o , p f ] → R such that F ( p ) (cid:44) x (cid:48) p y (cid:48)(cid:48) p − y (cid:48) p x (cid:48)(cid:48) p (cid:0) ( x (cid:48) p ) + ( y (cid:48) p ) (cid:1) . (13)Finally, as shown in Fig. 2, let us denote by l a ( t ) , t ∈ P i j the total distance the agent has traveled on the trajectorysegment ( i , j ) by time t . According to (2), v a ( t ) , t ∈ P i j represents the agent tangential velocity on the trajectorysegment at time t . Considering the agent dynamics alongthe tangential direction to the trajectory segment, note thatwe can write l a ( t ) = (cid:90) tt o v a ( t ) dt and v a ( t ) = (cid:90) tt o u a ( t ) dt , (14)for all t ∈ [ t o , t f ] (cid:44) P i j (note also that the terminal conditions l a ( t f ) = y i j and v a ( t f ) = a ∈ A whiletraversing a trajectory segment ( i , j ) ∈ E . Theorem 2:

The required agent angular velocity proﬁle { w a ( t ) : t ∈ P i j } on trajectory segment ( i , j ) ∈ E is w a ( t ) = F ( f − ( l a ( t ))) v a ( t ) , (15)where f ( · ) and F ( · ) are as in (12) and (13), respectively. Proof:

Provided in Appendix B.For an example, if the trajectory segment ( i , j ) (betweentarget locations Y i and Y j ) takes a circular shape centeredat C i j ∈ R with a radius r i j , it can be represented bythe parametric form: { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } where ( x ( p ) , y ( p )) ≡ C i j + r i j [ cos ( p ) , sin ( p )] T and p o = arctan ( Y i − C i j ) , p f = arctan ( Y j − C i j ) . Using (13) and (15), it can beshown that F ( p ) = / r i j and w a ( t ) = v a ( t ) / r i j , respectively.Similarly, if the trajectory segment ( i , j ) takes a linearshape, it can be shown that f ( p ) = p and F ( p ) = w a ( t ) = Remark 2:

In robotics applications where line-followingtechniques can be used [18], an agent can use its line-following capabilities to control its angular velocity w a ( t ) (instead of using (15)) - irrespective of its tangential accel-eration u a ( t ) .In conclusion, Theorem 2 allows us to dispense of w a ( t ) as a control input because it is always determined through u a ( t ) (which gives v a ( t ) via (14)) and the prespeciﬁed shapeof the trajectory segment. e) The Equivalent Dynamic Agent Model : Since wenow have discussed how an agent a ∈ A can control itsangular velocity w a ( t ) (i.e., via (15)), we can omit angulardynamics from (2) to construct an equivalent dynamic agentmodel, that focuses only on the tangential dynamics on atrajectory segment ( i , j ) ∈ E . In particular, as a direct conse-quence of (14), upon taking the state vector as [ l a ( t ) , v a ( t )] T for some t ∈ P i j , we can express the corresponding statedynamics as a second-order single-input linear system: (cid:20) ˙ l a ( t ) ˙ v a ( t ) (cid:21) = (cid:20) (cid:21) (cid:20) l a ( t ) v a ( t ) (cid:21) + (cid:20) (cid:21) u a ( t ) . (16)In the sequel, we use (16) and (15) to determine the op-timal agent control proﬁle segments { u a ( t ) : t ∈ P i j } and { w a ( t ) : t ∈ P i j } , respectively. This particular decompositionof unicycle agent dynamics is fundamentally similar to thatproposed in [19]. B. ED-RHC Problem (RHCP) Formulation

Consider an agent a ∈ A residing on a target i ∈ T at some time t s ∈ [ , T ] . Recall that control U ia ( t s ) in (6)includes dwell-time decisions δ i and δ j at the current target i and the next-visit target j ∈ N i , respectively. As shown inFig. 3, a dwell-time decision δ i (or δ j ) can be divided intotwo interdependent decisions: (i) the active time τ i (or τ j )and (ii) the inactive (or idle ) time ¯ τ i (or ¯ τ j ). Therefore, theagent has to optimally choose decision variables which formthe control vector U ia ( t s ) = [ τ i , ¯ τ i , j , { u a ( t ) } , τ j , ¯ τ j ] . Note thathere we have: (i) omitted representing each of these decisionvariable’s dependence on t s , (ii) used the notation { u a ( t ) } torepresent { u a ( t ) : t ∈ P i j ( t s ) } and (iii) omitted { w a ( t ) : t ∈ P i j ( t s ) } as it can be found directly from { u a ( t ) } and (15). ) The Receding Horizon Control Problem (RHCP) : Let us denote the real-valued component of the control vector U ia ( t s ) in (6) as U ia j ( t s ) = [ τ i , ¯ τ i , { u a ( t ) } , τ j , ¯ τ j ] . The discretecomponent of U ia ( t s ) is simply the next-visit target j ∈ N i . Inthis setting (see also Fig. 3), we deﬁne the planning horizonlength w ( U ia ( t s )) in (6) as w ( U ia j ( t s )) (cid:44) τ i + ¯ τ i + ρ i j + τ j + ¯ τ j . (17)The current local state X ia ( t s ) in (6) is considered as X ia ( t s ) =[ s a , v a , θ a , { R j : j ∈ ¯ N i } ] (again, omitting the dependence on t s ). Then, the optimal controls are obtained by solving (6),which can be re-stated as the following set of optimizationproblems, henceforth called the RHC Problem (RHCP): U ∗ ia j = arg min U iaj ∈ U J H ( X ia ( t s ) , U ia j ; w ( U ia j )) ; ∀ j ∈ N i , subject to w ( U ia j ) ≤ H (18) j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ∗ ia j )) . (19)Note that (18) requires solving | N i | optimization problems,one for each neighboring target j ∈ N i ( | · | is the cardinalityoperator). The next step (19) is a simple comparison todetermine the optimal next-visit target j ∗ . Therefore, the ﬁnaloptimal controls of the RHCP are U ∗ ia ( t s ) = [ U ∗ ia j ∗ , j ∗ ] .The objective function J H ( · ) in (18) is chosen to reﬂect thecontribution to the main objective J T in (3) by the targets inthe neighborhood ¯ N i and by the agent a , over the planninghorizon [ t s , t s + w ] as J H ( X ia ( t s ) , U ia j ; w ) (cid:44) α H J a ( t o , t f ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J eH + w ¯ J i ( t s , t s + w ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J sH . (20)where w = w ( U ia j ) and α H (cid:44) α (the weight factor used in(3)). In (20), the form of the J sH component has been selectedso that it is analogous to the J s component in (3) (with T replaced by w ). As illustrated in Fig. 3, note also that t e (cid:44) t s + w , [ t o , t f ] (cid:44) P i j ⊆ [ t s , t e ] and ρ i j (cid:44) t f − t o .Fig. 3: Event timeline and control decisions in ED-RHC. b) Planning Horizon : In conventional RHC methods,the RHCP objective function is evaluated over a ﬁxed plan-ning horizon length H , where H is selected exogenously.This makes the RHCP solution dependent on the choiceof H . In contrast, through (20) and (17) above, we havemade the RHCP solution (i.e., (18) and (19)) free of theparameter H , by using H only as an upper-bound to theactual planning horizon length w ( U ia j ) in (17) and selecting H to be sufﬁciently large (e.g., H = T − t s ). In fact, since the planning horizon length w ( U ia j ) is acontrol variable, the above RHCP formulation simultane-ously determines the optimal planning horizon length w ∗ = w ( U ∗ ia j ∗ ) . Moreover, as shown in Fig. 3, the time to departfrom the current target i (i.e., t o ), the time to arrive at thedestination target j (i.e., t f ) and the corresponding transit-time ρ i j = t f − t o , are also control dependent. Hence, thisRHCP formulation also determines the optimal values ofeach of these quantities: t ∗ o , t ∗ f and ρ ∗ i j ∗ , respectively. c) Overview of the RHCP Solution Process : Lookingback at (9) and (10), notice that the sensing component J sH of the RHCP objective (20) does not explicitly depend onthe agent control proﬁle segment { u a ( t ) : t ∈ P i j } , but, itdepends on the agent’s transit-time ρ i j value and on the othercontrol decisions in U ia j : τ i , ¯ τ i , τ j , ¯ τ j . Therefore, let us denote J sH as a function parameterized by ρ i j : J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) .In contrast, based on (11), notice that the energy compo-nent J eH of the RHCP objective (20) only depends on agentcontrol proﬁle segments, speciﬁcally on { u a ( t ) : t ∈ P i j } .Therefore, let us denote J eH simply as J eH ( { u a ( t ) } ) .As illustrated in Fig. 4, we exploit this property of theRHCP objective components ( J sH and J eH ) to solve theRHCP (18). In particular, we start with analytically solvingthe optimization problem which we label as the RHCP( ρ i j ): J ∗ sH ( ρ i j ) (cid:44) min ( τ i , ¯ τ i , τ j , ¯ τ j ) ∈ U s ( ρ ij ) J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) . (21)For this purpose, we exploit a few results established in [8]where the RHCP( ρ i j ) (21) has already been solved whiletreating ρ i j as a known constant.Next, we use the J ∗ sH ( ρ i j ) function obtained from (21) andthe relationship ρ i j = t f − t o to reformulate the problem ofoptimizing the RHCP objective (20) as an optimal controlproblem (OCP): [ t ∗ o , t ∗ f , { u ∗ a ( t ) } ] = arg min t o , t f , { u a ( t ) } α H J e ( { u a ( t ) } ) + J ∗ sH ( t f − t o ) . (22)Finally, as shown in Fig. 4, it is straightforward howthe RHCP (18) solution U ∗ ia j can be constructed from theobtained solutions of the OCP (22) and the RHCP( ρ i j ) (21).Fig. 4: Overview of the RHCP Solution Process whensolving (18) for some next-visit target j ∈ N i . 1. Solvingthe receding horizon control component of (18) (i.e., (21));2. Solving the optimal control component of (18) (i.e., (22));3. Constructing the ﬁnal solution of (18). ) Event-Driven Action Horizon : Each RHCP solution(i.e., U ∗ ia ( t s ) = [ U ∗ ia j ∗ , j ∗ ] from (18)-(19)) obtained over aplanning horizon w ( U ∗ ia j ∗ ) ≤ H is generally executed over ashorter action horizon h ≤ w ( U ∗ ia j ∗ ) . In particular, the actionhorizon h is determined by the ﬁrst event that takes placeafter t s , where the RHCP was last solved. Such a subsequentevent may be controllable if it results from executing the lastsolved RHCP solution or uncontrollable if it results from arandom or an external event (if such events are allowed).When executing the RHCP solution obtained by an agentat target i at time t s , there are three mutually exclusivecontrollable events that may occur subsequently. They are:

1. Event [ h → τ ∗ i ] : This event is feasible only if τ ∗ i ( t s ) > t = t s + τ ∗ i ( t s ) . If R i ( t ) >

0, itcoincides with a departure event from target i . Otherwise,i.e., if R i ( t ) =

0, it coincides with a [ R i → + ] event.

2. Event [ h → ¯ τ ∗ i ] : This event is feasible if τ ∗ i ( t s ) = R i ( t s ) =

0) and ¯ τ ∗ i ( t s ) ≥

0. It occurs at t = t s + ¯ τ ∗ i ( t s ) and coincides with a departure event from target i .

3. Event [ h → ρ i j ∗ ] : This event is feasible only if adeparture event (from target i ) occurred at t s . Clearly thisevent coincides with an arrival event at target j ∗ ( t s ) .In an agent trajectory, at a given time instant, only oneof these three controllable events is feasible. However, thereare two uncontrollable events that may occur at an agentresiding in a target i due to two speciﬁc controllable eventsat a neighboring target j ∈ N i . These two types of eventsare aimed to enforce the “no simultaneous target sharing”condition (i.e., the control constraint (8)) and thus, onlyapplies to multi-agent problems. To enforce this condition,an agent at target i modiﬁes its neighborhood N i to N i \{ j } when: (i) another agent already resides at target j or (ii)another agent is en-route to visit target j . Therefore, wedeﬁne the following two neighbor induced events at target i due to a neighbor j ∈ N i :

4. Covering Event C j , j ∈ N i : This event causes N i tobe modiﬁed to N i \{ j } .

5. Uncovering Event ¯ C j , j ∈ N i : This event causes N i to be modiﬁed to N i ∪ { j } .If one of these two events occurs while the agent isawaiting an event [ h → τ ∗ i ] or [ h → ¯ τ ∗ i ] , the RHCP is resolvedto account for the updated neighborhood N i . e) Three Forms of RHCPs : The exact form of theRHCP ((18) and (19)) that needs to be solved at a certaintime depends on the event that triggered the end of theprevious action horizon. In particular, corresponding to thethree controllable event types, there are three forms ofRHCPs:

RHCP1:

At a target i and time t s , this particular problemform is solved upon: (i) an arrival event [ h → ρ ki ] where k ∈ N i or (ii) a C j (or a ¯ C j ) event occurred when R i ( t s ) > j ∈ N i . Since R i ( t s ) > U ia ( t s ) = [ τ i , ¯ τ j , j , { u a ( t ) } , τ j , ¯ τ j ] with τ i ≥ RHCP2:

At a target i and time t s , this particular problemform is solved when R i ( t s ) = [ h → τ ∗ i ] or (ii) a C j (or a ¯ C j ) event where j ∈ N i . Since R i ( t s ) = RHCP1 but with τ i =

0, hence simpler.

RHCP3:

At a target i and time t s , this particular problemform is solved upon: (i) an event [ h → τ ∗ i ] with R i ( t s ) > [ h → ¯ τ ∗ i ] . Simply, this problem form issolved whenever the agent is ready to depart from the target.Therefore, it is the same as RHCP1 but with τ i = τ i =

0. III. S

OLVING E VENT -D RIVEN

RHCP S In this section, we present the solutions to the three RHCPforms identiﬁed above. We begin with

RHCP3 . A. Solution of

RHCP3

RHCP3 is the simplest RHCP given that τ i = ¯ τ i = U ia by default. Therefore, U ia j (i.e., the real-valued component of U ia , used in (18)) is limited to U ia j = [ { u a ( t ) } , τ j , ¯ τ j ] and theplanning horizon w ( U ia j ) deﬁned in (17) becomes w ( U ia j ) = ρ i j + τ j + ¯ τ j .Under these conditions, we next solve (18) (via solvingRHCP( ρ i j ) (21) and OCP (22), as shown in Fig. 4) and (19)to obtain the RHCP3 solution. a) Solution of RHCP( ρ i j ) (21) : As mentioned before,RHCP( ρ i j ) has already been solved in [8] - while treating ρ i j as a known ﬁxed value. In particular, the RHCP( ρ i j ) solutioncorresponding to the RHCP3 takes the form [8, Th. 2]: ( τ ∗ i , ¯ τ ∗ i ) = ( , ) , ( τ ∗ j , ¯ τ ∗ j ) =  ( , ) if ¯ A ≥ B j or D > D ( D , ) else if D < D ( D , ) else if B j > ¯ A ≥ B j (cid:20) − ρ ij ( ρ ij + D ) (cid:21) ( D , ¯ D ) else if ¯ D ≤ ¯ D ( D , ¯ D ) otherwise, J ∗ sH ( ρ i j ) = J sH ( τ ∗ j , ¯ τ ∗ j ; ρ i j ) , (23)where¯ A = ∑ m ∈ ¯ N i A m , D = ¯ A ρ i j B i − ¯ A , D = min { D , H − ρ i j } , D = R j ( t o ) B j − A j + A j B j − A j ρ i j , ¯ D = (cid:115) ( B j − A j )( ρ i j + D ) − B j ρ i j ¯ A j − ( ρ i j + D ) , ¯ A j = ¯ A − A j , ¯ D (cid:44) H − ( ρ i j + D ) , J sH ( τ j , ¯ τ j ; ρ i j ) = C τ j + C ¯ τ j + C τ j ¯ τ j + C τ j + C ¯ τ j + C ρ i j + τ j + ¯ τ j , C = [ ¯ A − B j ] , C = ¯ A j , C = ¯ A j , C = [ ¯ R ( t o ) + ¯ A ρ i j ] , ¯ R = ∑ m ∈ ¯ N i R m , C = [ ¯ R j ( t o ) + ¯ A j ρ i j ] , ¯ R j (cid:44) ¯ R − R j , C = ρ i j [ R ( t o ) + ¯ A ρ i j ] . (24)ote that in (23), not only J ∗ sH , but also τ ∗ j and ¯ τ ∗ j arefunctions of the transit-time ρ i j . To provide intuition aboutthe J ∗ sH ( ρ i j ) function form, let us consider the ﬁrst case in(23) where ( τ ∗ j , ¯ τ ∗ j ) = ( , ) that results in J ∗ sH ( ρ i j ) = J sH ( , ρ i j ) = ¯ R ( t o ) +

12 ¯ A ρ i j , (25)under the condition ¯ A ≥ B j or D > D . Using (24), it canbe shown that D > D ⇐⇒ ρ i j > min { R j ( t o )( B i − ¯ A ) ¯ AB j − A j B i , H ( − ¯ AB i ) } . From this example, it is clear that the function J ∗ sH ( ρ i j ) isdependent on the neighborhood parameters (e.g., ¯ A , B j , B i ) aswell as the current neighborhood state (e.g., ¯ R ( t o ) , R j ( t o ) ). b) Objective Function of OCP (22) : Note that we nowhave solved the RHCP( ρ i j ) and have obtained the functions(of ρ i j ): τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j , and, most importantly, J ∗ sH . Based onthe RHCP solution process outlined in Fig. 4, our next stepis to formulate and solve the corresponding OCP (22).As shown in (22), the sensing objective component of OCPis J ∗ sH ( t f − t o ) . Note that we now can explicitly express thisterm using the obtained J ∗ sH ( ρ i j ) function in (23) and therelationship ρ i j = t f − t o . For notational convenience, takinginto account that in RHCP3 , t o is the the current event timewhen the RHCP is solved (i.e., t o = t s where t s is ﬁxed andknown), let us denote this sensing objective component ofOCP as φ ( t f ) (cid:44) J ∗ sH ( t f − t o ) . (26)On the other hand, using (20) and (11), the energyobjective component of OCP (22) can be expressed as J eH ( { u a ( t ) } ) = (cid:90) t f t o u a ( t ) dt . (27) c) Solution of OCP (22) : In the following analysis, fornotational convenience, we use ˙ x = Ax ( t ) + Bu ( t ) with A = (cid:20) (cid:21) , B = (cid:20) (cid:21) , x ( t ) = (cid:20) l a ( t ) v a ( t ) (cid:21) , u ( t ) = u a ( t ) , (28)to represent the agent dynamics stated in (16). Under thisnotation, using (26) and (27), the OCP (22) can be stated asmin t f , { u ( t ) } α H (cid:90) t f t o u ( t ) dt + φ ( t f ) subject to ˙ x = Ax ( t ) + Bu ( t ) , x ( t o ) = [ , ] T , x ( t f ) = [ y i j , ] T . (29)The last two constraints in (29) are simply terminal con-straints for the agent motion on the trajectory segment ( i , j ) .Note that (29) is a standard free ﬁnal time, ﬁxed initialand ﬁnal state optimal control problem. Hence, there is anestablished solution procedure [20] as outlined next.First, the Hamiltonian corresponding to (29) is written as H ( x ( t ) , u ( t ) , t ) (cid:44) α H u ( t ) + λ T ( t )( Ax ( t ) + Bu ( t )) , (30) where λ ( t ) represents the co-state variables. Next, the ad-joined function that combines the terminal constraint on x ( t f ) and the terminal cost φ ( t f ) is written as Φ ( x ( t f ) , t f ) (cid:44) φ ( t f ) + ν T ( x ( t f ) − [ y i j , ] T ) , where ν is a set of multipliers.Finally, the OCP in (29) can be solved to obtain thecorresponding optimal { x ( t ) , u ( t ) , λ ( t ) : t ∈ [ t o , t f ] } , t f and ν values by solving the following system of equations [20]: ∂ H ∂ u = α H u ( t ) + λ T ( t ) B = , (31)˙ λ = − (cid:18) ∂ H ∂ x (cid:19) T = − A T λ ( t ) , λ ( t f ) = (cid:18) ∂ Φ ∂ x ( t f ) (cid:19) T = ν , (32) d Φ dt f + α H u ( t f ) = d φ dt f + ν T ( Ax ( t f ) + Bu ( t f )) + α H u ( t f ) = Lemma 1:

The optimal terminal time t ∗ f of the OCP (29)satisﬁes the equation: ( t f − t o ) d φ ( t f ) dt f = α H y i j , (34)where φ ( t f ) is known from (26) and the correspondingoptimal control law u ∗ ( t ) is given by u ∗ ( t ) = y i j ( t ∗ f − t o ) (cid:20) t ∗ f + t o − t (cid:21) , ∀ t ≡ [ t o , t f ] . (35) Proof:

First, we take ν = [ ν , ν ] T , λ ( t ) = [ λ ( t ) , λ ( t )] T and solve (32) for λ ( t ) and λ ( t ) . This gives: λ ( t ) = ν and λ ( t ) = ν + ν ( t f − t ) , ∀ t ∈ P i j , (recall P i j = [ t o , t f ] ). We then solve (31) for u ( t ) to obtain: u ( t ) = − λ ( t ) α H = − α H (cid:0) ν + ν ( t f − t ) (cid:1) , ∀ t ∈ P i j . (36)Next, we take x ( t ) = [ x ( t ) , x ( t )] T and solve the agentdynamics equation in (29) (also using (36)) for x ( t ) . Thisresults in: x ( t ) = ( t o − t ) α H (cid:16) ν + ν t f − ν ( t o + t ) (cid:17) , ∀ t ∈ P i j . Now, using the terminal constraint x ( t f ) = ν = − ν ( t f − t o ) /

2. Back substituting this in above x ( t ) we get a further simpliﬁed expression for it as x ( t ) = ν α H (cid:0) t − ( t o + t f ) t + t o t f (cid:1) , ∀ t ∈ P i j . Applying this result in the relationship x ( t ) = (cid:82) tt o x ( t ) dt (i.e., agent dynamics) we get: x ( t ) = v ( t − t o ) α H (cid:0) t − ( t f + t o ) t + t o ( t f − t o ) (cid:1) , ∀ t ∈ P i j . imilar to before, using the terminal constraint x ( t f ) = y i j on the above (and via back substituting), we get ν = α H y i j ( t f − t o ) (and ν = − α H y i j ( t f − t o ) , u ( t f ) = y i j ( t f − t o ) ) . (37)Now we are ready to use (33) to solve for the optimal t f value (i.e., t ∗ f ). Note that (33) directly simpliﬁes to the form: d φ dt f + ν u ( t f ) + α H u ( t f ) = , which we can further reduce to the form (using (37)): d φ dt f − α H y i j ( t f − t o ) = , and obtain (34). Finally, the optimal control law u ∗ ( t ) in (35)can be obtained by substituting (37) in (36).Using the optimal terminal time t ∗ f and control u ∗ ( t ) (i.e., u ∗ a ( t ) ) proven in Lemma 1, the optimal energy objectivecomponent of this OCP (i.e., (27)) can be obtained as J eH ( { u ∗ a ( t ) } ) = y i j ( t ∗ f − t o ) . (38)The corresponding optimal sensing objective component(i.e., (26)) is directly given by φ ( t ∗ f ) = J ∗ sH ( t ∗ f − t o ) . Finally,the optimal transit-time value is ρ ∗ i j = t ∗ f − t o . d) Solution of RHCP (18) for U ∗ ia j : As outlined in Fig.4, we now can conclude solving RHCP (18). First, we applythe determined ρ ∗ i j value in (23) to get the optimal controldecisions: τ ∗ j and ¯ τ ∗ j of the control vector U ∗ ia j (18). Remark 3:

Note that τ ∗ j and ¯ τ ∗ j in (23) are piece-wisefunctions of ρ i j (with at most ﬁve cases). Hence, J ∗ sH ( ρ i j ) in (23) is also a piece-wise function of ρ i j . Even thoughthis presents a complication to the proposed RHCP (18)solution process, it can be resolved by considering one case(of J ∗ sH ( ρ i j ) ) at a time when the corresponding OCP (22) issolved. Then, the resulting optimal transit-time value ρ ∗ i j canbe used to ensure the validity as well as the optimality ofthe considered case of J ∗ sH ( ρ i j ) (compared to other cases).Among the remaining control decisions in U ∗ ia j (18), wehave already found the optimal tangential acceleration proﬁlesegment { u ∗ a ( t ) : t ∈ P i j } . Integrating this, the correspondingtangential velocity proﬁle segment can be obtained as v ∗ a ( t ) = y i j ( ρ ∗ i j ) ( t − t o )( t o + ρ ∗ i j − t ) , ∀ t ∈ P i j . (39)Finally, the optimal angular velocity proﬁle segment { w ∗ a ( t ) : t ∈ P i j } (required in U ∗ ia j ) can be found using (39) in(15) together with the information about the shape of thetrajectory segment ( i , j ) . Remark 4:

Note that the OCP (29) (or (22) in general)only requires the total length y i j value of the trajectorysegment ( i , j ) . The shape of ( i , j ) becomes important onlywhen w ∗ a ( t ) has to be determined to facilitate the agent’sdeparture from target i to reach target j (i.e., at the end of an RHCP3 solving process). Therefore, even though we initiallyassumed the shapes of trajectory segments as prespeciﬁed, the proposed RHC framework can adapt even if they changeoccasionally. For instance, a new class of external events(similar to C j and ¯ C j ) can be deﬁned based on such trajectorysegment shape change events - to make agents react to suchevents. This ﬂexibility is an advantage as the shape of atrajectory segment may have to be designed (by an upper-level trajectory planner) taking into account moving obstaclesand other agents in the mission space as well as the agent’sown motion and controller constraints. e) Solution of RHCP (19) for j ∗ : We now havesolved RHCP (18) and have obtained the optimal controlvector U ∗ ia j corresponding to the next-visit target j . Next, thisprocess should be repeated for all the neighboring targets j ∈ N i to get the control vectors: { U ia j : j ∈ N i } . Finally,the optimal next-visit target j ∗ can be found from (19) as j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ia j )) .Upon solving RHCP3 , agent a departs from the target i and starts following the trajectory segment ( i , j ) whileexecuting the obtained optimal agent controls until it arrivesat the target j ∗ . According to the proposed RHC architecture,upon arrival, the agent will solve an instance of RHCP1 . B. Solution of

RHCP1

We now directly consider

RHCP1 as it encompasses

RHCP2 and is the most general form of the RHCP ((18)-(19)) in that no active or idle time is restricted to zero. Inthis case, the planning horizon w ( U ia j ) is same as in (17).Similar to before, we next solve (18) (via RHCP( ρ i j ) (21)and OCP (22)) and (19) to obtain the solution of RHCP1 . a) Solution of RHCP( ρ i j ) (21) : As mentioned before,RHCP( ρ i j ) corresponding to the RHCP1 has already beensolved in [8] to obtain: ( τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j )= arg min ( τ i , ¯ τ i , τ j , ¯ τ j ) ∈ U s ( ρ ij ) J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) J ∗ sH ( ρ i j ) = J sH ( τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j ; ρ i j ) (40)where J H ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j )=  C τ i + C ¯ τ i + C τ j + C ¯ τ j + C τ i ¯ τ i + C τ i τ j + C τ i ¯ τ j + C ¯ τ i τ j + C ¯ τ i ¯ τ j + C τ j ¯ τ j + C τ i + C ¯ τ i + C τ j + C ¯ τ j + C  τ i + ¯ τ i + ρ i j + τ j + ¯ τ j , C = ¯ A − B i , C = ¯ A i , ¯ A i = ¯ A − A i , C = ¯ A − B j , C = ¯ A j , C = C = ¯ A i , C = ¯ A − B i , C = ¯ A j − B i , C = ¯ A i j = ¯ A i − A j , C = ¯ A j , C = [ ¯ R ( t O ) + ( ¯ A − B i ) ρ i j ] , C = [ ¯ R i ( t O ) + ¯ A i ρ i j ] , ¯ R i = ¯ R − R i , C = [ ¯ R ( t O ) + ¯ A ρ i j ] , C = [ ¯ R j ( t O ) + ¯ A j ρ i j ] , C = ρ i j [ R ( t O ) + ¯ A ρ i j ] . (41)Explicit expressions of τ ∗ i , ¯ τ ∗ i , τ ∗ j and ¯ τ ∗ j (each as a function of ρ i j ) are determined using the rational function optimizationtechnique proposed in [8, App. A]. However, due to spaceconstraints, we omit giving their exact forms. ) Objective Function of OCP (22) : The sensingobjective component of OCP: J ∗ sH ( t f − t o ) now can be ex-pressed explicitly using the J ∗ sH ( ρ i j ) function in (40) andthe relationship ρ i j = t f − t o . Note however that in RHCP1 ,both t o and t f are free. Therefore, let us denote this sensingobjective component of OCP as φ ( t o , t f ) (cid:44) J ∗ sH ( t f − t o ) . (42)However, the energy objective component of OCP: J eH ( { u a ( t ) } ) in RHCP1 takes the same form as in (27). c) Solution of OCP (22) : Similar to before, using ˙ x = Ax ( t ) + Bu ( t ) with (28) to represent the agent dynamics (16),the OCP corresponding to the RHCP1 can be stated asmin t o , t f , { u ( t ) } α H (cid:90) t f t o u ( t ) dt + φ ( t o , t f ) subject to ˙ x = Ax ( t ) + Bu ( t ) , x ( t o ) = [ , ] T , x ( t f ) = [ y i j , ] T . (43)Note that (43) is a standard free initial and ﬁnal time, ﬁxedinitial and ﬁnal state optimal control problem. Hence, similarto (29), there is an established solution procedure [20] asoutlined next.First, note that Hamiltonian corresponding to (43) takesthe same form as in (30). Next, the adjoined function thatcombines the terminal constraints on x ( t o ) and x ( t f ) with theterminal cost φ ( t o , t f ) are written as Φ ( x ( t o ) , t o , x ( t f ) , t f ) (cid:44) φ ( t o , t f )+ ν To x ( t o ) + ν Tf ( x ( t f ) − [ y i j , ] T ) (44)where ν o and ν f are sets of multipliers.Finally, the OCP in (43) can be solved to obtain thecorresponding optimal { x ( t ) , u ( t ) , λ ( t ) : t ∈ [ t o , t f ] } , t o , t f , ν o and ν f values by solving the following system of equations[20]: ∂ H ∂ u = α H u ( t ) + λ T ( t ) B = , (45)˙ λ = − (cid:18) ∂ H ∂ x (cid:19) T = − A T λ ( t ) , λ ( t o ) = − (cid:18) ∂ Φ ∂ x ( t o ) (cid:19) T = − ν o , (46) λ ( t f ) = (cid:18) ∂ Φ ∂ x ( t f ) (cid:19) T = ν f , (47) ∂ Φ ∂ t o − H | t = t o = ∂ φ∂ t o − α H u ( t o ) + ν To ( Ax ( t o ) + Bu ( t o )) = , (48) ∂ Φ ∂ t f + H | t = t f = ∂ φ∂ t f + α H u ( t f ) + ν Tf ( Ax ( t f ) + Bu ( t f )) = Lemma 2:

The optimal transit time ρ ∗ i j = t ∗ f − t ∗ o of theOCP (43) satisﬁes the equation ρ i j d φ ( t o , t f ) d ρ i j = α y i j , (50)where φ ( t o , t f ) (42) is considered as a function of ρ i j = t f − t o . Thus, the optimal terminal times t ∗ o and t ∗ f of (43) are t ∗ o = t s + τ ∗ i ( ρ ∗ i j ) + ¯ τ ∗ i ( ρ ∗ i j ) and t ∗ f = t o + ρ ∗ i j , (51)respectively ( τ ∗ i ( ρ i j ) and ¯ τ ∗ i ( ρ i j ) are as in (40)). The corre-sponding optimal control law u ∗ ( t ) of (43) is given by u ∗ ( t ) = y i j ( t ∗ f − t ∗ o ) (cid:20) t ∗ f + t ∗ o − t (cid:21) , ∀ t ∈ [ t o , t f ] . (52) Proof:

The proof follows the same steps as that ofLemma 1 and is, therefore, omitted. However, the mainsteps are: (i) solve for λ ( t ) , u ( t ) and x ( t ) , ∀ t ∈ [ t o , t f ] using(46), (45) and agent dynamics, respectively, in that order,in terms of t o , t f , ν o and ν f . (ii) use the terminal constraint x ( t f ) = [ y i j , ] T and (46) to determine ν o , ν f in terms of t o , t f and (iii) solve for t o and t f using (47) and (48).Using the optimal terminal times t ∗ f , t ∗ o and the control u ∗ ( t ) (i.e., u ∗ a ( t ) ) proven in Lemma 2, the optimal energyobjective component of this OCP (43) can be evaluated as J eH ( { u ∗ ( t ) } ) = y i j ( t ∗ f − t ∗ o ) . (53)The corresponding sensing objective component (42) is di-rectly given by φ ( t ∗ o , t ∗ f ) = J ∗ sH ( t ∗ f − t ∗ o ) = J sH ( ρ ∗ i j ) . d) Solution of RHCP (18) for U ∗ ia j : We now concludethe solution process of RHCP (18) (outlined in Fig. 4)by applying the determined optimal transit-time ρ ∗ i j in (40)to obtain the optimal control decisions τ ∗ i , ¯ τ ∗ i , τ ∗ j and ¯ τ ∗ j included in the control vector U ∗ ia j (18). Note that here it isnot necessary to evaluate the optimal agent angular velocityproﬁle { w ∗ a ( t ) : t ∈ [ t ∗ o , t ∗ f ] } (unlike in RHCP3 ) as the agentdoes not plan to depart from the current target immediately. e) Solution of RHCP (19) for j ∗ : We now havesolved the RHCP (18) and have obtained the optimal controlvector U ∗ ia j corresponding to a next-visit target j ∈ N i .Executing this process for all j ∈ N i and subsequentlyevaluating (19) gives the optimal next-visit target j ∗ as j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ∗ ia j )) .Upon solving RHCP1 , agent a remains active on target i for a duration of τ ∗ i , and in the meantime, if any otherexternal event such as C j or ¯ C j for some j ∈ N i occurs, itre-computes the remaining active time at target i . However,if the agent completes executing the determined active time(i.e., if the corresponding event [ h → τ ∗ i ] occurs) with R i ( t O + τ ∗ i ) =

0, then, the agent will have to subsequently solve aninstance of

RHCP2 to determine the remaining inactive timeat target i . Otherwise (i.e., if the event [ h → τ ∗ i ] occurs with R i ( t O + τ ∗ i ) > RHCP3 to determine the next-visit target anddepart from target i . Remark 5:

Upon solving

RHCP1 , over the subsequentactive time at target i , the agent can choose to control itsngular velocity w a ( t ) (while keeping u a ( t ) =

0) to adjustits heading θ a ( t ) to accommodate the impending departuretowards the next-visit target j ∗ found in (19). However, theagent can also choose to keep w a ( t ) = { ( i , j ) : j ∈ N i } accordingly.IV. O PTIMAL C ONTROLS FOR F IRST -O RDER A GENTS

In the previous sections, we have proposed an RHC basedsolution for PMN problems that uses energy-aware second-order agents. In this section, for comparison purposes, weﬁrst present the details of a similar RHC solution [8] thatuses energy-agnostic ﬁrst-order agents. Subsequently, moti-vated by a few practical qualities that such ﬁrst-order agentbehaviors (controls) possess, we derive energy-aware controllaws for governing actual second-order agents in a way thatthey imitate ﬁrst-order agents.In particular, this section explores how the agent controls { u ∗ a ( t ) : t ∈ P i j } derived for energy-aware second-orderagents (2) (given in Lemmas 1, 2) should be modiﬁed if weare to replace them with energy-agnostic ﬁrst-order agentsor with energy-aware second-order agents that imitate ﬁrst-order agents. Based on the proposed modular RHCP (18)solution process (outlined in Fig. 4), note that, a change in theagent dynamic model will affect only the OCP (22) (not theRHCP( ρ i j ) (21)) - speciﬁcally through the energy objectivecomponent J eH ( { u a ( t ) } ) . Therefore, in this section, our mainfocus is on re-stating (and solving) the OCP (22) assumingits sensing objective component J ∗ sH ( ρ i j ) is given. Note alsothat such a change in the OCP objective will directly affectthe resulting optimal transit-time (i.e., ρ ∗ i j ) and consequentlywill also affect all the other control variables in U ∗ ia j (18).Since we assume the J ∗ sH ( ρ i j ) function is given in thissection, we keep the ensuing analysis independent of theexact RHCP form (i.e., RHCP1 , RHCP2 or RHCP3 ). Tothis end, we start with generalizing the optimal agent controlsestablished for second-order agents in Lemmas 1 and 2, in atheorem. For convenience, let us label the PMN solution thatuses energy-aware actual second-order agents (developed inthe previous sections) as the “SO Method”. a) SO Method : First, note that both equations (34) and(50) are equivalent and thus can be written generally as ρ i j J ∗ sH ( ρ i j ) d ρ i j = α y i j . (54)The ρ i j value that satisﬁes the above equation is the optimaltransit-time to be used under the SO method (irrespective ofthe RHCP form). Second, note that both (38) and (53) arealso equivalent. This implies that the optimal agent energyconsumption can be expressed independent of the RHCPform. Finally, note that both (35) and (52) represent the sameagent control proﬁle albeit with a shifted starting point t o = t ∗ o (compared to being t o = t s ) in the latter. However, as shown in(51), this starting point t ∗ o depends only on sensing objectiverelated quantities and the determined optimal transit-timevalue. Moreover, since our main focus in this section is on Fig. 5: Tangential velocity proﬁles on a trajectory segment ( i , j ) ∈ E of length y i j under second-order (SO) and fourdifferent forms of (approximate) ﬁrst-order (FO-0,1,2,3)agent models. Under the FO- n agent model n ∈ { , , , } , ρ Fn (cid:44) y i j / v mn where v mn is the average velocity, v Fn isthe maximum velocity and u Fn is the maximum absoluteacceleration level.agent controls { u a ( t ) : t ∈ P i j } , without loss of generality,we can assume t o = t ∗ o = P i j = [ , ρ i j ] .With regard to the SO method, let us denote: (i) optimaltransit-time as ρ SO , (ii) optimal tangential velocity andacceleration as v ∗ a ( t ) and u ∗ a ( t ) , respectively, for t ∈ [ , ρ SO ] ,(iii) maximum tangential velocity and acceleration as v SO (cid:44) max { v ∗ a ( t ) } and u SO (cid:44) max { u ∗ a ( t ) } , respectively, and, (iv)optimal agent energy consumption for the transition as E SO . Theorem 3:

Under the SO method: ρ SO is given by (54), v ∗ a ( t ) = y i j ρ SO t ( ρ SO − t ) , u ∗ a ( t ) = y i j ρ SO (cid:18) − t ρ SO (cid:19) , (55)for t ∈ [ , ρ SO ] and v SO = y i j ρ SO , u SO = y i j ρ SO , E SO = y i j ρ SO . (56) Proof:

The results stated in (55) directly follow from(35) and (39). The relationships given in (56) can be obtainedusing (55) (via calculus) and (38).Figure 5 illustrates an example agent tangential velocityproﬁle segment { v ∗ a ( t ) : t ∈ [ , ρ SO ] } . In the subsequent sub-sections, we explore several alternative approaches to this SOmethod. In particular, each of these alternative methods hasits root in the ﬁrst-order agent model used in [8], [10] - whereeach agent is assumed to travel at a ﬁxed predeﬁned velocityover each trajectory segment, and thus, does not involve anOCP in RHCPs. We label this approach of controlling agentsas the “FO-0 Method” and an example agent tangentialvelocity proﬁle observed is shown in Fig. 5.However, note that we can neither characterize the totalenergy consumption nor control a real-world agent over sucha tangential velocity proﬁle as in the FO-0 curve in Fig.5 - due to the involved instantaneous inﬁnite accelerations.Therefore, to facilitate a comparison between SO and FO-0 methods, we propose to use actual second-order agents(instead of ﬁrst-order ones) but enforce each agent controllerto approximate a ﬁrst-order agent behavior (FO-0). We labelthis approximate version of the FO-0 method as the “FO-1 Method” and a corresponding agent tangential velocityproﬁle is shown in Fig. 5. ) FO-1 Method : Under the FO-1 method, as shownin Fig. 5, each agent is assumed to go through a sequenceof: constant acceleration (of u F ), constant velocity (of v F )and constant deceleration (of − u F ) stages over a period oflength ρ F every time it travels on a trajectory segment. Inparticular, the acceleration/deceleration magnitude u F andthe average velocity value v m = y i j / ρ F are assumed tobe prespeciﬁed, commonly for all ( i , j ) ∈ E . The resultingmaximum velocity level on a trajectory segment ( i , j ) ∈ E isdenoted as v i jF and can be expressed in terms of y i j , u F and v m as v i jF ( u F , v m ) =  y ij u F − (cid:113) y ij u F − v m y ij u F v m if y i j ≥ v m u F √ y i j u F otherwise.(57)To conduct a fair comparison between the SO and FO-1 methods, the two parameters u F and v m that deﬁne theFO-1 method are selected as follows. First, let us deﬁne u maxSO and v maxSO as the respective maximum values of all the u SO and v SO values (empirical) observed in the interested PMNproblem. Then, we propose to enforce: u F = u maxSO and v m = argmax v m > , ( i , j ) ∈ E v i jF ( u maxSO , v m ) subject to v i jF ( u maxSO , v m ) ≤ v maxSO , (58)to ensure the maximum velocity and acceleration valuesresulting from the FO-1 method are identical or as closeas possible to those of the SO method. Proposition 1:

The v m expression given in (58) can besimpliﬁed into the form v m = min ( i , j ) ∈ E y i j u maxSO v maxSO ( v maxSO ) + y i j u maxSO subject to y i j ≥ ( v maxSO ) / u maxSO . (59) Proof:

Provided in Appendix C.Note that according to Proposition 1, we need to assumethe given u maxSO and v maxSO satisfy that ∃ ( i , j ) ∈ E with y i j ≥ ( v maxSO ) / u maxSO v m . However, if this assumption does not hold,we simply can use a lower v maxSO value than its actual valuewhen evaluating v m in (59). Note also that the maximumvelocity value observed in the FO-1 method is given by v F = max ( i , j ) ∈ E v i jF ( u maxSO , v m ) , (60)and the agent energy consumption on a trajectory segment ( i , j ) can be proven to be E i jF = u maxSO v i jF ( u maxSO , v m ) . (61)We can now compare the FO-1 and SO methods as wecan compute the total agent energy consumption in the FO-1 method (numerical results are provided in Section V, e.g.,Tab I). We again highlight that the FO-1 method: (i) doesnot consider the agent energy when solving its RHCPs (i.e.,RHCPs do not involve an OCP) and (ii) uses actual second-order agents whose controllers constrained to approximateﬁrst-order agent behaviors. We conclude our discussion about the FO-1 method with the following remark - which willmotivate us to reﬁne the proposed FO-1 method. Remark 6:

Notice that the optimal second-order agentcontrol u ∗ a ( t ) (55) (in the SO method) decreases linearlyand includes a zero-crossing point (at t = ( t o + t f ) / u a ( t ) = c) FO-2 Method : Even though we have proposed a rea-sonable and consistent way to select the parameters involvedin the FO-1 method (i.e., u F and v m ), it is clear that suchan approach is agnostic to the agent energy consumption.To address this concern, we next propose the FO-2 method,which as shown in Fig. 5, is identical to the FO-1 method inmany ways except for its choice of acceleration/decelerationmagnitude u F and average velocity value v m . In particular,as opposed to selecting u F , v m according to (58), here, anenergy-optimized approach is followed. Theorem 4:

Under the FO-2 method, for a ﬁxed averagevelocity v m = y i j / ρ F , on a trajectory segment ( i , j ) ∈ E ,the optimal agent energy consumption is E F = v m y ij and itis achieved when u F = v m y ij and v F = v m are used. Proof:

Since the total distance traveled by the (FO-2)agent over the period [ , ρ F ] is y i j , we can state that12 (cid:18) ρ F + (cid:18) ρ F − v F u F (cid:19)(cid:19) v F = y i j . (62)Over the same period, the corresponding total agent energyrequirement (denoted as E F ) can be evaluated by integratingthe square of the acceleration proﬁle used. This gives E F = (cid:90) ρ F u ( t ) dt = u F v F u F = u F v F . (63)This expression can be further simpliﬁed using (62) to obtain E F = v F v F ρ F − y i j . (64)Recall that both y i j and ρ F (= y i j / v m ) are ﬁxed in thiscase. Therefore, E F in (64) is a function of (only) v F .Thus, we can use calculus to determine the choice of v F that minimizes E F . This (and back substitution) reveals: v F = y i j ρ F , u F = y i j ρ F , E F = y i j ρ F . (65)Finally, this proof can be completed by replacing ρ F termswith y i j / v m in each of the above expressions. Corollary 1: If v m in the FO-2 method is such that y ij v m = ρ SO (i.e., ρ F = ρ SO ), then v F = v SO , u F = u SO , E F = E SO . Proof:

This result directly follow from comparing The-orem 3 (56) with (65).ext, we use Theorem 4 to develop energy-optimizedchoices for the v m and u F parameters of the FO-2 method.However, similar to before, we also use v maxSO and u maxSO values(empirical) as known inputs in this process to make sure themaximum velocity and acceleration values resulting from theFO-2 method are identical or as close as possible to thoseof the SO method.Note that the optimal choices of v F and u F givenin Theorem 4 are dependent on both ( i , j ) ∈ E and v m .Therefore, let us denote those as functions: v i jF ( v m ) = v m , u i jF ( v m ) = v m y i j . (66)Now, we propose to select the parameter v m based on theabove two relationships and the given v maxSO , u maxSO values as v m = arg max v m > , ( i , j ) ∈ E v i jF ( v m ) subject to v i jF ( v m ) ≤ v maxSO , u i jF ( v m ) ≤ u maxSO . (67) Proposition 2:

The v m expression given in (67) can besimpliﬁed into the form v m = min { (cid:112) y min u maxSO , v maxSO } (68)where y min = min ( i , j ) ∈ E y i j . Proof:

The proof follows the same steps as that ofProposition 1 and is, therefore, omitted.We point out that even though the average velocity v m computed above is used commonly across all the trajectorysegments, the acceleration/deceleration level of an agent in atrajectory segment ( i , j ) has to be selected as u i jF ( v m ) (66)so as to optimize the agent energy consumption. Hence, theoverall maximum acceleration/deceleration level observed inthe FO-2 method is (via (66)) u F = max ( i , j ) ∈ E u i jF ( v m ) . (69)Consider the scenario where ρ F = ρ SO on a certain tra-jectory segment. In such a case, sensing objective wise, bothFO-2 and SO methods perform equally. However, Corollary1 states that energy objective wise, the FO-2 method showsa 12 .

5% loss (i.e., a higher energy consumption) comparedto the SO method. Moreover, recall that the FO-2 methoddoes not consider the energy expenditure when solving itsRHCPs (i.e., no OCP is involved, similar to FO-0 and FO-1methods). To mitigate these two obvious disadvantages, wenext propose the FO-3 method - where we try to optimize theenergy objective further compromising the sensing objectivein an OCP. d) FO-3 Method : As shown in Fig. 5, the FO-3 methodhas similarly shaped agent state trajectories to FO-1 and FO-2 methods. However, as we will see next, the FO-3 methoddoes not involve any parameter that needs to be selectedbased on external information like u maxSO and v maxSO .On the other hand, note that in the FO-2 method, theoptimal agent energy consumption E F (65) is inversely proportional to the transit-time ρ F . Motivated by this, theFO-3 method proposes to use a larger transit-time ρ F ≥ ρ F (see Fig. 5) compromising the sensing objective so as toachieve a better (lower) energy objective. However, to makethis trade-off a proﬁtable one (in terms of the total objective(3)) we need to use the OCP (22).Note that we assume J ∗ sH ( ρ i j ) (i.e., the sensing objectivecomponent of the OCP (22)) as a known function in thissection. Therefore, the sensing objective component of OCP(22) under the FO-3 method can be written as J ∗ sH ( ρ F ) . Onthe other hand, under the FO-3 method, the energy objectivecomponent of the OCP can be written as J eH ( { u a ( t ) } ) = E F (cid:44) y ij ρ F (using E F in (65) and replacing ρ F with ρ F ).Hence, the objective function of the OCP (22) under the FO-3 method is J H = α H E F + J ∗ sH ( ρ F ) = α H y i j ρ F + J ∗ sH ( ρ F ) . (70) Theorem 5:

Under the FO-3 method, the optimal transit-time is ρ i j = ρ F that satisﬁes the equation: ρ i j dJ ∗ sH ( ρ i j ) d ρ i j = α y i j . (71)The corresponding optimal values of v F , u F and E F are v F = y i j ρ F , u F = y i j ρ F , E F = y i j ρ F , (72)i.e., v F = k v SO , u F = k u SO , E F = k E SO where k (cid:44) ρ F ρ SO . Proof:

The OCP objective J H given in (70) dependsonly on the choice of ρ F . Therefore, the optimal ρ F value that minimizes J H can be found using the equation dJ H d ρ F =

0, which translates into (71). Since both FO-2 andFO-3 methods assume structurally similar velocity proﬁles,we still can use Theorem 4 in the context of FO-3 afterreplacing ρ F with ρ F . In this way, (72) (and the remainingresults) can be obtained using (65) (and Theorem 3 (56)). (cid:4) Note that even though equations (71) and (54) are struc-turally similar, their subtle difference (in the coefﬁcienton the RHS) causes the FO-3 method to have a differenttransit-time value compared to the SO method. Based on thedifference between (71) and (54), we can anticipate ρ F > ρ SO (also, as we intended in the ﬁrst place). In such a case,the parameter k deﬁned in Theorem 5 follows k >

1. Thisimplies that (from Theorem 5) v F < v SO and u F < u SO , i.e.,the FO-3 method requires smaller velocity and accelerationvalues compared to the SO method.As shown in Appendix D and E, this analysis regarding theoptimal (approximate) ﬁrst and second-order agent behaviorson trajectory segments can be extended effortlessly for sce-narios where agents have additional velocity and accelerationconstraints. V. N UMERICAL R ESULTS

In this section, we ﬁrst explore the nature of individual

RHCP3 and

RHCP1 solutions presented in Section III undersecond-order agents (i.e., SO method). Then, we compare theerformance metrics J T , J e and J s deﬁned in (3) obtained forseveral different PMN problem conﬁgurations (shown in Fig.13) under different agent control methods: SO, FO-1, FO-2and FO-3. A. Numerical Results for a

RHCP3

To numerically evaluate quantities relating to a

RHCP3 and its solution, we chose target parameters as A m = B m = , ∀ m ∈ N i with | ¯ N i | =

3. Moreover, the default values of α , R j ( t s ) , y i j and H are chosen respectively as 0 . , , RHCP3 solution (i.e., v a ( t ) , u a ( t ) , J sH , J eH and J H ) changes when thethree parameters α , R j ( t s ) and y i j are varied.Figure 6 conﬁrms that by increasing the weight factor α (i.e., by giving more weight to the energy objective), wecan constrain the agent tangential velocity and accelerationproﬁles. A converse behavior can be seen in Fig. 7 withrespect to the next-visit target j ’s initial uncertainty R j ( t s ) . Inparticular, when R j ( t s ) is high, the agent is required to arriveat target j quickly (resulting in high tangential velocity andacceleration levels). In contrast, Fig. 8 reveals that when thetrajectory segment length y i j is varied, the agent may not (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 6:

RHCP3 solution under different weight factors (i.e., α in(3)) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 7:

RHCP3 solution under different initial target uncer-tainty (i.e., R j ( t s ) ) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 8:

RHCP3 solution under different trajectory segmentlength (i.e., y i j ) values. try to signiﬁcantly regulate: (i) the arrival time at target j (i.e., the transit-time ρ i j = t f ) or (ii) the magnitude of themaximum tangential acceleration. B. Numerical Results for a

RHCP1

Similar to before, to numerically evaluate quantities relat-ing to a

RHCP1 and its solution, we use the same parametervalues mentioned before, along with an additional default (initial) value R i ( t s ) =

50. Figures 9 - 12 respectively showhow the

RHCP1 solution (i.e., v a ( t ) , u a ( t ) , J sH , J eH and J H )changes when the four parameters α , R j ( t s ) , y i j and R i ( t o ) are varied.The RHCP1 solution properties illustrated in Figs. 9-11are identical to those of the

RHCP3 (shown in Figs. 6-8),except for the fact that now t o > t s = t o is theplanned time to leave the target i ). However, Figs. 9-11 implythat t o is independent of the α , R j ( t s ) or y i j value. In contrast,Fig. 12 reveals that t o is directly proportional to the R i ( t s ) value. Moreover, Fig. 12 shows that the maximum valuesof tangential velocity and acceleration decreases by a smallmargin when R i ( t s ) is increased. This implies that the agentplans to travel less urgently when it has to do more “sensing”at the current target i . C. Overall Performance in PMN Problems

In this ﬁnal section, we compare the performance metrics J T , J e and J s deﬁned in (3) obtained for several different PMNproblem conﬁgurations using agents behaving under: (i) SO,(ii) FO-1, (iii) FO-2 and (iv) FO-3 methods. In addition to J T , J e and J s , we also use the performance metrics: v max (cid:44) max a ∈ A , t ∈ [ , T ] v a ( t ) and u max (cid:44) max a ∈ A , t ∈ [ , T ] | u a ( t ) | , (73)to represent overall agent behaviors rendered by dif-ferent agent models. The proposed RHC solution to (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 9:

RHCP1 solution under different α (in (3)) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 10:

RHCP1 solution under different R j ( t s ) values.ABLE I: A comparison of performance metrics: J e , J s , J T (deﬁned in (3)) and v max , u max (deﬁned in (73)) observed underdifferent agent control methods SO, FO-1, FO-2 and FO-3 for each PMN Problem Conﬁguration (PC) shown in Fig. 13. PC J T J e / J s v max u max SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3PC1

PC2 363.1 1084.5 252.9

PC3

PC4 705.7 1212.8 721.4

PC5

PC6

PC7

PC8

Avg. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 11:

RHCP1 solution under different y i j values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 12:

RHCP1 solution under different R i ( t s ) values. (a) PC1 (b) PC2 (c) PC3 (d) PC4(e) PC5 (f) PC6 (g) PC7 (h) PC8 R i ( t ) , J a ( , t ) and s a ( t ) ), respectively. Since these threequantities are time-dependent, in the ﬁgures, only theirterminal state (i.e., at t = T ) is shown when the highestperforming agent model (control method) is used.The parameters of each PC have been chosen as follows: A i = , B i = , R i ( ) = . , ∀ i ∈ T and target locations(i.e., Y i ) are speciﬁed in each PC ﬁgure. In all PCs, targetshave been placed inside a 600 ×

600 mission space. The timehorizon was set to T = s a ( ) = Y i with i = + ( a − ) ∗ round ( M / N ) . The upper bound on the planning horizon(i.e., H ) was chosen as H = T =

250 and the weight factor α in (3) was chosen as α = . × − .Obtained comparative results are summarized in Tab. I.According to these results, on average, the energy-awaresecond-order agents (i.e., the SO method) have outperformedthe energy-agnostic and energy-aware versions of ﬁrst-orderagents (i.e., FO-1 and FO-2, FO-3 methods, respectively) interms of sensing objective J s , energy objective J e as well asthe total objective J T .However, the energy-aware (approximate) ﬁrst-order agentcontrol method FO-3 has shown relatively closer (within1 .

17% on average) performance levels to those of the SOmethod. Moreover, the FO-3 method has outperformed theSO method in terms of the performance metrics u max and v max . This observation is reasonable because the motivationbehind developing the FO-3 method was to improve theagent energy consumption. Recall also that we already haveproven in Theorem 5 that v F = k v SO , u F = k v SO with k ≥ ONCLUSION

This paper considers the persistent monitoring problemdeﬁned on a network of targets that needs to be monitoredby a team of energy-aware dynamic agents. Starting froman existing event-driven receding horizon control (RHC)solution, we exploit optimal control techniques to incorporateagent dynamics and agent energy consumption into theHC problem setup. The proposed overall RHC solution iscomputationally efﬁcient, distributed, on-line and gradient-free. Numerical results are provided to highlight the improve-ments with respect to an RHC solution that uses energy-agnostic ﬁrst-order agents. Ongoing work aims to combinethe proposed solution with a path planning algorithm toaddress situations where the agent trajectory segment shapeshave to be optimally determined.A

PPENDIX

A. Selecting the Weight Factor: α The weight factor α present in both the main objective J T (3) and the RHCP objective J H (20) is an important factorthat decides the trade-off between energy objective and thesensing objective components (i.e., J eH and J sH , respectively,in the latter case). Moreover, note that α can be used tobound the resulting optimal agent velocities and accelerationsfrom the proposed RHC solution. Therefore, it is importantto have an intuitive technique to select (and vary) α ∈ [ , ∞ ) .To develop such a technique, we use the RHCP form(20): J H = α J eH + J sH , rather than the main optimizationproblem form (3). A typical RHCP objective function J H that considers both energy and sensing objectives (i.e., J eH and J sH , respectively) can be written as J H = β J eH E maxH + ( − β ) J sH S maxH , (74)where E maxH and S maxH are upper-bounds to the terms J eH and J sH respectively and β is a parameter such that β ∈ [ , ] .Next, let us re-arrange the above expression to isolate thesensing objective component as J H = (cid:20) β − β S maxH E maxH (cid:21)(cid:124) (cid:123)(cid:122) (cid:125) α J eH + J sH . (75)Now, if the ratio S maxH / E maxH is known, a candidate α valuecan be obtained intuitively by selecting β ∈ [ , ) appro-priately. For an example, selecting β = . S maxH and E maxH we canconsider a simple RHCP that occurs when an agent a isready to leave a target i with a (single) neighboring target j connected through a trajectory segment ( i , j ) . For such ascenario, assuming steady-state operation, using Theorem 1,we can show that S maxH ∝ ρ i j . Next, let us deﬁne quantities E maxH , v max and u max based on Theorem 3 (56) as E maxH (cid:44) E SO ∝ y i j ρ i j , v max (cid:44) v SO ∝ y i j ρ i j , u max (cid:44) u SO ∝ y i j ρ i j . Combining S maxH , E maxH together with v max or (alternatively) u max stated above, we can show that S maxH E maxH ∝ y i j v max or S maxH E maxH ∝ u max , (76)respectively. Here, v max and u max can be thought of as thepreferred tangential velocity and acceleration bounds for the agents, respectively. And y i j can be thought of as themean trajectory segment length over all ( i , j ) ∈ E . Finally,neglecting the constants of proportionality in the abovestatements and using (75), we can state α as α = β − β y i j v max or α = β − β u max . (77)This result (77) provides a systematic way to select α while accounting for: (i) the relative balance between sensingand energy objectives (via β ∈ [ , ] ) and (ii) the preferredtangential velocity and acceleration bounds (via v max and u max , respectively). For an example, with β = . , v max = , y i j =

25, the relationship in (77) gives α = . × − . B. Proof of Theorem 2

First, we transform the parametric form { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } of the trajectory segment shape in to theform { ( x ( l ) , y ( l )) : l ∈ [ , y i j ] } where l represents the dis-tance along the trajectory segment from ( x ( p o ) , y ( p o )) to ( x ( p ) , y ( p )) , p ∈ [ p o , p f ] (recall that y i j is the total lengthof the interested trajectory segment ( i , j ) ∈ E ). To achievethe said transformation, we should be able to express theparameter p explicitly in terms of the distance l . For thispurpose, exploiting the geometry (see also Fig. 2), we canwrite a differential equation: dl = (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) d p . (78)Under assumption 2, (78) can be solved to obtain explicitrelationships: l = f ( p ) and p = f − ( l ) , where f : [ p o , p f ] → [ , y i j ] is as in (12). Thus, we now can express the trajectorysegment shape in the form: { ( x ( l ) , y ( l )) : l ∈ [ , y i j ] } .Second, according to Fig. 2, note that when the agent a ∈ A is at s a ( t ) ≡ ( x ( l ) , y ( l )) , its orientation θ satisﬁestan θ = ˙ y ( l ) ˙ x ( l ) = dy ( l ) dl dldtdx ( l ) dl dldt = y (cid:48) x (cid:48) . (79)In the above, the notation “ (cid:48) ” (without a subscript) has beenused here to represent the d · dl operator. The time derivativeof this relationship givessec θ d θ dt = ( x (cid:48) y (cid:48)(cid:48) − y (cid:48) x (cid:48)(cid:48) )( x (cid:48) ) dldt . (80)Note that if l = l a ( t ) is used to represent the total distancethe agent traveled on the trajectory segment by the time t ∈ [ t o , t f ] , we can also write dldt = v a ( t ) (i.e., the agenttangential velocity) and d θ dt = w a ( t ) (i.e., the agent angularvelocity). Therefore, using the above two relationships andthe trigonometric identity: sec θ = + tan θ , we can obtain w a ( t ) for any t ∈ [ t o , t f ] as w a ( t ) = x (cid:48) y (cid:48)(cid:48) − y (cid:48) x (cid:48)(cid:48) ( x (cid:48) ) + ( y (cid:48) ) (cid:124) (cid:123)(cid:122) (cid:125) G ( l ) v a ( t ) . (81)Here, note that the ﬁrst term G ( l ) is a function of l = l a ( t ) .inally, we can transform this G ( l ) term in (81) to obtaina function of parameter p , using the following relationships(from the chain rule and the fact that l = f ( p ) ): x (cid:48) = x (cid:48) p f (cid:48) p , y (cid:48) = y (cid:48) p f (cid:48) p , x (cid:48)(cid:48) = x (cid:48)(cid:48) p f (cid:48) p − x (cid:48) p f (cid:48)(cid:48) p ( f (cid:48) p ) , y (cid:48)(cid:48) = y (cid:48)(cid:48) p f (cid:48) p − y (cid:48) p f (cid:48)(cid:48) p ( f (cid:48) p ) , (82)and (using (12)) f (cid:48) p = (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) . (83)Recall that, in the above, we have used the notation “ (cid:48) ”(with a subscript p ) to denote the operator d · dp . Now, using(82) and (83), G ( l ) in (81) can be written as G ( l ) = G ( f ( p )) = x (cid:48) p y (cid:48)(cid:48) p − y (cid:48) p x (cid:48)(cid:48) p (cid:0) ( x (cid:48) p ) + ( y (cid:48) p ) (cid:1) . (84)Comparing the above result with (13), notice that G ( l ) = F ( p ) . Therefore, (81) can be written as w a ( t ) = F ( p ) v a ( t ) where now p can be replaced with p = f − ( l ) = f − ( l a ( t )) to obtain (15): w a ( t ) = F ( f − ( l a ( t ))) v a ( t ) , which completes the proof. C. Proof of Proposition 1

It is easy to show that v i jF ( u maxSO , v m ) (57) is a monoton-ically increasing function with respect to v m . In particular,if v m > (cid:112) y i j u maxSO , the function v i jF ( u maxSO , v m ) plateaus at alevel (cid:112) y i j u maxSO . Therefore, the set of v m values that satisﬁesthe inequality v i jF ( u maxSO , v m ) ≤ v maxSO can be stated as v m ≤ v i jm (cid:44)  y ij u maxSO v maxSO ( v maxSO ) + y ij u maxSO if y i j ≥ ( v maxSO ) u maxSO ∞ otherwise. (85)According to (58), the inequality v i jF ( u maxSO , v m ) ≤ v maxSO shouldhold for all ( i , j ) ∈ E . Therefore, the feasible set of v m in(58) is: v m ≤ min ( i , j ) ∈ E v i jm . Again, using the monotonicityproperty of v i jF ( u maxSO , v m ) (which is also the the objectivefunction of (58)), we can show that the optimal v m value(i.e., v m ) of (58) is the maximum feasible v m value, i.e., v m = min ( i , j ) ∈ E v i jm = min ( i , j ) ∈ E y i j u maxSO v maxSO ( v maxSO ) + y i j u maxSO subject to y i j ≥ ( v maxSO ) / u maxSO . (86) D. Second-Order Agent Models with Constraints

In this section, we show how a second-order agent a ∈ A should select its behavior (including the transit-time) on atrajectory segment when solving a RHCP under tangentialvelocity or acceleration bounds. a) SO-V Method : The SO-V method assumes that theagent tangential velocity is bounded such that | v a ( t ) | ≤ ¯ v where ¯ v is predeﬁned and satisﬁes ¯ v < v SO = y ij ρ SO (recall that ρ SO is the optimal transit-time found for the unconstrainedSO method). Based on the optimal unconstrained velocityproﬁle (55), we can expect the optimal constrained velocityproﬁle to contain three different phases: two quadratic seg-ments at the beginning and the end and a constant velocitysegment in the middle, as shown in Fig. 14.A generalized version of the optimal unconstrained veloc-ity proﬁle (55) can be written as v ( t ) = α t ( β − t ) , t ∈ [ , β ] where β can be thought of as controllable parameter and α = y ij β (enforcing the condition: (cid:82) β v ( t ) dt = y i j ). We nextuse this v ( t ) proﬁle to construct the optimal constrainedvelocity proﬁle v a ( t ) as v a ( t ) (cid:44)  v ( t ) t ∈ [ , t ) ¯ v t ∈ [ t , t ) v ( t − ( ρ SV − β )) t ∈ [ t , ρ SV ] , (87)where t is such that v ( t ) = ¯ v (existence of such a t isguaranteed when β ≤ ρ SO ), t = ρ SV − t (from symmetry)and the transit-time ρ SV is such that (cid:82) ρ SV v a ( t ) dt = y i j . Inparticular, it can be shown that t = β (cid:16) − (cid:0) − β t v (cid:1) (cid:17) , ρ SV = β + t v (cid:0) − β t v (cid:1) , (88)where t v (cid:44) y ij v . We highlight that the agent velocity proﬁle v a ( t ) deﬁned in (87) depends only on the parameter β .Fig. 14: Tangential velocity proﬁles on a trajectory segment ( i , j ) ∈ E under unconstrained (SO) and constrained (SO-V)second-order agent models.Under the SO-V method, the sensing objective componentof the OCP (22) is J ∗ sH ( ρ SV ) and the energy objectivecomponent of the OCP can be written as E SV (cid:44) (cid:90) ρ SV ( dv a ( t ) dt ) dt = y i j β (cid:16)(cid:0) − β t v (cid:1) − (cid:17) . (89)Therefore, the OCP objective that needs to be optimized ina RHCP under the SO-V method is J H = α E SV + J ∗ sH ( ρ SV ) . (90)Thus, the optimal transit-time ρ SV (and hence the optimal β value via (88)) can be found using: dJ H d ρ SV = α dE SV d β / d ρ SV d β + dJ ∗ sH ( ρ SV ) d ρ SV = . (91)s shown in Fig. 4, note that ﬁnding the optimal transit-time corresponding to the OCP (22) enables determining theremaining control inputs in U ∗ ia j of the RHCP (18). b) SO-A Method : The SO-A method assumes that theagent tangential acceleration is bounded such that | u a ( t ) | ≤ ¯ u where ¯ u is predeﬁned and satisﬁes ¯ u < u SO = y ij ρ SO . Based onthe optimal unconstrained acceleration proﬁle (55), we canexpect the optimal constrained acceleration proﬁle to be acomposition of three stages: two constant acceleration ses-sions at the beginning and the end and a linearly decreasingacceleration session in the middle, as shown in Fig. 15.In particular, the optimal constrained acceleration proﬁle u a ( t ) can be written as u a ( t ) (cid:44)  ¯ u t ∈ [ , t ] ¯ u − β ( t − t ) t ∈ [ t , t ] − ¯ u t ∈ [ t , ρ SA ] , (92)where t , t are switching times such that v a ( t ) = v SA and ρ SA is the transit-time. Using the symmetry and the relationship (cid:82) ρ SA v a ( t ) dt = y i j , it can be shown that v SA = (cid:115) y i j ¯ u − ¯ u β , ρ SA = u (cid:115) y i j ¯ u − ¯ u β + ¯ u β . (93)Notice that β is a controllable parameter that fully deﬁnesthe optimal constrained acceleration proﬁle in (92).Fig. 15: Tangential acceleration proﬁles on a trajectorysegment ( i , j ) ∈ E under unconstrained (SO) and constrained(SO-A) second-order agent models.Under the SO-A method, the sensing objective componentof a RHCP is J ∗ sH ( ρ SA ) and the energy objective componentof the OCP (22) can be written as E SA (cid:44) (cid:90) ρ SA u a ( t ) dt = u (cid:115) y i j ¯ u − ¯ u β + ¯ u α . (94)Therefore, the composite objective function of the OCPunder the SO-A method is J H = α E SA + J ∗ sH ( ρ SA ) . (95)Thus, the optimal transit-time ρ SA (and hence the optimal β value via (93)) can be found using the equation: dJ H d ρ SA = α dE SA d β / d ρ SA d β + dJ ∗ sH ( ρ SA ) d ρ SA = . (96) E. First-Order Agent Models with Constraints

In this section, we investigate how a ﬁrst-order agent a ∈ A should select its behavior (including the transit-time) on atrajectory segment when solving an RHCP under tangentialvelocity or acceleration bounds. a) FO-V Method : The FO-V method assumes that theagent tangential velocity is bounded such that v a ( t ) ≤ ¯ v where¯ v is predeﬁned and satisﬁes ¯ v < v F = y ij ρ F (recall that ρ F isthe transit-time found for the unconstrained FO-3 method).Under this constrained setting, the optimal agent tangentialvelocity proﬁle is shown in Fig. 16 where u FV is a control-lable parameter. Taking the corresponding transit-time as ρ FV and using the fact that (cid:82) ρ FV v a ( t ) dt = y i j , it can be shownthat u FV = ¯ v ¯ v ρ FV − y i j . (97)Similar to before, under this FO-V method, the sensingobjective component of the OCP (22) is J ∗ sH ( ρ FV ) and theenergy objective component of a RHCP can be written as E FV = v ¯ v ρ FV − y i j . (98)The composite objective function that needs to be optimizedthe OCP (22) under the FO-V method is J H = α E FV + J ∗ sH ( ρ FV ) . (99)Therefore, the optimal transit-time ρ FV (and hence the opti-mal u FV value via (97)) can be found using: dJ H d ρ FV = α dE FV d ρ FV + dJ ∗ sH ( ρ FV ) d ρ FV = . (100)Fig. 16: Tangential velocity proﬁles on a trajectory segment ( i , j ) ∈ E under unconstrained (FO-3) and constrained (FO-V) ﬁrst-order agent models. b) FO-A Method : The FO-A method assumes that theagent tangential acceleration is bounded such that | u a ( t ) | ≤ ¯ u where ¯ u is predeﬁned and satisﬁes ¯ u < u F = y ij ρ F .Under this constrained setting, the optimal agent tangentialvelocity proﬁle is shown in Fig. 17 where v FA is a control-lable parameter. Taking the corresponding transit-time as ρ FA and using the fact that (cid:82) ρ FA v a ( t ) dt = y i j , it can be shown that v FA = ρ FA ¯ u − ¯ u (cid:114) ρ FA − y i j ¯ u . (101)Following the same procedure as before, under this FO-A method, the sensing objective component of the OCP is ∗ sH ( ρ FA ) and the energy objective component of a RHCPcan be written as E FA = ¯ u (cid:0) ρ FA − (cid:114) ρ FA − y i j ¯ u (cid:1) . (102)The composite objective function that needs to be optimizedin a RHCP under the FO-A method is J H = α E FA + J ∗ sH ( ρ FA ) . (103)Therefore, the optimal transit-time ρ FA (and hence the opti-mal v FA value via (101)) can be found using: dJ H d ρ FA = α dE FA d ρ FA + dJ ∗ sH ( ρ FA ) d ρ FA = . (104)Fig. 17: Tangential velocity proﬁles on a trajectory segment ( i , j ) ∈ E under unconstrained (FO-3) and constrained (FO-A) ﬁrst-order agent models.R EFERENCES[1] M. L. Elwin, R. A. Freeman, and K. M. Lynch, “Distributed Envi-ronmental Monitoring with Finite Element Robots,”

IEEE Trans. onRobotics , vol. 36, no. 2, pp. 380–398, 2020.[2] D. Kingston, R. W. Beard, and R. S. Holt, “Decentralized PerimeterSurveillance Using a Team of UAVs,”

IEEE Trans. on Robotics ,vol. 24, no. 6, pp. 1394–1404, 2008.[3] R. Reshma, T. Ramesh, and P. Sathishkumar, “Security SituationalAware Intelligent Road Trafﬁc Monitoring Using UAVs,” in

Proc. of2nd IEEE Intl. Conf. on VLSI Systems, Architectures, Technology andApplications , 2016, pp. 1–6.[4] S. L. Smith, M. Schwager, and D. Rus, “Persistent Monitoring ofChanging Environments Using a Robot with Limited Range Sensing,”in

Proc. of IEEE Intl. Conf. on Robotics and Automation , 2011, pp.5448–5455.[5] J. Yu, S. Karaman, and D. Rus, “Persistent Monitoring of Events WithStochastic Arrivals at Multiple Stations,”

IEEE Trans. on Robotics ,vol. 31, no. 3, pp. 521–535, 2015.[6] N. Mathew, S. L. Smith, and S. L. Waslander, “Multirobot RendezvousPlanning for Recharging in Persistent Tasks,”

IEEE Trans. on Robotics ,vol. 31, no. 1, pp. 128–142, 2015.[7] S. K. Hari, S. Rathinam, S. Darbha, K. Kalyanam, S. G. Manyam,and D. Casbeer, “The Generalized Persistent Monitoring Problem,” in

Proc. of American Control Conf. , vol. 2019-July, 2019, pp. 2783–2788.[8] S. Welikala and C. G. Cassandras, “Event-Driven Receding HorizonControl For Distributed Persistent Monitoring in Network Systems,”

Automatica , vol. 127, p. 109519, 2021.[9] Y.-W. Wang, Y.-W. Wei, X.-K. Liu, N. Zhou, and C. G. Cassandras,“Optimal Persistent Monitoring Using Second-Order Agents withPhysical Constraints,”

IEEE Trans. on Automatic Control , vol. 64,no. 8, pp. 3239–3252, 2017.[10] N. Zhou, C. G. Cassandras, X. Yu, and S. B. Andersson, “OptimalThreshold-Based Distributed Control Policies for Persistent Monitor-ing on Graphs,” in

Proc. of American Control Conf. , 2019, pp. 2030–2035.[11] X. Lan and M. Schwager, “Planning Periodic Persistent MonitoringTrajectories for Sensing Robots in Gaussian Random Fields,” in

InProc. of IEEE Intl. Conf. on Robotics and Automation , 2013, pp. 2415–2420. [12] X. Lin and C. G. Cassandras, “An Optimal Control Approach toThe Multi-Agent Persistent Monitoring Problem in Two-DimensionalSpaces,”

IEEE Trans. on Automatic Control

IFAC-PapersOnLine , vol. 52, no. 20, 2019,pp. 217–222.[15] W. Li and C. G. Cassandras, “A Cooperative Receding HorizonController for Multi-Vehicle Uncertain Environments,”

IEEE Trans.on Automatic Control , vol. 51, no. 2, pp. 242–257, 2006.[16] R. Chen and C. G. Cassandras, “Optimal Assignments in Mobility-on-Demand Systems Using Event-Driven Receding Horizon Control,”

IEEE Trans. on Intelligent Transportation Systems , pp. 1–15, 2020.[Online]. Available: https://doi.org/10.1109/TITS.2020.3030218[17] J. Yu, M. Schwager, and D. Rus, “Correlated Orienteering Problemand its Application to Persistent Monitoring Tasks,”

IEEE Trans. onRobotics , vol. 32, no. 5, pp. 1106–1118, 2016.[18] M. Pakdaman and M. M. Sanaatiyan, “Design and Implementationof Line Follower Robot,” in

Proc. of Intl. Conf. on Computer andElectrical Engineering , vol. 2, 2009, pp. 585–590.[19] T. Kim, C. Lee, and H. Shim, “Completely Decentralized Design ofDistributed Observer for Linear Systems,”

IEEE Trans. on AutomaticControl , vol. 65, no. 11, pp. 4664–4678, 2020.[20] A. E. Bryson, Y. C. Ho, Y. C. Ho, and D. P. Cantwell,