Event-Driven Receding Horizon Control of Energy-Aware Dynamic Agents For Distributed Persistent Monitoring
EEvent-Driven Receding Horizon Control of Energy-Aware DynamicAgents For Distributed Persistent Monitoring
Shirantha Welikala and Christos G. Cassandras
Abstract — This paper addresses the persistent monitoringproblem defined on a network where a set of nodes (targets)needs to be monitored by a team of dynamic energy-awareagents. The objective is to control the agents’ motion tojointly optimize the overall agent energy consumption anda measure of overall node state uncertainty, evaluated overa finite period of interest. To achieve these objectives, weextend an established event-driven Receding Horizon Control(RHC) solution by adding an optimal controller to account foragent motion dynamics and associated energy consumption. Theresulting RHC solution is computationally efficient, distributedand on-line. Finally, numerical results are provided highlightingimprovements compared to an existing RHC solution that usesenergy-agnostic first-order agents.
I. I
NTRODUCTION
We consider the problem of controlling a group of mobile agents deployed to monitor a finite set of “points of interest”(henceforth called targets ) in a mission space. In particular,each agent follows second-order unicycle dynamics and eachtarget has an “uncertainty” metric associated with its statethat increases when no agent is monitoring (i.e., sensingor collecting information from) the target and decreaseswhen one or more agents are monitoring it by dwelling inits vicinity. The goal is to optimally control each agent’smotion so as to collectively minimize the overall agentenergy consumption and a measure of target uncertainties -evaluated over a fixed period of interest. This problem setupis widely known as the persistent monitoring problem and itencompasses applications such as environmental sensing [1],surveillance [2], traffic monitoring [3], data collection [4],event detection [5] and energy management [6]. In order tosuit different application scenarios, this persistent monitoringproblem has been studied in the literature under differentobjective functions [7], agent dynamic models [8], [9] andtarget state dynamic models [10], [11].A common way to categorize persistent monitoring prob-lem setups is based on whether the shapes of trajectorysegments (available for the agents to travel between targets)are predefined [4], [10] or not [11], [12]. In the latter case, themain challenge is to search for the optimal agent trajectoryshapes. This is often achieved by restricting agent trajectoryshapes to specific parametric families (elliptical, Fourier, etc.[12]) and optimizing the objective function of interest withinthese families. In contrast, when the shapes of trajectory (cid:63)
Supported in part by NSF under grants ECCS-1931600, DMS-1664644,CNS-1645681, by AFOSR under grant FA9550-19-1-0158, by ARPA-Eunder grant DE-AR0001282 and by the NEXTCAR program under grantDE-AR0000796 and by the MathWorks.The authors are with the Division of Systems Engineering and Center forInformation and Systems Engineering, Boston University, Brookline, MA02446, { shiran27,cgc } @bu.edu . segments are predefined, the challenge is to search for: 1) theoptimal target visiting schedules of agents and 2) the optimalcontrol laws to govern agents on corresponding trajectorysegments - assuming an agent has to remain stationary on atarget to monitor it. As introduced in [10] and illustrated inFig. 1, this can be seen as a Persistent Monitoring on a Net-work (PMN) problem where targets and trajectory segmentsare modeled as nodes and edges of a network, respectively.Such PMN problems are significantly more complicated thanthe NP-hard traveling salesman problems [13] and thus haveinspired many different solution approaches [8], [10], [14].Fig. 1: The network abstraction.The work in [14] proposes a centralized off-line greedyalgorithm to determine the optimal target visiting schedulesof agents (i.e., each agent’s sequence of targets to visitand respective dwell-times to be spent at visited targets)in PMN problems. In contrast, for the same task, [10] pro-poses a gradient-based distributed on-line approach - which,however, requires a brief centralized off-line initializationstage to address non-convexities. An alternative approach istaken in the recent work [8] which exploits the event-drivennature of PMN systems to develop a distributed on-linesolution based on event-driven Receding Horizon Control(RHC) [15]. This RHC solution enjoys many promisingfeatures such as being computationally cheap, parameter-free, gradient-free and robust in the presence of various formsof state and system perturbations.However, the work mentioned above [8], [10], [14] ignoresagent dynamics by assuming each trajectory segment has apredefined transit-time value that an agent has to spend inorder to travel on it. This assumption allows one to focus ondetermining the optimal target visiting schedules of agents,ignoring how the agents are governed during the transitionperiods where they travel on trajectory segments. In essence,it is identical to assuming each agent follows a first-orderdynamic model controlled by its velocity.In contrast, in this paper, we assume each agent followsa second-order dynamic model governed by accelerationrather than velocity. This leads to a better approximationof actual agent behaviors in practice and smoother agentstate trajectories [9]. In particular, we incorporate agentenergy consumption into the objective function to limit agentaccelerations and velocities and also to motivate agents to a r X i v : . [ ee ss . S Y ] F e b ake energy-efficient decisions. Under these modifications,we show how each agent needs to optimally select eachtransit-time value on its trajectory based on current localstate information - instead of using a fixed set of predefinedtransit-time values. In particular, we explicitly derive optimalcontrol laws to govern each agent on each trajectory segment.Finally, we not only compare the improvements achievedwith respect to an existing RHC solution [8] that usesenergy-agnostic first-order agents but also derive energy-aware optimal control laws for even such first-order agents.In this paper, first, we show that each agent’s trajectory isfully characterized by the sequence of decisions it makesat specific discrete event times in its trajectory. Second,considering an agent at each such event-time, we formulatea Receding Horizon Control Problem (RHCP) that deter-mines the agent’s optimal immediate control decisions overan optimally determined planning horizon. These controldecisions are subsequently executed over a shorter actionhorizon defined by the next event that the agent observes, andthe same process is continued in this event-driven manner. Asthe third step, we show that this RHCP includes an optimalcontrol component and it is then solved considering energy-aware second-order agents. Finally, several different numer-ical examples (i.e., PMN problems) are used to compare thedeveloped RHC solution with respect to the RHC solutionproposed in [8] that uses energy-agnostic first-order agents.This paper is organized as follows. Section II presents theproblem formulation and overview of the RHC approach.Sections III and IV present the formulation and solution ofthe RHCP with second-order agents and first-order agents,respectively. Numerical results are provided in Section V.Finally, Section VI concludes the paper.II. P ROBLEM F ORMULATION
We consider a 2-dimensional mission space containing M targets (nodes) in the set T = { , , . . . , M } where thelocation of target i ∈ T is fixed at Y i ∈ R . A team of N agents in the set A = { , , . . . , N } is deployed to monitor thetargets. Each agent a ∈ A moves within this mission spacewhere its location and orientation at time t are denoted by s a ( t ) ∈ R and θ a ( t ) ∈ [ , π ] , respectively. a) Target Model : Each target i ∈ T has an associated uncertainty state R i ( t ) ∈ R which follows the dynamics [10]:˙ R i ( t ) = (cid:40) A i − B i N i ( t ) if R i ( t ) > A i − B i N i ( t ) >
00 otherwise, (1)where N i ( t ) = ∑ a ∈ A { s a ( t ) = Y i } ( {·} denotes the indicatorfunction) is the number of agents present at target i at time t . According to (1): (i) R i ( t ) increases at a rate A i whenno agent is visiting target i , (ii) R i ( t ) decreases at a rate A i − B i N i ( t ) where B i is the uncertainty removal rate by avisiting agent to the target i and (iii) R i ( t ) ≥ , ∀ t . b) Agent Model : The location and orientation ( s a ( t ) , θ a ( t )) of an agent a ∈ A follows the second-order unicycle dynamics given by˙ s a ( t ) = v a ( t ) (cid:2) cos ( θ a ( t )) sin ( θ a ( t )) (cid:3) T , ˙ v a ( t ) = u a ( t ) , ˙ θ a ( t ) = w a ( t ) , (2)where v a ( t ) is the tangential velocity, u a ( t ) is the tangentialacceleration and w a ( t ) is the angular velocity. We consider u a ( t ) and w a ( t ) as the agent control inputs.Note that according to (1), the agent has to stay stationaryon a target i ∈ T for some positive amount of time tocontribute to decreasing a positive target uncertainty R i ( t ) .Therefore, during such a dwell-time period, the agent mustenforce u a ( t ) = v a ( t ) = s a ( t ) = Y i . c) Objective : Our aim is to minimize the compositeobjective J T of the total energy spent J e (called the energyobjective ) and the mean system uncertainty J s (called the sensing objective ) over a finite time interval [ , T ] : J T (cid:44) α J e + J s = α (cid:90) T ∑ a ∈ A u a ( t ) dt (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J e + T (cid:90) T ∑ i ∈ T R i ( t ) dt (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J s , (3)by controlling agent control inputs u a ( t ) , w a ( t ) , ∀ a ∈ A , t ∈ [ , T ] . Note that α in (3) is a weight factor that can also bemanipulated to constrain the resulting optimal agent controls(details on selecting α to ensure proper normalization of the J T components are provided in Appendix A). Note also thatthe cost of angular velocity (steering) control is not includedin (3). The trade-off between J e and J s components of (3) isclear from the fact that the aggressiveness of agent transitionsin-between targets affects negatively the J e component butpositively the J s component. d) Graph Topology : We embed a directed graph topol-ogy G = ( T , E ) into the mission space so that the targets are represented by the graph vertices T = { , , . . . , M } andthe inter-target trajectory segments are represented by thegraph edges E ⊆ { ( i , j ) : i , j ∈ T } (see also Fig. 1). Thesetrajectory segments may take arbitrary (prespecified) shapesso as to account for constraints in the mission space andagent motion. We use ρ i j to denote the transit-time thatan agent spends on a trajectory segment ( i , j ) ∈ E to reachtarget j from target i . In contrast to [10] and [8] where thesetransit-time values were treated as predefined, in this workthey are considered as control-dependent. We also use P i j to represent the transit-time interval ( P i j ⊂ [ , T ] of length ρ i j ) corresponding to the transit-time ρ i j .The neighbor set and the neighborhood of a target i ∈ T are defined based on the available trajectory segments E as N i (cid:44) { j : ( i , j ) ∈ E } and ¯ N i = N i ∪ { i } . (4) e) Control : As stated earlier, when an agent a ∈ A dwells on a target i ∈ T , the agent control u a ( t ) is zero.However, over such a dwell-time period, the agent control w a ( t ) may or may not be zero (exact details will be providedlater). Next, when the agent is ready to leave the target i , itneeds to decide the next-visit target j ∈ N i along with theorresponding control profiles u a ( t ) , w a ( t ) to be used on thetrajectory segment ( i , j ) ∈ E over t ∈ P i j .In essence, the overall control exerted on an agent can beseen as a sequence of: dwell-times δ i ∈ R ≥ , next-visit targets j ∈ N i and control profile segments { ( u a ( τ ) , w a ( τ )) : τ ∈ P i j } . Our goal is to determine ( δ i ( t s ) , j ( t s ) , { ( u a ( τ ) , w a ( τ )) : τ ∈ P i j ( t s ) } ) for any agent a ∈ A residing at any target i ∈ T at any time t s ∈ [ , T ] , which is optimal in the senseof minimizing (3).Clearly, this PMN problem is more complicated than thewell known NP-Hard traveling salesman problem (TSP)[13] due to its inclusion of: (i) multiple agents, (ii) targetdynamics, (iii) agent dynamics, (iv) target dwell-times and(v) repeated target visits. Even though one can still resortto dynamic programming techniques to solve this PMNproblem, for all the above reasons, the problem is intractable- even for the most simplistic problem configurations. f) Receding Horizon Control : As a solution to thisPMN problem, inspired by the prior work [8] (where wedealt with first-order agents without agent energy concerns),this paper proposes an
Event-Driven Receding Horizon Con-troller (RHC) at each agent. The key idea behind RHCderives from Model Predictive Control (MPC). However,RHC exploits the problem’s event-driven nature to signifi-cantly reduce the complexity by effectively decreasing thefrequency of control updates. As introduced and extendedlater on in [15] and [16], [8] respectively, the RHC isinvoked by the agents in a distributed manner at specificevents of interest in their trajectories. Upon invoking it, RHCdetermines the agent controls that optimize the objective(3) over a planning horizon and subsequently executes thedetermined optimal controls over a shorter action horizon .In particular, when the RHC is invoked at some event-time t s ∈ [ , T ] by an agent a ∈ A while residing at target i ∈ T ,it determines: (i) the remaining dwell-time δ i ( t s ) at target i ,(ii) the next-visit target j ( t s ) ∈ N i , (iii) the control profilesegments { u a ( τ ) , w a ( τ ) : τ ∈ P i j ( t s ) } and (iv) the dwell-time δ j ( t s ) at target j ( t s ) . These control decisions are jointlyrepresented by U ia ( t s ) and its optimal value is determined bysolving an optimization problem of the form: U ∗ ia ( t s ) = arg min U ia ( t s ) ∈ U ( t s ) J H ( X ia ( t s ) , U ia ( t s ) ; H ) + ˆ J H ( X ia ( t s + H )) (5)where X ia ( t s ) is the current local state and U ( t s ) is the feasi-ble control set at time t s (exact definitions are provided later).The term J H ( X ia ( t s ) , U ia ( t s ) ; H ) represents the immediate costover the planning horizon [ t s , t s + H ] and ˆ J H ( X ia ( t s + H ) is anestimate of the future cost based on the state at t s + H .In particular, we follow the variable horizon concept pro-posed in [8] where the planning horizon length is treated asan upper-bounded function of control decisions w ( U ia ( t s )) ≤ H rather than an exogenously selected value H , and theˆ J H ( X ia ( t s + H ) term is ignored. Hence, this approach incorpo-rates the selection of planning horizon length w ( U ia ( t s )) into the optimization problem (5), which now can be re-stated as U ∗ ia ( t s ) = arg min U ia ( t s ) ∈ U ( t s ) J H ( X ia ( t s ) , U ia ( t s ) ; w ( U ia ( t s ))) subject to w ( U ia ( t s )) ≤ H . (6) A. Preliminary Results
According to (1), the target state (uncertainty) R i ( t ) ofa target i ∈ T is piece-wise linear and its gradient ˙ R i ( t ) changes only when one of the following (strictly local) events occurs: (i) An agent arrival at i , (ii) R i ( t ) switches frompositive to zero, denoted as [ R i ( t ) → + ] , or (iii) An agentdeparture from i . Let us denote the sequence of such eventtimes (associated with the target i ) as t ki where k ∈ Z > with t i =
0. Then, it is easy to see from (1) that˙ R i ( t ) = ˙ R i ( t ki ) , ∀ t ∈ [ t ki , t k + i ) . (7) Remark 1:
As pointed out in [17], [8] (and the referencestherein), allowing multiple agents to simultaneously resideon a target (known also as “simultaneous target sharing”)is known to lead to solutions with poor performance levels.Thus, we enforce a constraint [8] on the controller to ensure: N i ( t ) ∈ { , } , ∀ t ∈ [ , T ] , ∀ i ∈ T . (8)Clearly, this constraint only applies if N ≥ { ˙ R i ( t ki ) } k = , ,... is a cyclic order of three elements: {− ( B i − A i ) , , A i } . Next, in order to make sure that each agent iscapable of enforcing the event [ R i → + ] at any target i ∈ T ,we assume the following simple stability condition [8]: Assumption 1:
Target uncertainty rate parameters A i and B i of each target i ∈ T satisfy 0 < A i < B i . a) Decomposition of the Sensing Objective J s : Thefollowing theorem provides a target-wise and temporal de-composition of the sensing objective J s defined in (3). Theorem 1: ([8, Th.1]) The contribution to the term J s in(3) by a target i ∈ T during a time period [ t , t ) ⊆ [ t ki , t k + i ) for some k ∈ Z ≥ is T J i ( t , t ) , where, J i ( t , t ) = (cid:90) t t R i ( t ) dt = ( t − t ) (cid:2) R i ( t ) + ˙ R i ( t )( t − t ) (cid:3) . (9) b) Local Sensing Objective Function : The local sens-ing objective function of a target i ∈ T over a period [ t , t ) ⊆ [ , T ] is defined as¯ J i ( t , t ) = ∑ j ∈ ¯ N i J j ( t , t ) , (10)where each J j ( t , t ) term is evaluated using Theorem 1. c) Decomposition of the Energy Objective J e : Asimilar decomposition result as Theorem 1 applies to theenergy objective J e defined in (3). However, this result isimmediate from (3) and is as follows. The contribution to theterm J e in (3) by an agent a ∈ A from traversing a trajectorysegment ( i , j ) ∈ E over the transit-time interval [ t o , t f ] (cid:44) P i j is J a ( t o , t f ) , where, J a ( t o , t f ) = (cid:90) t f t o u a ( t ) dt . (11)ote that the agent does not have any contribution to the J e term during dwell-time intervals as u a ( t ) = d) Agent Angular Velocity Profile w a ( t ) : The controlprofile segment { w a ( t ) : t ∈ P i j } that needs to be used byan agent a ∈ A over the transit-time interval P i j on thetrajectory segment ( i , j ) ∈ E can be obtained using only thefollowing information: (i) the agent tangential accelerationprofile { u a ( t ) : t ∈ P i j } and (ii) the shape of the trajectorysegment ( i , j ) given in a parametric form { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } . Note that the parameter values p = p o and p = p f correspond to the terminal target locations Y i ≡ ( x ( p o ) , y ( p o )) and Y j ≡ ( x ( p f ) , y ( p f )) , respectively. For notational conve-nience, let us denote x (cid:48) p = dx ( p ) dp , y (cid:48) p = dy ( p ) dp , x (cid:48)(cid:48) p = d x ( p ) dp and y (cid:48)(cid:48) p = d y ( p ) dp .First, we require a minor technical assumption regardingthe said trajectory segment shape parameterization. Assumption 2:
There exists an injective (i.e., one-to-one)function f : [ p o , p f ] → [ , y i j ] such that f ( p ) (cid:44) (cid:90) pp o (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) d p , (12)with f ( p f ) = y i j and a corresponding inverse function f − .This assumption simply means that we should be able toexpress the distance, say l , along the trajectory segment start-ing from ( x ( p o ) , y ( p o )) to ( x ( p ) , y ( p )) where p ∈ [ p o , p f ] ,explicitly in terms of the parameter p (i.e., l = f ( p ) ) andvice versa (i.e., p = f − ( l ) ). Clearly this assumption holdsif the distance l is used directly as the parameter p (i.e., p = l ) that characterizes the trajectory segment shape.Second, let us define a function F : [ p o , p f ] → R such that F ( p ) (cid:44) x (cid:48) p y (cid:48)(cid:48) p − y (cid:48) p x (cid:48)(cid:48) p (cid:0) ( x (cid:48) p ) + ( y (cid:48) p ) (cid:1) . (13)Finally, as shown in Fig. 2, let us denote by l a ( t ) , t ∈ P i j the total distance the agent has traveled on the trajectorysegment ( i , j ) by time t . According to (2), v a ( t ) , t ∈ P i j represents the agent tangential velocity on the trajectorysegment at time t . Considering the agent dynamics alongthe tangential direction to the trajectory segment, note thatwe can write l a ( t ) = (cid:90) tt o v a ( t ) dt and v a ( t ) = (cid:90) tt o u a ( t ) dt , (14)for all t ∈ [ t o , t f ] (cid:44) P i j (note also that the terminal conditions l a ( t f ) = y i j and v a ( t f ) = a ∈ A whiletraversing a trajectory segment ( i , j ) ∈ E . Theorem 2:
The required agent angular velocity profile { w a ( t ) : t ∈ P i j } on trajectory segment ( i , j ) ∈ E is w a ( t ) = F ( f − ( l a ( t ))) v a ( t ) , (15)where f ( · ) and F ( · ) are as in (12) and (13), respectively. Proof:
Provided in Appendix B.For an example, if the trajectory segment ( i , j ) (betweentarget locations Y i and Y j ) takes a circular shape centeredat C i j ∈ R with a radius r i j , it can be represented bythe parametric form: { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } where ( x ( p ) , y ( p )) ≡ C i j + r i j [ cos ( p ) , sin ( p )] T and p o = arctan ( Y i − C i j ) , p f = arctan ( Y j − C i j ) . Using (13) and (15), it can beshown that F ( p ) = / r i j and w a ( t ) = v a ( t ) / r i j , respectively.Similarly, if the trajectory segment ( i , j ) takes a linearshape, it can be shown that f ( p ) = p and F ( p ) = w a ( t ) = Remark 2:
In robotics applications where line-followingtechniques can be used [18], an agent can use its line-following capabilities to control its angular velocity w a ( t ) (instead of using (15)) - irrespective of its tangential accel-eration u a ( t ) .In conclusion, Theorem 2 allows us to dispense of w a ( t ) as a control input because it is always determined through u a ( t ) (which gives v a ( t ) via (14)) and the prespecified shapeof the trajectory segment. e) The Equivalent Dynamic Agent Model : Since wenow have discussed how an agent a ∈ A can control itsangular velocity w a ( t ) (i.e., via (15)), we can omit angulardynamics from (2) to construct an equivalent dynamic agentmodel, that focuses only on the tangential dynamics on atrajectory segment ( i , j ) ∈ E . In particular, as a direct conse-quence of (14), upon taking the state vector as [ l a ( t ) , v a ( t )] T for some t ∈ P i j , we can express the corresponding statedynamics as a second-order single-input linear system: (cid:20) ˙ l a ( t ) ˙ v a ( t ) (cid:21) = (cid:20) (cid:21) (cid:20) l a ( t ) v a ( t ) (cid:21) + (cid:20) (cid:21) u a ( t ) . (16)In the sequel, we use (16) and (15) to determine the op-timal agent control profile segments { u a ( t ) : t ∈ P i j } and { w a ( t ) : t ∈ P i j } , respectively. This particular decompositionof unicycle agent dynamics is fundamentally similar to thatproposed in [19]. B. ED-RHC Problem (RHCP) Formulation
Consider an agent a ∈ A residing on a target i ∈ T at some time t s ∈ [ , T ] . Recall that control U ia ( t s ) in (6)includes dwell-time decisions δ i and δ j at the current target i and the next-visit target j ∈ N i , respectively. As shown inFig. 3, a dwell-time decision δ i (or δ j ) can be divided intotwo interdependent decisions: (i) the active time τ i (or τ j )and (ii) the inactive (or idle ) time ¯ τ i (or ¯ τ j ). Therefore, theagent has to optimally choose decision variables which formthe control vector U ia ( t s ) = [ τ i , ¯ τ i , j , { u a ( t ) } , τ j , ¯ τ j ] . Note thathere we have: (i) omitted representing each of these decisionvariable’s dependence on t s , (ii) used the notation { u a ( t ) } torepresent { u a ( t ) : t ∈ P i j ( t s ) } and (iii) omitted { w a ( t ) : t ∈ P i j ( t s ) } as it can be found directly from { u a ( t ) } and (15). ) The Receding Horizon Control Problem (RHCP) : Let us denote the real-valued component of the control vector U ia ( t s ) in (6) as U ia j ( t s ) = [ τ i , ¯ τ i , { u a ( t ) } , τ j , ¯ τ j ] . The discretecomponent of U ia ( t s ) is simply the next-visit target j ∈ N i . Inthis setting (see also Fig. 3), we define the planning horizonlength w ( U ia ( t s )) in (6) as w ( U ia j ( t s )) (cid:44) τ i + ¯ τ i + ρ i j + τ j + ¯ τ j . (17)The current local state X ia ( t s ) in (6) is considered as X ia ( t s ) =[ s a , v a , θ a , { R j : j ∈ ¯ N i } ] (again, omitting the dependence on t s ). Then, the optimal controls are obtained by solving (6),which can be re-stated as the following set of optimizationproblems, henceforth called the RHC Problem (RHCP): U ∗ ia j = arg min U iaj ∈ U J H ( X ia ( t s ) , U ia j ; w ( U ia j )) ; ∀ j ∈ N i , subject to w ( U ia j ) ≤ H (18) j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ∗ ia j )) . (19)Note that (18) requires solving | N i | optimization problems,one for each neighboring target j ∈ N i ( | · | is the cardinalityoperator). The next step (19) is a simple comparison todetermine the optimal next-visit target j ∗ . Therefore, the finaloptimal controls of the RHCP are U ∗ ia ( t s ) = [ U ∗ ia j ∗ , j ∗ ] .The objective function J H ( · ) in (18) is chosen to reflect thecontribution to the main objective J T in (3) by the targets inthe neighborhood ¯ N i and by the agent a , over the planninghorizon [ t s , t s + w ] as J H ( X ia ( t s ) , U ia j ; w ) (cid:44) α H J a ( t o , t f ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J eH + w ¯ J i ( t s , t s + w ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) J sH . (20)where w = w ( U ia j ) and α H (cid:44) α (the weight factor used in(3)). In (20), the form of the J sH component has been selectedso that it is analogous to the J s component in (3) (with T replaced by w ). As illustrated in Fig. 3, note also that t e (cid:44) t s + w , [ t o , t f ] (cid:44) P i j ⊆ [ t s , t e ] and ρ i j (cid:44) t f − t o .Fig. 3: Event timeline and control decisions in ED-RHC. b) Planning Horizon : In conventional RHC methods,the RHCP objective function is evaluated over a fixed plan-ning horizon length H , where H is selected exogenously.This makes the RHCP solution dependent on the choiceof H . In contrast, through (20) and (17) above, we havemade the RHCP solution (i.e., (18) and (19)) free of theparameter H , by using H only as an upper-bound to theactual planning horizon length w ( U ia j ) in (17) and selecting H to be sufficiently large (e.g., H = T − t s ). In fact, since the planning horizon length w ( U ia j ) is acontrol variable, the above RHCP formulation simultane-ously determines the optimal planning horizon length w ∗ = w ( U ∗ ia j ∗ ) . Moreover, as shown in Fig. 3, the time to departfrom the current target i (i.e., t o ), the time to arrive at thedestination target j (i.e., t f ) and the corresponding transit-time ρ i j = t f − t o , are also control dependent. Hence, thisRHCP formulation also determines the optimal values ofeach of these quantities: t ∗ o , t ∗ f and ρ ∗ i j ∗ , respectively. c) Overview of the RHCP Solution Process : Lookingback at (9) and (10), notice that the sensing component J sH of the RHCP objective (20) does not explicitly depend onthe agent control profile segment { u a ( t ) : t ∈ P i j } , but, itdepends on the agent’s transit-time ρ i j value and on the othercontrol decisions in U ia j : τ i , ¯ τ i , τ j , ¯ τ j . Therefore, let us denote J sH as a function parameterized by ρ i j : J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) .In contrast, based on (11), notice that the energy compo-nent J eH of the RHCP objective (20) only depends on agentcontrol profile segments, specifically on { u a ( t ) : t ∈ P i j } .Therefore, let us denote J eH simply as J eH ( { u a ( t ) } ) .As illustrated in Fig. 4, we exploit this property of theRHCP objective components ( J sH and J eH ) to solve theRHCP (18). In particular, we start with analytically solvingthe optimization problem which we label as the RHCP( ρ i j ): J ∗ sH ( ρ i j ) (cid:44) min ( τ i , ¯ τ i , τ j , ¯ τ j ) ∈ U s ( ρ ij ) J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) . (21)For this purpose, we exploit a few results established in [8]where the RHCP( ρ i j ) (21) has already been solved whiletreating ρ i j as a known constant.Next, we use the J ∗ sH ( ρ i j ) function obtained from (21) andthe relationship ρ i j = t f − t o to reformulate the problem ofoptimizing the RHCP objective (20) as an optimal controlproblem (OCP): [ t ∗ o , t ∗ f , { u ∗ a ( t ) } ] = arg min t o , t f , { u a ( t ) } α H J e ( { u a ( t ) } ) + J ∗ sH ( t f − t o ) . (22)Finally, as shown in Fig. 4, it is straightforward howthe RHCP (18) solution U ∗ ia j can be constructed from theobtained solutions of the OCP (22) and the RHCP( ρ i j ) (21).Fig. 4: Overview of the RHCP Solution Process whensolving (18) for some next-visit target j ∈ N i . 1. Solvingthe receding horizon control component of (18) (i.e., (21));2. Solving the optimal control component of (18) (i.e., (22));3. Constructing the final solution of (18). ) Event-Driven Action Horizon : Each RHCP solution(i.e., U ∗ ia ( t s ) = [ U ∗ ia j ∗ , j ∗ ] from (18)-(19)) obtained over aplanning horizon w ( U ∗ ia j ∗ ) ≤ H is generally executed over ashorter action horizon h ≤ w ( U ∗ ia j ∗ ) . In particular, the actionhorizon h is determined by the first event that takes placeafter t s , where the RHCP was last solved. Such a subsequentevent may be controllable if it results from executing the lastsolved RHCP solution or uncontrollable if it results from arandom or an external event (if such events are allowed).When executing the RHCP solution obtained by an agentat target i at time t s , there are three mutually exclusivecontrollable events that may occur subsequently. They are:
1. Event [ h → τ ∗ i ] : This event is feasible only if τ ∗ i ( t s ) > t = t s + τ ∗ i ( t s ) . If R i ( t ) >
0, itcoincides with a departure event from target i . Otherwise,i.e., if R i ( t ) =
0, it coincides with a [ R i → + ] event.
2. Event [ h → ¯ τ ∗ i ] : This event is feasible if τ ∗ i ( t s ) = R i ( t s ) =
0) and ¯ τ ∗ i ( t s ) ≥
0. It occurs at t = t s + ¯ τ ∗ i ( t s ) and coincides with a departure event from target i .
3. Event [ h → ρ i j ∗ ] : This event is feasible only if adeparture event (from target i ) occurred at t s . Clearly thisevent coincides with an arrival event at target j ∗ ( t s ) .In an agent trajectory, at a given time instant, only oneof these three controllable events is feasible. However, thereare two uncontrollable events that may occur at an agentresiding in a target i due to two specific controllable eventsat a neighboring target j ∈ N i . These two types of eventsare aimed to enforce the “no simultaneous target sharing”condition (i.e., the control constraint (8)) and thus, onlyapplies to multi-agent problems. To enforce this condition,an agent at target i modifies its neighborhood N i to N i \{ j } when: (i) another agent already resides at target j or (ii)another agent is en-route to visit target j . Therefore, wedefine the following two neighbor induced events at target i due to a neighbor j ∈ N i :
4. Covering Event C j , j ∈ N i : This event causes N i tobe modified to N i \{ j } .
5. Uncovering Event ¯ C j , j ∈ N i : This event causes N i to be modified to N i ∪ { j } .If one of these two events occurs while the agent isawaiting an event [ h → τ ∗ i ] or [ h → ¯ τ ∗ i ] , the RHCP is resolvedto account for the updated neighborhood N i . e) Three Forms of RHCPs : The exact form of theRHCP ((18) and (19)) that needs to be solved at a certaintime depends on the event that triggered the end of theprevious action horizon. In particular, corresponding to thethree controllable event types, there are three forms ofRHCPs:
RHCP1:
At a target i and time t s , this particular problemform is solved upon: (i) an arrival event [ h → ρ ki ] where k ∈ N i or (ii) a C j (or a ¯ C j ) event occurred when R i ( t s ) > j ∈ N i . Since R i ( t s ) > U ia ( t s ) = [ τ i , ¯ τ j , j , { u a ( t ) } , τ j , ¯ τ j ] with τ i ≥ RHCP2:
At a target i and time t s , this particular problemform is solved when R i ( t s ) = [ h → τ ∗ i ] or (ii) a C j (or a ¯ C j ) event where j ∈ N i . Since R i ( t s ) = RHCP1 but with τ i =
0, hence simpler.
RHCP3:
At a target i and time t s , this particular problemform is solved upon: (i) an event [ h → τ ∗ i ] with R i ( t s ) > [ h → ¯ τ ∗ i ] . Simply, this problem form issolved whenever the agent is ready to depart from the target.Therefore, it is the same as RHCP1 but with τ i = τ i =
0. III. S
OLVING E VENT -D RIVEN
RHCP S In this section, we present the solutions to the three RHCPforms identified above. We begin with
RHCP3 . A. Solution of
RHCP3
RHCP3 is the simplest RHCP given that τ i = ¯ τ i = U ia by default. Therefore, U ia j (i.e., the real-valued component of U ia , used in (18)) is limited to U ia j = [ { u a ( t ) } , τ j , ¯ τ j ] and theplanning horizon w ( U ia j ) defined in (17) becomes w ( U ia j ) = ρ i j + τ j + ¯ τ j .Under these conditions, we next solve (18) (via solvingRHCP( ρ i j ) (21) and OCP (22), as shown in Fig. 4) and (19)to obtain the RHCP3 solution. a) Solution of RHCP( ρ i j ) (21) : As mentioned before,RHCP( ρ i j ) has already been solved in [8] - while treating ρ i j as a known fixed value. In particular, the RHCP( ρ i j ) solutioncorresponding to the RHCP3 takes the form [8, Th. 2]: ( τ ∗ i , ¯ τ ∗ i ) = ( , ) , ( τ ∗ j , ¯ τ ∗ j ) = ( , ) if ¯ A ≥ B j or D > D ( D , ) else if D < D ( D , ) else if B j > ¯ A ≥ B j (cid:20) − ρ ij ( ρ ij + D ) (cid:21) ( D , ¯ D ) else if ¯ D ≤ ¯ D ( D , ¯ D ) otherwise, J ∗ sH ( ρ i j ) = J sH ( τ ∗ j , ¯ τ ∗ j ; ρ i j ) , (23)where¯ A = ∑ m ∈ ¯ N i A m , D = ¯ A ρ i j B i − ¯ A , D = min { D , H − ρ i j } , D = R j ( t o ) B j − A j + A j B j − A j ρ i j , ¯ D = (cid:115) ( B j − A j )( ρ i j + D ) − B j ρ i j ¯ A j − ( ρ i j + D ) , ¯ A j = ¯ A − A j , ¯ D (cid:44) H − ( ρ i j + D ) , J sH ( τ j , ¯ τ j ; ρ i j ) = C τ j + C ¯ τ j + C τ j ¯ τ j + C τ j + C ¯ τ j + C ρ i j + τ j + ¯ τ j , C = [ ¯ A − B j ] , C = ¯ A j , C = ¯ A j , C = [ ¯ R ( t o ) + ¯ A ρ i j ] , ¯ R = ∑ m ∈ ¯ N i R m , C = [ ¯ R j ( t o ) + ¯ A j ρ i j ] , ¯ R j (cid:44) ¯ R − R j , C = ρ i j [ R ( t o ) + ¯ A ρ i j ] . (24)ote that in (23), not only J ∗ sH , but also τ ∗ j and ¯ τ ∗ j arefunctions of the transit-time ρ i j . To provide intuition aboutthe J ∗ sH ( ρ i j ) function form, let us consider the first case in(23) where ( τ ∗ j , ¯ τ ∗ j ) = ( , ) that results in J ∗ sH ( ρ i j ) = J sH ( , ρ i j ) = ¯ R ( t o ) +
12 ¯ A ρ i j , (25)under the condition ¯ A ≥ B j or D > D . Using (24), it canbe shown that D > D ⇐⇒ ρ i j > min { R j ( t o )( B i − ¯ A ) ¯ AB j − A j B i , H ( − ¯ AB i ) } . From this example, it is clear that the function J ∗ sH ( ρ i j ) isdependent on the neighborhood parameters (e.g., ¯ A , B j , B i ) aswell as the current neighborhood state (e.g., ¯ R ( t o ) , R j ( t o ) ). b) Objective Function of OCP (22) : Note that we nowhave solved the RHCP( ρ i j ) and have obtained the functions(of ρ i j ): τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j , and, most importantly, J ∗ sH . Based onthe RHCP solution process outlined in Fig. 4, our next stepis to formulate and solve the corresponding OCP (22).As shown in (22), the sensing objective component of OCPis J ∗ sH ( t f − t o ) . Note that we now can explicitly express thisterm using the obtained J ∗ sH ( ρ i j ) function in (23) and therelationship ρ i j = t f − t o . For notational convenience, takinginto account that in RHCP3 , t o is the the current event timewhen the RHCP is solved (i.e., t o = t s where t s is fixed andknown), let us denote this sensing objective component ofOCP as φ ( t f ) (cid:44) J ∗ sH ( t f − t o ) . (26)On the other hand, using (20) and (11), the energyobjective component of OCP (22) can be expressed as J eH ( { u a ( t ) } ) = (cid:90) t f t o u a ( t ) dt . (27) c) Solution of OCP (22) : In the following analysis, fornotational convenience, we use ˙ x = Ax ( t ) + Bu ( t ) with A = (cid:20) (cid:21) , B = (cid:20) (cid:21) , x ( t ) = (cid:20) l a ( t ) v a ( t ) (cid:21) , u ( t ) = u a ( t ) , (28)to represent the agent dynamics stated in (16). Under thisnotation, using (26) and (27), the OCP (22) can be stated asmin t f , { u ( t ) } α H (cid:90) t f t o u ( t ) dt + φ ( t f ) subject to ˙ x = Ax ( t ) + Bu ( t ) , x ( t o ) = [ , ] T , x ( t f ) = [ y i j , ] T . (29)The last two constraints in (29) are simply terminal con-straints for the agent motion on the trajectory segment ( i , j ) .Note that (29) is a standard free final time, fixed initialand final state optimal control problem. Hence, there is anestablished solution procedure [20] as outlined next.First, the Hamiltonian corresponding to (29) is written as H ( x ( t ) , u ( t ) , t ) (cid:44) α H u ( t ) + λ T ( t )( Ax ( t ) + Bu ( t )) , (30) where λ ( t ) represents the co-state variables. Next, the ad-joined function that combines the terminal constraint on x ( t f ) and the terminal cost φ ( t f ) is written as Φ ( x ( t f ) , t f ) (cid:44) φ ( t f ) + ν T ( x ( t f ) − [ y i j , ] T ) , where ν is a set of multipliers.Finally, the OCP in (29) can be solved to obtain thecorresponding optimal { x ( t ) , u ( t ) , λ ( t ) : t ∈ [ t o , t f ] } , t f and ν values by solving the following system of equations [20]: ∂ H ∂ u = α H u ( t ) + λ T ( t ) B = , (31)˙ λ = − (cid:18) ∂ H ∂ x (cid:19) T = − A T λ ( t ) , λ ( t f ) = (cid:18) ∂ Φ ∂ x ( t f ) (cid:19) T = ν , (32) d Φ dt f + α H u ( t f ) = d φ dt f + ν T ( Ax ( t f ) + Bu ( t f )) + α H u ( t f ) = Lemma 1:
The optimal terminal time t ∗ f of the OCP (29)satisfies the equation: ( t f − t o ) d φ ( t f ) dt f = α H y i j , (34)where φ ( t f ) is known from (26) and the correspondingoptimal control law u ∗ ( t ) is given by u ∗ ( t ) = y i j ( t ∗ f − t o ) (cid:20) t ∗ f + t o − t (cid:21) , ∀ t ≡ [ t o , t f ] . (35) Proof:
First, we take ν = [ ν , ν ] T , λ ( t ) = [ λ ( t ) , λ ( t )] T and solve (32) for λ ( t ) and λ ( t ) . This gives: λ ( t ) = ν and λ ( t ) = ν + ν ( t f − t ) , ∀ t ∈ P i j , (recall P i j = [ t o , t f ] ). We then solve (31) for u ( t ) to obtain: u ( t ) = − λ ( t ) α H = − α H (cid:0) ν + ν ( t f − t ) (cid:1) , ∀ t ∈ P i j . (36)Next, we take x ( t ) = [ x ( t ) , x ( t )] T and solve the agentdynamics equation in (29) (also using (36)) for x ( t ) . Thisresults in: x ( t ) = ( t o − t ) α H (cid:16) ν + ν t f − ν ( t o + t ) (cid:17) , ∀ t ∈ P i j . Now, using the terminal constraint x ( t f ) = ν = − ν ( t f − t o ) /
2. Back substituting this in above x ( t ) we get a further simplified expression for it as x ( t ) = ν α H (cid:0) t − ( t o + t f ) t + t o t f (cid:1) , ∀ t ∈ P i j . Applying this result in the relationship x ( t ) = (cid:82) tt o x ( t ) dt (i.e., agent dynamics) we get: x ( t ) = v ( t − t o ) α H (cid:0) t − ( t f + t o ) t + t o ( t f − t o ) (cid:1) , ∀ t ∈ P i j . imilar to before, using the terminal constraint x ( t f ) = y i j on the above (and via back substituting), we get ν = α H y i j ( t f − t o ) (and ν = − α H y i j ( t f − t o ) , u ( t f ) = y i j ( t f − t o ) ) . (37)Now we are ready to use (33) to solve for the optimal t f value (i.e., t ∗ f ). Note that (33) directly simplifies to the form: d φ dt f + ν u ( t f ) + α H u ( t f ) = , which we can further reduce to the form (using (37)): d φ dt f − α H y i j ( t f − t o ) = , and obtain (34). Finally, the optimal control law u ∗ ( t ) in (35)can be obtained by substituting (37) in (36).Using the optimal terminal time t ∗ f and control u ∗ ( t ) (i.e., u ∗ a ( t ) ) proven in Lemma 1, the optimal energy objectivecomponent of this OCP (i.e., (27)) can be obtained as J eH ( { u ∗ a ( t ) } ) = y i j ( t ∗ f − t o ) . (38)The corresponding optimal sensing objective component(i.e., (26)) is directly given by φ ( t ∗ f ) = J ∗ sH ( t ∗ f − t o ) . Finally,the optimal transit-time value is ρ ∗ i j = t ∗ f − t o . d) Solution of RHCP (18) for U ∗ ia j : As outlined in Fig.4, we now can conclude solving RHCP (18). First, we applythe determined ρ ∗ i j value in (23) to get the optimal controldecisions: τ ∗ j and ¯ τ ∗ j of the control vector U ∗ ia j (18). Remark 3:
Note that τ ∗ j and ¯ τ ∗ j in (23) are piece-wisefunctions of ρ i j (with at most five cases). Hence, J ∗ sH ( ρ i j ) in (23) is also a piece-wise function of ρ i j . Even thoughthis presents a complication to the proposed RHCP (18)solution process, it can be resolved by considering one case(of J ∗ sH ( ρ i j ) ) at a time when the corresponding OCP (22) issolved. Then, the resulting optimal transit-time value ρ ∗ i j canbe used to ensure the validity as well as the optimality ofthe considered case of J ∗ sH ( ρ i j ) (compared to other cases).Among the remaining control decisions in U ∗ ia j (18), wehave already found the optimal tangential acceleration profilesegment { u ∗ a ( t ) : t ∈ P i j } . Integrating this, the correspondingtangential velocity profile segment can be obtained as v ∗ a ( t ) = y i j ( ρ ∗ i j ) ( t − t o )( t o + ρ ∗ i j − t ) , ∀ t ∈ P i j . (39)Finally, the optimal angular velocity profile segment { w ∗ a ( t ) : t ∈ P i j } (required in U ∗ ia j ) can be found using (39) in(15) together with the information about the shape of thetrajectory segment ( i , j ) . Remark 4:
Note that the OCP (29) (or (22) in general)only requires the total length y i j value of the trajectorysegment ( i , j ) . The shape of ( i , j ) becomes important onlywhen w ∗ a ( t ) has to be determined to facilitate the agent’sdeparture from target i to reach target j (i.e., at the end of an RHCP3 solving process). Therefore, even though we initiallyassumed the shapes of trajectory segments as prespecified, the proposed RHC framework can adapt even if they changeoccasionally. For instance, a new class of external events(similar to C j and ¯ C j ) can be defined based on such trajectorysegment shape change events - to make agents react to suchevents. This flexibility is an advantage as the shape of atrajectory segment may have to be designed (by an upper-level trajectory planner) taking into account moving obstaclesand other agents in the mission space as well as the agent’sown motion and controller constraints. e) Solution of RHCP (19) for j ∗ : We now havesolved RHCP (18) and have obtained the optimal controlvector U ∗ ia j corresponding to the next-visit target j . Next, thisprocess should be repeated for all the neighboring targets j ∈ N i to get the control vectors: { U ia j : j ∈ N i } . Finally,the optimal next-visit target j ∗ can be found from (19) as j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ia j )) .Upon solving RHCP3 , agent a departs from the target i and starts following the trajectory segment ( i , j ) whileexecuting the obtained optimal agent controls until it arrivesat the target j ∗ . According to the proposed RHC architecture,upon arrival, the agent will solve an instance of RHCP1 . B. Solution of
RHCP1
We now directly consider
RHCP1 as it encompasses
RHCP2 and is the most general form of the RHCP ((18)-(19)) in that no active or idle time is restricted to zero. Inthis case, the planning horizon w ( U ia j ) is same as in (17).Similar to before, we next solve (18) (via RHCP( ρ i j ) (21)and OCP (22)) and (19) to obtain the solution of RHCP1 . a) Solution of RHCP( ρ i j ) (21) : As mentioned before,RHCP( ρ i j ) corresponding to the RHCP1 has already beensolved in [8] to obtain: ( τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j )= arg min ( τ i , ¯ τ i , τ j , ¯ τ j ) ∈ U s ( ρ ij ) J sH ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j ) J ∗ sH ( ρ i j ) = J sH ( τ ∗ i , ¯ τ ∗ i , τ ∗ j , ¯ τ ∗ j ; ρ i j ) (40)where J H ( τ i , ¯ τ i , τ j , ¯ τ j ; ρ i j )= C τ i + C ¯ τ i + C τ j + C ¯ τ j + C τ i ¯ τ i + C τ i τ j + C τ i ¯ τ j + C ¯ τ i τ j + C ¯ τ i ¯ τ j + C τ j ¯ τ j + C τ i + C ¯ τ i + C τ j + C ¯ τ j + C τ i + ¯ τ i + ρ i j + τ j + ¯ τ j , C = ¯ A − B i , C = ¯ A i , ¯ A i = ¯ A − A i , C = ¯ A − B j , C = ¯ A j , C = C = ¯ A i , C = ¯ A − B i , C = ¯ A j − B i , C = ¯ A i j = ¯ A i − A j , C = ¯ A j , C = [ ¯ R ( t O ) + ( ¯ A − B i ) ρ i j ] , C = [ ¯ R i ( t O ) + ¯ A i ρ i j ] , ¯ R i = ¯ R − R i , C = [ ¯ R ( t O ) + ¯ A ρ i j ] , C = [ ¯ R j ( t O ) + ¯ A j ρ i j ] , C = ρ i j [ R ( t O ) + ¯ A ρ i j ] . (41)Explicit expressions of τ ∗ i , ¯ τ ∗ i , τ ∗ j and ¯ τ ∗ j (each as a function of ρ i j ) are determined using the rational function optimizationtechnique proposed in [8, App. A]. However, due to spaceconstraints, we omit giving their exact forms. ) Objective Function of OCP (22) : The sensingobjective component of OCP: J ∗ sH ( t f − t o ) now can be ex-pressed explicitly using the J ∗ sH ( ρ i j ) function in (40) andthe relationship ρ i j = t f − t o . Note however that in RHCP1 ,both t o and t f are free. Therefore, let us denote this sensingobjective component of OCP as φ ( t o , t f ) (cid:44) J ∗ sH ( t f − t o ) . (42)However, the energy objective component of OCP: J eH ( { u a ( t ) } ) in RHCP1 takes the same form as in (27). c) Solution of OCP (22) : Similar to before, using ˙ x = Ax ( t ) + Bu ( t ) with (28) to represent the agent dynamics (16),the OCP corresponding to the RHCP1 can be stated asmin t o , t f , { u ( t ) } α H (cid:90) t f t o u ( t ) dt + φ ( t o , t f ) subject to ˙ x = Ax ( t ) + Bu ( t ) , x ( t o ) = [ , ] T , x ( t f ) = [ y i j , ] T . (43)Note that (43) is a standard free initial and final time, fixedinitial and final state optimal control problem. Hence, similarto (29), there is an established solution procedure [20] asoutlined next.First, note that Hamiltonian corresponding to (43) takesthe same form as in (30). Next, the adjoined function thatcombines the terminal constraints on x ( t o ) and x ( t f ) with theterminal cost φ ( t o , t f ) are written as Φ ( x ( t o ) , t o , x ( t f ) , t f ) (cid:44) φ ( t o , t f )+ ν To x ( t o ) + ν Tf ( x ( t f ) − [ y i j , ] T ) (44)where ν o and ν f are sets of multipliers.Finally, the OCP in (43) can be solved to obtain thecorresponding optimal { x ( t ) , u ( t ) , λ ( t ) : t ∈ [ t o , t f ] } , t o , t f , ν o and ν f values by solving the following system of equations[20]: ∂ H ∂ u = α H u ( t ) + λ T ( t ) B = , (45)˙ λ = − (cid:18) ∂ H ∂ x (cid:19) T = − A T λ ( t ) , λ ( t o ) = − (cid:18) ∂ Φ ∂ x ( t o ) (cid:19) T = − ν o , (46) λ ( t f ) = (cid:18) ∂ Φ ∂ x ( t f ) (cid:19) T = ν f , (47) ∂ Φ ∂ t o − H | t = t o = ∂ φ∂ t o − α H u ( t o ) + ν To ( Ax ( t o ) + Bu ( t o )) = , (48) ∂ Φ ∂ t f + H | t = t f = ∂ φ∂ t f + α H u ( t f ) + ν Tf ( Ax ( t f ) + Bu ( t f )) = Lemma 2:
The optimal transit time ρ ∗ i j = t ∗ f − t ∗ o of theOCP (43) satisfies the equation ρ i j d φ ( t o , t f ) d ρ i j = α y i j , (50)where φ ( t o , t f ) (42) is considered as a function of ρ i j = t f − t o . Thus, the optimal terminal times t ∗ o and t ∗ f of (43) are t ∗ o = t s + τ ∗ i ( ρ ∗ i j ) + ¯ τ ∗ i ( ρ ∗ i j ) and t ∗ f = t o + ρ ∗ i j , (51)respectively ( τ ∗ i ( ρ i j ) and ¯ τ ∗ i ( ρ i j ) are as in (40)). The corre-sponding optimal control law u ∗ ( t ) of (43) is given by u ∗ ( t ) = y i j ( t ∗ f − t ∗ o ) (cid:20) t ∗ f + t ∗ o − t (cid:21) , ∀ t ∈ [ t o , t f ] . (52) Proof:
The proof follows the same steps as that ofLemma 1 and is, therefore, omitted. However, the mainsteps are: (i) solve for λ ( t ) , u ( t ) and x ( t ) , ∀ t ∈ [ t o , t f ] using(46), (45) and agent dynamics, respectively, in that order,in terms of t o , t f , ν o and ν f . (ii) use the terminal constraint x ( t f ) = [ y i j , ] T and (46) to determine ν o , ν f in terms of t o , t f and (iii) solve for t o and t f using (47) and (48).Using the optimal terminal times t ∗ f , t ∗ o and the control u ∗ ( t ) (i.e., u ∗ a ( t ) ) proven in Lemma 2, the optimal energyobjective component of this OCP (43) can be evaluated as J eH ( { u ∗ ( t ) } ) = y i j ( t ∗ f − t ∗ o ) . (53)The corresponding sensing objective component (42) is di-rectly given by φ ( t ∗ o , t ∗ f ) = J ∗ sH ( t ∗ f − t ∗ o ) = J sH ( ρ ∗ i j ) . d) Solution of RHCP (18) for U ∗ ia j : We now concludethe solution process of RHCP (18) (outlined in Fig. 4)by applying the determined optimal transit-time ρ ∗ i j in (40)to obtain the optimal control decisions τ ∗ i , ¯ τ ∗ i , τ ∗ j and ¯ τ ∗ j included in the control vector U ∗ ia j (18). Note that here it isnot necessary to evaluate the optimal agent angular velocityprofile { w ∗ a ( t ) : t ∈ [ t ∗ o , t ∗ f ] } (unlike in RHCP3 ) as the agentdoes not plan to depart from the current target immediately. e) Solution of RHCP (19) for j ∗ : We now havesolved the RHCP (18) and have obtained the optimal controlvector U ∗ ia j corresponding to a next-visit target j ∈ N i .Executing this process for all j ∈ N i and subsequentlyevaluating (19) gives the optimal next-visit target j ∗ as j ∗ = arg min j ∈ N i J H ( X ia ( t s ) , U ∗ ia j ; w ( U ∗ ia j )) .Upon solving RHCP1 , agent a remains active on target i for a duration of τ ∗ i , and in the meantime, if any otherexternal event such as C j or ¯ C j for some j ∈ N i occurs, itre-computes the remaining active time at target i . However,if the agent completes executing the determined active time(i.e., if the corresponding event [ h → τ ∗ i ] occurs) with R i ( t O + τ ∗ i ) =
0, then, the agent will have to subsequently solve aninstance of
RHCP2 to determine the remaining inactive timeat target i . Otherwise (i.e., if the event [ h → τ ∗ i ] occurs with R i ( t O + τ ∗ i ) > RHCP3 to determine the next-visit target anddepart from target i . Remark 5:
Upon solving
RHCP1 , over the subsequentactive time at target i , the agent can choose to control itsngular velocity w a ( t ) (while keeping u a ( t ) =
0) to adjustits heading θ a ( t ) to accommodate the impending departuretowards the next-visit target j ∗ found in (19). However, theagent can also choose to keep w a ( t ) = { ( i , j ) : j ∈ N i } accordingly.IV. O PTIMAL C ONTROLS FOR F IRST -O RDER A GENTS
In the previous sections, we have proposed an RHC basedsolution for PMN problems that uses energy-aware second-order agents. In this section, for comparison purposes, wefirst present the details of a similar RHC solution [8] thatuses energy-agnostic first-order agents. Subsequently, moti-vated by a few practical qualities that such first-order agentbehaviors (controls) possess, we derive energy-aware controllaws for governing actual second-order agents in a way thatthey imitate first-order agents.In particular, this section explores how the agent controls { u ∗ a ( t ) : t ∈ P i j } derived for energy-aware second-orderagents (2) (given in Lemmas 1, 2) should be modified if weare to replace them with energy-agnostic first-order agentsor with energy-aware second-order agents that imitate first-order agents. Based on the proposed modular RHCP (18)solution process (outlined in Fig. 4), note that, a change in theagent dynamic model will affect only the OCP (22) (not theRHCP( ρ i j ) (21)) - specifically through the energy objectivecomponent J eH ( { u a ( t ) } ) . Therefore, in this section, our mainfocus is on re-stating (and solving) the OCP (22) assumingits sensing objective component J ∗ sH ( ρ i j ) is given. Note alsothat such a change in the OCP objective will directly affectthe resulting optimal transit-time (i.e., ρ ∗ i j ) and consequentlywill also affect all the other control variables in U ∗ ia j (18).Since we assume the J ∗ sH ( ρ i j ) function is given in thissection, we keep the ensuing analysis independent of theexact RHCP form (i.e., RHCP1 , RHCP2 or RHCP3 ). Tothis end, we start with generalizing the optimal agent controlsestablished for second-order agents in Lemmas 1 and 2, in atheorem. For convenience, let us label the PMN solution thatuses energy-aware actual second-order agents (developed inthe previous sections) as the “SO Method”. a) SO Method : First, note that both equations (34) and(50) are equivalent and thus can be written generally as ρ i j J ∗ sH ( ρ i j ) d ρ i j = α y i j . (54)The ρ i j value that satisfies the above equation is the optimaltransit-time to be used under the SO method (irrespective ofthe RHCP form). Second, note that both (38) and (53) arealso equivalent. This implies that the optimal agent energyconsumption can be expressed independent of the RHCPform. Finally, note that both (35) and (52) represent the sameagent control profile albeit with a shifted starting point t o = t ∗ o (compared to being t o = t s ) in the latter. However, as shown in(51), this starting point t ∗ o depends only on sensing objectiverelated quantities and the determined optimal transit-timevalue. Moreover, since our main focus in this section is on Fig. 5: Tangential velocity profiles on a trajectory segment ( i , j ) ∈ E of length y i j under second-order (SO) and fourdifferent forms of (approximate) first-order (FO-0,1,2,3)agent models. Under the FO- n agent model n ∈ { , , , } , ρ Fn (cid:44) y i j / v mn where v mn is the average velocity, v Fn isthe maximum velocity and u Fn is the maximum absoluteacceleration level.agent controls { u a ( t ) : t ∈ P i j } , without loss of generality,we can assume t o = t ∗ o = P i j = [ , ρ i j ] .With regard to the SO method, let us denote: (i) optimaltransit-time as ρ SO , (ii) optimal tangential velocity andacceleration as v ∗ a ( t ) and u ∗ a ( t ) , respectively, for t ∈ [ , ρ SO ] ,(iii) maximum tangential velocity and acceleration as v SO (cid:44) max { v ∗ a ( t ) } and u SO (cid:44) max { u ∗ a ( t ) } , respectively, and, (iv)optimal agent energy consumption for the transition as E SO . Theorem 3:
Under the SO method: ρ SO is given by (54), v ∗ a ( t ) = y i j ρ SO t ( ρ SO − t ) , u ∗ a ( t ) = y i j ρ SO (cid:18) − t ρ SO (cid:19) , (55)for t ∈ [ , ρ SO ] and v SO = y i j ρ SO , u SO = y i j ρ SO , E SO = y i j ρ SO . (56) Proof:
The results stated in (55) directly follow from(35) and (39). The relationships given in (56) can be obtainedusing (55) (via calculus) and (38).Figure 5 illustrates an example agent tangential velocityprofile segment { v ∗ a ( t ) : t ∈ [ , ρ SO ] } . In the subsequent sub-sections, we explore several alternative approaches to this SOmethod. In particular, each of these alternative methods hasits root in the first-order agent model used in [8], [10] - whereeach agent is assumed to travel at a fixed predefined velocityover each trajectory segment, and thus, does not involve anOCP in RHCPs. We label this approach of controlling agentsas the “FO-0 Method” and an example agent tangentialvelocity profile observed is shown in Fig. 5.However, note that we can neither characterize the totalenergy consumption nor control a real-world agent over sucha tangential velocity profile as in the FO-0 curve in Fig.5 - due to the involved instantaneous infinite accelerations.Therefore, to facilitate a comparison between SO and FO-0 methods, we propose to use actual second-order agents(instead of first-order ones) but enforce each agent controllerto approximate a first-order agent behavior (FO-0). We labelthis approximate version of the FO-0 method as the “FO-1 Method” and a corresponding agent tangential velocityprofile is shown in Fig. 5. ) FO-1 Method : Under the FO-1 method, as shownin Fig. 5, each agent is assumed to go through a sequenceof: constant acceleration (of u F ), constant velocity (of v F )and constant deceleration (of − u F ) stages over a period oflength ρ F every time it travels on a trajectory segment. Inparticular, the acceleration/deceleration magnitude u F andthe average velocity value v m = y i j / ρ F are assumed tobe prespecified, commonly for all ( i , j ) ∈ E . The resultingmaximum velocity level on a trajectory segment ( i , j ) ∈ E isdenoted as v i jF and can be expressed in terms of y i j , u F and v m as v i jF ( u F , v m ) = y ij u F − (cid:113) y ij u F − v m y ij u F v m if y i j ≥ v m u F √ y i j u F otherwise.(57)To conduct a fair comparison between the SO and FO-1 methods, the two parameters u F and v m that define theFO-1 method are selected as follows. First, let us define u maxSO and v maxSO as the respective maximum values of all the u SO and v SO values (empirical) observed in the interested PMNproblem. Then, we propose to enforce: u F = u maxSO and v m = argmax v m > , ( i , j ) ∈ E v i jF ( u maxSO , v m ) subject to v i jF ( u maxSO , v m ) ≤ v maxSO , (58)to ensure the maximum velocity and acceleration valuesresulting from the FO-1 method are identical or as closeas possible to those of the SO method. Proposition 1:
The v m expression given in (58) can besimplified into the form v m = min ( i , j ) ∈ E y i j u maxSO v maxSO ( v maxSO ) + y i j u maxSO subject to y i j ≥ ( v maxSO ) / u maxSO . (59) Proof:
Provided in Appendix C.Note that according to Proposition 1, we need to assumethe given u maxSO and v maxSO satisfy that ∃ ( i , j ) ∈ E with y i j ≥ ( v maxSO ) / u maxSO v m . However, if this assumption does not hold,we simply can use a lower v maxSO value than its actual valuewhen evaluating v m in (59). Note also that the maximumvelocity value observed in the FO-1 method is given by v F = max ( i , j ) ∈ E v i jF ( u maxSO , v m ) , (60)and the agent energy consumption on a trajectory segment ( i , j ) can be proven to be E i jF = u maxSO v i jF ( u maxSO , v m ) . (61)We can now compare the FO-1 and SO methods as wecan compute the total agent energy consumption in the FO-1 method (numerical results are provided in Section V, e.g.,Tab I). We again highlight that the FO-1 method: (i) doesnot consider the agent energy when solving its RHCPs (i.e.,RHCPs do not involve an OCP) and (ii) uses actual second-order agents whose controllers constrained to approximatefirst-order agent behaviors. We conclude our discussion about the FO-1 method with the following remark - which willmotivate us to refine the proposed FO-1 method. Remark 6:
Notice that the optimal second-order agentcontrol u ∗ a ( t ) (55) (in the SO method) decreases linearlyand includes a zero-crossing point (at t = ( t o + t f ) / u a ( t ) = c) FO-2 Method : Even though we have proposed a rea-sonable and consistent way to select the parameters involvedin the FO-1 method (i.e., u F and v m ), it is clear that suchan approach is agnostic to the agent energy consumption.To address this concern, we next propose the FO-2 method,which as shown in Fig. 5, is identical to the FO-1 method inmany ways except for its choice of acceleration/decelerationmagnitude u F and average velocity value v m . In particular,as opposed to selecting u F , v m according to (58), here, anenergy-optimized approach is followed. Theorem 4:
Under the FO-2 method, for a fixed averagevelocity v m = y i j / ρ F , on a trajectory segment ( i , j ) ∈ E ,the optimal agent energy consumption is E F = v m y ij and itis achieved when u F = v m y ij and v F = v m are used. Proof:
Since the total distance traveled by the (FO-2)agent over the period [ , ρ F ] is y i j , we can state that12 (cid:18) ρ F + (cid:18) ρ F − v F u F (cid:19)(cid:19) v F = y i j . (62)Over the same period, the corresponding total agent energyrequirement (denoted as E F ) can be evaluated by integratingthe square of the acceleration profile used. This gives E F = (cid:90) ρ F u ( t ) dt = u F v F u F = u F v F . (63)This expression can be further simplified using (62) to obtain E F = v F v F ρ F − y i j . (64)Recall that both y i j and ρ F (= y i j / v m ) are fixed in thiscase. Therefore, E F in (64) is a function of (only) v F .Thus, we can use calculus to determine the choice of v F that minimizes E F . This (and back substitution) reveals: v F = y i j ρ F , u F = y i j ρ F , E F = y i j ρ F . (65)Finally, this proof can be completed by replacing ρ F termswith y i j / v m in each of the above expressions. Corollary 1: If v m in the FO-2 method is such that y ij v m = ρ SO (i.e., ρ F = ρ SO ), then v F = v SO , u F = u SO , E F = E SO . Proof:
This result directly follow from comparing The-orem 3 (56) with (65).ext, we use Theorem 4 to develop energy-optimizedchoices for the v m and u F parameters of the FO-2 method.However, similar to before, we also use v maxSO and u maxSO values(empirical) as known inputs in this process to make sure themaximum velocity and acceleration values resulting from theFO-2 method are identical or as close as possible to thoseof the SO method.Note that the optimal choices of v F and u F givenin Theorem 4 are dependent on both ( i , j ) ∈ E and v m .Therefore, let us denote those as functions: v i jF ( v m ) = v m , u i jF ( v m ) = v m y i j . (66)Now, we propose to select the parameter v m based on theabove two relationships and the given v maxSO , u maxSO values as v m = arg max v m > , ( i , j ) ∈ E v i jF ( v m ) subject to v i jF ( v m ) ≤ v maxSO , u i jF ( v m ) ≤ u maxSO . (67) Proposition 2:
The v m expression given in (67) can besimplified into the form v m = min { (cid:112) y min u maxSO , v maxSO } (68)where y min = min ( i , j ) ∈ E y i j . Proof:
The proof follows the same steps as that ofProposition 1 and is, therefore, omitted.We point out that even though the average velocity v m computed above is used commonly across all the trajectorysegments, the acceleration/deceleration level of an agent in atrajectory segment ( i , j ) has to be selected as u i jF ( v m ) (66)so as to optimize the agent energy consumption. Hence, theoverall maximum acceleration/deceleration level observed inthe FO-2 method is (via (66)) u F = max ( i , j ) ∈ E u i jF ( v m ) . (69)Consider the scenario where ρ F = ρ SO on a certain tra-jectory segment. In such a case, sensing objective wise, bothFO-2 and SO methods perform equally. However, Corollary1 states that energy objective wise, the FO-2 method showsa 12 .
5% loss (i.e., a higher energy consumption) comparedto the SO method. Moreover, recall that the FO-2 methoddoes not consider the energy expenditure when solving itsRHCPs (i.e., no OCP is involved, similar to FO-0 and FO-1methods). To mitigate these two obvious disadvantages, wenext propose the FO-3 method - where we try to optimize theenergy objective further compromising the sensing objectivein an OCP. d) FO-3 Method : As shown in Fig. 5, the FO-3 methodhas similarly shaped agent state trajectories to FO-1 and FO-2 methods. However, as we will see next, the FO-3 methoddoes not involve any parameter that needs to be selectedbased on external information like u maxSO and v maxSO .On the other hand, note that in the FO-2 method, theoptimal agent energy consumption E F (65) is inversely proportional to the transit-time ρ F . Motivated by this, theFO-3 method proposes to use a larger transit-time ρ F ≥ ρ F (see Fig. 5) compromising the sensing objective so as toachieve a better (lower) energy objective. However, to makethis trade-off a profitable one (in terms of the total objective(3)) we need to use the OCP (22).Note that we assume J ∗ sH ( ρ i j ) (i.e., the sensing objectivecomponent of the OCP (22)) as a known function in thissection. Therefore, the sensing objective component of OCP(22) under the FO-3 method can be written as J ∗ sH ( ρ F ) . Onthe other hand, under the FO-3 method, the energy objectivecomponent of the OCP can be written as J eH ( { u a ( t ) } ) = E F (cid:44) y ij ρ F (using E F in (65) and replacing ρ F with ρ F ).Hence, the objective function of the OCP (22) under the FO-3 method is J H = α H E F + J ∗ sH ( ρ F ) = α H y i j ρ F + J ∗ sH ( ρ F ) . (70) Theorem 5:
Under the FO-3 method, the optimal transit-time is ρ i j = ρ F that satisfies the equation: ρ i j dJ ∗ sH ( ρ i j ) d ρ i j = α y i j . (71)The corresponding optimal values of v F , u F and E F are v F = y i j ρ F , u F = y i j ρ F , E F = y i j ρ F , (72)i.e., v F = k v SO , u F = k u SO , E F = k E SO where k (cid:44) ρ F ρ SO . Proof:
The OCP objective J H given in (70) dependsonly on the choice of ρ F . Therefore, the optimal ρ F value that minimizes J H can be found using the equation dJ H d ρ F =
0, which translates into (71). Since both FO-2 andFO-3 methods assume structurally similar velocity profiles,we still can use Theorem 4 in the context of FO-3 afterreplacing ρ F with ρ F . In this way, (72) (and the remainingresults) can be obtained using (65) (and Theorem 3 (56)). (cid:4) Note that even though equations (71) and (54) are struc-turally similar, their subtle difference (in the coefficienton the RHS) causes the FO-3 method to have a differenttransit-time value compared to the SO method. Based on thedifference between (71) and (54), we can anticipate ρ F > ρ SO (also, as we intended in the first place). In such a case,the parameter k defined in Theorem 5 follows k >
1. Thisimplies that (from Theorem 5) v F < v SO and u F < u SO , i.e.,the FO-3 method requires smaller velocity and accelerationvalues compared to the SO method.As shown in Appendix D and E, this analysis regarding theoptimal (approximate) first and second-order agent behaviorson trajectory segments can be extended effortlessly for sce-narios where agents have additional velocity and accelerationconstraints. V. N UMERICAL R ESULTS
In this section, we first explore the nature of individual
RHCP3 and
RHCP1 solutions presented in Section III undersecond-order agents (i.e., SO method). Then, we compare theerformance metrics J T , J e and J s defined in (3) obtained forseveral different PMN problem configurations (shown in Fig.13) under different agent control methods: SO, FO-1, FO-2and FO-3. A. Numerical Results for a
RHCP3
To numerically evaluate quantities relating to a
RHCP3 and its solution, we chose target parameters as A m = B m = , ∀ m ∈ N i with | ¯ N i | =
3. Moreover, the default values of α , R j ( t s ) , y i j and H are chosen respectively as 0 . , , RHCP3 solution (i.e., v a ( t ) , u a ( t ) , J sH , J eH and J H ) changes when thethree parameters α , R j ( t s ) and y i j are varied.Figure 6 confirms that by increasing the weight factor α (i.e., by giving more weight to the energy objective), wecan constrain the agent tangential velocity and accelerationprofiles. A converse behavior can be seen in Fig. 7 withrespect to the next-visit target j ’s initial uncertainty R j ( t s ) . Inparticular, when R j ( t s ) is high, the agent is required to arriveat target j quickly (resulting in high tangential velocity andacceleration levels). In contrast, Fig. 8 reveals that when thetrajectory segment length y i j is varied, the agent may not (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 6:
RHCP3 solution under different weight factors (i.e., α in(3)) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 7:
RHCP3 solution under different initial target uncer-tainty (i.e., R j ( t s ) ) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 8:
RHCP3 solution under different trajectory segmentlength (i.e., y i j ) values. try to significantly regulate: (i) the arrival time at target j (i.e., the transit-time ρ i j = t f ) or (ii) the magnitude of themaximum tangential acceleration. B. Numerical Results for a
RHCP1
Similar to before, to numerically evaluate quantities relat-ing to a
RHCP1 and its solution, we use the same parametervalues mentioned before, along with an additional default (initial) value R i ( t s ) =
50. Figures 9 - 12 respectively showhow the
RHCP1 solution (i.e., v a ( t ) , u a ( t ) , J sH , J eH and J H )changes when the four parameters α , R j ( t s ) , y i j and R i ( t o ) are varied.The RHCP1 solution properties illustrated in Figs. 9-11are identical to those of the
RHCP3 (shown in Figs. 6-8),except for the fact that now t o > t s = t o is theplanned time to leave the target i ). However, Figs. 9-11 implythat t o is independent of the α , R j ( t s ) or y i j value. In contrast,Fig. 12 reveals that t o is directly proportional to the R i ( t s ) value. Moreover, Fig. 12 shows that the maximum valuesof tangential velocity and acceleration decreases by a smallmargin when R i ( t s ) is increased. This implies that the agentplans to travel less urgently when it has to do more “sensing”at the current target i . C. Overall Performance in PMN Problems
In this final section, we compare the performance metrics J T , J e and J s defined in (3) obtained for several different PMNproblem configurations using agents behaving under: (i) SO,(ii) FO-1, (iii) FO-2 and (iv) FO-3 methods. In addition to J T , J e and J s , we also use the performance metrics: v max (cid:44) max a ∈ A , t ∈ [ , T ] v a ( t ) and u max (cid:44) max a ∈ A , t ∈ [ , T ] | u a ( t ) | , (73)to represent overall agent behaviors rendered by dif-ferent agent models. The proposed RHC solution to (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 9:
RHCP1 solution under different α (in (3)) values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 10:
RHCP1 solution under different R j ( t s ) values.ABLE I: A comparison of performance metrics: J e , J s , J T (defined in (3)) and v max , u max (defined in (73)) observed underdifferent agent control methods SO, FO-1, FO-2 and FO-3 for each PMN Problem Configuration (PC) shown in Fig. 13. PC J T J e / J s v max u max SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3 SO FO-1 FO-2 FO-3PC1
PC2 363.1 1084.5 252.9
PC3
PC4 705.7 1212.8 721.4
PC5
PC6
PC7
PC8
Avg. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 11:
RHCP1 solution under different y i j values. (a) Tan. Vel.: v a ( t ) (b) Tan. Acc.: u a ( t ) (c) Cost: J sH , J eH , J H Fig. 12:
RHCP1 solution under different R i ( t s ) values. (a) PC1 (b) PC2 (c) PC3 (d) PC4(e) PC5 (f) PC6 (g) PC7 (h) PC8 R i ( t ) , J a ( , t ) and s a ( t ) ), respectively. Since these threequantities are time-dependent, in the figures, only theirterminal state (i.e., at t = T ) is shown when the highestperforming agent model (control method) is used.The parameters of each PC have been chosen as follows: A i = , B i = , R i ( ) = . , ∀ i ∈ T and target locations(i.e., Y i ) are specified in each PC figure. In all PCs, targetshave been placed inside a 600 ×
600 mission space. The timehorizon was set to T = s a ( ) = Y i with i = + ( a − ) ∗ round ( M / N ) . The upper bound on the planning horizon(i.e., H ) was chosen as H = T =
250 and the weight factor α in (3) was chosen as α = . × − .Obtained comparative results are summarized in Tab. I.According to these results, on average, the energy-awaresecond-order agents (i.e., the SO method) have outperformedthe energy-agnostic and energy-aware versions of first-orderagents (i.e., FO-1 and FO-2, FO-3 methods, respectively) interms of sensing objective J s , energy objective J e as well asthe total objective J T .However, the energy-aware (approximate) first-order agentcontrol method FO-3 has shown relatively closer (within1 .
17% on average) performance levels to those of the SOmethod. Moreover, the FO-3 method has outperformed theSO method in terms of the performance metrics u max and v max . This observation is reasonable because the motivationbehind developing the FO-3 method was to improve theagent energy consumption. Recall also that we already haveproven in Theorem 5 that v F = k v SO , u F = k v SO with k ≥ ONCLUSION
This paper considers the persistent monitoring problemdefined on a network of targets that needs to be monitoredby a team of energy-aware dynamic agents. Starting froman existing event-driven receding horizon control (RHC)solution, we exploit optimal control techniques to incorporateagent dynamics and agent energy consumption into theHC problem setup. The proposed overall RHC solution iscomputationally efficient, distributed, on-line and gradient-free. Numerical results are provided to highlight the improve-ments with respect to an RHC solution that uses energy-agnostic first-order agents. Ongoing work aims to combinethe proposed solution with a path planning algorithm toaddress situations where the agent trajectory segment shapeshave to be optimally determined.A
PPENDIX
A. Selecting the Weight Factor: α The weight factor α present in both the main objective J T (3) and the RHCP objective J H (20) is an important factorthat decides the trade-off between energy objective and thesensing objective components (i.e., J eH and J sH , respectively,in the latter case). Moreover, note that α can be used tobound the resulting optimal agent velocities and accelerationsfrom the proposed RHC solution. Therefore, it is importantto have an intuitive technique to select (and vary) α ∈ [ , ∞ ) .To develop such a technique, we use the RHCP form(20): J H = α J eH + J sH , rather than the main optimizationproblem form (3). A typical RHCP objective function J H that considers both energy and sensing objectives (i.e., J eH and J sH , respectively) can be written as J H = β J eH E maxH + ( − β ) J sH S maxH , (74)where E maxH and S maxH are upper-bounds to the terms J eH and J sH respectively and β is a parameter such that β ∈ [ , ] .Next, let us re-arrange the above expression to isolate thesensing objective component as J H = (cid:20) β − β S maxH E maxH (cid:21)(cid:124) (cid:123)(cid:122) (cid:125) α J eH + J sH . (75)Now, if the ratio S maxH / E maxH is known, a candidate α valuecan be obtained intuitively by selecting β ∈ [ , ) appro-priately. For an example, selecting β = . S maxH and E maxH we canconsider a simple RHCP that occurs when an agent a isready to leave a target i with a (single) neighboring target j connected through a trajectory segment ( i , j ) . For such ascenario, assuming steady-state operation, using Theorem 1,we can show that S maxH ∝ ρ i j . Next, let us define quantities E maxH , v max and u max based on Theorem 3 (56) as E maxH (cid:44) E SO ∝ y i j ρ i j , v max (cid:44) v SO ∝ y i j ρ i j , u max (cid:44) u SO ∝ y i j ρ i j . Combining S maxH , E maxH together with v max or (alternatively) u max stated above, we can show that S maxH E maxH ∝ y i j v max or S maxH E maxH ∝ u max , (76)respectively. Here, v max and u max can be thought of as thepreferred tangential velocity and acceleration bounds for the agents, respectively. And y i j can be thought of as themean trajectory segment length over all ( i , j ) ∈ E . Finally,neglecting the constants of proportionality in the abovestatements and using (75), we can state α as α = β − β y i j v max or α = β − β u max . (77)This result (77) provides a systematic way to select α while accounting for: (i) the relative balance between sensingand energy objectives (via β ∈ [ , ] ) and (ii) the preferredtangential velocity and acceleration bounds (via v max and u max , respectively). For an example, with β = . , v max = , y i j =
25, the relationship in (77) gives α = . × − . B. Proof of Theorem 2
First, we transform the parametric form { ( x ( p ) , y ( p )) : p ∈ [ p o , p f ] } of the trajectory segment shape in to theform { ( x ( l ) , y ( l )) : l ∈ [ , y i j ] } where l represents the dis-tance along the trajectory segment from ( x ( p o ) , y ( p o )) to ( x ( p ) , y ( p )) , p ∈ [ p o , p f ] (recall that y i j is the total lengthof the interested trajectory segment ( i , j ) ∈ E ). To achievethe said transformation, we should be able to express theparameter p explicitly in terms of the distance l . For thispurpose, exploiting the geometry (see also Fig. 2), we canwrite a differential equation: dl = (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) d p . (78)Under assumption 2, (78) can be solved to obtain explicitrelationships: l = f ( p ) and p = f − ( l ) , where f : [ p o , p f ] → [ , y i j ] is as in (12). Thus, we now can express the trajectorysegment shape in the form: { ( x ( l ) , y ( l )) : l ∈ [ , y i j ] } .Second, according to Fig. 2, note that when the agent a ∈ A is at s a ( t ) ≡ ( x ( l ) , y ( l )) , its orientation θ satisfiestan θ = ˙ y ( l ) ˙ x ( l ) = dy ( l ) dl dldtdx ( l ) dl dldt = y (cid:48) x (cid:48) . (79)In the above, the notation “ (cid:48) ” (without a subscript) has beenused here to represent the d · dl operator. The time derivativeof this relationship givessec θ d θ dt = ( x (cid:48) y (cid:48)(cid:48) − y (cid:48) x (cid:48)(cid:48) )( x (cid:48) ) dldt . (80)Note that if l = l a ( t ) is used to represent the total distancethe agent traveled on the trajectory segment by the time t ∈ [ t o , t f ] , we can also write dldt = v a ( t ) (i.e., the agenttangential velocity) and d θ dt = w a ( t ) (i.e., the agent angularvelocity). Therefore, using the above two relationships andthe trigonometric identity: sec θ = + tan θ , we can obtain w a ( t ) for any t ∈ [ t o , t f ] as w a ( t ) = x (cid:48) y (cid:48)(cid:48) − y (cid:48) x (cid:48)(cid:48) ( x (cid:48) ) + ( y (cid:48) ) (cid:124) (cid:123)(cid:122) (cid:125) G ( l ) v a ( t ) . (81)Here, note that the first term G ( l ) is a function of l = l a ( t ) .inally, we can transform this G ( l ) term in (81) to obtaina function of parameter p , using the following relationships(from the chain rule and the fact that l = f ( p ) ): x (cid:48) = x (cid:48) p f (cid:48) p , y (cid:48) = y (cid:48) p f (cid:48) p , x (cid:48)(cid:48) = x (cid:48)(cid:48) p f (cid:48) p − x (cid:48) p f (cid:48)(cid:48) p ( f (cid:48) p ) , y (cid:48)(cid:48) = y (cid:48)(cid:48) p f (cid:48) p − y (cid:48) p f (cid:48)(cid:48) p ( f (cid:48) p ) , (82)and (using (12)) f (cid:48) p = (cid:113) ( x (cid:48) p ) + ( y (cid:48) p ) . (83)Recall that, in the above, we have used the notation “ (cid:48) ”(with a subscript p ) to denote the operator d · dp . Now, using(82) and (83), G ( l ) in (81) can be written as G ( l ) = G ( f ( p )) = x (cid:48) p y (cid:48)(cid:48) p − y (cid:48) p x (cid:48)(cid:48) p (cid:0) ( x (cid:48) p ) + ( y (cid:48) p ) (cid:1) . (84)Comparing the above result with (13), notice that G ( l ) = F ( p ) . Therefore, (81) can be written as w a ( t ) = F ( p ) v a ( t ) where now p can be replaced with p = f − ( l ) = f − ( l a ( t )) to obtain (15): w a ( t ) = F ( f − ( l a ( t ))) v a ( t ) , which completes the proof. C. Proof of Proposition 1
It is easy to show that v i jF ( u maxSO , v m ) (57) is a monoton-ically increasing function with respect to v m . In particular,if v m > (cid:112) y i j u maxSO , the function v i jF ( u maxSO , v m ) plateaus at alevel (cid:112) y i j u maxSO . Therefore, the set of v m values that satisfiesthe inequality v i jF ( u maxSO , v m ) ≤ v maxSO can be stated as v m ≤ v i jm (cid:44) y ij u maxSO v maxSO ( v maxSO ) + y ij u maxSO if y i j ≥ ( v maxSO ) u maxSO ∞ otherwise. (85)According to (58), the inequality v i jF ( u maxSO , v m ) ≤ v maxSO shouldhold for all ( i , j ) ∈ E . Therefore, the feasible set of v m in(58) is: v m ≤ min ( i , j ) ∈ E v i jm . Again, using the monotonicityproperty of v i jF ( u maxSO , v m ) (which is also the the objectivefunction of (58)), we can show that the optimal v m value(i.e., v m ) of (58) is the maximum feasible v m value, i.e., v m = min ( i , j ) ∈ E v i jm = min ( i , j ) ∈ E y i j u maxSO v maxSO ( v maxSO ) + y i j u maxSO subject to y i j ≥ ( v maxSO ) / u maxSO . (86) D. Second-Order Agent Models with Constraints
In this section, we show how a second-order agent a ∈ A should select its behavior (including the transit-time) on atrajectory segment when solving a RHCP under tangentialvelocity or acceleration bounds. a) SO-V Method : The SO-V method assumes that theagent tangential velocity is bounded such that | v a ( t ) | ≤ ¯ v where ¯ v is predefined and satisfies ¯ v < v SO = y ij ρ SO (recall that ρ SO is the optimal transit-time found for the unconstrainedSO method). Based on the optimal unconstrained velocityprofile (55), we can expect the optimal constrained velocityprofile to contain three different phases: two quadratic seg-ments at the beginning and the end and a constant velocitysegment in the middle, as shown in Fig. 14.A generalized version of the optimal unconstrained veloc-ity profile (55) can be written as v ( t ) = α t ( β − t ) , t ∈ [ , β ] where β can be thought of as controllable parameter and α = y ij β (enforcing the condition: (cid:82) β v ( t ) dt = y i j ). We nextuse this v ( t ) profile to construct the optimal constrainedvelocity profile v a ( t ) as v a ( t ) (cid:44) v ( t ) t ∈ [ , t ) ¯ v t ∈ [ t , t ) v ( t − ( ρ SV − β )) t ∈ [ t , ρ SV ] , (87)where t is such that v ( t ) = ¯ v (existence of such a t isguaranteed when β ≤ ρ SO ), t = ρ SV − t (from symmetry)and the transit-time ρ SV is such that (cid:82) ρ SV v a ( t ) dt = y i j . Inparticular, it can be shown that t = β (cid:16) − (cid:0) − β t v (cid:1) (cid:17) , ρ SV = β + t v (cid:0) − β t v (cid:1) , (88)where t v (cid:44) y ij v . We highlight that the agent velocity profile v a ( t ) defined in (87) depends only on the parameter β .Fig. 14: Tangential velocity profiles on a trajectory segment ( i , j ) ∈ E under unconstrained (SO) and constrained (SO-V)second-order agent models.Under the SO-V method, the sensing objective componentof the OCP (22) is J ∗ sH ( ρ SV ) and the energy objectivecomponent of the OCP can be written as E SV (cid:44) (cid:90) ρ SV ( dv a ( t ) dt ) dt = y i j β (cid:16)(cid:0) − β t v (cid:1) − (cid:17) . (89)Therefore, the OCP objective that needs to be optimized ina RHCP under the SO-V method is J H = α E SV + J ∗ sH ( ρ SV ) . (90)Thus, the optimal transit-time ρ SV (and hence the optimal β value via (88)) can be found using: dJ H d ρ SV = α dE SV d β / d ρ SV d β + dJ ∗ sH ( ρ SV ) d ρ SV = . (91)s shown in Fig. 4, note that finding the optimal transit-time corresponding to the OCP (22) enables determining theremaining control inputs in U ∗ ia j of the RHCP (18). b) SO-A Method : The SO-A method assumes that theagent tangential acceleration is bounded such that | u a ( t ) | ≤ ¯ u where ¯ u is predefined and satisfies ¯ u < u SO = y ij ρ SO . Based onthe optimal unconstrained acceleration profile (55), we canexpect the optimal constrained acceleration profile to be acomposition of three stages: two constant acceleration ses-sions at the beginning and the end and a linearly decreasingacceleration session in the middle, as shown in Fig. 15.In particular, the optimal constrained acceleration profile u a ( t ) can be written as u a ( t ) (cid:44) ¯ u t ∈ [ , t ] ¯ u − β ( t − t ) t ∈ [ t , t ] − ¯ u t ∈ [ t , ρ SA ] , (92)where t , t are switching times such that v a ( t ) = v SA and ρ SA is the transit-time. Using the symmetry and the relationship (cid:82) ρ SA v a ( t ) dt = y i j , it can be shown that v SA = (cid:115) y i j ¯ u − ¯ u β , ρ SA = u (cid:115) y i j ¯ u − ¯ u β + ¯ u β . (93)Notice that β is a controllable parameter that fully definesthe optimal constrained acceleration profile in (92).Fig. 15: Tangential acceleration profiles on a trajectorysegment ( i , j ) ∈ E under unconstrained (SO) and constrained(SO-A) second-order agent models.Under the SO-A method, the sensing objective componentof a RHCP is J ∗ sH ( ρ SA ) and the energy objective componentof the OCP (22) can be written as E SA (cid:44) (cid:90) ρ SA u a ( t ) dt = u (cid:115) y i j ¯ u − ¯ u β + ¯ u α . (94)Therefore, the composite objective function of the OCPunder the SO-A method is J H = α E SA + J ∗ sH ( ρ SA ) . (95)Thus, the optimal transit-time ρ SA (and hence the optimal β value via (93)) can be found using the equation: dJ H d ρ SA = α dE SA d β / d ρ SA d β + dJ ∗ sH ( ρ SA ) d ρ SA = . (96) E. First-Order Agent Models with Constraints
In this section, we investigate how a first-order agent a ∈ A should select its behavior (including the transit-time) on atrajectory segment when solving an RHCP under tangentialvelocity or acceleration bounds. a) FO-V Method : The FO-V method assumes that theagent tangential velocity is bounded such that v a ( t ) ≤ ¯ v where¯ v is predefined and satisfies ¯ v < v F = y ij ρ F (recall that ρ F isthe transit-time found for the unconstrained FO-3 method).Under this constrained setting, the optimal agent tangentialvelocity profile is shown in Fig. 16 where u FV is a control-lable parameter. Taking the corresponding transit-time as ρ FV and using the fact that (cid:82) ρ FV v a ( t ) dt = y i j , it can be shownthat u FV = ¯ v ¯ v ρ FV − y i j . (97)Similar to before, under this FO-V method, the sensingobjective component of the OCP (22) is J ∗ sH ( ρ FV ) and theenergy objective component of a RHCP can be written as E FV = v ¯ v ρ FV − y i j . (98)The composite objective function that needs to be optimizedthe OCP (22) under the FO-V method is J H = α E FV + J ∗ sH ( ρ FV ) . (99)Therefore, the optimal transit-time ρ FV (and hence the opti-mal u FV value via (97)) can be found using: dJ H d ρ FV = α dE FV d ρ FV + dJ ∗ sH ( ρ FV ) d ρ FV = . (100)Fig. 16: Tangential velocity profiles on a trajectory segment ( i , j ) ∈ E under unconstrained (FO-3) and constrained (FO-V) first-order agent models. b) FO-A Method : The FO-A method assumes that theagent tangential acceleration is bounded such that | u a ( t ) | ≤ ¯ u where ¯ u is predefined and satisfies ¯ u < u F = y ij ρ F .Under this constrained setting, the optimal agent tangentialvelocity profile is shown in Fig. 17 where v FA is a control-lable parameter. Taking the corresponding transit-time as ρ FA and using the fact that (cid:82) ρ FA v a ( t ) dt = y i j , it can be shown that v FA = ρ FA ¯ u − ¯ u (cid:114) ρ FA − y i j ¯ u . (101)Following the same procedure as before, under this FO-A method, the sensing objective component of the OCP is ∗ sH ( ρ FA ) and the energy objective component of a RHCPcan be written as E FA = ¯ u (cid:0) ρ FA − (cid:114) ρ FA − y i j ¯ u (cid:1) . (102)The composite objective function that needs to be optimizedin a RHCP under the FO-A method is J H = α E FA + J ∗ sH ( ρ FA ) . (103)Therefore, the optimal transit-time ρ FA (and hence the opti-mal v FA value via (101)) can be found using: dJ H d ρ FA = α dE FA d ρ FA + dJ ∗ sH ( ρ FA ) d ρ FA = . (104)Fig. 17: Tangential velocity profiles on a trajectory segment ( i , j ) ∈ E under unconstrained (FO-3) and constrained (FO-A) first-order agent models.R EFERENCES[1] M. L. Elwin, R. A. Freeman, and K. M. Lynch, “Distributed Envi-ronmental Monitoring with Finite Element Robots,”
IEEE Trans. onRobotics , vol. 36, no. 2, pp. 380–398, 2020.[2] D. Kingston, R. W. Beard, and R. S. Holt, “Decentralized PerimeterSurveillance Using a Team of UAVs,”
IEEE Trans. on Robotics ,vol. 24, no. 6, pp. 1394–1404, 2008.[3] R. Reshma, T. Ramesh, and P. Sathishkumar, “Security SituationalAware Intelligent Road Traffic Monitoring Using UAVs,” in
Proc. of2nd IEEE Intl. Conf. on VLSI Systems, Architectures, Technology andApplications , 2016, pp. 1–6.[4] S. L. Smith, M. Schwager, and D. Rus, “Persistent Monitoring ofChanging Environments Using a Robot with Limited Range Sensing,”in
Proc. of IEEE Intl. Conf. on Robotics and Automation , 2011, pp.5448–5455.[5] J. Yu, S. Karaman, and D. Rus, “Persistent Monitoring of Events WithStochastic Arrivals at Multiple Stations,”
IEEE Trans. on Robotics ,vol. 31, no. 3, pp. 521–535, 2015.[6] N. Mathew, S. L. Smith, and S. L. Waslander, “Multirobot RendezvousPlanning for Recharging in Persistent Tasks,”
IEEE Trans. on Robotics ,vol. 31, no. 1, pp. 128–142, 2015.[7] S. K. Hari, S. Rathinam, S. Darbha, K. Kalyanam, S. G. Manyam,and D. Casbeer, “The Generalized Persistent Monitoring Problem,” in
Proc. of American Control Conf. , vol. 2019-July, 2019, pp. 2783–2788.[8] S. Welikala and C. G. Cassandras, “Event-Driven Receding HorizonControl For Distributed Persistent Monitoring in Network Systems,”
Automatica , vol. 127, p. 109519, 2021.[9] Y.-W. Wang, Y.-W. Wei, X.-K. Liu, N. Zhou, and C. G. Cassandras,“Optimal Persistent Monitoring Using Second-Order Agents withPhysical Constraints,”
IEEE Trans. on Automatic Control , vol. 64,no. 8, pp. 3239–3252, 2017.[10] N. Zhou, C. G. Cassandras, X. Yu, and S. B. Andersson, “OptimalThreshold-Based Distributed Control Policies for Persistent Monitor-ing on Graphs,” in
Proc. of American Control Conf. , 2019, pp. 2030–2035.[11] X. Lan and M. Schwager, “Planning Periodic Persistent MonitoringTrajectories for Sensing Robots in Gaussian Random Fields,” in
InProc. of IEEE Intl. Conf. on Robotics and Automation , 2013, pp. 2415–2420. [12] X. Lin and C. G. Cassandras, “An Optimal Control Approach toThe Multi-Agent Persistent Monitoring Problem in Two-DimensionalSpaces,”
IEEE Trans. on Automatic Control
IFAC-PapersOnLine , vol. 52, no. 20, 2019,pp. 217–222.[15] W. Li and C. G. Cassandras, “A Cooperative Receding HorizonController for Multi-Vehicle Uncertain Environments,”
IEEE Trans.on Automatic Control , vol. 51, no. 2, pp. 242–257, 2006.[16] R. Chen and C. G. Cassandras, “Optimal Assignments in Mobility-on-Demand Systems Using Event-Driven Receding Horizon Control,”
IEEE Trans. on Intelligent Transportation Systems , pp. 1–15, 2020.[Online]. Available: https://doi.org/10.1109/TITS.2020.3030218[17] J. Yu, M. Schwager, and D. Rus, “Correlated Orienteering Problemand its Application to Persistent Monitoring Tasks,”
IEEE Trans. onRobotics , vol. 32, no. 5, pp. 1106–1118, 2016.[18] M. Pakdaman and M. M. Sanaatiyan, “Design and Implementationof Line Follower Robot,” in
Proc. of Intl. Conf. on Computer andElectrical Engineering , vol. 2, 2009, pp. 585–590.[19] T. Kim, C. Lee, and H. Shim, “Completely Decentralized Design ofDistributed Observer for Linear Systems,”
IEEE Trans. on AutomaticControl , vol. 65, no. 11, pp. 4664–4678, 2020.[20] A. E. Bryson, Y. C. Ho, Y. C. Ho, and D. P. Cantwell,