An Optimal Control Approach for the Data Harvesting Problem
aa r X i v : . [ c s . S Y ] A ug An Optimal Control Approach for the Data Harvesting Problem
Yasaman Khazaeni and Christos G. CassandrasDivision of Systems Engineeringand Center for Information and Systems EngineeringBoston University, MA 02446 [email protected],[email protected]
Abstract — We propose a new method for trajectory planningto solve the data harvesting problem. In a two-dimensionalmission space, N mobile agents are tasked with the collectionof data generated at M stationary sources and delivery to abase aiming at minimizing expected delays. An optimal controlformulation of this problem provides some initial insightsregarding its solution, but it is computationally intractable,especially in the case where the data generating processes arestochastic. We propose an agent trajectory parameterization interms of general function families which can be subsequentlyoptimized on line through the use of Infinitesimal PerturbationAnalysis (IPA). Explicit results are provided for the case ofelliptical and Fourier series trajectories and some properties ofthe solution are identified, including robustness with respect tothe data generation processes and scalability in the size of anevent set characterizing the underlying hybrid dynamic system. I. I
NTRODUCTION
Systems consisting of cooperating mobile agents are beingcontinuously developed for a broad spectrum of applicationssuch as environmental sampling [1],[2], surveillance [3],coverage control [4],[5],[6], persistent monitoring [7],[8],task assignment [9], and data harvesting and informationcollection [10],[11],[12]. The data harvesting problem arisesin many settings, including “smart cities” where wirelesssensor networks (WSNs) are being widely deployed for pur-poses of monitoring the environment, traffic, infrastructurefor transportation and for energy distribution, surveillance,and a variety of other specialized purposes [13]. Althoughmany efforts focus on the analysis of the vast amount ofdata gathered, we must first ensure the existence of robustmeans to collect all data in a timely fashion when the sizeof the sensor networks and the level of node interference donot allow for a fully wireless connected system. Sensors canlocally gather and buffer data, while mobile elements (e.g.,vehicles, aerial drones) retrieve the data from each part ofthe network. Similarly, mobile elements may themselves beequipped with sensors and visit specific points of interestto collect data which must then be delivered to a givenbase. These mobile agents should follow an optimal path(in some sense to be defined) which allows visiting each
The authors work is supported in part by NSF under grants CNS-1239021, ECCS-1509084, and IIP-1430145, by AFOSR under grantFA9550-12-1-0113, by ONR under grant N00014-09-1-1051, and by theCyprus Research Promotion Foundation under Grant New InfrastructureProject/Strategic/0308/26. data source frequently enough and within the constraints ofa given environment like that of an urban setting.The data harvesting problem using mobile agents knownas “message ferries” or “data mules” has been consideredfrom several different perspectives. For a survey on differentrouting problems in WSNs see [14],[15] and referencestherein. In [16] algorithms are proposed for patrolling targetpoints with the goal of balanced time intervals betweenconsecutive visits. A weighted version of the algorithmimproves the performance in cases with unequally valuedtargets. However, in this scenario the data need not bedelivered to a base and visits to a recharging station are onlynecessary if the data mules are running out of energy. In [11]the problem is viewed as a polling system with a mobileserver visiting data queues at fixed targets. Trajectories aredesigned for the mobile server in order to stabilize thesystem, keeping queue contents (modeled as fluid queues)uniformly bounded.In this paper, we consider the data harvesting problem asan optimal control problem for a team of multiple cooperat-ing mobile agents responsible for collecting data generatedby arbitrary random processes at fixed target points anddelivering these data to a base. The ultimate goal is for thedata to be collected and delivered with minimum expecteddelay. Rather than looking at this problem as a schedulingtask where visit times for each target are determined as-suming agents only move in straight lines between targets,we aim to optimize a two-dimensional trajectory for eachagent, which may be periodic and can collect data froma target once the agent is within a given range from thattarget. Interestingly, the setting of the problem can also beviewed as an evacuation process where visits are needed toretrieve individuals from a set of target points which may beof non-uniform importance. In this paper, we limit ourselvesto trajectories with no constraints due to obstacles or otherfactors. Clearly, in an urban environment this is generallynot the case and the set of admissible trajectories will haveto be restricted in subsequent work.We formulate a finite-horizon optimal control problemin which the underlying dynamic system has hybrid (time-driven and event-driven) dynamics. We note that the speci-fication of an appropriate objective function is nontrivial forthe data harvesting problem, largely due to the fact that theagents act as mobile servers for the data sources and havetheir own dynamics. Since the control is applied to the mo-ion of agents, the objective function must capture the agentbehavior in addition to that of the data queues at the targets,the agents, and the base. The solution of this optimal controlproblem (even in the deterministic case) requires a Two PointBoundary Value Problem (TPBVP) numerical solver whichis clearly not suited for on-line operation and yields onlylocally optimal solutions. Thus, the main contribution ofthe paper is to formulate and solve an optimal parametricagent trajectory problem. In particular, similar to the ideain [17] we represent an agent trajectory in terms of generalfunction families characterized by a set of parameters that weseek to optimize, given an objective function. We considerelliptical trajectories as well as the much richer set of Fourierseries trajectory representations. We then show that we canmake use of Infinitesimal Perturbation Analysis (IPA) forhybrid systems [18] to determine gradients of the objectivefunction with respect to these parameters and subsequentlyobtain (at least locally) optimal trajectories. This approachalso allows us to exploit ( i ) robustness properties of IPA toallow stochastic data generation processes, ( ii ) the event-driven nature of the IPA gradient estimation process whichis scalable in the event set of the underlying hybrid dynamicsystem, and ( iii ) the on-line computation which implies thattrajectories adjust as operating conditions change (e.g., newtargets).In section II we present an optimal control formulationfor the data harvesting problem. In section III we providea Hamiltonian analysis leading to a TPBVP. In section IVwe formulate the alternative problem of determining optimaltrajectories based on general function representations andprovide solutions using a gradient-based algorithm usingIPA for two particular function families. Sections V andVI present the numerical results and the conclusions respec-tively. II. P ROBLEM F ORMULATION
We consider a data harvesting problem where N mo-bile agents collect data from M stationary targets in atwo-dimensional rectangular mission space S = [0 , L ] × [0 , L ] ⊂ R . Each agent may visit one or more of the M targets, collect data from them, and deliver them to a base. Itthen continues visiting targets, possibly the same as beforeor new ones, and repeats this process. By cooperating inhow data are collected and delivered, the objective of theagent team is to minimize a weighted sum of collection anddelivery delays over all targets.Let s j ( t ) = [ s xj ( t ) , s yj ( t )] be the position of agent j attime t with s xj ( t ) ∈ [0 , L ] and s yj ( t ) ∈ [0 , L ] . The positionof the agent follows single integrator dynamics: ˙ s xj ( t ) = u j ( t ) cos θ j ( t ) , ˙ s yj ( t ) = u j ( t ) sin θ j ( t ) (1)where u j ( t ) is the scalar speed of the agent (normalizedso that ≤ u j ( t ) ≤ ) and θ j ( t ) is the angle relative tothe positive direction, ≤ θ j ( t ) < π . Thus, we assumethat the agent controls its orientation and speed. An agent isrepresented as a particle, so that we will omit the need forany collision avoidance control. The agent dynamics abovecould be more complicated without affecting the essence of X . . .X i X M P P MN . . . Z ij P B P BN . . .Y . . . Y M Y i Fig. 1. Data harvesting queueing model for M targets and N agents our analysis, but we will limit ourselves here to (1).Consider a set of data sources as points w i ∈ S, i =1 , . . . , M, with associated ranges r ij , so that agent j cancollect data from w i only if the Euclidean distance D ij ( t ) = k w i − s j ( t ) k satisfies D ij ( t ) ≤ r ij . Similarly, there is a baseat w B ∈ S which receives all data collected by the agents.An agent can only deliver data to the base if the Euclideandistance D Bj ( t ) = k w Bj − s j ( t ) k satisfies D Bj ( t ) ≤ r Bj . Wedefine a function P ij ( t ) to be the normalized data collectionrate from target i when the agent is at s j ( t ) : P ij ( t ) = p ( w i , s j ( t )) (2)and we assume that: ( A1 ) it is monotonically non-increasingin the value of D ij ( t ) = k w i − s j ( t ) k , and ( A2 ) itsatisfies P ij ( t ) = 0 if D ij ( t ) > r ij . Thus, P ij ( t ) canmodel communication power constraints which depend onthe distance between a data source and an agent equippedwith a receiver (similar to the model used in [11]) or sensingrange constraints if an agent collects data using on-boardsensors. For simplicity, we will also assume that: ( A3 ) P ij ( t ) is continuous in D ij ( t ) . Similarly, we define: P Bj ( t ) = p ( w B , s j ( t )) (3)The data harvesting problem described above can be viewedas a polling system where mobile agents are serving thetargets by collecting data and delivering them to the base.Figure 1 shows a queueing system in which each P ij ( t ) isdepicted as a switch activated when D ij ( t ) ≤ r ij to capturethe finite range between agent j and target i . All queuesare modeled as flow systems whose dynamics are given next(however, as we will see, the agent trajectory optimizationis driven by events observed in the underlying system wherequeues contain discrete data packets so that this modelingdevice has minimal effect on our analysis). As seen in Fig.1, there are three sets of queues. The first set includes the datacontents X i ( t ) ∈ R + at each target i = 1 , ..., M where weuse σ i ( t ) as the instantaneous inflow rate. In general, we treat { σ i ( t ) } as a random process assumed only to be piecewisecontinuous; we will treat it as a deterministic constant onlyfor the Hamiltonian analysis in the next section. Thus, attime t , X i ( t ) is a random variable resulting from the randomprocess { σ i ( t ) } . The second set of queues consists of datacontents Z ij ( t ) ∈ R + onboard agent j collected from target i as long as P ij ( t ) > . The last set consists of queues Y i ( t ) ∈ R + containing data at the base, one queue for eachtarget, delivered by some agent j as long as P Bj ( t ) > .Note that { X i ( t ) } , { Z ij ( t ) } and { Y i ( t ) } are also randomprocesses and the same applies to the agent states { s j ( t ) } , = 1 , . . . , N , since the controls are generally dependent onthe random queue states. Thus, we ensure that all randomprocesses are defined on a common probability space. Themaximum rate of data collection from target i by agent j is µ ij and the actual rate is µ ij P ij ( t ) if j is connected to i . We will assume that: ( A4 ) only one agent at a time isconnected to a target i even if there are other agents l with P il ( t ) > ; this is not the only possible model, but we adoptit based on the premise that simultaneous downloading ofpackets from a common source creates problems of properdata reconstruction at the base. The dynamics of X i ( t ) ,assuming that agent j is connected to it, are ˙ X i ( t ) = (cid:26) if X i ( t ) = 0 and σ i ( t ) ≤ µ ij ( t ) P ij ( t ) σ i ( t ) − µ ij ( t ) P ij ( t ) otherwise(4)Obviously, ˙ X i ( t ) = σ i ( t ) if P ij ( t ) = 0 , j = 1 , . . . , N . Inorder to express the dynamics of Z ij ( t ) , let ˜ µ ij ( t ) = ( min (cid:16) σ i ( tP ij ( t ) , µ ij (cid:17) if X i ( t ) = 0 and P ij ( t ) > µ ij otherwise (5)This gives us the dynamics: ˙ Z ij ( t ) = (cid:26) if Z ij ( t ) = 0 and ˜ µ ij ( t ) P ij ( t ) − β ij P Bj ( t ) ≤ µ ij ( t ) P ij ( t ) − β ij P Bj ( t ) otherwise (6)where β ij is the maximum rate of data from target i deliveredby agent j . For simplicity, we assume that: ( A5 ) k w i − w B k > r ij + r Bj for all i = 1 , . . . , M and j = 1 , . . . , N , i.e.,the agent cannot collect and deliver data at the same time.Therefore, in (6) it is always the case that P ij ( t ) P Bj ( t ) = 0 .Finally, the dynamics of Y i ( t ) depend on Z ij ( t ) , the contentof the on-board queue of each agent j from target i as long as P Bj ( t ) > . We define β i ( t ) = P Nj =1 β ij P Bj ( t ) [ Z ij ( t ) > to be the total instantaneous delivery rate for target i data,so that the dynamics of Y i ( t ) are: ˙ Y i ( t ) = β i ( t ) (7)Our objective is to maintain minimal values for all target andon-board agent data queues, while maximizing the contentsof the delivered data at the base queues. Thus, we define J ( X , . . . , X M , t ) to be the weighted sum of expected targetqueue contents (recalling that { σ i ( t ) } are random processes): J ( X , . . . X M , t ) = M X i =1 q i E [ X i ( t )] (8)where the weight q i represents the importance factor of target i . Similarly, we define a weighted sum of expected basequeues contents: J ( Y , . . . Y M , t ) = M X i =1 q i E [ Y i ( t )] (9)For simplicity, we will in the sequel assume that q i = 1 forall i without affecting any aspect of our analysis. Therefore,our optimization objective may be a convex combination of(8) and (9). In addition, we need to ensure that the agents arecontrolled so as to maximize their utilization, i.e., the fractionof time spent performing a useful task by being within rangeof a target or the base. Equivalently, we aim to minimize thenon-productive idling time of each agent during which it is not visiting any target or the base. Let D + ij ( t ) = max(0 , D ij ( t ) − r ij ) , D + Bj ( t ) = max(0 , D Bj ( t ) − r Bj ) (10)so that the idling time for agent j occurs when D + ij ( t ) > for all i and D + Bj ( t ) > . We define the idling function I j ( t ) : I j ( t ) = log D + Bj ( t ) M Y i =1 D + ij ( t ) ! (11)This function has the following properties. First, I j ( t ) = 0 ifand only if the product term inside the bracket is zero, i.e.,agent j is visiting a target or the base; otherwise, I j ( t ) > .Second, I j ( t ) is monotonically nondecreasing in the numberof targets M . The logarithmic function is selected so as toprevent the value of I j ( t ) from dominating those of J ( · ) and J ( · ) when included in a single objective function. Wedefine: J ( t ) = M I N X j =1 E [ I j ( t )] (12)where M I is a weight for the idling time effect relative to J ( · ) and J ( · ) . Note that I j ( t ) is also a random variablesince it is a function of the agent states s j ( t ) , j = 1 , . . . , N .Finally, we define a terminal cost at T capturing the expectedvalue of the amount of data left on board the agents, notingthat the effect of this term vanishes as T goes to infinity aslong as all E [ Z ij ( T )] remain bounded: J f ( T ) = 1 T M X i =1 N X j =1 E [ Z ij ( T )] (13)We can now formulate a stochastic optimization problem P1 where the control variables are the agent speeds and headingsdenoted by the vectors u ( t ) = [ u ( t ) , . . . , u N ( t )] and θ ( t ) =[ θ ( t ) , . . . , θ N ( t )] respectively (omitting their dependence onthe full system state at t ). We combine the objective functioncomponents in (8), (9), (12) and (13) to obtain: min u ( t ) , θ ( t ) J ( T ) = T R T (cid:16) αJ ( t ) − (1 − α ) J ( t ) + J ( t ) (cid:17) + J f ( T ) (14)where α ∈ [0 , is a weight capturing the relative importanceof collected data as opposed to delivered data and ≤ u j ( t ) ≤ , ≤ θ j ( t ) < π . To simplify notation, we havealso expressed J ( X , . . . X M , t ) and J ( Y , . . . Y M , t ) as J ( t ) and J ( t ) .Since we are considering a finite time optimization prob-lem, instability in the queues is not an issue. However,stability of such a system can indeed be an issue in thesense of guaranteeing that E [ X i ( t )] < ∞ , E [ Z ij ( T )] < ∞ for all i, j under a particular control policy when t → ∞ .This problem is considered in [11] for a simpler deterministicdata harvesting model where target queues are required to bebounded. In this paper, we do not explicitly study this issue;however, given a certain number of agents, it is possibleto stabilize a target queue by designing agent trajectoriesto ensure that the queue is visited frequently enough andperiodically emptied.II. O PTIMAL C ONTROL S OLUTION
In this section, we address P1 in a setting where all dataarrival processes are deterministic, so that all expectationsin (8)-(13) degenerate to their arguments. We proceed witha standard Hamiltonian analysis leading to a Two PointBoundary Value Problem (TPBVP) [19] where the states andcostates are known at t = 0 and t = T respectively. Wedefine a state vector and the associated costate vector: X ( t ) = [ X ( t ) , . . . , X M ( t ) , Y ( t ) , . . . , Y M ( t ) ,Z ( t ) , . . . , Z MN ( t ) , s x ( t ) , s y ( t ) , . . . , s xN ( t ) , s yN ( t )] λ ( t ) = [ λ ( t ) , . . . , λ M ( t ) , γ ( t ) , . . . , γ M ( t ) ,φ ( t ) , . . . , φ MN ( t ) , η x ( t ) , η y ( t ) , . . . , η xN ( t ) , η yN ( t )] The Hamiltonian is H ( X , λ , u , θ ) = 1 T h αJ ( t ) − (1 − α ) J ( t ) + J ( t ) i + X i λ i ( t ) ˙ X i ( t ) + X i γ i ( t ) ˙ Y i ( t ) + X i X j φ ij ( t ) ˙ Z ij ( t )+ X j (cid:0) η xj ( t ) u j ( t ) cos θ j ( t ) + η yj ( t ) u j ( t ) sin θ j ( t ) (cid:1) (15)where the costate equations are ˙ λ i ( t ) = − ∂H∂X i = − αT λ i ( T ) = 0˙ γ i ( t ) = − ∂H∂Y i = − αT γ i ( T ) = 0˙ φ ij ( t ) = − ∂H∂Z ij = 0 φ ij ( T ) = ∂J f ( t ) ∂Z ij (cid:12)(cid:12)(cid:12) T ˙ η xj ( t ) = − ∂H∂s xj = − " M I T ∂I j ( t ) ∂s xj + X i ∂∂s xj λ i ( t ) ˙ X i ( t )+ X i ∂∂s xj γ i ( t ) ˙ Y i ( t ) + X i ∂∂s xj φ ij ( t ) ˙ Z ij ( t ) ˙ η yj ( t ) = − ∂H∂s yj = − " M I T ∂I j ( t ) ∂s yj + X i ∂∂s yj λ i ( t ) ˙ X i ( t )+ X i ∂∂s yj γ i ( t ) ˙ Y i ( t ) + X i ∂∂s yj φ ij ( t ) ˙ Z ij ( t ) (16) η xj ( T ) = η yj ( T ) = 0 From (15), after some trigonometric manipulations, we get H ( X , λ , u , θ ) = 1 T h αJ ( t ) − (1 − α ) J ( t ) + J ( t ) i + X i λ i ( t ) ˙ X i ( t ) + X i γ i ( t ) ˙ Y i ( t ) + X i X j φ ij ( t ) ˙ Z ij ( t )+ X j u j ( t ) sgn ( η yj ( t )) q η xj ( t ) + η yj ( t ) sin( θ j ( t ) + ψ j ( t )) (17)where tan ψ j ( t ) = η xj ( t ) η yj ( t ) for η yj ( t ) = 0 and ψ j ( t ) = sgn ( η xj ( t )) π if η yj ( t ) = 0 . Applying the Pontryagin principleto (15) with ( u ∗ , θ ∗ ) being the optimal control, we have: H ( X ∗ , λ ∗ , u ∗ , θ ∗ ) = min u ( t ) , θ ( t ) H ( X , λ , u , θ ) (18)From (17) we easily see that we can always make the u j ( t ) multiplier to be negative, hence, recalling that ≤ u j ( t ) ≤ , u ∗ j ( t ) = 1 (19) Following the Hamiltonian definition in (15) we have: ∂H∂θ j = − η xj ( t ) u j ( t ) sin θ j ( t ) + η yj ( t ) u j ( t ) cos θ j ( t ) (20)and setting ∂H∂θ j = 0 the optimal heading θ ∗ j ( t ) should satisfy: tan θ ∗ j ( t ) = η yj ( t ) η xj ( t ) (21)Since u ∗ j ( t ) = 1 , we only need to evaluate θ ∗ j ( t ) for all t ∈ [0 , T ] . This is accomplished by discretizing the problemin time and numerically solving a TPBVP with a forwardintegration of the state and a backward integration of thecostate. Solving this problem quickly becomes intractable asthe number of agents and targets grows. However, one of theinsights this analysis provides is that under optimal controlthe data harvesting process operates as a hybrid system withdiscrete states (modes) defined by the dynamics of the flowqueues in (4), (6), (7), while the agents maintain a fixedspeed. The events that trigger mode transitions are definedin Table I (the superscript denotes events causing a variableto reach a value of zero from above and the superscript + denotes events causing a variable to become strictly positivefrom a zero value): TABLE IH
YBRID S YSTEM E VENTS
Event Name Description1. ξ i X i ( t ) hits 0, for i = 1 , . . . , M ξ + i X i ( t ) leaves 0, for i = 1 , . . . , M .3. ζ ij Z ij ( t ) hits 0, for i = 1 , . . . , M , j = 1 , . . . , N δ + ij D + ij ( t ) leaves 0, for i = 1 , . . . , M , j = 1 , . . . , N δ ij D + ij ( t ) hits 0, for i = 1 , . . . , M , j = 1 , . . . , N ∆ + j D + Bj ( t ) leaves 0, for j = 1 , . . . , N ∆ j D + Bj ( t ) hits 0, for j = 1 , . . . , N Observe that each of these events causes a change in atleast one of the state dynamics in (4), (6), (7). For example, ξ i causes a switch in (4) from ˙ X i ( t ) = σ i ( t ) − µ ij P ij ( t ) to ˙ X i ( t ) = 0 . Also note that we have omitted an event ζ + ij for Z ij ( t ) leaving 0 since this event is immediately inducedby δ ij when agent j comes within range of target i andstarts collecting data causing Z ij ( t ) to become positive if Z ij ( t ) = 0 and X i ( t ) > . Finally, note that all events aboveare directly observable during the execution of any agenttrajectory and they do not depend on our model of flowqueues. For example, if X i ( t ) becomes zero, this definesevent ξ i regardless of whether the corresponding queue isbased on a flow or on discrete data packets; this observationis very useful in the sequel.The fact that we are dealing with a hybrid dynamic systemfurther complicates the solution of a TPBVP. On the otherhand, it enables us to make use of Infinitesimal PerturbationAnalysis (IPA) [18] to carry out the parametric trajectory op-timization process discussed in the next section. In particular,we propose a parameterization of agent trajectories allowingus to utilize IPA to obtain a gradient of the objective functionwith respect to the trajectory parameters.V. A GENT T RAJECTORY P ARAMETERIZATION AND O PTIMIZATION
The idea here is to represent each agent’s trajectorythrough general parametric equations s xj ( t ) = f (Θ j , ρ j ( t )) , s yj ( t ) = g (Θ j , ρ j ( t )) (22)where the function ρ j ( t ) controls the position of the agenton its trajectory at time t and Θ j is a vector of parameterscontrolling the shape and location of the agent j trajectory.Let Θ = [Θ , . . . , Θ N ] . We now replace problem P1 in (14)by problem P2 : min Θ ∈ F Θ T Z T h αJ (Θ , t ) − (1 − α ) J (Θ , t ) + J (Θ , t ) i dt + J f (Θ , T ) (23)where we return to allowing arbitrary stochastic data arrivalprocesses { σ i ( t ) } so that P2 is a parametric stochastic opti-mization problem with F Θ appropriately defined dependingon (22). The cost function in (23) is written as J (Θ , T ; X (Θ , E [ L (Θ , T ; X (Θ , where L (Θ , T ; X (Θ , is a sample function defined over [0 , T ] and X (Θ , is the initial value of the state vector. Forconvenience, in the sequel we will use L , L , L , L f todenote sample functions of J , J , J and J f respectively.Note that in (23) we suppress the dependence of the fourobjective function components on the controls u ( t ) and θ ( t ) and stress instead their dependence on the parametervector Θ . In the rest of the paper, we will consider twofamilies of trajectories motivated by a similar approachused in the multi-agent persistent monitoring problem in[20]: elliptical trajectories and a Fourier series trajectoryrepresentation which is more general and better suited fornon-uniform target topologies. The hybrid dynamics of thedata harvesting system allow us to apply the theory of IPA[18] to obtain on line the gradient of the sample function L (Θ , T ; X (Θ , with respect to Θ . The value of the IPAapproach is twofold: ( i ) The sample gradient ∇L (Θ , T ) can be obtained on line based on observable sample pathdata only , and ( ii ) ∇L (Θ , T ) is an unbiased estimate of ∇ J (Θ , T ) under mild technical conditions as shown in [18].Therefore, we can use ∇L (Θ , T ) in a standard gradient-based stochastic optimization algorithm Θ l +1 = Θ l − ν l ∇L (Θ l , T ) , l = 0 , , . . . (24)to converge (at least locally) to an optimal parameter vector Θ ∗ with a proper selection of a step-size sequence { ν l } [21].We emphasize that this process is carried out on line , i.e.,the gradient is evaluated by observing a trajectory with given Θ over [0 , T ] and is iteratively adjusting it until convergenceis attained.
1) IPA equations:
Based on the events defined earlier,we will specify event time derivative and state derivativedynamics for each mode of the hybrid system. In this process,we will use the IPA notation from [18] so that τ k is the k th event time in an observed sample path of the hybridsystem and τ ′ k = dτ k d Θ , X ′ ( t ) = d X d Θ are the Jacobian matricesof partial derivatives with respect to all components of the controllable parameter vector Θ . Throughout the analysis wewill be using ( · ) ′ to show such derivatives. We will also use f k ( t ) = d X dt to denote the state dynamics in effect over aninterevent time interval [ τ k , τ k +1 ) . We review next the threefundamental IPA equations from [18] based on which wewill proceed. First, events may be classified as exogenous orendogenous. An event is exogenous if its occurrence time isindependent of the parameter Θ , hence τ ′ k = 0 . Otherwise, anendogenous event takes place when a condition g k (Θ , X ) =0 is satisfied, i.e., the state X ( t ) reaches a switching surfacedescribed by g k (Θ , X ) . In this case, it is shown in [18] that τ ′ k = − (cid:16) dg k d X f k ( τ − k ) (cid:17) − (cid:16) dg k d Θ + dg k d X X ′ ( τ − k ) (cid:17) (25)as long as ∂g k ∂ X f k ( τ − k ) = 0 . It is also shown in [18] that thestate derivative X ′ ( t ) satisfies ddt X ′ ( t ) = df k d X X ′ ( t ) + df k d Θ , t ∈ [ τ k , τ k +1 ) (26) X ′ ( τ + k ) = X ′ ( τ − k ) + [ f k − ( τ − k ) − f k ( τ + k )] τ k ′ (27)Then, X ′ ( t ) for t ∈ [ τ k , τ k +1 ) is calculated through X ′ ( t ) = X ′ ( τ + k ) + Z tτ k ddt X ′ ( t ) dt (28)Table I contains all possible endogenous event types forour hybrid system. To these, we add exogenous events κ i , i = 1 , ..., M , to allow for possible discontinuities (jumps)in the random processes { σ i ( t ) } which affect the sign of σ i ( t ) − µ ij P ij ( t ) in (4). We will use the notation e ( τ k ) todenote the event type occurring at t = τ k with e ( τ k ) ∈ E ,the event set consisting of all endogenous and exogenousevents. Finally, we make the following assumption which isneeded in guaranteeing the unbiasedness of the IPA gradientestimates: ( A6 ) Two events occur at the same time w.p. unless one is directly caused by the other.
2) Objective Function Gradient:
The sample functiongradient ∇L (Θ , T ) needed in (24) is obtained from (23)assuming a total of K events over [0 T ] with τ K +1 = T and τ = 0 : ∇L (Θ , T ; X (Θ; 0))) = 1 T ∇ h Z T (cid:16) α L (Θ , t ) − (1 − α ) L (Θ , t ) + L (Θ , t ) (cid:17) dt i + ∇L f (Θ , T )= 1 T ∇ h K X k =0 Z τ k +1 τ k (cid:16) α L (Θ , t ) − (1 − α ) L (Θ , t ) + L (Θ , t ) (cid:17) dt i + ∇L f (Θ , T )= 1 T h K X k =0 (cid:16) α (cid:16) Z τ k +1 τ k ∇L (Θ , t ) dt + L (Θ , τ k +1 ) τ ′ k +1 − L (Θ , τ k ) τ ′ k (cid:17) − (1 − α ) (cid:16) Z τ k +1 τ k ∇L (Θ , t ) dt + L (Θ , τ k +1 ) τ ′ k +1 − L (Θ , τ k ) τ ′ k (cid:17) + (cid:16) Z τ k +1 τ k ∇L (Θ , t ) dt + L (Θ , τ k +1 ) τ ′ k +1 − L (Θ , τ k ) τ ′ k (cid:17)i + ∇L f (Θ , T )= 1 T h K X k =0 Z τ k +1 τ k (cid:16) α ∇L (Θ , t ) dt − (1 − α ) ∇L (Θ , t ) dt + ∇L (Θ , t ) dt (cid:17)i + ∇L f (Θ , T ) (29)The last step follows from the continuity of the statevariables which causes adjacent limit terms in the sum tocancel out. Therefore, ∇L (Θ , T ) does not have any directdependence on any τ ′ k ; this dependence is indirect throughthe state derivatives involved in the four individual gradientterms. Referring to (8), the first term involves ∇L (Θ , t ) hich is as a sum of X ′ i ( t ) derivatives. Similarly, ∇L (Θ , t ) is a sum of Y ′ i ( t ) derivatives and ∇L f (Θ , T ) requires only Z ′ ij ( T ) . The third term, ∇L (Θ , t ) , requires derivatives of I j ( t ) in (11) which depend on the derivatives of the maxfunction in (10) and the agent state derivatives s ′ j ( t ) withrespect to Θ . Possible discontinuities in these derivativesoccur when any of the last four events in Table I takes place.In summary, the evaluation of (29) requires the statederivatives X ′ i ( t ) , Z ′ ij ( t ) , Y ′ i ( t ) , and s ′ j ( t ) . The latter areeasily obtained for any specific choice of f and g in (22)and are shown in Appendix I. The former require a ratherlaborious use of (25)-(27) which, however, reduces to asimple set of state derivative dynamics as shown next. Proposition 1 : After an event occurrence at t = τ k , thestate derivatives X ′ i ( τ + k ) , Y ′ i ( τ + k ) , Z ′ ij ( τ + k ) , with respect tothe controllable parameter Θ satisfy the following: X ′ i ( τ + k ) = if e ( τ k ) = ξ i X ′ i ( τ − k ) − µ il ( t ) P il ( τ k ) τ ′ k if e ( τ k ) = δ + ij X ′ i ( τ − k ) otherwisewhere l = j with P il ( τ k ) > if such l exists and τ ′ k = ∂D ij ( s j ) ∂s j ∂s j ∂ Θ (cid:16) ∂D ij ( s j ) ∂s j ˙ s j ( τ k ) (cid:17) − . Y ′ i ( τ + k ) = (cid:26) Y ′ i ( τ − k ) + Z ′ ij ( τ − k ) if e ( τ k ) = ζ ij Y ′ i ( τ − k ) otherwise Z ′ ij ( τ + k ) = if e ( τ k ) = ζ ij Z ′ ij ( τ − k ) + X ′ i ( τ − k ) if e ( τ k ) = ξ i Z ′ ij ( τ − k ) otherwisewhere e ( τ k ) = ξ i occurs when j is connected to target i . Proof : See (59), (70), (78), (76), (62), (71), (73), (65) inAppendix III.This result shows that only three of the events in E canactually cause discontinuous changes to the state derivatives.Further, note that X ′ i ( t ) is reset to zero after a ξ i event.Moreover, when such an event occurs, note that Z ′ ij ( t ) is coupled to X ′ i ( t ) . Similarly for Z ′ ij ( t ) and Y ′ i ( t ) whenevent ζ ij occurs, showing that perturbations in Θ can onlypropagate to an adjacent queue when that queue is emptied. Proposition 2 : The state derivatives X ′ i ( τ − k +1 ) , Y ′ i ( τ − k +1 ) with respect to the controllable parameter Θ satisfy thefollowing after an event occurrence at t = τ k : X ′ i ( τ − k +1 ) = (cid:26) if e ( τ k ) = ξ i X ′ i ( τ + k ) − R τ k +1 τ k µ ij P ′ ij ( u ) du otherwise Y ′ i ( τ − k +1 ) = Y ′ i ( τ + k ) + Z τ k +1 τ k β ′ i ( u ) du where j is such that P ij ( t ) > , t ∈ [ τ k , τ k +1 ) . Proof : See (58), (61) and (63) in Appendix III.
Proposition 3 : The state derivatives Z ′ ij ( τ + k +1 ) with respectto the controllable parameter Θ satisfy the following after anevent occurrence at t = τ k : i - If j is connected to target i , Z ′ ij ( τ − k +1 ) = (cid:26) Z ′ ij ( τ + k ) if e ( τ k ) = ξ i , ζ ij or δ + ij Z ′ ij ( τ + k ) + R τ k +1 τ k µ ij P ′ ij ( u ) du otherwise ii - If j is connected to B with Z ij ( τ k ) > , Z ′ ij ( τ − k +1 ) = Z ′ ij ( τ + k ) − Z τ k +1 τ k β ij P ′ Bj ( u ) du iii - Otherwise, Z ′ ij ( τ − k +1 ) = Z ′ ij ( τ + k ) . Proof : See (66), (67), (74) and (81) in Appendix III.
Corollary 1:
The state derivatives X ′ i ( t ) , Z ′ ij ( t ) , Y ′ i ( t ) with respect to the controllable parameter Θ are independentof the random data arrival processes { σ i ( t ) } , i = 1 , . . . , M . Proof : Follows directly from the three Propositions.There are a few important consequences of these results.First, as the Corollary asserts, one can apply IPA regardlessof the characteristics of the random processes { σ i ( t ) } . Thisrobustness property does not mean that these processes donot affect the values of the X ′ i ( t ) , Z ′ ij ( t ) , Y ′ i ( t ) ; this happensthrough the values of the event times τ k , k = 1 , , . . . , whichare observable and enter the computation of these derivativesas seen above. Second, the IPA estimation process is event-driven: X ′ i ( τ + k ) , Y ′ i ( τ + k ) , Z ′ ij ( τ + k ) are evaluated at eventtimes and then used as initial conditions for the evaluationsof X ′ i ( τ − k +1 ) , Y ′ i ( τ − k +1 ) , Z ′ ij ( τ − k +1 ) along with the integralsappearing in Propositions 2,3 which can also be evaluatedat t = τ k +1 . Consequently, this approach is scalable in thenumber of events in the system as the number of agentsand targets increases. Third, despite the elaborate derivationsin the Appendix, the actual implementation reflected bythe three Propositions is simple. Finally, returning to (29),note that the integrals involving ∇L (Θ , t ) , ∇L (Θ , t ) aredirectly obtained from X ′ i ( t ) , Y ′ i ( t ) , the integral involving ∇L (Θ , t ) is obtained from straightforward differentiationof (11), and the final term is obtained from Z ′ ij ( T ) .
3) Objective Function Optimization:
This is carried outusing (24) with an appropriate step size sequence.
A. Elliptical Trajectories
Elliptical trajectories are described by their center coor-dinates, minor and major axes and orientation. Agent j ’sposition s j ( t ) = [ s xj ( t ) , s yj ( t )] follows the general parametricequation of the ellipse: s xj ( t ) = A j + a j cos ρ j ( t ) cos φ j − b j sin ρ j ( t ) sin φ j s yj ( t ) = B j + a j cos ρ j ( t ) sin φ j + b j sin ρ j ( t ) cos φ j (30)Here, Θ j = [ A j , B j , a j , b j , φ j ] where A j , B j are the coordi-nates of the center, a j and b j are the major and minor axisrespectively while φ j ∈ [0 , π ) is the ellipse orientation whichis defined as the angle between the x axis and the major axisof the ellipse. The time dependent parameter ρ j ( t ) is theeccentric anomaly of the ellipse. Since the agent is movingwith constant speed of 1 on this trajectory from (19), wehave ˙ s xj ( t ) + ˙ s yj ( t ) = 1 which gives ˙ ρ j ( t ) = (cid:16) a sin ρ j ( t ) cos φ j + b j cos ρ j ( t ) sin φ j (cid:17) + (cid:16) a sin ρ j ( t ) sin φ j − b j cos ρ j ( t ) cos φ j (cid:17) − (31)In the data harvesting problem, trajectories that do not passthrough the base are inadmissible since there is no deliveryof data. Therefore, we add a constraint to force the ellipseto pass through w B = [ w x B , w y B ] where: w x B = A j + a j cos ρ j ( t ) cos φ j − b j sin ρ j ( t ) sin φ j w y B = B j + a j cos ρ j ( t ) sin φ j + b j sin ρ j ( t ) cos φ j (32)sing the fact that sin ρ ( t ) + cos ρ ( t ) = 1 we define aquadratic constraint term added to J (Θ , T ; X (Θ , with asufficiently large multiplier. This can ensure the optimal pathpasses through the base location. We define C j (Θ j ) whichappears in (34): C j (Θ j ) = (cid:0) − f j cos φ j − f j sin φ j − f j sin 2 φ j (cid:1) (33)where f j = (cid:0) w xB − A j a j (cid:1) + (cid:0) w yB − B j b j (cid:1) , f j = (cid:0) w xB − A j b j (cid:1) + (cid:0) w yB − B j a j (cid:1) , f j = ( b j − a j )( w xB − A j )( w yB − B j ) a j b j .Multiple visits to the base may be needed during themission time [0 , T ] . We can capture this by allowing an agenttrajectory to consist of a sequence of admissible ellipses.For each agent, we define E j as the number of ellipses inits trajectory. The parameter vector Θ κj with κ = 1 , . . . , E j ,defines the κ th ellipse in agent j ’s trajectory and T κj is thetime that agent j completes ellipse κ . Therefore, the locationof each agent is described through κ during [ T κ − j , T κj ] where T j = 0 . Since we cannot optimize over all possible E j for all agents, an iterative process needs to be performedin order to find the optimal number of segments in eachagent’s trajectory. At each step, we fix E j and find theoptimal trajectory with that many segments. The processis stopped once the optimal trajectory with E j segmentsis no better than the optimal one with E j − segments(obviously, this is not a globally optimal solution). We cannow formulate the parametric optimization problem P2 e where Θ j = [Θ j , . . . , Θ E j j ] and Θ = [Θ , . . . , Θ N ] : min Θ ∈ F Θ J e = 1 T Z T h αJ (Θ , t ) − (1 − α ) J (Θ , t ) + J (Θ , t ) i dt + M C N X j =1 C j (Θ j ) + J f (Θ , T ) (34)where M C is a large multiplier. The evaluation of ∇C j isstraightforward and does not depend on any event. (Detailsare shown in Appendix I). B. Fourier Series Trajectories
The elliptical trajectories are limited in shape and maynot be able to cover many targets in a mission space. Thus,we next parameterize the trajectories using a Fourier seriesrepresentation of closed curves [22]. Using a Fourier seriesfunction for f and g in (22), agent j ’s trajectory can bedescribed as follows with base frequencies f xj and f yj : s xj ( t ) = a ,j + Γ xj X n =1 a n,j sin(2 πnf xj ρ j ( t ) + φ xn,j ) s yj ( t ) = b ,j + Γ yj X n =1 b n,j sin(2 πnf yj ρ j ( t ) + φ yn,j ) (35)The parameter ρ ( t ) ∈ [0 , π ] , similar to elliptical trajectories,represents the position of the agent along the trajectory. Inthis case, forcing a Fourier series curve to pass through thebase is easier. For simplicity, we assume a trajectory to startat the base and set s xj (0) = w x B , s yj (0) = w y B . Assuming ρ (0) = 0 , with no loss of generality, we can calculate the zero frequency terms by means of the remaining parameters: a ,j = w x B − Γ xj X n =1 a n,j sin( φ xn,j ) , b ,j = w y B − Γ yj X n =1 b n,j sin( φ yn,j ) (36)The parameter vector for agent j is Θ j =[ f xj , a ,j , . . . , a Γ xj , b ,j , . . . , b Γ yj , φ ,j , . . . , φ Γ xj , ξ ,j , . . . , ξ Γ yj ] and Θ = [Θ , . . . , Θ N ] . Note that the shape of the curve isfully represented by the ratio f xj /f yj so one of these canbe kept constant. For the Fourier trajectories, the fact that u ∗ j = 1 allows us to calculate ˙ ρ j ( t ) as follows: ˙ ρ j ( t ) = π f xj Γ xj X n =1 a n,j n sin(2 πf xj ρ j ( t ) + φ xn,j ) ! + f yj Γ xj X n =1 b n,j n sin(2 πf yj ρ j ( t ) + φ yn,j ) ! − / (37)Problem P2 f is the same as P2 but there are no additionalconstraints in this case: min Θ ∈ F Θ J f = T R T (cid:16) αJ ( t ) − (1 − α ) J ( t ) + J ( t ) (cid:17) + J f ( T ) (38)V. N UMERICAL R ESULTS
In this section numerical results are presented to illustrateour approach. We consider 8 targets, 2 agents and a baseas shown in Fig. 2. First, we assume deterministic arrivalprocess with σ i = 0 . for all i . For (2) and (3) we have used p ( w, v ) = max(0 , − D ( w,v ) r ) where r is the correspondingvalue of r ij or r Bj . We have µ ij = 50 and β ij = 500 for all i and j . Other parameters used are α = 0 . , r ij = r Bj = 1 , M I = 1 and T = 100 except for the TPBVP casewhere T = 30 . In Fig. 2 results of the TPBVP are shownwhich depend heavily on the initial trajectory and this isthe best result among several initializations. These resultsare after 10,000 iterations of the TPBVP solver. In Fig. 3the results are shown for the (locally) optimal trajectorywith two ellipses in each agent’s trajectory ( E j = 2 ) andin Fig. 4 for a Fourier series representation with 5 termsin (35). Both methods converge in few iterations with eachiteration taking less than a few seconds. We use the Armijorule to update the step-size in each iteration. The averagequeue length at targets for TPBVP, Ellipse with E j = 2 and Fourier series are 52.13, 49.23 and 62.03 respectively.Whereas The average throughput for the three trajectories is3.76, 4.2, 3.56 respectively. Although the example is a verysymmetric configuration, the benefit of the Fourier seriestrajectories shows when the targets are randomly positioned.Then, initializing the TPBVP becomes a very hard task andellipses cannot fit all targets.Based on Corollary 1 our results are independent of theunderlying random processes { σ i ( t ) } . To verify this property,we model the exact same problem with a uniform distributionfor σ i ( t ) as U [0 . , . . Note that we keep E [ σ i ( t )] = 0 . ,the same rate as in the deterministic setting. At each iterationwe generate a random sample path using the random processwith σ i ( t ) ∼ U [0 . , . . The Fourier series trajectoriesfor this stochastic optimization problem are shown in Fig. X Y O b j e c t i v e F un c t i on Targets Base Agent 1 Initial Agent 2 Initial Agent 1 Final Agent 2 Final
Fig. 2. 8-targets, 2-agents, TPBVP trajectories (T=30) J ∗ = 15 . X Y O b j e c t i v e F un c t i on Targets Base Agent 1 Initial Agent 2 Initial Agent 1 Final Agent 2 Final
Fig. 3. 8-targets, 2-agents, Elliptical trajectories (T=100) J ∗ = − . J ∗ = − . compared to J ∗ = − . . Theobjective function converges almost as quickly but with someoscillations as expected.VI. C ONCLUSIONS
We have developed a new method for trajectory plan-ning in the data harvesting problem. An optimal controlformulation provides initial insights for the solution, but itis computationally intractable, especially in the case wherethe data generating processes are stochastic. We propose anagent trajectory parameterization in terms of general functionfamilies which are optimized on line through the use of IPA.Explicit results are provided for the case of elliptical andFourier series trajectories. We have shown robustness of thesolution with respect to stochastic data generation processesby considering stochastic data arrivals at targets. Naturalnext steps include constraining trajectories to urban settingobstacles in the mission space. X Y iteration O b j e c t i v e F un c t i on Targets Base Agent 1 Initial Agent 2 Initial Agent 1 Final Agent 2 Final
Fig. 4. 8-targets, 2-agents, Fourier series trajectories (T=100) J ∗ = − . X Y O b j e c t i v e F un c t i on Targets Base Agent 1 Initial Agent 2 Initial Agent 1 Final Agent 2 Final
Fig. 5. 8-targets, 2-agents, Random data processes - Fourier seriestrajectories J ∗ = − . A PPENDIX IE LLIPTICAL T RAJECTORIES
In order to calculate the IPA derivatives we need tohave the derivative of state variable with respect to all theparameter vector Θ j = [ A j , B j , a j , b j , φ j ] for all agents j .These derivatives do not depend on the events happening inthe system since the trajectories of agents are fixed at eachiteration. For now we assume E j = 1 for all j = 1 , . . . , N hence, we drop the superscript. We have: ∂s xj ∂A j = 1 , ∂s xj ∂B j = 0 (39) ∂s xj ∂a j = cos ρ j ( t ) cos φ j , ∂s xj ∂b j = − sin ρ j ( t ) sin φ j (40) ∂s xj ∂φ j = − a j cos ρ j ( t ) sin φ j − b j sin ρ j ( t ) cos φ j (41) ∂s yj ∂A j = 0 , ∂s yj ∂B j = 1 (42) s yj ∂a j = cos ρ j ( t ) sin φ j , ∂s yj ∂b j = sin ρ j ( t ) cos φ j (43) ∂s yj ∂φ j = a j cos ρ j ( t ) cos φ j − b j sin ρ j ( t ) sin φ j (44)Also the time derivative of the position state variables arecalculated as below: ˙ s xj ( t ) = − a j ˙ ρ j ( t ) sin ρ j ( t ) cos φ j + b j ˙ ρ j ( t ) cos ρ j ( t ) sin φ j (45) ˙ s yj ( t ) = − a j ˙ ρ j ( t ) sin ρ j ( t ) sin φ j + b j ˙ ρ j ( t ) cos ρ j ( t ) cos φ j (46)The gradient of the last term in the J e in (34) needs to becalculated separately. We have for j = l , ∂ C j ∂ Θ l = 0 and for j = l : ∂ C j ∂A j = 2 C j (cid:0) − cos φ j ∂f j ∂A j − sin φ j ∂f j ∂A j − sin 2 φ j ∂f j ∂A j (cid:1) ∂ C j ∂B j = 2 C j (cid:0) − cos φ j ∂f j ∂B j − sin φ j ∂f j ∂B j − sin 2 φ j ∂f j ∂B j (cid:1) ∂ C j ∂a j = 2 C j (cid:0) − cos φ j ∂f j ∂a j − sin φ j ∂f j ∂a j − sin 2 φ j ∂f j ∂a j (cid:1) ∂ C j ∂b j = 2 C j (cid:0) − cos φ j ∂f j ∂b j − sin φ j ∂f j ∂b j − sin 2 φ j ∂f j ∂b j (cid:1) ∂ C j ∂φ j = 2 C j (cid:0) ( f j − f j ) sin 2 φ j − f j cos 2 φ j (cid:1) where ∂f j ∂A j = − (cid:16) w x B − A j a j (cid:17) , ∂f j ∂B j = − (cid:16) w y B − B j b j (cid:17) ∂f j ∂a j = − (cid:16) ( w x B − A j ) a j (cid:17) , ∂f j ∂b j = − (cid:16) ( w y B − B j ) b j (cid:17) ∂f j ∂A j = − (cid:16) w x B − A j b j (cid:17) , ∂f j ∂B j = − (cid:16) w y B − B j a j (cid:17) ∂f j ∂a j = − (cid:16) ( w y B − B j ) a j (cid:17) , ∂f j ∂b j = − (cid:16) ( w x B − A j ) b j (cid:17) ∂f j ∂A j = − (cid:16) ( b j − a j )( w y B − B j ) a j b j (cid:17) ∂f j ∂B j = − (cid:16) ( b j − a j )( w x B − A j ) a j b j (cid:17) ∂f j ∂a j = − (cid:16) ( w x B − A j )( w y B − B j ) a j (cid:17) ∂f j ∂b j = 2 (cid:16) ( w x B − A j )( w y B − B j ) b j (cid:17) A PPENDIX
IIF
OURIER S ERIES T RAJECTORIES
We calculate the position of agent j ’sderivative with respect to all the Fourierparameters. The parameter vector is Θ j =[ f xj , a ,j , . . . , a Γ xj , b ,j , . . . , b Γ yj , φ ,j , . . . , φ Γ xj , ξ ,j , . . . , ξ Γ yj ] .So we have: ∂s xj ∂a ,j = 1 , ∂s xj ∂b ,j = 0 (47) ∂s xj ∂a n,j = sin(2 πnf xj ρ j ( t ) + φ xn,j ) , ∂s xj ∂b n,j = 0 (48) ∂s xj ∂φ xn,j = a n,j cos(2 πnf xj ρ j ( t ) + φ xn,j ) ∂s xj ∂φ yn,j = 0 (49) ∂s xj ∂f xj = 2 πρ j ( t ) Γ xj X n =1 a n,j n cos(2 πnf xj ρ j ( t ) + φ xn,j ) , (50) ∂s yj ∂b ,j = 1 , ∂s yj ∂a ,j = 0 (51) ∂s yj ∂b n,j = sin(2 πnf yj ρ j ( t ) + φ yn,j ) , ∂s yj ∂a n,j = 0 (52) ∂s yj ∂φ yn,j = b n,j cos(2 πnf yj ρ j ( t ) + φ yn,j ) ∂s yj ∂φ xn,j = 0 (53) ∂s yj ∂f xj = 0 (54)Also the time derivative of the position state variables arecalculated as below: ˙ s xj ( t ) = ˙ ρ j ( t ) Γ xj X n =1 πnf xj a n,j cos(2 πnf xj ρ j ( t )+ φ xn,j ) , (55) ˙ s yj ( t ) = ˙ ρ j ( t ) Γ yj X n =1 πnf yj a n,j cos(2 πnf yj ρ j ( t )+ φ xn,j ) , (56)A PPENDIX
IIIIPA
EVENTS AND DERIVATIVES
In this section, we derive all event time derivatives andstate derivatives with respect to the controllable parameter Θ for each event by applying the IPA equations.1. Event ξ i : This event causes a transition from X i ( t ) > , t < τ k to X i ( t ) = 0 , t ≥ τ k . The switching function is g k (Θ , X ) = X i so ∂g k ∂X i = 1 . From (25) and (4): τ ′ k = − (cid:16) ∂g k ∂X i f k ( τ − k ) (cid:17) − (cid:16) ∂g k ∂ Θ + ∂g k ∂X i X ′ i ( τ − k ) (cid:17) = − X ′ i ( τ − k ) σ i ( τ k ) − µ ij P ij ( τ k ) (57)where agent j is the one connected to i at t = τ k and wehave used the assumption that two events occur at the sametime w.p. , hence σ i ( τ − k ) = σ i ( τ k ) . From (26)-(27), since ˙ X i ( t ) = 0 , for τ k ≤ t < τ k +1 : ddt X ′ i ( t ) = ∂ ˙ X i ( t ) ∂X i ( t ) X ′ i ( t ) + ˙ X ′ i ( t ) = 0 (58) X ′ i ( τ + k ) = X ′ i ( τ − k ) + h(cid:16) σ i ( τ k ) − µ ij P ij ( τ k ) (cid:17) − i τ k ′ = X ′ i ( τ − k ) − X ′ i ( τ − k ) (cid:16) σ i ( τ k ) − µ ij P ij ( τ k ) (cid:17) σ i ( τ k ) − µ ij P ij ( τ k ) = 0 (59)For X r ( t ) , r = i , the dynamics of X r ( t ) in (4) are unaffectedand we have: X ′ r ( τ + k ) = X ′ r ( τ − k ) (60)If X r ( τ k ) > and agent l is connected to it, then ddt X ′ r ( t ) = ∂ ˙ X r ( t ) ∂X r ( t ) X ′ r ( t ) + ˙ X ′ r ( t )= ∂∂ Θ (cid:16) σ r ( t ) − µ rl P rl ( τ k ) (cid:17) = − µ rl P ′ rl ( t ) (61)and if X r ( t ) = 0 in [ τ k , τ k +1 ] or if no agents are connectedto i , then and ddt X ′ r ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , M , the dynamics of Y r ( t ) in (7) areot affected by the event ξ i at τ k , hence Y ′ r ( τ + k ) = Y ′ r ( τ − k ) (62)and since ˙ Y r ( t ) = β r ( t ) , for τ k ≤ t < τ k +1 : ddt Y ′ r ( t ) = ∂ ˙ Y r ( t ) ∂Y r ( t ) Y ′ r ( t ) + ˙ Y ′ r ( t ) = β ′ r ( t ) (63)For Z ij ( t ) , we must have Z ij ( τ k ) > since X i ( τ − k ) > ,hence ˜ µ ij ( τ − k ) > and from (27): Z ′ ij ( τ + k ) = Z ′ ij ( τ − k ) + h ˙ Z ij ( τ − k ) − ˙ Z ij ( τ + k ) i τ ′ k = Z ′ ij ( τ − k ) + h ˜ µ ij ( τ − k ) − ˜ µ ij ( τ + k ) i P ij ( τ k ) τ ′ k (64)Since X i ( τ − k ) > , from (5) we have ˜ µ ij ( τ − k ) = µ ij .At τ + k , j remains connected to target i with ˜ µ ij ( τ + k ) = σ i ( τ + k ) /P ij ( τ k ) = σ i ( τ k ) /P ij ( τ k ) and we get Z ′ ij ( τ + k ) = Z ′ ij ( τ − k ) + − X ′ i ( τ − k ) h µ ij P ij ( τ k ) − σ i ( τ k ) i σ i ( τ k ) − µ ij P ij ( τ k )= Z ′ ij ( τ − k ) + X ′ i ( τ − k ) (65)From (26) for τ k ≤ t < τ k +1 : ddt Z ′ ij ( t ) = ∂ ˙ Z ij ( t ) ∂Z ij ( t ) Z ′ ij ( t ) + ∂ ˙ Z ij ( t ) ∂ Θ= ∂ ˙ Z ij ( t ) ∂ Θ = ∂∂ Θ (cid:16) ˜ µ ij ( t ) P ij ( t ) − β ij P Bj ( t ) (cid:17) (66)Since ˜ µ ij ( t ) = σ i ( t ) /P ij ( t ) for the agent which re-mains connected to target i after this event, it followsthat ∂∂ Θ [˜ µ ij ( t ) P ij ( t )] = 0 . Moreover, P Bj ( t ) = 0 by ourassumption that agents cannot be within range of the baseand targets at the same time and we get ddt Z ′ ij ( t ) = 0 (67)Otherwise, for r = j , we have ˜ µ ir ( t ) = 0 and we get: ddt Z ′ ir ( t ) = − β ir P ′ Br ( t ) (68)Finally, for Z rj ( t ) , r = i we have Z ′ rj ( τ + k ) = Z ′ rj ( τ − k ) . If Z rj ( t ) = 0 in [ τ k , τ k +1 ) , then ddt Z ′ rj ( t ) = 0 . Otherwise, weget ddt Z ′ rj ( t ) from (66) with i replaced by r .2. Event ξ + i : This event causes a transition from X i ( t ) =0 , t ≤ τ k to X i ( t ) > , t > τ k . Note that this transition canoccur as an exogenous event when an empty queue X i ( t ) gets a new arrival in which case we simply have τ ′ k = 0 since the exogenous event is independent of the controllableparameters. In the endogenous case, however, we have theswitching function g k (Θ , X ) = σ i ( t ) − µ ij P ij ( t ) in whichagent j is connected to target i at t = τ k . Assuming ∂s j ∂ Θ = (cid:2) ∂s xj ∂ Θ ∂s yj ∂ Θ (cid:3) ⊤ and ˙ s j = [ ˙ s xj ˙ s yj ] ⊤ , from (25): τ k ′ = − (cid:16) ∂g k ∂s j ∂s j ∂ Θ (cid:17)(cid:16) ∂g k ∂s j ˙ s j ( τ k ) (cid:17) − (69)At τ k we have σ i ( τ k ) = µ ij P ij ( τ k ) . Therefore from (27): X ′ i ( τ + k ) = X ′ i ( τ − k ) + [ ˙ X i ( τ − k ) − ˙ X i ( τ + k )] τ k ′ = X ′ i ( τ − k ) + (cid:16) − σ i ( τ k ) + µ ij P ij ( τ k ) (cid:17) τ k ′ = X ′ i ( τ − k ) (70)Having X i ( t ) > in [ τ k , τ k +1 ) we know ˙ X i ( t ) = σ i ( t ) − µ ij P ij ( t ) therefor, we can get ddt X ′ i ( t ) from (61) with r and l replaced by i and j . For X r ( t ) , r = i , if X r ( τ k ) > andagent l is connected to r then ˙ X r ( τ k ) = σ r ( τ k ) − µ rl P rl ( τ k ) ,therefor, we get X ′ r ( τ + k ) from (60) while in [ τ k , τ k +1 ) wehave ddt X ′ r ( t ) from (61). If X r ( τ k ) = 0 or if no agent isconnected to target r , ˙ X r ( τ k ) = 0 . Thus, X ′ r ( τ + k ) = X ′ r ( τ − k ) and ddt X ′ r ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , M the dynamics of Y r ( t ) in (7) arenot affected by the event at τ k hence, we can get Y ′ r ( τ + k ) and ddt Y ′ r ( t ) in [ τ k , τ k +1 ) from (62) and (63) respectively.For Z ij ( t ) assuming agent j is the one connected to target i , we have: Z ′ ij ( τ + k ) = Z ′ ij ( τ − k ) + h ˙ Z ij ( τ − k ) − ˙ Z ij ( τ + k ) i τ ′ k = Z ′ ij ( τ − k ) + h ˜ µ ij ( τ − k ) − ˜ µ ij ( τ + k ) i P ij ( τ k ) τ ′ k = Z ′ ij ( τ − k ) (71)In the above equation, ˜ µ ij ( τ + k ) = µ ij because X i ( τ + k ) > .Also, µ ij P ij ( τ k ) = σ i ( τ k ) and ˜ µ ij ( τ − k ) = σ i ( τ k ) P ij ( τ k ) resultsin ˜ µ ij ( τ + k ) = µ ij . For Z il ( t ) , l = j , agent l cannot beconnected to target i at τ k so we have, Z ′ il ( τ + k ) = Z ′ il ( τ − k ) and ddt Z ′ il ( t ) = 0 in [ τ k , τ k +1 ) . For Z rl ( t ) , r = i and l = j using the assumption that two events occur at the same timew.p. 0, the dynamics of Z rl ( t ) are not affected at τ k , hencewe get ddt Z ′ rl ( t ) from (66) for i and j replaced by r and l .3. Event ζ ij : This event causes a transition from Z ij ( t ) > for t < τ k to Z ij ( t ) = 0 for t ≥ τ k . The switching functionis g k (Θ , X ) = Z ij ( t ) so ∂g k ∂Z ij = 1 . From (25): τ k ′ = − (cid:16) ∂g k ∂Z ij f k ( τ − k ) (cid:17) − (cid:16) ∂g k ∂ Θ + ∂g k ∂Z ij Z ′ ij ( τ − k ) (cid:17) = − Z ′ ij ( τ − k )˜ µ ij ( τ − k ) P ij ( τ − k ) − β ij P Bj ( τ − k ) = Z ′ ij ( τ − k ) β ij P Bj ( τ k ) (72)Since Z ij ( t ) is being emptied at τ k , by the assumption thatagents can not be in range with the base and targets at thesame time, we have P ij ( τ k ) = 0 . Then from (27): Z ′ ij ( τ + k ) = Z ′ ij ( τ − k ) + h − β ij P Bj ( τ k ) − i τ k ′ = Z ′ ij ( τ − k ) − h β ij P Bj ( τ k ) i Z ′ ij ( τ − k ) β ij P Bj ( τ k ) = 0 (73)Since ˙ Z ij ( t ) = 0 in [ τ k , τ k +1 ) : ddt Z ′ ij ( t ) = ∂ ˙ Z ij ( t ) ∂Z ij ( t ) Z ′ ij ( t ) + ∂ ˙ Z ij ( t ) ∂ Θ = 0 (74)For Z rl ( t ) , r = i or l = j , the dynamics in (6) are notaffected at τ k , hence: Z ′ rl ( τ + k ) = Z ′ rl ( τ − k ) (75)if Z rl ( τ k ) > , the value for ddt Z ′ rl ( t ) is calculated by (66)with r and l replacing i and j respectively. If Z rl ( τ k ) = 0 then ddt Z ′ rl ( t ) = 0 .For Y i ( t ) we have β i ( τ + k ) = 0 since the agent has emptiedits queue, hence: Y ′ i ( τ + k ) = Y ′ i ( τ − k ) + h ˙ Y i ( τ − k ) − ˙ Y i ( τ + k ) i τ ′ k = Y ′ i ( τ − k ) + [ β ij P Bj ( τ k ) − Z ′ ij ( τ − k ) β ij P Bj ( τ k )= Y ′ i ( τ − k ) + Z ′ ij ( τ − k ) (76)n [ τ k , τ k +1 ) we can get ddt Y ′ i ( t ) = 0 . For Y r ( t ) , r = i thedynamics of Y r ( t ) in (7) are not affected by the event at τ k hence, Y ′ r ( τ + k ) and ddt Y ′ r ( t ) in [ τ k , τ k +1 ) are calculatedfrom (62) and (63) respectively. The dynamics of X r ( t ) , r = 1 , . . . , M is are not affected at τ k since the event at τ k is happening at the base. We have X ′ r ( τ + k ) = X ′ r ( τ − k ) .If X r ( τ k ) > then we have ddt X ′ r ( t ) from (61) and if X r ( τ k ) = 0 then ddt X ′ r ( t ) = 0 in [ τ k , τ k +1 ) .4. Event δ + ij : This event causes a transition from D + ij ( t ) =0 for t ≤ τ k to D + ij ( t ) > for to t > τ k . It is the momentthat agent j leaves target i ’s range. The switching functionis g k (Θ , X ) = D ij ( t ) − r ij , from (25): τ k ′ = − ∂D ij ∂s j ∂s j ∂ Θ (cid:16) ∂D ij ∂s j ˙ s j ( τ k ) (cid:17) − (77)If agent j was connected to target i at τ k then by leaving thetarget, it is possible that another agent l which is within rangewith target i connects to that target. This means ˙ X i ( τ + k ) = σ i ( τ k ) − µ il P il ( τ k ) and ˙ X i ( τ − k ) = σ i ( τ k ) − µ ij P ij ( τ k ) , with P ij ( τ k ) = 0 , from (27) we have X ′ i ( τ + k ) = X ′ i ( τ − k ) − µ il P il ( τ k ) τ ′ k (78)If X i ( τ k ) > , ddt X ′ i ( t ) in [ τ k , τ k +1 ) is as in (61) with r replaced by i and if X i ( τ k ) = 0 then ddt X ′ i ( t ) = 0 . On theother hand, if agent j was not connected to target i at τ k , weknow that some l = j is already connected to target i . Thismeans agent j leaving target i cannot affect the dynamicsof X i ( t ) so we have X ′ i ( τ + k ) = X ′ i ( τ − k ) and ddt X ′ i ( t ) iscalculated from (61) with r replaced by i .For X r ( t ) , r = i the dynamics in (4) are not affected by theevent at τ k hence, we get X ′ r ( τ + k ) from (60). If X r ( τ k ) > the time derivative ddt X ′ r ( t ) in [ τ k , τ k +1 ) can be calculatedfrom (61) and if X r ( τ k ) = 0 then ddt X ′ r ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , , M , the dynamics in (7) are not alsoaffected by the event at τ k hence, we get Y r ( τ + k ) from (62)and in [ τ k , τ k +1 ) the ddt Y ′ r ( t ) is calculated from (63).For Z ij ( t ) , the dynamics in (6) are not affect at τ k , regardlessof the fact that agent j is connected to target i or not. We have ˙ Z ij ( τ − k ) = ˜ µ ij ( τ k ) P ij ( τ k ) with P ij ( τ k ) = 0 and ˙ Z ij ( τ + k ) =0 , hence from (27): Z ′ ij ( τ + k ) = Z ′ ij ( τ − k ) + h ˙ Z ij ( τ − k ) − ˙ Z ij ( τ + k ) i τ ′ k = Z ′ ij ( τ − k ) + ˜ µ ij ( τ k ) P ij ( τ k ) τ ′ k = Z ′ ij ( τ − k ) (79)and in [ τ k , τ k +1 ) , we have ddt Z ′ ij ( t ) = 0 using (66)knowing P ij ( τ k ) = P Bj ( τ k ) = 0 . For Z rl ( t ) , r = i or l = j ,the dynamics of Z rl ( t ) are not affected at τ k hence (75)holds and in [ τ k , τ k +1 ) again we can use (66) with i and j replaced by r and l .5. Event δ ij : This event causes a transition from D + ij ( t ) > for t < τ k to D + ij ( t ) = 0 for to t ≥ τ k . Theevent is the moment that agent j enters target i ’s range.The switching function is g k (Θ , X ) = D ij ( t ) − r ij . From(25) we can get τ k ′ from (77). If no other agent is alreadyconnected to target i , agent j connects to it. Otherwise, ifanother agent is already connected to target i , no connectionis established. For X i ( t ) , the dynamics in (4) are not affected in both cases, hence, (70) holds. If X i ( t ) > in [ τ k , τ k +1 ) we calculate ddt X ′ i ( t ) using (61) with l being theappropriate connected agent to target i . If X i ( τ − k ) = 0 , ddt X ′ i ( t ) = 0 . For X r ( t ) , r = i the dynamics in (4) are notaffected by the event at τ k . Hence, we get X ′ r ( τ + k ) from(60). If X r ( τ k ) > we calculate ddt X ′ r ( t ) from (61) with i replaced by r and if X r ( τ k ) = 0 then ddt X ′ r ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , M again the dynamics in (7) are notaffected at tau k so both (62) and (63) hold.For Z ij ( t ) , with agent j being connected or not to target i at τ k the dynamics of Z ij ( t ) are unaffected at τ k , hence (75)holds for i and j and in [ τ k , τ k +1 ) the ddt Z ′ ij ( t ) is calculatedthrough (66). For Z rl ( t ) , r = i or l = j the dynamics areunaffected (75) holds again. In [ τ k , τ k +1 ) , ddt Z ′ rl ( t ) is giventhrough (66) with i and j replaced by r and l .6. Event ∆ + j : This event causes a transition from D + Bj ( t ) = 0 for t ≤ τ k to D + Bj ( t ) ≥ for t > τ k . Theswitching function is g k (Θ , X ) = D Bj ( t ) − r Bj . τ k ′ = − ∂D Bj ∂s j ∂s j ∂ Θ (cid:16) ∂D Bj ∂s j ˙ s j ( τ k ) (cid:17) − (80)Similar to the previous event, the dynamics of X i ( t ) areunaffected at τ k hence, we have X ′ i ( τ + k ) calculated from(70). If X i ( t ) > in [ τ k , τ k +1 ) we calculate ddt X ′ i ( t ) through(61) and if X i ( τ − k ) = 0 , ddt X ′ i ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , , M , the dynamics of Y r ( t ) in (7) arenot affected at τ k , hence, we get Y r ( τ + k ) from (62) and in [ τ k , τ k +1 ) , ddt Y ′ r ( t ) is calculated from (63).For Z ij ( t ) , Using the fact that agent j can only be connectedto one target or the base, we have ˙ Z ij ( τ − k ) = β ij ( τ k ) P Bj ( τ k ) with P Bj ( τ k ) = 0 and ˙ Z ij ( τ + k ) = 0 , hence (75) holds with i and j replacing r and l . In [ τ k , τ k +1 ) from (26): ddt Z ′ ij ( t ) = ∂ ˙ Z ij ( t ) ∂Z ij ( t ) Z ′ ij ( t ) + ∂ ˙ Z ij ( t ) ∂ Θ= ∂ ˙ Z ij ( t ) ∂ Θ = − β ij P ′ Bj ( t ) (81)As for Z rl ( t ) , r = i or l = j the dynamics are unaffected so(75) holds. In [ τ k , τ k +1 ) we can calculate ddt Z ′ rl ( t ) through(66) with j replacing l .7. Event ∆ j : This event causes a transition from D + Bj ( t ) > for t < τ k to D + Bj ( t ) = 0 for t ≥ τ k . The switchingfunction is g k (Θ , X ) = D Bj ( t ) − r Bj . Using (25) we canget τ k ′ from (80). Similar with the previous event we have X ′ i ( τ + k ) from (70). If X i ( t ) > we can get ddt X ′ i ( t ) from(61) and if X i ( τ − k ) = 0 then ddt X ′ i ( t ) = 0 .For Y r ( t ) , r = 1 , . . . , , M , we again follow the previousevent analysis so (62) and (63) hold.For Z ij ( t ) , the analysis is similar to event ∆ + j so we cancalculate Z ′ ij ( τ + k ) and ddt Z ′ ij ( t ) in [ τ k , τ k +1 ) from (71) and(66) respectively. Also for Z rl ( t ) , r = i or l = j , (75)holds with same reasoning as previous event. In [ τ k , τ k +1 ) we calculate ddt Z ′ rl ( t ) from (66).R EFERENCES[1] P. Corke, T. Wark, R. Jurdak, W. Hu, P. Valencia, and D. Moore,“Environmental wireless sensor networks,” in
Proc. of the IEEE ,ol. 98, pp. 1903–1917, 2010.[2] R. N. Smith, M. Schwager, S. L. Smith, D. Rus, and G. S. Sukhatme,“Persistent ocean monitoring with underwater gliders: Towards accu-rate reconstruction of dynamic ocean processes,” in
Proc. - IEEE Int.Conf. on Robotics and Automation , pp. 1517–1524, 2011.[3] Z. Tang and U. ¨Ozg¨uner, “Motion planning for multitarget surveillancewith mobile sensor agents,”
IEEE Trans. on Robotics , vol. 21, pp. 898–908, 2005.[4] M. Zhong and C. G. Cassandras, “Distributed coverage contoroland data collection with mobile sensor networks,”
IEEE Trans. onAutomatic Cont., , vol. 56, no. 10, pp. 2445–2455, 2011.[5] K. Chakrabarty, S. S. Iyengar, H. Qi, and E. Cho, “Grid coverage forsurveillance and target location in distributed sensor networks,”
IEEETrans. on Computers , vol. 51, no. 12, pp. 1448–1453, 2002.[6] M. Cardei, M. T. Thai, Y. Li, and W. Wu, “Energy-efficient targetcoverage in wireless sensor networks,” ,pp. 1976–1984, 2005.[7] S. Alamdari, E. Fata, and S. L. Smith, “Persistent monitoring indiscrete environments: Minimizing the maximum weighted latencybetween observations,”
The Int. J. of Robotics Research , 2013.[8] C. G. Cassandras, X. Lin, and X. Ding, “An optimal control approachto the multi-agent persistent monitoring problem,”
IEEE Trans. on Aut.Cont. , vol. 58, pp. 947–961, April 2013.[9] D. Panagou, M. Turpin, and V. Kumar, “Decentralized goal as-signment and trajectory generation in multi-robot networks,”
CoRR ,vol. abs/1402.3735, 2014.[10] A. T. Klesh, P. T. Kabamba, and A. R. Girard, “Path planningfor cooperative time-optimal information collection,”
Proc. of theAmerican Cont. Conf. , pp. 1991–1996, 2008.[11] J. L. Ny, M. a. Dahleh, E. Feron, and E. Frazzoli, “Continuous pathplanning for a data harvesting mobile server,”
Proc. of the IEEE Conf.on Decision and Cont. , pp. 1489–1494, 2008.[12] R. Moazzez-Estanjini and I. C. Paschalidis, “On delay-minimized dataharvesting with mobile elements in wireless sensor networks,”
Ad HocNetworks , vol. 10, pp. 1191–1203, 2012.[13] M. Roscia, M. Longo, and G. Lazaroiu, “Smart City by multi-agent systems,” , no. October, pp. 20–23, 2013.[14] K. Akkaya and M. Younis, “A survey on routing protocols for wirelesssensor networks,”
Ad Hoc Networks , vol. 3, pp. 325–349, 2005.[15] M. Liu, Y. Yang, and Z. Qin, “A survey of routing protocols andsimulations in delay-tolerant networks,”
Lecture Notes in ComputerScience , vol. 6843 LNCS, pp. 243–253, 2011.[16] C. Chang, G. Yu, T. Wang, and C. Lin, “Path Construction and VisitScheduling for Targets using Data Mules,”
IEEE Trans. on Sys, Man,and Cybernetics: Systems , vol. 44, no. 10, pp. 1289–1300, 2014.[17] X. Lin and C. G. Cassandras, “Trajectory optimization for multi-agent persistent monitoring in two-dimensional spaces,” in
IEEE 53rdAnnual Conf. on Decision and Cont. (CDC) , 2014.[18] C. G. Cassandras, Y. Wardi, C. G. Panayiotou, and C. Yao, “Perturba-tion analysis and optimization of stochastic hybrid systems,”
EuropeanJournal of Cont. , vol. 16, no. 6, pp. 642 – 661, 2010.[19] A. E. Bryson and Y. C. Ho,
Applied optimal control: optimization,estimation and control . CRC Press, 1975.[20] X. Lin and C. G. Cassandras, “An optimal control approach to themulti-agent persistent monitoring problem in two-dimensional spaces,”
Automatic Control, IEEE Transactions on , vol. 60, pp. 1659–1664,June 2015.[21] H. Kushner and G. Yin,
Stochastic Approximation and RecursiveAlgorithms and Applications . Springer, 2003.[22] C. T. Zahn and R. Z. Roskies, “Fourier Descriptors for Plane ClosedCurves,”