Task Planning on Stochastic Aisle Graphs for Precision Agriculture
Xinyue Kan, Thomas C. Thayer, Stefano Carpin, Konstantinos Karydis
TTask Planning on Stochastic Aisle Graphs for Precision Agriculture
Xinyue Kan, Thomas C. Thayer, Stefano Carpin, and Konstantinos Karydis Abstract — This work addresses task planning under uncer-tainty for precision agriculture applications whereby task costsare uncertain and the gain of completing a task is propor-tional to resource consumption (such as water consumption inprecision irrigation). The goal is to complete all tasks whileprioritizing those that are more urgent, and subject to diversebudget thresholds and stochastic costs for tasks. To describeagriculture-related environments that incorporate stochasticcosts to complete tasks, a new Stochastic-Vertex-Cost AisleGraph (SAG) is introduced. Then, a task allocation algorithm,termed Next-Best-Action Planning (NBA-P), is proposed. NBA-P utilizes the underlying structure enabled by SAG, and tacklesthe task planning problem by simultaneously determining theoptimal tasks to perform and an optimal time to exit (i.e.return to a base station), at run-time. The proposed approachis tested with both simulated data and real-world experimentaldatasets collected in a commercial vineyard, in both single- andmulti-robot scenarios. In all cases, NBA-P outperforms otherevaluated methods in terms of return per visited vertex, wastedresources resulting from aborted tasks (i.e. when a budgetthreshold is exceeded), and total visited vertices.
I. INTRODUCTIONAutonomous agricultural mobile robots are becoming in-creasingly more capable at performing persistent missions such as monitoring crop health indices [1] and sampling spec-imens [2] across extended spatio-temporal scales to enhanceefficiency and productivity in precision agriculture [3]. Anautonomous robot (or a team of them) needs to perform cer-tain tasks in distinct locations of the operating environmentsubject to a specific budget [4] on the actions the robot cantake (e.g., a maximum capacity of soil samples to carry [5]).During in-field operations, the actual costs to complete taskscan be uncertain whereas expected costs may be known. Inaddition, some tasks can be more urgent than others, hencethey will have to be prioritized. It is often the case [3, 6, 7]that there exists some prior information about a required task(e.g., older measurements of soil moisture [8]) that can biasrobot task assignment(s). Hence, it is necessary to developapproaches that utilize limited prior information to plan taskswith uncertain costs and priority level.There exist two key challenges for efficient robot task allo-cation in precision agriculture. First, prior maps can indicatebiases in task assignments, but may not be trustworthy. Thisis because conditions in the agricultural field can changerapidly [9], are dynamic [10, 11], and may be hard to predict Dept. of Electrical and Computer Engineering, University of California,Riverside. Email: { xkan001, karydis } @ucr.edu. Dept. of Computer Science and Engineering, University of California,Merced. Email: { tthayer, scarpin } @ucmerced.edu.We gratefully acknowledge the support of NSF under grants ahead of time [12]. Second, as the budget is being depleted,the robot needs to periodically return to a base station (e.g.,to drop collected samples and/or recharge). Addressing thesetwo challenges simultaneously poses a two-layer intertwineddecision making under uncertainty problem: How to performoptimal sampling given an approximate prior map, and howto decide an optimal stopping time (i.e. to return to base) toavoid exceeding a given task capacity?
This paper introducesa new stochastic task allocation algorithm to balance optimalsampling and optimal stopping when task costs are uncertain.A direct approach for persistent sampling (and/or moni-toring) is to survey the entire space and perform the desiredtask(s) sequentially [13–15]. The main drawback is that therobot would then exhaustively visit all desired samplinglocations in the environment without prioritizing locationsthat would yield a higher gain or would be more time-critical.Orienteering [16–19] can address part of this drawback bydetermining paths that maximize the cumulative gain under aconstant budget. The robot prioritizes visiting adjacent loca-tions if they jointly yield higher gains than isolated high-gainlocations, and provided that any budget constraints are notviolated [16, 17]. However, this strategy can be insufficientfor missions where some tasks are more urgent than others.For instance, several existing robot task allocation strategies,albeit for distinct application domains [20–23], typicallyconsider a deadline [24] or user-defined importance levels.In precision agriculture, overhead imagery (e.g., thermalimaging) can help pinpoint locations that appear to be underwater stress [9], in which case sampling leaves or soil inthose areas should be prioritized. We formalize the notion oftasks with distinct urgency (e.g., a closer deadline or greaterimportance) by assigning a priority level [25, 26] to tasks.Besides the task priority level, deciding a next task fora robot to complete is also dependent on available budget,which can be of multiple types. For instance, the numberof locations that a robot can visit and sample from in one‘trip’ is constrained by both the energy capacity to movebetween locations and the robot’s sample payload capacity.Exceeding the energy budget can prevent the robot fromreturning to the base station to recharge and drop collectedsamples, whereas exceeding the sample payload capacity maycause potential robot and sample damage. Here we consideran energy budget for the robot moving between locations, anda resource budget linked to task execution. The two budgetsare independent of each other, and both can be reset to theirinitial values when the robot returns to the base station. Theactual amount of resources consumed to execute a task candiffer from what is the expectation in practice. In fact, theactual amount of resources consumed for task execution isrevealed only after the task has been completed. To model a r X i v : . [ c s . R O ] F e b his, we consider the cost to complete a task to be a stochasticrandom variable that follows some known distribution. Thecost to move between locations, however, is considered to bedeterministic [8]. Specific details are given in the following.This paper introduces a new stochastic task allocation ap-proach, termed Next-Best-Action Planning (NBA-P), for taskplanning under uncertainty in precision agriculture. The paperalso contributes a new Stochastic-Vertex-Cost Aisle Graph(SAG). SAG is an extension of the aisle graph [8, 27], whichis often used to describe agriculture-related environments.The main novelty of SAG is that it can represent uncertaintask costs. Using SAG, our proposed NBA-P algorithm si-multaneously determines 1) how to optimally schedule whichtasks to perform at run-time, and 2) when to optimally stopperforming new tasks and return back to the base station alsoat run-time. NBA-P ensures that urgent tasks are prioritizedsubject to both energy and resource budgets. In addition, itcan be extended to multi-robot teams. We test our method insingle- and multi-robot scenarios using both simulated dataand 10 real-world datasets collected in a commercial vineyardat central California. In all cases, NBA-P achieves higherefficiency than naive lawnmower, informed lawnmower, andseries Greedy Partial Row planners [28–30] in terms of morereturn per visited vertices, less resources wasted because ofaborted tasks, and less total visited vertices.II. R ELATED W ORK
Aisle graphs [8, 27] can model motion constraints emerg-ing when robots navigate in structured environments suchas agricultural fields. Vertices denote possible task locations,and edges represent connections between locations. Any tworows connect to each other only via the two end vertices.Moving backwards is not allowed. Hence, if a robot enters arow, it will have to reach the row’s other end then to moveto any other row. In the original aisle graph formulation [8,27], vertices and edges are associated with known and con-stant reward and movement costs, respectively. Our proposedextension, SAG, can also represent uncertain task costs.
Orienteering can tackle persistent sampling on aislegraphs. Even with motion constraints introduced via aislegraphs, orienteering remains an NP-hard problem and thusgreedy heuristics are often employed [8]. Recent efforts onstochastic orienteering associate stochastic costs to graphedges and propose a time-aware policy for a robot to adjustits path to avoid exceeding a certain budget [31]. However,addressing cases that involve uncertain task cost on verticesfor aisle graphs remains open. Our proposed NBA-P tacklesthe problem by simultaneously considering uncertain taskcosts on vertices and deterministic costs on edges.The optimal stopping framework [32] can be used toinvestigate the (optimal) criteria to terminate a process whileincorporating uncertainty [33]. In most cases [34–36], dataarrive in sequence, and irrevocable decision has to be made asto when the expected return is maximized. Optimal stoppinghas been used in robotics applications like target tracking [37]and marine ecosystem monitoring [38]. However, no motionconstraints, like those imposed by aisle graphs, apply to robot actions, and hence existing methods cannot be ported overto operations on aisle graphs. Paths planned with NBA-P fillthe gap, as they directly apply to environments with motionconstraints captured by aisle graphs.The proposed method applies when: 1) the motion con-straints in the application environment can be captured bya SAG; 2) the cost of completing tasks follow exponentialdistributions; and 3) the obtained gain by completing a taskis proportional to the actual task cost.III. S
TOCHASTIC T ASK A LLOCATION P ROBLEM S ETUP
We first define the
Stochastic-Vertex-Cost Aisle Graph(SAG) , to incorporate uncertain task cost on vertices. Then,we present this paper’s problem setup utilizing SAG.
A. The Stochastic-Vertex-Cost Aisle Graph (SAG)
We propose SAG as a way to extend the original aislegraph [8, 27] to handle missions consisting of tasks withpriority levels and stochastic execution costs. There arethree main differences between SAG and the original aislegraphs. 1) SAG considers stochastic costs for task executionat vertices. 2) Vertices in SAG are associated with taskpriority levels. 3) The gain, which describes the benefit ofcompleting a task, is proportional to the actual resourceconsumption if the task is fully completed. With moreresource consumption, higher gain could be obtained, e.g.,higher quality information during soil sampling process, orbetter field hydration/irrigation results. Note that no gain willbe obtained if 1) the resource budget is exceeded during taskexecution, and the task is aborted, or 2) a robot only passesthrough a vertex on its way without performing a task. Incontrast, in the original aisle graph rewards are constant andcan be collected immediately when passing through vertices.Given a field that contains m rows and n columns (where n denotes the total number of possible sampling locationsin each row), its SAG representation is an (undirected) graph A s ( m, n +2) = ( V, E ) where V and E are the sets of verticesand edges, respectively. Note that in the graph representationwe add two additional ‘virtual’ columns at indices j = 0 and j = n + 1 that connect the m rows; virtual vertices carry nogain. An example of a A s (3 , graph is given in Fig. 1.The set of edges E is built as follows: • Vertices v i,j with i ∈ [1 , m ] and j ∈ [1 , n ] have twoedges, e i, ( j − + and e i,j + . • Vertices v i,j with i ∈ (1 , m ) and j ∈ { , n + 1 } havethree edges: if j = 0 then { e i, + , e ( i − + , , e i + , } andif j = n + 1 then { e i,n + , e ( i − + ,n +1 , e i + ,n +1 } . • The four corner vertices are v , , v m, , v ,n +1 , v m,n +1 ,each of which has two edges. Fig. 1. SAG A s (3 , with 3 rows and 5 columns. Grey nodes areend vertices to connect rows. Vertices with red edges are base stations. For clarity and completeness, we follow and partially adapt the definitionof the original aisle graphs from [8, 27]. et S contains all priority levels in A s . Let c v : V −→ R ≥ and c e : E −→ R ≥ be the costs for task execution at verticesand movement on edges, respectively. The actual resourceconsumption to complete a task at vertex v ∈ V followsan exponential distribution, c v ( v ) ∼ Exp ( ¯ w s ) , where ¯ w s isthe mean cost of all tasks with priority level s ∈ S . Theactual task cost is not known before task completion, and isindependent between tasks at different locations. The cost ofmovement on edges is a known constant. Function f : V −→ S returns the priority level of a vertex, and f ( v i,j ) = 0 , v i,j ∈ V indicates no tasks at a vertex. Condition f ( i ,j ) < f ( i ,j ) implies that the task at v i ,j is more urgent than the taskat v i ,j . In other words, if the same amount of resourcesis consumed at v i ,j and v i ,j , higher gain is obtained at v i ,j . Once a task is completed, its priority level is set to .Let r : V −→ R ≥ be the actual gain obtained whencompleting a task. Function µ : S −→ R > maps each prioritylevel to a deterministic positive value, which indicates thegain-to-cost ratio of completing a task of given prioritylevel. Then, r ( i ,j ) = µ ( f ( i ,j ) ) c v ( i ,j ) , and µ ( f ( i ,j ) ) <µ ( f ( i ,j ) ) if f ( i ,j ) < f ( i ,j ) for v i ,j , v i ,j ∈ V . Vertices v i,j with i ∈ [1 , m ] and j = n +1 are virtual nodes to connectrows, hence f ( i,j ) = 0 , c v ( i,j ) = 0 , r ( i,j ) = 0 . Edges e i, + and e i,n + with i ∈ [1 , m ] has c e ( e i, + ) = c e ( e i,n + ) = 0 .Priority levels can be user-defined or estimated via anyprior environment maps. The latter can be determined basedon collected data, e.g., difference between ideal and sampledsoil moisture levels [8]. However, prior information may beapproximate and thus lead to suboptimality if directly setas priority levels for vertices. A way to assign priority levelsfrom prior information is to set thresholds so that data withina range yield the same priority level. Only same types of taskswith same expected cost can be set at same priority level. B. Stochastic Task Allocation on SAGs
A mission on SAG A s ( m, n + 2) = ( V, E ) comprisestasks located at v ∈ V T ⊂ V . Given energy budget formoving along edges and resource budget for executing taskson vertices, to complete all tasks in V T so that: • C1: Tasks are prioritized according to priority level. • C2: The number of tasks being aborted because of ex-ceeding the resource budget (at run-time) are minimized.C1 enforces the time-critical decision making, whereas C2ensures efficiency of mission completion. When a task isaborted, no gain is obtained and both consumed resourcesand energy spent moving to that vertex are wasted. Abortingtasks will also cause delays on mission completion time. Toavoid exceeding the resource budget at run-time, the robotthus needs to determine an optimal stopping time. Its nextaction should be to either 1) perform another feasible task ofthe highest possible priority level (which we describe how toset next), or 2) stop performing tasks and return to the basestation. Since the actual cost is unknown before completing atask, the next action and corresponding paths are determinedin an adaptive manner based on remaining budget at run-time. For clarity, we will henceforth write f ( v i,j ) as f ( i,j ) . Any otherfunctions that take vertices as input will be shortened similarly. IV. P
ROPOSED T ASK P LANNING A LGORITHM
Our proposed Next-Best-Action Planning (NBA-P) ap-proach balances sampling feasible vertices on SAG and deter-mining when it is preferable to exit (i.e. return to base station)based on remaining resource and energy budgets. Whensampling feasible vertices, we use a three-phase approach.Phase 1: sample feasible vertices subject to resource budget;Phase 2: sample feasible vertices from phase 1 subject toenergy budget; Phase 3: select a row to proceed and plancorresponding paths. When sampling in phase 1, we startfrom the highest priority level that currently exists. If eitherphase 1 or phase 2 returns no feasible vertex, we decreasethe examined priority level until either feasible vertices arefound, or the examined priority level reaches 0, in whichcase it is optimal to exit. This strategy ensures that taskswith higher priority level are prioritized when possible.
A. Phase 1: Feasible Vertices Subject to Resource Budget
To tackle the stochastic task cost, we formulate the next-task selection subject to resource budget as an optimalstopping problem. We employ a one-stage-look-ahead rule:if it is better to return to base station directly than to performone more task of any priority level then return , then returnat current time. In this phase, we do not need to consider theactual robot position. Let p and q be the remaining resourcebudget and the total gain in the current ‘trip’ (i.e. operationsince last visit to a base station), respectively. If a task ofpriority level s ∈ S consumes x amount of resources, thereturn is µ ( s ) x . Then, in a dynamic programming framework,with ( p, q ) the state, the expected return function, Φ( p, q ) , is Φ( p, q ) = (cid:40) max s ∈ S (cid:8) (cid:82) p λ s e − λ s x Φ( p − x, q + µ ( s ) x ) dx (cid:9) ,if p > q , otherwise, (1)where λ s = w s . When p > (i.e. some resource is available),a task of priority level s ∈ S which maximizes the return isselected. Otherwise, no task can be completed and the totalreturn remains the same as q .
1) Single Priority Level for All Tasks:
We start with thecase that all tasks in the mission have the same priority level, | S | = 1 . In this case, we only need to determine the optimaltime to exit. According to (1), for s ∈ S , the state ( p (cid:48) , q (cid:48) ) ison the optimal stopping boundary if q (cid:48) = (cid:90) p (cid:48) λ s e − λ s x ( q (cid:48) + µ ( s ) x ) dx, (2)since continuing to perform another task will not result inhigher expected return. Hence, the robot should exit if thecurrent state ( p, q ) satisfies p < p (cid:48) and q > q (cid:48) , i.e. all tasksare infeasible. Solving (2) leads to q (cid:48) = µ ( s ) λ s ( e λ s p (cid:48) − − λ s p (cid:48) ) . (3)Defining function g : ( R ≥ , S ) → R ≥ , ( p (cid:48) , s ) (cid:55)→ q (cid:48) basedon (3) represents the optimal stopping boundary curve for agiven priority level. Thus, it is optimal to exit at state ( p (cid:48) , q (cid:48) ) when q (cid:48) ≥ g ( p (cid:48) , s ) given a priority level s ∈ A s . Gain q are set to 0 when the robot resets at the base station. efinition 1. A task of priority level s is feasible for thecurrent state ( p, q ) , if ( p, q ) lies below the optimal stoppingboundary curve g ( p (cid:48) , s ) (Fig. 2).
2) Multiple Priority Levels Across Tasks: If | S | > , therobot determines the candidates with highest possible prioritylevel allowed by the remaining budget. The optimal strategyis to examine the feasibility to perform a task of prioritylevel s = max ( S ) , and then decrease s until a feasible taskis found. If no feasible task exists until s = 0 , then theoptimal decision is to return back to the base station.When multiple priority levels exist, it is not always truethat tasks with higher priority levels must be performed be-fore any lower priority rank tasks. To maximize the expectedreturn in one ‘trip’ (i.e. between two times that a robot visitsthe base station), when the remaining resource budget isnot enough for high priority level tasks, a task with lowerpriority level can potentially be selected to be performed next.However, in some scenarios, a lower priority task will neverbe selected prior to a higher priority task. Lemma 1.
At state ( p, q ) , given that tasks with priority level s ∈ S are infeasible, then all tasks with s (cid:48) ∈ S and s (cid:48) < s must be infeasible if ¯ w s (cid:48) ≥ ¯ w s . Proof.
Let s , s ∈ S : ≤ s < s ≤ max ( S ) and µ ( s ) <µ ( s ) . The mean costs of s and s tasks are ¯ w s and ¯ w s ,respectively. If ¯ w s ≥ ¯ w s , for any p > , g ( p, s ) >g ( p, s ) . Hence, if a s priority level task is infeasible atstate ( p, q ) , i.e. q ≥ g ( p, s ) , then q ≥ g ( p, s ) > g ( p, s ) ,and s tasks are infeasible too (Fig. 2(a)). (cid:4) On the other hand, as shown in Fig. 2(b)(c), for s , s ∈ S such that ≤ s < s ≤ max ( S ) , if ¯ w s < ¯ w s , therelationship between boundary curves g ( p, s ) and g ( p, s ) is either Condition 1 (Fig. 2(b)) or Condition 2 (Fig. 2(c)). • Condition 1 : g ( p, s ) < g ( p, s ) , ∀ p > , (4) • Condition 2: ∃ p > , such that g ( p, s ) (cid:40) ≥ g ( p, s ) , if < p ≤ p < g ( p, s ) , if p > p (5)For Condition 1, a state ( p, q ) above the curve g ( p, s ) canbe still below the curve g ( p, s ) , e.g., point b in Fig. 2(b). Inthis case, s tasks should be performed next even if there stillexist s tasks. For Condition 2, when p > p , the situation isthe same as described above for Condition 1. When < p ≤ p we reduce to the conditions of Lemma 1, in which casegiven that s tasks are infeasible, s tasks must be infeasible(an example is point c in Fig. 2(c)).Let Q be the set containing all feasible vertices subjectto a given resource budget. We propose Algorithm 1 todetermine Q at a state ( p, q ) . If Q = ∅ , the robot returns tothe base station. Otherwise, all vertices in Q will continueto be examined in Phase 2 subject to a given energy budget. B. Phase 2: Feasible Vertices Subject to Energy Budget
From Q , we continue sampling vertices that satisfy theenergy budget constraint. Suppose a robot is at vertex v i c ,j c ∈ V , and a vertex v i,j ∈ Q is a candidate to be examined. Fig. 2. Illustrations of cases described in (a) Lemma 1, (b) Condition 1 and(c) Condition 2. Red points visualize sample states ( p, q ) . In (a)-(c), tasksof priority levels s or s are feasible if the current state ( p, q ) is belowthe curves g ( p (cid:48) , s ) or g ( p (cid:48) , s ) , respectively. Algorithm 1
SampleQone( ( p, q ) , s ) procedure D ETERMINE Q AT STATE ( p, q ) s ← min ( s, max ( S )) , Q ← ∅ while s > Q = ∅ do if q < g ( p, s ) then for v ∈ V T do if f ( v ) = s then Q ← Q ∪ { v } end if end for else while w s − ≥ w s do s ← s − end while s ← s − end if end while return Q , s end procedure Without loss of generality, suppose two base stations arelocated on row i d , each at one of the end vertices v i d , and v i d ,n +1 . The robot can reset at either one. Vertex v i (cid:48) ,j (cid:48) isfeasible if the current remaining energy budget T allows therobot to move to v i (cid:48) ,j (cid:48) then to any base station. Let t α ( i (cid:48) ) be the cost to move from v i c ,j c to an end node–either oncolumn or n +1 depending on the robot’s moving directionin current row i (cid:48) (recall backward motion is not allowed). Let t β ( i (cid:48) ) be the cost to move between two end vertices v i (cid:48) , and v i (cid:48) ,n +1 in row i (cid:48) , and t γ be the cost to move from the endvertex on row i (cid:48) closest to the robot along its direction ofmotion to the closest base station. Then, v i (cid:48) ,j (cid:48) is feasible if T ≥ t α ( i (cid:48) ) + t β ( i (cid:48) ) + t γ ( i (cid:48) ) . (6)If v i (cid:48) ,j (cid:48) can be reached, all vertices on row i (cid:48) must bereachable, since t α ( i (cid:48) ) + t β ( i (cid:48) ) + t γ ( i (cid:48) ) only depends onrow i (cid:48) . Costs t β and t γ are fixed for each row and canbe precomputed prior to deployment. Cost t α is computedat run-time. The set containing all vertices that satisfy bothbudgets is Q = { v i (cid:48) ,j (cid:48) ∈ Q | t α ( i (cid:48) ) + t β ( i (cid:48) ) + t γ ( i (cid:48) ) ≤ T } . C. Phase 3 and Proposed Algorithm
Phase 3 can be reached if Q (cid:54) = ∅ . Note that all tasks atvertices in Q have the same priority level s and hence thesame expected cost ¯ w s . Therefore, sampling the next vertexturns into selecting a row i which consists of one or moretasks of priority level s . Then, the robot will perform the firstencountered feasible task while moving along row i . Expressions to compute t α , t β and t γ are given in the Appendix. uppose the robot is currently at v i c ,j c with state ( p, q ) .The row i ∈ [1 , m ] is selected such that ∀ i (cid:48) ∈ [1 , m ] , i (cid:48) (cid:54) = i c , min {| Q ( i ) | , (cid:106) q ¯ w s (cid:107) } > min {| Q ( i (cid:48) ) | , (cid:106) q ¯ w s (cid:107) } , (7) t α ( i ) ≤ t α ( i (cid:48) ) , (8)where Q ( i ) = { v i (cid:48) ,j (cid:48) ∈ Q | i (cid:48) = i } , and ¯ w s is the mean costof feasible tasks. By (7), the robot is expected to completemore tasks in row i than any other row. Thus, row i shouldbe the row that contains the largest number of feasible taskspermitted by remaining budget q , according to the expectedcost ¯ w s . If multiple rows return a tie, then the row closest tothe robot’s current position will be selected as per (8).The proposed Next-Best-Action Planning (NBA-P) ap-proach is formalized in Algorithm 2. NBA-P can be extendedto apply to multi-robot teams by sequentially determiningthe next best action for each robot. In multi-robot imple-mentation, each robot runs NBA-P independently and inparallel, and exchanges information only about the row itcurrently occupies. For each robot, Q has to be modified byremoving all vertices in those rows that are occupied by otherrobots. Note that similar to [16], multiple robots can travelsimultaneously along the vertical columns and n + 1 , sincespace on the boundary of a field is typically much larger. Algorithm 2
NBA-P procedure D ETERMINE NEXT ACTION AT STATE ( p, q ) s ← max ( S ) , Q ← ∅ , Q ← ∅ while Q = ∅ do if s=0 then exit, return to base station end if Q , s ← SampleQone (( p, q ) , s ) if Q = ∅ then exit, return to base station else obtain Q from Eq. (6), s ← s − end if end while continue with Phase 3 end procedure V. R
ESULTS
To study the efficiency and effectiveness of the proposedapproach, we test with 1) simulated data in a 2-robot scenario,and 2) data collected from a real-world vineyard in 1- and5-robot scenarios. Testing with simulated data enables param-eter tuning so as to study the properties of NBA-P, whereastesting with real-world data reveals the spatial pattern of realtasks that exist in agricultural fields. In both cases, NBA-P is compared against lawnmower planner [28], which isoften seen in agriculture-related applications [29, 30]. Innaive lawnmower (N-LM), a robot follows meandering pathsto survey rows in sequence. When no budget constraint isconsidered, and when departing from a corner in a squareenvironment, lawnmower will generate the shortest path tosurvey the entire field in the sense that each vertex is visitedonly once. In experiments using real-world datasets, NBA-Pis also compared against informed lawnmower (I-LM) and Series Greedy Partial Row (S-GPR) [16]. I-LM attemptsto complete a task if the remaining budget is greater thanthe expected cost of a task. S-GPR is modified to useboth energy and resource budgets. In multi-robot cases, eachrobot runs N-LM, I-LM, and NBA-P independently and inparallel to each other; in S-GPR robots plan their trajectoriessequentially. All vertices in occupied rows turn infeasible forother robots so that each row has one robot performing tasks.
A. Testing with Simulated Data and Parametric Study
We consider a simulated environment A s (20 , of 20rows and 17 columns (including the two virtual columns).Base stations are located at v , and v , . The cost tomove on each edge is 1. Consider two robots deployed frombase station v , to complete all tasks. Each robot departswith energy budget 80, and resource budget 40. A vertexis considered “visited” if the robot stops at the vertex andattempts to perform a task, regardless whether the task isultimately completed or aborted. If the resource budget isexceeded before task completion, the task will be aborted,the resources already consumed for this task are consideredto be “wasted,” and no gain will be obtained. The total gainwill be the sum of the actual gain, which is proportional to theactual task cost, at all task-completed vertices. Two cases arestudied, 1) S = { } , i.e. all tasks have equal priority level,and 2) S = { , } , i.e. two priority levels exist, hence taskswith s = 2 will be prioritized. In case 1, µ (1) = 1 , ¯ w = 2 ;and in case 2, µ (1) = 1 , µ (2) = 2 , ¯ w = 1 . , ¯ w = 2 .For each case study, 10 trials are conducted. In each trial,225 tasks are randomly assigned to 225 vertices in A s , withrandomly generated task location and actual task cost. In eachtrial, the proposed method and the N-LM method are testedon the same simulated environment. Fig. 3. Example of (a) percentage of obtained gain and (b) wasted resourcesover visited vertices when | S | = 1 . Panels (c) and (d) contain the sameinformation when | S | = 2 . Figure 3 illustrates shows from a sample trial when | S | = 1 (top) and | S | = 2 (bottom). Figures 3(a) and (c) show thepercentage of obtained gain over ground truth total gainas a function of visited vertices (shortened as r/v ratio).Figures 3(b) and (d) show the total wasted resources because Values for both budgets are selected randomly for evaluation purposes.Expected task costs are selected to achieve a budget over expected cost ratioof 20, which helps reveal general patterns of the method. f aborted tasks as a function of visited vertices (shortened as w/v ratio). Total wasted resources are the sum of resourceconsumption for all aborted tasks. Total gain, total wastedresources and visited vertices correspond to the sum of thosevalues from both robots. Results suggest that all tasks arecompleted, and the robots return to the base station.Table I contains the mean and one standard deviation of r/v ratio, w/v ratio, and total visited vertices over 10 trials.Larger r/v suggests higher efficiency since more gain isobtained by visiting the same number of vertices, i.e. samenumber of attempts to execute tasks. Lower w/v indicateslower rates of aborted tasks, i.e. less resources are wastedby visiting the same number of vertices. Higher r/v , lower w/v , and less total visited vertices are desired, and theseconditions together indicate higher overall effectiveness.
TABLE IR
ESULTS FOR SIMULATED DATA OVER TRIALS r/v ratio w/v ratio vertices (10 − ) (10 − ) visited | S | = 1 NBA-P . ± . . ± . . ± .
02 1 . ± . . ± . . ± .
08 226 . ± . . ± . . ± . N-LM . ± .
18 9 . ± .
74 289 . ± . | S | = 2 NBA-P . ± . . ± . . ± .
02 0 . ± . . ± . . ± .
73 225 . ± . . ± . . ± . N-LM . ± . . ± .
63 266 . ± . Results in Fig. 3 and Table I suggest that, in both cases,NBA-P achieves higher r/v ratio, lower w/v ratio, and lesstotal visited vertices than N-LM. When | S | = 1 , since alltasks have the same priority level, the higher r/v ratio ofNBA-P is mainly due to the optimal stopping strategy thathelps prevent aborting tasks. When | S | = 2 , the higher r/v is due to both the optimal stopping and priority-drivenstrategies. This can be observed by the steep slope at thebeginning of the curve of our proposed method in Fig. 3(c),during which time tasks with priority level 2 are prioritized.The high rate of tasks being aborted in N-LM is the reasonwhy the total visited vertices for N-LM are more than forNBA-P. For N-LM, the robot will attempt to perform a taskif there is still remaining resource budget. However, if thebudget is exceeded during task execution, the task will beaborted and the vertex needs to be re-visited. Setting multiplepriorities may be useful but this needs to be carefully tunedas having too many priority levels can make the processinefficient in practice, forcing the robot to move across thefield to reach the tasks of the next highest priority level.Reducing the rate of aborted tasks can increase efficiency.We continue to study the influence of µ and ¯ w s to the rateof aborted tasks, in which case the energy budget does notneed to be considered. Starting with | S | = 1 , and assumingthere exist infinite tasks, we need to determine the optimaltime to stop performing more tasks. Figure 4 (left) showsthe relation between aborted tasks (ratio of occurrences over1000 trials) and the ratio of initial budget over mean task cost ¯ w , for different values of µ . Results suggest that abortinga task is barely influenced by the value of µ . If the initialbudget is close to the mean task cost, the rate of abortedtasks can be as high as , and the optimal stopping ruleis less effective. If the initial budget is more than timesthe mean task cost, the rate of aborted tasks reaches . Fig. 4. (Left) Rate of aborted tasks over the ratio of budget/ ¯ w when | S | = 1 . (Right) Rate of aborted tasks with respect to the ratio of budget/ ¯ w and budget/ ¯ w for the | S | = 2 case. Given that µ barely affects the rate of aborted tasks (Fig. 4(left)), we examine in the case that | S | = 2 the influenceof the ratio between initial budget and task mean cost foreach priority level. We assume there exist infinite tasks ofpriority level 1 and 2. The goal is to determine if it is betterto select another task of priority level 1 or 2 to perform,or to stop. Figure 4 (right) suggests that, regardless of therelation between ¯ w and ¯ w , the rate of aborted tasks ismore influenced by the ratio of initial budget over ¯ w . Thisis intuitive since tasks of priority level 2 are prioritizedover tasks of priority level 1. Since more s = 2 tasks areperformed if possible, it escalates its influence to the rateof aborting tasks. Thus, when higher priority tasks exist, theinitial budget can be set by considering expected cost of highpriority tasks, as the expected cost of low priority tasks do nothave much impact when energy is sufficient. The proposedmethod is more suitable when the mean cost of tasks is smallenough compared to the initial (resource) budget. B. Testing with Real-world Field Data
The real-world datasets used here contain soil moisturevalues collected in a commercial vineyard located in centralCalifornia. The structure of the vineyard imposes motionconstraints to ground robots moving therein. Irrigation linesare attached to metallic wires at about in from the groundand running parallel to the trellis. Thus, to move from onerow to another (even if adjacent), the robot must first exitthe row from either end (based on its direction), and then re-enter at the desired row. Samples were collected on a regulargrid with 275 rows and 214 columns uniformly covering thefield. Sampling locations were computed offline, and datawere collected with a Campbell H2SP soil moisture sensor.Suppose autonomous ground mobile robots are deployedto water plants in the vineyard. Vertices with moisture valuesless than a desired level are considered to contain a task (ofprecise watering). The ground truth task cost at any vertexis the moisture difference between collected moisture valuesand the desired level. An example in shown in Fig. 5. Therobots’ decision is constrained by the resource budget oftotal water carrying capacity, and the energy budget to movebetween locations. All tasks are considered to have the samepriority level, i.e. | S | = 1 . The location of tasks and themean cost of all tasks (averaged real costs of all tasks usingground truth) are available to the robot(s) prior to departure,whereas the actual cost for each task is unknown to therobot(s) before task completion. Without loss of generality,we consider the movement cost on edges to be 1 (all water ABLE IIR
ESULTS FOR FIELD EXPERIMENTAL DATASETS IN AND ROBOT SCENARIOS robotnumber data idx 1 2 3 4 5 6 7 8 9 10total tasks 39528 44607 58845 38190 33489 24075 12203 58553 58551 203451 r/v( − ) NBA-P 2.532.532.53 2.242.242.24 1.701.701.70 2.622.622.62 2.992.992.99 4.154.154.15 8.198.198.19 1.711.711.71 1.711.711.71 4.924.924.92N-LM 1.72 1.23 0.55 1.23 1.56 2.17 6.68 0.59 0.57 3.72I-LM 2.53 2.24 1.70 2.61 2.98 4.14 8.18 1.70 1.70 4.91S-GPR 2.52 2.22 1.67 2.59 2.95 4.10 8.15 1.68 1.68 4.89w/v( − ) NBA-P 0.150.150.15 000 000 0.610.610.61 000 000 1.611.611.61 000 000 1.441.441.44N-LM 5.88 11.9 23.42 22.07 16.96 21.37 6.66 19.45 24.53 4.91I-LM 4.62 9.98 16.77 28.87 19.10 23.47 7.31 17.23 27.18 4.00S-GPR 9.12 22.44 70.44 49.03 33.47 41.83 10.94 58.66 76.18 7.70visitedvertices NBA-P 395293952939529 446074460744607 588455884558845 381923819238192 334893348933489 240752407524075 122061220612206 585535855358553 585515855158551 203452034520345N-LM 58220 81277 181553 81083 64178 46144 14967 168360 176174 26914I-LM 39585 44703 58966 38349 33600 24159 12223 58689 58734 20372S-GPR 39731 45018 59933 38672 33847 24363 12263 59504 59587 204335 r/v( − ) NBA-P 2.532.532.53 2.242.242.24 1.701.701.70 2.622.622.62 2.992.992.99 4.154.154.15 8.198.198.19 1.711.711.71 1.711.711.71 4.914.914.91N-LM 1.77 1.3 0.62 1.37 1.69 2.33 6.81 6.52 6.03 3.81I-LM 2.53 2.24 1.70 2.61 2.98 4.14 8.18 1.70 1.70 4.91S-GPR 2.52 2.22 1.67 2.59 2.95 4.10 8.15 1.68 1.68 4.89w/v( − ) NBA-P 000 000 000 0.920.920.92 000 000 2.672.672.67 000 000 0.590.590.59N-LM 5.97 12.00 24.72 24.67 18.66 22.62 6.23 21.18 27.05 4.72I-LM 5.81 8.37 14.78 27.35 18.47 23.58 4.94 14.66 29.46 3.61S-GPR 9.08 22.66 70.37 49.03 33.09 41.31 10.26 58.66 76.18 7.79visitedvertices NBA-P 395283952839528 446074460744607 588455884558845 381933819338193 334893348933489 240752407524075 122091220912209 585535855358553 585515855158551 203482034820348N-LM 56543 77210 162420 73179 59253 42923 14688 153425 165869 26233I-LM 39599 44687 58952 38335 33596 24160 12218 58670 58753 20369S-GPR 39729 45020 59933 38672 33847 24363 12263 59504 59587 20433 emitters are located within the same interval distance). Wetest in 1- and 5-robot scenarios, where the robots depart withenergy budget and resource budget . The base stationsare located at v , and v , . Fig. 5. (Left) Instance of a wheeled mobile robot performing soil moisturemeasurements in a commercial vineyard. (Right) Sample ground truth cost(i.e. the moisture difference between collected moisture values and a desiredlevel) for one of the field experimental datasets used here. Low-moisture(dry) locations are indicated by higher differences. In these areas, morewater (the resource in this case) needs to be consumed to reach a desiredmoisture level, which is equivalent to a higher cost. Discretely-sampledvalues where mapped to a continuous contour illustrated here using thekriging interpolation algorithm. (However, we use the discrete values directlyon the SAG representation of the environment utilized by our algorithm.)
Table II shows results for 10 real-world datasets in 1-and 5-robot scenarios. The 10 datasets were collected duringa timespan of 5 months, hence task locations and costsdiffer among datasets. Results suggest that, in all cases,our proposed NBA-P algorithm achieves higher r/v ratio,lower w/v ratio, and less total visited vertices than N-LM. Even though I-LM and S-GPR achieve similar r/v ratio and total visited vertices as NBA-P, the w/v ratio ismuch higher compared to NBA-P. Results attempts only ifthe expected task cost is less than the remaining budgetyet it fails to consider the uncertainties in actual resourceconsumption, which can be much higher than the expectedvalue. In addition, the high actual cost may cause multiplefailed attempts at the same position. Thus, the total wasteof resource can be significant, evident by the averaged totalwaste over 10 datasets for 1-robot cases being 5, 1725, 671and 1796 for NBA-P, N-LM, I-LM and S-GPR, respectively. That is, NBA-P is able to handle uncertain task costs, andrequires less total resources to complete the same amount oftasks as compared to other methods. Importantly, no tasksare aborted in 13 out of 20 cases using NBA-P, where eachcase contains up to 60000 tasks. For datasets 3, 8, and 9, N-LM in fact visits three times more vertices than the proposedmethod. The total path lengths for evaluated methods differwithin around -range, and NBA-P achieves the shortestpath for datasets 6 and 10. That is, NBA-P plans paths ofsimilar length as lawnmower methods.In all, testing with experimental data validates the efficacyand efficiency of our proposed method, and demonstratespreliminary feasibility that it can scale both in terms of thesize of the environment and the number of robots in the team.VI. C ONCLUSIONS
Contributions and Key Findings : The paper contributesto stochastic task allocation in precision agriculture. Givenresource and energy budgets, our NBA-P algorithm returnsthe best action on a stochastic aisle graph (SAG) by simulta-neously determining optimal sampling locations and optimalstopping times to return to a base station, all at run-time. Theproposed algorithm is tested using both simulated data for a2-robot scenario and agricultural field experimental datasetsfor 1- and 5-robot scenarios. Results suggest that, by applyingNBA-P, all tasks are completed, and tasks with high prioritylevels are prioritized when possible. The rate of aborted tasksis minimal when the initial resource budget is more than50 times the mean task costs. NBA-P outperforms N-LM,I-LM and S-GRP methods in all simulated and real-worlddatasets, in terms of more gain per vertex visited, fewer tasksbeing aborted, and less total visited vertices to complete thesame number of tasks. In testing with real-world datasets,our method has no tasks aborted in 13 out of 20 cases withup to 60000 tasks in each case. Further, N-LM visits up to 3times more vertices than NBA-P to complete same numberof tasks, which leads to a significant waste of resources. irections for Future Work : At its current form, NBA-P is not suitable for scenarios where the task and movementcosts are correlated. Further, the overall paths using NBA-Pcan be longer than those derived via the lawnmower method,especially when multiple priority levels exist and tasks ofdifferent priority level are intertwined. Future directions ofresearch include 1) application of the proposed algorithm tophysical robots in the field, and 2) study of the scenario thatconsiders correlated cost for movement and task execution.A
PPENDIX
Let hl and hr represent the robot moving toward column and j + 1 , respectively. Suppose i = min ( i c , i (cid:48) ) , i = max ( i c , i (cid:48) ) , i = min ( i (cid:48) , i d ) , and i = max ( i (cid:48) , i d ) . Then, t α ( i (cid:48) ) = (cid:40)(cid:80) n − j = j c c e ( e i c ,j + ) + (cid:80) i − i = i c e ( e i + ,n +1 ) , if hl , (cid:80) j c − j =1 c e ( e i c ,j + ) + (cid:80) i − p = i c v ( e i + , ) , if hr .t β ( i (cid:48) ) = n − (cid:88) j =1 c e ( e i (cid:48) ,j + ) .t γ ( i (cid:48) ) = (cid:40)(cid:80) i − i = i c e ( e i + , ) , if hl , (cid:80) i − i = i c e ( e i + ,n +1 ) , if hr . R EFERENCES[1] M. Bietresato, G. Carabin, D. D’Auria, R. Gallo, G. Ristorto,F. Mazzetto, R. Vidoni, A. Gasparetto, and L. Scalera, “A trackedmobile robotic lab for monitoring the plants volume and health,” in
IEEE/ASME Int. Conf. on Mechatronic and Embedded Syst. and Apl.(MESA) , 2016, pp. 1–6.[2] F. A. Auat Cheein and R. Carelli, “Agricultural robotics: Unmannedrobotic service units in agricultural tasks,”
IEEE Industrial ElectronicsMagazine , vol. 7, no. 3, pp. 48–58, 2013.[3] J. J. Rold´an, J. del Cerro, D. Garz´on-Ramos, P. Garcia-Aunon,M. Garz´on, J. de Le´on, and A. Barrientos, “Robots in agriculture:State of art and practical experiences,”
Service Robots , 2018.[4] D. Bochtis, H. W. Griepentrog, S. Vougioukas, P. Busato, R. Berruto,and K. Zhou, “Route planning for orchard operations,”
Computers andElectronics in Agri. , vol. 113, pp. 51–60, 2015.[5] E. Vaeljaots, H. Lehiste, M. Kiik, T. Leemet, et al. , “Soil samplingautomation case-study using unmanned ground vehicle,”
Eng. RuralDev , vol. 17, pp. 982–987, 2018.[6] A. Pretto, S. Aravecchia, W. Burgard, N. Chebrolu, C. Dornhege,T. Falck, F. Fleckenstein, A. Fontenla, M. Imperoli, R. Khanna, et al. ,“Building an aerial-ground robotics system for precision farming,” arXiv preprint arXiv:1911.03098 , 2019.[7] S. Bonadies, A. Lefcourt, and S. A. Gadsden, “A survey of unmannedground vehicles with applications to agricultural and environmentalsensing,” in
Autonomous Air and Ground Sensing Syst. for Argi. Opt.and Phenotyping , vol. 9866, 2016, p. 98660.[8] T. C. Thayer, S. Vougioukas, K. Goldberg, and S. Carpin, “Routingalgorithms for robot assisted precision irrigation,” in
IEEE Int. Conf.on Robotics and Autom. (ICRA) , 2018, pp. 2221–2228.[9] J. Gago, C. Douthe, R. Coopman, P. Gallego, M. Ribas-Carbo,J. Flexas, J. Escalona, and H. Medrano, “UAVs challenge to assesswater stress for sustainable agriculture,”
Agri. Water Management , vol.153, pp. 9–19, 2015.[10] Z. Zheng, F. Zhang, F. Ma, X. Chai, Z. Zhu, J. Shi, and S. Zhang,“Spatiotemporal changes in soil salinity in a drip-irrigated field,”
Geoderma , vol. 149, no. 3-4, pp. 243–248, 2009.[11] H. G. Jones,
Plants and microclimate: a quantitative approach toenvironmental plant physiology . Cambridge university press, 2013.[12] J. Kelly, N. Kljun, P.-O. Olsson, L. Mihai, B. Liljeblad, P. Weslien,L. Klemedtsson, and L. Eklundh, “Challenges and best practices forderiving temperature data from an uncalibrated UAV thermal infraredcamera,”
Remote Sensing , vol. 11, no. 5, p. 567, 2019.[13] H. Viet, V.-H. Dang, M. Laskar, and T. Chung, “Ba: An onlinecomplete coverage algorithm for cleaning robots,”
Applied Intell. ,vol. 39, 2013. [14] X. Kan, H. Teng, and K. Karydis, “Multi-robot field exploration in hex-decomposed environments for dubins vehicles,” in
IEEE Int. Conf. onRobotics and Biomimetics (ROBIO) , 2019, pp. 449–455.[15] ——, “Online exploration and coverage planning in unknown obstacle-cluttered environments,”
IEEE Robotics and Autom. Letters , vol. 5,no. 4, pp. 5969–5976, 2020.[16] T. C. Thayer, S. Vougioukas, K. Goldberg, and S. Carpin, “Multi-robot routing algorithms for robots operating in vineyards,” in
IEEEInt. Conf. on Autom. Science and Engr. (CASE) , 2018, pp. 14–21.[17] ——, “Bi-objective routing for robotic irrigation and sampling invineyards,” in
IEEE Int. Conf. on Autom. Science and Engr. (CASE) ,2019, pp. 1481–1488.[18] A. Gunawan, H. C. Lau, and P. Vansteenwegen, “Orienteering problem:A survey of recent variants, solution approaches and applications,”
Eur.J. of Opr. Research , vol. 255, no. 2, pp. 315–332, 2016.[19] P. Tokekar, J. V. Hook, D. Mulla, and V. Isler, “Sensor planning fora symbiotic UAV and UGV system for precision agriculture,”
IEEETrans. on Robotics , vol. 32, no. 6, pp. 1498–1511, 2016.[20] F. Auat Cheein, M. Torres-Torriti, N. B. Hopfenblatt, ´A. J. Prado, andD. Calabi, “Agricultural service unit motion planning under harvestingscheduling and terrain constraints,”
J. of Field Robotics , vol. 34, no. 8,pp. 1531–1542, 2017.[21] P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundredsof cooperative, autonomous vehicles in warehouses,”
AI magazine ,vol. 29, no. 1, pp. 9–9, 2008.[22] M. Veloso, J. Biswas, B. Coltin, and S. Rosenthal, “Cobots: Robustsymbiotic autonomous mobile service robots,” in
Int. Joint Conf. onArtificial Intell. , 2015.[23] Y. Khaluf, S. Vanhee, and P. Simoens, “Local ant system for allocatingrobot swarms to time-constrained tasks,”
J. of Computational Sci. ,vol. 31, pp. 33–44, 2019.[24] H. Ma, G. Wagner, A. Felner, J. Li, T. K. S. Kumar, and S. Koenig,“Multi-agent path finding with deadlines,”
CoRR , vol. abs/1806.04216,2018. [Online]. Available: http://arxiv.org/abs/1806.04216[25] L. B. Becker, E. Nett, S. Schemmer, and M. Gergeleit, “Robustscheduling in team-robotics,”
J. of Syst. and Software , vol. 77, no. 1,pp. 3–16, 2005.[26] A. Maoudj, B. Bouzouia, A. Hentout, A. Kouider, and R. Toumi,“Distributed multi-agent approach based on priority rules and geneticalgorithm for tasks scheduling in multi-robot cells,” in , 2016, pp. 692–697.[27] F. Betti Sorbelli, S. Carpin, F. Cor`o, A. Navarra, and C. M. Pinotti,“Optimal routing schedules for robots operating in aisle-structures,” in
IEEE Int. Conf. on Robotics and Autom. (ICRA) , 2020, pp. 4927–4933.[28] A. Zelinsky, R. A. Jarvis, J. Byrne, S. Yuta, et al. , “Planning paths ofcomplete coverage of an unstructured environment by a mobile robot,”in
Int. conf. on Advanced Robotics , vol. 13, 1993, pp. 533–538.[29] I. A. Hameed, D. Bochtis, and C. A. Sørensen, “An optimized fieldcoverage planning approach for navigation of agricultural robots infields involving obstacle areas,”
Int. J. of Advanced Robotic Syst. ,vol. 10, no. 5, p. 231, 2013.[30] T. Oksanen and A. Visala, “Coverage path planning algorithms foragricultural field machines,”
J. of Field Robotics , vol. 26, no. 8, pp.651–668, 2009.[31] T. C. Thayer and S. Carpin, “Solving large-scale stochastic orienteeringproblems with aggregation,” in
IEEE/RSJ Int. Conf. on IntelligentRobots and Syst. (IROS) , 2020.[32] A. N. Shiryaev,
Optimal stopping rules . Springer Science & BusinessMedia, 2007, vol. 8.[33] T. H. Chung and J. W. Burdick, “Analysis of search decision makingusing probabilistic search strategies,”
IEEE Trans. on Robotics , vol. 28,no. 1, pp. 132–144, 2012.[34] F. B. Abdelaziz and S. Krichen, “Optimal stopping problems by two ormore decision makers: a survey,”
Computational Management Science ,vol. 4, no. 2, p. 89, 2007.[35] G. Peskir and A. Shiryaev,
Optimal stopping and free-boundaryproblems . Springer, 2006.[36] K. Chen and S. M. Ross, “An adaptive stochastic knapsack problem,”
Eur. J. of OPR. Research , vol. 239, no. 3, pp. 625–635, 2014.[37] G. Best, W. Martens, and R. Fitch, “Path planning with spatiotemporaloptimal stopping for stochastic mission monitoring,”
IEEE Trans. onRobotics , vol. 33, no. 3, pp. 629–646, 2017.[38] J. Das, F. Py, J. B. Harvey, J. P. Ryan, A. Gellene, R. Graham, D. A.Caron, K. Rajan, and G. S. Sukhatme, “Data-driven robotic samplingfor marine ecosystem monitoring,”