Iterative Refinement for Real-Time Multi-Robot Path Planning
IIterative Refinement for Real-Time Multi-Robot Path Planning
Keisuke Okumura , Yasumasa Tamura and Xavier D´efago Abstract — We study the iterative refinement of path planningfor multiple robots, known as multi-agent pathfinding (MAPF).Given a graph, agents, their initial locations, and destinations,a solution of MAPF is a set of paths without collisions.Iterative refinement for MAPF is desirable for three reasons:1) optimization is intractable, 2) sub-optimal solutions can beobtained instantly, and 3) it is anytime planning, desired inonline scenarios where time for deliberation is limited. Despitethe high demand, this is under-explored in MAPF becausefinding good neighborhoods has been unclear so far. Ourproposal uses a sub-optimal MAPF solver to obtain an initialsolution quickly, then iterates the two procedures: 1) selecta subset of agents, 2) use an optimal MAPF solver to refinepaths of selected agents while keeping other paths unchanged.Since the optimal solvers are used on small instances of theproblem, this scheme yields efficient-enough solutions rapidlywhile providing high scalability. We also present reasonablecandidates on how to select a subset of agents. Evaluationsin various scenarios show that the proposal is promising;the convergence is fast, scalable, with reasonable quality, andpractical because it can be interrupted whenever the solutionis needed.
I. I
NTRODUCTION
Path planning for multiple robots is a fundamental problemin multi-robot coordination. The objective is to assign eachrobot on a graph with a collision-free path to its destination.This problem is known under various names: multi-robotpath planning, cooperative path finding, or multi-agent pathfinding (MAPF) [1]. Hereafter, this paper calls it MAPF,where robots are represented as agents moving on a graph.Applications of MAPF are inherently real-time systemswith a limited time for planning, e.g., automated ware-house [2], intersection management [3], airport surface op-eration [4], automated parking [5], and video games [6]. Itis critical to obtain feasible solutions with sufficient qualitybefore deadlines.Contrary to the importance of efficient path planning,optimization is intractable. MAPF is known to be an NP-hard problem for various optimization criteria [7]. This isstill true when restricting fields in grid structures [8], orapproximating within any constant factor less than 4/3 [9].Even with state-of-the-art optimal algorithms, they onlyhandle fewer than 300 agents [10].On the other hand, we can create sub-optimal solutionsin a very short time [11]–[16]. Although they often ignoresolution quality, having feasible solutions is clearly betterthan no solution as a result of waiting for optimal solvers,which are not guaranteed to get ones within the deadlines. The authors are with School of Computing, Tokyo Instituteof Technology, Tokyo, Japan. { okumura.k, tamura.y,defago.x } @coord.c.titech.ac.jp With feasible solutions obtained quickly, we can use theremaining time until the deadlines for iterative refinement .This is the basic motivation of anytime algorithms [17],which can yield a feasible solution whenever interrupted,the quality of which improving as time passes.Iterative refinement is also fruitful for online situationswhere goals are allocated dynamically [18], [19]. The chal-lenging part here is replanning. Two intuitive approachesexist: 1) replan all paths, or 2) replan a single path for onerobot with a new goal while keeping the others unchanged.The first approach may return efficient solutions but it iscostly and typically inappropriate for online use. The secondapproach may return inefficient solutions but is nearly cost-less. We can apply the second approach for an initial solutionthat we can gradually refine within the time constraints.Iterative refinement is promising though under-explored inMAPF because, so far, it was unclear how to incrementallyimprove a known solution. In the context of local search,this corresponds to finding a good neighborhood solution.Hence, we aim to propose an anytime framework of iterativerefinement for MAPF . State-of-the-art anytime MAPF:
Standley and Korf [20]extended their previous work of optimal MAPF algo-rithm [21] and developed an anytime version. Cohen etal. [22] studied an anytime algorithm based on a variantof A ∗ and applied it to MAPF. X ∗ [23] is an anytimeMAPF solver assuming sparse scenarios, i.e., agent dis-tributed sparsely in fields and where potential for collisions israre. These three methods search for non-optimal solutionsby relaxing some constraints, then eventually converge tooptimal solutions by iteratively tightening the constraints. Adrawback is that they are each tied to a specific solver, andthat they may fail to obtain initial solutions in a reasonabletime thus returning nothing. Surynek [24] studied localrepairing rules for a pebble motion on graphs; which canbe adapted to MAPF. Since improvements are done by ad-hoc local changes, redundancies of a priori unknown patternsremain in the solution. Contributions:
We propose a generic framework toprovide anytime MAPF based on an effective combinationof existing solvers. Our framework first uses a sub-optimalMAPF solver to very quickly obtain an initial feasiblesolution, then, it uses an optimal MAPF solver to find goodneighborhood solutions. Precisely, the framework refines thesolution iteratively by selecting a subset of agents and usingan optimal solver to refine their paths while keeping otherpaths fixed. Although the refinement process uses an optimalsolver, each refinement are completed quickly because itsolves a sub-problem whose size depends on the number a r X i v : . [ c s . R O ] F e b f selected agents, typically much smaller than the original.We also present reasonable candidates on how to select asubset of agents.We study the effectiveness of the approach in variousbenchmarks, and observe empirically that the frameworkconverges almost optimally within a short time in small in-stances, and remains responsive even for very large instances(i.e., large environments and/or many agents). In other words,it brings many practical advantages over prior art.In a wider view, our study can also be seen as solving avery large-scale neighborhood search [25]. Closer to our con-cept, Balyo et al. [26] studied local replanning for domain-independent planning problems to optimize makespan. It re-peats the following; create a sub-problem, obtain an optimalsub-solution by SAT-based techniques, and replace the partof the original solution with a new one. Paper Organization:
Section II provides a formal def-inition of MAPF and reviews several MAPF solvers thatwe use. Section III describes the framework including basictheoretical analysis. Section IV presents construction rulesof a subset of agents. Section V evaluates the proposal inMAPF benchmarks. Section VI concludes the paper.II. P
RELIMINARIES
A. Problem Definition (MAPF)
The
MAPF problem is defined as follows. Consider a setof agents A = { a , . . . , a n } evolving in an environmentrepresented as a connected and undirected graph G = ( V, E ) .Let π i [ t ] ∈ V denote the location of an agent a i atdiscrete time t ∈ N . At each timestep t , a i can moveto an adjacent node, or stay at its current location, i.e., π i [ t + 1] ∈ { v | ( π i [ t ] , v ) ∈ E } ∪ { π i [ t ] } . Agents mustavoid two types of conflicts: π i [ t ] (cid:54) = π j [ t ] and π i [ t ] (cid:54) = π j [ t + 1] ∨ π i [ t + 1] (cid:54) = π j [ t ] . Given a distinct initial location π i [0] and a distinct goal g i ∈ V for each agent a i , a solution π = ( π , . . . , π n ) is a collection of paths (one for each agent)where π i = ( π i [0] , π i [1] , . . . , π i [ T ]) such that π i [ T ] = g i .To evaluate solution quality, we use the sum-of-costs metric: (cid:80) a i ∈ A T i where T i is the minimum timestep suchthat π i [ T i ] = π i [ T i + 1] = . . . = π i [ T ] . This is a commonly-used objective in MAPF studies [27]. Whenever obviousfrom context, we simply refer to sum-of-costs as “cost”.We further use the following terms. dist ( u, v ) is the short-est path length between two nodes u, v ∈ V . cost ( a i , π ) isthe cost of an agent a i in a solution π , i.e., T i . B. MAPF Solvers
This part explains several MAPF solvers that we will uselater. Numerous solvers have been developed so far, and theycan be categorized as: optimal or sub-optimal; complete orincomplete; search-based, prioritized planning, or rule-based.See [1], [28] for comprehensive reviews.Conflict-based Search (CBS) [29], a popular optimal andcomplete MAPF solver, uses a two-level search. The high-level search manages conflicts between agents. When a con-flict occurs between two agents at some time and location,two possible resolutions are depending on which agent gets to use the location at that time. Following this observation,CBS constructs a binary tree where each node includesconstraints prohibiting to use space-time pairs for certainagents. In the low-level search, agents find a single path thatis constrained by the corresponding high-level node.Many studies enhance CBS. Improved-CBS (ICBS) [30]prioritizes conflicts when splitting a high-level node with sev-eral conflicts. CBSH [31] adds admissible heuristic for high-level nodes. Improved heuristics have been proposed [32].Enhanced CBS (ECBS) [33], a variant of CBS, is acomplete and bounded suboptimal solver, i.e., a returnedsolution is within a given sub-optimality bound. Instead ofbest-first search, ECBS uses focal search in both high- andlow-level searches. Focal search [34], a variant of A ∗ , allowsexploring efficient nodes not belonging to optimal solutions,e.g., in CBS, high-level nodes with few conflicts.Anytime Focal Search (AFS) [22], an anytime version ofthe focal search, iteratively refines a solution with guaranteedsolution quality. Given enough time, AFS finally convergesto an optimal solution. Cohen et al. [22] applied AFS to thehigh-level search of CBS and realized an anytime MAPFsolver.Different from search-based approaches explained so far,Hierarchical Cooperative A ∗ (HCA ∗ ) [6], neither completenor optimal, takes a decoupled approach. HCA ∗ is a typicalexample of prioritized planning , i.e., it plans paths for agentssequentially while avoiding conflicts with previously plannedpaths. During the construction of prioritized paths, HCA ∗ uses the shortest path length between two locations ignoringcollisions as heuristics. Windowed HCA ∗ (WHCA ∗ ) [6] is avariant of HCA ∗ , which uses a limited lookahead window.In general, prioritized planning is fast, scalable, and practicalwith acceptable costs (e.g., [6], [35], [36]).ˇC´ap et al. [35] analyzed a sufficient condition that se-quential conflict-free paths are constructed, and proposedRevisit Prioritized Planning (RPP) that agents plan pathswhile avoiding initial locations of all lower priority agents.RPP is complete for well-formed instances; for each pair ofstart and goal, a path exists that traverses no other starts andgoals. Note that well-informed instances are hard to realizein dense scenarios.Push and Swap (PS) [14] is complete, sub-optimal, andan example of rule-based approaches. PS relies on twoprimitives: push to move an agent towards its goal, and swap to allow two agents to swap their locations without alteringthe configuration of other agents. It only allows a single agentor a pair of agents to move at each time. In general, rule-based approaches (e.g., [12]–[15]) are the fastest class toobtain feasible solutions but their quality is overlooked.Priority Inheritance with Backtracking (PIBT) [16], in-corporating both prioritized planning and rule-based, planslocations of all agents only for the next timestep and repeatsthis to obtain sub-optimal solutions. It ensures that all agentsreach their goals, but agents might not be on their goalssimultaneously; hence it is incomplete for one-shot MAPF.II. I TERATIVE R EFINEMENT
The framework first obtains an initial solution by a sub-optimal MAPF solver, and then iteratively refines selectedparts of the solutions, the paths of a selected subset ofthe agents, using an optimal MAPF solver. We show thepseudocode in Algorithm 1.
Algorithm 1
The Framework of Iterative Refinement
Input : G, A, { π [0] , . . . , π n [0] } , { g , . . . , g n } Output : solution π or FAILURE π ← initial solution obtained by an MAPF solver if failed to obtain π then return FAILURE while not interrupted do Create a modification set M ⊆ A using π π ← refined MAPF solution for M while fixing the others’ paths in π end while return π An initial feasible solution is quickly obtained by a sub-optimal solver [Line 1]. We refer to the used sub-optimalMAPF solver as an initial solver . If the initial solver failed toobtain solutions, the framework ends with a failure [Line 2];otherwise, the refinement starts [Lines 3–6]. The refinementiteratively follows two procedures until interrupted: 1) Createa modification set M ⊆ A using the current solution π [Line 4]. 2) Refine the current solution π by changingpaths for agents in M [Line 5] using an optimal MAPFsolver. We call this solver a refinement solver . The refinementsolver only changes the paths for agents in M ; paths foragents not in M are unchanged. The refinement continuesuntil interrupted, e.g., timeout, reaching the predeterminediteration number, when no improvement is expected, inter-ruption by users, etc. The framework eventually returns thefinal solution [Line 7].The initial solver can be any sub-optimal MAPF solver,as long as it provides feasible solutions. As the refinementsolver, it is desirable to use versions adapted from an optimalsolver. The adaptation is simple; let it plan paths for agentsin M regarding the others as dynamic obstacles. E.g., forCBS, solve MAPF only for agents in M while prohibitingthe low-level search to use all space-time pairs used by agentsoutside of M . In a precise sense, the refinement solver is notlimited to optimal MAPF solvers. The requirement is that therefined solution never worsens from the original. Consideringthat cost of paths for agents outside of M does not change,the requirement is that cost of paths for agents in M is non-increasing before and after refinement. Corollary 1 (Monotonicity):
For each iteration in Algo-rithm 1, the solution cost is non-increasing.A key point is that the refinement solver recomputes thepaths for a selected subset M of agents, rather than for theentire set A of all agents. Compared to solving the originalproblem directly using optimal solvers, the problem solvedat each iteration by the refinement solver is significantlysmaller, ensuring that the framework is scalable even to alarge number of agents. a a length: k Fig. 1.
The example of a local minimum.
Goals are depicted by arrows.
A. Early Stop
Even though sub-problems solved by the refinement solverare small compared to the original problem, the refinementmay still take too long if | M | is too big. In such cases, itis preferable to abort the current refinement by returning thecurrent solution, and then start a new iteration with a newset M . The criteria can be a timeout or using a thresholdvalue for the size of a search tree in the refinement solver. B. Limitations
As a limitation, the framework may have the local mini-mum with no sub-optimality bounds from the optimal.
Proposition 1 (No sub-optimality bounds):
Consider theoptimal cost c ∗ . In Algorithm 1, there is no w ≥ such thatalways c ≤ wc ∗ unless selecting A itself as a modificationset M , where c is the solution cost in each iteration. Proof:
Consider an example in Fig. 1. Assume that aninitial solution assigns a to a clockwise path (cost: k ) and a to a counterclockwise path (cost: ). With k ≥ , this isnot optimal because a can take a counterclockwise path if a temporally moves over from its goal (sum-of-costs: ).Unless M (cid:54) = A , the solution of the refinement is unchanged.Assume w ≥ such that c ≤ wc ∗ . We can take an arbitrary k , contradicting the existence of w . Corollary 2 (Existence of local minimum):
Dependingon initial solutions, it may be impossible to reach theoptimal solution unless selecting A itself as M .Note that when M = A the refinement solver has to solvethe original MAPF problem.IV. D ESIGN OF M ODIFICATION S ET The modification set is an important component of theframework, and its design will affect the performance suchas computation time and solution quality. This section definesseveral selection rules to provide reasonable candidates.
A. Random
One naive approach is to pick a subset of agents randomly.The size of a modification set M is then a user-specifiedparameter. Note that large | M | has a chance to reduce costslargely in one iteration but takes time for the refinementbecause sub-problems become challenging. B. Single Agent
This rule always picks a single agent as M ; can beregarded as a special case of the previous rule ( random ).Even with a single agent, the cost might be reduced by therefinement. In this case, the refinement becomes just a single-agent path finding problem and can be computed efficientlywithout MAPF solvers, e.g., by A ∗ . a v v v v v v v Fig. 2.
The example of local repiar around goals.
C. Focusing at Goals
Consider an example in Fig. 2. Assume that the cur-rent solution is π = ( v , v , v , v , v ) and π =( v , v , v , v , v ) . An agent a cannot achieve a shorterpath because an agent a uses a goal of a (i.e., v ) at atimestep . In general, for a i , one reason of a gap betweenideal cost dist ( π i [0] , g i ) and real cost cost ( a i , π ) may bethat another agent a j uses a goal for a i (i.e., g i ) at a timestep t ≥ dist ( π i [0] , g i ) . At least before t , a i cannot arrive at g i and remain there. In this case, it is required to jointly refinepaths of a i and a j .This observation motivates to create a following simplerule taking a current solution π and one agent a i as input. M ← { a j | π j [ t ] = g i , dist ( π i [0] , g i ) ≤ t ≤ cost ( a i , π ) } The selecting rule of a i is arbitrary. D. Local Repair around Goals
This is a special case of the previous rule ( focusing-at-goals ). Assume again the example in Fig. 2; π =( v , v , v , v , v ) and π = ( v , v , v , v , v ) . In focusing-at-goals , M for a is { a , a } , therefore, the refinementsolver has to solve a sub-problem with two agents; however,this effort can be reduced. Consider obtaining a betterpath for a ignoring π . In this example, a new path isobtained without searching by local repair around the goal ; ( v , v , v , v , v ) . Next, compute a single path for a whileavoiding collisions with this new path and the other agents’paths. If the sum of costs of two new paths is smaller thanthe original, replace π and π with the new paths. Sincesearch effort is reduced, the refinement is expected to finishfaster.In general, when π i = ( . . . , g i , v, g i , . . . , g i ) where v (cid:54) = g i and another agent a j uses g i at that timestep, this rule canbe applied. E. Using MDD
Given a single path cost c , a set of paths from π i [0] to g i can be compactly represented as a multi-valued decisiondiagram (MDD) [37]; a directed acyclic graph where a vertexis a pair of a location v ∈ V and a timestep t ∈ N . Eachvertex in an MDD satisfies two conditions: 1) a reachablelocation at that timestep from a start and 2) a reachablelocation to a goal from that timestep. Let MDD ci be an MDDfor a i with a cost c . Fig. 3 shows two examples: MDD and MDD . MDDs are used commonly in MAPF solvers [30],[38], [39].Using MDD ci where dist ( π i [0] , g i ) ≤ c < cost ( a i , π ) ,a set of agents interfering with π i can be detected. See anexample in Fig. 3. Assume that the current solution is π = a a v v v v v v v v , MDD v , v , v , MDD v , v , v , v , v , Fig. 3.
The examples of MDD. ( v , v , v , v ) and π = ( v , v , v , v ) . Consider to update MDD by π ; remove vertices of MDD that collides of π ,i.e., ( v , . Then, remove all redundant vertices that do notsatisfy the two conditions due to the first operation: ( v , )and ( v , ). Now it turns out to be impossible that a reachesits goal with a cost of because there is no remaining vertex.In other words, π is preventing that a achieves a smallercost; hence π and π should be jointly refined. We describethe general procedure in Algorithm 2. Algorithm 2 using-MDD
Input : current solution π , an agent a i ∈ A Output : modification set M ⊆ A M ← { a i } for dist ( π i [0] , g i ) ≤ c < cost ( a i , π ) do create MDD ci for a j ∈ A \ { a i } do update MDD ci by π j if MDD ci is updated by π j then M ← M ∪ { a j } end for end for return M F. Using Bottleneck Agent
Consider the example of Fig. 3 again; π = ( v , v , v , v ) and π = ( v , v , v , v ) . If removing π , a can take ashorter path, meaning that, a is a bottleneck for a . Thereis a chance to reduce a cost by refining jointly with sucha bottleneck agent and agents that can take shorter pathswithout the agent. We describe this concept in Algorithm 3. Algorithm 3 using-bottleneck-agent
Input : current solution π , an agent a i ∈ A Output : modification set M ⊆ A M ← { a i } for a j ∈ A \ { a i } do c ← cost of the best path for a j while avoiding collisions with π \ { π i , π j } if c < cost ( a j , π ) then M ← M ∪ { a j } end for return M In our implementation, an agent a i is selected sequentially. . Composition Each rule might have suitable situations, e.g., the rule focusing-at-goals (Sec. IV-C) is costless to create modifi-cation sets, but it might be weak to detect effective setswhen solutions are already efficient to some extent. Onthe other hand, the rule using-MDD (IV-E) takes time butthey are highly expected to detect effective sets. Therefore,one promising direction is to composite these rules, namely,execute the first rule until no improvement is expected, andthen switch to the second rule; same as above.V. E
VALUATION
The experiments consist of six parts: 1) comparing be-tween the selection rules of agents with inefficient initialsolutions, 2) comparing between the rules with efficient initial solutions, 3) evaluation of dependencies on differentinitial solvers , 4) assessing costs compared to the optimal,5) comparing with other anytime MAPF solver, 6) tests inchallenging scenarios, i.e., huge fields with many agents. Weoften use sum-of-costs divided by (cid:80) a i ∈ A dist ( π i [0] , g i ) asthe solution quality; smaller is better and minimum is one.Even though optimal costs are hard to obtain, this scoreworks as an upper bound of sub-optimality. A. Experimental Setup
We carefully picked up several four-connected gridsfrom [27], [40] as a graph G , shown in Fig. 4; they arecommon in MAPF studies. In all settings, we tried identicalinstances between solvers. All instances were created bychoosing randomly initial locations and destinations.As refinement solver, we used one adapted from ICBS [30]for the following reasons. First, CBS [29] is a promising andactively-studied optimal solver; it is however sensitive to tie-break of choosing high-level nodes resulting in pure CBSbeing poorly scalable. ICBS, an extension of CBS, improvesthis aspect. Though not state-of-the-art, ICBS is stable andhas been used in many studies, e.g., [27], [31], [32]. Wethus considered that ICBS was sufficient as a baseline forour experiments. Note that results heavily depend on therefinement solver, meaning that, the refine speed mightbecome much faster with a faster refinement solver.In each setting, we introduced the early stop by the timeout(Sec. III-A); they were adjusted to appropriate values beforeexperiments.In the refinement rules composition (Sec. IV-G), we se-quentially used the rules local-repair-around-goals (IV-D), focusing-at-goals (IV-C), using-MDD (IV-E), and random (IV-A) with 30 agents. These rules were chosen according topreliminary results. The switching is when no improvementis achieved for all agents.Implementations of AFS [22] and CBSH [32] were ob-tained from their respective authors; we used them directly.The simulator, including ICBS [30], ECBS [33], HCA ∗ andWHCA ∗ [6], RPP [35], PS [14], PIBT [16], was developedin C++, and all experiments were run on a laptop with IntelCore i9 2.3GHz CPU and 16GB RAM. Available on https://kei18.github.io/mapf-IR
As a technical point, to obtain a fast, scalable, and com-plete sub-optimal solver with acceptable costs, we combinedtwo solvers: PIBT and PS. PIBT, which repeats one-timestepplanning for all agents, produces solutions with acceptablecosts, however, it is incomplete. PS is complete but onlyallows one agent to move at one timestep, resulting in terribleoutcomes compared to the optimal. Although we compresssolutions from PS while preserving temporal dependenciesof the solution, inspired by techniques in [41], they are stilltoo inefficient. We combine those two as follows. First, runPIBT until timestep max a j ∈ A dist ( π j [0] , g j ) , the minimumtimestep needed for the solution. If some agents are not ontheir goals at that timestep, taking this configuration as anew initial configuration, then obtain the rest of the solutionusing PS. We call this solver PIBT + . Since most parts ofplanning are computed by PIBT, we can expect much betteroutcomes than those of PS. B. with Inefficient Initial Solutions
The first experiment aims at assessing how each rulerefines inefficient initial solutions. The initial solver wasPIBT + . The refinements were stopped after
90 s in the smallfields ( random-32-32-20 and arena ) and
10 min in lak503d .The numbers of agents were fixed to , , and ,respectively. This duration includes the time required for theinitial solver. The refinement timeout was for lak503d ,otherwise
500 ms .Fig. 5 shows the average progress of the refinement over instances. The rules single-agent and local-repair-around-goals reduce costs immediately but soon reach their limits,i.e., no improvement even with room for refinement. The rule focusing-at-goals dramatically improves solution quality ineach case while the rule use-bottleneck-agent does not workwell as expected. Note that PIBT + returned solutions within
500 ms even for the worst case (in lak503d with 500 agents;see also Table I).
C. with Efficient Initial Solutions
Next, we tested the refinement with already efficientsolutions to some extent, obtained by ECBS or RPP. Theused settings were the same as the previous.Fig. 6 shows the results, which reveals a limitation ofthe rule focusing-at-goals in arena and random-32-32-20 ;it is difficult to refine efficient enough solutions by this rule.Rather, the rules using-MDD and random achieve smallerfinal costs. In lak503d , we often obtained initial solutionswith little room for refinement, and the effect of refinementis subtle (see y-axis). Even so, the several rules still improvethe solution quality.Throughout two experiments so far, the rule composition successfully reduced costs with reasonable speeds; we usethis rule hereinafter. D. with Different Initial Solvers
The third experiment evaluates dependencies to ini-tial solvers. We used five initial solvers: PIBT + , HCA ∗ ,WHCA ∗ , ECBS, and RPP. The timeout was set for lak503d , otherwise
500 ms . andom-32-32-20 × (819) random-32-32-10 × (922) random-64-64-20 × (3,270) arena × (2054) lak307d × (4,706) lak503d × (17,953) brc202d × (43,151) ost000a × (130,478) Fig. 4.
Used maps with their sizes. | V | is shown with parentheses. runtime (sec) c o s t / l o w e r b o un d random-32-32-20, 110agents runtime (sec) arena, 300agents runtime (min) lak503d, 500agents local repairsingle agentbottleneckfocusing at goalsMDDrandom (10)random (30)composition Fig. 5.
The average progress of the refinement with inefficient initial solutions.
The initial solver was PIBT + . runtime (sec) c o s t / l o w e r b o un d random-32-32-20, 110agents runtime (sec) arena, 300agents runtime (min) lak503d, 500agents local repairsingle agentbottleneckfocusing at goalsMDDrandom (10)random (30)composition Fig. 6.
The average progress of the refinement with efficient initial solutions. In random-32-32-20 and arena , we used ECBS to obtain the initialsolutions. The suboptimality was . and . respectively, which were adjusted to balance runtime and solution quality. For lak503d , we prepared well-formed instances and used RPP. Note that it is difficult to get such instances with other settings because they are too dense. In lak503d , we do not startthe y-axis from one to see differences between rules; the improvements are tiny. Fig. 7 shows the average progress. Table I summarizesthe details; “cost” is sum-of-costs divided by the lowerbound. We show both initial and last scores. “ instances. Somesolvers failed in some instances because they return failuredue to incompleteness, or, fail to obtain solutions beforethe deadlines (
90 s in random-64-64 and lak307d ;
10 min in lak503d ). “runtime” is when initial solvers return solutions.All scores are average over success instances in all initialsolvers except for lak503d where WHCA ∗ failed most cases;we show the average scores without WHCA ∗ for lak503d .The main observation is that, although the initial costs arewidely different between the solvers, the final costs end upnot so. This implies that any initial solvers can be used ifyou have enough time for the refinement. In the following,we use PIBT + because it instantly returns a feasible solutionand meets well with anytime property. E. v.s. Optimal Solutions
According to Proposition 1, the approximation ratio ofrefined solutions is unbounded from the optimal. In practice,however, the estimation from empirical data is useful hencewe evaluate this. We used two small settings (30 agentsin random-32-32-20 ; 50 agents in random-32-32-10 ) be-cause optimal solvers often fail to obtain solutions withina reasonable time in large fields or with many agents.Optimal solutions were obtained by CBSH. The refinementscontinued for including the time for the initial solver.Fig. 8 summarizes the results of instances, showing thesum-of-costs divided by the optimal costs. The average run-time of CBSH was
710 ms with agents and with agents while the refinement (PIBT + ) got initial solutionsless than in all instances. Despite large gaps between theinitial and optimal costs, the refinement dramatically reduces TABLE IT
HE DETAILED RESULTS WITH DIFFERENT INITIAL SOLVERS . PIBT + HCA ∗ WHCA ∗ ECBS RPPcost (init) 1.219 1.069 1.096 1.037 1.035 random-64-64-20 cost (last) 1.015 1.015 1.015 1.014 1.016300 agents lak307d cost (last) 1.003 1.003 1.003 1.003 1.003300 agents lak503d cost (last) 1.019 1.018 - 1.018 1.019500 agents the gaps within a short time. Furthermore, most solutionsreach the optimal within . F. v.s. Other Anytime MAPF Solver
We next compared the proposal with another anytimeMAPF solver, AFS, using random-32-32-20 while changingthe number of agents. Note that AFS theoretically convergesthe optimal someday but the proposal may not. The refine-ment timeout was
100 ms . We run both algorithms for
30 s .Fig. 9 shows the results of instances. AFS failed toobtain solutions within the time for 2 instances with 90agents. Clearly, the proposal has an advantage; it obtainsinitial solutions immediately while AFS does not, and theconvergence is fast enough with better costs. G. Challenging Scenarios
Finally, we tested the refinement with many agents in hugegrids, namely, agents in brc202d and agents in ost000a . The refinement timeout was for brc202d and
10 s for ost000a .
10 20 30 40 50 60 70 80 90 runtime (sec) c o s t / l o w e r b o un d random-64-64-20, 300agents runtime (sec) lak307d, 300agents runtime (min) lak503d, 500agents PIBT+WHCA*HCA*ECBSRPP Fig. 7.
The average progress of the refinement with different initial solvers . All instances were well-formed. The averages are for succcess instancesin all initial solvers. In lak503d , the scores were calculated removing those of WHCA ∗ because WHCA ∗ failed most cases. From the left, the suboptimalityof ECBS was . , . , and . . The window size of WHCA ∗ was , , . Those parameters were adjusted to balance success rate, cost, and runtime. instances c o s t / o p t i m a l random-32-32-2030agents instances random-32-32-1050agents init0.1s1.0s Fig. 8.
The results v.s. optimal solutions.
We show the suboptimality of50 instances with initial scores, . later, and at . The scores at ishard to recognize because most of them reach the optimal. runtime (sec) s u m - o f - c o s t s
50 agents70 agents90 agents random-32-32-20 AFSproposal
Fig. 9.
The results v.s. another anytime
MAPF solver.
We omit scoresafter
10 s because they are almost flat. The improvements of AFS are subtleand hard to recognize.
Fig. 10 shows the progress of instances with one-hour refinement. The initial solutions were obtained for brc202d and
17 s for ost000a on average. The refinementgradually reduces costs; however, the speed of the refinementis not so fast.VI. D
ISCUSSION AND C ONCLUSION
This paper presented the iterative refinement of pathfinding for multiple robots. The proposal uses two MAPFsolvers as sub-procedures: a sub-optimal solver to obtain aninitial solution and an optimal solver to refine the solution.Although the framework does not guarantee to find the opti-mal solution, the empirical results demonstrate its usefulness,i.e., the framework finds a solution with acceptable costs in asmall computation time with high scalability. Furthermore, itis anytime planning; a desired property for real-time systemswith severe deadlines.According to the experimental results, the cost is reducedto near-optimal regardless of initial solutions; however, it is We preliminary tried other solvers including CBSH and AFS but most ofthem could not solve any instances. HCA ∗ could yield solutions sometimesbut it requires about in brc202d and
15 min in ost000a . As a result,it is unlikely to obtain efficient enough solutions from the beginning in suchhuge scenarios. runtime (min) c o s t / l o w e r b o un d brc202d, 1500 agentsost000a, 3000 agents Fig. 10.
The results of challenging scenarios. better to start with efficient enough solutions if available, be-cause we can get better solutions at an early stage. Therefore,a practical anytime MAPF scheme will be the following.First, in parallel, start several initial solvers with differentportfolios between runtime and solution quality (e.g., PIBT + and RPP). Then, apply refinement to the first solution youget. If another initial solver gets a better solution comparedto the refined solution at that time, replace the current onewith the new one. In other words, this scheme complementsa time lag of an efficient initial solver by a fast inefficientinitial solver.As future directions, we describe the following two: 1)In the experiments, the rule composition , combining severalrules with different features, was successful. The rule com-position itself is reasonable, however, since its componentswere chosen empirically, an automatic selection of such rulesdepending on situations might be promising. 2) In challeng-ing scenarios (Sec. V-G), the refinement happens gradually;not so fast. Developing appropriate rules to achieve fastrefinement for such scenarios is remaining.MAPF studies are very active and our proposal can bebetter with their developments. In particular, we used ICBSin our experiments as the refinement solver but many studiesenhancing CBS, e.g., [31], [42], [43], or, there are otherpromising optimal solvers [38], [44], [45]. This is the samefor sub-optimal solvers as the initial solver, e.g., [36], [46]–[48]. With their effective use, we expect that the frameworkbecomes better than presented here.A CKNOWLEDGMENT
We are grateful to Franc¸ois Bonnet for his comments onthe initial manuscript. We would like to thank Liron Cohenand Jiaoyang Li for sharing with us their implementationof AFS and CBSH, respectively. This work was partlysupported by JSPS KAKENHI Grant Numbers 20J23011.Keisuke Okumura thanks the support of the Yoshida Schol-arship Foundation.
EFERENCES[1] R. Stern, “Multi-agent path finding–an overview,” in
Artificial Intelli-gence - 5th RAAI Summer School , ser. LNCS. Springer, 2019, vol.11866, pp. 96–115.[2] P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundredsof cooperative, autonomous vehicles in warehouses,”
AI magazine ,vol. 29, no. 1, p. 9, 2008.[3] K. Dresner and P. Stone, “A multiagent approach to autonomousintersection management,”
J. Artif. Intell. Res. , vol. 31, pp. 591–656,2008.[4] R. Morris, C. S. Pasareanu, K. S. Luckow, W. Malik, H. Ma, T. S.Kumar, and S. Koenig, “Planning, scheduling and monitoring forairport surface operations.” in
AAAI Workshop: Planning for HybridSystems , 2016.[5] A. Okoso, K. Otaki, and T. Nishi, “Multi-agent path finding withpriority for cooperative automated valet parking,” in
Proc. IEEE Intell.Transp. Syst. Conf. (ITSC) , 2019, pp. 2135–2140.[6] D. Silver, “Cooperative pathfinding.”
AIIDE , vol. 1, pp. 117–122,2005.[7] J. Yu and S. M. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in
Proc. AAAI Conf. on ArtificialIntelligence , 2013.[8] J. Banfi, N. Basilico, and F. Amigoni, “Intractability of time-optimalmultirobot path planning on 2d grid graphs with holes,”
IEEE Robot.Autom. Lett. (RA-L) , vol. 2, no. 4, pp. 1941–1947, 2017.[9] H. Ma, C. Tovey, G. Sharon, T. S. Kumar, and S. Koenig, “Multi-agent path finding with payload transfers and the package-exchangerobot-routing problem,” in
Proc. AAAI Conf. on Artificial Intelligence ,2016.[10] E. Lam and P. Le Bodic, “New valid inequalities in branch-and-cut-and-price for multi-agent path finding,” in
Proc. Intl. Conf. onAutomated Planning and Scheduling (ICAPS) , vol. 30, 2020, pp. 184–192.[11] K.-H. C. Wang, A. Botea, et al. , “Fast and memory-efficient multi-agent pathfinding.” in
Proc. Intl. Conf. on Automated Planning andScheduling (ICAPS) , 2008, pp. 380–387.[12] P. Surynek, “A novel approach to path planning for multiple robotsin bi-connected graphs,” in
Proc. IEEE Intl. Conf. on Robotics andAutomation (ICRA) , 2009, pp. 3613–3619.[13] K.-H. C. Wang and A. Botea, “Mapp: a scalable multi-agent pathplanning algorithm with tractability and completeness guarantees,”
J.Artif. Intell. Res. , vol. 42, pp. 55–90, 2011.[14] R. Luna and K. E. Bekris, “Push and swap: Fast cooperative path-finding with completeness guarantees,” in
Proc. Intl. Joint Conf. onArtificial Intelligence (IJCAI) , 2011, pp. 294–300.[15] B. de Wilde, A. W. ter Mors, and C. Witteveen, “Push and rotate:cooperative multi-agent path planning,” in
Proc. Intl. Joint Conf. onAutonomous Agents & Multiagent Systems (AAMAS) , 2013, pp. 87–94.[16] K. Okumura, M. Machida, X. D´efago, and Y. Tamura, “Priorityinheritance with backtracking for iterative multi-agent path finding,”in
Proc. Intl. Joint Conf. on Artificial Intelligence (IJCAI) , 2019, pp.535–542.[17] S. Zilberstein, “Using anytime algorithms in intelligent systems,”
AImagazine , vol. 17, no. 3, pp. 73–73, 1996.[18] H. Ma, J. Li, T. Kumar, and S. Koenig, “Lifelong multi-agent pathfinding for online pickup and delivery tasks,” in
Proc. Intl. Joint Conf.on Autonomous Agents & Multiagent Systems (AAMAS) , 2017, pp.837–845.[19] J. ˇSvancara, M. Vlk, R. Stern, D. Atzmon, and R. Bart´ak, “Onlinemulti-agent pathfinding,” in
Proc. AAAI Conf. on Artificial Intelligence ,vol. 33, 2019, pp. 7732–7739.[20] T. Standley and R. Korf, “Complete algorithms for cooperativepathfinding problems,” in
Proc. Intl. Joint Conf. on Artificial Intel-ligence (IJCAI) , 2011, p. 668–673.[21] T. Standley, “Finding optimal solutions to cooperative pathfindingproblems,” in
Proc. AAAI Conf. on Artificial Intelligence , vol. 24, no. 1,2010.[22] L. Cohen, M. Greco, H. Ma, C. Hern´andez, A. Felner, T. S. Kumar,and S. Koenig, “Anytime focal search with applications.” in
Proc. Intl.Joint Conf. on Artificial Intelligence (IJCAI) , 2018, pp. 1434–1441.[23] K. Vedder and J. Biswas, “X*: Anytime multi-agent path finding forsparse domains using window-based iterative repairs,”
Artif. Intell. ,vol. 291, p. 103417, 2021. [24] P. Surynek, “Redundancy elimination in highly parallel solutions ofmotion coordination problems,”
Int. J. on Artif. Intell. Tools , vol. 22,no. 05, p. 1360002, 2013.[25] R. K. Ahuja, ¨O. Ergun, J. B. Orlin, and A. P. Punnen, “A surveyof very large-scale neighborhood search techniques,”
Discrete AppliedMathematics , vol. 123, no. 1-3, pp. 75–102, 2002.[26] T. Balyo, R. Bartak, and P. Surynek, “Shortening plans by local re-planning,” in
Proc. IEEE Intl. Conf. on Tools with Artificial Intelli-gence (ICTAI) , 2012, pp. 1022–1028.[27] R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li,D. Atzmon, L. Cohen, T. Kumar, et al. , “Multi-agent pathfinding:Definitions, variants, and benchmarks,” in
Proc. Intl. Symp. on Com-binatorial Search (SoCS) , 2019, pp. 151–159.[28] A. Felner, R. Stern, S. E. Shimony, E. Boyarski, M. Goldenberg,G. Sharon, N. Sturtevant, G. Wagner, and P. Surynek, “Search-basedoptimal solvers for the multi-agent pathfinding problem: Summary andchallenges,” in
Proc. Intl. Symp. on Combinatorial Search (SoCS) ,2017.[29] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-basedsearch for optimal multi-agent pathfinding,”
Artif. Intell. , vol. 219, pp.40–66, 2015.[30] E. Boyarski, A. Felner, R. Stern, G. Sharon, D. Tolpin, O. Betzalel, andE. Shimony, “Icbs: improved conflict-based search algorithm for multi-agent pathfinding,” in
Proc. Intl. Joint Conf. on Artificial Intelligence(IJCAI) , 2015.[31] A. Felner, J. Li, E. Boyarski, H. Ma, L. Cohen, T. S. Kumar, andS. Koenig, “Adding heuristics to conflict-based search for multi-agent path finding,” in
Proc. Intl. Conf. on Automated Planning andScheduling (ICAPS) , 2018.[32] J. Li, A. Felner, E. Boyarski, H. Ma, and S. Koenig, “Improvedheuristics for multi-agent path finding with conflict-based search.” in
Proc. Intl. Joint Conf. on Artificial Intelligence (IJCAI) , 2019, pp.442–449.[33] M. Barer, G. Sharon, R. Stern, and A. Felner, “Suboptimal variantsof the conflict-based search algorithm for the multi-agent pathfindingproblem,” in
Proc. Intl. Symp. on Combinatorial Search (SoCS) , 2014.[34] J. Pearl and J. H. Kim, “Studies in semi-admissible heuristics,”
IEEETrans. Pattern Anal. Mach. Intell. , no. 4, pp. 392–399, 1982.[35] M. ˇC´ap, P. Nov´ak, A. Kleiner, and M. Seleck`y, “Prioritized planningalgorithms for trajectory coordination of multiple mobile robots,”
IEEETrans. Autom. Sci. Eng. , vol. 12, no. 3, pp. 835–849, 2015.[36] H. Ma, D. Harabor, P. J. Stuckey, J. Li, and S. Koenig, “Searchingwith consistent prioritization for multi-agent path finding,” in
Proc.AAAI Conf. on Artificial Intelligence , vol. 33, 2019, pp. 7643–7650.[37] A. Srinivasan, T. Ham, S. Malik, and R. K. Brayton, “Algorithmsfor discrete function manipulation,” in
IEEE Intl. Conf. on Computer-Aided Design (ICCAD) , 1990, pp. 92–95.[38] G. Sharon, R. Stern, M. Goldenberg, and A. Felner, “The increasingcost tree search for optimal multi-agent pathfinding,”
Artificial Intel-ligence , vol. 195, pp. 470–495, 2013.[39] O. Amir, G. Sharon, and R. Stern, “Multi-agent pathfinding as acombinatorial auction,” in
Proc. AAAI Conf. on Artificial Intelligence ,vol. 29, no. 1, 2015.[40] N. R. Sturtevant, “Benchmarks for grid-based pathfinding,”
IEEETrans. on Computational Intelligence and AI in Games , vol. 4, no. 2,pp. 144–148, 2012.[41] W. H¨onig, T. S. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, andS. Koenig, “Multi-agent path finding with kinematic constraints,” in
Proc. Intl. Conf. on Automated Planning and Scheduling (ICAPS) ,2016, pp. 477–485.[42] J. Li, D. Harabor, P. J. Stuckey, H. Ma, and S. Koenig, “Symmetry-breaking constraints for grid-based multi-agent path finding,” in
Proc.AAAI Conf. on Artificial Intelligence , vol. 33, 2019, pp. 6087–6095.[43] H. Zhang, J. Li, P. Surynek, S. Koenig, and T. S. Kumar, “Multi-agent path finding with mutex propagation,” in
Proc. Intl. Conf. onAutomated Planning and Scheduling (ICAPS) , vol. 30, 2020, pp. 323–332.[44] G. Wagner and H. Choset, “Subdimensional expansion for multirobotpath planning,”
Artif. Intell. , vol. 219, pp. 1–24, 2015.[45] E. Lam, P. Le Bodic, D. D. Harabor, and P. J. Stuckey, “Branch-and-cut-and-price for multi-agent pathfinding.” in
Proc. Intl. Joint Conf.on Artificial Intelligence (IJCAI) , 2019, pp. 1289–1296.[46] K. Okumura, Y. Tamura, and X. D´efago, “winpibt: Extended priori-tized algorithm for iterative multi-agent path finding,” arXiv preprintarXiv:1905.10149 , 2019.47] S. D. Han and J. Yu, “Ddm: Fast near-optimal multi-robot pathplanning using diversified-path and optimal sub-problem solutiondatabase heuristics,”
IEEE Robot. Autom. Lett. (RA-L) , vol. 5, no. 2,pp. 1350–1357, 2020.[48] J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimalsearch for multi-agent path finding,” in