[PDF] Iterative Refinement for Real-Time Multi-Robot Path Planning

Abstract

We study the iterative refinement of path planning for multiple robots, known as multi-agent pathfinding (MAPF). Given a graph, agents, their initial locations, and destinations, a solution of MAPF is a set of paths without collisions. Iterative refinement for MAPF is desirable for three reasons: 1)~optimization is intractable, 2)~sub-optimal solutions can be obtained instantly, and 3)~it is anytime planning, desired in online scenarios where time for deliberation is limited. Despite the high demand, this is under-explored in MAPF because finding good neighborhoods has been unclear so far. Our proposal uses a sub-optimal MAPF solver to obtain an initial solution quickly, then iterates the two procedures: 1)~select a subset of agents, 2)~use an optimal MAPF solver to refine paths of selected agents while keeping other paths unchanged. Since the optimal solvers are used on small instances of the problem, this scheme yields efficient-enough solutions rapidly while providing high scalability. We also present reasonable candidates on how to select a subset of agents. Evaluations in various scenarios show that the proposal is promising; the convergence is fast, scalable, and with reasonable quality.

Full PDF

IIterative Reﬁnement for Real-Time Multi-Robot Path Planning

Keisuke Okumura , Yasumasa Tamura and Xavier D´efago Abstract — We study the iterative reﬁnement of path planningfor multiple robots, known as multi-agent pathﬁnding (MAPF).Given a graph, agents, their initial locations, and destinations,a solution of MAPF is a set of paths without collisions.Iterative reﬁnement for MAPF is desirable for three reasons:1) optimization is intractable, 2) sub-optimal solutions can beobtained instantly, and 3) it is anytime planning, desired inonline scenarios where time for deliberation is limited. Despitethe high demand, this is under-explored in MAPF becauseﬁnding good neighborhoods has been unclear so far. Ourproposal uses a sub-optimal MAPF solver to obtain an initialsolution quickly, then iterates the two procedures: 1) selecta subset of agents, 2) use an optimal MAPF solver to reﬁnepaths of selected agents while keeping other paths unchanged.Since the optimal solvers are used on small instances of theproblem, this scheme yields efﬁcient-enough solutions rapidlywhile providing high scalability. We also present reasonablecandidates on how to select a subset of agents. Evaluationsin various scenarios show that the proposal is promising;the convergence is fast, scalable, with reasonable quality, andpractical because it can be interrupted whenever the solutionis needed.

I. I

NTRODUCTION

Path planning for multiple robots is a fundamental problemin multi-robot coordination. The objective is to assign eachrobot on a graph with a collision-free path to its destination.This problem is known under various names: multi-robotpath planning, cooperative path ﬁnding, or multi-agent pathﬁnding (MAPF) [1]. Hereafter, this paper calls it MAPF,where robots are represented as agents moving on a graph.Applications of MAPF are inherently real-time systemswith a limited time for planning, e.g., automated ware-house [2], intersection management [3], airport surface op-eration [4], automated parking [5], and video games [6]. Itis critical to obtain feasible solutions with sufﬁcient qualitybefore deadlines.Contrary to the importance of efﬁcient path planning,optimization is intractable. MAPF is known to be an NP-hard problem for various optimization criteria [7]. This isstill true when restricting ﬁelds in grid structures [8], orapproximating within any constant factor less than 4/3 [9].Even with state-of-the-art optimal algorithms, they onlyhandle fewer than 300 agents [10].On the other hand, we can create sub-optimal solutionsin a very short time [11]–[16]. Although they often ignoresolution quality, having feasible solutions is clearly betterthan no solution as a result of waiting for optimal solvers,which are not guaranteed to get ones within the deadlines. The authors are with School of Computing, Tokyo Instituteof Technology, Tokyo, Japan. { okumura.k, tamura.y,defago.x } @coord.c.titech.ac.jp With feasible solutions obtained quickly, we can use theremaining time until the deadlines for iterative reﬁnement .This is the basic motivation of anytime algorithms [17],which can yield a feasible solution whenever interrupted,the quality of which improving as time passes.Iterative reﬁnement is also fruitful for online situationswhere goals are allocated dynamically [18], [19]. The chal-lenging part here is replanning. Two intuitive approachesexist: 1) replan all paths, or 2) replan a single path for onerobot with a new goal while keeping the others unchanged.The ﬁrst approach may return efﬁcient solutions but it iscostly and typically inappropriate for online use. The secondapproach may return inefﬁcient solutions but is nearly cost-less. We can apply the second approach for an initial solutionthat we can gradually reﬁne within the time constraints.Iterative reﬁnement is promising though under-explored inMAPF because, so far, it was unclear how to incrementallyimprove a known solution. In the context of local search,this corresponds to ﬁnding a good neighborhood solution.Hence, we aim to propose an anytime framework of iterativereﬁnement for MAPF . State-of-the-art anytime MAPF:

Standley and Korf [20]extended their previous work of optimal MAPF algo-rithm [21] and developed an anytime version. Cohen etal. [22] studied an anytime algorithm based on a variantof A ∗ and applied it to MAPF. X ∗ [23] is an anytimeMAPF solver assuming sparse scenarios, i.e., agent dis-tributed sparsely in ﬁelds and where potential for collisions israre. These three methods search for non-optimal solutionsby relaxing some constraints, then eventually converge tooptimal solutions by iteratively tightening the constraints. Adrawback is that they are each tied to a speciﬁc solver, andthat they may fail to obtain initial solutions in a reasonabletime thus returning nothing. Surynek [24] studied localrepairing rules for a pebble motion on graphs; which canbe adapted to MAPF. Since improvements are done by ad-hoc local changes, redundancies of a priori unknown patternsremain in the solution. Contributions:

We propose a generic framework toprovide anytime MAPF based on an effective combinationof existing solvers. Our framework ﬁrst uses a sub-optimalMAPF solver to very quickly obtain an initial feasiblesolution, then, it uses an optimal MAPF solver to ﬁnd goodneighborhood solutions. Precisely, the framework reﬁnes thesolution iteratively by selecting a subset of agents and usingan optimal solver to reﬁne their paths while keeping otherpaths ﬁxed. Although the reﬁnement process uses an optimalsolver, each reﬁnement are completed quickly because itsolves a sub-problem whose size depends on the number a r X i v : . [ c s . R O ] F e b f selected agents, typically much smaller than the original.We also present reasonable candidates on how to select asubset of agents.We study the effectiveness of the approach in variousbenchmarks, and observe empirically that the frameworkconverges almost optimally within a short time in small in-stances, and remains responsive even for very large instances(i.e., large environments and/or many agents). In other words,it brings many practical advantages over prior art.In a wider view, our study can also be seen as solving avery large-scale neighborhood search [25]. Closer to our con-cept, Balyo et al. [26] studied local replanning for domain-independent planning problems to optimize makespan. It re-peats the following; create a sub-problem, obtain an optimalsub-solution by SAT-based techniques, and replace the partof the original solution with a new one. Paper Organization:

Section II provides a formal def-inition of MAPF and reviews several MAPF solvers thatwe use. Section III describes the framework including basictheoretical analysis. Section IV presents construction rulesof a subset of agents. Section V evaluates the proposal inMAPF benchmarks. Section VI concludes the paper.II. P

RELIMINARIES

A. Problem Deﬁnition (MAPF)

The

MAPF problem is deﬁned as follows. Consider a setof agents A = { a , . . . , a n } evolving in an environmentrepresented as a connected and undirected graph G = ( V, E ) .Let π i [ t ] ∈ V denote the location of an agent a i atdiscrete time t ∈ N . At each timestep t , a i can moveto an adjacent node, or stay at its current location, i.e., π i [ t + 1] ∈ { v | ( π i [ t ] , v ) ∈ E } ∪ { π i [ t ] } . Agents mustavoid two types of conﬂicts: π i [ t ] (cid:54) = π j [ t ] and π i [ t ] (cid:54) = π j [ t + 1] ∨ π i [ t + 1] (cid:54) = π j [ t ] . Given a distinct initial location π i [0] and a distinct goal g i ∈ V for each agent a i , a solution π = ( π , . . . , π n ) is a collection of paths (one for each agent)where π i = ( π i [0] , π i [1] , . . . , π i [ T ]) such that π i [ T ] = g i .To evaluate solution quality, we use the sum-of-costs metric: (cid:80) a i ∈ A T i where T i is the minimum timestep suchthat π i [ T i ] = π i [ T i + 1] = . . . = π i [ T ] . This is a commonly-used objective in MAPF studies [27]. Whenever obviousfrom context, we simply refer to sum-of-costs as “cost”.We further use the following terms. dist ( u, v ) is the short-est path length between two nodes u, v ∈ V . cost ( a i , π ) isthe cost of an agent a i in a solution π , i.e., T i . B. MAPF Solvers

This part explains several MAPF solvers that we will uselater. Numerous solvers have been developed so far, and theycan be categorized as: optimal or sub-optimal; complete orincomplete; search-based, prioritized planning, or rule-based.See [1], [28] for comprehensive reviews.Conﬂict-based Search (CBS) [29], a popular optimal andcomplete MAPF solver, uses a two-level search. The high-level search manages conﬂicts between agents. When a con-ﬂict occurs between two agents at some time and location,two possible resolutions are depending on which agent gets to use the location at that time. Following this observation,CBS constructs a binary tree where each node includesconstraints prohibiting to use space-time pairs for certainagents. In the low-level search, agents ﬁnd a single path thatis constrained by the corresponding high-level node.Many studies enhance CBS. Improved-CBS (ICBS) [30]prioritizes conﬂicts when splitting a high-level node with sev-eral conﬂicts. CBSH [31] adds admissible heuristic for high-level nodes. Improved heuristics have been proposed [32].Enhanced CBS (ECBS) [33], a variant of CBS, is acomplete and bounded suboptimal solver, i.e., a returnedsolution is within a given sub-optimality bound. Instead ofbest-ﬁrst search, ECBS uses focal search in both high- andlow-level searches. Focal search [34], a variant of A ∗ , allowsexploring efﬁcient nodes not belonging to optimal solutions,e.g., in CBS, high-level nodes with few conﬂicts.Anytime Focal Search (AFS) [22], an anytime version ofthe focal search, iteratively reﬁnes a solution with guaranteedsolution quality. Given enough time, AFS ﬁnally convergesto an optimal solution. Cohen et al. [22] applied AFS to thehigh-level search of CBS and realized an anytime MAPFsolver.Different from search-based approaches explained so far,Hierarchical Cooperative A ∗ (HCA ∗ ) [6], neither completenor optimal, takes a decoupled approach. HCA ∗ is a typicalexample of prioritized planning , i.e., it plans paths for agentssequentially while avoiding conﬂicts with previously plannedpaths. During the construction of prioritized paths, HCA ∗ uses the shortest path length between two locations ignoringcollisions as heuristics. Windowed HCA ∗ (WHCA ∗ ) [6] is avariant of HCA ∗ , which uses a limited lookahead window.In general, prioritized planning is fast, scalable, and practicalwith acceptable costs (e.g., [6], [35], [36]).ˇC´ap et al. [35] analyzed a sufﬁcient condition that se-quential conﬂict-free paths are constructed, and proposedRevisit Prioritized Planning (RPP) that agents plan pathswhile avoiding initial locations of all lower priority agents.RPP is complete for well-formed instances; for each pair ofstart and goal, a path exists that traverses no other starts andgoals. Note that well-informed instances are hard to realizein dense scenarios.Push and Swap (PS) [14] is complete, sub-optimal, andan example of rule-based approaches. PS relies on twoprimitives: push to move an agent towards its goal, and swap to allow two agents to swap their locations without alteringthe conﬁguration of other agents. It only allows a single agentor a pair of agents to move at each time. In general, rule-based approaches (e.g., [12]–[15]) are the fastest class toobtain feasible solutions but their quality is overlooked.Priority Inheritance with Backtracking (PIBT) [16], in-corporating both prioritized planning and rule-based, planslocations of all agents only for the next timestep and repeatsthis to obtain sub-optimal solutions. It ensures that all agentsreach their goals, but agents might not be on their goalssimultaneously; hence it is incomplete for one-shot MAPF.II. I TERATIVE R EFINEMENT

The framework ﬁrst obtains an initial solution by a sub-optimal MAPF solver, and then iteratively reﬁnes selectedparts of the solutions, the paths of a selected subset ofthe agents, using an optimal MAPF solver. We show thepseudocode in Algorithm 1.

Algorithm 1

The Framework of Iterative Reﬁnement

Input : G, A, { π [0] , . . . , π n [0] } , { g , . . . , g n } Output : solution π or FAILURE π ← initial solution obtained by an MAPF solver if failed to obtain π then return FAILURE while not interrupted do Create a modiﬁcation set M ⊆ A using π π ← reﬁned MAPF solution for M while ﬁxing the others’ paths in π end while return π An initial feasible solution is quickly obtained by a sub-optimal solver [Line 1]. We refer to the used sub-optimalMAPF solver as an initial solver . If the initial solver failed toobtain solutions, the framework ends with a failure [Line 2];otherwise, the reﬁnement starts [Lines 3–6]. The reﬁnementiteratively follows two procedures until interrupted: 1) Createa modiﬁcation set M ⊆ A using the current solution π [Line 4]. 2) Reﬁne the current solution π by changingpaths for agents in M [Line 5] using an optimal MAPFsolver. We call this solver a reﬁnement solver . The reﬁnementsolver only changes the paths for agents in M ; paths foragents not in M are unchanged. The reﬁnement continuesuntil interrupted, e.g., timeout, reaching the predeterminediteration number, when no improvement is expected, inter-ruption by users, etc. The framework eventually returns theﬁnal solution [Line 7].The initial solver can be any sub-optimal MAPF solver,as long as it provides feasible solutions. As the reﬁnementsolver, it is desirable to use versions adapted from an optimalsolver. The adaptation is simple; let it plan paths for agentsin M regarding the others as dynamic obstacles. E.g., forCBS, solve MAPF only for agents in M while prohibitingthe low-level search to use all space-time pairs used by agentsoutside of M . In a precise sense, the reﬁnement solver is notlimited to optimal MAPF solvers. The requirement is that thereﬁned solution never worsens from the original. Consideringthat cost of paths for agents outside of M does not change,the requirement is that cost of paths for agents in M is non-increasing before and after reﬁnement. Corollary 1 (Monotonicity):

For each iteration in Algo-rithm 1, the solution cost is non-increasing.A key point is that the reﬁnement solver recomputes thepaths for a selected subset M of agents, rather than for theentire set A of all agents. Compared to solving the originalproblem directly using optimal solvers, the problem solvedat each iteration by the reﬁnement solver is signiﬁcantlysmaller, ensuring that the framework is scalable even to alarge number of agents. a a length: k Fig. 1.

The example of a local minimum.

Goals are depicted by arrows.

A. Early Stop

Even though sub-problems solved by the reﬁnement solverare small compared to the original problem, the reﬁnementmay still take too long if | M | is too big. In such cases, itis preferable to abort the current reﬁnement by returning thecurrent solution, and then start a new iteration with a newset M . The criteria can be a timeout or using a thresholdvalue for the size of a search tree in the reﬁnement solver. B. Limitations

As a limitation, the framework may have the local mini-mum with no sub-optimality bounds from the optimal.

Proposition 1 (No sub-optimality bounds):

Consider theoptimal cost c ∗ . In Algorithm 1, there is no w ≥ such thatalways c ≤ wc ∗ unless selecting A itself as a modiﬁcationset M , where c is the solution cost in each iteration. Proof:

Consider an example in Fig. 1. Assume that aninitial solution assigns a to a clockwise path (cost: k ) and a to a counterclockwise path (cost: ). With k ≥ , this isnot optimal because a can take a counterclockwise path if a temporally moves over from its goal (sum-of-costs: ).Unless M (cid:54) = A , the solution of the reﬁnement is unchanged.Assume w ≥ such that c ≤ wc ∗ . We can take an arbitrary k , contradicting the existence of w . Corollary 2 (Existence of local minimum):

Dependingon initial solutions, it may be impossible to reach theoptimal solution unless selecting A itself as M .Note that when M = A the reﬁnement solver has to solvethe original MAPF problem.IV. D ESIGN OF M ODIFICATION S ET The modiﬁcation set is an important component of theframework, and its design will affect the performance suchas computation time and solution quality. This section deﬁnesseveral selection rules to provide reasonable candidates.

A. Random

One naive approach is to pick a subset of agents randomly.The size of a modiﬁcation set M is then a user-speciﬁedparameter. Note that large | M | has a chance to reduce costslargely in one iteration but takes time for the reﬁnementbecause sub-problems become challenging. B. Single Agent

This rule always picks a single agent as M ; can beregarded as a special case of the previous rule ( random ).Even with a single agent, the cost might be reduced by thereﬁnement. In this case, the reﬁnement becomes just a single-agent path ﬁnding problem and can be computed efﬁcientlywithout MAPF solvers, e.g., by A ∗ . a v v v v v v v Fig. 2.

The example of local repiar around goals.

C. Focusing at Goals

Consider an example in Fig. 2. Assume that the cur-rent solution is π = ( v , v , v , v , v ) and π =( v , v , v , v , v ) . An agent a cannot achieve a shorterpath because an agent a uses a goal of a (i.e., v ) at atimestep . In general, for a i , one reason of a gap betweenideal cost dist ( π i [0] , g i ) and real cost cost ( a i , π ) may bethat another agent a j uses a goal for a i (i.e., g i ) at a timestep t ≥ dist ( π i [0] , g i ) . At least before t , a i cannot arrive at g i and remain there. In this case, it is required to jointly reﬁnepaths of a i and a j .This observation motivates to create a following simplerule taking a current solution π and one agent a i as input. M ← { a j | π j [ t ] = g i , dist ( π i [0] , g i ) ≤ t ≤ cost ( a i , π ) } The selecting rule of a i is arbitrary. D. Local Repair around Goals

This is a special case of the previous rule ( focusing-at-goals ). Assume again the example in Fig. 2; π =( v , v , v , v , v ) and π = ( v , v , v , v , v ) . In focusing-at-goals , M for a is { a , a } , therefore, the reﬁnementsolver has to solve a sub-problem with two agents; however,this effort can be reduced. Consider obtaining a betterpath for a ignoring π . In this example, a new path isobtained without searching by local repair around the goal ; ( v , v , v , v , v ) . Next, compute a single path for a whileavoiding collisions with this new path and the other agents’paths. If the sum of costs of two new paths is smaller thanthe original, replace π and π with the new paths. Sincesearch effort is reduced, the reﬁnement is expected to ﬁnishfaster.In general, when π i = ( . . . , g i , v, g i , . . . , g i ) where v (cid:54) = g i and another agent a j uses g i at that timestep, this rule canbe applied. E. Using MDD

Given a single path cost c , a set of paths from π i [0] to g i can be compactly represented as a multi-valued decisiondiagram (MDD) [37]; a directed acyclic graph where a vertexis a pair of a location v ∈ V and a timestep t ∈ N . Eachvertex in an MDD satisﬁes two conditions: 1) a reachablelocation at that timestep from a start and 2) a reachablelocation to a goal from that timestep. Let MDD ci be an MDDfor a i with a cost c . Fig. 3 shows two examples: MDD and MDD . MDDs are used commonly in MAPF solvers [30],[38], [39].Using MDD ci where dist ( π i [0] , g i ) ≤ c < cost ( a i , π ) ,a set of agents interfering with π i can be detected. See anexample in Fig. 3. Assume that the current solution is π = a a v v v v v v v v , MDD v , v , v , MDD v , v , v , v , v , Fig. 3.

The examples of MDD. ( v , v , v , v ) and π = ( v , v , v , v ) . Consider to update MDD by π ; remove vertices of MDD that collides of π ,i.e., ( v , . Then, remove all redundant vertices that do notsatisfy the two conditions due to the ﬁrst operation: ( v , )and ( v , ). Now it turns out to be impossible that a reachesits goal with a cost of because there is no remaining vertex.In other words, π is preventing that a achieves a smallercost; hence π and π should be jointly reﬁned. We describethe general procedure in Algorithm 2. Algorithm 2 using-MDD

Input : current solution π , an agent a i ∈ A Output : modiﬁcation set M ⊆ A M ← { a i } for dist ( π i [0] , g i ) ≤ c < cost ( a i , π ) do create MDD ci for a j ∈ A \ { a i } do update MDD ci by π j if MDD ci is updated by π j then M ← M ∪ { a j } end for end for return M F. Using Bottleneck Agent

Consider the example of Fig. 3 again; π = ( v , v , v , v ) and π = ( v , v , v , v ) . If removing π , a can take ashorter path, meaning that, a is a bottleneck for a . Thereis a chance to reduce a cost by reﬁning jointly with sucha bottleneck agent and agents that can take shorter pathswithout the agent. We describe this concept in Algorithm 3. Algorithm 3 using-bottleneck-agent

Input : current solution π , an agent a i ∈ A Output : modiﬁcation set M ⊆ A M ← { a i } for a j ∈ A \ { a i } do c ← cost of the best path for a j while avoiding collisions with π \ { π i , π j } if c < cost ( a j , π ) then M ← M ∪ { a j } end for return M In our implementation, an agent a i is selected sequentially. . Composition Each rule might have suitable situations, e.g., the rule focusing-at-goals (Sec. IV-C) is costless to create modiﬁ-cation sets, but it might be weak to detect effective setswhen solutions are already efﬁcient to some extent. Onthe other hand, the rule using-MDD (IV-E) takes time butthey are highly expected to detect effective sets. Therefore,one promising direction is to composite these rules, namely,execute the ﬁrst rule until no improvement is expected, andthen switch to the second rule; same as above.V. E

VALUATION

The experiments consist of six parts: 1) comparing be-tween the selection rules of agents with inefﬁcient initialsolutions, 2) comparing between the rules with efﬁcient initial solutions, 3) evaluation of dependencies on differentinitial solvers , 4) assessing costs compared to the optimal,5) comparing with other anytime MAPF solver, 6) tests inchallenging scenarios, i.e., huge ﬁelds with many agents. Weoften use sum-of-costs divided by (cid:80) a i ∈ A dist ( π i [0] , g i ) asthe solution quality; smaller is better and minimum is one.Even though optimal costs are hard to obtain, this scoreworks as an upper bound of sub-optimality. A. Experimental Setup

We carefully picked up several four-connected gridsfrom [27], [40] as a graph G , shown in Fig. 4; they arecommon in MAPF studies. In all settings, we tried identicalinstances between solvers. All instances were created bychoosing randomly initial locations and destinations.As reﬁnement solver, we used one adapted from ICBS [30]for the following reasons. First, CBS [29] is a promising andactively-studied optimal solver; it is however sensitive to tie-break of choosing high-level nodes resulting in pure CBSbeing poorly scalable. ICBS, an extension of CBS, improvesthis aspect. Though not state-of-the-art, ICBS is stable andhas been used in many studies, e.g., [27], [31], [32]. Wethus considered that ICBS was sufﬁcient as a baseline forour experiments. Note that results heavily depend on thereﬁnement solver, meaning that, the reﬁne speed mightbecome much faster with a faster reﬁnement solver.In each setting, we introduced the early stop by the timeout(Sec. III-A); they were adjusted to appropriate values beforeexperiments.In the reﬁnement rules composition (Sec. IV-G), we se-quentially used the rules local-repair-around-goals (IV-D), focusing-at-goals (IV-C), using-MDD (IV-E), and random (IV-A) with 30 agents. These rules were chosen according topreliminary results. The switching is when no improvementis achieved for all agents.Implementations of AFS [22] and CBSH [32] were ob-tained from their respective authors; we used them directly.The simulator, including ICBS [30], ECBS [33], HCA ∗ andWHCA ∗ [6], RPP [35], PS [14], PIBT [16], was developedin C++, and all experiments were run on a laptop with IntelCore i9 2.3GHz CPU and 16GB RAM. Available on https://kei18.github.io/mapf-IR

As a technical point, to obtain a fast, scalable, and com-plete sub-optimal solver with acceptable costs, we combinedtwo solvers: PIBT and PS. PIBT, which repeats one-timestepplanning for all agents, produces solutions with acceptablecosts, however, it is incomplete. PS is complete but onlyallows one agent to move at one timestep, resulting in terribleoutcomes compared to the optimal. Although we compresssolutions from PS while preserving temporal dependenciesof the solution, inspired by techniques in [41], they are stilltoo inefﬁcient. We combine those two as follows. First, runPIBT until timestep max a j ∈ A dist ( π j [0] , g j ) , the minimumtimestep needed for the solution. If some agents are not ontheir goals at that timestep, taking this conﬁguration as anew initial conﬁguration, then obtain the rest of the solutionusing PS. We call this solver PIBT + . Since most parts ofplanning are computed by PIBT, we can expect much betteroutcomes than those of PS. B. with Inefﬁcient Initial Solutions

The ﬁrst experiment aims at assessing how each rulereﬁnes inefﬁcient initial solutions. The initial solver wasPIBT + . The reﬁnements were stopped after

90 s in the smallﬁelds ( random-32-32-20 and arena ) and

10 min in lak503d .The numbers of agents were ﬁxed to , , and ,respectively. This duration includes the time required for theinitial solver. The reﬁnement timeout was for lak503d ,otherwise

500 ms .Fig. 5 shows the average progress of the reﬁnement over instances. The rules single-agent and local-repair-around-goals reduce costs immediately but soon reach their limits,i.e., no improvement even with room for reﬁnement. The rule focusing-at-goals dramatically improves solution quality ineach case while the rule use-bottleneck-agent does not workwell as expected. Note that PIBT + returned solutions within

500 ms even for the worst case (in lak503d with 500 agents;see also Table I).

C. with Efﬁcient Initial Solutions

Next, we tested the reﬁnement with already efﬁcientsolutions to some extent, obtained by ECBS or RPP. Theused settings were the same as the previous.Fig. 6 shows the results, which reveals a limitation ofthe rule focusing-at-goals in arena and random-32-32-20 ;it is difﬁcult to reﬁne efﬁcient enough solutions by this rule.Rather, the rules using-MDD and random achieve smallerﬁnal costs. In lak503d , we often obtained initial solutionswith little room for reﬁnement, and the effect of reﬁnementis subtle (see y-axis). Even so, the several rules still improvethe solution quality.Throughout two experiments so far, the rule composition successfully reduced costs with reasonable speeds; we usethis rule hereinafter. D. with Different Initial Solvers

The third experiment evaluates dependencies to ini-tial solvers. We used ﬁve initial solvers: PIBT + , HCA ∗ ,WHCA ∗ , ECBS, and RPP. The timeout was set for lak503d , otherwise

500 ms . andom-32-32-20 × (819) random-32-32-10 × (922) random-64-64-20 × (3,270) arena × (2054) lak307d × (4,706) lak503d × (17,953) brc202d × (43,151) ost000a × (130,478) Fig. 4.

Used maps with their sizes. | V | is shown with parentheses. runtime (sec) c o s t / l o w e r b o un d random-32-32-20, 110agents runtime (sec) arena, 300agents runtime (min) lak503d, 500agents local repairsingle agentbottleneckfocusing at goalsMDDrandom (10)random (30)composition Fig. 5.

The average progress of the reﬁnement with inefﬁcient initial solutions.

The initial solver was PIBT + . runtime (sec) c o s t / l o w e r b o un d random-32-32-20, 110agents runtime (sec) arena, 300agents runtime (min) lak503d, 500agents local repairsingle agentbottleneckfocusing at goalsMDDrandom (10)random (30)composition Fig. 6.

The average progress of the reﬁnement with efﬁcient initial solutions. In random-32-32-20 and arena , we used ECBS to obtain the initialsolutions. The suboptimality was . and . respectively, which were adjusted to balance runtime and solution quality. For lak503d , we prepared well-formed instances and used RPP. Note that it is difﬁcult to get such instances with other settings because they are too dense. In lak503d , we do not startthe y-axis from one to see differences between rules; the improvements are tiny. Fig. 7 shows the average progress. Table I summarizesthe details; “cost” is sum-of-costs divided by the lowerbound. We show both initial and last scores. “ instances. Somesolvers failed in some instances because they return failuredue to incompleteness, or, fail to obtain solutions beforethe deadlines (

90 s in random-64-64 and lak307d ;

10 min in lak503d ). “runtime” is when initial solvers return solutions.All scores are average over success instances in all initialsolvers except for lak503d where WHCA ∗ failed most cases;we show the average scores without WHCA ∗ for lak503d .The main observation is that, although the initial costs arewidely different between the solvers, the ﬁnal costs end upnot so. This implies that any initial solvers can be used ifyou have enough time for the reﬁnement. In the following,we use PIBT + because it instantly returns a feasible solutionand meets well with anytime property. E. v.s. Optimal Solutions

According to Proposition 1, the approximation ratio ofreﬁned solutions is unbounded from the optimal. In practice,however, the estimation from empirical data is useful hencewe evaluate this. We used two small settings (30 agentsin random-32-32-20 ; 50 agents in random-32-32-10 ) be-cause optimal solvers often fail to obtain solutions withina reasonable time in large ﬁelds or with many agents.Optimal solutions were obtained by CBSH. The reﬁnementscontinued for including the time for the initial solver.Fig. 8 summarizes the results of instances, showing thesum-of-costs divided by the optimal costs. The average run-time of CBSH was

710 ms with agents and with agents while the reﬁnement (PIBT + ) got initial solutionsless than in all instances. Despite large gaps between theinitial and optimal costs, the reﬁnement dramatically reduces TABLE IT

HE DETAILED RESULTS WITH DIFFERENT INITIAL SOLVERS . PIBT + HCA ∗ WHCA ∗ ECBS RPPcost (init) 1.219 1.069 1.096 1.037 1.035 random-64-64-20 cost (last) 1.015 1.015 1.015 1.014 1.016300 agents lak307d cost (last) 1.003 1.003 1.003 1.003 1.003300 agents lak503d cost (last) 1.019 1.018 - 1.018 1.019500 agents the gaps within a short time. Furthermore, most solutionsreach the optimal within . F. v.s. Other Anytime MAPF Solver

We next compared the proposal with another anytimeMAPF solver, AFS, using random-32-32-20 while changingthe number of agents. Note that AFS theoretically convergesthe optimal someday but the proposal may not. The reﬁne-ment timeout was

100 ms . We run both algorithms for

30 s .Fig. 9 shows the results of instances. AFS failed toobtain solutions within the time for 2 instances with 90agents. Clearly, the proposal has an advantage; it obtainsinitial solutions immediately while AFS does not, and theconvergence is fast enough with better costs. G. Challenging Scenarios

Finally, we tested the reﬁnement with many agents in hugegrids, namely, agents in brc202d and agents in ost000a . The reﬁnement timeout was for brc202d and

10 s for ost000a .

10 20 30 40 50 60 70 80 90 runtime (sec) c o s t / l o w e r b o un d random-64-64-20, 300agents runtime (sec) lak307d, 300agents runtime (min) lak503d, 500agents PIBT+WHCA*HCA*ECBSRPP Fig. 7.

The average progress of the reﬁnement with different initial solvers . All instances were well-formed. The averages are for succcess instancesin all initial solvers. In lak503d , the scores were calculated removing those of WHCA ∗ because WHCA ∗ failed most cases. From the left, the suboptimalityof ECBS was . , . , and . . The window size of WHCA ∗ was , , . Those parameters were adjusted to balance success rate, cost, and runtime. instances c o s t / o p t i m a l random-32-32-2030agents instances random-32-32-1050agents init0.1s1.0s Fig. 8.

The results v.s. optimal solutions.

We show the suboptimality of50 instances with initial scores, . later, and at . The scores at ishard to recognize because most of them reach the optimal. runtime (sec) s u m - o f - c o s t s

50 agents70 agents90 agents random-32-32-20 AFSproposal

Fig. 9.

The results v.s. another anytime

MAPF solver.

We omit scoresafter

10 s because they are almost ﬂat. The improvements of AFS are subtleand hard to recognize.

Fig. 10 shows the progress of instances with one-hour reﬁnement. The initial solutions were obtained for brc202d and

17 s for ost000a on average. The reﬁnementgradually reduces costs; however, the speed of the reﬁnementis not so fast.VI. D

ISCUSSION AND C ONCLUSION

This paper presented the iterative reﬁnement of pathﬁnding for multiple robots. The proposal uses two MAPFsolvers as sub-procedures: a sub-optimal solver to obtain aninitial solution and an optimal solver to reﬁne the solution.Although the framework does not guarantee to ﬁnd the opti-mal solution, the empirical results demonstrate its usefulness,i.e., the framework ﬁnds a solution with acceptable costs in asmall computation time with high scalability. Furthermore, itis anytime planning; a desired property for real-time systemswith severe deadlines.According to the experimental results, the cost is reducedto near-optimal regardless of initial solutions; however, it is We preliminary tried other solvers including CBSH and AFS but most ofthem could not solve any instances. HCA ∗ could yield solutions sometimesbut it requires about in brc202d and

15 min in ost000a . As a result,it is unlikely to obtain efﬁcient enough solutions from the beginning in suchhuge scenarios. runtime (min) c o s t / l o w e r b o un d brc202d, 1500 agentsost000a, 3000 agents Fig. 10.

The results of challenging scenarios. better to start with efﬁcient enough solutions if available, be-cause we can get better solutions at an early stage. Therefore,a practical anytime MAPF scheme will be the following.First, in parallel, start several initial solvers with differentportfolios between runtime and solution quality (e.g., PIBT + and RPP). Then, apply reﬁnement to the ﬁrst solution youget. If another initial solver gets a better solution comparedto the reﬁned solution at that time, replace the current onewith the new one. In other words, this scheme complementsa time lag of an efﬁcient initial solver by a fast inefﬁcientinitial solver.As future directions, we describe the following two: 1)In the experiments, the rule composition , combining severalrules with different features, was successful. The rule com-position itself is reasonable, however, since its componentswere chosen empirically, an automatic selection of such rulesdepending on situations might be promising. 2) In challeng-ing scenarios (Sec. V-G), the reﬁnement happens gradually;not so fast. Developing appropriate rules to achieve fastreﬁnement for such scenarios is remaining.MAPF studies are very active and our proposal can bebetter with their developments. In particular, we used ICBSin our experiments as the reﬁnement solver but many studiesenhancing CBS, e.g., [31], [42], [43], or, there are otherpromising optimal solvers [38], [44], [45]. This is the samefor sub-optimal solvers as the initial solver, e.g., [36], [46]–[48]. With their effective use, we expect that the frameworkbecomes better than presented here.A CKNOWLEDGMENT

We are grateful to Franc¸ois Bonnet for his comments onthe initial manuscript. We would like to thank Liron Cohenand Jiaoyang Li for sharing with us their implementationof AFS and CBSH, respectively. This work was partlysupported by JSPS KAKENHI Grant Numbers 20J23011.Keisuke Okumura thanks the support of the Yoshida Schol-arship Foundation.

EFERENCES[1] R. Stern, “Multi-agent path ﬁnding–an overview,” in

Artiﬁcial Intelli-gence - 5th RAAI Summer School , ser. LNCS. Springer, 2019, vol.11866, pp. 96–115.[2] P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundredsof cooperative, autonomous vehicles in warehouses,”

AI magazine ,vol. 29, no. 1, p. 9, 2008.[3] K. Dresner and P. Stone, “A multiagent approach to autonomousintersection management,”

J. Artif. Intell. Res. , vol. 31, pp. 591–656,2008.[4] R. Morris, C. S. Pasareanu, K. S. Luckow, W. Malik, H. Ma, T. S.Kumar, and S. Koenig, “Planning, scheduling and monitoring forairport surface operations.” in

AAAI Workshop: Planning for HybridSystems , 2016.[5] A. Okoso, K. Otaki, and T. Nishi, “Multi-agent path ﬁnding withpriority for cooperative automated valet parking,” in

Proc. IEEE Intell.Transp. Syst. Conf. (ITSC) , 2019, pp. 2135–2140.[6] D. Silver, “Cooperative pathﬁnding.”

AIIDE , vol. 1, pp. 117–122,2005.[7] J. Yu and S. M. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in

Proc. AAAI Conf. on ArtiﬁcialIntelligence , 2013.[8] J. Banﬁ, N. Basilico, and F. Amigoni, “Intractability of time-optimalmultirobot path planning on 2d grid graphs with holes,”

IEEE Robot.Autom. Lett. (RA-L) , vol. 2, no. 4, pp. 1941–1947, 2017.[9] H. Ma, C. Tovey, G. Sharon, T. S. Kumar, and S. Koenig, “Multi-agent path ﬁnding with payload transfers and the package-exchangerobot-routing problem,” in

Proc. AAAI Conf. on Artiﬁcial Intelligence ,2016.[10] E. Lam and P. Le Bodic, “New valid inequalities in branch-and-cut-and-price for multi-agent path ﬁnding,” in

Proc. Intl. Conf. onAutomated Planning and Scheduling (ICAPS) , vol. 30, 2020, pp. 184–192.[11] K.-H. C. Wang, A. Botea, et al. , “Fast and memory-efﬁcient multi-agent pathﬁnding.” in

Proc. Intl. Conf. on Automated Planning andScheduling (ICAPS) , 2008, pp. 380–387.[12] P. Surynek, “A novel approach to path planning for multiple robotsin bi-connected graphs,” in

Proc. IEEE Intl. Conf. on Robotics andAutomation (ICRA) , 2009, pp. 3613–3619.[13] K.-H. C. Wang and A. Botea, “Mapp: a scalable multi-agent pathplanning algorithm with tractability and completeness guarantees,”

J.Artif. Intell. Res. , vol. 42, pp. 55–90, 2011.[14] R. Luna and K. E. Bekris, “Push and swap: Fast cooperative path-ﬁnding with completeness guarantees,” in

Proc. Intl. Joint Conf. onArtiﬁcial Intelligence (IJCAI) , 2011, pp. 294–300.[15] B. de Wilde, A. W. ter Mors, and C. Witteveen, “Push and rotate:cooperative multi-agent path planning,” in

Proc. Intl. Joint Conf. onAutonomous Agents & Multiagent Systems (AAMAS) , 2013, pp. 87–94.[16] K. Okumura, M. Machida, X. D´efago, and Y. Tamura, “Priorityinheritance with backtracking for iterative multi-agent path ﬁnding,”in

Proc. Intl. Joint Conf. on Artiﬁcial Intelligence (IJCAI) , 2019, pp.535–542.[17] S. Zilberstein, “Using anytime algorithms in intelligent systems,”

AImagazine , vol. 17, no. 3, pp. 73–73, 1996.[18] H. Ma, J. Li, T. Kumar, and S. Koenig, “Lifelong multi-agent pathﬁnding for online pickup and delivery tasks,” in

Proc. Intl. Joint Conf.on Autonomous Agents & Multiagent Systems (AAMAS) , 2017, pp.837–845.[19] J. ˇSvancara, M. Vlk, R. Stern, D. Atzmon, and R. Bart´ak, “Onlinemulti-agent pathﬁnding,” in

Proc. AAAI Conf. on Artiﬁcial Intelligence ,vol. 33, 2019, pp. 7732–7739.[20] T. Standley and R. Korf, “Complete algorithms for cooperativepathﬁnding problems,” in

Proc. Intl. Joint Conf. on Artiﬁcial Intel-ligence (IJCAI) , 2011, p. 668–673.[21] T. Standley, “Finding optimal solutions to cooperative pathﬁndingproblems,” in

Proc. AAAI Conf. on Artiﬁcial Intelligence , vol. 24, no. 1,2010.[22] L. Cohen, M. Greco, H. Ma, C. Hern´andez, A. Felner, T. S. Kumar,and S. Koenig, “Anytime focal search with applications.” in

Proc. Intl.Joint Conf. on Artiﬁcial Intelligence (IJCAI) , 2018, pp. 1434–1441.[23] K. Vedder and J. Biswas, “X*: Anytime multi-agent path ﬁnding forsparse domains using window-based iterative repairs,”

Artif. Intell. ,vol. 291, p. 103417, 2021. [24] P. Surynek, “Redundancy elimination in highly parallel solutions ofmotion coordination problems,”

Int. J. on Artif. Intell. Tools , vol. 22,no. 05, p. 1360002, 2013.[25] R. K. Ahuja, ¨O. Ergun, J. B. Orlin, and A. P. Punnen, “A surveyof very large-scale neighborhood search techniques,”

Discrete AppliedMathematics , vol. 123, no. 1-3, pp. 75–102, 2002.[26] T. Balyo, R. Bartak, and P. Surynek, “Shortening plans by local re-planning,” in

Proc. IEEE Intl. Conf. on Tools with Artiﬁcial Intelli-gence (ICTAI) , 2012, pp. 1022–1028.[27] R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li,D. Atzmon, L. Cohen, T. Kumar, et al. , “Multi-agent pathﬁnding:Deﬁnitions, variants, and benchmarks,” in

Proc. Intl. Symp. on Com-binatorial Search (SoCS) , 2019, pp. 151–159.[28] A. Felner, R. Stern, S. E. Shimony, E. Boyarski, M. Goldenberg,G. Sharon, N. Sturtevant, G. Wagner, and P. Surynek, “Search-basedoptimal solvers for the multi-agent pathﬁnding problem: Summary andchallenges,” in

Proc. Intl. Symp. on Combinatorial Search (SoCS) ,2017.[29] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conﬂict-basedsearch for optimal multi-agent pathﬁnding,”

Artif. Intell. , vol. 219, pp.40–66, 2015.[30] E. Boyarski, A. Felner, R. Stern, G. Sharon, D. Tolpin, O. Betzalel, andE. Shimony, “Icbs: improved conﬂict-based search algorithm for multi-agent pathﬁnding,” in

Proc. Intl. Joint Conf. on Artiﬁcial Intelligence(IJCAI) , 2015.[31] A. Felner, J. Li, E. Boyarski, H. Ma, L. Cohen, T. S. Kumar, andS. Koenig, “Adding heuristics to conﬂict-based search for multi-agent path ﬁnding,” in

Proc. Intl. Conf. on Automated Planning andScheduling (ICAPS) , 2018.[32] J. Li, A. Felner, E. Boyarski, H. Ma, and S. Koenig, “Improvedheuristics for multi-agent path ﬁnding with conﬂict-based search.” in

Proc. Intl. Joint Conf. on Artiﬁcial Intelligence (IJCAI) , 2019, pp.442–449.[33] M. Barer, G. Sharon, R. Stern, and A. Felner, “Suboptimal variantsof the conﬂict-based search algorithm for the multi-agent pathﬁndingproblem,” in

Proc. Intl. Symp. on Combinatorial Search (SoCS) , 2014.[34] J. Pearl and J. H. Kim, “Studies in semi-admissible heuristics,”

IEEETrans. Pattern Anal. Mach. Intell. , no. 4, pp. 392–399, 1982.[35] M. ˇC´ap, P. Nov´ak, A. Kleiner, and M. Seleck`y, “Prioritized planningalgorithms for trajectory coordination of multiple mobile robots,”

IEEETrans. Autom. Sci. Eng. , vol. 12, no. 3, pp. 835–849, 2015.[36] H. Ma, D. Harabor, P. J. Stuckey, J. Li, and S. Koenig, “Searchingwith consistent prioritization for multi-agent path ﬁnding,” in

Proc.AAAI Conf. on Artiﬁcial Intelligence , vol. 33, 2019, pp. 7643–7650.[37] A. Srinivasan, T. Ham, S. Malik, and R. K. Brayton, “Algorithmsfor discrete function manipulation,” in

IEEE Intl. Conf. on Computer-Aided Design (ICCAD) , 1990, pp. 92–95.[38] G. Sharon, R. Stern, M. Goldenberg, and A. Felner, “The increasingcost tree search for optimal multi-agent pathﬁnding,”

Artiﬁcial Intel-ligence , vol. 195, pp. 470–495, 2013.[39] O. Amir, G. Sharon, and R. Stern, “Multi-agent pathﬁnding as acombinatorial auction,” in

Proc. AAAI Conf. on Artiﬁcial Intelligence ,vol. 29, no. 1, 2015.[40] N. R. Sturtevant, “Benchmarks for grid-based pathﬁnding,”

IEEETrans. on Computational Intelligence and AI in Games , vol. 4, no. 2,pp. 144–148, 2012.[41] W. H¨onig, T. S. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, andS. Koenig, “Multi-agent path ﬁnding with kinematic constraints,” in

Proc. Intl. Conf. on Automated Planning and Scheduling (ICAPS) ,2016, pp. 477–485.[42] J. Li, D. Harabor, P. J. Stuckey, H. Ma, and S. Koenig, “Symmetry-breaking constraints for grid-based multi-agent path ﬁnding,” in

Proc.AAAI Conf. on Artiﬁcial Intelligence , vol. 33, 2019, pp. 6087–6095.[43] H. Zhang, J. Li, P. Surynek, S. Koenig, and T. S. Kumar, “Multi-agent path ﬁnding with mutex propagation,” in

Proc. Intl. Conf. onAutomated Planning and Scheduling (ICAPS) , vol. 30, 2020, pp. 323–332.[44] G. Wagner and H. Choset, “Subdimensional expansion for multirobotpath planning,”

Artif. Intell. , vol. 219, pp. 1–24, 2015.[45] E. Lam, P. Le Bodic, D. D. Harabor, and P. J. Stuckey, “Branch-and-cut-and-price for multi-agent pathﬁnding.” in

Proc. Intl. Joint Conf.on Artiﬁcial Intelligence (IJCAI) , 2019, pp. 1289–1296.[46] K. Okumura, Y. Tamura, and X. D´efago, “winpibt: Extended priori-tized algorithm for iterative multi-agent path ﬁnding,” arXiv preprintarXiv:1905.10149 , 2019.47] S. D. Han and J. Yu, “Ddm: Fast near-optimal multi-robot pathplanning using diversiﬁed-path and optimal sub-problem solutiondatabase heuristics,”

IEEE Robot. Autom. Lett. (RA-L) , vol. 5, no. 2,pp. 1350–1357, 2020.[48] J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimalsearch for multi-agent path ﬁnding,” in