A Combination of Theta*, ORCA and Push and Rotate for Multi-agent Navigation
AA Combination of Theta*, ORCA and Push andRotate for Multi-agent Navigation
Stepan Dergachev , − − − ,Konstantin Yakovlev , − − − X ] , andRyhor Prakapovich − − − National Research University Higher School of Economics, Moscow, Russia Federal Research Center for Computer Science and Control of Russian Academy ofSciences, Moscow, Russia United Institute of Informatics Problems of the National Academy of Sciences ofBelarus, Minsk, Belarus [email protected], [email protected], [email protected]
Abstract.
We study the problem of multi-agent navigation in staticenvironments when no centralized controller is present. Each agent iscontrolled individually and relies on three algorithmic components toachieve its goal while avoiding collisions with the other agents and theobstacles: i ) individual path planning which is done by Theta* algo-rithm; ii ) collision avoidance while path following which is performedby ORCA* algorithm; iii ) locally-confined multi-agent path planningdone by
Push and Rotate algorithm. The latter component is crucialto avoid deadlocks in confined areas, such as narrow passages or doors.We describe how the suggested components interact and form a coher-ent navigation pipeline. We carry out an extensive empirical evaluationof this pipeline in simulation. The obtained results clearly demonstratethat the number of occurring deadlocks significantly decreases enablingmore agents to reach their goals compared to techniques that rely oncollision-avoidance only and do not include multi-agent path planningcomponent .
Keywords:
Multi-agent Systems · Path Planning · Collision Avoidance · Multi-agent Path Finding · Navigation · ORCA · Push and Rotate
When a group of mobile agents, such as service robots or video game charac-ters, operates in the shared environment one of the key challenges they faceis arriving at their target locations while avoiding collisions with the obstaclesand each other. In general two approaches to solve this problem, i.e. multi-agentnavigation, can be identified: centralized and decentralized.The centralized approaches rely on a central planner that constructs a jointmulti-agent plan utilizing the full knowledge of the agents’ states. When the plan
This is a preprint of the paper accepted to ICR’20 a r X i v : . [ c s . M A ] A ug S. Dergachev et al. (a) (b)
Fig. 1: An example of possible deadlock in the vicinity of narrow passage.is constructed each agent follows it and if the execution is perfect or near-perfectcollisions are guaranteed not to happen. Centralized planning can be carried outin such a way that the joint plan is guaranteed to be optimal . Non-optimal yetcomputationally efficient centralized planners do exist as well.Decentralized approaches do not rely on the central planner. Instead eachagent plans its trajectory individually and resolves conflicts at the executionstage, i.e. an agent reactively avoids collisions relying on the local observationsand/or communication. Such approach naturally fits to numerous scenarios withheterogeneous agents, limited communication/visibility etc. It also scales well tolarge number of agents however it does not guarantee optimality. Moreover, noguarantees that each agent will reach its goal can be provided. The main reasonfor that is that each agent is egoistic in pursuing its goal this can lead to adeadlock. Consider a scenario shown on Fig. 1 when agents have to swap sidesusing a tight passage connecting the rooms. If they approach the passage nearlyat the same time they will be stuck. For all of them to safely move through atleast one of the agents has to step back and let other agent go. Thus, some formof cooperative behaviour (which is explicitly or implicitly centralized) is vital foraccomplishing the mission.In this work we present an algorithmic pipeline that combines decentralizedand centralized approaches to multi-agent navigation. Within this pipeline eachagent constructs and follows its path individually. When a pattern leading topotential deadlock is detected a locally-confined multi-agent path finding is in-voked, i.e. agents that appear to be involved in a deadlock are identified anda joint safe plan for these agents is constructed. After executing this plan allinvolved agents switch back to the decentralized mode. We use Theta* [6],
ORCA* [16],
Push and Rotate [7] for individual path planing, local collisionavoidance and multi-agent path finding respectively. Meanwhile, the suggestedpipeline is not limited to these algorithms and other algorithmic components maybe used instead of the aforementioned methods as well. We extensively study thesuggested approach in simulation and show that the suggested approach leads Typically, optimal centralized planers rely on the discretization of the workspace thusthe solutions they provide are optimal w.r.t. the used space/time discretization. Combination of Theta*, ORCA and Push and Rotate ... 3 to much better results compared to fully decentralized method that does notutilize multi-agent path finding.
The approaches to multi-agent navigation can be roughly divided into the twogroups: centralized and decentralized [18]. The centralized approaches assumethat a central planner is available that possess all the information about theagents and the environment and plans for a joint collision-free solution. Naiveapplication of such search algorithms as A* [10], i.e. planning in the full jointconfiguration space, which is a cardinal product of the individual configurationspaces, is impractical as the search space grows exponentially in the number ofagents. However there exist techniques that greatly reduce the search effort, suchas subdimensional expansion [17], conflict-based search [12] to name a few. Thelatter technique is very popular nowadays and there exist a large variety of CBSplanners that improve the performance of the original algorithm or extend it innumerous ways, e.g. in [3] a sub-optimal version of CBS is presented, the work [2]planning in continuous time is considered etc. All these algorithms are designedwith the aim of minimizing cost of the result solution. At the same time, whenlarge cost of the solution is acceptable one can use extremely fast polynomialalgorithms, e.g.
Push and Rotate [7]. Finally, a prioritized approach [4, 19]is also a common choice in numerous cases. Prioritized planners are fast andtypically provide solutions of reasonably low cost (very close to optimal in manycases). Their main drawback is that they are incomplete in general.The decentralized approaches do not rely on the full knowledge of the agents’states and the environment and assume that each agent utilizes only the informa-tion available to it locally [9, 18]. One of the common ways to solve multi-agentnavigation problem under a decentralized setting is to plan individual paths foreach agent, e.g. by a heuristic search algorithm such as
Dijkstra [8], A* [10], Theta* [6], and then let all agent follow their paths. To avoid collisions algo-rithms of the ORCA-family are typically used [1, 13, 14, 16]. Another techniquesthat rely on buffered Voronoi cells [21] or various reinforcement learning tech-niques [5, 11] to avoid collisions can also be employed. In general decentralizedapproaches are prone to deadlocks and do not guarantee that each agent willreach its destination.In this work we combine both decentralized and centralized methods intoa coherent multi-agent navigation pipeline striving to combine the best of twoworlds: flexibility and scalability of the decentralized navigation and complete-ness of the centralized planning.
Consider a set of n agents moving through 2D workspace, W ∈ R , composedof the free space and the obstacles: W = W free ∪ O . Each obstacle, o ∈ O , isa bounded subset in W . Every agent, i , is modelled as a disk of radius r i . The S. Dergachev et al. state of an agent is defined by its position and velocity: ( p i , V i ). The latter isbounded: || V i || ≤ V maxi ∈ R + , i.e. maximum speed of each agent is given.Let T = 0 , ∆t, ∆t, ... be the discrete time ( ∆t = const ). For the sake ofsimplicity assume that ∆t = 1, thus the timeline is T = 0 , , , ... . At eachmoment of time an agent can either move or wait (the latter can be considered asmoving with the velocity being zero). We neglect inertial effects and assume thatan agent moves from one location to the other following a straight line definedby its current velocity. We also assume perfect localization and execution.At each time step agent knows i ) its own position, ii ) positions of all theobstacles, iii ) states of the neighboring agents. The latter are the agents that arelocated withing a radius R i . This radius models a communication range of anagent, i.e. we assume that an agent i can communicate with other neighboringagents and acquire any information from them. For the sake of exposition wewill refer to this range as to the visibility range further on.A spatio-temporal path for an agent is, formally, a mapping: π : T → W free .It can also be represented as a sequence of agent’s locations at each time step: π = { π , π , ... } . In this work we are interested in converging paths, i.e. pathsby which an agent reaches a particular location and never moves away from it.This can be formulated as follows: ∃ k ∈ T : π k + t = π k ∀ t > i is valid w.r.t. velocity constraints and static obstacles iff the following conditions hold: || π t +1 − π t || ≤ V maxi ∆t, ∀ t ∈ , k − ρ ( π t , o ) > r i , ∀ o ∈ O, ∀ t ∈ , k, where ρ ( π t , o ) stands for the distanceto the closest obstacle.Consider now the two paths for distinct agents: π i , π j . They are called conflict-free if they are valid and the agents following them never collide, that is: || π it − π jt (cid:48) || ≤ r i + r j , ∀ t ∈ , k i , t (cid:48) ∈ , k j . The set of paths Π = { π , ..., π n } , one for each agent, is conflict-free if anypair of paths forming this set is conflict-free. The problem we are considering in this paper can now be formulated asfollows. Given n start and goal locations at each time step for each agent choosea velocity s.t. the agent reaches the goal at some time step, the agent nevercollides with static obstacles and other agents, that is the resultant set of paths Π is conflict-free. Our approach relies on three algorithmic components: individual planning, pathfollowing with collision avoidance and locally-confined multi-agent path findingin case of potential deadlocks. We will now describe all these components.
Combination of Theta*, ORCA and Push and Rotate ... 5 (a)
A* start goal Theta* (b)
Fig. 2: Finding a path for an individual agent: a) the workspace of an agent; b)tesselation of the workspace into the grid and two paths on that grid: the onethat was found by A* (in orange) and the one that was found by Theta* (inblue). We use
Theta* paths in this work.
To find a valid path for an agent that avoids static obstacles we reduce pathplanning problem to graph search. We employ regular square grids [20] as thediscretized workspace representation as they are easy-to-construct and infor-mative graph models widely used for navigation purposes. To create a grid wetessellate the workspace into the square cells and mark each cell either blockedor traversable depending on whether this cell overlaps with any of the obstacleor not (see Fig. 2 on the right). To find a path on the grid we utilize
Theta* [6]algorithm. It’s a heuristic search algorithm of the A* family which augmentsthe edge set of the graph on-the-fly by allowing any-angle moves, i.e. the movesbetween non-adjacent vertices, conditioned that such moves do not lead to thecollisions with the obstacles. As a result, paths found by Theta* are typicallyshorter than the ones found by A* and contain less waypoints. The differencebetween these two types of paths is clearly visible on Fig. 2. Both Theta* and A* paths are collision-free, while the former contains 7 intermediate waypointsand the latter – only 2 of them. Having the individual paths for all agent the considered multi-agent navigationproblem can be naively solved as follows. At each time step an agent choosessuch a velocity that it is directed towards the next waypoint on a path andits magnitude is maximal. In other words, agents move along the constructedpaths as fast as they can without any temporal waits and spatial deviations.Obviously, such an approach leads to numerous collisions in practice. Instead, werely on a more advanced method called Optimal Reciprocal Collision Avoidance(
ORCA ) [16]. This method is designed to move an agent safely towards its localgoal (current waypoint of a path) by reactively adjusting the velocity profile ateach timestep. When choosing a velocity
ORCA relies on local observations:only other moving agents that are located within a visibility range and static
S. Dergachev et al. (а) (b) (c) (d) AB Fig. 3: Goal-setting for
ORCA . (a) Waypoint A is the local goal of the agent 1.(b) The agent 1 deviates from the original trajectory in order to avoid collisionwith the agent 2, the line-of-sight to the local goal is lost, re-planning is triggeredand the new waypoint, C , is added to the path, it becomes the current localgoal. (c) C is reached, next local goal is A . (d) The agent safely moves towards A without any deviation as no other moving agents interfere with its plan.obstacles are taken into account. We tie together individual path planning andpath following with collision avoidance in the following fashion.The waypoints that comprise an individual path (the one found by Theta* )form a sequence of the local goals for
ORCA and an agent always moves towardsits current local goal until the latter is reached. When it happens next waypointfrom the path becomes the local goal and
ORCA is re-invoked. It is noteworthy,that when an agent moves to the current goal it may significantly deviate fromthe original path’s segment (due to other moving agents forcing
ORCA to do so).Thus, it may appear that the local goal can no longer be reached via the straight-line segment, which is a crucial condition for ORCA to operate appropriately. Wedetect such patterns and trigger re-planning from the current agent’s position tothe local goal with
Theta* . New waypoints are added to sequence of the localgoals as a result. The condition that there is a visible connection between anypair of waypoints now holds and ORCA continues. An illustration of the overallprocess is presented in Fig. 3.
Path planning and collision avoidance mechanisms described so far can be at-tributed as non-cooperative, i.e. each agent pursues its own goal and does nottake the goals of other agents into account. In general this can lead to deadlocks.A typical scenario when such a deadlock occurs is depicted on Fig. 4 on the left.Here all four agents have to come through the tight passage nearly at the sametime. In such case
ORCA is forced to set the velocity of each agent to zero toavoid the collisions. Thus, no agent is moving and a deadlock occurs. To avoidthis we include a multi-agent path finding (MAPF) algorithm into the navigationpipeline with the aim of building a set of locally coordinated collision-free paths.Conceptually, when an agent detects a pattern that leads to a potential dead-lock it switches to a coordinated path planning mode and forces its neighboring
Combination of Theta*, ORCA and Push and Rotate ... 7
Fig. 4: Switching to coordinated mode. (a) Agent 1 detects its local goal, A , and3 other agents within its visibility zone. (b) Based on the current positions andindividual paths of agents 1-4 a local grid-map for multi-agent path-finding isidentified as well as start and goal locations for each agent.agents to do so. Upon entering this mode involved agents create a joint conflict-free plan and execute it. After they switch back to the independent executionmode.Switching to the coordinated mode is triggered if both local goal A and k other agents appear in the visibility range of some agent (here k is the predefinedthreshold set up by the user). The latter forms a list of coordinated agents thatinclude: itself, its neighboring agents and their neighboring agents as well. Allthese agents now share the information about their current states (positions andvelocities) and individual paths with each other. Thus each agent constructsthe same information model and uses it for operation, so no explicit centralizedcontroller is introduced.Based on the current positions of the agents, the boundaries of the area inwhich the joint collision-free plan will be built are estimated. This is done asfollows. The minimum and maximum x - and y -positions across all agents areidentified. These four coordinates define a square which is inflated by a visibilityradius of the agent that has initiated switching to the coordinated mode. Thissquare is translated to the grid which was originally used for the individualpath planning. The resulting grid now becomes a local map for multi-agent pathfinding (it is depicted on Fig. 4b in grey). It might appear that two or morecoordinated planning areas are separately constructed by different groups ofagents close to each other. In this case those areas are combined into a singleone and all agents become members of a single coordinated group.After the grid map for multi-agent path finding is constructed each agentchooses its MAPF start and MAPF goal on that grid. The start cell is the oneclosest to the current location of the agent. If it coincides with the start of anotheragent or it is blocked by an obstacle a breadth-first search graph traversal isinvoked from this cell to find a close un-blocked position which becomes a start.Goal locations are identified in the similar way. First, a current local goal (path’swaypoint) is identified. If it coincides with A the next waypoint is chosen. If the S. Dergachev et al. selected waypoint lies inside the planning area then the cell which contains itbecome the MAPF goal. If the waypoint is outside an area, i.e. at least its x - or y -coordinate is greater/less than the corresponding maximum/minimum values ofthe planning area, then this coordinate is replaced by the maximum or minimumvalue of the corresponding coordinate.After the start and goal positions of the agents are fixed, an appropriateMAPF solver is invoked to obtain a solution, i.e. a set of sequences of movesbetween the grid cells and, possibly, wait actions, – one sequence per each agent.For the sake of simplicity we assume that all MAPF actions, i.e. move and wait,are of the uniform duration. This duration is chosen in such a way that theconstraints on the agents’ maximum moving speed are not violated. In this workwe utilize Push and Rotate [7] for multi-agent path planning algorithm asit is very fast and scales well to large number of agents. We did not choosean optimal solver, such as
CBS [12], because of its high computational budget.Using a prioritized planner is also not an option as prioritized MAPF solvers donot guarantee completeness.When a MAPF problem is solved each agent starts moving towards its chosenMAPF-start position on a grid (with ORCA collision avoidance activated). Whenall agents are at their start positions synchronous execution of the plan begins.At this stage no collisions are guaranteed to happen (assuming perfect execution)so ORCA is not used. When all agent reach their MAPF-goals they switch backto the individual mode, i.e. continue path following to their next waypoint on aglobal path with
ORCA as described in previous section.In case some agent, i , which is not involved in the plan execution, gets insidethe boundaries of the planning area and interferes with the coordinated agents,i.e. this agent appears in any coordinated agent’s visibility range, the executionof the coordinated plan is stopped, the agent i is added to the list of coordinatedagents and the plan is recomputed.Similarly, if an agent i , which is involved in the plan execution, gets insideinto the visibility zone of an agent j , which is also involved in the executionof plan, but belongs to another group of coordinated agents, then these twocoordinated groups are merged and re-planning is triggered. The proposed method was implemented in C++ and its experimental evaluationwas carried out on a laptop running macOS 10.14.6 based on Intel Core i5-8259U(2.3 GHz) CPU and with 16 GB of RAM.We used two types of grid maps, i.e. the maps composed of the free/blockedcells, for the experiments (see Fig. 5). First, we generated maps that were com-prised of two open areas (rooms) separated by a wall with tight passages (doors)in it. The number of passages varied from 1 to 4, thus 4 different maps of thattype were generated in total. The overall size of each map was 64 ×
64. We https://github.com/PathPlanning/ORCA-algorithm/tree/Deadlocks Combination of Theta*, ORCA and Push and Rotate ... 9(a) (b) (c)
Fig. 5: Maps used for experimental evaluation. (a) Map comprised of two roomsconnected by narrow passages. (b) and (c) Maps of the indoor environmentcomposed of numerous rooms connected by narrow passages.refer to these maps as to the gaps maps. Second, we took two maps from the
MovingAI [15] benchmark. They represent indoor environments composed ofnumerous rooms connected by passages. The first map of that type has a size of64 ×
64 and was comprised of 16 rooms (9 of them of the size 15 ×
15, 1 – 14and the rest – 14 ×
15 or 15 × ×
32 andit was composed of the 64 rooms of the size 3 ×
3. The latter two maps, referredto as rooms further on, are depicted on Fig. 5b-c.Start/goal locations on each map were generated as follows. For the gaps maps we always generated half of the start/goal locations in the left open areaand the other half in the right one. So, to accomplish the mission, agents haveto navigate between the two areas via the passages in the wall. For the rooms maps we picked start/goal locations randomly. For each map we created 250different scenarios. Each scenario is a list of 40 start/goal locations for the agents.While testing the scenarios were used in the following fashion. First, we invokethe algorithm on 5 (first) start/goal pairs of a scenario, then we increased thisnumber by 5 and invoke the algorithm again and so on until the number ofagents reaches 40. At this time the scenario was considered to be processed andwe moved to the next one.We compared the suggested approach, referred to as
ORCA*+P&R , tothe one that does not utilize deadlock detection and multi-agent path finding,referred to as
ORCA* . For both versions we set the radius of each agent to beequal to 0.3 of the cell size. We also introduced an additional 0.19 safe-buffer forindividual path finding and collision avoidance. For
ORCA*+P&R the numberof neighbours for an agent required to switch to the coordinated mode was setto k = 3.At each run the maximum number of simulation steps was limited to 12 800.If by that time the agents fail to reach their goals, i.e. at least one agent wasnot on its target position or at least one agent collided with another agent orstatic obstacle, the run was considered to be failure . If all agents managed to Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (1 gap)
ORCA*ORCA*+P&R 5 10 15 20 25 30 35 40
Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (2 gaps)
ORCA*ORCA*+P&R5 10 15 20 25 30 35 40
Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (3 gaps)
ORCA*ORCA*+P&R 5 10 15 20 25 30 35 40
Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (4 gaps)
ORCA*ORCA*+P&R
Fig. 6: Success rate for gaps maps.
Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (64x64, 16 rooms)
ORCA*ORCA*+P&R 5 10 15 20 25 30 35 40
Number of agents % o f s u ccs e ss f u ll y c o m p li t ed t a sks Success Rate (32x32, 64 rooms)
ORCA*ORCA*+P&R
Fig. 7: Success rate for rooms maps.safely reach their goals the run was acknowledged as success . The main metricthat we were interested in was the success rate i.e. the percentage of tasks thatresult with success out of all tasks attempted to be solved.Success rates for
ORCA*+P&R and
ORCA* are shown on Fig. 6 and 7. Asone can see the selected environments are very challenging for
ORCA* and itssuccess rate is very low for large number of agents on all the maps. The most chal-lenging environments are the one with 1 passage on a gaps map and the one with64 rooms on rooms map. As expected on all maps
ORCA*+P&R successfullysolved profoundly more instances compared to the baseline. For example, for thechallenging gaps map with 1 passage the success rate for
ORCA*+P&R was95% for 20 agents while the one for
ORCA* was 0%. The difference in successrate for rooms maps is also clearly visible but less pronounced. We hypothe-sise that the main reason for that, as well as the reason for
ORCA*+P&R not to achieve 100% success in general, is the imposed time limit. The planscreated by the
Push and Rotate algorithm may have contained numerousactions and that prevented the agents to reach their goals before the timestep
Combination of Theta*, ORCA and Push and Rotate ... 11 limit exhausted. One of the possible solutions to avoid this will be substituting
Push and Rotate to another MAPF solver which creates plans containing lessactions. However the solvers that are explicitly aimed at minimizing duration ofthe plan have much higher computational budget, so finding a right substitutionis a challenging task for future research.Overall, the results of the experiments clearly show that adding a locally-confined multi-agent path finding into the navigation pipeline significantly in-crease the chance that all agents will reach their goal locations and mission, thus,be accomplished.
In this paper we have studied a multi-agent navigation problem and suggested adecentralized approach that supplements the individual path planning and col-lision avoidance with the deadlock detection and multi-agent path finding. Weimplemented the proposed navigation pipeline and compared it the to the base-line, showing that adding the aformentioned components significantly increasesthe chances of agents safely arriving to the target destinations even in the con-gested environments with tight passages. An appealing direction of future workis the advancement of the deadlock detection procedure by making it less ad-hoc,as well as experimenting with different multi-agent path finding algorithms. An-other direction for future research is evaluating the suggested approach on realrobots.
Acknowledgements
The reported study was funded by RFBR and BRFBR,project number 20-57-00011. We would also like to thank Ilya Ivanashev for hisimplementation of
Push and Rotate . References
1. Alonso-Mora, J., Breitenmoser, A., Rufli, M., Beardsley, P., Siegwart, R.: Optimalreciprocal collision avoidance for multiple non-holonomic robots. In: Distributedautonomous robotic systems, pp. 203–216. Springer (2013)2. Andreychuk, A., Yakovlev, K., Atzmon, D., Stern, R.: Multi-agent pathfindingwith continuous time. In: Proceedings of the 28th International Joint Conferenceon Artificial Intelligence (IJCAI 2019). pp. 39–45 (2019)3. Barer, M., Sharon, G., Stern, R., Felner, A.: Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings ofthe 7th International Symposium on Combinatorial Search (SoCS 2014). vol. 263,pp. 961–962 (2014)4. ˇC´ap, M., Nov´ak, P., Kleiner, A., Seleck`y, M.: Prioritized planning algorithms fortrajectory coordination of multiple mobile robots. IEEE transactions on automa-tion science and engineering (3), 835–849 (2015)5. Chen, Y.F., Liu, M., Everett, M., How, J.P.: Decentralized non-communicatingmultiagent collision avoidance with deep reinforcement learning. In: Proceedingsof the 2017 IEEE international conference on robotics and automation (ICRA2017). pp. 285–292 (2017)2 S. Dergachev et al.6. Daniel, K., Nash, A., Koenig, S., Felner, A.: Theta*: Any-angle path planning ongrids. Journal of Artificial Intelligence Research , 533–579 (2010)7. De Wilde, B., Ter Mors, A.W., Witteveen, C.: Push and rotate: a complete multi-agent pathfinding algorithm. Journal of Artificial Intelligence Research , 443–492(2014)8. Dijkstra, E.W., et al.: A note on two problems in connexion with graphs. Nu-merische mathematik (1), 269–271 (1959)9. Dimarogonas, D.V., Kyriakopoulos, K.J.: Decentralized navigation functions formultiple robotic agents with limited sensing capabilities. Journal of Intelligent andRobotic Systems (3), 411–433 (2007)10. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determina-tion of minimum cost paths. IEEE transactions on Systems Science and Cybernet-ics (2), 100–107 (1968)11. Long, P., Fanl, T., Liao, X., Liu, W., Zhang, H., Pan, J.: Towards optimally decen-tralized multi-robot collision avoidance via deep reinforcement learning. In: Pro-ceedings of the 2018 IEEE International Conference on Robotics and Automation(ICRA 2018). pp. 6252–6259 (2018)12. Sharon, G., Stern, R., Felner, A., Sturtevant, N.R.: Conflict-based search for opti-mal multi-agent pathfinding. Artificial Intelligence , 40–66 (2015)13. Snape, J., Guy, S.J., Van Den Berg, J., Manocha, D.: Smooth coordination andnavigation for multiple differential-drive robots. In: Experimental Robotics. pp.601–613 (2014)14. Snape, J., Van Den Berg, J., Guy, S.J., Manocha, D.: Smooth and collision-freenavigation for multiple robots under differential-drive constraints. In: Proceedingsof the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2010). pp. 4584–4589 (2010)15. Stern, R., Sturtevant, N.R., Felner, A., Koenig, S., Ma, H., Walker, T.T., Li, J.,Atzmon, D., Cohen, L., Kumar, T.S., et al.: Multi-agent pathfinding: Definitions,variants, and benchmarks. In: Proceedings of the 12th International Symposiumon Combinatorial Search (SoCS 2019). pp. 151 – 158 (2019)16. Van Den Berg, J., Guy, S.J., Lin, M., Manocha, D.: Reciprocal n-body collisionavoidance. In: Robotics research, pp. 3–19. Springer (2011)17. Wagner, G., Choset, H.: Subdimensional expansion for multirobot path planning.Artificial Intelligence , 1 – 24 (2015)18. Xuan, P., Lesser, V.: Multi-agent policies: from centralized ones to decentralizedones. In: Proceedings of the 1st International Joint Conference on AutonomousAgents and Multiagent Systems (AAMAS 2002) : part 3. pp. 1098–1105 (2002)19. Yakovlev, K., Andreychuk, A., Vorobyev, V.: Prioritized multi-agent path findingfor differential drive robots. In: Proceedings of the 2019 European Conference onMobile Robots (ECMR 2019). pp. 1–6 (2019)20. Yap, P.: Grid-based path-finding. In: Proceedings of the 15th Conference of theCanadian Society for Computational Studies of Intelligence (AI 2002). pp. 44–55(2002)21. Zhou, D., Wang, Z., Bandyopadhyay, S., Schwager, M.: Fast, on-line collision avoid-ance for dynamic vehicles using buffered voronoi cells. IEEE Robotics and Automa-tion Letters2