MAPFAST: A Deep Algorithm Selector for Multi Agent Path Finding using Shortest Path Embeddings
Jingyao Ren, Vikraman Sathiyanarayanan, Eric Ewing, Baskin Senbaslar, Nora Ayanian
MMAPFAST: A Deep Algorithm Selector for Multi Agent PathFinding using Shortest Path Embeddings
Jingyao Ren
University of Southern CaliforniaLos Angeles, [email protected]
Vikraman Sathiyanarayanan
University of Southern CaliforniaLos Angeles, [email protected]
Eric Ewing
University of Southern CaliforniaLos Angeles, [email protected]
Baskin Senbaslar
University of Southern CaliforniaLos Angeles, [email protected]
Nora Ayanian
University of Southern CaliforniaLos Angeles, [email protected]
ABSTRACT
Solving the Multi-Agent Path Finding (MAPF) problem optimallyis known to be NP-Hard for both make-span and total arrival timeminimization. While many algorithms have been developed to solveMAPF problems, there is no dominating optimal MAPF algorithmthat works well in all types of problems and no standard guidelinesfor when to use which algorithm. In this work, we develop thedeep convolutional network MAPFAST (Multi-Agent Path FindingAlgorithm SelecTor), which takes a MAPF problem instance andattempts to select the fastest algorithm to use from a portfolio ofalgorithms. We improve the performance of our model by includingsingle-agent shortest paths in the instance embedding given toour model and by utilizing supplemental loss functions in additionto a classification loss. We evaluate our model on a large and di-verse dataset of MAPF instances, showing that it outperforms allindividual algorithms in its portfolio as well as the state-of-the-artoptimal MAPF algorithm selector. We also provide an analysis ofalgorithm behavior in our dataset to gain a deeper understandingof optimal MAPF algorithms’ strengths and weaknesses to helpother researchers leverage different heuristics in algorithm designs.
KEYWORDS
Multi-Agent Path Finding; Algorithm Selection; Deep Learning
ACM Reference Format:
Jingyao Ren, Vikraman Sathiyanarayanan, Eric Ewing, Baskin Senbaslar,and Nora Ayanian. 2021. MAPFAST: A Deep Algorithm Selector for MultiAgent Path Finding using Shortest Path Embeddings. In
Proc. of the 20thInternational Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), Online, May 3–7, 2021 , IFAAMAS, 9 pages.
Multi-Agent Path Finding (MAPF) is the problem of finding collision-free paths for a team of agents traveling from start locations to goallocations on a map. Given a map as an undirected graph, an optimalMAPF algorithm computes paths for agents with the minimum cost,such that no two agents occupy the same vertex or traverse the sameedge at the same timestep. MAPF is applicable to a wide variety
Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online of problems, including automated warehouses, self-driving vehi-cles, and game engines. In this work we study MAPF on 2D-grids.Solving the MAPF problem optimally is known to be NP-Hard forboth make-span and total arrival time minimization [1, 25]. Manytypes of optimal MAPF algorithms and their variants have beenproposed, including Conflict Based Search (CBS) [14], a methodbased on branch-and-cut-and-price (BCP) [10], and a boolean satis-fiability based algorithm (SAT) [19]. However, there is no optimalMAPF algorithm that dominates the others; each algorithm mayperform well where others do poorly, and vice versa. This stemsfrom the inherent differences of the algorithms. For example, as weshow in Section 6.3, CBS performs well on instances where averagepath lengths are small, while it performs poorly on instances whereagents need to traverse long paths since the complexity of CBS in-creases as the number of potential conflicts between agents increase.On the other hand, BCP performs well on instances with long pathlengths, but is outperformed by CBS when the path lengths aresmall. Moreover, many real-world instances may result in both longand short paths, increasing the difficulty of hand-picking an effi-cient algorithm. It is also often unclear whether MAPF algorithmshave strengths or weaknesses for different map types, since theyare usually tested on a relatively small number of maps. Finally,map features such as the density and arrangement of obstacles arenot the only factors that affect the solving speed of MAPF algo-rithms; for example, the number and configuration of agents alsoinfluence the performance of different algorithms. The difficulty ofhand-picking the fastest optimal MAPF algorithm for a particularinstance necessitates the development of an algorithm selector thatautomatically selects the fastest optimal MAPF algorithm.It has been shown that if a MAPF instance can be decomposedinto multiple disjoint sub-problems, solving each sub-problem in-dependently and combining their results can be significantly fasterthan solving the original problem with a MAPF algorithm [20].Each sub-problem may have different map and agent characteris-tics, which will affect the runtime of whatever MAPF algorithm ischosen for each sub-problem. If it is possible to select an efficient al-gorithm for each sub-problem, we can greatly speed up the runtimeof our overall algorithm. While many optimal MAPF algorithmscan solve instances with hundreds of agents within several min-utes [3, 10, 11], thousands of agents in complicated maps quicklymake finding a solution intractable. Moreover, the total number ofthe decomposed sub-problems may be too large, making it difficult a r X i v : . [ c s . M A ] F e b nd inefficient to hand-pick the best algorithms. We therefore seekto develop an algorithm selector that can select which algorithm touse given a MAPF instance.In this paper, we propose MAPFAST (Multi-Agent Path FindingAlgorithm SelecTor), a novel automatic optimal MAPF algorithmselector based on a convolutional neural network (CNN) with in-ception modules [21]. We show that MAPFAST outperforms allexisting optimal MAPF algorithm selectors on a large dataset ofdiverse MAPF instances. We propose two methods for improvingthe quality of our model over previous MAPF algorithm selectors:(1) augmenting MAPF instance encodings with single-agent short-est paths, and (2) using supplemental loss functions to train ourmodel in addition to a classification loss. We empirically show thatsingle-agent shortest paths, regardless of map topology, containenough information to train an algorithm selector that outperformsany individual algorithm in our portfolio. Compared to other ex-isting algorithm selectors [5, 16], we introduce a more selectiveand up-to-date algorithm portfolio that includes optimization, sat-isfiability, and search based algorithms (namely BCP, SAT, CBSand CBSH [3, 11]). We also provide insights into the certain in-stance characteristics that may lead to algorithms to perform wellin certain environments using a dataset of more than 24 ,
000 MAPFinstances. These insights may help researchers leverage differentheuristics in future algorithm designs.
Algorithm selection is the problem of selecting the best algorithmfrom a portfolio of algorithms to solve a given instance of a prob-lem [8, 13]. Algorithm selection can be formulated as a predictionproblem, where the goal is to predict the best algorithm from aportfolio for an input instance [17]. Such techniques have beensuccessfully applied to many computational problems, includingpropositional satisfiability (SAT) and the traveling salesman prob-lem (TSP) [6, 23, 24].Although algorithm selection has been applied to many opti-mization problems, MAPF algorithm selectors have not been wellstudied in the literature. Sigurdson et al. [16] first proposed a classi-fication model based on a convolutional neural network (CNN) byrepresenting the MAPF instance as an RGB image. Their model is aversion of AlexNet [9], which is modified and retrained from imageclassification to try and predict the fastest solver given an imageinput. Their model demonstrated that it was possible to predictfastest algorithms for MAPF instances, although they only achievedrelatively limited performance. Kaduri et al. [5] proposed two dif-ferent models: one based on CNNs using VGGNet [21], and theother based on a tree-based learning algorithm named XGBoost [2].Their work uses a MAPF algorithm portfolio that includes only op-timal search-based algorithms. Based on our performance analysisin Section 3.1, the best search-based algorithm in our algorithmportfolio, namely CBSH, is the fastest algorithm for only 30% of testcases. Thus, omitting non-search based algorithms handicaps theperformance of an algorithm selector. Their best model,
XGBoost Cl ,requires hand-crafted MAPF features (e.g., number of agents, ob-stacle density) as input to their algorithm selector. Although theauthors provide analysis of the impact of their hand-crafted fea-tures on the performance of their model, the performance for their algorithm selector may still be impaired by their small feature setthat may not include some important features of an instance.
The algorithm portfolio is a set of pre-selected candidate algorithms.When run, an algorithm selector will select a single algorithm fromthe portfolio to run on an instance and report the results of thatalgorithm. Given that MAPF instances are complex and diverse, agood algorithm portfolio must be diverse. Ideally, an algorithm inthe portfolio should have strengths that cover for the weaknessesof the other algorithms. Many optimal MAPF algorithms are builton top of similar approaches with different heuristics (e.g., CBS andits variants). We seek to include optimal MAPF algorithms that areinherently different from each other to find the best algorithms fora variety of MAPF instances.We select the following four algorithms for our portfolio: • Search-based : Conflict-Based-Search (CBS) [14] and its state-of-the-art variant with improved heuristics, CBSH [3, 11]; • Optimization-based : Branch-and-Cut-and-Price (BCP) [10],a method based on the decomposition framework for mathe-matical optimization; • Satisfiability-based : A reduction of the MAPF problems topropositional satisfiability problem (SAT) [19].We also tested other algorithms such as EPEA* [4] and ICTS [15],but removed them due to their limited performance compared tothe algorithms in our portfolio.
Next, we show the capabilities and different characteristics of thealgorithms in our portfolio by presenting a performance analysisfor each individual algorithm. We use three metrics to evaluate theportfolio algorithms: • Accuracy is the proportion of instances in which an algo-rithm is the fastest in the portfolio. • Coverage is the proportion of the instances that an algorithmsuccessfully solves within the time limit (5 minutes). • Runtime is the overall time taken by the algorithm, in min-utes, to solve all instances. A default value of 5 minutes isadded to the runtime when the algorithm doesn’t solve theinput instance within time limit.
Table 1: Performance analysis for portfolio algorithms onthe entire dataset
Algorithm Accuracy Coverage Runtime Solving TimeMean Median StdDev
CBS 0.1908 0.40 77,091 3.088 5.000 2.344CBSH 0.2953
We use the MAPF Benchmarks [18] to analyze our portfolioalgorithms. Our dataset of instances contains a wide variety ofmap types, including cities, video game maps, mazes, random mapsand warehouses. When generating the instances, we only keep the mpty Rand City Maze Game Room Ware A cc u r a c y CBS CBSH BCP SAT
Empty Rand City Maze Game Room Ware C o v e r a g e Figure 1: Accuracy and coverage data for portfolio algo-rithms with respect to different map types. Random andWarehouse maps are labeled as Rand and Ware respectively.Table 2: Accuracy and coverage data for game maps
Map Accuracy Coverage
CBS CBSH BCP SAT CBS CBSH BCP SATbrc202d 0.1059
Figure 2: Maps in Table 2. Left to right, first row: brc202d,orz900d, den312d; second row: den520d, lak303d, ost003d. instances that at least one algorithm from the portfolio can solvewithin the time limit (5 minutes). We generate 24967 solvable in-stances with varying number and distribution of agents. The resultsof the performance analysis are shown in Table 1. We also includethe mean, median, and standard deviation of the time needed (inminutes) for different algorithms to solve the instances. BCP andCBSH are successful in solving 90% of the instances. However, BCP,the algorithm that is fastest more often than any other algorithmsin our portfolio, is only the fastest algorithm for 51% of instances. (a) (b) (c)
Figure 3: (a) MAPF instance marked only with start (greencircles) and goal (blue squares) locations. (b)(c) Two differentmappings of the start and goal locations with respect to themap in (a). Planned paths are marked in colored lines. (a) (b)
Figure 4: Encoding an instance map with (a) start and goallocations and (b) single-agent shortest paths.
Selecting only BCP to solve all the instances would take more thantwice as long as selecting the best performing algorithm for eachinstance (shown as Oracle in Table 1). This further justifies theclaim that there is no dominating optimal MAPF algorithm.To answer whether a specific algorithm always performs well incertain types of maps, we present accuracy and coverage data withrespect to the map types in Fig. 1. We also present the accuracy andcoverage data for a subset of game map instances in Table 2. Overall,BCP has the highest accuracy in the game maps. However, neitherBCP nor CBSH show dominant performance over each other forthe instances on these individual maps. CBSH outperforms BCP inbrc202d, orz900d, den520d, ost003d but the gaps for accuracy aresmall. For den312d and lak303d, BCP is significantly better thanCBSH in terms of accuracy. Even for the maps that have similartopology (e.g., den321d and den520d), the fastest algorithms canstill be different.Based on this data, we do not find a clear relationship betweenmap types and algorithm performance. Some of the maps in Fig. 2have both narrow corridors and open spaces, making it challengingto manually choose the algorithm that works well on a certainmap types (e.g., use CBS for maps with narrow corridors). Owingto the fact that map topologies are usually non-homogeneous, itis necessary to analyze MAPF instances on a case-by-case basisinstead of categorizing them by map types.
Existing MAPF algorithm selectors encode instances using eitherhand-crafted features [5] or a 2D map with each cell marked ashaving a start or goal location or not [16]. Encoding an instanceusing hand-crafted features cannot capture as much informationas feeding the full map into a deep learning model. However, repre-senting a variable number of agents and goals on a map is challenge.Sigurdson et al. [16] encode agents and goals as binary featuresor each cell, meaning a node either has an agent/goal or does not.This method cannot distinguish between different permutations ofagent start and goal positions: different assignments of start andgoal locations to agents may lead to drastically different algorithmicperformance. Take for instance the map shown in Fig. 3a and twopermutations of agents and goals in Fig. 3b and Fig. 3c. CBS takes0 .
02 s and BCP takes 0 .
03 s to solve the instance in Fig. 3b. With adifferent mapping for start and goal locations as shown in Fig. 3c,CBS doesn’t finish within the five-minute time limit, but BCP suc-cessfully solves the instance in only 0 .
33 s. Any model trained onbinary encodings of agents and goals will not be able to differentiatebetween these two very different instances. We can improve theperformance of our models by encoding into our inputs a mappingbetween agents and goals. We will further justify this claim in theSection 6.2.We propose a new way of encoding MAPF instances that encodesa mapping between agents and their goals. In addition to markingthe start and goal locations, we include another marking in ourinput which encodes single-agent shortest paths from each agentto its goal. A single-agent shortest path is an optimal path for anagent without considering collisions with other agents (Fig. 4b).For every agent, we add only a single shortest path, despite the factthere may be many distinct shortest paths for every agent to itsgoal. We encode the shortest paths on our map where each cell ismarked if it lies on a shortest single-agent path.We initially encoded our MAPF instance into a tensor with fourlayers. Each layer encoded a binary feature for each map cell: ob-stacle, agent, goal, shortest path. Models trained on this encodingperformed relatively poorly in all metrics, perhaps requiring moretraining data than we generated. We then encoded our features ina different manner into a tensor with three layers by annotatingevery cell in our map with: • [ , , ] if the cell contains an obstacle • [ , , ] if the cell lies on a shortest path from any agent totheir goal • [ , , ] if the cell is the starting location of an agent • [ , , ] if the cell is the goal location of an agent • [ , , ] if the cell is both a start and goal location • [ , , ] if the cell is emptyNote that since every cell with a start/goal location is guaranteedto lie on a shortest path, we only mark that these cells contain astart/goal, we do not mark that they also lie on a shortest pathsince it is already implied by the presence of a start/goal. This isthe encoding we use for our CNN-based models. We introduce the following algorithm selectors in this section:(1) CNN
Class , which naively treats algorithm as a classification task,(2) MAPFAST, an augmented version of CNN
Class that achievessignificantly better results, and (3) G2V, a graph-embedding basedmodel that offers insights into what information is required toperform algorithm selection on MAPF instances.
Class
In this work, we model algorithm selection as a classification taskusing a CNN. Our model takes as input an encoding of a MAPF instance and returns a prediction for the fastest algorithm in theportfolio. Inception modules are used to improve training speedand allow for a much deeper network than architectures like VG-GNet [21]. Moreover, since the inception module contains multiplesizes of convolution kernels (Fig. 5), there is no need to decide theexact kernel size for each layer as the network learns which kernelto use.The input to the model is a 320 × × × × ×
224 and ourmap size is 320 × 𝐿 𝐶𝑙𝑎𝑠𝑠 usingcategorical cross-entropy between our predictions and the groundtruth fastest algorithm. This model is referred to as CNN
Class , as itis only trained with 𝐿 𝐶𝑙𝑎𝑠𝑠 . To improve the quality of our predictions, we further augment ourtraining process with two additional supplemental loss functions.We add a second output layer using four neurons with a sigmoidactivation function to predict, for every algorithm, whether it willfinish within the time limit or not. We compute our completion loss 𝐿 𝐶𝑜𝑚𝑝 using cross-entropy loss between our second output layerand the ground truth algorithm completions.We add an additional third output layer to predict pairwise rela-tive performance of algorithms on an input instance. This is doneusing six output neurons. The first three output neurons predictwhether BCP will be faster than CBS, CBSH, and SAT. The followingtwo neurons predict whether CBS will be faster than CBSH and SAT.The final neuron predicts whether CBSH will be faster than SAT.Again, we compute our pairwise loss 𝐿 𝑃𝑎𝑖𝑟 using cross-entropy. Wetrain a model with the total loss 𝐿 𝑇𝑜𝑡 = 𝐿 𝐶𝑙𝑎𝑠𝑠 + 𝐿 𝐶𝑜𝑚𝑝 + 𝐿 𝑃𝑎𝑖𝑟 andrefer to this combined model as MAPFAST (shown in Fig. 5).
For the previous models, we encode our MAPF instance as a tensorthat contains information about the map as well as single-agentshortest paths. We now demonstrate that the single-agent pathsalone, regardless of map topology, contain enough information tooutperform any individual algorithm in our portfolio. We utilizegraph embedding techniques to convert the single-agent shortestpaths for an instance into an embedding and train a model to makean algorithm prediction from this embedding. igure 5: The CNN architecture of MAPFAST.
For a MAPF instance, we construct our graph encoding as follows.First, we compute a single-agent shortest path for every agent. Wethen construct a graph consisting of only nodes which lie on theseshortest paths. We add edges to all nodes which are adjacent inour MAPF instance. This (possibly disjoint) graph serves as theencoding of the MAPF instance. The size of our graph encodingis often significantly smaller than the size of the original instancemap, as our graph encoding only contains nodes which lie alongshortest paths.After encoding our MAPF instance as a graph, we use the un-supervised graph embedding algorithm
Graph2Vec [12] to embedour graph into a vector. Graph2Vec takes as input a set of graphsand outputs a fixed-size vector representation for each graph. Wefeed Graph2Vec the graph representation of every instance in ourdataset and it produces a vector of size 128 for each instance. WhileGraph2Vec requires access to every graph in our dataset (includingthe test set) apriori, it does not have access to information on algo-rithm performances while generating embeddings and can be seenas a data preprocessing step. We train an XGBoost [2] classifier onthe embeddings generated from maps in our training set to predictthe fastest algorithm in our portfolio. This model is referred to asG2V in our results and is using only 𝐿 𝑐𝑙𝑎𝑠𝑠 , the cross-entropy lossto optimize the classification accuracy.The drawback of this approach is that Graph2Vec needs allgraphs before any embedding is calculated. This means that, oncetrained, the model cannot be used to create a new graph embeddingfor an unseen instance. Therefore, it is not a deployable algorithmselector in any reasonable sense. However, we believe the resultsfrom this model are very informative. Our G2V model performs welldespite only having access to nodes on shortest paths, potentiallya very small fraction of the total number of nodes in the instance.In the shortest paths, there is no explicit information on map type,obstacle density, or map size, heuristics which have previouslybeen used to select algorithms [5, 20]. Despite this lack of mapinformation, our G2V model performs quite well, better than anysingle algorithm in our portfolio and close in performance to ourCNN-based models, which have access to much more information.This suggests that just information on single-agent shortest paths may be enough to distinguish the performances of our portfolioalgorithms. We evaluate the three models presented in the previous section:CNN
Class , MAPFAST, and G2V. Of the 24967 instances generatedfrom the MAPF Benchmarks [18] for our evaluation, we used 80%for training, 10% for validation and 10% for testing. Before datacollection, we built wrappers for the algorithms in our portfolioso that all the algorithms use a common method to read the inputinstances. The code can be found at this link . We train our modelfor 5 epochs with a learning rate of . Table 3: Simulation resultsAlgorithm Accuracy Coverage Runtime
CBS 0.1888 0.41 7,714CBSH 0.2810 0.90
BCP
Class
Oracle 1.0 1.0 917We use the classification outputs of the models to select thefastest algorithm. To select an algorithm we take the argmax of theclassification output, and select the corresponding algorithm (if thechosen algorithm fails to solve the instance, another algorithm isnot selected).As mentioned in Section 3.1, we use accuracy, coverage, andruntime to evaluate the performance of portfolio algorithms. Theseetrics can also be used to analyze the performance of algorithmselectors, but with slightly different definitions.The
Accuracy metric gives the proportion of instances that thealgorithm selector correctly selects the fastest algorithm.
Coverage is the proportion of instances where the algorithm selector selectsan algorithm that solves the instance within the time limit.
Runtime is the overall time taken for the selected algorithms, in minutes, tosolve all the problems in the test set. A default value of 5 minuteswas added to runtime when the algorithm didn’t solve the inputinstance within time limit.Table 3 shows the results from evaluating our models on the2484 MAPF instances in the test set. In the first four rows, we reportresults for using a single algorithm on all input instances. Our ex-periments show that BCP and CBSH were successful in solving 90%of the input instances. However, BCP, the best individual algorithmin accuracy and coverage, is the fastest for only 53% of instancesand takes more than twice as long as selecting the fastest algorithmfor each instance (shown as Oracle in Table 3).The second part of the table shows the comparison of a state-of-the-art MAPF algorithm selector, XGBoost Cl [5], and our ap-proach. To generate these results, we train XGBoost Cl with ouralgorithm portfolio and dataset. MAPFAST successfully selects thefastest algorithm for 77% of the input instances and had coverageof 97%, outperforming XGBoost Cl, which had 67% accuracy and95% coverage. Our Model G2V had a performance comparable toour CNN
Class model, with 71% accuracy and 96% coverage, alsooutperforming XGBoost Cl.The total runtime for the algorithms chosen by our models aresignificantly less than using the same algorithm for every instance.On average, it takes 1 second to annotate one input instance withsingle-agent shortest paths and 0.01 second for the trained modelto select the fastest algorithm, which are negligible to the averageruntime of the best portfolio algorithm (i.e., CBSH has an averageruntime of 53 seconds). Our models also have a remarkable im-provement of accuracy compared to all of the individual algorithms,further justifying our approach.
Table 4: Actual and predicted coverage for MAPFAST
CBS CBSH BCP SATActual Coverage 0.41 0.90 0.91 0.38Predicted Coverage 0.42 0.89 0.87 0.40Recall 0.90 0.95 0.95 0.91Correctness 0.91 0.93 0.92 0.91We use the following method to further analyze the performanceof the four output neurons in MAPFAST that use 𝐿 𝐶𝑜𝑚𝑝 loss topredict if an algorithm solves a given input instance or not. Let T be the set of all test instances. For a particular algorithm, let S bethe set of test instances that it can solve within the time limit, and Q be the set of test instances it cannot solve within the time limit suchthat {S , Q} is a partition of T . Let ˜ S be the set of test instances ourmodel predicts as solvable by the algorithm within the given timelimit, and ˜ Q be the instances that our model predicts as not solvableby the algorithm within the given time limit. { ˜ S , ˜ Q} is another partition of T . The first row of Table 4 shows the actual coverageof the algorithms in the portfolio, i.e. |S ||T | . The second row showsthe predicted coverage of our model for each algorithm, i.e. | ˜ S ||T | .The third row lists the recall of our model, which is the fraction ofsolvable instances that our model predicts as solvable: |S∩ ˜ S ||S | . Thefinal row lists the correctness of our model, which is the fraction ofcorrect outputs: |(S∩ ˜ S)∪(Q∩ ˜ Q) ||T | . Our model predicts whether analgorithm solves an instance or not with at least 91% correctnessfor each algorithm. This suggests that the neural network learnsthe inherent behavior of algorithms for the given MAPF instances.
We also use an additional metric, the speed award [22], which pro-vides more information about relative performance among differ-ent algorithms, to further analyze our models. This metric givesa greater reward for solving tasks that not every algorithm solvesand a smaller reward to fast algorithms when every algorithm takesaround the same amount of time. For different algorithm selectors,it gives greater rewards for the models that correctly choose thefastest algorithm when other models fail to do so. It therefore pro-vides more information on the relative performance of algorithmsand algorithm selectors than the accuracy and the cumulative run-time.To calculate the speed award, we first compute the speed factor: speedFactor ( 𝑝, 𝑎 𝑖 ) = + timeUsed ( 𝑝, 𝑎 𝑖 ) , where timeUsed ( 𝑝, 𝑎 𝑖 ) is the time taken by the algorithm 𝑎 𝑖 to solveinstance 𝑝 and 300 is the time limit for each instance. The speedfactor shows how fast an algorithm can solve an instance. The fasteran algorithm is, the higher the speed factor will be. If algorithm 𝑎 𝑖 fails to solve the instance 𝑝 , the speed factor is set to 0.Once we have the speed factor of all algorithms for a probleminstance 𝑝 , we compute the speed award for each algorithm 𝑎 𝑖 tosolve instance 𝑝 as follows: speedAward ( p , a i ) = speedFactor ( 𝑝, 𝑎 𝑖 ) (cid:205) 𝑎 𝑗 ∈ algorithms speedFactor ( 𝑝, 𝑎 𝑗 ) . Here, algorithms = { BCP, CBS, CBSH, SAT, XGBoost Cl, G2V,CNN
Class , CNN
Agents , MAPFAST, Oracle } . The speed award for analgorithm has a higher value if the algorithm solves the instancefaster than other algorithms.The final score for an algorithm on a set of problem instances isgiven by score ( a i ) = ∑︁ ∀ 𝑝 speedAward ( 𝑝, 𝑎 𝑖 ) . (1)The custom score metric provides more information about therelative performance between different algorithms selectors thanjust using runtime metric. In particular, if the algorithms selectedby different algorithm selectors have very similar runtime, thensimilar scores will be granted to these algorithm selectors insteadof giving all the credits to the fastest algorithm.The calculated custom scores are shown in Table 5. Oracle is themodel that always selects the fastest algorithm. CNN Agents is usedfor model validation which will be introduced in Section 6.2. able 5: Custom score resultsAlgorithm
SAT CBS CBSH BCP CNN
Agents
XGBoost Cl CNN
Class
G2V MAPFAST Oracle
Custom Score
Table 6: Ablation study for CNN modelsAlgorithm Accuracy Coverage Runtime
CNN
Agents
Class, Pair
Pair, Comp
Class
Pair
Class, Comp
Oracle 1.0 1.0 917The best model should have the highest custom score. Based onthe results in Table 5, all of the algorithm selectors outperform theportfolio algorithms. Moreover, all of our models outperform thestate-of-the-art model, XGBoost Cl. MAPFAST is ranked as the bestalgorithm selector by speedAward . We performed an ablation study to analyze architectural and designchoices in our network by training variants of our model with differ-ent combinations of our loss functions. We denote the loss functionswe used as subscripts to our model, for example CNN
Class, Comp isour model trained with only the classification loss 𝐿 𝐶𝑙𝑎𝑠𝑠 and thecompletion loss 𝐿 𝐶𝑜𝑚𝑝 . Note that with this notation, MAPFAST isequivalent to CNN
Class, Pair, Comp , but we refer to it as MAPFAST forsimplicity. For models that have classification outputs, we select thealgorithm with the highest value in the classifier output. For modelsthat have pairwise comparison outputs and are trained with thepairwise performance loss 𝐿 𝑃𝑎𝑖𝑟 , but don’t have classifier outputs,we select an algorithm according to the predicted relative perfor-mance of each algorithm. We do not evaluate a model that onlyuses 𝐿 𝐶𝑜𝑚𝑝 as there is not a reasonable way to select an algorithmfrom predictions of algorithm completion. Additionally, we traineda model that only used agents’ start and goal locations and did notinclude single-agent shortest paths, referred to as CNN
Agents . TheCNN
Agents takes the same input encoding as [16], however dueto our different map size, we trained a new model from scratch.Additionally, we saw a performance increase by using inceptionmodules rather than their model architecture, so we present resultsfor CNN
Agents with the same model architecture as MAPFAST, butwith different instance encoding. Our results for all models arepresented in Table 6.All models in Table 6 outperform each individual algorithm inour portfolio (cf. Table 3). Training the model to predict algorithmcompletion has mixed impact on accuracy, but improves coverageand runtime scores. Interestingly, combining 𝐿 𝐶𝑙𝑎𝑠𝑠 and 𝐿 𝑃𝑎𝑖𝑟 de-creases accuracy, until 𝐿 𝐶𝑜𝑚𝑝 is also added to make MAPFAST, which causes accuracy to increase 5%. Our model MAPFAST, usingall three loss functions, has best performance in all three metrics.We have two possible explanations for why the additional lossfunctions improved performance of MAPFAST: (1) The classifica-tion loss function can be noisy, since sometimes the differencebetween solution times of the two fastest algorithms is quite small.The additional loss functions may smooth out this noise and preventthe model from reaching a local minima. (2) 𝐿 𝑐𝑜𝑚𝑝 helps the modelavoid selecting algorithms that do not finish by predicting whetheran algorithm finishes within the time limit or not. 𝐿 𝑝𝑎𝑖𝑟 encodesmore about the entire ordering, not just the fastest algorithm. Bothcan be seen as a means of extracting more information from eachinstance than using classification accuracy alone.The custom score of CNN Agents in Table 5 further shows itslimitation and the necessity of using shortest path embedding.Albeit having the same model architecture, the score for MAP-FAST (320 .
43) is remarkably higher than CNN
Agents (287 . In order to gain a deeper understanding of our MAPF algorithmportfolio, we analyze the performance of each algorithm for all theMAPF instances we generated for training and testing the algorithmselector. Although interpreting why MAPFAST chooses a certainalgorithm is beyond the scope of this paper, we aim to provide moreinsight on when a specific algorithm might work well for a certainscenario.Since the input for MAPFAST contains the single-agent shortestpaths, there may be some corresponding patterns of these pathsfor the test cases that have the same fastest algorithms. Here wedefine
SpaceRatio , which is equal to the number of map cells thatare on the single-agent shortest paths divided by the number ofthe map cells that have no obstacles in it. The SpaceRatio not onlyrepresents how much free map space is used by the single-agentshortest paths, but also how spread the start and goal locations arein a map. We present the scatter plots for average length of single-agent shortest paths with respect to SpaceRatio of two differentmaps in Fig. 6a and 6c. Each data point is colored by the algorithmthat solves the corresponding instance fastest. The distributionsof the average single-agent shortest path lengths and SpaceRatiowith respect to each algorithm are also shown on the top and rightsides of the figures. We see that CBS tends to perform better thanCBSH and BCP for the test cases having shorter average lengthof single-agent shortest path and smaller SpaceRatios. CBSH andBCP have similar performance on different average single-agentshortest path lengths. However, CBSH performs better than BCPon the test cases with higher SpaceRatios. Since the SpaceRatio isalso affected by the total number of agents, we further present thescatter plots for the number of agents with respect to SpaceRatios
100 200 300
Avg. single-agent shortest path Sp a c e R a t i o Berlin
CBSCBSHBCP (a)
Agents Sp a c e R a t i o Berlin
CBSCBSHBCP (b)
Avg. single-agent shortest path Sp a c e R a t i o den520d CBSCBSHBCP (c)
Agents Sp a c e R a t i o den520d CBSCBSHBCP (d)(e) CBS (f) CBSH (g) BCP (h) CBS (i) CBSH lowhigh (j) BCP
Figure 6: (a)-(d) The scatter plots for average single-agent shortest path length and number of agents with respect to SpaceRatio.(e)-(j) Heat maps of the single-agent shortest paths with respect to different algorithms for (e) - (g) Berlin and (h) - (j) den520d. in Fig. 6b and Fig. 6d. When there are fewer agents in the map,CBS works better for smaller SpaceRatios while BCP and CBSHdominate the test cases with larger SpaceRatios.In Fig. 6e - 6j, we show the heat maps of the single-agent shortestpaths for all of the test cases where a certain algorithm is rankedas the fastest one. The more a map cell is used by a single-agentshortest path, the brighter it is. The heat maps for CBS (Fig. 6eand 6h) have lots of scattered shortest paths compared to BCP andCBSH. This is because the solving speed of CBS is determined bythe number of conflicts found during the search phase. The testcases with longer single-agent shortest paths tends to result in morepotential conflicts, thus making CBS slower. On the other hand, theheat maps for CBSH and BCP are mostly dominated by the longerpaths. Although it seems that the heat maps for CBS have occupiedmore map space than CBSH and BCP, it has no correlation withthe SpaceRatio since the heat maps contain paths from differenttest cases. The notable differences of CBS’s heat maps with otheralgorithms also demonstrate our motivation of adding single-agentshortest paths as an input tensor for MAPFAST. The differences inthe heat maps are so readily apparent that a human can manuallydecide whether or not to use CBS without the help of an algorithmselector. However, the heat maps alone do not lead to any obvioussuggestion about when to use BCP or CBSH. These two algorithmshave very similar performance on test cases with different numbersof agents and SpaceRatios. Although the test cases in Fig. 6 indicatethat CBSH works better for higher SpaceRatios, we have observedsimilar results for BCP in other maps which are not shown here.Based on the dataset analysis, one interesting future work isto develop hybrid
MAPF algorithms that combine the strength ofdifferent algorithms. For instance, one can use the number of agentsor SpaceRatio as an additional heuristic to help decide whether to use the basic version of an algorithm such as CBS or an improvedversion such as CBSH.
In this paper, we present MAPFAST, a deep learning based optimalMAPF algorithm selector that outperforms the current state-of-the-art model. We also introduce a new encoding method for theMAPF instances by using the single-agent shortest paths. In addi-tion to just using classification loss in the CNN model, we show thatadding multiple supplemental loss functions such as completionloss and pair-wise loss further improves the performance of thealgorithm selector. The performance for MAPFAST is evaluated andanalyzed with a large and diverse dataset of MAPF instances. Weempirically show that the single-agent shortest paths, even withoutmap topology, contain enough information to train a model thatoutperforms all portfolio algorithms as well as the state-of-the-artmodel. Also, we provide insight on the inherent features of MAPFproblems that may help future researchers improve their MAPFalgorithm designs.We propose the following problems for future work: (1) utilizinggraph-based learning techniques to extend to MAPF on generalgraphs, (2) incorporating our algorithm selection model into MAPFproblem decomposition to select an efficient solver for each sub-problem, and (3) incorporating sub-optimal MAPF algorithms intoour portfolio and training a model that selects fast solvers withnear-optimal cost.
ACKNOWLEDGMENTS
This research was supported by NSF awards IIS-1553726, IIS-1724392,IIS-1724399, and CNS-1837779 as well as a gift from Amazon.
EFERENCES [1] Jacopo Banfi, Nicola Basilico, and Francesco Amigoni. 2017. Intractability ofTime-Optimal Multirobot Path Planning on 2d Grid Graphs with Holes.
IEEERobotics and Automation Letters
2, 4 (2017), 1941–1947.[2] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree BoostingSystem. In
Proc. of the 22nd ACM SIGKDD Intl Conf on Knowledge Discovery andData Mining . 785–794.[3] Ariel Felner, Jiaoyang Li, Eli Boyarski, Hang Ma, Liron Cohen, TK Satish Kumar,and Sven Koenig. 2018. Adding Heuristics to Conflict-Based Search for Multi-Agent Path Finding. In
Proc. of the Intl Conf on Automated Planning and Scheduling ,Vol. 28. 83–87.[4] Meir Goldenberg, Ariel Felner, Roni Stern, Guni Sharon, Nathan Sturtevant,Robert C Holte, and Jonathan Schaeffer. 2014. Enhanced Partial Expansion A.
Journal of Artificial Intelligence Research
50 (2014), 141–187.[5] Omri Kaduri, Eli Boyarski, and Roni Stern. 2020. Algorithm Selection for OptimalMulti-Agent Pathfinding. In
Proc. of the Intl Conf on Automated Planning andScheduling , Vol. 30. 161–165.[6] Pascal Kerschke, Lars Kotthoff, Jakob Bossek, Holger H Hoos, and Heike Traut-mann. 2018. Leveraging TSP Solver Complementarity Through Machine Learning.
Evolutionary Computation
26, 4 (2018), 597–620.[7] Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-mization. arXiv preprint arXiv:1412.6980 (2014).[8] Lars Kotthoff. 2016. Algorithm Selection for Combinatorial Search Problems: ASurvey. In
Data Mining and Constraint Programming . Springer, 149–190.[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Clas-sification with Deep Convolutional Neural Networks. In
Advances in NeuralInformation Processing Systems . 1097–1105.[10] Edward Lam, Pierre Le Bodic, Daniel Damir Harabor, and Peter J Stuckey. 2019.Branch-and-Cut-and-Price for Multi-Agent Pathfinding. In
Proc. of the Intl JointConf on Artificial Intelligence . 1289–1296.[11] Jiaoyang Li, Ariel Felner, Eli Boyarski, Hang Ma, and Sven Koenig. 2019. ImprovedHeuristics for Multi-Agent Path Finding with Conflict-Based Search.. In
Proc. ofthe Intl Joint Conf on Artificial Intelligence . 442–449.[12] Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, LihuiChen, Yang Liu, and Shantanu Jaiswal. 2017. Graph2Vec: Learning DistributedRepresentations of Graphs. arXiv preprint arXiv:1707.05005 (2017).[13] John R Rice. 1976. The Algorithm Selection Problem. In
Advances in Computers .Vol. 15. Elsevier, 65–118. [14] Guni Sharon, Roni Stern, Ariel Felner, and Nathan R Sturtevant. 2015. Conflict-Based Search for Optimal Multi-Agent Pathfinding.
Artificial Intelligence
Artificial Intelligence arXivpreprint arXiv:1906.03992 (2019).[17] Kate A Smith-Miles. 2009. Cross-Disciplinary Perspectives on Meta-Learning forAlgorithm Selection.
ACM Computing Surveys (CSUR)
41, 1 (2009), 1–25.[18] Roni Stern, Nathan R. Sturtevant, Ariel Felner, Sven Koenig, Hang Ma, Thayne T.Walker, Jiaoyang Li, Dor Atzmon, Liron Cohen, T. K. Satish Kumar, Eli Boyarski,and Roman Bartak. 2019. Multi-Agent Pathfinding: Definitions, Variants, andBenchmarks.
Symposium on Combinatorial Search (SoCS) (2019), 151–158.[19] Pavel Surynek, Ariel Felner, Roni Stern, and Eli Boyarski. 2016. Efficient SATApproach to Multi-Agent Path Finding Under the Sum of Costs Objective. In
Proc.of the European Conf on Artificial Intelligence . 810–818.[20] Jiri Svancara and Roman Bartak. 2019. Combining Strengths of Optimal Multi-Agent Path Finding Algorithms. In
Proc. of the Intl Conf on Agents and ArtificialIntelligence - Vol. 1 . 226–231.[21] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, DragomirAnguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015.Going Deeper with Convolutions. In
Proc. of the IEEE Conf on Computer Visionand Pattern Recognition . 1–9.[22] Allen Van Gelder, Daniel Le Berre, Armin Biere, Oliver Kullmann, and LaurentSimon. 2005. Purse-Based Scoring for Comparison of Exponential-time Programs.
Eighth Intl Conf on Theory and Applications of Satisfiability Testing (2005).[23] Lin Xu, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2012. Evaluat-ing Component Solver Contributions to Portfolio-Based Algorithm Selectors.In
International Conference on Theory and Applications of Satisfiability Testing .Springer, 228–241.[24] Lin Xu, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2008. SATzilla:Portfolio-Based Algorithm Selection for SAT.
Journal of Artificial IntelligenceResearch
32 (2008), 565–606.[25] Jingjin Yu and Steven M LaValle. 2013. Structure and Intractability of OptimalMulti-Robot Path Planning on Graphs. In