A Benchmark for Multi-UAV Task Assignment of an Extended Team Orienteering Problem
Kun Xiao, Junqi Lu, Ying Nie, Lan Ma, Xiangke Wang, Guohui Wang
AA Benchmark for Multi-UAV Task Assignment ofan Extended Team Orienteering Problem st Kun Xiao
Beijing Institute of Aerospace Systems Engineering
Beijing, Chinarobin [email protected] rd Ying Nie
Beijing Aerospace Automatic Control Institute
Beijing, [email protected] th Xiangke Wang
College of Intelligence Science and TechnologyNational University of Defense Technology
Changsha, [email protected] nd Junqi Lu
College of Intelligence Science and TechnologyNational University of Defense Technology
Changsha, [email protected] th Lan Ma
College of Intelligence Science and TechnologyNational University of Defense Technology
Changsha, [email protected] th Guohui Wang
China Academy of Launch Vehicle Technology
Beijing, [email protected]
Abstract —A benchmark for multi-UAV task assignment ispresented in order to evaluate different algorithms. An extendedTeam Orienteering Problem is modeled for a kind of multi-UAV task assignment problem. Three intelligent algorithms, i.e.,Genetic Algorithm, Ant Colony Optimization and Particle SwarmOptimization are implemented to solve the problem. A seriesof experiments with different settings are conducted to evaluatethree algorithms. The modeled problem and the evaluation resultsconstitute a benchmark, which can be used to evaluate otheralgorithms used for multi-UAV task assignment problems.
Index Terms —multi-UAV, task assignment, benchmark, TeamOrienteering Problem, intelligent algorithms
I. I
NTRODUCTION
Unmanned aerial vehicles (UAVs) develop rapidly due totheir large potential in both civilian and military uses, suchas disaster rescue, reconnaissance and surveillance. Limitedby its size and capability, a single UAV can hardly completecomplex and persistent tasks [1]. Therefore, swarms of UAVsare emerging as a disruptive technology to enable highly-reconfigurable, on-demand, distributed intelligent autonomoussystems with high impact on many areas of science, technol-ogy, and society [2].To achieve cooperation between UAVs, task assignmentis necessary to make them conduct tasks in a good orderand maximize total performance. The basic task assignmentproblem can be formulated as a Vehicle Routing Problem(VRP) [3]. VRP asks what the optimal set of routes fora fleet of vehicles is to traverse in order to deliver to agiven set of customers. For VRP, all the targets need to be reached and no time limit is set, which is unsuitable formany kinds of task assignment problems. Compared with VRP,Team Orienteering Problem (TOP) considers time limit andits goal is to maximize total reward under the time limit [4].Conventional TOP considers all vehicles have the same speed,which is unsuitable for the heterogeneous UAV swarm. And itdoesn’t consider the time cost when the UAV executes the taskafter reaching the target. To solve the unsuitability, we extendTOP, in which different UAVs have different flight speeds anddifferent targets have different time costs. Moreover, unlikeVRP and TOP, each UAV is unnecessary to come back to thedepot in our proposed problem. The object of our proposedproblem is to obtain as more reward as possible under certaintime limit.The extended TOP is suitable for a wide range of multi-UAV task assignment problems, such as reconnaissance andtransportation. Therefore, it can be a benchmark to evaluatedifferent algorithm. In this paper, three intelligent algorithms,Genetic Algorithm (GA) , Ant Colony Optimization (ACO)and Particle Swarm Optimization (PSO) are tested under a se-ries of experiments. The experiment environment, settings andanalysis, together with the implementation of three algorithmsare open sourced . Researchers can use the benchmark toevaluate their own algorithms. Source code at https://gitee.com/robin shaun/multi-uav-task-assignment-benchmarkor https://github.com/robin-shaun/Multi-UAV-Task-Assignment-Benchmark a r X i v : . [ c s . A I] S e p I. P
ROBLEM F ORMATION
The extended TOP is built on a directed graph. A completegraph G = ( V, A ) is given, where V = { , ..., n } is theset of vertices and A is the set of arcs. Vertices in N = V \{ } = { , ..., n } correspond to the targets, and vertex corresponds to the depot where UAVs start. d ij is the distancefrom vertex i ∈ V to vertex j ∈ V and d ij = d ji . r i is thereward associated with target i and r i > when i (cid:54) = 0 while r = 0 because the depot cannot supply any reward. t i is thetime consumption to finish the mission at target i . T max is thetime limit of the total task. If a UAV arrives target i but theremaining time is less than t i , it cannot obtain the reward r i .Given a set of K of UAVs, the TOP calls for the de-termination of at most | K | UAV routes that maximize thetotal collected reward, while satisfying a maximum durationconstraint [5]. The extended TOP has the same goal with TOP. y i,k is binary variable equal to 1 if target i ∈ V is visited byUAV k ∈ K , and otherwise. x ijk is binary variable equal to1 if path ( i, j ) ∈ A is traversed by UAV k , and otherwise. s k is the flight speed of UAV k .The mathematical programming formulation for the ex-tended TOP is as follows.maximize (cid:88) i ∈ V r i (cid:88) k ∈ K y ik s.t. (cid:80) j ∈ V x ijk = y ik ∀ i ∈ V, k ∈ K (cid:80) j ∈ V x jik = y ik ∀ i ∈ V, k ∈ K (cid:80) k ∈ K y k ≤ | K | (cid:80) k ∈ K y ik ≤ i ∈ V \{ } (cid:80) ( i,j ) ∈ δ + ( S ) x ijk ≥ y bk ∀ S ⊆ V \{ } , b ∈ S, k ∈ K (cid:80) ( i,j ) ∈ A d ij s k x ijk + t i y ik ≤ T max ∀ k ∈ Ky ik ∈ { , } ∀ i ∈ V, k ∈ Kx ijk ∈ { , } ∀ ( i, j ) ∈ A, k ∈ K Even though the position coordinate system is unnecessaryfor the problem, it is built to visualize the result. Fig. 1 showsthe extended TOP solved by GA. The red points are the targetsnot reached and the blue points are the targets reached. Theblack vertex is the depot. The size of the point is proportionalto the reward. Lines with different colors are paths traversedby different UAVs.III. D
ESIGN OF T HREE I NTELLIGENT A LGORITHMS
In this section, three intelligent algorithms, Genetic Algo-rithm, Ant Colony Optimization and Particle Swarm Optimiza-tion are designed to solve the extended TOP.
A. Genetic Algorithm
Genetic algorithm (GA) is a method to search the op-timal solution by simulating natural selection and geneticmechanism of biological evolution process [6]. The algorithmtransforms the process of solving a searching problem intoa process similar to the crossover and mutation of chromo-some during biological evolution. While dealing with complexcombination optimization problems with large solution space,genetic algorithm can obtain great results quickly.
Fig. 1. The extended TOP solved by GA
The first step is to determine a genetic representation of thesolution domain and a fitness function to evaluate the solutiondomain. Assuming that the time limit is large enough so thatall target can be reached, we can determine a string (cid:15) byarranging all the targets [7]. The length of string (cid:15) is equalto the total number of targets. And then, we can determine astring δ by dividing string (cid:15) into | K | groups [8]. The length ofstring δ is | K | − . The combination of string (cid:15) and string δ corresponds to a feasible solution. Fig. 2 shows the geneticrepresentation. The fitness function is defined as the totalreward. Fig. 2. Genetic representation of the solution domain
The flow chart of GA is shown as Fig. 3. In the selectionoperation, the roulette is performed on the new populationcombined by the parent population and offspring population togenerate a new parent population. In the crossover operation,any two gene codes in the new parent population exchangestheir codes with each other at a rate of 0.6. In the mutationoperation, each code in the population changed in its valuerange at a rate of 0.05. After the crossover operation and themutation operation, a new offspring population is generated.In order to speed up the convergence of genetic algorithm, thetermination condition is set as whether the maximum fitnessof the population does not change for 500 steps. ig. 3. Flow chart of GA
B. Ant Colony Optimization
The idea of ant colony optimization (ACO) is firstly givenin 1989 [9], and gradually implemented as a probabilistictechnique for solving computational problems which can bereduced to finding good paths through graphs [10]. Currently,the great majority of problems attacked by ACO are whichall the necessary information is available and does not changeduring problem solution [11]. Hence, it is a great method forsolving this problem. The flow chart of ACO is shown asFig. 4.The ants in the ant colony are equally divided into m groups.Since there are | K | UAVs (with different speeds), the numberof ants in each group is set as | K | . In other words, there are | K | types of ants. The target points of each group of ants arenot repeated, so the unvisited list would be reset only when agroup of ants are traversed.The next target of each ant could be obtained by roulettemethod. The reward function used for evaluating the solution isdefined as the sum of reward obtained by all ants in the group,denoted by r group . And the reward function used for evaluat-ing each ant is defined as the sum of the reward obtained bythe ant, denoted by r ant . And r max is the maximum of all the r group . Because of the time limit, the heuristic function shouldbe not only positively related to value, but also negativelyrelated to time. Thus, the heuristic function H is designed as H ( ant , j ) = s ont × r j d j − j × t j where j ∈ V Fig. 4. Flow Chart of ACO
The number of iterations is set as a constant iter . Therewards of a group is related to the ants in the group, whilepheromone of a type is related to the ants belonging to thetype. The total number of ants in an iteration is | K | × m ,so tremendous number of ants are needed for solving theproblem. In order to improve the convergence speed, thevolatilization factor(V) of each type of pheromone is deter-mined by the reward obtained by the type of ants in oneiteration. V( type ) = (cid:88) ant ∈ type r ant r max − r group ) η /m C. Particle Swarm Optimization
Particle swarm optimization (PSO) is a global randomsearch algorithm which simulates the migration and swarmbehavior of birds in the process of foraging. Its basic core isto make use of the information shared by the individuals in thegroup, so that the movement of the whole group will evolvefrom disorder to order in the problem solving space [12].The flow chart of PSO is shown in Fig. 5. The first step isto initialize the particle swarm according to UAV number | K | and target number n , which includes the initialization of thenumber of particles and iteration, the position of particle andthe velocity of particle. In our design, P N = 2( n + | K | − is the number of particles and iter = 40( n + | K | − is thenumber of iterations. Both the position and the velocity ofparticle swarm are set to be P N ( n + | K | − dimensionalarrays. Similar to GA described above, the first n dimensionof particle position represents the arrangement of targets, andthe last | K | − dimension represents the way of dividing thetargets.econdly, in the mutation part, there is a probability that theparticle position will change. Referring to [13], the mutationprobability of each iteration is set as 0.4, the particle numberproportion of each mutation is set as 0.5, and the mutationposition ratio of each mutation particle is set as 0.5. Thirdly,we use local PSO, in which all particles are divided intosmall swarms and the optimization is done separately in allsmall swarms, to jump out the local maximum in the earlyperiod. And then, in the velocity updating part, the newvelocity of each particle is generated according to the currentglobal optimal particle position and historical optimal particleposition [14]. Then, in the position updating part, the newposition of each particle is updated by the current positionplus the new velocity.Then the reward of each particle is calculated. The rewardis set as the total reward obtained by a particle.If the rewardis greater than historical optimal solution, the historical op-timal particle position will be updated to the current particleposition and then if the reward is even greater than globaloptimal solution, the global optimal particle position will alsobe updated to the current particle position. The terminationcondition is when the number of iterations reaches the upperlimit iter . Fig. 5. Flow Chart of PSO
IV. E
XPERIMENT AND R ESULT A NALYSIS
A. Experiment settings
Generally, intelligent algorithms cannot obtain global op-timum solution and has a certain degree of randomness.To evaluate different algorithms fairly, a series of repeatedexperiments have been conducted.The experiments are divided into three groups, small scale,medium scale and large scale. Different groups have differentsettings, shown as Table I. Except the number of UAVsand targets , other key parameters of the extended TOP aregenerated randomly, such as target positions, target rewards,time consumption at different targets and flight speeds. Forone scale, 10 groups of parameters are generated randomly.Under each parameter setting, each algorithm solves 10 times.Intel Core i5-8250 CPU is used in the experiment.
TABLE IE
XPERIMENT SETTINGS FOR DIFFERENT SCALES
Small scale Medium scale Large scale
UAV number 5 10 15Target number 30 60 90
B. Result Analysis
The evaluation index includes obtained reward and timecomplexity. The experiment results are shown as Fig. 6 andFig. 7. For mean reward, ACO performs best in the large scalegroup, but performs worst in the small scale. As a whole,three algorithms obtain similar rewards. However, for meancomputational time usage, three algorithms have differentperformances: GA performs best, PSO follows, and ACOperforms worst. Considering both obtained reward and timecomplexity, GA is recommended to solve the extended TOPamong the three algorithms.
Fig. 6. Mean reward comparison among three algorithmsig. 7. Mean computational time usage comparison among three algorithms
V. C
ONCLUSION