[PDF] A Benchmark for Multi-UAV Task Assignment of an Extended Team Orienteering Problem

Abstract

A benchmark for multi-UAV task assignment is presented in order to evaluate different algorithms. An extended Team Orienteering Problem is modeled for a kind of multi-UAV task assignment problem. Three intelligent algorithms, i.e., Genetic Algorithm, Ant Colony Optimization and Particle Swarm Optimization are implemented to solve the problem. A series of experiments with different settings are conducted to evaluate three algorithms. The modeled problem and the evaluation results constitute a benchmark, which can be used to evaluate other algorithms used for multi-UAV task assignment problems.

Full PDF

AA Benchmark for Multi-UAV Task Assignment ofan Extended Team Orienteering Problem st Kun Xiao

Beijing Institute of Aerospace Systems Engineering

Beijing, Chinarobin [email protected] rd Ying Nie

Beijing Aerospace Automatic Control Institute

Beijing, [email protected] th Xiangke Wang

College of Intelligence Science and TechnologyNational University of Defense Technology

Changsha, [email protected] nd Junqi Lu

College of Intelligence Science and TechnologyNational University of Defense Technology

Changsha, [email protected] th Lan Ma

College of Intelligence Science and TechnologyNational University of Defense Technology

Changsha, [email protected] th Guohui Wang

China Academy of Launch Vehicle Technology

Beijing, [email protected]

Abstract —A benchmark for multi-UAV task assignment ispresented in order to evaluate different algorithms. An extendedTeam Orienteering Problem is modeled for a kind of multi-UAV task assignment problem. Three intelligent algorithms, i.e.,Genetic Algorithm, Ant Colony Optimization and Particle SwarmOptimization are implemented to solve the problem. A seriesof experiments with different settings are conducted to evaluatethree algorithms. The modeled problem and the evaluation resultsconstitute a benchmark, which can be used to evaluate otheralgorithms used for multi-UAV task assignment problems.

Index Terms —multi-UAV, task assignment, benchmark, TeamOrienteering Problem, intelligent algorithms

I. I

NTRODUCTION

Unmanned aerial vehicles (UAVs) develop rapidly due totheir large potential in both civilian and military uses, suchas disaster rescue, reconnaissance and surveillance. Limitedby its size and capability, a single UAV can hardly completecomplex and persistent tasks [1]. Therefore, swarms of UAVsare emerging as a disruptive technology to enable highly-reconﬁgurable, on-demand, distributed intelligent autonomoussystems with high impact on many areas of science, technol-ogy, and society [2].To achieve cooperation between UAVs, task assignmentis necessary to make them conduct tasks in a good orderand maximize total performance. The basic task assignmentproblem can be formulated as a Vehicle Routing Problem(VRP) [3]. VRP asks what the optimal set of routes fora ﬂeet of vehicles is to traverse in order to deliver to agiven set of customers. For VRP, all the targets need to be reached and no time limit is set, which is unsuitable formany kinds of task assignment problems. Compared with VRP,Team Orienteering Problem (TOP) considers time limit andits goal is to maximize total reward under the time limit [4].Conventional TOP considers all vehicles have the same speed,which is unsuitable for the heterogeneous UAV swarm. And itdoesn’t consider the time cost when the UAV executes the taskafter reaching the target. To solve the unsuitability, we extendTOP, in which different UAVs have different ﬂight speeds anddifferent targets have different time costs. Moreover, unlikeVRP and TOP, each UAV is unnecessary to come back to thedepot in our proposed problem. The object of our proposedproblem is to obtain as more reward as possible under certaintime limit.The extended TOP is suitable for a wide range of multi-UAV task assignment problems, such as reconnaissance andtransportation. Therefore, it can be a benchmark to evaluatedifferent algorithm. In this paper, three intelligent algorithms,Genetic Algorithm (GA) , Ant Colony Optimization (ACO)and Particle Swarm Optimization (PSO) are tested under a se-ries of experiments. The experiment environment, settings andanalysis, together with the implementation of three algorithmsare open sourced . Researchers can use the benchmark toevaluate their own algorithms. Source code at https://gitee.com/robin shaun/multi-uav-task-assignment-benchmarkor https://github.com/robin-shaun/Multi-UAV-Task-Assignment-Benchmark a r X i v : . [ c s . A I] S e p I. P

ROBLEM F ORMATION

The extended TOP is built on a directed graph. A completegraph G = ( V, A ) is given, where V = { , ..., n } is theset of vertices and A is the set of arcs. Vertices in N = V \{ } = { , ..., n } correspond to the targets, and vertex corresponds to the depot where UAVs start. d ij is the distancefrom vertex i ∈ V to vertex j ∈ V and d ij = d ji . r i is thereward associated with target i and r i > when i (cid:54) = 0 while r = 0 because the depot cannot supply any reward. t i is thetime consumption to ﬁnish the mission at target i . T max is thetime limit of the total task. If a UAV arrives target i but theremaining time is less than t i , it cannot obtain the reward r i .Given a set of K of UAVs, the TOP calls for the de-termination of at most | K | UAV routes that maximize thetotal collected reward, while satisfying a maximum durationconstraint [5]. The extended TOP has the same goal with TOP. y i,k is binary variable equal to 1 if target i ∈ V is visited byUAV k ∈ K , and otherwise. x ijk is binary variable equal to1 if path ( i, j ) ∈ A is traversed by UAV k , and otherwise. s k is the ﬂight speed of UAV k .The mathematical programming formulation for the ex-tended TOP is as follows.maximize (cid:88) i ∈ V r i (cid:88) k ∈ K y ik s.t. (cid:80) j ∈ V x ijk = y ik ∀ i ∈ V, k ∈ K (cid:80) j ∈ V x jik = y ik ∀ i ∈ V, k ∈ K (cid:80) k ∈ K y k ≤ | K | (cid:80) k ∈ K y ik ≤ i ∈ V \{ } (cid:80) ( i,j ) ∈ δ + ( S ) x ijk ≥ y bk ∀ S ⊆ V \{ } , b ∈ S, k ∈ K (cid:80) ( i,j ) ∈ A d ij s k x ijk + t i y ik ≤ T max ∀ k ∈ Ky ik ∈ { , } ∀ i ∈ V, k ∈ Kx ijk ∈ { , } ∀ ( i, j ) ∈ A, k ∈ K Even though the position coordinate system is unnecessaryfor the problem, it is built to visualize the result. Fig. 1 showsthe extended TOP solved by GA. The red points are the targetsnot reached and the blue points are the targets reached. Theblack vertex is the depot. The size of the point is proportionalto the reward. Lines with different colors are paths traversedby different UAVs.III. D

ESIGN OF T HREE I NTELLIGENT A LGORITHMS

In this section, three intelligent algorithms, Genetic Algo-rithm, Ant Colony Optimization and Particle Swarm Optimiza-tion are designed to solve the extended TOP.

A. Genetic Algorithm

Genetic algorithm (GA) is a method to search the op-timal solution by simulating natural selection and geneticmechanism of biological evolution process [6]. The algorithmtransforms the process of solving a searching problem intoa process similar to the crossover and mutation of chromo-some during biological evolution. While dealing with complexcombination optimization problems with large solution space,genetic algorithm can obtain great results quickly.

Fig. 1. The extended TOP solved by GA

The ﬁrst step is to determine a genetic representation of thesolution domain and a ﬁtness function to evaluate the solutiondomain. Assuming that the time limit is large enough so thatall target can be reached, we can determine a string (cid:15) byarranging all the targets [7]. The length of string (cid:15) is equalto the total number of targets. And then, we can determine astring δ by dividing string (cid:15) into | K | groups [8]. The length ofstring δ is | K | − . The combination of string (cid:15) and string δ corresponds to a feasible solution. Fig. 2 shows the geneticrepresentation. The ﬁtness function is deﬁned as the totalreward. Fig. 2. Genetic representation of the solution domain

The ﬂow chart of GA is shown as Fig. 3. In the selectionoperation, the roulette is performed on the new populationcombined by the parent population and offspring population togenerate a new parent population. In the crossover operation,any two gene codes in the new parent population exchangestheir codes with each other at a rate of 0.6. In the mutationoperation, each code in the population changed in its valuerange at a rate of 0.05. After the crossover operation and themutation operation, a new offspring population is generated.In order to speed up the convergence of genetic algorithm, thetermination condition is set as whether the maximum ﬁtnessof the population does not change for 500 steps. ig. 3. Flow chart of GA

B. Ant Colony Optimization

The idea of ant colony optimization (ACO) is ﬁrstly givenin 1989 [9], and gradually implemented as a probabilistictechnique for solving computational problems which can bereduced to ﬁnding good paths through graphs [10]. Currently,the great majority of problems attacked by ACO are whichall the necessary information is available and does not changeduring problem solution [11]. Hence, it is a great method forsolving this problem. The ﬂow chart of ACO is shown asFig. 4.The ants in the ant colony are equally divided into m groups.Since there are | K | UAVs (with different speeds), the numberof ants in each group is set as | K | . In other words, there are | K | types of ants. The target points of each group of ants arenot repeated, so the unvisited list would be reset only when agroup of ants are traversed.The next target of each ant could be obtained by roulettemethod. The reward function used for evaluating the solution isdeﬁned as the sum of reward obtained by all ants in the group,denoted by r group . And the reward function used for evaluat-ing each ant is deﬁned as the sum of the reward obtained bythe ant, denoted by r ant . And r max is the maximum of all the r group . Because of the time limit, the heuristic function shouldbe not only positively related to value, but also negativelyrelated to time. Thus, the heuristic function H is designed as H ( ant , j ) = s ont × r j d j − j × t j where j ∈ V Fig. 4. Flow Chart of ACO

The number of iterations is set as a constant iter . Therewards of a group is related to the ants in the group, whilepheromone of a type is related to the ants belonging to thetype. The total number of ants in an iteration is | K | × m ,so tremendous number of ants are needed for solving theproblem. In order to improve the convergence speed, thevolatilization factor(V) of each type of pheromone is deter-mined by the reward obtained by the type of ants in oneiteration. V( type ) = (cid:88) ant ∈ type r ant r max − r group ) η /m C. Particle Swarm Optimization

Particle swarm optimization (PSO) is a global randomsearch algorithm which simulates the migration and swarmbehavior of birds in the process of foraging. Its basic core isto make use of the information shared by the individuals in thegroup, so that the movement of the whole group will evolvefrom disorder to order in the problem solving space [12].The ﬂow chart of PSO is shown in Fig. 5. The ﬁrst step isto initialize the particle swarm according to UAV number | K | and target number n , which includes the initialization of thenumber of particles and iteration, the position of particle andthe velocity of particle. In our design, P N = 2( n + | K | − is the number of particles and iter = 40( n + | K | − is thenumber of iterations. Both the position and the velocity ofparticle swarm are set to be P N ( n + | K | − dimensionalarrays. Similar to GA described above, the ﬁrst n dimensionof particle position represents the arrangement of targets, andthe last | K | − dimension represents the way of dividing thetargets.econdly, in the mutation part, there is a probability that theparticle position will change. Referring to [13], the mutationprobability of each iteration is set as 0.4, the particle numberproportion of each mutation is set as 0.5, and the mutationposition ratio of each mutation particle is set as 0.5. Thirdly,we use local PSO, in which all particles are divided intosmall swarms and the optimization is done separately in allsmall swarms, to jump out the local maximum in the earlyperiod. And then, in the velocity updating part, the newvelocity of each particle is generated according to the currentglobal optimal particle position and historical optimal particleposition [14]. Then, in the position updating part, the newposition of each particle is updated by the current positionplus the new velocity.Then the reward of each particle is calculated. The rewardis set as the total reward obtained by a particle.If the rewardis greater than historical optimal solution, the historical op-timal particle position will be updated to the current particleposition and then if the reward is even greater than globaloptimal solution, the global optimal particle position will alsobe updated to the current particle position. The terminationcondition is when the number of iterations reaches the upperlimit iter . Fig. 5. Flow Chart of PSO

IV. E

XPERIMENT AND R ESULT A NALYSIS

A. Experiment settings

Generally, intelligent algorithms cannot obtain global op-timum solution and has a certain degree of randomness.To evaluate different algorithms fairly, a series of repeatedexperiments have been conducted.The experiments are divided into three groups, small scale,medium scale and large scale. Different groups have differentsettings, shown as Table I. Except the number of UAVsand targets , other key parameters of the extended TOP aregenerated randomly, such as target positions, target rewards,time consumption at different targets and ﬂight speeds. Forone scale, 10 groups of parameters are generated randomly.Under each parameter setting, each algorithm solves 10 times.Intel Core i5-8250 CPU is used in the experiment.

TABLE IE

XPERIMENT SETTINGS FOR DIFFERENT SCALES

Small scale Medium scale Large scale

UAV number 5 10 15Target number 30 60 90

B. Result Analysis

The evaluation index includes obtained reward and timecomplexity. The experiment results are shown as Fig. 6 andFig. 7. For mean reward, ACO performs best in the large scalegroup, but performs worst in the small scale. As a whole,three algorithms obtain similar rewards. However, for meancomputational time usage, three algorithms have differentperformances: GA performs best, PSO follows, and ACOperforms worst. Considering both obtained reward and timecomplexity, GA is recommended to solve the extended TOPamong the three algorithms.

Fig. 6. Mean reward comparison among three algorithmsig. 7. Mean computational time usage comparison among three algorithms

V. C

ONCLUSION