Leader-follower based Coalition Formation in Large-scale UAV Networks, A Quantum Evolutionary Approach
Sajad Mousavi, Fatemeh Afghah, Jonathan D. Ashdown, Kurt Turck
LLeader-follower based Coalition Formation inLarge-scale UAV Networks, A QuantumEvolutionary Approach
Sajad Mousavi ∗ , Fatemeh Afghah ∗ , Jonathan D. Ashdown † and Kurt Turck †∗ School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, AZ, United StatesEmail: { sajadmousavi, fatemeh.afghah } @nau.edu † Air Force Research Laboratory, Rome, NY, United StatesEmail: { jonathan.ashdown, kurt.turck } @us.af.mil Abstract —The problem of decentralized multiple Point ofInterests (PoIs) detection and associated task completion in anunknown environment with multiple resource-constrained andself-interested Unmanned Aerial Vehicles (UAVs) is studied. TheUAVs form several coalitions to efficiently complete the compoundtasks which are impossible to be performed individually. Theobjectives of such coalition formation are to firstly minimizeresource consumption in completing the encountered tasks ontime, secondly to enhance the reliability of the coalitions, andlastly in segregating the most trusted UAVs amid the selfinterested of them. As many previous publications have merelyfocused upon minimizing costs, this study considers a multi-objective optimization coalition formation problem that considersthe three aforementioned objectives. In doing so, a leader-follower-inspired coalition formation algorithm amalgamating thethree objectives to address the problem of the computationalcomplexity of coalition formation in large-scale UAV networks isproposed. This algorithm attempts to form the coalitions withminimally exceeding the required resources for the encounteredtasks while maximizing the number of completed tasks. Theproposed algorithm is based on Quantum Evolutionary Algo-rithms (QEA) which are a combination of quantum computingand evolutionary algorithms. Results from simulations show thatthe proposed algorithm significantly outperforms the existingcoalition formation algorithms such as merge-and-split and afamous multi-objective genetic algorithm called NSGA-II . Keywords — Unmanned aerial vehicles, coalition formation, mis-sion completion, evolutionary algorithms.
I. I
NTRODUCTION
A single agent system is often unable to perform complextasks considering the limited individual capabilities of suchan agent. Therefore, cooperative multi-agent systems (MASs)can offer a practical solution to this problem by ensemblinga complementary set of different capabilities/resources fromseveral agents. However, one of the key challenges in suchcooperative MASs is forming optimal sub-groups of agents(i.e., coalition formation) in order to efficiently perform theexisting tasks, especially in a distributed case where no centralcontroller is available. That being said, the coalition formationproblem concerns how different coalitions can be formedconsidering the tasks’ requirements and the agents’ capabilitiesso much so that the collective goals of the tasks are reachedin the most effective manner. DISTRIBUTION A. Approved for public release: distribution unlimited.Case Number: 88ABW-2018-0096 . Dated 10 Jan 2018.
A considerable amount of research has been recentlycarried out in solving coalition formation problems. This hasspawned several classic methods to form stable coalitions thatfollow common stability concepts based on Core, Shapleyvalue, Bargaining Set, and Kernel [1], [2], [3]. However,achieving such stability concepts often mandates high compu-tational complexity. Many researchers have attempted to dealwith the problem of coalition formation in multi-agent systemsby applying various approaches including genetic algorithms[4], dynamic programing methods [5], graph theory [6], [7],iterative processes [8], cooperative Multi-Agent ReinforcementLearning (MARL) [9], and temporal-spatial abstraction MARL[10], [11], [12]. The majority of these reported studies havefocused mainly in reducing costs and only a few of themhave focused upon multiple objectives in solving the problem.In this paper, a solution for coalition formation problem ina heterogeneous resource-constraint UAV network in whichmultiple objectives are considered to form the optimal coali-tions is proposed. The objectives of the coalition formationmethod proposed here are : minimizing the cost associated withconsumption of resources of the coalitions formed, maximizingthe reliability of the formed coalitions, and lastly to select themost trustworthy of UAVs among the available self-interestedUAVs in the network.Finding the solution of such a multi-objective coalitionformation problem involves an NP-hard problem. Many ap-proaches such as mixed integer linear programming [13], [14]and dynamic network flow optimization [15] have been utilizedto provide an exact solution to this problem. However, sincethese approaches seek such a solution, applying them to alarge-scale problem is computationally taxing. More recently,metaheuristic algorithms such as Particle Swarm Optimization(PSO) [16], Ant Colony Optimization [17], Genetic Algo-rithms (GAs) [18], and Simulated Annealing (SA) [19] haveoffered reasonable solutions in efficient times for a variation ofmulti-objective optimization problems. Inspired by the successof evolutionary approaches, this paper presents a novel leader-follower-based coalition formation algorithm using a quantumevolutionary approach whilst considering the aforementionedobjectives in addressing the problem. As an unknown, dynamicenvironment is assumed where no prior knowledge is availableabout targets or Point of Interests (PoIs) and UAVs need togain the knowledge of environment dynamically [20], [21].Some potential applications of the proposed method are searchand rescue, humanitarian relief, and public safety operationsin unknown remote environments or military operations in a r X i v : . [ c s . M A ] F e b emote fields. In such environments, it can be safely assumedthat ground station does not have prior information about thePoIs and their positions. To evaluate the performance of theproposed method in such an unknown environment, severalscenarios with different numbers of tasks ranging from 4 to 24,and a heterogeneous network of UAVs consists of a differentnumber of UAVs ranging from 8 to 124 were simulated.The rest of the paper is organized as follows: Section IIpresents the problem statement and the formulation of themulti-UAV coalition formation as a multi-objective optimiza-tion problem. Section III describes the proposed coalitionformation algorithm. Section IV reports the simulation resultsfollowed by concluding remarks in section V.II. P ROBLEM S TATEMENT
In this section, the decentralized task allocation problemin a network of autonomous UAVs by forming optimumcoalitions are formulated. The possibility of selfish behaviorof these self-interested UAVs are accounted for and reputationguidelines in selecting the most reliable of UAVs to participatein the formed coalitions are also defined. It is vital to notethat the proposed algorithm considers minimizing the cost ofcoalition formation in terms of overspending the resourceson particular tasks as well as enhancing the reliability of theformed coalitions as paramount.A heterogeneous network of N UAVs U = { u , u , . . . , u N } , where each UAV, u i can carry adifferent set of resources compared to another is considered. R u i = { r u i , r u i , . . . , r N r u i } denotes the set of resourcesavailable at UAV, u i where N r is the number of possibleresources in the network. It is also assumed that there exists n tasks in the environment T = { T , T , . . . , T n } . Each task, T i requires a certain amount available resource in each of R u i in order to be completed. The vector of required resourcesfor task i is defined as follows: τ i = { τ i , τ i , τ i , . . . , τ iN r } , (1)where τ ij is the required amount of resource j for task i . Itis assumed that each task is associated to one PoI (target)and the PoIs can be located in different positions with adiverse set of resource requirements. All UAVs are able tosearch the unknown environment for new PoIs. The tasks arecarried out by the formed coalitions S = { S , S , . . . , S m } ,where each coalition S i is responsible for one task. A largesearch space where the PoIs are distributed far apart fromeach other are considered. It can therefore be assumed thatthe formed coalitions are sufficiently far from one anotherthat each UAV can only be a member of a single coalition,i.e., S k ∩ S l = ∅ , ∀ S k , S l ∈ S . This also means that thecoalitions are non-overlapping. The capability of each coalition S i to complete its encountered task is defined as the value ofcoalition v ( S i ) , ( v ( S i ) ∈ R ) as described in the next section.Moreover, cost ( S i , T i ) is defined as the cost of coalition S i in performing task T i . The cost function for coalition S i captures the cost imposed on all UAV members of this coalition(i.e., (cid:80) u j ∈ S i cost ( S i , u j )) in which their resources have beenshared to accomplish task T i .Another key contribution of the proposed model is consid-ering the reliability factor in forming the optimal coalitions.While in the majority of existing techniques, it is assumedthat the UAV members of formed coalitions are perfectly operational during the mission lifetime, this is obviously not arealistic assumption as the UAVs’ operation can be interruptedfor several reasons (e.g., exhaustion of battery or a particularresource). A practical scenario is considered where the UAVsare assumed to be either fully operational or one of theircapabilities (e.g., resources) are bound to fail during themission. Such failures in various capabilities are considered tobe statistically independent. Furthermore, involving differenttypes of UAVs in a coalition may result in different executiontimes of accomplishing the sub-tasks as the UAVs have adifferent set of capabilities. For instance, a given UAV may beable to fulfill its duty in a shorter time than another. The costand reliability of a coalition indeed depend on these executiontimes where lower execution times are favored for incurringlower execution costs. In the next section, we define andformulate these factors (i.e., cost, reliability and reputation). A. Definition of Cost, Reliability, and Reputation
A set of UAVs in the form of a coalition collaborate withone another to carry out the encountered task. Participationin such coalitions involves a cost of sharing and consumingthe resources for the member UAVs. For a given task T k with the required resources τ k = { τ k , τ k , τ k , . . . , τ kN r } , wherethe amount of resource i for task k is denoted by τ ki ≥ ,the execution cost of consuming resource j of UAV i isdenoted by e ij , where e ij = µ j r ju i ∀ i, j and µ j is a constantcoefficient in order to convert the amount of resource j to atime dependent value to have the same unit as the executiontime. Thus, the cost of coalition S k , C ( S k ) can be calculatedas follows: C ( S k ) = N k (cid:88) i =1 N r (cid:88) j =1 e ij k ij + a ij , (2)where k ij is the execution time if UAV i carrying resource j is involved in task k , and a ij is the travel time of UAV i totask j .Possibility of potential defects in the UAVs’ resources thatmay result in performance failure of these UAVs during themission is also accounted for. Considering so, the reliabilityof coalition S k for a give task T k , denoted by R ( S k ) is definedas follows: R ( S k ) = N k (cid:89) i =1 e − (cid:80) Nrj =1 λ ij k ij , (3)where λ ij is the failure rate of resource j of UAV i .For simplicity, log e transfer of R ( S k ) function as follows, ln( R ( S k )) = − (cid:80) N k i =1 (cid:80) N r j =1 λ ij k ij is used. The formulationof the reliability has been inspired by works reported in [22]and [23]. Interested readers are referred to these papers formore details on the probability that a system can accomplisha particular task without failure. Similar to other cognitiveagents, the UAVs are expected to be self-interested in thesense that they prefer to save their limited resources, andact selfishly by not consuming enough resources during themission. To monitor the cooperative behavior of these UAVs,an accumulative cooperative reputation related to the amountof resources that each UAV shares during the mission isdefined. During each mission t (i.e., accomplishing an assignedtask), the cooperative reputation of each UAV i , ρ i is updatedas follows: ρ ti = (cid:26) ρ t − i + ∆ ρ ti , ∃ k | u i ∈ S k ρ t − i , otherwise (4)able I: Notations a ij Travel time of UAV i to task j . e ij Execution cost of involving resource j of UAV i in a coalition. It is computedper unit time. k ij Execution time of involving resource j of UAV i in a coalition. It depends onthe task and capability of the UAV. λ ij Failure rate of involving the resource j of UAV i in a coalition. ρ i Credit of UAV i . N k Number of UAVs in coalition k . N r Number of network resources. N Number of network UAVs. where ∆ ρ ti is the amount of contribution of UAV i to coalition S k in terms of sharing resources to carry out the assigned task k and can be defined as follows: ∆ ρ ti = Υ k (cid:80) m ∈ S k f m f i , (5)where Υ k is the sum of the resource requirements of task T k denoting as Υ k = (cid:80) N r j =1 τ kj and f i is the sum of the resourcecontributions of UAV i , defined as f i = (cid:80) N r j =1 r jui τ kj . As such,the coalition reputation of all involved UAVs in the coalition S k is computed as follows: P ( S k ) = N k (cid:88) i =1 ρ ti (6) B. Formulation of Multi-objective Optimization
To consider all aforementioned optimization criteriaincluding reducing the coalition cost, and increasing thereliability, and reputation of the formed coalitions, Multi-Objective Optimization Problem (MOOP) as a weighted-sumof three objectives is defined. The multi-objective optimizationand its required constraints are defined as follows. min O ( S k ) = C ( S k ) − η ln R ( S k ) − η P ( S k ) (7) subject to N k (cid:88) i =1 e ij ≥ τ kj ∀ j = 1 , , , . . . , N r , (8)where η and η are weighting parameters to assign thedesired importance to each objective and scale them to be incomparative ranges. Constraint (8) refers to the requirement tosecure enough resources in the formed coalition to completethe encountered task T k . In table I, a summary of the notationsused throughout this paper is presented.III. P ROPOSED ALGORITHM
The objective function described in section II-B is a NP-hard problem. Standard approaches such as dynamic program-ming and exact algorithms involve computational complexityof O ( n ) that could be intractable in large-scale networks.Hence, evolutionary algorithms such as genetic algorithm canbe considered as potential options to find feasible solutionsof this problem. In this paper, we propose a coalition for-mation algorithm based on a version of genetic algorithmcalled Quantum-Inspired Genetic Algorithm (QIGA) to findthe solution of the multi-objective problem in (7). A. Review of Quantum-Inspired Genetic Algorithm (QIGA)
The idea behind QIG algorithms is to take advantageof both GA and quantum computing mechanisms [22].In quantum computation, the data representation is basedon qubit that is the smallest information unit. A qubit isconsidered as a superposition of two different states | (cid:105) and | (cid:105) that can be denoted as: | ψ (cid:105) = α | (cid:105) + β | (cid:105) , (9)where α and β are complex numbers such that | α | + | β | = 1 . | α | and | β | are the probability of amplitudes where thequbit can be at states | (cid:105) and | (cid:105) , respectively. With m qubits, the model can represent m independent states.However, when the value of the qubit is measured, it leadsto a single state of the quantum state (i.e., | (cid:105) or | (cid:105) ).Similar to a standard genetic algorithm, a chromosome’srepresentation is defined as a string of m information units.Thus, a chromosome can be defined as a string of m qubits as: (cid:26)(cid:18) α β (cid:19) , (cid:18) α β (cid:19) , (cid:18) α β (cid:19) , . . . , (cid:18) α m − β m − (cid:19)(cid:27) , (10)where each pair ( α i , β i ) , i = 1 , , , . . . , m − denotes a geneof the chromosome.To evaluate a quantum chromosome, a transfer functioncalled measure is applied to convert the quantum states to aclassical chromosome representation. For example, each pair ( α i , β i ) is converted to a value c i ∈ { , } so that the m -qubitchromosome may result in a binary string { c , c , c , . . . , c m } .More specifically, each pair becomes or using the corre-sponding qubit probabilities | α i | and | β i | . To perform thisconversion, we use the measure function defined as follow: function MEASURE ( α )set r to a random number between and ; if r > | α | thenreturn ; elsereturn ; end ifend function Each binary string is a possible solution which is evaluated viaa problem dependent fitness function. In the following section,the proposed fitness function is described.
B. Fitness function
The fitness function or evaluation function determines howto fit a solution with respect to the constrained optimizationproblem. To build the fitness function, the negative sign ofthe objective function O ( S k ) , defined in (7), is used. Inaddition, as the solutions which meet all of the constraintsget higher fitness value, the solutions which violate some ofthe constrains should achieve a lower objective value fromthe fitness function. To prevent potential violations, a penaltyfunction technique that penalizes the solutions according toamount of constraints’ mismatches is applied. Consideringthese facts, the fitness function is proposed as: F ( S k ) = − ( O ( S k ) + g ( S k )) , (11)where g ( S k ) is the penalty function such that if there is noviolation, its value will be zero, and positive otherwise. g ( S k ) s defined as: g ( S k ) = γ × N r (cid:88) i =1 max (0 , τ kj − N s (cid:88) i =1 e ij ) , (12)where γ is the penalty coefficient that controls the weight ofthe amount of constraints violated.Evolutionary strategies (e.g., crossover and mutation op-erations) are often considered in GA algorithms to improvetheir performance. However, as stated in [22], applying thecrossover and mutation operators do not significantly improvethe performance of the QIGA. Therefore, in the QIGAs,usually a qubit rotation gates strategy is used. Qubit ( α i , β i ) ofthe m -qubit chromosome is updated according to the rotationgates in order to get more or less probabilities to states | (cid:105) and | (cid:105) . Therefore, at each time step t , the update of qubit isperformed based on the following rotation gate matrix L ( θ i ) : L ( θ i ) = (cid:18) cos ( θ i ) − sin ( θ i ) sin ( θ i ) cos ( θ i ) (cid:19) ; (cid:18) α ti β ti (cid:19) = L ( θ i ) (cid:18) α t − i β t − i (cid:19) , (13)where θ i is the amount of angle rotation of qubit gate i .Algorithm 1 presents the pseudo-code for QIGA as describedin [22]. Algorithm 1
Quantum-Inspired Genetic Algorithm t ← Initialize Q ( t ) as the population Make P ( t ) by measuring Q ( t ) Evaluate P ( t ) Store the best solution b among P ( t ) repeat t ← t + 1 Make P ( t ) by measuring Q ( t − Evaluate P ( t ) Update Q ( t ) using quantum gates L ( t ) Store the best solution b among P ( t ) until the termination-condition C. Multi-UAV Coalition Formation
To establish a multi-UAV coalition, a leader-follower coali-tion formation method is followed. Initially, the UAVs areuniformly distributed in a search space to look for the PoIs.When a PoI is detected by a UAV, this UAV computes theresource requirements of the detected PoI and serves as aleader to form an optimal coalition. After calculating therequired resources, the leader UAV calls other UAVs withina certain distance to join it in forming a coalition. Then, theUAVs with at least one of the required resources can respondto this call by reporting the amount of resources that theyare able to contribute to help accomplish the task. It is alsoassumed that the UAVs are self-interested, meaning that if aUAV receives multiple requests, it will consider joining thecoalition which offers the highest benefit. The UAV i , u i measures the value of each request based on travel time toreach the task and the expected cooperative reputation creditreceived. Thus, its utility value can be defined as: U ( u i , S k ) = ρ i − δa ij , (14)where ρ i and a i are the cooperation credit and travel time of u i when it attempts to join coalition k . δ is the weight indicating the relative significance of the travel time compared to theexpected credit. Algorithm 2 shows the pseudo-code for themulti-UAV coalition formation. Algorithm 2
Multi-UAV coalition formation using the leader-follower method and QIGA Search PoIs in the search space Initialize each leader as a singleton coalition committed toa single PoI coalition members ← [ ] while there exists an idle UAV around do for all unsatisfied PoIs so far do coal mems ← execute MOQIG method regardingalgorithm 1 coalition members.append ( coal mems ) end for Send bids to UAVs as potential followers
Calculate the utility values of the followers and receivebid responses
Update coalition members of each leader with respectto the bid responses end while
IV. E
XPERIMENTAL RESULTS
To evaluate the performance of the proposed QIGA-basedcoalition formation method, several scenarios with differentnumber of UAVs and PoIs were simulated. It is assumed thatthe UAVs and PoIs are uniformly distributed across the regionand the closest UAV to the PoI is considered to be the onethat first detects the PoI and form a coalition (as a coalitionleader). It is also assumed that each UAV has five differenttypes of resources. The values for the UAVs’ resources aregenerated with a random uniform distribution and the UAV’sresource failure rates are produced randomly in the range (5 × − , − ) . Furthermore, the execution times of the identifiedtasks are computed randomly in the range between and ,depending on the task and the capability of the UAV.The performance of our proposed algorithm is comparedwith three well-known algorithms including: i) the distance-based coalition formation method in which the coalitions areformed with the closest UAVs to the leader (i.e., the leader onlyconsiders the UAVs in a certain distance of the PoI to be inthe coalition and do not evaluate them in terms of cooperativereputation or available resources), ii) the common merge-and-split coalition formation [24], and iii) a Non-DominatedSorting Genetic Algorithm (NSGA-II) which is a fast, elitistand heuristic-based multi-objective algorithm. Table II showsthe corresponding parameters for these algorithms.Table III represents some statistics regarding the completedtasks and resources violations for different algorithms. Itdemonstrates that the performance of the proposed coalitionformation method is quite better than other algorithms ad-dressed here in terms of percentage of completed tasks andaverage of resource violations. We also compared the proposedmethod against the merge-and-split coalition formation algo-rithm. The percentage of completed tasks for the merge-and-split method in 30 missions for different numbers of UAVs andtasks were between 46% and 50%, while, as shown in tableIII, our method could achieve rate of 90% in task completion.able II: Algorithms’ parameters MethodParameter
Distance − Based NSGA − II MOQGA
Population size NA
200 200
Maximum number of iterations NA
500 500
Number of objectives
Mutation probability NA NA Crossover probability NA NA Distribution index for crossover NA NA Distribution index for mutation NA NA NA: Not Available
100 150 200 250 300 350 400 objective function 1 (Cost) -70-60-50-40-30-20-10 ob j e c t i v e f un c t i on ( R e li ab ili t y ) NSGA-IIMOQGA (a) 8-2
150 200 250 300 350 400 450 500 550 600 650 objective function 1 (Cost) -140-120-100-80-60-40-20 ob j e c t i v e f un c t i on ( R e li ab ili t y ) NSGA-IIMOQGA (b) 16-4
100 150 200 250 300 350 400 450 objective function 1 (Cost) -150-100-500 ob j e c t i v e f un c t i on ( R e li ab ili t y ) NSGA-IIMOQGA (c) 32-8 objective function 1 (Cost) -400-350-300-250-200-150-100-500 ob j e c t i v e f un c t i on ( R e li ab ili t y ) NSGA-IIMOQGA (d) 64-16
Figure 1: Comparison the qualities of solutions provided by the proposed method (MOQGA) and NSGA-II.Table III: Percentage of completed tasks and average of resource violations for different algorithms in 30 missions (with differentnumbers of UAVs and tasks).
Method
Distance − Based NSGA − II MOQGA
No. of UAVs and tasks
Completed tasks Resource violations Completed tasks Resource violations Completed tasks Resource violations8-2
67% 1 . . % .
43% 3 . . % .
39% 6 . . % .
45% 11 . . % .
47% 20 . . % . able IV: The selected UAVs by leaders in 10 missions, where it is assumed the failure rate of the UAVs u and u is 90%. Time slot Coalition 1 Satisfied Coalition 2 Satisfied Unreliable UAVs L { u } ; F { u , u , u } Yes L { u } ; F { u , u } Yes {} L { u } ; F { u , u , u } Yes L { u } ; F { u , u } Yes {} L { u } ; F { u , u , u } Yes L { u } ; F { u } Yes { u } L { u } ; F { u , u , u } Yes L { u } ; F { u , u , u } No { u , u } L { u } ; F { u , u , u } Yes L { u } ; F { u , u } Yes {} L { u } ; F { u , u } Yes L { u } ; F { u , u , u , u } No { u , u } L { u } ; F { u , u , u } Yes L { u } ; F { u , u } Yes {} L { u } ; F { u , u , u } Yes L { u } ; F { u , u } Yes { u } L { u } ; F { u , u , u , u } Yes L { u } ; F { u , u } Yes { u } L { u } ; F { u , u , u } No L { u } ; F { u , u } Yes {} L { u } ; F { u , u , u } Yes L { u } ; F { u , u } No {} L: Leader, F: Follower
Time Slots C r ed i t C C C C C C Figure 2: Changes in UAVs’ cooperative reputation over time,where UAVs u and u are assumed to be selfishFigure 1 compares the qualities of solutions of the MO-QGA to NSGA-II algorithm. The plots demonstrate that theperformance of the proposed method is better than the NSGA-II algorithm and result in superior quality solutions in termsof lower cost and higher reliability. Figure 2 shows changesin UAV’s cooperative reputation over the course of time. It isassumed that UAVs, u and u are not trustworthy. As seenin the figure, the reputations of these UAVs decrease at eachtime slot, therefore it is less likely that these UAVs are selectedby the leaders over time. The impact of reliability in coalitionformation is also studied, where a pre-defined failure rate of90% is considered for resources of UAVs u and u . TableIV represents the selected UAVs (e.g., followers) by leaderswhere there are two unreliable UAVs. As shown in the table,the proposed method tries to not select unreliable UAVs inmost cases. The reason for the selection of unreliable UAVsin some cases is that the problem is a MOOP and the methodhas to consider other objectives (i.e., cost and reputation) tooptimize as well. V. C ONCLUSION
A leader-follower UAV coalition formation method isproposed to provide a practical solution for distributed taskallocation in an unknown environment. Three critical aspectsof cost minimization, reliability maximization, and the po-tential selfish behavior of the UAVs were considered in thiscoalition formation, and a quantum-inspired genetic algorithmis proposed to find the optimal coalitions with a low levelof computational complexity. The proposed approach led topromising results compared to existing solutions with respect to completing a higher number of tasks and minimally over-spending the resources.VI. ACKNOWLEDGMENT OF SUPPORT ANDDISCLAIMERThe authors acknowledge the U.S. Government’s support inthe publication of this paper. This material is based upon workfunded by AFRL. Any opinions, findings and conclusions orrecommendations expressed in this material are those of theauthor(s) and do not necessarily reflect the views of the USgovernment or AFRL. R
EFERENCES[1] T. W. Sandholm, “Distributed rational decision making,”
Multiagentsystems: a modern approach to distributed artificial intelligence , pp.201–258, 1999.[2] S. P. Ketchpel, “Coalition formation among autonomous agents,” in
European Workshop on Modelling Autonomous Agents in a Multi-AgentWorld . Springer, 1993, pp. 73–88.[3] J. Contreras, M. Klusch, and J. Yen, “Multi-agent coalition formationin power transmission planning: A bilateral shapley value approach.” in
AIPS , 1998, pp. 19–26.[4] J. Yang and Z. Luo, “Coalition formation mechanism in multi-agentsystems based on genetic algorithms,”
Applied Soft Computing , vol. 7,no. 2, pp. 561–568, 2007.[5] F. Cruz-Menc´ıa, J. Cerquides, A. Espinosa, J. C. Moure, and J. A.Rodriguez-Aguilar, “Optimizing performance for coalition structuregeneration problems’ idp algorithm,” in
Proceedings of the InternationalConference on Parallel and Distributed Processing Techniques andApplications (PDPTA) . The Steering Committee of The WorldCongress in Computer Science, Computer Engineering and AppliedComputing (WorldComp), 2013, p. 706.[6] F. Bistaffa, A. Farinelli, J. Cerquides, J. Rodr´ıguez-Aguilar, and S. D.Ramchurn, “Anytime coalition structure generation on synergy graphs,”in
Proceedings of the 2014 international conference on Autonomousagents and multi-agent systems . International Foundation for Au-tonomous Agents and Multiagent Systems, 2014, pp. 13–20.[7] L. Sless, N. Hazon, S. Kraus, and M. Wooldridge, “Forming coalitionsand facilitating relationships for completing tasks in social networks,” in
Proceedings of the 2014 international conference on Autonomous agentsand multi-agent systems . International Foundation for AutonomousAgents and Multiagent Systems, 2014, pp. 261–268.[8] P. Janovsky and S. A. DeLoach, “Multi-agent simulation frameworkfor large-scale coalition formation,” in
Web Intelligence (WI), 2016IEEE/WIC/ACM International Conference on . IEEE, 2016, pp. 343–350.[9] B. Ghazanfari and N. Mozayani, “Enhancing nash q-learning and teamq-learning mechanisms by using bottlenecks,”
Journal of Intelligent &Fuzzy Systems , vol. 26, no. 6, pp. 2771–2783, 2014.[10] B. Ghazanfari and M. E. Taylor, “Autonomous extracting a hierarchicalstructure of tasks in reinforcement learning and multi-task reinforcementlearning,” arXiv preprint arXiv:1709.04579 , 2017.11] B. Ghazanfari and N. Mozayani, “Extracting bottlenecks for rein-forcement learning agent by holonic concept clustering and attentionalfunctions,”
Expert Systems with Applications , vol. 54, pp. 61–77, 2016.[12] S. S. Mousavi, B. Ghazanfari, N. Mozayani, and M. R. Jahed-Motlagh,“Automatic abstraction controller in reinforcement learning agent viaautomata,”
Applied Soft Computing , vol. 25, pp. 118–128, 2014.[13] C. Schumacher, P. Chandler, M. Pachter, and L. Pachter, “Uav taskassignment with timing constraints via mixed-integer linear program-ming,” in
Proc. AIAA 3rd Unmanned Unlimited Systems Conference ,2004.[14] M. A. Darrah, W. M. Niland, and B. M. Stolarik, “Multiple uavdynamic task allocation using mixed integer linear programming in asead mission,”
Infotech@ Aerospace , pp. 26–29, 2005.[15] C. Schumacher, P. R. Chandler, and S. R. Rasmussen, “Task allocationfor wide area search munitions,” in
American Control Conference, 2002.Proceedings of the 2002 , vol. 3. IEEE, 2002, pp. 1917–1922.[16] J. Kennedy, “Particle swarm optimization,” in
Encyclopedia of machinelearning . Springer, 2011, pp. 760–766.[17] M. Dorigo and L. M. Gambardella, “Ant colony system: a cooperativelearning approach to the traveling salesman problem,”
IEEE Transac-tions on evolutionary computation , vol. 1, no. 1, pp. 53–66, 1997.[18] D. E. Goldberg,
Genetic algorithms . Pearson Education India, 2006.[19] P. J. Van Laarhoven and E. H. Aarts, “Simulated annealing,” in
Simulated annealing: Theory and applications . Springer, 1987, pp.7–15.[20] A. Rovira-Sugranes and A. Razi, “Predictive routing for dynamic uavnetworks,” in , Oct 2017, pp. 43–47.[21] A. Razi, F. Afghah, and J. Chakareski, “Optimal measurement policyfor predicting uav network topology,” arXiv preprint arXiv:1710.11185 ,2017.[22] K.-H. Han and J.-H. Kim, “Genetic quantum algorithm and its appli-cation to combinatorial optimization problem,” in
Evolutionary Com-putation, 2000. Proceedings of the 2000 Congress on , vol. 2. IEEE,2000, pp. 1354–1360.[23] P.-Y. Yin, S.-S. Yu, P.-P. Wang, and Y.-T. Wang, “Multi-objective taskallocation in distributed computing systems by hybrid particle swarmoptimization,”
Applied Mathematics and Computation , vol. 184, no. 2,pp. 407–420, 2007.[24] F. Afghah, M. Zaeri-Amirani, A. Razi, J. Chakareski, and E. Bentley,“A coalition formation approach to coordinated task allocation inheterogeneous uav networks,” arXiv preprint arXiv:1711.00214arXiv preprint arXiv:1711.00214