[PDF] A Game-Theoretic Utility Network for Cooperative Multi-Agent Decisions in Adversarial Environments

Abstract

Many underlying relationships among multi-agent systems (MAS) in various scenarios, especially agents working on dangerous, hazardous, and risky situations, can be represented in terms of game theory. In adversarial environments, the adversaries can be intentional or unintentional based on their needs and motivations. Agents will adopt suitable decision-making strategies to maximize their current needs and minimize their expected costs. In this paper, we propose a new network model called Game-Theoretic Utility Tree (GUT) to achieve cooperative decision-making for MAS in adversarial environments combining the core principles of game theory, utility theory, and probabilistic graphical models. Through calculating multi-level Game-Theoretic computation units, GUT can decompose high-level strategies into executable lower levels. Then, we design Explorers and Monsters Game to validate our model against a cooperative decision-making algorithm based on the state-of-the-art QMIX approach. Also, we implement different predictive models for MAS working with incomplete information to estimate adversaries' state. Our experimental results demonstrate that the GUT significantly enhances cooperation among MAS to successfully complete the assigned tasks with lower costs and higher winning probabilities against adversaries.

Full PDF

GGUT: A General Cooperative Multi-Agent Hierarchical DecisionArchitecture in Adversarial Environments

Qin Yang and Ramviyas Parasuraman ∗ Department of Computer Science, University of Georgia, Athens, GA 30605, USA

Abstract — Adversarial Robotics is a burgeoning researcharea in Swarms and Multi-Agent Systems. It mainly focuseson agents working on dangerous, hazardous and risky environ-ment, which will prevent robots achieve their tasks smoothly.In

Adversarial Environments , the adversaries can be intentionaland unintentional based on its needs and motivation. Agentsneed to adopt suitable strategies according to the currentsituation maximizing their utility or needs. In this paper, wedesign a game-like Exploration task, where both intentional(Monsters) and unintentional (Obstacles) adversaries challengethe Explorer robots in achieving their target. In order to mimicrational decision process of an intelligent agent, we propose anew

Game-Theoretic Utility Tree (GUT) architecture combiningthe core principles of game theory, utility theory, probabilisticgraphical models and tree structure decomposing the high levelstrategy to executable lower levels. We show through simulationexperiments that through the use of

GUT , the Explorer agentscan effectively cooperate between themselves and increase theutility of the individual agents and of the global system, andachieve higher success in task completion.

I. I

NTRODUCTION

Natural systems have been the key inspirations in thedesign, study, and analysis of multi-robot and multi-agentsystems (MAS) [1], [2], [3], [4]. For example, when a simpleindividual agent interacts with another agent or with theenvironment, it usually try to ﬁnd a suitable way adaptingto the current situation and satisfying its basic needs in theshort-term. But in a more complex system with multipleagents, individual agents usually builds a kind of cooperationalliance based on commonly agreed needs between eachother to maximize their beneﬁts, mitigate challenges in theenvironment, and achieve their long-term goals and globalobjectives. Cooperation in MAS can maximize system utilityand guarantee sustainable development for each group mem-ber [5]. On the other hand, it is also important to perceivethe environment and recognize the threats and adversaries inthe environment cooperatively among all agents in the MASteam. An

Adversary in the environment impairs the abilityof the individual agents and the global MAS to achieve theirtasks and also challenges their needs in certain scenarios [6],[7], [8].Following the examples from Information Systems [9], wecan classify an adversarial agent based on their needs andmotivations into two general categories: intentional (whichactively impair the MAS needs and capabilities such asan enemy or intelligent opponent agent) and unintentional (which might potentially or passively threaten MAS abilities,

Fig. 1. An illustrative game scenario where the paths (to Task) of theExplorer robots are blocked by Monster robots. like obstacles and weather). They can also be referred toas deliberate ( Monsters ) and accidental ( Obstacles ) adver-saries, respectively.Recently, researchers have been able to combine the dis-ciplines of Robotics and adversarial MAS into

AdversarialRobotics , which focus on autonomous agents and mobilerobots operating in adversarial environments [10], [11], [12],[6]. Generally, we can describe an adversarial environmentas the scenarios that combine intentional and unintentionaladversaries, which prevent robots from obtaining their needs,achieving their tasks.Moreover, in a dynamically changing environment, agentsfrequently decide to switch their behaviors and actionsaccording to the situation and needs. For example, Agent1 might be recognized as an adversary to Agent 2 in onescenario and when the situation lead to their needs’ changingin the future, they might develop a neutral relationship orbecome an ally and cooperatively perform a task.Most of past and current research focus on the uninten-tional adversaries in the environment, such as path planningavoiding static or dynamical obstacles, formation controlavoiding collision and so forth [13]. This is particularlyapplicable to urban search and rescue missions and robotsdeployed in disaster environments, where the robots are moreconcerned about unintentional threats such as radiation, ﬁre,water, etc.In this paper, we propose a general decision architectureto mimic rational thinking process of intelligent agents inan adversarial environment. We design a simple explorationgame (see Fig. 1) which contains both intentional and un-intentional adversaries. The main contributions of the paper a r X i v : . [ c s . M A ] A p r re outlined below. • First, we deﬁne the adversarial environment from arobot needs perspective and treat the two adversaries- unintentional and intentional separately. • Second, we propose a new

Game-Theoretic UtilityTree (GUT) , which combine the principles and meritsof

Game Theory [14],

Utility Theory [15],

ProbabilisticGraphical Model (PGM) [16], and

Tree Structure ex-ploiting its hierarchy. It can calculate the suitable tactics(behaviors) based on current utility in multiple levelsand decompose the high level strategies into low levelexecutable plans. • Third, to tackle the static unintentional adversaries inMAS, we present an efﬁcient distributed algorithmcalled ”

Adapting The Edge ”, which combines individ-ual adapting behaviours and group cooperation together. • Finally, we validate the proposed approach throughextensive simulation experiments demonstrating its util-ity in a simple

Exploration Game considering variousscenarios of Explorer to Monster ratios with distinct co-operative models under different environmental settings.II. R

ELATED W ORK

In majority of the Adversarial Robotics literature, theadversaries are not artiﬁcial intelligent agent [10], [11].They might be natural force like wind, ﬁre, rain or othercreatures’ aggressive behaviors. From the task’s perspective,some researchers categorize multi-robots adversarial roboticsinto four main class: • Adversarial Patrol [17] • Adversarial Coverage [10], [11] • Adversarial Formation [13] • Adversarial Navigation [18], [19], [20]But the most challenges usually come from the physicalworld, especially motion, dynamic and continuous spaces.When we model the uncertainty in adversarial environment,we need to consider how to build suitable and speciﬁc mod-eling for robots with respect to self and opponent perception,utility calculation, decision making, and motion planning.In the recent research, Lin [21] examined the problemof defending against a sequential attack in a knowledge-able adversarial environment. Prorok [22] did some studiesfocusing on multi-robot privacy with adversarial approach.From the swarm robotics perspective, Sanghvi and Sycara[12] identiﬁed a swarm vulnerability and studied how anadversary is take advantage of it. From a machine learningperspective, Paulos and Kumar [23] describe a architecturefor training teams of identical agents in a defense game.Some studies also focus on solving multi-player pursuitand evasion game problem [24], [6], [25], which mainlydeals with how to guide one or a group of pursuers to catchone or a group of moving evaders [26]. This problem coversformation keeping, conﬂict resolution, and optimum taskallocation [27]. The more recent works mainly concentrateon optimal evasion strategies and task allocation [28], [29]and predictive learning from agents’ behaviors [30]. In our

Explore Game problem, individual agent’s motivation is not to pursue or catch speciﬁc agents but based on their sharedneeds and cooperation with each other agents in the systemto explore an adversarial area while satisfying their taskrequirements and mission objectives.However, there is little research done in studying con-frontational strategies, preventive control and behaviors tomitigate intentional adversaries, which can be considered asactive, intelligent opponent agents.Intentional adversaries also play an important role in manyapplications such as military,/defense, where the multi-robotsystem cooperating with each other to achieve some globalmissions do not only have to consider the unintentionaladversaries like wind and obstacles impeding their pathtowards their targets, but also the intelligent adversarialagents such as enemy robots.In a MAS, the agents have to exhibit an awareness ofthe environment not only at an individual agent level butalso at a system level, wher computational game theoryprovide useful examples of the study in the area of machinebehaviour [31]. To address the gaps in the literature, webuild a general

GUT architecture combining

Game Theory [32], [14],

Utility Theory [15], [33],

Probabilistic GraphicalModel(PGM) [16], [34] and

Tree Structure to calculateand decompose the decision strategies speciﬁcally to tackleintentional adversaries. We also design an algorithm termed”Adapting The Edge” to help the MRS avoid static uninten-tional adversaries efﬁciently.III. P

ROBLEM S TATEMENT

Multi-robots or MAS working in adversarial environmentis a complex distributed system, especially multiple groupsof intelligent agents having different purposes interactingwith each other, which will present various relationshipsand behaviours. But the most important challenge in thisscenario is how to organize these robots to work togetherand adopt suitable strategies guaranteeing their maximumutility corresponding to adversaries’ tactics.We design a

Exploration Game mimicking a group ofExplorer agents go through an adversarial environment toexplore the treasure (Target) as shown the Fig. 1. In thisscenario, there are several Monsters (intelligent autonomousagents) representing intentional adversaries randomly dis-tributed in the path of treasure. Once the Monsters detect anyexplorers, they will prevent them from passing through. Twomountain-like Obstacles are considered to be unintentionaladversaries impeding the explorers’ movement task.In this whole process, we assume each Monster do notcommunicate between each other (acting independently) andthey all act based on their greedy self-interest ( individual ra-tionality ), which means each Monster always care about ownbeneﬁts. However, the Explorers can communicate betweenother explorer agents and can share information with eachother representing collective rationality . The problem to besolved is to device the cooperative strategies for the Exploreragents such that they all collectively reach the Target whiletackling both intentional and unintentional adversaries on theway. Through this problem, we aim to evaluate the individual ig. 2. General Individual Robot’s

GUT and system utility of Explorers and Monsters under differentstrategies, scenarios, and environments.IV. A

PPROACH O VERVIEW

To solve the above problem, the autonomous Exploreragents need to adopt various strategies and plans based ontheir current status (needs and utility) and optimize/guaranteethe utility of the individual robots and the collective MASsystem.For the intentional adversaries (Monsters), we design the

GUT architecture Fig. 2, which can calculate each level’stactics based on current utility and decompose the highlevel’s decision into lower levels. We also add two kinds ofassumption in the decision level: one is irrational decisionwhich means if the value of individual utility lower to certainlevel, it will present a kind of instinctive behaviors, suchas escaping, guaranteeing its safety. On the other hand, ifcurrent condition ﬁts the low level needs like safety or basicneeds, it will enter into rational decision process.For simplicity, we build a three level

GUT to elaborate it,which guarantee that the individual robot’s rational decisioncan be decomposed to executable level. The ﬁrst level (high-level) determine whether or not to attack (or defend). Then,the second level is to ﬁgure out the speciﬁc agent to beattacked (or defended from). According to the previousdecision, the last level (lowest level) decides how the agentsshould group themselves adapting the current situation.More speciﬁcally, in the ﬁrst level we deﬁne Explor-ers and Monsters both have two strategies:

Attack and

Def end , which are represented through

T riangle and

Regular P olygon formation shapes, respectively. Accordingto the payoff matrix Table. I in a zero-sum game, they cancalculate the strategy which can ﬁt for the current situation.Then, based on the precondition in the ﬁrst level, they needto decide attacking or defending the speciﬁc agent. Forexample, we assume that in the attacking model Explorersand Monsters have two kinds of behaviors: attacking the nearest agent or the attacking ability lowest agent in theformation. But in the defending model, they can choosedefending the nearest agent or the attacking ability lowest agent. Through the corresponding payoff matrix Table. II,they can conﬁrm the target sequence. Finally, in the lowestlevel, according to the corresponding tactics payoff matrixTable. III, individual can calculate the ﬁnal tactics, such

TABLE IL

EVEL

XPLORER & M

ONSTER T ACTICS P AYOFF M ATRIX

MBUtility EB Attack DefendAttack W AA W DA Defend W AD W DD as the number of groups in Explorers and the behavior offollowing others or not in Monsters.Through this process, we decompose the individual agent’sstrategy into three levels and each level focus on differentutilities corresponding to different needs or requirements. Inorder to simplify this calculation process, across every level,we respectively use Winning probability ( W ), the relative ex-pected cost of energy ( E ( e ) ) and HP (health power) ( E ( hp ) )– the expected utility difference of both sides – to measureevery level’s utility. In the ﬁrst level, the utility presents theWinning probability which relate to the current perceivedadversaries’ number and individual attacking and defendingability. In the second level, we consider the relative expectedenergy cost to describe this level’s utility, which depends onthe agents’ distribution and numbers. In the lowest level,we use the relative expected HP cost representing the utilitycaused by the individual and group’s current information,such as the number of groups and agent’s current energylevel. This decision decomposition process also disassemblethe individual needs into different level, which mimic theintelligent agent’s thinking process [35]. TABLE IIL

EVEL

XPLORER & M

ONSTER T ACTICS P AYOFF M ATRIX

MBUtility EB Nearest A Lowest A HighestNearest E ( e ) NN E ( e ) A L N E ( e ) A H N A Lowest E ( e ) NA L E ( e ) A L A L E ( e ) A H A L A Highest E ( e ) NA H E ( e ) A L A H E ( e ) A H A H )TABLE IIIL EVEL

XPLORER & M

ONSTER T ACTICS P AYOFF M ATRIX

MBUtility EB One Group Two Group Three GroupIndependent E ( hp ) I E ( hp ) I E ( hp ) I Dependent E ( hp ) D E ( hp ) D E ( hp ) D For the unintentional adversaries, we design the

AdaptingThe Edge algorithm, which can help individual agent tackle(static) unintentional adversaries and adapting their edge’strajectory until it ﬁnds a suitable route to the goal point.In this process, through the communication and informationsharing with other Explorer agents, individual agent can se-lect the direction having less possibility of potential collisionwith the unintentional adversaries to move. In our scenarios,the two mountains represent the unintentional adversariesand Explorers need to ﬁnd a path passing through them.Initially, the group of Explorers form the patrol formationo detect the unknown world and select the shortest path tothe treasure. When they perceive an unintentional adversaries(mountains), individual agents need to adapt to their currentsituation and combine perceived and shared information toﬁnd a route passing through them. If they detect intentionaladversaries (Monsters), each Explorer will compute its cur-rent strategy based on

GUT , then through negotiation andagreement cooperating with each other. These present a kindof global behaviors performing

Collective Rationality andcaring about

Group interest . In contrast, each Monster alsofollow the same process but do not cooperate with eachother and is

Self-interest . Additionally, in this process, ifindividual agent’s HP value lower to a threshold level, it willadopt instinct behavior (irrational behaviour) by escaping thecurrent status to ﬁt its safety needs.V. F

ORMALIZATION AND A LGORITHMS

Below, we formalize the adversarial environment and thedecision process in intentional and unintentional adversaries. A. Adversarial Environment

Supposing we have three agents R i , i ∈ in certainscenario. R needs to fulﬁl the task T satisfying its currentneed N . In order to quantify the agent’s need, we use utilityFunction. 1 to calculate it. N = U ( f , f , . . . f n ) , n ∈ Z + ; (1) f n presents the various factors involving the calculationsuch as time, energy and so forth. According to R ’scapabilities, it will have the solution space S i , i ∈ n to perform. So we assume that R complete T using the S j without any interruption(not considering R and R )and the corresponding utility value is max ( U | s j ) . Then,considering R and R , if R can not ﬁnd any solution in S i making the Equation. 2 be established. It will regard R and R to Adversary . ( U | s i ) = max ( U | s j ); (2)In additional, if R next s k corresponding to s i canincrease its current U , which means its expected utility lagerthan U Formula. 3, it can be regard to

Intentional Adversary .On the other hand, if R current solution s i does not impact U or U is always zero Formula. 4. 5, we consider R is Unintentional Adversary . E ( U | s k , s i ) > U ; (3) E ( U | s k , s i ) = U ; (4) U = 0; (5) B. Intentional Adversaries Decision

For the intentional decision, we have zero − sum game G in the each level of GUT

Fig. 2, which can be describedas Formula. 6: G = { A, B ; U } ; (6) Using U = ( a ij ) m × n present agent A ’s utility, and thetactics of agent A and B also can be present as Formula. 7and 8: A = { AT , AT , . . . , AT m } ; (7) B = { BT , BT , . . . , BT n } ; (8)In every ﬁnite games has a Pure Strategy Nash Equilibriumor a Mixed Strategy Nash Equilibrium, so the process canbe formalized as two step: a. Compute Pure Strategy Nash Equilibrium We can present agents’ utility matrix as Formula. 9:  a a · · · a n a a · · · a n ... ... . . . ... a m a m · · · a mn  (9)The row and column correspond to the utility of agent A and B separately. Through calculating the minimum valuelist of each row and maximum value list of each column, wecan compute the maximum and minimum values of the twolists separately. min ≤ j ≤ n max ≤ i ≤ m a ij = max ≤ i ≤ m min ≤ j ≤ n a ij (10)If the two value satisfy the Formula. 10, we can get theﬁrst level Pure Strategy Nash Equilibrium . P SN E = ( A i ∗ , B j ∗ ); (11) V G = a i ∗ j ∗ ; (12) b. Compute Mixed Strategy Nash Equilibrium The tactics’ probability of agent A present as Formula. 13. AX = ( x , x , . . . , x m ); x i ≥ , i = 1 , , . . . , m ; m (cid:88) i =1 x i = 1 (13)Similarly, we also can conclude agent B ’s tactics’ proba-bility as Formula. 14. BY = ( y , y , . . . , y n ); y j ≥ , i = 1 , , . . . , m ; m (cid:88) j =1 y i = 1 (14)As above discussion, ( X, Y ) can be deﬁned as M ixed Situation in certain status. Then, we can deducethe expected utility of agent A and B Formula. 15 and 16separately. E A ( X, Y ) = m (cid:88) i =1 n (cid:88) j =1 a ij x i y j = E ( X, Y ); (15) E B ( X, Y ) = − E ( X, Y ) (16)In the Game G = { A, B ; U } , if we can get all the M ixed T actics of agent A and B as Formula. 17 and18, we can deduct the G ’s M ixed Expansion as Formula.19. Then, if we can compute a tactic ( X ∗ , Y ∗ ) satisfyingormula. 20 and 21, we deﬁne this tactic is the optimalstrategy in current state and the game result is Formula. 22. S ∗ A = AX ; (17) S ∗ B = BX ; (18) G ∗ = { S ∗ A , S ∗ B ; E } ; (19) E ( X ∗ , Y ) ≥ V s A , ∀ Y ∈ S ∗ A ; (20) E ( X, Y ∗ ) ≤ V s B , ∀ X ∈ S ∗ B ; (21) V S A = V G = V S B (22)After computing the two steps, if the calculation is thePure Strategy Nash Equilibrium, individual agent can obtaina unique tactic entering into the next level, which meansthis tactic’s possibility is one hundred percent. Otherwise inthe Mixed Strategy Nash Equilibrium, it will present severaltactics with some certain probability. And the set of eachlevel’s feasible solution can present a speciﬁc possibilitybeing the next level’s conditional probability. For example,if one of feasible solution’s possibility in level i is P ij , onebranch set of feasible solution’s possibility in the next level i + 1 is P { k | ij } , k ∈ { . . . n } . We can represent this entireprocess as Fig. 2. Until get the lowest level, individual agentcan choose the most possible and suitable tactic set to adaptcurrent situation.To summary, according to the set of individual tactics ineach level, we can build a ﬁnite solution space S . Then,through computing each level’s Nash Equilibrium, we canget the corresponding GUT in current state.The most important and challenging part here is the

U tility F unction design, which connect the decision leveland the action(planning) level. It also determines whether ornot the agent or system can calculate a reasonable tactics tooptimize its performance. So in this step, we ﬁrst assumeeach robot has two basic attributes: Energy Level and HP.According to the discussion in

Section

IV, in the

GUT ﬁrstlevel

W inning P robability obey

Bernoulli Distribution ,so the utility present its expectation and we can formalize itas Formula. 23. W ( t ev , t mv , r ev , r mv , n, m ) = ( a a t ev + a r ev a t mv + a r mv ) mn ; (23)The second Level’s utility is described the relative ex-pected energy cost as Formula. 24, 25, 26 and 25. And weconsider three parts of energy cost in the whole process: walking , attacking and communication . E ( d, v, f, q, n, m, φ e , φ m ) = b + b (cid:90) + ∞−∞ ( n − m ) e d ( x ) p d ( x, d )d x + b ( + ∞ (cid:88) i =1 ne a e ( i, f ) p a m ( j, mφ m ) − + ∞ (cid:88) j =1 me a m ( j, q ) p a e ( i, nφ e ))+ b ∞ (cid:88) w =1 ne c ( w ) p c ( w, dv ); (24) e d ( x ) = b x ; (25) e a ( x, y ) = b xy ; (26) e c ( x ) = b x (27)In the entire attack − def end process, the agent’s ac-tion distance, the times of attacks and being attackedand the communication times obey N ormal Distribution , P oisson Distribution separately. So we can describe theirdistribution function as Formula. 28, 29, 30 and 31. p d ( x, d ) = 1 √ π e − ( x − d )22 ; (28) p a e ( x, λ e ) = e − λ e λ xe x ! ; (29) p a m ( x, λ m ) = e − λ m λ xm x ! ; (30) p a m ( x, dv ) = e − dv ( dv ) x x ! ; (31)In the lowest level, we use the expected HP cost to explainthe utility as Formula. 32, 33, 34 and 35. H ( k, t e , t m , r e , r m , g, φ e , φ m ) = c + c ( + ∞ (cid:88) i =1 kh ( t e , r e , i ) p h m ( i, φ m ) − + ∞ (cid:88) j =1 gh ( t m , r m , j ) p h e ( j, φ e )); (32) h ( x, y, z ) = ρz ( x + y ) (33) t e,m ( e e,m ) = γ e,m e e,m ; (34) r e,m ( e e,m ) = δ e,m e e,m ; (35) p h e ( i, φ e ) and p h m ( j, φ m ) are similar to the Formula. 29and 30 correspondingly. Here, n and m present the numberof Explorers and Monsters separately; d presents the groupaverage distance between two opponents; v presents theagent’s velocity; i and j present the times of attacks andbeing attacked; w presents Explorers’ communication times; f and q present the unit attacking energy cost of both sidesagents separately; t ev and t mv present average attackingability levels of both sides separately; r ev and r mv presentaverage defending ability levels of both sides separately; t e and t m present speciﬁc agent’s attacking ability levels ofboth sides separately; r e and r m present speciﬁc agent’sdefending ability levels of both sides separately; φ e and φ m present individual agent’s size; k presents the number ofExplorers’ attacking simultaneously; g presents the numberof Monsters’ attacking simultaneously; a , b , c , ρ , γ and δ present corresponding coefﬁcient; e e and e m present thecurrent energy level of Explorer and Monster; h present thecurrent HP level of Explorer and Monster; p presents theprobability corresponding to the different section.According to above discussion, we assume that Explorerand Monster have the same speed to move and Monstercan not communicate with each, then we can simplify theormula. 24 and 32 as Formula. 36 and 37. E ( d, v, f, q, n, m, φ e , φ m ) = b + b b ( n − m ) d + b b nm ( f φ m − qφ e )+ b b n dv ; (36) H ( k, t e , t m , r e , r m , g, φ e , φ m ) = c + c ρ [ kφ m e e ( γ e + δ e ) − gφ e e m ( γ m + δ m )]; (37)Through the Formula. 23, 36 and 37, individual can calcu-late the utility and get the Nash Equilibrium in each level’spayoff matrix. After computing the entire GUT , it needsto combine each level’s tactics and execute the integratedstrategy in the planning level. We can describe the decisionprocess as Alg. 1.

Algorithm 1:

Explore Game

GUT

Model

Input :

Explorers’ and Monsters’ states

Output : formation shape s ; current attacking target t ;number of groups g . set state = ” level one ” ; while the changing of Monster’s number == True And

Monster’s number != 0 do if state== ” level one ” then Compute the Nash Equilibrium; Get the most feasible formation shape s ; state = ” level two ” else if state== ” level two ” And s != Null then Compute the Nash Equilibrium; Get the most feasible attacking target t ; state = ” level three ” else if state== ” level three ” And s, t != Null then

Compute the Nash Equilibrium;

Get the most feasible number of groups g ; end end if Monster’s number == 0 then s = ” P atrol ” ; g = 1; end return s, t, g C. Unintentional Adversaries Decision

When Explorers perceive the mountains which can beregarded as the static unintentional adversary, they need toutilize the limited information through communication andperception passing through them. In our experiment, thescenario can be describe as Fig. 3. There are nine robotsand R and R detect the mountain. For robot R , in orderto avoid collision, it needs to switch current direction k totangent’s direction of the nearest collision point c . Since ithas two direction n and m , according to R ’s state, it shouldselect the direction n which has more non-collision robotscurrently.Speciﬁcally, considering the line L , which is vertical totangent through point c , is the boundary. For the direction n ,there are ﬁve robots but only four robots R , R , R and R do not have collision. We also can see the non-collisionrobots’ number is three in direction m . So currently, R will Fig. 3. Illustration of multi-robot ”Adapt The Edge” formation controlalgrithm for obstacle avoidance. select direction n to move certain distance ∆ d , then adjustthe direction to the goal point to go forward and loop theentire process until perceive no unintentional adversaries onits routing. Combining the two kinds of decision, we presentthe entire decision process as Alg. 2. For simplicity, we onlyselect the maximum feasible solution in each level enteringthe next level. Algorithm 2:

Adapting The Edge

Input :

Explorers’ and mountains’ states

Output : moving direction r and distance ∆ d while The nearest collision point c ! = Null do calculate the number n and m of non-collision agents inboth side of the line l passing through c andperpendicular c ’s tangent; if n > m then r = n side in line l ; ∆ d = one step of agent’s movement; else if n == m then agent stop; else if n < m then r = m side in line l ; ∆ d = one step of agent’s movement; end end return r = current position to goal point, ∆ d VI. E

VALUATION THROUGH S IMULATIONS

Considering cross-platform, scalability, efﬁciency and ex-tendability of the simulation, we chose “Unity” [36] tosimulate the

Explore Game and select Gamebit [37] forcalculating each level’s Nash Equilibrium. Because it isan open-source collection of tools for doing computationin game theory and can build, analyze, and explore gamemodels cross-platform.Our experimental evaluation is focused on the distributionof agents’ tactics in possible solution space and imple-menting different parameter’s predictive model to analysethe system utility and cost. In our model, when individualcalculate the utility, it relate to three main factors: thenumber of agents n , individual unit attacking energy cost ig. 4. Experiments on Monsters with Different Distribution Considering Unintentional Adversaries e u , agent’s current energy level e c . Since the parameters e u and e c might not clear for both sides, we design twokinds of experiments to evaluate the system’s performance.One is Complete Information , which means Explorer andMonster know each other’s status. On the other hand is

Incomplete Information and agents need to use predictivemodels predicting opponent states and parameters.In each experiment, we ﬁrst consider no unintentionaladversaries situation, and design 17 different scalabilityscenarios. Then we involve the unintentional adversaries(tworocks) and ﬁx the both sides’ number distributing the Mon-sters in different position with three proportion comparingthe system’s performance. We also implement

CollectiveRationality and

Individual Rationality in Explorer’s side forall the scenarios.We suppose each Explorer has same battery and HP levelsinitially and every moving step will cost . energy.Also, every communication round and per time attacking willcost . and . energy, respectively. If Explorer isbeing attacked by Monster, it will cost . HP per time.For the Monster, per time attacking energy and per timeattacked HP cost are . and . , respectively.In the complete information strategy, we assume that ifindividual agent can perceive the adversary, it will knowthe opponent status, such as unit attacking energy cost andenergy level, vice versa. For incomplete information part,we use two different kinds of predictive models (linear andnonlinear) based on individual HP cost to predict opponentunit attacking energy e u and agent’s current energy e c . a) Linear Predictive Model: In the linear predictivemodel, we use agent’s unit HP cost and average systemHP cost to predict opponent unit attacking energy cost andenergy level correspondingly. The model can be presented asFormula. E uc = a ∗ HP uc + b ∗ n and E el = 100 − HP asc ∗ c + b ∗ n ; b) Nonlinear Predictive Model: For the nonlinear pre-dictive model, we just replace the above formulas’ linearpart with the natural logarithm formula. And the modelsare shown as E uc = d ∗ ln ( HP uc ) + b ∗ n and E el =100 − ln ( HP asc ) ∗ e + b ∗ n ; . Here, a , b , c , d and e arethe coefﬁcients, n presents the noise following the normaldistribution N (0 , . A. Environments with only intentional adversary

In this setting, we consider three kinds of environmentswith ﬁxed number of Explorers (E=25) and Monsters (M=25)but with different Monster distributions (D2, D3, D4) asshown in Fig. 4. Through this experiment, we compare the Explorer HPcost and the number of Explorer losing of killing perMonster, which can evaluate the game’s difﬁculty level.And we also analysis the average HP and energy costof per Explorer completing the task, which can presentthe system’s performance and utility. We found that theperformance of Explorer cooperation is always better thanthe noncooperation’s, which means collective rationality canbring more beneﬁt and interest comparing self-interest whenmulti-agents work with each other. Also the number ofExplorer losing of killing per Monster and the Exploreraverage HP cost present linear positive correlation with theratio of Monster and Explorer.Comparing the strategy of Explorer and Monster in TableIV, we can notice that if the Monster’s main strategy isattacking, the Explorer will lose the game. On the hand, ifthe Explorers’ main strategy is attacking or the ratio of thetimes of attacking and defending higher than certain level,they will win the game.

TABLE IVS

TRATEGY C OMPARISON NO U NINTENTIONAL A DVERSARY . R A : R ATIOOF E XPLORERS TO M ONSTERS , R:

RESULT , √ : WIN , C OM : COMPLETEINFORMATION , L:

LINEAR , A:

ATTACKING , D:

DEFENDING , AD : FREQUENCY OF AGENTS ’ ATTACK / DEFEND BEHAVIORS . Ra Com Incom L Incom NonL NoncoopR A e D e A m D m R A e D e A m D m R A e D e A m D m R A e D e A m D m √ √ √ ∞ √ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ √ √ ∞ √ ∞ √ √ B. Environments with unintentional adversary

In this experiment, we consider 25 Monsters with four dif-ferent distribution Fig. 4. Through the system performance’sanalysis Table. V, we can clearly notice that the ﬁrst scenariohas better performance than others comparing with Exploreraverage energy cost av e and Explorer average HP cost hp e .Fig. 6 and 7 provide the strategies distribution on in thesolution space based on different scenarios, which follow thenormal distribution with different µ and σ . And if the σ is ig. 5. Experiments on Scalability and Complexity.Fig. 6. Normalized frequency of choosing Attack-Nearest-Three Group in Explorers. Normalized frequency of choosing different behaviors of Monstersin diffrent game strategies under different environments (M=E=25).Fig. 7. Normalized frequency of choosing Attack-Nearest-Three Group inExplorers. large, this means the current situation has lots of uncertaintyand the group tactics (Explorer) or the individual tactics(Monster) will represent more possibility combination in thesolution space.Our experiment verify that the collective rationality can bring more beneﬁt and interest comparing self-interest whenmulti-agents work with each other. Also through reducingthe solution space by the SGDT calculation, we analysethe correlation between system performance and the tacticsof group and individual. A video demonstration of theexperiments is available in the anonymized link at https://streamable.com/bmblm . TABLE VP

ERFORMANCE C OMPARISON WITH U NINTENTIONAL A DVERSARY . av e :E XPLORER AVERAGE ENERGY COST , hp e : E XPLORER AVERAGE HP COST , l e l m : THE PROPORTION OF LOSING NUMBER OF E AND M. Envir. Com Incom L Incom NonL Noncoop av e hp e l e l m av e hp e l e l m av e hp e l e l m av e hp e l e l m D1 48.72 82.18 0.91 52.08 91.72 1.05 50.53 80.09 0.90 6.60 - -D2 59.89 90.95 1.00 67.67 80.86 0.64 63.10 90.42 0.92 7.91 - -D3 63.51 80.62 0.65 62.18 89.87 0.95 62.85 89.18 1.10 5.66 - -D4 62.80 93.20 1.09 56.20 90.16 1.00 58.64 82.72 0.74 6.54 - -

VII. C

ONCLUSIONS

Our work introduces a general decision framework

GUT- Game-Theoretic Utility Tree to mimic intelligent agentthinking process in adversarial environment, which combinehe game theory, utility theory, probabilistic graphical modeland tree structure. We deﬁne and formal the

AdversarialEnvironment from robot’s needs perspective and classifyadversary into unintentional and intentional . In order totackle the static unintentional adversaries in multi-agentssystem, we present the

Adapting The Edge distributed algo-rithm. Finally, we validate our approach through extensivesimulation experiments.The proposed architecture provide the decision level basedon our previous work SRSS, which combine the low levelplanning framework causing intelligent agent adapting andcooperating dynamical environment through its decision.This approach also leave lots of future work for furtherresearch, such as individual learning from various scenar-ios helping the entire system upgrade, decreasing duplicatecalculation saving computing resource in decision level,the appropriate utility function designing, optimizing andbuilding suitable predictive model and parameters estimation.In addition, we also plan to implement our framework in thereal robots, which can help us to develop better veriﬁcationprocedures for computational models of these systems.R

EFERENCES[1] J. S. Shamma,

Cooperative control of distributed multi-agent systems .Wiley Online Library, 2007.[2] C. Blum and D. Merkle, “Swarm intelligence,”

Swarm Intelligence inOptimization; Blum, C., Merkle, D., Eds , pp. 43–85, 2008.[3] E. Bonabeau, M. Dorigo, and G. Theraulaz, “From natural to artiﬁcialswarm intelligence,” 1999.[4] M. Dorigo, M. Birattari, and M. Brambilla, “Swarm robotics,”

Schol-arpedia , vol. 9, no. 1, p. 1463, 2014.[5] J. Shen, X. Zhang, and V. Lesser, “Degree of local cooperation and itsimplication on global utility,” in

Proceedings of the Third InternationalJoint Conference on Autonomous Agents and Multiagent Systems-Volume 2 . IEEE Computer Society, 2004, pp. 546–553.[6] T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-evasionin mobile robotics,”

Autonomous robots , vol. 31, no. 4, p. 299, 2011.[7] M. Jun and R. DAndrea, “Path planning for unmanned aerial vehiclesin uncertain and adversarial environments,” in

Cooperative control:models, applications and algorithms . Springer, 2003, pp. 95–110.[8] N. Agmon, S. Kraus, and G. A. Kaminka, “Uncertainties in adver-sarial patrol,” in

Proceedings of The 8th International Conference onAutonomous Agents and Multiagent Systems-Volume 2 . InternationalFoundation for Autonomous Agents and Multiagent Systems, 2009,pp. 1267–1268.[9] M. Jouini, L. B. A. Rabai, and A. B. Aissa, “Classiﬁcation of securitythreats in information systems,”

Procedia Computer Science , vol. 32,pp. 489–496, 2014.[10] N. Agmon, G. A. Kaminka, and S. Kraus, “Multi-robot adversarialpatrolling: facing a full-knowledge opponent,”

Journal of ArtiﬁcialIntelligence Research , vol. 42, pp. 887–916, 2011.[11] R. Yehoshua and N. Agmon, “Adversarial modeling in the roboticcoverage problem,” in

Proceedings of the 2015 International Confer-ence on Autonomous Agents and Multiagent Systems . InternationalFoundation for Autonomous Agents and Multiagent Systems, 2015,pp. 891–899.[12] N. Sanghvi, S. Nagavalli, and K. Sycara, “Exploiting robotic swarmcharacteristics for adversarial subversion in coverage tasks,” in

Pro-ceedings of the 16th Conference on Autonomous Agents and MultiA-gent Systems . International Foundation for Autonomous Agents andMultiagent Systems, 2017, pp. 511–519.[13] Y. Shapira and N. Agmon, “Path planning for optimizing survivabil-ity of multi-robot formation in adversarial environments,” in . IEEE, 2015, pp. 4544–4549.[14] R. B. Myerson,

Game theory . Harvard university press, 2013.[15] P. C. Fishburn, “Utility theory for decision making,” Research analysiscorp McLean VA, Tech. Rep., 1970. [16] D. Koller and N. Friedman,

Probabilistic graphical models: principlesand techniques . MIT press, 2009.[17] N. Agmon, S. Kraus, G. A. Kaminka, and V. Sadov, “Adversarialuncertainty in multi-robot patrol,” in

Twenty-First International JointConference on Artiﬁcial Intelligence , 2009.[18] O. Keidar and N. Agmon, “Safety ﬁrst: Strategic navigation inadversarial environments,” in

Proceedings of the 16th Conference onAutonomous Agents and MultiAgent Systems . International Foun-dation for Autonomous Agents and Multiagent Systems, 2017, pp.1581–1583.[19] N. Agmon, Y. Elmaliah, Y. Mor, and O. Slor, “Robot navigation withweak sensors,” in

International Conference on Autonomous Agents andMultiagent Systems . Springer, 2011, pp. 272–276.[20] O. Keidar and N. Agmon, “Safe navigation in adversarial environ-ments,”

Annals of Mathematics and Artiﬁcial Intelligence , vol. 83,no. 2, pp. 121–164, 2018.[21] E. S. Lin, N. Agmon, and S. Kraus, “Multi-robot adversarial patrolling:Handling sequential attacks,”

Artiﬁcial Intelligence , vol. 274, pp. 1–25,2019.[22] H. Zheng, J. Panerati, G. Beltrame, and A. Prorok, “An adversarialapproach to private ﬂocking in mobile robot teams,” arXiv preprintarXiv:1909.10387 , 2019.[23] J. Paulos, S. W. Chen, D. Shishika, and V. Kumar, “Decentralization ofmultiagent policies by learning what to communicate,” arXiv preprintarXiv:1901.08490 , 2019.[24] R. Vidal, S. Rashid, C. Sharp, O. Shakernia, J. Kim, and S. Sastry,“Pursuit-evasion games with unmanned ground and aerial vehicles,” in

Proceedings 2001 ICRA. IEEE International Conference on Roboticsand Automation (Cat. No. 01CH37164) , vol. 3. IEEE, 2001, pp.2948–2955.[25] A. Kolling and S. Carpin, “Multi-robot pursuit-evasion without maps,”in .IEEE, 2010, pp. 3045–3051.[26] P. Cheng, “A short survey on pursuit-evasion games,”

Department ofComputer Science, University of Illinois at Urbana-Champaign , 2003.[27] J. S. Jang and C. Tomlin, “Control strategies in multi-player pursuitand evasion game,” in

AIAA guidance, navigation, and control con-ference and exhibit , 2005, p. 6239.[28] W. L. Scott III, “Optimal evasive strategies for groups of interact-ing agents with motion constraints,” Ph.D. dissertation, PhD thesis,Princeton University, 2017.[29] V. R. Makkapati and P. Tsiotras, “Optimal evading strategies and taskallocation in multi-player pursuit–evasion problems,”

Dynamic Gamesand Applications , pp. 1–20, 2019.[30] S. Shivam, A. Kanellopoulos, K. G. Vamvoudakis, and Y. Wardi, “Apredictive deep learning approach to output regulation: The case ofcollaborative pursuit evasion,” arXiv preprint arXiv:1909.00893 , 2019.[31] I. Rahwan, M. Cebrian, N. Obradovich, J. Bongard, J.-F. Bonnefon,C. Breazeal, J. W. Crandall, N. A. Christakis, I. D. Couzin, M. O.Jackson et al. , “Machine behaviour,”

Nature , vol. 568, no. 7753, p.477, 2019.[32] J. F. Nash et al. , “Equilibrium points in n-person games,”

Proceedingsof the national academy of sciences , vol. 36, no. 1, pp. 48–49, 1950.[33] M. J. Kochenderfer,

Decision making under uncertainty: theory andapplication . MIT press, 2015.[34] M. I. Jordan, “An introduction to probabilistic graphical models,”2003.[35] Q. Yang, Z. Luo, W. Song, and R. Parasuraman, “Self-ReactivePlanning of Multi-Robots with Dynamic Task Assignments,” in

IEEEInternational Symposium on Multi-Robot and Multi-Agent Systems(MRS) 2019 , 2019, extended Abstract.[36] U. G. Engine, “Unity game engine-ofﬁcial site,”