A Game-Theoretic Utility Network for Cooperative Multi-Agent Decisions in Adversarial Environments
GGUT: A General Cooperative Multi-Agent Hierarchical DecisionArchitecture in Adversarial Environments
Qin Yang and Ramviyas Parasuraman ∗ Department of Computer Science, University of Georgia, Athens, GA 30605, USA
Abstract — Adversarial Robotics is a burgeoning researcharea in Swarms and Multi-Agent Systems. It mainly focuseson agents working on dangerous, hazardous and risky environ-ment, which will prevent robots achieve their tasks smoothly.In
Adversarial Environments , the adversaries can be intentionaland unintentional based on its needs and motivation. Agentsneed to adopt suitable strategies according to the currentsituation maximizing their utility or needs. In this paper, wedesign a game-like Exploration task, where both intentional(Monsters) and unintentional (Obstacles) adversaries challengethe Explorer robots in achieving their target. In order to mimicrational decision process of an intelligent agent, we propose anew
Game-Theoretic Utility Tree (GUT) architecture combiningthe core principles of game theory, utility theory, probabilisticgraphical models and tree structure decomposing the high levelstrategy to executable lower levels. We show through simulationexperiments that through the use of
GUT , the Explorer agentscan effectively cooperate between themselves and increase theutility of the individual agents and of the global system, andachieve higher success in task completion.
I. I
NTRODUCTION
Natural systems have been the key inspirations in thedesign, study, and analysis of multi-robot and multi-agentsystems (MAS) [1], [2], [3], [4]. For example, when a simpleindividual agent interacts with another agent or with theenvironment, it usually try to find a suitable way adaptingto the current situation and satisfying its basic needs in theshort-term. But in a more complex system with multipleagents, individual agents usually builds a kind of cooperationalliance based on commonly agreed needs between eachother to maximize their benefits, mitigate challenges in theenvironment, and achieve their long-term goals and globalobjectives. Cooperation in MAS can maximize system utilityand guarantee sustainable development for each group mem-ber [5]. On the other hand, it is also important to perceivethe environment and recognize the threats and adversaries inthe environment cooperatively among all agents in the MASteam. An
Adversary in the environment impairs the abilityof the individual agents and the global MAS to achieve theirtasks and also challenges their needs in certain scenarios [6],[7], [8].Following the examples from Information Systems [9], wecan classify an adversarial agent based on their needs andmotivations into two general categories: intentional (whichactively impair the MAS needs and capabilities such asan enemy or intelligent opponent agent) and unintentional (which might potentially or passively threaten MAS abilities,
Fig. 1. An illustrative game scenario where the paths (to Task) of theExplorer robots are blocked by Monster robots. like obstacles and weather). They can also be referred toas deliberate ( Monsters ) and accidental ( Obstacles ) adver-saries, respectively.Recently, researchers have been able to combine the dis-ciplines of Robotics and adversarial MAS into
AdversarialRobotics , which focus on autonomous agents and mobilerobots operating in adversarial environments [10], [11], [12],[6]. Generally, we can describe an adversarial environmentas the scenarios that combine intentional and unintentionaladversaries, which prevent robots from obtaining their needs,achieving their tasks.Moreover, in a dynamically changing environment, agentsfrequently decide to switch their behaviors and actionsaccording to the situation and needs. For example, Agent1 might be recognized as an adversary to Agent 2 in onescenario and when the situation lead to their needs’ changingin the future, they might develop a neutral relationship orbecome an ally and cooperatively perform a task.Most of past and current research focus on the uninten-tional adversaries in the environment, such as path planningavoiding static or dynamical obstacles, formation controlavoiding collision and so forth [13]. This is particularlyapplicable to urban search and rescue missions and robotsdeployed in disaster environments, where the robots are moreconcerned about unintentional threats such as radiation, fire,water, etc.In this paper, we propose a general decision architectureto mimic rational thinking process of intelligent agents inan adversarial environment. We design a simple explorationgame (see Fig. 1) which contains both intentional and un-intentional adversaries. The main contributions of the paper a r X i v : . [ c s . M A ] A p r re outlined below. • First, we define the adversarial environment from arobot needs perspective and treat the two adversaries- unintentional and intentional separately. • Second, we propose a new
Game-Theoretic UtilityTree (GUT) , which combine the principles and meritsof
Game Theory [14],
Utility Theory [15],
ProbabilisticGraphical Model (PGM) [16], and
Tree Structure ex-ploiting its hierarchy. It can calculate the suitable tactics(behaviors) based on current utility in multiple levelsand decompose the high level strategies into low levelexecutable plans. • Third, to tackle the static unintentional adversaries inMAS, we present an efficient distributed algorithmcalled ”
Adapting The Edge ”, which combines individ-ual adapting behaviours and group cooperation together. • Finally, we validate the proposed approach throughextensive simulation experiments demonstrating its util-ity in a simple
Exploration Game considering variousscenarios of Explorer to Monster ratios with distinct co-operative models under different environmental settings.II. R
ELATED W ORK
In majority of the Adversarial Robotics literature, theadversaries are not artificial intelligent agent [10], [11].They might be natural force like wind, fire, rain or othercreatures’ aggressive behaviors. From the task’s perspective,some researchers categorize multi-robots adversarial roboticsinto four main class: • Adversarial Patrol [17] • Adversarial Coverage [10], [11] • Adversarial Formation [13] • Adversarial Navigation [18], [19], [20]But the most challenges usually come from the physicalworld, especially motion, dynamic and continuous spaces.When we model the uncertainty in adversarial environment,we need to consider how to build suitable and specific mod-eling for robots with respect to self and opponent perception,utility calculation, decision making, and motion planning.In the recent research, Lin [21] examined the problemof defending against a sequential attack in a knowledge-able adversarial environment. Prorok [22] did some studiesfocusing on multi-robot privacy with adversarial approach.From the swarm robotics perspective, Sanghvi and Sycara[12] identified a swarm vulnerability and studied how anadversary is take advantage of it. From a machine learningperspective, Paulos and Kumar [23] describe a architecturefor training teams of identical agents in a defense game.Some studies also focus on solving multi-player pursuitand evasion game problem [24], [6], [25], which mainlydeals with how to guide one or a group of pursuers to catchone or a group of moving evaders [26]. This problem coversformation keeping, conflict resolution, and optimum taskallocation [27]. The more recent works mainly concentrateon optimal evasion strategies and task allocation [28], [29]and predictive learning from agents’ behaviors [30]. In our
Explore Game problem, individual agent’s motivation is not to pursue or catch specific agents but based on their sharedneeds and cooperation with each other agents in the systemto explore an adversarial area while satisfying their taskrequirements and mission objectives.However, there is little research done in studying con-frontational strategies, preventive control and behaviors tomitigate intentional adversaries, which can be considered asactive, intelligent opponent agents.Intentional adversaries also play an important role in manyapplications such as military,/defense, where the multi-robotsystem cooperating with each other to achieve some globalmissions do not only have to consider the unintentionaladversaries like wind and obstacles impeding their pathtowards their targets, but also the intelligent adversarialagents such as enemy robots.In a MAS, the agents have to exhibit an awareness ofthe environment not only at an individual agent level butalso at a system level, wher computational game theoryprovide useful examples of the study in the area of machinebehaviour [31]. To address the gaps in the literature, webuild a general
GUT architecture combining
Game Theory [32], [14],
Utility Theory [15], [33],
Probabilistic GraphicalModel(PGM) [16], [34] and
Tree Structure to calculateand decompose the decision strategies specifically to tackleintentional adversaries. We also design an algorithm termed”Adapting The Edge” to help the MRS avoid static uninten-tional adversaries efficiently.III. P
ROBLEM S TATEMENT
Multi-robots or MAS working in adversarial environmentis a complex distributed system, especially multiple groupsof intelligent agents having different purposes interactingwith each other, which will present various relationshipsand behaviours. But the most important challenge in thisscenario is how to organize these robots to work togetherand adopt suitable strategies guaranteeing their maximumutility corresponding to adversaries’ tactics.We design a
Exploration Game mimicking a group ofExplorer agents go through an adversarial environment toexplore the treasure (Target) as shown the Fig. 1. In thisscenario, there are several Monsters (intelligent autonomousagents) representing intentional adversaries randomly dis-tributed in the path of treasure. Once the Monsters detect anyexplorers, they will prevent them from passing through. Twomountain-like Obstacles are considered to be unintentionaladversaries impeding the explorers’ movement task.In this whole process, we assume each Monster do notcommunicate between each other (acting independently) andthey all act based on their greedy self-interest ( individual ra-tionality ), which means each Monster always care about ownbenefits. However, the Explorers can communicate betweenother explorer agents and can share information with eachother representing collective rationality . The problem to besolved is to device the cooperative strategies for the Exploreragents such that they all collectively reach the Target whiletackling both intentional and unintentional adversaries on theway. Through this problem, we aim to evaluate the individual ig. 2. General Individual Robot’s
GUT and system utility of Explorers and Monsters under differentstrategies, scenarios, and environments.IV. A
PPROACH O VERVIEW
To solve the above problem, the autonomous Exploreragents need to adopt various strategies and plans based ontheir current status (needs and utility) and optimize/guaranteethe utility of the individual robots and the collective MASsystem.For the intentional adversaries (Monsters), we design the
GUT architecture Fig. 2, which can calculate each level’stactics based on current utility and decompose the highlevel’s decision into lower levels. We also add two kinds ofassumption in the decision level: one is irrational decisionwhich means if the value of individual utility lower to certainlevel, it will present a kind of instinctive behaviors, suchas escaping, guaranteeing its safety. On the other hand, ifcurrent condition fits the low level needs like safety or basicneeds, it will enter into rational decision process.For simplicity, we build a three level
GUT to elaborate it,which guarantee that the individual robot’s rational decisioncan be decomposed to executable level. The first level (high-level) determine whether or not to attack (or defend). Then,the second level is to figure out the specific agent to beattacked (or defended from). According to the previousdecision, the last level (lowest level) decides how the agentsshould group themselves adapting the current situation.More specifically, in the first level we define Explor-ers and Monsters both have two strategies:
Attack and
Def end , which are represented through
T riangle and
Regular P olygon formation shapes, respectively. Accordingto the payoff matrix Table. I in a zero-sum game, they cancalculate the strategy which can fit for the current situation.Then, based on the precondition in the first level, they needto decide attacking or defending the specific agent. Forexample, we assume that in the attacking model Explorersand Monsters have two kinds of behaviors: attacking the nearest agent or the attacking ability lowest agent in theformation. But in the defending model, they can choosedefending the nearest agent or the attacking ability lowest agent. Through the corresponding payoff matrix Table. II,they can confirm the target sequence. Finally, in the lowestlevel, according to the corresponding tactics payoff matrixTable. III, individual can calculate the final tactics, such
TABLE IL
EVEL
XPLORER & M
ONSTER T ACTICS P AYOFF M ATRIX
MBUtility EB Attack DefendAttack W AA W DA Defend W AD W DD as the number of groups in Explorers and the behavior offollowing others or not in Monsters.Through this process, we decompose the individual agent’sstrategy into three levels and each level focus on differentutilities corresponding to different needs or requirements. Inorder to simplify this calculation process, across every level,we respectively use Winning probability ( W ), the relative ex-pected cost of energy ( E ( e ) ) and HP (health power) ( E ( hp ) )– the expected utility difference of both sides – to measureevery level’s utility. In the first level, the utility presents theWinning probability which relate to the current perceivedadversaries’ number and individual attacking and defendingability. In the second level, we consider the relative expectedenergy cost to describe this level’s utility, which depends onthe agents’ distribution and numbers. In the lowest level,we use the relative expected HP cost representing the utilitycaused by the individual and group’s current information,such as the number of groups and agent’s current energylevel. This decision decomposition process also disassemblethe individual needs into different level, which mimic theintelligent agent’s thinking process [35]. TABLE IIL
EVEL
XPLORER & M
ONSTER T ACTICS P AYOFF M ATRIX
MBUtility EB Nearest A Lowest A HighestNearest E ( e ) NN E ( e ) A L N E ( e ) A H N A Lowest E ( e ) NA L E ( e ) A L A L E ( e ) A H A L A Highest E ( e ) NA H E ( e ) A L A H E ( e ) A H A H )TABLE IIIL EVEL
XPLORER & M
ONSTER T ACTICS P AYOFF M ATRIX
MBUtility EB One Group Two Group Three GroupIndependent E ( hp ) I E ( hp ) I E ( hp ) I Dependent E ( hp ) D E ( hp ) D E ( hp ) D For the unintentional adversaries, we design the
AdaptingThe Edge algorithm, which can help individual agent tackle(static) unintentional adversaries and adapting their edge’strajectory until it finds a suitable route to the goal point.In this process, through the communication and informationsharing with other Explorer agents, individual agent can se-lect the direction having less possibility of potential collisionwith the unintentional adversaries to move. In our scenarios,the two mountains represent the unintentional adversariesand Explorers need to find a path passing through them.Initially, the group of Explorers form the patrol formationo detect the unknown world and select the shortest path tothe treasure. When they perceive an unintentional adversaries(mountains), individual agents need to adapt to their currentsituation and combine perceived and shared information tofind a route passing through them. If they detect intentionaladversaries (Monsters), each Explorer will compute its cur-rent strategy based on
GUT , then through negotiation andagreement cooperating with each other. These present a kindof global behaviors performing
Collective Rationality andcaring about
Group interest . In contrast, each Monster alsofollow the same process but do not cooperate with eachother and is
Self-interest . Additionally, in this process, ifindividual agent’s HP value lower to a threshold level, it willadopt instinct behavior (irrational behaviour) by escaping thecurrent status to fit its safety needs.V. F
ORMALIZATION AND A LGORITHMS
Below, we formalize the adversarial environment and thedecision process in intentional and unintentional adversaries. A. Adversarial Environment
Supposing we have three agents R i , i ∈ in certainscenario. R needs to fulfil the task T satisfying its currentneed N . In order to quantify the agent’s need, we use utilityFunction. 1 to calculate it. N = U ( f , f , . . . f n ) , n ∈ Z + ; (1) f n presents the various factors involving the calculationsuch as time, energy and so forth. According to R ’scapabilities, it will have the solution space S i , i ∈ n to perform. So we assume that R complete T using the S j without any interruption(not considering R and R )and the corresponding utility value is max ( U | s j ) . Then,considering R and R , if R can not find any solution in S i making the Equation. 2 be established. It will regard R and R to Adversary . ( U | s i ) = max ( U | s j ); (2)In additional, if R next s k corresponding to s i canincrease its current U , which means its expected utility lagerthan U Formula. 3, it can be regard to
Intentional Adversary .On the other hand, if R current solution s i does not impact U or U is always zero Formula. 4. 5, we consider R is Unintentional Adversary . E ( U | s k , s i ) > U ; (3) E ( U | s k , s i ) = U ; (4) U = 0; (5) B. Intentional Adversaries Decision
For the intentional decision, we have zero − sum game G in the each level of GUT
Fig. 2, which can be describedas Formula. 6: G = { A, B ; U } ; (6) Using U = ( a ij ) m × n present agent A ’s utility, and thetactics of agent A and B also can be present as Formula. 7and 8: A = { AT , AT , . . . , AT m } ; (7) B = { BT , BT , . . . , BT n } ; (8)In every finite games has a Pure Strategy Nash Equilibriumor a Mixed Strategy Nash Equilibrium, so the process canbe formalized as two step: a. Compute Pure Strategy Nash Equilibrium We can present agents’ utility matrix as Formula. 9: a a · · · a n a a · · · a n ... ... . . . ... a m a m · · · a mn (9)The row and column correspond to the utility of agent A and B separately. Through calculating the minimum valuelist of each row and maximum value list of each column, wecan compute the maximum and minimum values of the twolists separately. min ≤ j ≤ n max ≤ i ≤ m a ij = max ≤ i ≤ m min ≤ j ≤ n a ij (10)If the two value satisfy the Formula. 10, we can get thefirst level Pure Strategy Nash Equilibrium . P SN E = ( A i ∗ , B j ∗ ); (11) V G = a i ∗ j ∗ ; (12) b. Compute Mixed Strategy Nash Equilibrium The tactics’ probability of agent A present as Formula. 13. AX = ( x , x , . . . , x m ); x i ≥ , i = 1 , , . . . , m ; m (cid:88) i =1 x i = 1 (13)Similarly, we also can conclude agent B ’s tactics’ proba-bility as Formula. 14. BY = ( y , y , . . . , y n ); y j ≥ , i = 1 , , . . . , m ; m (cid:88) j =1 y i = 1 (14)As above discussion, ( X, Y ) can be defined as M ixed Situation in certain status. Then, we can deducethe expected utility of agent A and B Formula. 15 and 16separately. E A ( X, Y ) = m (cid:88) i =1 n (cid:88) j =1 a ij x i y j = E ( X, Y ); (15) E B ( X, Y ) = − E ( X, Y ) (16)In the Game G = { A, B ; U } , if we can get all the M ixed T actics of agent A and B as Formula. 17 and18, we can deduct the G ’s M ixed Expansion as Formula.19. Then, if we can compute a tactic ( X ∗ , Y ∗ ) satisfyingormula. 20 and 21, we define this tactic is the optimalstrategy in current state and the game result is Formula. 22. S ∗ A = AX ; (17) S ∗ B = BX ; (18) G ∗ = { S ∗ A , S ∗ B ; E } ; (19) E ( X ∗ , Y ) ≥ V s A , ∀ Y ∈ S ∗ A ; (20) E ( X, Y ∗ ) ≤ V s B , ∀ X ∈ S ∗ B ; (21) V S A = V G = V S B (22)After computing the two steps, if the calculation is thePure Strategy Nash Equilibrium, individual agent can obtaina unique tactic entering into the next level, which meansthis tactic’s possibility is one hundred percent. Otherwise inthe Mixed Strategy Nash Equilibrium, it will present severaltactics with some certain probability. And the set of eachlevel’s feasible solution can present a specific possibilitybeing the next level’s conditional probability. For example,if one of feasible solution’s possibility in level i is P ij , onebranch set of feasible solution’s possibility in the next level i + 1 is P { k | ij } , k ∈ { . . . n } . We can represent this entireprocess as Fig. 2. Until get the lowest level, individual agentcan choose the most possible and suitable tactic set to adaptcurrent situation.To summary, according to the set of individual tactics ineach level, we can build a finite solution space S . Then,through computing each level’s Nash Equilibrium, we canget the corresponding GUT in current state.The most important and challenging part here is the
U tility F unction design, which connect the decision leveland the action(planning) level. It also determines whether ornot the agent or system can calculate a reasonable tactics tooptimize its performance. So in this step, we first assumeeach robot has two basic attributes: Energy Level and HP.According to the discussion in
Section
IV, in the
GUT firstlevel
W inning P robability obey
Bernoulli Distribution ,so the utility present its expectation and we can formalize itas Formula. 23. W ( t ev , t mv , r ev , r mv , n, m ) = ( a a t ev + a r ev a t mv + a r mv ) mn ; (23)The second Level’s utility is described the relative ex-pected energy cost as Formula. 24, 25, 26 and 25. And weconsider three parts of energy cost in the whole process: walking , attacking and communication . E ( d, v, f, q, n, m, φ e , φ m ) = b + b (cid:90) + ∞−∞ ( n − m ) e d ( x ) p d ( x, d )d x + b ( + ∞ (cid:88) i =1 ne a e ( i, f ) p a m ( j, mφ m ) − + ∞ (cid:88) j =1 me a m ( j, q ) p a e ( i, nφ e ))+ b ∞ (cid:88) w =1 ne c ( w ) p c ( w, dv ); (24) e d ( x ) = b x ; (25) e a ( x, y ) = b xy ; (26) e c ( x ) = b x (27)In the entire attack − def end process, the agent’s ac-tion distance, the times of attacks and being attackedand the communication times obey N ormal Distribution , P oisson Distribution separately. So we can describe theirdistribution function as Formula. 28, 29, 30 and 31. p d ( x, d ) = 1 √ π e − ( x − d )22 ; (28) p a e ( x, λ e ) = e − λ e λ xe x ! ; (29) p a m ( x, λ m ) = e − λ m λ xm x ! ; (30) p a m ( x, dv ) = e − dv ( dv ) x x ! ; (31)In the lowest level, we use the expected HP cost to explainthe utility as Formula. 32, 33, 34 and 35. H ( k, t e , t m , r e , r m , g, φ e , φ m ) = c + c ( + ∞ (cid:88) i =1 kh ( t e , r e , i ) p h m ( i, φ m ) − + ∞ (cid:88) j =1 gh ( t m , r m , j ) p h e ( j, φ e )); (32) h ( x, y, z ) = ρz ( x + y ) (33) t e,m ( e e,m ) = γ e,m e e,m ; (34) r e,m ( e e,m ) = δ e,m e e,m ; (35) p h e ( i, φ e ) and p h m ( j, φ m ) are similar to the Formula. 29and 30 correspondingly. Here, n and m present the numberof Explorers and Monsters separately; d presents the groupaverage distance between two opponents; v presents theagent’s velocity; i and j present the times of attacks andbeing attacked; w presents Explorers’ communication times; f and q present the unit attacking energy cost of both sidesagents separately; t ev and t mv present average attackingability levels of both sides separately; r ev and r mv presentaverage defending ability levels of both sides separately; t e and t m present specific agent’s attacking ability levels ofboth sides separately; r e and r m present specific agent’sdefending ability levels of both sides separately; φ e and φ m present individual agent’s size; k presents the number ofExplorers’ attacking simultaneously; g presents the numberof Monsters’ attacking simultaneously; a , b , c , ρ , γ and δ present corresponding coefficient; e e and e m present thecurrent energy level of Explorer and Monster; h present thecurrent HP level of Explorer and Monster; p presents theprobability corresponding to the different section.According to above discussion, we assume that Explorerand Monster have the same speed to move and Monstercan not communicate with each, then we can simplify theormula. 24 and 32 as Formula. 36 and 37. E ( d, v, f, q, n, m, φ e , φ m ) = b + b b ( n − m ) d + b b nm ( f φ m − qφ e )+ b b n dv ; (36) H ( k, t e , t m , r e , r m , g, φ e , φ m ) = c + c ρ [ kφ m e e ( γ e + δ e ) − gφ e e m ( γ m + δ m )]; (37)Through the Formula. 23, 36 and 37, individual can calcu-late the utility and get the Nash Equilibrium in each level’spayoff matrix. After computing the entire GUT , it needsto combine each level’s tactics and execute the integratedstrategy in the planning level. We can describe the decisionprocess as Alg. 1.
Algorithm 1:
Explore Game
GUT
Model
Input :
Explorers’ and Monsters’ states
Output : formation shape s ; current attacking target t ;number of groups g . set state = ” level one ” ; while the changing of Monster’s number == True And
Monster’s number != 0 do if state== ” level one ” then Compute the Nash Equilibrium; Get the most feasible formation shape s ; state = ” level two ” else if state== ” level two ” And s != Null then Compute the Nash Equilibrium; Get the most feasible attacking target t ; state = ” level three ” else if state== ” level three ” And s, t != Null then
Compute the Nash Equilibrium;
Get the most feasible number of groups g ; end end if Monster’s number == 0 then s = ” P atrol ” ; g = 1; end return s, t, g C. Unintentional Adversaries Decision
When Explorers perceive the mountains which can beregarded as the static unintentional adversary, they need toutilize the limited information through communication andperception passing through them. In our experiment, thescenario can be describe as Fig. 3. There are nine robotsand R and R detect the mountain. For robot R , in orderto avoid collision, it needs to switch current direction k totangent’s direction of the nearest collision point c . Since ithas two direction n and m , according to R ’s state, it shouldselect the direction n which has more non-collision robotscurrently.Specifically, considering the line L , which is vertical totangent through point c , is the boundary. For the direction n ,there are five robots but only four robots R , R , R and R do not have collision. We also can see the non-collisionrobots’ number is three in direction m . So currently, R will Fig. 3. Illustration of multi-robot ”Adapt The Edge” formation controlalgrithm for obstacle avoidance. select direction n to move certain distance ∆ d , then adjustthe direction to the goal point to go forward and loop theentire process until perceive no unintentional adversaries onits routing. Combining the two kinds of decision, we presentthe entire decision process as Alg. 2. For simplicity, we onlyselect the maximum feasible solution in each level enteringthe next level. Algorithm 2:
Adapting The Edge
Input :
Explorers’ and mountains’ states
Output : moving direction r and distance ∆ d while The nearest collision point c ! = Null do calculate the number n and m of non-collision agents inboth side of the line l passing through c andperpendicular c ’s tangent; if n > m then r = n side in line l ; ∆ d = one step of agent’s movement; else if n == m then agent stop; else if n < m then r = m side in line l ; ∆ d = one step of agent’s movement; end end return r = current position to goal point, ∆ d VI. E
VALUATION THROUGH S IMULATIONS
Considering cross-platform, scalability, efficiency and ex-tendability of the simulation, we chose “Unity” [36] tosimulate the
Explore Game and select Gamebit [37] forcalculating each level’s Nash Equilibrium. Because it isan open-source collection of tools for doing computationin game theory and can build, analyze, and explore gamemodels cross-platform.Our experimental evaluation is focused on the distributionof agents’ tactics in possible solution space and imple-menting different parameter’s predictive model to analysethe system utility and cost. In our model, when individualcalculate the utility, it relate to three main factors: thenumber of agents n , individual unit attacking energy cost ig. 4. Experiments on Monsters with Different Distribution Considering Unintentional Adversaries e u , agent’s current energy level e c . Since the parameters e u and e c might not clear for both sides, we design twokinds of experiments to evaluate the system’s performance.One is Complete Information , which means Explorer andMonster know each other’s status. On the other hand is
Incomplete Information and agents need to use predictivemodels predicting opponent states and parameters.In each experiment, we first consider no unintentionaladversaries situation, and design 17 different scalabilityscenarios. Then we involve the unintentional adversaries(tworocks) and fix the both sides’ number distributing the Mon-sters in different position with three proportion comparingthe system’s performance. We also implement
CollectiveRationality and
Individual Rationality in Explorer’s side forall the scenarios.We suppose each Explorer has same battery and HP levelsinitially and every moving step will cost . energy.Also, every communication round and per time attacking willcost . and . energy, respectively. If Explorer isbeing attacked by Monster, it will cost . HP per time.For the Monster, per time attacking energy and per timeattacked HP cost are . and . , respectively.In the complete information strategy, we assume that ifindividual agent can perceive the adversary, it will knowthe opponent status, such as unit attacking energy cost andenergy level, vice versa. For incomplete information part,we use two different kinds of predictive models (linear andnonlinear) based on individual HP cost to predict opponentunit attacking energy e u and agent’s current energy e c . a) Linear Predictive Model: In the linear predictivemodel, we use agent’s unit HP cost and average systemHP cost to predict opponent unit attacking energy cost andenergy level correspondingly. The model can be presented asFormula. E uc = a ∗ HP uc + b ∗ n and E el = 100 − HP asc ∗ c + b ∗ n ; b) Nonlinear Predictive Model: For the nonlinear pre-dictive model, we just replace the above formulas’ linearpart with the natural logarithm formula. And the modelsare shown as E uc = d ∗ ln ( HP uc ) + b ∗ n and E el =100 − ln ( HP asc ) ∗ e + b ∗ n ; . Here, a , b , c , d and e arethe coefficients, n presents the noise following the normaldistribution N (0 , . A. Environments with only intentional adversary
In this setting, we consider three kinds of environmentswith fixed number of Explorers (E=25) and Monsters (M=25)but with different Monster distributions (D2, D3, D4) asshown in Fig. 4. Through this experiment, we compare the Explorer HPcost and the number of Explorer losing of killing perMonster, which can evaluate the game’s difficulty level.And we also analysis the average HP and energy costof per Explorer completing the task, which can presentthe system’s performance and utility. We found that theperformance of Explorer cooperation is always better thanthe noncooperation’s, which means collective rationality canbring more benefit and interest comparing self-interest whenmulti-agents work with each other. Also the number ofExplorer losing of killing per Monster and the Exploreraverage HP cost present linear positive correlation with theratio of Monster and Explorer.Comparing the strategy of Explorer and Monster in TableIV, we can notice that if the Monster’s main strategy isattacking, the Explorer will lose the game. On the hand, ifthe Explorers’ main strategy is attacking or the ratio of thetimes of attacking and defending higher than certain level,they will win the game.
TABLE IVS
TRATEGY C OMPARISON NO U NINTENTIONAL A DVERSARY . R A : R ATIOOF E XPLORERS TO M ONSTERS , R:
RESULT , √ : WIN , C OM : COMPLETEINFORMATION , L:
LINEAR , A:
ATTACKING , D:
DEFENDING , AD : FREQUENCY OF AGENTS ’ ATTACK / DEFEND BEHAVIORS . Ra Com Incom L Incom NonL NoncoopR A e D e A m D m R A e D e A m D m R A e D e A m D m R A e D e A m D m √ √ √ ∞ √ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ ∞ √ √ ∞ √ ∞ √ ∞ √ √ √ ∞ √ ∞ √ √ B. Environments with unintentional adversary
In this experiment, we consider 25 Monsters with four dif-ferent distribution Fig. 4. Through the system performance’sanalysis Table. V, we can clearly notice that the first scenariohas better performance than others comparing with Exploreraverage energy cost av e and Explorer average HP cost hp e .Fig. 6 and 7 provide the strategies distribution on in thesolution space based on different scenarios, which follow thenormal distribution with different µ and σ . And if the σ is ig. 5. Experiments on Scalability and Complexity.Fig. 6. Normalized frequency of choosing Attack-Nearest-Three Group in Explorers. Normalized frequency of choosing different behaviors of Monstersin diffrent game strategies under different environments (M=E=25).Fig. 7. Normalized frequency of choosing Attack-Nearest-Three Group inExplorers. large, this means the current situation has lots of uncertaintyand the group tactics (Explorer) or the individual tactics(Monster) will represent more possibility combination in thesolution space.Our experiment verify that the collective rationality can bring more benefit and interest comparing self-interest whenmulti-agents work with each other. Also through reducingthe solution space by the SGDT calculation, we analysethe correlation between system performance and the tacticsof group and individual. A video demonstration of theexperiments is available in the anonymized link at https://streamable.com/bmblm . TABLE VP
ERFORMANCE C OMPARISON WITH U NINTENTIONAL A DVERSARY . av e :E XPLORER AVERAGE ENERGY COST , hp e : E XPLORER AVERAGE HP COST , l e l m : THE PROPORTION OF LOSING NUMBER OF E AND M. Envir. Com Incom L Incom NonL Noncoop av e hp e l e l m av e hp e l e l m av e hp e l e l m av e hp e l e l m D1 48.72 82.18 0.91 52.08 91.72 1.05 50.53 80.09 0.90 6.60 - -D2 59.89 90.95 1.00 67.67 80.86 0.64 63.10 90.42 0.92 7.91 - -D3 63.51 80.62 0.65 62.18 89.87 0.95 62.85 89.18 1.10 5.66 - -D4 62.80 93.20 1.09 56.20 90.16 1.00 58.64 82.72 0.74 6.54 - -
VII. C
ONCLUSIONS
Our work introduces a general decision framework
GUT- Game-Theoretic Utility Tree to mimic intelligent agentthinking process in adversarial environment, which combinehe game theory, utility theory, probabilistic graphical modeland tree structure. We define and formal the
AdversarialEnvironment from robot’s needs perspective and classifyadversary into unintentional and intentional . In order totackle the static unintentional adversaries in multi-agentssystem, we present the
Adapting The Edge distributed algo-rithm. Finally, we validate our approach through extensivesimulation experiments.The proposed architecture provide the decision level basedon our previous work SRSS, which combine the low levelplanning framework causing intelligent agent adapting andcooperating dynamical environment through its decision.This approach also leave lots of future work for furtherresearch, such as individual learning from various scenar-ios helping the entire system upgrade, decreasing duplicatecalculation saving computing resource in decision level,the appropriate utility function designing, optimizing andbuilding suitable predictive model and parameters estimation.In addition, we also plan to implement our framework in thereal robots, which can help us to develop better verificationprocedures for computational models of these systems.R
EFERENCES[1] J. S. Shamma,
Cooperative control of distributed multi-agent systems .Wiley Online Library, 2007.[2] C. Blum and D. Merkle, “Swarm intelligence,”
Swarm Intelligence inOptimization; Blum, C., Merkle, D., Eds , pp. 43–85, 2008.[3] E. Bonabeau, M. Dorigo, and G. Theraulaz, “From natural to artificialswarm intelligence,” 1999.[4] M. Dorigo, M. Birattari, and M. Brambilla, “Swarm robotics,”
Schol-arpedia , vol. 9, no. 1, p. 1463, 2014.[5] J. Shen, X. Zhang, and V. Lesser, “Degree of local cooperation and itsimplication on global utility,” in
Proceedings of the Third InternationalJoint Conference on Autonomous Agents and Multiagent Systems-Volume 2 . IEEE Computer Society, 2004, pp. 546–553.[6] T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-evasionin mobile robotics,”
Autonomous robots , vol. 31, no. 4, p. 299, 2011.[7] M. Jun and R. DAndrea, “Path planning for unmanned aerial vehiclesin uncertain and adversarial environments,” in
Cooperative control:models, applications and algorithms . Springer, 2003, pp. 95–110.[8] N. Agmon, S. Kraus, and G. A. Kaminka, “Uncertainties in adver-sarial patrol,” in
Proceedings of The 8th International Conference onAutonomous Agents and Multiagent Systems-Volume 2 . InternationalFoundation for Autonomous Agents and Multiagent Systems, 2009,pp. 1267–1268.[9] M. Jouini, L. B. A. Rabai, and A. B. Aissa, “Classification of securitythreats in information systems,”
Procedia Computer Science , vol. 32,pp. 489–496, 2014.[10] N. Agmon, G. A. Kaminka, and S. Kraus, “Multi-robot adversarialpatrolling: facing a full-knowledge opponent,”
Journal of ArtificialIntelligence Research , vol. 42, pp. 887–916, 2011.[11] R. Yehoshua and N. Agmon, “Adversarial modeling in the roboticcoverage problem,” in
Proceedings of the 2015 International Confer-ence on Autonomous Agents and Multiagent Systems . InternationalFoundation for Autonomous Agents and Multiagent Systems, 2015,pp. 891–899.[12] N. Sanghvi, S. Nagavalli, and K. Sycara, “Exploiting robotic swarmcharacteristics for adversarial subversion in coverage tasks,” in
Pro-ceedings of the 16th Conference on Autonomous Agents and MultiA-gent Systems . International Foundation for Autonomous Agents andMultiagent Systems, 2017, pp. 511–519.[13] Y. Shapira and N. Agmon, “Path planning for optimizing survivabil-ity of multi-robot formation in adversarial environments,” in . IEEE, 2015, pp. 4544–4549.[14] R. B. Myerson,
Game theory . Harvard university press, 2013.[15] P. C. Fishburn, “Utility theory for decision making,” Research analysiscorp McLean VA, Tech. Rep., 1970. [16] D. Koller and N. Friedman,
Probabilistic graphical models: principlesand techniques . MIT press, 2009.[17] N. Agmon, S. Kraus, G. A. Kaminka, and V. Sadov, “Adversarialuncertainty in multi-robot patrol,” in
Twenty-First International JointConference on Artificial Intelligence , 2009.[18] O. Keidar and N. Agmon, “Safety first: Strategic navigation inadversarial environments,” in
Proceedings of the 16th Conference onAutonomous Agents and MultiAgent Systems . International Foun-dation for Autonomous Agents and Multiagent Systems, 2017, pp.1581–1583.[19] N. Agmon, Y. Elmaliah, Y. Mor, and O. Slor, “Robot navigation withweak sensors,” in
International Conference on Autonomous Agents andMultiagent Systems . Springer, 2011, pp. 272–276.[20] O. Keidar and N. Agmon, “Safe navigation in adversarial environ-ments,”
Annals of Mathematics and Artificial Intelligence , vol. 83,no. 2, pp. 121–164, 2018.[21] E. S. Lin, N. Agmon, and S. Kraus, “Multi-robot adversarial patrolling:Handling sequential attacks,”
Artificial Intelligence , vol. 274, pp. 1–25,2019.[22] H. Zheng, J. Panerati, G. Beltrame, and A. Prorok, “An adversarialapproach to private flocking in mobile robot teams,” arXiv preprintarXiv:1909.10387 , 2019.[23] J. Paulos, S. W. Chen, D. Shishika, and V. Kumar, “Decentralization ofmultiagent policies by learning what to communicate,” arXiv preprintarXiv:1901.08490 , 2019.[24] R. Vidal, S. Rashid, C. Sharp, O. Shakernia, J. Kim, and S. Sastry,“Pursuit-evasion games with unmanned ground and aerial vehicles,” in
Proceedings 2001 ICRA. IEEE International Conference on Roboticsand Automation (Cat. No. 01CH37164) , vol. 3. IEEE, 2001, pp.2948–2955.[25] A. Kolling and S. Carpin, “Multi-robot pursuit-evasion without maps,”in .IEEE, 2010, pp. 3045–3051.[26] P. Cheng, “A short survey on pursuit-evasion games,”
Department ofComputer Science, University of Illinois at Urbana-Champaign , 2003.[27] J. S. Jang and C. Tomlin, “Control strategies in multi-player pursuitand evasion game,” in
AIAA guidance, navigation, and control con-ference and exhibit , 2005, p. 6239.[28] W. L. Scott III, “Optimal evasive strategies for groups of interact-ing agents with motion constraints,” Ph.D. dissertation, PhD thesis,Princeton University, 2017.[29] V. R. Makkapati and P. Tsiotras, “Optimal evading strategies and taskallocation in multi-player pursuit–evasion problems,”
Dynamic Gamesand Applications , pp. 1–20, 2019.[30] S. Shivam, A. Kanellopoulos, K. G. Vamvoudakis, and Y. Wardi, “Apredictive deep learning approach to output regulation: The case ofcollaborative pursuit evasion,” arXiv preprint arXiv:1909.00893 , 2019.[31] I. Rahwan, M. Cebrian, N. Obradovich, J. Bongard, J.-F. Bonnefon,C. Breazeal, J. W. Crandall, N. A. Christakis, I. D. Couzin, M. O.Jackson et al. , “Machine behaviour,”
Nature , vol. 568, no. 7753, p.477, 2019.[32] J. F. Nash et al. , “Equilibrium points in n-person games,”
Proceedingsof the national academy of sciences , vol. 36, no. 1, pp. 48–49, 1950.[33] M. J. Kochenderfer,
Decision making under uncertainty: theory andapplication . MIT press, 2015.[34] M. I. Jordan, “An introduction to probabilistic graphical models,”2003.[35] Q. Yang, Z. Luo, W. Song, and R. Parasuraman, “Self-ReactivePlanning of Multi-Robots with Dynamic Task Assignments,” in
IEEEInternational Symposium on Multi-Robot and Multi-Agent Systems(MRS) 2019 , 2019, extended Abstract.[36] U. G. Engine, “Unity game engine-official site,”