Cooperative coevolution of real predator robots and virtual robots in the pursuit domain
CCooperative coevolution of real predator robots andvirtual robots in the pursuit domain (cid:73)
Lijun Sun a , Chao Lyu a , Yuhui Shi a, ∗ a Shenzhen Key Laboratory of Computational Intelligence, Department of Computer Scienceand Engineering, Southern University of Science and Technology, China
Abstract
The pursuit domain, or predator-prey problem is a standard testbed for thestudy of coordination techniques. In spite that its problem setup is apparentlysimple, it is challenging for the research of the emerged swarm intelligence.This paper presents a particle swarm optimization (PSO) based cooperativecoevolutionary algorithm for the predator robots, called CCPSO-R, where realand virtual robots coexist for the first time in an evolutionary algorithm (EA).Virtual robots sample and explore the vicinity of the corresponding real robotand act as their action spaces, while the real robots consist of the real predatorsswarm who actually pursue the prey robot without fixed behavior rules underthe immediate guidance of the fitness function, which is designed in a modularmanner with very limited domain knowledges. In addition, kinematic limitsand collision avoidance considerations are integrated into the update rules ofrobots. Experiments are conducted on a scalable predator robots swarm with4 types of preys, the statistical results of which show the reliability, generality,and scalability of the proposed CCPSO-R. Finally, the codes of this paper arepublic availabe at: https://github.com/LijunSun90/pursuitCCPSO R. (cid:73)
This work is partially supported by National Key R&D Program of China under theGrant No. 2017YFC0804002, National Science Foundation of China under the Grant No.61761136008, Science and Technology Innovation Committee Foundation of Shenzhen un-der the Grant No. ZDSYS201703031748284, Shenzhen Peacock Plan under the GrantNo. KQTD2016112514355531, Program for Guangdong Introducing Innovative and En-trepreneurial Teams under the Grant No. 2017ZT07X386. ∗ Corresponding author: Yuhui Shi.
Email addresses: [email protected] (Lijun Sun), (Chao Lyu), [email protected] (Yuhui Shi)
Preprint submitted to Journal of L A TEX Templates January 24, 2019 a r X i v : . [ c s . R O ] J a n eywords: Swarm intelligence, Cooperative coevolution, Particle swarmoptimization, Pursuit domain, Virtual robot
1. Introduction
The pursuit domain, or predator-prey problem is a classical and interestingresearch domain which acts as one of the widely used fundamental testbedsfor coordination techniques since it was proposed by Benda et al. [1]. Onone hand, its apparently simple problem setup and flexibility in approaches orconcept evaluations lead to both its popular and the toy domain impression.On the other hand, it is challenging and thus a good domain for the researchof swarm intelligence emerged from the cooperations among robots or agents,which has drawn much attention of researchers on various versions of the pursuitdomain.At first, greedy coordination strategies were manually designed by Korf [2],part of which were improved by Haynes et al. [3]. After that, Haynes et al.[3–6] tried to improve the pursuit performance using evolutionary algorithms,such as genetic programming (GP) [7], strongly typed genetic programming(STGP) [8], and cases learning methods successivelly. In addition, Undegerand Polat [9] treated the multi-agent dynamic pursuing problem in partiallyobservable environments with obstacles as the dynamic path planning and taskallocation problem, and proposed the multi-agent real-time pursuit (MAPS)algorithm. Besides, much works have been done in the field of reinforcementlearning (RL). For example, Ishiwaka et al. [10] investigated the mechanism forthe emergence of the predators’ cooperative behaviors aiming to capture theprey in the continous world. Barrett et al. [11, 12] evaluated the designed adhoc teamwork performance in the pursuit domain as one benchmark task. Asresearches going on, the capture reliablity and the efficiency of approaches haveboth been improved. Finally, a detailed survey on the pursuit domain was givenby Stone and Veloso [13].In this paper, we deal with the dynamic pursuit domain problem with a scal-2ble predator robots and types of the prey in bounded diagonal grid worlds. Dif-ferent from prior work, this paper treats the pursuit domain as an optimizationproblem and proposes a particle swarm optimization (PSO) based cooperativecoevolutionary (CC) algorithm, called CCPSO-R (R is for robots), where, tothe best knowledge of authors, real and virtual robots coexist for the first timein an evolutionary algorithm (EA). In detail, we have n subpopulations, each ofwhich evolves independently with the same population size. The first individualof each subpopulation always corresponds to a unique real robot, which consistsof the cooperative real predator robots swarm pursuing the prey. The rest arevirtual robots, which are always deployed around the corresponding real robot,exploring the real robot’s vicinity in order to guide the real robot to a moreadvantageous position under the supervision of the fitness function defined onthe pursuit task. Hence, in the view of the multiagent system (MAS), thesevirtual robots can be seen as the action space for each real robot. Since thenumber of the virtual robots is less than the vicinity size of of a real predator,the exploration of virtual robots is actually a sampling rather than an exhaustedvicinity exploration, whch is guided by a proven efficiency swarm intelligencealgorithm – PSO. Therefore, the proposed CCPSO-R can be expected to bemore efficient and effective.In addition, the collision consideration among real robots is integrated intothe fitness function design, which not only separates the robotic considerationsfrom the EAs itself and is thus different from the robotic PSO (RPSO) [14] –the PSO variant specially designed for robots, but also enhances the flexibilityof the fitness function by modular design.Furthermore, unlike previous incrementally constructed EA based methodsand RL algorithms, the proposed CCPSO-R is actually an on-line algorithmwhich plans one step ahead for each robot and can reliably capture the prey evenwithout the training and learning stage under the immediate guidances of thefitness function. Meanwhile, similar to the common strategy in RL algorithms,the prey robot and the rest predator robots (agents in MAS) are treated asparts of the dyanmic environment to the current robot without any central3ommander/controller.The rest of the paper is organized as follows. First, the pursuit domain def-inition and details adopted here are explained in Section 2. Then the proposedCCPSO-R is described in Section 3. Experiments, corresponding results anddiscussions are presented in Section 4. Finally, conclusions and directions forfuture researches are contained in Section 5.
2. The pursuit / predator-prey domain
Generally speaking, the pursuit domain problem is a game where predatorstry to capture the prey with or without coordination. However, as summarizedin [13] and metioned above, the pursuit domain has various versions dependingon different combinations of its parameters, such as the type and size of theworld, definition of the capture, team size, legal moves and move orders forthe predators and prey, distance metirc, etc. In many researches, a toroidalworld, where a robot comes out of one edge will comes in immediately from theopposite edge, is selected to simulate an infinite world. However, this kind ofworld is not practical. As depicted in Figure 1a, if the red pentagram is a linearprey which moves in a straight line towards north and just escapes the nearlyencirclement of the predaotors (blue squares), which have the same speed asthe prey, in the real infinite world, the predators will never catch the prey insuch a situation. But in the toroidal world, if the predators move as shown inFigure 1b, they will capture the linear prey in the next step. Therefore, in thispaper, rather than toroidal worlds, bounded grid worlds are selected, which canat least represent part, although not all, of the real world scenarios, such as anindoor room or an outdoor park with boundaries, etc.Besides, as classfied by Korf [2], the game with a discrete world (grid worldhere) that only allows horizontal and vertical, totally 4 directions movements, iscalled the orthogonal game, while the one which allows the horizontal, verticaland diagonal 8 directions move is called the diagonal game. Again, towardsreal applications, the diagonal game is more realistic [2] and thus one of the as-4 a) (b)
Figure 1: One trick of the coordination strategy in a toroidal world. sumptions of this paper. In particular, no collisions are allowed and orthogonalobstacles will be considered when the real (predator or prey) robot moves diag-onally, as illustrated in Figure 2. Under these assumptions, the capture can bedefined as that each of all the available orthogonal neighbors of the prey robothas been occupied by a predator robot as shown in Figure 3. This may be dif-ferent from the definitions of some research work, espeically the RL algorithmsor path planning based methods, such as [9], where the capture is defined asthat the position of the prey is occupied by a predator.
Figure 2: Illustration of the orthogonal obstacles.
As for the other details of the pursuit game, the prey robot always movesfirst and then the predator robots move one by one in a fixed order, the naturalpriorities of which can make the collision avoidance control easier and more5 a) (b) (c)
Figure 3: Illustration of the capture, taking 4 predaotor robots as an example. reliable. Besides, the position of any real (prey or predator) robot is visible to allthe other real (prey or predator) robots, but there is no explicit communications,i.e. no explicit negotiations or coordinations, among the real predator robots. Inanother word, the coordination among real predator robots, or subpopulations,are implicit here. In addition, as will be seen later, no fixed behavior rules forthe predator robots exist due to the fact that the evolution, or the one stepahead plan, of a predator robot in the dynamic environment is only guided bythe fitness function.
3. Cooperative coevolution of real and virutal robots
Since the task is to encircle a prey robot using a swarm of predator robots,the fitness function should subject to the following metrics: • CLOSURE : the prey robot should locate inside the convex hull of thepredator robots’ positions; • SWARM EXPANSE : the predator robots swarm should concentrate aroundthe prey robot, i.e., a smaller swarm expanse of the predator robots is pre-ferred; • UNIFORMITY : the predator robots should distribute uniformly aroundthe prey robot; 6
COLLISION
AVOIDANCE: collisions among predator robots and prey-predator robots are not allowed in the practical sense.It is obvious that a single predator robot itself cannot form a solution to thetask. In CCPSO-R, a complete solution to the pursuit problem is composedby the positions of all the predator robots. So, the fitness function can beformulated as follows.
Definition of Convex Hull [15]:
The convex hull of the point set P ,denoted by conv( P ), is the intersection of all convex regions that contain P .Further, we define the function: inconv ( p, conv ( P )) def = , if point p is in conv ( P )0 . , if point p is on the edge of conv ( P )1 , if point p is outside of conv ( P ) (1)Hence, the fitness function for the j th ( j = 1 , ..., N p ) individual (robot) inthe i th ( i = 1 , ..., N s ) subpopulation p ijrobots is f ij = f ijrepel · ( f ijclosure + f ijexpanse + f ijuniformity ) (2)where f ijrepel = e − · ( NND ij − D min ) , if N N D ij < D min , else (3) f ijclosure = inconv ( p prey , conv ( p robots , ..., p ijrobots , ..., p N s robots )) (4) f ijexpanse = 1 N s ( N s (cid:88) k =1 ,k (cid:54) = i | p k robots − p prey | + | p ijrobots − p prey | ) (5)and f ijuniformity = std N N N N (6)in which p prey is the position of the prey robot, N N D ij represents the nearestneighbor distance which is the minimum of the pairwise Euclidean distancesbetween the j th individual in the i th subpopulation and all the real predatorrobots in the other subpopulations, std ( · ) stands for the standard deviation7unction, and N kh ( k = 1 , h = 1 ,
2) is the counts of the real predator robots inthe ( k, h )-th bin out of the overall 4 bins splitted by the horizontal and verticallines which intersect at the position of the prey robot, as shown in Figure 4. Notethat, the number of the real predator robots on the split lines is divided by 2 andequally assigned to the two adjacent bins. Hence, N = N = N = N = 1for the example in Figure 4b. However, since the formula (6) cannot always givethe right objective uniformity assessment which is consistent with a human’ssubjective judgement, as the deadlock phenomenon shown in Figure 5a, analternative space split strategy is performed as shown in Figure 5b, and thefollowing uniformity assessment will replace equation (6) in such situations: f ijuniformity = std ([ N , N , N , N ]) + std ([ N , N , N , N ]) , (7)the first and second part of which are the axial and diagonal uniformity assess-ment, respectively. (a) 4 bins splitted around theprey robot. (b) An example for the unifor-mity assessment. Figure 4: Illustration of the uniformity assessment.
To be more clearer, the fitness evaluation of p ijrobots , i.e., the j th ( j =1 , ..., N p ) individual (robot) in the i th ( i = 1 , ..., N s ) subpopulation, is illus-trated in Figure 6. 8 a) A deadlock attributed tothe uniformity assessment byequation (6). (b) An alternative splitmethod for the uniformityassessment of equation (7). Figure 5: Illustration of the alternative uniformity assessment.Figure 6: Illustration of the fitness evaluation for p ijrobots . .2. The proposed CCPSO-R algorithm In CCPSO-R, there are N s independently evolved subpopulations with sub-population size N p , and the first individual of each subpopulation represents aunique real robot while the others represent virtual robots. All the real robotsconsist of the predator robots swarm which actually pursue the prey robot in thegrid world, while the virtual robots are to explore the vicinity of the correspond-ing real predator robot in its subpopulation and guide the predator robot to abetter position. So in this sense, virtual robots can be seen as the action spaceof the corresponding real predator robot. The real predator robot chooses itslocally optimal action, but in terms of the global benefit of the whole predatorsswarm. That is, the evaluation of the position of a robot is conducted by consid-ering the positions of the rest real predator robots in the other subpopulations.Since the proposed algorithm works as the modes of cooperative coevolutionaryalgorithms (CCEAs), it is called the cooperative coevolutionary PSO for robots(CCPSO-R), as illustrated in Algorithm 1. Two update rules are designed separately for virtual and real robots:1. For a virtual robot j ( j ∈ { , ..., N p } ), the PSO update rules are as follows: v ijrobots = nnd ( w · v ijrobots + c · r · ( pi ijrobots − p ijrobots )+ c · r · ( pg irobots − p ijrobots )) (8) p ijrobots = nbn (( p ijrobots + v ijrobots ) , p i robots ) (9)where nnd ( v ) = arg min p n ∈ SN | ∠ p n − ∠ v | (10)and SN = { (1 , , (1 , , (0 , , ( − , , ( − , , ( − , − , (0 , − , (1 , − } . (11) nnd ( v ) outputs one of the 8 unit vectors in SN which has the minimum angledistance with the newly generated velocity v . By using the function nnd ( · ),10 lgorithm 1: CCPSO-R Initialization while the prey is not captured and time limit is not reached do for each subpopulation do Re-evaluate the subpopulation due to environmental changes for each virtual robot do Update its velocity and position using (8) and (9) Evaluate the fitness together with the rest real predator robots if unique virutal robots < T v then Re-initiate and re-evaluate the virtual robots Update the velocity and position of the real predator robot using(13) and (14) Evaluate the fitness of the real predator robot together with therest real predator robots if the real predator robot becomes the global best then Re-initiate and re-evaluate the virtual robots else if the predator robot gets trapped in the local optimum then Add a random noise to the real predator robot’s position Re-evaluate the whole population if the real predator robots swarm get trapped in the local optimum then Add random noises to all the real predator robots’ positions Re-evaluate the whole populationevery robot can only move one step by one step. In this way, unlike the multi-steps case in a general PSO, the path planning and the worry about collisionsin the half way to a destination are not ever necessary . v ijrobots is the velocityfor the j th individual (robot) in the i th subpopulation which has the position p ijrobots . In addition, pi ijrobots is the individual historical best position for the11 th individual (robot) in the i th subpopulation, while pg irobots is the global bestposition of the i th subpopulation. The coefficient w ∈ R is called the inertiaweight, c , c ∈ R + , and r , r are uniformly distributed random numbers in therange of (0 , nbn ( p ijrobots , p i robots ) = arg min p i b | ∠ ( p i b − p i robots ) − ∠ ( p ijrobots − p i robots ) | , if p ijrobots is out of the vicinity of p i robots p ijrobots , else (12)outputs the nearest boundary neighbor p i b in the constrainted vicinity of thereal predator robot p i robots . This function is illustrated in Figure 7, where theconstrained vicinity of p i robots is shown in a dashed square, which is determinedas the minimum one that can accommodate all the unique virtual robots. Notethat, the nbn ( p ijrobots , p i robots ) function in Equation (9) is very important becauseit can assure all the virtual robots p ijrobots ( j ≥
2) are in the constrained vicinityof the real predator robot p i robots , without which the subpopulation may losethe vicinity exploring capability for the real predator robot. Figure 7: Illustration of the function nbn ( p ijrobots , p i robots ) in Equation (12).
12. For the real robot ( j = 1), the PSO update rules are as follows: v i robots = nnd ( pg irobots − p i robots ) (13) p i robots = p i robots + v i robots (14)So, the real predator robot does not need to perform any exploring task, butjust quickly becomes the global best in its subpopulation.To summarize, by utilizing different optimization mechanisms for differentkinds of robots, virtual robots are responsible for exploring and finding potentialbetter positions in the vicinity of the real predator robot, while the real predatorrobot in each subpopulation just makes use of the achievements of the virtualrobots and becomes the global best. From the practical point of view, no collisions of any two real robots areallowed. Since we have totally N s subpopulations in the cooperative coevolutionpopulation, a priority scheduler is used to coordinate among them, as shown inFigure 8.To be as simple as possible, here we let the priorities be in consistent with theindexes of the subpopulations. In another word, after the prey robot moves, thesubpopulations evolve one-by-one and the newly updated real predator robot iscounted into the dynamics of the environment for the fitness evaluations of thefollowing subpopulations. So, if k > h , the predator p k robots always moves aheadof the predator p h robots .To evaluate the fitness of the j th individual in the i th subpopulation, acomplete solution should be first composed by replacing the i th real predatorrobot with p ijrobots from the real predator robots swarm: (cid:104) p robots , ..., p ( i − robots , p ijrobots , p ( i +1)1 robots , ..., p N s robots (cid:105) . Then, the fitness of a robot can be evaluated by (2).13 igure 8: Illustration of the priority scheduler for the CC based fitness evaluation.
When a swarm intelligence algorithm converges, all individuals may be at-tracted to the same position, no matter it is the global or local optimum. How-ever, for the pursuit case here, the convergence of virtual robots in a subpopu-lation brings the disadvantage that the capability of exploring potential betterpositions is getting worse. Therefore, if the number of unique virtual robotsin a subpopulation is defined as the subpopulation diversity, the diversity ofeach subpopulation must be maintained to keep its exploring capability. Be-sides, due to the existence of unexpected deadlocks, suitable strategies shouldbe integrated in the coordination algorithm to deal with such problems.Based on the above ideas, we propose the diversity maintainance mechanismswhich are performed as follows: • Update the population in each generation based on the scheme that thefitness of the newly generated individual is not worse than its parent robot,which will guide the robot to explore more positions with no harm to thefitness. • Redistribute the virtual robots once the number of unique virtual robots’14ositions in a subpopulation decreases below a threshold T v , i.e., the sub-population converges. That is, the subpopulation has found better solu-tions and all robots are attracted to the global best. In this situation,virtual robots should be redistributed to the space for better exploration.This strategy corresponds to the lines 8-9 in Algorithm 1. • Redistribute virtual robots once the real predator robot becomes the globalbest in the subpopulation. Because the role of virtual robots is to help thecorresponding real predator robot to find better positions, once this realpredator robot becomes the global best in its subpopulation, the object ofvirtual robots is reached and they should be redistributed to the space tofind potential better positions for the real predator robot. This strategycorresponds to the lines 12-13 in Algorithm 1. • Add a random noise to the position of the real predator robot if it is notthe global best in its subpopulation but abnormally keep stills for a longtime, in which it must get stuck in a deadlock. This strategy correspondsto the lines 14-16 in Algorithm 1. • Add random noises to the positions of all the real predator robots if theyconverges under the situation that the prey robot has not been captured.This strategy corresponds to the lines 17-19 in Algorithm 1.
4. Experiments
In this section, several experiments are conducted in a 30 ×
30 grid world toverify the performance of the proposed CCPSO-R. In particular, to verify thegenerality of CCPSO-R, four types of preys are implemented. The prey robotis initially put in the center of the world, but behaves differently according toits type defined as follows: • STILL PREY: the still prey keeps still in its initial position forever. • RANDOM PREY: the random prey randomly moves to a next positionaccording to the uniform distribution.15
LINEAR PREY: the linear prey chooses one of the 8 directions in whichthe number of predator robots is minimum, and moves in that direction ina straight line. However, when the way of the linear prey is blocked by apredator robot, it cannot move any more but wait for the other predatorrobots coming to encircle it. • SMARTER LINEAR PREY: the smarter linear prey, represented as lin-ear smart, is very similar to the linear prey. The only difference is thatwhen its way is blocked by a predator robot, it moves to an occuppiedneighbor which has the minimum angle distance with its current directionand after that it continues to move in the previous direction if there areobstacles.From the above descriptions, the capabilities of the preys can be intuitivelyranked as ”still prey < random prey < linear prey < linear smart prey”, whichwill be further verfied by the following experiments. Moreover, to verify thescalability of CCPPSO-R, various sizes of the predators swarm are used, fromwhich we can expect the advantages originated from the swarm intelligence ofthe swarm predator robots.The other implementation details are as follows: the initial real predatorrobots are deployed randomly in the whole grid world without overlapping, thepopulation size of each subpopulation is 20, the prey robot moves in 90% ofthe time, D min = 1 in equation (3), and parameters in equation (9) are set as w = 1 , c = c = 2.In addition, note the line 4 in Algorithm 1, i.e., when a subpopulation isre-evaluated due to the past changes in the environment, e.g., the changes ofthe real predator robots’ positions in the other subpopulations, the individualhistorical best position pi ijrobots will not be inherited, and the global best pg irobots will be re-calculated here. This is because, although the experimental results ofinheriting and not inheriting the inidvidual historical memoery pi ijrobots differ, itis hard to select either one due to their competitive performances.As for the performance metrics, we use the number of captures, the average16umber of moves to capture the prey, and their standard derivatives from 100randomly generated test cases given the maximum 1000 time steps, the randomseeds of which are set as 1 to 100. The simulation results are summarized in Table 1, from which it can be seenthat CCPSO-R is reliable with the capture rate being 100% in a limited time, nomatter what type of the prey it is. As expected before, to a swarm of predatorrobots, the difficulties, in terms of the average number of moves, to captureeach type of prey can be generally ranked as ”still prey < random prey < linearprey < linear smart prey”, which can be seen more clearly from Figure 9. Thisconclusion is in consistent with the common opinion in literatures (such as [5]and [13]) that compared with the random prey, the straight line moving preyis more effective because it breaks the movement locality. Hence, the straightline moving prey is more difficult to be captured, which leads to the low capturerates in previous work, such as the manually designed methods [2, 5], EA basedmethod [5] and the case learning method [6].In addition, we show the data of Table 1 in the manner of Figure 10, fromwhich an evident fact can be found that the more predator robots the moreefficient the pursuit is. Besides, from the decreasing standard derivates as morereal predator robots are involved, as shown in Figure 9, it can be concluded thatwith the swarm size of the predator robots gets larger, the pursuit performanceis gettting more and more stable and robust.To give a more intuitive impression to the pursuit process, several represen-tative episodes taking from an experiment of the linear smart prey are displayedin Figure 11.
5. Conclusions
This paper treated the pursuit domain as an optimization problem and pre-sented the cooperative coevolutionary algorithm – CCPSO-R, which, for the17 able 1: Number of captures, average number of moves, and their standard deriviations tocapture different preys with various number of predators out of 100 test cases.
Number of Predators PreyStill Random Linear Linear Smart4 Captures 100 100 100 100Avg. of moves 30.450 49.840 46.900 204.060Std. of moves 19.943 36.867 31.886 198.9278 Captures 100 100 100 100Avg. of moves 22.220 33.780 42.240 121.820Std. of moves 13.384 23.039 46.583 111.92212 Captures 100 100 100 100Avg. of moves 20.470 24.520 30.190 76.780Std. of moves 11.364 13.414 24.943 72.83916 Captures 100 100 100 100Avg. of moves 17.360 18.360 25.620 49.850Std. of moves 9.648 11.277 23.560 50.04824 Captures 100 100 100 100Avg. of moves 15.060 14.060 19.670 35.400Std. of moves 10.688 6.151 21.879 32.588first time, introduces the combination of the real robots and virtual robots intothe correspondences between the individual representation of an EA and therobots in an application. Before our work, an individual in an EA will be as-signed to a real robot. However, in our algorithm, only the first individual ineach subpopulation corresponds to a real robot, while the rest individuals areall vritual robots, who act as a kind of action space for real robots by samplingand exploring their vicinities.Besides, it should be noted that there are no fixed behavior rules for thepredator robots swarm. Instead, the robots swarm are guided immediately by18 igure 9: Box plot of the moves to capture a specific prey with a specific number of predatorrobots.Figure 10: Bar graph of the average moves to capture a specific prey with a specific numberof predator robots. the fitness funcion, which is designed in a modular manner by incorporatingvery limited domain knowledges. As one module, the collision avoidance con-sideration is integrated in the fitness function, which itself is another fitnessfunction for repelling and can be versatile by tuning its paramter D min . If the D min = 1, as it is in this paper, the robot swarm can capture the prey whilemoving without collisions. 19 a) (b) (c)(d) (e) (f)(g) (h) (i) Figure 11: Illustration of the pursuit process, taking the linear smart prey as an example. (a)is the initialization. From (a)-(b), the prey moves in the southeast direction in a straight line.In (c), the prey encounters an orthogonal real predator robot. So, in (d), the prey moves toa nearest unoccupied neighbor in the south. After that, from (d) to (e), the prey continuesto move in its previous straight line direction. Until (f), the prey reaches an edge. So, in (g),the prey re-selects the north as its new escape direction. In (h), the prey is captured. And in(i), the predator robots swarm converge.
Acknowledgements
We would like to thank Qiqi Duan for his help in this research work.
ReferencesReferences [1] M. Benda, V. Jagannathan, R. Dodhiawala, On optimal cooperationof knowledge sources-an empirical investigation, Technical Report BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computing Ser-vices, Seattle, Washington.[2] R. E. Korf, A simple solution to pursuit games, in: Working Papers of The11th International Workshop on Distributed Artificial Intelligence, 1992,pp. 183–194. 213] T. Haynes, R. L. Wainwright, S. Sen, Evolving cooperation strategies., in:ICMAS, 1995, p. 450.[4] T. Haynes, R. L. Wainwright, S. Sen, D. A. Schoenefeld, Strongly typedgenetic programming in evolving cooperation strategies., in: ICGA, Vol. 95,1995, pp. 271–278.[5] T. Haynes, S. Sen, Evolving behavioral strategies in predators and prey, in:G. Weiß, S. Sen (Eds.), Adaption and Learning in Multi-Agent Systems,Springer Berlin Heidelberg, Berlin, Heidelberg, 1996, pp. 113–126.[6] T. Haynes, S. Sen, Learning cases to resolve conflicts and improve groupbehavior, International Journal of Human-Computer Studies 48 (1) (1998)31–49.[7] J. R. Koza, Genetic programming, MIT Press, Cambridge, MA, 1992.[8] D. J. Montana, Strongly typed genetic programming, Evolutionary Com-putation 3 (2) (1995) 199–230. arXiv:https://doi.org/10.1162/evco.1995.3.2.199 , doi:10.1162/evco.1995.3.2.199 .URL https://doi.org/10.1162/evco.1995.3.2.199 [9] C. Undeger, F. Polat, Multi-agent real-time pursuit, AutonomousAgents and Multi-Agent Systems 21 (1) (2010) 69–107. doi:10.1007/s10458-009-9102-0 .[10] Y. Ishiwaka, T. Sato, Y. Kakazu, An approach to the pursuit problem ona heterogeneous multiagent system using reinforcement learning, Roboticsand Autonomous Systems 43 (4) (2003) 245 – 256. doi:https://doi.org/10.1016/S0921-8890(03)00040-X .[11] S. Barrett, P. Stone, S. Kraus, Empirical evaluation of ad hoc teamwork inthe pursuit domain, in: The 10th International Conference on AutonomousAgents and Multiagent Systems - Volume 2, AAMAS ’11, InternationalFoundation for Autonomous Agents and Multiagent Systems, Richland,SC, 2011, pp. 567–574. 2212] S. Barrett, A. Rosenfeld, S. Kraus, P. Stone, Making friends on the fly:Cooperating with new teammates, Artificial Intelligence 242 (2017) 132 –171. doi:https://doi.org/10.1016/j.artint.2016.10.005 .[13] P. Stone, M. Veloso, Multiagent systems: A survey from a machine learningperspective, Autonomous Robots 8 (3) (2000) 345–383. doi:10.1023/A:1008942012299 .[14] M. S. Couceiro, R. P. Rocha, N. M. F. Ferreira, A novel multi-robot explo-ration approach based on particle swarm optimization algorithms, in: 2011IEEE International Symposium on Safety, Security, and Rescue Robotics,2011, pp. 327–332. doi:10.1109/SSRR.2011.6106751doi:10.1109/SSRR.2011.6106751