Multiple Pursuer Multiple Evader Differential Games
Eloy Garcia, David W. Casbeer, Alexander Von Moll, Meir Pachter
aa r X i v : . [ m a t h . O C ] N ov Multiple Pursuer Multiple Evader DifferentialGames
Eloy Garcia, David W. Casbeer, Alexander Von Moll, and Meir Pachter
Abstract
In this paper an N -pursuer vs. M -evader team conflict is studied. The differential game of borderdefense is addressed and we focus on the game of degree in the region of the state space where thepursuers are able to win. This work extends classical differential game theory to simultaneously addressweapon assignments and multi-player pursuit-evasion scenarios. Saddle-point strategies that provideguaranteed performance for each team regardless of the actual strategies implemented by the opponentare devised. The players’ optimal strategies require the co-design of cooperative optimal assignmentsand optimal guidance laws. A representative measure of performance is proposed and the Value functionof the game is obtained. It is shown that the Value function is continuous, continuously differentiable,and that it satisfies the Hamilton-Jacobi-Isaacs equation – the curse of dimensionality is overcome andthe optimal strategies are obtained. The cases of N = M and N > M are considered. In the latter case,cooperative guidance strategies are also developed in order for the pursuers to exploit their numericaladvantage. This work provides a foundation to formally analyze complex and high-dimensional conflictsbetween teams of N pursuers and M evaders by means of differential game theory. I. I
NTRODUCTION
Differential game theory provides the right framework to analyze pursuit-evasion problems and,as a corollary, combat games. Pursuit-evasion scenarios involving multiple agents are importantbut challenging problems in aerospace, control, and robotics. Pursuit-evasion problems were firstformulated in the seminal work [1]. Concerning many players games, reference [2] addressedthe interesting dynamic game of a fast pursuer trying to capture in minimal time two slowerevaders in succession. Motivated by the work in [2], the paper [3] analyzed the case wherethe fast pursuer tries to capture multiple evaders. Reach and avoid differential games whichinclude time-varying dynamics, targets and constraints were addressed in [4] by means of a
This work has been supported in part by AFOSR LRIR No. 18RQCOR036.E. Garcia, D. Casbeer, and A. Von Moll are with the Control Science Center of Excellence, Air Force Research Laboratory,Wright-Patterson AFB, OH 45433. Corresponding author [email protected]
M. Pachter is with the Department of Electrical Engineering, Air Force Institute of Technology, Wright-Patterson AFB, OH45433.
November 12, 2019 DRAFT modified Hamilton-Jacobi-Isaacs (HJI) equation in the form of a double-obstacle variationalinequality. The work in [4] has interesting applications in collision avoidance, motion planning,and aircraft control. The authors of [5] provided a game formulation to solve reach and avoidproblems involving nonlinear systems. Other approaches and applications regarding reach andavoid games are found in [6], [7].The paper [8] considered a group of cooperative pursuers that try to capture a single evaderwithin a bounded domain. The domain may also contain obstacles and the solution employsVoronoi partitions of the plane. Similar games concerning multiple pursuers that try to capturean evader have been addressed in [9]–[13]. In the references [14]–[18], cooperative behaviorswithin pursuit-evasion games are analyzed in order to protect or rescue teammates in the presenceof adversarial entities.Several papers have addressed pursuit-evasion scenarios and missile interception problemsposed as differential games, see e.g. [19]–[22]. The Linear Quadratic Differential Game (LQDG)formulation is a particular instance of differential games and is well suited for maneuverabletarget interception using a homing missile and for missile-missile engagements – this due toits analytical friendliness. In their pioneering work Ho, Bryson, and Baron [23] introducedthe LQDG formulation to specifically address pursuit-evasion problems. The flexibility of theLQDG formulation for addressing target interception problems by tuning the relevant weightswas highlighted in [24]. The recent work in [25] developed a method for intercepting a movingtarget by formulating a linear quadratic differential game. In this vein, an LQDG for interceptinga missile and protecting a target was addressed in [26]. Turetsky and Shinar [27] compared thesolutions of two differential games for target interception: LQDG and a norm differential game(NDG); in the latter the cost/payoff is only a function of the miss distance and a hard bound onthe players’ controls is imposed. Several observations were made in [27] including the smallercontrol effort required in the LQDG than in the NDG for the same initial conditions and problemparameters.The scenario addressed in this paper is a multi-player Border Defense Differential Game(BDDG). The players are divided into two opposing teams: the pursuer team and the evader team(or N vs. M , blue team vs. red team). The agents in each team cooperate in order to optimize theteam’s performance. We emphasize that while the members within a given team cooperate amongthemselves, the game is non-cooperative since opposing teams do not cooperate. Members of thepursuer team are tasked to pursue the members of the evader team and capture them before theycan reach the border. Hence, the solution of the game should provide state feedback, optimalguidance strategies and also the optimal assignments of N pursuers to M evaders, which is adiscrete decision problem with combinatorial overtones. In other words, the players (pursuers November 12, 2019 DRAFT and evaders) need to dynamically determine their optimal headings/guidance/maneuvers over thecontinuum of space and time. Simultaneously, the team has to obtain the optimal assignmentsover a discrete space of possibilities.The evaders aim is to reach the border. In the case where the evaders are captured beforereaching the border, the evaders try to minimize their combined terminal distance from theborder. The pursuers strive to capture the evaders while maximizing the same metric. In thispaper, we provide a team cooperative optimal solution of this problem that can be implementedin real time and is thus able to exploit non-optimal adversary strategies and maneuvers.The paper [28] addresses a similar differential game to the BDDG formulated in this paperbut with only one invader and one defender. The work in [29] also addresses the problem ofguarding a line segment and it extends to the case of one intruder and a number of defenders;however, the defenders are constrained to move only along a straight line. In [30] the authorspropose a similar algorithm to the Rapidly-exploring Random Tree (RRT) and RRT* to computesolutions of particular pursuit-evasion problems where the evader is only aware of the initial stateof the pursuer, while the pursuer has access to full information about the evader’s trajectory.The recent papers [31] and [32] present two of the most related scenarios and approachesto the problem discussed here. In the recent paper [31], the authors address the pursuit-evasionproblem where a set of attackers tries to reach a target while avoiding a set of defenders. In thegame proposed in that reference only open-loop strategies are considered where a given teamis assigned to select its strategy first and the opposing team follows with its response. Sucha scenario becomes a Stackelberg game [33]. Furthermore, due to the open-loop nature of thesolution concept and the decomposition approach, the authors focus on computational approachesfor solving Hamilton-Jacobi-Bellman (HJB) local equations (to avoid the curse of dimensionality)as an approximation of the HJI equation of the overall game over the high dimensional statespace of all player states. Reference [32] focuses on approximating the solution of the Hamilton-Jacobi-Isaacs equation and for the players to implement ”semi-open-loop” control strategies. Asstated in [32], Isaacs’ method is the ideal approach to solve differential games, if it is attainable.A disadvantage of Isaacs’ method is that it does not scale well as the dimension of the stateincreases. We will show in this paper that the curse of dimensionality is overcome for the gameunder analysis and Isaacs’ method is applicable. The solution presented in this paper is not anapproximation but the optimal solution of the game over the complete state space. In fact, weare able to obtain the (closed-form) optimal solution of the operationally relevant multi-playerBDDG. We provide the complete solution of the BDDG: we derive state feedback optimalstrategies for each player, the Value function V ( x ) is obtained, and it is shown that V ( x ) is C and it satisfies the Hamilton-Jacobi-Isaacs (HJI) Partial Differential Equation (PDE). November 12, 2019 DRAFT
The paper is organized as follows. The two-team multi-agent BDDG is formally stated inSection II. Section III addresses the case of two pursuers against two evaders. The more generalcase of N pursuers vs. M evaders is considered in Section IV. Examples are shown in SectionV and extensions are discussed in Section VI. Concluding remarks are made in Section VII.II. T HE G AME
We consider a multi-agent pursuit-evasion differential game where each agent belongs to eitherone of two opposing teams. This problem presents unique challenges within classical differentialgame theory. In addition to computing state feedback optimal strategies, this game also requiresthe optimal assignment of pursuers to evaders to determine which pursuer captures which evader.In other words, we need to co-design the optimal guidance strategies, in the form of state feedbackstrategies, and the optimal assignments which are represented by discrete variables. The hybridnature of the problem has rarely been addressed within the theory of differential games [31],[32], [34]An N vs. M team differential game is considered with N defenders/pursuers and M attack-ers/evaders. It is assumed that N ≥ M . The players in the pursuer team are denoted by P i , i = 1 , ..., N and their speeds are v P i ∈ [ v P i , ¯ v P i ] , where v P i > and ¯ v P i denote, respectively,the minimum and maximum speed of player P i . Similarly, the players in the evader team aredenoted by E j , j = 1 , ..., M and their speeds are denoted by v E j ∈ [ v E j , ¯ v E j ] . It can be shownthat optimal strategies demand maximum speed by each player, hence, for simplicity, we denote v P i = v ∗ P i = ¯ v P i , for i = 1 , ..., N and v E j = v ∗ E j = ¯ v E j for j = 1 , ..., M . It is assumed thatthe pursuers are faster than the evaders, so the speed ratios satisfy α ij = v E j /v P i < , for i = 1 , ..., N and j = 1 , ..., M . The obtained results can be extended to the case where a subsetof pursuers are slower than a subset of evaders by imposing a constraint on the assignmentswhere slow pursuers cannot be assigned to intercept faster evaders.The states of P i and E j are given by their Cartesian coordinates x P i = ( x P i , y P i ) and x E j =( x E j , y E j ) . The complete state of the differential game is defined by x := ( x P i , y P i , x E j , y E j ) ∈ R N + M ) .The players have simple motion, so the control variables of the pursuer team are given by thecooperative instantaneous heading angles of each player P i , that is, u P = { ψ i } for i = 1 , ..., N .The evader team controls the state of the system by cooperatively choosing the instantaneousheadings of each evader E j , that is, u E = { φ j } for j = 1 , ..., M . The dynamics ˙ x = f ( x , u E , u P ) November 12, 2019 DRAFT are specified by the system of N + M ) differential equations ˙ x P i = v P i cos ψ i , x P i (0) = x P i ˙ y P i = v P i sin ψ i , y P i (0) = y P i ˙ x E j = v E j cos φ j , x E j (0) = x E j ˙ y E j = v E j sin φ j , y E j (0) = y E j (1)where the admissible controls are the players’ headings ψ i , φ j ∈ [ − π, π ) . The initial state of thesystem is defined as x := ( x P i , y P i , x E j , y E j ) = x ( t ) ∈ R N + M ) . We consider the specificscenario of border defense where the border line is given by the x -axis of the Euclidean planeand the game is played in the half plane y ≥ . Define the binary variable µ ij such that µ ij = 1 ifpursuer i is assigned to capture evader j and µ ij = 0 otherwise. For any pursuer-evader pair, i, j ,such that µ ij = 1 , the game will terminate in two possible ways. The first termination criteriaoccurs if y E j = 0 , meaning that the evader reaches the border before being captured by theassigned pursuer. Otherwise the game will terminate when the pursuer captures the evader. Weconsider the case where a pursuer can be assigned to at most one evader, that is, P Mj =1 µ ij ≤ .In this paper, we consider point capture and we focus on the Game of Degree in the state spaceregion R P ⊂ R N + M ) where capture of all evaders is guaranteed and thus the pursuers’ teamis the winner. However, the obtained strategies also provide the solution to the Game of Kind;this is discussed at the end of Section IV-B. We define the state of binary variables µ = { µ ij } ,for i = 1 , ..., N and j = 1 , ..., M . Also define the augmented state ¯ x = [ x T µ T ] T . In the winningregion of the pursuers, the terminal set is then given by T := (cid:8) ¯ x |∀ j ∈ , ..., M, ∃ i ∈ , ..., N, µ ij = 1 , x P i = x E j , y P i = y E j (cid:9) . (2)Note that (2) includes the case N > M where more than one pursuer could be assigned to anevader. In such a case the pursuers assigned to the same evader will also need to determine acooperative pursuit strategy. This will be clarified in Section IV-B.The terminal time t f is defined as the time instant when the state of the system satisfies(2), that is, the time instant when the last evader is captured. We define the individual terminaltimes t f ij corresponding to the interception of E j by P i . In order to guarantee regularity of thesolutions we define ˙ x P i = ˙ y P i = ˙ x E j = ˙ y E j = 0 for t ≥ t f ij . These definitions allow for thegame to continue until all evaders are captured.The terminal cost/payoff functional is J ( u P ( t ) , u E ( t ); x ) = Φ( x f ) (3) November 12, 2019 DRAFT where Φ( x f ) := M X j =1 y E j ( t f ) . (4)The cost/payoff functional depends only on the terminal state - the BDDG is a terminal cost/Mayertype game. Its Value is given by V ( x ) := max u P ( · ) min u E ( · ) J ( u P ( · ) , u E ( · ); x ) (5)subject to (1) and (2), where u P ( · ) and u E ( · ) are the teams’ state feedback strategies.The cost/payoff functional (4) represents an important measure of combat effectiveness; itrepresents the potential risk or threat to the strategic asset being defended. This risk is inverselyproportional to the distance between the point of interception and the x -axis, a.k.a., the defendedborder. As such, the evaders, knowing that the initial conditions make them unable to reach theborder, wish to be intercepted as close as possible to the border and increase the level of theirthreat to the border. In case the pursuers err, a saddle point state feedback strategy for the evaderswill allow them to end up closer to the border and, perhaps, reach it before being captured bythe pursuers. The pursuers aim at intercepting the evaders as far as possible from the border.Similarly, a saddle point state feedback strategy for the pursuers will provide a robust strategyto capture the evaders, regardless of what (unknown to the pursuers) guidance law the evadersimplement. Furthermore, the pursuers will only increase the total distance from the border tothe terminal capture points when the evaders do not implement their optimal strategies. Theseobjectives highlight the importance of saddle point state feedback strategies (the main resultin this paper) which can be implemented on-line and provide robustness against any possiblemaneuver by the adversaries.We will consider a firm commitment to the initial assignment by the pursuers; this meansthat µ ij ( t ) = µ ij ( t ) , that is, the pursuers do not switch assignments during the engagement. Inaddition to providing the foundation for a framework to analyze more complex scenarios, whichinclude switching assignments and capture in succession, the case of commitment is useful byitself in several applications such as missile interception. When the evaders represent missilestrying to reach the border, the pursuers are then represented by interceptor missiles. Knowing thepositions of the incoming missiles, the interceptors will solve the differential game and track/lockon the assigned missile and disregard the other missiles. Since interceptor i is locked on missile j it will not detonate its warhead until meeting its objective which prevents collisions with otherinterceptors near by.Let the co-state be represented by λ T = ( λ x P , λ y P , ..., λ x PN , λ y PN , λ x E , λ y E , ..., λ x EM , λ y EM ) ∈ R N + M ) . (6) November 12, 2019 DRAFT
The Hamiltonian of the differential game is H = P Ni =1 v P i ( λ x Pi cos ψ i + λ y Pi sin ψ i ) + P Mj =1 v E j ( λ x Ej cos φ j + λ y Ej sin φ j ) . (7) Theorem 1:
Consider the cooperative differential game (1)-(5). The headings of the players E j and P i are constant under optimal play and the optimal trajectories are straight lines. Proof . The proof follows from the fact that the agents have simple motion and the cost is ofMayer type. (cid:3)
Apollonius circle . The Apollonius circle is the locus of points S with a fixed ratio of distancesto two given points which are called foci. Let the instantaneous positions of E j and P i be thefoci, where the fixed ratio is α ij = E j SP i S . Players E j and P i travel at constant speeds and withconstant heading where P i aims at capturing E j at a point I = ( x I , y I ) on the circumference ofthe Apollonius circle. The Apollonius circle divides the plane into two dominance regions: Theinterior of the circle is E j ’s dominance region: E j can reach any point inside the circle before P i ; on the other hand, any point outside the circle can be reached by P i before E j does.At any point on S , the distance traveled by E j is equal to α ij times the distance traveled by P i . It is important to note that in a differential game the aimpoint of a player is not guessed bythe adversary but it is determined by the solution of the differential game which provides theoptimal strategies of each player. This means that each player, by solving the differential game,obtains the optimal strategies for itself and also its opponents. When a state feedback solutionis obtained, actual use of non-optimal strategies is in detriment to the player which does notimplement its optimal strategy, and this benefits the adversary.III. 2 VS . 2 DIFFERENTIAL GAME
In this section we will address the case of 2 pursuers versus 2 evaders. In the 2 vs. 2 BDDGthe state is given by x := ( x E , y E , x E , y E , x P , y P , x P , y P ) ∈ R . Let us define in general y ij ( x ) = y Ej − α ij y Pi − α ij √ ( x Ej − x Pi ) +( y Ej − y Pi ) − α ij . (8)For the case of two pursuers and two evaders let y s ( x ) = y ( x ) + y ( x ) y s ( x ) = y ( x ) + y ( x ) . (9)The following theorem provides the solution of the 2 vs. 2 differential game: it dictates whatthe optimal assignment is and it also provides the state feedback optimal headings for each oneof the four players. Theorem 2:
Consider the 2 vs. 2 BDDG (1)-(5) with α ij = v E j /v P i < , and where x ∈ R P .The Value function is continuous, continuously differentiable (except at the dispersal surface November 12, 2019 DRAFT y s = y s ), and it satisfies the HJI equation. The Value function is explicitly given by V ( x ) = y s ( x ) if y s > y s and V ( x ) = y s ( x ) if y s > y s . The optimal state feedback strategies aregiven by cos φ ∗ = x ∗ E − x E √ ( x ∗ E − x E ) +( y ∗ E − y E ) sin φ ∗ = y ∗ E − y E √ ( x ∗ E − x E ) +( y ∗ E − y E ) cos φ ∗ = x ∗ E − x E √ ( x ∗ E − x E ) +( y ∗ E − y E ) sin φ ∗ = y ∗ E − y E √ ( x ∗ E − x E ) +( y ∗ E − y E ) cos ψ ∗ = x ∗ P − x P √ ( x ∗ P − x P ) +( y ∗ P − y P ) sin ψ ∗ = y ∗ P − y P √ ( x ∗ P − x P ) +( y ∗ P − y P ) cos ψ ∗ = x ∗ P − x P √ ( x ∗ P − x P ) +( y ∗ P − y P ) sin ψ ∗ = y ∗ P − y P √ ( x ∗ P − x P ) +( y ∗ P − y P ) (10)where the players’ optimal aimpoints are x ∗ E = x ∗ P = x E − α x P − α y ∗ E = y ∗ P = y E − α y P − α d − α x ∗ E = x ∗ P = x E − α x P − α y ∗ E = y ∗ P = y E − α y P − α d − α (11)if y s > y s , and x ∗ E = x ∗ P = x E − α x P − α y ∗ E = y ∗ P = y E − α y P − α d − α x ∗ E = x ∗ P = x E − α x P − α y ∗ E = y ∗ P = y E − α y P − α d − α (12)if y s > y s , where d ij = q ( x E j − x P i ) + ( y E j − y P i ) (13)for i, j = 1 , . Proof . In the 2 vs. 2 engagement, where the pursuers commit to their initial assignment, thereare only two possible options for assignments and they are as follows. A : µ = µ = 1 , where P is assigned to intercept E and P is assigned to intercept E ; the cost/payoff is y s . A : µ = µ = 1 , where P is assigned to intercept E and P is assigned to intercept E ; thecost/payoff is y s . November 12, 2019 DRAFT
Fig. 1. BDDG example: two pursuers vs. two evaders
In order to determine both the optimal assignment and the optimal headings, we look at thefour Apollonius circles which are generated by pairing each pursuer with each evader – see Fig.1. The center coordinates of each circle are denoted by ( x c ij , y c ij ) and the radius is denoted by r ij , for i, j = 1 , . For each pair P i E j the optimal interception point is given by the lowestpoint on the corresponding Apollonius circle. This point is denoted by y ij = y c ij − r ij ; thecorresponding x -coordinate is x ij = x c ij .For the assignment A the cost/payoff incurred is y s = y + y . For the assignment A the cost/payoff incurred is y s = y + y . Both, y s and y s , can be explicitly written interms of the state x ; they are given by (8)-(9). Finally, the optimal assignment is given by ι ∗ = arg max ι =1 , y s ι .In order to show that V ( x ) = y s ( x ) is continuously differentiable, we obtain the partial November 12, 2019 DRAFT0 derivatives of the Value function with respect to each element of the state as follows ∂V∂x E = − α − α · x E − x P d ∂V∂y E = − α (cid:0) − α y E − y P d (cid:1) ∂V∂x E = − α − α · x E − x P d ∂V∂y E = − α (cid:0) − α y E − y P d (cid:1) ∂V∂x P = α − α · x E − x P d ∂V∂y P = α − α (cid:0) − α + y E − y P d (cid:1) ∂V∂x P = α − α · x E − x P d ∂V∂y P = α − α (cid:0) − α + y E − y P d (cid:1) (14)where the terms in the denominators d ij > for t < t f ij .We will now show that the Value function V ( x ) = y s ( x ) satisfies the HJI equation. To do sowe compute the following x ∗ E − x E = − α d ∂V∂x E y ∗ E − y E = − α d ∂V∂y E x ∗ E − x E = − α d ∂V∂x E y ∗ E − y E = − α d ∂V∂y E x ∗ P − x P = d α ∂V∂x P y ∗ P − y P = d α ∂V∂y P x ∗ P − x P = d α ∂V∂x P y ∗ P − y P = d α ∂V∂y P . (15)The HJI equation for regular solutions is given by − ∂V∂t = ∂V∂ x · f ( x , ψ ∗ , φ ∗ ) + g ( t, x , ψ ∗ , φ ∗ ) . Notethat in this problem ∂V∂t = 0 and g ( t, x , ψ ∗ , φ ∗ ) = 0 . Using eqs. (1), (10), and (15) we obtain thefollowing ∂V∂ x · f ( x , φ ∗ , φ ∗ , ψ ∗ , ψ ∗ ) = v E (cid:0) ∂V∂x E cos φ ∗ + ∂V∂y E sin φ ∗ (cid:1) + v P (cid:0) ∂V∂x P cos ψ ∗ + ∂V∂y P sin ψ ∗ (cid:1) + v E (cid:0) ∂V∂x E cos φ ∗ + ∂V∂y E sin φ ∗ (cid:1) + v P (cid:0) ∂V∂x P cos ψ ∗ + ∂V∂y P sin ψ ∗ (cid:1) = − α v P q(cid:0) ∂V∂x E (cid:1) + (cid:0) ∂V∂y E (cid:1) + v P q(cid:0) ∂V∂x P (cid:1) + (cid:0) ∂V∂y P (cid:1) − α v P q(cid:0) ∂V∂x E (cid:1) + (cid:0) ∂V∂y E (cid:1) + v P q(cid:0) ∂V∂x P (cid:1) + (cid:0) ∂V∂y P (cid:1) (16) November 12, 2019 DRAFT1 where q(cid:0) ∂V∂x E (cid:1) + (cid:0) ∂V∂y E (cid:1) = − α q α − α ( y E − y P ) d q(cid:0) ∂V∂x P (cid:1) + (cid:0) ∂V∂y P (cid:1) = α − α q α − α ( y E − y P ) d q(cid:0) ∂V∂x E (cid:1) + (cid:0) ∂V∂y E (cid:1) = − α q α − α ( y E − y P ) d q(cid:0) ∂V∂x P (cid:1) + (cid:0) ∂V∂y P (cid:1) = α − α q α − α ( y E − y P ) d . (17)Substituting (17) into (16) we obtain ∂V∂ x · f ( x , φ ∗ , φ ∗ , ψ ∗ , ψ ∗ ) = − − α q α − α ( y E − y P ) d (cid:0) α v P − α v P (cid:1) − − α q α − α ( y E − y P ) d (cid:0) α v P − α v P (cid:1) = 0 and the Value function V ( x ) = y s ( x ) satisfies the HJI equation.Using V ( x ) = y s ( x ) and the corresponding interception points shown in (12), it is possibleto show that V ( x ) = y s ( x ) is continuous, continuously differentiable, and it satisfies the HJIequation by following similar steps to (14)-(17).Finally, the singular surface y s ( x ) = y s ( x ) corresponds to a dispersal surface where bothassignments A and A are optimal. Clearly, at the dispersal surface, the Value function iscontinuous since V ( x ) = y s ( x ) = y s ( x ) ; however, V ( x ) is not continuously differentiable. Forinstance ∂y s ∂x E = − α − α · x E − x P d = − α − α · x E − x P d = ∂y s ∂x E . Corresponding expressions holdfor the remaining partial derivatives.Similar to most dispersal surfaces in pursuit-evasion differential games, when presented withthis scenario, the agents choose one of the two equally optimal assignments and the state of thesystem leaves the dispersal surface. Oddly, this dispersal surface benefits the pursuers, that is,the pursuers do not lose any performance by selecting a different assignment than the evaders.However, the evaders may see (a possibly large) increase in their combined cost if they assumethe wrong assignment (i.e. the one not selected by the pursuers) since they will try to evadepursuers which are not actually pursuing them. (cid:3) Remark 1:
Although the pursuers commit to their initial assignment, that is, each one lockson a given evader and keeps pursuing it until capture occurs, the optimal headings in (10) arestate feedback policies. As such, the pursuers are able to react to non-optimal strategies by theevaders by continuously recomputing its optimal heading given by (10). When an evader doesnot follow its prescribed optimal strategy, not only is captured by the assigned pursuer but theterminal cost/payoff increases with respect to the Value of the game. This is of benefit to thepursuers.
November 12, 2019 DRAFT2
Remark 2:
The regular saddle point solution to the 2 vs. 2 differential game can be computedby each agent individually without need for communication between teammates. Since all agentsare aware of the state of the system, each agent can compute the complete solution: the optimalassignment and the optimal headings of every player. Then, each player, implements its ownoptimal strategy. This is not the case when the state of the system resides on the dispersalsurface y s ( x ) = y s ( x ) . Since both assignments are equally optimal, both pursuers could assignthemselves to intercept the same evader while leaving the remaining evader free to reach theborder. Hence, deconfliction needs to occur through a single communication event, e.g., onepursuer is given priority and identifies the evader it has chosen to pursue and the second pursuerassigns itself to the remaining evader.IV. M ULTI - AGENT DIFFERENTIAL GAME
In this section we extend the BDDG to address the case of N pursuers and M evaders,for N = M and for N > M . The case N = M is presented first in order to introduce theenumeration of feasible assignments. Next, the more general case N > M is addressed whichinvolves cooperative guidance between two pursuers in order to intercept an evader.
A. Case: N = M We start by enumerating all feasible assignments A ι . Feasible assignments mean those as-signments where all evaders can be potentially captured. For instance, in Fig. 2, the Apolloniuscircle between P and E (shown by the bold dot-dashed line) intersects the x -axis; hence,any assignment matching P with E is not feasible. Thus, the feasible assignments in Fig. 2are A : µ = µ = µ = 1 ; A : µ = µ = µ = 1 ; A : µ = µ = µ = 1 ; A : µ = µ = µ = 1 .In general, the number of feasible assignments is denoted by ¯ ι so the assignment index ι =1 , ..., ¯ ι . Define y s ι ( x ) = P Mj =1 µ ιij y ij ( x ) (18)for ι = 1 , ..., ¯ ι , where the assignment variables µ ιij are specified by the corresponding assignment A ι . The optimal assignment variables are denoted by µ ∗ ij . Theorem 3:
Consider the N vs. M BDDG where N = M , α ij = v E j /v P i < , and x ∈ R P . TheValue function is continuous, continuously differentiable (except at dispersal surfaces y s ι = y s ι ′ for any ι, ι ′ = 1 , ..., ¯ ι ), and it satisfies the HJI equation. The Value function is explicitly given by V ( x ) = max ι y s ι ( x ) . The corresponding optimal assignment is ι ∗ = arg max ι A ι . The optimal November 12, 2019 DRAFT3 -5 0 5 10 15 20 25 30 x y E E E P P P Fig. 2. BDDG example: three pursuers vs. three evaders state feedback strategies are given by cos φ ∗ j = x ∗ Ej − x Ej q ( x ∗ Ej − x Ej ) +( y ∗ Ej − y Ej ) sin φ ∗ j = y ∗ Ej − y Ej q ( x ∗ Ej − x Ej ) +( y ∗ Ej − y Ej ) cos ψ ∗ i = x ∗ Pi − x Pi √ ( x ∗ Pi − x Pi ) +( y ∗ Pi − y Pi ) sin ψ ∗ i = y ∗ Pi − y Pi √ ( x ∗ Pi − x Pi ) +( y ∗ Pi − y Pi ) (19)where the optimal aimpoints are x ∗ E j = x ∗ P i = x Ej − α ij x Pi − α ij y ∗ E j = y ∗ P i = y Ej − α ij y Pi − α ij d ij − α ij (20)for a pair E j /P i such that µ ∗ ij = 1 , where d ij = q ( x E j − x P i ) + ( y E j − y P i ) (21)for i, j = 1 , ..., N . Proof . The proof follows that of Theorem 2 and it is omitted here for brevity. (cid:3)
B. Case:
N > M
We now consider the multi-agent BDDG with N pursuers and M evaders with N > M . Thenumber advantage is explicitly exploited by the pursuers by implementing cooperative pursuitagainst the evaders. Cooperative pursuit by two pursuers against one evader is beneficial for the
November 12, 2019 DRAFT4 pursuers because in most cases it will cause capture to occur farther away from the border than inthe non-cooperative single pursuer single evader case [35]. It is also important since it allow usto consider scenarios where an evader is potentially able to reach the x -axis if only one pursueris assigned to it, but it will not reach the x -axis if more than one pursuer cooperatively interceptit. For instance, consider the two pursuers and one evader game in Fig. 3.a. If only P is assignedto E then, the latter can reach the x -axis since the EP Apollonius circle intersects the x -axis.Similar situation occurs if only P is assigned. However, if both P and P cooperate to capture E they can significantly decrease the region of dominance of E which is now restricted to bethe lens shaped area of intersection of the two circles (since any point inside the EP circle butoutside the EP circle can be reached by P before E and, similarly, any point inside the EP circle but outside the EP circle can be reached by P before E ). In the example shown in Fig.3.a the point with smallest y -coordinate in the region of dominance of E is now given by point I – the intersection point of the two Apollonius circles.We now apply the cooperative guidance concept in order to obtain the saddle point solution tothe multi-agent BDDG: when the pursuers outnumber the evaders, cooperation among a groupof N pursuers entails the best cooperative assignment together with the cooperatively designedheading strategy in order to maximize the team’s payoff. The best strategy by the outnumberedevaders in order to minimize their combined cost is for each one to head to the lowest point inits dominance region which is determined by the optimal assignment of pursuers to evaders. Asexpected, the solution of the game provides the optimal strategies for each agent.In order to address isochronous or simultaneous capture we consider the following. If anevader E j can be potentially captured simultaneously by two pursuers P i and P i ′ we use E j P i and E j P i ′ to denote the corresponding Apollonius circles. They are given, respectively, by ( x − x c ij ) + ( y − y c ij ) = r ij ( x − x c i ′ j ) + ( y − y c i ′ j ) = r i ′ j (22)where x c ij = − α ij ( x E j − α ij x P i ) , y c ij = − α ij ( y E j − α ij y P i ) , r ij = α ij − α ij d ij , for i, i ′ . Remark 3:
By construction of the Apollonius circle, the Evader is always located insideof both circles specified in eq. (22). If the circles do not intersect, then, one of the circlesis completely located inside the other circle and the evader is captured by only one pursuer.The constructed Apollonius circles are never mutually exclusive since E has to be inside bothcircles. The intersection of the Apollonius circles is a necessary, but not sufficient, conditionfor simultaneous capture. Conversely, if, under optimal play, the evader is to be simultaneouslycaptured by the two pursuers, then the circles intersect each other. Fig. 3.b shows an exampleof the former where the circles intersect but the lowest point in the evader dominance region isstill ( x c , y c − r ) = (8 . , . . The lower point of intersection of the circles is given by November 12, 2019 DRAFT5 -10 -5 0 5 10 15 20 25 x y a) -5 0 5 10 15 20 x -2024681012141618 y b) P P P E x P EP circle EP circle EP circleEP circleEI .. I Fig. 3. Cooperative pursuit against one evader. a) Lowest point of evader dominance region is at the intersection of the twoApollonius circles. b) Lowest point on evader dominance region is located on the arc of the EP Apollonius circle -5 0 5 10 15 20 25 x y E j P E j P P P P E E E j P Fig. 4. Three pursuers and two evaders feasible assignments (6 . , . which has a higher value for the y -coordinate than the lowest point on the EP Apollonius circle. Hence, under optimal play, the evader is captured only by P .In general, the pursuers have different speeds and a third Apollonius circle can be constructedin terms of the positions of P i and P i ′ , and in terms of their corresponding speed ratio α i ′ i .Without loss of generality, we consider P i ′ to be the faster of the two pursuers and we definethe speed ratio α i ′ i = v P i /v P i ′ < . The P i ′ P i Apollonius circle is given by ( x − x c i ′ i ) + ( y − y c i ′ i ) = r i ′ i (23)where x c i ′ i = − α i ′ i ( x P i − α i ′ i x P i ′ ) , y c i ′ i = − α i ′ i ( y P i − α i ′ i y P i ′ ) , r i ′ i = α i ′ i − α i ′ i d i ′ i , and d i ′ i = p ( x P i − x P i ′ ) + ( y P i − y P i ′ ) .Similar to the case N = M , A ι for ι = 1 , ..., ¯ ι denotes the feasible assignments. In order to November 12, 2019 DRAFT6
TABLE IF
EASIBLE ASSIGNMENTS FOR P VS . E EXAMPLE A ι Potential match µ ij A P P ⇒ E , P ⇒ E µ = µ = µ = 1 A P ⇒ E , P P ⇒ E µ = µ = µ = 1 A P P ⇒ E , P ⇒ E µ = µ = 1 A P ⇒ E , P P ⇒ E µ = µ = 1 A P P ⇒ E , P ⇒ E µ = µ = 1 A P P ⇒ E , P ⇒ E µ = µ = µ = 1 enumerate the feasible assignments we consider the choices where simultaneous capture helpsthe pursuers to increase their payoff. In the simple example in Fig. 4, with N = 3 , M = 2 , thefeasible assignments are shown in Table I. In this table, the first column represents the assignmentindex, the second column shows the potential matching to be analyzed in the assignment, andthe third column provides the resulting assignment variables. For example, in A we look intothe possible assignment of P and P to E , while E is assigned to P . Cooperation between P and P helps to increase the payoff, that is, it helps to capture E farther away from the x -axiscompared to the individual solutions of each P and P . Therefore, both pursuers are assignedto capture E and the resulting assignment variables are µ = µ = µ = 1 . On the otherhand, in A , cooperation between P and P does not help to increase the payoff compared tothe individual solution where only P captures E . Hence, only P is assigned to E (and P to E ) and the resulting assignment variables are µ = µ = 1 . Visually, this can be confirmedfrom Fig. 4.We will now provide the solution of the N vs. M BDDG, for the case
N > M . Let us define y s ι ( x ) = P Mj =1 µ ιij y ij (24)where y ij ( x ) is given by (8) if, in assignment A ι , µ ιij = 1 holds only for one pursuer i , that is, E j is captured by only one pursuer. Also define y ij ( x ) = F ij − q ( x cij − x ci ′ j ) G ij D ij = V s ( x ) (25)if, in assignment A ι , µ ιij = 1 holds for two pursuers i, i ′ , that is, E j is captured simultaneously November 12, 2019 DRAFT7 by two pursuers where F ij = y c i ′ i ( x c ij − x c i ′ j ) − ( y c ij − y c i ′ j ) (cid:0) R ij − x c i ′ i ( x c i ′ j − x c ij ) (cid:1) G ij = r i ′ i D ij − (cid:0) R ij + x c i ′ i ( x c ij − x c i ′ j ) + y c i ′ i ( y c ij − y c i ′ j ) (cid:1) D ij = ( x c ij − x c i ′ j ) + ( y c ij − y c i ′ j ) R ij = r ij − r i ′ j − x c ij + x c i ′ j − y c ij + y c i ′ j . (26) Theorem 4:
Consider the N vs. M BDDG where
N > M , α ij = v E j /v P i < , and x ∈ R P . TheValue function is continuous, continuously differentiable (except at dispersal surfaces y s ι = y s ι ′ for any ι, ι ′ = 1 , ..., ¯ ι ), and it satisfies the HJI equation. The Value function is explicitly given by V ( x ) = max ι y s ι ( x ) . The corresponding optimal assignment is ι ∗ = arg max ι A ι . The optimalstate feedback strategies are given by (19). The optimal aimpoints are given by (20) if E j iscaptured by only one pursuer and they are given by x ∗ = x ∗ E j = x ∗ P i = x ∗ P i ′ = R ij − y ci ′ j − y cij ) V s ( x )2( x ci ′ j − x cij ) (27)and y ∗ = y ∗ E j = y ∗ P i = y ∗ P i ′ = V s ( x ) as defined in (25) if E j is captured simultaneously by twopursuers. Proof . We focus on the terms V s ( x ) of the Value function given by (25) which are associatedwith simultaneous capture of an evader by two pursuers. The remaining terms are of the form (8)which are associated with evaders being captured by a single pursuer and they can be analyzedas in Theorem 2.The term V s ( x ) is the point with smallest y -coordinate in the region of dominance of E j ,where E j will be captured simultaneously by two pursuers. This point is one of the intersectionsof the two Apollonius circles in (22). In order to obtain the intersection points we subtract thesecond equation from the first equation in (22) we obtain the linear equation x c i ′ j − x c ij ) x + 2( y c i ′ j − y c ij ) y = R ij . (28)Equation (28) is used in (23) in order to obtain the quadratic equation in yD ij y − F ij y + [ R ij − x c i ′ i ( x c i ′ j − x c ij )] + ( y c i ′ i − r i ′ i )( x c i ′ j − x c ij ) = 0 (29)where the applicable solution is given by (25).We now proceed to obtain the partial derivatives of the term (25) with respect to each elementof the state. In order to simplify the notation we define: F = F ij , G = G ij , D = D ij , R = R ij , x i = x c ij , y i = y c ij , x i ′ = x c i ′ j , y i ′ = y c i ′ j , x ′ = x c i ′ i , y ′ = y c i ′ i , r i = r ij , r i ′ = r i ′ j , r = r i ′ i , α i = α ij , α i ′ = α i ′ j , and α = α i ′ i . Then, using the previous definitions, V s ( x ) can be written asfollows V s ( x ) = F − √ ( x i − x i ′ ) GD . November 12, 2019 DRAFT8
We start by computing the following ∂F∂x Pi = α i [2 y ′ ( x i ′ − x i )+( y i − y i ′ )( x ′ − x Pi )]1 − α i + ( x i ′ − x i )( y i − y i ′ )1 − α ∂F∂x Pi ′ = − α i ′ [2 y ′ ( x i ′ − x i )+( y i − y i ′ )( x ′ − x Pi ′ )]1 − α i ′ − α ( x i ′ − x i )( y i − y i ′ )1 − α ∂F∂x Ej = ( − α i − − α i ′ )[2 y ′ ( x i − x i ′ )+( y i − y i ′ )( x E j − x ′ )] ∂F∂y Pi = α i [ R/ x ′ ( x i − x i ′ ) − ( y i − y i ′ ) y Pi ]1 − α i + ( x i − x i ′ ) − α ∂F∂y Pi ′ = − α i [ R/ x ′ ( x i − x i ′ ) − ( y i − y i ′ ) y Pi ]1 − α i − α ( x i − x i ′ ) − α ∂F∂y Ej = ( − α i − − α i ′ )[( y i − y i ′ ) y E j − R + x ′ ( x i ′ − x i )] . (30)Let us also obtain ∂G∂x Pi = − α i (cid:0) ( x i − x i ′ ) r +( x Pi − x ′ ) √ r D − G (cid:1) − α i + (cid:0) α − α ( x Pi − x Pi ′ ) D − ( x i − x i ′ ) √ r D − G (cid:1) − α ∂G∂x Pi ′ = α i ′ (cid:0) ( x i − x i ′ ) r +( x Pi ′ − x ′ ) √ r D − G (cid:1) − α i ′ − α (cid:0) xPi − xPi ′ − α D − ( x i − x i ′ ) √ r D − G (cid:1) − α ∂G∂x Ej = 2 (cid:0) − α i − − α i ′ (cid:1)(cid:0) ( x i − x i ′ ) r + ( x E j − x ′ ) √ r D − G (cid:1) ∂G∂y Pi = − α i (cid:0) ( y i − y i ′ ) r +( y Pi − y ′ ) √ r D − G (cid:1) − α i + (cid:0) α − α ( y Pi − y Pi ′ ) D − ( y i − y i ′ ) √ r D − G (cid:1) − α ∂G∂y Pi ′ = α i ′ (cid:0) ( y i − y i ′ ) r +( y Pi ′ − y ′ ) √ r D − G (cid:1) − α i ′ − α (cid:0) yPi − yPi ′ − α D − ( y i − y i ′ ) √ r D − G (cid:1) − α ∂G∂y Ej = 2 (cid:0) − α i − − α i ′ (cid:1)(cid:0) ( y i − y i ′ ) r + ( y E j − y ′ ) √ r D − G (cid:1) . (31)Additionally, we have that ∂D∂x Pi = − α i − α i ( x i − x i ′ ) ∂D∂x Pi ′ = α i ′ − α i ′ ( x i − x i ′ ) ∂D∂x Ej = 2( − α i − − α ′ i )( x i − x i ′ ) ∂D∂y Pi = − α i − α i ( y i − y i ′ ) ∂D∂y Pi ′ = α i ′ − α i ′ ( y i − y i ′ ) ∂D∂y Ej = 2( − α i − − α ′ i )( y i − y i ′ ) . (32)Then, we can write the gradient of V s ( x ) as follows ∂V s ∂x Pi = D (cid:0) ∂F∂x Pi − q ( x i − x i ′ ) G ∂G∂x Pi + α i ( x i − x i ′ )1 − α i [2 V s + q G ( x i − x i ′ ) ] (cid:1) ∂V s ∂x Pi ′ = D (cid:0) ∂F∂x Pi ′ − q ( x i − x i ′ ) G ∂G∂x Pi ′ − α i ′ ( x i − x i ′ )1 − α i ′ [2 V s + q G ( x i − x i ′ ) ] (cid:1) ∂V s ∂x Ej = D (cid:0) ∂F∂x Ej − q ( x i − x i ′ ) G ∂G∂x Ej − ( − α i − − α i ′ )( x i − x i ′ )[2 V s + q G ( x i − x i ′ ) ] (cid:1) ∂V s ∂y Pi = D (cid:0) ∂F∂y Pi − q ( x i − x i ′ ) G ∂G∂y Pi + α i − α i ( y i − y i ′ ) V s (cid:1) ∂V s ∂y Pi ′ = D (cid:0) ∂F∂y Pi ′ − q ( x i − x i ′ ) G ∂G∂y Pi ′ + α i ′ − α i ′ ( y i − y i ′ ) V s (cid:1) ∂V s ∂y Ej = D (cid:0) ∂F∂y Ej − q ( x i − x i ′ ) G ∂G∂y Ej − − α i − − α i ′ )( y i − y i ′ ) V s (cid:1) . (33) November 12, 2019 DRAFT9
We note that V s ( x ) is continuous in R P ι ∗ ( ∈ R P ) , where in the optimal assignment ι ∗ there existsat least one evader E j such that µ ij = µ i ′ j = 1 , that is, at least one evader is simultaneouslycaptured by two pursuers. Consequently, there exist at least one term (25) contributing to theValue function. From (25) and the definition of D in (26) we conclude that D = 0 only if x i = x i ′ and y i = y i ′ , that is, the centers of the E j P i and E j P i ′ circles coincide. However, insuch a case, the circles do not intersect (except when r i = r i ′ ) and E j is captured by only onepursuer. This means that for any { x | x i = x i ′ , y i = y i ′ } , then the term (25) does not contributeto the Value function. Thus, D = 0 for any R P ι ∗ . The terms of the form (8) were previouslyanalyzed, then the Value function is continuous.The term V s ( x ) in (25) is continuously differentiable in R P ι ∗ . Here, we also need to take intoconsideration the term G . Let ( x I , y I ) and ( x I , y I ) denote the coordinates of the two intersectionpoints. From (25) and (29), G = 0 only when the two intersection points of the Apollonius circleshave the same y -coordinate, that is, y I = y I . Since E j is always located inside both, the E j P i and the E j P i ′ circles, then the only case for both y I = y I and x I = x I to hold is when thecircles are tangent to each other and one of them is completely contained inside the other; sucha case can be analyzed as a single Pursuer differential game. Now, in the case where y I = y I and x I = x I , by convexity of the circles, the point on the reachable region of the Pursuer withlowest y -coordinate is located in the arc of one of the circles, not on any of the two intersectionpoints. Then, the optimal strategy is for E j to be captured by only one pursuer which means thatfor any { x | G = 0 } , then the term (25) does not contribute to the Value function. Thus, G = 0 for any x ∈ R P ι ∗ .Finally, we will show that the Value function satisfies the HJI equation. Similar to previoussections, in the HJI we only need to consider the term ∂V∂ x · f ( x , φ ∗ j , ψ ∗ i ) , for i = 1 , ..., N , j =1 , ..., M . Furthermore, since the terms (8) were already analyzed in Theorem 2, we now onlyfocus on the terms V s ( x ) . Using (19) we obtain the following ∂V s ∂ x · f ( x , φ ∗ j , ψ ∗ i , ψ ∗ i ′ ) = v P i ∂Vs∂xPi ( x ∗ − x Pi )+ ∂Vs∂yPi ( y ∗ − y Pi ) √ ( x ∗ − x Pi ) +( y ∗ − y Pi ) + v P i ′ ∂Vs∂xPi ′ ( x ∗ − x Pi ′ )+ ∂Vs∂yPi ′ ( y ∗ − y Pi ′ ) √ ( x ∗ − x Pi ′ ) +( y ∗ − y Pi ′ ) + v E j ∂Vs∂xEj ( x ∗ − x Ej )+ ∂Vs∂yEj ( y ∗ − y Ej ) √ ( x ∗ − x Ej ) +( y ∗ − y Ej ) (34)Let I ∗ = ( x ∗ , y ∗ ) and note that I ∗ P i ′ = α I ∗ P i = α i ′ I ∗ E j . Hence, we use the commondenominator I ∗ P i ′ = p ( x ∗ − x P i ′ ) + ( y ∗ − y P i ′ ) and the speed v P i ′ = v Pi α = v Ej α i ′ in (34). When the centers of both circles coincide and, in addition, r i = r i ′ , the circles are identical, the game morphs into a singlepursuer differential game. November 12, 2019 DRAFT0
In addition, we substitute (33) into (34) to obtain the following ∂V s ∂ x · f ( x , φ ∗ j , ψ ∗ i , ψ ∗ i ′ ) = v Pi ′ D · I ∗ P i ′ × (cid:16) ( ∂F∂x Pi − q ( x i − x i ′ ) G ∂G∂x Pi + α i ( x i − x i ′ )1 − α i [2 V s + q G ( x i − x i ′ ) ])( x ∗ − x P i )+ ( ∂F∂y Pi − q ( x i − x i ′ ) G ∂G∂y Pi + α i ( y i − y i ′ ) V s − α i )( V s − y P i )+ ( ∂F∂x Pi ′ − q ( x i − x i ′ ) G ∂G∂x Pi ′ − α i ′ ( x i − x i ′ )1 − α i ′ [2 V s + q G ( x i − x i ′ ) ])( x ∗ − x P i ′ )+ ( ∂F∂y Pi ′ − q ( x i − x i ′ ) G ∂G∂y Pi ′ − α i ′ ( y i − y i ′ ) V s − α i ′ )( V s − y P i ′ )+ ( ∂F∂x Ej − q ( x i − x i ′ ) G ∂G∂x Ej − [ − α i − − α i ′ ][ x i − x i ′ ][2 V s + q G ( x i − x i ′ ) ])( x ∗ − x E j )+ ( ∂F∂y Ej − q ( x i − x i ′ ) G ∂G∂y Ej − − α i − − α i ′ ][ y i − y i ′ ] V s )( V s − y E j ) (cid:17) . (35)Expanding the terms in (35) we have ∂V s ∂ x · f ( x , φ ∗ j , ψ ∗ i , ψ ∗ i ′ ) = v Pi ′ D · I ∗ P i ′ × (cid:16) ( x i − x i ′ ) [2 V s + q G ( x i − x i ′ ) ] + 2( y i − y i ′ ) V s + x ∗ ( ∂F∂x Pi + ∂F∂x Pi ′ + ∂F∂x Ej ) − x P i ∂F∂x Pi − x P i ′ ∂F∂x Pi ′ − x E j ∂F∂x Ej + V s ( ∂F∂y Pi + ∂F∂y Pi ′ + ∂F∂y Ej ) − y P i ∂F∂y Pi − y P i ′ ∂F∂y Pi ′ − y E j ∂F∂y Ej − q ( x i − x i ′ ) G (cid:2) x ∗ ( ∂G∂x Pi + ∂G∂x Pi ′ + ∂G∂x Ej ) − x P i ∂G∂x Pi − x P i ′ ∂G∂x Pi ′ − x E j ∂G∂x Ej + V s ( ∂G∂y Pi + ∂G∂y Pi ′ + ∂G∂y Ej ) − y P i ∂G∂y Pi − y P i ′ ∂G∂y Pi ′ − y E j ∂G∂y Ej (cid:3)(cid:17) . (36)It can be shown that ∂F∂x Pi + ∂F∂x Pi ′ + ∂F∂x Ej = 0 ∂F∂y Pi + ∂F∂y Pi ′ + ∂F∂y Ej = D ∂G∂x Pi + ∂G∂x Pi ′ + ∂G∂x Ej = 0 ∂G∂y Pi + ∂G∂y Pi ′ + ∂G∂y Ej = 0 (37)and (36) simplifies to ∂V s ∂ x · f ( x , φ ∗ j , ψ ∗ i , ψ ∗ i ′ ) = v Pi ′ D · I ∗ P i ′ × (cid:16) V s D +( x i − x i ′ ) q G ( x i − x i ′ ) − x P i ∂F∂x Pi − x P i ′ ∂F∂x Pi ′ − x E j ∂F∂x Ej − y P i ∂F∂y Pi − y P i ′ ∂F∂y Pi ′ − y E j ∂F∂y Ej + q ( x i − x i ′ ) G (cid:2) x P i ∂G∂x Pi + x P i ′ ∂G∂x Pi ′ + x E j ∂G∂x Ej + y P i ∂G∂y Pi + y P i ′ ∂G∂y Pi ′ + y E j ∂G∂y Ej (cid:3)(cid:17) . (38)Using (30)-(33) and performing the corresponding simplifications we obtain the following twoequations x P i ∂F∂x Pi + x P i ′ ∂F∂x Pi ′ + x E j ∂F∂x Ej + y P i ∂F∂y Pi + y P i ′ ∂F∂y Pi ′ + y E j ∂F∂y Ej = 3 y ′ ( x i − x i ′ ) − y i − y i ′ )[ R − x ′ ( x i ′ − x i )]= 3 F November 12, 2019 DRAFT1 and x P i ∂G∂x Pi + x P i ′ ∂G∂x Pi ′ + x E j ∂G∂x Ej + y P i ∂G∂y Pi + y P i ′ ∂G∂y Pi ′ + y E j ∂G∂y Ej = 4[ r D − (cid:0) R + x ′ ( x i − x i ′ ) + y ′ ( y i − y i ′ ) (cid:1) ]= 4 G. Finally, the HJI equation can be written as follows ∂V s ∂ x · f ( x , φ ∗ j , ψ ∗ i , ψ ∗ i ′ )= v Pi ′ D · I ∗ P i ′ (cid:16) F − p ( x i − x i ′ ) G ]+ p ( x i − x i ′ ) G − F + 2 p ( x i − x i ′ ) G (cid:17) = 0 . (39)In conclusion, the Value function V ( x ) is continuous, continuously differentiable, and it satisfiesthe HJI equation. (cid:3) Remark 4:
We considered the cases N = M and N > M . By formulation of the problem (apursuer is eliminated from the game when it intercepts its assigned evader), the case
N < M implies that the evaders can win the game since at least M ′ = M − N evaders can reach theborder. Still, the ideas presented in this paper could be used by the pursuers in order to minimizethe damage. This could be in the form of intercepting as many evaders as possible and/or chooseto maximize the remaining payoff by assuming that M ′ evaders are destined to reach the border.The latter case will return the choice of the best N evaders to intercept as farther away aspossible from the x -axis. This is directly related to the solution to the Game of Kind, that is,whether the border can be protected. Complete protection is automatically given by the solutionof the initial assignment if V > . In more detail, if each y ij > in y s ∗ ι then all evaders canbe captured before reaching the border. If some y ij < then the best assignment is the one thatminimizes the number of evaders reaching the border and the border is only partially protectedin such a case. Remark 5:
The solution of the BDDG derived in this paper scales well with respect to thenumber of players since this solution has been obtained in closed-form. As the number of agentsincreases the only increase on computations is to determine the feasible assignments A i which, inthe case of commitment by the pursuers, is only done once, at the beginning of the engagement.However, the state feedback optimal guidance strategies hold in the form summarized in Theorem4 for the general case of N > M . V. E
XAMPLES
Example 1. Consider the 2 vs. 2 BDDG where the pursuers initial positions are given by P = ( − . , . and P = (9 . , . . The initial positions of the evaders are E = (4 . , and E = (5 . , . . The speeds of the agents are v P = 1 , v P = 1 . , v E = 0 . , and v E = 0 . . November 12, 2019 DRAFT2
Example 1.1. In order to determine the best assignment, the players need to compute andcompare the terms y s ( x ) and y s ( x ) which are explicit functions of the state as shown in (9). Inthis example we have that y s ( x ) = 10 . and y s ( x ) = 8 . ; hence, the optimal assignment isgiven by µ = µ = 1 and we have V ( x ) = y s ( x ) = 10 . . The optimal guidance strategiesare given by (10)-(11) where x ∗ E = x ∗ P = 14 . , y ∗ E = y ∗ P = 3 . , x ∗ E = x ∗ P = 0 . ,and y ∗ E = y ∗ P = 7 . . The optimal trajectories are shown in the top left plot of Fig. 5. Notethat the selection of assignments is done only once, at the beginning of the engagement, but theguidance strategies are computed in closed-loop form. Under optimal play, the optimal aimpointsare time-invariant: the calculation of the optimal aimpoints along the optimal trajectories providesthe same result and the trajectories are straight lines, as expected.Example 1.2. The rest of the plots in Fig. 5 show the players with the same initial positionsand the same speeds but with non-optimal choices of strategies by one of the teams. For instance,in the top right plot of Fig. 5 the evaders implement a non-optimal strategy while the pursuerslock on the corresponding evader according to their optimal assignment ( P on E and P on E ) and implement their optimal guidance law in a closed-loop manner. Since the evaders’trajectories are not optimal, the pursuers continuously update their aimpoints (which are nowtime-varying) by computing (10)-(11) and react to the non-optimal strategies of the evaders.The terminal cost/payoff is y ( t f ) + y ( t f ) = 16 . > V = 10 . and, as expected,the evaders are captured farther away from the x -axis since they did not follow their optimalstrategy. This is true for any non-optimal evaders’ strategy. In this example in particular, theevaders wrongly chose to aim at the lowest points on the E P and E P Apollonius circles;that is, they assume the wrong pursuer assignment. The pursuers simply follow their combinedoptimal assignment/guidance to improve their performance, that is, to increase their payoff bycapturing the evaders farther away from the border as they did in the top right plot of Fig. 5.Example 1.3. The bottom left plot of Fig. 5 shows an example where the pursuers implementtheir optimal assignment but they fail to implement their optimal guidance strategy. In particularthey implement the Pure Pursuit (PP) guidance each one on its assigned evader. In this case P is able to intercept E but closer to the x − axis . Even worse, P is not able to capture E beforethe latter reaches the border. Clearly, the pursuers performance is significantly degraded by notusing their optimal guidance, even when the assignment was correct. In this case the evaders,knowing that the pursuers implemented the correct assignment, they only need to implementthe same assignment along with the optimal guidance for that assignment. This means that theycompute their optimal headings according to (10)-(11) and, by implementing this optimal strategyin closed-loop manner, they are able to react to the pursuers non-optimal guidance and increasetheir performance, that is, reduce their combined cost and be captured closer to the border or November 12, 2019 DRAFT3
Fig. 5. Example 1. Top left: optimal play. Top right: evaders follow non-optimal assignment. Bottom left: pursuers followsnon-optimal guidance. Bottom right: pursuers follow non-optimal assignment reach it if possible.Example 1.4. The bottom right plot of Fig. 5 shows another example where the pursuerdo not follow their optimal strategy. In this case they implement the incorrect assignmentfor this example ( P on E and P on E ); however they use the optimal guidance for thatparticular assignment given by (10) and (12) in this case. The evaders, knowing that the pursuersimplemented the incorrect assignment, respond by implementing their optimal guidance for thatassignment and they also compute their aimpoints using (12). The combined cost/payoff is y s = y ( t f ) + y ( t f ) = 8 . < V = 10 . and, as expected, the evaders are capturedcloser to the x -axis compared to the case where the pursuers implement their combined optimalassignment/guidance. VI. E XTENSIONS
The differential game with two teams and multiple players could be extended to consider addi-tional facets of combat scenarios: Decoys, players willing to sacrifice to benefit their teammates,and players with different levels of importance will be analyzed in future research.An important extension addressed in this section is to analyze the same BDDG but withoutthe prior commitment restriction. We will focus on the particular case considered in Section III
November 12, 2019 DRAFT4 of two pursuers versus 2 evaders. The following is a corollary to Theorem 2.
Corollary 1:
Consider the 2 vs. 2 BDDG (1)-(5) with α ij = v E j /v P i < , and where x ∈R P and the pursuers do not commit to their initial assignment. The pursuers’ strategies withcommitment given by cos ψ ∗ = x ∗ P − x P √ ( x ∗ P − x P ) +( y ∗ P − y P ) sin ψ ∗ = y ∗ P − y P √ ( x ∗ P − x P ) +( y ∗ P − y P ) cos ψ ∗ = x ∗ P − x P √ ( x ∗ P − x P ) +( y ∗ P − y P ) sin ψ ∗ = y ∗ P − y P √ ( x ∗ P − x P ) +( y ∗ P − y P ) (40)are robust state-feedback strategies for the game without commitment, where x ∗ P = x E − α x P − α y ∗ P = y E − α y P − α d − α x ∗ P = x E − α x P − α y ∗ P = y E − α y P − α d − α (41)if y s > y s , and x ∗ P = x E − α x P − α y ∗ P = y E − α y P − α d − α x ∗ P = x E − α x P − α y ∗ P = y E − α y P − α d − α (42)if y s > y s , where d ij is given by (13). The pursuers’ guaranteed payoff is y s ( x ) = y s ( x ) if y s > y s and y s ( x ) = y s ( x ) if y s > y s where y s and y s are given by (9). Proof . Note that for a given assignment, the evaders cannot do better but to head to the lowestpoint on the corresponding circles. The pursuers attain their best payoff under that assignmentby aiming at the same point. This was proven in Theorem 2. Hence, the pursuers only need tochoose their best possible assignment, and by sticking with this assignment, the evaders cannotunilaterally improve their performance. By following (40)-(42), the pursuers lowest payoff is y s regardless of what strategy the evaders implement. (cid:3) Remark 6:
An important problem is to determine under which conditions the pursuers cansee a benefit by switching assignments. This is related to dispersal surfaces where anotherassignment may be better than the current one. The existence of a dispersal surface perhaps maybe predicted from the start and the evaders will look into other choices. Another aspect mayinclude the existence of curved trajectories where the evaders try to avoid a dispersal surface.
November 12, 2019 DRAFT5
Also note that y s is not the value of this game but only a lower bound on the achievable payofffor the pursuers Remark 7:
By relaxing the restriction regarding initial commitment it is also possible to extendthe region R P by cooperation and switch. For example, when initially an evader is able to reachthe border if only one pursuer is assigned to it, then two pursuers cooperate in order to decreasethe region of dominance of the evader so that he is intercepted farther away from the border.This makes possible for one of them to eventually single handedly capture the evader while theother one is free to switch its assignment and to pursue a different opponent.VII. C ONCLUSIONS
In this paper large scale pursuit-evasion games were considered and the joint optimal as-signment of pursuers to evaders and optimal pursuit and evasion strategies in a multiplayerengagement has been analyzed. The two-team multi-player scenario of border defense wasposed as a differential game. Unlike classical differential games, where only state feedbackstrategies are sought, the results of this paper show how to solve this hybrid differential gameand provide the complete solution over the joint set of continuous time state feedback strategiesand discrete (binary) assignment variables. Simulation examples demonstrated the effectivenessand robustness of the solution under optimal play and also when one or more players do notfollow their optimal strategies and/or optimal assignments. Finally, extensions to this game weredelineated emphasizing the importance of differential game theory to address pursuit-evasionproblems where assignment of pursuers to evaders is required.VIII. A
PPENDIX
A. Border defense with 3P and 1E
Consider the scenario shown in Fig. 6.a where three pursuers try to capture an evader andmaximize the distance between the interception point and the closest point to the border. Theevader aims at minimizing the same terminal distance. As before, the border is the x -axis of theCartesian frame. In general, the interception point is given by the lowest point of the reachableregion of the evader. Such region is constructed using the corresponding segments of the threeApollonius circles.Two cases exist, the evader is captured by only one pursuer or it is captured simultaneously bytwo pursuers. In the first case, the lowest point on the evader’s reachable region is given by a pointon an arc of the reachable region. In the second case, the lowest point is given by an intersectionof two Apollonius circles. For instance, in Fig. 6.a the interception point under optimal play is I , and only P and P capture the evader. P is not needed in this engagement. A particular November 12, 2019 DRAFT6 -10 -5 0 5 10 15 20 25 x -50510152025 y a) -10 -5 0 5 10 15 20 x -5051015 y b) EP circleEP circle EP circleP E P P EP circle EP circleEP circleEP P P I I Fig. 6. a) 3P1E scenario. b) Particular case: all three circles share the same intersection point case is when the lowest point on the evader’s reachable region is given by the intersection ofthe three circles. An example of such a case is shown in Fig. 6.b. However, one of the pursuersis redundant since it can be removed from the scenario (or choose not to participate in pursuingthe evader) and the lowest point on the evader’s reachable region remains the same. In Fig. 6.beither P or P can choose not to participate in the game and the interception point remains thesame. Hence, assignment of a third pursuer to a single evader does not improve the payoff forthe pursuer’s group; such assignments do not need to be considered.In general, the interception point is the lowest point in the evader’s reachable region and suchpoint is unique. Thus, there are no singular surfaces and saddle point state feedback strategiesexist. The similar problem [36] where the cost/payoff functional is capture time, where it ispossible for the evader to be captured simultaneously by the three pursuers and the interceptionpoint is located inside the reachable region of the evader. In this case, the evader maximizescapture time by determining the point inside its reachable region that is equidistant (when allpursuers have the same speed) to all pursuers. B. Assignment problem
The multi-pursuer multi-evader assignment problem can be cast as a Linear Program (LP).Consider the case where N = M , then the optimal assignments are obtained by maximizing thefollowing J = N X i =1 N X j =1 y i,j µ i,j (43) November 12, 2019 DRAFT7 subject to the constraints N X i =1 µ i,j = 1 , j = 1 , . . . , N (44) N X j =1 µ i,j = 1 , i = 1 , . . . , N (45)where y i,j is given by (8) and µ ij = 1 if pursuer i is assigned to capture evader j and µ ij = 0 otherwise. The constraint in (44) requires evader j to be engaged by just one pursuer and (45)requires pursuer i to be assigned to just one evader. Problem (43)–(45) can be solved using theHungarian algorithm [37].If the pursuers commit to their initial assignment one can obtain saddle point state feedbackstrategies and the Value of the game exists. The objective of dynamic reassignment is to takeadvantage of evader’s errors but also of trajectories that may hit a dispersal surface. In this casethe Value of the game has not been found. however, evaders have a lower bound J for theircost. One can also use the assignment algorithm when there are more pursuers than evaders. Itis possible and advantageous to assign up to two (but not more) pursuers to one evader. Thiswas considered in Section IV-B. The distances from interception points to the border need to becalculated. If for example, N = M + 1 , one must calculate M ( M + 1) M ! distances. In general,one must calculate NM ! M ! NN − M ! = M ! N ![( N − M )!] (2 M − N ) If N = M , one must calculate N ! distances, as expected. Now ≤ N X i =1 µ i,j ≤ , j = 1 , . . . , N N X j =1 µ i,j = 1 , i = 1 , . . . , N R EFERENCES [1] R. Isaacs,
Differential Games . New York: Wiley, 1965.[2] J. Breakwell and P. Hagedorn, “Point capture of two evaders in succession,”
Journal of Optimization Theory andApplications , vol. 27, no. 1, pp. 89–97, 1979.[3] S.-Y. Liu, Z. Zhou, C. Tomlin, and K. Hedrick, “Evasion as a team against a faster pursuer,” in
American Control Conference(ACC), 2013 . IEEE, 2013, pp. 5368–5373.[4] J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry, “Reach-avoid problems with time-varying dynamics, targets andconstraints,” in
Proceedings of the 18th international conference on hybrid systems: computation and control . ACM,2015, pp. 11–20.
November 12, 2019 DRAFT8 [5] K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach–avoid differential games,”
IEEE Transactions onAutomatic Control , vol. 56, no. 8, pp. 1849–1861, 2011.[6] J. Lorenzetti, M. Chen, B. Landry, and M. Pavone, “Reach-avoid games via mixed-integer second-order cone programming,”in , 2018, pp. 4409–4416.[7] Z. Zhou, R. Takei, H. Huang, and C. Tomlin, “A general, open-loop formulation for reach-avoid games,” in , 2012, pp. 6501–6506.[8] Z. Zhou, W. Zhang, J. Ding, H. Huang, D. M. Stipanovi´c, and C. J. Tomlin, “Cooperative pursuit with voronoi partitions,”
Automatica , vol. 72, pp. 64–72, 2016.[9] S. A. Ganebny, S. S. Kumkov, S. Le M´enec, and V. S. Patsko, “Model problem in a line with two pursuers and oneevader,”
Dynamic Games and Applications , vol. 2, no. 2, pp. 228–257, 2012.[10] H. Huang, W. Zhang, J. Ding, D. M. Stipanovic, and C. J. Tomlin, “Guaranteed decentralized pursuit-evasion in the planewith multiple pursuers,” in , 2011, pp.4835–4840.[11] A. Von Moll, D. Casbeer, E. Garcia, and D. Milutinovic, “Pursuit-evasion of an evader by multiple pursuers,” in
International Conference on Unmanned Aircraft Systems . IEEE, 2018, pp. 133–142.[12] J. S. McGrew, J. P. How, B. Williams, and N. Roy, “Air-combat strategy using approximate dynamic programming,”
Journal of Guidance, Control, and Dynamics , vol. 33, no. 5, pp. 1641–1654, 2010.[13] E. Bakolas and P. Tsiotras, “Optimal pursuit of moving targets using dynamic Voronoi diagrams,” in , 2010, pp. 7431–7436.[14] Z. E. Fuchs, P. P. Khargonekar, and J. Evers, “Cooperative defense within a single-pursuer, two-evader pursuit evasiondifferential game,” in , 2010, pp. 3091–3097.[15] W. Scott and N. E. Leonard, “Pursuit, herding and evasion: A three-agent model of caribou predation,” in
American ControlConference , 2013, pp. 2978–2983.[16] E. Garcia, D. W. Casbeer, and M. Pachter, “Active target defense using first order models,”
Automatica , vol. 78, pp.139–143, 2017.[17] D. W. Oyler, P. T. Kabamba, and A. R. Girard, “Pursuit–evasion games in the presence of obstacles,”
Automatica , vol. 65,pp. 1–11, 2016.[18] M. Coon and D. Panagou, “Control strategies for multiplayer target-attacker-defender differential games with doubleintegrator dynamics,” in , 2017, pp. 1496–1502.[19] E. Garcia, D. W. Casbeer, and M. Pachter, “Design and analysis of state-feedback optimal strategies for the differentialgame of active defense,”
IEEE Transactions on Automatic Control , vol. 64, no. 2, pp. 553–568, 2019.[20] L. Liang, F. Deng, Z. Peng, X. Li, and W. Zha, “A differential game for cooperative target defense,”
Automatica , vol. 102,pp. 58–71, 2019.[21] R. H. Venkatesan and N. K. Sinha, “A new guidance law for the defense missile of nonmaneuverable aircraft,”
IEEETransactions on Control Systems Technology , vol. 23, no. 6, pp. 2424–2431, 2015.[22] D. Li and J. B. Cruz, “Defending an asset: a linear quadratic game approach,”
IEEE Transactions on Aerospace andElectronic Systems , vol. 47, no. 2, pp. 1026–1044, 2011.[23] Y. Ho, A. Bryson, and S. Baron, “Differential games and optimal pursuit-evasion strategies,”
IEEE Transactions onAutomatic Control , vol. 10, no. 4, pp. 385–389, 1965.[24] J. Z. Ben-Asher, S. Levinson, J. Shinar, and H. Weiss, “Trajectory shaping in linear-quadratic pursuit-evasion games,”
Journal of Guidance, Control, and Dynamics , vol. 27, no. 6, pp. 1102–1105, 2004.[25] M. Levy, T. Shima, and S. Gutman, “Full-state autopilot-guidance design under a linear quadratic differential gameformulation,”
Control Engineering Practice , vol. 75, pp. 98–107, 2018.[26] A. Perelman, T. Shima, and I. Rusnak, “Cooperative differential games strategies for active aircraft protection from ahoming missile,”
Journal of Guidance, Control, and Dynamics , vol. 34, no. 3, pp. 761–773, 2011.
November 12, 2019 DRAFT9 [27] V. Turetsky and J. Shinar, “Missile guidance laws based on pursuit–evasion game formulations,”
Automatica , vol. 39, no. 4,pp. 607–618, 2003.[28] P. Kawkecki, B. Kraska, K. Majcherek, and M. Zola, “Guarding a line segment,”
Systems & Control Letters , vol. 58, no. 7,pp. 540–545, 2009.[29] W. Rzymowski, “A problem of guarding line segment,” in
Proceedings of the 48h IEEE Conference on Decision andControl (CDC) held jointly with 2009 28th Chinese Control Conference . IEEE, 2009, pp. 6444–6447.[30] S. Karaman and E. Frazzoli, “Incremental sampling-based algorithms for a class of pursuit-evasion games,” in
Algorithmicfoundations of robotics IX . Springer, 2010, pp. 71–87.[31] Z. Zhou, J. Ding, H. Huang, R. Takei, and C. Tomlin, “Efficient path planning algorithms in reach-avoid problems,”
Automatica , vol. 89, pp. 28–36, 2018.[32] M. Chen, Z. Zhou, and C. J. Tomlin, “Multiplayer reach-avoid games via pairwise outcomes,”
IEEE Transactions onAutomatic Control , vol. 62, no. 3, pp. 1451–1457, 2017.[33] T. Basar and G. J. Olsder,
Dynamic noncooperative game theory . SIAM, 1999, vol. 23.[34] D. Li, J. B. Cruz, G. Chen, C. Kwan, and M. Chang, “A hierarchical approach to multi-player pursuit-evasion differentialgames,” in , 2005, pp. 5674–5679.[35] A. Von Moll, E. Garcia, D. Casbeer, S. Manickam, and S. C. Swar, “Multiple pursuer single evader border defensedifferential game,” in
AIAA Scitech Forum, AIAA 2019-1162 , 2019.[36] A. V. Moll, D. Casbeer, E. Garcia, D. Milutinovic, and M. Pachter, “The multi-pursuer single-evader game,”
Journal ofIntelligent and Robotic Systems , 2019.[37] R. E. Burkard, M. Dell’Amico, and S. Martello,
Assignment Problems (Revised reprint) . SIAM, 2012.. SIAM, 2012.