A Pursuit-Evasion Differential Game with Strategic Information Acquisition
aa r X i v : . [ ee ss . S Y ] F e b ARXIV VERSION 1
A Pursuit-Evasion Differential Game withStrategic Information Acquisition
Yunhan Huang and Quanyan Zhu Abstract
In this paper, we study a two-person linear-quadratic-Gaussian pursuit-evasion differential game with costly butcontrolled information. One player can decide when to observe the other player’s state. But one observation of anotherplayer’s state comes with two costs: the direct cost of observing and the implicit cost of exposing his/her state. We callgames of this type a Pursuit-Evasion-Exposure-Concealment (PEEC) game. The PEEC game constitutes two typesof strategies: The control strategies and the observation strategies. We fully characterize the Nash control strategiesof the PEEC game using techniques such as completing squares and the calculus of variations. We show that thederivation of the Nash observation strategies and the Nash control strategies can be decoupled. We develop a set ofnecessary conditions that facilitate the numerical computation of the Nash observation strategies. We show in theorythat players with less maneuverability prefer concealment to exposure. We also show that when the game’s horizongoes to infinity, the Nash observation strategy is to observe periodically. We conduct a series of numerical experimentsto study the proposed PEEC game. We illustrate the numerical results using both figures and animation. Numericalresults show that the pursuer can maintain high-grade performance even when the number of observations is limited.We also show that an evader with low maneuverability can still escape if the evader increases his/her stealthiness.
I. I
NTRODUCTION
Pursuit-Evasion (PE) refers to the problem in which one or more evaders try to escape from one or severalpursuers.
Berge , in 1957, initiated a PE problem where evaders move in a prescribed trajectory and the pursuerstrack with certain control constraints. In 1965, Isaacs, recognized as the father of differential games, bridges theproblem of PE and zero-sum differential game in his seminal work [1]. Propelled by early pioneers such as JohnBreakwell, Richard Bellman, Lev Pontryagin, and Yu-Chi Ho, the study of PE differential games, advancing inparallel with the theory of differential games and optimal control, have been flourished over the past half-century[2]–[17]. The motive behind PE games is not limited to physical entities pursuing one another. Various formulationsof PE games empower the problem solving in other research areas such as robotics, sports/games, target defenseand cybersecurity [6], [14], [17]–[20].Among the differential game studies, particular attention is paid to information patterns of the Linear-Quadratic-Gaussian (LQG) differential games. The information pattern of a dynamic game describes the available informationto each player at each state for sequential decision. Two classical information pattern is the open-loop pattern, under Y. Huang and Q. Zhu are with the Department of Electrical and Computer Engineering, New York University, 370 Jay St., Brooklyn, NY. { yh.huang, qz494 } @nyu.edu RXIV VERSION 2 which players only know the initial state of the game, and the feedback pattern with full information [1], [2]. Asfar as information patterns are concerned, there are essentially three possibilities: no information, perfect(exact)information, or partial information. These possibilities lead to nine different cases of two categories separated basedon the symmetric of information for two-player games. Many efforts has been dedicated to tackling different casesof information pattern [4], [21]–[26].In studies of PE differential games, it is a common assumption that state information is available any time to bothplayers [2], [3], [6], [7], [9], [12]–[15], [17], [27]. However, in real-world applications, state information, especiallyinformation regarding the information of one’s opponent, is not available and usually comes with a price to attain.In military affairs, the innovation of more advanced and autonomous information and communication technologieshas engendered a new revolution, making the battle in cyberspace as crucial as the ones in physical battlefields.The ability to remain stealthy and to be deceptive becomes the most valued characteristics of battlefield things.However, frameworks that can capture the intricacy of the information interactions between players are missing inthe existing literature.To fill the void, in this work, we study the controlled information structure of LQG PE differential games with afinite horizon, where players can decide at each stage whether to attain information or not, which we call a Pursuit-Evasion-Exposure-Concealment (PEEC) game. Acquiring information is referred to as “making an observation”here, which sometimes is called “taking a measurement” in some references [24]–[26], [28]–[30]. Each observationcomes with a cost that whoever makes this observation has to pay. Besides the quantitative price, the player whochooses to observe the other player may also expose his state information. In real-world applications, the cost ofobservation may come from sensing and/or the cost of communication and stealth considerations. For example, aradar measurement can easily lead to megawatts of power usage and the measurer’s exposure to the target. In thePEEC game, each player has to decide when to observe by developing the observation strategy and how to controlby designing the control strategy.The problem of controlled observations with costs has been studied in the context of finite-horizon optimalcontrol [31], infinite-horizon optimal control [30], and Markov decision process [32]. Jan Geert Olsder studiedcostly observations in a discrete-time dynamic game setting [33], where each player at each step makes independentobservation choices and obtains their private observations. The author proposes a matrix game to solve for a Nashobservation strategy, whose derivation becomes prohibitive when the game’s horizon increases. Hence, only a two-stage game problem is investigated. In [34], the authors extended the framework of dynamic games with costlyobservations into the context of security problems in cyber-physical systems where one player chooses to observe,and the other chooses whether to jam the observation or not. Both [33] and [34] focus on discrete-time dynamicgames. Dipankar Maity et al. [28], and [29] study dynamic games with controlled observations in a continuous-timesetting, where each player can only choose to observe at a finite number of times. In [28] and [29], each playerreceives their private observations, and one player’s observation decision won’t affect the other player’s informationset. Our work is different from [28], [29], [33], [34] in three ways. First, we study the controlled observations ina PE differential game setting, where the two players have specific goals (one is chasing, the other is avoiding).And this situation can result in interesting interactions between the two players in terms of observation strategies.
RXIV VERSION 3
Second, our work deals with an information pattern that previous works have not investigated. That is when oneplayer chooses to observe, his/her information also exposes to his opponent. Third, we fully characterize the Nashcontrol strategies and develop a set of necessary conditions, which allows us to compute the Nash observationstrategies numerically.The contributions of this work is summarized as follows.1) We propose a new type of PE differential game called the PEEC game, where both the pursuer and the evaderdon’t know each other’s state information and can decide when to observe it by paying a cost. This frameworkintroduces the concept of controlled information to PE differential games, which expand the interactionsbetween the pursuer and the evader to not just the physical layer but also the battlefield of information.2) We first leverage Itˆo’s formula and completion of squares to obtain the Nash control strategy structure. Weshow that the Nash control strategies are the same as would be obtained in a perfect feedback setting. Next,we fully characterize the Nash control strategies for any given observation strategies using the calculus ofvariations. We show that the derivation of the Nash observation strategies and the Nash control strategies canbe decoupled. The Nash control strategies have the certainty equivalence property and satisfy the separationprinciple. And the observation strategies are determined only by system characteristics. We show that playerswith less maneuverability prefer concealment to exposure and the optimal number of observations within afinite horizon is inversely proportional to the cost of observation. We develop a set of necessary conditionsthat helps characterize the Nash observation strategies. The necessary conditions show that when the horizonof the game goes to infinity, it is optimal to observe/expose periodically.3) Leveraging the theoretical results, we numerically characterize the observation strategies. In numerical studies,we illustrate the pursuer and the evader’s actions in the PEEC game using both figures and animation. Theresults show that a pursuer with higher maneuverability than the evader prefers more exposures/observations.But the pursuer can achieve reasonably good performance even when the number of observations is limited.The Nash observation strategy enables the pursuer to observe efficiently ( observe less often while maintaininga good performance). We also show that when only a limited number of observations are available, largersystem disturbances give an evader with less maneuverability more advantage. A less maneuverable evadercan still escape if he/she can avoid being detected by his/her opponent frequently by making it more expensivefor his/her opponent to observe.
A. Notation
In this paper, R represents the set of real numbers, N refers to the set of natural numbers including zero. Givenany vector or matrix 𝑥 , 𝑥 ′ means the transpose of 𝑥 . Given any square matrix 𝑀 , Tr ( 𝑀 ) denote the trace of 𝑀 .Given any vector 𝑥 and positive semi-definite matrix 𝑄 with proper dimension, k 𝑥 k 𝑄 = 𝑥 ′ 𝑄𝑥 . Note that dependingon the positive definiteness of 𝑄 , k · k 𝑄 is not necessarily a norm. Let 𝑀 be any vector or matrix, ¤ 𝑀 is thederivative of 𝑀 with respect to time. Given any two square matrix 𝑀 and 𝑀 , 𝑀 ≥ 𝑀 meas 𝑀 − 𝑀 is positivesemi-definite. Let 𝑛 be a positive integer, Id 𝑛 is a identity matrix with dimension 𝑛 × 𝑛 . RXIV VERSION 4
II. F
ORMULATION
We consider a class of pursuit-evasion (PE) games described by the following linear stochastic differential equation 𝑑𝑥 ( 𝑡 ) = 𝐴𝑥 ( 𝑡 ) 𝑑𝑡 + 𝐵 𝑝 𝑢 𝑝 ( 𝑡 ) 𝑑𝑡 − 𝐵 𝑒 𝑢 𝑒 ( 𝑡 ) 𝑑𝑡 + 𝐶𝑑𝑤 ( 𝑡 ) , with 𝑥 ( ) = 𝑥 , (1)where 𝑥 ∈ R 𝑛 is not random and disclosed to both the pursuer and the evader, 𝑥 ( 𝑡 ) ∈ R 𝑛 captures the states(locations) of both players at time 𝑡 . The terms 𝑢 𝑝 ( 𝑡 ) ∈ R 𝑚 𝑝 and 𝑢 𝑒 ∈ R 𝑚 𝑒 denote respectively the controlactions of the pursuer and the evader at time 𝑡 . Here, 𝑤 ( 𝑡 ) is a 𝑞 -dimensional real-valued standard Wiener processindependent of 𝑥 . The positive integers ( 𝑛, 𝑚 𝑝 , 𝑚 𝑒 , 𝑞 ) are arbitrary. Moreover, 𝐴 , 𝐵 𝑝 , 𝐵 𝑒 and 𝐶 are real-valuedmatrices with appropriate dimensions. Let I 𝑝 ( 𝑡 ) and I 𝑒 ( 𝑡 ) be respectively the information available to the pursuerand the evader at time instance 𝑡 . The family of admissible strategies for 𝑢 𝑝 is U 𝑝 , where U 𝑝 is a set of all possible 𝑢 𝑝 such that 𝑢 𝑝 is progressively measurable with respective to I 𝑝 ( 𝑡 ) and square-integrable on [ , 𝑇 ] almost surely.We define U 𝑒 in a similar way.To characterize the objective of each player in classic PE games, we introduce a quadratic functional of 𝑥 , 𝑢 𝑝 ,and 𝑢 𝑒 , over a finite time horizon [ , 𝑇 ] : 𝐽 ( 𝑢 𝑝 , 𝑢 𝑒 ) = E (cid:20) 𝑥 ( 𝑇 ) ′ 𝑄 𝑇 𝑥 ( 𝑇 ) + ∫ 𝑇 𝑥 ( 𝑡 ) ′ 𝑄𝑥 ( 𝑡 ) + 𝑢 𝑝 ( 𝑡 ) ′ 𝑅 𝑝 𝑢 𝑝 ( 𝑡 ) − 𝑢 𝑒 ( 𝑡 ) ′ 𝑅 𝑒 𝑢 𝑒 ( 𝑡 ) (cid:21) , where expectation E [·] is over the statistics of { 𝑤 ( 𝑡 )} ; further, 𝑄 and 𝑄 𝑇 are real-valued non-negative definitematrices, and 𝑅 𝑝 and 𝑅 𝑒 are real-valued positive definite matrices with appropriate dimensions. The objective ofthe pursuer is to find a 𝑢 𝑝 ∈ U 𝑝 that minimizes 𝐽 and the evader aims to do the opposite. In classic PE games, acommon assumption is that the state history is fully observable to both players, i.e., I 𝑝 ( 𝑡 ) = I 𝑒 ( 𝑡 ) = { 𝑥 ( 𝑠 ) , 𝑠 ≤ 𝑡 } .In our setting, we consider a controlled information structure. Specifically, both the purser and the evader candecide when to observe over the time interval ( , 𝑇 ] . When a player decides to observe at time instance 𝑡 , theplayer receives the state information 𝑥 ( 𝑡 ) , suffers from a non-negative cost, and at the same time exposes the stateinformation to the other player. The cost per observation is 𝑂 𝑝 ∈ [ , ∞) for the pursuer and 𝑂 𝑒 ∈ [ , ∞) for theevader. Denoted by Ω 𝑝 = ( 𝑁 𝑝 , T 𝑝 ) , the observation decisions of the pursuer can be characterized by 𝑁 𝑝 ∈ N (thenumber of observations made over time interval ( , 𝑇 ] ), and T 𝑝 = { 𝑡 𝑝, , 𝑡 𝑝, , · · · , 𝑡 𝑝,𝑁 𝑝 } (the set of time instanceswhen observations are made). We have Ω 𝑒 = ( 𝑁 𝑒 , T 𝑒 ) with 𝑁 𝑒 ∈ N and T 𝑒 = { 𝑡 𝑒, , 𝑡 𝑒, , · · · , 𝑡 𝑒,𝑁 𝑒 } defined similarly.Let T = T 𝑝 ∪ T 𝑒 be the time instances when at least one of the players decides to observe. Without loss of generality,we suppose T = { 𝑡 , 𝑡 , · · · , 𝑡 𝑁 𝑝 + 𝑁 𝑒 } , where time instances in T are ordered as < 𝑡 ≤ 𝑡 ≤ · · · 𝑡 𝑁 𝑝 + 𝑁 𝑒 . Theinformation available to the pursuer and the evader at time 𝑡 can be written as I( 𝑡 ) ≔ I 𝑝 ( 𝑡 ) = I 𝑒 ( 𝑡 ) = { 𝑥 ( 𝑠 )| <𝑠 ≤ 𝑡, 𝑠 ∈ T } . Therefore, the objective of the pursuer is to find an observation strategy Ω 𝑝 and an control strategy 𝑢 𝑝 that minimizes the following cost functional 𝐽 ( Ω 𝑝 , 𝑢 𝑝 , Ω 𝑒 , 𝑢 𝑒 ) = E (cid:20) 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 + 𝑥 ( 𝑇 ) ′ 𝑄 𝑇 𝑥 ( 𝑇 ) + ∫ 𝑇 𝑥 ( 𝑡 ) ′ 𝑄𝑥 ( 𝑡 ) + 𝑢 𝑝 ( 𝑡 ) ′ 𝑅 𝑝 𝑢 𝑝 ( 𝑡 ) − 𝑢 𝑒 ( 𝑡 ) ′ 𝑅 𝑒 𝑢 𝑒 ( 𝑡 ) (cid:21) . (2)Meanwhile, the evader aims to minimize 𝐽 ( Ω 𝑝 , 𝑢 𝑝 , Ω 𝑒 , 𝑢 𝑒 ) with an optimal observation strategy Ω 𝑒 and an optimalcontrol strategy 𝑢 𝑒 . The two players (the pursuer P and the evader E ), their strategies ( Ω 𝑝 , 𝑢 𝑝 ) and ( Ω 𝑒 , 𝑢 𝑒 ) ,the cost functional 𝐽 ( Ω 𝑝 , 𝑢 𝑝 , Ω 𝑒 , 𝑢 𝑒 ) in eq. (2), and the associated state dynamics given in eq. (1) constitute a RXIV VERSION 5 linear-quadratic-Gaussian zero-sum differential game with special controlled information structure, which we calla Pursuit-Evasion Exposure-Concealment (PEEC) game.
Remark 1.
Our framework can capture pursuit-evasion differential games in various forms [3], [4], [6], [8], [12],[13], [16], [17]. In pursuit-evasion differential games studied in these works , the pursuer and the evader usuallyhave independent dynamics. 𝑑𝑥 𝑝 ( 𝑡 ) = 𝐴 𝑝 𝑥 𝑝 ( 𝑡 ) 𝑑𝑡 + ˜ 𝐵 𝑝 𝑢 𝑝 ( 𝑡 ) 𝑑𝑡 + ˜ 𝐶 𝑝 𝑑𝑤 𝑝 ( 𝑡 ) ,𝑑𝑥 𝑒 ( 𝑡 ) = 𝐴 𝑒 𝑥 𝑒 ( 𝑡 ) 𝑑𝑡 + ˜ 𝐵 𝑒 𝑢 𝑝 ( 𝑡 ) 𝑑𝑡 + ˜ 𝐶 𝑒 𝑑𝑤 𝑒 ( 𝑡 ) , where 𝑥 𝑝 ( 𝑡 ) ∈ R 𝑚 and 𝑥 𝑒 ( 𝑡 ) ∈ R 𝑚 . This general dynamics can be captured by our framework by defining 𝑥 = [ 𝑥 ′ 𝑝 𝑥 ′ 𝑒 ] ′ , 𝑤 = [ 𝑤 ′ 𝑝 𝑤 ′ 𝑒 ] ′ , 𝑄 = Id 𝑚 − Id 𝑚 − Id 𝑚 Id 𝑚 , 𝐶 = ˜ 𝐶 𝑝
00 ˜ 𝐶 𝑒 , 𝐵 𝑝 = ˜ 𝐵 𝑝 , and 𝐵 𝑒 = 𝐵 𝑒 . Note that this formulation yieds 𝑥 ′ 𝑄𝑥 = k 𝑥 𝑝 − 𝑥 𝑒 k , which describes the objectives of both the pursuer and theevader. This formulation has been adopted in [3], [6], [10], [17]. Another way of formulating is letting 𝑥 = 𝑥 𝑝 − 𝑥 𝑒 ,when 𝐴 𝑝 = 𝐴 𝑒 . Let 𝑄 = Id 𝑚 , we have 𝑥 ′ 𝑄𝑥 = k 𝑥 𝑝 − 𝑥 𝑒 k . This formulation has been used in [4], [8], [12], [13]. Remark 2.
We consider a special information structure that is neither open-loop nor close-loop. The playershave symmetric information. Both players can influence information, and one player’s decision can affect the otherplayer’s information, hence the other player’s control. The logic is two players’ observation strategies ( 𝑁 𝑝 , T 𝑝 ) and ( 𝑁 𝑒 , T 𝑒 ) decide the set of time instances T = T 𝑝 ∪ T 𝑒 when information will be available. This set T determines I ( 𝑡 ) , which the controls have to be adapted to. Apart from I( 𝑡 ) , it is tacitly assumed that the system characteristics I 𝑠 = ( 𝐴, 𝐵 𝑝 , 𝐵 𝑒 , 𝐶, 𝑥 , 𝑄 𝑇 , 𝑄, 𝑅 𝑝 , 𝑅 𝑒 , 𝑂 𝑝 , 𝑂 𝑒 ) are known to both players. III. C
HARACTERIZATION OF N ASH S TRATEGIES
In this section, we study the existence and the characterization of Nash strategies for the PEEC game. The Nashstrategies involve the Nash observation strategies and the Nash control strategies selected by the two players. Tocharacterize the Nash strategies, we first characterize the Nash control strategies for every possible observationstrategies. That is for every possible Ω 𝑝 and Ω 𝑒 , we characterize the Nash control strategies ( 𝑢 ∗ 𝑝 , 𝑢 ∗ 𝑒 ) ∈ U 𝑝 × U 𝑒 ,such that 𝐽 ( Ω 𝑝 , 𝑢 ∗ 𝑝 , Ω 𝑒 , 𝑢 𝑒 ) ≤ 𝐽 ( Ω 𝑝 , 𝑢 ∗ 𝑝 , Ω 𝑒 , 𝑢 ∗ 𝑒 ) ≤ 𝐽 ( Ω 𝑝 , 𝑢 𝑝 , Ω 𝑒 , 𝑢 ∗ 𝑒 ) , for every 𝑢 𝑝 ∈ U 𝑝 and 𝑢 𝑒 ∈ U 𝑒 . Note that here the set of admissible control strategies U 𝑝 depends on theinformation structure I , which is controlled by both players P and E through ( Ω 𝑝 , Ω 𝑒 ) . Hence, 𝑢 ∗ 𝑝 and 𝑢 ∗ 𝑒 alsodepends on Ω 𝑝 , Ω 𝑒 . Then, we write ˜ 𝐽 ( Ω 𝑝 , Ω 𝑒 ) ≔ 𝐽 ( Ω 𝑝 , 𝑢 ∗ 𝑝 ( Ω 𝑝 , Ω 𝑒 ) , Ω 𝑒 , 𝑢 ∗ 𝑒 ( Ω 𝑝 , Ω 𝑒 )) , (3)where we emphasize the dependence of the Nash control strategies on ( Ω 𝑝 , Ω 𝑒 ) . RXIV VERSION 6
Next, we characterize the Nash observation strategies by finding a pair ( Ω ∗ 𝑝 , Ω ∗ 𝑒 ) such that ˜ 𝐽 ( Ω ∗ 𝑝 , Ω 𝑒 ) ≤ ˜ 𝐽 ( Ω ∗ 𝑝 , Ω ∗ 𝑒 ) ≤ ˜ 𝐽 ( Ω 𝑝 , Ω ∗ 𝑒 ) , for all possible Ω 𝑝 and Ω 𝑒 . A. The Nash control Strategies
Suppose that we are given an arbitrary pair of observation strategies ( Ω 𝑝 , Ω 𝑒 ) . Due to the special informationstructure, instead of using dynamic programming techniques or Pontryagin’s type of approaches [5], [35], we resortto a direct method to characterize the Nash control strategies. The direct method, widely applied recently in certaintypes of differential games [7], [9], [27], [29], is to form a generic structure of the cost functional in eq. (2) by astandard completion of squares and characterize the Nash control strategies by using the calculus of variations typeof techniques.The following lemma is a result of applying Itˆo’s lemma [36] and a completion of squares on eq. (1) and eq. (2). Lemma 1.
The cost functional 𝐽 in eq. (2) associated with state dynamics eq. (1) can be written as 𝐽 = E (cid:20) ∫ 𝑇 k 𝑢 𝑝 ( 𝑡 ) + 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) k 𝑅 𝑝 − k 𝑢 𝑒 ( 𝑡 ) + 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) k 𝑅 𝑒 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 (cid:21) + k 𝑥 k 𝐾 ( ) + ∫ 𝑇 Tr ( 𝐾 ( 𝑡 ) 𝐶𝐶 ′ ) 𝑑𝑡, (4) where ( 𝐾 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) is the symmetric non-negative solution of the Riccati equation: ¤ 𝐾 ( 𝑡 ) + 𝑄 + 𝐾 ( 𝑡 ) 𝐴 + 𝐴 ′ 𝐾 ( 𝑡 ) + 𝐾 ( 𝑡 ) (cid:16) 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 − 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 (cid:17) 𝐾 ( 𝑡 ) = , with 𝐾 ( 𝑇 ) = 𝑄 𝑇 . (5) Proof.
See Appendix A. (cid:3)
To ensure the existence and the well-definedness of a solution ( 𝐾 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) defined by eq. (5), i.e., 𝐾 (·) doesn’t have a finite escape time in [ , 𝑇 ) , we assume that 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑒 ≥ 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 [37]. The interpretation of thisassumption in a PE game is that the pursuer has more maneuverability than the evader to ensure the cost 𝐽 remainsbounded.In the classic PE game, the knowledge of the state 𝑥 ( 𝑡 ) for all 𝑡 ∈ [ , 𝑇 ] is available to both players and there isno cost of observation, we can obtain a pair of Nash strategies ( 𝑢 ∗ 𝑝 ( 𝑡 ) , 𝑢 ∗ 𝑒 ( 𝑡 )) = ( 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) , 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 )) ,which yields a cost k 𝑥 k 𝐾 ( ) + ∫ 𝑇 Tr ( 𝐾𝐶𝐶 ′ ) 𝑑𝑡 . However, in the PEEC game, the players have access to stateinformation at only a finite number of times instances T . Note that T = T 𝑝 ∪ T 𝑒 depends on the observationstrategies of both players. Recall that the observation strategies Ω 𝑝 , Ω 𝑒 can be characterized by the number ofobservations 𝑁 𝑝 , 𝑁 𝑒 and the time instances when an observation is made T 𝑝 , T 𝑒 . The following theorem gives theNash control strategies for every possible observation strategies Ω 𝑝 , Ω 𝑒 of both players. The proof of the theoremfollows the idea of forming a static game of infinite-dimensional action space and leveraging Gˆateaux derivative tocheck the first and second-order conditions of a Nash equilibrium (a saddle point in this zero-sum game.) RXIV VERSION 7
Theorem 1.
Given arbitrary Ω 𝑝 = ( 𝑁 𝑝 , T 𝑝 ) and Ω 𝑒 = ( 𝑁 𝑒 , T 𝑒 ) . Let T = T 𝑝 ∪ T 𝑒 = { 𝑡 , 𝑡 , · · · , 𝑡 𝑁 𝑝 + 𝑁 𝑒 } with < 𝑡 ≤ 𝑡 ≤ · · · , 𝑡 𝑁 𝑝 + 𝑁 𝑒 < 𝑇 . Let I( 𝑡 ) = { 𝑥 ( 𝑠 )| < 𝑠 ≤ 𝑡, 𝑠 ∈ T } be the information available to P and E attime 𝑡 . The Nash control strategies of the PEEC game defined by eqs. (1) and (2) are 𝑢 ∗ 𝑝 ( 𝑡 ) = − 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑡 ) ˆ 𝑥 ( 𝑡 ) ,𝑢 ∗ 𝑒 ( 𝑡 ) = − 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑡 ) ˆ 𝑥 ( 𝑡 ) , (6) where ( ˆ 𝑥 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) is the solution of the Riccati equation eq. (5) and ( ˆ 𝑥 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) is the solution of thefollowing ordinary differential equation 𝑑 ˆ 𝑥 ( 𝑡 ) = (cid:16) 𝐴 − ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 (cid:17) ˆ 𝑥 ( 𝑡 ) 𝑑𝑡, ˆ 𝑥 ( ) = 𝑥 , ˆ 𝑥 ( 𝜏 ) = 𝑥 ( 𝜏 ) , for all 𝜏 ∈ T . (7) Proof.
See Appendix B. (cid:3)
Remark 3.
Theorem 1 unveils the certainty equivalence property and the separation principle. If perfect feedbackof state information is available, the Nash control strategies are the same as would be obtained in the absenceof the additive disturbances. The missing feedback of state information is replaced by an estimate whose statisticsis independent of the control. The separation principle also allows us to characterize Nash observation strategiesseparated from the control strategies.
Remark 4.
As we can see from eq. (7) , between two neighboring observation time instances (say 𝑡 𝑖 and 𝑡 𝑖 + ), twoplayers are conducting open-loop control with initial condition 𝑥 ( 𝑡 𝑖 ) . But the control is not open-loop for the entirehorizon [ , 𝑇 ] . Whenever an observation is made, a close-loop information structure is formed at this particulartime instance. The estimate then is reset to the actual state and the variance of the estimation error becomes zero.At extreme cases such as when 𝑁 𝑝 = 𝑁 𝑒 = , then T = ∅ , the Nash control strategies becomes an open-loop one.When T = [ , 𝑇 ] , the Nash control strategies has close-loop information structure. In Section III-B, we will discussunder what conditions these extreme cases are attained. Under the Nash control strategies ( 𝑢 ∗ 𝑝 , 𝑢 ∗ 𝑒 ) , the following corollary present the the cost functional ˜ 𝐽 ( Ω 𝑝 . Ω 𝑢 ) forany given observation strategies ( Ω 𝑝 , Ω 𝑒 ) . Corollary 1.
Given arbitrary Ω 𝑝 = ( 𝑁 𝑝 , T 𝑝 ) and Ω 𝑒 = ( 𝑁 𝑒 , T 𝑒 ) . Under the Nash control strategies ( 𝜇 ∗ 𝑝 , 𝑢 ∗ 𝑒 ) givenin eq. (6) Theorem 1, the cost functional ˜ 𝐽 ( Ω 𝑝 , Ω 𝑒 ) defined in eq. (3) can be given as ˜ 𝐽 ( Ω 𝑝 , Ω 𝑒 ) = 𝑁 𝑝 + 𝑁 𝑒 Õ 𝑖 = ∫ 𝑡 𝑖 + 𝑡 𝑖 Tr [ Σ ( 𝑡 − 𝜏 𝑖 ) 𝜑 ( 𝑡 )] 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 + k 𝑥 k 𝐾 ( ) + ∫ 𝑇 Tr ( 𝐾 ( 𝑡 ) 𝐶𝐶 ′ ) 𝑑𝑡, (8) where Σ ( 𝑡 ) = ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑠 ) ′ 𝑑𝑠,𝜑 ( 𝑡 ) = 𝐾 ( 𝑡 ) ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 ( 𝑡 ) , (9) and = 𝑡 < 𝑡 ≤ 𝑡 · · · ≤ 𝑡 𝑁 𝑝 + 𝑁 𝑒 < 𝑡 𝑁 𝑝 + 𝑁 𝑒 + = 𝑇 .Proof. See Appendix C. (cid:3)
RXIV VERSION 8
Note that 𝑡 , 𝑡 , · · · , 𝑡 𝑁 𝑝 + 𝑁 𝑒 ∈ T are the ordered time instances at which at least one of the players choose toobserve. Now we can see how the observation strategies of player P and player E affect the cost functional. Thechoices of observation points T 𝑝 and T 𝑒 gives T = T 𝑝 ∪ T 𝑒 , which is the set of time instances when state informationwill be available to both players and determines hence the information set I . The control strategies, which areadapted to I , will be affected. Since the last two terms in eq. (8) are constant, to study the Nash observationstrategy, we only need to focus on the first three terms of eq. (8). B. The Nash Observation Strategies
In this section, we focus on characterizing the Nash observation strategies ( Ω ∗ 𝑝 , Ω ∗ 𝑒 ) . Following the results ofCorollary 1, the problem of characterizing a Nash observation strategy reduces to solving the following problem min Ω 𝑝 max Ω 𝑒 ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑜 ) ≔ 𝑁 𝑝 + 𝑁 𝑒 Õ 𝑖 = ∫ 𝑡 𝑖 + 𝑡 𝑖 Tr [ Σ ( 𝑡 − 𝑡 𝑖 ) 𝜑 ( 𝑡 )] 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 , (10)where ( Σ ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) and ( 𝜑 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) are defined in eq. (9). The observation strategies of player P involves 𝑁 𝑝 , the number of observations made in the time interval [ , 𝑇 ] , and T 𝑝 = { 𝑡 𝑝, , 𝑡 𝑝, , · · · , 𝑡 𝑝,𝑁 𝑝 } , the time instanceswhen an observation is made. So does the observation strategies of player E . The observation strategies of bothplayers can be determined offline by solving the finite-dimensional minmax problem in eq. (10). The couplingbetween two player’s observation strategies is introduced due to the fact that if one player choose to observe theother player’s state, his/her own state information will be disclosed. To solve the problem in eq. (10), we firstdevelop some structural results regarding the solution of the problem. Proposition 1.
Suppose 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 ≥ 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 . Consider the Concealment-Exposure(CE) game defined in eq. (10) .Let (cid:16) Ω ∗ 𝑝 = ( 𝑁 ∗ 𝑝 , T ∗ 𝑝 ) , Ω ∗ 𝑒 = ( 𝑁 ∗ 𝑒 , T ∗ 𝑒 ) (cid:17) be a Nash strategy of the CE game. If 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 > 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 , we have(i) No matter what the observation strategy of the pursuer is, the best observation strategy for the evader E isto not observe, i.e., 𝑁 ∗ 𝑒 = , T ∗ 𝑒 = ∅ for all Ω 𝑝 .(ii) When 𝑂 𝑝 = , it is optimal for the pursuer P to observe every time, i.e., 𝑁 ∗ 𝑝 = ∞ , T ∗ 𝑝 = [ , 𝑇 ] . When 𝑂 𝑝 > ,the optimal number of observations for the pursuer P is upper bounded and inversely proportional to theobservation cost 𝑂 𝑝 , i.e., 𝑁 ∗ 𝑝 ≤ 𝑂 𝑝 ∫ 𝑇 Tr ( Σ ( 𝑡 ) 𝜑 ( 𝑡 )) 𝑑𝑡. (11) (iii) The optimal observation time instances T ∗ 𝑝 for the pursuer P exist and need to satisfy ∫ 𝑡 ∗ 𝑝,𝑖 𝑡 ∗ 𝑝,𝑖 − Tr h 𝑒 𝐴 ( 𝑡 ∗ 𝑝,𝑖 − 𝑡 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 ∗ 𝑝,𝑖 − 𝑡 ) ′ 𝜑 ( 𝑡 ∗ 𝑝,𝑖 ) i 𝑑𝑡 = ∫ 𝑡 ∗ 𝑝,𝑖 + 𝑡 ∗ 𝑝,𝑖 Tr h 𝑒 𝐴 ( 𝑡 − 𝑡 ∗ 𝑝,𝑖 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑡 ∗ 𝑝,𝑖 ) ′ 𝜑 ( 𝑡 ) i 𝑑𝑡, (12) for 𝑖 = , , · · · , 𝑁 ∗ 𝑝 .Proof. See Appendix D. (cid:3)
Remark 5.
In Proposition 1, we focus on the case when 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 > . When 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 = 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 , 𝜑 ( 𝑡 ) = for all 𝑡 . In this case, the CE game becomes min 𝑁 𝑝 max 𝑁 𝑒 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 . The Nash observation strategiesfor both players are simply not to observe at all. When 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑝 < 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 , the solution of the Riccati equation in RXIV VERSION 9 eq. (5) admits a finite escape time [37]. That means the PEEC game admits an unbounded value. Hence, discussingthe observation strategies becomes meaningless in this case.
Remark 6.
From Proposition 1 (i), we know that when the pursuer has stronger maneuverability than the evader(i.e., 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 > ), the best observation strategy for evader is to stay stealthy, i.e., not observe, hencenot expose him/herself. Results in (ii) tell us that when there is no observation cost for the pursuer, i.e., 𝑂 𝑝 = ,since the pursuer has better maneuverability, the pursuer does not have any concerns about stealthiness. Hence,the pursuer will observe as often as possible. When the cost of observation is not zero, i.e., 𝑂 𝑝 > , intimidated bythe cost of sensing and communication, it is optimal that the pursuer observes only a finite number of times. Theoptimal number of observation times is inversely proportional to the observation cost 𝑂 𝑝 . When an arbitrary numberof observation time instances 𝑁 𝑝 is given, we show in (iii) that there always exist a set of optimal observationinstances T ∗ 𝑝 that minimizes the pursuer’s cost. Since 𝑁 ∗ 𝑝 ∈ N is upper bounded, it is guaranteed that there existsa Nash observation strategy ( Ω ∗ 𝑝 , Ω ∗ 𝑒 ) with Ω ∗ 𝑒 = ( , ∅) and that different Nash observation strategies will producethe same value (cost). Remark 7.
Besides the results regarding the existence of a Nash observation strategy, (iii) also offers a set ofnecessary conditions that help the characterization of a Nash observation strategy. From eq. (12) , we can see thatthe optimal observation time instances distributed evenly over the time horizon when 𝐾 ( 𝑡 ) becomes stationary, i.e., 𝑡 ∗ 𝑝,𝑖 + − 𝑡 ∗ 𝑝,𝑖 ≈ 𝑡 ∗ 𝑝,𝑖 − 𝑡 ∗ 𝑝,𝑖 − for 𝑡 ∗ 𝑝,𝑖 − , 𝑡 ∗ 𝑝,𝑖 , 𝑡 ∗ 𝑝,𝑖 + in some horizon. An extreme case is when 𝑇 → ∞ . We know from[35] (Section 8.3) that when 𝑇 → ∞ , 𝐾 ( 𝑡 ) → ˜ 𝐾 where ˜ 𝐾 is the solution of the algebraic version of eq. (5) . Hence,eq. (12) can be written as ∫ 𝑡 ∗ 𝑝,𝑖 𝑡 ∗ 𝑝,𝑖 − Tr h 𝑒 𝐴 ( 𝑡 ∗ 𝑝,𝑖 − 𝑡 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 ∗ 𝑝,𝑖 − 𝑡 ) ′ ˜ 𝜑 i 𝑑𝑡 = ∫ 𝑡 ∗ 𝑝,𝑖 + 𝑡 ∗ 𝑝,𝑖 Tr h 𝑒 𝐴 ( 𝑡 − 𝑡 ∗ 𝑝,𝑖 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑡 ∗ 𝑝,𝑖 ) ′ ˜ 𝜑 i 𝑑𝑡, where ˜ 𝜑 = ˜ 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) ˜ 𝐾 is positive definite. Hence, we have 𝑡 ∗ 𝑝,𝑖 + − 𝑡 ∗ 𝑝,𝑖 = 𝑡 ∗ 𝑝,𝑖 − 𝑡 ∗ 𝑝,𝑖 − for any 𝑖 . In Proposition 1, we show the existence of a Nash observation strategy and partially characterized a Nashobservation strategy via theoretical analysis. More specifically, we characterize the evader’s strategy, derive anupper bound on the optimal observation times 𝑁 ∗ 𝑝 of the pursuer, and develop a set of necessary conditions for theoptimal observation time instances T ∗ 𝑝 . We also show in Remark 7 that when 𝑇 goes to infinity, the Nash strategyof the pursuer is to observe periodically. For a finite 𝑇 , to fully characterize a Nash observation Ω ∗ 𝑝 , we need tosolve the following finite-dimensional optimization problem: 𝐹 𝑝 ( 𝑁 𝑝 ) ≔ min 𝑡 𝑝, ,𝑡 𝑝, , ··· ,𝑡 𝑝,𝑁𝑝 ∈[ ,𝑇 ] 𝑁 𝑝 Õ 𝑖 = ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 , s.t. 𝑡 𝑝, = , 𝑡 𝑝,𝑁 𝑝 + = 𝑇,𝑡 𝑝,𝑖 ≤ 𝑡 𝑝,𝑖 + , 𝑖 = , , , · · · , 𝑁 𝑝 , (13)where 𝐹 𝑝 ( 𝑁 𝑝 ) is the optimal value of the CE game when the number of observations made is 𝑁 𝑝 . The first-ordernecessary conditions of this problem is provided in eq. (12). In general, a closed-form solution for the optimizationproblem in eq. (13) is unattainable. Since the first and second-order differentials of the objective function in eq. (13) RXIV VERSION 10 can be expressed explicitly and the problem has only linear inequality constraint, we can resort to numerical methodsto solve the problem in eq. (13). We leave the discussion of numerical methods to Section IV.
Remark 8.
The discussion so far allows the pursuer to determine his/her observation strategy offline. To find anonline implementation of the observation strategy, we can leverage dynamic programming techniques. We can firstdefine 𝑉 ( 𝑡 ) = min 𝑁 𝑝 min 𝑡 𝑝, ,𝑡 𝑝, , ··· ,𝑡 𝑝,𝑁𝑝 ∈[ 𝑡,𝑇 ] 𝑁 𝑝 Õ 𝑖 = ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 , with 𝑡 𝑝, = 𝑡 , 𝑡 𝑝,𝑁 𝑝 + = 𝑇 , and 𝑉 ( 𝑇 ) = . Then, we need to show that 𝑉 ( 𝑡 ) = min Δ 𝑡 ≤ 𝑇 − 𝑡 (cid:20)∫ Δ 𝑡 Tr ( Σ ( 𝑠 ) 𝜑 ( 𝑡 + 𝑠 )) 𝑑𝑠 + 𝑂 𝑝 + 𝑉 ( 𝑡 + Δ 𝑡 ) (cid:21) , where 𝑉 (·) can be characterized by using techniques like approximate dynamic programming. With 𝑉 (·) beingcharacterized, whenever an observation is made, say an observation is made at time 𝑡 , the pursuer can thusdetermine online the optimal waiting time for next observation Δ ∗ 𝑡 by solving Δ ∗ 𝑡 = arg min Δ 𝑡 ≤ 𝑇 − 𝑡 (cid:20)∫ Δ 𝑡 Tr ( Σ ( 𝑠 ) 𝜑 ( 𝑡 + 𝑠 )) 𝑑𝑠 + 𝑂 𝑝 + 𝑉 ( 𝑡 + Δ 𝑡 ) (cid:21) . The analysis of the dynamic programming approach and online implementation is out the scope of this paper. Weleave it for future work.
IV. N
UMERICAL E XPERIMENTS
To illustrate the PEEC game and the Nash strategies, we consider a one pursuer and one evader game. Thespace is a planar surface for visualization purposes. Let 𝑦 𝑝 ∈ R be the 2-dimensional coordinates (position) of thepursuer, 𝑧 𝑝 = ¤ 𝑦 𝑝 ∈ R is be velocity vector and 𝑢 𝑝 be the acceleration control vector R . Let 𝑦 and 𝑦 be thename of the two coordinates. Let 𝑥 𝑝 = [ 𝑦 ′ 𝑝 𝑧 ′ 𝑝 ] ′ be the state of the pursuer, which includes the location and thevelocity of the pursuer. The state of the pursuer is subject certain degree of disturbances which is captured by a -dimensional standard Weiner process 𝑤 𝑝 ( 𝑡 ) ∈ R for all 𝑡 . By physical law, the state dynamics of the pursuer is 𝑑𝑥 𝑝 ( 𝑡 ) = 𝐴𝑥 𝑝 ( 𝑡 ) 𝑑𝑡 + 𝐵 𝑝 𝑢 𝑝 ( 𝑡 ) 𝑑𝑡 + 𝐶 𝑝 𝑑𝑤 𝑝 ( 𝑡 ) , where 𝐴 = ⊗ Id , 𝐵 𝑝 = ⊗ Id , 𝐶 𝑝 = 𝑐 𝑝 · Id . We define 𝑦 𝑒 ∈ R be the coordinates of the evader. Similarly, we have 𝑧 𝑒 = ¤ 𝑦 𝑒 and 𝑥 𝑒 = [ 𝑦 ′ 𝑒 𝑧 ′ 𝑒 ] ′ . The statedynamics of the evader can be described by 𝑑𝑥 𝑒 ( 𝑡 ) = 𝐴𝑥 𝑒 ( 𝑡 ) 𝑑𝑡 + 𝐵 𝑒 𝑢 𝑒 ( 𝑡 ) 𝑑𝑡 + 𝐶 𝑒 𝑑𝑤 𝑒 ( 𝑡 ) , where 𝐵 𝑒 = ⊗ Id , 𝐶 𝑒 = 𝑐 𝑒 · Id . Define a new state 𝑥 = 𝑥 𝑝 − 𝑥 𝑒 . We have 𝑑𝑥 ( 𝑡 ) = 𝐴𝑥 ( 𝑡 ) 𝑑𝑡 + 𝐵 𝑝 𝑢 𝑝 ( 𝑡 ) 𝑑𝑡 − 𝐵 𝑒 𝑢 𝑒 ( 𝑡 ) + 𝐶𝑑𝑤 ( 𝑡 ) , RXIV VERSION 11
Fig. 1: A realization of the PEEC game when 𝑂 𝑝 = ∞ and 𝑐 𝑝 = 𝑐 𝑒 = √ . (a) Trajectories of the Pursuer and theEvader on a two-dimensional plane; (b) Trajectory of the relative positions between the Pursuer and the Evader; (c)The Euclidean norm of the estimation error over time; (d) The Euclidean norm of the relative positions betweenthe pursuer and the evader over time.where 𝑐 = q 𝑐 𝑝 + 𝑐 𝑒 and ( 𝑤 ( 𝑡 ) , 𝑡 ≥ ) is a -dimensional standard Wiener process. The pursuer is trying to minimizethe distance between him/her and the evader. The evader is trying to maximize it. Assume that acceleration on bothaxes require the same amount of effort/energy. Hence, we have 𝑄 = ⊗ Id , 𝑅 𝑝 = 𝛾𝑅 𝑒 , 𝑅 𝑒 = · Id , where 𝛾 ≤ . Let 𝑄 𝑇 = · 𝑄 and 𝛾 = . . Let the terminal time 𝑇 = 𝑠 . We set the initial positions and the initialvelocities of the two players to be 𝑥 𝑝 ( ) = [ −
20 5 10 ] ′ and 𝑥 𝑒 ( ) = [−
50 10 1 10 ] ′ . Parameters 𝑐 , 𝛾 , 𝑂 𝑝 , and 𝑂 𝑒 are subject to change.For numerical computation of the Nash observation strategies, we know that when 𝛾 = . , the evader has lessmaneuverability than the pursuer. Hence, the evader’s observation strategy is to not observe to expose himself/herself. RXIV VERSION 12
Fig. 2: A realization of the PEEC game when 𝑂 𝑝 = and 𝑐 𝑝 = 𝑐 𝑒 = √ . (a) Trajectories of the Pursuer and theEvader on a two-dimensional plane; (b) Trajectory of the relative positions between the Pursuer and the Evader; (c)The Euclidean norm of the estimation error over time; (d) The Euclidean norm of the relative positions betweenthe pursuer and the evader over time.To compute the pursuer’s strategy, we first leverage the result given in eq. (11) to compute the upper bound ofthe optimal number of observations ¯ 𝑁 ∗ 𝑝 . Then, for every 𝑁 𝑝 ≤ ¯ 𝑁 ∗ 𝑝 , we solve the finite-dimensional optimization ineq. (13). Since the gradient and the Hessian of the objective function of eq. (13), we use projected gradient descentto compute the minimizer and verify the sufficiency by checking the second-order condition.In Figures 1-3, we present the realizations of the PEEC game under various costs of observation when the systemnoise level is 𝑐 𝑝 = 𝑐 𝑒 = √ . In Figure fig. 4, we present a realization of the PEEC game when the optimal numberof observations is and the system noise level is 𝑐 𝑝 = 𝑐 𝑒 = √ . To facilitate the visualization, we use animationto show the moving trajectories of the pursuer and the evader in the link . We also add time indices 𝑡 = { , , , } to the figures to help readers visualize the moving trajectory. https://github.com/Yun-Han/PE-DifferentialGame-StrategicInfo/tree/master/VideoSharing RXIV VERSION 13
Fig. 3: A realization of the PEEC game when 𝑂 𝑝 = and 𝑐 𝑝 = 𝑐 𝑒 = √ . (a) Trajectories of the Pursuer and theEvader on a two-dimensional plane; (b) Trajectory of the relative positions between the Pursuer and the Evader; (c)The Euclidean norm of the estimation error over time; (d) The Euclidean norm of the relative positions betweenthe pursuer and the evader over time.When the cost of observation is infinity, i.e., 𝑂 𝑝 = ∞ , the optimal observation strategy for the pursuer is to notobserve at all. As we can see in fig. 1 (a), the only observation point (marked by a blue cross marker) is the initialconditions that are assumed to known to both players. In this case. the controls of both players are equivalent to theopen-loop Nash control strategies in a deterministic setting. Since both players know each other’s initial position,at the beginning, the evader escapes toward the exact opposite direction of where the pursuer is initially located.This is due to the fact that acceleration on 𝑦 axis and 𝑦 axis requires the same cost, i.e., 𝑅 𝑝 and 𝑅 𝑒 are identitymatrices multiplied by some constants. As we can see from in fig. 1 (d), the euclidean distance between the pursuerand the evader narrows. But as the estimation error accumulates due to no observation, the pursuer lose track ofthe evader and even goes beyond where the evader is actually located 𝑦 𝑒 ( ) .When the cost of observation is 𝑂 𝑝 = , the optimal observation strategy for the pursuer is to observe two RXIV VERSION 14
Fig. 4: A realization of the PEEC game when 𝑂 𝑝 = and 𝑐 𝑝 = 𝑐 𝑒 = √ . (a) Trajectories of the Pursuer and theEvader on a two-dimensional plane; (b) Trajectory of the relative positions between the Pursuer and the Evader; (c)The Euclidean norm of the estimation error over time; (d) The Euclidean norm of the relative positions betweenthe pursuer and the evader over time.times at time instances T 𝑝 = { . 𝑠, . 𝑠 } . Since when the pursuer observes, the evader also knows the pursuer’slocation at the same time. Hence, there are observation points for both players in Fig. 2 (a) including the initialpoints. Based on the initial condition, as in 1 (a), the evader runs away from the pursuer and the pursuer chases afterthe evader following the same direction. At 𝑡 = . 𝑠 , the pursuer triggers the observation and both players observeeach other’s location. At this time, the relative position between the two players has the almost the same angleas the relative positives at time , so the trajectory of the two players is almost a line until the next observationat 𝑡 = . 𝑠 . At 𝑡 = . 𝑠 , the pursuer and the evader receive each other’s location and realize the relative anglebetween them is changed. Thus, after the observation, both players adjust their directions of chasing and evading,which cause a sharp turn in their trajectories. As we can see from Fig. 2 (c) that the estimate is refreshed to theactual state information and the estimation error is reset to when an observation arrives. From Fig. 2 (b), the RXIV VERSION 15 relative position between the two players is close to the origin near the terminal time. And as is shown in 2 (d),the Euclidean distance of the relative position goes down to at the end, which is a relative low value comparedwith the Euclidean distance at the initial positions. This indicates that when the disturbances level 𝑐 𝑝 = 𝑐 𝑒 = √ ,it is not necessary to observe every time to ensure a good performance. With an optimized set of observation timeinstances T 𝑝 , the pursuer can also achieve a fairly good performance. Hence, the Nash observation strategy can alsobe used to help the pursuer save sensing/communication costs while maintaining a certain level of performance.If the cost of observation goes down to 𝑂 𝑝 = , it is optimal to observe times at time instances T 𝑝 = { . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . } . As we can see from Fig. 3 (a), the pursuer follows behind the evader and trajectories of two players overlap. Werefer the readers to the animation provided in the link for a clearer description of the trajectories. The pursuersenses frequently and as a result, the evader receives observation frequently. Hence, the pursuer and the evaderadapts their controls immediately when they realize the angle of the relative position changes. The estimation errorremains low as is shown in Fig. 3 (c). From Fig. 3 (b) and (d), we can see that with better maneuverability andfrequent observations, the pursuer can easily narrows the distance to the evader to near zero before the terminaltime.We increase the system disturbances level to 𝑐 𝑝 = 𝑐 𝑒 = √ . Fig. 4 presents a realization of the PEEC gamewhen the optimal number of observations is 𝑁 ∗ 𝑝 = . Compared with the setting with lower disturbances, whichis presented in Fig. 2, the pursuer fails to narrow his/her distance to the evader to near zero when the systemdisturbances is larger. This shows that larger system disturbances give more advantage to an evader with lessmaneuverability when the pursuer has to pay a large overhead to sense. Hence, if an evader is less maneuverablethan the pursuer, the evader can still escape if he/she can keep a high stealth level (makes it more expensive for thepursuer to observe). In military applications, this means stealth technologies are especially important for battlefieldthings with less maneuverability.In conclusion, in this section, we show that a pursuer with higher maneuverability than the evader prefersmore observations(exposures). But the pursuer can achieve reasonably good performance even when the numberof observations is low. The Nash observation strategy enables the pursuer to observe less often while maintaininga good performance. We also show that when only a limited number of observations are available, larger systemdisturbances give an evader with less maneuverability more advantage. A less maneuverable evader can still escapeif he/she can avoid being detected by his/her opponent frequently by making it more expensive for his/her opponentto observe. V. C ONCLUSIONS
This paper proposes a framework that introduces the concept of controlled information into PE differential games.This framework enriches the existing framework of PE differential games by capturing the interactions between https://github.com/Yun-Han/PE-DifferentialGame-StrategicInfo/tree/master/VideoSharing RXIV VERSION 16 the pursuer and the evader in the battlefield of information. We show that the Nash observation strategies dependonly on the system characteristics. Players with less maneuverability won’t observe at all in fear of the exposureof his/she own state. The proposed PEEC game has a symmetric information structure because when one playerobserves, the other player also obtains the information. With symmetric information structure, we avoid the second-guessing problem, which may render the problem untractable. The framework also sparks several exciting ideasfor future exploring: 1. when one player senses(detects) the state(location) of the other player, he/she may exposehis state (location), but the information received by the other player is noisier than what he/she receives. Thisscenario creates an asymmetric information game with noised observations. 2. future works can focus on analyzingthe statistics aspects in terms of the players’ performance, such as the probability of capture within a given time.A
PPENDIX
A. Proof of Lemma 1Proof.
In this proof, we drop the time index of some variables for simplicity and readability purposes. The prooffollows the arguments in the proof of Theorem II.1 in [9].Let 𝑓 ( 𝑥, 𝑡 ) ≔ 𝑥 ( 𝑡 ) ′ 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) . An application of Itˆo’s formula [36] gives 𝑑𝑓 ( 𝑥, 𝑡 ) = 𝜕 𝑓𝜕𝑡 ( 𝑥, 𝑡 ) 𝑑𝑡 + ∇ 𝑥 𝑓 ( 𝑥, 𝑡 ) ′ 𝑑𝑥 ( 𝑡 ) + 𝑑𝑥 ( 𝑡 ) ′ ∇ 𝑥𝑥 𝑓 ( 𝑥, 𝑡 ) 𝑑𝑥 ( 𝑡 ) + 𝑜 ( 𝑑𝑡 ) , = 𝑥 ′ ¤ 𝐾𝑥𝑑𝑡 + 𝑥 ′ 𝐾𝑑𝑥 + 𝑑𝑥 ′ 𝐾𝑥 + 𝑑𝑥 ′ 𝐾𝑑𝑥 + 𝑜 ( 𝑑𝑡 ) = 𝑥 ′ (cid:0) ¤ 𝐾 + 𝐾 𝐴 + 𝐴 ′ 𝐾 (cid:1) 𝑥𝑑𝑡 + 𝑥 ′ 𝐾 ( 𝐵 𝑝 𝑢 𝑝 − 𝐵 𝑒 𝑢 𝑒 + 𝐶𝑤 ( 𝑡 )) 𝑑𝑡 + ( 𝐵 𝑝 𝑢 𝑝 − 𝐵 𝑒 𝑢 𝑒 + 𝐶𝑤 ( 𝑡 )) ′ 𝐾𝑥𝑑𝑡 + 𝑑𝑤 ( 𝑡 ) ′ 𝐶 ′ 𝐾𝐶𝑑𝑤 ( 𝑡 ) 𝑑𝑡 + 𝑜 ( 𝑑𝑡 ) , where ∇ 𝑥 and ∇ 𝑥𝑥 are the gradient and Hessian operators with respect to 𝑥 respectively. Immediately, we have = E (cid:20)∫ 𝑇 𝑥 ′ ( ¤ 𝐾 + 𝐾 𝐴 + 𝐴 ′ 𝐾 ) 𝑥 + 𝑥 ′ 𝐾 ( 𝐵 𝑝 𝑢 𝑝 − 𝐵 𝑒 𝑢 𝑒 ) + ( 𝐵 𝑝 𝑢 𝑝 − 𝐵 𝑒 𝑢 𝑒 ) ′ 𝐾𝑥𝑑𝑡 (cid:21) + ∫ 𝑇 Tr ( 𝐾𝐶𝐶 ′ ) 𝑑𝑡 − { E [ 𝑓 ( 𝑥 ( 𝑇 ) , 𝑇 ) − 𝑓 ( 𝑥 ( ) , )]} . (14)Adding the right-hand-side of eq. (14) to 𝐽 in eq. (2) and completing the squares yield 𝐽 = E [ 𝑥 ( ) ′ 𝐾 ( ) 𝑥 ( ) − 𝑥 ( 𝑇 ) ′ 𝐾 ( 𝑇 ) 𝑥 ( 𝑇 ) + 𝑥 ( 𝑇 ) ′ 𝑄 𝑇 𝑥 ( 𝑇 )]+ E (cid:20)∫ 𝑇 𝑥 ′ h ¤ 𝐾 + 𝐾 𝐴 + 𝐴 ′ 𝐾 + 𝐾 (cid:16) 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 − 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 (cid:17) i 𝑥 ′ 𝑑𝑡 (cid:21) + E (cid:20)∫ 𝑇 k 𝑢 𝑝 + 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾𝑥 k 𝑅 𝑝 − k 𝑢 𝑒 + 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾𝑥 k 𝑅 𝑒 + Tr ( 𝐾𝐶𝐶 ′ ) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 (cid:21) = k 𝑥 k 𝐾 ( ) + E (cid:20)∫ 𝑇 k 𝑢 𝑝 + 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾𝑥 k 𝑅 𝑝 − k 𝑢 𝑒 + 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾𝑥 k 𝑅 𝑒 + Tr ( 𝐾𝐶𝐶 ′ ) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 (cid:21) . This completes the proof. (cid:3)
B. Proof of Theorem 1Proof.
In this proof, we drop the time index of some variables for simplicity and readability purposes. The prooffollows follow a similar line of arguments as in [4], [28]. Given arbitrary Ω 𝑝 and Ω 𝑒 , Player P aims to minimize 𝐽 . RXIV VERSION 17
Meanwhile, player E aims to maximize 𝐽 . From Lemma 1, we know that only the first two terms in eq. (4) dependon the choices 𝑢 𝑝 and 𝑢 𝑒 . Thus, the Nash control strategies can be obtained by solving the following problem min 𝑢 𝑝 ∈U 𝑝 max 𝑢 𝑒 ∈U 𝑒 𝐽 𝑐 ( 𝑢 𝑝 , 𝑢 𝑒 ) , where 𝐽 𝑐 ( 𝑢 𝑝 , 𝑢 𝑒 ) ≔ E (cid:20) ∫ 𝑇 k 𝑢 𝑝 ( 𝑡 ) + 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) k 𝑅 𝑝 − k 𝑢 𝑒 ( 𝑡 ) + 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑡 ) 𝑥 ( 𝑡 ) k 𝑅 𝑒 𝑑𝑡 (cid:21) . From Proposition 3.2 of [28], we know that a necessary condition of a Nash control strategy is that 𝑢 𝑝 lies in therange space of the linear operator 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 and 𝑢 𝑒 lies in the range space of the linear operator 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 . Since U 𝑝 and U 𝑒 are the sets of admissible control strategies that are progressively measurable with respect to I . Thus,the Nash control strategies take the form of (cid:0) 𝑢 𝑝 ( 𝑡 ) , 𝑢 𝑒 ( 𝑡 ) (cid:1) = (cid:16) 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑡 ) ˆ 𝑥 𝑝 ( 𝑡 ) , 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑡 ) ˆ 𝑥 𝑒 ( 𝑡 ) (cid:17) , where ˆ 𝑥 𝑝 ( 𝑡 ) and ˆ 𝑥 𝑒 ( 𝑡 ) , chosen by player P and player E respectively, have to be I ( 𝑡 ) measurable.The problem now becomes solving the following problem by finding ˆ 𝑥 𝑝 and ˆ 𝑥 𝑒 that are progressively I measurable: min ˆ 𝑥 𝑝 max ˆ 𝑥 𝑒 ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) ≔ ∫ 𝑇 E h k 𝑥 − ˆ 𝑥 𝑝 k 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 − k 𝑥 − ˆ 𝑥 𝑒 k 𝐾 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 (cid:12)(cid:12)(cid:12) I( 𝑡 ) i 𝑑𝑡. Next, we study the first and second-order Gˆateaux differentials of ˜ 𝐽 𝑐 to characterize a Nash strategy ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) . First,let’s calculate the first and second-order of Gˆateaux differentials (pp.120 [38]) of ˜ 𝐽 𝑐 at ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) with directions ( ℎ 𝑒 , ℎ 𝑒 ) : 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) ≔ lim 𝜖 → ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 + 𝜖 ℎ 𝑝 , ˆ 𝑥 𝑒 + 𝜖 ℎ 𝑒 ) − ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) 𝜖 ,𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) ≔ lim 𝜖 → ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 + 𝜖 ℎ 𝑝 , ˆ 𝑥 𝑒 + 𝜖 ℎ 𝑒 ) − ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) − 𝜖 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) 𝜖 . (15)Note that given ˆ 𝑥 𝑝 and ˆ 𝑥 𝑒 , the solution of eq. (1) can be expressed as 𝑥 ( 𝑡 ) = 𝑒 𝐴𝑡 𝑥 − ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑠 ) ˆ 𝑥 𝑝 ( 𝑠 ) 𝑑𝑠 + ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑠 ) ˆ 𝑥 𝑒 ( 𝑠 ) 𝑑𝑠 + ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝑑𝑤 ( 𝑠 ) . (16)Given the perturbations 𝜖 ℎ 𝑝 and 𝜖 ℎ 𝑒 on ˆ 𝑥 𝑝 and ˆ 𝑥 𝑒 , the solution of eq. (1) becomes ˜ 𝑥 ( 𝑡 ) = 𝑥 ( 𝑡 ) − 𝜖 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) + 𝜖 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) , (17)where 𝐻 𝑝 and 𝐻 𝑒 are liear operators defined as 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) ≔ ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑝 𝐾 ( 𝑠 ) ℎ 𝑝 ( 𝑠 ) 𝑑𝑠𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) ≔ ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 𝐾 ( 𝑠 ) ℎ 𝑒 ( 𝑠 ) 𝑑𝑠. Therefore, we have ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 + 𝜖 ℎ 𝑝 , ˆ 𝑥 𝑒 + 𝜖 ℎ 𝑒 ) = ∫ 𝑇 E h k ˜ 𝑥 − ˆ 𝑥 𝑝 k 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 − k ˜ 𝑥 − ˆ 𝑥 𝑒 k 𝐾 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 (cid:12)(cid:12)(cid:12) I ( 𝑡 ) i 𝑑𝑡. (18) RXIV VERSION 18
Using eqs. (16) to (18) in eq. (15), we have 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ∫ 𝑇 E " − (cid:2) ℎ 𝑝 ( 𝑡 ) + 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) − 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) (cid:3) ′ 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑥 − ˆ 𝑥 𝑝 )+ (cid:2) ℎ 𝑒 ( 𝑡 ) + 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) − 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) (cid:3) ′ 𝐾 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑥 − ˆ 𝑥 𝑒 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) I ( 𝑡 ) 𝑑𝑡. (19)The necessary condition for ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) being a Nash strategy is 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = for all possible directions ( ℎ 𝑝 , ℎ 𝑒 ) . Under this condition, both players have no incentives to move away from ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) .Here, we consider ℎ 𝑒 ( 𝑡 ) = − ∫ 𝑡 𝑒 ( 𝐴 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ) ( 𝑡 − 𝑠 ) 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑠 ) ℎ 𝑝 ( 𝑠 ) 𝑑𝑠. (20)Hence, we have 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) = − ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 𝐾 ( 𝑠 ) ∫ 𝑠 𝑒 ( 𝐴 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ) ( 𝑠 − 𝜏 ) 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝜏 ) ℎ 𝑝 ( 𝜏 ) 𝑑𝜏𝑑𝑠 = − ∫ 𝑡 (cid:20)∫ 𝑡𝜏 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 𝐾 ( 𝑠 ) 𝑒 ( 𝐴 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ) ( 𝑠 − 𝜏 ) 𝑑𝑠 (cid:21) 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝜏 ) ℎ 𝑝 ( 𝜏 ) 𝑑𝜏 = − ∫ 𝑡 (cid:20)∫ 𝑡𝜏 𝑑𝑑𝑠 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝑒 ( 𝐴 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ) ( 𝑠 − 𝜏 ) 𝑑𝑠 (cid:21) 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝜏 ) ℎ 𝑝 ( 𝜏 ) 𝑑𝜏 = − ∫ 𝑡 h − 𝑒 ( 𝐴 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ) ( 𝑡 − 𝜏 ) + 𝑒 𝐴 ( 𝑡 − 𝜏 ) i 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝜏 ) ℎ 𝑝 ( 𝜏 ) 𝑑𝜏 = − 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) − ℎ 𝑒 ( 𝑡 ) . That means for any ℎ 𝑝 , we can construct ℎ 𝑒 following eq. (20) such that ℎ 𝑒 = 𝐻 𝑒 [ ℎ 𝑒 ] − 𝐻 𝑝 [ ℎ 𝑝 ] . Hence, for allpossible ℎ 𝑝 , we have ℎ 𝑒 defined by eq. (20) such that 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ∫ 𝑇 E (cid:2) −( ℎ 𝑝 − ℎ 𝑒 ) ′ 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑥 − ˆ 𝑥 𝑝 ) (cid:12)(cid:12) I ( 𝑡 ) (cid:3) 𝑑𝑡. Hence, the necessary condition that makes sure 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = for all possible ℎ 𝑝 is E (cid:2) 𝑥 ( 𝑡 ) − ˆ 𝑥 𝑝 ( 𝑡 ) (cid:12)(cid:12) I( 𝑡 ) (cid:3) , for all 𝑡. That means ˆ 𝑥 𝑝 ( 𝑡 ) = E [ 𝑥 ( 𝑡 )|I ( 𝑡 )] . Similarly, for any ℎ 𝑒 , we construct ℎ 𝑝 as ℎ 𝑝 ( 𝑡 ) = ∫ 𝑡 𝑒 ( 𝐴 − 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ) ( 𝑡 − 𝑠 ) 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ( 𝑠 ) ℎ 𝑒 ( 𝑠 ) 𝑑𝑠, (21)which gives ℎ 𝑝 = 𝐻 𝑒 [ ℎ 𝑒 ] − 𝐻 𝑝 [ ℎ 𝑝 ] . For any given ℎ 𝑒 and ℎ 𝑝 constructed by eq. (21), we have 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ∫ 𝑇 E (cid:2) ( ℎ 𝑒 − ℎ 𝑝 ) ′ 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ( 𝑥 − ˆ 𝑥 𝑒 ) (cid:12)(cid:12) I( 𝑡 ) (cid:3) 𝑑𝑡. Therefore, the necessary condition to guarantee that 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) for all possible ℎ 𝑒 is E [ 𝑥 − ˆ 𝑥 𝑒 |I( 𝑡 )] = , for all 𝑡. This implies ˆ 𝑥 𝑝 = ˆ 𝑥 𝑒 = E [ 𝑥 ( 𝑡 )|I ( 𝑡 )] . Note that I( 𝑡 ) = { 𝑥 ( 𝑠 )| < 𝑠 ≤ 𝑡, 𝑠 ∈ T } , where T = { 𝑡 , 𝑡 , · · · , 𝑡 𝑁 𝑝 + 𝑁 𝑒 } .Using the fact that E [ ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶 𝑑𝑤 ( 𝑠 )|I ( 𝑡 )] is a martingale [36], we obtain the following differential equationfor ˆ 𝑥 ( 𝑡 ) ≔ E [ 𝑥 ( 𝑡 )|I ( 𝑡 )] : 𝑑 ˆ 𝑥 ( 𝑡 ) = (cid:16) 𝐴 − ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 (cid:17) ˆ 𝑥 ( 𝑡 ) 𝑑𝑡, ˆ 𝑥 ( ) = 𝑥 , ˆ 𝑥 ( 𝜏 ) = 𝑥 ( 𝜏 ) , for all 𝜏 ∈ T . (22) RXIV VERSION 19
To show the sufficiency of ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ( ˆ 𝑥, ˆ 𝑥 ) being a Nash equilibrium, we resort to the second order Gˆateauxdifferential defined in eq. (15). Following the definition in eq. (15), we calculate 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ∫ 𝑇 E (cid:20) (cid:13)(cid:13) ℎ 𝑝 ( 𝑡 ) + 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) − 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) (cid:13)(cid:13) 𝐾 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 − (cid:13)(cid:13) ℎ 𝑒 ( 𝑡 ) + 𝐻 𝑝 [ ℎ 𝑝 ] ( 𝑡 ) − 𝐻 𝑒 [ ℎ 𝑒 ] ( 𝑡 ) (cid:13)(cid:13) 𝐾 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 (cid:12)(cid:12)(cid:12)(cid:12) I ( 𝑡 ) (cid:21) . We need to show that at point ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ( ˆ 𝑥, ˆ 𝑥 ) , there exist some directions ( ℎ 𝑝 , ℎ 𝑒 ) such that 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) < and some other directions 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) > . To show this, consider any ℎ 𝑝 ≠ and ℎ 𝑒 constructed accordingto eq. (20). Then, let ℎ 𝑝 be a constant over time. We have 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) > . Similarly, we can show thereexist some ( ℎ 𝑝 , ℎ 𝑒 ) such that 𝑑 ( ℎ 𝑝 ,ℎ 𝑒 ) ˜ 𝐽 𝑐 ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) < . This proves that ( ˆ 𝑥 𝑝 , ˆ 𝑥 𝑒 ) = ( ˆ 𝑥, ˆ 𝑥 ) , where ˆ 𝑥 has dynamicseq. (22), constitutes a Nash control strategy of the PEEC game. (cid:3) C. Proof of Corollary 1Proof.
Using eq. (4) in lemma 1 and the results in Theorem 1, we know that ˜ 𝐽 ( Ω 𝑝 , Ω 𝑒 ) ≔ 𝐽 ( Ω 𝑝 , 𝑢 ∗ 𝑝 ( Ω 𝑝 , Ω 𝑒 ) , Ω 𝑒 , 𝑢 ∗ 𝑒 ( Ω 𝑝 , Ω 𝑒 )) = E (cid:20)∫ 𝑇 k 𝑥 − ˆ 𝑥 k 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 𝑑𝑡 (cid:21) + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 + k 𝑥 k 𝐾 ( ) + ∫ 𝑇 Tr ( 𝐾 ( 𝑡 ) 𝐶𝐶 ′ ) 𝑑𝑡. Using eq. (7) and eq. (1), we know that 𝑑𝑥 − 𝑑 ˆ 𝑥 = (cid:16) 𝐴𝑥 − 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 𝐾 ˆ 𝑥 + 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 𝐾 ˆ 𝑥 (cid:17) 𝑑𝑡 + 𝐶𝑑𝑤 ( 𝑡 ) − (cid:16) 𝐴 − ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 (cid:17) ˆ 𝑥𝑑𝑡, = 𝐴 ( 𝑥 − ˆ 𝑥 ) + 𝐶𝑑𝑤 ( 𝑡 ) with refreshing points ˆ 𝑥 ( ) = 𝑥 ( ) = 𝑥 , ˆ 𝑥 ( 𝜏 ) = 𝑥 ( 𝜏 ) , for all 𝜏 ∈ T . Thus, for any 𝑡 ∈ ( , 𝑇 ] , let 𝜏 = max { 𝑠 | 𝑠 ∈ T , 𝑠 < 𝑡 } . We have 𝑝 ( 𝑡 ) ≔ 𝑥 ( 𝑡 ) − ˆ 𝑥 ( 𝑡 ) = 𝑒 𝐴 ( 𝑡 − 𝜏 ) 𝑝 ( 𝜏 ) + ∫ 𝑡𝜏 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝑑𝑤 ( 𝑠 − 𝜏 ) = ∫ 𝑡𝜏 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝑑𝑤 ( 𝑠 − 𝜏 ) . Hence E [ 𝑝 ( 𝑡 )] = . Let 𝑃 ( 𝑡 ) ≔ E [ 𝑝 ( 𝑡 ) 𝑝 ( 𝑡 ) ′ ] be the variance of the estimation error. We have 𝑃 ( 𝑡 ) = ∫ 𝑡𝜏 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑠 ) ′ 𝑑𝑠. (23)Hence, we have E (cid:20)∫ 𝑇 k 𝑥 − ˆ 𝑥 k 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 ) ′ 𝐾 𝑑𝑡 (cid:21) = ∫ 𝑇 E (cid:2) 𝑝 ( 𝑡 ) ′ (cid:2) 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 (cid:3) 𝑝 ( 𝑡 ) (cid:3) 𝑑𝑡 = ∫ 𝑇 Tr (cid:16) 𝑃 ( 𝑡 ) 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 𝑒 ) ′ 𝐾 (cid:17) 𝑑𝑡 = 𝑁 𝑝 + 𝑁 𝑒 Õ 𝑖 = ∫ 𝑡 𝑖 + 𝑡 𝑖 Tr (cid:20) (cid:18)∫ 𝑡𝑡 𝑖 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝑑𝑠 (cid:19) 𝐾 ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 (cid:21) 𝑑𝑡 = 𝑁 𝑝 + 𝑁 𝑒 Õ 𝑖 = ∫ 𝑡 𝑖 + 𝑡 𝑖 Tr [ Σ ( 𝑡 − 𝜏 𝑖 ) 𝜑 ( 𝑡 )] 𝑑𝑡, (24) RXIV VERSION 20 where Σ ( 𝑡 ) = ∫ 𝑡 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑠 ) ′ 𝑑𝑡 , 𝜑 ( 𝑡 ) = 𝐾 ( 𝑡 ) ( 𝐵 𝑝 𝑅 − 𝑝 𝐵 ′ 𝑝 − 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ) 𝐾 ( 𝑡 ) , and = 𝑡 < 𝑡 ≤ 𝑡 · · · ≤ 𝑡 𝑁 𝑝 + 𝑁 𝑒 <𝑡 𝑁 𝑝 + 𝑁 𝑒 + = 𝑇 . Hence, we complete the proof by showing ˜ 𝐽 ( Ω 𝑝 , Ω 𝑒 ) = 𝑁 𝑝 + 𝑁 𝑒 Õ 𝑖 = ∫ 𝑡 𝑖 + 𝑡 𝑖 Tr [ Σ ( 𝑡 − 𝜏 𝑖 ) 𝜑 ( 𝑡 )] 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 − 𝑂 𝑒 𝑁 𝑒 + k 𝑥 k 𝐾 ( ) + ∫ 𝑇 Tr ( 𝐾 ( 𝑡 ) 𝐶𝐶 ′ ) 𝑑𝑡. (cid:3) D. Proof of Proposition 1Proof.
First, we state two claims that are useful in the proof.
Claim 1 (Proposition 8.5.12 of [39]) . Consider two symmetric matrices Σ and Σ , and a positive semi-definitematrix Φ . If Σ ≤ Σ , then Tr ( Σ Φ ) ≤ Tr ( Σ Φ ) . Claim 2.
Let ( 𝑃 , 𝑡 ∈ [ , 𝑇 ]) be the variance of the estimation error defined in eq. (23) associated with T , and ( 𝑃 , 𝑡 ∈ [ , 𝑇 ]) be the variance of the estimation error defined in eq. (23) associated with T . If T ⊂ T , then 𝑃 ( 𝑡 ) > 𝑃 ( 𝑡 ) for all 𝑡 ∈ [ , 𝑇 ] . Here, Claim 2 is a direct result of the definition of ( 𝑃 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) in eq. (23). To prove (i), let Ω 𝑝 = ( 𝑁 𝑝 , T 𝑝 ) be any observation strategy of the pursuer. Let Ω 𝑛𝑜𝑒 = ( , ∅) be the no observation strategy for the evader. Let Ω 𝑒 be any other strategies such that 𝑁 𝑒 ≠ , T 𝑒 ≠ ∅ . Let ( 𝑃 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) be the variance of estimation error definedin eq. (23) associated with T = T 𝑝 ∪ ∅ and let ( 𝑃 ( 𝑡 ) , 𝑡 ∈ [ , 𝑇 ]) be associated with T = T 𝑝 ∪ T 𝑒 . Hence, we have T ⊂ T . By Claim 2, we have 𝑃 ( 𝑡 ) ≥ 𝑃 ( 𝑡 ) for all 𝑡 ∈ [ , 𝑇 ] . From eq. (24), we know ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑛𝑜𝑒 ) = ∫ 𝑇 Tr ( 𝑃 ( 𝑡 ) 𝜑 ( 𝑡 )) 𝑑𝑡 + 𝑁 𝑝 𝑂 𝑝 ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑒 ) = ∫ 𝑇 Tr ( 𝑃 ( 𝑡 ) 𝜑 ( 𝑡 )) 𝑑𝑡 + 𝑁 𝑝 𝑂 𝑝 − 𝑁 𝑒 𝑂 𝑒 . By Claim 1 and the fact that 𝜑 ( 𝑡 ) is positive definite for all 𝑡 (this is true when 𝐵 𝑝 𝑅 − 𝑝 𝐵 𝑝 ] > 𝐵 𝑒 𝑅 − 𝑒 𝐵 ′ 𝑒 ), we have ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑛𝑜𝑒 ) > ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑒 ) for any Ω 𝑝 and any Ω 𝑒 ≠ Ω 𝑛𝑜𝑒 . Thus, Ω ∗ 𝑒 = Ω 𝑛𝑜𝑒 .Now we prove (ii). Since the optimal strategy for the evader is not to observe at all no matter what Ω 𝑝 is, theproblem for the pursuer is to solve the following finite-dimensional optimization problem min Ω 𝑝 ˜ 𝐽 𝑜 ( Ω 𝑝 , Ω 𝑛𝑜𝑒 ) = 𝑁 𝑝 Õ 𝑖 = ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 . When 𝑂 𝑝 = , the best strategy is trivial, i.e., to observe every time and the optimal value will be . When 𝑂 𝑒 ≠ ,suppose Ω ∗ 𝑝 = ( 𝑁 ∗ 𝑝 , Ω ∗ 𝑝 ) is the optimal strategy. We have ˜ 𝐽 𝑜 ( Ω ∗ 𝑝 , Ω 𝑛𝑜𝑒 ) ≤ ˜ 𝐽 𝑜 (( , ∅) , Ω 𝑛𝑜𝑒 ) = ∫ 𝑇 Tr [ Σ ( 𝑡 ) 𝜑 ( 𝑡 )] 𝑑𝑡, and ˜ 𝐽 𝑜 ( Ω ∗ 𝑝 , Ω 𝑛𝑜𝑒 ) = 𝑁 ∗ 𝑝 Õ 𝑖 = ∫ 𝑡 ∗ 𝑝,𝑖 + 𝑡 ∗ 𝑝,𝑖 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 ∗ 𝑝 ≥ 𝑂 𝑝 𝑁 ∗ 𝑝 . Combining the two inequalities above, we have eq. (11).
RXIV VERSION 21
To prove (iii), note that for any given 𝑁 𝑝 , the optimal time instances 𝑡 ∗ 𝑝,𝑖 , 𝑖 = , , · · · , 𝑁 𝑝 has to satisfy thefirst-order necessary condition for the optimization problem given in eq. (13). Taking derivatives on the objectivefunction of eq. (13) with respect to 𝑡 𝑝,𝑖 and an application of Leibniz integral rule yield 𝑑𝑑𝑡 𝑝,𝑖 𝑁 𝑝 Õ 𝑗 = ∫ 𝑡 𝑝, 𝑗 + 𝑡 𝑝, 𝑗 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝, 𝑗 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 = 𝑑𝑑𝑡 𝑝,𝑖 (∫ 𝑡 𝑝,𝑖 𝑡 𝑝,𝑖 − Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 − ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 + ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr (cid:2) Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:3) 𝑑𝑡 + 𝑂 𝑝 𝑁 𝑝 ) = Tr (cid:2) Σ ( 𝑡 𝑝,𝑖 − 𝑡 𝑝,𝑖 − ) 𝜑 ( 𝑡 𝑝,𝑖 ) (cid:3) + ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr (cid:20) 𝑑𝑑𝑡 𝑝,𝑖 Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝜑 ( 𝑡 ) (cid:21) 𝑑𝑡, = Tr (cid:2) Σ ( 𝑡 𝑝,𝑖 − 𝑡 𝑝,𝑖 − ) 𝜑 ( 𝑡 𝑝,𝑖 ) (cid:3) − ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr h 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) ′ 𝜑 ( 𝑡 ) i 𝑑𝑡, = ∫ 𝑡 𝑝,𝑖 𝑡 𝑝,𝑖 − Tr h 𝑒 𝐴 ( 𝑡 𝑝,𝑖 − 𝑡 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 𝑝,𝑖 − 𝑡 ) ′ 𝜑 ( 𝑡 𝑝,𝑖 ) i 𝑑𝑡 − ∫ 𝑡 𝑝,𝑖 + 𝑡 𝑝,𝑖 Tr h 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) ′ 𝜑 ( 𝑡 ) i 𝑑𝑡 where we used the fact that 𝑑𝑑𝑡 𝑝,𝑖 Σ ( 𝑡 − 𝑡 𝑝,𝑖 ) = 𝑑𝑑𝑡 𝑝,𝑖 ∫ 𝑡𝑡 𝑝,𝑖 𝑒 𝐴 ( 𝑡 − 𝑠 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑠 ) ′ 𝑑𝑠 = − 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) 𝐶𝐶 ′ 𝑒 𝐴 ( 𝑡 − 𝑡 𝑝,𝑖 ) ′ . Since the objective function in 13 in continuous in 𝑡 𝑝,𝑖 for every 𝑖 = , , · · · , 𝑁 𝑝 and the constraint set is aclosed and bounded subset of R 𝑁 𝑝 (hence compact), by Weierstrass extreme value theorem, there exits at least oneminimizer for the optimization problem in 13. Thus, we arrive the conclusions in (iii). (cid:3) R EFERENCES[1] R. . Isaacs,
Differential games; a mathematical theory with applications to warfare and pursuit, control and optimization . New York,Wiley, 1965.[2] Y. Ho, A. Bryson, and S. Baron, “Differential games and optimal pursuit-evasion strategies,”
IEEE Transactions on Automatic Control ,vol. 10, no. 4, pp. 385–389, 1965.[3] M. Foley and W. Schmitendorf, “A class of differential games with two pursuers versus one evader,”
IEEE Transactions on AutomaticControl , vol. 19, no. 3, pp. 239–243, 1974.[4] A. Bagchi and G. J. Olsder, “Linear-quadratic stochastic pursuit-evasion games,”
Applied mathematics and optimization , vol. 7, no. 1, pp.95–123, 1981.[5] T. Bas¸ar and G. J. Olsder,
Dynamic noncooperative game theory . SIAM, 1998.[6] D. Li and J. B. Cruz, “Defending an asset: a linear quadratic game approach,”
IEEE Transactions on Aerospace and Electronic Systems ,vol. 47, no. 2, pp. 1026–1044, 2011.[7] T. E. Duncan, “Linear-quadratic stochastic differential games with general noise processes,” in
Models and Methods in Economics andManagement Science . Springer, 2014, pp. 17–25.[8] V. Y. Glizer and V. Turetsky, “Linear-quadratic pursuit-evasion game with zero-order players’ dynamics and terminal constraint for theevader,”
IFAC-PapersOnLine , vol. 48, no. 25, pp. 22–27, 2015.[9] T. E. Duncan, “Linear exponential quadratic stochastic differential games,”
IEEE Transactions on Automatic Control , vol. 61, no. 9, pp.2550–2552, 2015.[10] S. Y. Hayoun, M. Weiss, and T. Shima, “A mixed l 2/l 𝛼 differential game approach to pursuit-evasion guidance,” IEEE Transactions onAerospace and Electronic Systems , vol. 52, no. 6, pp. 2775–2788, 2016.[11] D. W. Oyler, P. T. Kabamba, and A. R. Girard, “Pursuit–evasion games in the presence of obstacles,”
Automatica , vol. 65, pp. 1–11, 2016.
RXIV VERSION 22 [12] A. Jagat and A. J. Sinclair, “Nonlinear control for spacecraft pursuit-evasion game using the state-dependent riccati equation method,”
IEEE Transactions on Aerospace and Electronic Systems , vol. 53, no. 6, pp. 3032–3042, 2017.[13] S. Talebi, M. A. Simaan, and Z. Qu, “Cooperative, non-cooperative and greedy pursuers strategies in multi-player pursuit-evasion games,”in . IEEE, 2017, pp. 2049–2056.[14] M. Pachter, E. Garcia, and D. W. Casbeer, “Toward a solution of the active target defense differential game,”
Dynamic Games andApplications , vol. 9, no. 1, pp. 165–216, 2019.[15] V. G. Lopez, F. L. Lewis, Y. Wan, E. N. Sanchez, and L. Fan, “Solutions for multiagent pursuit-evasion games on communication graphs:Finite-time capture and asymptotic behaviors,”
IEEE Transactions on Automatic Control , vol. 65, no. 5, pp. 1911–1923, 2019.[16] I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in . IEEE, 2020, pp. 1049–1066.[17] E. Garcia, D. W. Casbeer, M. Pachter, J. W. Curtis, and E. Doucette, “A two-team linear quadratic differential game of defending a target,”in . IEEE, 2020, pp. 1665–1670.[18] A. Kehagias, D. Mitsche, and P. Prałat, “The role of visibility in pursuit/evasion games,”
Robotics , vol. 3, no. 4, pp. 371–399, 2014.[19] Y. Huang, J. Chen, L. Huang, and Q. Zhu, “Dynamic games for secure and resilient control system design,”
National Science Review ,vol. 7, no. 7, pp. 1125–1141, 2020.[20] S. K. Singh and P. V. Reddy, “Dynamic network analysis of a target defense differential game with limited observations,” arXiv preprintarXiv:2101.05592 , 2020.[21] T. Basar, “On the uniqueness of the nash solution in linear-quadratic differential games,”
International Journal of Game Theory , vol. 5,no. 2, pp. 65–90, 1976.[22] R. Behn and Y.-C. Ho, “On a class of linear stochastic differential games,”
IEEE Transactions on Automatic Control , vol. 13, no. 3, pp.227–240, 1968.[23] I. Rhodes and D. Luenberger, “Differential games with imperfect state information,”
IEEE Transactions on Automatic Control , vol. 14,no. 1, pp. 29–38, 1969.[24] P. Bernhard and A.-L. Colomb, “Saddle point conditions for a class of stochastic dynamical games with imperfect information,”
IEEETransactions on Automatic Control , vol. 33, no. 1, pp. 98–101, 1988.[25] A. Gupta, A. Nayyar, C. Langbort, and T. Basar, “Common information based markov perfect equilibria for linear-gaussian games withasymmetric information,”
SIAM Journal on Control and Optimization , vol. 52, no. 5, pp. 3228–3260, 2014.[26] J. W. Clemens and J. L. Speyer, “On the lqg game with nonclassical information pattern using a direct solution method,”
IEEE Transactionson Automatic Control , vol. 65, no. 5, pp. 2078–2093, 2019.[27] T. E. Duncan and H. Tembine, “Linear–quadratic mean-field-type games: A direct method,”
Games , vol. 9, no. 1, p. 7, 2018.[28] D. Maity, A. Raghavan, and J. S. Baras, “Stochastic differential linear-quadratic games with intermittent asymmetric observations,” in , 2017, pp. 3670–3675.[29] D. Maity and J. S. Baras, “Linear quadratic stochastic differential games under asymmetric value of information,”
IFAC-PapersOnLine ,vol. 50, no. 1, pp. 8957–8962, 2017.[30] Y. Huang and Q. Zhu, “Infinite-horizon linear-quadratic-gaussian control with costly measurements,” arXiv preprint arXiv:2012.14925 ,2020.[31] C. Cooper and N. Hahi, “An optimal stochastic control problem with observation cost,”
IEEE Transactions on Automatic Control , vol. 16,no. 2, pp. 185–189, 1971.[32] Y. Huang, V. Kavitha, and Q. Zhu, “Continuous-time markov decision processes with controlled observations,” in . IEEE, 2019, pp. 32–39.[33] G. J. Olsder, “On observation costs and information structures in stochastic differential games,” in
Differential Games and Applications .Springer, 1977, pp. 172–185.[34] Y. Huang, Z. Xiong, and Q. Zhu, “Cross-layer coordinated attacks on cyber-physical systems: A lqg game framework with controlledobservations,” arXiv preprint arXiv:2012.02384 , 2020.[35] J. Engwerda,
LQ dynamic optimization and differential games . John Wiley & Sons, 2005.[36] R. Durrett,
Probability: theory and examples . Cambridge university press, 2019, vol. 49.[37] T. Bas¸ar and P. Bernhard,
H-infinity optimal control and related minimax design problems: a dynamic game approach . Springer Science& Business Media, 2008.[38] W. Cheney,
Analysis for applied mathematics . Springer Science & Business Media, 2001, vol. 208.
RXIV VERSION 23 [39] D. S. Bernstein,