Mean Field Game Theory for Agents with Individual-State Partial Observations
aa r X i v : . [ m a t h . O C ] M a y MEAN FIELD GAME THEORY FOR AGENTS WITHINDIVIDUAL-STATE PARTIAL OBSERVATIONS ∗ NEVROZ S¸EN † AND
PETER E. CAINES ‡ Abstract.
Subject to reasonable conditions, in large population stochastic dynamics games,where the agents are coupled by the system’s mean field (i.e. the state distribution of the genericagent) through their nonlinear dynamics and their nonlinear cost functions, it can be shown thata best response control action for each agent exists which (i) depends only upon the individualagent’s state observations and the mean field, and (ii) achieves a ǫ -Nash equilibrium for the system.In this work we formulate a class of problems where each agent has only partial observations onits individual state. We employ nonlinear filtering theory and the Separation Principle in order toanalyze the game in the asymptotically infinite population limit. The main result is that the ǫ -Nashequilibrium property holds where the best response control action of each agent depends upon theconditional density of its own state generated by a nonlinear filter, together with the system’s meanfield. Finally, comparing this MFG problem with state estimation to that found in the literaturewith a major agent whose partially observed state process is independent of the control action of anyindividual agent, it is seen that, in contrast, the partially observed state process of any agent in thiswork depends upon that agent’s control action. Key words. mean field games, partially observed stochastic control, nonlinear filtering, stochas-tic games.
AMS subject classifications.
1. Introduction.
For dynamical games of mean field type it has been demon-strated that when the agents are coupled through their dynamics and their cost func-tions, the best response control policies in the asymptotically infinite population limitonly depends upon their individual state and the system mean field. Furthermore,such policies generate approximate Nash equilibria when they are applied to a largefinite population game, see [14], [17], [18] and [16] among others, by Huang, Malham´eand Caines, [21], [22] and [23], by Lasry and Lions.A distinct consequence of such a result is that in the mean field games (MFG)set-up an individual agent does not have a significant benefit in learning the stateof an other agent, and therefore the estimation of any other agent’s state processhas negligible value. Nonetheless, in practical situations one does not have access tocomplete observation of its own state and and therefore models of such (PO) MFGsystems where the agents’ controls depend upon the agents’ observation processes canonly represent them as functions of the agents’ states via estimates of those states.Such a model for linear quadratic Gaussian (LQG) MFG type of problems has beenconsidered in [15] and approximate Nash equilibrium is obtained on an extended statespace. In this work, we consider the nonlinear MFG where an individual agent hasnoisy observation on its own state.Recent works, ([13] and [24]), consider MFG involving a major agent and manyminor agents (MM-MFG) where, by definition, a minor agent is an agent which, ∗ Some of the work in this paper was presented at the 55 th IEEE Conference on Decision andControl, Las Vegas, NV, USA, December, 2016. † ABB Inc. San Jose CA. This author’s work was performed at the Center for Intelligent Machines(CIM) and the Department of Electrical and Computer Engineering, McGill University, MontrealQC, Canada (email: [email protected] ). ‡ Department of Electrical and Computer Engineering and CIM, McGill University, Montreal QC,Canada (email: [email protected] ). Work supported by research grants to this author fromNatural Science and Engineering Research Council of Canada (NSERC) and Air Force Office ofScientific Research (AFOSR). 1
NEVROZ S¸EN AND PETER E. CAINES asymptotically as the population size goes to infinity, has a negligible influence onthe overall system while the overall population’s effect on it is significant, and wherea major agent is the agent which has asymptotically non-vanishing influence on eachminor agent as the population size goes to infinity. A fundamental feature of this setupis that, in contrast to the situation without a major agent, the mean field is stochasticdue to the stochastic evolution of the state of the major agent and the best responseprocesses of each minor agent depend on the state of the major agent. Motivatedby this observation, state estimation problems in the nonlinear MFG with a majoragent is considered in [10] (see [6] and [11] for the LQG case) where the major agent’sstate process is partially observed but the agents have complete observation of theirown states. Adopting the approach of constructing an equivalent completely observedmodel via application of nonlinear filtering, the MFG problem is analyzed in the spaceof conditional densities and the existence of Nash equilibria in the infinite populationand the ǫ -Nash equilibria for the finite population game is obtained. We finally remarkthat in addition to [13] and [24], MFG setup with major and minor agents has alsobeen considered in [5] and [8] where in the former the authors generalized the MM-MFG setup to the case where the mean field is determined by control policy of themajor agent and in the later, a probabilistic approach is taken for MFG problems inwhich major agent’s state exists in the both state and the cost functions. We alsorefer to [7] for the analysis of MFG with common noise.The individuals dynamics in the infinite population limit in an MFG setup arecharactherized by McKean-Vlasov (MV) type stochastic differential equations (SDEs).These SDEs have the property that the dynamics depend on the distribution of thestate process. Hence, a PO stochastic optimal control problem (SOCP) is formulatedfor MV type SDEs and as a consequence, the filtering equations should first be de-veloped for such SDEs for which a theory for joint state and distribution estimationin the case the measure is stochastic is developed in [26]. Following the standardapproach in the literature, once the filtering equations in the form of normalized orunnormalized densities are obtained, it is possible to obtain a form of the Hamilton-Jacobi-Bellman (HJB) equation in functional spaces. This is the path that we employin the paper by using the unnormalized conditional densities.It is also worthwhile to provide a summary of the technical steps that one shalldevelop in a nonlinear PO MFG setup. We first remark that one can follow differ-ent approaches in order to prove the convergence properties of MFG in the infinitepopulation. Among these, the convergence of the dynamics of the controlled stateprocess into a MV type dynamics when feedback controls are applied, see [17], greatlysimplifies the analysis of the associated optimal control problem. In the partiallyobserved setup, we follow this approach consequently, as the first step, we shall provesuch a convergence argument for the case where the control policies are in the feed-back form for conditional densities. We next analyze the fixed point property on theWasserstein space of probability measures. Recall however that the solution to thecompletely observed MFG problem is given by a coupled HJB and a Fokker-Plank-Kolmogorov (FPK) equation which essentially requires to analyze the sensitivity ofthe solutions to the HJB equations with respect to the probability measure represent-ing the mean field. In the PO MFG one needs to generalize such sensitivity resultswith respect to the conditional density component representing the information state.This is achieved by using the robustness property of the nonlinear filter. In the finalstage, we shall prove the approximate Nash equilibrium property of the best responsecontrol policies obtained as the solution to the HJB equation of the infinite populationgame. EAN FIELD GAMES WITH PARTIAL OBSERVATIONS A , A T , tr ( A )and A ij denotes the transpose, the trace and the corresponding entry, respectively. ∇ x and ∇ xx denotes the gradient and Hessian operators with respect to the variable x andin a one-dimensional domain, ∂ x and ∂ xx will be used instead. Let S be a metric space.Then, B ( S ) denotes the Borel σ -algebra and P ( S ) denotes the space of probabilitymeasures, respectively, on S . Let (Ω , F , {F t } t ≥ , P ) be a filtered probability spacesatisfying usual conditions. Conditional expectation with respect to a sigma algebra F is denoted by E ( ·|F ). For an Euclidean space H , we denote by L G ([0 , T ]; H ) theset of all {G} -adapted H -valued processes such that E R T | f ( t, ω ) | dt < ∞ .
2. Mean Field Games with Uniform Agents.
We consider a stochastic dy-namic game with N agents, {A i , ≤ i ≤ N } , where the dynamics of the agents aregiven by the following controlled SDEs on (cid:0) Ω , F , {F t } t ≥ , P (cid:1) : dz Ni ( t ) = 1 N N X j =1 f (cid:0) t, z Ni ( t ) , u Ni ( t ) , z Nj ( t ) (cid:1) dt + σdw i ( t ) , (1)with terminal time T ∈ (0 , ∞ ) and initial conditions z Ni (0) = z i (0), 1 ≤ i ≤ N , where(i) z Ni ( t ) ∈ R , u Ni ( t ) ∈ U , 0 ≤ t ≤ T , are the state and control input of agent A i , (ii) f : [0 , T ] × R × U × R → R is a measurable function, (iii) ( w i ( t )) t ≥ are independentstandard Brownian motions in R and; (iv) σ > ≤ j ≤ N we denoteby u N − j := { u N , . . . , u Nj − , u Nj +1 , . . . , u NN } , where agents’ states and controls are takento be scalar valued for simplicity of notation throughout the paper. The objective ofeach agent is to minimize its cost-coupling function given by J Ni ( u Ni , u N − i ) := E Z T N N X j =1 L (cid:0) z Ni ( t ) , u Ni ( t ) , z Nj ( t ) (cid:1) dt, (2)where L : R × U × R → R + . Remark that the above model can be generalized tothe case where the diffusion coefficient depends on the mean field coupling, where thestate processes take values in, say, R m and where the cost functions are time varying.We assume the followings:(A0) The initial states { z j (0) , ≤ j ≤ N } are mutually independent, independentof all Brownian motions and satisfy sup j ∈{ ,...,N } E | z j (0) | ≤ k < ∞ , where k is independent of N . Furthermore, let F N ( x ) := (1 /N ) P Ni =1 { E z i (0) ≤ x } denote the empirical distribution of agents where { E z i (0) 3. Partially Observed Mean Field Games and Nonlinear Filtering forMV Systems. In this section we formulate the estimation problem associated to theMFG set-up described above. Let agent A i has access to a noisy observation of itsown state via: dy i ( t ) = h ( t, z oi ( t )) dt + dν i ( t ) , (5)where ( ν i ( t )) ≤ t ≤ T is a Brownian motion independent of (cid:8) z i (0) , ( w i ( t )) ≤ t ≤ T , ≤ i ≤ N (cid:9) and of the other noise processes (cid:8) ( ν − i ( t ) ≤ t ≤ T (cid:9) . We assume the following.(A5) The function h : [0 , T ] × R → R ∈ C , t,x ([0 , T ] × R ), the space of functionswhich are differentiable in t and twice differentiable in x , with | ∂ x h ( t, x ) | + | ∂ xx h ( t, x ) | ≤ K and | ∂ t h ( t, x ) | ≤ K (1 + | x | ) for all ( t, x ) ∈ [0 , T ] × R .Following the standard approach to the PO SOCP, we shall construct the associatedcompletely observed system via application of nonlinear filtering for the dynamicsdescribed in (4). But prior to that we obtain an MV type approximation result forthe state process controlled with filtering dependent policies, since under suitableassumptions the optimal control takes a feedback form given by the solution of anHJB equation with infinite dimensional domain. EAN FIELD GAMES WITH PARTIAL OBSERVATIONS The nonlinearfiltering equations that each agent needs to generate are defined as follows: Given thehistory of observations F y i t := σ { y i ( s ) : s ≤ t } , determine a recursive expression for E [ ℓ ( z oi ( t )) |F y i t ] for ℓ ∈ C b ( R ), the space of all bounded differentiable functions withbounded derivatives up to order 2. Note that the agent’s state z oi ( t ) has MV typedynamics and so, for a fixed measure flow, we have f [ t, z oi , u i , µ t ] = Z R f [ t, z oi , u i , x ] µ t ( dx ) := f ∗ ( t, z oi , u i ) , (6)where f ∗ : [0 , T ] × R × U → R . Hence, consider the SDEs dz oi ( t ) = f ∗ ( t, z oi ( t ) , u i ( t )) dt + σdw i ( t ) , (7) dy i ( t ) = h ( t, z oi ( t )) dt + dν i ( t ) . (8)The filtering problem for the MV system described by (7)-(8) has been analyzedin [26] where filtering equations generating conditional distributions are obtained.We can similarly obtain filtering equations in the form of conditional densities asfollows. Define the following innovation process: I i ( t ) = y i ( t ) − R t E [ h ( s, z oi ( s )) |F y i s ] ds which can be shown to be a F y i t -Brownian motion under the measure P . Let π i := P ( z oi ( t ) |F y i t ) and define L ℓ := 12 σ ∂ xx ℓ + f ∗ ∂ x ℓ. (9)Define the adjoint operator on C ( R ) as: L ∗ θ ( x ) = 12 ∂ xx σθ ( x ) − ∂ x f ∗ θ ( x ) . (10)Let ϕ i denote the probability density for π i i.e., for A ∈ B ( R ), π i ( t, A ) = R A ϕ i ( t, x ) dx where ϕ i ( · ) is ( t, x )-measurable and F y i t adapted for each t ∈ [0 , T ]. Then ϕ i ( t, x )satisfies the following: For every t , ϕ i ( t, x ) = ϕ i (0 , x )(11) + Z t L ∗ ϕ i ( s, x ) ds + Z t ϕ i ( s, x ) (cid:26) h ( s, x ) − Z R h ( s, x ′ ) ϕ i ( s, x ′ ) dx ′ (cid:27) dI i ( s ) , for a.e. x with probability 1 and where ϕ i (0 , x ) is the initial conditional densityand I i ( t ) is the Innovations process, which is a Brownian motion, defined above.This can be shown, for instance, by following [19, Theorem 11.2.1]. Based on theconsistency based approach to MFG [17], we now provide a decoupling result whichdemonstrates that the closed loop dynamics of each agent in the infinite populationlimit is approximated by MV SDEs in the partially observed setup. Let E be a vector space with norm k · k E such that the process ϕ i ( t ), 0 ≤ t ≤ T , satisfying(11) takes values in. Recall also that the process ϕ i ( t ) is F y i t -adapted. Let α ( t, p ) :[0 , T ] × E → U be an arbitrary measurable process and assume that(M1) α ( t, p ) ∈ C Lip(p) ([0 , T ] × E ; U ) and α ( t, ∈ L F yit ([0 , T ]; U ).Assume that the process α ( t, · ) is used by agent i as its control laws in (1) such that u i = α for 1 ≤ i ≤ N . We then obtain the following closed-loop dynamics: dz Ni ( t ) = 1 N N X j =1 f (cid:0) t, z Ni ( t ) , α ( t, ϕ i ( t )) , z Nj ( t ) (cid:1) dt + σdw i ( t ) , z Ni ( t ) = z i (0) . (12) NEVROZ S¸EN AND PETER E. CAINES One can show that under the assumptions (A1)-(A4), the systems of equation givenin (12) has a unique solution (cid:0) z N , . . . , z NN (cid:1) by following similar steps to those in theproof of Theorem 6.16 of [28, p. 49] and by using the robustness (i.e., continuity withrespect to the observation path) of nonlinear filter; see Theorem 6. We now introducethe MV system for the generic agent where the agent’s MV system shall contain theestimation of its own state via nonlinear filtering equations: d ˆ z ( t ) = f [ t, ˆ z ( t ) , α ( t, ϕ ( t )) , µ t ] dt + σdw ( t ) , (13) dy ( t ) = h (ˆ z ( t )) dt + dν ( t ) , ≤ t ≤ T, (14)with the initial condition ˆ z (0) = z (0) and (cid:0) w ( t ) , ν ( t ) (cid:1) ≤ t ≤ T are standard Brownianmotion in R , which are independent of each other and independent of initial condition z (0). Furthermore, we characterize µ t by P (ˆ z ( t ) ≤ α ) = R α −∞ µ t ( dx ), 0 < t ≤ T . Fi-nally, ϕ ( t ) is the F yt -adapted solution to filtering equation for the conditional density.We remark that under (A0)-(A4), (A5) and (M1) it can be shown that a unique con-sistent solution to the above MV system exists; see Theorem 6. Let us also introduce d ˆ z i ( t ) = f [ t, ˆ z i ( t ) , α ( t, ϕ i ( t )) , µ t ] dt + σdw i ( t ) , (15) dy i ( t ) = h ( t, ˆ z i ( t )) dt + dν i ( t ) , ≤ t ≤ T, (16)where ( w i ( t ) , ν i ( t )) ≤ t ≤ T ≤ i ≤ N Brownian motions in R which are are indepen-dent of each other and independent of ( z i (0) , ≤ i ≤ N ) and µ t is the law of ˆ z i ( t ).These equations can be considered as N independent copies of (13)-(14). We can nowstate the MV approximation result. Theorem Assume (A0)-(A4), (A5) and (M1) hold. Then sup ≤ j ≤ N sup ≤ t ≤ T E | z Nj ( t ) − ˆ z j ( t ) | = O (cid:18) √ N (cid:19) , (17) where z Nj ( t ) and ˆ z j ( t ) , ≤ j ≤ N , are given in (12) and (15), respectively, and O ( √ N ) depends on T .Proof. The proof is an extension of [17, Theorem 12] to the case where controllaws depend on the filtering processes. Consider first the i th agent and notice that z Ni ( t ) − ˆ z i ( t ) =(18) Z t N N X j =1 f (cid:0) t, z Ni , α ( s, ϕ i ( s )) , z Nj (cid:1) ds − Z t f [ s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , µ s ] ds. Let D i ( s ) :=(19) 1 N N X j =1 f (cid:0) s, z Ni ( s ) , α ( s, ϕ i ( s )) , z Nj ( s ) (cid:1) − Z R f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , y ) µ s ( dy ) , and observe that D i ( s ) = D i ( s ) + D i ( s ) + D i ( s ) ,D i ( s ) := 1 N N X j =1 f (cid:0) s, z Ni ( s ) , α ( s, ϕ i ( s )) , z Nj ( s ) (cid:1) − N N X j =1 f (cid:0) s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , z Nj ( s ) (cid:1) , EAN FIELD GAMES WITH PARTIAL OBSERVATIONS D i ( s ) := 1 N N X j =1 f (cid:0) s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , z Nj ( s ) (cid:1) − N N X j =1 f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , ˆ z j ( s )) ,D i ( s ) := 1 N N X j =1 f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , ˆ z j ( s )) − Z R f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , y ) µ s ( dy ) . By the Lipschitz continuity of f and α , there exists a constant C > N such that | D i + D i | ≤ C N X j =1 (1 /N ) (cid:0) | z Ni − ˆ z i | + | z Nj − ˆ z j | (cid:1) . (20)From (18)-(20), it follows thatsup ≤ s ≤ t (cid:12)(cid:12) z Ni ( s ) − ˆ z i ( s ) (cid:12)(cid:12) ≤ C Z t (cid:12)(cid:12) z Ni ( s ) − ˆ z i ( s ) (cid:12)(cid:12) ds + C Z t (1 /N ) N X j =1 (cid:12)(cid:12) z Nj ( s ) − ˆ z j ( s ) (cid:12)(cid:12) ds + Z t D i ( s ) ds (21)which gives N X i =1 sup ≤ s ≤ t (cid:12)(cid:12) z Ni ( s ) − ˆ z i ( s ) (cid:12)(cid:12) ≤ C N X i =1 Z t (cid:12)(cid:12) z Ni ( s ) − ˆ z i ( s ) (cid:12)(cid:12) ds + Z t N X i =1 D i ( s ) ds ≤ C N X i =1 Z t sup ≤ τ ≤ s (cid:12)(cid:12) z Ni ( s ) − ˆ z i ( s ) (cid:12)(cid:12) ds + Z t N X i =1 D i ( s ) ds. (22)We consider the last item in (22). We have that E (cid:12)(cid:12) D i ( t ) (cid:12)(cid:12) ≤ Z t E (cid:12)(cid:12)(cid:12)(cid:12) N N X j =1 f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , ˆ z j ( s )) − Z R f [ s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , y ] µ s ( dy ) (cid:12)(cid:12)(cid:12)(cid:12) . (23)Define now g ( s, ˆ z i , x ) := f ( s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , x ) − f [ s, ˆ z i ( s ) , α ( s, ϕ i ( s )) , µ s ] andrecall that ϕ i ( t ) depends on ˆ z i ( t ) through (16). Therefore, for j = k , we have E [ g ( s, ˆ z i , ˆ z j ) g ( s, ˆ z i , ˆ z k )] = 0 , (24)which implies that there are no cross terms in (23). Consequently, by the boundednessof f and the inequality that (cid:16)P Ni =1 x i (cid:17) ≤ N P Ni =1 x i , we obtain E (cid:12)(cid:12) D i ( t ) (cid:12)(cid:12) ≤ k ( t ) /N = O (1 /N ) , (25) NEVROZ S¸EN AND PETER E. CAINES where k is an increasing function of t but independent of N . Now by (22), (25) andGronwall’s lemma N X i =1 E sup ≤ t ≤ T (cid:12)(cid:12) z Ni ( t ) − ˆ z i ( t ) (cid:12)(cid:12) = O (cid:18) √ N (cid:19) , (26)which yields E sup ≤ t ≤ T (cid:12)(cid:12) z Ni ( t ) − ˆ z i ( t ) (cid:12)(cid:12) = O (cid:16) √ N (cid:17) . The widely adopted procedure in the literature in the construc-tion of a completely observed stochastic optimal control problem from the partiallyobserved one is to use the unnormalized conditional density in the separation principlesince it is known that the cost function under an equivalent measure is linear in theinitial unnormalized conditional density. Furthermore, the dynamics of the unnor-malized conditional density is also a linear functional of the initial density and henceone can significantly benefit from the unnormalized construction including the closedform computation of the first and second order functional (Fr´echet) derivatives withrespect to the density-valued state component. However, in order to proceed with theunnormalized form, following the standard assumptions in the literature (see [3], [12]and [4]), we shall restrict ourselves to the state dynamics in the following form:(A6) The function f ( t, x, u, y ) is linear in the control: f ( t, x, u, y ) = f † ( t, x, y ) + u .Recall that if the probability measure flow ( µ t ) ≤ t ≤ T is fixed, f [ t, x, u, µ ] and L [ x, u, µ ]become a function of ( t, x, u ) and as before, we denote f ∗ ( t, x, u ) := f [ t, x, u, µ ] , L ∗ ( x, u ) := L [ x, u, µ ] . We need a further condition that the measure flow satisfies so that the induced func-tions are well behaved, see Definition 3 and Proposition 4 of [17]. Definition A probability measure flow µ t on [0 , T ] is in M [0 ,T ] if there exists β ∈ (0 , such that for any bounded and Lipschitz continuous function ψ on R , sup ≤ j ≤ K (cid:12)(cid:12)(cid:12)(cid:12)Z R ψ ( y ) µ jt ′ ( dy ) − Z R ψ ( y ) µ jt ′′ ( dy ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ B | t ′ − t ′′ | β , (27) for all t ′ , t ′′ ∈ [0 , T ] where for given µ t , B depends on upon the Lipschitz coefficientof ψ . In order to obtain the unnormalized filtering equations for the MV SDE, we first needto define an exponential martingale for the change of measure argument. Considerfirst the following MV SDE dz o ( t ) = f [ t, z o ( t ) , α ( t ) , µ t ] dt + σdw ( t ) , (28) dy ( t ) = h ( t, z o ( t )) dt + dν ( t ) , ≤ t ≤ T, (29)with z o (0) = z (0), y (0) = 0 where ( α ( t )) ≤ t ≤ T is an admissible control, i.e., an F yt -adapted process taking values in U . In the rest of this section, we assume that( µ t ) ∈ M [0 ,T ] is fixed with exponent β and we follow the approach presented in [3].Hence, we define the process w − ( t ) = w ( t ) − Z t u ( s ) ds (30) EAN FIELD GAMES WITH PARTIAL OBSERVATIONS P such that dPd ˜ P = M T ( u ) where M ts ( u ) := exp (cid:26)Z ts (cid:0) u ( τ ) dw − ( τ ) + h ( τ, z o ( τ )) dν ( τ ) (cid:1) − Z ts (cid:0) | u ( τ ) | + | h ( τ, z o ( τ )) | (cid:1) dτ (cid:27) . (31)It now follows from Girsanov’s theorem that under ˜ P , y ( t ) is a Brownian motion.Define now the backward differential operator and its adjoint as follows: For a ∈ U J at := 12 ∂ xx + ( f † + a ) ∂ x , J ∗ at := 12 ∂ xx − ( f † + a ) ∂ x − ∂ x f † . (32)Similarly, for a given control process u ∈ U , where U := (cid:8) u ( · ) ∈ U : u ( t ) is F yt − adapted and E R T | u ( t ) | dt < ∞ (cid:9) , we denote the family of operators by {J ut := J u t t , J ∗ ut := J ∗ u t t , ≤ t ≤ T } . (33)Consider a random function { q ( t, z ; τ, κ ); τ < t ≤ T } with ( z, κ ) ∈ R × R and assumethat it is a fundamental solution of the Zakai equation (which is known to exist [3])given by: dq ( t, z ; τ, κ ) = J ∗ ut q ( t, z ; τ, κ ) dt + h ( t, z ) q ( t, z ; τ, κ ) dy ( t ) , lim t ↓ τ q ( t, z ; τ, κ ) = δ z − κ , τ ≤ t ≤ T, ˜ P − a.s. (34)Let p ( z ) denote the density of z (0) and set q t ( z ; κ ) := q ( t, z ; 0 , κ ). Then by [3, Theorem4.1] the function p t ( z ) = R R q t ( z ; κ ) p ( κ ) dκ R R R R q t ( z ; κ ) p ( κ ) dκdz , (35)is a version of the conditional density of P ( z o ( t ) ∈ A |F yt ) i.e., for ℓ ∈ C b ( R ) and A ∈ B ( R ), E [ ℓ ( z o ( T )) |F yT ] = Z R p T ( z ) ℓ ( z ) dz ˜ P − a.s. (36)Finally, let us set ˜ ϕ ( t, z ) := R R q t ( z ; κ ) p ( κ ) dκ . Then, by [3, Theorem 4.1], we obtain d ˜ ϕ ( t, z ) = J ∗ ut ˜ ϕ ( t, z ) dt + h ( t, z ) ˜ ϕ ( t, z ) dy ( t ) , ≤ t ≤ T, ˜ ϕ (0 , z ) = p ( z ) , (37)where (37) is the Zakai equation for the unnormalized conditional density which willserve as the infinite dimensional state process of the completely observed optimalcontrol problem. It is worthwhile recalling at this point that the goal is to solve thepartially observed SOCP at the infinite population limit for which we aim to obtain anHJB equation in a function space by constructing the associated completely observedSOCP. We now proceed with such a derivation the first step of which requires one0 NEVROZ S¸EN AND PETER E. CAINES to define the cost in terms of the conditional density process and the new measuredefined via (31).Indeed, consider the cost function and note that J ( u ; p ) = E Z T L [ z o ( t ) , u ( t ) , µ t ] dt = ˜ E Z T (cid:18)Z R Z R L [ z, u ( t ) , µ t ] q t ( z ; x ) p ( x ) dxdz (cid:19) dt, (38)where ˜ E denotes expectation with respect to ˜ P and (38) follows from [3, Equation5.1]. Note that we explicitly indicate dependence on the initial condition. Define thefollowing space of functions: E k △ = (cid:26) p ∈ L ( R ); k p k k = Z R (cid:0) | z | k (cid:1) | p ( z ) | dz < ∞ (cid:27) . (39)In the derivation of the HJB equation we consider the function space (39) where forthe expected total cost incurred during [ T − τ, T ], 0 ≤ τ ≤ T , we assume that theinitial condition satisfies p T − τ ( z ) ∈ E k and hence, for a constant control u t = a ∈ U for all 0 ≤ t ≤ T , we have J : [0 , T ] × E k → R . Definition [3] Consider a probability space (cid:0) Ω , F , {F t } t ≥ , P (cid:1) and an E l -valued stochastic process ( η ( t, z )) ≤ t ≤ T adapted to the filtration F t with l ≥ . If E Z T (cid:18)Z R (cid:0) | z | l (cid:1) | η ( t, z ) | (cid:19) j dt < ∞ , (40) than we say that η ( t, z ) ∈ M l,j [ F t ] . A continuous cost functional is next defined by setting: V u ( τ, p ) := E Z TT − τ L [ z o ( t ) , u ( t ) , µ t ] dt, (41)where ( τ, p ) ∈ [0 , T ] × E k and z o ( T − τ ) has a distribution with density p . We nowrecall the definition of the Fr´echet derivative. A function f : X → Y is said be Fr´echetdifferentiable at x if there exists D f ( x ) ∈ L ( X ; Y ), where L ( X ; Y ) denote the space ofbounded linear operators from X to Y , such that lim = k h k→ k f ( x + h ) − f ( h ) − D f ( x ) · h k Y k h k X =0. One can define higher order Fr´echet derivatives in a similar manner. For in-stance, the second order Fr´echet derivative of f at x ∈ X satisfies that D f ( x ) ∈L ( X ; L ( X ; Y )). We define the following assumptions.(A7) The function V ( τ, p ) : [0 , T ] × E k → R possesses continuous first derivatives in τ and first and second order Fr´echet derivatives D V ( τ, p ) and D V ( τ, p ) withrespect to p in the form of linear functional and a bilinear form, respectively,which are given by D V ( τ, p )[ η ] = Z R V p ( τ, p ) ( z ) η ( z ) dz, D V ( τ, p )[ η, θ ] = Z R Z R V pp ( τ, p ) ( z, z ′ ) η ( z ) θ ( z ′ ) dzdz ′ , η ( · ) , θ ( · ) ∈ E l , where the kernels V p ( τ, p ) ( z ) and V pp ( τ, p ) ( z, z ′ ) are continuous in their ar-guments and satisfy the following: | V p ( τ, p ) ( z ) | ≤ ζ ( τ, k p k l ) (cid:0) | z | l (cid:1) , EAN FIELD GAMES WITH PARTIAL OBSERVATIONS | V pp ( τ, p ) ( z, z ′ ) | ≤ ζ ( τ, k p k l ) (cid:0) | z | l (cid:1) (cid:0) | z ′ | l (cid:1) , (42)for ζ , ζ being continuous functions on [0 , T ] × R + .(A8) Consider (37). Assume that J ∗ ut ˜ ϕ ( z ) ∈ M l, [ F yt ] ∩ M l, [ F yt ] ,h ( t, z ) ˜ ϕ ( t, z ) ∈ M l, [ F yt ] ∩ M l, [ F yt ] , (43)for some l ≥ u t = a , 0 ≤ t ≤ T and define N t ( x ) := R R L [ z, a, µ t ] q ( T, z ; t, x ) dz and s := T − τ .Notice that with this notation, we have that V a ( τ, p ) = ˜ E ( N s , p ) where we use thenotational convention that ( α, β ) := R R α ( z ) β ( z ) dz . Consider V a and note that due tothe linearity in the infinite dimensional component, for a fixed control a , the Fr´echetderivatives satisfy D V a ( τ, p )[ η ] = Z R V ap ( τ, p )( x ) η ( x ) dx, η ( · ) ∈ E l − , where V ap ( τ, p )( x ) = ˜ E N s ( x ). Similarly, D V a ( τ, p )[ η, θ ] = Z R V app ( τ, p )( x, x ′ ) η ( x ) θ ( x ′ ) dxdx ′ , where η ( · ) , θ ( · ) ∈ E l − and V app ( τ, p )( x, x ′ ) = 0. Therefore, the first set of conditionsof (A7) are already satisfied when the unnormalized conditional density is considered.We are now in the position to provide a HJB equation that the function given in(41) satisfies. Proposition Consider the probability space (cid:0) Ω , F , {F yt } t ≥ , P (cid:1) and any ad-missible control process { u t ; 0 ≤ t ≤ T } ∈ U along with ( z o ( t ) , y ( t ) , ν ( t ) , w ( t )) ≤ t ≤ T .Assume that (A1), (A2), (A3), (A5), (A6) and (A8) hold. If the following equation ∂V ( τ, p ) ∂τ = 12 D V ( τ, p ) · [ h ( T − τ ) p, h ( T − τ ) p ]+ min θ ∈ U (cid:8)(cid:0) J θT − τ D V ( τ, p )( · ) , p (cid:1) + ( L [ · , θ, µ T − τ ] , p ) (cid:9) ,V (0 , p ) = 0 , ( τ, p ) ∈ [0 , T ] × E k , (44) has a solution V ( τ, p ) := [0 , T ] × E k → R which satisfies the assumptions defined in(A7), then V ( τ, p ) is a lower bound to the cost achieved under the control process u ,i.e.,: V u ( τ, p ) := E Z TT − τ L [ z o ( t ) , u ( t ) , µ t ] dt ≥ V ( τ, p ) , (45) for any ( τ, p ) ∈ [0 , T ] × E k . Under the assumptions (A1)-(A3), (A5)-(A8) and the condition that the measure flow( µ t ) ≤ t ≤ T ∈ M [0 ,T ] is fixed, the proof of this proposition follow from [3, Theorem 5.2].Notice now that the PDE described in (44) is difficult to analyze (notice theexistence of a function space in the domain of V ); indeed, the solution to such anequation is not completely understood in the literature. Hence, in order to proceedwith the analysis of the PO MFG system, we assume the following.2 NEVROZ S¸EN AND PETER E. CAINES (A9) The equation (44) has a unique solution V ( t, p ) : [0 , T ] × E k → R with V ( t, p ) ∈ C , t,p ([0 , T ] × E k ).Notice that due to this assumption, the best response control process can now begiven in the following separated form: u ∗ = { u ∗ ( t, p ) = a ∗ ( T − t, p t ) ; 0 ≤ t ≤ T } ,a ∗ ( τ, p ) = arg min a ∈ U (cid:8)(cid:0) J aT − τ D V ( τ, p )( · ) , p (cid:1) + ( L [ · , a, µ T − τ ] , p ) (cid:9) , (46)if the Zakai equation (37) is strongly solvable for an F yt -adapted random function˜ ϕ ( t, z ) with u ( t ) = a ∗ (cid:16) T − t, ˜ ϕ ( t )( ˜ ϕ ( t ) , (cid:17) .Following standard procedures, e.g., see [10, Theorem 4], one can also show thatthe value function ¯ V : [0 , T ] × E k → R defined as ¯ V ( t, p ) := inf u ∈U V u ( t, p ) is asolution to the HJB equation given in (44).To summarize: in order to obtain its optimal control, a generic agent solves itspartially observed control problem defined by (28), (29) and (38) and obtains theoptimal control law in the feedback form given in (46).However, as in the completely observable and hence in the finite dimensionalcases, the existence of feedback control policies is in general not sufficient for thevalidity of the fixed point argument. Therefore, we further assume the following.(A10) For each ( µ t ) ∈ M [0 ,T ] , u ∗ (cid:16) t, p | ( µ t ) ≤ t ≤ T (cid:17) is continuous in ( t, p ) ∈ [0 , T ] × E k and Lipschitz continuous in p ∈ E k .Before we proceed with the fixed point analysis we remark the following. remark For the stochastic partial differential equation (SPDE) defined in(37), a solution via a Sobolev space characterization is considered in [28] where thesolution is defined on H where H := n f ∈ L ( R ) : ∂f∂x ∈ L ( R ) o with the norm k ∂k k H := { R R (cid:0) | k ( x ) | + | ∂k∂x | (cid:1) dx } / . Therefore, the infinite dimensional state pro-cess takes values in H and one can obtain a similar expression for the HJB equationfor which the stochastic calculus for Hilbert space-valued stochastic process can be used. 4. Analysis of the Partially Observed MFG System. Following the widelyused approach in MFG theory, it is now required to demonstrate that when thecompletely observed SOCP derived in Section 3.3 is considered and solved by eachgeneric agent, the corresponding strategies should collectively replicate the aggregatebehavior, which is the system mean field. This corresponds to the fixed point argumentof MFG analysis which is also referred to as Nash Certainty Equivalence (NCE). Forsuch an analysis, it suffices to prove that the MFG system has a unique solutionwhich can be achieved by proving that starting with an exogenous measure, µ o ( · ) , thecomposition map below has a fixed point in the space of probability measures. µ o ( · ) MV −→ z o ( · ) NLF −→ ˜ ϕ ( · ) ↑ ↓ HJB z o ( · ) MV ←− u o ( · , p ) BRC ←− V ( · , p )We now introduce some preliminary material about the metrics on a space of prob-ability measures which can be found in [17] and [27]. Let C ([0 , T ]; R ) be the spaceof continuous functions on [0 , T ]. For x, y ∈ C ([0 , T ]; R ) define the norm k x − y k :=sup t ∈ [0 ,T ] | x ( t ) − y ( t ) | . Then, ( C ([0 , T ]; R ) , k · k ) is a Banach space. Consider alsothe metric ρ ( x, y ) = sup t ∈ [0 ,T ] | x ( t ) − y ( t ) | ∧ 1; one can show that the metric space EAN FIELD GAMES WITH PARTIAL OBSERVATIONS C ([0 , T ]; R ) , ρ ) is complete and separable (Polish). Let C ρ := C ([0 , T ]; R ). On( C ([0 , T ]; R ) , k · k ) we define the σ -algebra F generated by the cylindrical sets ofthe form (cid:8) x ( · ) ∈ C ρ : x ( t i ) ∈ B i ; t i ∈ [0 , T ] , i = 1 , . . . , l (cid:9) where each B i ∈ B ( R )and l is a positive integer. Let M ( C ρ ) denote the space of all probability mea-sures m on ( C ρ , F ). M ( C ρ × C ρ ) denotes the space of all probability measureson the product space. Define the canonical process X with the sample space C ρ ; i.e., X t ( ξ ) = ξ t , ξ ∈ C ρ . Definition For m , m ∈ M ( C ρ ) , the Wasserstein metric is defined as fol-lows: D T ( m , m ) = inf Υ ∈ Π( m ,m ) Z C ρ × C ρ (cid:18) sup s ≤ T | X s ( ξ ) − X s ( ξ ) | ∧ (cid:19) d Υ( ξ , ξ )(47) where Π( m , m ) := (cid:8) Υ ⊂ M ( C ρ × C ρ ) : Υ( A × C ([0 , T ]; R )) = m ( A ) and Υ( C ([0 , T ]; R ) × A ) = m ( A ) , A ∈ B ( C ([0 , T ]; R )) (cid:9) . Note that the metric space ( M ( C ρ ) , D T ) is also Polish.We continue with the existence and uniqueness proof for MV SDEs in the partiallyobserved setup which is based on a fixed point argument in the space M ( C ρ ). Hence,for 1 ≤ i ≤ N , consider first the following SDEs: dz oi ( t ) = f [ t, z oi ( t ) , α ( t, ˜ ϕ i ( t )) , µ t ] dt + σdw i ( t ) , (48) dy i ( t ) = h ( t, z oi ( t )) dt + dν i ( t ) , ≤ t ≤ T, (49)where z oi (0) = z i (0) and ˜ ϕ i is generated by the unnormalized nonlinear filter and α is an admissible control. Let m ∈ M ( C ρ ) and observe that one can re-write (48) bydefining the random process ϑ ( t ) on [0 , T ] as follows: ϑ i ( t ) = Z t Z C ρ f ( s, ϑ i ( s ) , α ( s, ˜ ϕ i ( s )) , ξ s ) dm ( ξ ) ds + z i (0) + Z t σdw i ( s ) , ≤ t ≤ T. (50)Let us denote the law of ϑ i by Φ( m ) ∈ M ( C ρ ).Although the results which we derived in the previous sections hold with timevarying observation dynamics, it is simpler to handle the sensitivity analysis of thefiltering equation when the observation dynamics are time invariant. We thereforeassume the following in the rest of the paper.(A11) The observation dynamics is time invariant: h ( t, x ) = h ( x ). Theorem Under (A0)-(A3), (A5) and (A10), there exists a unique consistentsolution pair (cid:0) z oi ( t ) , µ t (cid:1) with µ t ∈ M [0 ,T ] .Proof. The proof is a generalization of [17, Theorem 6], [24, Theorem 6.12] and[10, Theorem 13] requires to consider an unnormalized conditional density in thecontrol law. For m, ˆ m ∈ M ( C ρ ) let ϑ i ( t ) and ˆ ϑ i ( t ) be defined by (50) correspondingto m and ˆ m , respectively, with the same initial condition z i (0). Similarly, let ˜ ϕ ( t )and ˆ˜ ϕ ( t ) be generated by the unnormalized filtering equations for ϑ i ( t ) and ˆ ϑ i ( t ),respectively. It follows thatsup ≤ s ≤ t (cid:12)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12)(cid:12) ≤ (51)4 NEVROZ S¸EN AND PETER E. CAINES Z t (cid:12)(cid:12)(cid:12)(cid:12) Z C ρ f ( s, ϑ i ( s ) , α ( s, ˜ ϕ ( s )) , ξ s ) dm ( ξ ) − Z C ρ f (cid:16) s, ˆ ϑ i ( s ) , α (cid:16) s, ˆ˜ ϕ ( s ) (cid:17) , ξ s (cid:17) d ˆ m ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12) ds. For any ¯ m ∈ M ( C ρ × C ρ ) with marginals ( m, ˆ m ), we haveΛ s = (cid:12)(cid:12)(cid:12)(cid:12) Z C ρ f ( s, ϑ i ( s ) , α ( s, ˜ ϕ ( s )) , ξ s ) dm ( ξ ) − Z C ρ f (cid:16) s, ˆ ϑ i ( s ) , α (cid:16) s, ˆ˜ ϕ ( s ) (cid:17) , ξ s (cid:17) d ˆ m ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) Z C ρ × C ρ f ( s, ϑ i ( s ) , α ( s, ˜ ϕ ( s )) , ξ s ) d ¯ m ( ξ, ˆ ξ ) − Z C ρ × C ρ f (cid:16) s, ˆ ϑ i ( s ) , α (cid:16) s, ˆ˜ ϕ ( s ) (cid:17) , ˆ ξ s (cid:17) d ¯ m ( ξ, ˆ ξ ) (cid:12)(cid:12)(cid:12)(cid:12) (52) ≤ C (cid:0)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12) ∧ (cid:1) + C (cid:0)(cid:13)(cid:13) ˜ ϕ i ( s ) − ˆ˜ ϕ i ( s ) (cid:13)(cid:13) E k ∧ (cid:1) + Z C ρ × C ρ C (cid:0)(cid:12)(cid:12) ξ s − ˆ ξ s (cid:12)(cid:12) ∧ (cid:1) d ¯ m ( ξ, ˆ ξ )(53)where (53) follows due to the boundedness and the Lipschitz continuity of f and α . We note that the essential difference with the completely observed MV SDEsis the existence of the conditional density terms, that is to say the solutions to thenonlinear filtering equations in the Zakai form where the observation process y ( t ) actsas the exogenous input process, which are going to be handled through the robustrepresentation of the filtering processes.Recall that for 0 ≤ t ≤ T , dy ( t ) = h (cid:0) ϑ i ( t ) (cid:1) dt + dν ( t ) ,d ˆ y ( t ) = h (cid:0) ˆ ϑ i ( t ) (cid:1) dt + dν ( t ) , (54)and hence, the filtering processes ˜ ϕ i ( t ) and ˆ˜ ϕ i ( t ) are F yt and F ˆ yt -adapted, respectively.Let ℓ ∈ C b ( R ) and for 0 ≤ t ≤ T consider the following unnormalized conditionalexpectations: ˜ E (cid:2) ℓ ( ϑ i ( t )) M t ( α ) |F yt (cid:3) , ˜ E (cid:2) ℓ (cid:0) ˆ ϑ i ( t ) (cid:1) M t ( α ) |F ˆ yt (cid:3) . (55)Let us also define the path valued random variable y · : Ω → ([0 , t ]; R ) such that y · ( ω ) = ( y ( s, ω ) , ≤ s ≤ t ). Hence, by [1, Theorem 5.12], there exists a function η ℓ : C ([0 , t ]; R ) → R such that˜ E (cid:2) ℓ ( ϑ i ( t )) M t ( α ) |F yt (cid:3) = η ℓ ( y · ) , ˜ E (cid:2) ℓ (cid:0) ˆ ϑ i ( t ) (cid:1) M t ( α ) |F ˆ yt (cid:3) = η ℓ (ˆ y · ) , ˜ P − a.s. (56)Furthermore, the function η ℓ is locally Lipschitz in the sup-norm and locally bounded[1, Lemma 5.6]. To continue, recall that by [3, Theorem 4.1],˜ E (cid:2) ℓ ( ϑ i ( t )) M t ( α ) |F yt (cid:3) = Z R ℓ ( x ) ˜ ϕ ( x ) dx, ˜ P − a.s. (57)Take l = (1 + | x | k ) for a k that is set in (39) so that we have (cid:12)(cid:12)(cid:12) ˜ E (cid:2) ℓ ( ϑ i ( t )) M t ( α ) |F yt (cid:3) − ˜ E (cid:2) ℓ (cid:0) ˆ ϑ i ( t ) (cid:1) M t ( α ) |F ˆ yt (cid:3)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z R (cid:16) | x | k (cid:17) (cid:16) ˜ ϕ ( x ) − ˆ˜ ϕ ( x ) (cid:17) dx (cid:12)(cid:12)(cid:12)(cid:12) EAN FIELD GAMES WITH PARTIAL OBSERVATIONS k ˜ ϕ ( x ) − ˆ˜ ϕ ( x ) k E k . (58)Notice that η (1+ | x | k ) is only locally Lipschitz, however, since y · and ˆ y · take values in C ([0 , t ]; R ), there exists R ′ > k y · k ≤ R ′ for all ω ∈ Ω. Hence let R > R ′ and consequently for any k y · k , k ˆ y · k ≤ R , there exists a constant C Rf > k ˜ ϕ ( x ) − ˆ˜ ϕ ( x ) k E k = k η (1+ | x | k ) ( y · ) − η (1+ | x | k ) (ˆ y · ) k≤ C Rf sup ≤ s ≤ t | y ( s ) − ˆ y ( s ) | ≤ C Rf C sup ≤ s ≤ t (cid:12)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12)(cid:12) (59)where C is a constant obtained from the Lipschitz continuity of h . Substituting (59)in (53) yieldsΛ s ≤ (cid:0) C + C C Rf C (cid:1) (cid:0)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12) ∧ (cid:1) + Z C ρ × C ρ C (cid:0)(cid:12)(cid:12) ξ s − ˆ ξ s (cid:12)(cid:12) ∧ (cid:1) d ¯ m ( ξ, ˆ ξ ) . (60)Clearly, by (60) the proof easily follows from [17, Theorem 6]. We herein providedetails for the sake of completeness. Notice first that (60) impliesΛ s ≤ (cid:0) C + C C Rf C (cid:1) ( | ϑ i ( s ) − ˆ ϑ i ( s ) | ∧ 1) + C D s ( m, ˆ m ) , (61)since ¯ m is any measure with marginals m and ˆ m . (53) and (61) together implysup ≤ s ≤ t (cid:12)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12)(cid:12) ≤ Z t (cid:20) C D s ( m, ˆ m ) + (cid:0) C + C C Rf C (cid:1) (cid:16)(cid:12)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12)(cid:12) ∧ (cid:17)(cid:21) ds. (62)Applying Gronwall’s lemma yieldssup ≤ s ≤ t (cid:12)(cid:12)(cid:12) ϑ i ( s ) − ˆ ϑ i ( s ) (cid:12)(cid:12)(cid:12) ∧ ≤ C RT Z t C D s ( m, ˆ m ) ds. (63)where C RT = exp (cid:16)(cid:16) C + C C Rf C (cid:17) T (cid:17) . Notice that ϑ i and ˆ ϑ i induce two probabilitydistributions, denoted by Φ( m ) and Φ( ˆ m ), respectively, on C ρ , Likewise, the jointdistribution of ( ϑ i , ˆ ϑ i ) gives a measure ¯ m Φ on C ρ × C ρ . Taking expectation of bothsides of (64) we obtain D t (cid:0) Φ( m ) , Φ( ˆ m ) (cid:1) ≤ C RT Z t C D s ( m, ˆ m ) ds. (64)The proof of the existence and uniqueness is now complete by following the Cauchyargument in [27]. Finally, the claim that µ t ∈ M [0 ,T ] directly follows from [17, Lemma7].By the analysis in Section 3.3, we have first obtained the optimal control law for thegeneric agent by assuming that (i) it has only access to partial information on itsown state and (ii) the flow of probability measures is fixed. In the second step weproved that the closed loop MV dynamics of the generic agent has a unique solutionwhen the agent uses the Lipschitz control strategy obtained as the solution of HJBequation derived in Section 3.3. Furthermore, it follows that after all agents apply suchLipschitz control strategies, the resulting measure flow maintains a certain continuity.Hence, one can refine its strategy by solving the HJB equation of the partially observedcontrol problem by using this new measure. In the next section we discuss this inmore detail.6 NEVROZ S¸EN AND PETER E. CAINES Given µ ot ∈ M [0 ,T ] , bythe proposition 4, we obtain a solution for V ( · ) and subsequently, corresponding toeach ( t, p ) ∈ [0 , T ] × E k we get u ( t, p ) as a well defined function minimizing (44).Consequently, we write the optimal control law in the feedback form u = u ∗ (cid:16) t, p (cid:12)(cid:12) ( µ ot ) ≤ t ≤ T (cid:17) , ( t, p ) ∈ [0 , T ] × E k , (65)for which we define the well defined map: Υ : M [0 ,T ] → C Lip(p) ([0 , T ] × E k ; U ) withΥ (cid:16) ( µ ot ) ≤ t ≤ T (cid:17) := u ∗ (cid:16) t, p (cid:12)(cid:12) ( µ ot ) ≤ t ≤ T (cid:17) . (66)Notice that the map characterizes the interaction of individual and the mass which canbe considered as the best response map; the individual optimal decision is obtainedwhile the actions of all other agents are fixed.We now consider the second component of the fixed point argument. Given a func-tion α ( t, p ) ∈ C Lip(p) ([0 , T ] × E k ; U ) we implement it as a control law in (48) whichleads to a well defined solution ( z o ( t ) , y ( t )), 0 ≤ t ≤ T . Let us denote the law of thesolution of z o by m ∈ M ( C ρ ). Then, we define the map ˆΥ : C Lip(p) ([0 , T ] × E k ; U ) →M ( C ρ ) by m = ˆΥ ( α ) . (67)Note that by Theorem 6, ( µ t ) ≤ t ≤ T , the law of z o ( t ), is in M [0 ,T ] and hence one canalso specify the well defined map ¯Υ : C Lip(p) ([0 , T ] × E k ; U ) → M [0 ,T ] by( µ t ) ≤ t ≤ T = ¯Υ ( α ) . (68)Therefore, we obtain the following proposition as the partially observed equivalent of[17, Proposition 8]. Proposition Assume (A0)-(A10) and ( µ ot ) ≤ t ≤ T ∈ M [0 ,T ] . We have ¯Υ ◦ Υ (cid:16) ( µ ot ) ≤ t ≤ T (cid:17) ∈ M [0 ,T ] , i.e., Υ M := ¯Υ ◦ Υ : M [0 ,T ] → M [0 ,T ] . It is now clear that by the construction of Υ and ¯Υ we obtain a solution to NCEsystem if we can find µ ot ∈ M [0 ,T ] that satisfies the fixed point equation¯Υ ◦ Υ (cid:16) ( µ ot ) ≤ t ≤ T (cid:17) = ( µ ot ) ≤ t ≤ T . (69)As mentioned in [17], there exists several difficulties in demonstrating the existenceof a fixed point for (69); most notably, the sensitivity of the optimal control policieswith respect to the measure flow ( µ ot ) ≤ t ≤ T and hence, such a regularity condition istaken as an assumption in [17, See (37)]. By generalizing the approach presented in[24], we derive a similar sensitivity analysis for the partially observed setup.Let 0 ≤ t ≤ T and s = T − t and consider the best response process given by thesolution of HJB equation in (44): u ∗ ( t, p t ) = arg min a ∈ U { ( J at D V ( s, p t )( · ) , p t ) + ( L [ · , a, µ ot ] , p t ) } = arg min a ∈ U (cid:26) a (cid:18)Z ∂ x V p ( s, p t )( x ) p t ( x ) dx + L [ x, a, µ ot ] p t ( x ) dx (cid:19)(cid:27) . (70) EAN FIELD GAMES WITH PARTIAL OBSERVATIONS H ( t, p, a, ∂ x V p ( s, p )( x )) := a (cid:18)Z R ∂ x V p ( s, p )( x ) p ( x ) dx + L [ x, a, µ ot ] p ( x ) dx (cid:19) . (71)We also define the closed loop dynamics by dz o ( t ) = f [ t, z o ( t ) , u ( t ) , µ ot ] dt + σdw ( t ) , z o (0) = z (0) ,dy ( t ) = h ( z o ( t )) dt + dν ( t ) , y (0) = 0 . (72)We assume the following.(A11) For each µ t ∈ M [0 ,T ] , the set S ( t, p, q ) := argmin a ∈ U H ( t, p, a, q ) , (73)is singleton and the resulting u ∗ as a function of ( t, p, q ) ∈ [0 , T ] × E k × R is continuous in t , Lipschitz continuous in ( p, q ), uniformly with respect to t and µ t ∈ M [0 ,T ] .The conditions under which the above assumptions hold in the partially observed caseare beyond the scope of this work. However, in the completely observed situation,such conditions can be satisfied under sutiable convexity assumptions in the controlvariable; the reader is referred to [17] for more details.Following [20], we define the Gˆateux derivative of a function F ( t, p, µ ) : [0 , T ] × E k × P ( R ) → R with respect to the measure µ ( y ) as follows : ∂ µ ( y ) F ( t, p, µ ) = lim ǫ → F ( t, p, µ + ǫδ ( y )) − F ( t, p, µ ) ǫ , (74)where δ is the Dirac delta function. We assume the following.(A12) The Gˆateux derivatives of f and L with respect to µ exists are C ∞ ( R ) anduniformly bounded. Let V ( t, p ) : [0 , T ] × E k → R be the unique solution to theHJB equation in (44). Then, V ( t, p ) is uniformly bounded, its Gˆateaux deriva-tive D V p ( t, p ) (or the Kernel V p ( t, p )( x )) exists and is uniformly bounded withrespect to all its parameters. Theorem Assume (A0)-(A12) hold and in addition assume that the resultingcontrol law is Lipschitz in µ . Then for given ( µ t ) ≤ t ≤ T , (˜ µ t ) ≤ t ≤ T ∈ M [0 ,T ] , thereexists a constant c such that sup t,p ∈ [0 ,T ] × E k | u ∗ ( t, p ) − ˜ u ∗ ( t, p ) | ≤ c (cid:0) D T ( µ, ˜ µ ) (cid:1) . (75) Proof. Note that Assumption (A12) and the fact that resulting u ∗ is Lipschitzcontinuous in µ yields | u ∗ ( t, p ) − ˜ u ∗ ( t, p ) | ≤ k D t (cid:0) µ, ˜ µ (cid:1) + k | ∂ x V µp ( t, p )( x ) − ∂ x V ˜ µp ( t, p )( x ) | , (76)with positive constants k and k where V µp ( t, p )( x ) and V ˆ µp ( t, p )( x ) are the kernelsdefined in (42) corresponding to measures µ and ˆ µ .The goal is to use the existence of Gˆateux derivative of the Kernel ∂ x V p ( t, p )( x )with respect to the measure µ which satisfy the boundedness assumption so thatone can invoke the mean value theorem (MVT). Indeed, by the assumption that ∂ µ ∂ x V p ( t, p )( x ) is uniformly bounded, by MVT we obtain | ∂ x V µp ( t, p )( x ) − ∂ x V ˜ µp ( t, p )( x ) | ≤ k D t (cid:0) µ, ˜ µ (cid:1) . (77)8 NEVROZ S¸EN AND PETER E. CAINES Consequently, we obtain | u ∗ ( t, p ) − ˜ u ∗ ( t, p ) | ≤ ( k + k k ) D t (cid:0) µ, ˜ µ (cid:1) | , (78)which completes the proof with c = k + k k . remark In Theorem 8 above, we assume the uniform boundedness of theGˆateux derivative of the function that satisfies the HJB equation obtained for the par-tially observed setup. We note that one could follow a similar approach to [20, Section6] in order to analyze the boundedness property of the solution of the HJB equation(44) (see also [24, Appendix F] for the finite dimensional case) and its Fr´echet deriva-tives by analyzing associated kernels. However, such an analysis would considerablydepend upon the analysis of PDE in the form (44) which is not well understood in theliterature. We now provide a sensitivity result for the distance between two measures with respectto the control policies by combining the general developed in [17], [24] and [10]. Lemma Under (A0)-(A11) there exists a constant c such that D T ( m, ˆ m ) ≤ c sup t,p ∈ [0 ,T ] × E k | u ∗ ( t, p ) − ˆ u ∗ ( t, p ) | , (79) where m, ˆ m ∈ M ( C ρ ) are induced by (67) using u ∗ , ˆ u ∗ ∈ C Lip(p) ([0 , T ] × E k ; U ) .Proof. Denote the two solutions corresponding to u ∗ and ˜ u ∗ by z o ( t ) and ˆ z o ( t ).Hence, z o ( t ) = Z t Z C ρ f ( s, z o ( s ) , u ∗ ( s, ˜ ϕ ( s )) , ξ s ) dm ( ξ ) ds + z (0) + Z t σdw ( s ) , ˆ z o ( t ) = Z t Z C ρ f (cid:16) s, ˆ z o ( s ) , ˆ u ∗ (cid:16) s, ˆ˜ ϕ ( s ) (cid:17) , ˆ ξ s (cid:17) d ˆ m ( ˆ ξ ) ds + z (0) + Z t σdw ( s ) . (80)By the Lipschitz continuity of f , u ∗ and ˆ u ∗ we obtain (cid:12)(cid:12)(cid:12) f ( s, z o ( s ) , u ∗ ( s, ˜ ϕ ( s )) , ξ s ) − f (cid:0) s, ˆ z o ( s ) , ˆ u ∗ (cid:0) s, ˆ˜ ϕ ( s ) (cid:1) , ˆ ξ s (cid:1)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) f ( s, z o ( s ) , u ∗ ( s, ˜ ϕ ( s )) , ξ s ) − f (cid:0) s, ˆ z o ( s ) , u ∗ (cid:0) s, ˆ˜ ϕ ( s ) (cid:1) , ˆ ξ s (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) f (cid:0) s, ˆ z o ( s ) , u ∗ (cid:0) s, ˆ˜ ϕ ( s ) (cid:1) , ˆ ξ s (cid:1) − f (cid:0) s, ˆ z o ( s ) , ˆ u ∗ (cid:0) s, ˆ˜ ϕ ( s ) (cid:1) , ˆ ξ s (cid:1)(cid:12)(cid:12) ≤ C ( | z o ( s ) − ˆ z o ( s ) | ∧ 1) + C (cid:0) k ˜ ϕ ( s ) − ˆ˜ ϕ ( s ) k E k ∧ (cid:1) + C (cid:16)(cid:12)(cid:12)(cid:12) ξ s − ˆ ξ s (cid:12)(cid:12)(cid:12) ∧ (cid:17) + C sup ( s,p ) ∈ [0 ,T ] × E k | u ∗ ( s, p ) − ˆ u ∗ ( s, p ) | . (81)Following similar steps in the proof of Theorem 6 we get | z o ( t ) − ˆ z o ( t ) | ≤ Z t C ( | z o ( s ) − ˆ z o ( s ) | ∧ ds + Z t C (cid:0) k ˜ ϕ ( s ) − ˆ˜ ϕ ( s ) k E k ∧ (cid:1) ds + C Z t D s ( m, ˆ m ) ds + C t sup ( s,p ) ∈ [0 ,T ] × E k | u ∗ ( s, p ) − ˆ u ∗ ( s, p ) | . (82)Now by use of the robust representation of nonlinear filter demonstrated in (54)-(60),we obtain | z o ( t ) − ˆ z o ( t ) | EAN FIELD GAMES WITH PARTIAL OBSERVATIONS ≤ Z t C ( | z o ( s ) − ˆ z o ( s ) | ∧ 1) + (cid:0) C + C C Rf C (cid:1) sup ≤ s ≤ t (cid:0) | z o ( s ) − ˆ z o ( s ) | ∧ (cid:1) ds + C Z t D s ( m, ˆ m ) ds + C t sup ( s,p ) ∈ [0 ,T ] × E k | u ∗ ( s, p ) − ˆ u ∗ ( s, p ) | (83)where C Rf and C are defined in (59). Applying Gronwall’s lemma to (83) givessup ≤ s ≤ t | z o ( t ) − ˆ z o ( t ) | ∧ ≤ exp (cid:0)(cid:0) C + C C Rf C (cid:1) T (cid:1) (cid:18) C Z t D s ( m, ˆ m ) ds + C t sup ( s,p ) ∈ [0 ,T ] × E k | u ∗ ( s, p ) − ˆ u ∗ ( s, p ) | (cid:19) . Consequently, D t ( m, ˆ m ) ≤ (84)exp (cid:0)(cid:0) C + C C Rf C (cid:1) T (cid:1) (cid:18) C Z t D s ( m, ˆ m ) ds + C t sup ( s,p ) ∈ [0 ,T ] × E k | u ∗ ( s, p ) − ˆ u ∗ ( s, p ) | (cid:19) . Applying Gronwall’s lemma to (84) we complete the proof.We can now present the main result of PO MFG theory. Theorem 10 ( Main Result ). Assume (A0)-(A12) hold and consider the pro-cesses V ( t, p ) , u ∗ ( t ) , z o ( t ) and y ( t ) , which are defined in (44), (70) and (72), re-spectively. If the constants ( c , c ) for (75) and (79) satisfy the gain condition that c c < there exists a unique solution to (69) and hence a unique solution to theMFG system given by (44), (70) and (72).Proof. The proof follows from the Banach fixed point theorem for the map ¯Γ ◦ Γdefined on the Polish space M [0 ,T ] since the gain condition assures that the map is acontraction. ǫ -Nash Equilibrium Property of the MFG Control Laws. Followingthe common methodology employed in the MFG literature, we shall investigate theperformance of the best response control processes obtained in the previous sectionin a finite population setting. Consider the following dynamics described in (1): dz Ni ( t ) = 1 N N X j =1 f (cid:0) t, z Ni ( t ) , u Ni ( t ) , z Nj ( t ) (cid:1) dt + σdw i ( t ) , (85) dy Ni ( t ) = h (cid:0) z Ni ( t ) (cid:1) dt + dν i ( t ) , (86)with z Ni (0) = z i (0), y i (0) = 0, 1 ≤ i ≤ N . Here ( w i ( t ) , ν i ( t ) , ≤ i ≤ N ) are indepen-dent Brownian motions in R . Similarly, define the following set of MV equations dz oi ( t ) = f [ t, z oi ( t ) , u oi ( t ) , µ t ] dt + σdw i ( t ) , (87) dy oi ( t ) = h ( z oi ( t )) dt + dν i ( t ) , (88)with the same initial conditions where µ t is the law of z oi ( t ). Recall that a uniquesolution exists to (85) when u Ni ∈ C Lip(x) ([0 , T ] × R ; U ), and a unique consistentsolution to (87) exists when u oi ∈ C Lip(p) ([0 , T ] × E k ; U ). Notice that in contrast to0 NEVROZ S¸EN AND PETER E. CAINES the coupled process in (85)-(86), the system in (87)-(88) gives N decoupled pairs ofprocesses. Let z (0) := R R xdF ( x ) be the mean value of the initial states and define ǫ N := (cid:12)(cid:12)(cid:12)(cid:12)Z R x dF N ( x ) − z (0) Z R xdF N ( x ) + z (0) (cid:12)(cid:12)(cid:12)(cid:12) , (89)where lim n →∞ ǫ N = 0. Consider the following σ -algebras: F ( y o ) t := σ (cid:8) y o ( s ) , . . . , y oN ( s ); 0 ≤ s ≤ t (cid:9) , F y oi t := σ (cid:8) y oi ( s ); 0 ≤ s ≤ t (cid:9) , F ( y N ) t := σ (cid:8) y N ( s ) , . . . , y NN ( s ); 0 ≤ s ≤ t (cid:9) , F y Ni t := σ (cid:8) y Ni ( s ); 0 ≤ s ≤ t (cid:9) . (90)We now define class of admissible control policies. Let U oi := (cid:26) u ( · ) : u ( t ) is adapted to F ( y o ) t ,u ∈ C Lip(p N ) (cid:0) [0 , T ] × E Nk ; U (cid:1) and E Z T | u ( t ) | dt < ∞ (cid:27) , U o,di := (cid:26) u ( · ) : u ( t ) is adapted to F y oi t ,u ∈ C Lip(p) ([0 , T ] × E k ; U ) and E Z T | u ( t ) | dt < ∞ (cid:27) , U Ni := (cid:26) u ( · ) : u ( t ) is adapted to F ( y N ) t ,u ∈ C Lip(p N ) (cid:0) [0 , T ] × E Nk ; U (cid:1) and E Z T | u ( t ) | dt < ∞ (cid:27) , U N,di := (cid:26) u ( · ) : u ( t ) is adapted to F y Ni t ,u ∈ C Lip(p) ([0 , T ] × E k ; U ) and E Z T | u ( t ) | dt < ∞ (cid:27) , (91)where 1 ≤ i ≤ N and C Lip(p N ) (cid:0) [0 , T ] × E Nk ; U (cid:1) denote the space of U -valued continu-ous functions on [0 , T ] × E Nk with Lipschitz coefficients in E Nk := ⊗ Nj =1 E k,j , where for1 ≤ j ≤ N , E k,j is a copy of E k In the above admissible control policies, U oi represents centralized information onthe partially observed states at the infinite population game whereas U Ni representscentralized information on the partially observed states at the finite population game.On the other hand U o,di represents decentralized control policies at infinite populationgame whereas U N,di represents decentralized control policies at the finite population.For the dynamic game problem specified in (85), recall that the cost function forthe i th agent is given as J Ni ( u Ni , u N − i ) = E Z T N N X j =1 L (cid:0) z Ni ( t ) , u Ni ( t ) , z Nj ( t ) (cid:1) dt. (92)Recall also the generic agent’s PO SOCP: EAN FIELD GAMES WITH PARTIAL OBSERVATIONS Generic Agent’s SOCP: For ≤ t ≤ Tdz o ( t ) = f [ t, z o ( t ) , u ( t ) , µ t ] dt + σdw ( t ) , z o (0) = z (0) ,dy o ( t ) = h ( z o ( t )) dt + dν ( t ) , y (0) = 0 ,J ( u ) = E Z T L [ z o ( t ) , u ( t ) , µ t ] dt (93)where J ( u ) is to be minimized over U := (cid:8) u ( · ) ∈ U : u ( t ) is F y o t − adapted and E R T | u ( t ) | dt < ∞ (cid:9) .The optimal control law for the above PO SOCP (and hence the best responsecontrol process for the MFG game) is characterized by (70) which we denote by u o ( t ).Recall that under (A6)-(A10), u oi ∈ C Lip(p) ([0 , T ] × E k ; U ) ⊂ U o,di . Definition [17] Given ǫ , the set of admissible control laws ( u o , . . . , u oN ) gen-erates ǫ -Nash equilibrium with respect to the cost J Ni if for any ≤ i ≤ NJ Ni (cid:0) u oi ; u o − i (cid:1) − ǫ ≤ inf u i ∈U Ni J Ni (cid:0) u i ; u o − i (cid:1) ≤ J Ni (cid:0) u oi ; u o − i (cid:1) . (94)We now show that the MFG best responses for a finite N population system (85)-(88)is an ǫ -Nash equilibrium with respect to the cost function defined in (92). Theorem Assume (A0)-(A12) hold and there exists a unique solution toMFG system such that the best response control process u o ( t, p ) is continuous in ( t, p ) and Lipschitz continuous in p . Then ( u o , u o . . . , u oN ) where u oi = u o , ≤ i ≤ N ,generates an O (cid:16) ǫ N + 1 / √ N (cid:17) -Nash equilibrium with respect to the cost function (92)such that lim N →∞ ǫ N = 0 .Proof. Proof involves several linked perturbation estimates which involve condi-tional density process. We consider a strategy change for the first agent. Considerthe following closed loop individual dynamics under the best response control policies u oi = u o , 1 ≤ i ≤ N , at finite population dz o,Ni ( t ) = 1 N N X j =1 f (cid:16) t, z o,Ni ( t ) , u o (cid:0) t, ˜ ϕ o,Ni ( t ) (cid:1) , z o,Nj ( t ) (cid:17) dt + σdw i ( t ) ,dy o,Ni ( t ) = h (cid:16) z o,Ni ( t ) (cid:17) dt + dν i ( t ) , (95)where z o,Ni (0) = z i (0), y o,Ni (0) = 0 and ˜ ϕ o,Ni ( t ), 1 ≤ i ≤ N , denote the associatedunnormalized filtering processes. Similarly, consider the MV system dz oi ( t ) = f [ t, z oi ( t ) , u o ( t, ˜ ϕ oi ( t )) , µ t ] dt + σdw i ( t ) , (96) dy oi ( t ) = h ( z oi ( t )) dt + dν i ( t ) , (97)with the same initial conditions and ˜ ϕ oi ( t ) denotes the associated unnormalized filter-ing processes. Following similar lines of the proof of Theorem 6 and using (cid:13)(cid:13)(cid:13) ˜ ϕ o,Ni ( t ) − ˜ ϕ oi ( t ) (cid:13)(cid:13)(cid:13) E k ≤ C Rf C sup ≤ s ≤ t (cid:12)(cid:12)(cid:12) z oi ( t ) − z o,Ni ( t ) (cid:12)(cid:12)(cid:12) , (98)we obtain, as a consequence of Gronwall’s lemma,sup ≤ j ≤ N sup ≤ t ≤ T E | z oj ( t ) − z o,Nj ( t ) | = O (cid:16) / √ N (cid:17) , (99)2 NEVROZ S¸EN AND PETER E. CAINES where the right hand side depends on the terminal time T .Assume now that while each agent j , 2 ≤ j ≤ N , are using the MFG best responsecontrol law u o ( t, p ), the first agent implement a strategy change from u o to u ∈ U N which yields dz o,N ,c ( t ) = 1 N N X j =1 f (cid:16) t, z o,N ,c ( t ) , u (cid:0) t, ˜ ϕ o,N N,c ( t ) (cid:1) , z o,Nj,c ( t ) (cid:17) dt + σdw ( t ) ,dy o,N ,c ( t ) = h (cid:16) z o,N ,c ( t ) (cid:17) dt + dν ( t ) , (100) dz o,Ni,c ( t ) = 1 N N X j =1 f (cid:16) t, z o,Ni,c ( t ) , u o (cid:0) t, ˜ ϕ o,Ni ( t ) (cid:1) , z o,Nj,c ( t ) (cid:17) dt + σdw i ( t ) ,dy o,Ni,c ( t ) = h (cid:16) z o,Ni,c ( t ) (cid:17) dt + dν i ( t ) , (101)where initial conditions are given by z o,Ni,c (0) = z i (0), y o,Ni,c (0) = 0 and ˜ ϕ o,Ni,c ( t ), 1 ≤ i ≤ N , denotes the filtering processes and ˜ ϕ o,N N,c ( t ) := (cid:16) ˜ ϕ o,N ,c ( t ) , . . . , ˜ ϕ o,NN,c ( t ) (cid:17) .We also introduce the MV dynamics and its observation: d ˆ z o ( t ) = f h t, ˆ z o ( t ) , u (cid:0) t, ˜ˆ ϕ o N ( t ) (cid:1) , µ t i dt + σdw ( t )(102) d ˆ y o ( t ) = h (ˆ z o ( t )) dt + dν ( t )(103) d ˆ z oi ( t ) = f h t, ˆ z oi ( t ) , u o ( t, ˜ˆ ϕ oi ( t )) , µ t i dt + σdw i ( t )(104) d ˆ y oi ( t ) = h (ˆ z oi ( t )) dt + dν i ( t )(105)for 2 ≤ i ≤ N with the same initial conditions and ˜ˆ ϕ oi ( t ) denotes the unnormalizedfiltering equations for the signal and observation pair (cid:0) ˆ z oi ( t ) , ˆ y oi ( t ) (cid:1) , 1 ≤ i ≤ N .It now follows thatsup ≤ j ≤ N sup ≤ t ≤ T E (cid:12)(cid:12)(cid:12) z o,Nj,c ( t ) − ˆ z oj ( t ) (cid:12)(cid:12)(cid:12) = O (cid:16) / √ N (cid:17) . (106)Furthermore, by the robustness of the nonlinear filter, for 2 ≤ j ≤ N and 0 ≤ t ≤ T , (cid:13)(cid:13)(cid:13) ˜ ϕ o,Nj ( t ) − ˜ˆ ϕ oj ( t ) (cid:13)(cid:13)(cid:13) E k ≤ sup ≤ s ≤ t C Rf C (cid:12)(cid:12)(cid:12) z o,Nj,c ( t ) − ˆ z oj ( t ) (cid:12)(cid:12)(cid:12) . (107)Gronwall’s lemma, (106) and (107) imply thatsup ≤ t ≤ T E (cid:12)(cid:12)(cid:12) z o,N ,c ( t ) − ˆ z o ( t ) (cid:12)(cid:12)(cid:12) = O (cid:16) / √ N (cid:17) . (108)Observe that (108) and so (108) and the robustness of the filtering together implythat sup ≤ t ≤ T E (cid:13)(cid:13)(cid:13) ˜ˆ ϕ o ( t ) − ˜ ϕ o,N ,c ( t ) (cid:13)(cid:13)(cid:13) E k = O (cid:16) / √ N (cid:17) . (109)Consequently, under the modified strategy, we obtain J N (cid:0) u ; u o − i (cid:1) ≡ EAN FIELD GAMES WITH PARTIAL OBSERVATIONS E Z T N N X j =1 L (cid:16) z o,N ,c ( t ) , u (cid:16) t, ˜ ϕ o,N N,c ( t ) (cid:17) , z o,Nj,c ( t ) (cid:17) dt (110) ( ) , ( ) ≥ E Z T N N X j =1 L (cid:16) ˆ z o ( t ) , u (cid:16) t, ˜ ϕ o,N N,c ( t ) (cid:17) , ˆ z oj ( t ) (cid:17) dt − O (cid:18) ǫ N + 1 √ N (cid:19) (111) ( ) , ( ) ≥ E Z T N N X j =1 L (cid:18) ˆ z o ( t ) , u (cid:18) t, ˜ ϕ o,N ,c ( t ) , ˜ˆ ϕ o N ( t ) (cid:19) , ˆ z oj ( t ) (cid:19) dt − O (cid:18) ǫ N + 1 √ N (cid:19) (112) ( ) ≥ E Z T N N X j =1 L (cid:18) ˆ z o ( t ) , u (cid:18) t, ˜ˆ ϕ o ( t ) , ˜ˆ ϕ o N ( t ) (cid:19) , ˆ z oj ( t ) (cid:19) dt − O (cid:18) ǫ N + 1 √ N (cid:19) (113) ( ) ≥ E Z T L (cid:2) ˆ z o ( t ) , u (cid:0) t, ˜ˆ ϕ o N ( t ) (cid:1) , µ t (cid:3) dt − O (cid:18) ǫ N + 1 √ N (cid:19) Furthermore, by the construction of the generic agent’s partially observed MFG sys-tem (93) we have E Z T L h ˆ z o ( t ) , u (cid:16) t, ˜ˆ ϕ o N ( t ) (cid:17) , µ t i dt ≥ E Z T L [ z o ( t ) , u o ( t, ˜ ϕ o ( t )) , µ t ] dt. (114)On the other hand we have E Z T L [ z o ( t ) , u o ( t, ˜ ϕ o ( t )) , µ t ] dt ( ) ≥ E Z T N N X j =1 L (cid:2) z o ( t ) , u o ( t, ˜ ϕ o ( t )) , z oj ( t ) (cid:3) dt − O (cid:18) ǫ N + 1 √ N (cid:19) ( ) ≥ E Z T N N X j =1 L h z o,N ( t ) , u o (cid:16) t, ˜ ϕ o,N ( t ) (cid:17) , z o,Nj ( t ) i dt − O (cid:18) ǫ N + 1 √ N (cid:19) ≡ J N (cid:0) u o ; u o − (cid:1) − O (cid:18) ǫ N + 1 √ N (cid:19) . (115)It now follows from (110)-(115) J N (cid:0) u o ; u o − (cid:1) − O (cid:18) ǫ N + 1 √ N (cid:19) ≤ inf u ∈U N J N (cid:0) u ; u o − (cid:1) , (116)which completes the proof for agent 1. The analysis for the other agents followssimilarly. 6. An Explicit Example with Finite Dimensional Nonlinear Filters. Innonlinear filtering theory, initiated by the work [2], there has been a considerable4 NEVROZ S¸EN AND PETER E. CAINES progress in representing a large class of nonlinear filters with finite dimensional quan-tities. Furthermore, in the case that these quantities are sufficient for the control,then the infinite dimensional conditional density can be replaced by this finite dimen-sional sufficient statistics. Consequently, a rigorous proof of the verification theoremcan be obtained in this finite dimensional, completely observed stochastic optimalcontrol problem. The literature for such a theory is vast and we refer reader to [9,Section I] for a succinct summary of the related works. In this section we considersuch an explicit model whose nonlinear filters can be expressed with finite dimensionalquantities and hence yield a more tractable PO MFG system.Consider the following MFG system where f , h and L are motivated from [9,Section III.B-C] and given in the following form: Let z Ni ( t ) = [ z Ni, ( t ) , z Ni, ( t )] T , G t := (cid:20) G t G t (cid:21) , w i ( t ) := ( w i, ( t ) , w i, ( t )) and dz Ni ( t ) = (cid:2) g (cid:0) t, z Ni, ( t ) (cid:1) , u Ni ( t, y i ) (cid:3) T dt + G t dw i ( t ) dy i ( t ) = H t z Ni ( t ) + N t db i ( t ) ,z Ni (0) = z i (0) , y i (0) = 0 , ≤ i ≤ N, (117)where (i) z Ni,j ( t ) : [0 , T ] → R , j ∈ { , } , (ii) ( w i, ( t ) , w i, ( t ), b i ( t ); 0 ≤ t ≤ T ) areindependent Brownian motions in R which are also independent of z i (0), (iii) g , H t , G t , N t and h satisfy [9, Assumptions A2)-A6) and A9)]. We note that the exampleis a suitably specialized case of [9, Section III.B-C] where (A2)-(A4) and (A5) aresatisfied by [9, Assumptions A2), A4), A9)]. The cost function is given by J Ni ( u Ni ; u N − i ) := E Z T N N X j =1 l (cid:0) t, z Ni ( t ) , u Ni ( t, y i ) , z Nj ( t ) (cid:1) dt (118)where for any z ∈ R , l ( · , z ) satisfy [9, Assumption A7]. Let ˆ U i := { u i ( · ) ∈ L F yit ([0 , T ]; U ) } .For the dynamics (117), we also consider the limiting system dz i ( t ) = (cid:2) g (cid:0) t, z i, ( t ) (cid:1) , u i ( t, y i ) (cid:3) T dt + G t dw i ( t ) dy i ( t ) = H t z i ( t ) + N t db i ( t ) ,J i := E Z T l [ t, z i ( t ) , u i ( t, y i ) , µ t ] dtz i (0) = z i (0) , y i (0) = 0 , ≤ i ≤ N, (119)where we emphasize that the mean field exists only in the performance functions.Assume now that there exists a function φ ( t, x ) ∈ C , t,x ([0 , T ] × R ) which satisfies ∂ t φ ( t, x ) + 12 ( G t ) ∂ xx φ ( t, x ) + 12 | G t ∂ x φ ( t, x ) | = 12 (cid:0) Q t x + 2 m t x + δ t (cid:1) (120)where Q , m and δ are arbitrary functions. Then, for each u ∈ ˆ U with the followingconditions are satisfied:(E1) The random variable z i (0), 1 ≤ i ≤ N , has density q i ( z i ) = exp (cid:0) − P − ( z i − ξ ) (cid:1) (2 π ) n | P | , P ≥ ξ > EAN FIELD GAMES WITH PARTIAL OBSERVATIONS g ( t, x ) = ( G t ) ∂ x φ ( t, x ) , (122)the system defined by (119) has an information state given by q it = exp ( φ ( x, t ) + λ t ) exp (cid:0) − (cid:12)(cid:12) P − t ( z i − r t ) (cid:12)(cid:12)(cid:1) (2 π ) n | P t | (123)where P t ≥ , ∀ t ∈ [0 , T ] and λ t ∈ R and r t ∈ R are given by: dr t = ( P t Q t ) r t dt − P t m t dt + u i ( t ) dt, r (0) = ξ (124) dP t dt = − P t Q t P t + G t G Tt , P (0) = P (125) λ t = 12 Z t (cid:0) Q s r s + 2 m s r s + δ s + tr( P s Q s (cid:1) ds. For the partially observed MFG problem defined by (117)-(118), the information state q t in (123) for the generic agent will be given in a finite-dimensional form if the PDEin (120) has shown to an explicit solution. This is discussed in [9, Theorem 3.9] andit is shown that solution is given by φ ( t, x ) = log Γ( t, x )Γ( t, x ) = 12 ∆ t x + xς t + η t (126)where ∆( · ) , ς ( · ) , η ( · ) are given in the statement of the theorem which sets the function g ( t, x ) to be: g ( t, x ) = G t G t Γ( t, x ) (cid:0) ∆ t x + ς t (cid:1) . (127)Consequently, by (117)-(127), we obtain a PO MFG model whose filtering equationhas a finite dimensional solution. remark Consider the following filtering problem dz ( t ) = f (cid:0) z ( t ) (cid:1) dt + dw ( t )(128) dy ( t ) = Z t z ( s ) ds + dν ( t ) . (129) It is shown in [2] that the solution to the unnormalized conditional density of ρ ( t, x ) of P ( z ( t ) |F yt ) is given in terms of 10 sufficient statistics when f satisfies the conditionthat f ′ + f = ax + bx + c , a ≥ − . The model defined in (117) is a generalization ofthis result to the case where the control enters into the dynamics which is in generalthe case in MFG. By the Separation Theorem discussed in Section 3.3, the optimal control process fora generic agent is given by the minimizer of a Hamiltonian which was defined earlier.Before, we define the forward and backward operators as: O at := 12 ∇ xx + ( g, a ) T ∇ x NEVROZ S¸EN AND PETER E. CAINES O ∗ at := 12 ∇ xx − ( g, a ) T ∇ x − tr [ ∇ x ( g, a )] . (130)Consequently, one can write the the optimal control in the form: u ∗ = { u ∗ ( t, p ) = a ∗ ( T − t, q t ) ; 0 ≤ t ≤ T } a ∗ ( τ, p ) = arg min a ∈ U (cid:8)(cid:0) O aT − t D V ( τ, p )( · ) , p (cid:1) + ( l [ t, · , a, µ t ] , p ) (cid:9) . (131)Notice that the control law given in the separated form a ∗ ( T − t, q t ) depends on q t which has an explicit solution given by (123)-(126) and entails a finite dimensionalrepresentation. In other words, the best response process of the agents in the abovepartially observed MFG example, which is F yt -adapted, can be written as a functionof (cid:0) G t , g, y t , H t , φ ( t ) , P , ξ, Q t , C − t , m t (cid:1) .Finally, we remark that one can follow the rest of the analysis in Section 4.1in order to obtain a sufficient condition so that PO MFG system admits a uniquesolution. Alternatively, one can employ the models considered in [9, Section IV] andobtain an equivalent LQG MFG system for such nonlinear models. Such a path iscurrently under investigation and we will report further details in a future work. 7. Comparison with MFG with a Partially Observed Major Agent. Inthe nonlinear MFG setup where there is a major agent, the best response controlpolicy of each minor agent depends on the state of the major agent in addition to themean field which is stochastic; the mean field is adapted to the filtration generated bythe Brownian motion of the major agent [24] (see also [13] for the LQG case). Moreexplicitly, consider the following stochastic coefficient MV (SMV) type dynamics: dz ( t ) = f [ t, z , u ( t, ω, z ) , µ t ( ω )] dt + σ [ t, z , µ t ( ω )] dw ( t, ω ) , (132) dz ( t ) = f [ t, z, u ( t, ω, z ) , µ t ( ω )] dt + σ [ t, z ( t ) , µ t ( ω )] dw ( t ) , (133)with z (0) = z (0) and z (0) = z (0) where (cid:0) µ t ( ω ) (cid:1) ≤ t ≤ T satisfies P ( z ( t ) ≤ α |F w t ) = R α −∞ µ t ( ω, dx ) for all α ∈ R n . The SMV SDEs in (132) and (133) represent the closedloop state dynamics (with F w t := σ { w ( s ) : 0 ≤ s ≤ t } -adapted feedback control)of the major and the generic minor agents, respectively, in the infinite populationlimit. Let U := { u ( · ) ∈ U : u is adapted to F w t and E R T | u ( t ) | dt < ∞} and U i := { u ( · ) ∈ U : u is adapted to F w ,w i t and E R T | u ( t ) | dt < ∞} . Then we definetwo SOCPs as follows: Major Agent’s SOCP at Infinite Population: dz ( t ) = f [ t, z ( t ) , u ( t ) , µ t ( ω )] dt + σ [ t, z ( t ) , µ t ( ω )] dw ( t ) ,J ( u ) = E Z T L [ z ( t ) , u ( t ) , µ t ( ω )] dt. Generic Minor Agent’s SOCP at Infinite Population: dz i ( t ) = f [ t, z i ( t ) , u ( t ) , µ t ( ω )] dt + σ [ t, z i ( t ) , µ t ( ω )] dw i ( t ) J i ( u ) = E Z T L [ z i ( t ) , z ( t ) , u ( t ) , µ t ( ω )] dt. where J is to be minimized over U , J i is to be minimized over U i , ( w i ( t ); 1 ≤ i ≤ N )are N independent Brownian motions and u ( t ) and u ( t ) are F w t -adapted best re-sponse control processes (i.e., the unique solutions satisfying [24, (5.14)-(5.19)]. The EAN FIELD GAMES WITH PARTIAL OBSERVATIONS u N = u , u Ni = u, i = 1 , . . . , N when applied toa finite population game generates an ǫ -Nash equilibrium [24, Theorem 7.2]. Oneessential point in this result is that the state of the major agent and the stochasticmeasure induced by the generic minor agent, ( z ( t, ω ) , µ t ( ω )) ≤ t ≤ T , are completelyobserved by the minor agents. It is worth remarking that the solution to these twoSOCPs are given by certain backward SPDEs (BSPDE) (see [10, Theorem 4]). Thisnonstandard feature is due the fact that the dynamics and the cost functions haveexplicit dependence on the underlying probability space, i.e., each f , f , L and L is F w t -adapted. Such problems are referred to as SOCP with random parameters whichis introduced in [25].Motivated by the observation that the best response control policies depend uponthe state of the major agent, the corresponding PO MM-MFG is examined in [10]where it is assumed that minor agents have partial observations on the major agent’sstate in a distributed manner where each agent has complete observation on its ownstate. 8. Conclusions. In this work we consider an MFG model where the agents havenonlinear dynamics and cost functions and have noisy measurements on their individ-ual state dynamics. By constructing the associated completely observed model via anapplication of nonlinear filtering theory and the Separation Principle, a control prob-lem with infinite dimensional state space is formulated and a solution is characterizedvia a solution to the HJB equation in this function space. The optimal control lawobtained from the solution of the HJB equation is next applied by the agents in theinfinite population limit. We show that the aggregate behaviour of the populationunder such policies collectively generate the mean field and then establish the ǫ -Nashproperty of such solutions. 9. Acknowledgement. The authors wish to thank the referees for their valu-able comments which greatly improved the quality of the manuscript. REFERENCES[1] A. Bain and D. Crisan , Fundamentals of Stochastic Filtering , Springer-Verlag, New York,NY, 2009.[2] V. E. Beneˇs , Exact finite-dimensional filters for certain diffusions with nonlinear drift ,Stochastics, 5 (1981), pp. 65–92.[3] V. E. Beneˇs and I. Karatzas , On the relation of Zakai’s and Mortensen’s equations , SIAMJ. Control Optim., 21 (1983), pp. 472–489.[4] A. Bensoussan , Stochastic Control of Partially Observable Systems , Cambridge UniversityPress, Cambridge, United Kingdom, 1992.[5] A. Bensoussan, M. Chau, and P. Yam , Mean field games with a dominating player , Appl.Math. Optim., (2015), pp. 1–38.[6] P. E. Caines and A. Kizilkale , epsilon-Nash Equilibria for Partially Observed LQG MeanField Games with Major Player , IEEE Trans. Autom. Control, 62 (2017), pp. 3225–3234.[7] R. Carmona and F. Delarue , Probabilistic Theory of Mean Field Games with ApplicationsI-II , Springer International Publishing, New York, NY, 2018.[8] R. Carmona and X. Zhu , A probabilistic approach to mean field games with major and minorplayers , arXiv:1409.7141., (2015).[9] C. D. Charalambous and R. J. Elliot , Certain nonlinear partially observable stochasticoptimal control problems with explicit control laws equivalent to LEQG/LQG problems ,IEEE Trans. Autom. Control, 42 (1997), pp. 482–497.[10] N. S¸en and P. E. Caines , Mean field game theory with a partially observed major agent , SIAMJ. Control Optim., 54 (2016), pp. 3174–3224. NEVROZ S¸EN AND PETER E. CAINES[11] D. Firoozi and P. E. Caines , Epsilon-Nash equilibria for partially observed LQG mean fieldgames with major agent: Partial observations by all agents , in Proc. 54th. IEEE Conf.Dec. Cont., Osaka, Japan, 2016, pp. 4430–4437.[12] W. H. Fleming , Nonlinear semigroup for controlled partially observed diffusions , SIAM J.Control Optim., 20 (1981), pp. 286–301.[13] M. Huang , Large-population LQG games involving a major player: The Nash certainty equiv-alence principle , SIAM J. Control Optim., 48 (2010), pp. 3318–3353.[14] M. Huang, P. E. Caines, and R. P. Malham´e , Individual and mass behaviour in largepopulation stochastic wireless power control problems: Centralized and Nash equilibriumsolutions , in Proc. 42th. IEEE Conf. Dec. Cont., Maui, HI, 2003, pp. 98–103.[15] M. Huang, P. E. Caines, and R. P. Malham´e , Distributed multi-agent decision-making withpartial observartions: Asymptotic Nash equilibria , in Proc. 17th. Intern. Symp. Math.Theo. Net. Syst., Kyoto, Japan, 2006, pp. 2725–2730.[16] M. Huang, P. E. Caines, and R. P. Malham´e , Large-population cost-coupled LQG problemswith nonuniform agents: Individual-mass behaviour and decentralized ǫ -Nash equilibria ,IEEE Trans. Autom. Control, 52 (2007), pp. 1560–1571.[17] M. Huang, R. P. Malham´e, and P. E. Caines , Large population stochastic dynamic games:Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , Com-mun. Inf. Syst., 6 (2006), pp. 221–252.[18] M. Huang, R. P. Malham´e, and P. E. Caines , Nash certainty equivalence in large populationstochastic dynamic games: Connections with the physics of interacting particle systems ,in Proc. 45th. IEEE Conf. Dec. Cont., San-Diego, CA, 2006, pp. 4921–4926.[19] G. Kallianpur , Stochastic Filtering Theory , Springer-Verlag, New York, NY, 1980.[20] V. N. Kolokoltskov, J. Li, and W. Yang , Mean field games and nonlinear Markov processes ,arxiv:1112.3744, (2011).[21] J.-M. Lasry and P.-L. Lions , Jeux `a champ moyen. i- Le cas stationnaire , C. R. Math. Acad.Sci. Paris, 343 (2006), pp. 619–625.[22] J.-M. Lasry and P.-L. Lions , Jeux `a champ moyen. ii- Horizon fini et contrˆole optimal , C.R. Math. Acad. Sci. Paris, 343 (2006), pp. 679–684.[23] J.-M. Lasry and P.-L. Lions , Mean field games , Japan J. Math., 2 (2007), pp. 229–260.[24] M. Nourian and P. E. Caines , ǫ -Nash mean field game theory for nonlinear stochastic dynam-ical systems with major and minor agents , SIAM J. Control Optim., 51 (2013), pp. 3302–3331.[25] S. Peng , Stochastic Hamilton-Jacobi-Bellman equations , SIAM J. Control Optim., 30 (1992),pp. 284–304.[26] N. S¸en and P. E. Caines , Nonlinear filtering theory for McKean-Vlasov type stochastic dif-ferential equations , SIAM J. Control Optim., 54 (2016), pp. 153–174.[27] A. S. Sznitman , Topics in propagation of chaos , in Ecole d’Et´e de Probabilit´es de Saint-FlourXIX-1989, P.-L. Hennequin, ed., Springer-Verlag, Berlin, Germany, 1991, pp. 165–251.[28]