[PDF] Mean-Field Games with Differing Beliefs for Algorithmic Trading

Abstract

Even when confronted with the same data, agents often disagree on a model of the real-world. Here, we address the question of how interacting heterogenous agents, who disagree on what model the real-world follows, optimize their trading actions. The market has latent factors that drive prices, and agents account for the permanent impact they have on prices. This leads to a large stochastic game, where each agents' performance criteria are computed under a different probability measure. We analyse the mean-field game (MFG) limit of the stochastic game and show that the Nash equilibrium is given by the solution to a non-standard vector-valued forward-backward stochastic differential equation. Under some mild assumptions, we construct the solution in terms of expectations of the filtered states. Furthermore, we prove the MFG strategy forms an ϵ -Nash equilibrium for the finite player game. Lastly, we present a least-squares Monte Carlo based algorithm for computing the equilibria and show through simulations that increasing disagreement may increase price volatility and trading activity.

Full PDF

MMean-Field Games with Diﬀering Beliefs for AlgorithmicTrading (cid:73)

Forthcoming in Mathematical Finance

Philippe Casgrain a , Sebastian Jaimungal a a Department of Statistical Sciences, University of Toronto

Abstract

Even when confronted with the same data, agents often disagree on a model of the real-world.Here, we address the question of how interacting heterogenous agents, who disagree on what modelthe real-world follows, optimize their trading actions. The market has latent factors that driveprices, and agents account for the permanent impact they have on prices. This leads to a largestochastic game, where each agents’ performance criteria are computed under a diﬀerent probabil-ity measure. We analyse the mean-ﬁeld game (MFG) limit of the stochastic game and show thatthe Nash equilibrium is given by the solution to a non-standard vector-valued forward-backwardstochastic diﬀerential equation. Under some mild assumptions, we construct the solution in termsof expectations of the ﬁltered states. Furthermore, we prove the MFG strategy forms an (cid:15) -Nashequilibrium for the ﬁnite player game. Lastly, we present a least-squares Monte Carlo based al-gorithm for computing the equilibria and show through simulations that increasing disagreementmay increase price volatility and trading activity.

1. Introduction

Financial markets are immensely complicated dynamic systems which incorporate the interactionsof millions of individuals on a daily basis. Market participants vary immensely, both in terms oftheir trading objectives and in their beliefs on the assets they are trading. All of these participantscompete with one another in an attempt to achieve their own personal objectives in the most eﬃcientway possible. Traded assets may also be driven by latent factors, and agents must dynamicallyincorporate data into their trading decisions.In this paper, we propose a game theoretic model in which a large population of heterogeneousagents all trade the same asset. This model considers heterogeneity not only from the point of viewof an individual’s trading objectives and risk appetite, but also from the point of view of each agent’sbeliefs regarding the performance of the asset they are trading. We pay particular attention to theinformation each agent is privy to, in an attempt to render the framework as realistic as possible,while maintaining analytical tractability. We study the equilibrium of these markets by usingthe theory of mean-ﬁeld games (MFGs), which studies the system as the number of participatingagents becomes arbitrarily large. The general theory of mean-ﬁeld games already has a large body ofresearch associated with it. The original works stem from Huang et al. (2006), Huang et al. (2007), (cid:73)

SJ would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada(NSERC), funding reference numbers RGPIN-2018-05705 and RGPAS-2018-522715. Data sharing is not applicableto this article as no new data were created or analyzed in this study.

Email addresses: [email protected] (Philippe Casgrain), [email protected] (Sebastian Jaimungal) a r X i v : . [ q -f i n . M F ] D ec nd Lasry and Lions (2007). Among the many extensions and generalizations which explore thebroad theory of MFGs as well as their applications, we highlight the following works: Huang (2010)and Nourian and Caines (2013) who investigate MFGs with combinations major and minor agents,Carmona and Delarue (2013) who develop a probabilistic analysis of MFGs, as well as the worksof Cirant (2015); Bensoussan et al. (2018) who introduce MFGs with heterogeneous populations ofagents. This theory has seen applications in various ﬁnancial contexts, such as Gu´eant et al. (2011)who explores various applications of MFGs in economics, Carmona et al. (2013) and Huang andJaimungal (2017) who study systemic risk, Huang et al. (2019) who studies algorithmic trading inthe presence of a major agent and a population of minor agents, Cardaliaguet and Lehalle (2016)who investigates optimal execution, and Firoozi and Caines (2015), Firoozi and Caines (2016) wholook at MFGs with partial information on states and apply it to algorithmic trading.Other works that study diﬀering beliefs of market participants include Bayraktar and Munk(2017), who study a system where agents’ believe the asset price is an arithmetic Brownian motionwith a latent (constant) drift, and agents disagree on the prior distribution of this latent drift,as well as on the temporary and permanent impact trading has on prices. The authors do notseek an equilibrium, but rather look at how the diﬀerences in belief may cause mini-ﬂash crashes.Bouchard et al. (2018) study a model where agents, with diﬀering risk aversion who receive randomendowments, trade assets who’s drifts are determined in equilibrium. Under certain assumptions,the equilibria results in asset prices having a permanent price impact component due to the existenceof trading costs (temporary price impact). Choi et al. (2018) study how traders who penalizedeviations from a target strategy, and have their own private information, form an equilibria.In contrast to other work on MFGs, as well as its speciﬁc application to algorithmic trading, here,motivated by Casgrain and Jaimungal (2016), we include latent states so that agents do not havefull information about the system dynamics. Furthermore, motivated by Firoozi and Caines (2016)and Casgrain and Jaimungal (2018), who study a stochastic game with latent factors where agentshave the same model beliefs, here, we study how varying beliefs among the agents aﬀect the optimaltrading behaviour. In our model, we express the belief of agents as a probability measure on thedynamics of the asset price process and of any latent processes that may be driving them. As faras the authors are aware, this is the ﬁrst time that MFG with varying beliefs have been treatedin the literature. This generalization is quite non-trivial, nonetheless, we succeed in characterizingthe model equilibrium as the solution to a non-standard forward-backward stochastic diﬀerentialequation (FBSDE) deﬁned across the collection of belief measures. We are able to present a closedform representation for the solution of the MFG and it incorporates all of the diﬀering market’sbeliefs into the decisions of the individual agents.Our key result, is the optimal mean-ﬁeld trading rate ν ∗ t for the collection of sub-populations(within which agents have the same belief) can be written as ν ∗ t = g ,t + g ,t ¯ q ν ∗ t , where g ,t is a deterministic matrix-valued process, g ,t is stochastic and encodes the various be-liefs of the agents and their expectations of the future dynamics of the asset price, and ¯ q ν ∗ t isthe corresponding mean-ﬁeld inventories. Moreover, the individual agents’ trading rates within asubpopulation- k can be written as ν j, ∗ t = ν k, ∗ t + a k h k ,t ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) , where h k ,t is a deterministic function of time. Hence, agents speed up or slow down relative tothe mean-ﬁeld trading rate depending on whether their current inventory is above or below theirsubpopulation’s mean-ﬁeld inventory. The model setup does not penalize deviations from the mean-ﬁeld, yet agents tend to revert to the mean-ﬁeld – this is in constrast to many MFG formulations2here the deviation from the mean-ﬁeld is explicitly penalized.We structure the remainder of the paper as follows. Section 2 introduces the market modeland the stochastic game that agents participate in. Section 3 begins by introducing the MFGlimit of the stochastic game and then proves the collection of optimal strategies in the MFG maybe represented as the solution to a system of coupled FBSDEs. Next, we solve the system ofFBSDEs, and ﬁnd the mean-ﬁeld and each individual agents’ strategy. Section 5 provides a speciﬁcexample of a model where the assumptions in the key results are satisﬁed. In Section 4 we provethe solution to the MFG satisﬁes the (cid:15) -Nash equilibrium property in the ﬁnite population game.Lastly, Section 7 provides a least-square Monte Carlo approach to computing certain expecations,as well as simulated examples of a market model with agents having diﬀering beliefs.

2. The Model

In this section, we provide the market model and the participating agents’ performance criteria.Our model closely resembles the model for the stochastic game in Casgrain and Jaimungal (2018).The stochastic game here aims to characterize a population of agents with several sources of het-erogeneity. As in Casgrain and Jaimungal (2018), here, agents have varying trading objectives. Inaddition, however, agents are also characterized by their beliefs regarding the model driving theasset price process. In the remainder of this section, we present the trading mechanics which eachof the agents use to interact with the market, as well as the objectives each of the agents seek toachieve with their actions.

The market consists of a population of

N > j ∈ N := { , . . . , N } . The total population of agents isdivided into K ∈ { , . . . , N } disjoint sub-populations, which are indexed by k ∈ K := { , . . . , K } . K is assumed to be constant and independent of N . All agents within a ﬁxed sub-population havehomogeneous beliefs and performance criteria. The set K ( N ) k := { j ∈ N : j is in sub-population k } , ∀ k ∈ K , (1)denotes the set of agents within sub-population k , and the superscript ( N ) indicates the explicitdependence on the total number of agents. We also deﬁne N ( N ) k := |K ( N ) k | to be the total numberof agents within sub-population k . We further assume the number of agents contained in each ofthe sub-populations remains stable as we take the population limit to inﬁnity. More speciﬁcally,we require that the proportion of agents contained within population k satisﬁeslim N →∞ p ( N ) k = p k ∈ (0 ,

1) where p ( N ) k = N ( N ) k N . (2)

We work on the ﬁltered probability space (Ω , G = {G t } t ∈ [0 ,T ] , P ) completed by the null sets of P and where T ∈ (0 , ∞ ) is some ﬁxed time horizon. All of processes deﬁned in the remainder ofthis section are G -adapted, unless otherwise speciﬁed, and the notation E P [ · ] represents expectationwith respect to the measure P .All agents have the ability to buy and sell the asset over the ﬁxed trading period [0 , T ], after whichall trading activity comes to a halt. Each agent j ∈ N controls the amount they wish to purchaseor sell at a continuous rate denoted ν j = ( ν jt ) t ∈ [0 ,T ] , where ν jt > ν jt <

0) indicates the rate of buy(sell) orders the agent sends to the market. At the start of the trading period, each agent holds a3andom amount Q j of the asset. This may be interpreted as each agent having private informationabout their own holdings, whereas other market participant know only its distribution. Agentskeep track of their holdings in the traded asset with the inventory process q j,ν j = ( q j,ν j t ) t ∈ [0 ,T ] ,where the superscript indicates the explicit dependence on the agent’s controlled rate of trade. Therelationship between agent- j ’s trading rate and their inventory process is q j,ν j t = Q j + (cid:90) t ν jt dt , (3)and may be interpreted as each agent buying or selling an amount (cid:15) ν jt in each small time interval[ t, t + (cid:15) ). Assumption . We make the technical assumption that the initial inventory holdings of all agentshave a bounded variance, so that ∃ < C < ∞ for which E P ( Q j ) < C , ∀ j ∈ N . Moreover, weassume that the mean of the starting inventory levels are the same within a given sub-population,so that E P [ Q j ] = m k for each j ∈ K ( N ) k .Buying and selling actions of agents impact the price of the traded asset in a manner to bespeciﬁed below. As well, agents believe the asset midprice follows (potentially) diﬀerent models.We incorporate diﬀering beliefs into our model by assigning a probability measure P k to each sub-population k ∈ K . The various measures correspond to the model that agents in a particularsub-population believes to represent the true dynamics of the asset price.We deﬁne the asset price process S ν ( N ) = ( S ν ( N ) t ) t ∈ [0 ,T ] , where the superscript ν ( N ) = ( ν j ) j ∈ N indicates the dependence of the price on the actions of all agents in the market. It is useful todeﬁne the average trading rate ν k, ( N ) = ( ν k, ( N ) t ) t ∈ [0 ,T ] of all agents within sub-population k as ν k, ( N ) t = 1 N ( N ) k (cid:88) j ∈K ( N ) k ν jt . (4)Each agent in sub-population k then believes the asset price process follows the dynamics S ν ( N ) t = S + (cid:90) t (cid:40) A ku + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p ( N ) k (cid:48) ν k (cid:48) , ( N ) u (cid:41) du + M kt , (5)where for each k ∈ K , A k = ( A kt ) t ∈ [0 ,T ] is a G -predictable process, M k = ( M kt ) [0 ,T ] is a G -adapted P k -martingale, and λ k,k (cid:48) > ∀ k, k (cid:48) ∈ K are constants. We also assume the initial inventory holdingsof each agent Q j j ∈ ( N ) are all independent of both { A k } k ∈ K and { M k } k ∈ K in each measure P k .The measure P k eﬀectively speciﬁes the sub-population- k ’s asset price model through the processes A k and M k , as well as the scale of the market impact of each sub-population, through set ofconstants { λ k,k (cid:48) } k (cid:48) ∈K . Assumption . We make the technical assumptions that A k ∈ H ,kT and M k ∈ L ,kT , where H ,kT = (cid:26) f t : Ω × [0 , T ] → R : E P k (cid:90) T (cid:107) f t (cid:107) dt < ∞ (cid:27) and (6a) L ,kT = (cid:110) f t : Ω × [0 , T ] → R : E P k (cid:107) f t (cid:107) < ∞ , ∀ t ∈ [0 , T ] (cid:111) , (6b)for each k ∈ K and where (cid:107)·(cid:107) represents the Euclidean norm. Assumption . We assume that P k ∼ P for all k ∈ K and the law Q j under each measure P k isthe same as that under the measure P . 4 ssumption . We assume that for each k ∈ K , A k and M k are uncontrolled – i.e., are unaﬀectedby the agents’ actions.Our asset price process model does not require explicitly specifying A k and/or M k in advance.Rather, we only require the integrability conditions in (6), hence there is great ﬂexibility in theclass of models our approach accommodates. For example, A k and M k can be discontinuous ornon-Markov, as long as they satisfy the appropriate integrability conditions. The key assumptionwe make is that price impact from the order-ﬂow of all agents’ trading is linear. Remark 2.5.

Agents do not change their assigned belief measure, even after observing data from( S ν ( N ) t ) t ∈ [0 ,T ] , however, their prior assumptions are updated to posterior estimates as time ﬂows.Each agent tracks their total accumulated cash process X j,ν j t = ( X j,ν j t ) t ∈ [0 ,T ] throughout thetrading period. When buying and selling the asset, each agent pays an instantaneous cost that islinearly proportional to amount of shares transacted. This cost is expressed through the controlleddynamics of the cash process. For an agent j ∈ K ( N ) k , their corresponding cash process is X j,ν j t = X j − (cid:90) t (cid:16) S ν ( N ) t + a k ν ju (cid:17) ν ju du , (7)where a k > k and sets the scale of the instan-taneous cost. The penalty − a k (cid:82) t ( ν ju ) du may be interpreted as a cost of trading too quickly, or atractable proxy for the cost of crossing the bid-ask spread, as in Bouchard et al. (2018). It is alsostraightforward to include the inﬂuence of other agents in this cost, i.e., replace S ν ( N ) t + a k ν ju by S ν ( N ) t + (cid:80) k ∈ K a k p ( N ) k ν k, ( N ) u . In this market model, agents have restricted information over the course of the trading period.More speciﬁcally, agents have access only to the information generated by the paths of the asset priceprocess S ν ( N ) , their own inventory process q j,ν j , and the average order ﬂow of each sub-population, ν k, ( N ) = (cid:0) ν k, ( N ) (cid:1) k ∈ K . We express this information restriction in our model by restricting the sigma-algebra to which an agent’s strategy may be adapted. For each j ∈ N , we only allow agent- j tochoose strategies contained within the set of asmissible strategies, A j := (cid:8) ω ∈ H T , ω is F j -predictable (cid:9) , (8)where we deﬁne H T = (cid:84) k ∈ K H ,kT , and F jt = σ (cid:16) ( S ν ( N ) u , ν k, ( N ) u ) u ∈ [0 ,t ) (cid:17) ∨ σ (cid:16) Q j (cid:17) , (9)which is the sigma-algebra generated by the paths of the asset price process, the total order-ﬂowproces,, and the starting inventory level for agent j . In deﬁnition (8), we deliberately restrictourselves to processes in H T , to guarantee that S ν ( N ) t ∈ H ,kT for all k ∈ K . Each agent chooses their trading strategy to maximize an objective functional that measures theirperformance over the course of the trading period [0 , T ]. For each j ∈ N let A − j := × i ∈ N ,i (cid:54) = j A i .Each agent- j within a sub-population k ∈ K , chooses a control ν j ∈ A j to maximize a functional5 j : A j × A − j → R deﬁned as follows H j ( ν j , ν − j ) = E P k (cid:20) X ν j T + q j,ν j T (cid:16) S ν ( N ) T − Ψ k q j,ν j T (cid:17) − φ k (cid:90) T ( q j,ν j u ) du (cid:21) , (10)where Ψ k > φ k ≥ ν − j := (cid:0) ν , . . . , ν j − , ν j +1 , . . . , ν N (cid:1) to indicate the depen-dence of the objective functional on the actions of all other agents in the population.The objective functional corresponds to the agent trying to maximize a weighted average of threeseparate quantities. The ﬁrst term X ν j T corresponds to the total amount of cash the agent hasaccumulated up until time T . The second term, q j,ν j T (cid:16) S ν ( N ) T − Ψ k q j,ν j T (cid:17) corresponds to the cost ofliquidating all of the agent’s leftover inventory at time T , minus a liquidation penalty controlledby the parameter Ψ k . The last term, − φ k (cid:82) T ( q j,ν j u ) du is a running risk-aversion penalty thatis controlled by the parameter φ k , which incentivizes the agent to keep their market exposurelow during the trading period. As demonstrated in Cartea et al. (2017), this term may also beinterpreted as stemming from an agent’s model uncertainty with respect to a continuum of measures,absolutely continuous with respect to the reference measure, and penalize those candidate measureswith relative entropy.Each agent within sub-population k has an objective functional that is computed by takingexpectations under the measure P k . Hence, agents incorporate their own beliefs on the asset pricedynamics. Furthermore, each functional H j depends on the actions of all other players ( ν − j )through the dynamics of the asset price S ν ( N ) t , which implicitly appear in the deﬁnition (10).By expanding the dynamics of each of the state processes present in (10), and by using integrationby parts, we may re-write the agent’s objective functional as H j ( ν j , ν − j ) = C j + E P k (cid:34)(cid:90) T q j,ν j t dS ν ( N ) t − (cid:32) ν jt q j,ν j t (cid:33) (cid:124) (cid:18) a k Ψ k Ψ k φ k (cid:19) (cid:32) ν jt q j,ν j t (cid:33) dt (cid:35) , (11)where C j is a term that is constant with respect to ν j and ν − j . Each agent’s behaviour is charac-terized entirely by the objective functional they are trying to maximize. From (11), it is clear thatthe objective functional is parametric so that the agent’s preferences can be entirely described bythe tuple (cid:0) a k , φ k , Ψ k , P k (cid:1) and their starting inventory Q j .The market model deﬁned above forms a stochastic game in which all participating agents arecompeting to maximize each of their own objectives. We wish to ﬁnd and study this market at itsNash equilibrium. This equilibrium can be described more formally as the collection of admissiblestrategies { ν j ∈ A j : j ∈ N } which satisﬁes the condition ν j, ∗ = arg max ω ∈A j H j (cid:0) ω, ν − j (cid:1) , ∀ j ∈ N . (12)Obtaining this collection of strategies for the stochastic game with a ﬁnite number of playersproves to be a diﬃcult task. As we make no assumptions on { ν j } beyond the measurability andintegrability assumptions required in the deﬁnition of A j , and in particular do not use a feed-backfrom, a Nash equilibrium satisfying (12) will be an open-loop equilibrium in general. One of themain obstacles in ﬁnding a solution to this problem is that each agent’s strategy is adapted todiﬀerent ﬁltration F j . Furthermore, each of the objective functionals deﬁned in equation (11) areexpressed one of K diﬀerent measures from the collection of measures { P k } k ∈ K , each representingthe beliefs of a particular individual. These two features make the ﬁnite-population stochasticgame diﬃcult to solve directly. It is, however, possible to solve the stochastic game in the inﬁnitepopulation limit, and use the result as an approximation for the ﬁnite population game.6 . Solving the Mean-Field Stochastic Game As the stochastic game presented in Section 2.4 presents obstacles when aiming to solve it directly,we now take a diﬀerent avenue. In this section, we study the stochastic game as the population limittends towards inﬁnity. The resulting limit is that of a stochastic Mean Field Game (MFG) thatwe can solve. Although we do not explicitly solve the ﬁnite player game presented in Section (2),by establishing an (cid:15) -Nash equilibrium property in Section 4, we show that the equilibrium solutionobtained for the MFG provides an approximation to the ﬁnite population game, provided that thepopulation size is large enough.This section begins by taking the population limit as N → ∞ , to obtain new objective functionalsfor the agents resulting in a stochastic MFG. Next, using convex analysis methods, we character-ize the Nash-equilibrium as the solution to a coupled system of FBSDEs. We then conclude bypresenting a solution to this FBSDE problem, and thus an exact representation of each agent’soptimal control at the Nash-equilibrium. Agent- j ’s objective functional (10) only depends on the population size N through the dynamicsof the mid-price process S ν ( N ) t , which is given by the dynamics in equation (5). Assumption . To proceed, we assume that the limiting trading rate exists, in particular, thereexist processes ν k = ( ν kt ) t ∈ [0 ,T ] for k ∈ K such that ν k ∈ H T andlim N →∞ ν k, ( N ) t = ν kt , P × µ a.e., (13)where µ is the Lebesgue measure on the Borel sigma-algebra B [0 ,T ] , and where P × µ is the canonicalproduct measure of P and µ .As each individual ν j is F j -predictable, ν k must be (cid:16)(cid:87) j ∈ N F j (cid:17) -predictable. Moreover, by ourassumption that P k ∼ P for each k ∈ K , the limit (13) also holds P k × µ almost everywhere. Fromnow on, we refer to each of the processes ν k as the mean-ﬁeld trading rate for sub-population- k .Using the assumption that p ( N ) k → p k for all k ∈ K along with (13), we ﬁnd that in the inﬁnitepopulation limit, from the perspective of agent- j from sub-population k , the dynamics of the assetprice process is S νt = S + (cid:90) t (cid:40) A ku + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p k (cid:48) ν k (cid:48) u (cid:41) du + M kt . (14)In this limit, a single individual’s impact on the price becomes negligible, thus the resulting mean-ﬁeld trading rate ν k is unaﬀected by a single agent’s trading rate ν j . Therefore, in the limit, eachagent’s objective H j no longer depends on the whole collection of trading rates ν − j , but insteadonly depends on the collection of mean-ﬁeld processes { ν k } k ∈ K , which considerably simpliﬁes thedependence structure within the game. This can be interpreted as agents becoming ‘price takers’in the mean-ﬁeld limit resulting from the aggregate of agents’ inﬁnitesimal impact.By using the objective functional representation in (11), expanding dS νt from (14), and noticingthat the martingale components vanish under expectation, we may write the agents objectivefunctional in the inﬁnite population limit as H ν j ( ν j ) = C j + E P k (cid:34)(cid:90) T (cid:40) q j,ν j t (cid:16) A kt + λ (cid:124) k ν t (cid:17) − (cid:32) ν jt q j,ν j t (cid:33) (cid:124) (cid:18) a k Ψ k Ψ k φ k (cid:19) (cid:32) ν jt q j,ν j t (cid:33)(cid:41) dt (cid:35) , (15)7here for each k ∈ K we deﬁne λ k ∈ R K as λ k = ( λ k,k (cid:48) p k (cid:48) ) k (cid:48) ∈K and where we deﬁne ν t ∈ R K as ν t = ( ν k ) k ∈ K . In the expression for H ν j in (15) we suppress the argument ν − j as, in this inﬁnitepopulation limit, their eﬀect is felt through the mean-ﬁelds for each subpopulation. We use thesuperscript in the notation for H ν j to indicate the dependence on the set of mean-ﬁelds.Our new objective is to obtain the Nash-equilibrium in this newly deﬁned mean-ﬁeld game. TheNash equilibrium for the MFG consists of ﬁnding the inﬁnite collection of controls { ν j } ∞ j =1 thatsatisﬁes the optimality condition ν j, ∗ = arg max ω ∈A j H ν ∗ j ( ω ) , (16)as well as the consistency condition ν k, ∗ = lim N (cid:37)∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t , (17)for all k ∈ K .In the limit, the explicit dependence of an agent’s actions in another agent’s objective functionalis replaced with an implicit dependence through the consistency condition. To solve the optimization problem described in Section 3.1, we must determine what strategymaximizes the rhs of equation (16) for all agents. This is achieved by using tools from inﬁnitedimensional convex-analysis or variational calculus along the lines of Bank et al. (2017) (who inves-tigate a single agent tracking problem that does not incorporate price information) and Casgrainand Jaimungal (2018) (who look at a multi-agent setting with price imformation, but where allagents use the same model). First, we demonstrate that each function H ν j is a strictly concavefunctional of ν j . Next, as H ν j is a functional with an inﬁnite-dimensional argument, we show thateach functional H ν j is Gˆateaux diﬀerentiable within the space A j and compute the Gˆateaux deriva-tive explicitly. General results in convex optimization then state that if the derivative vanishes ata point within the space A j , it must be the point at which H ν j attains its supremum. The lemmasthat follow give us the required properties for H ν j . Lemma 3.2.

The functional H ν j deﬁned in equation (15) is strictly concave in A j up to P × µ nullsets.Proof. See Appendix A.1.

Lemma 3.3.

For an agent- j in sub-population k , the functional H ν j deﬁned in equation (15) iseverywhere Gˆateaux diﬀerentiable in A j . The Gˆateaux derivative at a point ν ∈ A j in a direction ω ∈ A j can be expressed as (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:34)(cid:90) T ω t (cid:40) − a k ν t − k q j,νT + (cid:90) Tt (cid:16) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,νu (cid:17) du (cid:41) dt (cid:35) . (18) Proof.

See Appendix A.2. 8herefore, since H ν j is concave, the supremum of H ν j is attained at a point ν ∈ A j if and only ifthe expression (18) vanishes for all ω ∈ A j . Moreover, the strict concavity of H ν j guarantees thatsuch a point is unique up to P × µ null sets. Indeed, as the following theorem shows, the collectionof points { ν j } ∞ j =1 that ensures (18) vanishes for all j ∈ N , and for all ω ∈ A j , coincides with thesolution of an inﬁnite-dimensional system of FBSDE. Theorem 3.4.

We have that ν j, ∗ := arg max ν ∈A j H ν ∗ j ( ν ) (19) for all j ∈ N if and only if for each agent- j in sub-population k , ν j, ∗ ∈ H T and ν j, ∗ is the uniquestrong solution to the FBSDE  − d (2 a k ν j, ∗ t ) = (cid:16) E P k (cid:104) A kt + λ (cid:124) k ν ∗ t | F jt (cid:105) − φ k q j,ν j, ∗ t (cid:17) dt − d M jt , a k ν j, ∗ T = − k q j,ν j, ∗ T , (20) where M j ∈ H T is an F j -adapted P k -martingale and where ν k, ∗ t = lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t , (21) for all k ∈ K .Proof. See Appendix A.3.Theorem 3.4 reduces the convex optimization problem (15), (16), and (17) into an inﬁnite systemof FBSDEs. The forward component comes from the latent drift processes A k and inventoryprocesses q j,ν j, ∗ , while the backwards component comes from the trading rates ν j, ∗ . The couplingin this system appears through the mean-ﬁeld processes ν , which averages out all of the actionsof other agents within the game. A few diﬃculties are immediately apparent in the FBSDE (20).Firstly, each individual FBSDE, corresponding to a particular agent’s trading rate, is written interms of a martingale that is speciﬁc to the agent’s sub-population, and the measure under whichthe process is a martingale corresponds to the agent’s belief about the drift process A k . Secondly,the conditional expected value E P k (cid:104) A kt + λ (cid:124) k ν ∗ t | F jt (cid:105) appears in the driver of the FBSDE. This isa projection of the mean-ﬁelds onto the agent’s ﬁltration, and appears because the agent cannotdirectly observe the strategies of other individuals. This projection of the mean-ﬁelds adds anotherlayer of diﬃculty.Recall that a solution to the FBSDE (20) for agent- j consists of a pair of processes ( ν j, ∗ , M j ) thatsatisﬁes the SDE and terminal condition in (20) P × µ almost everywhere. For the requirements ofTheorem 3.4 to be met, a solution must simultaneously meet the consistency condition (21) P × µ almost everywhere. If we can ﬁnd a set of solutions, we can guarantee it is unique up to P × µ nullsets due to the strict convexity of the objective functional and the ‘if and only if’ nature of thestatement. In this section, we solve the FBSDE (20), and hence provide an exact form for the Nash-equilibrium for the inﬁnite population mean-ﬁeld game. The key to obtaining a solution lies inﬁrst postulating a structure for the solution of (20). This form then suggests a vector valued FB-SDE that the mean-ﬁeld processes ν k must satisfy, which are independent of any individual agent’sstrategy. The resulting non-standard FBSDE system, is deﬁned across the set of K measures9 P k } k ∈ K and introduces an obstacle in solving it directly. The key step in obtaining a solution liesin representing the FBSDE in terms of a single measure, and solving it there.Due to the linear form of the FBSDE (20), it is natural to assume that the solution is aﬃne. Assuch, for an agent- j within a sub-population k , we seek for optimal controls of the form2 a k ν j, ∗ t = 2 a k ν k, ∗ t + h k ,t (cid:16) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:17) , (22)where h k ,t : [0 , T ] → R is an unknown deterministic, continuously diﬀerentiable, function of time,and where we deﬁne the mean-ﬁeld inventory process ¯ q k,ν k, ∗ = (¯ q k,ν k, ∗ t ) t ∈ [0 ,T ] for sub-population k as ¯ q k,ν k, ∗ t = ¯ m k + (cid:90) t ν k, ∗ u du . Plugging this ansatz into (20) and simplifying, we ﬁnd that0 = (cid:110) ∂ t h k ,t + a k ( h k ,t ) − φ k (cid:111) (cid:16) q j,ν j, ∗ − ¯ q k,ν k t (cid:17) dt + (cid:110) d (2 a k ν k, ∗ t ) + (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t |F jt ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M jt (cid:111) , (23)along with the boundary condition that0 = { h k ,T + 2 Ψ k } ( q j,ν j, ∗ T − ¯ q k,ν k, ∗ T ) + { a k ν k, ∗ T + 2Ψ k ¯ q k,ν k, ∗ T } , (24)which must both hold P k × µ almost everywhere. Therefore, to solve the FBSDE (20), it is suﬃcientfor us to make the terms in the curly brackets of equation (23) and in the boundary condition (3.3)vanish independently of one another. Collecting these equations, we obtain a ﬁrst-order Riccati-type ODE for h k ,t , (cid:40) ∂ t h k ,t + a k ( h k ,t ) − φ k = 0 ,h k ,T = − k , (25)as well as a linear FBSDE for the mean-ﬁeld process ν kt  − d (2 a k ν k, ∗ t ) = (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t | F jt ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M jt , a k ν k, ∗ T = − k ¯ q k,ν k, ∗ T , (26)where M j = (cid:16) M jt (cid:17) t ∈ [0 ,T ] ∈ H T is an F j -adapted P k -martingale.Let us point out here that the ansatz for ν j, ∗ found in equation (22) satisﬁes the consistencycondition as long as there exist solutions to the equations (25) and (26). This can be most easilyseen by taking the average of (22) over j ∈ K Nk and taking the limit as N → ∞ .The FBSDE (26) suggests that the solution ν k, ∗ should be an F j -adapted process. Equation (26),however, holds for any agent- j (cid:48) for which j (cid:48) ∈ K k , therefore, ν k, ∗ must be F j (cid:48) -adapted for any j (cid:48) ∈ K k . Consequently, each ν k, ∗ must be adapted to the ﬁltration generated by the intersection (cid:84) j (cid:48) ∈K k F j (cid:48) t . Computing this intersection, we ﬁnd that (cid:84) j (cid:48) ∈K k F j (cid:48) t = (cid:84) j (cid:48) ∈K k σ (cid:16) ( S u , ν ∗ u , q j,ν j, ∗ u ) u ∈ [0 ,t ] (cid:17) ⊆ σ (cid:0) ( S u , ν ∗ u ) u ∈ [0 ,t ] (cid:1) , which does not depend on the sub-population k . This is easy to see since ( i )each q j is not measurable with respect to σ ( q i ) for any i (cid:54) = j and ( ii ) for any j ∈ N , ν j is notmeasurable with respect to σ ( ν ∗ ) by deﬁnition from (21). Thus, for each k ∈ K , we have that ν k, ∗ is an F -adapted process, where we deﬁne F t := (cid:86) j ∈K k F jt = σ (cid:0) ( S u , ν ∗ u ) u ∈ [0 ,t ] (cid:1) . As a consequence,10e ﬁnd that ν k, ∗ should satisfy the FBSDE  − d (2 a k ν k, ∗ t ) = (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t | F t ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M kt , a k ν k, ∗ T = − k ¯ q k,ν k, ∗ T , (27)where M k = ( M kt ) t ∈ [0 ,T ] is an F -adapted, P k -martingale, and the expectation appearing in thedrift is conditional on F t not F jt .By stacking the FBSDEs (27) over all values of k ∈ K , we may obtain a vector-valued FBSDEfor the process ν ∗ . To this end, deﬁne the column vector of ﬁltered drift processes (cid:98) A = ( (cid:98) A t ) t ∈ [0 ,T ] where (cid:98) A t = (cid:16) E P k [ A kt |F t ] (cid:17) k ∈ K . Next, as ν ∗ t is F t -measurable, stacking the FBSDEs (27) over allvalues of k ∈ K , we have  − d (2 a ν ∗ t ) = (cid:16) (cid:98) A t + Λ ν ∗ t − φ ¯ q ν ∗ t (cid:17) dt − d M t , a ν ∗ T = − Ψ ¯ q ν ∗ T , (28)where a , φ , Ψ and Λ are all real-valued K × K matrices deﬁned as a = diag ( { a k } k ∈ K ) , φ = diag ( { φ k } k ∈ K ) , Ψ = diag ( { Ψ k } k ∈ K ) , Λ =  λ , p . . . λ ,K p K ... ... λ K, p . . . λ K,K p K  , where ¯ q ν ∗ t = m + (cid:82) t ν ∗ u du , and M = ( M k ) k ∈ K is a column vector of the F -adapted processes,where as a reminder, M kt ∈ H T , ∀ k ∈ K and the k -th element M k is a P k -martingale.From the linear structure of the FBSDE (28), we can further simplify the problem by seeking foraﬃne solutions of the form 2 aν ∗ t = g ,t + g ,t ¯ q ν ∗ t , (29)where g ,t : [0 , T ] → R K × K is a deterministic and continuously diﬀerentiable function of time,and g = ( g ,t ) t ∈ [0 ,T ] ∈ H T is an R k -valued stochastic process. Plugging the ansatz into (28), andfollowing through with the same logical steps as before, we ﬁnd that the ansatz holds true so longas g is the solution to the Ricatti-type matrix-ODE (cid:40) ∂ t g ,t = (cid:0) Λ + g ,t (cid:1) (2 a ) − g ,t − φ , g ,T = − Ψ , (30)and when g ,t solves the BSDE, (cid:40) − d g ,t = (cid:16) (cid:98) A t + (cid:0) Λ + g ,t (cid:1) (2 a ) − g ,t (cid:17) dt − d M t , g ,T = 0 , (31)where M is the same vector of processes present in FBSDE (28).At this point, we have succeeded in reducing the search for a Nash-equilibrium to solving (i) twodeterministic ordinary diﬀerential equations (ODEs) (25) and (30), and (ii) a non-standard linearBSDE (31). The ODEs are straightforward to solve, however, BSDE poses some further challenges.One of the primary obstacles in solving the BSDE (31) is that each component of g incorporates11 process that is a martingale under a diﬀerent probability measure. Recall that the componentsof M = {M k } k ∈ K are required to be martingales with respect to the k diﬀerent measures { P k } k ∈ K .Each measure is what agents within sub-population k use to compute expectations, and agentswithin that sub-population assume the asset has drift A k in excess of the order-ﬂow from all agents.The key step in solving the BSDE is to re-cast it in terms of martingales under a single probabilitymeasure. This introduces non-trivial drﬁt adjustments, however, we ﬁnd that it is indeed possibleto solve the modiﬁed BSDE explicitly.Consider the k th dimension of the BSDE (31) − dg k ,t = (cid:16) (cid:98) A kt + G kt g k ,t (cid:17) dt − d M kt , (32)where M k is a P k -martingale, and where G kt is deﬁned as the k -th row of the deterministic matrix-valued function G t = (cid:0) Λ + g ,t (cid:1) (2 a ) − . The solution of BSDE (32) can be expressed implicitly asfollows g k ,t = E P k (cid:20) (cid:90) Tt (cid:110) (cid:98) A ku + G ku g ,u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (33)Next, we aim to represent (33) in terms of expectation under another measure Q such that Q ∼ P k for all k . By the assumption that P k ∼ P k (cid:48) for all k, k (cid:48) ∈ K , there always exists such a measure. Forexample, Q = P k for some k . Given this measure, deﬁne the F -adapted Radon-Nikodym derivativeprocesses Z Q ,kt = d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t := E (cid:20) d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , ∀ k ∈ K . (34)Using this process, we ﬁnd that we may write equation (33) as an expected value under the Q measure as, Z Q ,kt g k ,t = E Q (cid:20) (cid:90) Tt (cid:110) Z Q ,ku (cid:98) A ku + Z Q ,ku G ku g ,u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (35)Deﬁning the diagonal R K × K valued process Z Q = ( Z Q t ) t ∈ [0 ,T ] , where Z Q t = diag( Z Q ,kt ) k ∈ K , allowsus to write a linear BSDE for Z Q t g ,t = (cid:16) Z Q ,kt g k ,t (cid:17) k ∈ K using a single measure Q . More speciﬁcally,from (35), we have that − d (cid:16) Z Q t g ,t (cid:17) = (cid:16) Z Q t (cid:98) A t + Z Q t G t g ,t (cid:17) dt − d ˜ M t , (36)where ˜ M = ( ˜ M t ) t ∈ [0 ,T ] is an R K -valued Q -martingale. The BSDE (36) is linear and its solutioncan be expressed in closed form. The following theorem provides a representation for the solutionof g as well as { h k } k ∈ K , and g . Theorem 3.5 (Solutions to the Mean-Field BSDEs) . I) Let Q be any probability measure such that Q ∼ P . Then the BSDE (31) admits a closedform solution, g ,t = E Q (cid:20)(cid:90) Tt ( E Q t ) − E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21) , (37) where E t is the solution to the forward matrix-valued SDE d E Q t = E Q t (cid:16) G t dt + ( Z Q t ) − d Z Q t (cid:17) , E Q = Z Q , (38)12 here the deterministic matrix valued function G t := (cid:0) Λ + g ,t (cid:1) (2 a ) − and Z Q t = diag (cid:32) d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:33) k ∈ K . (39) II) There exists a unique solution g ,t to the matrix valued ODE (30) that is bounded over theinterval [0 , T ] .Moreover, let Y t : [0 , T ] → R K × K be deﬁned as Y t = e ( T − t ) B (cid:16) I ( K × K ) , − Ψ (cid:17) (cid:124) , (40) where B ∈ R K × K is the block matrix B = (cid:18) ( K × K ) − (2 a ) − − φ Λ (2 a ) − (cid:19) , (41) then, using the matrix partition Y t = ( Y ,t , Y ,t ) (cid:124) , where Y ,t , Y ,t ∈ R K × K , the function g ,t may be expressed as g ,t = Y ,t Y − ,t . (42) III) The ODE (25) admits the unique solution h k ,t = − ξ k (cid:18) Ψ k cosh ( − γ k ( T − t ) ) − ξ k sinh ( − γ k ( T − t ) ) ξ k cosh ( − γ k ( T − t ) ) − Ψ k sinh ( − γ k ( T − t ) ) (cid:19) , ∀ k ∈ K , (43) where the constants γ k = (cid:112) φ k /a k and ξ k = √ φ k a k . Moreover, h k ,t ≤ for all t ∈ [0 , T ] .Proof. See Appendix A.4.This theorem shows that g may be expressed in terms of any measure Q ∼ P , which includes anyof the { P k } k ∈ K . The representations for g , g and h k in (37), (42) and (43), respectively, togetherwith the form of ν j, ∗ in (22), provides us with a candidate for the optimal control in the populationlimit. It only remains to ensure that this optimal control is indeed admissible, i.e., ν j, ∗ ∈ A j . Thefollowing theorem provides suﬃcient conditions for this to hold. Theorem 3.6.

Let us assume that g ∈ H T . Then the optimality equation (20) admits the solution ν j, ∗ t = ν k, ∗ t + a k h k ,t ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) , (44) and the mean-ﬁeld trading rate process ν ∗ = (cid:0) ν k, ∗ (cid:1) k ∈ K may be written ν ∗ t = g ,t + g ,t ¯ q ν ∗ t , (45) where g , g and h k are the functions deﬁned in Theorem 3.5, and the mean-ﬁeld inventory process ¯ q ν ∗ = (¯ q k,ν k, ∗ ) k ∈ K is ¯ q ν t = ¯ m + (cid:90) t ν ∗ u du . Moreover, the collection of proposed optimal solutions satisﬁes ν j, ∗ = arg max ω ∈A j H ν ∗ j ( ω ) (46)13 or all j ∈ N .Proof. See Appendix A.5.Theorem 3.6 guarantees, under the technical assumption that g ∈ H T , our proposed solutionforms a Nash-equilibrium for the limiting mean-ﬁeld game. Moreover, Theorem (3.4) guaranteesthat the solution is unique up to P × µ null sets. The condition g ∈ H T holds for the class ofmodels presented in Sections 5 and 7. While these models are not exhaustive, they provide aninstructive class to study. The optimal solution provided in Theorem (3.6) admits many interesting properties. Firstly, themean-ﬁeld trading rate in (45) contains two parts: (i) a ‘risk control’ portion g ,t ¯ q ν ∗ t , which isindependent of the dynamics of the asset price process; and (ii) an ‘alpha trading’ or statisticalarbitrage portion g .The ‘risk control’ portion ( g ,t ¯ q ν ∗ t ) survives even when A k = 0 ∀ k ∈ K , i.e., the midprice processsubtracted from total order-ﬂow is a martingale and induces interactions between the various sub-populations due to the their permanent impact. It can be shown through numerical examples thatthis function scales with the parameter matrix φ and Ψ to make agents liquidate their inventoriesfaster when either φ or Ψ become large, thereby controlling the risk agents take while trading.In the ‘alpha trading’ portion ( g ,t ), agents adjust their trading based on a weighted average of (cid:98) A , the estimated drift of the asset price for all agents. The weighting process E encodes bothinformation about the ‘risk’ portion of the algorithm, g , as well as information about all otheragent’s measures through the process Z , which implicitly appears through the dynamics of E . Theweighting function compensates for the diﬀering models agents use for the asset price, and adjuststhe individual trading rates to account for the price impact due to ‘alpha trading’ of all otheragents.The Nash equilibrium, provided in Theorem 3.6, resembles the one obtained in Casgrain andJaimungal (2018), with the main diﬀerences lying in the expression for the value of the function g . The diﬀerences are important and reveal themselves in two ways.First, here, we have a stochastic weighting process E deﬁned by the SDE (38) which replacesthe deterministic time-ordered exponential function present Casgrain and Jaimungal (2018). Infact, we can view E as the natural extension of the time-ordered exponential appearing in Casgrainand Jaimungal (2018) to the case of stochastic processes. Second, to determine the correctionto trading, rather than weighting a single estimate of future alpha as in Casgrain and Jaimungal(2018), all posterior estimated alphas’ (cid:98) A k under all measures P k , k ∈ K , play a role. Finally, when P k = P k (cid:48) for all k, k (cid:48) ∈ K , the optimal controls in Theorem 3.6 match the one presented in Casgrainand Jaimungal (2018).Thus far, we discussed the optimal mean-ﬁeld strategy. The individual agents’ trading rates alsoadmit an interesting structure. An arbitrary agent trades at their own sub-population mean-ﬁeldrate ν k plus a correction term proportional to the diﬀerence between their individual inventory andthe mean-ﬁeld inventory: ( q j,ν j, ∗ − ¯ q k,ν k ). This diﬀerence can be solved for in terms of the diﬀerencebetween the initial inventory of the agent and its sub-population prior mean:( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) = ( Q j − ¯ m k ) e (cid:82) t h k ,u du , (47)where h k ,t ≤ t ∈ [0 , T ]. Therefore, the diﬀerence in inventories shrinks towards zeroat a deterministic rate, and agents are consistently drawing their inventories closer to their sub-14opulation’s mean-ﬁeld. As time elapses, all agents in a sub-population resemble that of theirsub-population’s mean-ﬁeld.

4. The (cid:15) -Nash Equilibrium Property

In Section 3, we solve the stochastic game in the inﬁnite population limit, and provide an exactrepresentation of each agent’s control at the Nash-equilibrium. One important question to ask ishow the optimal MFG strategy performs in a ﬁnite-population game. We study the properties ofthe limiting strategy in the ﬁnite game by looking at how close the collection of limiting strategies,deﬁned in Theorem (3.6) is to the true Nash-equilibrium of a game with only N agents.Let us consider a ﬁnite game with N players, as described in Section 2. Let us assume that eachof the agents in this population use the strategy described in Theorem 3.6. Each agent computesthe process ν t according to equation (45), and then uses these values to compute their own tradingrates, ν j, ∗ t , according to equation (44). In the theorem that follows, we show that this collection ofcontrols can serve as a quasi-Nash-equilibrium in a ﬁnite player game, provided that the populationsize is large enough. Theorem 4.1 ( (cid:15) -Nash equilibrium) . Consider the collection of objective functionals { H j : j ∈ N } deﬁned in equation (10) and the set of optimal mean-ﬁeld controls { ν j, ∗ } Nj =1 deﬁned in Theo-rem (3.6) . Suppose that there exists a sequence { δ N } ∞ N =1 such that δ N → and (cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k N − p k (cid:12)(cid:12)(cid:12)(cid:12) = o ( δ N ) (48) for all k ∈ K , then H j ( ν j, ∗ , ν − j, ∗ ) ≤ sup ν ∈A H j ( ν, ν − j, ∗ ) ≤ H j ( ν j, ∗ , ν − j, ∗ ) + o ( N ) + o ( δ N ) (49) for each j ∈ N .Proof. See Appendix A.8.Theorem 4.1 shows that for any (cid:15) >

0, there exists N (cid:15) such that for all N > N (cid:15) agent- j mayimprove their performance by at most (cid:15) by unilaterally deviating away from ν j, ∗ . The statementof the theorem also reveals that the rate N (cid:15) must be at least linear in (cid:15) − and is dependent on therate at which δ N vanishes in the limit. This theorem eﬀectively demonstrates that the mean-ﬁeldequilibrium { ν j, ∗ } Nj =1 serves as a viable alternative to the true ﬁnite-game equilibrium, providedthe population size is large enough.

5. An Example Model of Disagreement

In this part, we provide an example model where the asset price process is modulated by alatent Markov chain similarly to that in Casgrain and Jaimungal (2016). In our model, we assumeeach sub-population disagrees on the distribution of initial value of the latent process, while theydo agree on what the possible values of the latent state are, and agree on the transition ratesbetween states. One can view this as all agents believing there are positive, neutral, and negativedrift environments, but disagree on what is the current environment. We prove that the resultingoptimal control presented in Theorem 3.6 exists and is well-deﬁned, i.e., that g ∈ H T , under thisgeneral model assumption. 15o this end, assume that the asset price satisﬁes the SDE dS ν ( N ) t = (cid:32) J (cid:88) i =1 α it { Θ t = θ i } + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p ( N ) k (cid:48) ν k (cid:48) , ( N ) u (cid:33) dt + σdW t , (50)where Θ t is a continuous-time Markov chain taking values in the set { θ i } i ∈ J ( J = { , . . . , J } )and where the processes α i = ( α it ) t ∈ [0 ,T ] are F -predictable processes satisfying α i ∈ H T for all i ∈ J . In this model, agents across diﬀerent sub-populations have diﬀerent prior probabilities onthe initial value of the latent process, so that under the measure P k (Θ = θ i ) = π k,i ∈ (0 ,

1) with (cid:80) i ∈ J π k,i = 1. We assume that under each measure P k , the latent Markov chain Θ t has the sameinﬁnitesimal generator matrix C . Furthermore, we assume that W is a stardard Brownian motionin each measure P k and that σ > λ in each measure, so that, in the notation of section 2, wehave λ k = λ for all k ∈ K .This model may be interpreted as a case in which agents all agree on the dynamics of the assetprice S ν and the latent process Θ but disagree on the initial value of the latent process. Thespeciﬁcation allows us to compute the expression for the processes { Z P k t } k ∈ K , which are used tocompute each agent’s optimal strategy. With this model, we may compute the Radon-Nikodymderivative process Z P k t for any measure P k . Proposition 5.1.

Fix Q = P k for some k ∈ K . If the asset price dynamics follow the latent Markovchain model of equation (50) , then Z P k t , deﬁned in Theorem 3.5, may be expressed as Z P k t = (cid:88) j ∈ J M kj P k (cid:0) Θ = θ j (cid:12)(cid:12) F t (cid:1) , (51) where for each j ∈ J we deﬁne the diagonal matrix M kj = diag (cid:16) π k (cid:48) ,j (cid:46) π k,j (cid:17) k (cid:48) ∈ K .Proof. See Appendix A.6.From expression (51), it is clear that Z P k is almost surely bounded, since P k (cid:0) Θ = θ j (cid:12)(cid:12) F t (cid:1) ∈ [0 , π k,i ∈ (0 ,

1) for all k ∈ K , i ∈ J . We use this fact in the proof of the following proposition. Proposition 5.2.

Suppose that the asset price process is given by Equation (50) , then the solution g deﬁned in Theorem 3.5 satisﬁes g ∈ H T and thus the results of Theorem 3.6 apply to the modeldescribed in this section.Proof. See Appendix A.7.Although we show that there exist models for which the mean-ﬁeld optimal control presented inTheorem 3.5 is well deﬁned, computing these controls presents us with another challenge. In par-ticular, due to the complicated nature of the process E Q , the conditional expected value appearingin the expression (37), for obtaining g , is diﬃcult to compute. In section 7, we address this issueby presenting a computational method to approximate such expressions. The generator matrix C ∈ R J × J can be any matrix satisfying the conditions C i,j ≥ i (cid:54) = j ∈ J and C i,i = (cid:80) j (cid:54) = i ∈ J C i,j . C deﬁnes the transition dynamics of the latent Markov chain Θ t through the relation, P k (cid:16) Θ t + h = θ i (cid:12)(cid:12)(cid:12) Θ t = θ j (cid:17) = (cid:0) e h C (cid:1) i,j , where e h C represents the matrix exponential. . A Simulation-Based Computational Method For most non-trivial models, obtaining a closed-form expression for the solution to the BSDE (37)for g ,t proves to be very diﬃcult. To overcome this diﬃculty, we present a simulation-basedcomputational method to approximate solutions. We propose a Least-Square-Monte-Carlo (LSMC)based method, which closely resembles the methods used to approximate solutions of BSDEs, asin Bender and Steiner (2012) and Gobet et al. (2005). Unlike these two methods, however, we donot concern ourselves with the computation of the martingale portion of the BSDE (36), since it isnot required to compute g .To this end, deﬁne the M -point uniform partition of the interval [0 , T ], T := { t m := m × ∆ , m =0 , , . . . , M } where M is a positive integer and where ∆ := T /M is the discretization interval .We aim to approximate the process g over the partition T with a discrete-time stochastic processˆ g = (cid:8) ˆ g ,t m (cid:9) t m ∈T , where each ˆ g ,t m ∈ R K .To derive an expression for ˆ g , we ﬁrst study the expression for g ,t , g ,t = E Q (cid:20)(cid:90) Tt ( E Q t ) − E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21) (52)at the points t m ∈ T . This expression may be written recursively over T as follows g ,t m = E Q (cid:20)(cid:90) t m +1 t m ( E Q t m ) − E Q u (cid:98) A u du + ( E Q t m ) − E Q t m +1 g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:21) . (53)Next, approximating the time-integral in the previous expression with its left end-point, we obtainthe approximation g ,t m ≈ E Q (cid:104) (cid:98) A t m ∆ + ( E Q t m ) − E Q t m +1 g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) . (54)A further simpliﬁcation follows by approximating the term ( E Q t m ) − E Q t m +1 for small values of∆. Using the deﬁnition of E Q in Equation (38), we may factor E Q as E Q t = ˜ E Q t Z Q t , where ˜ E Q t is the solution the the matrix-valued SDE d ˜ E Q t = ˜ E Q t (cid:16) Z Q t G t ( Z Q t ) − (cid:17) dt with initial condition˜ E Q = I ( K × K ) . For ∆ (cid:28)

1, we freeze the process in parenthesis at their t m values, so that d ˜ E Q t ≈ ˜ E Q t (cid:16) Z Q t m G t m ( Z Q t m ) − (cid:17) dt over each interval [ t m , t m +1 ), resulting in( ˜ E Q t m ) − ˜ E Q t m +1 ≈ exp (cid:110) Z Q t m G t m ( Z Q t m ) − ∆ (cid:111) = Z Q t m exp { G t m ∆ } ( Z Q t m ) − , (55)where exp represents matrix exponential. By plugging in this last result into equation (54), weobtain an approximation ˆ g for the process g at t m asˆ g ,t m = E Q (cid:104) (cid:98) A t m ∆ + exp { G t m ∆ } ( Z Q t m ) − Z Q t m +1 ˆ g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) . (56)The ﬁnal step in obtaining values of ˆ g is to approximate the conditional expected value in therhs of equation (56). As is often done, we project the conditional expectation onto a ﬁnite basis ofstochastic processes. In particular, let the (vector-valued) stochastic process Y = ( Y t ) t ∈ [0 ,T ] , with Y t ∈ R L where L is some positive integer, and we write E Q (cid:104) (cid:98) A t m ∆ + exp { G t m ∆ } ( Z Q t m ) − Z Q t m +1 ˆ g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) ≈ (cid:10) Y t m , β t m (cid:11) (57)for some collection { β t } t ∈T , where each β t ∈ R L × K , and where the process Y can be chosen fairly17rbitrarily. A common and sensible choice for Y is a ﬁnite basis expansion of the state processesof the problem (i.e. S νt , Z Q t , etc.) and combinations of them.The algorithm then estimates the coeﬃcients (cid:98) β in a sequential manner. This is done by ﬁrstsimulating paths of Y t forward over the time partition T using the measure Q , and then proceedingbackwards in time from the boundary condition, solving a least-square regression problem at eachtime step t m ∈ T to obtain each of the coeﬃcients (cid:98) β . The details of this algorithm are illustratedin Algorithm 1 below. Algorithm 1 is an application of the LSMC methods that already exist forBSDEs and we point the reader to Bender and Steiner (2012) and Gobet et al. (2005) for moredetails on the convergence rates and error bounds. Data:

Simulate M paths of ( Y t , Z Q t , (cid:98) A ) over T using measure Q Set (cid:98) β t M = ( L × Set ˆ g ,t M ( Y ) = ( L × for m = M − , M − , . . . , do Set (cid:98) β t m = arg min β M (cid:88) n =1 (cid:16) (cid:10) Y nt m , β (cid:11) − (cid:110) (cid:98) A nt m ∆ + exp { G t m ∆ } ( Z Q ,nt m ) − Z Q ,nt m +1 ˆ g ,t m +1 ( Y nt m +1 ) (cid:111) (cid:17) Set ˆ g ,t m ( Y ) = (cid:68) Y nt m , (cid:98) β t m (cid:69) endAlgorithm 1: The LSMC algorithm used to approximate the valueof the process g given in Equation (37).As the process Z Q is deﬁned as a diagonal matrix of Radon-Nikodym derivatives, it is possibleto re-write conditional expected value over Q in equation (56) in an element-wise fashion as (cid:98) g k ,t m +1 = (cid:98) A kt m ∆ + (cid:88) k (cid:48) ∈ K (exp { G t m ∆ } ) k,k (cid:48) E P k (cid:48) (cid:104) (cid:98) g k (cid:48) ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) , ∀ k ∈ K , (58)where ( · ) k,k (cid:48) represents element ( k, k (cid:48) ) of the matrix. The above representation allows one to mod-ify Algorithm 1 such that it eliminates the dependence on the process Z Q in the LSMC procedure,but at the cost of having to simulate the basis process Y across all measures { P k } k ∈ K . We ﬁnd thatin examples where simulating the process Z Q t is straightforward, this is much less eﬃcient thanAlgorithm 1 due to the need of simulating and storing K copies of the process Y . In cases where Z Q is intractable, however, this modiﬁcation may be a viable alternative for computing ˆ g .Equation (58) also provides additional insight into how the optimal policy is trading. As pointedout in Section 3.4, the process g represents the ‘statistical arbitrage’ portion of the agent’s optimaltrading strategy. Equation (58) further reveals that, over one step, agent’s of type- k trade propor-tionally to the sum of their best estimate of the asset’s drift ( (cid:98) A kt m ∆) and a weighted average of theexpected end of period ‘alpha’ from all sub-populations. Hence, agents trade based on expectedexogenous price movements plus what they anticipate other traders’ actions to have on price. Theweights generated by the matrix exp { G t m ∆ } serve to risk adjust the agent’s own alpha tradingand to adjust for the impact of other agents based on the scale of their market impacts.18 . Numerical Experiments This section showcases numerical experiments resulting from a particular model of diﬀering beliefs.We ﬁrst assess the performance of the LSMC algorithm presented in Section 6 by comparing, inthe case of equal beliefs, to the analytical results in Casgrain and Jaimungal (2018). The algorithmis then used to approximate and simulate a ﬁnite collection of agents trading at the mean-ﬁeldNash-equilibrium when the agents have diﬀering beliefs.For the remainder of the section, we assume the asset price process follows a linear mean-revertingmodel described in Section 5 with K = 2 sub-populations. Deﬁne the un-impacted asset priceprocess F = ( F t ) t ∈ [0 ,T ] to be the solution to the SDE dF t = κ (Θ t − F t ) dt + σ dW t , (59)where κ, σ > W = ( W t ) t ∈ [0 ,T ] is a Wiener process in both measures P and P , and Θ = (Θ t ) t ∈ [0 ,T ] is a latent Markov chain with generator matrix C which can take one of two values in the set { θ , θ } .The asset price process including the price impact is then deﬁned as having the dynamics dS ν ( N ) t = dF t + (cid:16) λ p ( N )1 ν , ( N ) t + λ p ( N )2 ν , ( N ) t (cid:17) dt with λ , λ > p ( N ) k ) k ∈ K deﬁned in Section 2. We assume sub-population 1 believes the initialvalue of Θ has distribution π , while sub-population 2 believes the initial value has distribution π . The dynamics of the asset price process causes it to mean-revert towards the value of Θ t ,which may change over the course of the trading period [0 , T ]. Furthermore, this model falls intothe class of models described in Section 5, which guarantees that the mean-ﬁeld optimal solutionfrom Theorem 3.6 exists and is well deﬁned. To assess the LSMC algorithm described in Section 6, we choose a special, non-trivial, case where g can be computed in closed form. The case we study is when P = P = P . This reduces themarket model to one where all agents agree on the dynamics of the asset price process. We maythen assess the accuracy of the approximation by comparing the results produced by the LSMCalgorithm to the closed-form solution of the optimal control in Casgrain and Jaimungal (2018).For this particular experiment, we use the model presented in the previous section, but where theprior on the initial states of the latent process is the same for all agents. The two sub-populationsof agents may, however, diﬀer in their parameter triplet (Ψ k , φ k , a k ). For the experiments we usethe parameters in Table 1. The parameters chosen for this experiment match the parameters usedin the simulations in Section 5 of Casgrain and Jaimungal (2018). Due to the large value of theparameter Ψ k , agents in both sub-populations are incentivized to fully liquidate their inventorypositions before the end of the trading horizon. The risk-aversion parameter φ k is 10 times largerin sub-population 2 than in sub-population 1. This can be interpreted as a model in which agents insub-population 2 are averse to holding any inventory and are intent on liquidating their inventoriesas quickly as possible, while agents in sub-population 1 do not feel such urgency and are more opento trading on alpha. k N k Ψ k φ k a k − − − − Table 1: Population and impact parameters for the two sub-populations of agents.

19e set T = 1 to be the trading horizon for the model. The asset price process follows the Markovmodulated Ornstein-Uhlenbeck dynamics in (59), with parameters provided in Table 2. Table 2also deﬁnes the parameters for the dynamics of the latent process Θ t . Θ t is deﬁned so that theasset price process either mean reverts to θ = 4 .

95 or θ = 5 .

05, depending on the state of Θ. Inthis particular experiment, we set the distribution of Θ so that there is an equal chance of startingin each of the states. We also choose an asymmetric generator matrix so that the latent process istwice as likely to spend time in state 1 than state 2. π = ( . . ), C = (cid:2) − − (cid:3) , θ = { . , . } , κ = 5 . σ = 0 . λ k = 10 − Table 2: The parameters used for the asset price dynamics and for the latent process.

We run Algorithm 1 using 10 simulated paths over a partition of size 3600 over the interal [0 , T ]and compare the results from the LSMC algorithm applied to these simulated paths to the closedform solution for g in Casgrain and Jaimungal (2018). In this particular case, we set the basisprocess, Y t , to be a second-order monomial expansion of ( S ν t , π t ) with product terms included,where we deﬁne π it = P (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) for i = 1 ,

2. For details on how to compute such conditionalprobabilities, see Section Appendix B and (Casgrain and Jaimungal, 2016, Section 3).

Figure 1: Error plots for the LSMC algorithm described in Section 6. In these plots, we compare the value of theLSMC estimate, ˆ g , with the true value of g , in a special case where we can compute g in closed form. The upperpanel shows the standard deviation of the error, SD (cid:0) ˆ g k ,t − g k ,t (cid:1) computed over 10 simulations, for each k = 1 , , T ]. The lower panel, plots the quantity E (cid:2) | g k ,t − ˆ g ,t | (cid:3) (cid:14) E (cid:2) | g k ,t | (cid:3) computed over 10 simulations,and provides a measure of relative error. Figure 1 shows that the LSMC algorithm performs well and with a high level of accuracy withthis particular model. In particular, from the lower panel, we see that the largest relative error isabout 1 . .

5% of the absolute size of g . Wehave also observed, as elsewhere in the LSMC literature such as in Letourneau and Stentoft (2016)and Wang and Caﬂisch (2009), that randomizing the initial value of the state process, ( S , π ), forthe forward simulation portion of the algorithm signiﬁcantly improves the estimates. Furthermore,the errors reported in this ﬁgure appear to be consistent across a wide variety of model parameters.For the more general case in which there are diﬀerent measures assigned to each population, weset Q = P and we enlarge the basis process Y t to be a monomial expansion of the forward stateprocess ( S νt , { π kt } k ∈ K , { Z P ,kt } k ∈ K ). Expansions with respect to diﬀerent bases, such as Laguerre or20ermite polynomials, are also possible, however, in our experiments, we ﬁnd the monomial basisexpansion performs well enough. In this section, we simulate the full market with agents of diﬀering beliefs disagreement. Theexample continues to use the model in Section 7.1 with the parameters in Table 1 and 2, with theexception that the distributions on Θ now diﬀers across each of the sub-populations. In particular,we assume agent’s in sub-population 1 believe that prior distribution over initial states is π = ( . . ),while the sub-population 2 believe it is π = ( . . ). In other words, sub-population 1 believes thatthe latent process will much more likely begin in the higher state, while sub-population 2 assumesthe reverse. In the simulation, we also assume the starting inventory of agents in sub-population k has distribution Q j ∼ N (¯ µ k , ¯ σ ), where we set ¯ µ = 100, ¯ µ = 0 and σ = 50. The rationaleis so that the risk-averse sub-population 1 begins the trading period long 100 shares on average,while agents in sub-population 2 begin the trading period with zero shares on average. Over thecourse of the simulation, we ﬁx the path of the latent process to begin in the upper state and thenjump down to the lower state at t = 0 .

5. To compute the trading strategy of each participatingagent, we use the LSMC method from the preceding section to approximate the value of g andthen use this value in Theorem 3.6 to determine the optimal trading rate of the ﬁctitious mean-ﬁeld and then each individual agent. At each time step, we compute the basis process Y t byusing a ﬁfth-order polynomial expansion of the state process ( S νt , { π kt } k ∈ K , { Z P ,kt } k ∈ K ), and usethe coeﬃcients obtained by the LSMC algorithm to obtain an approximation for g ,t . Computingthe values of π kt and Z P ,kt requires the computation of a collection of posterior probabilities ateach time step. To do this, we make use of the ﬁltering and smoothing equations which are detailedin Appendix Appendix B. Figure 2: State processes from a single simulated scenario of the market.

Left panel : inventory path process fromall agents (sub-populations separated by color), the sub-population mean-ﬁeld inventory process ¯ q k , and the sub-population empirical mean inventory ¯ q k, ( N ) . Right panel : ( top ) the unimpacted F and impacted S asset priceprocesses, and the latent Markov chain Θ; ( middle ) sub-population ﬁlters π k,jt = P k (cid:0) Θ t = θ j (cid:12)(cid:12) F t (cid:1) for the latentprocess state; ( bottom ) the Radon-Nikodym process Z P , t = d P d P (cid:12)(cid:12) F t . Figure 2 shows one sample path of the simulation of all agents. The ﬁgure demonstrates a numberof path-wise properties of the trading algorithm and of the beliefs of each of the sub-populationsof agents. Firstly, the left panel shows that the agents inventory paths diﬀer signiﬁcantly between21he sub-populations. As mentioned in Section 7.1, and resulting from the population parametersin Table 1, sub-population 1 is far more risk-averse than sub-population 2. This is reﬂected in thepath-wise variance of their inventory.Agents in sub-population 1 begin long the asset on average. As these agents are risk-averse,their main concern is to unwind their position quickly. They are, however, conscious of their ownexpectations of the future path of the asset price as well as the expectations of sub-population 2,which they use to adjust the rate at which their inventory is liquidated. This last eﬀect can be seenthrough the variations of the inventory paths of sub-population 2 in Figure 2.Agents in sub-population 2 are instead concerned with proﬁting from statistical arbitrage. Theybegin the trading period by incorrectly assigning a 90% probability that the latent process is inthe upper state. Because of this, they expect the asset price to mean-revert downwards slightly,so they begin by taking a slight short position in the asset over the time period t ∈ [0 , . t = 0 .

15, the asset price has approximately reached the lower mean reverting level. The agentexpects that the asset price will now be reverting upwards in the long run, since it expects the stateof the latent process to switch, which would cause the price to begin reverting upwards. Becauseof this, the agent begins reverting their short position into a long position in the asset over thecourse of the time period t ∈ [0 . , . t = 0 .

4, agents fromgroup 2 are now conﬁdent that the latent process is in the upper state. Moreover, using the sametrain of logic as before, it expects the price to mean revert downwards in the long run, due to anexpected switch in the latent process. Thus it gradually shifts to a short position and repeats thesame process. The magnitude of the long and short positions for sub-population 2 decrease as theend of the trading period approaches. This is due to the fact that the agent is highly insentivizedto completely liquidate their inventory before time t = 1, and therefore reduces their absoluteexposure so that it is easier to completely liquidate their inventory.From the ceter-right part of Figure 2, we also see that the posterior distribution over latent statesfor each group converge to one another as time progresses. This is since, although their priorsare diﬀerent, the agents are able collect information so that the eﬀect of the priors on the ﬁnalposterior computation is negligible by a certain time. Furthermore, as was pointed out in thediscussion following equation (58), the strategies of agents from diﬀerent sub-populations feed intoone another. This causes agents from diﬀerent sub-population to move synchronously with respectto one another, as seen in the left of Figure 2, where the upwards and downward variations inagent’s strategies happen simultaneously.The actions of agents from both sub-populations demonstrate that the optimal control incorpo-rates the beliefs of all agents and weighs them against their own. The ﬁlter paths in the middleright panel of Figure 2 show how both agents eventually learn the true value of the latent statewith high conﬁdence. And this occurs by observing the paths of the price process only, even if theirbeliefs on its initial state are incorrect. The Radon-Nikodym derivative path in the bottom middlepanel of Figure 2 provides a sense of how far apart are the measures for sub-populations 1 and 2.This process varies signiﬁcantly over the course of the trading period since agents are constantlyupdating their estimate of the latent price process by observing order-ﬂow and the price paths.The variation in this process also demonstrates there is a non-trivial interdependence between theactions of each agent and the beliefs of all other agents. Using the same latent Markov model as in Section 7.2, we study the predicted behaviour of marketprices and of market participant as we vary the degree of disagreement across sub-populations. Weinvestigate the eﬀect of disagreement on both the volatility of market prices and the total tradingturnover of market participants. 22e assume K = 2 sub-populations of equal size, each with N k = 30 agents. Each of sub-population has identical preferences, but diﬀer in their beliefs of the market and set the agents’preference parameters to Ψ k = a k = 10 − , φ k = 5 × − for k = 1 ,

2, and Ψ k = a k (so that agentsare not necessarily forced to arrive at time T with zero inventory). The initial inventory positionsof agents are drawn from Q j ∼ N (0 , ¯ σ ) for all j ∈ N , with ¯ σ = 50.The two-state latent Markov process Θ t has generator matrix C = 0, and hence Θ t = Θ for all t ∈ [0 , T ], however, Θ is random and inaccesible to agents. Table 3 lists the assumed parametersof the mean-reverting asset price pocess, as well as the latent process. S = 5, θ = { . , . } , κ = 5 . σ = 0 .

14, and λ = λ = 10 − . Table 3: The parameters used for the asset price dynamics to generate Figure 3.

To introduce disagreement into this setup, we assume that sub-populations have diﬀerent prior be-liefs on the distribution of Θ . In particular, we assume π = (cid:16) . π . − ∆ π (cid:17) and π = (cid:16) . − ∆ π . π (cid:17) , where∆ π ∈ [0 , .

5) quantiﬁes the level of disagreement across the two sub-populations. In simulations, weassume the true probability distribution of the latent process is P (Θ = 4 .

95) = P (Θ = 5 .

05) = 0 . Figure 3: Estimated statistics of the simulated market as the degree of disagreement ∆ π varies. left panel : standarddeviation of the asset price. center panel : absolute deviation of asset price from un-impacted price. right panel :average absolute trading rate. Figure 3 shows various statistics (standard deviation of price, average absolute deviation fromthe un-impacted price, and average absolute trading rate) of trading activity within the marketresulting from 10 simulations. All three panels show a unilateral increase in all of the plottedstatistics as the level of disagreement increases. In particular, the right panel shows that tradingvolume increases, driving up the standard deviation of the asset price process (as see in the leftpanel), and driving up the net impact of trading as shown in the center panel.Extrapolating from the results of these experiments, we can conclude that an increase in disagree-ment amongst a population of agents appears to increase market volume and increase asset pricevolatility. These observations are consistent with those seen in Bayraktar and Munk (2017), whoalso observe an increase in market activity as disagreement increases in markets.

8. Conclusion

This paper introduced a stochastic game for a market in which sub-populations of agents havediﬀerent risk-preferences and beliefs on the model for the asset price process. By taking the inﬁnitepopulation limit of the model, we obtained a more tractable mean-ﬁeld game (MFG) model forthe market. By using tools from convex analysis we provide an FBSDE characterization of the23ptimal control of each agent and thus the Nash-equlibirum of the MFG. This FBSDE is highdimensional, and non-standard as the martingale components for each dimension are martingalesacross diﬀerent probability measures. Through some change-of-measure techniques we manage toobtain a solution to this FBSDE system and for the collection of mean-ﬁeld optimal controls. Wealso demonstrated that the MFG optimal control satisﬁes the (cid:15) -Nash property, which implies thatthe limiting Nash-Equilibrium can be arbitrarily close to the Nash-equilibrium in the ﬁnite playergame as long as the population size is large enough. Lastly, we provide a LSMC approximationto the MFG optimal control, and use it to study example simulations of markets near their Nash-equilibrium. In a simulation setting, increasing disagreement among market participants appearsto increase price volatility, price deviation from the un-impacted market price, and trading volume.24 ppendix A. Proofs

Appendix A.1. Proof of Lemma 3.2

Proof.

To show that the claim holds, we need to show that for any ρ ∈ (0 , H ν j ( ρν + (1 − ρ ) ω ) − ρH ν j ( ν ) − (1 − ρ ) H ν j ( ω ) > ν, ω ∈ A j where ν t = ω t at most on P × µ null sets. By noting that q j,ρν +(1 − ρ ) ωt = ρ q j,νt + (1 − ρ ) q j,ωt , (A.2)we may compute the diﬀerence (A.1) using the representation (11) of H ν j to obtain,LHS of (A.1) = E P k (cid:34) (cid:90) T (cid:110) ρ (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt (cid:17) − (cid:16) ρ (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:124) Γ k (cid:16) ρ (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:111) dt (cid:35) (completing the square) = E P k (cid:34) (cid:90) T (cid:110) ρ (1 − ρ ) (cid:16)(cid:16) ν t q j,νt (cid:17) − (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:124) Γ k (cid:16)(cid:16) ν t q j,νt (cid:17) − (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:111) dt (cid:35) , where we deﬁne the matrix Γ k = (cid:16) a k Ψ k Ψ k φ k (cid:17) .By deﬁning the terms ∆ t = ν t − ω t , q ∆ t = q j,νt − q j,ωt , we can expand the above expression to obtainLHS of (A.1) = ρ (1 − ρ ) E P k (cid:20)(cid:90) T (cid:26) a k ∆ t + φ k (cid:16) q ∆ t (cid:17) + 2Ψ k ∆ t q ∆ t (cid:27) dt (cid:21) . (A.3)As ρ ∈ (0 , φ k ≥

0, the middle termin (A.3) is ≥

0. Next, let us focus on the right-most term in (A.3). Because q ∆0 = 0, we may write q ∆ t = (cid:82) t ∆ u du .Using integration by integrating by parts then yields E P k (cid:90) T t q ∆ t dt = E P k (cid:20)(cid:16) q ∆ T (cid:17) (cid:21) ≥ . (A.4)As Ψ k ≥

0, this inequality implies the right-most term in (A.3) is non-negative. Lastly, notice that if ( P × µ )( ν t (cid:54) = ω t ) >

0, then ( P k × µ )( ν t (cid:54) = ω t ) > E P k (cid:20)(cid:90) T ∆ t dt (cid:21) > . (A.5)As a k >

0, this result together with the inequality from the other two terms, shows that (A.1) is strictly greater thanzero.

Appendix A.2. Proof of Lemma 3.3

Proof.

Using the deﬁnition of the Gˆateaux derivative, (cid:68) D H ν j ( ν ) , ω (cid:69) = lim (cid:15) (cid:38) H ν j ( ν + (cid:15) ω ) − H ν j ( ν ) (cid:15) (A.6)we aim to show this limit exists and is equal to the result provided in the lemma. Using the representation for theobjective H ν j in (11), canceling out the t = 0 terms, and using the linearity of the process q j,νt − q j,ν in the variable ν , we have H ν j ( ν + (cid:15) ω ) − H ν j ( ν ) = (cid:15) E P k (cid:20)(cid:90) T (cid:110) ( q j,ωt − q j,ω )( A kt + λ (cid:124) k ν t ) − (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17)(cid:111) dt (cid:21) − (cid:15) E P k (cid:20)(cid:90) T (cid:16) ω t q j,ωt − q j,ω (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17) dt (cid:21) , (A.7) here Γ k = (cid:18) a k Ψ k Ψ k φ k (cid:19) . Dividing by (cid:15) and taking the limit yields (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:20)(cid:90) T (cid:110) ( q j,ωt − q j,ω )( A kt + λ (cid:124) k ν t ) − (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17)(cid:111) dt (cid:21) . (A.8)Expanding the right part of the integrand in (A.8) and re-grouping terms, (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:20) (cid:90) T ( q j,ωt − q j,ω ) (cid:16) A kt + λ (cid:124) k ν t − φ k q j,νt + Ψ k ν t ) (cid:17) dt − (cid:90) T ω t (cid:16) a k ν t + Ψ k q j,νt (cid:17) dt (cid:21) . (A.9)As ν, ω ∈ A j and ν, (cid:98) A ∈ H T , the suﬃcient conditions for Fubini’s theorem are met. Applying Fubini’s theorem, thetower property and the fact that ω t is F jt -measurable, (cid:68) D H ν j ( ν ) , ω (cid:69) = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt (cid:110) A ku + λ (cid:124) k ν u − φ k q j,νu (cid:111) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:34) ω t (cid:32) − a k ν t − k q j,νT + E P k (cid:34)(cid:90) Tt (cid:110) A ku + λ (cid:124) k ν u − φ k q j,νu (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:35)(cid:33)(cid:35) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) A ku + λ (cid:124) k ν u − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) E P k (cid:104) A ku + λ (cid:124) k ν u |F ju (cid:105) − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) A ku + λ (cid:124) k ν u − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u |F ju (cid:105) − φ k q j,νu du (cid:111)(cid:19)(cid:21) dt which gives the desired result. Appendix A.3. Proof of Theorem 3.4

Proof.

By using lemmas 3.2 and 3.3 we may apply the results of (Ekeland and Temam, 1999, Section 5) which statethat, for each j ∈ J (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0 , ∀ ω ∈ A j ⇔ ν j, ∗ = arg max ν ∈A j H ν j ( ν ) . (A.10)Further, the strict concavity of H implies that ν j, ∗ is unique up to P × µ null sets. Therefore we need only demonstratethat (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j , if and only ν j, ∗ is the solution to the FBSDE (21). Suﬃciency:

Suppose that ν j, ∗ is the solution to the FBSDE (21) and that ν j, ∗ ∈ H T . We now show that ν j, ∗ ∈ A j and that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j .First, the solution to the FBSDE may be represented implicitly as2 a k ν j, ∗ t = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.11)which demonstrates that ν j, ∗ is F j -adapted. Therefore, since ν j, ∗ ∈ H T and ν j, ∗ is F j -adapted, we have that ν j, ∗ ∈ A j . Second, by inserting (A.11) into the expression for the Gˆateaux derivative (18) from Lemma 3.3 and usingthe tower property, we ﬁnd that it vanishes almost surely. Necessity:

Suppose that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j , then E P k (cid:20) − a k ν j, ∗ t − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) = 0 , P × µ a.e. (A.12)To see this, suppose that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0 for all ω ∈ A j , but (A.12) does not hold. Then, choose (cid:101) ω = ( (cid:101) ω t ) t ∈ [0 ,T ] s.t., (cid:101) ω t = E P k (cid:20) − a k ν j, ∗ t − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) . (A.13) uch (cid:101) ω is F j -adapted by its very deﬁnition. Second, as ν k , ν j, ∗ , A k ∈ H T , Jensen’s and the triangle inequalityapplied to (A.13) implies the bound E P k (cid:20)(cid:90) T ( (cid:101) ω t ) dt (cid:21) ≤ C k (cid:18) E P k (cid:20)(cid:90) T ( ν j, ∗ t ) dt (cid:21) + E P k (cid:20)(cid:90) T (cid:16) ( A kt ) + λ ( ν t ) (cid:17) dt (cid:21)(cid:19) < ∞ , where the constant C k = 4 (cid:0) a k + T Ψ k + T φ k (cid:1) . Hence, (cid:101) ω ∈ H T and therefore (cid:101) ω ∈ A j . Inserting this choice of (cid:101) ω into the expression for the Gˆateaux derivative (18), we see that (cid:104)D H ν j ( ν j, ∗ ) , (cid:101) ω (cid:105) >

0, and hence contradicts theassumption that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j .Thus, using (A.12) and noting that ν j, ∗ t is F j -adapted, using the tower property, we may write2 a k ν j, ∗ t = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u (cid:12)(cid:12)(cid:12) F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.14)and 2 a k M jt = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) T (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u (cid:12)(cid:12)(cid:12) F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.15)which solves the FBSDE in the statement of the proposition. Appendix A.4. Proof of Theorem 3.5

We separate this proof in 3 parts, corresponding to each of the claims of the proposition.

Part (I):

To obtain the solution to g , we ﬁrst compute the SDE for E Q g , using the SDE for g in Equation (31) and theSDE of E Q in Equation (38). After expanding the SDE for E Q g , and grouping terms, we ﬁnd − d (cid:16) E Q t g ,t (cid:17) = E Q t (cid:98) A t dt − E Q t (cid:110) d M t − ( Z Q t ) − d (cid:104) Z Q , M (cid:105) t + ( Z Q t ) − d Z Q t E Q t g ,t } (cid:111) . (A.16)As Z Q is a Radon-Nikodym derivative process, it must be a Q -martingale, and by extension, the term E Q t ( Z Q t ) − d Z Q t E Q t g ,t is the increment of a Q -martingale. Next, by the Girsanov-Meyer theorem Protter (2005)[Chapter III, Thm. 35], theremainder of the terms in the curly brackets of Equation (A.16) sum to the increment of a Q -martingale. Because ofthis, we may re-write the BSDE for E Q t g ,t as − d (cid:16) E Q t g ,t (cid:17) = E Q t (cid:98) A t dt − d ˜ M t , (A.17)for some martingale term ˜ M . Using this last result, we may write out the implicit form of the solution as E Q t g ,t = E Q (cid:20) (cid:90) Tt E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) . (A.18)Lastly, multiplying the result on both sides by ( E Q t ) − , we obtain the stated solution. Part (II):

The ODE (30) is a matrix-valued non-symmetric Riccati-type ODE. We prove the claims concerning the ODE (30)by applying theorems and tools for non-symmetric Riccati ODEs in Freiling et al. (2000) and Freiling (2002). Firstly,deﬁne ˜ g ,t = g ,T − t . We show that all of the claims hold for ˜ g ,t , and hence also for g ,t .From ODE (30) (cid:26) ∂ t ˜ g ,t = (cid:0) Λ + ˜ g ,t (cid:1) (2 a ) − ˜ g ,t − φ ˜ g , = − Ψ . (A.19)Next we aim to use Theorem 2.3 of Freiling et al. (2000) on ˜ g ,t to prove existence and boundedness of a solution.Using the notation in Freiling et al. (2000), deﬁne B = , B = − J, B = − φ , B = Λ J , (A.20)and W = − Ψ , where J = (2 a ) − . To meet the requirements of Theorem 2.3 in Freiling et al. (2000), we must ﬁnd C, D ∈ R K × K , C = C (cid:124) so that L + L (cid:124) ≤ C + DW + W (cid:124) D (cid:124) >

0, where L = (cid:18) − D φ − CJ + D Λ J − J (cid:124) D (cid:19) . (A.21)Let D = I ( K × K ) and C = 5 Ψ . With these choices of C, D , and using the fact that Ψ is a diagonal matrix with ositive entries, we ﬁnd that C + DW + W (cid:124) D (cid:124) = Ψ > , (A.22)which meets one of the necessary conditions. The choices of C and D also imply that the matrix L takes the form L = (cid:18) − φ − (5 Ψ + Λ ) J − J (cid:19) . (A.23)Next, as det( L ) = det( − φ ) × det( − J ), the set of eigenvalues of L is the union of the set of eigenvalues of − φ andthose of − J . Because − φ ≤ − J <

0, all eigenvalues of L are guaranteed to be non-positive, and at least oneof them is guaranteed to be non-zero, implying that L <

0. Hence, L + L (cid:124) < g ,t exists and is continuous on the interval [0 , T ], it follows that it is also bounded on thisinterval. Since the solution is guaranteed to exist and to be bounded, we may apply (Freiling, 2002, Thm 3.1), whichguarantees that the solution is unique and takes the form (42), as desired. Part (III):

The reader may verify that the presented solution for the Ricatti ODE (25) is valid. Moreover, it is also easy toverify that the solution is bounded and continuous in the interval [0 , T ]. All that remains is to show that h k ,t ≤ t ∈ [0 , T ]. If we notice that since t < T and γ k ≥ − γ k ( T − t )) ≤ − γ k ( T − t )) ≥

1. As ξ k , Ψ k ≥ k cosh ( − γ k ( T − t ) ) − ξ k sinh ( − γ k ( T − t ) ) ξ k cosh ( − γ k ( T − t ) ) − Ψ k sinh ( − γ k ( T − t ) ) ≥ , (A.24)and the desired result follows. Appendix A.5. Proof of Theorem 3.6

To demonstrate the claim of the theorem, we need to show that the optimality conditions of Theorem (3.4) arefulﬁlled. As demonstrated in Section 3.3, if there exists solutions to the Ricatti-type ODEs (25) for { h k ,t } k ∈ K , amatrix-valued Ricatti-type ODE 30 for g ,t as well as the vector-valued BSDE (31) for g ,t , then the solution to theoptimality FBSDE (20) follows the exact form presented in the statement of this theorem. In Theorem 3.5, we showedthat there exist solutions to these FBSDEs, and hence the solution to the optimality FBSDE of Theorem (3.4) issolved.All that remains to be shown is that the solution to the optimality FBSDE also belongs to an individual agent’sset of admissible strategies, A j and that the consistency conditions are met.First, we show that ν j, ∗ ∈ A j . To do this, we must demonstrate that ν j, ∗ is F j -predictable and contained in H T .By the deﬁnition of Z Q in equation (39), it is an F -adapted process, and by extension E Q t must also be F -predictable.Therefore, by the deﬁnition of the conditional expected value, the solution to g ,t presented in Theorem 3.5 mustbe F -predictable, and hence the mean-ﬁeld processes { ν k } k ∈ K must all be F -predictable as well. Lastly, since ν j, ∗ t = ν kt + h k ,t a k ( q j,ν j, ∗ t − ¯ q k,ν k t ) and since ¯ q k,ν k t is F -predictable, and since h k ,t is deterministic, we have that ν j, ∗ t must be F j -adapted.Next, we must show that ν j, ∗ ∈ H T . Noting that d ¯ q ν ∗ t = ν ∗ t dt = ( g ,t + g ,t ¯ q ν ∗ t ) dt and that ¯ q ν ∗ = ( ¯ m k ) k ∈ K = ¯ m ,we can solve for ¯ q t directly as ¯ q ν ∗ t = E (cid:18)(cid:90) t g ,s ds (cid:19) ¯ m + (cid:90) t E (cid:18)(cid:90) s g ,s ds (cid:19) g ,s ds , (A.25)where E (cid:16)(cid:82) t g ,s ds (cid:17) is the solution to the time-ordered matrix exponential of g ,s . Thus by Yonge’s inequality andthe boundedness of g ,t , E P k (cid:90) T (cid:107) ¯ q ν ∗ u (cid:107) du ≤ (cid:18) (cid:107) ¯ m (cid:107) (cid:90) T (cid:13)(cid:13) E (cid:18)(cid:90) t g ,s ds (cid:19)(cid:13)(cid:13) ds + T (cid:90) T (cid:13)(cid:13) E (cid:18)(cid:90) s g ,s ds (cid:19)(cid:13)(cid:13) (cid:13)(cid:13) g ,s (cid:13)(cid:13) ds (cid:19) (A.26) ≤ C + C (cid:90) T (cid:13)(cid:13) g ,s (cid:13)(cid:13) ds < ∞ , (A.27)for some C , C >

0, where (cid:107)·(cid:107) represents the (cid:96) operator norm, . Hence, ¯ q ν ∗ ∈ H T . ext, using this last fact, if we compute the expected integrated squared norm of ν over [0 , T ], we ﬁnd that E P k (cid:90) T (cid:107) ν ∗ u (cid:107) du = E P k (cid:90) T (cid:13)(cid:13)(cid:13) g ,t + g ,t ¯ q ν u (cid:13)(cid:13)(cid:13) du (A.28) ≤ (cid:18) E P k (cid:90) T (cid:13)(cid:13) g ,t (cid:13)(cid:13) du + E P k (cid:90) T (cid:107) g ,t (cid:107) (cid:107) ¯ q ν u (cid:107) du (cid:19) (A.29) ≤ C + C E P k (cid:90) T (cid:107) ¯ q ν u (cid:107) du < ∞ , (A.30)for some constants C , C > g ,t is bounded over the interval [0 , T ] and the fact that g ∈ H T (as stated in the conditions of the theorem). Hence, ν ∈ H T .Next, notice that E P k (cid:90) T | ν j, ∗ u | du ≤ (cid:18) E P k (cid:90) T | ν k, ∗ u | du + E P k (cid:90) T | ν j, ∗ u − ν k, ∗ u | du (cid:19) . (A.31)As ν k, ∗ ∈ H T , the above demonstrates that it is suﬃcient to show that ν j, ∗ u − ν k, ∗ u ∈ H T to guarantee that ν j, ∗ ∈ H T .Similarly to ¯ q t , if we notice that d ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) = ( ν j, ∗ t − ν k, ∗ t ) dt = h k ,t a k ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) dt and that ( q j,ν j, ∗ − ¯ q k,ν k, ∗ ) = Q j − ¯ m k , we can solve exactly for this diﬀerence as q j,ν j, ∗ t − ¯ q k,ν k, ∗ t = (cid:16) Q j − ¯ m k (cid:17) e (cid:82) t hk ,t ak . (A.32)As E P k ( Q j ) < ∞ and h k ,t ≤ (cid:16) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:17) ∈ H T .Using the solution to ν j, ∗ and using the result above, E P k (cid:90) Tt | ν j, ∗ u − ν k, ∗ u | du ≤ sup t ∈ [0 ,T ] ( h k ,t ) a k E P k (cid:90) Tt (cid:12)(cid:12)(cid:12) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:12)(cid:12)(cid:12) du < ∞ , (A.33)where we use h k ,t < ν j, ∗ u − ν k, ∗ u ∈ H T and ν j, ∗ ∈ H T . Thus we have demonstrated that ν j, ∗ is F j -predictable, and that ν j, ∗ ∈ H T . Therefore ν j, ∗ ∈ H T .Lastly, we demonstrate that the consistency conditions are met. In other words, we must show that ν k, ∗ t = lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t (A.34)for all t ∈ [0 , T ] and for all k ∈ K . Using the solution to q j,ν j, ∗ t − ¯ q k,ν k, ∗ t , we ﬁnd thatlim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k (cid:16) ν j, ∗ t − ν k, ∗ t (cid:17) = e (cid:82) t hk ,t ak lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k (cid:16) Q j − ¯ m k (cid:17) . (A.35)Now since the Q j have bounded variance, the limit on the right vanishes as N → ∞ by the law of large numbers.Hence, the consistency conditions are met.The last statement follows from Theorem 3.4. Appendix A.6. Proof of Proposition 5.1

Proof.

Let us ﬁrst note that we may represent each element in Z P k t as a Doob-martingale since d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t = E P k (cid:34) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) . (A.36)Recall the global ﬁltration G = ( G t ) t ∈ [0 ,T ] introduced in Section 2.2, with the property that G t ⊇ (cid:87) j ∈ N F jt for all t ∈ [0 , T ]. By this deﬁnition, we have that Z P k t = diag (cid:32) E P k (cid:34) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35)(cid:33) k (cid:48) ∈ K . (A.37) ach term d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T is in fact quite easy to compute. Let us remember that only diﬀerence between measures d P k and d P k (cid:48) is the law of the initial value of the latent process, Θ . For each k ∈ K , we have that P k (Θ = θ j ) = π k,j . Thus,we may write the expression for each Radon-Nikodym derivative conditional on G T as d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T = (cid:88) i ∈ J π k (cid:48) ,i π k,i { Θ = θ i } . (A.38)As each π k (cid:48) ,i π k,i is constant, taking the conditional expected value with respect to P k yields d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t = (cid:88) i ∈ J π k (cid:48) ,i π k,i P k (cid:16) Θ = θ i (cid:12)(cid:12)(cid:12) F t (cid:17) . (A.39)Assembling the d P k (cid:48) d P k terms above into a diagonal matrix, we ﬁnd that the expression for Z P k t follows the form in thestatement of the proposition. Appendix A.7. Proof of Proposition 5.2

Proof.

We will need to show here that the expression for g ,t presented in Theorem 3.5 satisﬁes g ∈ H T . In otherwords, we need to show that E P k (cid:104)(cid:82) T (cid:107) g ,t (cid:107) dt (cid:105) < ∞ for all k ∈ K .The ﬁrst step will be to show that the operator norm of E P k t is almost surely bounded above when using the latentMarkov chain model. For the remainder of this proof, we suppress the superscript P k for ease of notation. Simplyapplying Itˆo’s lemma, we ﬁnd that E t = ˜ E t Z P k t , where ˜ E t is the solution to the SDE d ˜ E t = ˜ E t Z P k t G t ( Z P k t ) − dt (A.40)with the initial condition ˜ E = I K × K . Writing out the implicit solution of the diﬀerential equation and taking theoperator norm we ﬁnd that (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) = (cid:13)(cid:13) I K × K + (cid:90) t ˜ E u Z P k u G u ( Z P k u ) − du (cid:13)(cid:13) (A.41) ≤ (cid:90) t (cid:13)(cid:13) ˜ E u Z P k u G u ( Z P k u ) − (cid:13)(cid:13) du (A.42) ≤ (cid:90) t (cid:13)(cid:13) ˜ E u (cid:13)(cid:13) (cid:13)(cid:13) Z P k u (cid:13)(cid:13) (cid:13)(cid:13) G u (cid:13)(cid:13) (cid:13)(cid:13) ( Z P k u ) − (cid:13)(cid:13) du , (A.43)where we use the triangle inequality, Jensen’s inequality and the property of the operator norm. As shown inProposition 5.1, we know that Z P k t is almost surely bounded over the interval [0 , T ]. From Theorem 3.5, we alsoknow that G t is bounded over this same interval. Now, looking back to the deﬁnition of Z t , we ﬁnd that Z − t = diag (cid:32) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t (cid:33) , (A.44)which can also be expressed in the same way as presented in Proposition 5.1, which in turn implied that Z t is almostsurely bounded over [0 , T ]. Therefore, it follows that there exists a constant C > (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) ≤ C (cid:90) t (cid:13)(cid:13) ˜ E u (cid:13)(cid:13) du . (A.45)Applying Gr¨onwall’s lemma to the above yields that sup t ∈ [0 ,T ] (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) ≤ e C T < ∞ . Repeating the same analysis on (cid:13)(cid:13) ˜ E − t (cid:13)(cid:13) yields the very same bound. Finally, since the operator norms of Z P k t , ( Z P k t ) − , ˜ E t and ˜ E − t are all boundedover [0 , T ], we get that there exists a constant C > t,u ∈ [0 ,T ] (cid:13)(cid:13) ( E t ) − E u (cid:13)(cid:13) < e TC Next, we wish to show that (cid:98) A ∈ H T . Under our model, we may compute (cid:98) A k as (cid:98) A kt = E P k (cid:34) J (cid:88) i =1 α it { Θ t = θ i } (cid:12)(cid:12) F t (cid:35) (A.46)= (cid:88) i ∈ J α it P k (Θ t = θ i (cid:12)(cid:12) F t ) . (A.47) herefore, since all of the P k terms in the above are bounded above by 1, we may use Young’s inequality to write (cid:13)(cid:13)(cid:13) (cid:98) A (cid:13)(cid:13)(cid:13) ≤ K (cid:88) i ∈ J (cid:13)(cid:13)(cid:13) α it (cid:13)(cid:13)(cid:13) . (A.48)As each α it ∈ H T , we get that (cid:98) A ∈ H T .Now we can proceed to showing the main result. Using the bounds we derived above and Jensen’s inequality, wemay write E P k (cid:20)(cid:90) T (cid:107) g ,t (cid:107) dt (cid:21) ≤ E P k (cid:34)(cid:90) T (cid:13)(cid:13)(cid:13)(cid:13) E P k (cid:20)(cid:90) Tt ( E t ) − E u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) dt (cid:35) (A.49) ≤ E (cid:20)(cid:90) T (cid:90) Tt (cid:13)(cid:13)(cid:13) ( E t ) − E u (cid:98) A u (cid:13)(cid:13)(cid:13) du dt (cid:21) (A.50) ≤ E (cid:20)(cid:90) T (cid:90) Tt (cid:13)(cid:13) ( E t ) − E u (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) (cid:98) A u (cid:13)(cid:13)(cid:13) du dt (cid:21) (A.51) ≤ ( T + 1) e C T E P k (cid:20)(cid:90) T (cid:13)(cid:13)(cid:13) (cid:98) A u (cid:13)(cid:13)(cid:13) du (cid:21) < ∞ , (A.52)where in the last line, we use the fact that (cid:98) A ∈ H T . Thus, we ﬁnd that g ∈ H T , which veriﬁes the claim of theproposition. Appendix A.8. Proof of Theorem 4.1

We begin the proof of Theorem 4.1 by introducing a lemma regarding the distance between the mean-ﬁeld gameobjective H j and the ﬁnite player game objective H j . Lemma Appendix A.1.

Let ν ∈ A j be some arbitrary admissible control and ν − j, ∗ ∈ A − j be the collection ν − j, ∗ := (cid:0) ν , ∗ , . . . , ν j − , ∗ , ν j +1 , ∗ , . . . , ν N, ∗ (cid:1) of optimal controls deﬁned by equation (44) in Theorem 3.6 for all agentsexcept for j . Let us also assume that ν ∗ = (cid:0) ν k, ∗ (cid:1) k ∈ K follows the dynamics of equation (45) in Theorem 3.6. Then (cid:12)(cid:12)(cid:12) H j ( ν, ν − j, ∗ ) − H ν ∗ j ( ν ) (cid:12)(cid:12)(cid:12) = o ( δ N ) + o ( 1 N ) . (A.53) Proof.

Using the deﬁnitions of H ν ∗ j and H j and simplifying down the equations, we ﬁnd that (cid:12)(cid:12)(cid:12) H j ( ν, ν − j, ∗ ) − H j ( ν ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k (cid:34) (cid:88) k (cid:48) ∈ K (cid:90) T λ k,k (cid:48) (cid:16) p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) t (cid:17) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.54) ≤ (cid:88) k (cid:48) ∈ K λ k,k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) , ∗ t dt (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (A.55)Therefore it is suﬃcient for us to show that each of the expected values in the sum of (A.55) is o ( N − ) + o ( δ N ).Next, notice that using the deﬁnitions of ν k (cid:48) , ( N ) t and p ( N ) k (cid:48) , we can decompose the diﬀerence of the mean-ﬁeld ratesbetween the agent’s rate and the rate of all others p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) , ∗ t = 1 N ( ν t − ν j, ∗ t ) + 1 N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) , (A.56)where ν j, ∗ t is the optimal control that agent-j would have taken in the limiting game.Using the triangle inequality and Jensen’s along with the last result, we get that (A.55) ≤ (cid:88) k (cid:48) ∈ K λ k,k (cid:48)  N E P k (cid:20)(cid:90) T | ν t − ν j, ∗ t | dt (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k  N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (A.57) It is clear that ν t − ν j, ∗ t ∈ A j so we can guarantee that E P k (cid:104)(cid:82) T | ν t − ν j, ∗ t | dt (cid:105) is bounded and independent of N .Therefore, 1 N E P k (cid:20)(cid:90) T | ν t − ν j, ∗ t | dt (cid:21) = o ( 1 N ) . (A.58) herefore all that’s left to show is that the right part of the summand of (A.57) vanishes at an appropriate speed.By plugging in the manipulation p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t = ( p ( N ) k (cid:48) − p k (cid:48) ) ν j, ∗ t + p k (cid:48) ( ν j, ∗ t − ν k (cid:48) , ∗ t ) (A.59)and using the triangle inequality and Jensen’s inequality, we ﬁnd that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k  N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) p ( N ) k (cid:48) − p k (cid:48) (cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T | ν j, ∗ t | dt (cid:21) (A.60)+ p k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k  N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( ν j, ∗ t − ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.61)As ν j, ∗ ∈ A j , we ﬁnd that E P k (cid:104)(cid:82) T | ν j, ∗ t | dt (cid:105) < ∞ . Therefore by the assumption of the theorem, we get (cid:12)(cid:12)(cid:12) p ( N ) k (cid:48) − p k (cid:48) (cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T | ν j, ∗ t | dt (cid:21) = o ( δ N ) . (A.62)Next, using the structure of the solution for ν j, ∗ t from Theorem 3.6, equation (A.32) and the fact that h k ,t is bounded,we get (A.61) = p k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) E P k (cid:34)(cid:16) Q j − ¯ m k (cid:17) (cid:90) T e (cid:82) t hk ,t ak dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.63) ≤ C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:16) E P k (cid:104) Q j (cid:105) − ¯ m k (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 . (A.64)Hence, the right part of equation A.57 is equal to o ( δ N ) + o ( N − ) and the claims of the lemma hold true Appendix A.8.1. Main Proof of Theorem 4.1

Proof.

We prove the result of the theorem by using the Lemma Appendix A.1. First, let us note that by the deﬁnitionof the supremum, H j ( ω, ν − j, ∗ ) ≤ sup ν ∈A j H j ( ν, ν − j, ∗ ) (A.65)holds for all ω ∈ A j , and therefore the left-most inequality in the statement of Theorem 4.1 holds.Next, we must show that the right-most inequality in the statement of Theorem 4.1 also holds. First let us notethat by Lemma Appendix A.1, for any ν ∈ A j , H j ( ν, ν − j, ∗ ) ≤ H ν ∗ j ( ν ) + o ( δ N ) + o ( N − ) (A.66) ≤ H ν ∗ j ( ν j, ∗ ) + o ( δ N ) + o ( N − ) , (A.67)where we use the fact that H j ( ν j, ∗ ) = sup ν ∈A j H j ( ν ). Applying Lemma Appendix A.1 again, we ﬁnd that H j ( ν, ν − j, ∗ ) ≤ H j ( ν j , ν − j, ∗ ) + 2 o ( δ N ) + 2 o ( N − ) . (A.68)As the above inequality holds for all ν ∈ A j we may take the supremum on the left, and cancel out the constantterms multiplying the little- o terms to yield the ﬁnal result,sup ν ∈A j H j ( ν, ν − j, ∗ ) ≤ H j ( ν j , ν − j, ∗ ) + o ( δ N ) + o ( N − ) . (A.69) Appendix B. Filtering and Smoothing Equations

In sections 5, 6 and 7 we refer to the Radon-Nikodym process Z Q and for the F -projected drift process (cid:98) A which arerequired for the approximation of the optimal control. This appendix will provide the details on how these quantities re computed for the mean-revertingmodel used in the numerical experiments present in Section 7.Let us recall the model provided in Section 7. We assume that the un-impacted asset price process has the dynamics dF t = κ (Θ t − F t ) dt + σ dW t , where Θ t is a continuous-time Markov chain with generator matrix C which takes values in the set { θ i } Ji =1 . Whatvaries across each measure P k is the distribution over the initial state, Θ , where we assume that P k (Θ = θ i ) = π k,i for each i ∈ { , , . . . , J } and k ∈ K . Our ﬁrst step will be to compute the F t -adapted process (cid:98) A t = (cid:16) E P k [ A t |F t ] (cid:17) k ∈ K .Using the dynamics of F t , we get that E P k [ A t |F t ] = E P k [ κ (Θ t − F t ) |F t ]= κ (cid:16) E P k [Θ t |F t ] − F t (cid:17) = κ (cid:32) J (cid:88) i =1 θ i P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) − F t (cid:33) . Therefore, to compute (cid:98) A we need to compute the posterior probabilities of each state of Θ t , P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) . Thelemma that follows gives an explicit way of computing these probabilities. Lemma Appendix B.1 (Filtering Equation) . Let us assume that the Novikov condition E P k (cid:20) exp (cid:26)(cid:90) T ( A u ) du (cid:27)(cid:21) < ∞ (B.1) holds for all k ∈ K . For each i = 1 , . . . , J and k ∈ K , let π k,it = P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) , and deﬁne the processes Λ k,i = (cid:16) Λ k,it (cid:17) t ∈ [0 ,T ] , satisfying the dynamics d Λ k,it = Λ k,it σ − κ ( θ i − F t ) dF t + J (cid:88) j =1 C i,j Λ k,jt dt , along with the initial condition Λ k,i = π k,i . Then the ﬁlters π k,jt satisfy the relation π k,jt = Λ k,it (cid:44)(cid:32) J (cid:88) j =1 Λ k,jt (cid:33) . Proof.

For the proof of this lemma, we refer the reader to the proof of a more general version of this statement foundin (Casgrain and Jaimungal, 2016, Theorem 3.1).The next task is to compute the process Z Q for any choice of Q = P k k ∈ K . We can do this by applyingProposition 5.1 to the model dynamics that we have. This Proposition 5.1 allows us to compute Z P k , given that wecan compute the value of the time-0 smoothers for Θ, P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) . The following lemma provides an expressionfor the computation of these smoothers. Lemma Appendix B.2 (Smoothing Equation) . Assume that the Novikov condition (B.1) holds. For each k ∈ K and i, j ∈ { , , . . . , J } , let us deﬁne the process ˜Λ k,i,j = (cid:16) ˜Λ k,i,jt (cid:17) t ∈ [0 ,T ] , where each ˜Λ k,i,j satisﬁes the SDE d ˜Λ k,i,jt = ˜Λ k,i,jt σ − κ ( θ j − F t ) dF t + J (cid:88) (cid:96) =1 C j,(cid:96) ˜Λ k,i,(cid:96)t dt , and the initial condition ˜Λ k,i,j = { i = j } . Then the time-0 smoother for Θ satisﬁes the equation P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) = (cid:32) J (cid:88) j =1 π k,i ˜Λ k,i,jt (cid:33) (cid:44) J (cid:88) i,(cid:96) =1 π k,i ˜Λ k,i,(cid:96)t  , Proof.

For each k ∈ K , let us deﬁne the measure ˜ Q k which is speciﬁed through the Radon-Nikodym derivative ζ kt = d P k d ˜ Q k (cid:12)(cid:12)(cid:12) F t = exp (cid:26)(cid:90) t A u σ − dF u − (cid:90) t ( A u ) σ − du (cid:27) . he Radon-Nikodym derivative above is deﬁned speciﬁcally so that under measure ˜ Q k , ( F t − F ) σ − is a Brownianmotion, independent of Θ t and so that the dynamics of Θ t are left unchanged.Using this new measure, we can re-represent the time-0 smoother we are looking for as P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) = E ˜ Q k (cid:2) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:3) E ˜ Q k (cid:2) ζ kt (cid:12)(cid:12) F t (cid:3) = E ˜ Q k (cid:2) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:3)(cid:80) Jj =1 E ˜ Q k (cid:104) { Θ = θ j } ζ kt (cid:12)(cid:12) F t (cid:105) Now, if we take a look at the term in the numerator, we can further expand it as E ˜ Q k (cid:104) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:105) = J (cid:88) j =1 E ˜ Q k (cid:104) { Θ = θ i } { Θ t = θ j } ζ kt (cid:12)(cid:12) F t (cid:105) = π k,i J (cid:88) j =1 E ˜ Q k (cid:104) { Θ t = θ j } ζ kt (cid:12)(cid:12) F t ∨ σ (Θ = θ i ) (cid:105) , where we use Bayes’ rule to get to the last line.Following the proof of (Casgrain and Jaimungal, 2016, Theorem 3.1), we ﬁnd that˜Λ k,i,jt = E ˜ Q k (cid:104) { Θ t = θ j } ζ kt (cid:12)(cid:12) F t ∨ σ (Θ = θ i ) (cid:105) satisﬁes the SDE found in the statement of the theorem, with the initial condition ˜Λ k,i,j = { i = j } . Plugging thisback into the previous expressions, we obtain the ﬁnal result. eferences Bank, P., H. M. Soner, and M. Voß (2017). Hedging with temporary price impact. Mathematics and FinancialEconomics 11(2), 215–239.Bayraktar, E. and A. Munk (2017). Mini-ﬂash crashes, model risk, and optimal execution.Bender, C. and J. Steiner (2012). Least-squares Monte Carlo for backward SDEs. In Numerical methods in ﬁnance,pp. 257–289. Springer.Bensoussan, A., T. Huang, and M. Lauri`ere (2018). Mean ﬁeld control and mean ﬁeld game models with severalpopulations. arXiv preprint arXiv:1810.00783.Bouchard, B., M. Fukasawa, M. Herdegen, and J. Muhle-Karbe (2018). Equilibrium returns with transaction costs.Finance and Stochastics 22(3), 569–601.Cardaliaguet, P. and C.-A. Lehalle (2016). Mean ﬁeld game of controls and an application to trade crowding. arXivpreprint arXiv:1610.09904.Carmona, R. and F. Delarue (2013). Probabilistic analysis of mean-ﬁeld games. SIAM Journal on Control andOptimization 51(4), 2705–2734.Carmona, R., J.-P. Fouque, and L.-H. Sun (2013). Mean ﬁeld games and systemic risk.Cartea, ´A., R. Donnelly, and S. Jaimungal (2017). Algorithmic trading with model uncertainty. SIAM Journal onFinancial Mathematics 8(1), 635–671.Casgrain, P. and S. Jaimungal (2016, Nov). Trading algorithms with learning in latent alpha models. MathematicalFinance, Forthcoming.Casgrain, P. and S. Jaimungal (2018). Meanf ﬁeld games with partial information for algorithmic trading. arXivpreprint arXiv:1803.04094.Choi, J. H., K. Larsen, and D. J. Seppi (2018). Smart twap trading in continuous-time equilibria. Available at SSRN3146658.Cirant, M. (2015). Multi-population mean ﬁeld games systems with neumann boundary conditions. Journal deMath´ematiques Pures et Appliqu´ees 103(5), 1294–1315.Ekeland, I. and R. Temam (1999). Convex analysis and variational problems. SIAM.Firoozi, D. and P. E. Caines (2015). ε -nash equilibria for partially observed lqg mean ﬁeld games with major agent:Partial observations by all agents. In Decision and Control (CDC), 2015 IEEE 54th Annual Conference on, pp.4430–4437. IEEE.Firoozi, D. and P. E. Caines (2016). Mean ﬁeld game ε -nash equilibria for partially observed optimal executionproblems in ﬁnance. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pp. 268–275. IEEE.Freiling, G. (2002). A survey of nonsymmetric riccati equations. Linear algebra and its applications 351, 243–270.Freiling, G., G. Jank, and A. Sarychev (2000). Non-blow-up conditions for riccati-type matrix diﬀerential anddiﬀerence equations. Resultate der Mathematik 37(1-2), 84–103.Gobet, E., J.-P. Lemor, X. Warin, et al. (2005). A regression-based monte carlo method to solve backward stochasticdiﬀerential equations. The Annals of Applied Probability 15(3), 2172–2202.Gu´eant, O., J.-M. Lasry, and P.-L. Lions (2011). Mean ﬁeld games and applications. Paris-Princeton lectures onmathematical ﬁnance 2010, 205–266.Huang, M. (2010). Large-population LQG games involving a major player: the nash certainty equivalence principle.SIAM Journal on Control and Optimization 48(5), 3318–3353.Huang, M., P. E. Caines, and R. P. Malham´e (2007, Sep.). Large-population cost-coupled LQG problems with nonuni-form agents: Individual-mass behavior and decentralized (cid:15) -Nash equilibria. IEEE Trans. Autom. Control 52(9),1560–1571. uang, M., R. P. Malham´e, P. E. Caines, et al. (2006). Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Communications in Information &Systems 6(3), 221–252.Huang, X. and S. Jaimungal (2017). Robust stochastic games and systemic risk. Available athttps://ssrn.com/abstract=3024021.Huang, X., S. Jaimungal, and M. Nourian (2019). Mean-ﬁeld game strategies for optimal execution. AppliedMathematical Finance 26(2), 153–185.Lasry, J.-M. and P.-L. Lions (2007). Mean ﬁeld games. Japanese journal of mathematics 2(1), 229–260.Letourneau, P. and L. Stentoft (2016). Improved greeks for american options using simulation.Nourian, M. and P. E. Caines (2013). (cid:15) -nash mean ﬁeld game theory for nonlinear stochastic dynamical systems withmajor and minor agents. SIAM Journal on Control and Optimization 51(4), 3302–3331.Protter, P. E. (2005). Stochastic diﬀerential equations. In Stochastic integration and diﬀerential equations. Springer.Wang, Y. and R. Caﬂisch (2009). Pricing and hedging american-style options: a simple simulation-based approach.-nash mean ﬁeld game theory for nonlinear stochastic dynamical systems withmajor and minor agents. SIAM Journal on Control and Optimization 51(4), 3302–3331.Protter, P. E. (2005). Stochastic diﬀerential equations. In Stochastic integration and diﬀerential equations. Springer.Wang, Y. and R. Caﬂisch (2009). Pricing and hedging american-style options: a simple simulation-based approach.