Mean-Field Games with Differing Beliefs for Algorithmic Trading
MMean-Field Games with Differing Beliefs for AlgorithmicTrading (cid:73)
Forthcoming in Mathematical Finance
Philippe Casgrain a , Sebastian Jaimungal a a Department of Statistical Sciences, University of Toronto
Abstract
Even when confronted with the same data, agents often disagree on a model of the real-world.Here, we address the question of how interacting heterogenous agents, who disagree on what modelthe real-world follows, optimize their trading actions. The market has latent factors that driveprices, and agents account for the permanent impact they have on prices. This leads to a largestochastic game, where each agents’ performance criteria are computed under a different probabil-ity measure. We analyse the mean-field game (MFG) limit of the stochastic game and show thatthe Nash equilibrium is given by the solution to a non-standard vector-valued forward-backwardstochastic differential equation. Under some mild assumptions, we construct the solution in termsof expectations of the filtered states. Furthermore, we prove the MFG strategy forms an (cid:15) -Nashequilibrium for the finite player game. Lastly, we present a least-squares Monte Carlo based al-gorithm for computing the equilibria and show through simulations that increasing disagreementmay increase price volatility and trading activity.
1. Introduction
Financial markets are immensely complicated dynamic systems which incorporate the interactionsof millions of individuals on a daily basis. Market participants vary immensely, both in terms oftheir trading objectives and in their beliefs on the assets they are trading. All of these participantscompete with one another in an attempt to achieve their own personal objectives in the most efficientway possible. Traded assets may also be driven by latent factors, and agents must dynamicallyincorporate data into their trading decisions.In this paper, we propose a game theoretic model in which a large population of heterogeneousagents all trade the same asset. This model considers heterogeneity not only from the point of viewof an individual’s trading objectives and risk appetite, but also from the point of view of each agent’sbeliefs regarding the performance of the asset they are trading. We pay particular attention to theinformation each agent is privy to, in an attempt to render the framework as realistic as possible,while maintaining analytical tractability. We study the equilibrium of these markets by usingthe theory of mean-field games (MFGs), which studies the system as the number of participatingagents becomes arbitrarily large. The general theory of mean-field games already has a large body ofresearch associated with it. The original works stem from Huang et al. (2006), Huang et al. (2007), (cid:73)
SJ would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada(NSERC), funding reference numbers RGPIN-2018-05705 and RGPAS-2018-522715. Data sharing is not applicableto this article as no new data were created or analyzed in this study.
Email addresses: [email protected] (Philippe Casgrain), [email protected] (Sebastian Jaimungal) a r X i v : . [ q -f i n . M F ] D ec nd Lasry and Lions (2007). Among the many extensions and generalizations which explore thebroad theory of MFGs as well as their applications, we highlight the following works: Huang (2010)and Nourian and Caines (2013) who investigate MFGs with combinations major and minor agents,Carmona and Delarue (2013) who develop a probabilistic analysis of MFGs, as well as the worksof Cirant (2015); Bensoussan et al. (2018) who introduce MFGs with heterogeneous populations ofagents. This theory has seen applications in various financial contexts, such as Gu´eant et al. (2011)who explores various applications of MFGs in economics, Carmona et al. (2013) and Huang andJaimungal (2017) who study systemic risk, Huang et al. (2019) who studies algorithmic trading inthe presence of a major agent and a population of minor agents, Cardaliaguet and Lehalle (2016)who investigates optimal execution, and Firoozi and Caines (2015), Firoozi and Caines (2016) wholook at MFGs with partial information on states and apply it to algorithmic trading.Other works that study differing beliefs of market participants include Bayraktar and Munk(2017), who study a system where agents’ believe the asset price is an arithmetic Brownian motionwith a latent (constant) drift, and agents disagree on the prior distribution of this latent drift,as well as on the temporary and permanent impact trading has on prices. The authors do notseek an equilibrium, but rather look at how the differences in belief may cause mini-flash crashes.Bouchard et al. (2018) study a model where agents, with differing risk aversion who receive randomendowments, trade assets who’s drifts are determined in equilibrium. Under certain assumptions,the equilibria results in asset prices having a permanent price impact component due to the existenceof trading costs (temporary price impact). Choi et al. (2018) study how traders who penalizedeviations from a target strategy, and have their own private information, form an equilibria.In contrast to other work on MFGs, as well as its specific application to algorithmic trading, here,motivated by Casgrain and Jaimungal (2016), we include latent states so that agents do not havefull information about the system dynamics. Furthermore, motivated by Firoozi and Caines (2016)and Casgrain and Jaimungal (2018), who study a stochastic game with latent factors where agentshave the same model beliefs, here, we study how varying beliefs among the agents affect the optimaltrading behaviour. In our model, we express the belief of agents as a probability measure on thedynamics of the asset price process and of any latent processes that may be driving them. As faras the authors are aware, this is the first time that MFG with varying beliefs have been treatedin the literature. This generalization is quite non-trivial, nonetheless, we succeed in characterizingthe model equilibrium as the solution to a non-standard forward-backward stochastic differentialequation (FBSDE) defined across the collection of belief measures. We are able to present a closedform representation for the solution of the MFG and it incorporates all of the differing market’sbeliefs into the decisions of the individual agents.Our key result, is the optimal mean-field trading rate ν ∗ t for the collection of sub-populations(within which agents have the same belief) can be written as ν ∗ t = g ,t + g ,t ¯ q ν ∗ t , where g ,t is a deterministic matrix-valued process, g ,t is stochastic and encodes the various be-liefs of the agents and their expectations of the future dynamics of the asset price, and ¯ q ν ∗ t isthe corresponding mean-field inventories. Moreover, the individual agents’ trading rates within asubpopulation- k can be written as ν j, ∗ t = ν k, ∗ t + a k h k ,t ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) , where h k ,t is a deterministic function of time. Hence, agents speed up or slow down relative tothe mean-field trading rate depending on whether their current inventory is above or below theirsubpopulation’s mean-field inventory. The model setup does not penalize deviations from the mean-field, yet agents tend to revert to the mean-field – this is in constrast to many MFG formulations2here the deviation from the mean-field is explicitly penalized.We structure the remainder of the paper as follows. Section 2 introduces the market modeland the stochastic game that agents participate in. Section 3 begins by introducing the MFGlimit of the stochastic game and then proves the collection of optimal strategies in the MFG maybe represented as the solution to a system of coupled FBSDEs. Next, we solve the system ofFBSDEs, and find the mean-field and each individual agents’ strategy. Section 5 provides a specificexample of a model where the assumptions in the key results are satisfied. In Section 4 we provethe solution to the MFG satisfies the (cid:15) -Nash equilibrium property in the finite population game.Lastly, Section 7 provides a least-square Monte Carlo approach to computing certain expecations,as well as simulated examples of a market model with agents having differing beliefs.
2. The Model
In this section, we provide the market model and the participating agents’ performance criteria.Our model closely resembles the model for the stochastic game in Casgrain and Jaimungal (2018).The stochastic game here aims to characterize a population of agents with several sources of het-erogeneity. As in Casgrain and Jaimungal (2018), here, agents have varying trading objectives. Inaddition, however, agents are also characterized by their beliefs regarding the model driving theasset price process. In the remainder of this section, we present the trading mechanics which eachof the agents use to interact with the market, as well as the objectives each of the agents seek toachieve with their actions.
The market consists of a population of
N > j ∈ N := { , . . . , N } . The total population of agents isdivided into K ∈ { , . . . , N } disjoint sub-populations, which are indexed by k ∈ K := { , . . . , K } . K is assumed to be constant and independent of N . All agents within a fixed sub-population havehomogeneous beliefs and performance criteria. The set K ( N ) k := { j ∈ N : j is in sub-population k } , ∀ k ∈ K , (1)denotes the set of agents within sub-population k , and the superscript ( N ) indicates the explicitdependence on the total number of agents. We also define N ( N ) k := |K ( N ) k | to be the total numberof agents within sub-population k . We further assume the number of agents contained in each ofthe sub-populations remains stable as we take the population limit to infinity. More specifically,we require that the proportion of agents contained within population k satisfieslim N →∞ p ( N ) k = p k ∈ (0 ,
1) where p ( N ) k = N ( N ) k N . (2)
We work on the filtered probability space (Ω , G = {G t } t ∈ [0 ,T ] , P ) completed by the null sets of P and where T ∈ (0 , ∞ ) is some fixed time horizon. All of processes defined in the remainder ofthis section are G -adapted, unless otherwise specified, and the notation E P [ · ] represents expectationwith respect to the measure P .All agents have the ability to buy and sell the asset over the fixed trading period [0 , T ], after whichall trading activity comes to a halt. Each agent j ∈ N controls the amount they wish to purchaseor sell at a continuous rate denoted ν j = ( ν jt ) t ∈ [0 ,T ] , where ν jt > ν jt <
0) indicates the rate of buy(sell) orders the agent sends to the market. At the start of the trading period, each agent holds a3andom amount Q j of the asset. This may be interpreted as each agent having private informationabout their own holdings, whereas other market participant know only its distribution. Agentskeep track of their holdings in the traded asset with the inventory process q j,ν j = ( q j,ν j t ) t ∈ [0 ,T ] ,where the superscript indicates the explicit dependence on the agent’s controlled rate of trade. Therelationship between agent- j ’s trading rate and their inventory process is q j,ν j t = Q j + (cid:90) t ν jt dt , (3)and may be interpreted as each agent buying or selling an amount (cid:15) ν jt in each small time interval[ t, t + (cid:15) ). Assumption . We make the technical assumption that the initial inventory holdings of all agentshave a bounded variance, so that ∃ < C < ∞ for which E P ( Q j ) < C , ∀ j ∈ N . Moreover, weassume that the mean of the starting inventory levels are the same within a given sub-population,so that E P [ Q j ] = m k for each j ∈ K ( N ) k .Buying and selling actions of agents impact the price of the traded asset in a manner to bespecified below. As well, agents believe the asset midprice follows (potentially) different models.We incorporate differing beliefs into our model by assigning a probability measure P k to each sub-population k ∈ K . The various measures correspond to the model that agents in a particularsub-population believes to represent the true dynamics of the asset price.We define the asset price process S ν ( N ) = ( S ν ( N ) t ) t ∈ [0 ,T ] , where the superscript ν ( N ) = ( ν j ) j ∈ N indicates the dependence of the price on the actions of all agents in the market. It is useful todefine the average trading rate ν k, ( N ) = ( ν k, ( N ) t ) t ∈ [0 ,T ] of all agents within sub-population k as ν k, ( N ) t = 1 N ( N ) k (cid:88) j ∈K ( N ) k ν jt . (4)Each agent in sub-population k then believes the asset price process follows the dynamics S ν ( N ) t = S + (cid:90) t (cid:40) A ku + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p ( N ) k (cid:48) ν k (cid:48) , ( N ) u (cid:41) du + M kt , (5)where for each k ∈ K , A k = ( A kt ) t ∈ [0 ,T ] is a G -predictable process, M k = ( M kt ) [0 ,T ] is a G -adapted P k -martingale, and λ k,k (cid:48) > ∀ k, k (cid:48) ∈ K are constants. We also assume the initial inventory holdingsof each agent Q j j ∈ ( N ) are all independent of both { A k } k ∈ K and { M k } k ∈ K in each measure P k .The measure P k effectively specifies the sub-population- k ’s asset price model through the processes A k and M k , as well as the scale of the market impact of each sub-population, through set ofconstants { λ k,k (cid:48) } k (cid:48) ∈K . Assumption . We make the technical assumptions that A k ∈ H ,kT and M k ∈ L ,kT , where H ,kT = (cid:26) f t : Ω × [0 , T ] → R : E P k (cid:90) T (cid:107) f t (cid:107) dt < ∞ (cid:27) and (6a) L ,kT = (cid:110) f t : Ω × [0 , T ] → R : E P k (cid:107) f t (cid:107) < ∞ , ∀ t ∈ [0 , T ] (cid:111) , (6b)for each k ∈ K and where (cid:107)·(cid:107) represents the Euclidean norm. Assumption . We assume that P k ∼ P for all k ∈ K and the law Q j under each measure P k isthe same as that under the measure P . 4 ssumption . We assume that for each k ∈ K , A k and M k are uncontrolled – i.e., are unaffectedby the agents’ actions.Our asset price process model does not require explicitly specifying A k and/or M k in advance.Rather, we only require the integrability conditions in (6), hence there is great flexibility in theclass of models our approach accommodates. For example, A k and M k can be discontinuous ornon-Markov, as long as they satisfy the appropriate integrability conditions. The key assumptionwe make is that price impact from the order-flow of all agents’ trading is linear. Remark 2.5.
Agents do not change their assigned belief measure, even after observing data from( S ν ( N ) t ) t ∈ [0 ,T ] , however, their prior assumptions are updated to posterior estimates as time flows.Each agent tracks their total accumulated cash process X j,ν j t = ( X j,ν j t ) t ∈ [0 ,T ] throughout thetrading period. When buying and selling the asset, each agent pays an instantaneous cost that islinearly proportional to amount of shares transacted. This cost is expressed through the controlleddynamics of the cash process. For an agent j ∈ K ( N ) k , their corresponding cash process is X j,ν j t = X j − (cid:90) t (cid:16) S ν ( N ) t + a k ν ju (cid:17) ν ju du , (7)where a k > k and sets the scale of the instan-taneous cost. The penalty − a k (cid:82) t ( ν ju ) du may be interpreted as a cost of trading too quickly, or atractable proxy for the cost of crossing the bid-ask spread, as in Bouchard et al. (2018). It is alsostraightforward to include the influence of other agents in this cost, i.e., replace S ν ( N ) t + a k ν ju by S ν ( N ) t + (cid:80) k ∈ K a k p ( N ) k ν k, ( N ) u . In this market model, agents have restricted information over the course of the trading period.More specifically, agents have access only to the information generated by the paths of the asset priceprocess S ν ( N ) , their own inventory process q j,ν j , and the average order flow of each sub-population, ν k, ( N ) = (cid:0) ν k, ( N ) (cid:1) k ∈ K . We express this information restriction in our model by restricting the sigma-algebra to which an agent’s strategy may be adapted. For each j ∈ N , we only allow agent- j tochoose strategies contained within the set of asmissible strategies, A j := (cid:8) ω ∈ H T , ω is F j -predictable (cid:9) , (8)where we define H T = (cid:84) k ∈ K H ,kT , and F jt = σ (cid:16) ( S ν ( N ) u , ν k, ( N ) u ) u ∈ [0 ,t ) (cid:17) ∨ σ (cid:16) Q j (cid:17) , (9)which is the sigma-algebra generated by the paths of the asset price process, the total order-flowproces,, and the starting inventory level for agent j . In definition (8), we deliberately restrictourselves to processes in H T , to guarantee that S ν ( N ) t ∈ H ,kT for all k ∈ K . Each agent chooses their trading strategy to maximize an objective functional that measures theirperformance over the course of the trading period [0 , T ]. For each j ∈ N let A − j := × i ∈ N ,i (cid:54) = j A i .Each agent- j within a sub-population k ∈ K , chooses a control ν j ∈ A j to maximize a functional5 j : A j × A − j → R defined as follows H j ( ν j , ν − j ) = E P k (cid:20) X ν j T + q j,ν j T (cid:16) S ν ( N ) T − Ψ k q j,ν j T (cid:17) − φ k (cid:90) T ( q j,ν j u ) du (cid:21) , (10)where Ψ k > φ k ≥ ν − j := (cid:0) ν , . . . , ν j − , ν j +1 , . . . , ν N (cid:1) to indicate the depen-dence of the objective functional on the actions of all other agents in the population.The objective functional corresponds to the agent trying to maximize a weighted average of threeseparate quantities. The first term X ν j T corresponds to the total amount of cash the agent hasaccumulated up until time T . The second term, q j,ν j T (cid:16) S ν ( N ) T − Ψ k q j,ν j T (cid:17) corresponds to the cost ofliquidating all of the agent’s leftover inventory at time T , minus a liquidation penalty controlledby the parameter Ψ k . The last term, − φ k (cid:82) T ( q j,ν j u ) du is a running risk-aversion penalty thatis controlled by the parameter φ k , which incentivizes the agent to keep their market exposurelow during the trading period. As demonstrated in Cartea et al. (2017), this term may also beinterpreted as stemming from an agent’s model uncertainty with respect to a continuum of measures,absolutely continuous with respect to the reference measure, and penalize those candidate measureswith relative entropy.Each agent within sub-population k has an objective functional that is computed by takingexpectations under the measure P k . Hence, agents incorporate their own beliefs on the asset pricedynamics. Furthermore, each functional H j depends on the actions of all other players ( ν − j )through the dynamics of the asset price S ν ( N ) t , which implicitly appear in the definition (10).By expanding the dynamics of each of the state processes present in (10), and by using integrationby parts, we may re-write the agent’s objective functional as H j ( ν j , ν − j ) = C j + E P k (cid:34)(cid:90) T q j,ν j t dS ν ( N ) t − (cid:32) ν jt q j,ν j t (cid:33) (cid:124) (cid:18) a k Ψ k Ψ k φ k (cid:19) (cid:32) ν jt q j,ν j t (cid:33) dt (cid:35) , (11)where C j is a term that is constant with respect to ν j and ν − j . Each agent’s behaviour is charac-terized entirely by the objective functional they are trying to maximize. From (11), it is clear thatthe objective functional is parametric so that the agent’s preferences can be entirely described bythe tuple (cid:0) a k , φ k , Ψ k , P k (cid:1) and their starting inventory Q j .The market model defined above forms a stochastic game in which all participating agents arecompeting to maximize each of their own objectives. We wish to find and study this market at itsNash equilibrium. This equilibrium can be described more formally as the collection of admissiblestrategies { ν j ∈ A j : j ∈ N } which satisfies the condition ν j, ∗ = arg max ω ∈A j H j (cid:0) ω, ν − j (cid:1) , ∀ j ∈ N . (12)Obtaining this collection of strategies for the stochastic game with a finite number of playersproves to be a difficult task. As we make no assumptions on { ν j } beyond the measurability andintegrability assumptions required in the definition of A j , and in particular do not use a feed-backfrom, a Nash equilibrium satisfying (12) will be an open-loop equilibrium in general. One of themain obstacles in finding a solution to this problem is that each agent’s strategy is adapted todifferent filtration F j . Furthermore, each of the objective functionals defined in equation (11) areexpressed one of K different measures from the collection of measures { P k } k ∈ K , each representingthe beliefs of a particular individual. These two features make the finite-population stochasticgame difficult to solve directly. It is, however, possible to solve the stochastic game in the infinitepopulation limit, and use the result as an approximation for the finite population game.6 . Solving the Mean-Field Stochastic Game As the stochastic game presented in Section 2.4 presents obstacles when aiming to solve it directly,we now take a different avenue. In this section, we study the stochastic game as the population limittends towards infinity. The resulting limit is that of a stochastic Mean Field Game (MFG) thatwe can solve. Although we do not explicitly solve the finite player game presented in Section (2),by establishing an (cid:15) -Nash equilibrium property in Section 4, we show that the equilibrium solutionobtained for the MFG provides an approximation to the finite population game, provided that thepopulation size is large enough.This section begins by taking the population limit as N → ∞ , to obtain new objective functionalsfor the agents resulting in a stochastic MFG. Next, using convex analysis methods, we character-ize the Nash-equilibrium as the solution to a coupled system of FBSDEs. We then conclude bypresenting a solution to this FBSDE problem, and thus an exact representation of each agent’soptimal control at the Nash-equilibrium. Agent- j ’s objective functional (10) only depends on the population size N through the dynamicsof the mid-price process S ν ( N ) t , which is given by the dynamics in equation (5). Assumption . To proceed, we assume that the limiting trading rate exists, in particular, thereexist processes ν k = ( ν kt ) t ∈ [0 ,T ] for k ∈ K such that ν k ∈ H T andlim N →∞ ν k, ( N ) t = ν kt , P × µ a.e., (13)where µ is the Lebesgue measure on the Borel sigma-algebra B [0 ,T ] , and where P × µ is the canonicalproduct measure of P and µ .As each individual ν j is F j -predictable, ν k must be (cid:16)(cid:87) j ∈ N F j (cid:17) -predictable. Moreover, by ourassumption that P k ∼ P for each k ∈ K , the limit (13) also holds P k × µ almost everywhere. Fromnow on, we refer to each of the processes ν k as the mean-field trading rate for sub-population- k .Using the assumption that p ( N ) k → p k for all k ∈ K along with (13), we find that in the infinitepopulation limit, from the perspective of agent- j from sub-population k , the dynamics of the assetprice process is S νt = S + (cid:90) t (cid:40) A ku + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p k (cid:48) ν k (cid:48) u (cid:41) du + M kt . (14)In this limit, a single individual’s impact on the price becomes negligible, thus the resulting mean-field trading rate ν k is unaffected by a single agent’s trading rate ν j . Therefore, in the limit, eachagent’s objective H j no longer depends on the whole collection of trading rates ν − j , but insteadonly depends on the collection of mean-field processes { ν k } k ∈ K , which considerably simplifies thedependence structure within the game. This can be interpreted as agents becoming ‘price takers’in the mean-field limit resulting from the aggregate of agents’ infinitesimal impact.By using the objective functional representation in (11), expanding dS νt from (14), and noticingthat the martingale components vanish under expectation, we may write the agents objectivefunctional in the infinite population limit as H ν j ( ν j ) = C j + E P k (cid:34)(cid:90) T (cid:40) q j,ν j t (cid:16) A kt + λ (cid:124) k ν t (cid:17) − (cid:32) ν jt q j,ν j t (cid:33) (cid:124) (cid:18) a k Ψ k Ψ k φ k (cid:19) (cid:32) ν jt q j,ν j t (cid:33)(cid:41) dt (cid:35) , (15)7here for each k ∈ K we define λ k ∈ R K as λ k = ( λ k,k (cid:48) p k (cid:48) ) k (cid:48) ∈K and where we define ν t ∈ R K as ν t = ( ν k ) k ∈ K . In the expression for H ν j in (15) we suppress the argument ν − j as, in this infinitepopulation limit, their effect is felt through the mean-fields for each subpopulation. We use thesuperscript in the notation for H ν j to indicate the dependence on the set of mean-fields.Our new objective is to obtain the Nash-equilibrium in this newly defined mean-field game. TheNash equilibrium for the MFG consists of finding the infinite collection of controls { ν j } ∞ j =1 thatsatisfies the optimality condition ν j, ∗ = arg max ω ∈A j H ν ∗ j ( ω ) , (16)as well as the consistency condition ν k, ∗ = lim N (cid:37)∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t , (17)for all k ∈ K .In the limit, the explicit dependence of an agent’s actions in another agent’s objective functionalis replaced with an implicit dependence through the consistency condition. To solve the optimization problem described in Section 3.1, we must determine what strategymaximizes the rhs of equation (16) for all agents. This is achieved by using tools from infinitedimensional convex-analysis or variational calculus along the lines of Bank et al. (2017) (who inves-tigate a single agent tracking problem that does not incorporate price information) and Casgrainand Jaimungal (2018) (who look at a multi-agent setting with price imformation, but where allagents use the same model). First, we demonstrate that each function H ν j is a strictly concavefunctional of ν j . Next, as H ν j is a functional with an infinite-dimensional argument, we show thateach functional H ν j is Gˆateaux differentiable within the space A j and compute the Gˆateaux deriva-tive explicitly. General results in convex optimization then state that if the derivative vanishes ata point within the space A j , it must be the point at which H ν j attains its supremum. The lemmasthat follow give us the required properties for H ν j . Lemma 3.2.
The functional H ν j defined in equation (15) is strictly concave in A j up to P × µ nullsets.Proof. See Appendix A.1.
Lemma 3.3.
For an agent- j in sub-population k , the functional H ν j defined in equation (15) iseverywhere Gˆateaux differentiable in A j . The Gˆateaux derivative at a point ν ∈ A j in a direction ω ∈ A j can be expressed as (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:34)(cid:90) T ω t (cid:40) − a k ν t − k q j,νT + (cid:90) Tt (cid:16) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,νu (cid:17) du (cid:41) dt (cid:35) . (18) Proof.
See Appendix A.2. 8herefore, since H ν j is concave, the supremum of H ν j is attained at a point ν ∈ A j if and only ifthe expression (18) vanishes for all ω ∈ A j . Moreover, the strict concavity of H ν j guarantees thatsuch a point is unique up to P × µ null sets. Indeed, as the following theorem shows, the collectionof points { ν j } ∞ j =1 that ensures (18) vanishes for all j ∈ N , and for all ω ∈ A j , coincides with thesolution of an infinite-dimensional system of FBSDE. Theorem 3.4.
We have that ν j, ∗ := arg max ν ∈A j H ν ∗ j ( ν ) (19) for all j ∈ N if and only if for each agent- j in sub-population k , ν j, ∗ ∈ H T and ν j, ∗ is the uniquestrong solution to the FBSDE − d (2 a k ν j, ∗ t ) = (cid:16) E P k (cid:104) A kt + λ (cid:124) k ν ∗ t | F jt (cid:105) − φ k q j,ν j, ∗ t (cid:17) dt − d M jt , a k ν j, ∗ T = − k q j,ν j, ∗ T , (20) where M j ∈ H T is an F j -adapted P k -martingale and where ν k, ∗ t = lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t , (21) for all k ∈ K .Proof. See Appendix A.3.Theorem 3.4 reduces the convex optimization problem (15), (16), and (17) into an infinite systemof FBSDEs. The forward component comes from the latent drift processes A k and inventoryprocesses q j,ν j, ∗ , while the backwards component comes from the trading rates ν j, ∗ . The couplingin this system appears through the mean-field processes ν , which averages out all of the actionsof other agents within the game. A few difficulties are immediately apparent in the FBSDE (20).Firstly, each individual FBSDE, corresponding to a particular agent’s trading rate, is written interms of a martingale that is specific to the agent’s sub-population, and the measure under whichthe process is a martingale corresponds to the agent’s belief about the drift process A k . Secondly,the conditional expected value E P k (cid:104) A kt + λ (cid:124) k ν ∗ t | F jt (cid:105) appears in the driver of the FBSDE. This isa projection of the mean-fields onto the agent’s filtration, and appears because the agent cannotdirectly observe the strategies of other individuals. This projection of the mean-fields adds anotherlayer of difficulty.Recall that a solution to the FBSDE (20) for agent- j consists of a pair of processes ( ν j, ∗ , M j ) thatsatisfies the SDE and terminal condition in (20) P × µ almost everywhere. For the requirements ofTheorem 3.4 to be met, a solution must simultaneously meet the consistency condition (21) P × µ almost everywhere. If we can find a set of solutions, we can guarantee it is unique up to P × µ nullsets due to the strict convexity of the objective functional and the ‘if and only if’ nature of thestatement. In this section, we solve the FBSDE (20), and hence provide an exact form for the Nash-equilibrium for the infinite population mean-field game. The key to obtaining a solution lies infirst postulating a structure for the solution of (20). This form then suggests a vector valued FB-SDE that the mean-field processes ν k must satisfy, which are independent of any individual agent’sstrategy. The resulting non-standard FBSDE system, is defined across the set of K measures9 P k } k ∈ K and introduces an obstacle in solving it directly. The key step in obtaining a solution liesin representing the FBSDE in terms of a single measure, and solving it there.Due to the linear form of the FBSDE (20), it is natural to assume that the solution is affine. Assuch, for an agent- j within a sub-population k , we seek for optimal controls of the form2 a k ν j, ∗ t = 2 a k ν k, ∗ t + h k ,t (cid:16) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:17) , (22)where h k ,t : [0 , T ] → R is an unknown deterministic, continuously differentiable, function of time,and where we define the mean-field inventory process ¯ q k,ν k, ∗ = (¯ q k,ν k, ∗ t ) t ∈ [0 ,T ] for sub-population k as ¯ q k,ν k, ∗ t = ¯ m k + (cid:90) t ν k, ∗ u du . Plugging this ansatz into (20) and simplifying, we find that0 = (cid:110) ∂ t h k ,t + a k ( h k ,t ) − φ k (cid:111) (cid:16) q j,ν j, ∗ − ¯ q k,ν k t (cid:17) dt + (cid:110) d (2 a k ν k, ∗ t ) + (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t |F jt ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M jt (cid:111) , (23)along with the boundary condition that0 = { h k ,T + 2 Ψ k } ( q j,ν j, ∗ T − ¯ q k,ν k, ∗ T ) + { a k ν k, ∗ T + 2Ψ k ¯ q k,ν k, ∗ T } , (24)which must both hold P k × µ almost everywhere. Therefore, to solve the FBSDE (20), it is sufficientfor us to make the terms in the curly brackets of equation (23) and in the boundary condition (3.3)vanish independently of one another. Collecting these equations, we obtain a first-order Riccati-type ODE for h k ,t , (cid:40) ∂ t h k ,t + a k ( h k ,t ) − φ k = 0 ,h k ,T = − k , (25)as well as a linear FBSDE for the mean-field process ν kt − d (2 a k ν k, ∗ t ) = (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t | F jt ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M jt , a k ν k, ∗ T = − k ¯ q k,ν k, ∗ T , (26)where M j = (cid:16) M jt (cid:17) t ∈ [0 ,T ] ∈ H T is an F j -adapted P k -martingale.Let us point out here that the ansatz for ν j, ∗ found in equation (22) satisfies the consistencycondition as long as there exist solutions to the equations (25) and (26). This can be most easilyseen by taking the average of (22) over j ∈ K Nk and taking the limit as N → ∞ .The FBSDE (26) suggests that the solution ν k, ∗ should be an F j -adapted process. Equation (26),however, holds for any agent- j (cid:48) for which j (cid:48) ∈ K k , therefore, ν k, ∗ must be F j (cid:48) -adapted for any j (cid:48) ∈ K k . Consequently, each ν k, ∗ must be adapted to the filtration generated by the intersection (cid:84) j (cid:48) ∈K k F j (cid:48) t . Computing this intersection, we find that (cid:84) j (cid:48) ∈K k F j (cid:48) t = (cid:84) j (cid:48) ∈K k σ (cid:16) ( S u , ν ∗ u , q j,ν j, ∗ u ) u ∈ [0 ,t ] (cid:17) ⊆ σ (cid:0) ( S u , ν ∗ u ) u ∈ [0 ,t ] (cid:1) , which does not depend on the sub-population k . This is easy to see since ( i )each q j is not measurable with respect to σ ( q i ) for any i (cid:54) = j and ( ii ) for any j ∈ N , ν j is notmeasurable with respect to σ ( ν ∗ ) by definition from (21). Thus, for each k ∈ K , we have that ν k, ∗ is an F -adapted process, where we define F t := (cid:86) j ∈K k F jt = σ (cid:0) ( S u , ν ∗ u ) u ∈ [0 ,t ] (cid:1) . As a consequence,10e find that ν k, ∗ should satisfy the FBSDE − d (2 a k ν k, ∗ t ) = (cid:16) E P k [ A kt + λ (cid:124) k ν ∗ t | F t ] − φ k ¯ q k,ν k, ∗ t (cid:17) dt − d M kt , a k ν k, ∗ T = − k ¯ q k,ν k, ∗ T , (27)where M k = ( M kt ) t ∈ [0 ,T ] is an F -adapted, P k -martingale, and the expectation appearing in thedrift is conditional on F t not F jt .By stacking the FBSDEs (27) over all values of k ∈ K , we may obtain a vector-valued FBSDEfor the process ν ∗ . To this end, define the column vector of filtered drift processes (cid:98) A = ( (cid:98) A t ) t ∈ [0 ,T ] where (cid:98) A t = (cid:16) E P k [ A kt |F t ] (cid:17) k ∈ K . Next, as ν ∗ t is F t -measurable, stacking the FBSDEs (27) over allvalues of k ∈ K , we have − d (2 a ν ∗ t ) = (cid:16) (cid:98) A t + Λ ν ∗ t − φ ¯ q ν ∗ t (cid:17) dt − d M t , a ν ∗ T = − Ψ ¯ q ν ∗ T , (28)where a , φ , Ψ and Λ are all real-valued K × K matrices defined as a = diag ( { a k } k ∈ K ) , φ = diag ( { φ k } k ∈ K ) , Ψ = diag ( { Ψ k } k ∈ K ) , Λ = λ , p . . . λ ,K p K ... ... λ K, p . . . λ K,K p K , where ¯ q ν ∗ t = m + (cid:82) t ν ∗ u du , and M = ( M k ) k ∈ K is a column vector of the F -adapted processes,where as a reminder, M kt ∈ H T , ∀ k ∈ K and the k -th element M k is a P k -martingale.From the linear structure of the FBSDE (28), we can further simplify the problem by seeking foraffine solutions of the form 2 aν ∗ t = g ,t + g ,t ¯ q ν ∗ t , (29)where g ,t : [0 , T ] → R K × K is a deterministic and continuously differentiable function of time,and g = ( g ,t ) t ∈ [0 ,T ] ∈ H T is an R k -valued stochastic process. Plugging the ansatz into (28), andfollowing through with the same logical steps as before, we find that the ansatz holds true so longas g is the solution to the Ricatti-type matrix-ODE (cid:40) ∂ t g ,t = (cid:0) Λ + g ,t (cid:1) (2 a ) − g ,t − φ , g ,T = − Ψ , (30)and when g ,t solves the BSDE, (cid:40) − d g ,t = (cid:16) (cid:98) A t + (cid:0) Λ + g ,t (cid:1) (2 a ) − g ,t (cid:17) dt − d M t , g ,T = 0 , (31)where M is the same vector of processes present in FBSDE (28).At this point, we have succeeded in reducing the search for a Nash-equilibrium to solving (i) twodeterministic ordinary differential equations (ODEs) (25) and (30), and (ii) a non-standard linearBSDE (31). The ODEs are straightforward to solve, however, BSDE poses some further challenges.One of the primary obstacles in solving the BSDE (31) is that each component of g incorporates11 process that is a martingale under a different probability measure. Recall that the componentsof M = {M k } k ∈ K are required to be martingales with respect to the k different measures { P k } k ∈ K .Each measure is what agents within sub-population k use to compute expectations, and agentswithin that sub-population assume the asset has drift A k in excess of the order-flow from all agents.The key step in solving the BSDE is to re-cast it in terms of martingales under a single probabilitymeasure. This introduces non-trivial drfit adjustments, however, we find that it is indeed possibleto solve the modified BSDE explicitly.Consider the k th dimension of the BSDE (31) − dg k ,t = (cid:16) (cid:98) A kt + G kt g k ,t (cid:17) dt − d M kt , (32)where M k is a P k -martingale, and where G kt is defined as the k -th row of the deterministic matrix-valued function G t = (cid:0) Λ + g ,t (cid:1) (2 a ) − . The solution of BSDE (32) can be expressed implicitly asfollows g k ,t = E P k (cid:20) (cid:90) Tt (cid:110) (cid:98) A ku + G ku g ,u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (33)Next, we aim to represent (33) in terms of expectation under another measure Q such that Q ∼ P k for all k . By the assumption that P k ∼ P k (cid:48) for all k, k (cid:48) ∈ K , there always exists such a measure. Forexample, Q = P k for some k . Given this measure, define the F -adapted Radon-Nikodym derivativeprocesses Z Q ,kt = d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t := E (cid:20) d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , ∀ k ∈ K . (34)Using this process, we find that we may write equation (33) as an expected value under the Q measure as, Z Q ,kt g k ,t = E Q (cid:20) (cid:90) Tt (cid:110) Z Q ,ku (cid:98) A ku + Z Q ,ku G ku g ,u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (35)Defining the diagonal R K × K valued process Z Q = ( Z Q t ) t ∈ [0 ,T ] , where Z Q t = diag( Z Q ,kt ) k ∈ K , allowsus to write a linear BSDE for Z Q t g ,t = (cid:16) Z Q ,kt g k ,t (cid:17) k ∈ K using a single measure Q . More specifically,from (35), we have that − d (cid:16) Z Q t g ,t (cid:17) = (cid:16) Z Q t (cid:98) A t + Z Q t G t g ,t (cid:17) dt − d ˜ M t , (36)where ˜ M = ( ˜ M t ) t ∈ [0 ,T ] is an R K -valued Q -martingale. The BSDE (36) is linear and its solutioncan be expressed in closed form. The following theorem provides a representation for the solutionof g as well as { h k } k ∈ K , and g . Theorem 3.5 (Solutions to the Mean-Field BSDEs) . I) Let Q be any probability measure such that Q ∼ P . Then the BSDE (31) admits a closedform solution, g ,t = E Q (cid:20)(cid:90) Tt ( E Q t ) − E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21) , (37) where E t is the solution to the forward matrix-valued SDE d E Q t = E Q t (cid:16) G t dt + ( Z Q t ) − d Z Q t (cid:17) , E Q = Z Q , (38)12 here the deterministic matrix valued function G t := (cid:0) Λ + g ,t (cid:1) (2 a ) − and Z Q t = diag (cid:32) d P k d Q (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:33) k ∈ K . (39) II) There exists a unique solution g ,t to the matrix valued ODE (30) that is bounded over theinterval [0 , T ] .Moreover, let Y t : [0 , T ] → R K × K be defined as Y t = e ( T − t ) B (cid:16) I ( K × K ) , − Ψ (cid:17) (cid:124) , (40) where B ∈ R K × K is the block matrix B = (cid:18) ( K × K ) − (2 a ) − − φ Λ (2 a ) − (cid:19) , (41) then, using the matrix partition Y t = ( Y ,t , Y ,t ) (cid:124) , where Y ,t , Y ,t ∈ R K × K , the function g ,t may be expressed as g ,t = Y ,t Y − ,t . (42) III) The ODE (25) admits the unique solution h k ,t = − ξ k (cid:18) Ψ k cosh ( − γ k ( T − t ) ) − ξ k sinh ( − γ k ( T − t ) ) ξ k cosh ( − γ k ( T − t ) ) − Ψ k sinh ( − γ k ( T − t ) ) (cid:19) , ∀ k ∈ K , (43) where the constants γ k = (cid:112) φ k /a k and ξ k = √ φ k a k . Moreover, h k ,t ≤ for all t ∈ [0 , T ] .Proof. See Appendix A.4.This theorem shows that g may be expressed in terms of any measure Q ∼ P , which includes anyof the { P k } k ∈ K . The representations for g , g and h k in (37), (42) and (43), respectively, togetherwith the form of ν j, ∗ in (22), provides us with a candidate for the optimal control in the populationlimit. It only remains to ensure that this optimal control is indeed admissible, i.e., ν j, ∗ ∈ A j . Thefollowing theorem provides sufficient conditions for this to hold. Theorem 3.6.
Let us assume that g ∈ H T . Then the optimality equation (20) admits the solution ν j, ∗ t = ν k, ∗ t + a k h k ,t ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) , (44) and the mean-field trading rate process ν ∗ = (cid:0) ν k, ∗ (cid:1) k ∈ K may be written ν ∗ t = g ,t + g ,t ¯ q ν ∗ t , (45) where g , g and h k are the functions defined in Theorem 3.5, and the mean-field inventory process ¯ q ν ∗ = (¯ q k,ν k, ∗ ) k ∈ K is ¯ q ν t = ¯ m + (cid:90) t ν ∗ u du . Moreover, the collection of proposed optimal solutions satisfies ν j, ∗ = arg max ω ∈A j H ν ∗ j ( ω ) (46)13 or all j ∈ N .Proof. See Appendix A.5.Theorem 3.6 guarantees, under the technical assumption that g ∈ H T , our proposed solutionforms a Nash-equilibrium for the limiting mean-field game. Moreover, Theorem (3.4) guaranteesthat the solution is unique up to P × µ null sets. The condition g ∈ H T holds for the class ofmodels presented in Sections 5 and 7. While these models are not exhaustive, they provide aninstructive class to study. The optimal solution provided in Theorem (3.6) admits many interesting properties. Firstly, themean-field trading rate in (45) contains two parts: (i) a ‘risk control’ portion g ,t ¯ q ν ∗ t , which isindependent of the dynamics of the asset price process; and (ii) an ‘alpha trading’ or statisticalarbitrage portion g .The ‘risk control’ portion ( g ,t ¯ q ν ∗ t ) survives even when A k = 0 ∀ k ∈ K , i.e., the midprice processsubtracted from total order-flow is a martingale and induces interactions between the various sub-populations due to the their permanent impact. It can be shown through numerical examples thatthis function scales with the parameter matrix φ and Ψ to make agents liquidate their inventoriesfaster when either φ or Ψ become large, thereby controlling the risk agents take while trading.In the ‘alpha trading’ portion ( g ,t ), agents adjust their trading based on a weighted average of (cid:98) A , the estimated drift of the asset price for all agents. The weighting process E encodes bothinformation about the ‘risk’ portion of the algorithm, g , as well as information about all otheragent’s measures through the process Z , which implicitly appears through the dynamics of E . Theweighting function compensates for the differing models agents use for the asset price, and adjuststhe individual trading rates to account for the price impact due to ‘alpha trading’ of all otheragents.The Nash equilibrium, provided in Theorem 3.6, resembles the one obtained in Casgrain andJaimungal (2018), with the main differences lying in the expression for the value of the function g . The differences are important and reveal themselves in two ways.First, here, we have a stochastic weighting process E defined by the SDE (38) which replacesthe deterministic time-ordered exponential function present Casgrain and Jaimungal (2018). Infact, we can view E as the natural extension of the time-ordered exponential appearing in Casgrainand Jaimungal (2018) to the case of stochastic processes. Second, to determine the correctionto trading, rather than weighting a single estimate of future alpha as in Casgrain and Jaimungal(2018), all posterior estimated alphas’ (cid:98) A k under all measures P k , k ∈ K , play a role. Finally, when P k = P k (cid:48) for all k, k (cid:48) ∈ K , the optimal controls in Theorem 3.6 match the one presented in Casgrainand Jaimungal (2018).Thus far, we discussed the optimal mean-field strategy. The individual agents’ trading rates alsoadmit an interesting structure. An arbitrary agent trades at their own sub-population mean-fieldrate ν k plus a correction term proportional to the difference between their individual inventory andthe mean-field inventory: ( q j,ν j, ∗ − ¯ q k,ν k ). This difference can be solved for in terms of the differencebetween the initial inventory of the agent and its sub-population prior mean:( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) = ( Q j − ¯ m k ) e (cid:82) t h k ,u du , (47)where h k ,t ≤ t ∈ [0 , T ]. Therefore, the difference in inventories shrinks towards zeroat a deterministic rate, and agents are consistently drawing their inventories closer to their sub-14opulation’s mean-field. As time elapses, all agents in a sub-population resemble that of theirsub-population’s mean-field.
4. The (cid:15) -Nash Equilibrium Property
In Section 3, we solve the stochastic game in the infinite population limit, and provide an exactrepresentation of each agent’s control at the Nash-equilibrium. One important question to ask ishow the optimal MFG strategy performs in a finite-population game. We study the properties ofthe limiting strategy in the finite game by looking at how close the collection of limiting strategies,defined in Theorem (3.6) is to the true Nash-equilibrium of a game with only N agents.Let us consider a finite game with N players, as described in Section 2. Let us assume that eachof the agents in this population use the strategy described in Theorem 3.6. Each agent computesthe process ν t according to equation (45), and then uses these values to compute their own tradingrates, ν j, ∗ t , according to equation (44). In the theorem that follows, we show that this collection ofcontrols can serve as a quasi-Nash-equilibrium in a finite player game, provided that the populationsize is large enough. Theorem 4.1 ( (cid:15) -Nash equilibrium) . Consider the collection of objective functionals { H j : j ∈ N } defined in equation (10) and the set of optimal mean-field controls { ν j, ∗ } Nj =1 defined in Theo-rem (3.6) . Suppose that there exists a sequence { δ N } ∞ N =1 such that δ N → and (cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k N − p k (cid:12)(cid:12)(cid:12)(cid:12) = o ( δ N ) (48) for all k ∈ K , then H j ( ν j, ∗ , ν − j, ∗ ) ≤ sup ν ∈A H j ( ν, ν − j, ∗ ) ≤ H j ( ν j, ∗ , ν − j, ∗ ) + o ( N ) + o ( δ N ) (49) for each j ∈ N .Proof. See Appendix A.8.Theorem 4.1 shows that for any (cid:15) >
0, there exists N (cid:15) such that for all N > N (cid:15) agent- j mayimprove their performance by at most (cid:15) by unilaterally deviating away from ν j, ∗ . The statementof the theorem also reveals that the rate N (cid:15) must be at least linear in (cid:15) − and is dependent on therate at which δ N vanishes in the limit. This theorem effectively demonstrates that the mean-fieldequilibrium { ν j, ∗ } Nj =1 serves as a viable alternative to the true finite-game equilibrium, providedthe population size is large enough.
5. An Example Model of Disagreement
In this part, we provide an example model where the asset price process is modulated by alatent Markov chain similarly to that in Casgrain and Jaimungal (2016). In our model, we assumeeach sub-population disagrees on the distribution of initial value of the latent process, while theydo agree on what the possible values of the latent state are, and agree on the transition ratesbetween states. One can view this as all agents believing there are positive, neutral, and negativedrift environments, but disagree on what is the current environment. We prove that the resultingoptimal control presented in Theorem 3.6 exists and is well-defined, i.e., that g ∈ H T , under thisgeneral model assumption. 15o this end, assume that the asset price satisfies the SDE dS ν ( N ) t = (cid:32) J (cid:88) i =1 α it { Θ t = θ i } + (cid:88) k (cid:48) ∈ K λ k,k (cid:48) p ( N ) k (cid:48) ν k (cid:48) , ( N ) u (cid:33) dt + σdW t , (50)where Θ t is a continuous-time Markov chain taking values in the set { θ i } i ∈ J ( J = { , . . . , J } )and where the processes α i = ( α it ) t ∈ [0 ,T ] are F -predictable processes satisfying α i ∈ H T for all i ∈ J . In this model, agents across different sub-populations have different prior probabilities onthe initial value of the latent process, so that under the measure P k (Θ = θ i ) = π k,i ∈ (0 ,
1) with (cid:80) i ∈ J π k,i = 1. We assume that under each measure P k , the latent Markov chain Θ t has the sameinfinitesimal generator matrix C . Furthermore, we assume that W is a stardard Brownian motionin each measure P k and that σ > λ in each measure, so that, in the notation of section 2, wehave λ k = λ for all k ∈ K .This model may be interpreted as a case in which agents all agree on the dynamics of the assetprice S ν and the latent process Θ but disagree on the initial value of the latent process. Thespecification allows us to compute the expression for the processes { Z P k t } k ∈ K , which are used tocompute each agent’s optimal strategy. With this model, we may compute the Radon-Nikodymderivative process Z P k t for any measure P k . Proposition 5.1.
Fix Q = P k for some k ∈ K . If the asset price dynamics follow the latent Markovchain model of equation (50) , then Z P k t , defined in Theorem 3.5, may be expressed as Z P k t = (cid:88) j ∈ J M kj P k (cid:0) Θ = θ j (cid:12)(cid:12) F t (cid:1) , (51) where for each j ∈ J we define the diagonal matrix M kj = diag (cid:16) π k (cid:48) ,j (cid:46) π k,j (cid:17) k (cid:48) ∈ K .Proof. See Appendix A.6.From expression (51), it is clear that Z P k is almost surely bounded, since P k (cid:0) Θ = θ j (cid:12)(cid:12) F t (cid:1) ∈ [0 , π k,i ∈ (0 ,
1) for all k ∈ K , i ∈ J . We use this fact in the proof of the following proposition. Proposition 5.2.
Suppose that the asset price process is given by Equation (50) , then the solution g defined in Theorem 3.5 satisfies g ∈ H T and thus the results of Theorem 3.6 apply to the modeldescribed in this section.Proof. See Appendix A.7.Although we show that there exist models for which the mean-field optimal control presented inTheorem 3.5 is well defined, computing these controls presents us with another challenge. In par-ticular, due to the complicated nature of the process E Q , the conditional expected value appearingin the expression (37), for obtaining g , is difficult to compute. In section 7, we address this issueby presenting a computational method to approximate such expressions. The generator matrix C ∈ R J × J can be any matrix satisfying the conditions C i,j ≥ i (cid:54) = j ∈ J and C i,i = (cid:80) j (cid:54) = i ∈ J C i,j . C defines the transition dynamics of the latent Markov chain Θ t through the relation, P k (cid:16) Θ t + h = θ i (cid:12)(cid:12)(cid:12) Θ t = θ j (cid:17) = (cid:0) e h C (cid:1) i,j , where e h C represents the matrix exponential. . A Simulation-Based Computational Method For most non-trivial models, obtaining a closed-form expression for the solution to the BSDE (37)for g ,t proves to be very difficult. To overcome this difficulty, we present a simulation-basedcomputational method to approximate solutions. We propose a Least-Square-Monte-Carlo (LSMC)based method, which closely resembles the methods used to approximate solutions of BSDEs, asin Bender and Steiner (2012) and Gobet et al. (2005). Unlike these two methods, however, we donot concern ourselves with the computation of the martingale portion of the BSDE (36), since it isnot required to compute g .To this end, define the M -point uniform partition of the interval [0 , T ], T := { t m := m × ∆ , m =0 , , . . . , M } where M is a positive integer and where ∆ := T /M is the discretization interval .We aim to approximate the process g over the partition T with a discrete-time stochastic processˆ g = (cid:8) ˆ g ,t m (cid:9) t m ∈T , where each ˆ g ,t m ∈ R K .To derive an expression for ˆ g , we first study the expression for g ,t , g ,t = E Q (cid:20)(cid:90) Tt ( E Q t ) − E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21) (52)at the points t m ∈ T . This expression may be written recursively over T as follows g ,t m = E Q (cid:20)(cid:90) t m +1 t m ( E Q t m ) − E Q u (cid:98) A u du + ( E Q t m ) − E Q t m +1 g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:21) . (53)Next, approximating the time-integral in the previous expression with its left end-point, we obtainthe approximation g ,t m ≈ E Q (cid:104) (cid:98) A t m ∆ + ( E Q t m ) − E Q t m +1 g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) . (54)A further simplification follows by approximating the term ( E Q t m ) − E Q t m +1 for small values of∆. Using the definition of E Q in Equation (38), we may factor E Q as E Q t = ˜ E Q t Z Q t , where ˜ E Q t is the solution the the matrix-valued SDE d ˜ E Q t = ˜ E Q t (cid:16) Z Q t G t ( Z Q t ) − (cid:17) dt with initial condition˜ E Q = I ( K × K ) . For ∆ (cid:28)
1, we freeze the process in parenthesis at their t m values, so that d ˜ E Q t ≈ ˜ E Q t (cid:16) Z Q t m G t m ( Z Q t m ) − (cid:17) dt over each interval [ t m , t m +1 ), resulting in( ˜ E Q t m ) − ˜ E Q t m +1 ≈ exp (cid:110) Z Q t m G t m ( Z Q t m ) − ∆ (cid:111) = Z Q t m exp { G t m ∆ } ( Z Q t m ) − , (55)where exp represents matrix exponential. By plugging in this last result into equation (54), weobtain an approximation ˆ g for the process g at t m asˆ g ,t m = E Q (cid:104) (cid:98) A t m ∆ + exp { G t m ∆ } ( Z Q t m ) − Z Q t m +1 ˆ g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) . (56)The final step in obtaining values of ˆ g is to approximate the conditional expected value in therhs of equation (56). As is often done, we project the conditional expectation onto a finite basis ofstochastic processes. In particular, let the (vector-valued) stochastic process Y = ( Y t ) t ∈ [0 ,T ] , with Y t ∈ R L where L is some positive integer, and we write E Q (cid:104) (cid:98) A t m ∆ + exp { G t m ∆ } ( Z Q t m ) − Z Q t m +1 ˆ g ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) ≈ (cid:10) Y t m , β t m (cid:11) (57)for some collection { β t } t ∈T , where each β t ∈ R L × K , and where the process Y can be chosen fairly17rbitrarily. A common and sensible choice for Y is a finite basis expansion of the state processesof the problem (i.e. S νt , Z Q t , etc.) and combinations of them.The algorithm then estimates the coefficients (cid:98) β in a sequential manner. This is done by firstsimulating paths of Y t forward over the time partition T using the measure Q , and then proceedingbackwards in time from the boundary condition, solving a least-square regression problem at eachtime step t m ∈ T to obtain each of the coefficients (cid:98) β . The details of this algorithm are illustratedin Algorithm 1 below. Algorithm 1 is an application of the LSMC methods that already exist forBSDEs and we point the reader to Bender and Steiner (2012) and Gobet et al. (2005) for moredetails on the convergence rates and error bounds. Data:
Simulate M paths of ( Y t , Z Q t , (cid:98) A ) over T using measure Q Set (cid:98) β t M = ( L × Set ˆ g ,t M ( Y ) = ( L × for m = M − , M − , . . . , do Set (cid:98) β t m = arg min β M (cid:88) n =1 (cid:16) (cid:10) Y nt m , β (cid:11) − (cid:110) (cid:98) A nt m ∆ + exp { G t m ∆ } ( Z Q ,nt m ) − Z Q ,nt m +1 ˆ g ,t m +1 ( Y nt m +1 ) (cid:111) (cid:17) Set ˆ g ,t m ( Y ) = (cid:68) Y nt m , (cid:98) β t m (cid:69) endAlgorithm 1: The LSMC algorithm used to approximate the valueof the process g given in Equation (37).As the process Z Q is defined as a diagonal matrix of Radon-Nikodym derivatives, it is possibleto re-write conditional expected value over Q in equation (56) in an element-wise fashion as (cid:98) g k ,t m +1 = (cid:98) A kt m ∆ + (cid:88) k (cid:48) ∈ K (exp { G t m ∆ } ) k,k (cid:48) E P k (cid:48) (cid:104) (cid:98) g k (cid:48) ,t m +1 (cid:12)(cid:12)(cid:12) F t m (cid:105) , ∀ k ∈ K , (58)where ( · ) k,k (cid:48) represents element ( k, k (cid:48) ) of the matrix. The above representation allows one to mod-ify Algorithm 1 such that it eliminates the dependence on the process Z Q in the LSMC procedure,but at the cost of having to simulate the basis process Y across all measures { P k } k ∈ K . We find thatin examples where simulating the process Z Q t is straightforward, this is much less efficient thanAlgorithm 1 due to the need of simulating and storing K copies of the process Y . In cases where Z Q is intractable, however, this modification may be a viable alternative for computing ˆ g .Equation (58) also provides additional insight into how the optimal policy is trading. As pointedout in Section 3.4, the process g represents the ‘statistical arbitrage’ portion of the agent’s optimaltrading strategy. Equation (58) further reveals that, over one step, agent’s of type- k trade propor-tionally to the sum of their best estimate of the asset’s drift ( (cid:98) A kt m ∆) and a weighted average of theexpected end of period ‘alpha’ from all sub-populations. Hence, agents trade based on expectedexogenous price movements plus what they anticipate other traders’ actions to have on price. Theweights generated by the matrix exp { G t m ∆ } serve to risk adjust the agent’s own alpha tradingand to adjust for the impact of other agents based on the scale of their market impacts.18 . Numerical Experiments This section showcases numerical experiments resulting from a particular model of differing beliefs.We first assess the performance of the LSMC algorithm presented in Section 6 by comparing, inthe case of equal beliefs, to the analytical results in Casgrain and Jaimungal (2018). The algorithmis then used to approximate and simulate a finite collection of agents trading at the mean-fieldNash-equilibrium when the agents have differing beliefs.For the remainder of the section, we assume the asset price process follows a linear mean-revertingmodel described in Section 5 with K = 2 sub-populations. Define the un-impacted asset priceprocess F = ( F t ) t ∈ [0 ,T ] to be the solution to the SDE dF t = κ (Θ t − F t ) dt + σ dW t , (59)where κ, σ > W = ( W t ) t ∈ [0 ,T ] is a Wiener process in both measures P and P , and Θ = (Θ t ) t ∈ [0 ,T ] is a latent Markov chain with generator matrix C which can take one of two values in the set { θ , θ } .The asset price process including the price impact is then defined as having the dynamics dS ν ( N ) t = dF t + (cid:16) λ p ( N )1 ν , ( N ) t + λ p ( N )2 ν , ( N ) t (cid:17) dt with λ , λ > p ( N ) k ) k ∈ K defined in Section 2. We assume sub-population 1 believes the initialvalue of Θ has distribution π , while sub-population 2 believes the initial value has distribution π . The dynamics of the asset price process causes it to mean-revert towards the value of Θ t ,which may change over the course of the trading period [0 , T ]. Furthermore, this model falls intothe class of models described in Section 5, which guarantees that the mean-field optimal solutionfrom Theorem 3.6 exists and is well defined. To assess the LSMC algorithm described in Section 6, we choose a special, non-trivial, case where g can be computed in closed form. The case we study is when P = P = P . This reduces themarket model to one where all agents agree on the dynamics of the asset price process. We maythen assess the accuracy of the approximation by comparing the results produced by the LSMCalgorithm to the closed-form solution of the optimal control in Casgrain and Jaimungal (2018).For this particular experiment, we use the model presented in the previous section, but where theprior on the initial states of the latent process is the same for all agents. The two sub-populationsof agents may, however, differ in their parameter triplet (Ψ k , φ k , a k ). For the experiments we usethe parameters in Table 1. The parameters chosen for this experiment match the parameters usedin the simulations in Section 5 of Casgrain and Jaimungal (2018). Due to the large value of theparameter Ψ k , agents in both sub-populations are incentivized to fully liquidate their inventorypositions before the end of the trading horizon. The risk-aversion parameter φ k is 10 times largerin sub-population 2 than in sub-population 1. This can be interpreted as a model in which agents insub-population 2 are averse to holding any inventory and are intent on liquidating their inventoriesas quickly as possible, while agents in sub-population 1 do not feel such urgency and are more opento trading on alpha. k N k Ψ k φ k a k − − − − Table 1: Population and impact parameters for the two sub-populations of agents.
19e set T = 1 to be the trading horizon for the model. The asset price process follows the Markovmodulated Ornstein-Uhlenbeck dynamics in (59), with parameters provided in Table 2. Table 2also defines the parameters for the dynamics of the latent process Θ t . Θ t is defined so that theasset price process either mean reverts to θ = 4 .
95 or θ = 5 .
05, depending on the state of Θ. Inthis particular experiment, we set the distribution of Θ so that there is an equal chance of startingin each of the states. We also choose an asymmetric generator matrix so that the latent process istwice as likely to spend time in state 1 than state 2. π = ( . . ), C = (cid:2) − − (cid:3) , θ = { . , . } , κ = 5 . σ = 0 . λ k = 10 − Table 2: The parameters used for the asset price dynamics and for the latent process.
We run Algorithm 1 using 10 simulated paths over a partition of size 3600 over the interal [0 , T ]and compare the results from the LSMC algorithm applied to these simulated paths to the closedform solution for g in Casgrain and Jaimungal (2018). In this particular case, we set the basisprocess, Y t , to be a second-order monomial expansion of ( S ν t , π t ) with product terms included,where we define π it = P (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) for i = 1 ,
2. For details on how to compute such conditionalprobabilities, see Section Appendix B and (Casgrain and Jaimungal, 2016, Section 3).
Figure 1: Error plots for the LSMC algorithm described in Section 6. In these plots, we compare the value of theLSMC estimate, ˆ g , with the true value of g , in a special case where we can compute g in closed form. The upperpanel shows the standard deviation of the error, SD (cid:0) ˆ g k ,t − g k ,t (cid:1) computed over 10 simulations, for each k = 1 , , T ]. The lower panel, plots the quantity E (cid:2) | g k ,t − ˆ g ,t | (cid:3) (cid:14) E (cid:2) | g k ,t | (cid:3) computed over 10 simulations,and provides a measure of relative error. Figure 1 shows that the LSMC algorithm performs well and with a high level of accuracy withthis particular model. In particular, from the lower panel, we see that the largest relative error isabout 1 . .
5% of the absolute size of g . Wehave also observed, as elsewhere in the LSMC literature such as in Letourneau and Stentoft (2016)and Wang and Caflisch (2009), that randomizing the initial value of the state process, ( S , π ), forthe forward simulation portion of the algorithm significantly improves the estimates. Furthermore,the errors reported in this figure appear to be consistent across a wide variety of model parameters.For the more general case in which there are different measures assigned to each population, weset Q = P and we enlarge the basis process Y t to be a monomial expansion of the forward stateprocess ( S νt , { π kt } k ∈ K , { Z P ,kt } k ∈ K ). Expansions with respect to different bases, such as Laguerre or20ermite polynomials, are also possible, however, in our experiments, we find the monomial basisexpansion performs well enough. In this section, we simulate the full market with agents of differing beliefs disagreement. Theexample continues to use the model in Section 7.1 with the parameters in Table 1 and 2, with theexception that the distributions on Θ now differs across each of the sub-populations. In particular,we assume agent’s in sub-population 1 believe that prior distribution over initial states is π = ( . . ),while the sub-population 2 believe it is π = ( . . ). In other words, sub-population 1 believes thatthe latent process will much more likely begin in the higher state, while sub-population 2 assumesthe reverse. In the simulation, we also assume the starting inventory of agents in sub-population k has distribution Q j ∼ N (¯ µ k , ¯ σ ), where we set ¯ µ = 100, ¯ µ = 0 and σ = 50. The rationaleis so that the risk-averse sub-population 1 begins the trading period long 100 shares on average,while agents in sub-population 2 begin the trading period with zero shares on average. Over thecourse of the simulation, we fix the path of the latent process to begin in the upper state and thenjump down to the lower state at t = 0 .
5. To compute the trading strategy of each participatingagent, we use the LSMC method from the preceding section to approximate the value of g andthen use this value in Theorem 3.6 to determine the optimal trading rate of the fictitious mean-field and then each individual agent. At each time step, we compute the basis process Y t byusing a fifth-order polynomial expansion of the state process ( S νt , { π kt } k ∈ K , { Z P ,kt } k ∈ K ), and usethe coefficients obtained by the LSMC algorithm to obtain an approximation for g ,t . Computingthe values of π kt and Z P ,kt requires the computation of a collection of posterior probabilities ateach time step. To do this, we make use of the filtering and smoothing equations which are detailedin Appendix Appendix B. Figure 2: State processes from a single simulated scenario of the market.
Left panel : inventory path process fromall agents (sub-populations separated by color), the sub-population mean-field inventory process ¯ q k , and the sub-population empirical mean inventory ¯ q k, ( N ) . Right panel : ( top ) the unimpacted F and impacted S asset priceprocesses, and the latent Markov chain Θ; ( middle ) sub-population filters π k,jt = P k (cid:0) Θ t = θ j (cid:12)(cid:12) F t (cid:1) for the latentprocess state; ( bottom ) the Radon-Nikodym process Z P , t = d P d P (cid:12)(cid:12) F t . Figure 2 shows one sample path of the simulation of all agents. The figure demonstrates a numberof path-wise properties of the trading algorithm and of the beliefs of each of the sub-populationsof agents. Firstly, the left panel shows that the agents inventory paths differ significantly between21he sub-populations. As mentioned in Section 7.1, and resulting from the population parametersin Table 1, sub-population 1 is far more risk-averse than sub-population 2. This is reflected in thepath-wise variance of their inventory.Agents in sub-population 1 begin long the asset on average. As these agents are risk-averse,their main concern is to unwind their position quickly. They are, however, conscious of their ownexpectations of the future path of the asset price as well as the expectations of sub-population 2,which they use to adjust the rate at which their inventory is liquidated. This last effect can be seenthrough the variations of the inventory paths of sub-population 2 in Figure 2.Agents in sub-population 2 are instead concerned with profiting from statistical arbitrage. Theybegin the trading period by incorrectly assigning a 90% probability that the latent process is inthe upper state. Because of this, they expect the asset price to mean-revert downwards slightly,so they begin by taking a slight short position in the asset over the time period t ∈ [0 , . t = 0 .
15, the asset price has approximately reached the lower mean reverting level. The agentexpects that the asset price will now be reverting upwards in the long run, since it expects the stateof the latent process to switch, which would cause the price to begin reverting upwards. Becauseof this, the agent begins reverting their short position into a long position in the asset over thecourse of the time period t ∈ [0 . , . t = 0 .
4, agents fromgroup 2 are now confident that the latent process is in the upper state. Moreover, using the sametrain of logic as before, it expects the price to mean revert downwards in the long run, due to anexpected switch in the latent process. Thus it gradually shifts to a short position and repeats thesame process. The magnitude of the long and short positions for sub-population 2 decrease as theend of the trading period approaches. This is due to the fact that the agent is highly insentivizedto completely liquidate their inventory before time t = 1, and therefore reduces their absoluteexposure so that it is easier to completely liquidate their inventory.From the ceter-right part of Figure 2, we also see that the posterior distribution over latent statesfor each group converge to one another as time progresses. This is since, although their priorsare different, the agents are able collect information so that the effect of the priors on the finalposterior computation is negligible by a certain time. Furthermore, as was pointed out in thediscussion following equation (58), the strategies of agents from different sub-populations feed intoone another. This causes agents from different sub-population to move synchronously with respectto one another, as seen in the left of Figure 2, where the upwards and downward variations inagent’s strategies happen simultaneously.The actions of agents from both sub-populations demonstrate that the optimal control incorpo-rates the beliefs of all agents and weighs them against their own. The filter paths in the middleright panel of Figure 2 show how both agents eventually learn the true value of the latent statewith high confidence. And this occurs by observing the paths of the price process only, even if theirbeliefs on its initial state are incorrect. The Radon-Nikodym derivative path in the bottom middlepanel of Figure 2 provides a sense of how far apart are the measures for sub-populations 1 and 2.This process varies significantly over the course of the trading period since agents are constantlyupdating their estimate of the latent price process by observing order-flow and the price paths.The variation in this process also demonstrates there is a non-trivial interdependence between theactions of each agent and the beliefs of all other agents. Using the same latent Markov model as in Section 7.2, we study the predicted behaviour of marketprices and of market participant as we vary the degree of disagreement across sub-populations. Weinvestigate the effect of disagreement on both the volatility of market prices and the total tradingturnover of market participants. 22e assume K = 2 sub-populations of equal size, each with N k = 30 agents. Each of sub-population has identical preferences, but differ in their beliefs of the market and set the agents’preference parameters to Ψ k = a k = 10 − , φ k = 5 × − for k = 1 ,
2, and Ψ k = a k (so that agentsare not necessarily forced to arrive at time T with zero inventory). The initial inventory positionsof agents are drawn from Q j ∼ N (0 , ¯ σ ) for all j ∈ N , with ¯ σ = 50.The two-state latent Markov process Θ t has generator matrix C = 0, and hence Θ t = Θ for all t ∈ [0 , T ], however, Θ is random and inaccesible to agents. Table 3 lists the assumed parametersof the mean-reverting asset price pocess, as well as the latent process. S = 5, θ = { . , . } , κ = 5 . σ = 0 .
14, and λ = λ = 10 − . Table 3: The parameters used for the asset price dynamics to generate Figure 3.
To introduce disagreement into this setup, we assume that sub-populations have different prior be-liefs on the distribution of Θ . In particular, we assume π = (cid:16) . π . − ∆ π (cid:17) and π = (cid:16) . − ∆ π . π (cid:17) , where∆ π ∈ [0 , .
5) quantifies the level of disagreement across the two sub-populations. In simulations, weassume the true probability distribution of the latent process is P (Θ = 4 .
95) = P (Θ = 5 .
05) = 0 . Figure 3: Estimated statistics of the simulated market as the degree of disagreement ∆ π varies. left panel : standarddeviation of the asset price. center panel : absolute deviation of asset price from un-impacted price. right panel :average absolute trading rate. Figure 3 shows various statistics (standard deviation of price, average absolute deviation fromthe un-impacted price, and average absolute trading rate) of trading activity within the marketresulting from 10 simulations. All three panels show a unilateral increase in all of the plottedstatistics as the level of disagreement increases. In particular, the right panel shows that tradingvolume increases, driving up the standard deviation of the asset price process (as see in the leftpanel), and driving up the net impact of trading as shown in the center panel.Extrapolating from the results of these experiments, we can conclude that an increase in disagree-ment amongst a population of agents appears to increase market volume and increase asset pricevolatility. These observations are consistent with those seen in Bayraktar and Munk (2017), whoalso observe an increase in market activity as disagreement increases in markets.
8. Conclusion
This paper introduced a stochastic game for a market in which sub-populations of agents havedifferent risk-preferences and beliefs on the model for the asset price process. By taking the infinitepopulation limit of the model, we obtained a more tractable mean-field game (MFG) model forthe market. By using tools from convex analysis we provide an FBSDE characterization of the23ptimal control of each agent and thus the Nash-equlibirum of the MFG. This FBSDE is highdimensional, and non-standard as the martingale components for each dimension are martingalesacross different probability measures. Through some change-of-measure techniques we manage toobtain a solution to this FBSDE system and for the collection of mean-field optimal controls. Wealso demonstrated that the MFG optimal control satisfies the (cid:15) -Nash property, which implies thatthe limiting Nash-Equilibrium can be arbitrarily close to the Nash-equilibrium in the finite playergame as long as the population size is large enough. Lastly, we provide a LSMC approximationto the MFG optimal control, and use it to study example simulations of markets near their Nash-equilibrium. In a simulation setting, increasing disagreement among market participants appearsto increase price volatility, price deviation from the un-impacted market price, and trading volume.24 ppendix A. Proofs
Appendix A.1. Proof of Lemma 3.2
Proof.
To show that the claim holds, we need to show that for any ρ ∈ (0 , H ν j ( ρν + (1 − ρ ) ω ) − ρH ν j ( ν ) − (1 − ρ ) H ν j ( ω ) > ν, ω ∈ A j where ν t = ω t at most on P × µ null sets. By noting that q j,ρν +(1 − ρ ) ωt = ρ q j,νt + (1 − ρ ) q j,ωt , (A.2)we may compute the difference (A.1) using the representation (11) of H ν j to obtain,LHS of (A.1) = E P k (cid:34) (cid:90) T (cid:110) ρ (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt (cid:17) − (cid:16) ρ (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:124) Γ k (cid:16) ρ (cid:16) ν t q j,νt (cid:17) + (1 − ρ ) (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:111) dt (cid:35) (completing the square) = E P k (cid:34) (cid:90) T (cid:110) ρ (1 − ρ ) (cid:16)(cid:16) ν t q j,νt (cid:17) − (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:124) Γ k (cid:16)(cid:16) ν t q j,νt (cid:17) − (cid:16) ω t q j,ωt (cid:17)(cid:17) (cid:111) dt (cid:35) , where we define the matrix Γ k = (cid:16) a k Ψ k Ψ k φ k (cid:17) .By defining the terms ∆ t = ν t − ω t , q ∆ t = q j,νt − q j,ωt , we can expand the above expression to obtainLHS of (A.1) = ρ (1 − ρ ) E P k (cid:20)(cid:90) T (cid:26) a k ∆ t + φ k (cid:16) q ∆ t (cid:17) + 2Ψ k ∆ t q ∆ t (cid:27) dt (cid:21) . (A.3)As ρ ∈ (0 , φ k ≥
0, the middle termin (A.3) is ≥
0. Next, let us focus on the right-most term in (A.3). Because q ∆0 = 0, we may write q ∆ t = (cid:82) t ∆ u du .Using integration by integrating by parts then yields E P k (cid:90) T t q ∆ t dt = E P k (cid:20)(cid:16) q ∆ T (cid:17) (cid:21) ≥ . (A.4)As Ψ k ≥
0, this inequality implies the right-most term in (A.3) is non-negative. Lastly, notice that if ( P × µ )( ν t (cid:54) = ω t ) >
0, then ( P k × µ )( ν t (cid:54) = ω t ) > E P k (cid:20)(cid:90) T ∆ t dt (cid:21) > . (A.5)As a k >
0, this result together with the inequality from the other two terms, shows that (A.1) is strictly greater thanzero.
Appendix A.2. Proof of Lemma 3.3
Proof.
Using the definition of the Gˆateaux derivative, (cid:68) D H ν j ( ν ) , ω (cid:69) = lim (cid:15) (cid:38) H ν j ( ν + (cid:15) ω ) − H ν j ( ν ) (cid:15) (A.6)we aim to show this limit exists and is equal to the result provided in the lemma. Using the representation for theobjective H ν j in (11), canceling out the t = 0 terms, and using the linearity of the process q j,νt − q j,ν in the variable ν , we have H ν j ( ν + (cid:15) ω ) − H ν j ( ν ) = (cid:15) E P k (cid:20)(cid:90) T (cid:110) ( q j,ωt − q j,ω )( A kt + λ (cid:124) k ν t ) − (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17)(cid:111) dt (cid:21) − (cid:15) E P k (cid:20)(cid:90) T (cid:16) ω t q j,ωt − q j,ω (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17) dt (cid:21) , (A.7) here Γ k = (cid:18) a k Ψ k Ψ k φ k (cid:19) . Dividing by (cid:15) and taking the limit yields (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:20)(cid:90) T (cid:110) ( q j,ωt − q j,ω )( A kt + λ (cid:124) k ν t ) − (cid:16) ν t q j,νt (cid:17) (cid:124) Γ k (cid:16) ω t q j,ωt − q j,ω (cid:17)(cid:111) dt (cid:21) . (A.8)Expanding the right part of the integrand in (A.8) and re-grouping terms, (cid:68) D H ν j ( ν ) , ω (cid:69) = E P k (cid:20) (cid:90) T ( q j,ωt − q j,ω ) (cid:16) A kt + λ (cid:124) k ν t − φ k q j,νt + Ψ k ν t ) (cid:17) dt − (cid:90) T ω t (cid:16) a k ν t + Ψ k q j,νt (cid:17) dt (cid:21) . (A.9)As ν, ω ∈ A j and ν, (cid:98) A ∈ H T , the sufficient conditions for Fubini’s theorem are met. Applying Fubini’s theorem, thetower property and the fact that ω t is F jt -measurable, (cid:68) D H ν j ( ν ) , ω (cid:69) = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt (cid:110) A ku + λ (cid:124) k ν u − φ k q j,νu (cid:111) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:34) ω t (cid:32) − a k ν t − k q j,νT + E P k (cid:34)(cid:90) Tt (cid:110) A ku + λ (cid:124) k ν u − φ k q j,νu (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:35)(cid:33)(cid:35) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) A ku + λ (cid:124) k ν u − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) E P k (cid:104) A ku + λ (cid:124) k ν u |F ju (cid:105) − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt E P k (cid:104) A ku + λ (cid:124) k ν u − φ k q j,νu |F jt (cid:105) du (cid:19)(cid:21) dt = (cid:90) T E P k (cid:20) ω t (cid:18) − a k ν t − k q j,νT + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u |F ju (cid:105) − φ k q j,νu du (cid:111)(cid:19)(cid:21) dt which gives the desired result. Appendix A.3. Proof of Theorem 3.4
Proof.
By using lemmas 3.2 and 3.3 we may apply the results of (Ekeland and Temam, 1999, Section 5) which statethat, for each j ∈ J (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0 , ∀ ω ∈ A j ⇔ ν j, ∗ = arg max ν ∈A j H ν j ( ν ) . (A.10)Further, the strict concavity of H implies that ν j, ∗ is unique up to P × µ null sets. Therefore we need only demonstratethat (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j , if and only ν j, ∗ is the solution to the FBSDE (21). Sufficiency:
Suppose that ν j, ∗ is the solution to the FBSDE (21) and that ν j, ∗ ∈ H T . We now show that ν j, ∗ ∈ A j and that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j .First, the solution to the FBSDE may be represented implicitly as2 a k ν j, ∗ t = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.11)which demonstrates that ν j, ∗ is F j -adapted. Therefore, since ν j, ∗ ∈ H T and ν j, ∗ is F j -adapted, we have that ν j, ∗ ∈ A j . Second, by inserting (A.11) into the expression for the Gˆateaux derivative (18) from Lemma 3.3 and usingthe tower property, we find that it vanishes almost surely. Necessity:
Suppose that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j , then E P k (cid:20) − a k ν j, ∗ t − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) = 0 , P × µ a.e. (A.12)To see this, suppose that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0 for all ω ∈ A j , but (A.12) does not hold. Then, choose (cid:101) ω = ( (cid:101) ω t ) t ∈ [0 ,T ] s.t., (cid:101) ω t = E P k (cid:20) − a k ν j, ∗ t − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u | F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) . (A.13) uch (cid:101) ω is F j -adapted by its very definition. Second, as ν k , ν j, ∗ , A k ∈ H T , Jensen’s and the triangle inequalityapplied to (A.13) implies the bound E P k (cid:20)(cid:90) T ( (cid:101) ω t ) dt (cid:21) ≤ C k (cid:18) E P k (cid:20)(cid:90) T ( ν j, ∗ t ) dt (cid:21) + E P k (cid:20)(cid:90) T (cid:16) ( A kt ) + λ ( ν t ) (cid:17) dt (cid:21)(cid:19) < ∞ , where the constant C k = 4 (cid:0) a k + T Ψ k + T φ k (cid:1) . Hence, (cid:101) ω ∈ H T and therefore (cid:101) ω ∈ A j . Inserting this choice of (cid:101) ω into the expression for the Gˆateaux derivative (18), we see that (cid:104)D H ν j ( ν j, ∗ ) , (cid:101) ω (cid:105) >
0, and hence contradicts theassumption that (cid:104)D H ν j ( ν j, ∗ ) , ω (cid:105) = 0, ∀ ω ∈ A j .Thus, using (A.12) and noting that ν j, ∗ t is F j -adapted, using the tower property, we may write2 a k ν j, ∗ t = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) Tt (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u (cid:12)(cid:12)(cid:12) F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.14)and 2 a k M jt = E P k (cid:20) − k q j,ν j, ∗ T + (cid:90) T (cid:110) E P k (cid:104) A ku + λ (cid:124) k ν u (cid:12)(cid:12)(cid:12) F ju (cid:105) − φ k q j,ν j, ∗ u (cid:111) du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) , (A.15)which solves the FBSDE in the statement of the proposition. Appendix A.4. Proof of Theorem 3.5
We separate this proof in 3 parts, corresponding to each of the claims of the proposition.
Part (I):
To obtain the solution to g , we first compute the SDE for E Q g , using the SDE for g in Equation (31) and theSDE of E Q in Equation (38). After expanding the SDE for E Q g , and grouping terms, we find − d (cid:16) E Q t g ,t (cid:17) = E Q t (cid:98) A t dt − E Q t (cid:110) d M t − ( Z Q t ) − d (cid:104) Z Q , M (cid:105) t + ( Z Q t ) − d Z Q t E Q t g ,t } (cid:111) . (A.16)As Z Q is a Radon-Nikodym derivative process, it must be a Q -martingale, and by extension, the term E Q t ( Z Q t ) − d Z Q t E Q t g ,t is the increment of a Q -martingale. Next, by the Girsanov-Meyer theorem Protter (2005)[Chapter III, Thm. 35], theremainder of the terms in the curly brackets of Equation (A.16) sum to the increment of a Q -martingale. Because ofthis, we may re-write the BSDE for E Q t g ,t as − d (cid:16) E Q t g ,t (cid:17) = E Q t (cid:98) A t dt − d ˜ M t , (A.17)for some martingale term ˜ M . Using this last result, we may write out the implicit form of the solution as E Q t g ,t = E Q (cid:20) (cid:90) Tt E Q u (cid:98) A u du (cid:12)(cid:12)(cid:12)(cid:12) F jt (cid:21) . (A.18)Lastly, multiplying the result on both sides by ( E Q t ) − , we obtain the stated solution. Part (II):
The ODE (30) is a matrix-valued non-symmetric Riccati-type ODE. We prove the claims concerning the ODE (30)by applying theorems and tools for non-symmetric Riccati ODEs in Freiling et al. (2000) and Freiling (2002). Firstly,define ˜ g ,t = g ,T − t . We show that all of the claims hold for ˜ g ,t , and hence also for g ,t .From ODE (30) (cid:26) ∂ t ˜ g ,t = (cid:0) Λ + ˜ g ,t (cid:1) (2 a ) − ˜ g ,t − φ ˜ g , = − Ψ . (A.19)Next we aim to use Theorem 2.3 of Freiling et al. (2000) on ˜ g ,t to prove existence and boundedness of a solution.Using the notation in Freiling et al. (2000), define B = , B = − J, B = − φ , B = Λ J , (A.20)and W = − Ψ , where J = (2 a ) − . To meet the requirements of Theorem 2.3 in Freiling et al. (2000), we must find C, D ∈ R K × K , C = C (cid:124) so that L + L (cid:124) ≤ C + DW + W (cid:124) D (cid:124) >
0, where L = (cid:18) − D φ − CJ + D Λ J − J (cid:124) D (cid:19) . (A.21)Let D = I ( K × K ) and C = 5 Ψ . With these choices of C, D , and using the fact that Ψ is a diagonal matrix with ositive entries, we find that C + DW + W (cid:124) D (cid:124) = Ψ > , (A.22)which meets one of the necessary conditions. The choices of C and D also imply that the matrix L takes the form L = (cid:18) − φ − (5 Ψ + Λ ) J − J (cid:19) . (A.23)Next, as det( L ) = det( − φ ) × det( − J ), the set of eigenvalues of L is the union of the set of eigenvalues of − φ andthose of − J . Because − φ ≤ − J <
0, all eigenvalues of L are guaranteed to be non-positive, and at least oneof them is guaranteed to be non-zero, implying that L <
0. Hence, L + L (cid:124) < g ,t exists and is continuous on the interval [0 , T ], it follows that it is also bounded on thisinterval. Since the solution is guaranteed to exist and to be bounded, we may apply (Freiling, 2002, Thm 3.1), whichguarantees that the solution is unique and takes the form (42), as desired. Part (III):
The reader may verify that the presented solution for the Ricatti ODE (25) is valid. Moreover, it is also easy toverify that the solution is bounded and continuous in the interval [0 , T ]. All that remains is to show that h k ,t ≤ t ∈ [0 , T ]. If we notice that since t < T and γ k ≥ − γ k ( T − t )) ≤ − γ k ( T − t )) ≥
1. As ξ k , Ψ k ≥ k cosh ( − γ k ( T − t ) ) − ξ k sinh ( − γ k ( T − t ) ) ξ k cosh ( − γ k ( T − t ) ) − Ψ k sinh ( − γ k ( T − t ) ) ≥ , (A.24)and the desired result follows. Appendix A.5. Proof of Theorem 3.6
To demonstrate the claim of the theorem, we need to show that the optimality conditions of Theorem (3.4) arefulfilled. As demonstrated in Section 3.3, if there exists solutions to the Ricatti-type ODEs (25) for { h k ,t } k ∈ K , amatrix-valued Ricatti-type ODE 30 for g ,t as well as the vector-valued BSDE (31) for g ,t , then the solution to theoptimality FBSDE (20) follows the exact form presented in the statement of this theorem. In Theorem 3.5, we showedthat there exist solutions to these FBSDEs, and hence the solution to the optimality FBSDE of Theorem (3.4) issolved.All that remains to be shown is that the solution to the optimality FBSDE also belongs to an individual agent’sset of admissible strategies, A j and that the consistency conditions are met.First, we show that ν j, ∗ ∈ A j . To do this, we must demonstrate that ν j, ∗ is F j -predictable and contained in H T .By the definition of Z Q in equation (39), it is an F -adapted process, and by extension E Q t must also be F -predictable.Therefore, by the definition of the conditional expected value, the solution to g ,t presented in Theorem 3.5 mustbe F -predictable, and hence the mean-field processes { ν k } k ∈ K must all be F -predictable as well. Lastly, since ν j, ∗ t = ν kt + h k ,t a k ( q j,ν j, ∗ t − ¯ q k,ν k t ) and since ¯ q k,ν k t is F -predictable, and since h k ,t is deterministic, we have that ν j, ∗ t must be F j -adapted.Next, we must show that ν j, ∗ ∈ H T . Noting that d ¯ q ν ∗ t = ν ∗ t dt = ( g ,t + g ,t ¯ q ν ∗ t ) dt and that ¯ q ν ∗ = ( ¯ m k ) k ∈ K = ¯ m ,we can solve for ¯ q t directly as ¯ q ν ∗ t = E (cid:18)(cid:90) t g ,s ds (cid:19) ¯ m + (cid:90) t E (cid:18)(cid:90) s g ,s ds (cid:19) g ,s ds , (A.25)where E (cid:16)(cid:82) t g ,s ds (cid:17) is the solution to the time-ordered matrix exponential of g ,s . Thus by Yonge’s inequality andthe boundedness of g ,t , E P k (cid:90) T (cid:107) ¯ q ν ∗ u (cid:107) du ≤ (cid:18) (cid:107) ¯ m (cid:107) (cid:90) T (cid:13)(cid:13) E (cid:18)(cid:90) t g ,s ds (cid:19)(cid:13)(cid:13) ds + T (cid:90) T (cid:13)(cid:13) E (cid:18)(cid:90) s g ,s ds (cid:19)(cid:13)(cid:13) (cid:13)(cid:13) g ,s (cid:13)(cid:13) ds (cid:19) (A.26) ≤ C + C (cid:90) T (cid:13)(cid:13) g ,s (cid:13)(cid:13) ds < ∞ , (A.27)for some C , C >
0, where (cid:107)·(cid:107) represents the (cid:96) operator norm, . Hence, ¯ q ν ∗ ∈ H T . ext, using this last fact, if we compute the expected integrated squared norm of ν over [0 , T ], we find that E P k (cid:90) T (cid:107) ν ∗ u (cid:107) du = E P k (cid:90) T (cid:13)(cid:13)(cid:13) g ,t + g ,t ¯ q ν u (cid:13)(cid:13)(cid:13) du (A.28) ≤ (cid:18) E P k (cid:90) T (cid:13)(cid:13) g ,t (cid:13)(cid:13) du + E P k (cid:90) T (cid:107) g ,t (cid:107) (cid:107) ¯ q ν u (cid:107) du (cid:19) (A.29) ≤ C + C E P k (cid:90) T (cid:107) ¯ q ν u (cid:107) du < ∞ , (A.30)for some constants C , C > g ,t is bounded over the interval [0 , T ] and the fact that g ∈ H T (as stated in the conditions of the theorem). Hence, ν ∈ H T .Next, notice that E P k (cid:90) T | ν j, ∗ u | du ≤ (cid:18) E P k (cid:90) T | ν k, ∗ u | du + E P k (cid:90) T | ν j, ∗ u − ν k, ∗ u | du (cid:19) . (A.31)As ν k, ∗ ∈ H T , the above demonstrates that it is sufficient to show that ν j, ∗ u − ν k, ∗ u ∈ H T to guarantee that ν j, ∗ ∈ H T .Similarly to ¯ q t , if we notice that d ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) = ( ν j, ∗ t − ν k, ∗ t ) dt = h k ,t a k ( q j,ν j, ∗ t − ¯ q k,ν k, ∗ t ) dt and that ( q j,ν j, ∗ − ¯ q k,ν k, ∗ ) = Q j − ¯ m k , we can solve exactly for this difference as q j,ν j, ∗ t − ¯ q k,ν k, ∗ t = (cid:16) Q j − ¯ m k (cid:17) e (cid:82) t hk ,t ak . (A.32)As E P k ( Q j ) < ∞ and h k ,t ≤ (cid:16) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:17) ∈ H T .Using the solution to ν j, ∗ and using the result above, E P k (cid:90) Tt | ν j, ∗ u − ν k, ∗ u | du ≤ sup t ∈ [0 ,T ] ( h k ,t ) a k E P k (cid:90) Tt (cid:12)(cid:12)(cid:12) q j,ν j, ∗ t − ¯ q k,ν k, ∗ t (cid:12)(cid:12)(cid:12) du < ∞ , (A.33)where we use h k ,t < ν j, ∗ u − ν k, ∗ u ∈ H T and ν j, ∗ ∈ H T . Thus we have demonstrated that ν j, ∗ is F j -predictable, and that ν j, ∗ ∈ H T . Therefore ν j, ∗ ∈ H T .Lastly, we demonstrate that the consistency conditions are met. In other words, we must show that ν k, ∗ t = lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k ν j, ∗ t (A.34)for all t ∈ [0 , T ] and for all k ∈ K . Using the solution to q j,ν j, ∗ t − ¯ q k,ν k, ∗ t , we find thatlim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k (cid:16) ν j, ∗ t − ν k, ∗ t (cid:17) = e (cid:82) t hk ,t ak lim N →∞ N ( N ) k (cid:88) j ∈K ( N ) k (cid:16) Q j − ¯ m k (cid:17) . (A.35)Now since the Q j have bounded variance, the limit on the right vanishes as N → ∞ by the law of large numbers.Hence, the consistency conditions are met.The last statement follows from Theorem 3.4. Appendix A.6. Proof of Proposition 5.1
Proof.
Let us first note that we may represent each element in Z P k t as a Doob-martingale since d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t = E P k (cid:34) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) . (A.36)Recall the global filtration G = ( G t ) t ∈ [0 ,T ] introduced in Section 2.2, with the property that G t ⊇ (cid:87) j ∈ N F jt for all t ∈ [0 , T ]. By this definition, we have that Z P k t = diag (cid:32) E P k (cid:34) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35)(cid:33) k (cid:48) ∈ K . (A.37) ach term d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T is in fact quite easy to compute. Let us remember that only difference between measures d P k and d P k (cid:48) is the law of the initial value of the latent process, Θ . For each k ∈ K , we have that P k (Θ = θ j ) = π k,j . Thus,we may write the expression for each Radon-Nikodym derivative conditional on G T as d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) G T = (cid:88) i ∈ J π k (cid:48) ,i π k,i { Θ = θ i } . (A.38)As each π k (cid:48) ,i π k,i is constant, taking the conditional expected value with respect to P k yields d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t = (cid:88) i ∈ J π k (cid:48) ,i π k,i P k (cid:16) Θ = θ i (cid:12)(cid:12)(cid:12) F t (cid:17) . (A.39)Assembling the d P k (cid:48) d P k terms above into a diagonal matrix, we find that the expression for Z P k t follows the form in thestatement of the proposition. Appendix A.7. Proof of Proposition 5.2
Proof.
We will need to show here that the expression for g ,t presented in Theorem 3.5 satisfies g ∈ H T . In otherwords, we need to show that E P k (cid:104)(cid:82) T (cid:107) g ,t (cid:107) dt (cid:105) < ∞ for all k ∈ K .The first step will be to show that the operator norm of E P k t is almost surely bounded above when using the latentMarkov chain model. For the remainder of this proof, we suppress the superscript P k for ease of notation. Simplyapplying Itˆo’s lemma, we find that E t = ˜ E t Z P k t , where ˜ E t is the solution to the SDE d ˜ E t = ˜ E t Z P k t G t ( Z P k t ) − dt (A.40)with the initial condition ˜ E = I K × K . Writing out the implicit solution of the differential equation and taking theoperator norm we find that (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) = (cid:13)(cid:13) I K × K + (cid:90) t ˜ E u Z P k u G u ( Z P k u ) − du (cid:13)(cid:13) (A.41) ≤ (cid:90) t (cid:13)(cid:13) ˜ E u Z P k u G u ( Z P k u ) − (cid:13)(cid:13) du (A.42) ≤ (cid:90) t (cid:13)(cid:13) ˜ E u (cid:13)(cid:13) (cid:13)(cid:13) Z P k u (cid:13)(cid:13) (cid:13)(cid:13) G u (cid:13)(cid:13) (cid:13)(cid:13) ( Z P k u ) − (cid:13)(cid:13) du , (A.43)where we use the triangle inequality, Jensen’s inequality and the property of the operator norm. As shown inProposition 5.1, we know that Z P k t is almost surely bounded over the interval [0 , T ]. From Theorem 3.5, we alsoknow that G t is bounded over this same interval. Now, looking back to the definition of Z t , we find that Z − t = diag (cid:32) d P k (cid:48) d P k (cid:12)(cid:12)(cid:12) F t (cid:33) , (A.44)which can also be expressed in the same way as presented in Proposition 5.1, which in turn implied that Z t is almostsurely bounded over [0 , T ]. Therefore, it follows that there exists a constant C > (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) ≤ C (cid:90) t (cid:13)(cid:13) ˜ E u (cid:13)(cid:13) du . (A.45)Applying Gr¨onwall’s lemma to the above yields that sup t ∈ [0 ,T ] (cid:13)(cid:13) ˜ E t (cid:13)(cid:13) ≤ e C T < ∞ . Repeating the same analysis on (cid:13)(cid:13) ˜ E − t (cid:13)(cid:13) yields the very same bound. Finally, since the operator norms of Z P k t , ( Z P k t ) − , ˜ E t and ˜ E − t are all boundedover [0 , T ], we get that there exists a constant C > t,u ∈ [0 ,T ] (cid:13)(cid:13) ( E t ) − E u (cid:13)(cid:13) < e TC Next, we wish to show that (cid:98) A ∈ H T . Under our model, we may compute (cid:98) A k as (cid:98) A kt = E P k (cid:34) J (cid:88) i =1 α it { Θ t = θ i } (cid:12)(cid:12) F t (cid:35) (A.46)= (cid:88) i ∈ J α it P k (Θ t = θ i (cid:12)(cid:12) F t ) . (A.47) herefore, since all of the P k terms in the above are bounded above by 1, we may use Young’s inequality to write (cid:13)(cid:13)(cid:13) (cid:98) A (cid:13)(cid:13)(cid:13) ≤ K (cid:88) i ∈ J (cid:13)(cid:13)(cid:13) α it (cid:13)(cid:13)(cid:13) . (A.48)As each α it ∈ H T , we get that (cid:98) A ∈ H T .Now we can proceed to showing the main result. Using the bounds we derived above and Jensen’s inequality, wemay write E P k (cid:20)(cid:90) T (cid:107) g ,t (cid:107) dt (cid:21) ≤ E P k (cid:34)(cid:90) T (cid:13)(cid:13)(cid:13)(cid:13) E P k (cid:20)(cid:90) Tt ( E t ) − E u (cid:98) A u du (cid:12)(cid:12)(cid:12) F t (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) dt (cid:35) (A.49) ≤ E (cid:20)(cid:90) T (cid:90) Tt (cid:13)(cid:13)(cid:13) ( E t ) − E u (cid:98) A u (cid:13)(cid:13)(cid:13) du dt (cid:21) (A.50) ≤ E (cid:20)(cid:90) T (cid:90) Tt (cid:13)(cid:13) ( E t ) − E u (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) (cid:98) A u (cid:13)(cid:13)(cid:13) du dt (cid:21) (A.51) ≤ ( T + 1) e C T E P k (cid:20)(cid:90) T (cid:13)(cid:13)(cid:13) (cid:98) A u (cid:13)(cid:13)(cid:13) du (cid:21) < ∞ , (A.52)where in the last line, we use the fact that (cid:98) A ∈ H T . Thus, we find that g ∈ H T , which verifies the claim of theproposition. Appendix A.8. Proof of Theorem 4.1
We begin the proof of Theorem 4.1 by introducing a lemma regarding the distance between the mean-field gameobjective H j and the finite player game objective H j . Lemma Appendix A.1.
Let ν ∈ A j be some arbitrary admissible control and ν − j, ∗ ∈ A − j be the collection ν − j, ∗ := (cid:0) ν , ∗ , . . . , ν j − , ∗ , ν j +1 , ∗ , . . . , ν N, ∗ (cid:1) of optimal controls defined by equation (44) in Theorem 3.6 for all agentsexcept for j . Let us also assume that ν ∗ = (cid:0) ν k, ∗ (cid:1) k ∈ K follows the dynamics of equation (45) in Theorem 3.6. Then (cid:12)(cid:12)(cid:12) H j ( ν, ν − j, ∗ ) − H ν ∗ j ( ν ) (cid:12)(cid:12)(cid:12) = o ( δ N ) + o ( 1 N ) . (A.53) Proof.
Using the definitions of H ν ∗ j and H j and simplifying down the equations, we find that (cid:12)(cid:12)(cid:12) H j ( ν, ν − j, ∗ ) − H j ( ν ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k (cid:34) (cid:88) k (cid:48) ∈ K (cid:90) T λ k,k (cid:48) (cid:16) p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) t (cid:17) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.54) ≤ (cid:88) k (cid:48) ∈ K λ k,k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) , ∗ t dt (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (A.55)Therefore it is sufficient for us to show that each of the expected values in the sum of (A.55) is o ( N − ) + o ( δ N ).Next, notice that using the definitions of ν k (cid:48) , ( N ) t and p ( N ) k (cid:48) , we can decompose the difference of the mean-field ratesbetween the agent’s rate and the rate of all others p ( N ) k (cid:48) ν k (cid:48) , ( N ) t − p k (cid:48) ν k (cid:48) , ∗ t = 1 N ( ν t − ν j, ∗ t ) + 1 N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) , (A.56)where ν j, ∗ t is the optimal control that agent-j would have taken in the limiting game.Using the triangle inequality and Jensen’s along with the last result, we get that (A.55) ≤ (cid:88) k (cid:48) ∈ K λ k,k (cid:48) N E P k (cid:20)(cid:90) T | ν t − ν j, ∗ t | dt (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (A.57) It is clear that ν t − ν j, ∗ t ∈ A j so we can guarantee that E P k (cid:104)(cid:82) T | ν t − ν j, ∗ t | dt (cid:105) is bounded and independent of N .Therefore, 1 N E P k (cid:20)(cid:90) T | ν t − ν j, ∗ t | dt (cid:21) = o ( 1 N ) . (A.58) herefore all that’s left to show is that the right part of the summand of (A.57) vanishes at an appropriate speed.By plugging in the manipulation p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t = ( p ( N ) k (cid:48) − p k (cid:48) ) ν j, ∗ t + p k (cid:48) ( ν j, ∗ t − ν k (cid:48) , ∗ t ) (A.59)and using the triangle inequality and Jensen’s inequality, we find that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( p ( N ) k (cid:48) ν j, ∗ t − p k (cid:48) ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) p ( N ) k (cid:48) − p k (cid:48) (cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T | ν j, ∗ t | dt (cid:21) (A.60)+ p k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P k N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:90) T ( ν j, ∗ t − ν k (cid:48) , ∗ t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.61)As ν j, ∗ ∈ A j , we find that E P k (cid:104)(cid:82) T | ν j, ∗ t | dt (cid:105) < ∞ . Therefore by the assumption of the theorem, we get (cid:12)(cid:12)(cid:12) p ( N ) k (cid:48) − p k (cid:48) (cid:12)(cid:12)(cid:12) E P k (cid:20)(cid:90) T | ν j, ∗ t | dt (cid:21) = o ( δ N ) . (A.62)Next, using the structure of the solution for ν j, ∗ t from Theorem 3.6, equation (A.32) and the fact that h k ,t is bounded,we get (A.61) = p k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) E P k (cid:34)(cid:16) Q j − ¯ m k (cid:17) (cid:90) T e (cid:82) t hk ,t ak dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (A.63) ≤ C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N ( N ) k (cid:88) i ∈K ( N ) k (cid:48) (cid:16) E P k (cid:104) Q j (cid:105) − ¯ m k (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 . (A.64)Hence, the right part of equation A.57 is equal to o ( δ N ) + o ( N − ) and the claims of the lemma hold true Appendix A.8.1. Main Proof of Theorem 4.1
Proof.
We prove the result of the theorem by using the Lemma Appendix A.1. First, let us note that by the definitionof the supremum, H j ( ω, ν − j, ∗ ) ≤ sup ν ∈A j H j ( ν, ν − j, ∗ ) (A.65)holds for all ω ∈ A j , and therefore the left-most inequality in the statement of Theorem 4.1 holds.Next, we must show that the right-most inequality in the statement of Theorem 4.1 also holds. First let us notethat by Lemma Appendix A.1, for any ν ∈ A j , H j ( ν, ν − j, ∗ ) ≤ H ν ∗ j ( ν ) + o ( δ N ) + o ( N − ) (A.66) ≤ H ν ∗ j ( ν j, ∗ ) + o ( δ N ) + o ( N − ) , (A.67)where we use the fact that H j ( ν j, ∗ ) = sup ν ∈A j H j ( ν ). Applying Lemma Appendix A.1 again, we find that H j ( ν, ν − j, ∗ ) ≤ H j ( ν j , ν − j, ∗ ) + 2 o ( δ N ) + 2 o ( N − ) . (A.68)As the above inequality holds for all ν ∈ A j we may take the supremum on the left, and cancel out the constantterms multiplying the little- o terms to yield the final result,sup ν ∈A j H j ( ν, ν − j, ∗ ) ≤ H j ( ν j , ν − j, ∗ ) + o ( δ N ) + o ( N − ) . (A.69) Appendix B. Filtering and Smoothing Equations
In sections 5, 6 and 7 we refer to the Radon-Nikodym process Z Q and for the F -projected drift process (cid:98) A which arerequired for the approximation of the optimal control. This appendix will provide the details on how these quantities re computed for the mean-revertingmodel used in the numerical experiments present in Section 7.Let us recall the model provided in Section 7. We assume that the un-impacted asset price process has the dynamics dF t = κ (Θ t − F t ) dt + σ dW t , where Θ t is a continuous-time Markov chain with generator matrix C which takes values in the set { θ i } Ji =1 . Whatvaries across each measure P k is the distribution over the initial state, Θ , where we assume that P k (Θ = θ i ) = π k,i for each i ∈ { , , . . . , J } and k ∈ K . Our first step will be to compute the F t -adapted process (cid:98) A t = (cid:16) E P k [ A t |F t ] (cid:17) k ∈ K .Using the dynamics of F t , we get that E P k [ A t |F t ] = E P k [ κ (Θ t − F t ) |F t ]= κ (cid:16) E P k [Θ t |F t ] − F t (cid:17) = κ (cid:32) J (cid:88) i =1 θ i P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) − F t (cid:33) . Therefore, to compute (cid:98) A we need to compute the posterior probabilities of each state of Θ t , P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) . Thelemma that follows gives an explicit way of computing these probabilities. Lemma Appendix B.1 (Filtering Equation) . Let us assume that the Novikov condition E P k (cid:20) exp (cid:26)(cid:90) T ( A u ) du (cid:27)(cid:21) < ∞ (B.1) holds for all k ∈ K . For each i = 1 , . . . , J and k ∈ K , let π k,it = P k (cid:0) Θ t = θ i (cid:12)(cid:12) F t (cid:1) , and define the processes Λ k,i = (cid:16) Λ k,it (cid:17) t ∈ [0 ,T ] , satisfying the dynamics d Λ k,it = Λ k,it σ − κ ( θ i − F t ) dF t + J (cid:88) j =1 C i,j Λ k,jt dt , along with the initial condition Λ k,i = π k,i . Then the filters π k,jt satisfy the relation π k,jt = Λ k,it (cid:44)(cid:32) J (cid:88) j =1 Λ k,jt (cid:33) . Proof.
For the proof of this lemma, we refer the reader to the proof of a more general version of this statement foundin (Casgrain and Jaimungal, 2016, Theorem 3.1).The next task is to compute the process Z Q for any choice of Q = P k k ∈ K . We can do this by applyingProposition 5.1 to the model dynamics that we have. This Proposition 5.1 allows us to compute Z P k , given that wecan compute the value of the time-0 smoothers for Θ, P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) . The following lemma provides an expressionfor the computation of these smoothers. Lemma Appendix B.2 (Smoothing Equation) . Assume that the Novikov condition (B.1) holds. For each k ∈ K and i, j ∈ { , , . . . , J } , let us define the process ˜Λ k,i,j = (cid:16) ˜Λ k,i,jt (cid:17) t ∈ [0 ,T ] , where each ˜Λ k,i,j satisfies the SDE d ˜Λ k,i,jt = ˜Λ k,i,jt σ − κ ( θ j − F t ) dF t + J (cid:88) (cid:96) =1 C j,(cid:96) ˜Λ k,i,(cid:96)t dt , and the initial condition ˜Λ k,i,j = { i = j } . Then the time-0 smoother for Θ satisfies the equation P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) = (cid:32) J (cid:88) j =1 π k,i ˜Λ k,i,jt (cid:33) (cid:44) J (cid:88) i,(cid:96) =1 π k,i ˜Λ k,i,(cid:96)t , Proof.
For each k ∈ K , let us define the measure ˜ Q k which is specified through the Radon-Nikodym derivative ζ kt = d P k d ˜ Q k (cid:12)(cid:12)(cid:12) F t = exp (cid:26)(cid:90) t A u σ − dF u − (cid:90) t ( A u ) σ − du (cid:27) . he Radon-Nikodym derivative above is defined specifically so that under measure ˜ Q k , ( F t − F ) σ − is a Brownianmotion, independent of Θ t and so that the dynamics of Θ t are left unchanged.Using this new measure, we can re-represent the time-0 smoother we are looking for as P k (cid:0) Θ = θ i (cid:12)(cid:12) F t (cid:1) = E ˜ Q k (cid:2) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:3) E ˜ Q k (cid:2) ζ kt (cid:12)(cid:12) F t (cid:3) = E ˜ Q k (cid:2) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:3)(cid:80) Jj =1 E ˜ Q k (cid:104) { Θ = θ j } ζ kt (cid:12)(cid:12) F t (cid:105) Now, if we take a look at the term in the numerator, we can further expand it as E ˜ Q k (cid:104) { Θ = θ i } ζ kt (cid:12)(cid:12) F t (cid:105) = J (cid:88) j =1 E ˜ Q k (cid:104) { Θ = θ i } { Θ t = θ j } ζ kt (cid:12)(cid:12) F t (cid:105) = π k,i J (cid:88) j =1 E ˜ Q k (cid:104) { Θ t = θ j } ζ kt (cid:12)(cid:12) F t ∨ σ (Θ = θ i ) (cid:105) , where we use Bayes’ rule to get to the last line.Following the proof of (Casgrain and Jaimungal, 2016, Theorem 3.1), we find that˜Λ k,i,jt = E ˜ Q k (cid:104) { Θ t = θ j } ζ kt (cid:12)(cid:12) F t ∨ σ (Θ = θ i ) (cid:105) satisfies the SDE found in the statement of the theorem, with the initial condition ˜Λ k,i,j = { i = j } . Plugging thisback into the previous expressions, we obtain the final result. eferences Bank, P., H. M. Soner, and M. Voß (2017). Hedging with temporary price impact. Mathematics and FinancialEconomics 11(2), 215–239.Bayraktar, E. and A. Munk (2017). Mini-flash crashes, model risk, and optimal execution.Bender, C. and J. Steiner (2012). Least-squares Monte Carlo for backward SDEs. In Numerical methods in finance,pp. 257–289. Springer.Bensoussan, A., T. Huang, and M. Lauri`ere (2018). Mean field control and mean field game models with severalpopulations. arXiv preprint arXiv:1810.00783.Bouchard, B., M. Fukasawa, M. Herdegen, and J. Muhle-Karbe (2018). Equilibrium returns with transaction costs.Finance and Stochastics 22(3), 569–601.Cardaliaguet, P. and C.-A. Lehalle (2016). Mean field game of controls and an application to trade crowding. arXivpreprint arXiv:1610.09904.Carmona, R. and F. Delarue (2013). Probabilistic analysis of mean-field games. SIAM Journal on Control andOptimization 51(4), 2705–2734.Carmona, R., J.-P. Fouque, and L.-H. Sun (2013). Mean field games and systemic risk.Cartea, ´A., R. Donnelly, and S. Jaimungal (2017). Algorithmic trading with model uncertainty. SIAM Journal onFinancial Mathematics 8(1), 635–671.Casgrain, P. and S. Jaimungal (2016, Nov). Trading algorithms with learning in latent alpha models. MathematicalFinance, Forthcoming.Casgrain, P. and S. Jaimungal (2018). Meanf field games with partial information for algorithmic trading. arXivpreprint arXiv:1803.04094.Choi, J. H., K. Larsen, and D. J. Seppi (2018). Smart twap trading in continuous-time equilibria. Available at SSRN3146658.Cirant, M. (2015). Multi-population mean field games systems with neumann boundary conditions. Journal deMath´ematiques Pures et Appliqu´ees 103(5), 1294–1315.Ekeland, I. and R. Temam (1999). Convex analysis and variational problems. SIAM.Firoozi, D. and P. E. Caines (2015). ε -nash equilibria for partially observed lqg mean field games with major agent:Partial observations by all agents. In Decision and Control (CDC), 2015 IEEE 54th Annual Conference on, pp.4430–4437. IEEE.Firoozi, D. and P. E. Caines (2016). Mean field game ε -nash equilibria for partially observed optimal executionproblems in finance. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pp. 268–275. IEEE.Freiling, G. (2002). A survey of nonsymmetric riccati equations. Linear algebra and its applications 351, 243–270.Freiling, G., G. Jank, and A. Sarychev (2000). Non-blow-up conditions for riccati-type matrix differential anddifference equations. Resultate der Mathematik 37(1-2), 84–103.Gobet, E., J.-P. Lemor, X. Warin, et al. (2005). A regression-based monte carlo method to solve backward stochasticdifferential equations. The Annals of Applied Probability 15(3), 2172–2202.Gu´eant, O., J.-M. Lasry, and P.-L. Lions (2011). Mean field games and applications. Paris-Princeton lectures onmathematical finance 2010, 205–266.Huang, M. (2010). Large-population LQG games involving a major player: the nash certainty equivalence principle.SIAM Journal on Control and Optimization 48(5), 3318–3353.Huang, M., P. E. Caines, and R. P. Malham´e (2007, Sep.). Large-population cost-coupled LQG problems with nonuni-form agents: Individual-mass behavior and decentralized (cid:15) -Nash equilibria. IEEE Trans. Autom. Control 52(9),1560–1571. uang, M., R. P. Malham´e, P. E. Caines, et al. (2006). Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Communications in Information &Systems 6(3), 221–252.Huang, X. and S. Jaimungal (2017). Robust stochastic games and systemic risk. Available athttps://ssrn.com/abstract=3024021.Huang, X., S. Jaimungal, and M. Nourian (2019). Mean-field game strategies for optimal execution. AppliedMathematical Finance 26(2), 153–185.Lasry, J.-M. and P.-L. Lions (2007). Mean field games. Japanese journal of mathematics 2(1), 229–260.Letourneau, P. and L. Stentoft (2016). Improved greeks for american options using simulation.Nourian, M. and P. E. Caines (2013). (cid:15) -nash mean field game theory for nonlinear stochastic dynamical systems withmajor and minor agents. SIAM Journal on Control and Optimization 51(4), 3302–3331.Protter, P. E. (2005). Stochastic differential equations. In Stochastic integration and differential equations. Springer.Wang, Y. and R. Caflisch (2009). Pricing and hedging american-style options: a simple simulation-based approach.-nash mean field game theory for nonlinear stochastic dynamical systems withmajor and minor agents. SIAM Journal on Control and Optimization 51(4), 3302–3331.Protter, P. E. (2005). Stochastic differential equations. In Stochastic integration and differential equations. Springer.Wang, Y. and R. Caflisch (2009). Pricing and hedging american-style options: a simple simulation-based approach.