Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach
Bastien Baldacci, Iuliia Manziuk, Thibaut Mastrolia, Mathieu Rosenbaum
aa r X i v : . [ q -f i n . M F ] D ec Market making and incentives design in the presence of adark pool: a deep reinforcement learning approach ∗ Bastien
Baldacci † Iuliia
Manziuk ‡ Thibaut
Mastrolia § Mathieu
Rosenbaum ¶ December 4, 2019
Abstract
We consider the issue of a market maker acting at the same time in the lit and dark pools ofan exchange. The exchange wishes to establish a suitable make-take fees policy to attract trans-actions on its venues. We first solve the stochastic control problem of the market maker withoutthe intervention of the exchange. Then we derive the equations defining the optimal contract tobe set between the market maker and the exchange. This contract depends on the trading flowsgenerated by the market maker’s activity on the two venues. In both cases, we show existenceand uniqueness, in the viscosity sense, of the solutions of the Hamilton-Jacobi-Bellman equationsassociated to the market maker and exchange’s problems. We finally design deep reinforcementlearning algorithms enabling us to approximate efficiently the optimal controls of the marketmaker and the optimal incentives to be provided by the exchange.
Keywords:
Market making, dark pools, regulation, make-take fees, stochastic control, principal-agent problem, deep reinforcement learning, actor-critic method
Since the seminal work [1], a vast literature on optimal market making problems has emerged. Amarket maker is a liquidity provider whose role is to post orders on the bid and ask sides of the limitorder book of an underlying asset. Various extensions of [1] have been considered, see for example[7, 12] and the books [6, 11] for further references. In most of these works, it is assumed that there ∗ This work benefits from the financial support of the Chaires Analytics and Models for Regulation, FinancialRisk, and Finance and Sustainable Development. The authors would like to thank Charles-Albert Lehalle for fruitfuldiscussions and remarks. Bastien Baldacci and Mathieu Rosenbaum gratefully acknowledge the financial support of theERC Grant 679836 Staqamof. Thibaut Mastrolia gratefully acknowledges the support of the ANR project PACMANANR-16-CE05-0027. † École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected]. ‡ Université Paris 1 Panthéon-Sorbonne. Centre d’Economie de la Sorbonne. 106, boulevard de l’Hôpital, 75013Paris, France, [email protected] § École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected] ¶ École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected]
1s no make-take fees system on the market. The problem of relevant make-take fees is studied quan-titatively in [3, 10]. In these papers, the policies are designed in the context of traditional liquidityvenues, or so-called “lit pools”. On these venues, the order book is visible to market participants,and transactions are fully transparent. Market takers can in particular monitor the quotes offered bymarket makers.However, recent regulatory changes have induced a rise of different types of alternative trading mech-anisms, notably “dark pools”, which have gained a significant market share. Nowadays, many majorexchanges, such as Bats-ChiX and Turquoise, have their dark pools in addition to their major trad-ing platforms. Furthermore, several traditional exchanges such as NYSE and Euronext offer tradingplatforms whose functioning is inspired mainly by dark pools. Trading rules for dark pools are verydiversified, but they share at least two important properties. The first one is the absence of a visibleorder book for market participants, which implies that investors have no information on the amountof liquidity posted by market makers. Second, aiming at improving prices for clients compared tothe lit venue, dark pools usually set prices that are different from those in the lit pool. For example,many dark pools take the mid-price of the lit pool as their transaction price. Because of these twoeffects, it is presumed that trades in dark pools have no or less price impact. This feature enablesmarket makers to mitigate their inventory risk. Finally, a remarkable phenomenon is that dark poolsare prone to a latency effect: the price being monitored in the lit pool can change between the time ofa request in the dark pool and that of the corresponding transaction. Such price discrepancy due tolatency is particularly frequent in the presence of high imbalance because the price is likely to movewhen liquidity is scarce on one side of the book.As the market impact of trades on a dark pool is less important or delayed, market makers can alsouse it to liquidate large positions. Therefore there is a trade-off between transacting in the dark poolat a lousy price with low impact or in the lit pool at a better price with higher impact. Dark poolsare also very attractive for market takers because of the reduced market impact and the possibilityto be executed at a better price than in the lit pool.To our knowledge, most of studies treat the issue of trading in dark pools mainly from the point ofview of optimal liquidation: a trader wishing to buy or sell a large number of shares of one or severalstocks and needing to find an optimal order placement strategy between the lit and dark pools, seefor example [15]. In this paper, we rather focus on the behavior of a market maker, acting on both litand dark venues. In the lit market, we assume that there is an efficient price S t and that the marketmaker always posts volumes on the bid and ask sides at prices S t + T and S t − T , where T representsthe half-tick of the market. The market maker also provides liquidity in the dark pool where thetransaction price is the efficient price S t (possibly with the latency effect). This can partially be seenas the dual problem of [10], without dark pools, where the posted volume is fixed at one unit, and themarket maker optimizes the quoted spread. In addition to market impact and latency phenomena,we also take into account transaction costs for market orders on both venues, which can be smaller inthe dark pool. Thus, in our setting, a single market maker only needs to select the volumes to poston the bid and ask sides of both lit and dark pools. Note however, that transactions’ reporting imposed by regulation in most markets may still induce some delayedprice impact. We have in mind here a large tick asset for which the spread equals the tick size.
2n exchange managing the lit and dark venues wishes to attract transactions. Inspired by thework [10], we consider that the exchange offers a contract to the market maker whose remunera-tion at a terminal time is determined according to the executed transactions on both venues. Thisis a so-called principal-agent framework, first formalized in [8, 9, 17]. Here, the wealth of the ex-change (the principal) depends on the market order flows, which are a function of the volumes postedby the market maker (the agent). However, the exchange cannot control those volumes and mayonly provide incentives to influence the market maker’s behavior. These incentives take the form of acontract between the market maker and the exchange, whose payoff depends on observed trading flows.To find an optimal contract and optimal volumes for the market maker in response to this contract, weneed to solve a nonlinear Hamilton-Jacobi-Bellman (HJB for short) equation. Dimensionality (abovefour) and complexity of the resulting equations do not allow us to apply classical root-finding algo-rithms. Therefore we use a method based on neural networks to solve our HJB equations. Neuralnetworks have been at the core of recent studies on high-dimensional PDE resolution. In [13], the au-thors introduce a deep learning-based methodology that can handle general high-dimensional parabolicPDEs. This approach relies on the reformulation of PDEs via Backward Stochastic Differential Equa-tions, where neural networks approximate the gradients of the unknown solution. Since then, manyextensions have been proposed, see for example [2, 14].In our setting, the market maker has to fix volumes in response to the incentives of the exchange.These volumes are functions of the incentives (and of the market maker’s inventory), which are thesolution of a nonlinear equation. The resolution of our principal-agent problem consists of two stages.The first stage is to represent the volumes posted by the market maker by a neural network. Takinginto account the optimal response of the market maker to given incentives, the exchange needs tochoose the contract maximizing its utility. So the second stage is to solve a HJB equation to obtain theoptimal contract. However, dimensionality and the high degree of nonlinearity of this equation makestandard numerical methods hard to apply. We circumvent this difficulty by adopting a reinforcementlearning method. More precisely, we use an actor-critic approach where not only the controls of theexchange, but also its value function are represented by neural networks. The essence of this methodis the alternation of the learning phases of the controls and of the value function.The paper is organized as follows. Market dynamics are introduced in Section 2. In Section 3, we firstinvestigate the problem of a market maker acting on both lit and dark venues without any incentivepolicy from the exchange. His goal is to maximize his PnL process while managing his inventory risk.It is a stochastic control problem, where the corresponding HJB equation cannot be solved explicitly.We show existence and uniqueness of a viscosity solution for this equation.In Section 4, we analyze the bi-level optimization problem associated with the issue of optimal con-tracting between the market maker and the exchange owning both lit and dark pools. Followingrecent works on make-take fees policies mentioned above, we first prove a representation theorem forthe contract proposed to the market maker. We then establish existence and uniqueness of a viscositysolution for the HJB equation corresponding to the problem of the exchange.A key difference with [3, 10] is the absence of a closed-form solution for the best response of the3arket maker to a given contract. Therefore, the HJB equation of the exchange cannot be solvedexplicitly. In Section 5, we introduce a deep reinforcement learning method as a computational toolenabling us to address both exchange and market maker’s problems in practice. We conclude thissection with numerical experiments, illustrating various behaviors of the market maker under differentmarket scenarios.
The framework considered throughout this paper is inspired by the article [1] in which the authorsinvestigate the problem of optimal market making without intervention of an exchange. Let
T > V l , V d ⊂ N the sets of possible values for volumes in the lit and dark pools, ofcardinality V l , V d . We define Ω := Ω c × Ω V l + V d ) d with Ω c the set of continuous functions from[0 , T ] into R and Ω d the set of piecewise constant càdlàg functions from [0 , T ] into N . Ω is a subspaceof the Skorokhod space D ([0 , T ] , R V l + V d )+1 ) of càdlàg functions from [0 , T ] into R V l + V d )+1 andwrite F for the trace Borel σ -algebra on Ω, where the topology is the one associated with the usualSkorokhod distance on D ([0 , T ] , R V l + V d )+1 ).We define ( X t ) t ∈ [0 ,T ] := ( W t , N i,j,kt ) t ∈ [0 ,T ] ,i ∈{ a,b } ,j ∈{ l,d } ,k ∈V j as the canonical process on Ω, that is for any ω := ( w, n i,j,k ) ∈ Ω W t ( ω ) := w ( t ) , N i,j,kt ( ω ) = n i,j,k ( t ) , i ∈ { a, b } , j ∈ { l, d } and k ∈ V j . For any i ∈ { a, b } , j ∈ { l, d } and k ∈ V j , N i,j,kt denotes the total number of trades of size k madebetween time 0 and time t , where a , b stand for the ask and bid side respectively and l , d for the litand dark pools respectively. Finally the process W represents the mid-price of the traded asset.Then we define the probability P on (Ω , F ) under which W t and the N i,j,kt are independent, W t is aone-dimensional Brownian motion and the N i,j,kt , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j are Poisson processeswith intensity ǫ > Finally, we endow the space (Ω , F ) with the ( P − completed)canonical filtration F := ( F t ) t ∈ [0 ,T ] generated by ( X t ) t ∈ [0 ,T ] . In this section, we formalize the connection between volumes posted by the market maker and arrivalintensity of market orders on the ask and bid sides of both venues. We also take into account marketimpact phenomenon and latency effect in the dark pool.
Let 2 q ∈ N represent a risk limit for the market maker, which corresponds to the maximum num-ber of cumulated bid and ask orders the market maker can handle. We define the volume process In other words, P is the product measure of the Wiener measure on Ω c and the unique measure on Ω V l + V d ) d so that the canonical process corresponds to a multidimensional homogeneous Poisson process with arbitrary smallintensity, representing a situation where no liquidity is available. L t ) t ∈ [0 ,T ] := ( L lt , L dt ) t ∈ [0 ,T ] ∈ ( V l ) × ( V d ) , where L lt = ( ℓ a,lt , ℓ b,lt ) t ∈ [0 ,T ] and L dt = ( ℓ a,dt , ℓ b,dt ) t ∈ [0 ,T ] with ℓ i,jt corresponding to the volume posted by the market maker at time t on side i ∈ { a, b } of pool j ∈ { l, d } . The set A of admissible controls of the market maker is therefore defined as A := n ( L t ) t ∈ [0 ,T ] predictable, s.t for i ∈ { a, b } , ℓ i,l + ℓ i,d ∈ [0 , q ] o . The market maker manages his inventory Q t , defined as the aggregated sum of the volumes filled onboth sides of the lit and dark pools, namely Q t := X j ∈{ l,d } X ( k a,j ,k b,j ) ∈ ( V j ) k b,j N b,j,kt − k a,j N a,j,kt . Remark 2.1.
Note that we assume that there is no partial execution in our model. Therefore marketorders consume the whole volume posted by the market maker on the considered side and pool.
We define the function ψ i,j ( L lt ) := ( I a ( L lt ) if ( i, j ) ∈ { ( a, l ) , ( b, d ) } I b ( L lt ) if ( i, j ) ∈ { ( b, l ) , ( a, d ) } , where I a ( L lt ) := ℓ a,lt ℓ a,lt + ℓ b,lt , I b ( L lt ) := ℓ b,lt ℓ a,lt + ℓ b,lt represent the imbalances on the ask and bid sides of thelit pool respectively. To model the behavior of market takers, we define the intensities of the pro-cesses N i,j,k as λ L ,i,j,kt := λ i,j ( L lt ) { φ ( i ) Q t − > − q,ℓ i,jt = k } , φ ( i ) := ( i = a − i = b, where λ i,j ( L lt ) := A j exp (cid:18) − θ j σ ψ i,j ( L lt ) (cid:19) {L lt =(0 , } + ǫ {L lt =(0 , } , where σ > θ l , θ d > A l , A d > L ∈ A , we introduce a new probability measure P L under which W remains a one-dimensionalBrownian motion and for i ∈ { a, b } , j ∈ { l, d } , k ∈ V j the N L ,i,j,kt := N i,j,kt − Z t λ L ,i,j,ku d u are martingales. This probability measure is defined by the corresponding Doléans-Dade exponential L L t := exp X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z t { φ ( i ) Q u − > − q,ℓ i,jt = k } log (cid:18) λ i,j ( L lu ) ǫ (cid:19) d N i,j,ku − (cid:18) λ i,j ( L lu ) − ǫ (cid:19) d u ! , ℓ i,j . We can therefore set the Girsanovchange of measure with d P L d P | F t = L L t for all t ∈ [0 , T ]. In particular, all the probability measures P L indexed by L are equivalent. We write E L t for the conditional expectation with respect to F t underthe probability measure P L . We also define for i ∈ { a, b } , j ∈ { l, d } the processes N i,jt := X k ∈V j N i,j,kt of intensities λ i,j ( L lt ) { φ ( i ) Q t − > − q } . These processes correspond to the total number of transactionsexecuted on the bid or ask side of the lit or dark pools. We define the efficient price of the underlying asset, observable by all market participants (in thesense that they can infer it) as ˜ S t := ˜ S + σW t , where ˜ S > σ > t ∈ [0 , T ] by S t := ˜ S t + X j ∈{ l,d } Z t Γ j ℓ a,ju d N a,ju − Γ j ℓ b,ju d N b,ju , (2.1)where Γ l , Γ d > Remark 2.2.
The market impact parameters Γ l , Γ d are taken small enough with respect to the ticksize to discard obvious arbitrage opportunities. Moreover, as the market impact in the dark pool isusually smaller or delayed compared to the lit pool, we will take Γ l ≥ Γ d . We assume that in the lit pool, the best bid and best ask prices P b,l and P a,l satisfy P b,lt := S t − T , P a,lt := S t + T , t ∈ [0 , T ] , where T > The associated Novikov criterion is given in [18]. P b,d, lat t := S t − T , P a,d, lat t := S t + T ,P b,d, non-lat t := S t , P a,d, non-lat t := S t . Recall that in most dark pools, market takers are supposed to be executed at the mid-price of thelit pool. However, the higher the imbalance on the ask (resp. bid) side of the lit pool, the higherthe probability that the mid-price will move down (resp. up) quickly. To model the latency effect,we introduce Bernoulli random variables ν at ∼ Ber (cid:16) I a ( L lt ) (cid:17) , ν bt ∼ Ber (cid:16) I b ( L lt ) (cid:17) which are associatedto each incoming market order in the dark pool. If ν t = 1, there is no latency, and conversely for ν t = 0. So we define N a,d, lat t := Z t (1 − ν au ) dN a,du , N a,d, non-lat t := Z t ν au dN a,du , and N b,d, lat t := Z t (1 − ν bu ) dN b,du , N b,d, non-lat t := Z t ν bu dN b,du . Note that for any t ∈ [0 , T ], N i,d, lat t + N i,d, non-lat t = N i,dt for i ∈ { a, b } . To our knowledge, our approachis the first one considering market making in the dark pool taking into account latency effect. We address the problem of a market maker acting in the lit and dark pools, without intervention ofthe exchange. The profit and loss (PnL for short) of the market maker is defined as the sum of thecash earned from his executed orders and the value of his inventory. Thus it is expressed as
P L L t := W L t + Q t S t , where, at time t ∈ [0 , T ], W L t := Z T (cid:18) S t + T (cid:19) ℓ a,lt d N a,lt − Z T (cid:18) S t − T (cid:19) ℓ b,lt d N b,lt + Z T (cid:18) S t + T (cid:19) ℓ a,dt d N a,d, lat t + Z T S t ℓ a,dt d N a,d, non-lat t − Z T (cid:18) S t − T (cid:19) ℓ b,dt d N b,d, lat t − Z T S t ℓ b,dt d N b,d, non-lat t represents his cash process and Q t S t is the mark-to-market value of his inventory. Note that marketmaking activity in the dark pool without latency does not generate PnL through spread collection.We consider a risk averse market maker with exponential utility function and risk aversion parame-ter γ >
0. We define his optimization problem as V MM0 = sup
L∈A J MM0 ( L ) , (3.1) We take the convention I a (0 ,
0) = I b (0 ,
0) = 0. Note that for all t ∈ [0 , T ], R t ℓ i,ju d N i,ju = P k ∈V j kN i,j,kt . t ∈ [0 , T ], J MM t ( L ) = E L " − exp (cid:18) − γ ( P L L T − P L L t ) (cid:19) . Inspired by [10], we prove a dynamic programming principle for the control problem (3.1), see Sec-tion A.1, from which we derive the corresponding HJB equation. We define O = [0 , q ] . Similarlyto [12], we use a change of variable (see Equation (4.14) for the form of the ansatz) to reduce theinitial problem to the following HJB equation:0 = ∂ t v ( t, q ) + v ( t, q ) 12 σ γ q + sup L∈O X ( k l ,k d ) ∈V l ×V d X i ∈{ a,b } λ L ,i,l,k l t exp (cid:18) − γℓ i,l (cid:16) T l ( φ ( i ) q − ℓ i,l ) (cid:17)(cid:19) v ( t, q − φ ( i ) k l ) − v ( t, q ) ! + X i ∈{ a,b } X κ ∈ K λ L ,i,d,k d t φ d ( i, κ ) exp (cid:18) − γℓ i,d (cid:16) T φ lat ( κ )+Γ d ( φ ( i ) q − ℓ i,d ) (cid:17)(cid:19) v ( t,q − φ ( i ) k d ) − v ( t, q ) ! , (3.2)with K := { lat , non-lat } , φ lat ( κ ) := ( κ = lat0 if κ = non-lat , φ d ( i, κ ) := ( I b ( L l ) if ( i, κ ) ∈ { ( a, lat) , ( b, non-lat) } I a ( L l ) if ( i, κ ) ∈ { ( a, non-lat) , ( b, lat) } and terminal condition v ( T, · ) = −
1. We have the following theorem.
Theorem 1.
There exists a unique viscosity solution to the HJB equation (3.2) . It satisfies V MM0 = v (0 , Q ) . The supremum in (3.2) characterizes the optimal controls L ⋆ ∈ A . The proof follows the same arguments as Theorem 3 in Section A.3.We see that the supremum over L is not separable with respect to each control process as in [10, 12].To our best knowledge there is no explicit expression for the optimal controls of the market maker.Nevertheless, as shown in Section 5.2, we can solve PDE (3.2) numerically. More precisely, we makeuse of deep reinforcement learning techniques to approximate the optimal volumes posted of themarket maker. Let us now consider the case where a make-take fees system is in place and influences the amount ofliquidity provided by the market maker on both lit and dark venues.8 .1 Modified PnL of the market maker
Following the principal-agent approach of [10], we now assume that the exchange gives to the marketmaker a compensation ξ defined as an F T − measurable random variable, which is added to his PnLprocess at terminal time T . This contract, designed by the exchange, aims at creating incentives sothat the market maker attracts more transactions.Therefore, the total payoff of the market maker at time T is now given by W L T + Q T S T + ξ . Theproblem of the market maker then becomes V MM0 ( ξ ) := sup L∈A J MM0 ( L , ξ ) , (4.1)with J MM t (cid:16) L , ξ (cid:17) := E L t " − exp (cid:18) − γ ( P L L T − P L L t + ξ ) (cid:19) . To ensure that this functional is non-degenerate, we impose the following technical condition on ξ (see the next section for the definition of an admissible contract):sup L∈A E L " exp (cid:18) − γ ′ ξ (cid:19) < + ∞ , for some γ ′ > γ, (4.2)so that the optimization problem of the market maker is well-posed.For a fixed compensation ξ , the optimal response L ⋆ associated with the market maker’s problem(4.1) is defined as J MM0 ( L ⋆ , ξ ) = V MM0 ( ξ ) for L ⋆ ∈ A . (OC)We now consider the problem of the exchange wishing to attract liquidity on its platforms. We assume that the exchange receives fixed fees c l , c d > c l , c d independent of the price of the asset.The goal of the exchange is essentially to maximize the total number of market orders sent duringthe period of interest. As the arrival intensities of market orders are controlled by the market makerthrough L , the contract ξ should aim at increasing these intensities. Thus, the exchange subsidizesthe agent at time T with the compensation ξ so that its PnL is given by X i ∈{ a,b } j ∈{ l,d } c j Z T ℓ i,jt d N i,jt − ξ.
9e now need to specify the set of admissible contracts potentially offered by the exchange. We assumethat the exchange has exponential utility function with risk aversion parameter η >
0. The naturalwell-posedness condition for the problem of the exchange is E L ∗ " exp (cid:18) η ′ ξ (cid:19) < + ∞ , for some η ′ > η, (4.3)for any L ⋆ satisfying condition (OC).Since the N i,j are point processes with bounded intensities, this condition, together with Hölderinequality, ensure that the problem of the exchange is well-defined. We also assume that the marketmaker only accepts contracts ξ such that V MM0 ( ξ ) is above some threshold value R <
0, that is ξ mustsatisfy V MM0 ( ξ ) ≥ R. (R)This threshold, called reservation utility of the agent, is the critical utility value under which themarket maker has no interest in the contract. This quantity has to be taken into account carefullyby the exchange when proposing a contract to the market maker. We can therefore define the spaceof admissible contracts C by C := (cid:26) ξ ∈ F T , s.t (4.2), (4.3) and (R) are satisfied (cid:27) . Thus the contracting problem the exchange has to solve is V E := sup ξ ∈C E L ⋆ − exp − η (cid:18) X i ∈{ a,b } j ∈{ l,d } c j Z T ℓ i,jt d N i,jt − ξ (cid:19)! . (4.4)In the next section, we characterize the form of an admissible contract ξ ∈ C . Inspired by [10], we prove in this section that without loss of generality, we can consider a specificform of contracts, defined by some Y ∈ R and a predictable process Z = ( Z ˜ S , Z i,j,k ) i ∈{ a,b } ,j ∈{ l,d } ,k ∈V j chosen by the principal. A contract ξ of this form can be written as ξ = Y Y ,ZT := Y + Z T (cid:18) X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z i,j,ku d N i,j,ku (cid:19) + Z ˜ Su d ˜ S u + (cid:18) γσ ( Z ˜ Su + Q u ) − H ( Z u , Q u ) (cid:19) d u, (4.5)where H ( z, q ) := sup L∈A h ( L , z, q ) , (4.6) Note that for fixed ξ , the control L ⋆ is not necessarily unique. However, numerical results seem to indicate itsuniqueness. Otherwise we could also consider a supremum over all L ⋆ satisfying (OC), as it is usually done in principal-agent theory (see for instance [9, Section 2.4]). h : R × R V l + V d ) × Z → R is the Hamiltonian of the agent’s problem. To ensure admissibilityof the contract, the process ( Z t ) t ∈ [0 ,T ] has to satisfy the following technical conditions:sup L∈A sup t ∈ [0 ,T ] E L " exp (cid:18) − γ ′ Y ,Zt (cid:19) < + ∞ , for some γ ′ > γ, (4.7)and Z T | Z ˜ St | + | H ( Z t , Q t ) | d t < + ∞ . (4.8)Given this integrability condition, the process ( Y ,Zt ) t ∈ [0 ,T ] is well-defined. The contract consists ofthe following elements: • The constant Y is calibrated by the exchange to ensure that the reservation utility constraint(R) of the market maker is satisfied. • The term Z ˜ S is the compensation given to the market maker with respect to the volatility riskinduced by the efficient price ˜ S . • Every time a trade of size k occurs on the ask or bid side of the lit or dark pool, the marketmaker receives Z i,j,k . • The term γσ ( Z ˜ S + Q ) − H ( Z, Q ) is a continuous coupon given to the market maker.
Remark 4.1.
In our setting, the volumes of limit orders do not belong to the canonical process andso the principal does not contract on the volumes displayed by the market maker. It is very reasonableas in practice, a large part of volumes sent by market makers are not executed or rapidly canceled.Therefore it is clearly preferable to build contracts based on actual transactions. Moreover, note that ˜ S ,and not the mid-price S , appears in the contract (2.1) . This is not an issue since S can be decomposedinto elements of the canonical process. Formally stated, the definition of the space Ξ of contracts of the form (4.5) isΞ = n Y Y ,ZT ∈ F T , Y ∈ R , Z ∈ Z , s.t (R) holds o , where Z denotes the set of processes defined by Z := n ( Z ˜ S , Z i,j,k ) , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j s.t (4.3) , (4.7) , (4.8) are satisfied o . (4.9) For ( L , z, q ) ∈ ( V l ) × ( V d ) × R V l + V d ) × Z we define the Hamiltonian of the market maker, whichappears in the contract Y Y ,ZT via the continuous coupon H ( Z, Q ), by h ( L , z, q ) := X ( k l ,k d ) ∈V l ×V d X i ∈{ a,b } γ − − exp (cid:18) − γ (cid:16) z i,l,k l + ℓ i,l ( T φ ( i )Γ l q ) − Γ l ( ℓ i,l ) (cid:17)(cid:19)! λ L ,i,l,k l t + X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d,k d + ℓ i,d ( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ i,d ) (cid:17)(cid:19)! λ L ,i,d,k d t φ d ( i, κ ) . (4.10) Its form is defined in (4.10). This Hamiltonian term appears naturally when applying the dynamic programmingprinciple for the market maker’s problem. From Theorem 2, ˆ Y = − γ − log( − R ) ensures that the reservation utility constraint of the market maker is satisfied. C and Ξ are, in fact, equal. Moreover, the contractrepresentation (4.5) enables us to provide a solution to the market maker’s problem (4.1). The proofis given in Section A.2. Theorem 2.
Any admissible contract can be written under the form (4.5) , that is C = Ξ . Moreover,for any Y Y ,ZT ∈ Ξ we have V MM0 ( Y Y ,ZT ) = − exp( − γY ) , and Condition (OC) with ξ = Y Y ,ZT is equivalent to the fact that L satisfies h ( L t , Z t , Q t ) = H ( Z t , Q t ) for any t ∈ [0 , T ] . This theorem provides a tractable form of contracts for the design of a suitable make-take fees policy.Given the knowledge of the market maker’s response to a given contract, we reformulate the problemof the exchange and prove the existence and uniqueness of the associated value function.
Following Theorem 2, the contracting problem (4.4) is reduced to V E0 := sup ( Y ,Z, L ) ∈ R ×Z×A ,h ( L t ,Z t ,Q t )= H ( Z t ,Q t ) , ∀ t ∈ [0 ,T ] E L − exp − η (cid:18) X i = { a,b } j = { l,d } c j Z T ℓ i,jt d N i,jt − Y Y ,ZT (cid:19)! . (4.11)For a given contract Y Y ,Z , due to the form of (4.10), the market maker’s optimal response does notdepend on Y . With the exchange’s objective function being decreasing in Y , the maximization withrespect to Y is achieved at the level ˆ Y = − γ − log( − R ). Therefore Problem (4.11) can be reducedto v E := sup ( Z, L ) ∈Z×A h ( L t ,Z t ,Q t )= H ( Z t ,Q t ) , ∀ t ∈ [0 ,T ] J ( Z, L ) , (4.12)where J ( Z, L ) = E L − exp − η (cid:18) X i = { a,b } j = { l,d } c j Z T ℓ i,jt d N i,jt − Y ,ZT (cid:19)! . We define D := [0 , T ] × R × N V l + V d ) × N V l + V d ) × R and for any vector p , i ∈ { , . . . , p } , p − i = ( p , . . . , p i − , p i +1 , . . . , p p ) ∈ N p − . By Equation (4.4) and the corresponding footnote, theremight be more than one optimal response L ⋆ of the market maker. We show here how to solve theprincipal’s problem for a specific optimal response L ⋆ . Using a dynamic programming principle If there are several optimal responses L ⋆ , the exchange should solve the HJB equation (4.13) for every L ⋆ and,according to principal-agent theory, choose the optimal response that maximizes its own utility. v E : D → R ,as v E ( t, ˜ S t , ¯ N t , N t , Y t ) := sup Z ∈Z E L ⋆ t − exp − η (cid:18) X i = { a,b } j = { l,d } X k ∈V j c j ( ¯ N i,j,kT − ¯ N i,j,kt ) − Y ,ZT (cid:19)! , with ( ¯ N t , N t ) := (cid:16) kN i,j,kt , N i,j,kt (cid:17) i = { a,b } ,j = { l,d } ,k ∈V j , and L ⋆ = ( ℓ ⋆b,l ( z, q ) , ℓ ⋆a,l ( z, q ) , ℓ ⋆b,d ( z, q ) , ℓ ⋆a,d ( z, q )) the optimal response of the market maker, in thesense of (4.6), displayed at time t for a given inventory q and given incentives z of the exchange.Recall that Q t = P j ∈{ l,d } P k ∈V j ( ¯ N b,j,kt − ¯ N a,j,kt ). Usual arguments enables us to show that v E is aviscosity solution of the HJB equation defined on D by0 = ∂ t v E + 12 σ ∂ ˜ S ˜ S v E + sup z ˜ S ∈ R γσ z ˜ S + q ) ∂ y v E + σ z ˜ S ) ∂ yy v E + σ z ˜ S ∂ ˜ Sy v E + sup z ∈ R V l + V d ) X i = { a,b } j = { l,d } X k ∈V j λ L ⋆ ,i,j,kt (cid:18) ∆ i,j,k ( z ) v E − ∂ y v E E ( z i,j,k , ℓ ⋆i,j ( z, q )) (cid:19) , (4.13)where, for i ∈ { a, b } , j ∈ { l, d } ,∆ i,j,k ( z ) v E ( t, ˜ s, ¯ n, n, y ) := v E ( t, ˜ s, ¯ n i,j,k + k, ¯ n − ( i,j,k ) , n i,j,k + 1 , n − ( i,j,k ) , y + z i,j,k ) − v E ( t, ˜ s, ¯ n, n, y ) , E ( z i,l,k , ℓ ⋆i,l ( z, q )) := 1 γ − exp (cid:18) − γ (cid:16) z i,l,k + ℓ ⋆i,l ( z, q )( T φ ( i )Γ l q ) − Γ l ( ℓ ⋆i,l ( z, q )) (cid:17)(cid:19) , E ( z i,d,k , ℓ ⋆i,d ( z, q )) := 1 γ X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d,k + ℓ ⋆i,d ( z, q )( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ ⋆i,d ( z, q )) (cid:17)(cid:19) φ d ( i, κ ) , and terminal condition v E ( T, ˜ s, ¯ n, n, y ) = − exp (cid:16) − η ( X i = { a,b } j = { l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) . Remark that the best response of the market maker, for which we do not have explicit expression,appears in the value function of the exchange. Inspired by [10, 12], we use the following ansatz forEquation (4.13): v E ( t, s, ¯ n, n, y ) = v ( t, q ) exp (cid:16) − η ( X i = { a,b } j = { l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) , (4.14)where v is a solution of the following HJB equation ( ∂ t v ( t, q ) + H (cid:16) q, L ⋆ , v ( t, · ) (cid:17) , q ∈ {− q, . . . , q } , t ∈ [0 , T ) ,v ( T, q ) = − , (4.15)13ith H (cid:16) q, L ⋆t , v ( t, · ) (cid:17) := sup z ∈ R V l + V d )+1 U (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) , (4.16)and U (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) := v ( t, q ) (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i = { a,b } j = { l,d } X k ∈V j λ L ⋆ ,i,j,kt exp( η ( z i,j,k − kc j )) v (cid:16) t, q − φ ( i ) k (cid:17) − v (cid:16) t, q (cid:17)(cid:16) η E ( z i,j,k , ℓ ⋆i,j ( z, q )) (cid:17)! . This ansatz leads to dimensionality reduction from five to two parameters. Using [4, Corollary 1.4.2],there exists a unique continuous viscosity solution associated to (4.15).
Remark 4.2.
Note that the supremum over z ˜ S is explicit and given by z ˜ S = − γγ + η q as in [10]. Making use of the ansatz v , the bi-level optimization problem (4.12) is reduced to solving the followingsystem: ( ∂ t v ( t, q ) + H (cid:16) q, L ⋆t , v ( t, · ) (cid:17) , with final condition v ( T, q ) = − ,h ( L ⋆ , z, q ) = H ( z, q ) , q ∈ {− q, . . . , q } . (4.17)We have the following theorem. Theorem 3.
There exists a unique continuous viscosity solution to HJB equation (4.15) . It satisfies v E = v (0 , Q ) = v E (0 , ˜ S , ¯ N , N , Y ) . Moreover, the optimal incentives of the principal Z ⋆ are solutions of the supremum in (4.15) . The proof can be found in Section A.3.Theorem 3 allows us to use numerical methods to obtain the optimizers (cid:18) Z ⋆ ( t, Q t − ) , L ⋆ (cid:16) Z ⋆ ( t, Q t − ) , Q t − (cid:17)(cid:19) t ∈ [0 ,T ] (4.18)of the bi-level problem (4.17). Moreover, the optimal contract is given by ξ ⋆ = ˆ Y + Z T (cid:18) X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z ⋆i,j,ku d N i,j,ku (cid:19) + Z ⋆ ˜ Su d ˜ S u + (cid:18) γσ ( Z ⋆ ˜ Su + Q u ) − H ( Z ⋆u , Q u ) (cid:19) d u. The second problem in (4.17) is a classical optimization problem. Having found numerically L ⋆ ( z, q ),we solve the Hamilton-Jacobi-Bellman (4.15) using neural networks.14 emark 4.3. Theorem 3 characterizes only the value function of the exchange and not the optimalincentives defined in (4.18) , which are computed through deep reinforcement learning techniques. Inparticular, there is no guarantee of admissibility of the incentives ( Z ⋆ ( t, Q t − )) t ∈ [0 ,T ] solving (4.16) .However, we observe numerically (see Figure 5) that these incentive parameters are essentially linear(despite nonlinear nature of neural networks) in the inventory Q at any fixed time t . This resultis indeed quite usual in the optimal market making literature where asymptotic development of thefunction v is used, so v should be regular enough (see [12, Section 4] or [1, Section 3.2]). Thelinearity of the incentives Z ⋆ implies them to be in the set of admissible contracts Z defined by (4.9) . We now turn to the description of our numerical method to solve (4.17), the optimization procedureconsists of two stages. At the first stage, we optimize the controls of the market maker for all possiblevalues of the incentives given by the exchange. At the second stage, we use an actor-critic approach,to obtain both the optimal controls and the value function of the exchange. We conclude this sectionwith numerical experiments showing the impact of incentives as well as that of market conditionson the volumes posted by the market maker on both lit and dark venues regulated by the exchange.Throughout these experiments, we assume the following:
Assumption 5.1.
For all i ∈ { a, b } , j ∈ { l, d } and k ∈ V j , Z i,j,k = Z i,j ∈ R . This means that the principal provides incentives only with respect to the number of transactions oneach side of each pool independently of the volumes. In that case, HJB equation (4.15) remains valid.Recall that the optimal incentives depend on time and market maker’s inventory, therefore, implicitlythey depend on the transacted volume.There is obvious bid-ask symmetry in our model with respect to the inventory of the market maker,as it can be seen in Hamiltonian (4.10). Thus for our numerical experiments we impose symmetryof the incentives with respect to q . As a consequence, we have symmetry of volumes posted by themarket maker with respect to q , given incentives satisfying the bid-ask symmetry property. The first step to tackle our principal-agent problem is to find optimal volumes L ⋆ = ( ℓ ⋆a,l , ℓ ⋆b,l , ℓ ⋆a,d , ℓ ⋆b,d ),by solving for any couple ( z, q ), the maximization problem of the market maker (4.6). To do so, weintroduce a continuous version of the Hamiltonian (4.10) with respect to L , that is we maximize thefollowing functional: L 7−→ h c ( L , z, q ) := X i ∈{ a,b } γ − − exp (cid:18) − γ (cid:16) z i,l + ℓ i,l ( T φ ( i )Γ l q ) − Γ l ( ℓ i,l ) (cid:17)(cid:19)! λ i,l ( L lt )+ X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d + ℓ i,d ( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ i,d ) (cid:17)(cid:19)! λ i,d ( L lt ) φ d ( i, κ ) . (5.1)15or fixed incentives, we have that L ⋆a,l ( q ) = L ⋆b,l ( − q ). Because of the intricate form of the func-tion h c , we cannot have an explicit solution to the first order condition ∇ L h c = 0, which is four-dimensional. Moreover, we do not have an a priori knowledge on the functional form of optimizers L ⋆ : R × [ − q, q ] → R , so we cannot apply canonical root-finding methods. Therefore to addressthis problem, we approximate the best response of the market maker by a neural network.Although we do not use a purely grid-based method, we need to define a domain for arguments q and z . In our model inventory q of the market maker is bounded and evolves between risk limits − q and q . We also define a bound z for the incentives z ∈ R , so that z ∈ [ − z, z ] . This is in fact justifiedalso by the paper [10] in which optimal incentives are proved to be bounded.We approximate the best response function L ⋆ by a neural network l [ ω l ], where ω l are the weights ofthe neural network. The neural network l [ ω l ] takes as inputs principal’s incentives and the marketmaker’s current inventory ( z a,l , z b,l , z a,d , z b,d , q ), which are normalized by z and q respectively. Thenetwork is composed of 2 hidden layers with 10 nodes in each of them and with ELU activationfunctions. ELU activation function is of the formELU( x ) = α ( e x − , for x ≤ x, for x > , where α is a non-negative parameter, usually taken equal to 1.The final layer of the network contains four outputs, and the activation function is sigmoid (for theoutputs to be between 0 and 1). The output of l [ ω l ] is then renormalized via multiplication by q toobtain volumes between 0 and q .To obtain optimal volumes of the market maker, we minimize the opposite of the Hamiltonian functiondefined by Equation (5.1). We generate K > z and q , and conduct several epochsof batch learning with the following weights update: ω l ← ω l + µ l K K X k =1 ∇ ω l l [ ω l ]( z k , q k ) (cid:18) ∇ l h c ( l [ ω l ]( z k , q k ) , z k , q k ) − ρ (cid:16) ( q k + l [ ω l ] b,l + l [ ω l ] b,d − q ) + +( q k − l [ ω l ] a,l − l [ ω l ] a,d + q ) − (cid:17)(cid:19) , where µ l is the learning rate. The term scaled by ρ corresponds to a penalty employed to force quotesto stay in A , so that l [ ω l ] i,l + l [ ω l ] i,d ∈ [0 , q ] , i ∈ { a, b } . In our computations we use ρ = 0 . l ⋆ [ ω l ] the approximated optimal response function of the market maker L ⋆ (theresult of the above optimization procedure). In Figure 1, we see an example of the best response l ⋆ [ ω l ] as a function of z a,l = − z b,l , when the market maker’s inventory q = 50 and other incentives z a,d = z b,d = 0 .
05 (close to zero). Remark that the choice of incentives is arbitrary only and aimed atreflecting the main properties of l ⋆ [ ω l ]. Here we slightly abuse notation denoting by l [ ω l ] the response of the market maker obtained via neural networkparametrized by weights ω l . b,l −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0z a,l Figure 1: Best response of the market maker as a function of z a,l and z b,l , with q = 50.The observed behavior has quite natural interpretation. The incentive z a,l is a remuneration of themarket maker when his limit order is executed on the ask side of the lit pool. When this incentiveincreases, the market maker ensures to have a small imbalance on the ask side of the lit pool so thathe can earn z a,l . Because of his positive inventory, the volume posted on the ask side of the darkpool is higher than on the bid side of the dark pool: the market maker wants to liquidate his longposition. Similarly when the incentive z b,l increases, the market maker wants to benefit from it whentransacting on the bid side of the lit pool. This explains the small imbalance on the bid side of the litpool for positive z b,l . Mathematically speaking, the function h c is increasing in z a,l . Thus for a high z a,l , the value of the term E ( z a,l , l ⋆ [ ω l ] a,l ) in the Hamiltonian is high. To benefit from the remuneration z a,l , the intensity λ a,l must be high, which implies a small imbalance on the ask side, hence I a shouldbe small. Similarly for z b,l .For q = 150, z b,l = − z a,l , and other incentives z a,d = z b,d = 0 .
05 (close to zero), we display thevolumes in Figure 2: −2.0−1.5−1.0−0.50.00.51.01.52.0 z b,l −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0z a,l
Figure 2: Best response of the market maker as a function of z a,l and z b,l , with q = 150.As the market maker has a higher inventory, his quotes on the bid side of both pools decrease becauseof the inventory risk. Moreover, his quotes on the ask side of both pools increase to liquidate hislong position. For high incentives z b,l , a small volume on the bid side of the lit pool leads to a low17mbalance on the bid side, hence a high probability of execution for passive ask orders in the darkpool, where the market maker tries to liquidate his position. Note that for high z a,l , the imbalance isapproximately equal to one half, because the market maker does not want to suffer from the latencyeffect (to be executed at the mid-price in the dark pool).We now move to the problem of the principal. (4.15)A numerical approximation of the optimal incentives z ⋆ can be obtained by1. solving (numerically) the static maximization problem (5.1), which provides the approximationof the optimal response L ⋆ of the market maker,2. plugging this approximation in the continuous (with respect to L ⋆ ) version of Hamilton-Jacobi-Bellman equation (4.15), that is to say: ( ∂ t v ( t, q ) + H c (cid:16) q, L ⋆ , v ( t, · ) (cid:17) , q ∈ [ − q, q ] , t ∈ [0 , T ) ,v ( T, q ) = − , with H c (cid:16) q, L ⋆ , v ( t, · ) (cid:17) := sup z ∈ R U c (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) , and abusing the notation with L ⋆ denoting L ⋆ ( z, q ) U c (cid:16) z, q, L ⋆ , v ( t, · ) (cid:17) := v ( t, q ) (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i = { a,b } j = { l,d } λ i,j ( L ⋆l ) exp( η ( z i,j − c j ℓ ⋆i,j )) v (cid:16) t, q − φ ( i ) ℓ ⋆i,j (cid:17) − v (cid:16) t, q (cid:17)(cid:16) η E ( z i,j , ℓ ⋆i,j ) (cid:17)! . We obtain explicitly z ˜ S = − γγ + η q , so we are only interested in finding optimal ( z a,l , z b,l , z a,d , z b,d ). Theclassical method to solve the above problem is to obtain an approximation of the value function via afinite difference scheme on a grid. Since the size of the grid increases exponentially with the numberof dimensions, using this approach is not possible for a high dimension. Therefore, to address ourfive-dimensional optimization problem, we resort to neural networks.We use an algorithm known in reinforcement learning literature as the actor-critic method. The coreof this approach is the representation of the value function and optimal controls with deep neuralnetworks. The learning procedure itself consists of two stages: value function update (also calledcritic update) and controls update (actor update).We first split our problem into sub-problems corresponding to different time steps. We consider atime step ∆ t . The first-order approximation of the value function at time t gives v ( t, · ) ≈ v ( t + ∆ t, · ) − ∂ t v ( t + ∆ t, · ) . t , we represent the value function and the incentives with neural networks. Ourprocedure is backward in time, and we start from T − ∆ t , recalling that v ( T, · ) = −
1. Let us fix t ∈ [0 , T − ∆ t ]. Value function at time t is represented by v t [ ω v t ]( · ) which is a feedforward neuralnetwork, parameterized by weights ω v t , which approximates the value function corresponding to thecurrent set of incentives approximated by the neural network z t [ ω z t ]( · ), parametrized by ω z t . Critic’snetwork is composed of 2 hidden layers with 20 nodes in each of these layers with ELU activationfunctions. The final layer of the network contains one output, and the activation is affine. Actor’snetwork is composed of 2 hidden layers with 20 nodes in each of these layers with ELU activationfunctions. The final layer of the network contains four outputs, and the activation is tanh (this allowsthe output to stay between − z . The first step is thefollowing update of the value function network’s weights ω v t : ω v t ← ω v t + µ v K K X k =1 ∇ ω vt v t [ ω v t ]( q k ) (cid:16) v t +∆ t [ ω v t +∆ t ]( q k )+ U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] ,v t +∆ t [ ω v t +∆ t ]( · )) − v t [ ω v t ]( q k ) (cid:17) , where µ v is a learning rate, U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · )) corresponds to the function under thesupremum of the Hamiltonian (4.16) calculated using the current controls z t [ ω z t ]. The quantities q k , k ∈ { , . . . , K } are the elements of the training set, more precisely K uniformly distributed ele-ments from the interval [ − q, q ]. We use U ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t +∆ t [ ω v t +∆ t ]( · )) as an approximationof ∂ t v ( t + ∆ t, · ) to apply the first order approximation of the value function described above.When the value function’s neural network approximates the value function corresponding to thecurrent control z t [ ω z t ], we can move to the stage of optimization over control values (also called policyupdate in reinforcement learning literature). Our policy update consists of two different procedures.The first one is an exploitation phase where the weights are updated according to the best directionsuggested by the gradient of the function U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · )): ω z t ← ω z t + µ z K K X k =1 ∇ ω zt z t [ ω z t ]( q k ) ∇ z t U c (cid:16) z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17) , where µ z is a learning rate. This type of updates is usually called policy gradient.Another type of updates we use in the learning procedure is an exploration phase. During this phase,we use the current values given by the neural network of controls and introduce noise to these values,to explore the values slightly different from those proposed by the neural network. Noise is normallydistributed around 0 with standard deviation chosen beforehand (in the following examples, we usestandard normal distribution). This phase could help us to quit local minima, in case the algorithmis trapped in one. The following updates characterize this phase: ω z t ← ω z t + ˆ µ z K K X k =1 ε k ∇ ω zt z t [ ω z t ]( q k ) (cid:18) U c (cid:16) z t [ ω z t ]( q k ) + ε, q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17) − U c (cid:16) z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17)(cid:19) , where ε is a vector of length K representing introduced perturbations and ˆ µ z is a learning rate.19 .2 Numerical Results In the following we consider ∆ t = 1. Since time has little impact on the quotes chosen by the marketmaker (see [10, 12]), we present the results only for time T −
1, the extension to earlier time stepsis straightforward. As mentioned before, the optimization problems considered are symmetric withrespect to the inventory variable q . First, we present a reference model without the intervention of the exchange. We consider the followingparameters: • Risk aversion of the market maker: γ = 0 . • Market impacts: Γ l = 10 − , Γ d = 5 × − ; • Influence of the imbalance on the orders arrival: θ l = θ d = 0 . • Volatility: σ = 0 . • Fees: c l = 0 . , c d = 0 . • Order flow intensity parameter: A l = 5 × , A d = 3 × .In Figure 3, we present the optimal quotes of the market maker. −300 −200 −100 0 100 200 300Inventory0255075100125150175200 Optimal volumes ask litbid litask darkbid dark
Figure 3: Optimal quotes of the market maker.One can see that the market maker splits his orders equitably between the lit and dark pools whenhis inventory is near zero. However, when he has a very positive (resp. negative) inventory, he hasa large imbalance on the ask (resp. bid) side of the lit pool, to liquidate his position in the darkpool. Such behavior shows that the market maker uses the dark pool as a way to liquidate a largeposition by adjusting the imbalance in the lit pool. Indeed, when he posts a high volume on the askside of the lit pool, he encourages ask orders in the dark pool. Thus, as he prioritizes the executionof a large ask order, he accepts to be executed at the mid-price in the dark pool. When q = 300, hedoes not post a sell order of size 300 in the dark pool, because of the quadratic variation between themid-price and its inventory process (which can be seen as a quadratic penalty in the market maker’s20nL process with respect to the volumes displayed). Because of the latency generated on the ask sideof the lit pool, the market takers sending market orders on the bid side of the dark pool are likely tobe executed at an unfavorable price. This is why the market maker posts a non-zero volume on thebid side of the dark pool. Remark also that for small inventories, the market maker posts volumes onboth ask and bid sides of the dark pool because he may accept to increase his inventory risk by beingexecuted at a more favorable price in the dark pool due to the latency effect (the volumes displayedin the lit pool lead to 50 percents chance to face this effect at least on one of the sides of the darkpool). Note that the parameters A l > A d describe the fact that there are, on average, much moreorders in the lit pool than in the dark pool. In the following sections, we present several numerical experiments involving the incentive policy ofthe exchange.
In this section, we present a reference model with the exchange. We take the same parameters as inthe case without the exchange, and we set the exchange’s risk aversion: η = 0 . −300 −200 −100 0 100 200 300Inventory0255075100125150175 Optimal volumes ask litbid litask darkbid dark
Figure 4: Optimal quotes of the market maker. −300 −200 −100 0 100 200 300Inventory−0.8−0.6−0.4−0.20.00.20.4
Incentives ask litbid litask darkbid dark
Figure 5: Optimal incentives of the exchange.The presence of incentives has significant effects on the market maker’s behavior. When the marketmaker has an inventory near zero, incentives lead to an increase of the volumes posted in the lit pooland a decrease of that in the dark pool compared to Figure 3. Thus the exchange improves the liquid-ity in the lit venue. Moreover, the strategy of the market maker for very positive or negative inventoryis modified. When he has a very positive inventory, he posts a higher volume on the ask side of thedark pool than in the case without exchange. In addition to this, he posts an equal volumes (smallbut not negligible) on the ask and bid sides of the lit pool. So we see that the exchange preventsthe market maker from artificial manipulation of the market, consisting in creation of high imbalanceon the ask side. As the imbalance is around 1 /
2, the market maker does not take advantage of thelatency effect. This assumption is consistent with the MIFID II regulation rolled out on January 3, 2018, which imposes a cap onvolumes traded in the dark pools.
21n Figure 5, we see that, even if our problem is much more intricate than those of [3, 10], the shapeof the principal’s incentives are essentially linear functions of the market maker’s inventory.
We now investigate the impact of higher volatility on the posted volumes with and without theexchange. We take σ = 0 .
4, the other parameters being as previously. −300 −200 −100 0 100 200 300Inventory0255075100125150175200
Optimal volumes ask litbid litask darkbid dark
Figure 6: Optimal quotes of the market makerwithout the exchange. −300 −200 −100 0 100 200 300Inventory0255075100125150175200
Optimal volumes ask litbid litask darkbid dark
Figure 7: Optimal quotes of the market makerwith the exchange.In Figure 6 we see that, compared to Figure 3, higher volatility does not change significantly thestrategy of the market maker without the exchange. We observe that the contract has more limitedinfluence in the case of high volatility as the market maker follows the same strategy as withoutexchange. In particular, he does not keep his imbalance equal to 1 / Here we show the volumes posted by the market maker and the incentives of the exchange, when thelit and dark pools share the same characteristics. We consider the following set of parameters: • Risk aversion of the market maker and of the exchange respectively: γ = 0 . , η = 0 . • Market impacts: Γ l = Γ d = 10 − ; • Influence of the imbalance on the orders arrival: θ l = θ d = 0 . • Volatility: σ = 0 . • Fees: c l = c d = 0 . • Order flow intensity parameters: A l = A d = 5 × .In Figures 8 and 9, we see that the repartition of volumes between the lit and dark pools has notchanged significantly compared to the reference case with and without contract. The main difference isthat, in the absence of the exchange, the market maker posts higher volumes in the lit pool comparedto the dark one when he has a small inventory. It happens because the dark pool does not providelower market impact and transaction costs contrary to the reference case. Keeping his imbalance near22 / −300 −200 −100 0 100 200 300Inventory0255075100125150175200 Optimal volumes ask litbid litask darkbid dark
Figure 8: Optimal quotes of the market makerwithout the exchange. −300 −200 −100 0 100 200 300Inventory0255075100125150175
Optimal volumes ask litbid litask darkbid dark
Figure 9: Optimal quotes of the market makerwith the exchange.
We now show the volumes displayed by the market maker with and without the exchange, when theparameters of the dark pool make it more appealing than the lit pool. In particular, the marketimpact in the dark pool is five times smaller than in the lit pool. We consider the following set ofparameters: • Risk aversion of the market maker and of the exchange respectively: γ = 0 . , η = 0 . • Market impacts: Γ l = 10 − , Γ d = 2 × − ; • Influence of the imbalance on the orders arrival: θ l = θ d = 0 . • Volatility: σ = 0 . • Fees: c l = 0 . , c d = 0 . • Order flow intensity parameters: A l = 5 × , A d = 3 × . −300 −200 −100 0 100 200 300Inventory0255075100125150175 Optimal volumes ask litbid litask darkbid dark
Figure 10: Optimal quotes of the market makerwithout the exchange. −300 −200 −100 0 100 200 300Inventory050100150200250
Optimal volumes ask litbid litask darkbid dark
Figure 11: Optimal quotes of the market makerwith the exchange.23n Figures 10 and 11, we see the influence of a higher market impact and transaction costs in the litpool. Either with or without the intervention of the exchange and for small inventories, the marketmaker posts higher volumes in the dark pool than in the lit pool. We recover similar behavior for thedisplayed volumes as in the reference case with and without the exchange in Figures 3 and 4.
In this last section, we show how the volumes are split between the lit and dark pools when the marketimpact in the lit and dark pools are equal. We consider the following set of parameters: • Risk aversion of the market maker and of the exchange respectively: γ = 0 . , η = 0 . • Market impacts: Γ l = Γ d = 2 . × − ; • Influence of the imbalance on the orders arrival: θ l = θ d = 0 . • Volatility: σ = 0 . • Fees: c l = 0 . , c d = 0 . • Order flow intensity parameters: A l = 5 × , A d = 3 × . −300 −200 −100 0 100 200 300Inventory020406080100120140160 Optimal volumes ask litbid litask darkbid dark
Figure 12: Optimal quotes of the market makerwithout the exchange. −300 −200 −100 0 100 200 300Inventory020406080100120140160
Optimal volumes ask litbid litask darkbid dark
Figure 13: Optimal quotes of the market makerwith the exchange.In Figures 12 and 13, we see that a higher market impact reduces the volume posted on both lit anddark pools. We also recover a behavior similar to the reference case without the exchange. For thecase with the exchange, for very positive (resp. negative) inventory, the market maker has an ask(resp. bid) imbalance slightly above 1 /
2, meaning that market takers on the bid (resp. ask) of thedark pool are more likely to be executed at a price unfavorable for them due to the latency effect.24
Appendix
A.1 Dynamic programming principle and contract representation
For any F stopping time τ ∈ [ t, T ] and L ∈ A τ , we define: J T ( τ, L ) = E L τ " − D τ,T ( L )exp (cid:16) − γξ (cid:17) , V τ = sup L∈A τ J T ( τ, L ) , where A τ denotes the restriction of A to controls on [ τ, T ] and D τ,T ( L ) := exp − γ (cid:18) Z Tτ X i ∈{ a,b } (cid:18) T ℓ i,lt d N i,lt + X κ ∈ K φ lat ( κ ) T ℓ i,dt d N i,d,κt (cid:19) + Q t d S t + d h Q · , S · i t (cid:19)! , where d h Q · , S · i t = − X i ∈{ a,b } j ∈{ l,d } X k ∈V j Γ j k d N i,j,kt . We now set the dynamic programming principle associated to the control problem (4.1).
Lemma A.1.
Let t ∈ [0 , T ] and τ be an F stopping time with values in [ t, T ] . Then V t = ess sup L∈A E L t " − D t,τ ( L ) V τ . The proof can be found in [10, Lemma A.4].
A.2 Proof of Theorem 2
To prove that C = Ξ, we proceed in six steps. Our approach is largely inspired by [10]. However, forthe sake of completeness, we provide here the details. Step 1:
For
L ∈ A it follows from the dynamic programming principle of Lemma A.1 that the process U L t = V t D ,t ( L )defines a P L -supermartingale for any L ∈ A . By standard analysis, we may then consider it in itscàdlàg version (by taking right limits along rationals). By the Doob-Meyer decomposition, we canwrite U L t = M L t − A L t where M L is a P L -martingale and A L t = A L ,ct + A L ,dt is an integrable non-decreasing predictable process such that A L ,c = A L ,d = 0 with pathwise continuous component A L ,c and with A L ,d a piecewise constant predictable process.From the martingale representation theorem under P L , see Appendix A.1 in [10], there exists ˜ Z L =( ˜ Z L ,S , ˜ Z L ,i,j,k ) i ∈{ a,b } ,j ∈{ l,d } ,k ∈V j predictable, such that M L t = V + Z t ˜ Z L ,Sr d ˜ S r + X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z t ˜ Z L ,i,jr d N L ,i,jr . tep 2: We now show that V is a negative process. Thanks to the uniform boundedness of L ∈ A and I a , I b ∈ [0 ,
1] we get that L L T L L t ≥ α t,T = exp − X j ∈{ l,d } θ j σ ( N a,jT − N a,jt + N b,jT − N b,jt ) − A j − ǫ )( T − t ) ! . Therefore using the definition of D t,T ( L ), we obtain V t ≤ E t (cid:20) − α t,T exp (cid:18) − γ (cid:16) T − Γ l − Γ d ) q (cid:16) X i ∈{ a,b } X j ∈{ l,d } N i,jT − N i,jt (cid:17) + Z Tt Q u d ˜ S u (cid:17)(cid:19) exp( − γξ ) (cid:21) < . Step 3:
Let Y be the process defined for any t ∈ [0 , T ] by V t = − exp( − γY t ). As A L ,d is a predictablepoint process and the jumps of N i,j,k , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j are totally inaccessible stoppingtimes under P , we have D N i,j,k , A L ,d E t = 0 a.s. We obtain Y T = ξ and d Y t = (cid:18) X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z i,j,kt d N i,j,kt (cid:19) + Z ˜ St d ˜ S t − dI t − d ˜ A dt . Ito’s formula yields to Z a,l,kt := − γ log (cid:18) Z L ,a,l,kt U L t − (cid:19) − ℓ a,lt (cid:18) T l Q t − (cid:19) + Γ l k ,Z b,l,kt := − γ log (cid:18) Z L ,b,l,kt U L t − (cid:19) − ℓ b,lt (cid:18) T − Γ l Q t − (cid:19) + Γ l k ,Z a,d,kt := − γ log (cid:18) Z L ,a,d,kt U L t − (cid:19) − ℓ a,dt (cid:18) T ν at =0 + Γ d Q t − (cid:19) + Γ d k ,Z b,d,kt := − γ log (cid:18) Z L ,b,d,kt U L t − (cid:19) − ℓ b,dt (cid:18) T ν bt =0 − Γ d Q t − (cid:19) + Γ d k ,Z ˜ St := − ˜ Z L ,St γU L t − − Q t − ,I t := Z t (cid:18) h ( L r , Z r , Q r )d r − γU L r dA L ,cr (cid:19) ,h ( L , Z t , Q t ) := h ( L , Z t , Q t ) − γσ ( Z ˜ St ) , ˜ A dt := 1 γ X s ≤ t log (cid:18) − ∆ A L ,dt U L t − (cid:19) . In particular, the last relation between ˜ A d and A L ,d shows that ∆ a t ≥ L ∈ A ,with a t = − A L ,dt U L t − and abusing notations slightly, ∆ a t = − ∆ A L ,dt U L t − .In order to complete the proof, we argue in the subsequent steps that Z ∈ Z and that, for t ∈ [0 , T ], A L ,dt = − P s ≤ t U L s − ∆ a s = 0 so that ˜ A dt = 0 and I t = R t H ( Z r , Q r )d r where26 ( Z t , Q t ) = H ( Z t , Q t ) − γσ ( Z ˜ St ) . Step 4:
Since V T = −
1, we get that0 = sup
L∈A E L [ U L T ] − V = sup L∈A E L [ U L T − M L T ]= γ sup L∈A E (cid:20) L L T Z T U L r − (d I r − h ( L , Z r , Q r )d r + d a r γ ) (cid:21) . Moreover, the controls being uniformly bounded, we have U L t ≤ − β t := V t exp (cid:18) − γ (cid:16) T − Γ l − Γ d ) q ( X i ∈{ a,b } X j ∈{ l,d } N i,jt ) + Z t Q r d ˜ S r (cid:17)(cid:19) < . Then, using A L ,d ≥ , U L ≤ I t − h ( L , Z t , Q t )d t ≥
0, obtain0 ≤ sup L∈A E (cid:20) α ,T Z T − β r − (cid:16) d I r − h ( L , Z r , Q r )d r + d a r γ (cid:17)(cid:21) = − E (cid:20) α ,T Z T β r − (cid:16) d I r − H ( Z r , Q r )d r + d a r γ (cid:17)(cid:21) . The quantities α ,T R T β r − ( dI r − H ( Z r , Q r ))d r and α ,T R T β r − da r γ being non-negative random vari-ables, the result follows.Moreover, if L is such that for any ( z, q ) ∈ R V l + V d ) × N we have h ( L , z, q ) = H ( z, q ), then Z T U L r − (cid:16) dI r − h ( L , Z r , Q r ) (cid:17) d r = 0 . Therefore, sup
L∈A E L [ U L T ] = V which implies that (OC) is satisfied. Conversely, if (OC) is satisfied,the supremum is directly attained. This provides the inclusion C ⊃ Ξ. Step 5: As ξ satisfies Conditions (4.2) and (4.3), to prove that Z ∈ Z it is enough to show that forsome p > L∈A sup t ∈ [0 ,T ] E L (cid:20) exp (cid:16) − γ ( p + 1) Y t (cid:17)(cid:21) < + ∞ . Using Hölder inequality together with the boundedness of the intensities of the N i,j,k , we have thatsup L∈A E L [ | U L T | p ′ +1 ] < + ∞ for some p ′ >
0. Thussup
L∈A sup t ∈ [0 ,T ] E L [ | U L t | p ′ +1 ] = sup L∈A E L [ | U L T | p ′ +1 ] < + ∞ , U L is a P L -negative supermartingale. The conclusion follows using again Hölder inequality,the uniform boundedness of the intensities of the N i,j and the fact thatexp( − γY t ) = U L t exp γ (cid:18) Z t X i ∈{ a,b } (cid:18) T ℓ i,lu d N i,lu + X κ ∈ K φ lat ( κ ) ℓ i,du d N i,d,κu (cid:19) + Q u d S u + d h Q · , S · i u (cid:19)! . Consequently,
C ⊂
Ξ and using Step 4 we finally get C = Ξ. Step 6:
We prove here uniqueness of the representation. Let ( Y , Z ) , ( Y ′ , Z ′ ) ∈ R × Z be such that ξ = Y Y ,ZT = Y Y ′ ,Z ′ T . By following the lines of the verification argument in second part of the proof ofthe theorem, we obtain the equality Y Y ,Zt = Y Y ′ ,Z ′ t using the fact that the value of the continuationutility of the market maker satisfies − exp( − γY Y ,Zt ) = − exp( − γY Y ′ ,Z ′ t ) = ess sup L∈A E L t (cid:20) − exp (cid:16) − γ ( P L L T − P L L t + ξ ) (cid:17)(cid:21) . This in turn implies that Z i,j,kt d N i,j,kt = Z ′ i,j,kt d N i,j,kt and Z ˜ St σ d t = Z ′ ,St σ d t = d h Y, S i t , t ∈ [0 , T ].Thus ( Y , Z ) = ( Y ′ , Z ′ ).We now prove the second part of Theorem 2. Let ξ = Y Y ,ZT with ( Y , Z ) ∈ R × Z . We first showthat for an arbitrary set of controls L ∈ A we have J MM0 ( L , ξ ) ≤ − exp( − γY ) where we recall that J MM0 ( L , ξ ) is such that V MM0 ( ξ ) = sup L∈A J MM0 ( L , ξ ). Then we will see that this inequality is in factan equality when the corresponding Hamiltonian h ( L , z, q ) is maximized. Let us write Y t := Y Y ,Zt + Z t T ℓ a,lu d N a,lu + Z t T ℓ b,lu d N b,lu + Z t Q u d S u + d h Q · , S · i u + Z t T ℓ a,du d N a,d, lat u + Z t T ℓ b,du d N b,d, lat u , with t ∈ [0 , T ]. An application of Ito’s formula leads tod (cid:16) exp( − γY t ) (cid:17) = γ exp( − γY t − ) − ( Q t + Z ˜ St )d ˜ S t + ( H ( Z t , Q t ) − h ( L , Z t , Q t ))d t − X ( k l ,k d ) ∈V l ×V d X i ∈{ a,b } γ − − exp (cid:18) − γ (cid:16) Z i,l,k l t + ℓ i,lt ( T l ( φ ( i ) Q t − − ℓ i,lt )) (cid:17)(cid:19)! d N L ,i,l,k l t − X κ ∈ K − exp (cid:18) − γ (cid:16) Z i,d,k d t + ℓ i,dt (cid:16) T φ lat ( κ )+Γ d ( φ ( i ) Q t − − ℓ i,dt ) (cid:17)(cid:17)(cid:19)! φ dt ( i, κ )d N L ,i,d,k d t . Therefore exp( − γY . ) is a P L -local submartingale. Thanks to Condition (4.7), the uniform bounded-ness of the intensities of the N i,j,k , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j and Hölder inequality, exp( − γY · ) is28niformly integrable and hence a true submartingale. Doob-Meyer decomposition gives us that Z · γ exp( − γY t − ) − ( Q t + Z ˜ St )d ˜ S t − X ( k l ,k d ) ∈V l ×V d X i ∈{ a,b } γ − − exp (cid:18) − γ (cid:16) Z i,l,k l t + ℓ i,lt ( T l ( φ ( i ) Q t − − ℓ i,lt )) (cid:17)(cid:19)! d N L ,i,l,k l t − X κ ∈ K − exp (cid:18) − γ (cid:16) Z i,d,k d t + ℓ i,dt (cid:16) T φ lat ( κ )+Γ d ( φ ( i ) Q t − − ℓ i,dt ) (cid:17)(cid:17)(cid:19)! φ dt ( i, κ )d N L ,i,d,k d t is a true martingale. Thus J MM0 ( L , ξ ) = E L (cid:20) − exp( − γY T ) (cid:21) = − exp( − γY ) − E L " Z T γ exp( − γY t − ) (cid:16) H ( Z t , Q t ) − h ( L , Z t , Q t ) (cid:17) d t ≤ − exp( − γY ) . In addition to this, the previous inequality becomes an equality if and only if L is chosen as themaximizer of the Hamiltonian h . In that case, J MM ( L , ξ ) = − exp( − γY ). Finally we have that V MM0 ( ξ ) = − exp( − γY ) with optimal response ( L ⋆t ) t ∈ [0 ,T ] defined by (OC). A.3 Proof of Theorem 3
We recall that, by [4, Corollary 1.4.2], the PDE (4.15) admits a unique continuous viscosity solutiondenoted by v .Let ( t , ˜ s , ¯ n , n , y ) ∈ D where D = [0 , T ] × R × N V l + V d ) × N V l + V d ) × R . We consider a testfunction Φ : D → R continuously differentiable in time, twice continuously differentiable with respectto s and y and continuous with respect to ¯ n and n , such that0 = u ( t , ˜ s , ¯ n , n , y ) − Φ( t , ˜ s , ¯ n , n , y )= max ( t, ˜ s, ¯ n,n,y ) ∈ D exp (cid:16) − η ( X i ∈{ a,b } j ∈{ l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17)(cid:18) v ( t, q ) − Φ( t, ˜ s, ¯ n, n, y ) exp (cid:16) η ( X i ∈{ a,b } j ∈{ l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17)(cid:19) . Therefore for all ( t, ˜ s, ¯ n, n, y ) ∈ D ≥ v ( t, q ) − Φ( t, ˜ s, ¯ n, n, y ) exp (cid:16) η ( X i ∈{ a,b } j ∈{ l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) , with equality at ( t , ˜ s , ¯ n , n , y ). Thus0 = v ( t , q ) − Ψ( t , ˜ s , ¯ n , n , y )= max ( t, ¯ n ) ∈ D (cid:18) v ( t, q ) − Ψ( t, ¯ n ) (cid:19) , t, ¯ n ) := Φ( t, ˜ s , ¯ n, n , y ) exp (cid:16) η ( X i ∈{ a,b } j ∈{ l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) . As v is the unique viscosity solution of (4.15), it is in particular a subsolution. Thus, for any z ∈ R V l + V d )+1 , Ψ satisfies0 ≥ ∂ t Ψ( t , ¯ n ) + U (cid:16) z, q , L ⋆ ( z, q ) , Ψ( t , · ) (cid:17) , with q := P j ∈{ l,d } P k ∈V j (¯ n b,j,k − ¯ n a,j,k ), U is defined by (4.16) and L ⋆ is defined in Theorem 2. Aftercomputations, we deduce that0 ≥ ∂ t Ψ( t , ¯ n ) + Ψ( t , ¯ n ) (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i ∈{ a,b } j ∈{ l,d } X k ∈V j λ L ⋆ ,i,j,k × exp (cid:16) η ( z i,j,k − kc j ) (cid:17) Φ( t , ˜ s , ¯ n i,j,k + k, ¯ n − ( i,j,k )0 , n , y ) exp (cid:16) η ( X i ∈{ a,b } j ∈{ l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) − Ψ( t , ˜ s , ¯ n , n , y ) (cid:16) η E ( z i,j,k , ℓ ⋆i,j ( z, q )) (cid:17)! . Dividing on both sides of the equation by exp (cid:16) η ( P i ∈{ a,b } j ∈{ l,d } P k ∈V j c j ¯ n i,j,k − y ) (cid:17) >
0, we obtain0 ≥ ∂ t Φ( t , ˜ s , ¯ n , n , y ) + Φ (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i ∈{ a,b } j ∈{ l,d } X k ∈V j λ L ⋆ ,i,j,k × exp (cid:16) η ( z i,j,k − kc j ) (cid:17) Φ( t , ˜ s , ¯ n i,j,k + k, ¯ n − ( i,j )0 , n , y ) − Φ (cid:16) η E ( z i,j,k ,ℓ ⋆i,j ( z, q )) (cid:17)! , where Φ := Φ( t , ˜ s , ¯ n , n , y ). Therefore, u is a viscosity subsolution of (4.13). A similar argumentholds to prove that u is also a viscosity supersolution of (4.13). Consequently, u is a viscosity solutionof (4.13). The uniqueness of u follows from an application of [16, Theorem II.3], together with thecontinuity of v . Thus, we deduce that v E = u (0 , ˜ S , ¯ N , N , Y ) = v (0 , Q ). References [1] M. Avellaneda and S. Stoikov. High-frequency trading in a limit order book.
QuantitativeFinance , 8(3):217–224, 2008.[2] A. Bachouch, C. Huré, N. Langrené, and H. Pham. Deep neural networks algorithms forstochastic control problems on finite horizon, part 2: numerical applications. arXiv preprintarXiv:1812.05916 , 2018.[3] B. Baldacci, D. Possamaï, and M. Rosenbaum. Optimal make take fees in a multi market makerenvironment. arXiv preprint arXiv:1907.11053 , 2019.304] B. Bouchard. Introduction to stochastic control of mixed diffusion processes, viscosity solutionsand applications in finance and insurance.
Lecture Notes Preprint , 2007.[5] J.-P. Bouchaud. Price impact.
Encyclopedia of Quantitative Finance , 2010.[6] A. Cartea, S. Jaimungal, and J. Penalva.
Algorithmic and high-frequency trading . CambridgeUniversity Press, 2015.[7] A. Cartea, S. Jaimungal, and J. Ricci. Buy low, sell high: A high frequency trading perspective.
SIAM Journal on Financial Mathematics , 5(1):415–444, 2014.[8] J. Cvitanić, D. Possamaï, and N. Touzi. Moral hazard in dynamic risk management.
ManagementScience , 63(10):3328–3346, 2016.[9] J. Cvitanić, D. Possamaï, and N. Touzi. Dynamic programming approach to principal–agentproblems.
Finance and Stochastics , 22(1):1–37, 2018.[10] O. El Euch, T. Mastrolia, M. Rosenbaum, and N. Touzi. Optimal make-take fees for marketmaking regulation. 2018.[11] O. Guéant.
The Financial Mathematics of Market Liquidity: From optimal execution to marketmaking . Chapman and Hall/CRC, 2016.[12] O. Guéant, C.-A. Lehalle, and J. Fernandez-Tapia. Dealing with the inventory risk: a solutionto the market making problem.
Mathematics and Financial Economics , 7(4):477–507, 2013.[13] J. Han, A. Jentzen, and E. Weinan. Solving high-dimensional partial differential equations usingdeep learning.
Proceedings of the National Academy of Sciences , 115(34):8505–8510, 2018.[14] N. Langrené, C. Huré, H. Pham, and A. Bachouch. Algorithmes probabilistes pour les équationsde Hamilton-Jacobi-Bellman en dimension élevée.[15] S. Laruelle, C.-A. Lehalle, and G. Pagès. Optimal split of orders across liquidity pools: astochastic algorithm approach.
SIAM Journal on Financial Mathematics , 2(1):1042–1076, 2011.[16] P.-L. Lions. Hamilton-Jacobi-Bellman equations and the optimal control of stochastic systems.In
Proceedings of the International Congress of Mathematicians , volume 1, page 2, 1983.[17] Y. Sannikov. A continuous-time version of the principal-agent problem.
The Review of EconomicStudies , 75(3):957–984, 2008.[18] A. Sokol. Optimal Novikov-type criteria for local martingales with jumps.
Electronic Communi-cations in Probability , 18, 2013.[19] B. Toth, Z. Eisler, and J.-P. Bouchaud. The short-term price impact of trades is universal.