[PDF] Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach

Abstract

We consider the issue of a market maker acting at the same time in the lit and dark pools of an exchange. The exchange wishes to establish a suitable make-take fees policy to attract transactions on its venues. We first solve the stochastic control problem of the market maker without the intervention of the exchange. Then we derive the equations defining the optimal contract to be set between the market maker and the exchange. This contract depends on the trading flows generated by the market maker's activity on the two venues. In both cases, we show existence and uniqueness, in the viscosity sense, of the solutions of the Hamilton-Jacobi-Bellman equations associated to the market maker and exchange's problems. We finally design deep reinforcement learning algorithms enabling us to approximate efficiently the optimal controls of the market maker and the optimal incentives to be provided by the exchange.

Full PDF

aa r X i v : . [ q -f i n . M F ] D ec Market making and incentives design in the presence of adark pool: a deep reinforcement learning approach ∗ Bastien

Baldacci † Iuliia

Manziuk ‡ Thibaut

Mastrolia § Mathieu

Rosenbaum ¶ December 4, 2019

Abstract

We consider the issue of a market maker acting at the same time in the lit and dark pools ofan exchange. The exchange wishes to establish a suitable make-take fees policy to attract trans-actions on its venues. We ﬁrst solve the stochastic control problem of the market maker withoutthe intervention of the exchange. Then we derive the equations deﬁning the optimal contract tobe set between the market maker and the exchange. This contract depends on the trading ﬂowsgenerated by the market maker’s activity on the two venues. In both cases, we show existenceand uniqueness, in the viscosity sense, of the solutions of the Hamilton-Jacobi-Bellman equationsassociated to the market maker and exchange’s problems. We ﬁnally design deep reinforcementlearning algorithms enabling us to approximate eﬃciently the optimal controls of the marketmaker and the optimal incentives to be provided by the exchange.

Keywords:

Market making, dark pools, regulation, make-take fees, stochastic control, principal-agent problem, deep reinforcement learning, actor-critic method

Since the seminal work [1], a vast literature on optimal market making problems has emerged. Amarket maker is a liquidity provider whose role is to post orders on the bid and ask sides of the limitorder book of an underlying asset. Various extensions of [1] have been considered, see for example[7, 12] and the books [6, 11] for further references. In most of these works, it is assumed that there ∗ This work beneﬁts from the ﬁnancial support of the Chaires Analytics and Models for Regulation, FinancialRisk, and Finance and Sustainable Development. The authors would like to thank Charles-Albert Lehalle for fruitfuldiscussions and remarks. Bastien Baldacci and Mathieu Rosenbaum gratefully acknowledge the ﬁnancial support of theERC Grant 679836 Staqamof. Thibaut Mastrolia gratefully acknowledges the support of the ANR project PACMANANR-16-CE05-0027. † École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected]. ‡ Université Paris 1 Panthéon-Sorbonne. Centre d’Economie de la Sorbonne. 106, boulevard de l’Hôpital, 75013Paris, France, [email protected] § École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected] ¶ École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected]

1s no make-take fees system on the market. The problem of relevant make-take fees is studied quan-titatively in [3, 10]. In these papers, the policies are designed in the context of traditional liquidityvenues, or so-called “lit pools”. On these venues, the order book is visible to market participants,and transactions are fully transparent. Market takers can in particular monitor the quotes oﬀered bymarket makers.However, recent regulatory changes have induced a rise of diﬀerent types of alternative trading mech-anisms, notably “dark pools”, which have gained a signiﬁcant market share. Nowadays, many majorexchanges, such as Bats-ChiX and Turquoise, have their dark pools in addition to their major trad-ing platforms. Furthermore, several traditional exchanges such as NYSE and Euronext oﬀer tradingplatforms whose functioning is inspired mainly by dark pools. Trading rules for dark pools are verydiversiﬁed, but they share at least two important properties. The ﬁrst one is the absence of a visibleorder book for market participants, which implies that investors have no information on the amountof liquidity posted by market makers. Second, aiming at improving prices for clients compared tothe lit venue, dark pools usually set prices that are diﬀerent from those in the lit pool. For example,many dark pools take the mid-price of the lit pool as their transaction price. Because of these twoeﬀects, it is presumed that trades in dark pools have no or less price impact. This feature enablesmarket makers to mitigate their inventory risk. Finally, a remarkable phenomenon is that dark poolsare prone to a latency eﬀect: the price being monitored in the lit pool can change between the time ofa request in the dark pool and that of the corresponding transaction. Such price discrepancy due tolatency is particularly frequent in the presence of high imbalance because the price is likely to movewhen liquidity is scarce on one side of the book.As the market impact of trades on a dark pool is less important or delayed, market makers can alsouse it to liquidate large positions. Therefore there is a trade-oﬀ between transacting in the dark poolat a lousy price with low impact or in the lit pool at a better price with higher impact. Dark poolsare also very attractive for market takers because of the reduced market impact and the possibilityto be executed at a better price than in the lit pool.To our knowledge, most of studies treat the issue of trading in dark pools mainly from the point ofview of optimal liquidation: a trader wishing to buy or sell a large number of shares of one or severalstocks and needing to ﬁnd an optimal order placement strategy between the lit and dark pools, seefor example [15]. In this paper, we rather focus on the behavior of a market maker, acting on both litand dark venues. In the lit market, we assume that there is an eﬃcient price S t and that the marketmaker always posts volumes on the bid and ask sides at prices S t + T and S t − T , where T representsthe half-tick of the market. The market maker also provides liquidity in the dark pool where thetransaction price is the eﬃcient price S t (possibly with the latency eﬀect). This can partially be seenas the dual problem of [10], without dark pools, where the posted volume is ﬁxed at one unit, and themarket maker optimizes the quoted spread. In addition to market impact and latency phenomena,we also take into account transaction costs for market orders on both venues, which can be smaller inthe dark pool. Thus, in our setting, a single market maker only needs to select the volumes to poston the bid and ask sides of both lit and dark pools. Note however, that transactions’ reporting imposed by regulation in most markets may still induce some delayedprice impact. We have in mind here a large tick asset for which the spread equals the tick size.

2n exchange managing the lit and dark venues wishes to attract transactions. Inspired by thework [10], we consider that the exchange oﬀers a contract to the market maker whose remunera-tion at a terminal time is determined according to the executed transactions on both venues. Thisis a so-called principal-agent framework, ﬁrst formalized in [8, 9, 17]. Here, the wealth of the ex-change (the principal) depends on the market order ﬂows, which are a function of the volumes postedby the market maker (the agent). However, the exchange cannot control those volumes and mayonly provide incentives to inﬂuence the market maker’s behavior. These incentives take the form of acontract between the market maker and the exchange, whose payoﬀ depends on observed trading ﬂows.To ﬁnd an optimal contract and optimal volumes for the market maker in response to this contract, weneed to solve a nonlinear Hamilton-Jacobi-Bellman (HJB for short) equation. Dimensionality (abovefour) and complexity of the resulting equations do not allow us to apply classical root-ﬁnding algo-rithms. Therefore we use a method based on neural networks to solve our HJB equations. Neuralnetworks have been at the core of recent studies on high-dimensional PDE resolution. In [13], the au-thors introduce a deep learning-based methodology that can handle general high-dimensional parabolicPDEs. This approach relies on the reformulation of PDEs via Backward Stochastic Diﬀerential Equa-tions, where neural networks approximate the gradients of the unknown solution. Since then, manyextensions have been proposed, see for example [2, 14].In our setting, the market maker has to ﬁx volumes in response to the incentives of the exchange.These volumes are functions of the incentives (and of the market maker’s inventory), which are thesolution of a nonlinear equation. The resolution of our principal-agent problem consists of two stages.The ﬁrst stage is to represent the volumes posted by the market maker by a neural network. Takinginto account the optimal response of the market maker to given incentives, the exchange needs tochoose the contract maximizing its utility. So the second stage is to solve a HJB equation to obtain theoptimal contract. However, dimensionality and the high degree of nonlinearity of this equation makestandard numerical methods hard to apply. We circumvent this diﬃculty by adopting a reinforcementlearning method. More precisely, we use an actor-critic approach where not only the controls of theexchange, but also its value function are represented by neural networks. The essence of this methodis the alternation of the learning phases of the controls and of the value function.The paper is organized as follows. Market dynamics are introduced in Section 2. In Section 3, we ﬁrstinvestigate the problem of a market maker acting on both lit and dark venues without any incentivepolicy from the exchange. His goal is to maximize his PnL process while managing his inventory risk.It is a stochastic control problem, where the corresponding HJB equation cannot be solved explicitly.We show existence and uniqueness of a viscosity solution for this equation.In Section 4, we analyze the bi-level optimization problem associated with the issue of optimal con-tracting between the market maker and the exchange owning both lit and dark pools. Followingrecent works on make-take fees policies mentioned above, we ﬁrst prove a representation theorem forthe contract proposed to the market maker. We then establish existence and uniqueness of a viscositysolution for the HJB equation corresponding to the problem of the exchange.A key diﬀerence with [3, 10] is the absence of a closed-form solution for the best response of the3arket maker to a given contract. Therefore, the HJB equation of the exchange cannot be solvedexplicitly. In Section 5, we introduce a deep reinforcement learning method as a computational toolenabling us to address both exchange and market maker’s problems in practice. We conclude thissection with numerical experiments, illustrating various behaviors of the market maker under diﬀerentmarket scenarios.

The framework considered throughout this paper is inspired by the article [1] in which the authorsinvestigate the problem of optimal market making without intervention of an exchange. Let

T > V l , V d ⊂ N the sets of possible values for volumes in the lit and dark pools, ofcardinality V l , V d . We deﬁne Ω := Ω c × Ω V l + V d ) d with Ω c the set of continuous functions from[0 , T ] into R and Ω d the set of piecewise constant càdlàg functions from [0 , T ] into N . Ω is a subspaceof the Skorokhod space D ([0 , T ] , R V l + V d )+1 ) of càdlàg functions from [0 , T ] into R V l + V d )+1 andwrite F for the trace Borel σ -algebra on Ω, where the topology is the one associated with the usualSkorokhod distance on D ([0 , T ] , R V l + V d )+1 ).We deﬁne ( X t ) t ∈ [0 ,T ] := ( W t , N i,j,kt ) t ∈ [0 ,T ] ,i ∈{ a,b } ,j ∈{ l,d } ,k ∈V j as the canonical process on Ω, that is for any ω := ( w, n i,j,k ) ∈ Ω W t ( ω ) := w ( t ) , N i,j,kt ( ω ) = n i,j,k ( t ) , i ∈ { a, b } , j ∈ { l, d } and k ∈ V j . For any i ∈ { a, b } , j ∈ { l, d } and k ∈ V j , N i,j,kt denotes the total number of trades of size k madebetween time 0 and time t , where a , b stand for the ask and bid side respectively and l , d for the litand dark pools respectively. Finally the process W represents the mid-price of the traded asset.Then we deﬁne the probability P on (Ω , F ) under which W t and the N i,j,kt are independent, W t is aone-dimensional Brownian motion and the N i,j,kt , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j are Poisson processeswith intensity ǫ > Finally, we endow the space (Ω , F ) with the ( P − completed)canonical ﬁltration F := ( F t ) t ∈ [0 ,T ] generated by ( X t ) t ∈ [0 ,T ] . In this section, we formalize the connection between volumes posted by the market maker and arrivalintensity of market orders on the ask and bid sides of both venues. We also take into account marketimpact phenomenon and latency eﬀect in the dark pool.

Let 2 q ∈ N represent a risk limit for the market maker, which corresponds to the maximum num-ber of cumulated bid and ask orders the market maker can handle. We deﬁne the volume process In other words, P is the product measure of the Wiener measure on Ω c and the unique measure on Ω V l + V d ) d so that the canonical process corresponds to a multidimensional homogeneous Poisson process with arbitrary smallintensity, representing a situation where no liquidity is available. L t ) t ∈ [0 ,T ] := ( L lt , L dt ) t ∈ [0 ,T ] ∈ ( V l ) × ( V d ) , where L lt = ( ℓ a,lt , ℓ b,lt ) t ∈ [0 ,T ] and L dt = ( ℓ a,dt , ℓ b,dt ) t ∈ [0 ,T ] with ℓ i,jt corresponding to the volume posted by the market maker at time t on side i ∈ { a, b } of pool j ∈ { l, d } . The set A of admissible controls of the market maker is therefore deﬁned as A := n ( L t ) t ∈ [0 ,T ] predictable, s.t for i ∈ { a, b } , ℓ i,l + ℓ i,d ∈ [0 , q ] o . The market maker manages his inventory Q t , deﬁned as the aggregated sum of the volumes ﬁlled onboth sides of the lit and dark pools, namely Q t := X j ∈{ l,d } X ( k a,j ,k b,j ) ∈ ( V j ) k b,j N b,j,kt − k a,j N a,j,kt . Remark 2.1.

Note that we assume that there is no partial execution in our model. Therefore marketorders consume the whole volume posted by the market maker on the considered side and pool.

We deﬁne the function ψ i,j ( L lt ) := ( I a ( L lt ) if ( i, j ) ∈ { ( a, l ) , ( b, d ) } I b ( L lt ) if ( i, j ) ∈ { ( b, l ) , ( a, d ) } , where I a ( L lt ) := ℓ a,lt ℓ a,lt + ℓ b,lt , I b ( L lt ) := ℓ b,lt ℓ a,lt + ℓ b,lt represent the imbalances on the ask and bid sides of thelit pool respectively. To model the behavior of market takers, we deﬁne the intensities of the pro-cesses N i,j,k as λ L ,i,j,kt := λ i,j ( L lt ) { φ ( i ) Q t − > − q,ℓ i,jt = k } , φ ( i ) := ( i = a − i = b, where λ i,j ( L lt ) := A j exp (cid:18) − θ j σ ψ i,j ( L lt ) (cid:19) {L lt =(0 , } + ǫ {L lt =(0 , } , where σ > θ l , θ d > A l , A d > L ∈ A , we introduce a new probability measure P L under which W remains a one-dimensionalBrownian motion and for i ∈ { a, b } , j ∈ { l, d } , k ∈ V j the N L ,i,j,kt := N i,j,kt − Z t λ L ,i,j,ku d u are martingales. This probability measure is deﬁned by the corresponding Doléans-Dade exponential L L t := exp  X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z t { φ ( i ) Q u − > − q,ℓ i,jt = k } log (cid:18) λ i,j ( L lu ) ǫ (cid:19) d N i,j,ku − (cid:18) λ i,j ( L lu ) − ǫ (cid:19) d u ! , ℓ i,j . We can therefore set the Girsanovchange of measure with d P L d P | F t = L L t for all t ∈ [0 , T ]. In particular, all the probability measures P L indexed by L are equivalent. We write E L t for the conditional expectation with respect to F t underthe probability measure P L . We also deﬁne for i ∈ { a, b } , j ∈ { l, d } the processes N i,jt := X k ∈V j N i,j,kt of intensities λ i,j ( L lt ) { φ ( i ) Q t − > − q } . These processes correspond to the total number of transactionsexecuted on the bid or ask side of the lit or dark pools. We deﬁne the eﬃcient price of the underlying asset, observable by all market participants (in thesense that they can infer it) as ˜ S t := ˜ S + σW t , where ˜ S > σ > t ∈ [0 , T ] by S t := ˜ S t + X j ∈{ l,d } Z t Γ j ℓ a,ju d N a,ju − Γ j ℓ b,ju d N b,ju , (2.1)where Γ l , Γ d > Remark 2.2.

The market impact parameters Γ l , Γ d are taken small enough with respect to the ticksize to discard obvious arbitrage opportunities. Moreover, as the market impact in the dark pool isusually smaller or delayed compared to the lit pool, we will take Γ l ≥ Γ d . We assume that in the lit pool, the best bid and best ask prices P b,l and P a,l satisfy P b,lt := S t − T , P a,lt := S t + T , t ∈ [0 , T ] , where T > The associated Novikov criterion is given in [18].  P b,d, lat t := S t − T , P a,d, lat t := S t + T ,P b,d, non-lat t := S t , P a,d, non-lat t := S t . Recall that in most dark pools, market takers are supposed to be executed at the mid-price of thelit pool. However, the higher the imbalance on the ask (resp. bid) side of the lit pool, the higherthe probability that the mid-price will move down (resp. up) quickly. To model the latency eﬀect,we introduce Bernoulli random variables ν at ∼ Ber (cid:16) I a ( L lt ) (cid:17) , ν bt ∼ Ber (cid:16) I b ( L lt ) (cid:17) which are associatedto each incoming market order in the dark pool. If ν t = 1, there is no latency, and conversely for ν t = 0. So we deﬁne N a,d, lat t := Z t (1 − ν au ) dN a,du , N a,d, non-lat t := Z t ν au dN a,du , and N b,d, lat t := Z t (1 − ν bu ) dN b,du , N b,d, non-lat t := Z t ν bu dN b,du . Note that for any t ∈ [0 , T ], N i,d, lat t + N i,d, non-lat t = N i,dt for i ∈ { a, b } . To our knowledge, our approachis the ﬁrst one considering market making in the dark pool taking into account latency eﬀect. We address the problem of a market maker acting in the lit and dark pools, without intervention ofthe exchange. The proﬁt and loss (PnL for short) of the market maker is deﬁned as the sum of thecash earned from his executed orders and the value of his inventory. Thus it is expressed as

P L L t := W L t + Q t S t , where, at time t ∈ [0 , T ], W L t := Z T (cid:18) S t + T (cid:19) ℓ a,lt d N a,lt − Z T (cid:18) S t − T (cid:19) ℓ b,lt d N b,lt + Z T (cid:18) S t + T (cid:19) ℓ a,dt d N a,d, lat t + Z T S t ℓ a,dt d N a,d, non-lat t − Z T (cid:18) S t − T (cid:19) ℓ b,dt d N b,d, lat t − Z T S t ℓ b,dt d N b,d, non-lat t represents his cash process and Q t S t is the mark-to-market value of his inventory. Note that marketmaking activity in the dark pool without latency does not generate PnL through spread collection.We consider a risk averse market maker with exponential utility function and risk aversion parame-ter γ >

0. We deﬁne his optimization problem as V MM0 = sup

L∈A J MM0 ( L ) , (3.1) We take the convention I a (0 ,

0) = I b (0 ,

0) = 0. Note that for all t ∈ [0 , T ], R t ℓ i,ju d N i,ju = P k ∈V j kN i,j,kt . t ∈ [0 , T ], J MM t ( L ) = E L " − exp (cid:18) − γ ( P L L T − P L L t ) (cid:19) . Inspired by [10], we prove a dynamic programming principle for the control problem (3.1), see Sec-tion A.1, from which we derive the corresponding HJB equation. We deﬁne O = [0 , q ] . Similarlyto [12], we use a change of variable (see Equation (4.14) for the form of the ansatz) to reduce theinitial problem to the following HJB equation:0 = ∂ t v ( t, q ) + v ( t, q ) 12 σ γ q + sup L∈O  X ( k l ,k d ) ∈V l ×V d  X i ∈{ a,b } λ L ,i,l,k l t exp (cid:18) − γℓ i,l (cid:16) T l ( φ ( i ) q − ℓ i,l ) (cid:17)(cid:19) v ( t, q − φ ( i ) k l ) − v ( t, q ) ! + X i ∈{ a,b } X κ ∈ K λ L ,i,d,k d t φ d ( i, κ ) exp (cid:18) − γℓ i,d (cid:16) T φ lat ( κ )+Γ d ( φ ( i ) q − ℓ i,d ) (cid:17)(cid:19) v ( t,q − φ ( i ) k d ) − v ( t, q ) ! , (3.2)with K := { lat , non-lat } , φ lat ( κ ) := ( κ = lat0 if κ = non-lat , φ d ( i, κ ) := ( I b ( L l ) if ( i, κ ) ∈ { ( a, lat) , ( b, non-lat) } I a ( L l ) if ( i, κ ) ∈ { ( a, non-lat) , ( b, lat) } and terminal condition v ( T, · ) = −

1. We have the following theorem.

Theorem 1.

There exists a unique viscosity solution to the HJB equation (3.2) . It satisﬁes V MM0 = v (0 , Q ) . The supremum in (3.2) characterizes the optimal controls L ⋆ ∈ A . The proof follows the same arguments as Theorem 3 in Section A.3.We see that the supremum over L is not separable with respect to each control process as in [10, 12].To our best knowledge there is no explicit expression for the optimal controls of the market maker.Nevertheless, as shown in Section 5.2, we can solve PDE (3.2) numerically. More precisely, we makeuse of deep reinforcement learning techniques to approximate the optimal volumes posted of themarket maker. Let us now consider the case where a make-take fees system is in place and inﬂuences the amount ofliquidity provided by the market maker on both lit and dark venues.8 .1 Modiﬁed PnL of the market maker

Following the principal-agent approach of [10], we now assume that the exchange gives to the marketmaker a compensation ξ deﬁned as an F T − measurable random variable, which is added to his PnLprocess at terminal time T . This contract, designed by the exchange, aims at creating incentives sothat the market maker attracts more transactions.Therefore, the total payoﬀ of the market maker at time T is now given by W L T + Q T S T + ξ . Theproblem of the market maker then becomes V MM0 ( ξ ) := sup L∈A J MM0 ( L , ξ ) , (4.1)with J MM t (cid:16) L , ξ (cid:17) := E L t " − exp (cid:18) − γ ( P L L T − P L L t + ξ ) (cid:19) . To ensure that this functional is non-degenerate, we impose the following technical condition on ξ (see the next section for the deﬁnition of an admissible contract):sup L∈A E L " exp (cid:18) − γ ′ ξ (cid:19) < + ∞ , for some γ ′ > γ, (4.2)so that the optimization problem of the market maker is well-posed.For a ﬁxed compensation ξ , the optimal response L ⋆ associated with the market maker’s problem(4.1) is deﬁned as J MM0 ( L ⋆ , ξ ) = V MM0 ( ξ ) for L ⋆ ∈ A . (OC)We now consider the problem of the exchange wishing to attract liquidity on its platforms. We assume that the exchange receives ﬁxed fees c l , c d > c l , c d independent of the price of the asset.The goal of the exchange is essentially to maximize the total number of market orders sent duringthe period of interest. As the arrival intensities of market orders are controlled by the market makerthrough L , the contract ξ should aim at increasing these intensities. Thus, the exchange subsidizesthe agent at time T with the compensation ξ so that its PnL is given by X i ∈{ a,b } j ∈{ l,d } c j Z T ℓ i,jt d N i,jt − ξ.

9e now need to specify the set of admissible contracts potentially oﬀered by the exchange. We assumethat the exchange has exponential utility function with risk aversion parameter η >

0. The naturalwell-posedness condition for the problem of the exchange is E L ∗ " exp (cid:18) η ′ ξ (cid:19) < + ∞ , for some η ′ > η, (4.3)for any L ⋆ satisfying condition (OC).Since the N i,j are point processes with bounded intensities, this condition, together with Hölderinequality, ensure that the problem of the exchange is well-deﬁned. We also assume that the marketmaker only accepts contracts ξ such that V MM0 ( ξ ) is above some threshold value R <

0, that is ξ mustsatisfy V MM0 ( ξ ) ≥ R. (R)This threshold, called reservation utility of the agent, is the critical utility value under which themarket maker has no interest in the contract. This quantity has to be taken into account carefullyby the exchange when proposing a contract to the market maker. We can therefore deﬁne the spaceof admissible contracts C by C := (cid:26) ξ ∈ F T , s.t (4.2), (4.3) and (R) are satisﬁed (cid:27) . Thus the contracting problem the exchange has to solve is V E := sup ξ ∈C E L ⋆  − exp − η (cid:18) X i ∈{ a,b } j ∈{ l,d } c j Z T ℓ i,jt d N i,jt − ξ (cid:19)! . (4.4)In the next section, we characterize the form of an admissible contract ξ ∈ C . Inspired by [10], we prove in this section that without loss of generality, we can consider a speciﬁcform of contracts, deﬁned by some Y ∈ R and a predictable process Z = ( Z ˜ S , Z i,j,k ) i ∈{ a,b } ,j ∈{ l,d } ,k ∈V j chosen by the principal. A contract ξ of this form can be written as ξ = Y Y ,ZT := Y + Z T (cid:18) X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z i,j,ku d N i,j,ku (cid:19) + Z ˜ Su d ˜ S u + (cid:18) γσ ( Z ˜ Su + Q u ) − H ( Z u , Q u ) (cid:19) d u, (4.5)where H ( z, q ) := sup L∈A h ( L , z, q ) , (4.6) Note that for ﬁxed ξ , the control L ⋆ is not necessarily unique. However, numerical results seem to indicate itsuniqueness. Otherwise we could also consider a supremum over all L ⋆ satisfying (OC), as it is usually done in principal-agent theory (see for instance [9, Section 2.4]). h : R × R V l + V d ) × Z → R is the Hamiltonian of the agent’s problem. To ensure admissibilityof the contract, the process ( Z t ) t ∈ [0 ,T ] has to satisfy the following technical conditions:sup L∈A sup t ∈ [0 ,T ] E L " exp (cid:18) − γ ′ Y ,Zt (cid:19) < + ∞ , for some γ ′ > γ, (4.7)and Z T | Z ˜ St | + | H ( Z t , Q t ) | d t < + ∞ . (4.8)Given this integrability condition, the process ( Y ,Zt ) t ∈ [0 ,T ] is well-deﬁned. The contract consists ofthe following elements: • The constant Y is calibrated by the exchange to ensure that the reservation utility constraint(R) of the market maker is satisﬁed. • The term Z ˜ S is the compensation given to the market maker with respect to the volatility riskinduced by the eﬃcient price ˜ S . • Every time a trade of size k occurs on the ask or bid side of the lit or dark pool, the marketmaker receives Z i,j,k . • The term γσ ( Z ˜ S + Q ) − H ( Z, Q ) is a continuous coupon given to the market maker.

Remark 4.1.

In our setting, the volumes of limit orders do not belong to the canonical process andso the principal does not contract on the volumes displayed by the market maker. It is very reasonableas in practice, a large part of volumes sent by market makers are not executed or rapidly canceled.Therefore it is clearly preferable to build contracts based on actual transactions. Moreover, note that ˜ S ,and not the mid-price S , appears in the contract (2.1) . This is not an issue since S can be decomposedinto elements of the canonical process. Formally stated, the deﬁnition of the space Ξ of contracts of the form (4.5) isΞ = n Y Y ,ZT ∈ F T , Y ∈ R , Z ∈ Z , s.t (R) holds o , where Z denotes the set of processes deﬁned by Z := n ( Z ˜ S , Z i,j,k ) , i ∈ { a, b } , j ∈ { l, d } , k ∈ V j s.t (4.3) , (4.7) , (4.8) are satisﬁed o . (4.9) For ( L , z, q ) ∈ ( V l ) × ( V d ) × R V l + V d ) × Z we deﬁne the Hamiltonian of the market maker, whichappears in the contract Y Y ,ZT via the continuous coupon H ( Z, Q ), by h ( L , z, q ) := X ( k l ,k d ) ∈V l ×V d  X i ∈{ a,b } γ −  − exp (cid:18) − γ (cid:16) z i,l,k l + ℓ i,l ( T φ ( i )Γ l q ) − Γ l ( ℓ i,l ) (cid:17)(cid:19)! λ L ,i,l,k l t + X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d,k d + ℓ i,d ( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ i,d ) (cid:17)(cid:19)! λ L ,i,d,k d t φ d ( i, κ )  . (4.10) Its form is deﬁned in (4.10). This Hamiltonian term appears naturally when applying the dynamic programmingprinciple for the market maker’s problem. From Theorem 2, ˆ Y = − γ − log( − R ) ensures that the reservation utility constraint of the market maker is satisﬁed. C and Ξ are, in fact, equal. Moreover, the contractrepresentation (4.5) enables us to provide a solution to the market maker’s problem (4.1). The proofis given in Section A.2. Theorem 2.

Any admissible contract can be written under the form (4.5) , that is C = Ξ . Moreover,for any Y Y ,ZT ∈ Ξ we have V MM0 ( Y Y ,ZT ) = − exp( − γY ) , and Condition (OC) with ξ = Y Y ,ZT is equivalent to the fact that L satisﬁes h ( L t , Z t , Q t ) = H ( Z t , Q t ) for any t ∈ [0 , T ] . This theorem provides a tractable form of contracts for the design of a suitable make-take fees policy.Given the knowledge of the market maker’s response to a given contract, we reformulate the problemof the exchange and prove the existence and uniqueness of the associated value function.

Following Theorem 2, the contracting problem (4.4) is reduced to V E0 := sup ( Y ,Z, L ) ∈ R ×Z×A ,h ( L t ,Z t ,Q t )= H ( Z t ,Q t ) , ∀ t ∈ [0 ,T ] E L  − exp − η (cid:18) X i = { a,b } j = { l,d } c j Z T ℓ i,jt d N i,jt − Y Y ,ZT (cid:19)! . (4.11)For a given contract Y Y ,Z , due to the form of (4.10), the market maker’s optimal response does notdepend on Y . With the exchange’s objective function being decreasing in Y , the maximization withrespect to Y is achieved at the level ˆ Y = − γ − log( − R ). Therefore Problem (4.11) can be reducedto v E := sup ( Z, L ) ∈Z×A h ( L t ,Z t ,Q t )= H ( Z t ,Q t ) , ∀ t ∈ [0 ,T ] J ( Z, L ) , (4.12)where J ( Z, L ) = E L  − exp − η (cid:18) X i = { a,b } j = { l,d } c j Z T ℓ i,jt d N i,jt − Y ,ZT (cid:19)! . We deﬁne D := [0 , T ] × R × N V l + V d ) × N V l + V d ) × R and for any vector p , i ∈ { , . . . , p } , p − i = ( p , . . . , p i − , p i +1 , . . . , p p ) ∈ N p − . By Equation (4.4) and the corresponding footnote, theremight be more than one optimal response L ⋆ of the market maker. We show here how to solve theprincipal’s problem for a speciﬁc optimal response L ⋆ . Using a dynamic programming principle If there are several optimal responses L ⋆ , the exchange should solve the HJB equation (4.13) for every L ⋆ and,according to principal-agent theory, choose the optimal response that maximizes its own utility. v E : D → R ,as v E ( t, ˜ S t , ¯ N t , N t , Y t ) := sup Z ∈Z E L ⋆ t  − exp − η (cid:18) X i = { a,b } j = { l,d } X k ∈V j c j ( ¯ N i,j,kT − ¯ N i,j,kt ) − Y ,ZT (cid:19)! , with ( ¯ N t , N t ) := (cid:16) kN i,j,kt , N i,j,kt (cid:17) i = { a,b } ,j = { l,d } ,k ∈V j , and L ⋆ = ( ℓ ⋆b,l ( z, q ) , ℓ ⋆a,l ( z, q ) , ℓ ⋆b,d ( z, q ) , ℓ ⋆a,d ( z, q )) the optimal response of the market maker, in thesense of (4.6), displayed at time t for a given inventory q and given incentives z of the exchange.Recall that Q t = P j ∈{ l,d } P k ∈V j ( ¯ N b,j,kt − ¯ N a,j,kt ). Usual arguments enables us to show that v E is aviscosity solution of the HJB equation deﬁned on D by0 = ∂ t v E + 12 σ ∂ ˜ S ˜ S v E + sup z ˜ S ∈ R γσ z ˜ S + q ) ∂ y v E + σ z ˜ S ) ∂ yy v E + σ z ˜ S ∂ ˜ Sy v E + sup z ∈ R V l + V d ) X i = { a,b } j = { l,d } X k ∈V j λ L ⋆ ,i,j,kt (cid:18) ∆ i,j,k ( z ) v E − ∂ y v E E ( z i,j,k , ℓ ⋆i,j ( z, q )) (cid:19) , (4.13)where, for i ∈ { a, b } , j ∈ { l, d } ,∆ i,j,k ( z ) v E ( t, ˜ s, ¯ n, n, y ) := v E ( t, ˜ s, ¯ n i,j,k + k, ¯ n − ( i,j,k ) , n i,j,k + 1 , n − ( i,j,k ) , y + z i,j,k ) − v E ( t, ˜ s, ¯ n, n, y ) , E ( z i,l,k , ℓ ⋆i,l ( z, q )) := 1 γ  − exp (cid:18) − γ (cid:16) z i,l,k + ℓ ⋆i,l ( z, q )( T φ ( i )Γ l q ) − Γ l ( ℓ ⋆i,l ( z, q )) (cid:17)(cid:19) , E ( z i,d,k , ℓ ⋆i,d ( z, q )) := 1 γ X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d,k + ℓ ⋆i,d ( z, q )( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ ⋆i,d ( z, q )) (cid:17)(cid:19) φ d ( i, κ )  , and terminal condition v E ( T, ˜ s, ¯ n, n, y ) = − exp (cid:16) − η ( X i = { a,b } j = { l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) . Remark that the best response of the market maker, for which we do not have explicit expression,appears in the value function of the exchange. Inspired by [10, 12], we use the following ansatz forEquation (4.13): v E ( t, s, ¯ n, n, y ) = v ( t, q ) exp (cid:16) − η ( X i = { a,b } j = { l,d } X k ∈V j c j ¯ n i,j,k − y ) (cid:17) , (4.14)where v is a solution of the following HJB equation ( ∂ t v ( t, q ) + H (cid:16) q, L ⋆ , v ( t, · ) (cid:17) , q ∈ {− q, . . . , q } , t ∈ [0 , T ) ,v ( T, q ) = − , (4.15)13ith H (cid:16) q, L ⋆t , v ( t, · ) (cid:17) := sup z ∈ R V l + V d )+1 U (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) , (4.16)and U (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) := v ( t, q ) (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i = { a,b } j = { l,d } X k ∈V j λ L ⋆ ,i,j,kt exp( η ( z i,j,k − kc j )) v (cid:16) t, q − φ ( i ) k (cid:17) − v (cid:16) t, q (cid:17)(cid:16) η E ( z i,j,k , ℓ ⋆i,j ( z, q )) (cid:17)! . This ansatz leads to dimensionality reduction from ﬁve to two parameters. Using [4, Corollary 1.4.2],there exists a unique continuous viscosity solution associated to (4.15).

Remark 4.2.

Note that the supremum over z ˜ S is explicit and given by z ˜ S = − γγ + η q as in [10]. Making use of the ansatz v , the bi-level optimization problem (4.12) is reduced to solving the followingsystem: ( ∂ t v ( t, q ) + H (cid:16) q, L ⋆t , v ( t, · ) (cid:17) , with ﬁnal condition v ( T, q ) = − ,h ( L ⋆ , z, q ) = H ( z, q ) , q ∈ {− q, . . . , q } . (4.17)We have the following theorem. Theorem 3.

There exists a unique continuous viscosity solution to HJB equation (4.15) . It satisﬁes v E = v (0 , Q ) = v E (0 , ˜ S , ¯ N , N , Y ) . Moreover, the optimal incentives of the principal Z ⋆ are solutions of the supremum in (4.15) . The proof can be found in Section A.3.Theorem 3 allows us to use numerical methods to obtain the optimizers (cid:18) Z ⋆ ( t, Q t − ) , L ⋆ (cid:16) Z ⋆ ( t, Q t − ) , Q t − (cid:17)(cid:19) t ∈ [0 ,T ] (4.18)of the bi-level problem (4.17). Moreover, the optimal contract is given by ξ ⋆ = ˆ Y + Z T (cid:18) X i ∈{ a,b } j ∈{ l,d } X k ∈V j Z ⋆i,j,ku d N i,j,ku (cid:19) + Z ⋆ ˜ Su d ˜ S u + (cid:18) γσ ( Z ⋆ ˜ Su + Q u ) − H ( Z ⋆u , Q u ) (cid:19) d u. The second problem in (4.17) is a classical optimization problem. Having found numerically L ⋆ ( z, q ),we solve the Hamilton-Jacobi-Bellman (4.15) using neural networks.14 emark 4.3. Theorem 3 characterizes only the value function of the exchange and not the optimalincentives deﬁned in (4.18) , which are computed through deep reinforcement learning techniques. Inparticular, there is no guarantee of admissibility of the incentives ( Z ⋆ ( t, Q t − )) t ∈ [0 ,T ] solving (4.16) .However, we observe numerically (see Figure 5) that these incentive parameters are essentially linear(despite nonlinear nature of neural networks) in the inventory Q at any ﬁxed time t . This resultis indeed quite usual in the optimal market making literature where asymptotic development of thefunction v is used, so v should be regular enough (see [12, Section 4] or [1, Section 3.2]). Thelinearity of the incentives Z ⋆ implies them to be in the set of admissible contracts Z deﬁned by (4.9) . We now turn to the description of our numerical method to solve (4.17), the optimization procedureconsists of two stages. At the ﬁrst stage, we optimize the controls of the market maker for all possiblevalues of the incentives given by the exchange. At the second stage, we use an actor-critic approach,to obtain both the optimal controls and the value function of the exchange. We conclude this sectionwith numerical experiments showing the impact of incentives as well as that of market conditionson the volumes posted by the market maker on both lit and dark venues regulated by the exchange.Throughout these experiments, we assume the following:

Assumption 5.1.

For all i ∈ { a, b } , j ∈ { l, d } and k ∈ V j , Z i,j,k = Z i,j ∈ R . This means that the principal provides incentives only with respect to the number of transactions oneach side of each pool independently of the volumes. In that case, HJB equation (4.15) remains valid.Recall that the optimal incentives depend on time and market maker’s inventory, therefore, implicitlythey depend on the transacted volume.There is obvious bid-ask symmetry in our model with respect to the inventory of the market maker,as it can be seen in Hamiltonian (4.10). Thus for our numerical experiments we impose symmetryof the incentives with respect to q . As a consequence, we have symmetry of volumes posted by themarket maker with respect to q , given incentives satisfying the bid-ask symmetry property. The ﬁrst step to tackle our principal-agent problem is to ﬁnd optimal volumes L ⋆ = ( ℓ ⋆a,l , ℓ ⋆b,l , ℓ ⋆a,d , ℓ ⋆b,d ),by solving for any couple ( z, q ), the maximization problem of the market maker (4.6). To do so, weintroduce a continuous version of the Hamiltonian (4.10) with respect to L , that is we maximize thefollowing functional: L 7−→ h c ( L , z, q ) := X i ∈{ a,b } γ −  − exp (cid:18) − γ (cid:16) z i,l + ℓ i,l ( T φ ( i )Γ l q ) − Γ l ( ℓ i,l ) (cid:17)(cid:19)! λ i,l ( L lt )+ X κ ∈ K − exp (cid:18) − γ (cid:16) z i,d + ℓ i,d ( T φ lat ( κ )+ φ ( i )Γ d q ) − Γ d ( ℓ i,d ) (cid:17)(cid:19)! λ i,d ( L lt ) φ d ( i, κ )  . (5.1)15or ﬁxed incentives, we have that L ⋆a,l ( q ) = L ⋆b,l ( − q ). Because of the intricate form of the func-tion h c , we cannot have an explicit solution to the ﬁrst order condition ∇ L h c = 0, which is four-dimensional. Moreover, we do not have an a priori knowledge on the functional form of optimizers L ⋆ : R × [ − q, q ] → R , so we cannot apply canonical root-ﬁnding methods. Therefore to addressthis problem, we approximate the best response of the market maker by a neural network.Although we do not use a purely grid-based method, we need to deﬁne a domain for arguments q and z . In our model inventory q of the market maker is bounded and evolves between risk limits − q and q . We also deﬁne a bound z for the incentives z ∈ R , so that z ∈ [ − z, z ] . This is in fact justiﬁedalso by the paper [10] in which optimal incentives are proved to be bounded.We approximate the best response function L ⋆ by a neural network l [ ω l ], where ω l are the weights ofthe neural network. The neural network l [ ω l ] takes as inputs principal’s incentives and the marketmaker’s current inventory ( z a,l , z b,l , z a,d , z b,d , q ), which are normalized by z and q respectively. Thenetwork is composed of 2 hidden layers with 10 nodes in each of them and with ELU activationfunctions. ELU activation function is of the formELU( x ) =  α ( e x − , for x ≤ x, for x > , where α is a non-negative parameter, usually taken equal to 1.The ﬁnal layer of the network contains four outputs, and the activation function is sigmoid (for theoutputs to be between 0 and 1). The output of l [ ω l ] is then renormalized via multiplication by q toobtain volumes between 0 and q .To obtain optimal volumes of the market maker, we minimize the opposite of the Hamiltonian functiondeﬁned by Equation (5.1). We generate K > z and q , and conduct several epochsof batch learning with the following weights update: ω l ← ω l + µ l K K X k =1 ∇ ω l l [ ω l ]( z k , q k ) (cid:18) ∇ l h c ( l [ ω l ]( z k , q k ) , z k , q k ) − ρ (cid:16) ( q k + l [ ω l ] b,l + l [ ω l ] b,d − q ) + +( q k − l [ ω l ] a,l − l [ ω l ] a,d + q ) − (cid:17)(cid:19) , where µ l is the learning rate. The term scaled by ρ corresponds to a penalty employed to force quotesto stay in A , so that l [ ω l ] i,l + l [ ω l ] i,d ∈ [0 , q ] , i ∈ { a, b } . In our computations we use ρ = 0 . l ⋆ [ ω l ] the approximated optimal response function of the market maker L ⋆ (theresult of the above optimization procedure). In Figure 1, we see an example of the best response l ⋆ [ ω l ] as a function of z a,l = − z b,l , when the market maker’s inventory q = 50 and other incentives z a,d = z b,d = 0 .

05 (close to zero). Remark that the choice of incentives is arbitrary only and aimed atreﬂecting the main properties of l ⋆ [ ω l ]. Here we slightly abuse notation denoting by l [ ω l ] the response of the market maker obtained via neural networkparametrized by weights ω l . b,l −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0z a,l Figure 1: Best response of the market maker as a function of z a,l and z b,l , with q = 50.The observed behavior has quite natural interpretation. The incentive z a,l is a remuneration of themarket maker when his limit order is executed on the ask side of the lit pool. When this incentiveincreases, the market maker ensures to have a small imbalance on the ask side of the lit pool so thathe can earn z a,l . Because of his positive inventory, the volume posted on the ask side of the darkpool is higher than on the bid side of the dark pool: the market maker wants to liquidate his longposition. Similarly when the incentive z b,l increases, the market maker wants to beneﬁt from it whentransacting on the bid side of the lit pool. This explains the small imbalance on the bid side of the litpool for positive z b,l . Mathematically speaking, the function h c is increasing in z a,l . Thus for a high z a,l , the value of the term E ( z a,l , l ⋆ [ ω l ] a,l ) in the Hamiltonian is high. To beneﬁt from the remuneration z a,l , the intensity λ a,l must be high, which implies a small imbalance on the ask side, hence I a shouldbe small. Similarly for z b,l .For q = 150, z b,l = − z a,l , and other incentives z a,d = z b,d = 0 .

05 (close to zero), we display thevolumes in Figure 2: −2.0−1.5−1.0−0.50.00.51.01.52.0 z b,l −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0z a,l

Figure 2: Best response of the market maker as a function of z a,l and z b,l , with q = 150.As the market maker has a higher inventory, his quotes on the bid side of both pools decrease becauseof the inventory risk. Moreover, his quotes on the ask side of both pools increase to liquidate hislong position. For high incentives z b,l , a small volume on the bid side of the lit pool leads to a low17mbalance on the bid side, hence a high probability of execution for passive ask orders in the darkpool, where the market maker tries to liquidate his position. Note that for high z a,l , the imbalance isapproximately equal to one half, because the market maker does not want to suﬀer from the latencyeﬀect (to be executed at the mid-price in the dark pool).We now move to the problem of the principal. (4.15)A numerical approximation of the optimal incentives z ⋆ can be obtained by1. solving (numerically) the static maximization problem (5.1), which provides the approximationof the optimal response L ⋆ of the market maker,2. plugging this approximation in the continuous (with respect to L ⋆ ) version of Hamilton-Jacobi-Bellman equation (4.15), that is to say: ( ∂ t v ( t, q ) + H c (cid:16) q, L ⋆ , v ( t, · ) (cid:17) , q ∈ [ − q, q ] , t ∈ [0 , T ) ,v ( T, q ) = − , with H c (cid:16) q, L ⋆ , v ( t, · ) (cid:17) := sup z ∈ R U c (cid:16) z, q, L ⋆ ( z, q ) , v ( t, · ) (cid:17) , and abusing the notation with L ⋆ denoting L ⋆ ( z, q ) U c (cid:16) z, q, L ⋆ , v ( t, · ) (cid:17) := v ( t, q ) (cid:18) η σ γ (cid:16) z ˜ S + q (cid:17) + η σ (cid:16) z ˜ S (cid:17) (cid:19) + X i = { a,b } j = { l,d } λ i,j ( L ⋆l ) exp( η ( z i,j − c j ℓ ⋆i,j )) v (cid:16) t, q − φ ( i ) ℓ ⋆i,j (cid:17) − v (cid:16) t, q (cid:17)(cid:16) η E ( z i,j , ℓ ⋆i,j ) (cid:17)! . We obtain explicitly z ˜ S = − γγ + η q , so we are only interested in ﬁnding optimal ( z a,l , z b,l , z a,d , z b,d ). Theclassical method to solve the above problem is to obtain an approximation of the value function via aﬁnite diﬀerence scheme on a grid. Since the size of the grid increases exponentially with the numberof dimensions, using this approach is not possible for a high dimension. Therefore, to address ourﬁve-dimensional optimization problem, we resort to neural networks.We use an algorithm known in reinforcement learning literature as the actor-critic method. The coreof this approach is the representation of the value function and optimal controls with deep neuralnetworks. The learning procedure itself consists of two stages: value function update (also calledcritic update) and controls update (actor update).We ﬁrst split our problem into sub-problems corresponding to diﬀerent time steps. We consider atime step ∆ t . The ﬁrst-order approximation of the value function at time t gives v ( t, · ) ≈ v ( t + ∆ t, · ) − ∂ t v ( t + ∆ t, · ) . t , we represent the value function and the incentives with neural networks. Ourprocedure is backward in time, and we start from T − ∆ t , recalling that v ( T, · ) = −

1. Let us ﬁx t ∈ [0 , T − ∆ t ]. Value function at time t is represented by v t [ ω v t ]( · ) which is a feedforward neuralnetwork, parameterized by weights ω v t , which approximates the value function corresponding to thecurrent set of incentives approximated by the neural network z t [ ω z t ]( · ), parametrized by ω z t . Critic’snetwork is composed of 2 hidden layers with 20 nodes in each of these layers with ELU activationfunctions. The ﬁnal layer of the network contains one output, and the activation is aﬃne. Actor’snetwork is composed of 2 hidden layers with 20 nodes in each of these layers with ELU activationfunctions. The ﬁnal layer of the network contains four outputs, and the activation is tanh (this allowsthe output to stay between − z . The ﬁrst step is thefollowing update of the value function network’s weights ω v t : ω v t ← ω v t + µ v K K X k =1 ∇ ω vt v t [ ω v t ]( q k ) (cid:16) v t +∆ t [ ω v t +∆ t ]( q k )+ U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] ,v t +∆ t [ ω v t +∆ t ]( · )) − v t [ ω v t ]( q k ) (cid:17) , where µ v is a learning rate, U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · )) corresponds to the function under thesupremum of the Hamiltonian (4.16) calculated using the current controls z t [ ω z t ]. The quantities q k , k ∈ { , . . . , K } are the elements of the training set, more precisely K uniformly distributed ele-ments from the interval [ − q, q ]. We use U ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t +∆ t [ ω v t +∆ t ]( · )) as an approximationof ∂ t v ( t + ∆ t, · ) to apply the ﬁrst order approximation of the value function described above.When the value function’s neural network approximates the value function corresponding to thecurrent control z t [ ω z t ], we can move to the stage of optimization over control values (also called policyupdate in reinforcement learning literature). Our policy update consists of two diﬀerent procedures.The ﬁrst one is an exploitation phase where the weights are updated according to the best directionsuggested by the gradient of the function U c ( z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · )): ω z t ← ω z t + µ z K K X k =1 ∇ ω zt z t [ ω z t ]( q k ) ∇ z t U c (cid:16) z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17) , where µ z is a learning rate. This type of updates is usually called policy gradient.Another type of updates we use in the learning procedure is an exploration phase. During this phase,we use the current values given by the neural network of controls and introduce noise to these values,to explore the values slightly diﬀerent from those proposed by the neural network. Noise is normallydistributed around 0 with standard deviation chosen beforehand (in the following examples, we usestandard normal distribution). This phase could help us to quit local minima, in case the algorithmis trapped in one. The following updates characterize this phase: ω z t ← ω z t + ˆ µ z K K X k =1 ε k ∇ ω zt z t [ ω z t ]( q k ) (cid:18) U c (cid:16) z t [ ω z t ]( q k ) + ε, q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17) − U c (cid:16) z t [ ω z t ]( q k ) , q k , l ⋆ [ ω l ] , v t [ ω v t ]( · ) (cid:17)(cid:19) , where ε is a vector of length K representing introduced perturbations and ˆ µ z is a learning rate.19 .2 Numerical Results In the following we consider ∆ t = 1. Since time has little impact on the quotes chosen by the marketmaker (see [10, 12]), we present the results only for time T −

1, the extension to earlier time stepsis straightforward. As mentioned before, the optimization problems considered are symmetric withrespect to the inventory variable q . First, we present a reference model without the intervention of the exchange. We consider the followingparameters: • Risk aversion of the market maker: γ = 0 . • Market impacts: Γ l = 10 − , Γ d = 5 × − ; • Inﬂuence of the imbalance on the orders arrival: θ l = θ d = 0 . • Volatility: σ = 0 . • Fees: c l = 0 . , c d = 0 . • Order ﬂow intensity parameter: A l = 5 × , A d = 3 × .In Figure 3, we present the optimal quotes of the market maker. −300 −200 −100 0 100 200 300Inventory0255075100125150175200 Optimal volumes ask litbid litask darkbid dark

Figure 3: Optimal quotes of the market maker.One can see that the market maker splits his orders equitably between the lit and dark pools whenhis inventory is near zero. However, when he has a very positive (resp. negative) inventory, he hasa large imbalance on the ask (resp. bid) side of the lit pool, to liquidate his position in the darkpool. Such behavior shows that the market maker uses the dark pool as a way to liquidate a largeposition by adjusting the imbalance in the lit pool. Indeed, when he posts a high volume on the askside of the lit pool, he encourages ask orders in the dark pool. Thus, as he prioritizes the executionof a large ask order, he accepts to be executed at the mid-price in the dark pool. When q = 300, hedoes not post a sell order of size 300 in the dark pool, because of the quadratic variation between themid-price and its inventory process (which can be seen as a quadratic penalty in the market maker’s20nL process with respect to the volumes displayed). Because of the latency generated on the ask sideof the lit pool, the market takers sending market orders on the bid side of the dark pool are likely tobe executed at an unfavorable price. This is why the market maker posts a non-zero volume on thebid side of the dark pool. Remark also that for small inventories, the market maker posts volumes onboth ask and bid sides of the dark pool because he may accept to increase his inventory risk by beingexecuted at a more favorable price in the dark pool due to the latency eﬀect (the volumes displayedin the lit pool lead to 50 percents chance to face this eﬀect at least on one of the sides of the darkpool). Note that the parameters A l > A d describe the fact that there are, on average, much moreorders in the lit pool than in the dark pool. In the following sections, we present several numerical experiments involving the incentive policy ofthe exchange.

In this section, we present a reference model with the exchange. We take the same parameters as inthe case without the exchange, and we set the exchange’s risk aversion: η = 0 . −300 −200 −100 0 100 200 300Inventory0255075100125150175 Optimal volumes ask litbid litask darkbid dark

Figure 4: Optimal quotes of the market maker. −300 −200 −100 0 100 200 300Inventory−0.8−0.6−0.4−0.20.00.20.4

Incentives ask litbid litask darkbid dark

Figure 5: Optimal incentives of the exchange.The presence of incentives has signiﬁcant eﬀects on the market maker’s behavior. When the marketmaker has an inventory near zero, incentives lead to an increase of the volumes posted in the lit pooland a decrease of that in the dark pool compared to Figure 3. Thus the exchange improves the liquid-ity in the lit venue. Moreover, the strategy of the market maker for very positive or negative inventoryis modiﬁed. When he has a very positive inventory, he posts a higher volume on the ask side of thedark pool than in the case without exchange. In addition to this, he posts an equal volumes (smallbut not negligible) on the ask and bid sides of the lit pool. So we see that the exchange preventsthe market maker from artiﬁcial manipulation of the market, consisting in creation of high imbalanceon the ask side. As the imbalance is around 1 /

2, the market maker does not take advantage of thelatency eﬀect. This assumption is consistent with the MIFID II regulation rolled out on January 3, 2018, which imposes a cap onvolumes traded in the dark pools.

21n Figure 5, we see that, even if our problem is much more intricate than those of [3, 10], the shapeof the principal’s incentives are essentially linear functions of the market maker’s inventory.

We now investigate the impact of higher volatility on the posted volumes with and without theexchange. We take σ = 0 .

4, the other parameters being as previously. −300 −200 −100 0 100 200 300Inventory0255075100125150175200