[PDF] Algorithmic trading in a microstructural limit order book model

Abstract

We propose a microstructural modeling framework for studying optimal market making policies in a FIFO (first in first out) limit order book (LOB). In this context, the limit orders, market orders, and cancel orders arrivals in the LOB are modeled as Cox point processes with intensities that only depend on the state of the LOB. These are high-dimensional models which are realistic from a micro-structure point of view and have been recently developed in the literature. In this context, we consider a market maker who stands ready to buy and sell stock on a regular and continuous basis at a publicly quoted price, and identifies the strategies that maximize her P\&L penalized by her inventory. We apply the theory of Markov Decision Processes and dynamic programming method to characterize analytically the solutions to our optimal market making problem. The second part of the paper deals with the numerical aspect of the high-dimensional trading problem. We use a control randomization method combined with quantization method to compute the optimal strategies. Several computational tests are performed on simulated data to illustrate the efficiency of the computed optimal strategy. In particular, we simulated an order book with constant/ symmet-ric/ asymmetrical/ state dependent intensities, and compared the computed optimal strategy with naive strategies. Some codes are available on this https URL

Full PDF

AAlgorithmic trading in a microstructural limit order book model

Frédéric

Abergel ∗ Côme

Huré † Huyên

Pham ‡ May 3, 2019

Abstract

We propose a microstructural modeling framework for studying optimal market making policies in aFIFO (ﬁrst in ﬁrst out) limit order book (order book). In this context, the limit orders, market orders,and cancel orders arrivals in the order book are modeled as Cox point processes with intensities thatonly depend on the state of the order book. These are high-dimensional models which are realistic froma micro-structure point of view and have been recently developed in the literature. In this context, weconsider a market maker who stands ready to buy and sell stock on a regular and continuous basis at apublicly quoted price, and identiﬁes the strategies that maximize her P&L penalized by her inventory.We apply the theory of Markov Decision Processes and dynamic programming method to char-acterize analytically the solutions to our optimal market making problem. The second part of thepaper deals with the numerical aspect of the high-dimensional trading problem. We use a controlrandomization method combined with quantization method to compute the optimal strategies. Sev-eral computational tests are performed on simulated data to illustrate the eﬃciency of the computedoptimal strategy. In particular, we simulated an order book with constant/ symmetric/ asymmetrical/state dependent intensities, and compared the computed optimal strategy with naive strategies.

Keywords:

Limit order book, pure-jump controlled process, high-frequency trading, high-dimensionalstochastic control, Markov Decision Process, quantization, local regression

Most of the markets use a limit order book (order book) mechanism to facilitate trade. Any marketparticipant can interact with the order book by posting either market orders or limit orders. In such typeof markets, the market makers play a fundamental role by providing liquidity to other market participants,typically to impatient agents who are willing to cross the bid-ask spread. The proﬁt made by a marketmaking strategy comes from the alternation of buy and sell orders.From the mathematical modeling point of view, the market making problem corresponds to the choiceof an optimal strategy for the placement of orders in the order book. Such a strategy should maximizethe expected utility function of the wealth of the market maker up to a penalization of her inventory. In ∗ MICS Laboratory - CentraleSupelec, frederic.abergel at ecp.fr † LPSM - Paris 7 Diderot University, hure at lpsm.paris ‡ LPSM - Paris 7 Diderot University and CREST - ENSAE, pham at lpsm.paris a r X i v : . [ q -f i n . T R ] M a y he recent litterature, several works focused on the problem of market making through stochastic controlmethods. The seminal paper by Avellaneda and Stoikov [AS07] inspired by the work of Ho and Stoll[HS79] proposes a framework for trading in an order driven market. They modeled a reference price forthe stock as a Wiener process, and the arrival of a buy or sell liquidity-consuming order at a distance δ from the reference price is described by a point process with an intensity in an exponential form decreasingwith δ . They characterized the optimal market making strategies that maximize an exponential utilityfunction of terminal wealth. Since this paper, other authors have worked on related market makingproblems. Gueant, Lehalle, and Fernandez-Tapia [GLFT12] generalized the market making problem of[AS07] by dealing with the inventory risk. Cartea and Jaimungal [CJ13] also designed algorithms thatmanage inventory risk. Fodra and Pham [FP15b] and [FP15a] considered a model designed to be a goodcompromise between accuracy and tractability, where the stock price is driven by a Markov RenewalProcess, and solved the market making problem. Guilbaud and Pham [GP13] also considered a model forthe mid-price, modeled the spread as a discrete Markov chain that jumps according to a stochastic clock,and studied the performance of the market making strategy both theoretically and numerically. Carteaand Jaimungal [CJ10] employed a hidden Markov model to examine the intra-day changes of dynamicsof the order book. Very recently, Cartea, Penalva, and Jaimungal [CPJ15] and Gueant [Gu

6] publishedmonographs in which they developped models for algorithmic trading in diﬀerent contexts. Abergel andEl Aoud [EAA15] extended the framework of Avellaneda and Stoikov to the options market making. Acommon feature of all these works is that a model for the price or/and the spread is considered, and theorder book is then built from these quantities. This approach leads to models that predict well the long-term behavior of the order book. The reason for this choice is that it is generally easier to solve the marketmaking problem when the controlled process is low-dimensional. Yet, some recent works have introducedaccurate and sophisticated micro-structural order book models. These models reproduce accurately theshort-term behavior of the market data. The focus is on conditional probabilities of events, given the stateof the order book and the positions of the market maker. Abergel, Anane, Chakraborti, Jedidi, MuniToke [Abe+16] proposed models of order book where the arrivals of orders in the order book are drivenby Poisson processes or Hawkes processes. Stoikov, Talreja, and Cont [CST07] also modeled the ordersarrivals with Poisson processes. Lehalle, Rosenbaum and Huang [HLR15] proposed a queue-reactive modelfor the order book. In this model the arrivals of orders are driven by Cox point processes with intensitiesthat only depend on the state of the order book (they are not time dependent). Other tractable dynamicmodels of order-driven market are available (see e.g. Stoikov, Talreja, and Cont [CST07], Rosu [Ros08],Cartea, Jaimungal, Ricci [CJR14]).In this paper we adopt the micro-structural model of order book in [Abe+16], and solve the associatedtrading problem. The problem is formulated in the general framework of Piecewise Deterministic MarkovDecision Process (PDMDP), see Bauerle and Rieder [BR11]. Given the model of order book, the PDMDPformulation is natural. Indeed, between two jumps, the order book remains constant, so one can see themodeled order book as a point process where the time becomes a component of the state space. As for thecontrol, the market maker ﬁxes her strategy as a deterministic function of the time right after each jumptime. We prove that the value function of the market making problem is equal to the value function of anassociated non-ﬁnite horizon Markov decision process (MDP). This provides a characterization of the valuefunction in terms of a ﬁxed point dynamic programming equation. Jacquier and Liu in [JL18] recentlyfollowed a similar idea to solve an optimal liquidation problem, while Baradel et al. [BBEM18] and Lehalleet al. [LOR18] also tackled this problem of reward functional maximization in a micro-structure model oforder book framework. 2he second part of the paper deals with the numerical simulation of the value functions. The com-putation is challenging because the micro-structural model used to model the order book leads to a high-dimensional pure jump controlled process, so evaluating the value function is computationally intensive.We rely on control randomization and Markovian quantization methods to compute the value functions.Markovian quantization has been proved to be very eﬃcient for solving control problems associated withhigh-dimensional Markov processes. We ﬁrst quantize the jump times and then quantize the state space ofthe order book. See Pages, Pham, Printemps [PPP04] for a general description of quantization applied tocontrolled processes. The projections are time-consuming in the algorithm, but Fast approximate nearestneighbors algorithms (see e.g. [ML09]) can be implemented to alleviate the procedure. We borrow thevalues of intensities of the arrivals of orders for the order book simulations from Huang et al. [HLR15] inorder to test our optimal trading strategies.The paper is organized as follows. The model setup is introduced in Section 2: we present the micro-structural model for the order book, and show how the market maker interacts with the market. InSection 3, we prove the existence and provide a characterization of the value function and optimal tradingstrategies. In Section 4, we introduce a quantization-based algorithm to numerically solve a general classof discrete-time control problem with ﬁnite horizon, and then apply it on our trading problem. We thenpresent some results of numerical tests on simulated order book. Section 5 presents an extension of ourmodel when order arrivals are driven by Hawkes processes, and ﬁnally the appendix collects some resultsused in the paper.

We consider a model of the order book inspired by the one introduced in chapter 6 of [Abe+16].Fix K ≥ . An order book is supposed to be fully described by K limits on the bid side and K limits onthe ask side. Denote by pa t the best ask at time t , which is the cheapest price a participant in the market iswilling to sell a stock at time t , and by pb t the best bid at time t , which is the highest price a participant inthe market is willing to buy a stock at time t . We use the pair of vectors (cid:0) a t , b t (cid:1) = (cid:0) a t , ..., a Kt , b t , . . . , b Kt (cid:1) • a it is the number of shares available i ticks away from pb t , • - b it is the number of shares available i ticks away from pa t ,to describe the order book. The vector a t and b t describe the ask and the bid sides at time t . Thequantities a it , ≤ i ≤ K , live in the discrete space q N where q ∈ R ∗ is the minimum order size on eachspeciﬁc market ( lot size ). The quantities b it , ≤ i ≤ K , live in the discrete space − q N . By convention,the a i are non-negative, and the b i are non-positive for ≤ i ≤ K . The tick size (cid:15) represents the smallestintervall between price levels.In the sequel we assume that the orders arrivals have the same size q = 1 , and set the tick size to (cid:15) = 1 for simplicity.Constant boundary conditions are imposed outside the moving frame of size K in order to guaranteethat both sides of the LOB are never empty: we assume that all the limits up to the K -th ones are equal3o a ∞ in the ask side, and equal to b ∞ in the bid side.We shall assume some conditions on the structure of the orders arrivals in the order book. (Harrivals) The orders arrivals from general market participants (market orders, limit orders and cancelorders) occur according to Markov jump processes which intensities only depends on the state of the orderbook. Moreover, we assume that the all the intensities are at most linear w.r.t. the couple (cid:0) a, b (cid:1) and areconstant between two events.Under (Harrivals) , let us deﬁne • λ M + the intensity of the buy-to-market orders ﬂow M + t , • λ M − the intensity of the sell-to-market orders ﬂow M − t , • λ L + i , i ∈ { , ...K } , the intensity of the sell orders ﬂow L + i at the i th limit of the ask side, • λ L − i , i ∈ { , ...K } , the intensity of the buy orders ﬂow L − i at the i th limit of the bid side, • λ C + i , i ∈ { , ...K } , the intensity of the cancel orders ﬂow C + i at the i th limit of the ask side, • λ C − i , i ∈ { , ...K } , the intensity of the cancel orders ﬂow C − i at the i th limit of the bid side,and let λ L , λ C , λ M be such that K (cid:88) i =0 λ (cid:0) L ± i (cid:1) ( z ) ≤ λ L (cid:0) | a | + | b | (cid:1) , K (cid:88) i =0 λ (cid:0) C ± i (cid:1) ( z ) ≤ λ C (cid:0) | a | + | b | (cid:1) ,λ (cid:0) M − (cid:1) ( z ) + λ (cid:0) M + (cid:1) ( z ) ≤ λ M (cid:0) | a | + | b | (cid:1) , for all state ( a, b ) of the LOB. We remind that λ L , λ C , λ M are well-deﬁned under assumption (Harrivals) . Remark 2.1.

The linear conditions on the intensities are required to prove that the control problem iswell-posed.

Remark 2.2.

We generalize the structure of the orders arrivals in section 5 by modeling them as Hawkesprocesses with exponential kernel.

We provide in ﬁgure 1 a graphical representation of an LOB that may help to get more familiar withthe introduced notations. 4 riceVolume &Rank in thequeues-8-7-6-5-4-3-2-1012345678 pa Ask sideBid side pb

Bounding conditions pa+2 (cid:15)

Bounding conditions

New limit orderCancellation of limit orderbuy-to-market order

Figure 1 – Order book dynamics: in this example, K = 3 , q = 1 , a ∞ = 4 , b ∞ = − , a = (8 , , , b = ( − , − , − . The spread is equal to 1. At any time, the order book can receive limit orders, marketorders or cancel orders. We assume that the market is governed by a

FIFO ( First In First Out ) rule, which means that eachlimit of the order book is a queue where the ﬁrst order in the queue is the ﬁrst one to be executed. Weconsider a market maker who stands ready to send buy and sell limit orders on a regular and continuousbasis at quoted prices. A usual assumption in stochastic control in order to characterize value functionas solution of HJB equation is to constrain the control space to be compact. In this spirit, we shall makethe following assumption on the market maker’s decisions. (Hcontrol)

Assume that at any time, the total number of limit orders placed by the marker maker doesnot exceed a ﬁxed (possibly large) integer ¯ M . The market maker can choose at any time to keep, cancel or take positions in the order book (as longas she does not hold more than ¯ M positions in the order book). Her positions are fully described by thefollowing ¯ M − dimensional vectors ra t , rb t , na t , nb t where ra (resp. rb ) records the limits in which themarket maker’s sell (resp. buy) orders are located; and na (resp. nb ) records the ranks in the queues ofeach market maker’s sell (resp. buy) orders. In order to guarantee that the strategy of the market makeris predictable w.r.t. the natural ﬁltration generated by the orders arrivals processes, we shall make thefollowing assumption. 5 Harrivals2)

The intensities do not depend on the control. Moreover, the market maker does not crossthe spread.We discuss in Appendix how to control the intensity, by transferring the control on the probabilitymeasure, see Section ATo simplify the theoritical analysis, we also make the following assumption: (Harrivals3)

Assumethat the market maker does not change her strategy between two orders arrivals of the order book. Inother words, the market maker makes a decision right after one of the order arrivals processes L ± , C ± , M ± jumps, and keep it until the next the jump of an order arrival.Note that assumption (Harrivals3) is mild if the order book jumps frequently, since the market makercan change her decisions frequently in such a case.We provide in ﬁgure 2 a graphical representation of the controlled LOB. Notice that the market makerinteracts with the order book by placing orders at some limits. The latter have ranks that evolve aftereach orders arrivals.Denote by ( T n ) n ∈ N the sequence of jump times of the order book. We denote by A the set of theadmissible strategies, deﬁned as the predictable processes (cid:0) ra t , rb t (cid:1) t ≤ T such that: • for all n ∈ N , (cid:0) ra t , rb t (cid:1) ∈ { , ..., K } ¯ M × { , ..., K } ¯ M are constant on (cid:0) T n , T n +1 (cid:3) • ra ∗ , rb ∗ ≥ a where, for every vector a : a ∗ = min ≤ i ≤ K { a i s.t. a i (cid:54) = − } ; and: a ≤ i ≤ K (cid:0) a i s.t. a i > (cid:1) . The controlis the double vector of the positions of the ¯ M market maker’s orders in the order book. By convention,we set: ra i ( t ) = − if the i th market maker’s order is not placed in the order book.6 riceVolume &Rank in thequeuesBuy Ordersof the marketmaker Sell Ordersof the marketmakerNew market maker sell order Figure 2 – Example of market maker’s placements and decisions she might make. In this example: herpositions are ra = (0 , , − , ... ) , rb = (0 , , − ... ) . The ranks vectors associated are na = (2 , , − . . . ) and nb = (4 , , − , ... ) . After each order arrival, she can send new limit orders, cancel some positions, orjust keep the latter unchanged. We describe the controlled order book by the following state process Z : Z t := (cid:0) X t , Y t , a t , b t , na t , nb t , pa t , pb t , ra t , rb t (cid:1) , where, at time t : • X t is the cash held by the market maker on a zero interest account. • Y t is the inventory of the market maker, i.e. it is the (signed) number of shares held by the marketmaker. • pa t is the ask price, i.e. the cheapest price a general market participant is willing to sell stock. • pb t is the bid price, i.e. the highest price a general market participant is willing to buy stock. • a t = ( a ( t ) , . . . , a K ( t )) (resp. b t = ( b ( t ) , . . . , b K ( t )) ) describes the ask (resp. bid) side: i ∈{ , . . . , K } , a i ( t ) is the sum of all the general market participants’ sell orders which are i ticks awayfrom the bid (resp. ask) price. • ra t (resp. ra t ) describes the market maker’s orders in the ask (resp. bid) side: for i ∈ { , ..., ¯ M } , ra t ( i ) is the number of ticks between the i -th market maker’s sell (resp. bid) order and the bid (resp.7sk) price. By convention, we set ra t ( i ) = − (resp. rb t ( i ) = − ) if the i -th sell (resp. buy) orderof the market maker is not placed in the order book. As a result ra t ( i ) , rb t ( i ) ∈ { , . . . , K } ∪ {− } . • na t (resp. nb t ) describes the ranks of the market maker’s orders in the ask (resp. bid) side. For i ∈ { , ..., ¯ M } , na t ( i ) ∈ (cid:8) − , ..., | a | + ¯ M (cid:9) (resp. nb t ( i ) ∈ (cid:8) − , ..., | b | + ¯ M (cid:9) ) is the rank of the i -th sell(resp. buy) orders of the market maker in the queue. By convention, we assume that na t ( i ) = − (resp. nb t ( i ) = − ) if the i -th sell (resp. buy) order of the market maker is not placed in the orderbook.The dynamics of ( Z t ) has been computed in the case where the set of admissible strategies is restrictedto those where the market maker only makes orders at the two best limits in the bid and ask sides. Wepresent the computations in Section B in the Appendix, for the case where the market maker can onlysend limit orders at the best-bid and best-ask. We only present the numerical results, in Section 4.4, inthe case where the market maker can send limit orders at the two best limits in the ask and bid sides. We denote by V the value function for the market-making problem, deﬁned as follows: V ( t, z ) = sup α ∈ A E αt,z (cid:20)(cid:90) Tt f (cid:0) α s , Z s (cid:1) d s + g (cid:0) Z T (cid:1)(cid:21) , ( t, z ) ∈ [0 , T ] × E, (3.1)where: • A is the set of the admissible strategies, deﬁned in Section 2.2.1. • f and g are respectively the running and terminal reward functions. A usual deﬁnition for g is themarket maker’s wealth function, possibly with an inventory penalization, i.e. g : z (cid:55)→ x + L ( y ) − ηy where L returns the amount earned from the immediate liquidation of the inventory; where η isthe penalization parameter of the latter; and where we remind that y stands for the (signed) marketmaker’s inventory. • E αt,z stands for the expectation conditioned by Z t = z and when strategy α = ( α s ) t ≤ s if y = 0 , for all state z = (cid:0) x, y, a, b, na, nb, pa, pb, ra, rb (cid:1) of order book, where:  = (cid:26) min (cid:8) j (cid:12)(cid:12) (cid:80) ji =0 a i > − y (cid:9) if y < (cid:8) j (cid:12)(cid:12) (cid:80) ji =0 | b i | > y (cid:9) if y > .

8e shall assume conditions on the rewards to insure the well-posedness of the market-making problem. (Hrewards)

The expected running reward is uniformly upper-bounded w.r.t. the strategies in A , i.e. sup α ∈ A E αt,z (cid:104) (cid:90) Tt f + ( Z s , α s ) d s (cid:105) < + ∞ holds. The terminal reward g ( Z T ) is a.s. no more than linear with respect to the number of events up totime T , denoted by N T in the sequel, i.e. there exists a constant c > such as g ( Z T ) ≤ c N T , a.s.. Remark 3.1.

Under Assumption (Hcontrols) , Assumption (Hrewards) holds when g is deﬁned as thewealth of the market maker plus an inventory penalization. In particular, we have g ( Z T ) ≤ N T ¯ M , where ¯ M is the maximal number of orders that can be sent by the market maker, which holds a.s. since the bestproﬁt the market maker can make is when her buy (resp. sell) limit orders are all executed, and then theprice keeps going to the right (resp. left) direction. Hence the second condition of (Hrewards) holds with c = ¯ M . The following Lemma 3.1 tackles the well-posedness of the control problem.

Lemma 3.1.

Under (Hrewards) and (Hcontrols) , the value function is well-deﬁned, i.e. sup α ∈ A E αt,z (cid:20) g ( Z T ) + (cid:90) Tt r (cid:0) α s , Z s (cid:1) d s (cid:21) < + ∞ , where, as deﬁned previously, E αt,z [ . ] stands for the expectation conditioned by the event { Z t = z } , assumingthat strategy α ∈ A is followed in [ t, T ] .Proof. Denote by ( N t ) t the sum of all the arrivals of orders up to time t . Under (Hrewards) , we canbound E αt,z (cid:104)(cid:82) Tt f (cid:0) α s , Z s (cid:1) d s + g ( Z T ) (cid:105) , the reward functional at time t associated to a strategy α ∈ A , asfollows: E αt,z (cid:20)(cid:90) Tt f ( α s , Z s ) d s + g ( Z T ) (cid:21) ≤ sup α ∈ A E αt,z [ g ( Z T )] + sup α ∈ A E α (cid:20)(cid:90) Tt f + ( Z s , α s ) d s (cid:21) ≤ c sup α ∈ A E αt, [ N T ] + sup α ∈ A E αt,z (cid:20)(cid:90) Tt f + ( Z s , α s ) d s (cid:21) , (3.2)where once again, for all general process M and all m ∈ E , E αt,m [ M T ] stands for the expectation of M T conditioned by M t = m and assuming that the market maker follows strategy α ∈ A in [ t, T ] . Let us showthat the ﬁrst term in the r.h.s. of (3.2) is bounded. On one hand, we have: E αt, [ N T ] ≤ (cid:107) λ (cid:107) ∞ (cid:90) T E ( | a | t + | b | t ) d t, (3.3)where (cid:107) λ (cid:107) ∞ := λ L + λ C + λ M is a bound on the intensity rate of N t . On the other hand, there ex-ists a constant c > such that d( | a | + | b | ) t ≤ c d L t so that: E αt, | a | + | b | [ | a | t + | b | t ] ≤ | a | + | b | + c (cid:82) t E [ | a | s + | b | s ] d s . Applying Gronwall’s inequality, we then get: E αt, | a | + | b | [ | a | t + | b | t ] ≤ ( | a | + | b | ) e c t . (3.4)9lugging (3.4) into (3.3) ﬁnally leads to: E αt, [ N T ] ≤ c e c T wit c and c > that do not depends on α , which proves that the ﬁrst term in the r.h.s. of (3.2)is bounded. Also, its second term in the r.h.s. of (3.2) is bounded under (Hrewards) . Hence, thereward functional is bounded uniformly in α , which proves that the value function of the consideredmarket-making problem is well-deﬁned. In this section, we aim ﬁrst at reformulating the market-making problem as a Markov Decision Process(MDP), and secondly deriving a characterization of the value function as solution of a Bellman equation.We consider the Markov Decision Process (MDP) characterized by the following information [0 , T ] × E (cid:124) (cid:123)(cid:122) (cid:125) state space , A z (cid:124)(cid:123)(cid:122)(cid:125) market maker control , λ (cid:124)(cid:123)(cid:122)(cid:125) intensity of the jump , Q (cid:124)(cid:123)(cid:122)(cid:125) transitions kernel , r (cid:124)(cid:123)(cid:122)(cid:125) reward such that: • E := R × N × N K × N K × N ¯ M × N ¯ M × N ¯ M × N ¯ M × R × R is the state space of ( Z t ) . For z ∈ E , z = (cid:0) x, y, a, b, na, nb, ra, rb, pa, pb (cid:1) where: x is the cash held by the market maker, y her inventory; na (resp. nb ) is the ¯ M -dimensional vector of the ranks of the market maker’s sell (resp. buy) ordersin the queues ; ra (resp. rb ) is the ¯ M -dimensional vector of the number of ticks the ¯ M marketmaker’s sell (resp. buy) orders are from the bid (resp. ask) price; pa (resp. pb ) is the ask-price(resp. bid-price). • for every state z ∈ E , denote by A z the space of the admissible controls which is the set of all theactions the market maker can take when the order book is at state z . A z = (cid:110) ra, rb ∈ { , ..., K } ¯ M × { , ..., K } ¯ M (cid:12)(cid:12)(cid:12) rb ∗ , ra ∗ ≥ a (cid:111) , where we deﬁne c ∗ = min i { c i | c i (cid:54) = − } and c ≤ i ≤ K { c i > } for c ∈ N ¯ M . The control is thevectors of positions of the market maker’s orders. The condition for the control to be admissiblecomes from the assumption that the market maker is not allowed to cross the spread. • Given a market-making strategy α , the stochastic evolution is given by a marked point process ( T n , Z n ) where ( T n ) is the increasing sequence of jump times of the controlled order book with in-tensity λ ( Z n − ) . Just after the jump at time T n , the process can jump again, due to the decision ofthe market maker. Then it remains constant on ] T n , T n +1 [ since the market maker does not changeher strategy between two jumps.We denote by φ a ( z ) ∈ E the state of the order book at time t such that T n < t < T n +1 , given that Z T n = z and given that the strategy a has been chosen by the market maker at time T n .10 In the sequel, we denote (cid:0) [0 , T ] × E (cid:1) C := (cid:110)(cid:0) t, z, a (cid:1) ∈ E × { , . . . , K } M (cid:12)(cid:12) t ∈ [0 , T ] , z ∈ E, a ∈ A z (cid:111) ,and E C := (cid:110)(cid:0) z, a (cid:1) ∈ E × { , . . . , K } M (cid:12)(cid:12) z ∈ E, a ∈ A z (cid:111) . Q (cid:48) is the stochastic kernel from E C to E that describes the distribution of the jump goals, i.e., Q (cid:48) (cid:0) B | z, u (cid:1) is the probability that the orderbook jumps in the set B given that it was at state z ∈ E right before the jump, and the controlaction u ∈ A z has been chosen right after the jump time.An admissible policy α = ( α t ) is entirely characterized by decision functions f n : [0 , T ] × E → A such that α t = f n ( T n , Z n ) for t ∈ (cid:0) T n , T n +1 (cid:3) By abuse of notation, we denote in the sequel by α the sequence of controls ( f n ) ∞ n =0 . The intensityof the controlled process ( Z t ) is: λ ( z ) := λ M + ( z ) + λ M − ( z ) + (cid:88) ≤ j ≤ K λ L + j ( z ) + (cid:88) ≤ j ≤ K λ L − j ( z ) + (cid:88) ≤ j ≤ K λ C + j ( z ) + (cid:88) ≤ j ≤ K λ C − j ( z ) It does not depend on the strategy α chosen by the market maker since we assumed that the generalparticipants does not "see" the market maker’s orders in the order book. The intensity of the orderbook process only depends on the vectors a and b .The transition kernel of the controlled order book, given a state z , is given by: Q (cid:48) (cid:0) z (cid:48) | z, u (cid:1) =  λ M + ( z ) λ ( z ) if z (cid:48) = e M + ( φ u ( z )) ... λ C + ( z ) λ ( z ) if z (cid:48) = e C + K ( φ u ( z )) , where φ u ( z ) is the new state of the controlled order book when decision u as been taken and when theorder book was at state z before the decision; e M + ( z ) is the new state of the order book right after itreceived a buy market order, given that it was at state z before the jump; and e C ± i ( z ) is the new state ofthe order book right after it received a cancel order from a general market participant on its i th ask/bidlimit, given that it was at state z .Let us ﬁx an admissible policy α = ( f n ) ∞ n =0 ∈ A and take t ∈ [0 , T ] . Then, for all Borelian B in E, itholds: P (cid:0) T n +1 − T n ≤ t, Z n +1 ∈ B | T , Z , ...T n , Z n (cid:1) = λ ( Z n ) (cid:90) t e − λ ( Z n ) s Q (cid:48) (cid:0) B | Z T n , α T n (cid:1) d s = λ ( Z n ) (cid:90) t e − λ ( Z n ) s Q (cid:48) (cid:0) B | Z T n , f n ( Z n ) (cid:1) d s, so that the stochastic kernel Q of the MDP is deﬁned as follows: Q (cid:0) B × C | t, z, α (cid:1) := λ ( z ) (cid:90) T − t e − λ ( z ) s B ( t + s ) Q (cid:48) (cid:0) C | φ α ( z ) , α (cid:1) d s + e − λ ( z )( T − t ) T ∈ B,z ∈ C , Note that we restrict ourselves to the feedback controls here B ⊂ R + and C ⊂ E , for all ( t, z ) ∈ [0 , T ] × E , and for all α ∈ A .We denote by ( T n , Z n ) n ∈ N the corresponding state of the controlled Markov chain. It remains to deﬁnethe value function of this reformulated control problem.Let r be the running reward function r : [0 , T ] × E C → R deﬁned as: r ( t, z, a ) := − c (cid:0) z, a (cid:1) e − λ ( z )( T − t ) ( T − t ) t>T + c (cid:0) z, a (cid:1)(cid:16) λ ( z ) − e − λ ( z )( T − t ) λ ( z ) (cid:17) + e − λ ( z )( T − t ) g ( z ) t ≤ T , (3.5)and let us deﬁne the cumulated reward functional associated to the discrete-time Markov Decision Modelfor an admissible policy ( f n ) ∞ n =0 as: V ∞ , ( f n ) ( t, z ) = E ( f n ) t,z (cid:34) ∞ (cid:88) n =0 r (cid:0) T n , Z n , f n ( T n , Z n ) (cid:1)(cid:35) . The value function associated to ( T n , Z n ) n ∈ N is then deﬁned as the supremum of the cumulated rewardfunctional over all the admissible controls in A , i.e. V ∞ ( t, z ) = sup ( f n ) ∞ n =0 ∈ A V ∞ ,α ( t, z ) , ( t, z ) ∈ [0 , T ] × E, (3.6)Notice that we used the same notation for admissible controls of the MDP and those of the continuous-timecontrol problem. Proposition 3.1.

The value function of the MDP deﬁned by (3.6) coincides with (3.1) , i.e. we have forall ( t, z ) ∈ E (cid:48) : V ∞ ( t, z ) = V ( t, z ) . (3.7) Proof.

Let us show that for all α = ( f n ) ∈ A and all ( t, z ) ∈ E (cid:48) V α ( t, z ) = V ( f n ) ∞ ( t, z ) . (3.8)Let us ﬁrst denote by H n := ( T , Z , ..., T n , Z n ) . Notice then that for all admissible strategy α : V α ( t, z ) = E αt,z (cid:34) ∞ (cid:88) n =0 T >T n +1 (cid:0) T n +1 − T n (cid:1) c (cid:0) Z n , α n (cid:1) + [ T n ≤ T such that: ∀ ( z, a ) ∈ E × A, | f ( z, a ) | ≤ c (1 + | z | ) .

2. The terminal reward g has no more than a quadratic growth, i.e. there exists c > such that: ∀ z ∈ E, | g ( z ) | ≤ c (1 + | z | ) . Remark 3.2.

Assumption (HrewardsBis) holds in the case where g is the terminal wealth of the marketmaker plus a penalization of her inventory, and where with no running reward, i.e. f = 0 . The main result of this section is the following theorem that gives existence and uniqueness of a solutionto (3.1), and moreover characterizes the latter as ﬁxed point of the maximal reward operator deﬁned in(3.10).

Theorem 3.1. T admits a unique ﬁxed point v which coincides with the value function of the MDP.Moreover we have: v = V ∞ = V. Denote by f ∗ the maximizer of the operator T . Then (cid:0) f ∗ , f ∗ , ... (cid:1) is an optimal stationary (in the MDPsense) policy. emark 3.3. Theorem 3.1 states that the optimal strategy is stationary in the MDP formulation of theproblem, but of course, it is not stationary for the original time-continuous trading problem with ﬁnitehorizon (3.1) , since the time component is not a state variable anymore in the original formulation.Actually, given n ∈ N and the state of order book z at that time, the optimal decision to take at time T n is given by f ∗ (cid:0) T n , z (cid:1) . We devoted the next section to the proof of Theorem 3.1.

Remind ﬁrst that we deﬁned in the previous section E C := (cid:110)(cid:0) z, a (cid:1) ∈ E × { , . . . , K } M (cid:12)(cid:12) z ∈ E, a ∈ A z (cid:111) and (cid:0) [0 , T ] × E (cid:1) C := (cid:110)(cid:0) t, z, a (cid:1) ∈ [0 , T ] × E × { , . . . , K } M (cid:12)(cid:12) t ∈ [0 , T ] , z ∈ E, a ∈ A z (cid:111) .Let us deﬁne the bounding functions: Deﬁnition 3.1.

A measurable function b : E → R + is called a bounding function for the controlledprocess ( Z t ) if there exists positive constants c c , c g , c Q (cid:48) , c φ such that:1. | f ( z, a ) | ≤ c c b ( z ) for all ( z, a ) ∈ E C .2. | g ( z ) | ≤ c g b ( z ) for all z in E .3. (cid:82) b ( z (cid:48) ) Q (cid:48) ( dz (cid:48) | z, a ) ≤ c Q (cid:48) b ( z ) for all ( z, a ) ∈ E C .4. b ( φ αt ( z )) ≤ c φ b ( z ) for all ( t, z, α ) ∈ (cid:0) [0 , T ] × E (cid:1) C . Proposition 3.2.

Let b be such that : ∀ z ∈ E, b ( z ) := 1 + | z | . Then, b is a bounding function for the controlled process ( Z t ) , under Assumption (HrewardsBis) .Proof. Let us check that b deﬁned in Proposition 3.2 satisﬁes the four assertions in Deﬁnition 3.1. • Assertion 1 and 2 of Deﬁnition 3.1 holds under (HrewardsBis) . • First notice that ra, rb are bounded by √ ¯ M K (where we recall that K is the number of limits ineach side of the order book, and ¯ M is the biggest number of limit orders that the market maker isallowed to send in the market). Secondly, pa (cid:48) ∈ B ( pa, K ) , pb (cid:48) ∈ B ( pb, K ) , where B ( x, r ) is the ballcentered in x with radius r > , because of the limit conditions that we imposed in our LOB model.And last, we can see that | a (cid:48) | ≤ | a | + a ∞ K . These three bounds are linear w.r.t. z so that assertion3 holds. • φ α ( z ) = z α only diﬀers from z by its na , nb , and ra , rb components.But | na | ≤ √ ¯ M (cid:0) | a | + ¯ M (cid:1) and | nb | ≤ √ ¯ M (cid:0) | b | + ¯ M (cid:1) are bounded by a linear function of ( a, b ) , also | ra | and | rb | are bounded by the universal constant √ ¯ M K , so assertion 4 in Deﬁnition 3.1 holds.14et us deﬁne

Λ := (4 K + 2)sup (cid:40) λ M ± | a | + | b | , λ L ± | a | + | b | , λ C ± | a ( z ) | + | b ( z ) | (cid:41) . Note that Λ is well-deﬁned under (Harrivals) . Proposition 3.3. If b is a bounding function for ( Z t ) , then b ( t, z ) := b ( z ) e γ ( z )( T − t ) , with γ ( z ) = γ (4 K + 2)Λ (cid:0) | a | + | b | (cid:1) and γ > is a bounding function for the MDP, i.e. for all t ∈ [0 , T ] , z ∈ E, a ∈ A z , we have: | r ( t, z, a ) | ≤ c g b ( t, z ) , (cid:90) b ( s, z (cid:48) ) Q ( ds, dz (cid:48) | t, z, a ) ≤ c φ c Q e C ( T − t )

11 + γ b ( t, z ) , with C = γ Λ K (4 K + 2) (cid:0) | a | ∞ + | b | ∞ (cid:1) .Proof. Let z (cid:48) = (cid:0) x (cid:48) , y (cid:48) , a (cid:48) , b (cid:48) , na (cid:48) , nb (cid:48) , ra (cid:48) , rb (cid:48) (cid:1) be the state of the order book after an exogenous jumpoccursn given that it was at state z before the jump. Since | a (cid:48) | ≤ | a | + a ∞ K and | b (cid:48) | ≤ | b | + b ∞ , where a ∞ and b ∞ are deﬁned as the border conditions of the order book, we have: γ ( z (cid:48) ) ≤ γ ( z ) + C, (3.11)with C = γ Λ K (4 K + 2)( a ∞ + b ∞ ) . Then, we get: (cid:90) b ( s, z (cid:48) ) Q ( ds, dz (cid:48) | t, φ α ( z ) , α ) = λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) b ( t + s, z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ αs ( z ) , α (cid:1) d s = λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) b ( z (cid:48) ) e γ ( z (cid:48) )( T − ( t + s )) Q (cid:48) (cid:0) dz (cid:48) | φ αs ( z ) , α (cid:1) d s ≤ λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) b ( z (cid:48) ) e ( γ ( z )+ C )( T − ( t + s )) Q (cid:48) (cid:0) dz (cid:48) | φ αs ( z ) , α (cid:1) ds ≤ λ ( z ) (cid:90) T − t e − λ ( z ) s e ( γ ( z )+ C )( T − ( t + s )) (cid:90) b ( z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ αs ( z ) , α (cid:1) ds ≤ λ ( z ) (cid:90) T − t e − λ ( z ) s e ( γ ( z )+ C )( T − ( t + s )) c Q c φ b ( z ) ds ≤ λ ( z ) c Q c φ λ ( z ) + γ ( z ) + C e ( γ ( z )+ C )( T − t ) (cid:16) − e − ( T − t )( λ ( z )+ γ ( z )+ C ) (cid:17) b ( z ) ≤ c Q c φ λ ( z ) λ ( z ) + γ ( z ) + C e C ( T − t ) (cid:16) − e − ( T − t )( λ ( z )+ γ ( z )+ C ) (cid:17) b ( t, z ) , where we applied (3.11) at the thrid line. It remains to notice that λ ( z ) λ ( z ) + γ ( z ) + C = λ ( z ) λ ( z ) (cid:0) γ (cid:1) + γ (cid:2) Λ( | a | + | b | ) − λ ( z ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ (cid:3) ≤

11 + γ , to complete the proof of the Proposition. 15et us denote by (cid:107) . (cid:107) b the weighted supremum norm such that for all measurable function v : E (cid:48) → R , (cid:107) v (cid:107) b := sup ( t,z ) ∈ E (cid:48) | v ( t, z ) | b ( t, z ) , and deﬁne the set: B b := (cid:110) v : E (cid:48) → R | v is measurable and (cid:107) v (cid:107) b < ∞ (cid:111) . Moreover let us deﬁne α b := sup ( t,z,α ) ∈ E (cid:48) ×R (cid:82) b ( s, z (cid:48) ) Q ( ds, dz (cid:48) | t, φ α ( z ) , α ) b ( t, z ) . From the preceding estimations we can bound α b as follows: α b ≤ c Q c φ

11 + γ e CT , So that, by taking: γ = c Q c φ e CT , we get: α b < . In the sequel, we then assume w.l.o.g. that α b < .Recall that the maximal reward mapping for the MDP has been deﬁned as: T v : ( t, z ) (cid:55)→ sup a ∈ A z (cid:26) r ( t, z, a ) + λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) v ( t + s, z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ a ( z ) , a (cid:1) d s (cid:27) It is straightforward to see that: (cid:107)T v − T w (cid:107) b ≤ α b (cid:107) v − w (cid:107) b , (3.12)which implies that T is contracting, since α b < .Let M be the set of all the continuous function in B b . Since b is continuous, ( M , (cid:107) . (cid:107) b ) is a Banachspace. T sends M to M . Indeed, for all continuous function v in B b , ( t, z, a ) (cid:55)→ r ( t, z, a )+ λ ( z ) (cid:82) T − t e − λ ( z ) s (cid:82) v ( t + s, z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ a ( z ) , a (cid:1) ds is continuous on [0 , T ] × E C . A z is ﬁnite, so we get the continuity of the application: T v : ( t, z ) (cid:55)→ sup a ∈ A z (cid:26) r ( t, z, a ) + λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) v ( t + s, z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ a ( z ) , a (cid:1) d s (cid:27) . Proposition 3.4.

There exists a maximizer for T , i.e. let v ∈ M , then there exists a Borelian function f : [0 , T ] × E → A such that for all ( t, z ) ∈ E (cid:48) : T v (cid:16) t, z, f (cid:0) t, z (cid:1)(cid:17) = sup a ∈ A (cid:26) r ( t, z, a ) + λ ( z ) (cid:90) T − t e − λ ( z ) s (cid:90) v ( t + s, z (cid:48) ) Q (cid:48) (cid:0) dz (cid:48) | φ a ( z ) , a (cid:1) d s (cid:27) Proof. D ∗ ( t, z ) = (cid:110) a ∈ A (cid:12)(cid:12) T a v ( t, z ) = T v ( t, z ) (cid:111) is ﬁnite, so it is compact. So ( t, z ) (cid:55)→ D ∗ ( t, z ) is acompact-valued mapping. Since the application ( t, z, a ) (cid:55)→ T a ( t, z ) − T ( t, z ) is continuous, we get that D ∗ = (cid:110) ( t, z, a ) ∈ E (cid:48) C (cid:12)(cid:12) T a v ( t, z ) = T v ( t, z ) (cid:111) is borelian. Applying the measurable selection theorem yieldsto the existence of the maximizer. (see [BR11] p.352)16 emma 3.2. The following holds: sup α ∈A E αt,z (cid:34) ∞ (cid:88) k = n | r ( T k , Z k ) | (cid:35) ≤ α nb − α b b ( t, z ) , and in particular, we have: lim n →∞ sup α ∈A E αt,z (cid:34) ∞ (cid:88) k = n (cid:12)(cid:12) r (cid:0) t k , Z k (cid:1)(cid:12)(cid:12)(cid:35) = 0 . Proof.

By conditioning we get E αt,z (cid:104)(cid:12)(cid:12) r ( T k , Z k ) (cid:12)(cid:12)(cid:105) ≤ c g α bk b ( t, z ) for k ∈ N , and for all α ∈ A . It remains tosum this inequality to complete the proof of Lemma 3.2.We can now prove Theorem 3.1. Proof.

We divided the proof of Theorem 3.1 into four steps.

Step 1:

Inequality (3.12) and Proposition 3.3 imply that T is a stable and contracting operator deﬁned onthe Banach space M . Banach’s ﬁxed point theorem states that T admits a ﬁxed point, i.e. there existsa function v ∈ M such that v = T v , and moreover we have v = lim n →∞ T n . Notice that T N coincideswith v deﬁned recursively by the following Bellman equation: (cid:26) v N = 0 v n = T v n +1 for n = N − , ..., . (3.13)The solution of the Bellman equation is always larger than the value function of the MDP associated(see e.g. Theorem 2.3.7 p.22 in [BR11]). Then we have: T n ≥ sup ( f k ) E ( f k ) n (cid:20) (cid:80) n − k =0 r ( t k , X k ) (cid:21) =: J n , where J n is the value function of the MDP with ﬁnite horizon n and terminal reward 0, associated to (3.13).Moreover, by Lemma 7.1.4 p.197 in [BR11], we know that (cid:0) J n (cid:1) n converges as n → ∞ to a limit that wedenote by J . Passing at the limit in the previous inequality we get: lim n →∞ T n ≥ J , i.e. v ≥ J. (3.14) Step 2:

Let us ﬁx a strategy α ∈ A , and take n ∈ N . We denote J n ( α ) := E ( α k )0 (cid:20) (cid:80) n − k =0 r ( t k , X k ) (cid:21) , thereward functional associated to the control α on the discrete ﬁnite time horizon { , . . . , n } . By deﬁnition,we have J n ( α ) ≤ J n . We get by letting n → ∞ : lim n → + ∞ J n ( α ) =: J ∞ ( α ) ≤ J . Taking the supremumover all the admissible strategies α ﬁnally leads to: V ∞ ≤ J. (3.15) Step 3:

Let us denote by f a maximizer of T associated to v , which exists, as stated in Proposition 3.4. v is the ﬁxed point of T so that v = T nf ( v ) , for n ∈ N . Moreover v ≤ δ where δ := sup α ∈A E (cid:104) (cid:80) ∞ k =0 r + ( Z k , α k ) (cid:105) ,so that T nf ( v ) ≤ T nf T no δ , where T no δ = sup α E αn (cid:104) (cid:80) ∞ k = n r + ( t k , Z k ) (cid:105) . Lemma 3.2 implies that T no δ → as n → ∞ . Hence, we get: v ≤ J f . (3.16)17 tep 4: Conclusion. Since it holds J f ≤ V ∞ , (3.17)we get by combining (3.14), (3.15), (3.16) and (3.17): V ∞ ≤ J ≤ v ≤ J f ≤ V ∞ . (3.18)All the inequalities in (3.18) are then equalities, which completes the proof of Theorem 3.1. In this section, we ﬁrst introduce an algorithm to numerically solve a general class of discrete-time controlproblem with ﬁnite horizon, and then apply it on the trading problem (3.1).

Let us consider a general discrete-time stochastic control problem over a ﬁnite horizon N ∈ N \ { } . Thedynamics of the controlled state process Z α = ( Z αn ) n valued in R d is given by Z αn +1 = F ( Z αn , α n , ε n +1 ) , n = 0 , . . . , N − , Z α = z ∈ R d , with ( ε n ) n is a sequence of i.i.d. random variables valued in some Borel space ( E, B ( E )) , and deﬁned onsome probability space (Ω , F , P ) equipped with the ﬁltration F = ( F n ) n generated by the noise ( ε n ) n ( F is the trivial σ -algebra), the control α = ( α n ) n is an F -adapted process valued in A ⊂ R q , and F is ameasurable function from R d × R q × E into R d .Given a running cost function f deﬁned on R d × R q , a terminal cost function g deﬁned on R d , the costfunctional associated to a control process α is J ( α ) = E (cid:34) N − (cid:88) n =0 f ( Z αn , α n ) + g ( Z αN ) (cid:35) . The set A of admissible control is the set of control processes α satisfying some integrability conditionsensuring that the cost functional J ( α ) is well-deﬁned and ﬁnite. The control problem, also called Markovdecision process (MDP), is formulated as V ( x ) := sup α ∈A J ( α ) , and the goal is to ﬁnd an optimal control α ∗ ∈ A , i.e., attaining the optimal value: V ( z ) = J ( α ∗ ) . Noticethat problem (4.1)-(4.2) may also be viewed as the time discretization of a continuous time stochasticcontrol problem, in which case, F is typically the Euler scheme for a controlled diﬀusion process.Problem (4.2) is tackled by the dynamic programming approach. For n = N, . . . , , the value function V n at time n is characterized as solution of the following backward (Bellman) equation: (cid:40) V N ( z ) = g ( z ) V n ( z ) = sup a ∈ A (cid:8) f ( z, a ) + E an,z [ V n +1 ( Z n +1 )] (cid:9) , z ∈ R d , (4.3)18oreover, when the supremum is attained in the DP formula at any time n by a ∗ n ( z ) , we get an optimalcontrol in feedback form given by: α ∗ = ( a ∗ n ( Z ∗ n )) n where Z ∗ = Z α ∗ is the Markov process deﬁned by Z ∗ n +1 = F ( Z ∗ n , a ∗ n ( Z ∗ n ) , ε n +1 ) , n = 0 , . . . , N − , Z ∗ = z. There are two usual ways that have been studied in the literature, to solve numerically (4.3): somemethods make use of quantization to discretize to state space and approximate the conditional expectationsby cubature methods; another way is to rely on MC regress-now or Later methods to regress the valuefunctions V n +1 at time n for n = 0 , . . . , N − on basis functions or neural networks. See e.g. [KLP14] forthe regress-now and [BP17] for the regress-Later methods for algorithms using basis functions, and e.g.[HPBL18] for regression on neural networks based on regress-now or regress-later techniques. In this section, we present an algorithm based on k-nn estimates for local non-parametric regression of thevalue function, and optimal quantization to quantize the exogenous noise, in order to numerically solve(4.3).Let us ﬁrst introduce some ingredients of the quantization approximation: • We denote by ˆ ε a K -quantizer of the E -valued random variable ε n +1 ∼ ε , that is a discrete randomvariable on a grid Γ = { e , . . . , e K } ⊂ E K deﬁned by ˆ ε = Proj Γ ( ε ) := K (cid:88) (cid:96) =1 e l ε ∈ C i (Γ) , where C (Γ) , . . . , C K (Γ) are Voronoi tesselations of Γ , i.e., Borel partitions of the Euclidian space ( E, | . | ) satisfying C (cid:96) (Γ) ⊂ (cid:26) e ∈ E : | e − e (cid:96) | = min j =1 ,...,K | e − e j | (cid:27) . The discrete law of ˆ ε is then characterized by ˆ p (cid:96) := P [ˆ ε = e (cid:96) ] = P [ ε ∈ C (cid:96) (Γ)] , (cid:96) = 1 , . . . , K. The grid points ( e (cid:96) ) which minimize the L -quantization error (cid:107) ε − ˆ ε (cid:107) lead to the so-called optimal L -quantizer, and can be obtained by a stochastic gradient descent method, known as Kohonenalgorithm or competitive learning vector quantization (CLVQ) algorithm, which also provides as abyproduct an estimation of the associated weights (ˆ p (cid:96) ) • Recalling the dynamics (4.1), the conditional expectation operator is equal to P ˆ a Mn ( z ) W ( x ) = E (cid:2) W ( Z ˆ a Mn n +1 ) | Z n = x (cid:3) = E (cid:2) W ( F ( z, ˆ a Mn ( z ) , ε )) (cid:3) , z ∈ , P (cid:86) ˆ a Mn ( z ) W ( z ) := E (cid:2) W ( F ( z, ˆ a Mn ( z ) , ˆ ε )) (cid:3) = K (cid:88) (cid:96) =1 ˆ p (cid:96) W ( F ( z, ˆ a Mn ( z ) , e (cid:96) )) . Let us secondly introduce the notion of training distribution that will be used to build the estimatorsof value functions at time n , for n = 0 , . . . , N − . Let us consider a measure µ on the state space E . Werefer to it in the sequel as the training measure. Let us take a large integer M , and for n = 0 , . . . , N ,introduce Γ n = (cid:110) Z (1)1 , . . . , Z ( M ) n (cid:111) , where (cid:16) Z ( m ) n (cid:17) Mm =1 is a i.i.d. sequence of r.v. following law µ . Γ n shouldbe seen as a training sampling to estimate the value function V n at time n .The proposed algorithm reads as:  ˆ V QN ( z ) = g ( z ) , for z ∈ Γ N , ˆ Q n ( z, a ) = (cid:80) K(cid:96) =1 p (cid:96) (cid:104) f ( z, a ) + ˆ V Qn +1 (cid:0) Proj n +1 (cid:0) F (cid:0) z, e (cid:96) , a (cid:1)(cid:1)(cid:1)(cid:105) , ˆ V Qn ( z ) = sup a ∈ A ˆ Q n ( z, a ) , for z ∈ Γ n , n = 0 , . . . , N − . (4.4)where, for n = 0 , . . . , N , Proj n ( z ) stands for the closest neighbor of z ∈ E in the grid Γ n , i.e. the operator z (cid:55)→ Proj n ( z ) is actually the euclidean projection on the grid Γ n . Remark 4.1.

We could have generalized the operator

Proj n by considering z ∈ E (cid:55)→ ˆ z = k (cid:80) kj =1 w j Z ( j ) n ,with the weight w j such as w j ( z ) = (cid:12)(cid:12)(cid:12) z − Z ( j ) n (cid:12)(cid:12)(cid:12)(cid:80) ki =1 (cid:12)(cid:12)(cid:12) z − Z ( i ) n (cid:12)(cid:12)(cid:12) , and where Z ( j ) n stands for the j th nearest neighbors of z in Γ n , for j = 1 , . . . , k . This generalization bringscontinuity to the estimates.Others local generalizations of Proj n , based e.g. kernel methods, are available in the literature, and werefer to [BKS10] for more details. In the sequel, we refer to (4.4) as the Qknn algorithm.We shall make the following assumption on the transition probability of ( Z n ) ≤ n ≤ N , to guarantee theconvergence of the Qknn algorithm. (Htrans) Assume that the transition probability P ( Z n +1 ∈ A (cid:12)(cid:12) Z n = z, a ) conditioned by Z n = z whencontrol a is followed at time n admits a density r w.r.t. the training measure µ , which is uniformlybounded and lipschitz w.r.t. the state variable z , i.e. there exists (cid:107) r (cid:107) ∞ > such that for all z ∈ E andcontrol u taken at time n : | r ( y ; n, x, a ) | ≤ (cid:107) r (cid:107) ∞ and | r ( y ; n, x, a ) − r ( y ; n, x (cid:48) , a ) | ≤ [ r ] L | x − x (cid:48) | and r is deﬁned as follows: P ( Z n +1 ∈ O (cid:12)(cid:12) Z n = z, u ) = (cid:90) O r ( y ; n, x, a ) dµ ( y ) . [ r ] L the Lipschitz constant of r w.r.t. x .Denote by Supp( µ ) the support of µ . We shall assume smoothness conditions on µ and F to providea bound on the projection error. (H µ ) We assume

Supp( µ ) to be bounded, and denote by (cid:107) µ (cid:107) ∞ the smallest real such that Supp( µ ) ⊂ B (0 , (cid:107) µ (cid:107) ∞ ) . Moreover, we assume x ∈ E (cid:55)→ µ (cid:0) B ( x, η ) (cid:1) to be Lipschitz, uniformly w.r.t. η , and we denoteby [ µ ] L its Lipschitz constant. (HF) For x ∈ E and a ∈ A , assume F to be L -Lipschitz w.r.t. the noise component ε , i.e., there exists [ F ] L > such that for all x ∈ E and a ∈ A , for all r.v. ε and ε (cid:48) , we have: E (cid:2)(cid:12)(cid:12) F ( x, a, ε ) − F ( x, a, ε (cid:48) ) (cid:12)(cid:12)(cid:3) ≤ [ F ] L E (cid:2)(cid:12)(cid:12) ε − ε (cid:48) (cid:12)(cid:12)(cid:3) We now state the main result of this section whose proof is postponed in Appendix C.

Theorem 4.1.

Take K = M d points for the optimal quantization of the exogenous noise ε n , n =1 , . . . , N . There exist constants [ ˆ V Qn ] L > , that only depends on the Lipschitz coeﬃcients of f , g and F ,such that, under (Htrans) , it holds for n = 0 , ..., N − , as M → + ∞ : (cid:107) ˆ V Qn ( X n ) − V n ( X n ) (cid:107) ≤ N (cid:88) k = n +1 (cid:107) r (cid:107) N − k ∞ (cid:104) ˆ V Qk (cid:105) L (cid:16) ε projk + [ F ] L ε Qk (cid:17) + O (cid:18) M /d (cid:19) , (4.5) where ε Qk := (cid:107) ˆ ε k − ε k (cid:107) stands for the quantization error, and ε projn := sup a ∈ A (cid:107) Proj n +1 ( F ( X n , a, ˆ ε n )) − F ( X n , a, ˆ ε n ) (cid:107) stands for the projection error, when decision a is taken at time n . Remark 4.2.

The constants [ ˆ V Qn ] L > are deﬁned in (C.8) . From Theorem 4.1, we can deduce consistency and provide a rate of convergence for the estimator ˆ V Qn , n = 0 , . . . , N − , under some rather tough yet usual compactness conditions on the state space. Corollary 4.1.

Under (H µ ) and (HF) , the Qknn-estimator ˆ V Qn is consistent for n = 0 , . . . , N − , whentaking M d +1 points for the quantization; and moreover, we have for n = 0 , ..., N − , as M → + ∞ : (cid:107) ˆ V Qn ( X n ) − V n ( X n ) (cid:107) ≤ O (cid:18) M /d (cid:19) . Proof.

We postpone the proof of Theorem 4.1 to Appendix C.21 .3 Qknn agorithm applied to the order book control problem (3.1)

We recall the expression of the controlled order book, as described in section 3: Z t = (cid:0) X t , Y t , a t , b t , na t , nb t , pa t , pb t , ra t , rb t (cid:1) . In section 3.3, we proved that the value function V is characterized as the unique solution of the Bellmanequation (3.10). In this section, some implementation details on the Qknn algorithm are presented inorder to numerically solve the market-making problem. Training set design

Inspired by [FPS18], we use product-quantization method and randomization techniques to build thetraining set Γ n on which we project ( T n , Z n ) that lies on [0 , T ] × E , where T n and Z n stands for the n th jump of Z and the state of Z at time t n , i.e. Z n = Z T n , for n ≥ . This basic idea of ControlRandomization consists in replacing in the dynamics of Z the endogenous control by an exogenous control ( I T n ) n ≥ , as introduced in [KLP14]. In order to alleviate the notations, we denote by I n the control takenat time T n , for n ≥ . Initialization.

Set: Γ E = { z } and Γ T = { } .Randomize the control, using e.g. uniform distribution on A at each time step, and then simulate D randomized processes to generate ( T kn , Z kn ) N,Dn =0 ,k =1 .For all n = 1 , . . . , N , set Γ Tn = { T kn , ≤ k ≤ D } , which stands for the grid associated to the quantizationof the n th jump time T n , and set Γ En = { Z kn , ≤ k ≤ D } which stands for the grid associated to thequantization of the state Z n of Z at time T n . Remark 4.3.

The way we chose our training sets is often referred to as an exploration strategy in thereinforcement learning literature. Of course, if one has ideas or good guess of where to optimally drivethe controlled process, she shouldn’t follow an exploration-type strategy to build the training set, but shouldrather use the guess to build it, which is referred to as the exploitation strategy in the reinforcementlearning and the stochastic bandits literature. We refer to [Bal+19] for several other applications of the exploration strategy to build training sets.

Let F and G be the Borelian functions such that Z n = F (cid:0) Z n − , d n , I n (cid:1) and T n = G (cid:0) T n − , (cid:15) n , I n (cid:1) ,where (cid:15) n ∼ E (1) stands for the temporal noise, and d n is the state noise, for n ≥ .Let us ﬁx N ≥ and consider (cid:0) T (cid:86) n , Z (cid:86) n (cid:1) Nn =0 , the dimension-wise projection of (cid:0) T n , Z n (cid:1) Nn =0 on the grids Γ Tn × Γ En , n = 0 , . . . , N , i.e. T (cid:86) = 0 , Z (cid:86) = z , and  T (cid:86) n = Proj (cid:16) G (cid:0) T (cid:86) n − , (cid:15) n , I n (cid:1) , Γ Tn (cid:17) ,Z (cid:86) n = Proj (cid:16) F (cid:0) Z (cid:86) n − , d n , I n (cid:1) , Γ En (cid:17) , for n = 1 , . . . , N. (cid:0) T (cid:86) n , Z (cid:86) n , I n (cid:1) n ∈{ ,N } is a Markov chain, and its probability transition matrix at time n = 1 , ..., N reads: ˆ p ijk ( a ) = P (cid:104) ˆ t k = t jk , Z (cid:86) k = z jk (cid:12)(cid:12)(cid:12) ˆ t k − = t ik − , Z (cid:86) k − = z ik − , I k = a (cid:105) = ˆ β ijk ˆ p ik − , i = 1 , ..., N k − , j = 1 , ..., N k , a ∈ A ˆ p ik − = P (cid:2) ˆ t k − = t ik − , Z (cid:86) k − = z (cid:48) ik − (cid:3) = (cid:26) P (cid:2) F (cid:0) ˆ t k − , Z (cid:86) k − , (cid:15) k − , d k − (cid:1) ∈ C i (Γ k − × Γ Ek − ) (cid:3) if k ≥ if k = 1ˆ β ijk = P (cid:104) ˆ t k − = t ik − , Z (cid:86) k − = z ik − , ˆ t k = t jk , Z (cid:86) k = z (cid:48) jk (cid:105) = (cid:40) P (cid:104) F k (cid:0) ˆ t k − , Z (cid:86) k − , (cid:15) k − , d k − (cid:1) ∈ C i (Γ k − × Γ Ek − ); F k (cid:0) ˆ t k − , Z (cid:86) k − , (cid:15) k , d k (cid:1) ∈ C i (Γ k × Γ Ek ) (cid:105) if k ≥ if k = 1 and where, for all i, ≤ i ≤ D , for all k ∈ N , we denoted by C i (Γ k × Γ Ek ) the Voronoï cell associated tothe point ( T ik , z ik ) .Deﬁne then (cid:0) T (cid:86) Qn , Z (cid:86) Qn (cid:1) Nn =0 as temporal noise-quantized version of (cid:0) T (cid:86) n , Z (cid:86) n , I n (cid:1) Nn =0 . Note that we do notneed to quantize the spacial noise since this noise already takes a ﬁnite number of states. Let ˆ ε n be thequantized process associated to (cid:15) n . The process (cid:0) T (cid:86) Qn , Z (cid:86) Qn (cid:1) Nn =0 is then deﬁned as follows: Z (cid:86) Q = z , T (cid:86) Q = 0 and ∀ ≤ n ≤ N :  T (cid:86) Qn = Proj (cid:16) G (cid:0) ˆ t n − , ˆ ε n , I n (cid:1) , Γ Tn (cid:17) ,Z (cid:86) Qn = Proj (cid:16) F (cid:0) Z (cid:86) n − , d n , I n (cid:1) , Γ En (cid:17) . Denote by (cid:16) V (cid:86) Q, ( N,D ) n (cid:17) Nn =0 the solution of the Bellman equation associated to (cid:16) T (cid:86) Qn , Z (cid:86) Qn (cid:17) Nn =0 : ( B (cid:86) QN,D ) :  V (cid:86) Q, ( N,D ) N = 0 V (cid:86) Q, ( N,D ) n ( t, z ) = r ( t, z, a ) + sup a ∈ A (cid:110) E at,z (cid:104) V (cid:86) Q, ( N,D ) n +1 (cid:0) T (cid:86) Qn +1 , Z (cid:86) Qn +1 (cid:1)(cid:105)(cid:111) , for n = 0 , . . . , N, where E at,z [ . ] stands for the expectation conditioned by the events T (cid:86) Qn = t , Z (cid:86) Qn = z and when decision I n = a is taken at time t .We wrote the pseudo-code of the Qknn algorithm to compute ( B (cid:86) QN,D ) in Algorithm 1.We discuss in Remark 4.4 the reasons why we can apply Theorem 4.1. Remark 4.4.

When the number of jumps of the LOB N ≥ is ﬁxed, the set of all the states that cantake the controlled order book by jumping less than N times, denoted by K in the sequel, is ﬁnite. Hence,the reward function r , deﬁned in (3.5) , is bounded and Lipschitz on K . The following proposition states that V (cid:86) Q, ( N,D ) n , built from the combination of time-discretization, k -nearest neighbors and optimal quantization methods, is a consistent estimator of the value function attime T n , for n = 0 , . . . , N − . It provides a rate of convergence for the Qknn-estimations of the valuefunctions. 23 lgorithm 1 Generic Qknn Algorithm

Inputs :– N : number of time steps– z : state in E at time T = 0 – Γ ε = { e , . . . , e L } and ( p (cid:96) ) L(cid:96) =1 : the grid and the weights for the optimal quantization of ( ε n ) Nn =1 .– Γ n and Γ En the grids for the projection of respectively the time and the state components at time n , for n = 0 , . . . , N . for i = N − , . . . , do Compute the approximated

Qknn -value at time n : ˆ Q n ( z, a ) = r ( T n , z, a )+ L (cid:88) (cid:96) =1 p (cid:96) (cid:98) V Qn +1 (cid:0) Proj (cid:0) G ( z, e (cid:96) , a ) , Γ Tn +1 (cid:1) , Proj (cid:0) F ( z, e (cid:96) , a ) , Γ En +1 (cid:1)(cid:1) , for ( z, a ) ∈ Γ n × A z ; Compute the optimal control at time n ˆ A n ( z ) ∈ argmin a ∈ A z ˆ Q n ( z, a ) , for z ∈ Γ n , where the argmin is easy to compute since A z is ﬁnite for all z ∈ E ; Estimate analytically by quantization the value function: (cid:98) V Qn ( z ) = ˆ Q n (cid:16) z, ˆ A n ( z ) (cid:17) , ∀ z ∈ Γ n ; end forOutput: – ( (cid:98) V Q ) : Estimate of V (0 , z ) ; 24 roposition 4.1. The estimators of the value functions provided by Qknn algorithm are consistent. More-over, it holds as M → + ∞ : (cid:13)(cid:13)(cid:13) V (cid:86) Q, ( N,M ) n (cid:16) T (cid:86) n , Z (cid:86) n (cid:17) − V n (cid:0) T n , Z n (cid:1)(cid:13)(cid:13)(cid:13) M, = O (cid:18) α N + 1 M /d (cid:19) , for n = 0 , . . . , N − , where we denote by (cid:107) . (cid:107) M, the L ( µ ) norm conditioned by the training sets that have been used to buildthe estimator V (cid:86) Q, ( N,M ) n +1 .Proof. Splitting the error of time cutting and quantization, we get: (cid:107) V n (cid:0) T n , Z n (cid:1) − V (cid:86) ( N,M ) n (cid:0) T (cid:86) n , Z (cid:86) n (cid:1) (cid:107) M, ≤ (cid:107) V n (cid:0) T n , Z n (cid:1) − V ( N ) n (cid:0) T n , Z n (cid:1) (cid:107) M, + (cid:107) V ( N ) n (cid:0) T n , Z n (cid:1) − V (cid:86) ( N,M ) n (cid:0) T (cid:86) n , Z (cid:86) n (cid:1) (cid:107) M, . (4.8) Step 1:

Applying Lemma 3.2, we get the following bound on the ﬁrst term in the r.h.s. of (4.8): (cid:107) V n (cid:0) T n , Z n (cid:1) − V ( N ) n (cid:0) T n , Z n (cid:1) (cid:107) M, ≤ α N − α (cid:107) b (cid:107) ∞ , (4.9)where (cid:107) b (cid:107) ∞ stands for the supremum of b over [0 , T ] × E . Step 2:

Note that the assumptions of Theorem 4.1 are met as noticed in Remark 4.4, so that the latterprovides the following bound for the second term in the r.h.s. of (4.8): (cid:13)(cid:13)(cid:13) V ( N ) n (cid:0) T n , Z n (cid:1) − V (cid:86) Q, ( N,M ) n (cid:0) T (cid:86) n , Z (cid:86) n (cid:1)(cid:13)(cid:13)(cid:13) M, = M →∞ O (cid:18) M /d (cid:19) . (4.10)It remains to plug (4.9) and (4.10) into (4.8) to complete the proof of Proposition 4.1.We provide a diagram in ﬁgure 3 to summarize the two main steps in the estimation of the valuefunction of the market-making problem deﬁned in (3.1). In this section, we propose several settings to test the eﬃciency of Qknn on simulated order books. We takeno running reward, i.e. f = 0 , and take the wealth of the market maker as terminal reward, i.e. g ( z ) = x .The intensities are taken constant in some tests, and state dependent on other tests. The values of theintensities are similar to the ones in [HLR15]. Although the intensities are assumed uncontrolled in section3 for predictability reasons, the latter are controlled processes in this section, i.e. the intensities of theorder arrivals depends on the orders in the order book from all the participant plus the ones of the marketmaker. The optimal trading strategies have been computed among two diﬀerent classes of strategies: insection 4.4.1, we tested the algorithm to approximate the optimal strategy among those where the marketmaker is only allowed to place orders only at the best bid and the best ask. The dynamics of the controlledorder book for such a class of controls are available in Section B in the Appendix. In Section 4.4.2, wecomputed the optimal trading strategy among the class of the strategies where the market maker allows25 , N → ∞ V ( N ) V = V ∞ V ( N,K ) N → ∞ K →∞ Figure 3 – Numerical resolution of the algorithmic control problem. We ﬁrst bound the number of eventsand then quantize the state space.herself to place orders on the two best limits on each side of the order book. Note that the second classof controls is more general than the ﬁrst one.The search of the k nearest neighbors, that arises when estimating the conditional expectations usingthe Qknn algorithm, is very time-consuming; especially in the considered market-making problem whichis of dimension more than 10. The eﬃciency of Qknn then highly depends on the algorithm used to ﬁndthe k nearest neighbors in high-dimension. Qknn algorithm has been implemented using the Fast Libraryfor Approximate Nearest Neighbors algorithm (FLANN), introduced in [ML09] and already available inlibraries in C++, Python, Julia and many other languages. This algorithm is based on tree methods.Note that recent algorithms based on graph also proved to perform well, and can also be used. Denote by A1lim the class of controls where the placements of orders in allowed on the best ask andbest bid exclusively. We implement the Qknn algorithm to compute the optimal strategy among thosein A1lim. We then compared the optimal strategy with a naive strategy which consists in always placingone order at the best bid and one order at the best ask. The naive strategy is called 11 in the plots, andcan be seen as a benchmark. The naive strategy is a good benchmark when the model for the intensitiesof order arrivals is symmetrical, i.e. the intensities for the bid and the ask sides are the same. Indeed, inthis case, the market maker can expect to earn the spread in average.

Numerical results:

In Figure 4, we take constant intensities to model the limit and market orders arrivals, and linearintensity to model the cancel orders. In this setting, as we can see in the ﬁgure, the strategy computedusing Qknn algorithm performs as well as the naive strategy. Note that, obviously, the market maker hasto take enough points for the state quantization in order for Qknn algorithm to perform well. In ﬁgure 5,we plotted the P&L of the market maker when the latter compute the optimal strategy using only 600026igure 4 – Symmetrical intensities, Size of the grids: 100000. Short terminal time: T=1. Notice that theQknn strategy reduces the variance of the P&L, but the expected wealth when following Qknn strategy(StratOpt2lim) is the same as the one following the naive strategy (Strat11).points for the state space discretization, and for such a low number of points for the grid, Qknn algorithmperforms poorly.In Figure 6, we plotted the empirical histogram of the P&L of the market maker using the Qknn-estimated optimal strategy, computed with grids of size N = 1000 , , , for the statespace discretization; and the empirical histogram of the P&L of the market maker using the naive strategy.One can see that the larger the size of the grids are, the better the Qknn-estimation of the optimal strategyis. We plot in Figure 7 the results of simulations run taking a short terminal time T=1, and intensitiesthat depend on the size of the queues. In this setting, notice that the naive strategy does not performwell anymore, but the Qknn algorithm still does well, when the market maker takes enough points forstate space discretization.In ﬁgure 7, we plot the P&L of the market maker following the Qknn strategy and the naive strategy,and we took the same parameters as in ﬁgure 6 to run the simulations expect from the terminal time thatwe set as T=10. As expected , the expected wealth of the market maker is larger when terminal time islarger and when the latter follows the Qknn-estimated optimal strategy. Note that the expectation of thelatter remains the same when she follows the naive strategy. We extend the class of admissible controls to the ones where the market maker places order on the ﬁrsttwo limits on the bid and ask sides of the order book. Denote by A lim the latter. We run simulationsto test the Qknn algorithm on A2lim. In ﬁgure 8 and ﬁgure 9, we plot the empirical distributions of theP&L when the market maker follows the three diﬀerent strategies: The value function for the market-making problem is by deﬁnition a non-decreasing function w.r.t. the time component • Qknn-estimated optimal strategy among those in A2lim (PLOpt2lim). • Qknn-estimated optimal strategy among those in A1lim (PLOpt1lim). • naive strategy, i.e. always place orders on the best bid and best ask queues (PL11).Note that the P&L of the market maker is always better when the class of admissible controls is extended,see ﬁgure 8, but in some models of order books, the extended set of controls doest not improve the P&L,i.e. sup α ∈ A lim V α = sup α ∈ A lim V α . We consider in this section a market maker who aims at maximizing a function of her terminal wealth,penalizing her inventory at terminal time T in the case where the orders arrivals are driven by Hawkesprocesses.Let us ﬁrst present the model with Hawkes processes for the LOB. Model for the LOB:

We assume that the order book receives limit, cancel, and market orders. We denote by L + (resp. L − ) thelimit order arrivals process the ask (resp. bid) side; by C + (resp. C − ) the cancel order on the ask (resp.bid) side; and by M + (resp. M − ) the buy (resp. sell) market order arrivals processes. In this section,the limit orders arrivals are assumed to follow Hawkes processes dynamics, and moreover we assume thekernel to be exponential. The order arrivals are then modeled by a (4K+2)-variate Hawkes process ( N t ) with a vector of exogenous intensities λ and exponential kernel φ , i.e. φ ij ( t ) = α ij β ij e β ij t t ≥ . Note thatin the presented model, the following holds: (H λ ) λ is assumed to be independent of the control.Denoting by D = 4 K + 2 the dimension of ( N t ) , the m th component of the intensity λ of N t writes, under28 a) (b)(c) (d) Figure 6 – P&L when the intensities λ M , λ Li and λ Ci depend on the state of the order book. Figure6a shows the P&L of the market maker when following the Qknn-estimated optimal strategy computedwith 1000 points for the state space discretization. Figure 6b shows the P&L when following the Qknn-estimated optimal strategy computed with 9000 points for the state space discretization. Figure 6c showsthe P&L when following the Qknn-estimated optimal strategy computed with 100000 points for the statespace discretization. Figure 6d shows the P&L when following the Qknn-estimated optimal strategycomputed with 1000000 points for the state space discretization.The reader can see that the market maker increases her expected terminal wealth by taking more andmore points for the state space discretization. Also, the naive strategy is beaten when the intensities arestate dependent. 29igure 7 – P&L of the market maker following the optimal strategy and following the naive strategy 11.Symmetrical state dependent intensities. Long Terminal Time: T=10. Notice that the Qknn strategydoes better than the naive strategy when the intensities are state dependent.Figure 8 – P&L of the market maker who follows optimal strategies and the naive strategy (PL11). ShortTerminal Time. asymmetrical intensities for the market order arrivals: the intensity for the buying marketorder process is taken higher than the one for the selling market order process. The wealth of the marketmaker is greater when she places orders on the two ﬁrst limits of each sides of the order book, rather thanwhen she places orders only on the best limits at the bid and ask sides.30igure 9 – P&L when following the optimal strategy or the naive strategy (PL11). Long Terminal Time.Symmetrical intensities for the arrival of market orders. 400000 points for the quantization. Notice thatthe Qknn strategy computed on the extended class of controls, i.e. order placements on the two ﬁrstlimits (StratOpt2lim), performs as well as the one computed on the original class of controls, i.e. orderplacements on the best-bid and best-ask (StratOpt1lim).Figure 10 – P&L of the market maker who follows the optimal strategy and following the naive strategy11. Long Terminal Time. Constant and symmetrical intensities for the arrivals of orders. Notice that thestrategies computed by Qknn algorithm when taking A2lim performs as well as the one computed on thetwo best limits of the order book exclusively. Then, in this setting, placing orders only at the best-askand best-bid seems to be the the optimal strategy. 31 H λ ) : λ mt = λ m + D (cid:88) j =1 α mj (cid:90) t e − β mj ( t − s ) d N js , for m = 1 , . . . , D, or equivalently: d λ mt = D (cid:88) j =1 α mj (cid:104) − β mj (cid:0) λ mt − λ m (cid:1) dt + α mj d N jt (cid:105) , for m = 1 , . . . , D, with given initial conditions: λ m ∈ R ∗ + for m = 1 , . . . , D . It is well-known that for this choice of intensity,the couple ( N t , λ t ) t ≥ becomes Markovian. See e.g. Lemma 6 in [Mas98] for a proof of this result.We can now rewrite the control problem (3.1) in the particular case where the order book is driven byHawkes processes, there is no running reward, i.e. f = 0 , and where the terminal reward G stands for theterminal wealth of the market maker penalized by her inventory. We then consider the following problemin this section: V ( t, λ, z ) := sup α ∈ A E αt,z,λ (cid:2) G (cid:0) Z T (cid:1)(cid:3) , (5.1)where G ( z ) denotes the wealth of the market maker when the controlled order book is at state z , plus aterm of penalization of her inventory; and where A is the set of the admissible controls, i.e. the predictabledecisions taken by the market maker until a terminal time T > .We now present the main result of this section. Theorem 5.1. V is characterized as the unique solution of the following HJB equation:  f ( T, z, λ ) = G ( z ) , for z ∈ E ∂f∂t ( t, z, λ ) − D (cid:88) m =1  D (cid:88) j =1 β mj (cid:0) λ m − λ m (cid:1) ∂f∂λ m ( t, z, λ )+ λ m sup a ∈ A z (cid:2) f (cid:0) t, e am ( z ) , λ + α m (cid:1) − f (cid:0) t, z, λ (cid:1)(cid:3)(cid:21) , for ≤ t < T, and ( t, z, λ ) ∈ R + × E × R ∗ + . (5.2) Moreover, V admits the following representation V ( t, z, λ ) = sup α ∈ A ∞ (cid:88) n =0 E αt,z,λ (cid:34) T n ≤ T G (cid:0) Z αT n (cid:1) exp (cid:40) − | λ | ( T − T n )+ D (cid:88) m =1 λ mT n − λ m (cid:80) Dj =1 β mj (cid:16) e − (cid:80) Dj =1 β mj ( T − T n ) − (cid:17)(cid:41)(cid:35) , (5.3) where, for n ≥ , T n stands for the n th jump time of Z after time t , and ( Z αT n ) ∞ n =0 is seen as a MDPcontrolled by α ∈ A ; and where E αt,z,λ [ . ] stands for the expectation conditioned by Z t = z, λ t = λ when thecontrol α is followed. emark 5.1. V is characterized in (5.3) as the value function associated to an MDP with inﬁnite horizon,where the instantaneous reward reads: r ( t, z, λ ) = t ≤ T G ( z ) exp (cid:40) −| λ | ( T − t ) + D (cid:88) m =1 λ m − λ m (cid:80) Dj =1 β mj (cid:16) e − (cid:80) Dj =1 β mj ( T − t ) − (cid:17)(cid:41) , where | . | denotes the L (cid:0) R D (cid:1) norm.Proof: (of Theorem 5.1) Step 1:

Let us check that (5.3) holds, where V is deﬁned as solution of (5.1).We want to show that (5.10) is the expression of the maximal reward operator associated to thePDMDP (3.9) that we will deﬁne later. First notice that ( λ t , Z t ) t is a PDMDP, since ( λ t , Z t ) t is determin-istic between two jumping times. We then aim at rewriting the expression of the value function deﬁnedin (5.1) as the value function associated to a inﬁnite horizon control problem of the PDMDP ( λ t , Z t ) t . Todo so, we ﬁrst notice that by conditioning on the time jumps we get: V ( t, z, λ ) = sup α ∈ A E αt,z,λ (cid:104) G (cid:0) Z αT (cid:1)(cid:105) = sup α ∈ A E αt,z,λ (cid:20) ∞ (cid:88) n =0 T n ≤ T

Let us show that V is the unique solution to (5.2).Notice ﬁrst that the solutions to the following HJB equation  G ( z ) = f ( T, z, λ )0 = ∂f∂t − (cid:80) Dm =1 (cid:80) Dj =1 β mj (cid:0) λ m − λ m (cid:1) ∂f∂λ m + λ m sup a ∈ A z (cid:2) f (cid:0) t, e am ( z ) , λ + α m (cid:1) − f (cid:0) t, z, λ (cid:1)(cid:3) , for ≤ t < T. are the ﬁxed points of the operator T = T ◦ T where T and T are deﬁned as follows: T : F (cid:55)→ f solution of (cid:40) ∂f∂t − (cid:80) Dm =1 (cid:80) Dj =1 β mj (cid:0) λ m − λ m (cid:1) ∂f∂λ m = F ( t, z, λ ) f ( T, z, λ ) = G ( z ) , and: T : f (cid:55)→ − D (cid:88) m =1 λ m sup a ∈ A z (cid:2) f (cid:0) t, e am ( z ) , λ + α m (cid:1) − f (cid:0) t, z, λ (cid:1)(cid:3) . We now use the characteristic method to rewrite the image of T .Let us take function F , and deﬁne f = T ( F ) . Let us ﬁx t ∈ [0 , T ] and λ ∈ ( R + ) D , and denote by g thefunction g ( s, z ) = f ( s, z, λ s , ..., λ Ds ) where, for m = 1 , . . . , D , s (cid:55)→ λ ms is a diﬀerentiable function deﬁnedon [ t, T ] as solution to the following ODE: (cid:40) dλ ms ds = − (cid:80) Dj =1 β mj (cid:0) λ ms − λ m (cid:1) , for all t < s ≤ T,λ mt = λ m . (5.7)For m = 1 , . . . , D , basic theory on ODE provides existence and uniqueness of a solution to (5.7), which isgiven by: λ ms = λ m + (cid:0) λ m − λ m (cid:1) e − (cid:80) Dj =1 β mj ( s − t ) , for s ∈ [ t, T ] , and m = 1 , . . . , D. Since ∂g∂s = ∂f∂s + (cid:80) Dm =1 dλ ms ds ∂f∂λ m , then g ( t, z ) = G ( z ) − (cid:82) Tt F ( s, z, λ s ) d s , which ﬁnally leads to the followingexpression of T ( F ) : T ( F ) = f ( t, z, λ ) = G ( z ) − (cid:90) Tt F (cid:0) s, z, λ s (cid:1) d s. (5.8)Replacing F by T ( f ) in (5.8), we get that f is ﬁxed point of T ◦ T if and only if: f ( t, λ, z ) + D (cid:88) m =1 (cid:90) Tt λ ms f (cid:0) s, z, λ s (cid:1) d s = G ( z ) − D (cid:88) m =1 (cid:90) Tt λ ms sup a ∈ A z f (cid:0) s, e am ( z ) , λ s + α m (cid:1) d s. Notice ∂f ( s, λ s , z ) e − (cid:80) Dj =1 (cid:82) st λ ju du ∂s = − D (cid:88) m =1 λ ms e − (cid:80) Dj =1 (cid:82) st λ ju du sup a ∈ A z f (cid:0) s, e am ( z ) , λ s + α m (cid:1) ,

34o that: f ( t, λ, z ) = G ( z ) e − (cid:80) Dm =1 (cid:82) Tt λ ms d s + N (cid:88) m =1 (cid:90) Tt λ ms e − (cid:82) st λ u du sup a ∈ A z f (cid:0) s, e am ( z ) , λ s + α m (cid:1) d s = G ( z ) e − (cid:80) Dm =1 (cid:82) Tt λ ms d s + sup a ∈ A z E at,λ,z (cid:2) f (cid:0) T , Z , λ T + α m (cid:1)(cid:3) , (5.9)where T is the ﬁrst jump time of N larger than t , we denote Z = Z T . Equation (5.9) shows that theﬁxed point of T ◦ T is characterized as the ﬁxed point of the operator T deﬁned for any smooth enoughfunction f by: T ( f ) = G ( z ) e − (cid:80) Dm =1 (cid:82) Tt λ ms d s + sup a ∈ A z E at,λ,z (cid:104) f (cid:0) T , Z , λ T + α m (cid:1)(cid:105) , (5.10)where E at,λ,z [ . ] stands for the expectation conditioned by the events λ t = λ and Z t = z , when decision a is taken at time t . We recognize here the maximal reward operator of the value function deﬁned in (5.6).Basic theory on PDMDP shows that the maximal reward operator T admits V as unique ﬁxed point,which completes step 2. Appendix A From uncontrolled to controlled intensity

Remind that the results state in Section 3 hold when assuming that the intensities of the orders arrivalsare uncontrolled. In particular, we assumed in this section that the market maker has no inﬂuence on thenext exogenous event that will occur. This can be seen as a weak assumption if the market maker is asmall player, but never holds in the case where the latter is a large player.In this section, we show how to alleviate Assumption (Harrivals2) by rewriting the initial control prob-lems (3.1) with controlled intensities as a control problems with uncontrolled intensities under a new(controlled) probability measure. The results and proofs in this section are inspired from [Bré81].Consider a LOB which can receive at any time limit, cancel, and market orders. Denote by L + (res. L − ) the limit sell (resp. buy) order arrival process, received on the ask (resp. buy) side. Denote by C + (resp. C − ) the cancel order on the ask (resp. bid) side. Denote by M + (resp. M − ) the buy (resp. sell)market order process. The orders arrivals process is then a (4 K + 2) dimensional process. Recall that E is the state space of the order book. The order book is modeled by a jump process Z : [0 , T ] → E suchthat the order arrivals processes have uncontrolled stochastic intensities λ i ( a, b ) , for i = 1 , . . . , K + 2 ,that only depend on the bid and ask sides, i.e. ( a, b ) of the order book under P . We underline that, byassumption, the intensities are uncontrolled under P .Let us ﬁx ( α t ) ≤ t ≤ T ∈ A an admissible control, i.e. a predictive process w.r.t. the natural ﬁltration (cid:0) F t (cid:1) t> generated by the uncontrolled orders arrivals processes under P . (HarrivalsL): We assume in this section that the intensities are Lipschitz and bounded, i.e. there exist [ λ ] L > and (cid:107) λ (cid:107) ∞ > such that (cid:12)(cid:12) λ i ( a, b ) − λ i (cid:0) a (cid:48) , b (cid:48) (cid:1)(cid:12)(cid:12) ≤ [ λ ] L (cid:0) | a − a (cid:48) | + | b − b (cid:48) | (cid:1) , λ i ( a, b ) ≤ (cid:107) λ (cid:107) ∞ , for i = 1 , . . . , K + 2 , for a, a (cid:48) ∈ N K and b, b (cid:48) ∈ ( − N ) K .We want to deﬁne the probability P α as the absolutely continuous probability w.r.t. P , which Radon-Nikodym derivative writes: L αt = d P α d P (cid:12)(cid:12)(cid:12)(cid:12) F t = K +2 (cid:89) i =1 ∞ (cid:89) n =1 µ iα,T in T in ≤ t exp (cid:26) (cid:90) t (cid:0) − µ is (cid:1) λ is d s (cid:27) , for ≤ t ≤ T, (A.1)where for i = 1 , . . . , K + 2 , we denote by µ iα,T in the quotient of the controlled intensity at time T in of the n th jump of the i th process and the uncontrolled intensity, i.e. denoting by a α and b α the ask and bidwhere the market order’s orders are counted, we deﬁne: µ iα = λ ( a α , b α ) λ ( a, b ) . Remark A.1.

Under (HarrivalsL) , it holds: | λ ( a α , b α ) − λ ( a, b ) | ≤ [ λ ] L M, for a, a (cid:48) ∈ N K , and b, b (cid:48) ∈ ( − N ) K , (A.2) where we remind that M stands for the limit number of orders that can be hold by the market maker atthe same time in the LOB. Remark A.2.

From Remark A.1, it is straightforward to see that µ iα is bounded under (HarrivalsL) ,and moreover: µ iα ≤ λ ] L Mλ min , for i = 1 , . . . , K + 2 , where we denote λ min = inf i =1 ,..., K +2 inf z ∈ E λ i ( z ) , and assume the latter to be strictly positive. Note that thebound is uniform w.r.t. the control and the state variables. Proposition A.1.

For every α ∈ A , it holds under (HarrivalsL) : E [ L αT ] = 1 , (A.3) which implies in particular that P α is well-deﬁned.Moreover, the orders arrivals admit the controlled intensities λ ( a α , b α ) , for i = 1 , . . . , K + 2 , under P α ,where we remind that a α and b α stand for the vector of orders on the ask and the bid sides, where themarket maker’s orders are counted.Proof. We divided the proof of Proposition A.1 into two steps.

Step 1:

We show (A.3).Let us ﬁx α ∈ A and write the integral representation of ( L αt ) ≤ t ≤ T :for t ∈ [0 , T ] , L αt = 1 + K +2 (cid:88) i =1 (cid:90) t L αs − (cid:0) µ iα,s − (cid:1) d ˜ M is , for i = 1 , . . . , K + 2 , (A.4)36here ˜ M stands for the local martingale which dynamic writes: d ˜ M is = d N is − λ i ( a s , b s ) d s . It is thensuﬃcient to show that E (cid:20)(cid:90) T L αs − (cid:0) µ iα,s − (cid:1) λ is d s (cid:21) < + ∞ , for i = 1 , . . . , K + 2 , (A.5)to get that the (cid:16)(cid:82) t L αs − (cid:0) µ iα,s − (cid:1) d ˜ M is (cid:17) ≤ t ≤ T are martingales for i = 1 , . . . , K + 2 (as proved e.g. in[Bré81]), and complete the proof of Step 1, using (A.4).Plugging (A.2) into (A.1), we get: L s ≤ (cid:107) µ (cid:107) A s ∞ e [ λ ] L MT , for ≤ s ≤ T, (A.6)where we denote (cid:107) µ (cid:107) ∞ := 1 + [ λ ] L Mλ min , and where ( A t ) t ∈ [0 ,T ] stands for the sum of all the order arrivalsprocess up to time t , for t ∈ [0 , T ] .Moreover, as stated in Remark A.1, we have for all i = 1 , . . . , K + 2 : | (cid:0) µ is − (cid:1) λ is ( z ) | = | λ ( a α , b α ) − λ ( a, b ) |≤ [ λ ] L M. (A.7)Plugging (A.7) and (A.6) into the l.h.s. of (A.5), we get: E (cid:20)(cid:90) T L αs − (cid:0) µ iα,s − (cid:1) λ is d s (cid:21) ≤ (cid:90) T E (cid:2) (cid:107) µ (cid:107) A s ∞ (cid:3) e [ λ ] L MT [ λ ] L M d s (A.8)Notice that the intensity of A is bounded by (cid:107) λ (cid:107) ∞ , under (HarrivalsL) , so that: E (cid:2) (cid:107) µ (cid:107) A s ∞ (cid:3) ≤ e −(cid:107) λ (cid:107) ∞ s + ∞ (cid:88) n =0 (cid:107) µ (cid:107) n ∞ ( (cid:107) λ (cid:107) ∞ s ) n n ! ≤ exp {(cid:107) λ (cid:107) ∞ T ( (cid:107) µ (cid:107) ∞ − } , for s ∈ [0 , T ] , (A.9)Combining (A.8) and (A.9), we can prove that (A.5) holds, which completes the proof of Step 1. Step 2:

We refer to the T3 Theorem in Chapter VI of [Bré81] for a proof of the second assertion inProposition A.1.

Appendix B Dynamics of the controlled order book (simpliﬁed version)

In this section, we give the expressions for the dynamics of the controlled order book process ( Z t ) . Themarket maker control has been simpliﬁed to a couple ( la t , lb t ) , where la = 1 (resp. 0) if the market makerholds (does not hold) a sell order at the best ask limit, and lb = 1 (resp. 0) if the market maker holds(does not hold) a buying order at the best bid limit. So to speak, the market maker considers to placeorders at the best ask limit or at the best bid limit exclusively. In the numerical simulations that werun, we also had to calculate the dynamics of ( Z t ) for the set of generalized controls in which the market37aker is allowed to post orders on the two ﬁrst limits at the bid and at the ask side. The expression ofthe dynamics for the generalized controls are very similar to the ones for the simpliﬁed controls.To understand the dynamics of the rank of the orders of the market maker, we need a model for thecancellation of orders. Suppose for example that the market maker holds an order whose rank is na in thequeue, with na < a A − (0) . Suppose that the cancel process L C + A − (0) jumps. Then two scenarios can occur: • If the rank of the canceled order is greater than the one of the market maker, then na t stays constant. • If the rank of the canceled order is smaller than the one of the market maker, then na t = na t − + 1 .Model:We consider a Bernoulli variable X a with parameter: na − a A − (0) (cid:124) (cid:123)(cid:122) (cid:125) α δ + a A − (0) + 1 − naa A − (0) (cid:124) (cid:123)(cid:122) (cid:125) β δ .We assume that the canceled order is in front of the market maker’s order in the queue if X a = 1 , andbehind it if X a = 0 .We proceed for the bid side as we just did for the ask side. We consider a random variable X b followinga Bernoulli law with parameter: nb − | b B − (0) | (cid:124) (cid:123)(cid:122) (cid:125) α δ + | b B − (0) | + 1 − nb | b B − (0) | (cid:124) (cid:123)(cid:122) (cid:125) β δ . B.1 Dynamics of X t et Y t The dynamic of the amount hold by the market maker on a no-interest-bearing account ( X t ) t ∈ R + is asfollows: dX t = la t pa t − { na t − =1 } dM + t − lb t pb t − { nb t − =1 } dM − t The market maker’s inventory ( Y t ) follows the dynamic: dY t = − la t { na t − =1 } dM + t + { nb t − =1 } lb t dM − t where: • ˆ a = sup { a i : (cid:80) i − j =1 a j = 0 } et ˆ b = sup { b i : (cid:80) i − j =1 b j = 0 }}• M ± t are Cox processes with intensities λ M ± B.2 Dynamics of the a t et b t We remind that a i is the number of orders located i ticks away from the best buy order.We denote by J the shif t operator that re-index a side of the book when an event occurred on theopposite side. 38 L − i , i ∈ { , . . . , B − (0) } is the shift operator that shifts the bid side due to the jump of a L + i for i ∈ { , K } . We get: J L − i ( a ) =  a i +1 , ..., a K , a ∞ , . . . , a ∞ (cid:124) (cid:123)(cid:122) (cid:125) i times  Dynamics of a i : d a i = (1 − lb t ) d L + i + lb t d L + i − ( A − (0) − rb t − ) + (cid:104)(cid:0) − lb t (cid:1) + lb t { nb t − > } (cid:105)(cid:16) J M − (cid:0) a i (cid:1) − a i (cid:17) d M − ( t ) − (1 − lb t ) dC + i − lb t d C + i − ( A − (0) − rb t − ) + (1 − la t ) (cid:34) − { i = A − (0) } d M + t + (cid:0) J C − ( a i ) − a i (cid:1) d C − A − (0) + (1 − lb t ) A − (0) − (cid:88) j =1 ( J L − j , ( a i ) − a i ) d L − j ( t )+ lb t A − (0) − (cid:88) j =1 ( J L − j , ( a i ) − a i ) d L − j ( t ) (cid:35) + la t (cid:34) − { na t − > } { i = A − (0) } d M + t + (cid:0) J C − ( a i ) − a i (cid:1) d C − ra t − + lb t ra t − − (cid:88) j =1 ( J L − j , ( a i ) − a i ) d L − j ( t )+ (1 − lb t ) ra t − − (cid:88) j =1 ( J L − j , ( a i ) − a i ) d L − j ( t ) (cid:35) with J such that: J C − ( a i ) =  a ∞ si i > B − (1) − B − (0) + Ka i − (cid:0) B − (1) − B − (0) (cid:1) si i > (cid:0) B − (1) − B − (0)0 si i ≤ B − (1) − B − (0) J M − ( a i ) = (cid:40) a i − (cid:0) B − (1) − B − (0) (cid:1) si i > (cid:0) B − (1) − B − (0)0 si i ≤ B − (1) − B − (0) J L − j , ( a i ) = (cid:26) a i + j si i + j ≤ K si i + j < K J L − j , ( a i ) = (cid:26) a i + rb t − − j si i + rb t − − j ≤ Ka ∞ si i + rb t − − j > K J L − j , ( a i ) = (cid:26) a i + ra t − − j si i + ra t − − j ≤ Ka ∞ si i + ra t − − j > K L − j , ( a i ) =  si i + rb t − − j < a i + rb t − − j si i + rb t − − j ≤ Ka ∞ si i + rb t − − j > K We remind that b i is the number of buy order located i ticks away from the best sell order. Dynamics of b i : db i = − (1 − la t ) dL − i − la t d L − i − ( A − (0) − ra t − ) + (cid:104)(cid:0) − la t (cid:1) + la t { na t − > } (cid:105)(cid:16) J M + (cid:0) b i (cid:1) − b i (cid:17) d M + ( t )+ (1 − la t ) d C − i + la t d C − i − ( A − (0) − ra t − ) + (1 − lb t ) (cid:34) { i = A − (0) } d M − t + (cid:0) J C + ( b i ) − b i ) d C + A − (0) + (1 − la t ) B − (0) − (cid:88) j =1 ( J L + j , ( b i ) − b i ) d L + j ( t )+ la t B − (0) − (cid:88) j =1 ( J L + j , ( b i ) − b i ) d L + j ( t ) (cid:35) + lb t (cid:34) { nb t − > } { i = A − (0) } d M − t + (cid:0) J C + ( b i ) − b i ) d C + rb t − + la t rb t − − (cid:88) j =1 ( J L + j , ( b i ) − b i ) d L + j ( t )+ (1 − la t ) rb t − − (cid:88) j =1 ( J L + j , ( b i ) − b i ) d L + j ( t ) (cid:35) with J the shift operators: J C + ( b i ) =  b ∞ si i + A − (1) − A − (0) > Kb i − (cid:0) A − (1) − A − (0) (cid:1) si i > (cid:0) A − (1) − A − (0)0 si i ≤ A − (1) − A − (0) J M + ( b i ) =  b ∞ si i + A − (1) − A − (0) > Kb i − (cid:0) A − (1) − A − (0) (cid:1) si i > (cid:0) A − (1) − A − (0)0 si i ≤ A − (1) − A − (0) J L + j , ( b i ) = (cid:26) b i + j si i + j ≤ K si i + j > K J L + j , ( b i ) = (cid:26) b i − j + A − (0) si i − j + A − (0) ≤ Kb ∞ si i − j + A − (0) > K L + j , ( b i ) = (cid:26) b i + rb t − − j si i + rb t − − j ≤ Kb ∞ si i + rb t − − j > K J L + j , ( b i ) =  si i + rb t − − j < b i + rb t − − j si i + rb t − − j ≤ Kb ∞ si i + rb t − − j > K B.3 Dynamics of na t and nb t Dynamics of na t : X a has been introduced in part B. It models whether the canceled order is behind or in front of themarket maker’s order in the queue.We get: dna t = la t (cid:34) − X a (cid:16) (1 − lb t ) d C + A − (0) ( t ) + lb t d C + rb t − (cid:17) + (cid:16) − { na t − > } + (cid:0) a A − (0) + 1 − na t − (cid:1) { na t − =1 } (cid:17) d M + t + (2 − na t − ) (cid:110) (1 − lb t ) ra t − ( t ) − (cid:88) i =1 d L + i + lb t ra t − ( t ) − ( A − (0) − rb t − ) − (cid:88) i =1 d L + i (cid:111)(cid:35) + (1 − la t ) (cid:34)(cid:16) a A − (0) { a A − > } + ( a A − (1) + 1) { a A − =1 } − na t − (cid:17) d M + t + lb t (cid:20) (2 − na t − ) rb t − − (cid:88) j =1 dL + j + ( a A − (0) + 2 − na t − ) d L + rb t − + ( a A − (0) + 1 − na t − ) K (cid:88) j = rb t − +1 (cid:0) d L + j + dC + j (cid:1) + (cid:16) a A − (0) { a A − > } + ( a A − (1) + 1) { a A − =1 } − na t − (cid:17) d C + rb t − (cid:21) + (1 − lb t ) (cid:20) (2 − na t − ) B − (0) − (cid:88) j =1 d L + j + ( a A − (0) + 2 − na t − ) d L + A − (0) + ( a A − (0) + 1 − na t − ) K (cid:88) j = B − (0)+1 (cid:16) d L + j + d C + j (cid:17) + (cid:16) a A − (0) { a A − > } + ( a A − (1) + 1) { a A − =1 } − na t − (cid:17) d C + A − (0) (cid:21) + (cid:0) a A − (0) + 1 − na t − (cid:1)(cid:20) d M − t + K (cid:88) j =1 (cid:16) d L − j + d C − j (cid:17)(cid:21)(cid:35) na t = ( la t == 0)( − − na t − ) (cid:34) d M + t + d M − t + ( lb t ! = 1) K (cid:88) i =1 (cid:0) d L + i + d L − i + d C + i + d C − i (cid:1) + ( lb t == 1) (cid:20) K − ( A − (0) − rb t − ) (cid:88) i =0 (cid:0) dL + i + dC + i (cid:1) + K (cid:88) i =1 (cid:0) dL − i + d C − i (cid:1)(cid:21)(cid:35) + la t =1 (cid:40)(cid:16) na t − = − + na t − != − ra t − >A − (0) (cid:17)(cid:104)(cid:0) a A − (0) − na t − (cid:1)(cid:104) d M + t + lb t =1 d C + rb t − + lb t !=1 d C + A − (0) (cid:105) + (cid:0) a A − (0) + 1 − na t − (cid:1)(cid:104) lb t !=1 K (cid:88) i =1 (cid:0) dL + i + dC + i (cid:1) + lb t =1 K − ( A − (0) − rb t − ) (cid:88) i =1 (cid:0) d L + i + dC + i (cid:1)(cid:105)(cid:105)(cid:41) Dynamics of nb t : dnb t = lb t (cid:34) − X b (cid:16) (1 − la t ) d C − B − (0) ( t ) + la t d C − ra t − (cid:17) + (cid:16) − { nb t − > } + (cid:0) | b B − (0) | + 1 − nb t − (cid:1) { nb t − =1 } (cid:17) d M − t + (2 − nb t − ) (cid:110) (1 − la t ) rb t − ( t ) − (cid:88) i =1 d L − i + la t rb t − ( t ) − ( B − (0) − ra t − ) − (cid:88) i =1 d L − i (cid:111)(cid:35) + (1 − lb t ) (cid:34)(cid:16) | b A − (0) | {| b B − | > } + ( | b B − (1) | + 1) {| b B − | =1 } − nb t − (cid:17) d M − t + la t (cid:20) (2 − nb t − ) ra t − − (cid:88) j =1 dL − j + ( | b A − (0) | + 2 − nb t − ) d L − ra t − + ( | b A − (0) | + 1 − nb t − ) K (cid:88) j = ra t − +1 (cid:16) d L − j + d C − j (cid:17) + (cid:16) | b A − (0) | {| b B − | > } + ( | b B − (1) | + 1) {| b B − | =1 } − nb t − (cid:17) d C − ra t − (cid:21) + (1 − la t ) (cid:20) (2 − nb t − ) B − (0) − (cid:88) j =1 d L − j + ( | b B − (0) | + 2 − nb t − ) d L − A − (0) + ( | b B − (0) | + 1 − nb t − ) K (cid:88) j = B − (0)+1 (cid:16) d L − j + d C − j (cid:17) + (cid:16) | b A − (0) | {| b B − | > } + ( | b B − (1) | + 1) {| b B − | =1 } − nb t − (cid:17) d C − A − (0) (cid:21) + (cid:0) | b A − (0) | + 1 − nb t − (cid:1)(cid:20) d M + t + K (cid:88) j =1 (cid:16) d L + j + d C + j (cid:17)(cid:21) .4 Dynamics of pa and pb Dynamics of ( pa t ) t : Denoting by δ the tick, we have: d P At = δ (1 − la t ) (cid:34)(cid:16)(cid:0) A − (1) − ra t − (cid:1) d M + ( t )+ lb t (cid:34) − rb t − − (cid:88) i =1 (cid:20) rb t − − ( A − (0) − ra t − ) − j (cid:21) d L + i ( t ) + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = rb t − dL + i + (cid:0) A − (1) − ra t − (cid:1) d C + rb t − + K (cid:88) j = rb t − +1 (cid:0) A − (0) − ra t − (cid:1) d C + j (cid:35) + (cid:16) − lb t (cid:17)(cid:34) − A − (0) − (cid:88) i =1 (cid:0) ra t − − j (cid:1) d L + i ( t ) + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = A − (0) dL + i + (cid:0) A − (1) − ra t − (cid:1) d C + A − (0) + K (cid:88) j = A − (0)+1 (cid:0) A − (0) − ra t − (cid:1) d C + j (cid:35) + { ra t − (cid:54) = A − (0) } (cid:0) A − (0) − ra t − (cid:1)(cid:18) d M − t + K (cid:88) j =1 d L − t + K (cid:88) j =1 d C − t (cid:19)(cid:35) + δla t (cid:34) ( A − (0) − r At ) (cid:17) d M + ( t ) − lb t rb t − − ( A − (0) − ra t − ) − (cid:88) i =1 (cid:18) ra t − − (cid:0) j + A − (0) − rb t − (cid:1)(cid:19) d L + i ( t ) − (cid:16) − lb t (cid:17) ra t − − (cid:88) i =1 (cid:0) ra t − − j (cid:1) d L + i ( t ) (cid:35) Dynamics of ( pb t ) : P Bt = − δ (1 − lb t ) (cid:34)(cid:16)(cid:0) B − (1) − rb t − (cid:1) d M − ( t )+ la t (cid:34) − ra t − − (cid:88) i =1 (cid:20) ra t − − ( B − (0) − rb t − ) − j (cid:21) d L − i ( t ) + (cid:0) B − (0) − rb t − (cid:1) K (cid:88) j = ra t − dL − i + (cid:0) B − (1) − rb t − (cid:1) d C − rb t − + K (cid:88) j = ra t − +1 (cid:0) B − (0) − rb t − (cid:1) d C − j (cid:35) + (cid:16) − la t (cid:17)(cid:34) − B − (0) − (cid:88) i =1 (cid:0) rb t − − j (cid:1) d L − i ( t ) + (cid:0) B − (0) − rb t − (cid:1) K (cid:88) j = B − (0) dL − i + (cid:0) B − (1) − rb t − (cid:1) d C − A − (0) + K (cid:88) j = A − (0)+1 (cid:0) B − (0) − rb t − (cid:1) d C − j (cid:35) + { rb t − (cid:54) = A − (0) } (cid:0) A − (0) − rb t − (cid:1)(cid:18) d M + t + K (cid:88) j =1 dL + j + K (cid:88) j =1 dC + j (cid:19)(cid:35) − δlb t (cid:34) ( B − (0) − r Bt ) (cid:17) d M − ( t ) − la t ra t − − ( B − (0) − rb t − ) − (cid:88) i =1 (cid:18) rb t − − (cid:0) j + B − (0) − ra t − (cid:1)(cid:19) d L − i ( t ) − (cid:16) − la t (cid:17) rb t − − (cid:88) i =1 (cid:0) rb t − − j (cid:1) d L − i ( t ) (cid:35) B.5 Dynamics of ra and rb We remind that ra t denotes the number of ticks between the market maker’s order and the best buyorder in the order book. We assumed in this simpliﬁed control problem that the market maker is allowedto place no more than one order on the best ask and best bid limits. So ra and rb are vectors of size 1 here.44 ynamics of ra : d ra t = la t (cid:34) { na =1 } (cid:16) A − (0) − r At (cid:17) d M + t + (1 − lb t ) ra t − − (cid:88) i =1 (cid:16) i − ra t − (cid:17) d L + i + lb t rb t − − ( B − (0) − ra t − ) − (cid:88) i =1 (cid:16) i + B − (0) − rb t − − ra t − (cid:17) d L + i + ra t − − (cid:88) i =1 (cid:0) i − ra t − (cid:1) d L − i + (cid:0) B − (1) − B − (0) (cid:1) d C − ra t − (cid:18)(cid:16)(cid:0) − lb t (cid:1) + lb t { nb t − > } (cid:17)(cid:104) B − (1) − B − (0) (cid:105)(cid:19) d M − t (cid:35) + ( l − la t ) (cid:34) lb t (cid:20) rb t − − (cid:88) j =1 (cid:16) j + B − (0) − rb t − − ra t − (cid:17) d L + j + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = rb t − d L + j (cid:0) A − (1) − ra t − (cid:1) d C + rb t − + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = rb t − +1 d C + j (cid:21) + (1 − lb t ) (cid:20) B − (0) − (cid:88) j =1 (cid:16) j − ra t − (cid:17) d L + j + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = B − (0) (cid:0) d C + j + d C − j (cid:1) + (cid:0) A − (1) − ra t − (cid:1) d C + A − (0) + K (cid:88) j = A − (0)+1 (cid:0) A − (0) − ra t − (cid:1) d C + j + A − (0) − (cid:88) j =1 ( j − ra t − ) d L − j + K (cid:88) j = A − (0) (cid:0) A − (0) − ra t − (cid:1) d L − j + (cid:0) B − (1) − ra t (cid:1) d C − B − (0) + (cid:0) A − (0) − ra t − (cid:1) K (cid:88) j = A − (0)+1 d C − j + (cid:20)(cid:16) (1 − lb t ) + lb t { nb t − > } (cid:17)(cid:18) B − (1) − ra t − (cid:19) + lb t { nb t − =1 } (cid:0) B − (0) − ra t − (cid:1)(cid:21) d M − t + (cid:0) A − (1) − ra t − (cid:1) d M + t We remind that rb t is the number of ticks between the market maker’s order and the best sell orderin the order book. 45 ynamics of rb : d rb t = lb t (cid:34) { nb =1 } (cid:16) B − (0) − rb (cid:17) d M − t + (1 − la t ) rb t − − (cid:88) i =1 (cid:16) i − rb t − (cid:17) d L − i + la t ra t − − ( A − (0) − rb t − ) − (cid:88) i =1 (cid:16) i + A − (0) − ra t − − rb t − (cid:17) d L − i + rb t − − (cid:88) i =1 (cid:0) i − rb t − (cid:1) d L + i + (cid:0) A − (1) − A − (0) (cid:1) d C + rb t − (cid:18)(cid:16)(cid:0) − la t (cid:1) + la t { na t − > } (cid:17)(cid:104) A − (1) − A − (0) (cid:105)(cid:19) d M + t (cid:35) + ( l − lb t ) (cid:34) la t (cid:20) ra t − − (cid:88) j =1 (cid:16) j + A − (0) − ra t − − rb t − (cid:17) d L − j + K (cid:88) j = ra t − (cid:0) B − (0) − rb t − (cid:1) d L − j (cid:0) B − (1) − rb t − (cid:1) d C − ra t − + K (cid:88) j = ra t − +1 (cid:0) B − (0) − rb t − (cid:1) d C − j (cid:21) + (1 − la t ) (cid:20) A − (0) − (cid:88) j =1 (cid:16) j − rb t − (cid:17) d L − j + K (cid:88) j = A − (0) (cid:0) B − (0) − rb t − (cid:1) d L − j + (cid:0) B − (1) − rb t − (cid:1) d C − B − (0) + K (cid:88) j = B − (0)+1 (cid:0) B − (0) − rb t − (cid:1) d C − j + B − (0) − (cid:88) j =1 ( j − rb t − ) d L + j + K (cid:88) j = B − (0) (cid:0) B − (0) − rb t − (cid:1) d L + j + (cid:0) A − (1) − rb t (cid:1) d C + A − (0) + (cid:0) B − (0) − rb t − (cid:1) K (cid:88) j = B − (0)+1 d C + j + (cid:20)(cid:16) (1 − la t ) + la t { na t − > } (cid:17)(cid:0) A − (1) − rb t − (cid:19) + la t { na t − =1 } (cid:0) A − (0) − rb t − (cid:1)(cid:21) d M + t + (cid:0) B − (1) − rb t − (cid:1) d M − t Appendix C Proof of Theorem 4.1 and Corollary 4.1

We divided the proofs of Theorem 4.1 and Corollary 4.1 into several Lemmas that we state and provenow.Lemma C.1 aims at bounding the projection error. It relies on [GKKW02], see p.93, as well as Zador’stheorem, stated in Section D for the sake of completeness.46 emma C.1.

Assume d ≥ , and take K = M d +2 points for the optimal quantization of ε n , then itholds under (H µ ) and (HF) , as M → + ∞ , ε projn = O (cid:18) M /d (cid:19) , (C.1) where we remind that ε projn := sup a ∈ A (cid:107) Proj n +1 ( F ( X n , a, ˆ ε n )) − F ( X n , a, ˆ ε n ) (cid:107) stands for the average projec-tion error.Proof. Let us take η > , and observe that P (cid:16)(cid:12)(cid:12) Proj n +1 (cid:2) F ( X n , a, ˆ ε n +1 ) (cid:3) − F ( X n , a, ˆ ε n +1 ) (cid:12)(cid:12) > η (cid:17) = E (cid:34) M (cid:89) m =1 E (cid:34) (cid:12)(cid:12)(cid:12) X t, ( m ) n +1 − F ( X n ,a, ˆ ε n +1 ) (cid:12)(cid:12)(cid:12) > √ η (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X n , ˆ ε n +1 (cid:35)(cid:35) = E (cid:20)(cid:16) − µ (cid:2) B (cid:0) F ( X n , a, ˆ ε n +1 ) , √ η (cid:1)(cid:3)(cid:17) M (cid:21) , where for all x ∈ E and η > , B ( x, η ) denote the ball of center x and radius η . Since x (cid:55)→ (1 − x ) M is M -Lipschitz, we get by application of Zador’s theorem: P (cid:16)(cid:12)(cid:12) Proj n +1 (cid:2) F ( X n , a, ˆ ε n +1 ) (cid:3) − F ( X n , a, ˆ ε n +1 ) (cid:12)(cid:12) > η (cid:17) ≤ M [ F ] L [ µ ] L (cid:107) ˆ ε n +1 − ε n +1 (cid:107) + E (cid:20)(cid:16) − µ (cid:0) B (cid:0) F ( X n , a, ε n +1 ) , √ η (cid:1)(cid:1)(cid:17) M (cid:21) = M [ F ] L [ µ ] L K /d + E (cid:20)(cid:16) − µ (cid:0) B (cid:0) F ( X n , a, ε n +1 ) , √ η (cid:1)(cid:1)(cid:17) M (cid:21) + O (cid:18) MK /d (cid:19) , as the number of points for the quantization of the exogenous noise K goes to + ∞ , and where M standsfor the size of the grids Γ n .Let us introduce A , ..., A N ( η ) , a cubic partition of Supp( µ ) , which is bounded under (H µ ) , such thatfor all j = 1 , . . . , N ( η ) , A j has diameter η . Also, Notice that there exists c > , which only depends on Supp( µ ) , such as N ( η ) ≤ cη d . (C.2)If x ∈ A j , then A j ⊂ B ( x, η ) , therefore: E (cid:104) (1 − µ ( B ( X n , η ))) M (cid:105) = N ( η ) (cid:88) j =1 (cid:90) A j (cid:16) − µ ( B ( x, η )) (cid:17) M µ ( dx ) ≤ N ( η ) (cid:88) j =1 (cid:90) A j (cid:16) − µ ( A j ) (cid:17) M µ ( dx ) . (C.3)Also notice that: N ( η ) (cid:88) j =1 µ ( A j ) (cid:16) − µ ( A j ) (cid:17) M ≤ N ( η ) (cid:88) j =1 max z z (1 − z ) M ≤ e − N ( η ) M . (C.4)47ombining (C.3) and (C.4) leads to E (cid:104) (1 − µ ( B ( X n , η ))) M (cid:105) ≤ e − N ( η ) M . (C.5)Let L = 2 (cid:107) µ (cid:107) ∞ stands for the diameter of the support of µ . We then get, as M → + ∞ , E (cid:20) (cid:12)(cid:12) Proj n +1 (cid:2) F ( X n , a, ˆ ε n +1 ) (cid:3) − F ( X n , a, ˆ ε n +1 ) (cid:12)(cid:12) (cid:21) = (cid:90) ∞ P (cid:16) (cid:12)(cid:12) Proj n +1 (cid:2) F ( X n , a, ˆ ε n +1 ) (cid:3) − F ( X n , a, ˆ ε n +1 ) (cid:12)(cid:12) > η (cid:17) d η ≤ (cid:90) L M [ F ] L [ µ ] L K /d + P (cid:16) | Proj n +1 (cid:2) F ( X n , a, ˆ ε n +1 ) (cid:3) − F ( X n , a, (cid:15) n +1 ) | > √ η (cid:17) d η = (cid:90) L min (cid:18) , e − N ( √ η ) M (cid:19) d η + O (cid:18) MK /d (cid:19) = (cid:90) L min (cid:32) , cη − d/ eM (cid:33) d η + O (cid:18) MK /d (cid:19) = (cid:90) ( c/ ( eM )) (2 /d ) dη + (cid:90) L ( c/ ( eM )) (2 /d ) cη − d/ eM d η + O (cid:18) MK /d (cid:19) = ˜ c M /d + O (cid:18) MK /d (cid:19) , (C.6)where ˜ c is deﬁned as ˜ c := (cid:113) dd − (cid:0) ce (cid:1) /d , and where we used (C.5) and (C.2) to go from the second to thethird line. It remains to take K = M d +1 points for the optimal quantization of the exogenous noise, andthen take square root of equality (C.6), in order to derive (C.1). Lemma C.2.

Assume d ≥ , take K = M d +2 points for the optimal quantization of ε n , and let x ∈ E .Then it holds under (H µ ) and (HF) , as M → + ∞ : ε projn ( x ) = O (cid:18) M /d (cid:19) , where ε projn ( x ) , deﬁned as ε projn ( x ) := sup a ∈ A (cid:107) Proj n +1 ( F ( x, a, ˆ ε n )) − F ( x, a, ˆ ε n ) (cid:107) , stands for the later-projection error at state x .Proof. Following the same steps as those used to prove Lemma C.1, we show that: P (cid:16)(cid:12)(cid:12) Proj n +1 (cid:2) F ( x, a, ˆ ε n +1 ) (cid:3) − F ( x, a, ˆ ε n +1 ) (cid:12)(cid:12) > η (cid:17) = M [ F ] L [ µ ] L K /d + E (cid:20)(cid:16) − µ (cid:0) B (cid:0) F ( x, a, ε n +1 ) , √ η (cid:1)(cid:1)(cid:17) M (cid:21) + O (cid:18) MK /d (cid:19) , K → + ∞ , and moreover, E (cid:20)(cid:16) − µ (cid:0) B (cid:0) F ( x, a, ε n +1 ) , √ η (cid:1)(cid:1)(cid:17) M (cid:21) ≤ e − N ( η ) M , holds, which is enough to complete the proof of Lemma C.2.

Lemma C.3.

Under (HF) , for n = 0 , . . . , N there exists constant (cid:104) ˆ V Qn (cid:105) L > such that for x, x (cid:48) ∈ E , itholds as M → ∞ : (cid:12)(cid:12)(cid:12) ˆ V Qn ( x ) − ˆ V Qn ( x (cid:48) ) (cid:12)(cid:12)(cid:12) ≤ (cid:104) ˆ V Qn (cid:105) L (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + O (cid:18) M /d (cid:19) . (C.7) Moreover, following bounds holds on (cid:104) ˆ V Qn (cid:105) L , for n = 0 , . . . , N : (cid:104) ˆ V QN (cid:105) L ≤ [ g ] L (cid:104) ˆ V Qn (cid:105) L ≤ [ f ] L + [ F ] L (cid:104) ˆ V Qn +1 (cid:105) L , for n = 0 , ..., N − . (C.8) Proof.

Let us show that by induction that ˆ V QN is Lipschitz. First, notice that (C.7) holds at terminal time n = N , if one deﬁne (cid:104) ˆ V QN (cid:105) L as (cid:104) ˆ V QN (cid:105) L = [ g ] L . Let us take x, x (cid:48) ∈ E . Assume (cid:12)(cid:12)(cid:12) ˆ V Qn +1 ( x ) − ˆ V Qn +1 ( x (cid:48) ) (cid:12)(cid:12)(cid:12) ≤ (cid:104) ˆ V Qn +1 (cid:105) L | x − x (cid:48) | + O (cid:16) M /d (cid:17) holds for some n = 0 , . . . , N − . Let us show that (cid:12)(cid:12)(cid:12) ˆ V Qn ( x ) − ˆ V Qn ( x (cid:48) ) (cid:12)(cid:12)(cid:12) ≤ (cid:104) ˆ V Qn (cid:105) L (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + O (cid:18) M /d (cid:19) , where (cid:104) ˆ V Qn (cid:105) L is deﬁned in (C.8). Notice that, by the dynamic programming principle and the triangularinequality, it holds: | ˆ V Qn ( x ) − ˆ V Qn ( x (cid:48) ) | ≤ [ f ] L (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + sup a E an (cid:104)(cid:12)(cid:12)(cid:12) ˆ V Qn +1 (cid:0) Proj n +1 ( F ( x, a, ˆ ε n +1 )) (cid:1) − ˆ V Qn +1 (cid:0) Proj n +1 (cid:0) F ( x (cid:48) , a, ˆ ε n +1 ) (cid:1) (cid:1)(cid:12)(cid:12)(cid:12)(cid:105) ≤ [ f ] L (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + (cid:104) ˆ V Qn +1 (cid:105) L sup a E (cid:2)(cid:12)(cid:12) Proj n +1 ( F ( x, a, ˆ ε n +1 )) − F ( x, a, ˆ ε n +1 ) (cid:12)(cid:12)(cid:3) + O (cid:18) M /d (cid:19) ≤ (cid:16) [ f ] L + (cid:104) ˆ V Qn +1 (cid:105) L [ F ] L (cid:17) (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + O (cid:18) M /d (cid:19) ≤ (cid:104) ˆ V Qn (cid:105) L (cid:12)(cid:12) x − x (cid:48) (cid:12)(cid:12) + O (cid:18) M /d (cid:19) , which completes the proof of (C.7).We now proceed to the proof of Theorem 4.1. 49 roof. (of Theorem 4.1) Combining inequality | u + u + u | ≤ (cid:0) | u | + | u | + | u | (cid:1) that holds for all u , u , u ∈ R with inequality (cid:12)(cid:12)(cid:12)(cid:12) sup i ∈ I a i − sup i ∈ I b i (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup i ∈ I | a i − b i | that holds for all families ( a i ) i ∈ I and ( a i ) i ∈ I of reals, and all subset I of R , we have: (cid:107) ˆ V Qn ( X n ) − V n ( X n ) (cid:107) ≤ E (cid:34) sup a ∈ A E n,X n (cid:12)(cid:12)(cid:12) ˆ V Qn +1 (cid:0) Proj n +1 ( F ( X n , a, ˆ ε n +1 )) (cid:1) − ˆ V Qn +1 ( F ( X n , a, ˆ ε n +1 )) (cid:12)(cid:12)(cid:12) + sup a ∈ A E n,X n (cid:12)(cid:12)(cid:12) ˆ V Qn +1 ( F ( X n , a, ˆ ε n +1 )) − ˆ V Qn +1 ( F ( X n , a, ε n +1 )) (cid:12)(cid:12)(cid:12) + sup a ∈ A E n,X n (cid:12)(cid:12)(cid:12) ˆ V Qn +1 ( F ( X n , a, ε n +1 )) − V n +1 ( F ( X n , a, ε n +1 )) (cid:12)(cid:12)(cid:12) (cid:35) where E n,X n stands for the expectation conditioned by the state X n at time n . It holds as M → + ∞ ,using Lemma C.3: (cid:107) ˆ V Qn ( X n ) − V n ( X n ) (cid:107) ≤ (cid:104) ˆ V Qn (cid:105) L E (cid:34) sup a E n,X n (cid:2) | Proj n +1 ( F ( X n , a, ˆ ε n +1 )) − F ( X n , a, ˆ ε n +1 ) | (cid:3) + sup a E n,X n (cid:2) | F ( X n , a, ˆ ε n +1 ) − F ( X n , a, ε n +1 ) | (cid:3) (cid:35) + 3 (cid:107) r (cid:107) ∞ E (cid:104) | ˆ V Qn +1 ( X n +1 )) − V n +1 ( X n +1 )) | (cid:105) + (cid:18) M /d (cid:19) (C.9)Under (HF) , (C.9) can then be rewritten as: (cid:107) ˆ V Qn ( X n ) − V n ( X n ) (cid:107) ≤ (cid:104) ˆ V Qn (cid:105) L (cid:16) [ F ] L ( (cid:15) Qn ) + ( (cid:15) projn ) (cid:17) + 3 (cid:107) r (cid:107) ∞ (cid:107) ˆ V Qn +1 ( X n +1 ) − V n +1 ( X n +1 ) (cid:107) + (cid:18) M /d (cid:19) . (4.5) then follows by induction, which completes the proof of Theorem 4.1. Proof. (of Corollary 4.1)Corollary 4.1 is straightforward by plugging the bound for the projection error provided by Lemma C.1and the one of the quantization error provided by the Zador’s Theorem into (4.5).

Appendix D Zador’s Theorem

Theorem D.1 (Zador’s theorem) . Let us take n = 0 , . . . , N , and denote by K the number of points forthe quantization of the exogenous noise ε n .Assume that E (cid:2) | ε n | η (cid:3) < + ∞ for some η > . Then, there exists a universal constant C > such that: lim M → + ∞ (cid:16) M d (cid:107) ˆ ε n − ε n (cid:107) (cid:17) = C Proof.

We refer to [GL00] for a proof of Theorem D.1.50 eferences [Abe+16] F. Abergel, A. Anane., A. Chakraborti, A. Jedidi, and I. Muni Toke.

Limit Order Books .Cambridge University Press, 2016.[AS07] M. Avellaneda and S. Stoikov. “High-frequency trading in a limit order book”. In:

Quantita-tive Finance

ESAIM Proceedings and surveys

65 (2019).[BBEM18] N. Baradel, B. Bouchard, D. Evangelista, and O. Mounjid. “Optimal inventory managementand order book modeling”. In: arXiv:1802.08135 (2018).[BKS10] D. Belomestny, A. Kolodko, and J. Schoenmakers. “Regression methods for stochastic controlproblems and their convergence analysis”. In:

SIAM Journal on Control and Optimization eprint arXiv:1712.09705 (2017).[BR11] N. Bäuerle and U. Rieder.

Markov Decision Processes with Applications to Finance . Springer,2011.[Bré81] P. Brémaud.

Point Processes and Queues : Martingale Dynamics . Springer, 1981.[CJ10] A. Cartea and S. Jaimungal. “Modeling Asset Prices for Algorithmic and High FrequencyTrading”. In:

Applied Mathematical Finance

Mathematical Finance

SIAM Journal on Financial Mathematics (2014), pp. 415–444.[CPJ15] A. Cartea, J. Penalva, and S. Jaimungal.

Algorithmic and High-frequency trading . CambridgeUniversity Press, 2015.[CST07] R. Cont, S. Stoikov, and R. Talreja. “A stochastic model for order book dynamics”. In:

Operations Research

58 (2007), pp. 549–563.[EAA15] S. El Aoud and F. Abergel. “A stochastic control approach for options market making.” In:

World scientiﬁc publishing company

SIAM Journal of Financial Mathematics

Applied Math-ematical Finance

Methodology and Computing in Applied Probability (2018),pp. 1–32. 51GKKW02] L. Gyorﬁ, M. Kohler, A. Krzyzak, and H. Walk.

A Distribution-Free Theory of NonparametricRegression . Springer, 2002.[GL00] S. Graf and H. Luschgy.

Foundations of quantization for probability distributions . Vol. 1730.Springer-Verlag Berlin Heidelberg, 2000.[GLFT12] O. Guéant, C.-A. Lahalle, and J. Fernandez-Tapia. “Dealing with the Inventory Risk”. In:

Mathematics and Financial Economics

Quantitative Finance

6] O. Guéant.

The Financial Mathematics of Market Liquidity: From optimal execution tomarket making . Chapman and Hall/CRC, 2016.[HLR15] W. Huang, C-A. Lehalle, and M. Rosenbaum. “Simulating and analyzing order book data:The queue-reactive model”. In:

Journal of the American Statistical Association arXiv:1812.04300 (2018).[HS79] T. Ho and H. Stoll. “Optimal dealer pricing under transactions and return uncertainty”. In:

Journal of Financial Economics

SIFIN

Monte Carlo Methods and Applica-tions arXiv:1803.05690 (2018).[Mas98] L. Massoulié. “Stability results for a general class of interacting point processes dynamics,and applications”. In:

Stochastic Processes and their Applications

International Conference on Computer Vision Theory and Applications(VISAPP) (2009).[PPP04] G. Pagès, H. Pham, and J. Printems. “Optimal quantization methods and applications tonumerical problems in ﬁnance”. In:

Handbook on Numerical Methods in Finance . Ed. bySvetlozar T. Rachev and George A. Anastassiou. Boston: Birkhäuser, 2004. Chap. 7, pp. 253–298.[Ros08] I. Rosu. “A dynamic model of the limit order book”. In: