[PDF] Market Impact in Trader-Agents: Adding Multi-Level Order-Flow Imbalance-Sensitivity to Automated Trading Systems

Abstract

Financial markets populated by human traders often exhibit "market impact", where the traders' quote-prices move in the direction of anticipated change, before any transaction has taken place, as an immediate reaction to the arrival of a large (i.e., "block") buy or sell order in the market: e.g., traders in the market know that a block buy order will push the price up, and so they immediately adjust their quote-prices upwards. Most major financial markets now involve many "robot traders", autonomous adaptive software agents, rather than humans. This paper explores how to give such trader-agents a reliable anticipatory sensitivity to block orders, such that markets populated entirely by robot traders also show market-impact effects. In a 2019 publication Church & Cliff presented initial results from a simple deterministic robot trader, ISHV, which exhibits this market impact effect via monitoring a metric of imbalance between supply and demand in the market. The novel contributions of our paper are: (a) we critique the methods used by Church & Cliff, revealing them to be weak, and argue that a more robust measure of imbalance is required; (b) we argue for the use of multi-level order-flow imbalance (MLOFI: Xu et al., 2019) as a better basis for imbalance-sensitive robot trader-agents; and (c) we demonstrate the use of the more robust MLOFI measure in extending ISHV, and also the well-known AA and ZIP trading-agent algorithms (which have both been previously shown to consistently outperform human traders). We demonstrate that the new imbalance-sensitive trader-agents introduced here do exhibit market impact effects, and hence are better-suited to operating in markets where impact is a factor of concern or interest, but do not suffer the weaknesses of the methods used by Church & Cliff. The source-code for our work reported here is freely available on GitHub.

Full PDF

MMarket Impact in Trader-Agents: Adding Multi-Level Order-FlowImbalance-Sensitivity to Automated Trading Systems

Zhen Zhang a and Dave Cliff b Department of Computer Science, University of Bristol, Bristol BS8 1UB, [email protected], [email protected]

Keywords: Market Impact, Adaptive Trader Agents, Financial Markets, Multi-Agent Systems.Abstract: Financial markets populated by human traders often exhibit so-called “market impact”, where the pricesquoted by traders move in the direction of anticipated change, before any transaction has taken place, asan immediate reaction to the arrival of a large (i.e., “block”) buy or sell order in the market: traders in themarket know that a block buy order is likely to push the price up, and that a block sell order is likely to pushthe price down, and so they immediately adjust their quote-prices accordingly. In most major ﬁnancial marketsnowadays very many of the participants are “robot traders”, autonomous adaptive software agents, rather thanhumans. This paper addresses the question of how to give such trader-agents a reliable anticipatory sensitivityto block orders, such that markets populated entirely by robot traders also show market-impact effects. Thisis desirable because impact-sensitive trader-agents will get a better price for their transactions when blockorders arrive, and because such traders can also be used for more accurate simulation models of real-worldﬁnancial markets. In a 2019 publication Church & Cliff presented initial results from a simple deterministicrobot trader, called ISHV, which was the ﬁrst such trader-agent to exhibit this market impact effect. ISHVdoes this via monitoring a metric of imbalance between supply and demand in the market. The novel contri-butions of our paper are: (a) we critique the methods used by Church & Cliff, revealing them to be weak, andargue that a more robust measure of imbalance is required; (b) we argue for the use of multi-level order-ﬂowimbalance (MLOFI: Xu et al. , 2019) as a better basis for imbalance-sensitive robot trader-agents; and (c) wedemonstrate the use of the more robust MLOFI measure in extending ISHV, and also the well-known AAand ZIP trading-agent algorithms (which have both been previously shown to consistently outperform humantraders). Our results demonstrate that the new imbalance-sensitive trader-agents introduced in this paper doexhibit market impact effects, and hence are better-suited to operating in markets where impact is a factor ofconcern or interest, but do not suffer the weaknesses of the methods used by Church & Cliff. We have madethe source-code for our work reported here freely available on GitHub.To be presented at the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART2021),Vienna, 4th–6th February 2021. a ORCID: 0000-0002-4618-6934 b ORCID: 0000-0003-3822-9364 a r X i v : . [ q -f i n . T R ] D ec INTRODUCTION

Financial markets populated by human traders often exhibit so-called market impact , where the prices quoted bytraders shift in the direction of anticipated change, as a reaction to the arrival of a large (i.e., “block”) buy orsell order for a particular asset: that is, mere knowledge of the presence of the block order is enough to trigger achange in the traders’ quote-prices, before any transaction has actually taken place, because the traders know thata block buy order is likely to push the price of the asset up, and a block sell order is likely to push the price down,and so they adjust their quote-prices accordingly, in anticipation of the shift in price that they foresee coming as aconsequence of the block-trade completing. This is bad news for the trader trying to buy or sell a block order: themoment she reveals her intention to buy a block, the market-price goes up; the moment she reveals her intentionto sell, the price goes down. From the perspective of a block-trader, the market price moves against her , whethershe is buying or selling, and this happens not because of the price she is quoting, but because of the quantity thatshe is attempting to transact.Block-traders’ collective desire to avoid market impact has long driven the introduction of automated tradingtechniques such as “VWAP engines” (which break block orders into a sequence of smaller sub-orders that arereleased into the market over a set period of time, with the intention of achieving a speciﬁc v olume- w eighted a verage p rice, hence VWAP), and has also driven the design of major new electronic exchanges such as LondonStock Exchange’s (LSE’s) Turquoise Plato trading venue (London Stock Exchange Group, 2019), in which block-traders are allowed to obscure the size of their blocks in a so-called dark pool market, with LSE’s automatedmatching engine identifying one or more willing counterparties and only making full details of the block-tradeknown to all market participants after it has completed: see (Petrescu and Wedow, 2017) for further discussion.Many of the world’s major ﬁnancial markets now have very high levels of automated trading: in such marketsmost of the participants, the traders, are “robots” rather than humans: i.e., software systems for automated trading,empowered with the same legal sense of agency as a human trader, and hence “software agents” in the most literalsense of that phrase. Given that these software agents typically replace more than one human trader, and giventhat those human traders were widely regarded to have required a high degree of intelligence (and remuneration)to work well in a ﬁnancial market, it is clear that the issue of designing well-performing robot traders presentschallenges for research in agents and artiﬁcial intelligence, and hence is a research topic that is central to thethemes of the ICAART conference.This paper addresses the question of best how to give robot traders an appropriate anticipatory sensitivityto large orders, such that markets populated entirely by robot traders also show market-impact effects. This isdesirable because the impact-sensitive robot traders will get a better price for their transactions when block ordersdo arrive, and also because simulated market populated by impact-sensitive automated traders can be studied toexplore the pros and cons of various impact-mitigation or avoidance techniques. We show here that well-knownand long-established trader-agent strategies can be extended by giving them appropriately robust sensitivity tothe imbalance between buy and sell orders issued by traders on the exchange, orders that are aggregated on themarket’s limit order-book (LOB), the data-structure at the heart of most electronic exchanges.To the best of our knowledge, the ﬁrst paper to report on the use of an imbalance metric to give automatedtraders impact-sensitivity was the recent 2019 publication by Church & Cliff, in which they demonstrated howa minimal nonadaptive trader-agent called

Shaver (which, following the convention of practice in this ﬁeld,is routinely referred to in abbreviated form via a psuedo-ticker-symbol: “SHVR”) could be extended to showimpact effects by addition of an imbalance metric, and Church & Cliff gave the name

ISHV to their Imbalance-SHVR trader-agent (Church and Cliff, 2019). SHVR is a trader-agent strategy built-in to the popular open-sourceﬁnancial exchange simulator called BSE (BSE, 2012), which Church & Cliff used as the platform for theirresearch. Although Church & Cliff deserve some credit for the proof-of-concept that ISHV provides, we arguehere that the imbalance-metric they employed is too fragile for practical purposes because very minor changesin the supply and demand can cause their metric to swing wildly between the extremes of its range. One ofthe major contributions of our paper here is the demonstration that a much better, more robust, metric known as multi-level order-ﬂow imbalance (MLOFI) can be used instead of the comparatively very fragile metric proposedby Church & Cliff. Another major contribution of this paper is our demonstration of the addition of MLOFI-based impact-sensitivity to the very well-known and widely cited public-domain adaptive trader-agent strategiesZIP (Cliff, 1997) and AA (Vytelingum, 2006; Vytelingum et al., 2008). Although our primary aim was to addimpact sensitivity to these two machine-learning-based trader-agent strategies, we also demonstrate in this paperthat ISHV can be altered/extended to use MLOFI, and our improvement of Church & Cliff’s work in that regards an additional contribution of this paper.The extended versions of the AA, ZIP, and ISHV trader-agent strategies that we introduce here are namedZZIAA, ZZIZIP, and ZZISHV respectively. In this paper, after our criticism of Church & Cliff’s methods, wedescribed more mathematically sophisticated approaches to measuring imbalance, which are more robust, andwhich we incorporate into our agent extensions. We then present results from testing our extended trader-agentson BSE, the same platform that was used in Church & Cliff’s work. Full details of the work reported here areavailable in (Zhang, 2020b), and all the relevant source-code has been made freely available on

GitHub (Zhang,2020a).The structure of the rest of this paper is as follows. Section 2 presents a broad overview of relevant back-ground material: readers already familiar with automated trading systems and contemporary electronic ﬁnancialexchanges can safely skip ahead straight to Section 3. In Section 3 we give a brief summary of Church & Cliff’s2019 work and then provide our detailed critique of their core method, which we demonstrate to be signiﬁcantlylacking in robustness, and we then describe our MLOFI approach in detail. Then Section 4 is where we describethe steps taken to add MLOFI-based impact-sensitivity to ZIP, AA, and ISHV; and the results from those extendedtrading algorithms are presented in Section 5. We discuss further work and draw our conclusions in Section 6.

Since the mid-1990s researchers in universities and in the research labs of major corporations such as IBM andHewlett-Packard have published details of various strategies for autonomous trader-agents, often incorporatingAI and/or machine learning (ML) methods so that the automated trader can adapt its behaviors to prevailingmarket conditions. Notable trading strategies in this body of literature include: Kaplan’s “Sniper” strategy (Rustet al., 1992); Gode & Sunder’s ZIC (Gode and Sunder, 1993); the ZIP strategy developed at Hewlett-Packard(Cliff, 1997); the GD strategy reported by Gjerstad & Dickhaut (Gjerstad and Dickhaut, 1997) the MGD andGDX automated traders developed by IBM researchers (Tesauro and Das, 2001; Tesauro and Bredin, 2002);Gjerstad’s HBL (Gjerstad, 2003); Vytelingum’s AA (Vytelingum, 2006; Vytelingum et al., 2008)); and the Roth-Erev approach (see e.g. (Pentapalli, 2008)). However, for reasons discussed at length in a recent review of keypapers in the ﬁeld (Snashall and Cliff, 2019) this sequence of publications concentrated on the issue of developingtrading strategies for orders that all had the same ﬁxed-size quantity, and that quantity was always one. That is, none of the key papers listed here deal with trading strategies for outsize block orders, and none of them directlyexplore the issue of how an automated trader can best deal with, or avoid, market impact.Trader-agent strategies such as Sniper, ZIC, ZIP, GD and MGD were all developed to operate in electronicmarkets that were based on old-school open-outcry trading pits, as were common on major ﬁnancial exchangesuntil face-to-face human-to-human bargaining was replaced by negotiation of trades via electronic communica-tion media; but more recent work has concentrated on developing trading agents that issue bids and asks (i.e.quotations for orders to buy or to sell) to a centralised electronic exchange (such as a major stock-market likeNYSE or NASDAQ or LSE) where the exchange’s matching engine then either matches the trader’s quote witha willing counterparty (in which case a transaction is recorded between the two counterparties, the buyer andthe seller) or the quote is added to a data-structure called the

Limit Order Book (LOB) that is maintained by theexchange and published to all traders whenever it changes. The LOB aggregates and anonymises all outstand-ing orders: it has two sides or halves: the bid-side and the ask-side. Each side of the book shows a summaryof all outstanding orders, arranged from best to worst: this means that the bid-side is arranged in descendingprice-order, and the ask side is arranged in ascending price-order, such that at the “top of the book” on the twosides the best bid and ask are visible. For all orders currently sat on the LOB, if there are multiple orders at thesame price then the quantities of those orders are aggregated together, and often multiple orders at the same pricewill be later matched with a counterparty in a sequence given by the orders’ arrival times, in a ﬁrst-in-ﬁrst-outfashion. The public LOB shows only, for each side of the book, the prices at which orders have been lodged withthe exchange, and the total quantity available at each of those prices: if no orders are resting at the exchange fora particular price, then that price is usually omitted from the LOB rather than being shown with a correspondingquantity of zero. Illustrations of LOBs appear later in this paper, commencing with Figure 2.The difference between the price of the best bid on the LOB at time t and the price of the best ask at t isknown as the spread . The mid-point of the spread (i.e. the arithmetic mean of the best bid and the best ask) isknown as the mid-price which is denoted here by P mid . The mid-price is very commonly used as a single-valuedtatistic to summarise the current state of the market, and as an estimate of what the next transaction price wouldbe. However, the midprice pays no attention to the quantities that are bid and offered. If the current best bid isfor a quantity of one at a price of $10 and the current best ask is for a quantity of 200 at a price of $20 then themid-price is $15 but that fails to capture that there is a much larger quantity being offered than being bid: basicmicroeconomics, the theory of supply and demand, would tell even the most casual observer that with such heavyselling pressure then actual transaction prices are likely to trend down – in which case the mid-price of $15 islikely to be an overestimate of the next transaction price. Similarly, if the bid and ask prices remain the same butthe imbalance between supply and demand is instead reversed, then the fact that there is a revealed desire for 200units to be purchased but only one unit on sale at the current best ask would surely be a reasonable indicationthat transaction prices are likely to be pushed up by buying pressure, in which case the mid-price of $15 willturn out to be an underestimate. This lack of quantity-sensitivity in the mid-price calculation leads many marketpractitioners to instead monitor the micro-price , denoted here by P micro , which is a quantity-weighted average ofthe best bid and best ask prices, and which does move in the direction indicated by imbalances between supplyand demand at the top of the LOB: see, e.g., (Cartea et al., 2015).To the best of our knowledge the ﬁrst impact-sensitive trading algorithm was ISHV (Church and Cliff, 2019).ISHV is based on the

SHVR trader built into the popular

BSE public-domain ﬁnancial-market simulator (BSE,2012; Cliff, 2018). A SHVR trader simply posts the buy/sell order with its price set one penny higher/lowerthan the current best bid/ask. This single instruction gives it a parasitic nature, in the sense that it can mimic theprice-convergence behaviour of other strategies being used by other traders in the market.Instead of shaving the best bid or offer by one penny, Church & Cliff’s ISHV trader instead chooses to shaveby an amount ∆ s which varies with ∆ m deﬁned in Equation 1: ∆ m = P micro − P mid (1)The difference of the micro-price and the mid-price can identify the degree of supply/demand imbalance to auseful extent. If ∆ m ≈

0, there is no obvious imbalance in the market. If ∆ m <

0, then the quantities of the bestbid and the best offer on the LOB indicate that supply exceeds demand and the subsequent transactions prices arelikely to decrease; whereas ∆ m > ∆ m to ∆ s todetermine how much it will shave off its price. For a buyer, if ∆ m <

0, it knows the price will shift in its favourand shaves its price as little as possible (the exchange’s minimum tick-size ∆ p – often one penny or one cent –is chosen). However, if ∆ m >

0, ISHV “believes” the later prices will be worse and attempts to shave a largeamount off ( C ∆ p + M ∆ m ∆ p ). C and M are two constants that determine the SHVR’s response to the imbalance(they are the y − intersect and gradient for a linear response function; nonlinear response functions could be usedinstead). Church & Cliff showed that ISHV can identify and respond appropriately to the presence of a blockorder signal at the top of the LOB. Figure 1: Pseudocode for the bidding behavior of ISHV, source from (Church and Cliff, 2019).

Church & Cliff were careful to ﬂag their ISHV trader as only a proof-of-concept (PoC): ISHV was developedto enable the study of coupled lit/dark trading polls such as LSE

Turquoise Plato system in commercial operationin London, as mentioned in the Introduction to this paper. Without impact-sensitive trader-agents, it is not possibleo build agent-based models of contemporary real-world trading venues such as LSE Turquoise Plato. Havingexperimented further with Church & Cliff’s PoC system, we came to realise that there are severe limitations inISHV as described in Figure 1: these limitations stem from the fact that Equation 1, which is at the heart of ISHV,uses values only found at the top of the LOB . That is, Equation 1 involves only the price and quantity of the bestbid and the best ask. As we will demonstrate in the next section, this makes the method introduced by Church &Cliff so fragile that it is unlikely to be usable in anything but the simplest of simulation studies; as we show inthe next section, for real-world markets it is necessary to look deeper into the LOB, to go beyond the top of theLOB.

For brevity, we will limit ourselves here to presenting a qualitative illustrative example which demonstrateshow wildly fragile the Church & Cliff method is. For a longer and more detailed discussion, see Chapter 3 of(Zhang, 2020b).Consider a situation in which the top of the LOB has a best bid price of $10 and a best ask price of $20, asbefore, and where the quantity at the best bid is 200 and at the best ask is 1. As we explained in the previoussection, this huge imbalance between supply and demand at the top of the book indicate that the excess demandis likely to push transaction prices up in the immediate future. Church & Cliff’s ISHV does the right thing in thissituation.Now consider what happens if the next order to arrive at the exchange is a bid for $11 at a quantity of 1.Because this fresh bid is at a higher price than the current best bid, it is inserted at the top of the bid-side of theLOB. The previous best-bid, for 200 at $10, gets shufﬂed down to the second layer of the LOB. At that point, thebest bid and the best ask each show a quantity of one, and so ISHV acts as if there is no imbalance in the market,despite the fact when viewing the whole LOB it is clear that the quantity bid is now 201 (i.e. 1 at $11 and 200 at$10) while the ask quantity is still only 1: if anything, the imbalance has increased but ISHV reacts as if it had disappeared because ISHV looks only at the top of the LOB.There is more that could be said, but this should be enough to convince the reader that any impact-sensitivetrader-agent algorithm that looks only at the data at the top of the LOB is surely going to get it wrong very often,because it is ignoring the supply and demand information, the quantities and the prices, which lie deeper in theLOB. What we introduce in the rest of this paper addresses this problem.

A reliable metric is needed to capture the quantity imbalance between the supply side and the demand side, atmultiple levels in the LOB (i.e., not just the top) and which can quantitatively indicate how much the imbalancewill affect the market. We ﬁrst discuss the Order-Flow Imbalance (OFI) metric introduced by (Cont et al., 2014)and then describe the extension of this into a reliable Multi-Level OFI (MLOFI) metric very recently reportedby (Xu et al., 2019). After that, we show how MLOFI can be used to give robust impact-sensitivity to ISHV(Church and Cliff, 2019), AA (Vytelingum, 2006; Vytelingum et al., 2008), and ZIP (Cliff, 1997). AA and ZIPare of particular interest because in previous papers published at IJCAI and at ICAART it was demonstrated thatthese two trader-agent strategies can each reliably outperform human traders (Das et al., 2001; De Luca and Cliff,2011b; De Luca and Cliff, 2011a; De Luca et al., 2011).

Cont et al. argued that previous studies modelling impact are extremely complex, and that instead a single factor,the order ﬂow imbalance (OFI), can adequately explain the impact ( R =

67% in their research) (Cont et al.,2014). They indicated that

OFI has a positive linear relation with mid-price changes, and that the market depth D is inversely proportional to the scope of the relationship. OFI means the net order ﬂow at the bid-side and theask-side, and the market depth, D , represents the size at each bid/ask quote price.To calculate the OFI they focused on the “Level 1 order book”, i.e. the best bid and ask at the top of the LOB.Between any two events ( event n and event n − ), only one change happens in the LOB (check the condition fromop to bottom, and from left to right; in other words, we should compare the change of price ﬁrst and if the pricedoes not change, then compare the change of quantity). Using D ↑ and D ↓ to respectively denote an increase anda reduction in demand; and S ↑ and S ↓ to denote an increase/decrease in supply, Cont et al. had: p bn > p bn − ∨ q bn > q bn − = ⇒ D ↑ p bn < p bn − ∨ q bn < q bn − = ⇒ D ↓ p an < p an − ∨ q an > q an − = ⇒ S ↑ p an > p an − ∨ q an < q an − = ⇒ S ↓ Where p b is the best bid price; q b the size of the best bid price; p a the best ask price; and q a the size of thebest ask price. The variable e n is deﬁned to measure this tick change between two events, ( event n and event n − ),shown in Equation 2, where I can be regarded as a Boolean variable. e n = I { p bn > p bn − } q bn − I { p bn ≤ p bn − } q bn − − I { p an < p an − } q an + I { p an ≥ p an − } q an − (2)The rules for I are as follows, and only one of them will happen between any two consecutive events:1. if p b increases, e n = q bn

2. if p b decreases, e n = − q bn −

3. if p a increases, e n = q an −

4. if p a decreases, e n = − q an

5. if p b remains same and q bn (cid:54) = q bn − , e n = q bn − q bn −

6. if p a remains same and q an (cid:54) = q an − , e n = q an − − q an If N ( t k ) is the number of events during [0, t k ], then OFI k refers to the cumulative effect of e n that hasoccurred over the time interval [ t k − t k ], as shown in Equation 3. OFI k = N ( t k ) ∑ n = N ( t k − )+ e n (3)After this, a linear regression equation can be built, per Equation 4, where ∆ P k = ( P k − P k − ) / δ and δ is the ticksize (1 cent in Cont et al.’s experiments), β is the price impact coefﬁcient, and ε k is the noise term mainly causedby contributions from lower levels of the LOB: ∆ P k = β OFI k + ε k (4)Moreover, Cont et al. stated that the market depth, D , is an important contributing factor to the ﬂuctuations, andis inversely proportional to mid-price changes. They deﬁned the average market depth, AD k , in the “Level 1 orderbook” as shown in Equation 5; and β can be measured by AD k , shown in Equation 6, where λ and c are constantsand v k is a noise term. AD k = ( N ( T k ) − N ( T k ) − ) N ( T k ) ∑ N ( T k − )+ ( q Bn + q An ) (5) β k = cAD λ k + v k (6)Given equations 4 and 6, the relationship between ∆ P and OFI and AD is constructed as seen in Equation 7,according to which, Cont et al. ran the linear regression by using the 21-trading-day data from 50 randomlychosen US stocks, and the average R = OFI is positive in relation to the changeof mid-price. If

OFI >

0, meaning a net inﬂow on the bid side or a net outﬂow on the ask side, the mid-price hasa signiﬁcantly increasing momentum, and the higher

OFI is, the more the mid-price will increase. Conversely, if

OFI <

0, meaning a net outﬂow on the bid side or a net inﬂow on the ask side, the mid-price has a signiﬁcantlydecreasing momentum, and the lower

OFI is, the more the mid-price will decrease. ∆ P k = cAD λ k OFI k + ε k (7)FI is clearly a useful metric, but it operates only on values found at the top of the LOB, i.e. the best bidand ask. In that sense, it is as sensitive to changes at the top of the book as is the Church & Cliff ∆ m metric.Next we describe how OFI can be extended to be sensitive to values at multiple levels in the LOB, which givesus Multi-Level OFI, or MLOFI. Fortunately, (Xu et al., 2019) demonstrated how to measure multi-level order ﬂow imbalance (MLOFI). Aquantity vector, v , is used to record the OFI at each discrete level in the LOB: see Equation 8, where m denotesthe depth of price level in the LOB. The level- m bid-price refers to the m -highest prices among bids in the LOB,and the level- m ask-price refers to the m -lowest prices among asks in the LOB. v = (cid:18) MLOFI MLOFI ... MLOFI m (cid:19) (8)The time when an n th event occurs is denoted by τ n ; p mb ( τ n ) signiﬁes the level- m bid-price; p ma ( τ n ) denotesthe level- m ask-price; q mb ( τ n ) refers to the total quantity at the level- m bid-price, and q ma ( τ n ) refers to the totalquantity at the level- m ask-price.Similar to the OFI deﬁned in Section 4.1, the level- m OFI between two consecutive events occurring at times τ s and τ n ( s = n −

1) can be calculated as follows: ∆ W m ( τ n ) =  q mb ( τ n ) , if p mb ( τ n ) > p mb ( τ s ) q mb ( τ n ) − q mb ( τ s ) , if p mb ( τ n ) = p mb ( τ s ) − q mb ( τ m ) , if p mb ( τ n ) < p mb ( τ s ) (9)and ∆ V m ( τ n ) =  − q ma ( τ m ) , if p ma ( τ n ) > p ma ( τ s ) q ma ( τ n ) − q ma ( τ s ) , if p ma ( τ n ) = p ma ( τ s ) q ma ( τ n ) , if p ma ( τ n ) < p ma ( τ s ) (10)where ∆ W m ( τ n ) measures the order ﬂow imbalance of the bid side in the level- m and ∆ V m ( τ n ) measures the orderﬂow imbalance of the ask side in the level- m .From equations 9 and 10, we can get the MLOFI in the level- m over the time interval [ t k − t k ]: MLOFI mk = ∑ { n | t k − < τ n < t k } e m ( τ n ) (11)where e m ( τ n ) = ∆ W m ( τ n ) − ∆ V m ( τ n ) (12)We now give four illustrative examples of the MLOFI mechanism in action. Figure 2 shows the situation ofthe LOB at time t k − , and there is only one event that occurs during the time interval [ t k − , t k ], and here we’ll onlyconsider the 3-level OFI. Figure 2: The LOB at time t k − .igure 3: The LOB at time t k : a new buy order comes. A new buy order comes into the LOB and occupies the best-bid position shown in Figure 3.• Level-1: since p b ( t k ) > p b ( t k − ) (i.e. 93 > MLOFI k = q b ( t k ) = p b ( t k ) > p b ( t k − ) (i.e. 90 > MLOFI k = q b ( t k ) = p b ( t k ) > p b ( t k − ) (i.e. 87 > MLOFI k = q b ( t k ) = v k is: v k = (cid:16) (cid:17) (13)All three numbers in v k are positive, which indicates the upward trend of the price. A new sell limit order crosses the spread, or a buy limit order at the best-bid position cancels. Figure 4 shows theresultant LOB.

Figure 4: The LOB at time t k : crossing the spread or a buy order cancellation. For the level-1, as p b ( t k ) = p b ( t k − ) (i.e. 90 = MLOFI k = q b ( t k ) − q b ( t k − ) = − = − p b ( t k ) = p b ( t k − ) (i.e. 87 = MLOFI k = q b ( t k ) − q b ( t k − ) − = p b ( t k ) = p b ( t k − ) (i.e. 82 = MLOFI k = q b ( t k ) − q b ( t k − ) = v k is: v k = (cid:16) − (cid:17) (14)Where − This is similar to Case 2, but (as illustrated in Figure 5) assumes that all orders at Level 1 in the ask book ( A )are transacted by an incoming buy order, or that the order in A is cancelled. In this case, we need to consider thechange on the ask side:• A : p a ( t k ) > p a ( t k − ) (i.e. 98 > = ⇒ ∆ V ( t k ) = − q a ( t k − ) = − MLOFI k = − ∆ V ( t k ) = A : p a ( t k ) > p a ( t k − ) (i.e. 100 > = ⇒ ∆ V ( t k ) = − q a ( t k − ) = − MLOFI k = − ∆ V ( t k ) = igure 5: The LOB at time t k : crossing the spread or a sell order cancellation. • A : p a ( t k ) > p a ( t k − ) (i.e. 105 > = ⇒ ∆ V ( t k ) = − q b ( t k − ) = − MLOFI k = − ∆ V ( t k ) = v k shown in Equation 15 demonstrates that if the supply reduces or a buy has sufﬁcientinterest to transact, the price tends to go up. v k = (cid:16) (cid:17) (15) m of the LOB Figure 6: The LOB at time t k : crossing the spread or a sell order cancellation. Assuming now that a new large-sized order comes to the level-2 ask, if we only consider order ﬂow imbalancein the top level of the LOB, we cannot detect this new block order. This is the reason why we choose to useMLOFI.As there is no change in the level-1 bid,

MLOFI k =

0. Because a new order comes to the second-level bid, p b ( t k ) > p b ( t k − ) (i.e. 89 >

87) and

MLOFI k = q b ( t k ) = MLOFI k = q b ( t k ) = v k is: v k = (cid:16) (cid:17) (16)If we only care about ﬁrst-level order ﬂow imbalance, we get OFI =

0. However, if we consider second andthird levels, we get

MLOFI k =

100 and

MLOFI k =

2, which indicate a huge surplus on the demand side. If atrader can obtain this information and take action accordingly, it may result in larger proﬁts or smaller losses.

In this section we describe how ZZIAA is created, by the addition of MLOFI-style imbalance-sensitivity tothe original AA trader strategy. Our intention for ZZIAA was to develop an “impact-sensitive” module that is notdeeply embedded into the original AA so that, if successful, this relatively independent module could also easilybe applied to other trading algorithms. For this reason we chose the Widrow-Hoff delta rule to update the quoteof the ZZIAA towards an impact-sensitive quote, as shown in Equation 17. The p AA ( t + ) is derived from thelong-term and short-term factors using the information at time t (see (Vytelingum et al., 2008)), and τ ( t ) is thetarget price computed with consideration of MLOFI: p IAA ( t + ) = p AA ( t + ) + ∆ ( t ) (17)here ∆ ( t ) = β ( τ ( t ) − p AA ( t + )) (18)and τ ( t ) = p benchmark ( t ) + o offset ( t ) (19)The core of the IAA derivation is how to ﬁnd τ ( t ) , which consists of two parts, the benchmark price p benchmark ( t ) and o offset ( t ) . The p benchmark ( t ) depends on whether the mid-price exists. As Equation 20 shows,if the mid-pice is available, we can set p benchmark ( t ) as the mid-price, but if it is not, we set p benchmark ( t ) as p AA ( t + ) , which can be obtained at time t. p benchmark ( t ) = (cid:26) p mid ( t ) , if ∃ p mid p AA ( t + ) , if (cid:64) p mid (20)The o offset ( t ) is derived from the MLOFI and the average depth. Assume that we consider M numbers oflevels MLOFI in the LOB, shown in Equation 21, and that each MLOFI captures the last N events shown inEquation 22. a ( t ) =  MLOFI ( t ) MLOFI ( t ) ...... MLOFI M ( t )  (21)where MLOFI M ( t ) = N ∑ n = e mn (22)We can deﬁne the average market depth for m levels in a similar way: d ( t ) =  AD ( t ) AD ( t ) ...... AD M ( t )  (23)where: AD M ( t ) = N N ∑ n = q aM n + q bM n a ( t ) , we need a mechanism to switch this vector to a scalar. Similar to Equation 7,we deﬁne the offset as Equation 25. v offset = i = m − ∑ i = α i c ∗ MLOFI ( i + ) ( t ) AD ( i + ) ( t ) (25)where α is the decay factor (initialized as 0.8) and c is a constant (we use c = AD m ( t ) =

0, the item α m − c ∗ MLOFI m ( t ) AD m ( t ) will not be counted.To summarise, our work extends AA by the novel introduction of prior contributions to the econometrics ofLOB imbalance from Cont et al. and of Xu et al. in the following ways:• Cont et al. and Xu et al. run linear regressions to build their model and use statistical methods to test thesigniﬁcance of factors. The constants such as c come from modelling real-world data. However our versiondoes not run a linear regression and the constants such as c and α are determined based on previous studies(Cont et al., 2014; Xu et al., 2019). We can check the model’s performance by exploring different values ofconstants.• In the prior work, MLOFI M ( t ) and AD M ( t ) are inﬂuenced by the events within a speciﬁed time interval. Incontrast, in our work, MLOFI M ( t ) and AD M ( t ) are calculated based on the last N events that occurred in theLOB, regardless of length of the time interval between successive events. RESULTS

Because our MOLFI-based “impact sensitive” module added to AA was deliberately developed in a non-intrusive way, it can easily be replicated into any other algorithm. In this section we ﬁrst show results fromZZIAA and then we follow those with results from adding the MLOFI module to ISHV (giving ZZISHV), and toZIP (giving ZZIZIP). Because of space limitations, the performance comparisons shown here focus on situationswhere the imbalance would cause a problem for the non-imbalance sensitive versions of the trader agents – and wedemonstrate that our extended trader agents are indeed superior. Extensive sets of further results are presented in(Zhang, 2020b), which demonstrate that the extended trader-agents perform the same as the unextended versionsin situations where there is no imbalance to be concerned about in the LOB.For each A:B comparison we ran 100 trials in BSE (BSE, 2012), the same open-source simulator of a ﬁnancialexchange that was used by Church & Cliff. Each trial involved creating a market where there were N traders oftype A (e.g., ZIP) and N traders of type B (e.g., ZZIZIP) who were allocated the role of buyers, and similarly N of type A and N of type B who were allocated the role of sellers. Thus one market trial involved a total of 4 N trader-agents: for the results presented here we used N =

10. As is entirely commonplace in all such experimentalwork, buyers were issued with assignments of cash, and sellers with assignments of items to sell, and each traderwas given a private limit price : the price below which a seller could not sell and above which a buyer could notbuy. The distribution of limit prices in the market determines that market’s supply and demand curves, and theintersection of those two curves indicates the competitive equilibrium price that microeconomic theory tells us toexpect transaction prices to converge to.Although very many of the previous trader-agent papers that we have cited here have monitored the efﬁciency of the traders’ activity in the market, we instead monitored proﬁtability (which only differs from efﬁciency bysome constant coefﬁcient). Each individual market trial would allow the traders to interact via the LOB-basedexchange in BSE for a ﬁxed period of time, and at the end of the session the average proﬁt of the Type Atraders would be recorded, along with the average proﬁt of the Type B traders. In the results presented herewe conducted 100 independent and identically distributed market trials for each A:B comparison, giving us 100pairs of proﬁtability ﬁgures. To summarise those results we plot as box-and-whisker charts the distribution ofproﬁtability values for traders of Type A, the distribution of proﬁtability values for traders of Type B, and thedistribution of proﬁtability-difference values (i.e., for each of the 100 trials, for trial t compute the differencebetween the proﬁtability of Type A traders and the proﬁtability of Type B traders in trial i ). To determine whetherthe differences we observed were statistically signiﬁcant, we used the Wilcoxon-Mann-Whitney U Test. Figure 7 summarises the comparison data generated between AA and ZZIAA. In the U test, when comparingthe ZZIAA with AA, p = .

007 which meant that the proﬁt difference between ZZIAA and AA was statisticallysigniﬁcant.

We can see from Figure 8 that the proﬁt generated by ZZISHV was much greater than ISHV. However, this onlymeans that ZZISHV is better than ISHV under this particular market condition, and this might not be the caseunder other market conditions. In the test, the outperformance of ZZISHV can easily be explained: as a seller,when ISHV met favourable imbalances, it worked like SHVR and posted a price one penny lower than the currentbest ask; in contrast, under the same condition, ZZISHV chose to set price ∆ p higher than the current best askand seek for transaction opportunities some time later. For example, assume that the current best ask is 70 andISHV will post an order with the price equal to 69. Assume that ZZISHV gets the offset value equal to 20 fromthe “impact-sensitive” module, and the quoted price will be 90.The aim of both ISHV and ZZISHV is the same: to be sensitive to imbalances in the market. The former usesa function that maps from ∆ m to ∆ s to achieve this objective and ∆ m is generated based on the mid- and micro-prices in the market. In contrast, the latter uses MLOFI to achieve the goal. The biggest difference between ISHVand ZZISHV is that ISHV can only be sensitive to imbalances at the top of the LOB and the MLOFI mechanismhelps ZZISHV to be sensitive to m -level imbalances on the LOB and thus detect them earlier than ISHV in somecases. The drawback comes in determination of appropriate parameter values for both ISHV and ZZISHV, where igure 7: Proﬁt distributions from original AA tested against ZZIAA.Figure 8: Performance of ZZISHV and ISHV when facing large-sized orders from the bid side. trial-and-error is the best current option. In the map function of ISHV ( ∆ s = C ∆ p ± M . ∆ m . ∆ p if the imbalanceis signiﬁcant), the parameters C and M were somewhat arbitrarily set by (Church and Cliff, 2019) to C = =

1. For ZZISHV, when quantifying MLOFI, we use Equation 25, and the key parameter c and decay factor α are artiﬁcially determined. We set m = α = .

8. Theoptimal values of these parameters are not known; poor choices of these constants may cause agents to performbadly. .3 Comparison of ZZIZIP and ZIP

ZZIZIP is ZIP with the addition of the MLOFI module. In the example we present here, sellers will face anexcess imbalance from the demand side. The box plots in Figure 9 illustrate the results: ZZIZIP has less variancethan ZIP and their median proﬁtability was slightly higher than that of ZIP; in the second ﬁgure, we can seethat although there were some outliers on both the top and bottom, and the bottom whisker was located belowzero, the whole box was distributed beyond zero. Employing the U Test, we got p = .

002 and can thereforeconclude that the proﬁt generated by ZZIZIP was statistically signiﬁcantly greater than ZIP. Despite this, it isworth noting that the average difference in proﬁtability is less than half of the difference between AA and IAA,given that other conditions remain unchanged. So, our next question is: what causes the smaller difference inproﬁts between ZZIZIP and ZIP?

Figure 9: Performance of ZIP and ZZIZIP when facing large-sized orders from the bid side.

To answer this, we need to examine how ZIP works. ZIP uses the Widrow-Hoff Delta rule to update its nextquote-price towards its current target price. The current target price is based on the last quote price in the market.Due to this, the last quote price affects the bidding behaviour of ZIP considerably. In this test, on the ask side, the10 ZIP sellers were not impact-sensitive and the 10 ZZIZIP sellers were. But, although the ZIP traders were notthemselves impact-sensitive, they were affected by the quote prices coming from the ZZIZIP active in the samemarket, and so the ZIPs’ quote prices approached the ZZIZIPs’ to some extent. In other words, this adaptivemechanisms within the non-impact-sensitive ZIP gave it a degree of impact-sensitivity, because it was inﬂuencedby the activities of the impact-sensitive traders in the market. In the test, if we treat ZZIZIP and ZIP as a group,the average proﬁt generated is 84.82 (95% CI: [82.16, 87.48]). If we replace 10 ZZIZIPs with 10 ZIPs (total 20ZIP sellers), the average proﬁt of ZIP is 79.21 (95% CI: [77.11, 81.31]). With the presence of ZZIZIP, all sellerstend to make more proﬁt.

We know of no paper prior to (Church and Cliff, 2019) in which trader-agents are given a sensitivity to quantityimbalances between the bid and ask sides of the LOB. Such imbalances are often (but not always) caused by thearrival of one or more block orders on one side of the LOB. In this paper we have provided a constructive critiqueof Church & Cliff’s method, pointing out the extreme fragility of imbalance-sensitivity metrics like theirs thatmonitor only the top of the LOB. We then explained the OFI and MLOFI metrics of (Cont et al., 2014) and (Xuet al., 2019) respectively, and demonstrated how MLOFI could be integrated within Vytelingum’s AA trading-gent strategy to give ZZIAA. We demonstrated that ZZIAA performs extremely well: it performs the same asAA when there is no imbalance, and signiﬁcantly outperforms AA in the presence of major LOB imbalance. Wethen showed how the imbalance-sensitivity mechanisms that we developed for ZZIAA can readily be incorporatedinto other trading-agent algorithms such as ZIP (Cliff, 1997) and SHVR (Cliff, 2018). Results from ZZIZIP andZZISHV are similarly very good and further demonstrate that the mechanisms developed here have given robustimbalance-sensitivity to a range of trader-agent strategies. In future work we intend to explore the addition ofMLOFI-based impact-sensitivity to contemporary adaptive trader-agents based on deep learning neural networks(le Calvez and Cliff, 2018; Wray et al., 2020). Complete details of the work described here are given in (Zhang,2020b) and all of our relevant source-code for the system described here has been made freely available as open-source code on

GitHub (Zhang, 2020a), enabling other researchers to examine, replicate, and extend our work.

REFERENCES

BSE (2012). Bristol Stock Exchange open-source ﬁnancial exchange simulator. GitHub repository: https://github.com/davecliff/BristolStockExchange .Cartea, A., Jaimungal, S., and Penalva, J. (2015).

Algorithmic and High-Frequency Trading . Cambridge Univer-sity Press.Church, G. and Cliff, D. (2019). A simulator for studying automated block trading on a coupled darks/lit ﬁnancialexchange with reputation tracking.

Proceedings of the European Modelling and Simulation Symposium .Cliff, D. (1997). Minimal-intelligence agents for bargaining behaviors in market-based environments.

Hewlett-Packard Labs Technical Report HPL-97-91 .Cliff, D. (2018). An open-source limit-order-book exchange for teaching and research. In

Proceedings of theIEEE Symposium Series on Computational Intelligence (SSCI-2018) , pages 1853–1860.Cont, R., Kukanov, A., and Stoikov, S. (2014). The price impact of order book events.

Journal of FinancialEconometrics , 12(1):47–88.Das, R., Hanson, J., Tesauro, G., and Khephart, J. (2001). Agent-human interactions in the continuous doubleauction.

Proceedings IJCAI-2001 , pages 1169–1176.De Luca, M. and Cliff, D. (2011a). Agent-human interactions in the continuous double auction, redux.

Proceed-ings ICAART-2011 .De Luca, M. and Cliff, D. (2011b). Human-agent auction interactions: Adaptive-aggressive agents dominate.

Proceedings IJCAI-2011 .De Luca, M., Szostek, C., Cartlidge, J., and Cliff, D. (2011). Studies of interactions between human traders andalgorithmic trading systems. Driver Review 13, UK Government Ofﬁce for Science, Foresight Project onthe Future of Computer Trading in Financial Markets. http://bit.ly/RoifIu .Gjerstad, S. (2003). The impact of pace in double auction bargaining. Working paper, Department of Economics,University of Arizona.Gjerstad, S. and Dickhaut, J. (1997). Price formation in continuous double auctions.

Games and EconomicBehavior , 22(1):1–29.Gode, D. K. and Sunder, S. (1993). Allocative efﬁciency of markets with zero-intelligence traders.

Journal ofPolitical Economy , 101(1):119–137.le Calvez, A. and Cliff, D. (2018). Deep learning can replicate adaptive traders in a limit-order-book ﬁnancialmarket. In

Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI-2018) , pages1876–1883.London Stock Exchange Group (2019). Turquoise trading service.

Description Version 3.35.5.

Pentapalli, M. (2008).

A comparative study of Roth-Erev and Modiﬁed Roth-Erev reinforcement learning algo-rithms for uniform-price double auctions . PhD thesis, Iowa State University.Petrescu, M. and Wedow, M. (2017). Dark pools in european equity markets: emergence, competition andimplications.

ECB Occasional Paper , 193.ust, J., Palmer, R., and Miller, J. H. (1992). Behaviour of trading automata in a computerized double auctionmarket. In

The Double Auction Market: Theories and Evidence , pages 155–198. Addison Wesley.Snashall, D. and Cliff, D. (2019). Adaptive-aggressive traders don’t dominate. In van den Herik, J., Rocha,A., and Steels, L., editors,

Agents and Artiﬁcial Intelligence: Selected Papers from ICAART 2019 , pages246–269. Springer.Tesauro, G. and Bredin, J. L. (2002). Strategic sequential bidding in auctions using dynamic programming. In

Proc. First Int. Joint Conf. on Autonomous Agents and Multiagent Systems: part 2 , pages 591–598.Tesauro, G. and Das, R. (2001). High-performance bidding agents for the continuous double auction.

Proc. 3rdACM Conference on E-Commerce , pages 206–209.Vytelingum, K., Cliff, D., and Jennings, N. (2008). Strategic bidding in continuous double auctions.

ArtiﬁcialIntelligence , 172(14):1700–1729.Vytelingum, P. (2006).

The structure and behaviour of the Continuous Double Auction . PhD thesis, Universityof Southampton.Wray, A., Meades, M., and Cliff, D. (2020). Automated creation of a high-performing algorithmic trader via deeplearning on level-2 limit order book data. In

Proceedings of the IEEE Symposium Series on ComputationalIntelligence (SSCI-2020) .Xu, K., Gould, M., and Howison, S. (2019). Multi-level order-ﬂow imbalance in a Limit Order Book.

SSRN3479741 .Zhang, Z. (2020a). GitHub repository: https://github.com/ davecliff/BristolStockExchange/tree/master/ZhenZhanghttps://github.com/ davecliff/BristolStockExchange/tree/master/ZhenZhang