[PDF] Empirical Study of Market Impact Conditional on Order-Flow Imbalance

Abstract

In this research, we have empirically investigated the key drivers affecting liquidity in equity markets. We illustrated how theoretical models, such as Kyle's model, of agents' interplay in the financial markets, are aligned with the phenomena observed in publicly available trades and quotes data. Specifically, we confirmed that for small signed order-flows, the price impact grows linearly with increase in the order-flow imbalance. We have, further, implemented a machine learning algorithm to forecast market impact given a signed order-flow. Our findings suggest that machine learning models can be used in estimation of financial variables; and predictive accuracy of such learning algorithms can surpass the performance of traditional statistical approaches. Understanding the determinants of price impact is crucial for several reasons. From a theoretical stance, modelling the impact provides a statistical measure of liquidity. Practitioners adopt impact models as a pre-trade tool to estimate expected transaction costs and optimize the execution of their strategies. This further serves as a post-trade valuation benchmark as suboptimal execution can significantly deteriorate a portfolio performance. More broadly, the price impact reflects the balance of liquidity across markets. This is of central importance to regulators as it provides an all-encompassing explanation of the correlation between market design and systemic risk, enabling regulators to design more stable and efficient markets.

Full PDF

EEmpirical Study of Market Impact Conditional onOrder-Flow Imbalance

Anastasia BugaenkoSupervisor: Dr Ankush AgarwalWord Count: 14300August 2019

A dissertation submitted in part requirement for the Master of Science inQuantitative Finance

Adam Smith Business School, University of [email protected] a r X i v : . [ q -f i n . C P ] A p r cknowledgements I would like to express my gratitude to my family and my partner fortheir endless support, love and patience. bstract

In this research, we have empirically investigated the key drivers aﬀectingliquidity in equity markets. We illustrated how theoretical models, suchas Kyle’s model, of agents’ interplay in the ﬁnancial markets, are alignedwith the phenomena observed in publicly available trades and quotes data.Speciﬁcally, we conﬁrmed that for small signed order-ﬂows, the price impactgrows linearly with increase in the order-ﬂow imbalance. We have, further,implemented a machine learning algorithm to forecast market impact givena signed order-ﬂow. Our ﬁndings suggest that machine learning models canbe used in estimation of ﬁnancial variables; and predictive accuracy of suchlearning algorithms can surpass the performance of traditional statisticalapproaches.Understanding the determinants of price impact is crucial for several rea-sons. From a theoretical stance, modelling the impact provides a statisticalmeasure of liquidity. Practitioners adopt impact models as a pre-trade toolto estimate expected transaction costs and optimize the execution of theirstrategies. This further serves as a post-trade valuation benchmark as sub-optimal execution can signiﬁcantly deteriorate a portfolio performance.More broadly, the price impact reﬂects the balance of liquidity across mar-kets. This is of central importance to regulators as it provides an all-encompassing explanation of the correlation between market design and sys-emic risk, enabling regulators to design more stable and eﬃcient markets.

Keywords:

Market Impact, Liquidity, Order-Flow Imbalance, Machine Learning

Note:

This copy of the research does not include the source code. Please contactthe author for reference to the source code at – Email: [email protected] ontents

List Of TablesList Of Figures1 Introduction 1 eferences 86Appendix A 92

A.1

Market and Funding Liquidity . . . . . . . . . . . . . . . . . . . . . 92

Appendix B 93

B.1

Selective Liquidity Taking Under Market Fragmentation . . . . . . . 93B.2

Limit Orderbooks and Their Models . . . . . . . . . . . . . . . . . . 95

Appendix C 97

C.1

The NASDAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97C.2

Ordeerbook Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 97C.3

Data Prepossessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 99C.4

Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105C.5

The Gauss–Markov Theorem . . . . . . . . . . . . . . . . . . . . . . 107C.6

Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 108C.7

AIC and BIC Goodness-of-Fit Measures . . . . . . . . . . . . . . . 109C.8

Hidden Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110C.9

Empirical Studies Review . . . . . . . . . . . . . . . . . . . . . . . 111

Appendix D 114

D.1

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114D.1.1 Alternative Data . . . . . . . . . . . . . . . . . . . . . . . . 114D.1.2 Illiquid Markets . . . . . . . . . . . . . . . . . . . . . . . . . 115.1.3 Invariant Market Impact Function . . . . . . . . . . . . . . . 115D.1.4 Optimal Execution . . . . . . . . . . . . . . . . . . . . . . . 116D.1.5 Computational Resources . . . . . . . . . . . . . . . . . . . 118

Appendix E 119 ist of Tables (cid:104) s (cid:105) Average spread; R (1) the lag-1 response function for all MOs; (cid:80) R standard deviation of price ﬂuctuations around the averageprice impact of MOs – all expressed in dollar cents; N MO totalaverage number of MO per day between 10:30 and 15:00 on eachtrading day from 2 nd of January to 30 th of June 2015 for four large-and small- tick stocks. . . . . . . . . . . . . . . . . . . . . . . . . . 607 (cid:104) s (cid:105) Average spread; R (1) the lag-1 response function for all MOs; (cid:80) R standard deviation of price ﬂuctuations around the averageprice impact of MOs – all expressed in dollar cents; N MO totalaverage number of MO per day between 10:30 and 15:00 on eachtrading day from 30 th of June to 31 st of December 2015 for fourlarge- and small- tick stocks . . . . . . . . . . . . . . . . . . . . . . 638 In bold are empirical observations from the work of Bouchaud et al.(2018); in white results of the current study using market data fromLOBSTER (measurement units are the same as in Tables 6 and 7) . 64 Separate LO executions representing the same MO at the sametimestamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10410 Descriptive Statistics: Order-Flow Imbalance and Aggregated Mar-ket Impact. Order-Flow Imbalance measured in shared, AggregatedMarket Impact is measured in dollar cents. . . . . . . . . . . . . . . 11211 MSE Cross Validation Scores run k ×

10 . MSE is measured in dollarcents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 ist of Figures nd of January to 30 th of June 2015 in light grey; and thecomputational time in seconds for estimation of lag-1 unconditionalimpact 30 th of June to 31 st of December 2015 in dark grey . . . . . 119 Introduction

A security marketplace broadly refers to any venue where buyers and sellersculminate to exchange resources, enabling prices to adapt to supply and demand(Bouchaud et al., 2018). Trading can take place in several possible ways; viabroker-intermediated over-the-counter (OTC) deals, specialized broker-dealer net-works, decentralised internal chat rooms where traders engage in bilateral trans-actions, amongst others.In traditional quote-driven markets , all trading is enabled by designated marketmakers (MM or specialists liquidity providers) who quote their prices with corre-sponding volumes (the quantity to be bought/ sold), whilst other participants –market takers – submit their orders to either buy at quoted ask price or sell atthe bid price posted by the market maker. In this respect, market makers oﬀerindicative prices to the whole market. However, today, most modern markets op-erate electronically across multiple venues, and center around a continuous-timedouble-auction (where participants can simultaneously auction buy and sell orders)mechanism, using a visible limit order book (LOB). The LOB mechanism allowsany participant to quote bid/ ask prices, and a transaction takes place whenevera buyer and a seller agree on the price. The London, New York (NYSE), Swiss,Tokyo Stock Exchanges, NASDAQ, Euronext, and other smaller markets operateusing some kind of LOB. These cover a range of liquid (traded in large volume)1roducts including stocks, futures, and foreign exchange. Market participants insuch venues can see the proposed prices, submit their own oﬀers and execute tradesby sending relevant messages to the LOB. Owing to technological developments,traders across the globe can access information about LOBs state in real-time andincorporate their observations when deciding on how to act. This transparencycombined with low-latency, high liquidity and low trading costs of electronic ex-changes appeals to many individual and institutional traders (Hautsch and Huang,2011).The quality of a security market is often characterised by its liquidity . Never-theless, the term is not simple to deﬁne accurately, with precise deﬁnitions onlyexisting in the context of particular models. Generally, liquidity is provided whencounterparties enter into a ﬁrm commitment to trade. This ultimately results inan exchange of resources at a perceived free market fair price (market clearing, asdescribed by general equilibrium pricing). In this regard, the term captures theusual economic concept of price elasticity – in a highly liquid market (where manyparticipants are willing to trade) a small shift in supply (respectively demand)does not result in a large price change (Hasbrouck, 2007). Kyle (1985) more ade-quately describes liquidity by identifying three key properties of a liquid market: tightness – “the cost of turning around a position over a short period of time”, depth – “the size of an order-ﬂow innovation required to change the prices by agiven amount” or the available volume at the quoted price, and resilience – “the2peed with which prices recover from a random, uninformative shock”.Despite these simplistic yet elusive deﬁnitions, in the marketplace, liquidity is acomplex variable with multiple unobservable facets, and often the main contrib-utor to the non-stationarity of ﬁnancial time series (amongst other variables, i.e.,volatility). The diﬃculty in providing a more comprehensive deﬁnition of liquidityis exacerbated by the fact that academia has traditionally preferred to look at theworld through the lens of a perfect, frictionless market with inﬁnite liquidity at themarket price. Nonetheless, the qualities associated with the word are suﬃcientlywidely accepted and understood, making the term useful in practical discourse.In particular, practitioners discern market liquidity from that of funding liquidity.To capital market participants, liquidity generally refers to implicit or explicit transaction costs (arising from limited market depth in the security), bid-ask spread (i.e., quality spread – a diﬀerence in interest rates/ the diﬀerence in price at whichone can buy or sell an asset) and price impact (a change in market price thatfollows a trade). This is colloquially referred to as market liquidity . Conversely,risk managers are often concerned with funding liquidity . This pertains to the easeat which a ﬁnancial institution can raise funds/ capital to meet cash shortfalls(Acharya, 2006). 3 .1 Background

As outlined above, prior to the widespread adoption of LOBs, liquidity provisionwas traditionally designated to a small group of specialists. These specialists servedas the exclusive source of liquidity for an entire market. This mechanism workedparticularly well for quote-driven markets, granting these so-called MMs severalprivileges in exchange for immediate quotation and clearing services (i.e., ensuringsettlement of transactions). To maintain eﬃciency under this market structure,dealers/ MMs must maintain undesirably large inventories (long position – assetsthat have been bought; short position – asset borrowed against a deposit known ascollateral), accumulated whilst providing liquidity. This is problematic for MMswho typically aim to keep their net inventory as close to zero as possible, so as notto bear the risk of the assets’ price declining (Bouchaud et al., 2018).Alternatively, modern markets place no such restriction; in today’s electronic mar-kets, all agents can act as MM by oﬀering liquidity to other participants. Thisemerging complexity of electronic trading venues has intrinsically blurred the linebetween the usual distinction of liquidity provider (MM) and consumer. Nonethe-less, to assist our discussion we adopt a more concrete distinction of the type ofparticipants, as outlined in the works of Cartea, Jaimungal and Penalva (2015)4nd Bouchaud et al. (2018):1.

Informed Traders – attributed to sophisticated traders who proﬁt fromleveraging statistical information (i.e., private signal or prediction) aboutthe future price of an asset, which may not be fully reﬂected in the assetsspot price2.

Uninformed Traders – attributed to either unsophisticated traders withno access to (or inability to correctly/ eﬃcient process) information, or mar-ket participants who are driven by economic fundamentals outside of theexchange. These traders are often labelled noise traders as a large fractionof their trades arise from portfolio management and risk-return trade-oﬀsthat carry very little short-term price information3.

Market makers (MMs) – attributed to (provisionally) uninformed pro-fessional traders who proﬁt from facilitating the exchange of a particularsecurity and exploiting their skills in executing tradesClearly, the notion of information fundamentally underpins our classiﬁcation ofeach agent and deﬁnes their ability to accurately forecast price changes (Bouchaudet al., 2018). Considering the interactions and tensions amidst these groups pro-vides useful insights into the origins of many interesting observed phenomena inmodern ﬁnancial markets. 5 .1.2 Asymmetric Information and Adverse Selection

The rate at which information is incorporated/ reﬂected in prices underpins thedegree of eﬃciency in the market. In this regards, ﬁnancial markets are notgenerally classiﬁed purely at two extremes (eﬃcient or ineﬃcient) but have beenshown to exhibit various degrees of eﬃciency (McMillan et al., 2011). In thisview, market eﬃciency is observed as a continuum between extremes of completelyeﬃcient, at one end, and ineﬃcient at the other. This is consistent with widespreadempirical observations (see Finnerty (1976) and Seyhun (1986)), where the strongform eﬃciency has been shown not to hold in light of private information.As private information can consist of signals about the terminal value of the se-curity, information asymmetry is of fundamental importance to MMs (who oftentrade with highly informed participants) and is the prevailing consideration of ourstudy. Whereas most small trades contain relatively little information and arethus innocuous for MMs providing liquidity; larger orders could be interpretedas stronger signals of an information advantage stemming from better predictivemodels.An imperative consequence of such informed order-ﬂows (trends in the directionof trading arising from more informed participants) is the resulting inventory im-balance, where MMs are forced to accumulate larger net positions in the short-run– i.e., MM receives many more buy orders than sell, with a high probability of6eing on the wrong side of the trade. This is known as adverse selection and maycause MMs huge losses as they are “picked-oﬀ” by more informed traders whenmaking binding quotes (Hasbrouck, 2007).To compensate for this information asymmetry (therefore mitigating the risk ofbeing adversely selected), MMs choose how much liquidity to reveal and look toeﬃciently process any new piece of information by updating their bid/ ask quotesin response to the order-ﬂow imbalance. Such market friction results in MeanField Games, where MMs adjusts their bid/ ask prices as more informed liquiditytakers submit large trades. This leads to a worse execution price for the informedtrader – the so-called market or price impact . Consequently, informed agents mustselectively take liquidity using optimal execution strategies (i.e., split their largeorders across time to match the liquidity volume revealed by MM) as described inthe work of Almgren and Chriss (2001), see Appendix B.1.7 .2 Motivation Following the wake of the 2007 global credit crisis, there has been a myriad ofregulations requiring institutional investors (both on the buy-side and sell-side) tomeet several liquidity related policies (see Table 1). This stems from the generalperceived reduction in the quality of liquidity across asset classes as per the reportproduced by Bloomberg (2016).Buy Side Sell SidePrudent Valuation MIFID IIRRP SEC (22E-4)ILAAP AIFMDBasel 3 (LCR) UCITSFRTB (Basel 4) FORM PF

Table 1:

Financial Liquidity RegulationsLiquidity risk is of special importance to practitioners because it might cause abank to fail despite no trading losses (Murphy, 2008). This risk pertains to theﬁrms’ ability to meet cash demands. These demands might be either known inadvance, such as coupon payments; or unexpected, such as the early exercise ofoptions or the need to liquidate portfolios of large positions. Therefore, inadequatefunding and market liquidity may impair the ﬁrms’ ability to meet their paymentobligations.Moreover, excess transaction costs arising from liquidity concerns are important8actors in determining investment ﬁrms’ performance. These costs can becomevery high, reducing any trading proﬁts. According to Jean-Philippe Bouchaudfrom Capital Fund Management, nearly two-thirds of trading proﬁts can be lostbecause of market impact costs (Day, 2017). Whilst explicit transaction costscan be accounted for, the implicit costs (such as market impact) cannot be esti-mated directly but can be approximated by measuring liquidity and minimized byadopting an optimal trading strategy.Within the microstructure of ﬁnancial markets, an optimal liquidation/ acquisition strategy delivers the minimum market impact for a particular order size and timehorizon (the urgency at which an asset is to be bought/ sold). In this respect,Kyle and Obizhaeva (2018) deﬁne market impact as the expected adverse pricemovement from a pre-trade benchmark (the decision/ fair price for which a traderwishes to purchase an asset), upon execution. Consequently, accurate measure-ment of market impact is essential, possibly blurring the line between a proﬁtableand unproﬁtable strategy net of such transaction costs.However, the eﬀect of a ﬁrm’s own trading activity on the market prices is notori-ously diﬃcult to model as there is no standard formula that applies to every ﬁnan-cial asset or trading venue. Deriving such formula is a challenging task due to thelack of trade activity and data in various asset classes. For instance, investment-grade ﬁxed income securities are traded in quote driven over-the-counter markets(OTC), with no transaction visibility, whereas large common stocks are often found9n more liquid order driven electronic exchanges. Hence, the functional formulafor the market impact would vary according to assets characteristics and tradingpattern.Market microstructure literature has discussed a number of market impact/ costfunctions, with theoretical studies arriving at a model of linear functional form,where price impact is said to be proportional to the volume of security traded inthe market. On the other hand, an overwhelming number of practitioners havepurported a square root model, which suggests a marginal price impact diminishesas the trade volume increases (Kyle and Obizhaeva, 2018). Despite the presenceof some empirical evidence for the square root model of market impact, bothpractitioners and academics agree that the model is not exact and is not alignedwith the theoretical research (Bouchaud, 2009).The purpose of this study, therefore, is to conduct a robust empirical analysis ofthe market impact functional form and validate it against existing models. Thenovelty of our methodology lies in the advanced statistical tools adopted. Speciﬁ-cally, the study will examine the application of machine learning techniques to thederivation of a market cost function. Unlike traditional regression analysis thatsuﬀers from limitations such as “ curse of dimensionality ” (the model becomesmathematically intractable when dealing with a large number of explanatory vari-ables), machine learning (ML) has been proven to provide robust results in manyhigher-dimensional ﬁnancial applications. This is because of its ability to ﬁt and10redict using complex data sets (Park, Lee and Son, 2016). With the increas-ing availability of high-frequency market trading data, we are now at an acutejuncture where we can begin to conduct meaningful studies of the relationshipbetween order ﬂow, liquidity and price impact in order-driven markets. In doingso we hope to facilitate a better understanding of market impact function giventhe gap between current empirical ﬁndings and the theory.Being able to model market impact more accurately is essential for a better under-standing of how trades aﬀect prices and how to quantify the degree of this impactas well as it’s dynamics (Gu´eant, 2016). This knowledge of the price formationprocess would empower both practitioners and academics to arrive at models thatbetter depict observed market behaviour, further contributing to the eﬃciency andstability of modern market microstructure.

The focus of this research is to derive a functional form of market impact usingparametric ML algorithm. To achieve this, a detailed investigation of the keydrivers aﬀecting liquidity is required; with an emphasis on observing the conse-quences of executing large orders by exploiting data from a stock exchange.A series of experiments will be carried out using the scientiﬁc method to statis-11ically reconstruct the dynamics of NASDAQ Limit Order Book (LOB). LOBscontain detailed information about the interplay between liquidity providers (i.e.,market makers) and liquidity takers. This permits us to select microstructure fea-tures (explanatory variables) that underpin price impact, and thus, need to beincluded in its function deﬁnition. These features will then serve as input featuresto the ML algorithm.The work will be conducted in a controlled environment, examining liquid stocks.This allows for a vast and rich data set that can facilitate precise and robustnumeric results. As parametric models are often prone to overﬁtting (thus, badforecasting), we look to reduce both bias and variance errors by conducting cross-validation of the derived model. The latter involves dividing the training data setin random parts and ﬁtting the model on each partition (known as out-of-sampletesting).

To summarise, the aims of this study are:

1. Investigate key drivers aﬀecting liquidity in equities markets2. Derive a functional form of market impact using machine learning algorithm3. Compare machine learning predictive performance against traditional statis-tical models using cross validation12 .4 Thesis Structure

The structure of this thesis is organised as follows: • Chapter 2 -

Background and Literature Review is a review of key terms andintroduction to the research environment. • Chapter 3 -

Data and Research Methodology describes the dataset and theapproach adopted throughout the study. • Chapter 4 -

Empirical Study illustrates the outcomes of our experiments;and discusses the implications of the ﬁndings. • Chapter 5 -

Conclusion and Future Work is a summary of our key resultsand suggestions for future works. 13

Literature Review

No respectable model exists without an appropriate understanding of the sys-tem rules and challenges faced by domain practitioners, as well as empirical facts.To facilitate a comprehensive study of the functional form of market impact, wemust ﬁrst consider several key concepts present in the

Market Microstructure lit-erature.Market microstructure forms a long and rich history of diﬀering viewpoints, withacademics (economist, physicists, and mathematicians) and practitioners (regu-latory policymakers and investors) typically residing at two distinct ends of thespectrum. As we will discuss, all such perspectives have their conﬁnes and inter-sect. Developing a coherent understanding of these themes is a long and complexendeavour. This chapter serves to situate these issues within the current researchanatomy.

The microstructure of a market is characterised by the interactions of the kinds ofparticipants, and rules governed by regulators. These rules focus on minimizingany friction arising at the level of trading venues, as well as how the exchange ofassets takes place in very speciﬁc settings.14he term market microstructure was ﬁrst coined by Garman (1976), in the paperof the same title, where he describes the moment-to-moment trading activitiesin asset markets. The ﬁeld has since emerged as an eﬀervescent research areaof prominent importance. A substantial number of changes have occurred sincethe expressions ﬁrst usage. For example, the price formation process has beenimpacted by the fragmentation of markets in major ﬁnancial hubs such as theUS and Europe (e.g., introduction of

Dark Pools – alternative trading systemswith no visible liquidity, for which market activities take place away from publicexchanges), no doubt due to the abundance of technological advances (i.e., au-tomation of trading and the development of execution algorithms). These modernmarket designs have prompted new questions for modelers.However, information remains a key dimension at the heart of the prevailing mi-crostructure studies. The ﬁrst iterations of models embracing this notion of infor-mation were developed during the last quarter of the 20th century. Economistssuch as Kyle (1985), provided an in-depth analysis of how information is conveyedinto prices; and the impact of asymmetric information on liquidity in general. Thisnotion that “market prices are an eﬃcient way of transmitting the information re-quired to arrive at a Pareto optimal allocation of resources” (Grossman, 1976) –is a natural emerging property of microstructure studies that aim to identify howdiﬀerent trading conditions and rules promote, or hinder, price eﬃciency. That is,many classical (static and dynamic) microstructure models describe the process15y which new information comes to be reﬂected in prices. This transmission ofinformation into transactions and prices is deeply related to market impact, provi-sion of liquidity and determinants of the bid-ask spread. These topics were oftenthe focus of much of the earlier academic literature.The ﬁrst academic papers focussing on optimal execution were those of Bertsimasand Lo (1998), Almgren and Chriss (1999) and Almgren and Chriss (2001), withinterest in the subject only truly proliferating beyond 2000 (Gu´eant, 2016). Mod-els incorporating the use of limit orders (visible orders resting in the LOB) anddark pools soon followed. Appendix B.2 describes earlier LOB models and theirevolution.These new models featured more complex variables such as trading volatility andinvolve coeﬃcients that need to be estimated using high-frequency datasets (timeseries of market data observed at extremely ﬁne scales, i.e., milliseconds). Manystatisticians are now acutely involved in the study of market microstructure, bring-ing with them advanced methods based on stochastic calculus that allow for betterestimation of parameters given the data. More speciﬁcally, there are several im-portant pieces of literature on high-frequency liquidity provision. This began in2008 with the publication of Avellaneda and Stoikov (2008) who presented a modelof market dynamics, comprising of a complex partial diﬀerential equation (PDE)that was solved by Gu´eant, Lehalle and Fernandez Tapia (2013).16oday, quantitative research on market microstructures is more concerned withthe importance of pre- and post- trade transparency, the optimal tick size, the roleof alternative trading venues, clearing and settlement of standardized products,amongst others.Nonetheless, there remains room for improvements when it comes to more real-istic dynamic market models that better depict widely observed, but still poorlyunderstood micro- and macro- structure phenomena (Bouchaud et al., 2018). Toexamine this relationship between market dynamics and some exogenous variablessuch as volume and order-ﬂow imbalance, we must ﬁrst review the properties ofprominent models of market impact.

As we have deliberated, the notion of liquidity in ﬁnancial markets is an elusiveconcept. However, from a practical stance, one of its most important metrics is theresponse of price as a function of order-ﬂow imbalance (i.e., excess volume withrespect to the order sign). This response is known as market impact.In much of the literature, there are three distinct strands of interpretation for thecause of market impact, which reﬂects the great divide between eﬃcient marketenthusiasts and sceptics: 17. The Eﬃcient Market Hypothesis (EMH) posits that all available informa-tion is reﬂected in prices as rational agents immediately arbitrage away anydeviation from the fair price. In the eﬃcient market framework, rationalagents who believe that asset prices are always close to their fundamentalvalue “successfully forecast short-term price movement”. This can result ina measurable correlation between trade sign and subsequent price change(Bouchaud, Farmer and Lillo, 2009). under this interpretation, as empha-sised by Hasbrouck (2007), “. . . orders do not impact prices . It is more accu-rate to state that orders forecast prices ”, thus, noise-induced trades carryingno information yield no long-term price impact as prices would otherwisedeviate from their fundamental value.2. The second picture reinforces the ﬁrst in that market impact is the apparatusby which prices adapt to new information as illustrated in the aforementionedGlosten and Milgrom (1985) and Kyle (1985) models; therefore permittinginformation about the fundamental value of the asset to be incorporated intoprices.3. The third perspective resonates with that of the eﬃcient market sceptic.Here, in the absence of fundamental price nor private information, zero-intelligent models describe how prices impact is a reaction to order-ﬂow im-balance. In Farmer, Patelli and Zovko (2005) Santa Fe model, they delineatea completely stochastic order-ﬂow process by which the act of trading itself is18autologically seen as the physical medium statistically interpreted as priceimpact.Though all three interpretations result in a positive correlation between trade signs,volume and price impact (response function), they are conceptually very distinct.In the ﬁrst two pictures, trades reveal private information about the fundamentalvalue of assets, resulting in a price discovery process . In the latter mechanicalinterpretation, one should remain agnostic of the informational content of tradesand should instead speak of price formation .Trading impacts prices – this is an undisputable empirical observation. However,the interpretation of this impact is still widely debated; whether prices are formedor discovered remains a topic of discussion with no clear consensus at this stage.But because of the unclear distinction between true information and noise, onecan assume reality lies somewhere between all three extremes (Bouchaud et al.,2018).The concepts of adverse selection and market impact are often captured by asym-metric information models describing why liquidity providers actions should de-pend on the behaviour of other market participants. A key early reference onthe subject is the seminal Kyle (1985), which provides an elegant explanation ofhow impact arises from liquidity providers fears of adverse selection when tradingagainst highly informed traders. Albeit not very realistic, the model is a concrete19llustration of how private information comes to be reﬂected in the price of an asset.Moreover, economic models further provide important insights into the challengesfaced by MM. In this regard, Glosten and Milgrom (1985) work demonstrate howthe bid-ask spread must compensate MM for adverse selection when trading in acompetitive market.

The market microstructure literature has studied a diverse range of market im-pact models. However, despite many decades of theoretical and empirical research,there remains a vast number of open questions regarding its functional form (Kyleand Obizhaeva, 2018). This is in part due to the varying characteristics of assetsas dictated by their market microstructure. Moreover, as outlined in Almgren etal. (2005), there are diﬀerent classes of market impact that must be distinguished.Before presenting our empirical investigation, we consider some heuristic modelsof market price dynamics. These models intuitively encapsulate the strategic con-siderations of market participants by examining the delicate balance in liquiditymaintained by ongoing competition between rational agents.20 inear and Permanent Impact: The Kyle Model

Kyle (1985) proposed a classic toy model which seeks to shed light on the mech-anisms by which private information is gradually propagated into prices in aneﬃcient market. The model was further extended in (Kyle (1989) to account forthe case of several competing informed traders. Kyle’s original model assumesthe simple case of a normal distributed random variable (Kyle’s Lambda λ ) andderives a single statistical measure of the impact that is both linear in traded vol-ume (order size) and permanent in time. In this framework, market dynamics arejuxtaposed as a contest between an inside trader (who holds unique informationregarding the fair price of an asset) and a noise trader (who submits random or-ders in the absence of actual knowledge) whom submit orders that are cleared bya Market Maker (MM) at every time step ∆ t . In the model, the price adjustmentrule ∆ p of the MM must be linear in the total signed volume εv , i.e.,∆ p = λεV (1)This gauges the market impact of a trade as a consequence of volume ﬂow imbal-ance, where λ is a measure of impact and is inversely proportional to the liquidityof the market. The consecutive price adjustment is further permanent , i.e., theprice change between time t = 0 and t = T = N ∆ t is:21 t = p + N − (cid:88) n =0 p n = p + λ N − (cid:88) n =0 ε n V n (2)Formula 2 assumes that the impact λ n V n of trades in the nth time interval persistsunconditionally unabated up to time T . It is clear that the signs of the trademust be serially uncorrelated if the price is to follow a stochastic path. Within themodel’s framework, the trading schedule of the informed insider is precisely suchthat ε n are uncorrelated (Kyle, 1985). Conversely, data from the real marketsreveal autocorrelation in the signs of the traded volumes over prolonged timeframes as empirically observed by Bouchaud et al. (2003).In summary, Kyle’s model elicits some elegant deep truths about how marketsfunction, but also fails to capture the essence of important empirical propertiesof real markets. In the model, the mechanism by which information comes to bereﬂected in prices (price impact) is due to the MM’s attempt to predict the infor-mational content comprised in the order-ﬂow and adjusts their prices accordingly.Additionally, price impact is said to be linear (i.e., price changes are proportionalto the order ﬂow imbalance) and permanent (i.e., there is no decay of impact ex-hibited). In the context of LOBs, linear impact is a generic consequence of a ﬁniteliquidity density (buy/ sell orders) in the vicinity of the price, and permanent im-pact is the outcome of liquidity immediately reﬁlling the subsequent consumption22ap caused by the market order, as demonstrated by Obizhaeva and Wang (2013).Because of impact, informed traders restrict their trading volume to optimise re-turns. Thus, despite private information, the amount of proﬁt that can be madeby exploiting this insider knowledge is limited. Concave Transient Impact: The Square Root Model

Kyle’s original model assumes a linear dependency of impact on traded volume, butthis requires a variety of idealised assumptions that may be violated in real markets(Zarinell et al., 2015). A key ﬁnding is that impact is not only mechanical, but alsodynamic, meaning it cannot simply be described by revealed supply or demand of avisible LOB (Weber and Rosenow, 2005). The impact is rather related to the latentunderlying liquidity – hidden supply and demand, not reﬂected in the LOB. Thisstems from the fact that even highly liquid markets only oﬀer very small volumesof liquidity for immediate execution (see Appendix B.1). Consequently, tradesmust be fragmented creating long memory in the sign of the order-ﬂow as privateinformation is slowly incorporated into prices. This, however, is incompatible withthe permanent impact (as described by (Kyle (1985)), which would otherwise leadto trends, i.e., strong autocorrelated price changes (Bouchaud, 2009).As an alternative, the square root model of market impact was ﬁrst proposed byTorre (1997) based on empirical regularities observed by Loeb (1983). Empirical23tudies have since ubiquitously found market impact to indeed be a non-linearconcave function in the size of meta-orders (large orders fragmented into a numberof smaller market orders) and fading in time (Bouchaud, 2009). This concavenature of market impact is universally observed over several heterogeneous datasetsin terms of markets (including equities, FX, futures, bitcoins and even OTC creditmarkets as reported by Eisler, Bouchaud and Kockelkoren (2012), epochs andexecution styles. On this note, it is worth mentioning that some studies reportempirical deviations from the square-root law, e.g., Almgren et al. (2005) andZarinell et al. (2015).Nonetheless, in all former cases, the square root model is consistently simple andempirically realistic. Let G denote the percentage cost of executing a meta-orderof size Q shares of a stock with price P , expressed as a fraction of the value of thetrade | P Q | . Let σ denote the assets return (daily) volatility, and let V denotethe assets (average) daily traded volume in shares per day. The square root lawof market impact is thus described by the relationship G = g ( σ, P, V ; Q ) ∼ σ (cid:18) | Q | V (cid:19) (3)where g ( · ) deﬁnes the functional form for the model and the notation ” ∼ ” means“is proportional to”. The market impact G is a dimensionless (absolute) quantity,24hich is the same regardless of the units of measurement for impact Q, V, and σ .Even though model (3) seems a reasonable indication of transaction costs, therestill no consensus on whether or not price impact is indeed described by the squareroot function. The exponent varies within a range of 0.4 to 0.7. For example,empirical ﬁndings by Almgren et al. (2005) and Kyle and Obizhaeva (2016) ﬁndan exponent closer to 0.6, but average observations suggest a power closer to 0.5(square root). We note that the only conditional variable is the total traded volume Q . This is surprising as it implies that the time taken to completely liquidate(respectively acquire) and execution path are not important factors in determiningmarket impact. Nevertheless, real data shows that impact does depend on suchdynamics, thus, power law , as often referenced, should, therefore, be seen as a goodﬁrst-order-approximation. Whilst this result is in stark contrast with classicaleconomic literature (which asserts linear impact (Kyle, 1985)), it is perfectly inaccordance with the fact that instantaneous observed liquidity is limited in realmarkets further indicating that markets may be inherently fragile.Our empirical study aims to quantify how liquidity providers react to the arrivalof market orders. The following section provides the foundation of the statisticalapproaches employed in this research. By outlining the analysis methodologyand exploring more recent developments in the area of data analytics, we hopeto facilitate a practical solution for accurately measuring order-ﬂow, therefore,providing clarity on many obscurities surrounding price impact.25 Data and Research Methodology

Digital markets, generally, facilitate trade by automatically matching those want-ing to sell with those wanting to buy. Market participants express their willingnessto trade a speciﬁc quantity at a speciﬁc price by submitting orders to the exchange.At a high level, all orders are classiﬁed by their type as Market Orders (MO) orLimit Orders (LO). MOs indicate an immediate need to execute the trade. LOs,on the other hand, are known as passive orders because these usually do not re-sult in an instant execution. LOs are often submitted at a price worse than thecurrent market price, and therefore, have to wait until either a new order arrivesthat matches LO price or the LO is withdrawn. All active (non-cancelled) LOsare placed in a queue according to their corresponding price. This order queue ismanaged within the LOB, whilst mapping of orders is conducted by the matchingengine.LOB is deﬁned on a ﬁxed discrete grid of prices where submitted LOs are recordedin separate price level queues. Figure 1 illustrates a sample snapshot of LOB:vertical blue and red bars represent queues of LOs to buy and sell respectively.26he length of each queue is deﬁned by the number and sizes of orders that havebeen submitted at that price, but not yet matched for execution. The scale ofprice and volume axis is deﬁned by LOB resolution parameters – tick and lot sizesrespectively. The tick size is the smallest possible change in the price of an asset.In other words, the tick size deﬁnes the precision of the quoted price. On the otherhand, the lot size deﬁnes the smallest amount (expressed as the number of shares)of security that can be traded within the LOB.When a new buy or sell LO comes in, it is added to the end of the correspondingprice queue – on the top of previous LOs at that level, see Figure 1. The diﬀerencebetween the best (lowest) ask and the best (highest) bid prices is known as thespread; while the arithmetic average of these best quotes is called the mid-price .Mid-price is often used to describe the LOB and its dynamics (as opposed to look-ing at individual behaviour of the bid and ask prices). Therefore, when examiningthe market impact of orders, we are interested in how mid-price had changed inresponse to the execution of MOs. Note that in some cases, it is more appropriateto examine micro-price – the weighted (by inverse volume) average of the bid andthe ask. It comes to be more useful when the imbalance of the orderbook is usedfor prediction of the sign of future price changes as emphasised in Cartea, Jaimun-gal and Penalva (2015) and Bouchaud et al. (2018). In this study, however, weuse mainly the mid-price. 27 igure 1:

Illustration of LOB dynamics (Bonart and Gould, 2017).Matching engine, or matching algorithm, is a well-deﬁned procedure on how toorders are select and execute. Predominantly, in electronic markets, the algorithmtries to map MOs ﬁrst – if two MOs match then these are executed immediately. Ifa new MO does not have an opposite side matching MO at the time of submission,it is executed against LOs in a price-time priority order. That is, an incomingMO is ﬁrst mapped to the oldest LO at the opposite side best price. Then, ifthe quantity demanded by MO is not fulﬁlled, it is executed against earlier LOs(still at the best available price). It is interesting to note that not all matchingengines abide by a price-time priority queuing mechanism. Alternative matchingengines such as the prorata rules are often found in alternative market structures(e.g., in money markets). Under the prorata model, there is no explicit time-priority rule. MOs are instead matched against LOs posted at the best oppositeprice – proportionate to the quantity posted. Moreover, there are other markets28hat combine the two approaches, adopting both time-priority and pro-rate (i.e.,Futures).Traditionally, if the full size of MO cannot be fulﬁlled by LOs resting at the bestprice, the matching algorithm would try to execute the rest of MO against LOsat the second, third and so on best prices; until the full quantity of LO is ﬁlled(walking the book). However, modern ﬁnancial markets implement alternativeprocedures for handling situations where there is not enough liquidity at the bestprice to which the entire MO can be matched. For example, in the US, tradingvenues are obliged by regulators to provide participants with the best possibleprice for the asset. This means that, depending on the speciﬁc order type, theexchange may re-route the remaining units of an unfulﬁlled order to alternativevenue displaying the same best quote. While re-routing can certainly aﬀect liq-uidity (thus price impact) in the market, empirical observations such as Bouchaudet al. (2018) ﬁnd that only a few orders are bigger than the available volume atthe best quote. This indicates that traders try to adapt their order volumes to theavailable quotes. 29 .2 The Dataset

In our research, we will be speciﬁcally analysing trades and quotes data from theNASDAQ electronic stock exchange (for more information on NASDAQ pleasesee Appendix C.1). The time-series data has been provisioned from the academicdatabase LOBSTER (Limit Order Book System Eﬃcient Reconstructor) that of-fers on-demand LOB data, reconstructed from NASDAQ’s Historical TotalView-ITCH ﬁles. TotalView-ITCH is a standard NASDAQ data feed which illustratesthe full depth of the order book (resting limit orders), as well as all market events.This data feed is consumed in a form of message ﬁles which record every statechange to the order book, as opposed to recording timely snapshots. TotalView-ITCH message data contains all visible order activities, such as submission, can-cellation, and matching of limit orders. Therefore, as outlined in the paper byLOBSTER development team, the platform can reconstruct LOB for a NASDAQstock at any required book depth level for the speciﬁed period (Huang, Lehalleand Rosenbaum, 2015). For a detailed procedure on how LOBSTER reconstructsLOB data, please refer to Appendix C.2.30 .2.2 Output Format

Market activity for a stock on a trading day is organised into two LOBSTERoutput ﬁles:1. ‘Message’ ﬁle contains every arriving market and limit orders as well ascancellations and updates – in other words, all trades data2. ‘Orderbook’ ﬁle depicts the evolving state of the LOB. It describes how thetotal volume of buy or sell orders at corresponding price level changes aftereach market event. This information is displayed as a list of all successive quotes

For every entry in the message ﬁle, there is a corresponding record in the orderbookﬁle that describes how the order book advanced immediately after the message ﬁleevent.Tables 2 and 3 correspond to snapshots of message and orderbook ﬁles for thesame stock during the same time (measured in the number of market events). Anevent of type 4 at the timestamp of 43955.2426 (Table 2) illustrates the executionof a buy LO at the ask price 2158800. The size of this order is 10 shares; thus,we observe corresponding Bid Volume entry in the orderbook decreasing by 10.Furthermore, the next arriving MO (represented by LO execution at 43955.242631ime–stamp Event Type Order ID Size Order Price Direction43955.2422 4 140339446 5 2158800 1

Table 2:

Sample entries in ‘message’ ﬁleAsk Price Ask Volume Bid Price Bid Volume2159600 100 2158800 852159600 100

Table 3:

Sample entries in the ‘orderbook’ ﬁle at Level 1in Table 2) consumes all available liquidity of 75 shares and the best bid pricechanges permanently to 2158300 (next best price level in the LOB). In this work, weimplement an algorithm that detects this kind of price changing events and exploresthe causal relationship between market impact and observed LOB features.In summary, the information contained in output LOBSTER ﬁles has the belowproperties: 32

All events have timestamps of seconds after midnight with the precision of atleast milliseconds (nanoseconds depending on the requested period). Whena market order is matched against several limit orders, each matching isrecorded separately (both in message and orderbook ﬁles) but with the sametimestamp. This allows reconstruction of the initial market order volume. • There are seven types of market events that are recorded in LOBSTER data(see Table 4 below). For this study, we are interested in market orders whichcorrespond to event types 4 and 5 – execution of either visible or hiddenlimit orders. A more detailed explanation of why other types of events areout of scope follows in the Empirical Findings chapter. • Order ID corresponds to the unique order reference number. Zero referencenumber corresponds to a hidden limit order. Note that order ID is distinctfrom an ID of a trader/ broker who submitted the order. In other words,knowing the order ID does not allow to reconstruct the ownership of trades,but provides information about a lifespan of a single trade. • Size of a trade is measured in the number of shares. • Price is depicted in dollars times 10000. For instance, 2158800 correspondsto $215.88. • Direction indicates whether a buy or a sell order has been executed:33 -1 : Execution of sell LO, therefore, a MO to buy has been matched atask price – 1 : Execution of buy LO, therefore, a MO to sell has been matched atthe bid priceThe prevailing majority of microstructure literature that we have consultedduring our study adopts the opposite nomenclature. To remain consistentwith the subject expertise we follow earlier examples and use trade sign of-1 to indicate sell MO and 1 for buy MO. • Orderbook ﬁle entries are composed of the ask price and the correspondingvolume, as well as the bid price and its volume. The best prices are Level 1oﬀerings – the cheapest price to buy at (ask), and the biggest price to sell at(bid) from a viewpoint of a trader submitting a MO. Typically, orderbookhas several price levels: from the best Level 1 to the second-best, the thirdbest and so on. However, as we recall from the earlier discussion of the LOB,in most electronic exchanges today, an order is re-routed to other marketsif the available volume at the best price is less than the order size. Hence,nowadays, an order rarely “walks the book” (being matched with LOs ata worse price on a diﬀerent price level). Following suit, we will be lookingprimarily on the Level 1 price changes in the LOB.34vent Number Event Type1 Submission of a new limit order2 Cancellation (partial deletion of a limit order)3 Deletion (total deletion of a limit order)4 Execution of a visible limit order5 Execution of a hidden limit order6 Cross Trade (Auction Trade)7 Trading halt indicator

Table 4:

Event Types in LOBSTER data

The analysis is conducted on four NASDAQ stocks – large tick SIRI and EBAY;and small tick TSLA and PCLN. Conventionally, the tick size is the smallestmovement in the price of an asset. On NASDAQ each asset is traded in its ownbook with the tick size of $0.01. Size of the tick is uniform across all NASDAQlisted securities, despite their prices varying signiﬁcantly across several magnitudes.

Relative tick size , then, is deﬁned as a ratio between dollar tick size and the price ofa stock. Securities with smaller traded prices have a large relative tick size, whilethose that trade at higher prices per share have a smaller ratio between tick sizeand stock price. Large tick stocks are known to have the bid-ask spread almostalways equal to one tick, whilst smaller tick securities usually have spreads of afew ticks (Eisler, Bouchaud and Kockelkoren, 2012). The relative tick size andspread indicate how actively an asset is traded on an exchange.We have chosen to work with diﬀerent (in this regard) stocks to be able to quan-35itatively observe the inﬂuence of a tick size on trading features.The initial objective of our study was to use trades and quotes data for the selectedstocks during the ﬁrst six months of 2015 (2 nd of January to 30 th of June 2015).Depending on the stock, the average number of daily market events varies from8,000 to over 200,000. Among these, we are interested in the price impact of MOsspeciﬁcally. Table 5 lists an average daily number of MOs for each asset.Ticker Number of MOs SIRI

EBAY

TSLA

PCLN

Table 5:

Average daily number of MOs for each stockDuring the ﬁrst six months of 2015, there have been 124 NASDAQ trading dayswhich gives us a total number of relevant MO events of the order of 10 . Thisimplies that our dataset is suﬃciently large, ensuring signiﬁcant robustness in anystatistical ﬁndings. In the later empirical discussion, we elaborate further on thenature of what we have learned from the data and how our results compare toother studies that use similar LOBSTER dataset.36 .3 Method Development The methods employed in this research are solely based on the principles of thescientiﬁc method. The process of the scientiﬁc method begins with the formulationof a question based on observations. This allows a hypothesis to be formed thatmay explain an observed phenomenon. In this instance, a hypothesis may be ”dolarge orders impact prices?”

The null hypothesis is large orders of size (tradedvolume)

Q > Z do not impact prices, where Q denotes the order size.After formulation of a hypothesis, it is up to the scientist to disprove the nullhypothesis and demonstrate that orders of a predeﬁned deﬁned (large) size do infact impact prices. To carry this out a prediction must be deﬁned. To prove ordisprove the hypothesis, the prediction is subject to testing.The results of the testing procedure will provide a statistical answer upon whetherthe null hypothesis can be rejected at a certain level of conﬁdence. If the nullhypothesis cannot be rejected, which implies that there was no discernible rela-tionship between the size of an order and the price impact, it is still possible thatthe hypothesis is (partially) true. A larger set of data, a transformation of variablesor incorporation of additional information can be used to optimise and improvethe level of signiﬁcance. This is the process of analysis and it seeks to reject thenull hypothesis after reﬁnement. For more details on statistical inference pleaserefer to Appendix C.5 and Appendix C.6.37 .3.1 Machine Learning Machine learning (ML) is a subﬁeld of Artiﬁcial Intelligence (AI) that facilitatesautomated methods of data analysis. More broadly, ML deﬁnes a series of adaptivecomputational algorithms that automatically detect patterns in multi-dimensionaldatasets (panel or time-series data with a large number of observed variables) andmakes use of these uncovered patterns to predict values of interest or perform otherkinds of decision making under uncertainty (known as generalisation).The conventional tool of statistical analyses in ﬁnance – econometrics – mainlyfocuses on multivariate linear regression. Linear regression does not have memory(standard econometric models do not learn), thus fails to improve its performancewith new observations. Consequently, econometric regression analysis often fallsshort in understanding the full spectrum of informational content present in thedata. Moreover, when applied to complex problems, more traditional statisticaltools often suﬀer from various limitations (i.e., the “curse of dimensionality” – themodel becomes mathematically intractable when dealing with a large number ofexplanatory variables/ features), whilst ML algorithm can learn patterns in high-dimensional space without being speciﬁcally directed. In this regard, ML methodsdo not replace the theory and conventional wisdom of econometrics but simplyenhance and guide them.The deﬁning attribute that diﬀerentiates ML from such conventional tools is the38lgorithms’ ability to learn (i.e., improve predictive accuracy with time) from ex-perience with respect to a task and performance measure (Mitchell, 1997). Here, experience refers to the past information available to a learner (typically takingthe form of electronic training data). Thus, ML algorithms learn patterns in ahigh-dimensional space without being explicitly directed (i.e., programmed withpre-speciﬁcations). Such algorithms are incredibly diverse, ranging from moretraditional statistical models that emphasise inference to deep neural network ar-chitecture that excel at highly complex classiﬁcation and predictive tasks. As thesuccess of learning is greatly dependant on the data used, ML most closely relatesto statistics and data mining, though it has a diﬀerent emphasis and vocabulary.ML tasks are generally classiﬁed into three broad areas:

Supervised Learning , Unsupervised Learning and

Reinforcement Learning . In the current study, wespeciﬁcally focus on Supervised Learning, which is the most commonly used classof ML. Please refer to Appendix C.4 for information on other forms of ML.Supervised Learning describes a set of predictive learning models for which thepresence of outcome variables guides the learning process. That is, data annotatedwith values such as categories (as in supervised classiﬁcation) or numeric responses(as in supervised regression) facilitates external supervision of the learning process.The objective is to learn a mapping from inputs x to outputs y , given the labelled set of input-output pairs D = ( x i , y i ) Ni =1 . Here D represents the training set,and N is the number of training examples. Given the set of labelled examples,39he algorithm is trained on the data and learns which predictive factors are mostinﬂuential to the responses. When applied to new unseen datasets, supervisedlearning algorithms attempt to make predictions based on their prior trainingexperience. In the simplest setting, each training input x i corresponds to a D- dimensional vector of values that correspond to various attributes of the observedphenomenon. These are known as features and could range from a time-series ofobservations to more complex structured objects such as an image or a molecularshape.Similarly, the form of the output dependent variable (known as the response vari-able) can in principle take on any structure, however, most methods assume that y i is a categorical variable from some ﬁnite set y i ∈ , . . . , C or Y i ∈ R (a real-valued continuous scalar). This research makes explicit use of regression-basedsupervised algorithms, for which we will delineate the basic mechanisms below.40 ulti-Linear Regression Linear regression is a familiar and widely adopted statistical technique. Thisasserts that a continuous scalar response y is the result of a linear combinationof its feature inputs x . Unlike multivariate linear regression, where the modelpredicts multiple correlated dependent variables given multiple input variables, multi-linear models output a single real value scalar. That is: y ( x ) = β T x + (cid:15) = D (cid:88) j =1 β j x j + (cid:15) (4)Where β T , x ∈ R p +1 and (cid:15) ∼ N ( µ, σ ). That is, β T and x are both real-valued vectors of dimension p + 1, representing the inner scalar product betweenthe input vector x and the models’ weight vector β T . (cid:15) , the residual error betweenour linear predictions and the true response, is normally distributed with mean µ and variance σ . The multi-linear regression model makes the strong assumptionof independent and identical distribution (IID) of errors. That is, the distributionof the errors is identical across observations and the errors are independent ofeach other (knowing one error does not assist in forecasting the next error). Whenplotted, such a distribution produces the well know Gaussian (normal) distribution.To make the connection between linear regression and Gaussian distributions moreexplicit, we can express the model in the form:41 ( y | x, θ ) = N ( µ ( x ) , σ ( x )) (5)This makes it clear that the model is a conditional probability density. In thesimplest case, we assume µ is a linear function of x , thus µ = β T x ; and the noise(as measured by variance) is ﬁxed as σ ( x ) = σ . In this case, θ = ( β, σ are theparameters of the model.Multilinear regression forms the basis of many of the more sophisticated supervisedML techniques employed. From the deﬁned model speciﬁcation, it is intuitive toask how the coeﬃcient β is estimated. Further, we introduce two methods foroptimal β estimation. Least Squares

The

Least Squares is the most common approach to learn the parameters of ahyperplane by approximating coeﬃcients that best ﬁt the generated output for aspeciﬁed input set. The diﬀerence between the model’s prediction and the actualoutcome for a given data point denotes the residual , whereas the deviation of themodel from the true population output is called the error . The method of

OrdinaryLeast Squares (OLS – a type of

Least Squares used for estimating the parameters42n a linear regression model) entails taking each vertical distance from a data pointto the regression line (residual), squaring this distance (taking an absolute valueby disregarding the sign) and then minimising the total sum of squared residuals,see Figure 2. In more formal terms, the least square estimation method choosesthe coeﬃcient vector β to minimise the residual sum of squares (RSS, also knownas the sum of squared errors SSE) so that the model ﬁts the data as closely aspossible. Hence, the least-squares coeﬃcients β LS are computed asarg min β LS = RSS ( β ) := N (cid:88) i =1 ( y i − β T x i ) (6)The vertical distances are usually minimised as opposed to the horizontal distancesor those taken perpendicular to the line. This arises as a result of the assumptionthat x is ﬁxed in repeated samples so that the problem becomes one of determiningthe appropriate model for y given (or conditional upon) the observed values of x .For simplicity, the latter term can be expressed in matrix form. By deﬁning the N × ( p + 1) matrix x , it is possible to write the RSS term as: RSS ( β ) = ( y − xβ ) T ( y − xβ ) (7)43 igure 2: Method of OLS ﬁtting a line (described by the model) to the data byminimising the sum of squared residuals (Murphy, 2012).This term is now diﬀerentiated with respect to (w.r.t.) the parameter variable β : ∂RSS∂β = − x T ( y − xβ ) (8)A key assumption about the data is made here: the matrix x T x must be positive-deﬁnite, which is only true if there are more observational datapoints than thereare dimensions. If this does not hold (as is often the case in high-dimensionaldata settings) then it is not possible to ﬁnd a unique β coeﬃcient, thus the abovematrix equation cannot hold. Under the assumption of a positive-deﬁnite x T x thepartial diﬀerential (PDE) is set to zero and solved for β :44 T ( y − xβ ) = 0 (9)The solution to this matrix equation provides ˆ β OLS ˆ β OLS = ( x T x ) − x T y (10) Maximum Likelihood

An alternative to OLS, the

Maximum Likelihood Estimator (MLE) is an optimi-sation process to estimate the coeﬃcients of a statistical model given a particularbatch of data by maximizing the likelihood function (deﬁnes how likely a set ofobservations is to occur given the model parameters). The estimator diﬀers fromprobabilities in that it is not normalised to range from 0 to 1. MLE is a directedalgorithmic search through a high-dimensional set of possible parameter choices,attempting to answer the question:

If the data were to have been generated bythe model, what parameter choices were most likely to have been used ? (Murphy,2012).This reduces to a conditional likelihood problem of seeing the dataset D , given aspeciﬁc set of parameters θ . The value sought is the θ that maximises p ( D| θ ). This45an be framed as searching for the mode of the p ( D| θ ) denoted by ˆ θ , expressed asˆ θ (cid:44) arg max θ log p ( D| θ ) (11)In linear regression problems, we often assume that the errors are IID. This sim-pliﬁes the solution of the log-likelihood by making use of properties of the naturallogarithms, permitting us to express it as (cid:96) ˆ θ (cid:44) log p ( D| θ ) = N (cid:88) i =1 log p ( y i | x i, θ ) (12)In the case of normal distribution of residuals, maximising the log-likelihood func-tion returns the same parameter solution as Least Squares. It is often desirable to measure how well the regression line (described by themodel) ﬁts the data (i.e., explains the variation in the outcome response func-tion).

Goodness-of-ﬁt statistical measures rigorously assess the quality of the sample regression function (SRF) speciﬁcations, permitting us to select model de-46igns that best estimate the true relationship between observed variables. Promi-nent goodness-of-ﬁt measures, including the coeﬃcient of determination (

R-squared ( R )) and adjusted R-squared , are based on maximising the outcome of least-squareestimates. • R measures the proportion of total variation in the observed data explainedby the model and ranges from 0 to 1. R is computed as R = 1 − RSST SS ,where TSS is the total sum of squared deviations of the outcome from itsmean, given by

T SS = (cid:88) i ( y i − N N (cid:88) i =1 y i ) (13)and RSS is the residual sum of squares (the sum of all of the squared dif-ferences between the actual datapoints and the corresponding estimated val-ues), expressed as RSS = (cid:88) i ( y i − ˆ y i ) (14)The objective is to maximise R - the closer the value of R is to 1 the47etter the regression model ﬁts the data (when compared to the mean of theobservation). R only illustrates how one variable explains the observation.When more explanatory variables are added to the model an alternativeadjusted R should be used. • Adjusted R penalises for increasing the feature space with non-signiﬁcantexplanatory variables, by reducing RSS to produce a superior goodness-of-ﬁt.Both R and adjusted R are not ideal for time-series predictive modelling as theyonly reﬂect the models’ precision in ﬁtting past values, which is not indicative offuture predictions.Alternative prominent goodness-of-ﬁt measures are discussed in Appendix C.7. ML algorithms are only capable of learning if the training set contains suﬃcientrelevant variables in the feature space and minimum irrelevant ones. A criticalaspect of a success ML initiative is producing a good set of features for which thealgorithm can be trained on. This process is known as feature engineering andinvolves:1.

Feature Extraction – combining existing features to produce a more usefulvariable 48.

Feature Selection – selecting the most useful/ relevant features to train thealgorithm on amongst an existing set of features.

It could be stated that the deﬁning quality of a successful ML algorithm is its abil-ity to generalise across future unseen datasets. To ensure accuracy and robustnessof a model it is necessary to deﬁne a loss function which measures the predictiveprecision of our model, as a function of the generalisation error . In a regressionsetting, a common loss function is given by the

Mean Squared Error (MSE) - asmaller MSE means the estimate is more accurate. It is deﬁned as:

M SE := 1 N N (cid:88) i =1 | y i − ˆ y i | (15)This states that the generalisation error of a model, given a particular set ofdata, is the average of squared diﬀerences between the training values y i and theirassociated estimates ˆ y i . This function heavily penalises estimate values that diﬀergreatly from the true observations (by squaring the diﬀerences). A small value ofa loss function signiﬁes that the errors are not substantial and, thus, the model,theoretically, will perform similarly when exposed to unseen data.49t is important to note that the standard MSE value is computed only on thetraining dataset (the data for which the model was ﬁtted on) and is, therefore,referenced as the training MSE . This value is of little practical importance as it ismerely representative of the past predictive accuracy; we are more concerned abouthow well the model performs given values of new unseen data. This is known asgeneralisation performance. Given a new prediction value x and a true response y , we look to take the expectation across all such new prediction values, givingus the test MSE : M SE test := E (cid:104) ( y − ˆ f ( x )) (cid:105) (16)Where the expectation is taken across all unseen predictive pairs ( y , x ). Theobjective is to select the model that yields the lowest MSE. To do this, we canincrease the ﬂexibility of the model (the degrees of freedom available to the modelto ﬁt to the training dataset). In this regard, a linear model is very inﬂexible (itonly has 2 degrees of freedom), whereas a polynomial is highly ﬂexible (it can havemany degrees of freedom).However, if the loss function is minimised too severely, then generalisation perfor-mance of the model can decrease substantially. This is a major known concernand is referred to as the bias-variance trade-oﬀ – indicating that the model has50een over-ﬁt to the training data, and has not learned the general form of the re-sponse function. Whilst increasing the degrees of freedom helps the model adaptto more complex datasets, it is extremely important to recognise that there is anincreased likelihood of over-ﬁtting , and a clear warning that the model is moreclosely aligned to minor variations in the training input set ( noise ) than any un-derlying true signal.A technique to mitigate the eﬀects of the bias-variance trade-oﬀ is to use a separatevalidation dataset (diﬀerent from training data) by randomly dividing observationsin two partitions - in-sample (used for model training) and out-of-sample (usedto examine the predictive estimates against the true values) sets. However, if wedon’t have a separate set of reference out-of-sample data, we can adopt a techniqueknown as cross-validation (CV). In the current study we make use of a speciﬁcimplementation of CV known as k-Fold cross validation . k -Fold seeks to divide n observations of the data into k mutually exclusive and approximately equal sized folds (subsets). This is repeated k times, with each iteration holding out a fold asa validation set, whilst the remaining k − k , giving us the average of all individuals meansquared errors, MSE i . More formally, CV k is deﬁned as follows:CV k = 1 N N (cid:88) i =1 MSE i (17)51ue to computational expenses, the optimal value for k has been empiricallyproven to be k = 5 or k = 10. Figure 3:

Schematic of 5-fold cross validation (Murphy, 2012).52

Empirical Study

In this section, we deﬁne the methodology for the price impact analysis, whilesimultaneously introducing several key properties that are relevant to our investi-gation. In the earlier background and literature review section, we have discussedseveral popular market impact models such as the linear Kyle model and SquareRoot empirical functional form. Here, we develop a further understanding of pricedynamics by exploring the functional relationships between change in price (im-pact) and state of the world before and after.Please note that the data used for our statistical analysis has been pre-processedto remove outliers. We further reconstruct original MOs that are represented asseveral LO executions within the LOBSTER dataset. Appendix C.3 describes indetail the applied pre-processing procedure.

In broad terms, price impact can be deﬁned as an incremental change in pricecaused by the execution of market buy orders, or a similar drop in price causedby the sell orders. Statistically, market impact relates to the positive correlationbetween the incoming MO sign and the price change that follows immediately, orsometime after MO execution (Bouchaud et al., 2018).53herefore, in our study of price impact, we focus on the consequences of MOs.We recognise that all kinds of market events may impact prices, but we chooseto concentrate on the eﬀect of MOs because it is conceptually and operationallyintuitive to study initially – i.e., it suﬃces to use trades and quotes data from theexchange.In combination with earlier remarks about submitted order sizes rarely exceedingavailable volume at the opposite side best quote (LOBSTER output discussionin Section 3.2.2), we postulate the following two assumptions of our empiricalanalysis:1. Market orders never exceed available liquidity in the market2. Impact of arrivals and cancellations of LOs can be neglectedThese assumptions will help us understand the high-level dynamics of interactionsin the marketplace without introducing unnecessary complexity to the decision-making process.With this in mind, we can begin to measure the impact of trades on prices.54 .1.1 Unconditional Lag-1 Impact

Bouchaud et al. (2003) in their paper “Fluctuations and response in ﬁnancialmarkets: the subtle nature of ‘random’ price changes” postulate that the simplestquantity, that can assist in the study of price changes, is the mean squared ﬂuc-tuation of the prices between the given trade and the execution of the next one(correspondent to the execution of MOs in the context of LOB trading). Theyfunctionally deﬁne this degree of ﬂuctuation D ( l ) as: D ( l ) := (cid:104) ( m t +1 − m t ) (cid:105) (18)Where m t is the mid-price immediately before the t th MO: m ( t ) := 12 ( a ( t ) + b ( t )) (19) a ( t ) and b ( t ) are the corresponding ask and bid prices.Whilst D ( l ) measures the degree of diﬀusive behaviour of market prices, authorspropose a better alternative to examine speciﬁcally the eﬀect of trading on thechange in prices - lagged response function R (). R (1) is the lagged by 1 response55unction that measures the average diﬀerence between the mid-price just beforethe arrival of an original MO and the mid-price just before the arrival of the nextMO: R (1):= (cid:104) ε t · ( m t +1 − m t ) (cid:105) t , (20)In contrast to the previous diﬀusion estimation, the response function includes theorder sign ε t to account for the direction of MO. For instance, if a market sellorder (direction -1) causes an asset price to decrease, the change in mid-price willbe negative. To compensate for this, we incorporate the order sign -1 (given by ε t ).This allows the correct estimation of correlation between MO and subsequent pricechange. Speciﬁcally, R (1) measures how much, on average, the price increasesgiven a buy order (or how sell order moves the price down).The lag-1 response is calculated as the empirical average of R (1) over all consec-utive MOs. This deﬁnition can be further extended to look at the relationshipof MOs beyond lag-1; or conditioned on extra variables. However, initially, weexplore the statistical properties of R (1) as deﬁned above.56 rder Splitting Lag-1 response function measures market reaction to a single MO. In reality,traders wishing to execute large quantities have to split their orders into childMO to reduce the impact by matching available liquidity at the correspondingprice level (please see Appendix B.1 for more detail). To study the dynamics ofsuch meta-orders, it is necessary to know which child orders belong to the samemeta-order. Such levels of information can only be found in specialised and/ orproprietary data sets. Due to such restrictions, this paper solely focuses on theimpact of single independent MOs. 57 .1.2 Engineering Response Function

We calculate the lag-1 response function for each of our four stocks during the ﬁrst6 months of 2015 using uncond market impact.py script provided in the AppendixE. The general idea is demonstrated in algorithm 1 below:

Algorithm 1:

Calculate lag -1 response

Input:

A DataFrame set of trades and quotes entries D = { d , d , . . . , d r } , where ∀ t : d t entry contains the following information about the market event: { T ime, EventT ypeCode, OrderID, V olume, P rice,Direction, AskP rice, AskV olume, BidP rice, BidV olume,M id − price, Spread } Output: (cid:104) s (cid:105) , R (1) , Σ r , N MO (cid:104) s (cid:105) , R (1) , Σ r , N MO ← m t , m t +1 , V (1) ← for i = 0 , . . . , length ( D ) do if d i [ EventT ype ] = 4 or d i [ EventT ype ] = 5 then N MO + = 1 (cid:104) s (cid:105) + = d i − [ Ask ] − d i − Bid ] if m t = 0 then m t = d i [ M idprice ] m t +1 = d i +1 [ M idprice ] R = max (0 , ( m t +1 − m t ) ∗ sign ) R (1)+ = R V (1)+ = ( m t +1 − m t ) m t = m t +1 (cid:104) s (cid:105) = (cid:104) s (cid:105) /N MO R (1) = R (1) /N MO V (1) = V (1) / ( N MO Σ r = (cid:112) V (1) − R (1) return (cid:104) s (cid:105) , R (1) , Σ r , N MO In addition to R (1) the script estimates: • (cid:104) s (cid:105) - average spread just before a MO58 (cid:80) R - standard deviation of price ﬂuctuations around the average price im-pact of a MO • N MO – number of MOs recorded during the day (excluding the auctions)A MO is said to be price-changing if its size is equal to or exceeds the volume atcorresponding opposite-side best price in the LOB at the time of MO arrival; thatis, MO execution changes the current price of an asset immediately to the nextbest bid (ask) respectively.In summary, the above algorithmic procedure iterates over daily market eventsand when it ﬁnds a MO execution (corresponding to the event of type 4 or 5),it records various statistics. Appendix C.8 explains speciﬁcally how the events oftype 5 – execution of hidden orders - can aﬀect our measurement of market impact.For each trading days’ set of observations we calculate daily averages, and thenuse it to estimate descriptive statistics of our data for the entire period.59 .1.3 Empirical Observations The results of our calculations can be found in the below Table 6:Stock (cid:104) s (cid:105) R (1) (cid:80) R N MO SIRI

EBAY

TSLA

PCLN

Table 6: (cid:104) s (cid:105) Average spread; R (1) the lag-1 response function for all MOs; (cid:80) R standard deviation of price ﬂuctuations around the average price impact of MOs– all expressed in dollar cents; N MO total average number of MO per day between10:30 and 15:00 on each trading day from 2 nd of January to 30 th of June 2015 forfour large- and small- tick stocks.The following can be noted: • Lag-1 unconditional response R (1) is strongly positive for all stocks. Thisimplies that on average, MOs are followed by a change in price. • For small tick stocks TSLA and PCLN, R (1) seems proportional to theaverage spread (cid:104) s (cid:105) . This could be explained by our earlier discussion ofmarket makers’ compensation for adverse selection – MMs attempt to earnthe bid-ask spread but are challenged by adverse price movement caused bythe MOs impact. • For large tick stocks, the spread is bound below by (and is often equal to) asingle tick as can be noticed for SIRI and EBAY. When the bid-ask spread is60his small, neither of the best prices can improve as the two would otherwisemerge. If we recall Figure 1 from the earlier discussion of the LOB, there wasa gap of 4 ticks between the best ask and bid prices. In that case, submissionof a new LO at a price lower than the current ask would update the bestquote, thus the mid-price. Large tick stocks, on the other hand, do not havesuch a gap between the best bid and ask, therefore, the changes in mid-priceare not aﬀected by LOs in the same manner. In this regard, Bonart andGould (2017) argue that large tick stocks are more suited for the analyses ofprice queue dynamics. • For all stocks, the degree of dispersion (cid:80) R exceeds the mean impact R (1),meaning that R (1) varies greatly. This is due to the fact that, as we haveobserved from the data, MOs can not only change the price in the directionof demand, but have also been found, at times, to not aﬀect the LOB atall. Moreover, a MO can be followed by the opposite direction change in themid-price. In this respect, R (1) incorporates the eﬀect of the full sequenceof market events that have caused the asset price to change including sub-missions and cancellations of LOs. As pointed out earlier, small tick stocks’mid-prices are prone to be aﬀected by LOs to a larger degree than large-tickstocks. 61 .1.4 2015 Financial Time Series Our choice of the observational period was partially inﬂuenced by the research ofBouchaud et al. (2018). Authors use LOBSTER data for 120 stocks traded onNASDAQ to measure and predict market impact. They have conducted compa-rable estimations for our selected securities during 2015.When we initially benchmarked our ﬁndings for the ﬁrst six months of 2015 to thoseof Bouchaud et al. (2018), our response function measurements were somewhatlesser. To understand this phenomenon, we conducted the same experiments on thedataset for the second half of 2015 (to obtain an entire year average as estimatedin Bouchaud et al. (2018)).The results, presented in Table 7, indicate that trade activity in the second half of2015 was very distinct from the ﬁrst six months. This is in line with several publicnews archives, which report major ﬁnancial events that took place at the start ofJune 2015 and followed through to June 2016. During that period, now referredto as “2015-16 stock market sell-oﬀ” , stock prices around the world declined invalue as traders were actively selling for a number of reasons including: slowinggrowth of Chinese GDP, falling petroleum prices, Greek debt default, sharp rise inbond yields and UK’s decision to leave the European Union among other factors(Randall and Gaﬀen, 2016). 62tock (cid:104) s (cid:105) R (1) (cid:80) R N MO SIRI

EBAY

TSLA

PCLN

Table 7: (cid:104) s (cid:105) Average spread; R (1) the lag-1 response function for all MOs; (cid:80) R standard deviation of price ﬂuctuations around the average price impact of MOs– all expressed in dollar cents; N MO total average number of MO per day between10:30 and 15:00 on each trading day from 30 th of June to 31 st of December 2015for four large- and small- tick stocksThe data adheres to the occurrence of such events – the average spread and pricedispersion (as measured by the standard deviation) are evidently higher for theperiod of June to December 2015 (higher than the ﬁrst half). Although the fol-lowing year of 2016 is out of the scope of our research, we are conﬁdent similartrend continued through.When we average the outcome for the two halves of 2015, our results resembleclosely those by Bouchaud et al. (2018). Table 8 contrasts our ﬁndings to thoseoutlined in “Trades, Quotes and Prices” (Bouchaud et al., 2018).We prescribe some minor discrepancy in the reported results to the implementationapproach. Some further discussion on this is presented in Appendix C.9.63tock (cid:104) s (cid:105) R (1) (cid:80) R N MO SIRI

EBAY

TSLA

PCLN

Table 8:

In bold are empirical observations from the work of Bouchaud et al.(2018); in white results of the current study using market data from LOBSTER(measurement units are the same as in Tables 6 and 7)

In our earlier discussion of market impact, we looked at several economic reasonsthat drive this phenomenon – MOs revealing private information and MOs’ me-chanical consumption of available liquidity – both leading to the change in thebest-quoted price (immediate or subsequent). It is, thus, intuitive to explore a re-lationship of how the size of MO can inﬂuence the degree of impact. Therefore, wedeﬁne R ( υ,

1) as a volume dependent lag-1 impact response function and furtherexplore its’ functional form. 64 eature Engineering: Extraction and Selection

In this respect, our feature selection process throughout the study has been guidedby economic intuition. That is, following a comprehensive study of subject liter-ature we have identiﬁed features that we believe can help explain market impact;as opposed to obtaining the inferred relationships between the price impact andvarious observed market variables using feature extraction techniques.To arrive at a point where we can examine the impact response function, we havealso engineered several features such as mid-price, spread, the standard deviationof price ﬂuctuations, normalised trade volume and the response function itself.Whilst the former features are not going to be used directly for deﬁning priceimpacts functional form, quantities like spread and standard deviation illustrateimportant qualities of our dataset which we highlighted in the previous section.

Normalised Volume

To account for daily trends and to be able to compare response function behaviouramong stock with diﬀerent average traded volume characteristics, we normalisethe MO volume υ by the average volume at the opposite side best quote ¯ V best .The mean volume used for normalisation is recalculated each day ensuring therobustness of the relative scale. 65 escriptive Statistics For this exercise, we recorded the normalised volume of each MO and its cor-responding lag-1 response function R (1). As a part of data pre-processing, wehave removed outliers and aggregated average response per observed values ofnormalised volume. Jupyter Notebook (document that illustrates the code and itsexecution) Conditioning on Trade Volume in Appendix E describes the detailedprocedure.An initial analysis of distribution of normalised volume, showed that the major-ity of trades for both large tick EBAY and small tick PCLN have a relativelysmall normalised volume (example in Figure 4). This means we have many moreobservations for smaller trade volumes than for trades of a larger size.To regularise the data, we ordered the series by normalised volume and then splitthe dataset in two: trades with normalised volume less than 0.1 and the rest ofobservations. Then we subsampled each partition at diﬀerent frequencies.

Subsampling

The LOBSTER data represents the evolution of the LOB throughout each tradingday sampled by the arrival of new events, as opposed to sampling at regular timeintervals. In our study, we are similarly interested in the series of events as they66 igure 4:

Normalised volume distribution for PCLN. Yellow line representing themean normilised volume.arrive in the exchange, and more speciﬁcally MO executions.In our initial analysis of market data, we conducted several data subsamplingexperiments where we regularised time series by sampling at regular intervals inseveral feature spaces. The main purpose of subsampling is to transform a seriesof observations that arrive at irregular intervals (such as LOBSTER data) intoa homogeneous series. Such form is akin to a tabular representation and is oftenreferred to as bars . In the Jupyter Notebook

Create Bars (see Appendix E) severalstandard bars are created including time, tick, volume and dollar bars. Volumebars, for example, sample data every time a certain threshold (number of shares)of the security has been exchanged. Similarly, other types of bars sample dailymarket data at regular intervals in tick, time and dollar dimensions.67o estimate the relationship between response function and trade volume, how-ever, we examine every market order – its corresponding normalised volume andlag-1 response. Therefore, a slightly diﬀerent kind of regularisation applies in thiscase: for trades with smaller trade volume we sample the data every 0.01 incrementin the (ordered) series of normalised volume; for larger volume trades we samplethe data every 0.1 increment in normalised volume observation. This approachto regularisation allows us to normalise the dataset according to the density ofobservations. An exact procedure is demonstrated in the Jupyter Notebook

Con-ditioning on Trade Volume which shows how we have visualised lag-1 responsefunction conditioned on normalised trade volume (we collected the data duringexecution of uncond market impact.py – available in Appendix E).68 ag-1 Response Function Conditioned on Normalised Trade Volume

From our earlier discussions it was intuitive to assume that larger MOs wouldaﬀect the price more than smaller MOs. However, the empirical results in Figure5 demonstrate that the eﬀect of trade volume on the price is very faint: small tickstock PCLN exhibits almost constant monotonic relationship between normalisedtrade volume and the price impact (as measured by R ( υ , Figure 5:

Lag-1 Response Function Conditioned on Normalised Trade Volumefor PCLN (left) and EBAY (right)From this visual representation of lag-1 impact conditioned on the trade volume wenote that the response function appears to be concave and even almost constantfor small tick stocks such as PCLN. It would be necessary, however, to compareobservations for a larger number of securities to conﬁrm the robustness of suchﬁndings. We refer to the works of Bouchaud et al. (2018) and Zarinell et al. (2015)69ho obtained similar results. The former attribute concavity to a conditioningbias called selective liquidity taking. Appendix B.1 elaborates further on thisphenomenon.

An alternative approach to measuring impact of trades on price is to observemarket impact not at trade-by-trade level, as we were doing before, but instead at aggregated order-ﬂow scale. In other words, in contrast to looking at the individualimpact of trades, we aggregate multiple MOs’ signed volumes as a single order-ﬂow imbalance and measure how the price has changed before and after. Recall ε as order sign (positive meaning buy MO, and negative – sell MO), and υ as thevolume of an individual MO, we denote order-ﬂow imbalance ∆ V as:∆ V = (cid:88) n ∈ [ t, t + T ) ε n υ n (21)where T > V .In this regard, ∆ V indicates the proportions of buy and sell orders that arriveduring a given period (as measured in the number of MOs observed). Positive∆ V means there were more buy orders (in volume) during the speciﬁed period70buy orders have positive sign and volume causing ∆ V > V implies more recent sell orders, hence the price is expected to decline. Wefurther study the aggregate impact conditioned on this volume imbalance, deﬁnedas: R (∆ V, T ) := E [ m t + T − m t | ∆ V ] (22) R (∆ V, T ) measures change in the mid-price immediately before the original MOand after a sequence of T MOs ( m t + T being a mid-price immediately before thenext MO after T MOs).

Aggregate Response Function Conditioned on Trade-Flow Imbalance

We choose to investigate the relationship between aggregate impact (measuredin dollar cents) and the order-ﬂow imbalance (measured in shares) over T =5 , , ,

50 for TSLA stock. Our approach is demonstrated in the JupyterNotebook

Conditioning on Order-Flow Imbalance in Appendix E.In this part of the study we did not normalise (by corresponding average dailystatistics, as done in the previous section when conditioning on trade volume) ag-71 igure 6:

Aggregate impact (in dollar cents) of order-ﬂow imbalance (in shares)(in over T=5 (top left), T=10 (top right), T=20 (bottom left) and T=50 (bottomright) for TSLA stockgregate impact or imbalance, therefore, each experiments’ results are presented inindividual subplot. The main purpose of this exercise was to visually understandthe functional form of relationship between order-ﬂow imbalance and the aggre-gated price response function. The following interesting properties can be noticedfrom Figure 6: • Imbalance amplitude increases with T – as we aggregate the eﬀect of moretrades, the order-ﬂow imbalance increases in absolute value as well. Forinstance, when observing the impact of T = 5 MOs, imbalance ranges from72round -1500 to 1500 shares. However, once T is increased to 10, imbalancescope expands to reaching 6000 shares in absolute value. Interestingly, thefunctional form of relationship seems the same for all T despite diﬀerentscales. • For smaller ∆ V the relationship with R (∆ V, T ) appears to be linear for allobserved T = 5 , , , • With increase in | ∆ V | (absolute value of order-ﬂow imbalance) the relation-ship takes on a more concave (rather than linear) form, similar to our pre-vious analysis of lag-1 response function conditioned on normalised volume.Again, this may be attributed to the selective liquidity bias, as described inAppendix B.1. In the previous section we stated that the dependency between order-ﬂow imbal-ance and corresponding aggregate price impact appears linear for smaller absolutevalues of ∆ V . In this segment, we conduct statistical experiments, as outlined inthe research methodology section, to quantitatively infer functional form of mar-ket impact conditional on order-ﬂow imbalance. Our approach is illustrated in theJupyter Notebook Market Impact Functional Form in Appendix E.73 .2.1 Functional Form of Market ImpactDescriptive Statistics

We use the data collected for TSLA stock over the entire year of 2015 with order-ﬂow imbalance observed over T = 10 MOs with the total of 47,473 observationsof ∆ V and R (∆ V, T ). We remove the outliers beyond 3 standard deviations andsubsample the data to reduce the noise . Visually our dataset is described by Figure7. We also illustrate the distributions of R (∆ V, T ) – aggregate impact, and ∆ V –order-ﬂow imbalance in Figure 8. Figure 7:

Aggregate impact (in dollar cents) conditioned on order-ﬂow imbalance(in shares) for TSLA stock during 2015 after data pre-processing74 igure 8:

Aggregate impact (in dollar cents) and order-ﬂow imbalance (in shares)distributions for TSLA during 2015. Red lines representing means - for aggregateresponse two lines are positive and negative means.These are in accordance with the previous illustration – imbalance distribution isuniform, meaning that asymmetry (order-ﬂow imbalance) of every size on eitherdirection of trade is equally likely. Aggregate impact, on the other hand, seems tohave more observations for a speciﬁc absolute value of around 10 dollar cents. Thisconcentration of aggregated price impact, again, can be explained by the selectiveliquidity bias - larger orders (therefore order-ﬂow imbalance) only consume theliquidity available at the best opposite-side quote, hence, the impact does notincrease beyond certain margin (is concave).We further explore the correlation between the two observed variables and calculatethe correlation coeﬃcient ρ = 0 . Supervised Linear Regression

We ﬁrst estimate functional relationship between ∆ V and R (∆ V, T ) using a

LinearRegression model which employs the OLS approach.We split our dataset randomly into two partitions: the training data to ﬁt themodel, and the test data reserved to verify predictive accuracy of our model onunseen data. To quantify this accuracy, Mean Squared Error (MSE) is calcu-lated to assess how values estimated by the regression model diﬀer from the trueobservations in the test dataset.In the case of Linear Regression, the MSE is 3.86 meaning that, on average, ourpredicted aggregated price impact was either above or below the true value by 3.86dollar cents. We graphically illustrate predictive linear model and true observationsin the Figure 9. 76 ecision Tree

As an alternative to Linear Regression, we estimate a model using

Decision TreeRegression – a supervised learning technique that predicts values of regressand bylearning decision rules derived from observations. In contrast to Linear Regres-sion, Decision Tree model is not linear in parameters, therefore, it utilises MLEfor parameter estimation. Similarly, we train our model on one partition of thedata and test it on another unseen dataset. The MSE for Decision Tree modelis 2.5 dollar cents – a better estimate than Linear Regression model. A visualrepresentation of predicted and observed values can be seen in Figure 9.

Figure 9:

Linear Regression model and true observations of market impact forTSLA (left). Decision Tree model and true observations of market impact forTSLA (right) (Decision Tree Regression only displaying testing observations).77 ross-Validation: Model Selection

Splitting the dataset into two partitions – one for training and one for testing – isa na¨ıve approach to testing predicting accuracy. Thus, we employ cross-validationto determine which model produces better estimates.Cross-validation is conducted using a helper function that splits the data intotraining and test sets, ﬁts the model and calculates scores (MSE and R R Figure 10:

MSE (left) with average MSE for Linear Regression in blue, andaverage MSE for Decision Tree Regression in green; and R2 results (right) forLinear Regression and Decision Tree Regression.78rom the two ﬁgures it is evident that the Decision Tree Regression displayedbetter predictive performance - its average MSE is slightly lower than MSE ofLinear Regression model; and the coeﬃcient R The functional linearity between price impact and order-ﬂow imbalance is an em-pirical illustration of Kyle’s model which was introduced in the earlier LiteratureReview chapter. As we recall, Kyle stated that the price adjustment must be linearin total signed volume, expressed as:∆ p = λεV (23)Therefore, the slope of linear relationship that we have observed for smaller volumeimbalances is, what’s known as, a Kyle’s lambda .79 odel Estimation

In order to estimate the slope of this linear region of R (∆ V, T ), we train the modelspeciﬁcally on the subsection of our observations where the relationship appearslinear (as illustrated in the Figure 11).

Figure 11:

Subsection of observation that demonstrate linearity between order-ﬂow imbalance and price impact for TSLA in 2015.Again, we use Linear Regression and Decision Tree Regression to ﬁt the modeland cross-validate it. The models and corresponding statistics about predictionaccuracy (MSE and R

2) are illustrated in the Figure 12.In this instance, both models have a good performance according to R igure 12: Top: models of Linear and Decision Tree Regressions for predictingmarket impact conditioned on order-ﬂow imbalance (Decision Tree Regression onlydisplaying testing observations); bottom: measures of goodness of ﬁt for the twomodels (left: green line - average MSE for Decision Tree, blue line - average MSEfor Linear Regressionlower average MSE by almost 0.50 dollar cents. This supports our earlier remarkabout the potential for more accurate ML predictive models.81 easure of Market Liquidity

The slope of our linear model, the Kyle’s λ , was estimated as 0.011. This isregarded as the measure of (il-)liquidity of a market. For TSLA, we can postulatethat the market is quite liquid, since the value of λ is low. However, for morerobust conclusions it is necessary to compare measurements of λ across multiplestocks and venues. 82 Conclusion and Future Work

The focus of this research was to ﬁnd a functional form of market impact usingparametric machine learning algorithms. We began with an overview of the ﬁnan-cial market microstructure and the role of information (and more importantly in-formation asymmetry) in the price formation (discovery) process. We then lookedat a number of models of the information eﬃciency in the market, and how agents’interactions inﬂuence trading. Naturally we arrived at the deﬁnition of market im-pact - the focal point of this study.Following suit, we used trades and quotes dataset from NASDAQ electronic ex-change to measure unconditional market impact, price impact conditioned onnormilised trade volume and the aggregate price response as a function of order-ﬂow imbalance. We have learnt that the unconditional price impact of marketorders is proportional to the average spread for the stock. Moreover, we observedno functional dependency between price response function and normilised tradevolume - partially attributed to the selective liquidity bias. In our analysis of aggre-gate price impact conditioned on the order-ﬂow imbalance, we discerned sublinearrelationship between these two observed variables.Consequently, we employed Linear Regression and Decision Tree Regression models83o quantitatively measure the functional form of such relationship. We used thetwo models to predict values of aggregated price impact of order-ﬂow imbalanceafter a number of consecutive market orders. Decision Tree Regression - a machinelearning algorithm that is not linear in parameters, illustrated better predictiveaccuracy than traditional Linear Regression, during cross-validation.

We identify a number of ways in which our study can be further developed. Thebelow list provides a summary of our suggestions. These are further explained inthe Appendix D. • To explore the statistical and practical application of ML models in estima-tion of market impact it would be necessary to conduct a comparison withperformance of current standard (for academics and practitioners) measuresof price impact (i.e., Volume Weighted Average Price, VWAP); as well asother theoretical and empirical formulas of market impact. • Alternative data sources can be used for study of the impact of meta-orders(see Appendix D.1.1). • It would be of acute importance to study liquidity dynamics and impact oftrades in illiquid markets - for example, in stressed market conditions (seeAppendix D.1.2). 84

Furthermore, the study can be expanded to other ﬁnancial products (notjust equities) (see Appendix D.1.3). • Accurate measure of liquidity (thus market impact) facilitates the under-standing of optimal execution, further contributing to the study of optimaltrading strategies (see Appendix D.1.4). • Experimental part of this research has been computationally intensive - itwould be beneﬁcial to adopt high-performance computational resources whenanalysing high-frequency data (see Appendix D.1.5).85 eferences

Acharya, Viral (2006). “Liquidity Risk: Causes, Consequences and Implicationsfor Risk Management”. In:

Economic and Political Weekly

41 (6),pp. 460–463.Almgren, Robert F. (2003). “Optimal execution with nmiscar impact functionsand trading-enhanced risk”. In:

Applied Mathematical Finance

10 (1),pp. 1–18.Almgren, Robert and Neil Chriss (1999). “Value Under Liquidation”. In:

Risk

Journal of Risk

Journal of Risk , p. 57.Avellaneda, Marco and Sasha F. Stoikov (2008). “High Frequency Trading in aLimit Order Book”. In:

Quantitative Finance

Journal of Financial Market

Bloomberg Liquidity Assessment LQA . url : https://data.bloomberglp.com/professional/sites/10/LQA-Fact-Sheet-1.pdf . 86onart, Julius and Martin Gould (2017). “Latency and liquidity provision in alimit order book”. In: Quantitative Finance

17 (10), pp. 1601–1616.Bouchaud, Jean-Philippe (2009). “Price Impact”. In:

Encyclopedia ofQuantitative Finance; arXiv:0903.2428 [q-ﬁn.TR]; arXiv:0903.2428v1[q-ﬁn.TR .Bouchaud, Jean-Philippe, Julius Bonart, et al. (2018).

Trades, Quotes and PricesFinancial Markets Under the Microscope . Cambridge University Press.Bouchaud, Jean-Philippe, Doyne J. Farmer, and Fabrizio Lillo (2009). “Howmarkets slowly digest changes in supply and demand”. In:

Handbook ofFinancial Markets: Dynamics and Evolution , pp. 57–160.Bouchaud, Jean-Philippe, Yuval Gefen, et al. (2003). “Fluctuations and responsein ﬁnancial markets: the subtle nature of ‘random’ price changes”. In:

Quantitative Finance

Algorithmic andHigh-Frequency Trading . Cambridge: Cambridge University Press.Cont, Roma, Arseniy Kukanov, and Sasha Stoikov (2010). “The Price Impact ofOrder Book Events”. In:

Journal of Financial Econometrics , pp. 1–26.Cont, Roma, Sasha Stoikov, and Rishi Talreja (2010). “A Stochastic Model forOrder Book Dynamics”. In:

Operations Research

58 (3), pp. 549–563.Day, Sebastian (2017).

Risk.net . url : .87isler, Zoltan, Jean-Philippe Bouchaud, and Julien Kockelkoren (2012). “Theprice impact of order book events: market orders, limit orders andcancellations”. In: Quantitative Finance

12 (9), pp. 1395–1419.Farmer, Doyne J., Paolo Patelli, and Ilija J. Zovko (2005). “The predictive powerof zero intelligence in ﬁnancial markets”. In:

Proceedings of the NationalAcademy of Sciences of the United States of America

102 (6), pp. 2254–2259.Finnerty, Joseph E. (1976). “INSIDERS AND MARKET EFFICIENCY”. In:

The Journal of Finance

31 (4), pp. 1141–1148.Garman, Mark (1976). “Market Microstructure.” In:

Journal of FinancialEconomics , pp. 257–275.Glosten, Lawrence R. and Paul R. Milgrom (1985). “Bid, ask and transactionprices in a specialist market with heterogeneously informed traders”. In:

Journal of Financial Economics

14 (1), pp. 71–100.Gould, Martin D. et al. (2013). “Limit order books”. In:

Quantitative Finance

The Journal of Finance

31 (2),pp. 573–585.Gu´eant, Olivier (2016).

The Financial Mathematics of Market Liquidity FromOptimal Execution to Market Making . New York: Chapman and Hall/CRC.88u´eant, Olivier, Charles-Albert Lehalle, and Joaquin Fernandez Tapia (2013).“Dealing with the Inventory Risk. A solution to the market making problem”.In:

Mathematics and Financial Economics

Trading and Exchanges: Market Microstructure forPractitioners . New York: Oxford University Press.Hasbrouck, Joel (2007).

Empirical Market Microstructure: The Institutions,Economics, and Econometrics of Securities Trading . Oxford University Press.Hautsch, Nikolaus and Ruihong Huang (2011).

Limit Order Flow, Market Impactand Optimal Order Sizes: Evidence from NASDAQ TotalView-ITCH Data.

Huang, Ruihong and Tomas Polak (2011).

LOBSTER: Limit Order BookReconstruction System . url : http://dx.doi.org/10.2139/ssrn.1977207 .Huang, Weibing, Charles-Albert Lehalle, and Mathieu Rosenbaum (2015).“Simulating and Analyzing Order Book Data: The Queue-Reactive Model”.In: Journal of the American Statistical Association

110 (509), pp. 107–112.Jansen, Stefan (2018).

Hands-On Machine Learning for Algorithmic Trading .Packt Publishing.Jean-Philippe, Donier Jonathan Bonart Julius Mastromatteo Iacopo Bouchaud(2015). “A fully consistent, minimal model for non-linear market impact”. In:

Quantitative Finance

15 (7), pp. 1109–1121.K.Brunnermeier, Markus and Lasse Heje Pedersen (2009). “Market Liquidity andFunding Liquidity”. In:

Review of Financial Studies

22 (6), pp. 2201–2238.89yle, Albert S. (1985). “Continuous auctions and insider trading”. In:

Econometrica vol. 53 (6), pp. 1315–1335.— (1989). “Informed Speculation with Imperfect Competition”. In:

The Reviewof Economic Studies

56 (3), pp. 317–355.Kyle, Albert S. and Anna A. Obizhaeva (2016). “Market MicrostructureInvariance: Empirical Hypotheses”. In:

Econometrica

84 (4), pp. 1345–1404.— (2018). “The Market Impact Puzzle”. In:

SSRN Electronic Journal , pp. 1–16.Lehalle, Charles-Albert and Sophie Laruelle (2018).

Market Microstructure inPractice . World Scientiﬁc Publishing.Loeb, Thomas F. (1983). “Trading Cost: The Critical Link between InvestmentInformation and Results”. In:

Financial Analysts Journal

39 (3), pp. 39–44.Mastromatteo, Iacopo, Bence Toth, and Jean-Philippe Bouchaud (2014).“Agent-based models for latent liquidity and concave price impact”. In:

Physical review E

89 (4), p. 042805.Michael, McMillan et al. (2011).

Investments: Principles of Portfolio and EquityAnalysis . Wiley.Mitchell, Thomas M. (1997).

Machine Learning . New York: McGraw-Hill, Inc.Moro, Magro E. et al. (2009). “Market impact and trading proﬁle of largetrading orders in stock markets”. In: arXiv:0908.0202 [q-ﬁn.TR];arXiv:0908.0202v1 [q-ﬁn.TR] .Murphy, David (2008).

Understanding Risk: The Theory and Practice ofFinancial Risk Management . New York: Chapman and Hall/CRC.90urphy, Kevin P. (2012).

Machine Learning: A Probabilistic Perspective . MITPress.Obizhaeva, Anna A. and Jiang Wang (2013). “Optimal trading strategy andsupply/demand dynamics”. In:

Journal of Financial Markets

16 (1), pp. 1–32.Park, Saerom, Jaewook Lee, and Youngdoo Son (2016). “Predicting MarketImpact Costs Using Nonparametric Machine Learning Models”. In:

PLoSONE 11(2):e0150243 - DOI: 10.1371/journal.pone.0150243 .Randall, David and David Gaﬀen (2014).

What’s behind the global stock marketselloﬀ ? url : .Rosenow, P. Weber B. (2005). “Order book approach to price impact”. In: Quantitative Finance

Journal of Financial Economics

16 (2).Smith, Eric et al. (2003). “Statistical theory of the continuous double auction”.In:

Quantitative Finance

Market Impact Model: Handbook . Berkeley: Barra Inc.Zarinelli, Elia et al. (2015). “Beyond the square root: Evidence for logarithmicdependence of market impact on size and participation rate”. In:

MarketMicrostructure and Liquidity ppendices A A.1

Market and Funding Liquidity

As outlined by Brunnermeier and Pedersen (2009), the market and funding liq-uidity concepts are mutually reinforcing. Traders provide market liquidity, andtheir ability to do so is dependent on the availability of funds. Conversely, tradersfunding needs, such as margin and capital requirements (i.e., collateral needed toenter into a trade) depend on the market liquidity of the asset. More accurately,when funding liquidity becomes scarce, traders become reluctant to take on capitalintensive positions, lowering market liquidity and increasing margin and capitalrequirements. This increased risk, and thus cost of ﬁnancing a trade ( margin or haircut ) is thus dependent on eﬃcient streams of market liquidity.92 B.1

Selective Liquidity Taking Under Market Fragmen-tation

An imperative consequence of information asymmetry that is crucial to under-standing of how markets operate is that even highly liquid markets are in factnot suﬃciently liquid after all Bouchaud et al. (2018). That is, to minimise therisk of adverse selection, MMs providing liquidity operate in a regime of small re-vealed liquidity, and large latent liquidity. Consequently, the available liquidity atany given moment only reﬂects a small fraction of the potentially huge underlyingsupply and demand. This is often captured by general equilibrium models suchas Kyle (1985), Mastromatteo, Toth and Bouchaud (2014), which are based oninteractions between rational agents who take optimal decisions.The scarcity of liquidity has an immediate and important consequence: largetrades, or meta-orders , must be fragmented into smaller child orders that canbe slowly digested by markets. Empirical observations such as Bouchaud (2009)suggest that order splitting is a very common practice in a wide range of markets.In recent years, it has become common to trade assets on several diﬀerent electronicplatforms simultaneously. This increased fragmentation means it is necessary tosplit an order not just through time, but also through space - to selectively consume93iquidity (submit trades no larger than the available liquidity at the correspondingprice) across all available venues (Lehalle and Laruelle, 2018).In the earlier discussions of functional form of relationship between response func-tion and normalised volume, as well as, dependency of aggregate impact on order-ﬂow imbalance, we have attributed concavity to selective liquidity taking. Thiscould be partially explained by the fact that the larger orders arrive only whenthere is a large volume available at the opposite side best quote. In other words,larger volume trades do not cause bigger impact because they only consume avail-able liquidity and no more. Therefore, the price impact remains monotonic (con-cave) for diﬀerent size trades.The study of meta-orders and hidden orders (hidden in the sense that the trueorder size is not made public to minimize information leakage) yields surprisingempirical ﬁndings for the functional form of market impact. However, the existingempirical literature on the subject is limited, with Torre (1997), Almgren (2003),Moro et al. (2009), being the few publications in this area. This is attributed tothe diﬃculty in obtaining access to data that could facilitate statistical reconstruc-tion of transactions against trading account member codes. This would otherwisepermit us to identify the parent meta-orders to which a child belongs to. Thelimitation of our study stems from this lack of proprietary datasets which clearlyidentiﬁes the source of large orders. For this reason, latent liquidity and the impactof meta-orders are beyond the scope of this project.94 igure 13:

Latent Order Book in the presence of a meta-order, with bid orders(blue boxes) and ask orders (red boxes) sitting on opposite sides of the price lineand subject to a stochastic evolution. (DONIER, Jonathan et al., 2015)

B.2

Limit Orderbooks and Their Models

The LOB keeps a record of order-ﬂow events – the execution of trades (aggres-sive market orders) and the provision of liquidity (passive limit orders). Theseorders are managed by a matching engine in accordance with some well-deﬁnedalgorithm (i.e., price time priority or pro-rata rules). The unusually rich, detailed,and high-quality historic LOB data facilitates the ability to analyse complex globalphenomena that emerge as a result of the local interactions between many hetero-geneous agents.The earliest models of LOB generally depicted the evolution of order ﬂow according95o simple stochastic processes with set parameters. Smith et al. (2003) introduceda model in which limit order queues, as well as counter executed market ordersand cancellations, transpire as mutually exclusive Poisson processes with ﬁxed-rate parameters. The model has since been extended by Cont, Stoikov and Talreja(2010), who account for varying cancellation and limit order rates as a function ofﬂuctuating prices.Though these so-called zero intelligence models (models in which order ﬂows areassumed to be governed by speciﬁed stochastic processes) of LOBs perform rela-tively well at depicting some long-run statistical properties of genuine LOBs (see,e.g., Farmer, Patelli and Zovko (2005)); they are largely hindered by their ex-clusion of strategic considerations in which liquidity providers might adapt theirﬂows in response to the actions of others. This constraint was recently allevi-ated in the work of Huang, Lehalle and Rosenbaum (2015). By incorporating anagent’s strategic behaviour, they provide a more realistic picture of empiricallyobserved facts about the price discovery and price formation processes, as well as,the general market quality. 96

C.1

The NASDAQ

NASDAQ stands for “National Association of Securities Dealers Automated Quo-tation” and was originally founded in 1971 in New York, USA. The initial ideabehind NASDAQ stock exchange was to give dealers the ability to post quoteselectronically, thus making the process of selling smaller stocks (previously soldOTC) more eﬃcient. This electriﬁcation of the trading process proved hugelysuccessful. Today NASDAQ is the second-largest electronic exchange by marketcapitalisation in the world - currently exceeding 10 trillion US dollars. And theﬁrst by overall trading volume.Unlike other major exchanges such as NYSE, NASDAQ does not have a physicallocation – it is an entirely digital marketplace. NASDAQ normal trading hoursare between 9:30 am and 4:00 pm Eastern Time on a business day. Typically,NASDAQ operates around 253 days a year.

C.2

Ordeerbook Reconstruction

The order book dynamics are simulated by continuously updating known informa-tion about its state. For example, given the previous day state of LOB for a stock,each new TotalView-ITCH event message changes the state of the book according97o whether the message was an addition, update, cancellation or execution of thelimit order. Figure 14 illustrates the incoming message “A” to add a limit order.In response, a new order is added to the pool and the LOB is updated at the cor-responding price level. In contrast, when an update message arrives – for example,cancellation “X” and “D”, the original order is found in the pool using order IDand updated accordingly.

Figure 14:

LOB reconstruction from NASDAQ data feed (HUANG, Ruihongand Polak, Tomas, 2011)For each LOBSTER data request, the LOB is reconstructed fully from the ex-change message data for the stock during the speciﬁed period. By choosing anappropriate level of depth, only the number of levels requested is saved to theoutput ﬁle. 98OBSTER data has been widely popular among academics in recent years. Weﬁnd studies of Bouchaud et al. (2018), Cartea, Jaimungal and Penalva (2015),Bonart and Gould (2017) and others using LOBSTER data for microstructureresearch. The pre-processed output LOB ﬁles are available to download via theInternet saving a lot of time and eﬀort, thus enabling academics to focus on theeconomic study.

C.3

Data Prepossessing

Python Data Analysis

Once the data is downloaded from LOBSTER in CSV format, we deﬁne and deployPython script to analyse and clean the data. Since each record in LOBSTERmessage ﬁle corresponds to exactly one record in orderbook ﬁle (and both ﬁlesare ordered), we merge the data from messages and orderbook into one PythonDataFrame object to be able to manipulate and analyse the data more eﬃciently.DataFrame is a two-dimensional tabular data structure with labelled rows andcolumns that can store data of various types - integer numbers, ﬂoating-pointnumbers, string characters and other. Throughout our empirical study we operateon market data using Python, and more speciﬁcally Python Pandas library whichcontains an extensive number of data science methods and functions implementedand available freely for use by academics and practitioners. Python has become99ery popular and is increasingly recognised as the de-facto data science languageamong both in recent years because of it’s easy to read (thus use) syntax andmulti-paradigm nature.

Inspection and Cleaning

To begin with, we investigate the properties of typical daily data. Our researchobjective is to deﬁne how various environment features, such as trade volume,inﬂuence the parameter of interest – market impact. To achieve this, we estimatethe daily impact of MOs, and then calculate an average price impact for the total6 months set of observations (amounting to 248 trading days).When looking at the average daily data, we discover that a large portion ofevents during trading hours constitute submission and cancellation of LOs (eventsof type 1 and 3 respectively). These ﬁndings, illustrated in Figure 15, are alignedwith previous microstructure research (see Bouchaud et al. (2018)). Therefore,given our particular interest in how executions of MOs inﬂuence the price, onlya small fraction of a daily dataset is relevant for the purposes of our quantitativeanalysis. To compensate for this, the study examines daily dynamics throughouta suﬃciently long period of 6 months. 100 igure 15:

Frequencies of market events of each type during trading day.Next, when we examine the average trade volume for TSLA, some very unusualvalues are evident (Figure 16). Most electronic markets are known to operateduring set time intervals throughout the day, where the beginning and end ofthe day are more generally reserved for auctions that result in the atypical tradeactivity. This might explain our ﬁndings of trade volume outliers.101 igure 16:

Box plot of trade volumes during the full trading day for TSLA stockFor each trading day, we remove the market activity for the ﬁrst and the last hour,e.g. only using data for 10:30-15:00. This eliminates some inconsistency noticedin the previous examination of trade volume range as per Figure 17.

Figure 17:

Box plot of trade volumes during 10:30-15:00 for TSLA stockAt this point, it is important to consider that whenever a single MO matchesseveral LOs in the book, each LO execution is listed as a separate record in bothLOBSTER messages and orderbook ﬁles, albeit with the same timestamp. Anexample is illustrated in Table 9 where three consecutive executions of visible LOs(event type 4) have the same timestamp, therefore, represent a single MO being102atched with several LOs at that price. Interestingly, the execution of the lastLO at the best ask price consumes all available liquidity at that level (33 shares)thus moving the ask price to the next best level – from 2058500 to 2058800. Thenext arriving MO to buy is most likely to be executed at this new price. This is anexample of market impact, therefore, in our study, it is necessary to recognise thatsuch multiple executions with the same timestamp correspond to a single MO.And the fact that it is the aggregated MO size that may have caused the pricechange, not just the latest matched LO that absorbed liquidity at the best quote.103 i m e s t a m p E v e n t T y p e O r d e r I D S i ze D i r ec t i o n A s k P r i ce A s k V o l u m e . - . - . - . - . . T a b l e : S e p a r a t e L O e x ec u t i o n s r e p r e s e n t i n g t h e s a m e M O a tt h e s a m e t i m e s t a m p C.4

Machine Learning

Unsupervised Learning

Unsupervised learning describes a set of algorithms that do not make use of labelledresponses associated with the data. They alternatively exploit the underlying datastructure to draw inference. In this respect, there are no correct target variables,instead, we are only given inputs, D = { x i } Ni =1 , to deduce meaningful properties ofthe probability density of the dataset (given no training sample). This is sometimesreferred to as knowledge discovery and is a much less well-deﬁned problem as thereis no means of evaluating the learners’ performance given no obvious target metric. Reinforcement Learning

Reinforcement learning constitutes the third class of ML. Reinforcement Learning(RL) attempts to learn a policy of how to take actions in a dynamic environmentto maximise the discounted future reward criterion. To collect information thelearner agent actively interacts with the environment (exploring unknown actionsto gain new insight and exploiting the information already collated) and in somecases aﬀects the system. 105 arametric and Non-Parametric Models • Parametric statistical learning involves a speciﬁed model of form f ( y ) withﬁxed a ﬁxed number of parameters that deﬁne its behaviour. The canonicalexample of a parametric model is that of linear regression. This involvesestimation of a set of p +1 coeﬃcients relating to the vector β = ( β , β , . . . β p )whereby a response y is linearly proportional to each feature x j . Parametricmodels hold the advantage of being eﬃcient in use, but the disadvantageof making strong assumptions about the nature of the distribution of thedataset. • An alternative is to consider a form for f where the number of parametersgrows with respect to the size of the dataset. Non-parametric models may notinvolve any parameters or may describe a distribution with a ﬁnite numberof parameters. These are more ﬂexible but often computationally intractablegiven the large quantity of observational training data required to account forthe lack of assumption. All models considered in this research fall under theparametric classiﬁcation. As always, by increasing the number parametricvectors in our feature space we risk the danger of overﬁtting the model asthe model may follow the “noise” too closely as opposed to the “signal” .106 .5 The Gauss–Markov Theorem

To assess the statistical signiﬁcance of the model and conduct inference, we makeassumptions about the residuals – properties of the unexplained aspect of the inputsample. The

Gauss–Markov theorem (GMT) deﬁnes several assumptions requiredfor OLS to generate unbiased estimates of the model parameters β . Under thestandard GMT, the coeﬃcients’ best (optimal) linear unbiased estimator is givenby OLS if and only if:1. The data for the input variables x, . . . , x k is a random sample from thepopulation2. The errors have a conditional mean of zero given any of the inputs: E [ (cid:15) | x, . . . , x k ] = 03. Homoskedasticity, the error term (cid:15) has a constant variance given the inputs: E [ (cid:15) | x, . . . , x k ] = σ The best estimator is the one that yields the lowest standard error (variance) ofthe estimates. In addition to these GMT assumptions, the classical linear modelassumes normality – the population error is normally distributed and independentof the input variables. This implies that the output variables are normally dis-tributed (conditional on the input variables) permitting the derivation of the exactdistribution of coeﬃcients. 107 .6 Statistical Inference

The functional relationship produced by supervised learning algorithms can beused for inference (that is, to gain new insights into how the outcomes are gener-ated) or for prediction (that is, to generate accurate outcome estimates, representedby ˆ y ) for unknown or future inputs ( x ).In the context of regression, inference looks to draw conclusions about the truerelationship of the population regression function (PRF) using the sample observa-tions. This means running tests of hypothesis about the signiﬁcance of the overallrelationship estimated by SRF, as well as testing for the signiﬁcance of particularcoeﬃcients. We also estimate conﬁdence intervals for the estimated model.An important component of statistical inference is a test statistic with a knowndistribution. This assumes, under the null hypothesis , the estimated statistic issimilar to the true model; and computes the probability (given by the p -value) ofobserving the value for this statistic in the sample. If the p-value drops below aconﬁdence threshold – usually ﬁve percent – we reject the null hypothesis.In addition to GMT assumptions, we have the following distributional charac-teristics for test statistic exactly when normality holds: The test statistic for ahypothesis test given an individual coeﬃcient β j is108 j = ˆ β j ˆ σ √ v j ∼ t N − p − t distribution of N − p − v j is the j th element of the diagonal of ( x T x ) − . C.7

AIC and BIC Goodness-of-Fit Measures

Alternative prominent goodness-of-ﬁt measures are the

Akaike Information Cri-terion and (AIC) and the

Bayesian Information Criterion (BIC). The goal is tominimise these based on the MLE: • AIC = − log ( L ∗ ) + 2 k , where L ∗ denotes the value of the maximised likeli-hood function and k is the number of parameters • BIC = 2 log ( L ∗ ) + log ( N ) k , where N is the sample sizeGiven a collection of estimated models, AIC determines the quality of a model bylooking to ﬁnd the model that best describes an unknown data-generating process;whereas BIC aims to ﬁnd the best model amongst a ﬁnite set of candidates. Bothprovide a means of model selection. Each metric penalises for complexity. WhileAIC has a lower penalty, thus might overﬁt; BIC imposes a larger penalty so it109an underﬁt the data. In practice, AIC and BIC can be used jointly to guidethe process of model selection if the objective is to ﬁt in-sample; otherwise, cross-validation is preferable. C.8

Hidden Orders

The LOBSTER data provides information on hidden LO executions – market eventof type 5. When reconstructing a MO that had been matched against several LOsin the book, we often observe a MO initially execute against hidden

LO, onlymatching resting visible

LOs at the best quote once it has consumed all hiddenliquidity. Although we do not particularly speak of the eﬀect of hidden ordersin this text, it is important to understand that the response (eﬀect) of some ofthe visible MOs is aﬀected by the hidden LOs at a better price. From the earlierdescription of NASDAQ electronic exchange, we recall that orders are executed inspeciﬁc successions: orders with the best prices are matched ﬁrst, priority beinggiven to the visible orders and the time of order submission. In other words, visibleorders are always executed ﬁrst unless hidden LO has a better price. In such cases,hidden orders (hidden liquidity) dilute the eﬀect of aggressive MOs at the pricethat matches the best in the LOB. 110 .9 Empirical Studies Review

In our implementation of price response algorithm we followed a number of promi-nent (in the ﬁeld of ﬁnancial microstructure) research publications that workedwith LOBSTER data such as Bouchaud et al. (2018) and Gould et al. (2013).Using the data for the same stocks, our results underestimate R (1) for all fourobserved tickers: SIRI, EBAY, TSLA, PCLN. Since our ﬁndings report averagespread (cid:104) s (cid:105) exactly matching results of BBouchaud et al. (2018), we assume thatthe discrepancy in the calculated values of response function R (1) arises from dif-ferent approaches in dealing with a MO that matched a number of LOs in the LOB(therefore appears as multiple entries at the same time-stamp in the LOBSTERmessage ﬁle). 111 o b s m i n m a x m e a n v a r i a n ce s k e w n e ss k u r t o s i s O r d e r - F l o w I m b a l a n ce - . . - . . . - . A gg r e ga t e d M a r k e t I m p a c t - . . . . - . - . T a b l e : D e s c r i p t i v e S t a t i s t i c s : O r d e r - F l o w I m b a l a n ce a nd A gg r e ga t e d M a r k e t I m p a c t . O r d e r - F l o w I m b a l a n ce m e a s u r e d i n s h a r e d , A gg r e ga t e d M a r k e t I m p a c t i s m e a s u r e d i nd o ll a r ce n t s . i n e a r R e g r e ss i o n M S E . . . . . . . . . . M e a n . S t . D e v . D ec i s i o n T r ee M S E . . . . . . . . . . M e a n . S t . D e v . T a b l e : M S E C r o ss V a li d a t i o nS c o r e s r un k × . M S E i s m e a s u r e d i nd o ll a r ce n t s . D.1

Future Work

D.1.1 Alternative Data

When giving a functional deﬁnition of market impact we have emphasised that inthis work we are looking speciﬁcally at the impact of individual market orders.This is partially due to the speciﬁcs of LOBSTER data set that does not identifymeta-orders. However, as we later discovered, there are other free resources suchas NASDAQ OMX NORDIC (or just OMX) online database where one can ac-cess trades and quotes data that identify market participants. This would enablereconstruction of meta-orders and thus study of the latent liquidity.Alternatively, as suggested by many similar empirical studies, having proprietarydata makes a big diﬀerence for accurate research of ﬁnancial markets microstruc-ture. Previous empirical studies of market impact employed traditional statis-tical techniques with a few publications such as Nevmyvaka and Kearns (2016)and Lehalle (2018) discussing application of ML methods. The majority of suchstudies were conducted on clean academic historical datasets that often omit mi-crostructure nuances required for a deeper level investigation. Contrastingly, weseek collaboration with industry partners to support the vision of data driven re-114earch, by providing real-time proprietary microstructure data. This can be usedto simulate interesting game theory variations of agents behaviours in varyingmarket conditions.

D.1.2 Illiquid Markets

One of the main distinguishing properties of diﬀerent markets is the availability andvisibility of liquidity. Moreover, liquidity can change drastically for an asset classin times of stressed markets. Such events, characterised as black swan events, arechallenging to model since they appear as outliers in the data, making it diﬃcultfor practitioners to decide on the most optimal actions when facing such scenarios.The second motivation for this research is to investigate the key drivers aﬀectingliquidity dynamics, under stressed market conditions.

D.1.3 Invariant Market Impact Function

Kyle and Obizhaeva (2018), as well as, Eisler and Bouchaud (2016) have researchedinto a universal solution for price impact. In their paper, “The Market ImpactPuzzle” , the former propose two conditions on the drivers of invariant formulawhich result in the tightly parameterized function. Eisler and Bouchaud (2016)have successfully applied the propagator technique to estimation of price impactin OTC credit index market, with their quantitative results being similar to those115n more traditional asset classes. Thus, conﬁrming that price impact is a universalphenomenon regardless of speciﬁcations of market microstructure.

D.1.4 Optimal Execution

Having a more comprehensive understanding of liquidity (ability to forecast liq-uidity more accurately) is imperative to measuring market impact of trades, henceassociated transaction costs. This knowledge can help optimize a liquidation sched-ule for large block orders. Studies of optimal trading strategies have often solelyfocused on a single agent who desires to trade a certain amount of the security.When generalised to multiple agents, the resulting stochastic optimal control prob-lem is extremely diﬃcult to solve. One approach is to investigate a mean ﬁeld gameframework in which the interactions of major and minor players are approximatedby analysing how a ﬁnite subsample N , of the population of agents who seek toadopt optimal game strategy, are restricted in their behaviour as N −→ ∞ . Suchsituations constitute Nash Equilibrium.In this context, an area of machine learning called Reinforcement Learning (whichhas roots in control theory) can be employed to solve the problem of optimisedtrade execution. Reinforcement Learning (RL) attempts to learn a policy of howto take actions in the environment in order to maximise the discounted future re-ward criterion. The key advantage of Reinforcement Learning over other Machine116earning techniques, is that the agent learns directly how to make decisions, asopposed to predicting target values. In the framework of our stochastic controlproblem, the agent learns to optimize a trading path by dividing a speciﬁed orderacross time, and venues with potentially diﬀerent liquidity proﬁles.To supplement the learning process, we will look to explore further applications ofDeep Neural Networks. As the agent accumulates the knowledge of which actionsyield the highest reward, this information is continuously stored to assist futuredecision making ( trade-oﬀ between exploration and exploitation ). However, asthe learning process continues this knowledge space can become extremely largemaking it unfeasible for the algorithm to process (both mathematically and com-putationally intractable for partially-observable/ stochastic environments). Analternative to stowing all previously encountered states and their correspondingbest actions is to generalise past experiences by creating a neural network to pre-dict the reward for a given input. This approach is more tractable than standardknowledge storing techniques, allowing RL to be applied to more complex problemswhere many network layers can capture the most intricate details, thus facilitatingdeep learning.To aﬃrm the validity of our ﬁndings, a series of experiments will be carried outusing the scientiﬁc method. The work will be conducted in a controlled envi-ronment with initial oﬄine veriﬁcation of assumptions on historical data sets.Some properties of real time market dynamics, however, cannot be veriﬁed oﬄine;117herefore, should be validated at runtime. This might require high-performancecomputational resources. The success of derived optimal trading behaviour willbe benchmark against traditional deﬁnitions of optimal execution such as arrivalprice, VWAP and participation percentage.

D.1.5 Computational Resources

High frequency and detailed precision of LOBSTER data imply that the amountof information about certain stocks’ trade activity easily exceeds several of GigaBites (GBs). Such a high volume of data is best dealt with using high-performancecomputing resources that are optimised to improve on processing timings.We strongly believe that the pace of learning from data can be majorly improved bythe usage of appropriate hardware designed for eﬃcient analysis of large quantitiesof information.The below chart illustrates the average computational time for estimation of thelag-1 unconditional impact of trades for four selected stocks. One of the challengesof the current study was the long time it took to test the algorithm and collectstatistical information for further inference.118 igure 18: