[PDF] On Detecting Spoofing Strategies in High Frequency Trading

Abstract

Spoofing is an illegal act of artificially modifying the supply to drive temporarily prices in a given direction for profit. In practice, detection of such an act is challenging due to the complexity of modern electronic platforms and the high frequency at which orders are channeled. We present a micro-structural study of spoofing in a simple static setting. A multilevel imbalance which influences the resulting price movement is introduced upon which we describe the optimization strategy of a potential spoofer. We provide conditions under which a market is more likely to admit spoofing behavior as a function of the characteristics of the market. We describe the optimal spoofing strategy after optimization which allows us to quantify the resulting impact on the imbalance after spoofing. Based on these results we calibrate the model to real Level 2 datasets from TMX, and provide some monitoring procedures based on the Wasserstein distance to detect spoofing strategies in real time.

Full PDF

OON DETECTING SPOOFING STRATEGIES IN HIGH FREQUENCYTRADING

XUAN TAO, ANDREW DAY, LAN LING, AND SAMUEL DRAPEAUA

BSTRACT . Spooﬁng is an illegal act of artiﬁcially modifying the supply to drive tem-porarily prices in a given direction for proﬁt. In practice, detection of such an act is chal-lenging due to the complexity of modern electronic platforms and the high frequency atwhich orders are channeled. We present a micro-structural study of spooﬁng in a simplestatic setting. A multilevel imbalance which inﬂuences the resulting price movement isintroduced upon which we describe the optimization strategy of a potential spoofer. Weprovide conditions under which a market is more likely to admit spooﬁng behavior as afunction of the characteristics of the market. We describe the optimal spooﬁng strategyafter optimization which allows us to quantify the resulting impact on the imbalance af-ter spooﬁng. Based on these results we calibrate the model to real Level 2 datasets fromTMX, and provide some monitoring procedures based on the Wasserstein distance to detectspooﬁng strategies in real time.Keywords: Spooﬁng, High Frequency Trading, Imbalance, Limit Order Book.

1. I

NTRODUCTION

The act of spooﬁng is a speciﬁc trading activity that aims at artiﬁcially modifying thesupply on the market without intent to trade to move it away from its equilibrium. Onemight proﬁt from the resulting short term price movement by canceling the previous supplywhile the market comes back to its equilibrium. Such a strategy implies that the spoofershould be able to act anonymously, fast and in a market where all the other agents reactto offer and demand. In this regard, with the recent rise of centrally cleared venue andhigh frequency algo-trading, the ground for the existence of spooﬁng schemes is rising,see Shorter and Miller [21]. In a competitive market where many potential spoofers arepresent, spooﬁng behavior might cancel out, but most regulations consider it as illegal. Forinstance, the 2010 Dodd-Frank Act prohibits spooﬁng – deﬁned as activity of bidding oroffering with the intent to cancel before execution – that can be prosecuted as “a felonypunishable by up to $1 million in penalties and up to ten years in prison for each spooﬁngcount”. Yet, for several reasons, detecting and prosecuting spooﬁng behavior is a challenging prob-lem. First is the sheer amount of data produced from high frequency trading across manyﬁnancial products and venues. Second, it is usually impossible to trace in real time who

Date : October 1, 2020.We also thanks the Fields Institute for the organization of the many “Fields-China Joint Industrial ProblemSolving Workshop” from which this problem stems. Finally, we thank TMX and their data analysis team forsupporting this project with profound datasets, high performing computing facilities as well as precious marketinsights in high frequency trading. The ﬁnancial supports from the National Science Foundation of China, GrantsNumbers: 11971310 and 11671257; as well as from Shanghai Jiao Tong University, Grant number AF0710020;are gratefully acknowledged. Andrew Day gratefully acknowledges the funding provided by Mitacs, the OntarioGraduate Scholarship (OGS), and the Natural Sciences and Engineering Research Council of Canada postgradu-ate scholarship (NSERC PGS) . In 2010, trader Navinder Singh Sarao was accused of exacerbating a ﬂash crash by placing thousands ofE-mini S&P 500 stock index futures contract orders in one day and changed or moved those orders more than 20million times before they were cancelled. In 2019, the high frequency company Tower Research Capital agreed to pay a ﬁne of about $60 million overspooﬁng allegations. In 2020, JP Morgan settles spooﬁng lawsuit alleging fraud for about $920 million. a r X i v : . [ q -f i n . T R ] S e p s behind every trade. From a CCP viewpoint, they mainly have access to the broker IDthrough which the trade has been channeled resulting only in aggregated informations.Furthermore, a potential spoofer might post those trades through different venues and bro-kers. Third, aside from a loose deﬁnition, it is unclear how a spooﬁng strategy differsquantitatively from other strategies and what is the resulting impact. Thus the complexityof quantifying and discriminating spooﬁng strategies from legitimate ones. Based on theabove points, it seems difﬁcult to provide an efﬁcient way to monitor the market for spoof-ing behavior. However, a potential spoofer is also confronted to the constraints of modernelectronic trading platforms. Indeed, from the basic spooﬁng description, the spoofer hasto act rapidly in a complex and high frequency environment. Therefore, it must rely on fast– henceforth simple – algorithmic strategies which, due to the complexity of the dynamicstructure of a limit order book, is based on aggregated signal.Along these lines, we intentionally address a quantitative analysis of spooﬁng in a simplesetup. As a basis for this study, let us consider a simple example. We suppose that in thenext period the limit order book shifts up by one unit with a probability ¯ p and down byone unit otherwise. From the perspective of an agent whose objective is to purchase twoshares, it faces the following three idealized situations.1 - Immediately post a buy market order for a total cost of ˆ C = 10 + 11 = 21 ˜ C = (1 − ¯ p ) (9 + 10) + ¯ p (11 + 12) = (1 − ¯ p ) 19 + ¯ p which is smaller than ˆ C if and only if ¯ p < / , hence a bearish market. − ¯ p ¯ p p < ¯ p . Doing so, with a probability q , its sell limit orderis executed through an incoming market order walking the limit order book beyond oneunit. For this executed sell order, the agent receives an average price of q ((1 − p )10 + p while its inventory increases in average to q . The cost of buying back this increased Aside from obvious cases or exogenous approaches as for insider prosecution. We suppose that during this time period, if the market moves through limit order posting/canceling, the agentkeeps its sell limit order one unit away from the best ask price by rapidly canceling and posting again. nventory minus the gain from selling its limit order results in an average net cost of C = (1 − p ) (9 + 10 + q

11) + p (11 + 12 + q − q ((1 − p )10 + p − p )19 + p

23 + q − pp From this simple situation, a proﬁtable spooﬁng situation depends on the value of ¯ p , howbearish p < ¯ p the market reacts to an increase of one share at a distance of one unit fromthe best ask price, as well as the probability q of being adversarially executed through amarket order during this time. The goal of this paper is to study the interplay betweenthese different factors. In particular the impact on the price movement as a function ofthe spooﬁng size and depth in the limit order book. Based on this study, we present someapproaches to track spooﬁng behavior and calibrate those to real market data.We model the impact of the offer and demand on the price through the volume imbalanceoften taken as the ratio of the volume of the best bid divided by the total volume on thebest bid and ask. Since the spoofer never intends to have their orders executed, spooﬁngis more likely to happen beyond the top of the limit order book since the possibility ofgetting executed is too high which results in a negative payoff to the spoofer. To take thisinto account we weight the impact on the imbalance in terms of depth as follows ¯ ı = (cid:80) ¯ v − k w k (cid:80) (cid:0) ¯ v − k + ¯ v + k (cid:1) w k where ¯ v ± k represents the volume on the bid/ask k units away from the best bid/ask and w k is the relative impact on the imbalance at level k . If the agent posts a sell limit order v onthe ask side at tick level k , the imbalance moves to ı ( v ) = (cid:80) ¯ v − k w k (cid:80) (cid:0) ¯ v − k + ¯ v + k (cid:1) w k + w k v ≤ ¯ ı If we denote by dp n the probability of a price deviation of n units, the dependence on theimbalance ı is given as follows dp n = ıdp + n + (1 − ı ) dp − n , n = . . . , − , − , , , , . . . where dp ± n represents the price deviation when the imbalance is at its extreme. When theimbalance ı is low/high – the offer/demand dominates – the price distribution is biaseddownwards/upwards through dp ± . The agent can inﬂuence the price distribution in a nonlinear way through the volume it posts: v (cid:55)−→ dp ( v ) := ı ( v ) dp + + (1 − ı ( v )) dp − Given the probability dq of a sell limit order hitting the limit order book up to a given levelthe resulting net average cost of spooﬁng turns out to be C ( v ) = pH + (1 − Q ) G ( H ) (cid:124) (cid:123)(cid:122) (cid:125) Cost for optimal situation + Hµ + (2 ı ( v ) − (cid:124) (cid:123)(cid:122) (cid:125) Spooﬁng impact + QG ( H + v ) + vν (cid:124) (cid:123)(cid:122) (cid:125) Cost for being caught wrong way3 here H is the initial objective of shares to acquire, G is the liquidity costs from walkingthrough the limit order book, µ + > is the mean of dp + , Q (resp ν ) is the probability (respmean beyond k ) of being executed beyond k . From this expression, there is a competitiveaspect between the risk of being caught on the wrong side and the fact of pushing ı ( v ) waybeyond / to get Hµ + (2 ı ( v ) − ≤ .In a theoretical part we ﬁrst provide conditions for the limit order book to admit spooﬁngmanipulation as a function of the initial imbalance ¯ ı , local sensitivity of the imbalance w ,overall price impact µ + , liquidity cost G as well as the initial objective H . In short, thismodel conﬁrms several intuitive facts when spooﬁng is more likely to occur • if the probability Q of the spooﬁng order being executed is small; • if the local sensitivity w k or the overall price impact µ + is large; • if the amount of share H to buy is large with respect to the depth of the limit orderbook; • if the initial market imbalance is close to / . • away from the top of the limit order book;As for the depth of the limit order book – how liquid the market is – the results are incon-clusive. For this to be taken into account, one should account for how the above mentionedparameters depend on the liquidity of the limit order book. The subsequent empirical studyshows that it is the case, but we can not derive conclusions from this model as in Shorterand Miller [21] where illiquid markets seems more likely to be prone to spooﬁng. Wethen address the impact of spooﬁng on the resulting imbalance ı spoof = ı ( v spoof ) afterspooﬁng and discuss its dependence as a function of the aforementioned parameters as theimbalance modiﬁed by a spoofer can be compared to the steady state one. We characterizeand discuss the deviation for the imbalance as a function of the different parameters. Inparticular as a function of the depth where the spooﬁng order is posted. We ﬁnally addressthe situation of a market maker using spooﬁng strategies for a positive round trip payoffs.Based on this study, we can theoretically discriminate spoofed imbalance from steady stateone. The main idea for potential spooﬁng detection is to compare the imbalance before andafter each market order. If the market is on steady state, the imbalance after a legitimatemarket order should come back to its current steady state ¯ ı . On the other hand, if the mar-ket order is the result of a spooﬁng pattern as described in the simpliﬁed aforementionedmodel, the imbalance before the market order should be of the form ı ∗ ≈ bb + a + wv < ba + b = ¯ ı while returning to its market steady state as soon as the spoofed volumes are canceled. Thetheoretical part provides a ﬁrst approximation how ı ∗ deviates from ¯ ı as a function of themarket parameters.We calibrate the parameters of this model as well as the weighted imbalance on severalLevel 2 data sets provided by TMX. Based on the theoretical results we provide possiblequantiﬁcation procedures using real time monitoring of a conditional Wasserstein distanceof short term imbalance histories against the long run imbalance history in order to trackeventual spooﬁng behavior.Before addressing the relevant literature, let us expose the shortcomings and modelingchoices of this approach. The micro-structure dynamic of the limit order book at highfrequency is highly complex. To excerpt some key impact of spooﬁng behavior we delib-erately concentrate on a static situation where the dynamic of the market is ignored. Forinstance, we do not consider situations where compound spooﬁng behavior happens. We In other words, better than immediate or delayed market order. Since we consider the limit order book beyond its top, a dynamic version of the present approach wouldresult into a fairly complex and high dimensional dynamic programing problem. urthermore assume that there exists a single potential spoofer and that the market is in-ﬁnitely reactive in the sense that it comes back to its steady state driven by the imbalance.There is no implicit game between the market and the spoofer where the market acknowl-edge the potential existence of a spoofer. There is also no competitive game between twoor more spoofers. Also, even though we shortly address the situation of a round trip fora market maker and the resulting optimal spooﬁng behavior, we take the viewpoint of amarket taker willing to purchase/sell a given amount of shares. The overall goal being abasic understanding of spooﬁng in its most simple nature, quantify the resulting impactand derive potential detection procedures. Reﬁnement of this approach, other take on, aswell as more adequate quantiﬁcation procedures are topics of further studies.1.1. Literature review.

There exists a solid stream of research showing that even rationalspeculative activities might destabilize prices, have an adverse effect on market efﬁciencyor eventually leads to different forms of arbitrage; From market speculation based on var-ious form of information asymmetries, for instance Hart and Kreps [14], Allen and Gale[3] or Jarrow [15], to price manipulation in limit order books using different market im-pact assumptions and trading strategies, as studied in Alfonsi and Schied [1], Alfonsi et al.[2], Gatheral [10], Gatheral and Schied [11]. The speciﬁc case of spooﬁng behavior hasnot yet been the subject of much theoretical study.Although many high frequency trading strategies are legitimate, Shorter and Miller [21]point out that high frequency trading ﬁrms may engage in potentially manipulative strate-gies involving the usage of quote cancellations. Lee et al. [16] empirically study the changein spooﬁng behavior following a change in volume disclosure rules on the Korean Ex-change (KRX) at the start of 2002. Up to the end of 2001 the KRX disclosed the totalvolume of shares on both sides of the book and also the volumes at each tick up to 5 ticksfrom the best ask/bid. At the start of 2002 the KRX stopped disclosing the total volumeon both sides in an effort to stop spooﬁng, but increased the disclosed volumes at eachtick from the ﬁrst 5 to the ﬁrst 10 ticks from the best ask/bid. They show that spooﬁngis proﬁtable and spoofers tend to prefer stocks with higher return volatility, lower mar-ket capitalization, lower price and lower managerial transparency. This study suggests theimportance of the depth of book on spooﬁng strategies and potential price manipulationbeing carried out through a form of “volume imbalance”. Wang [24] followed a similarmethodology in empirically studying spooﬁng on Taiwan’s index futures market. Theyfound consistent results on the impact of spooﬁng on the market, but without the noveltesting ground on changes in the disclosure of volumes deeper in the limit order book.Some other studies to detect price manipulations are mainly based on learning algorithms.Among other, Cao et al. [5, 6] based on the deﬁnition of spooﬁng in [17] use K-nearestneighbour, one class support vector machine and adaptive hidden markov models to clas-sify the data. Miranda et al. [18] characterize spooﬁng and pinging as full and partialobservability of Markov decision processes. Under a reinforcement learning framework,they ﬁnd that in order to maximise the investment growth, a trader will always employspooﬁng or pinging orders except when market adds extra transaction costs or ﬁnes. Incontrast to these empirical studies, our approach focuses on the micro economic featuresof spooﬁng behavior, in particular using our main stylized factor â ˘A ¸S imbalance, whichmeasures the difference between bid and ask side.Concerning the impact of the imbalance on direction of the price movement: Lipton et al.[17] use the deﬁnition on the top of the book for the imbalance and study the impacton the trade arrival dynamic and resulting price movement. They ﬁt a stochastic modelfor this behavior on real market data. Cartea et al. [7] employ volume imbalance as asignal to improve proﬁts on the liquidation of a collection of shares in a dynamic high-frequency trading environment. Gould and Bonart [12] ﬁt logistic regressions between theimbalance and the direction of the subsequent mid-price movement for each of 10 liquidstocks on Nasdaq, and illustrates the existence of a statistically signiﬁcant relationship. u et al. [26] compute the imbalance at multi position in the limit order book and ﬁta linear relationship between this imbalance and the mid-price change. They ﬁnd thatthe goodness-of-ﬁt is considerably stronger for large-tick stocks than it is for small-tickstocks. The impact of order imbalance on prices has also been studied by Cont et al. [9] andBechler and Ludkovski [4], for example. Bechler and Ludkovski also found that includingcharacteristics of deeper parts of the book may be necessary for forecasting price impact.However, due to the nature of their dataset, they were only able to look at an aggregatedform of the depth of book while we are able to use the exact volumes at all depths inthe book. Sirignano [22] also used the book volumes beyond the touch to model pricemovements in a deep learning setting. Further suggesting the impact of depth of book onpredicting future price movements.Finally, the closest work to the present one in terms of quantitative analysis of spooﬁngbehavior in relationship with imbalance is from Cartea et al. [8]. They adopt a dynamicapproach where the trader inﬂuences the imbalance to derive the optimal strategy. Theycalibrate their model to market data and provide trading trajectories for the spoofer showingthat spooﬁng considerably increases the revenues from liquidating a position. While beingin a dynamic setting, in contrast to the present study, everything happens at the top of thelimit order book for the imbalance to be manipulated. Furthermore, we do not focus hereon the resulting gains from the spoofer, be rather on the impact on the imbalance fromspoofer as for detection purposes from a regulatory viewpoint.1.2. Organization of the paper.

The ﬁrst Section introduces the model, the imbalanceand the dependence of the price movement on that imbalance. The second Section presentsthe spooﬁng strategies, addresses the theoretical conditions for spooﬁng behavior to happenand provide the resulting imbalance after spooﬁng together with numerical illustrations. Italso addresses the situation of a round trip from a market maker viewpoint in this model.The third Section is dedicated to the calibration procedure of the model on real Level2 market data from TMX. The last Section discusses and introduces some quantitativeapproach to track spooﬁng behavior in real time illustrated on real datasets.2. L

IMIT O RDER B OOK , L

IQUIDITY C OSTS AND I MBALANCE

The ask price is denoted by p and the limit order book on the ask side by ¯ v = (¯ v , . . . , ¯ v N ) ,that is, ¯ v is the volume posted at ask price p , ¯ v the volume posted at p + δ , etc. where δ is the tick size. We denote by ¯ v − = (¯ v − , . . . , ¯ v − N ) the limit order book on the bid side. Given a limit order book inventory ¯ v on the ask side, we deﬁne for an amount of share H ≥ the function F ( H ) := inf (cid:40) x ∈ N : x (cid:88) k =0 ¯ v k ≥ H (cid:41) which represents how many positive price tick deviation an order of size H generates.Given an amount of shares H to buy, a bid price p and an ask limit order ¯ v , the resultingcosts of the market order is pH + F ( H ) (cid:88) k =0 kδ ¯ v k − δF ( H )  F ( H ) (cid:88) k =0 ¯ v k − H  = pH + δG ( H ) The term G on the right hand side represents the liquidity costs depending only on ¯ v . Remark 2.1.

Throughout the theoretical part of this work we assume that the limit orderbook is blocked shaped with an amount a > of shares at each price level of the ask side.We then get the continuous approximation F ( H ) ≈ Ha and G ( H ) ≈ H a That is ¯ v − is the volume posted at bid price p − < p , ¯ v − the volume posted at p − − δ , etc. s for the imbalance of the limit order book, measure of the difference between offer anddemand, we proceed as follows. Let w , . . . , w N with (cid:80) Nk =0 w k = 1 and w k > , a weightfor each tick level k , and a limit order book inventory ¯ v − , ¯ v on the bid and ask respectively,we denote by ¯ B = N (cid:88) k =0 w k ¯ v − k = (cid:104) w, ¯ v − (cid:105) ¯ A = N (cid:88) k =0 w k ¯ v k = (cid:104) w, ¯ v (cid:105) the weighted average bid and ask volumes. We deﬁne the imbalance as ¯ ı := BB + A ∈ (0 , Remark 2.2.

In a blocked shaped setting, if b denotes the amount of orders on every pricelevel on the bid side, we get ¯ ı = (cid:80) Nk =0 w k b (cid:80) Nk =0 w k ( a + b ) = ba + b which yields b = a ¯ ı/ (1 − ¯ ı ) . The price deviation in the next period can be triggered by two events. The posting and can-cellation of incoming limit orders as well as the posting of market orders. We distinguishbetween both, since the former does not have an impact on the execution of existing limitorders while the latter has. We generically denote by dp = { dp − N , . . . , dp , . . . , dp N } µ = N (cid:88) x = − N xdp x dq = { dq − N − , . . . , dq , . . . , dq N +1 } ν = N +1 (cid:88) y = − N − ydp y the distribution and mean, respectively, of the two possible price movement in the nextperiod. For the sake of simplicity, we assume that the imbalance does not have an impacton incoming market orders and that the price deviation with respect to the market orders isneutral, that is ν = 0 . We furthermore assume that they are independent of each others. Toreﬂect the fact that the imbalance, as an indicator of the offer and demand on the market,has an impact on the market makers, we consider a parametrization ı (cid:55)→ dp ( ı ) of the pricemovement driven by limit orders as a function of the imbalance ı . Since the imbalancemoves between and , we assume that the distribution dp moves as a convex combinationof ı between the distribution dp − – distribution when the imbalance is close to , that ishighly skewed to the left – and the distribution dp + – distribution when the imbalance isclose to , that is highly skewed to the right. Mathematically: dp ( ı ) = ıdp + + (1 − ı ) dp − From the skewness assumptions and symmetry of the imbalance indicator, we assume that dp + x ≥ dp − x for every x ≥ and dp + x = dp −− x for every x which implies that µ ( ı ) := N (cid:88) x = − N xdp x ( ı ) = (2 ı − N (cid:88) x = − N xdp + x = µ + (2 ı − The subsequent theoretical study adapt to eventual joint distribution of price movement due to limit andmarket orders also jointly dependent on the imbalance. The exposition of which is no longer explicit but can besolved numerically. y assumption, µ + is positive, showing that µ ( i ) moves between − µ + and µ + and is equalto for an imbalance of / when offers equal demand.3. S POOFING STRATEGY

Suppose that at a given time we are given an ask price p and a limit order book inventory (¯ v − , ¯ v ) . A trader willing to buy an amount H of shares faces the following three options. • Immediate market order: for a total costs of pH + δG ( H ) • Delayed market order: for an average total cost of N (cid:88) x = − N N +1 (cid:88) y = − N − [( p + δ ( x + y )) H + G ( H )] dp x (¯ ı ) dq y = pH + δ (cid:0) G ( H ) + Hµ + (2¯ ı − (cid:1) Clearly if ¯ ı < / , then this second option is better than a direct buy. • Spooﬁng and delayed market order:

Book ﬁrst an ask limit order v at a depth k in { , , . . . , N } on top of the ask limit order book ¯ v k to increase the liquidity on the ask sideand signal a surge in supply to the market. In the next period the price deviates from p to p + δ ( x + y ) and two situations may happen: – y ≤ k : no market order of sufﬁcient magnitude hits the limit order book and thereforethis limit order is not executed against an incoming market order. The previous limitorder is canceled and the amount H of shares is acquired for a cost of ( p + δ ( x + y )) H + δG ( H ) – y > k : the limit order is executed against an incoming market order at a price level p + δ ( x + k ) . The new objective moves to H + v resulting in a net cost of ( p + δ ( x + y )) ( H + v ) + δG ( H + v ) − ( p + δ ( x + k )) v = ( p + xδ ) H + δG ( H + v ) + δ ( y − k ) v It follows that the spooﬁng net cost for a price deviation of p + δ ( x + y ) is given by(3.1) C k ( v, x, y ) = ( p + ( x + y ) δ ) H + δG (cid:0) H + v { y>k } (cid:1) + δ ( y − k ) v { y>k } However, the posting of the selling limit order modiﬁes the imbalance from ¯ ı to ı k ( v ) := BA + B + w k v In other words, the imbalance will move downwards, shifting the distribution dp to morefavorable outcomes. Since we assume that ν = 0 , it follows that the average net costs aregiven by C k ( v ) := N (cid:88) x = − N N +1 (cid:88) y = − N − C k ( v, x, y ) dp x ( ı k ( v )) dq y = N (cid:88) x = − N ( p + δx ) Hdp x ( ı k ( v )) + δ (1 − Q k ) G ( H ) + δQ k G ( H + v ) + δv N +1 (cid:88) y = k +1 ( y − k ) dq y = pH + δ (1 − Q k ) G ( H ) (cid:124) (cid:123)(cid:122) (cid:125) Cost for optimal situation + δHµ + (2 ı k ( v ) − (cid:124) (cid:123)(cid:122) (cid:125) Spooﬁng impact + δQ k G ( H + v ) + δvν k (cid:124) (cid:123)(cid:122) (cid:125) Cost for being caught wrong way where Q k = N +1 (cid:88) y = k +1 dq y and ν k = N +1 (cid:88) y = k +1 ( y − k ) dq y ote that this cost functional is convex in v since G and ı k are convex functions. Notealso that in order to take advantage of this spooﬁng impact, it is necessary to drive theimbalance ı k ( v ) below / . Remark 3.1.

Note that we implicitly assume that the spooﬁng only happens at a givendepth k . It is possible to spoof simultaneously at different depths resulting in a slightlymore complex cost function that can be solved numerically. The conclusions do not changequalitatively and we use the more general multi-depth spooﬁng for the analysis of data inthe subsequent sections. Existence of Spooﬁng Manipulation.

The question is whether it is possible to pushthe imbalance as much as possible to in order to offset the costs of posting selling orders,they being executed and paying the liquidity costs of buying them back. Deﬁnition 3.2.

We say that the limit order book (¯ v − , ¯ v ) admits a (market taker) spooﬁngmanipulation if there exists v > and k ∈ { , , . . . , N } such that (3.2)  C k ( v ) < pH + δ (cid:0) G ( H ) + µ + H (2¯ ı − (cid:1) if ¯ ı ≤ / C k ( v ) < pH + δG ( H ) if ¯ ı > / According to the average costs of spooﬁng, these two inequalities turn into(3.3) Q k [ G ( H + v ) − G ( H )] < µ + H (cid:18) ¯ ı ∧ − ı k ( v ) (cid:19) − vν k The following results concerns the existence of spooﬁng manipulation in a blocked shapedsetting where the volume on the ask side of the limit order book is a everywhere. For scal-ing reasons let H = ρa , where ρ represents the ratio of the shares to purchase to the depthof the limit order book. Our ﬁrst result concerns the existence of spooﬁng manipulation. Proposition 3.3.

In a block shaped setting, where the volume on the ask side of the limitorder book is a everywhere. The following assertions hold • If ¯ ı ≤ / , the limit order book admits no spooﬁng manipulation if and only if (3.4) holds; • If ¯ ı > / , the limit order book admits no spooﬁng manipulation if (3.4) holds.Where H = ρa and (3.4) ρµ + (1 − ¯ ı ) ¯ ıw k ≤ Q k ρ + ν k for all k Proof.

Let H = ρa , and the imbalance ı k ( v ) = b/ ( b + a + w k v ) with b = a ¯ ı/ (1 − ¯ ı ) . Itfollows that the gradient of ı k ( v ) is given by ∇ ı k ( v ) = − b ( a + b + w k v ) w k = − (1 − ¯ ı ) a ¯ ı ı k ( v ) w k If ¯ ı ≤ / , from the previous equations, since g ( x ) = x / (2 a ) , it follows that there is nospooﬁng manipulation if and only if f ( v ) := Q k v a + Q k ρv + vν k − ρaµ + (¯ ı − ı k ( v )) ≥ for any v ≥ . Taking the gradient for this function yields ∇ f ( v ) = Q k va + Q k ρ + ν k − ρµ + − ¯ ı ¯ ı ı k ( v ) w k which is a monotone functional in v . Since f (0) = 0 , it follows that f ( v ) ≥ for any v ifand only if ∇ f (0) (cid:62) which is equivalent to Q k ρ + ν k ≥ ρµ + (1 − ¯ ı )¯ ıw k f ¯ ı > / , there is no spooﬁng manipulation if and only if f ( v ) := q k v a + ρq k v − q k kv − ρaµ + (1 / − ı k ( v )) ≥ for any v (cid:62) the gradient of which is given by ∇ f ( v ) = q k va + ρq k − q k kv − ρµ + − ¯ ı ¯ ı ı k ( v ) w k Since f (0) > , as previously argued, it follows that f ( v ) ≥ as soon as ∇ f (0) (cid:62) ,which yields the same conditions. (cid:3) From this proposition, we deduce that price manipulation is more likely to occur • if Q k is small – and as a byproduct ν k . If the probability to get a spooﬁng orderexecuted is small, there is relatively no downsize at spooﬁng. • if ¯ ı is close to / . If the imbalance is close to / , then ¯ ı (1 − ¯ ı ) is maximum. Theimpact of moving the price in ones favor is maximal there. • if µ + is large: µ + represents the mean deviation sensitivity as a function of the im-balance. The more sensitive the price movement is with respect to the imbalance,the more likely spooﬁng strategies may occur. • If w k is large: w k represents the relative impact at tick level k of a spooﬁng volumeto the imbalance. If one of w k is large with respect to the corresponding Q k , thenspooﬁng is more likely to occur there. • if ρ is relatively large. If the amount of order to buy relative to the overall offer isvery large, spooﬁng is more likely to happen.Figure 1 represents the spooﬁng in terms of the initial imbalance ¯ ı with varying marketparameters.F IGURE

1. Spooﬁng condition (3.4) as a function of ¯ ı and in ( a ) µ + =1 ρ = 1 , k = 3 , w k = 0 . , dq i = 0 . for all i ≥ k . One parameteris increased each time with respect to ( a ) where ( b ) : dq i = 0 . for all i ≥ k ; ( c ) : µ + = 2 ; ( d ) : ρ = 2 ; ( e ) : w k = 0 . ; ( f ) : k = 4 .3.2. Optimal Spooﬁng and Resulting Imbalance Impact.

Let us now address the prob-lem of ﬁnding the optimal spooﬁng strategy. In particular as a function of the depth atwhich the spooﬁng order is placed. roposition 3.4. The optimal spooﬁng volume v spoof at a given level k and resultingimbalance ı spoof – adopting the notations w := w k , Q := Q k and ν := ν k – are given by: v spoof = aQ (cid:20) ρwµ + − ¯ ı ¯ ı ı spoof − ( Qρ + ν ) (cid:21) + where ı spoof is the unique cubic root solution in (0 , ¯ ı ] of ¯ ıı = 1 + (1 − ¯ ı ) wQ (cid:20) ρwµ + − ¯ ı ¯ ı ı − ( Qρ + ν ) (cid:21) + Proof.

Adopting the notations Q := Q k , ν = ν k , w = w k , H = ρa , the goal is to optimizeover v ≥ the objective function f ( v ) = (1 − Q ) ( ρa ) a + Q ( ρa + v ) a + ρaµ + (2 ı ( v ) −

1) + vν = ( ρa ) a + Q v a + ( Qρ + ν ) v + ρaµ + (2 ı ( v ) − First order condition with Lagrangian λ yields Q va + ( Qρ + ν ) − ρwµ + − ¯ ı ¯ ı ı = λ where ı := ı ( v ) . Solving as a function of ı in (0 , , we get λ ( ı ) = (cid:20) ( Qρ + ν ) − ρwµ + − ¯ ı ¯ ı ı (cid:21) + v ( ı ) = aQ (cid:20) ρwµ + − ¯ ı ¯ ı ı − ( Qρ + ν ) (cid:21) + Given now the optimal v ( ı ) as a function of ı , we solve for ı such that ı = a + b + wv ( ı ) b = 1¯ ı + w − ¯ ıa ¯ ı v ( ı ) = 1¯ ı + wQ − ¯ ı ¯ ı (cid:20) ρwµ + − ¯ ı ¯ ı ı − ( Qρ + ν ) (cid:21) + Since the left hand side in strictly decreasing on from ∞ to ı on (0 , ¯ ı ] and the right handside is increasing from ı , on (0 , ¯ ı ] , there exists a unique solution which is a cubic root. (cid:3) Though the solution is implicit, we can inspect the spooﬁng behavior as a function ofthe distance to the top of the limit order book. Note ﬁrst that ı spoof = ¯ ı if and only if ρwµ + (1 − ¯ ı )¯ ı ≤ Qρ + ν which results into v spoof = 0 . This coincide with the nospooﬁng condition of the previous proposition. We are interested on the relative spooﬁngsize as a function of these parameters. From the deﬁnition of the imbalance, ı ( v ) increasesif and only if v decreases, so we get more spooﬁng volume as ı spoof gets smaller. Nowfrom the implicit function it holds that ı = 1¯ ı + wQ − ¯ ı ¯ ı (cid:20) ρwµ + − ¯ ı ¯ ı ı − ( Qρ + ν ) (cid:21) + =: f (cid:0) w, µ + , ν, Q, ¯ ı, ρ, ı (cid:1) where the function f is an increasing function of ı greater than / ¯ ı . • Since f is increasing as a function of w and µ + , it follows that ı spoof is decreasingas a function of w and µ + . Hence, spooﬁng behavior increases as a function of theimpact w at level k on the imbalance as well as a function of the overall sensitivity µ + of the price movement with respect to the imbalance. • Since f is decreasing as a function of Q and ν , it follows that ı spoof is increasingas a function of Q and ν . From an empirical viewpoint, Q = Q k as well as ν = ν k decreases as a function of the depth k . It follows that spooﬁng behavior ismore likely to happen and increase deeper in the limit order book. However, thisconclusion is short of the fact that local sensitivity of the imbalance w = w k alsodepends on the depth with an inverse impact. According to empirical analysis, it urns out that w does not exert this decreasing behavior as a function of k at leastwithin a reasonable depth in the limit order book. It seems that spooﬁng behavioris more likely to happen at a reasonable distance from the top of the limit orderbook.The behavior of the resulting imbalance ı spoof as a function of the initial imbalance ¯ ı ismore difﬁcult to stress out. We know that ı spoof ≤ ¯ ı and for the same reasons as before itis increasing as a function of ¯ ı . The same holds for the dependence on the relative numberof shares to purchase ρ . Figure 2 represents the curves of the spoofed imbalance ı spoof (¯ ı ) as a function of the initial imbalance ¯ ı for different depths with varying market parameters.F IGURE ı as a function of ¯ ı and in ( a ) µ + = 1 ρ = 1 , w k = 0 . , dq i =0 . for all i ≥ k . One parameter is increased each time with respectto ( a ) where ( b ) : dq i = 0 . for all i ≥ k ; ( c ) : µ + = 3 ; ( d ) : ρ = 3 ; ( e ) : w k = 0 . ; Blue line : k = 0 ; red line: k = 2 ; orange line: k = 4 .3.3. Round Trip Situation.

In this paper we mainly focus on the spooﬁng behavior froma market taker’s viewpoint. As for a market maker, spooﬁng behavior might be rewardingas well. However, as seen in the following subsection, the rewards from spooﬁng areintertwined with the ones from pure market making.We present a simple situation together with the numerical analysis in a blocked shapesetting with the same model assumptions as before. We assume that the potential marketmaker spoofer acts as follows: At the ﬁrst stage it decides to spoof with a volume v atdepth k on the ask side to drive the price down and acquire an amount H of shares afterthis price movement. When the market comes back to its steady state, it liquidates H andeventually buys back v if it has been executed. We assume that v and H are decided at thevery beginning. This stylised situation makes strong assumptions and simpliﬁcations. First H is decided at time even if itis executed after the price movement. This is to prevent conditional optimization. Second, the liquidation of theinventory H and v occurs separately. Once again, to provide simpliﬁed optimization problem, while we couldnumerically consider a liquidation of the net inventory H − v . Finally, a second spooﬁng could happen at thesecond stage as in the previous section to liquidate the inventory. fter spooﬁng a volume v at level k , as soon as the price moves the spoofer executes itsmarket order H for a revenue of − (cid:0) H − v { y>k } (cid:1) ( p + δ ∆ + δ ( x + y )) − δG a ( H ) − δ ( y − k ) v { y>k } where G a ( H ) = H / (2 a ) , p = ( p + + p − ) / , and ∆ = ( p + − p − ) / (2 δ ) is the effectivespread in ticks. The spoofer then waits for the market to return to its steady state andliquidate the resulting inventory with market orders. For ease of computation, we assumethat it executes two market orders: One for H and one for v if it has been executed for arevenue of Hp − δ ∆ H − G b ( H ) − { y>k } ( v ( p + δ ∆) + δG a ( v )) Adding both and integrating yields an average net revenue of R ( H, v ) δ = − H (cid:0)

2∆ + µ + (2 ı k ( v ) − (cid:1) − G a ( H ) − G b ( H )+ Q k (cid:2)(cid:0) k + µ + (2 ı k ( v ) − (cid:1) v − G a ( v ) (cid:3) = − H (cid:0)

2∆ + µ + (2 ı k ( v ) − (cid:1) − a ¯ ı H Q k v (cid:2) k + µ + (2 ı k ( v ) − (cid:3) − Q k a v From this equation, we can derive the following remarks concerning the decision of thespoofer: • If v = 0 : This corresponds to the classical situation where a market maker takesadvantage of the temporary market movement to execute a market order and cashout at a later time when the market comes back to its steady state. Clearly, it getsa positive gain if and only if ¯ ı ≤ − ∆ µ + In particular, if the effective spread ∆ is large, or if µ + is small, then it is impossi-ble or the initial imbalance should be very small. In the case where this happens,then ¯ H ∗ is given by ¯ H ∗ = a ¯ ı (cid:0)

2∆ + µ + (2¯ ı − (cid:1) − with corresponding revenue of ¯ R ∗ = 12 (cid:104) a ¯ ı (cid:0)

2∆ + µ + (2¯ ı − (cid:1) − (cid:105) • If H = 0 : This corresponds to the classical situation where a market maker postslimit orders at a given depth to gain from possible ﬂuctuations. This results incorresponding average revenue given v of ˆ R ( v ) = Q k v (cid:0) k + µ + (2 ı k ( v ) − (cid:1) − Q k a v From this equation, even if the spoofer gets a positive gain of k ticks buy execut-ing its order, it will drive the imbalance ı k ( v ) below / and face adverse pricemovement that will offset its gains. The optimal ˆ v ∗ = ˆ v ∗ (¯ ı ) in that situation is notexplicit, but can be easily numerically implemented and corresponds to an optimalrevenue of ˆ R ∗ = Q k ˆ v ∗ (cid:0) k + µ + (2 ı k (ˆ v ∗ ) − (cid:1) − Q k a ˆ v Combining both in terms of H − v { y>k } is cost effective but complicates the exposition of the result. n general, solving for the optimal H is straightforward with H ∗ = a ¯ ı (cid:0)

2∆ + µ + (2 ı k ( v ) − (cid:1) − and corresponding average revenue: R ( v ) δ = 12 (cid:104) a ¯ ı (cid:0)

2∆ + µ + (2 ı k ( v ) − (cid:1) − (cid:105) + Q k v (cid:0) k + µ + (2 ı k ( v ) − (cid:1) − Q k a v These two effects are difﬁcult to disentangle from a truly spooﬁng gain when H as well as v are strictly positive. However, this can be done numerically and the results are presentedin Figure 3, where the spooﬁng region – H > as well as v > – is indicated. We alsoprovide the plots of ¯ H ∗ as well as ˆ v ∗ against H ∗ and v ∗ .We can however draw some stylised facts about the spooﬁng behavior from this marketmaker viewpoint. The impact of the different parameters – initial imbalance ¯ ı , probabilityof getting executed Q k , local sensitivity of imbalance on the price impact w = w k as wellas overall price deviation µ + are similar to the previous case. However, in addition to theprevious part, the effective spread ∆ acts negatively on the spooﬁng opportunity in thatcontext. Indeed, a positive market order H is only triggered if µ + (2 ı ∗ − ≤ − , whichrequires a spoofed imbalance satisfying ı spoof ≤ − ∆ µ + + 12 If ∆ is too large or µ + too low, a spooﬁng strategy is no longer rewarding.F IGURE ı spoof as a function of ¯ ı and in ( a ) µ + = 3 , ∆ = 0 , k =1 , w k = 0 . , dq i = 0 . for all i ≥ k . One parameter is increasedeach time with respect to ( a ) where ( b ) : dq i = 0 . for all i ≥ k ; ( c ) : µ + = 4 ; ( d ) : ∆ = 2 ; ( e ) : w k = 0 . ; ( f ) : k = 4 . Red area: v ∗ = 0 , H ∗ = 0 ; Dark blue area: v ∗ = 0 , H ∗ > ; Light blue area: v ∗ > , H ∗ > ; White area: v ∗ > , H ∗ = 0 . . C ALIBRATION

According to this model, we calibrate the imbalance as well as dp and dq on real dataprovided by TMX. These datasets consists of level 2 data from June to September 2017.Among the 1500 equities we selected 9 varying in company background, market capital-ization as well as trading frequency. The level 2 datasets include time, order price, volume,type — buy/sell; booked/cancelled/traded —, order ID and counterpart order ID in case ofa trade, see Figure 4F IGURE

4. Original Level 2 dataset of stock AEM provided by TMX.Since these are provided in diff form, therefore a cumulative aggregation allows to con-struct the full limit order book at any time as in Figure 5F

IGURE

5. Generation of the full limit order book out of the Level 2 data.This operation is computationally heavy and therefore has been realized on a distributeddata cluster of TMX with spark.With the full limit order books at hand we divide the calibration into two steps: • Find a normalized sample frequency and depth N ; • Calibrate the imbalance generically, that is, as a function of the weights w = { w k } k =0 ,...,N ; • Estimate dq , dp ± and weights w = { w k } k =0 ,...,N .The sample frequency should be large enough such that there exists enough variance inprice change, see Figure 6F IGURE

6. Left panel: Histogram of original AEM price change. Rightpanel: Histogram of AEM price change after sampling.To compare across markets with different trading activity—and eventually time duringthe day—we ﬁx a target variance of σ for the price movement and select the optimalfrequency f for each stock as to minimize the square distance between σ f and σ . For atarget variance σ = 2 , Table 1 is the sample frequency for different stocks with a summary tatistics of the average volume and arrival rate for Market/Limit Orders on the bid and askside. As for the depth N , we take the 99% quantile of empirical sampled price changedistribution. Stock f N

Market Orders Limit OrdersBuy Sell Buy SellVol Rate Vol Rate Vol Rate Vol RateAEM 2 5 119 0.076 120 0.070 148 6.655 148 6.856BMO 11 4 172 0.110 168 0.130 150 2.480 152 2.560CNR 6 4 141 0.104 134 0.100 139 2.400 142 2.401CPG 53 5 311 0.131 95 0.116 873 1.766 885 1.746HVU 16 4 518 0.079 464 0.078 579 1.823 577 1.810HXU 26 3 845 0.004 902 0.004 2498 0.905 2578 1.195PPL 26 4 142 0.101 152 0.110 167 2.312 184 2.470SSO 28 3 133 0.031 38 0.033 353 1.992 360 1.916VUN 50 2 305 0.001 473 0.001 2561 0.809 2690 0.738XEG 213 3 1422 0.020 1587 0.018 3573 1.854 3659 1.926 T ABLE

1. Stock data from June 5, 2017 to June 9, 2017. The Volcolumns corresponds to the average volume per seconds and the Ratecolumns corresponds to the number of orders per second.With the sampling frequency f and depth N , we deﬁne the average imbalance at time t as ˆ ı ( w, t ) = (cid:80) k ≤ N (cid:80) t − f ≤ s

7. Distribution of the average imbalance for different weightsfor the BMO stock.Nevertheless for each weight vector, the resulting distribution is close to a skewed normaldistribution. For a given weight w , using maximum likelihood, we ﬁt the empirical dis-tribution to the corresponding skew normal distribution SN ( α ( w ) , ξ ( w ) , ω ( w )) , the ﬁt ofwhich is particularly good, see ﬁgure 8 for an example. IGURE

8. Histogram of BMO average imbalance and ﬁtted skewnor-mal distribution: w = [0 . , . , . , . , . .The third step is to determine dq , dp ± with the optimal weights w . As for dq , it is theprobability that the price moves by k ticks triggered by market orders. Thus for eachmarket order, compute F ( H ) = inf { x ∈ N : x (cid:88) k =0 v k ≥ H } where H is the volume of the market order. This represents exactly how many positivetick price deviation an order of size H will produce. We derive dq from the empiricaldistribution.As for dp ± and w , a maximum likelihood estimation is implemented to solve(4.1) dp ∗ , w ∗ = arg min dp + ,w (cid:34) − M M (cid:88) m =1 log p ( x m , ˆ ı m , w ) (cid:35) where x m is the empirical price change, ˆ ı m the average imbalance for a given weight w , p ( x m , ˆ ı m ) = dp x m (ˆ ı m ) p (ˆ ı m ) where dp x m (ˆ ı m ) = ˆ ı m dp + x m + (1 − ˆ ı m ) dp − x m represents the conditional probability ofprice change equal to x m given ˆ ı m , and p (ˆ ı m ) is the density of the ﬁtted skewnormaldistribution for a weight w evaluated at ˆ ı m .Figure 9, illustrating the value of the optimal weight w for selected stocks, shows differ-ent patterns. Overall, it turns out that the relative impact of the imbalance to the pricedistribution is more important away from the top of the limit order book.We also performed this calibration procedure on stock BMO weekly from June 5th to June30th, as well as for the ﬁrst hour of trading monthly from June to September. Figure 10provide the optimal weights in each case for BMO.F IGURE w for stock AEM, BMO, CNR, CPG, HVU, HXU, PPL,SSO, XEG from June 5th, 2017 to June 9th, 2017. IGURE

10. Left panel: w for stock BMO each week in June, 2017.Right panel: w for stock BMO each month in June, 2017 only using ﬁrsthour trading data.Notice that for the ﬁrst hour of trading the optimal weights are more consistent acrosstime, but all show that the weight impact on the price movement happens deeper in thelimit order book.As for the corresponding dp + and dq , they are represented in Figure 11 for stock BMOfrom June 5th, 2017 to June 9th, 2017. As expected, dp + , representing the price movementas the imbalance is large, is skewed to the right.F IGURE dp + and dq for BMO from June 5th to June 9th.5. A PPROACHES TO S POOFING D ETECTION

For reasons mentioned in the introduction, it is difﬁcult from a regulatory viewpoint toﬁgure out whether or not spooﬁng happened a-posteriori. According to the theoreticalpart, the act of spooﬁng will inﬂuence the resulting imbalance. However, to monitor theimbalance is akin to contemplate pure noise as shown in Figure 12.F

IGURE

12. Imbalance of stock BMO from 09:30 to 16:00 on June, 7, 2017.In the following, we propose some possible ways to perform such a monitoring based onthe theoretical results. The strategy comes from the following observation: For a spooﬁng trategy to be successfully fulﬁlled, a market order has to be executed. Hence whenobserving an executed market order two situations may happen: The market order is a legitimate one. In that case, the imbalance before this marketorder ı − and after ı + should follow statistically the classical long run behavior. Inother words, in a legitimate situation, we should observe statistically the pair ( ı − , ı + ) for each market order. The market order is the result of a spooﬁng behavior. The implicit equilibriumwithout spooﬁng would be the pair ( ı − , ı + ) . After the market order is executed,the market imbalance should be back to its equilibrium ı + . However before themarket order, the spoofer observes the implicit imbalance ı − and decides to spoofaccording to this information, sending to the market ı spoof ( ı − ) instead of ı − . Theresulting observation for those spoofed market orders is therefore the pair ( ı spoof ( ı − ) , ı + ) Furthermore, spooﬁng strategies are supposed to happen sporadically but intensively withina short time horizon. Before presenting some strategies, let us ﬁx some notations: • Π = { t < t < . . . < t M } represents the time stamps of each (buy) marketorders in a long sample. For ease of notations, we relabel it by { , . . . , M } . • ˆ ı − ( t ) and ˆ ı + ( t ) represents the imbalance before and after the market order hap-pening at time t in Π . • ( ı − , ı + ) represents the joint distribution of the imbalance right before and aftereach market orders ﬁtted to the overall data. We assume that these represents thestable behavior of the market without spooﬁng. • (ˆ ı N − ( t ) , ˆ ı N + ( t )) represents the empirical distribution generated by the last s = t, . . . , t − N + 1 observed imbalances (ˆ ı − ( s ) , ˆ ı + ( s )) , where N (cid:28) M is a shorthorizon sample size. • The previous theoretical part, even if not explicit in terms of solution allows us tocompute numerically ı spoof ( ı − ) for a given implicit imbalance ı − .5.1. Monitoring ˆ ı N − . A ﬁrst idea is to monitor the short behavior ˆ ı N − as times passes totest if it is statistically different from the equilibrium ı − . This is however not adequatefor the following reasons. First, this is not related to spooﬁng behavior and might reﬂectssome other market patterns. It is also not clear how to derive the magnitude of a potentialspooﬁng behavior. Second, and more importantly, the sequence of ˆ ı − ( t ) for each marketorder is highly dependent. Indeed, there might exists market conditions – bullish/bearish,etc – such that a short horizon sample ˆ ı N − differs strongly from the long term behavior.Figure 13 provides empirical evidence about the sequential dependence of the imbalance ˆ ı − as well as ˆ ı + over time. In this paper, we do not consider spooﬁng strategies involving only limit orders. Several weeks, either for the full day of selected time of the day. IGURE

13. Left panel: ˆ ı − autocorrelation of stock BMO on June 7,2017. Right panel: ˆ ı + autocorrelation of stock BMO on June 7, 2017.Red area is the 95% conﬁdence interval of the autocorrelation.5.2. Monitoring (ˆ ı N − , ˆ ı N + ) . The statistical link towards spooﬁng is the additional obser-vation of the imbalance after the spooﬁng happen. This provides statistical a-posterioriinformation about the implicit market equilibrium before spooﬁng which in case of spoof-ing can not be observed. Figure 14 shows on the left panel the joint distribution ( ı − , ı + ) while the right panel represents, based on the model of the theoretical part and calibration,the joint distribution ( ı spoof ( ı − ) , ı + ) would be in the case of spooﬁng. The spoofed jointdistribution is skewed to the left in comparison to the non-spoofed one, in accordance tothe theoretical analysis that spooﬁng decreases the imbalance – in the buy order case –before a market order.F IGURE

14. Left panel: Empirical joint distribution of ( ı − , ı + ) . Rightpanel: Joint distribution of ( ı spoof ( ı − ) , ı + ) A possible way to detect spooﬁng is therefore comparing the long run distribution ( ı − , ı + ) with the short term empirical distribution (ˆ ı N − , ˆ ı N + ) . These two distributions encode thepossibility to disentangle legitimate market behavior from spoofed ones. However, as inthe previous approach, the sequence of joint observation is once again not iid as previouslymentioned and pictured in Figure 13. For short time horizon, the market may be legitimate,though far away from the long run distribution. .3. Monitoring ˆ ı N − conditioned on ˆ ı N + . To overcome the previous problem the next ap-proach is to monitor ˆ ı N − conditioned on the current ˆ ı N + . From our hypothesis, ı + representsthe current steady state of the market at equilibrium. It turns out that conditioned on ˆ ı + ( t ) the sequence of ˆ ı − ( t ) is closer to iid.F IGURE

15. Left panel: Autocorrelation of ˆ ı N − | ˆ ı N + ∈ [0 . , . ofstock BMO on June 7, 2017. Right panel: Autocorrelation of ˆ ı N − | ˆ ı N + ∈ [0 . , . of stock BMO on June 7, 2017. Red area is the 95% conﬁ-dence interval of the autocorrelation.In order to detect spooﬁng behavior, instead of adopting a statistical test for which someparametric assumptions on the distribution has to be made, we measure the distance be-tween ı − and ˆ ı N − conditioned on the current observed imbalance ˆ ı N + using a conditional2-Wasserstein distance.For two distributions µ and ν the 2-Wasserstein distance is deﬁned as W ( µ, ν ) =  inf (cid:90) ( x − y ) π ( dx, dy ) : π ∼ µ, π ∼ ν  / =  (cid:90) ( q µ ( α ) − q ν ( α )) dα  / From a generic perspective, the conditional distance we consider is as follows: If we as-sume that ( ı − , ı + ) ∼ K ( y, dx ) ⊗ µ ( dy ) where µ ∼ ı + and K ( y, · ) ∼ ı − | ı + = y , it follows that ( ı spoof ( ı − ) , ı + ) ∼ K spoof ( y, dx ) ⊗ µ ( dy ) where K spoof ( y, · ) = K ( y, · ) ◦ ı − spoof Hence given ı + = y , we have W ( K ( y, · ) , K spoof ( y, · )) =  (cid:90) (cid:0) q K ( y, · ) ( α ) − ı spoof (cid:0) q K ( y, · ) ( α ) (cid:1)(cid:1) dα  / Heuristically we wish to monitor the following two quantities W (cid:0) ı − , ˆ ı N − (cid:1) | ı N + (cid:124) (cid:123)(cid:122) (cid:125) Distance from the short term imbalance ˆ ı N − to the equilibrium imbalance ı − given that ı + ∼ ı N + and W (cid:0) ı spoof ( ı − ) , ˆ ı N − (cid:1) | ı N + (cid:124) (cid:123)(cid:122) (cid:125) Distance from the short term imbalance ˆ ı N − to the spoofed imbalance ı spoof given that ı + ∼ ı N + rom the data, we can calibrate the joint distribution ( ı − , ı + ) as well as ( ı spoof ( ı − ) , ı + ) . Hence, we have a parametrization of K ( y, dx ) and K spoof ( y, dx ) for every y . Howeverfor each value ˆ ı + ( l ) from the discrete distribution ˆ ı N + we only have a single sample point ˆ ı − ( l ) at hand. In order to overcome this problem we will bucket the values of ˆ ı + ( l ) in thesample of ˆ ı N + to get a Kernel approximation of ˆ ı N − .The monitoring strategy at a given time t in Π is given as follows1 - Consider the discrete short term joint distribution (ˆ ı N − ( t ) , ˆ ı N + ( t )) given by the sample ( ı − ( s ) , ı + ( s )) , s = t, . . . , k − N + 1 of the last N pairs of imbalances before time t .2 - We deﬁne the following L buckets of equal cardinality N/LJ l = (cid:26) s : s = t, . . . , t − N + 1 , q ˆ ı N + (cid:18) l − L (cid:19) ≤ ı + ( s ) < q ˆ ı N + (cid:18) lL (cid:19)(cid:27) , l = 1 . . . , L as well as the mid point of each ı l = LN (cid:88) s ∈ J l ı + ( s ) l , we generate a random sample ı N,L − and ı N,Lspoof of N/L points each drawnfrom K ( ı l , · ) and K spoof ( ı l , · ) , respectively.4 - For each l we compute the Wasserstein distances W (cid:16) ı N,l − , ˆ ı N,L − (cid:17) and W (cid:16) ı N,lspoof , ˆ ı N,L − (cid:17) where ˆ ı N,L − is the discrete distribution out of the sample ˆ ı − ( s ) for s in J l . This is anapproximation for the Wasserstein distance W (cid:0) ı − , ˆ ı Nn (cid:1) | ı N + ≈ ı l and W (cid:0) ı − , ˆ ı Nn (cid:1) | ı N + ≈ ı l S (cid:0) ı − , ˆ ı N − | ˆ ı N + (cid:1) := 1 L L (cid:88) l =1 W (cid:16) ı N,l − , ˆ ı N,L − (cid:17) S (cid:0) ı spoof , ˆ ı N − | ˆ ı N + (cid:1) := 1 L L (cid:88) l =1 W (cid:16) ı N,lspoof , ˆ ı N,L − (cid:17) Remark 5.1.

To enhance the accuracy of this indicator, we run step 3 to 5 a couple oftimes with different samples and average again. The former ﬁts well with a joint normal distribution, while the second one with a skewed normal distribu-tion, see Figure 14. Other parametrization could eventually be used too. IGURE S (cid:0) ı spoof , ˆ ı N − | ˆ ı N + (cid:1) from June 5, 2017 to June 9, 2017. Fromtop to bottom are stock BMO, CNR, CPG, PPL respectively. Blue areais the conﬁdence interval of S (cid:0) ı − , ˆ ı N − | ˆ ı N + (cid:1) . Red area is where S (cid:0) ı spoof , ˆ ı N − | ˆ ı N + (cid:1) ≤ S (cid:0) ı − , ˆ ı N − | ˆ ı N + (cid:1) .6. C ONCLUSION

In this paper we address the question of assessing quantitatively eventual spooﬁng behav-ior in high frequency trading. In a one period setting we present how a spooﬁng strategyfrom a market taker or maker is designed by manipulating the imbalance at different depthlevel to impact the subsequent price movement. We provide and discuss the conditions forthe market to allow for spooﬁng manipulations. We subsequently solve the optimizationproblem from a spoofer perspective and derive/discuss the resulting imbalance after spoof-ing as a function of the market parameters. We calibrate the weighted imbalance and pricemovement impact to Level 2 data provided by TMX. Using these results we propose aquantiﬁcation instrument to monitor in real time eventual spooﬁng behavior on the marketusing a conditional Wasserstein distance. We illustrate these results on the data providedby TMX.This approach is by no means a deﬁnitive answer to spooﬁng detection but rather a ﬁrst takeon. The dynamic structure of the limit order book and strategy, the memory dependence ofthe parameters over time, as well as the speciﬁcities of one market with respect to anotherone are left to further study. Furthermore, there might be alternative approaches subject tonew research directions – monitoring arrival rates of orders, frequency of book/cancelling,etc. – that could complement such a monitoring approach. . C REDITS

We thank Fields Institute for the organization of the series “Fields-China Joint IndustrialProblem Solving Workshop” through which we got involved into this problem. Thanksto TMX and their data analysis team for providing unique access to large datasets, highperformance computing facilities as well as precious insights concerning high frequencytrading. Finally, the data analysis could not have been performed without the outstand-ing work and dedication of the open source community for the development of scientiﬁccomputing/visualisation libraries/platforms such as NumPy [13], SciPy [23], Mistic [19],Pandas [20, 25], Apache Spark, Plotly to name the most relevant for this work.A

PPENDIX

A. A

PPENDIX

Sampling Frequency f . In real market there exists a large amount of orders whichare posted "far away" from the best bid/ask price or cancelled right after booked. Theseorders have little impact on the price so if the best bid/ask price change is computed asthe best bid/ask price difference between 2 consecutive orders, more than 96% of time theprice does not move, shown in the left panel of Figure 6. Instead, we extract orders everycertain seconds and compute price changes in this "new" dataframe. The sample frequency f is the smallest number of seconds where the variance of sampled price changes σ f is theclosest to a given benchmark σ , arg min f> | σ − σ f | When f = 2 s , stock AEM which is one of the most active stocks on the TSX has a widerprice change distribution and its variance is approximately equal to 2. So taking σ = 2 ,we conduct tests for different stocks and do see improvements of price change distributionshown in the right panel of Figure 6.A.2. Computation of ı spoof . For t ∈ Π , ˆ ı − ( t ) and ˆ ı + ( t ) are computed as ˆ ı − ( t ) = (cid:80) k ≤ N (cid:80) t − f ≤ s

SSRN Electronic Journal , 02 2010.[2] A. Alfonsi, A. Fruth, and A. Schied. Optimal execution strategies in limit order bookswith general shape functions.

Quantitative Finance , 10, 08 2007.[3] F. Allen and D. Gale. Stock-price manipulation.

Review of Financial Studies , 5:503–29, 02 1992.[4] K. Bechler and M. Ludkovski. Order ﬂows and limit order book resiliency on themeso-scale.

Market Microstructure and Liquidity , 3(03n04):1850006, 2017.[5] Y. Cao, Y. Li, S. Coleman, A. Belatreche, and T. M. McGinnity. Detecting pricemanipulation in the ﬁnancial market. In , pages 77–84, 2014.[6] Y. Cao, Y. Li, S. Coleman, A. Belatreche, and T. M. McGinnity. Adaptive hiddenmarkov model with anomaly states for price manipulation detection.

IEEE Transac-tions on Neural Networks and Learning Systems , 26(2):318–330, 2015.[7] Á. Cartea, R. Donnelly, and S. Jaimungal. Enhancing trading strategies with orderbook signals.

Applied Mathematical Finance , 25(1):1–35, 2018.[8] A. Cartea, S. Jaimungal, and Y. Wang. Spooﬁng and price manipulation in order-driven markets.

Applied Mathematical Finance , pages 1–32, 02 2020. doi:10.1080/1350486X.2020.1726783.[9] R. Cont, A. Kukanov, and S. Stoikov. The price impact of order book events.

Journalof Financial Econometrics , 12(1):47–88, 2014.[10] J. Gatheral. No-dynamic-arbitrage and market impact.

Quantitative Finance , 10(7):749–759, 2010.[11] J. Gatheral and A. Schied. Dynamical models of market impact and algorithms fororder execution.

SSRN Electronic Journal , 01 2013.[12] M. Gould and J. Bonart. Queue imbalance as a one-tick-ahead price predictor in alimit order book.

Market Microstructure and Liquidity , 12 2015.[13] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Courna-peau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H.van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E.Oliphant. Array programming with NumPy.

Nature , 585(7825):357–362, September2020. ISSN 1476-4687.[14] O. Hart and D. Kreps. Price destabilizing speculation.

Journal of Political Economy ,94(5):927–52, 1986.

15] R. Jarrow. Market manipulation, bubbles, corners, and short squeezes.

Journal ofFinancial and Quantitative Analysis , 27:311–336, 09 1992.[16] E. J. Lee, K. S. Eom, and K. S. Park. Microstructure-based manipulation: Strategicbehavior and performance of spooﬁng traders.

Journal of Financial Markets , 16(2):227–252, 2013.[17] A. Lipton, U. Pesavento, and M. Sotiropoulos. Trade arrival dynamics and quoteimbalance in a limit order book.

Preprint , 12 2013.[18] E. M. Miranda, P. McBurney, and M. J. Howard. Learning unfair trading: A marketmanipulation analysis from the reinforcement learning perspective. , pages 103–109, 2016.[19] T. S. A. F. M. A. M.M. McKerns, L. Strand. Building a framework for predictivescience.

Proceedings of the 10th Python in Science Conference, 2011 , 2011.[20] J. Reback, W. McKinney, jbrockmendel, J. V. den Bossche, T. Augspurger, P. Cloud,gfyoung, Sinhrks, A. Klein, M. Roeschke, S. Hawkins, J. Tratner, C. She, W. Ayd,T. Petersen, M. Garcia, J. Schendel, A. Hayden, MomIsBestFriend, V. J. L. Rechen-zentrum, P. Battiston, S. Seabold, chris b1, h vetinari, S. Hoyer, W. Overmeire, alim-cmaster1, K. Dong, C. Whelan, and M. Mehyar. pandas-dev/pandas: Pandas. Soft-ware, Feb. 2020.[21] G. Shorter and R. Miller. High-frequency trading: Background, concerns, and regu-latory developments. Technical report, Congressional Research Service, 04 2015.[22] J. A. Sirignano. Deep learning for limit order books.

Quantitative Finance , 19(4):549–570, 2019.[23] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau,E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wil-son, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J.Carey, ˙I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cim-rman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,F. Pedregosa, P. van Mulbregt, A. Vijaykumar, A. P. Bardelli, A. Rothberg, A. Hilboll,A. Kloeckner, A. Scopatz, A. Lee, A. Rokem, C. N. Woods, C. Fulton, C. Mas-son, C. Häggström, C. Fitzgerald, D. A. Nicholson, D. R. Hagen, D. V. Pasechnik,E. Olivetti, E. Martin, E. Wieser, F. Silva, F. Lenders, F. Wilhelm, G. Young, G. A.Price, G.-L. Ingold, G. E. Allen, G. R. Lee, H. Audren, I. Probst, J. P. Dietrich, J. Sil-terra, J. T. Webber, J. Slaviˇc, J. Nothman, J. Buchner, J. Kulick, J. L. Schönberger,J. V. de Miranda Cardoso, J. Reimer, J. Harrington, J. L. C. Rodríguez, J. Nunez-Iglesias, J. Kuczynski, K. Tritz, M. Thoma, M. Newville, M. Kümmerer, M. Boling-broke, M. Tartre, M. Pak, N. J. Smith, N. Nowaczyk, N. Shebanov, O. Pavlyk, P. A.Brodtkorb, P. Lee, R. T. McGibbon, R. Feldbauer, S. Lewis, S. Tygier, S. Sievert,S. Vigna, S. Peterson, S. More, T. Pudlik, T. Oshima, T. J. Pingel, T. P. Robitaille,T. Spura, T. R. Jones, T. Cera, T. Leslie, T. Zito, T. Krauss, U. Upadhyay, Y. O.Halchenko, Y. Vázquez-Baeza, and S. . Contributors. SciPy 1.0: fundamental algo-rithms for scientiﬁc computing in Python.

Nature Methods , 17(3):261–272, March2020. ISSN 1548-7105.[24] Y. Wang. Strategic spooﬁng order trading by different types of investors in the futuresmarkets.

Wall Street Journal , 2015.[25] Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfanvan der Walt and Jarrod Millman, editors,

Proceedings of the 9th Python in ScienceConference , pages 56 – 61, 2010. doi: 10.25080/Majora-92bf1922-00a.[26] K. Xu, M. Gould, and S. Howison. Multi-level order-ﬂow imbalance in a limit orderbook.

Preprint , 07 2019. HANGHAI J IAO T ONG U NIVERSITY , S

HANGHAI , C

HINA

E-mail address : [email protected] W ESTERN U NIVERSITY , C

ANADA

E-mail address : [email protected] S HANGHAI J IAO T ONG U NIVERSITY ( NOW AT P ING A N T ECHNOLOGY ), S

HANGHAI , C

HINA

E-mail address : [email protected] E-mail address : [email protected] S HANGHAI J IAO T ONG U NIVERSITY , S

HANGHAI , C

HINA

E-mail address : [email protected] URL :