[PDF] A continuous and efficient fundamental price on the discrete order book grid

Abstract

This paper develops a model of liquidity provision in financial markets by adapting the Madhavan, Richardson, and Roomans (1997) price formation model to realistic order books with quote discretization and liquidity rebates. We postulate that liquidity providers observe a fundamental price which is continuous, efficient, and can assume values outside the interval spanned by the best quotes. We confirm the predictions of our price formation model with extensive empirical tests on large high-frequency datasets of 100 liquid Nasdaq stocks. Finally we use the model to propose an estimator of the fundamental price based on the rebate adjusted volume imbalance at the best quotes and we empirically show that it outperforms other simpler estimators.

Full PDF

AA continuous and eﬃcient fundamental priceon the discrete order book grid

Julius Bonart ∗ , Fabrizio Lillo August 8, 2016

1: Department of Computer Science, University College London and CFM-Imperial Institute of QuantitativeFinance, Imperial College London2: Scuola Normale Superiore, Piazza dei Cavalieri 7, Pisa, Italy

Abstract

This paper develops a model of liquidity provision in ﬁnancial markets by adapting theMadhavan, Richardson, and Roomans (1997) price formation model to realistic order bookswith quote discretization and liquidity rebates. We postulate that liquidity providers observea fundamental price which is continuous, eﬃcient, and can assume values outside the intervalspanned by the best quotes. We conﬁrm the predictions of our price formation model withextensive empirical tests on large high-frequency datasets of 100 liquid Nasdaq stocks. Finallywe use the model to propose an estimator of the fundamental price based on the rebate adjustedvolume imbalance at the best quotes and we empirically show that it outperforms other simplerestimators.

Keywords— price formation; liquidity provision; tick size; market microstructure

JEL classiﬁcation—

G10 ∗ Corresponding author: [email protected] a r X i v : . [ q -f i n . T R ] A ug Introduction

We say that a market price is eﬃcient if it unambiguously reﬂects all available information, is arbitrage-free,and unpredictable. But diﬀerent types of microstructural frictions prevent the observed price to freely reﬂectthe eﬃcient price. Among these, the price discretization implemented in most modern markets plays a majorrole. As a result there is no reason to expect that either best quote, or the mid-price (their average), coincidewith an eﬃcient fundamental market price.Eﬃcient price dynamics are somewhat paradoxical because many other ﬁnancial time series display longmemory: A striking example is the auto-correlation of the transaction signs [Hasbrouck and Ho, 1987, Biaiset al., 1995, Bouchaud et al., 2004, Farmer and Lillo, 2004]. A central endeavour in market microstructure isto reconcile the predictability of trade signs with price eﬃciency, i.e. to uncover the adequate price formationprocess. Following Glosten and Milgrom [1985], Madhavan et al. [1997] (henceforth MRR) developed a simpletheory of price formation: Prices are impacted proportionally to the innovation in the transaction history(order ﬂow). This alone removes all price predictibility when transaction signs are correlated. Included intheir theory is Glosten and Milgrom [1985]’s assumption that market makers avoid ex-post regrets by settinga ﬁnite spread. Accordingly, the theory yields relationships between price impact, trade sign correlations,and the bid-ask spread.Because MRR’s model disregards the role of price discretization it is not a priori clear that MRR’spredicted relationships between market variables hold also for large tick stocks . Hitherto, only some ofthese predictions have been empirically assessed in the literature. We perform here the ﬁrst systematic testof MRR’s general relationships and explicitly study how the tick size aﬀects their performance. AlthoughMRR’s model assumed quotes and prices to be continuous we ﬁnd that some of the predicted relationshipsbetween market variables are surprisingly accurate also when the tick is large and discretization plays, apriori, an important role.More speciﬁcally, in this paper we shall argue that MRR’s original idea can be adapted to large tickstocks by postulating the existence of an underlying eﬃcient continuous market price. From this assumptionwe derive price formation equations for large tick stocks which reproduce, on average, the classical MRRrelations. Our framework also predicts that price discretization becomes in fact important for second orderprice statistics (i.e. covariances and correlations). Thus, our model of liquidity provision explains howrelationships between quantities depending linearly on the price hold regardless of the price discretization,whereas the tick size heavily inﬂuences quantities depending quadratically on the price.In our model, limit order queues deplete when the fundamental price moves outside a certain intervalaround the mid price. We ﬁnd that the length of this characteristic interval is typically larger than one tickand the mid-price dynamics are characterised by an interesting “stickiness”. We show that this behaviourcan be entirely explained by the existing rebate structure oﬀered by the exchange to liquidity providers.Our framework provides a straightforward way to reconcile the assumption of an eﬃcient fundamentalprice with the inherent properties of ﬁnancial markets. Whereas many empirical studies have focused onmarket dynamics on ultra high frequency time scales, little attention has been paid to the spatial dimensionof price dynamics in modern ﬁnancial markets. Because most liquidity provision in ﬁnancial markets isnowadays channelled through high frequency market makers, important aspects of the market quality arenaturally related to the properties of price changes on the smallest temporal and spatial scales. Our workdevelops an important step towards a uniﬁed theory of price formation and liquidity provision in smalland large tick order books. Our point of view diﬀers signiﬁcantly from the perspective taken in some of theprevious literature on the subject, too. We believe that our empirical observations suggest that price changesare mostly gouverned by strategic considerations of liquidity providers who agree on a hidden fundamentalprice. For example, we observe that a queue depletion at the best entails with a high probability a permanentprice change. Order book models with purely stochastic dynamics, so called “zero intelligence models”, arenot capable of reproducing this simple fact, unless they beneﬁt from additional assumptions.In large tick stocks the volumes at the best quotes contain a signiﬁcant amount of information about the In a large tick stock the ratio between tick size and price is relatively large and the spread is almost always equalto one tick. irection of the market [Gould and Bonart, 2015]. The second part of our paper hence proceeds by deﬁningan approximate fundamental price in large tick stocks by using the squares of the available volumes at thebest bid and ask. This proxy can take continuous values within and beyond the region spanned by the bidand ask. We argue that this quantity performs better, as a proxy of the fundamental price, than generalizedlinearly volume weighted prices previously discussed in the literature. We assess the performance of ourproxy by showing that (i) it incorporates to a large extent the information conveyed by the state of theorder book, (ii) it behaves very much as the fundamental price, in that it follows approximately the MRRprice formation rule. Having this proxy at hand allows us to reach further than much of the previous priceformation literature, in that we can ﬁlter out exogeneous public information shocks from the fundamentalprice dynamics, and study their eﬀects on the future order ﬂow. An interesting empirical ﬁnding of thesecond part of this paper is that the sign of the next trade is positively correlated with the exogeneousinformation shocks.Tick sizes are not expected to decrease in the near future, since regulation authorities express theirexplicit desire to “prevent a race to the bottom”[Gouvernment Oﬃce for Science, 2015]. European regulationagencies currently aim to establish a pan-European tick size regime “to prevent the use of the tick size asa competition tool”[MiFID II Wholesale Firms Conference, 2015] within the MiFID . The introductionof MiFID II requires that standardized tick size bands, which depend on the valuation of the regulatedsecurity, must be adopted across Europe. In the United States, the SEC announced in a press release onMay 6 2015 that it “approved a proposal [...] for a two-year pilot program that would widen the minimumquoting and trading increments...”, i.e. the tick size, for stocks of some smaller companies [Securities andExchange Commission, 2015]. This reﬂects a general mood inclined towards freezing the tick sizes or evenincreasing them. Such opinion of regulation agencies is supported by some academic research, althoughno clear consensus seems to have emerged [Biais et al., 2005]. To summarize, we expect that the problemstudied in this paper will have a relatively high longeveity and continue to be a challenge to regulators andmarket participants over a reasonably long term future.This paper is structured as follows: In Sec. 2 we disucss the relevant literature. Sec. 3 reviews thefunctioning of limit order books. Sec. 4 presents the empirical datasets used in this study. We analyse theempirical implications of the MRR price formation model in Sec. 5. Sec. 6.1 discusses the eﬃciency of themid price in large tick LOBs and the role of liquidity rebates. Sec. 6 presents our generalization of the MRRmodel to large tick LOBs. We analyse ﬁrst order and second order price statistics in small and large tickLOBs. A proxy of the fundamental price is developped in Sec. 7. Finally, in Sec. 8 we draw some conclusions. The MRR model proposed a price formation process which accomodates the strong serial correlations ofthe trade signs [Hasbrouck and Ho, 1987, Biais et al., 1995, Bouchaud et al., 2004, Farmer and Lillo, 2004,Toth et al., 2015, Gould et al., 2015] with the very weak auto-correlation of price returns. The MRR priceformation model heavily relies on the theory of market making developed in Glosten and Milgrom [1985].In recent years, Glosten and Milgrom [1985]’s work was signiﬁcantly reﬁned, for example in Wyart et al.[2008], Bouchaud et al. [2009]. However, despite the considerable recent eﬀort in this ﬁeld, no convincingtheory for large tick stocks has emerged so far. History dependent impact was discussed again in Farmerand Lillo [2004], Farmer et al. [2006]. The expressions “asymmetric/adaptive liquidity provision” emergedto account for the impact asymmetry as a function of the order sign predictor [Taranto et al., 2014]. MRR’stransaction dependent price formation model was merged with this framework in Bouchaud et al. [2009],albeit with some reluctance to use the concept of a “fundamental price”.In the econometrics literature the discrepancy between the eﬃcient price and the quoted price is calledmarket microstructural noise. The past literature mostly addressed the problem of ﬁltering the market The Markets in Financial Instruments Directive is the EU legislation regulating ﬁrms providing services toclients linked to ’ﬁnancial instruments’ (shares, bonds, units in collective investment schemes, and derivatives), andthe venues where those instruments are traded. icrostructural noise to uncover underlying quantities, such as feedback reactions between market partici-pants [Bacry et al., 2013], fundamental price jumps [Lee and Mykland, 2015], the volatility [Andersen et al.,2011, Bandi and Russel, 2008, Ghysels and Sinko, 2011, Jacod et al., 2009], and the inﬂuence of the tick sizeon the noise [Large, 2011, Curato and Lillo, 2015]. Ball and Chordia [2001] discussed the inﬂuence of thetick size on the adverse selection component of bid-ask spreads.Robert and Rosenbaum [2011], Huang et al. [2013] proposed an explicit method to reconstruct an eﬃcientprice from discrete high frequency data. Some aspects of our paper superﬁcially resemble their work, in thatwe also propose a method to reconstruct a fundamental price from high frequency LOB data. However, incontrast to the method exposed in Robert and Rosenbaum [2011] we shall achieve the reconstruction byusing the best quotes and the displayed volumes in the LOB. Robert and Rosenbaum [2011] use only theprevious return and an ad-hoc quantity “ η ” to infer an eﬃcient price. The parameter η corresponds to theexcess width of “uncertainty regions” where the eﬃcient price is undeﬁned. While this leads to mid pricestickiness similar to what we observe in our model, we do not believe that the characteristic intervals aroundthe mid-price reﬂect any uncertainty of liquidity providers about the fundamental price. Rather, we suggestthat market making can be proﬁtable even when the distance between the fundamental price and the midprice is larger than a half tick because of market making rebates oﬀered by the exchanges; volumes aretherefore not immediately cancelled when the fundamental price moves outside the interval between the bidand the ask, and the mid price becomes sticky. We follow here Harris [2013] who discusses the consequenceson market making and taking rebates/fees on the quote setting in limit order markets. Finally, Jaisson [2015]developed a model in which a fundamental price emerges as a consequence of a fair pricing condition formarket makers.Alternatives to MRR’s price formation theory emerged, as well. Bouchaud et al. [2004] introducedthe “propagator model” postulating a transient impact which was independent of the transaction history.Whereas Bouchaud et al. [2009] showed that the transaction dependent propagator model was equivalentto the MRR theory, the multi event propagator model [Eisler et al., 2012] truly diﬀers from the MRRframework [Taranto et al., 2016a]. The propagator model is an alternative to the MRR theory in that itcompletely lacks the notion of a fundamental price.Most classical price formation models, including the MRR framework and Bouchaud et al. [2004]’spropagator model, have in common that they take the exogeneity of the order ﬂow for granted. Our papercalculates explicitly the correlation between a public information shock and the future trade sign in largetick order books. Therefore, our work contributes signiﬁcantly to the understanding of the interplay betweenliquidity provision and consumption in ﬁnancial markets. Superﬁcially, this aspect of our study resemblesHasbrouck [1991]’s vector regression model of price returns and order ﬂow and its recent generalization tolarge tick stocks [Taranto et al., 2016b]. While we are able to infer directly the correlation between tradeand public information shocks by using the proxy of the fundamental price, Hasbrouck [1991] performs anregression on price returns and order ﬂow omitting the eﬀect of public information shocks. Most modern ﬁnancial markets operate limit order books (LOBs), where ﬁnancial institutions interact viathe submission of orders. A buy order (sell order) is a commitment to buy (sell, respectively) a maximumquantity of the asset for a price no larger than its limit price (no smaller than its limit price, respectively).Whenever an institution submits a buy (sell) order, the LOB’s trade-matching algorithms checks whetherits order can be matched against previously submitted but still unmatched sell (buy) orders. In this casean immediate transactions occurs. If the order cannot be matched it remains active in the LOB until itis matched against a future incoming sell order or cancelled by its owner. Orders that do not match uponarrival are called limit orders . Orders that match upon arrival are called market orders .For a given LOB, the bid price b t is the highest price among active buy orders at time t . Similarly, the askprice a t is the lowest price among active sell orders at time t . The bid and ask prices are collectively knownas the best quotes . Their diﬀerence s t = a t − b t is called the bid-ask spread, and their mean x t = ( a t + b t ) / mid price . uy limit order sell limit order Bid Side Ask Side bid-price ask-price Pricemid-price D e p t h A v a i l a b l e ( σ ) -10-9-8-6-5-4-3-1-20-779103684521 spread Figure 1: Schematic of an LOB. The horizontal lines within the blocks at each price level denotethe diﬀerent active orders at each price.

LOBs enforce a minimum price increment which is called the tick size . Hence, institutions must chooseprices of their orders which are integer multiples of the tick size speciﬁed by the platform. Because LOBsimplement a tick size τ >

0, it is common for several diﬀerent limit orders to reside at the same price ata given time. A trade-matching algorithms therefore needs a secondary rule to decide which limit orderamong limit orders at the same price is executed ﬁrst. To determine the queueing priority for orders at agiven price, most exchanges implement a price–time priority rule. That is, for buy (respectively, sell) limitorders, priority is given to the limit orders with the highest (respectively, lowest) price, and ties are brokenby selecting the limit order with the earliest submission time. LOBs with price–time priority resembles inmany respects a queueing system, where limit orders have a rank and wait for execution. Figure 1 shows aschematic of an LOB at some instant in time, illustrating the deﬁnitions in this section.The rules that govern order matching also dictate how prices evolve through time. Prices changes canoccur for the following reason: The remaining volume at the best queue can be cancelled by its owner, or itcan be executed by an incoming market order. Finally, if s t > τ a buy or a sell limit order can be submittedinside the spread. In either case, the mid price changes by a certain mutiple of τ /

2. Both liquidity providersand liquidity consumers can therefore change the price in a LOB.When the tick size is small, the eﬀects of the discretization are weak and can often be neglected. Whenthe tick size is large, the eﬀects of the discretization are important and constrain the price dynamics. Forexample, the spread of large tick LOBs is usually locked and equal to the allowed minimum of one tick,whereas the spread of small tick LOBs is usually a large multiple of τ . Because price ﬂuctuations areproportional to the price itself, not the bare tick size itself but the ratio τ /x t determines whether an assetis a large or small tick. We call this ratio the relative tick size ˜ τ .Finally, market participants pay fees to the exchange when they trade in a LOB. Some exchanges (e.g.Nasdaq) also oﬀer rebates to liquidity providers: Owners of limit orders receive a negative fee (i.e. credit)from the exchange upon the execution of their orders as a reward for liquidity provision. A trader is a liquidity provider if he/she submits limit orders. A trader is a liquidity consumer if he/she submitsmarket orders. Data

In our empirical investigation we use a pool of 100 highly liquid stocks traded on Nasdaq during the wholeyear 2015. The data that we study originates from the LOBSTER database [Limit Order Book System:The Eﬃcient Reconstructor], which lists all market order arrivals, limit order arrivals, and cancellationsthat occur on the Nasdaq platform during the normal trading hours of 09:30 to 16:00 on each trading day.Trading does not occur on weekends or public holidays, so we exclude these days from our analysis. We alsoexclude market activity during the ﬁrst and last hour of each trading day, to remove any abnormal tradingbehaviour that can occur shortly after the opening auction or shortly before the closing auction.On the Nasdaq platform, each stock is traded in a separate LOB with price–time priority, with a tick sizeof τ = $0 .

01. Although this tick size is the same for all stocks on the platform, the prices of diﬀerent stocksvary across several orders of magnitude (from about $1 to more than $1000). Therefore, the relative ticksize similarly varies considerably across diﬀerent stocks. The bid-ask spread of large tick LOBs is typicallyclose to one tick (its minimum possible value). In this paper we deﬁne large tick stocks by the condition E [ s t ] < . . τ , with E [ s t ] the observed time averaged bid-ask spread. Conversely, the bid-ask spreadof small tick LOBs is typically a large multiple of the tick. In this paper we deﬁne small tick stocks by thecondition E [ s t ] > .

04$ = 4 τ . All other stocks are called medium tick stocks.Nasdaq imposes fees of approximately 0 . . Price formation describes the process whereby information, liquidity, and the order ﬂow impact the marketprice. The price impact mechanism cannot be trivial: It is a very well known empirical fact that the orderﬂow is highly correlated [Hasbrouck and Ho, 1987, Biais et al., 1995] whereas the resulting mid prices are(nearly) uncorrelated.Madvahan et al. Madhavan et al. [1997] developed a simple structural model of price formation. Theypostulated the existence of a fundamental price p t of the stock. t denotes the transaction time , i.e. p t isthe fundamental price immediately before the t -th trade. If a transaction at time t is buyer initiated, it isassigned the indicator (cid:15) t = 1, while if it is seller-initiated it is assigned the indicator (cid:15) t = −

1. MRR furtherassumed that the revision in beliefs about p t is positively correlated with the innovation in the order ﬂow.Formally, this can be written as p t +1 − p t = G [ (cid:15) t − ˆ (cid:15) t ] + W t , (1)where ˆ (cid:15) t = E t − [ (cid:15) t ] (2) The main reason of the long memory in the market order ﬂow is order splitting. Institutions tend to splitlarge orders into small orders and execute them incrementally over long periods of time. Order ﬂow correlations aredetectable on time scales beyond hours and even days [Farmer and Lillo, 2004, Bouchaud et al., 2004, Toth et al.,2015, Lillo et al., 2005, Bouchaud et al., 2009]. More precisely MRR deﬁne p t as the “post trade expected value of the stock conditioned upon public informationand the trade information variable”. s the expected transaction sign at t given the public information up to t . Public information enters themarket through the past trading history { (cid:15) t − , (cid:15) t − , · · · } and the shocks { W t − , W t − , · · · } which describepublic informaton derived from external news . We assume that W t is a white noise with zero mean whichis uncorrelated with the past transaction history . Private information enters Eq. (1) through the term G [ (cid:15) t − ˆ (cid:15) t ]. Because the information on the past order ﬂow is public, the information content of a trade canonly depend on its unexpected part (cid:15) t − ˆ (cid:15) t . By construction, this price formation mechanism ensures priceeﬃciency, E t − [ p t +1 ] = p t , regardless the correlations of { (cid:15) t } .Market makers share a common belief about p t and set ask and bid quotes. Following Glosten andMilgrom [1985] they seek to avoid ex-post regrets. In a competitive market, the bid and ask are therefore a t = p t + G [1 − ˆ (cid:15) t ] , (3) b t = p t + G [ − − ˆ (cid:15) t ] , (4)which ensure a zero average gain for the market maker . Accordingly, the LOB’s mid-price and spread are x t = a t + b t p t − G ˆ (cid:15) t , (5) s t = a t − b t = 2 G . (6)Because market makers need to anticipate the impact of the future market order, the mid price at t is notequal to p t . More speciﬁcally, its evolution equation reads x t +1 − x t = G [ (cid:15) t − ˆ (cid:15) t +1 ] + W t . (7)MMR studied this evolution equation and compared the implied spread to actually observed spreads. Intheir conclusion, they pointed out that price discreteness was a serious limitation to the applicability oftheir model. Surprisingly, this has been hitherto taken for granted in the market microstructure literature.Moreover, confusion arised later whether Eq. (1) described a fundamental price or the observed mid-price[Bouchaud et al., 2009], since the consequences are manifestly diﬀerent. Hence, the empirical implications ofthe evolution equation (7) have not been fully analysed in the literature. Our goal is therefore to transformEq. (7) into easily observable quantities.We ﬁrst note that the response function of a transaction, deﬁned as [Bouchaud et al., 2004] R ( (cid:96) ) = E [ (cid:15) t · ( x t + (cid:96) − x t )] , (8)is related to the correlations in the order ﬂow according to R ( (cid:96) ) = G E [1 − (cid:15) t ˆ (cid:15) t + (cid:96) ] = G − G E [ (cid:15) t · E t (ˆ (cid:15) t + (cid:96) − (cid:15) t + (cid:96) )] − G E [ (cid:15) t (cid:15) t + (cid:96) ]= G [1 − C ( (cid:96) )] . (9)We exploited the fact that by deﬁnition E t [ˆ (cid:15) t + (cid:96) − (cid:15) t + (cid:96) ] = 0 and deﬁned the auto-correlation function of thetrade signs C ( (cid:96) ) = E [ (cid:15) t + (cid:96) (cid:15) t ] . (10)Both C ( (cid:96) ) and R ( (cid:96) ) are easily measurable on empirical data. A testable parameter-free prediction of theMRR model is that R (1) = R ( (cid:96) ) 1 − C (1)1 − C ( (cid:96) ) . (11) MRR originally assumed that ˆ (cid:15) t only depended on (cid:15) t − . We consider a more general case in this paper as in[Farmer and Lillo, 2004, Taranto et al., 2014]. The future trade signs can be correlated with W t because market participants might adapt their trading strategyto the observed price dynamics. We revert back to this problem in Sec. 6.2. Zero average gain is the idealized state of perfect competition. In reality, market makers need to set slightlylarger bid-ask spread to achieve a minimum proﬁt, albeit this proﬁt is arguably very low. The following equation does not appear in the original MRR paper, but it is mentioned in Wyart et al. [2008]. R (1) 0.000588 $ 0.00277 $ R (2) − C (1)1 − C (2) R (3) − C (1)1 − C (3) R (4) − C (1)1 − C (4) R (5) − C (1)1 − C (5) R (10) − C (1)1 − C (10) R (20) − C (1)1 − C (20) The results of this empirical test are displayed in Fig. 2. Our dataset reproduces the relationship (11)accurately over three orders of magnitude. This is highly surprising, as it contains both small and large tickstocks, and price discretization implies that the equations for x t and s t (Eqs. (5)-(6)) cannot be exact. Yet,Eq. (11) holds also for large tick stocks. Even the stock with the largest relative tick in our dataset, SiriusXM Holdings, satisﬁes Eq. (11) approximately, see table 1.In the following we show with minimal assumptions why the testable MRR predictions (11) are correctalso on large ticks. Our framework will show that while Eq. (7) is certainly incorrect as it stands, a verysimilar version remains true on average . We postulate the existence of a fundamental price which takes continuous values and satisﬁes the MRRequation (1). When the tick size is large, the best bid and ask are separated by one tick: a t − b t = τ .Eqs. (4) and (3) are now incorrect because they do not account for price discretization.In large tick LOBs the mid price changes when limit order queues at the best deplete. We assume thatliquidity providers own volume at the best quotes as long as it is marginally proﬁtable, i.e. as long as itsexpected execution price is equal to the fundamental price immediately after the transaction, adjusted forthe liquidity rebate. A limit order at the best remains proﬁtable as long as p t ∈ ( b t − r − G [ − − ˆ (cid:15) t ] , a t + r − G [1 − ˆ (cid:15) t ]) , (12)where r > p t + G [ − − ˆ (cid:15) t ] is the expectedfundamental price after the execution of the buy limit order at b t , and p t + G [1 − ˆ (cid:15) t ] is the expectedfundamental price after the execution of the sell limit order at a t . Nasdaq oﬀers rebates for liquidityproviders of approximately 0 . b t , a t ). We conﬁrm this predictionempirically in Sec. 6.1 by showing that both the impact of an extreme volume imbalance and the impact ofa queue depletion on the mid price is approximately equal to the rebate adjusted half spread.Price discretization prevents MRR’s dynamic equation for x t , Eq. (7), to be correct, but we can showthat it is still true on average . We assume that the distribution of p t within the characteristic interval issymmetric with respect to its center. In this case the expected fundamental price, given the mid price, isaccording to Eq. (12), equal to the average of the edges of the characteristic interval around x t : E [ p t | x t ] = x t + G ˆ (cid:15) t . (13) − − − − R (1) [$]10 − − − − R ( k ) × ( − C ( )) / ( − C ( k )) [ $ ] circle=R(2);righttriangle=R(3);lefttriangle=R(4);square=R(5);pentagon=R(10) Figure 2: Emprirical test of the prediction (11) of the MRR model on a pool of 100 large, mediumand small tick stocks. Each cloud of points, consisting of a circle (rescaled R (2)), a right triangle(rescaled R (3)), a left triangle (rescaled R (4)), a square (rescaled R (5)), and a pentagon (rescaled R (10)) corresponds to a single stock, averaged over the year 2015. Red clouds depict large tickstocks (which we deﬁne by E [ s t ] < . . < E [ s t ] < . E [ s t ] > . y = x ) is a guide to the eye.9 e can apply the conditional expectation to the above relation, given the past until t (cid:48) ≤ t , and average over x t . This is particularly useful, since Eq. (1) implies that E t (cid:48) [ p t +1 − p t ] = G E t (cid:48) [ (cid:15) t − ˆ (cid:15) t ] for t (cid:48) ≤ t . This allowsus to express the return of the mid price in terms of known quantities: G E t (cid:48) [ (cid:15) t − ˆ (cid:15) t ] = E t (cid:48) [ x t +1 − x t ] + G E t (cid:48) [ˆ (cid:15) t +1 − ˆ (cid:15) t ] , or simply E t (cid:48) [ x t +1 − x t ] = G E t (cid:48) [ (cid:15) t − ˆ (cid:15) t +1 ] . Note that t (cid:48) ≤ t is arbitrary. Therefore, we can sum up this equation to obtain E t [ x t + (cid:96) − x t ] = G E t [ (cid:15) t − ˆ (cid:15) t +1+ (cid:96) ] . (14)We can now proceed by calculating the response function of a transaction, R ( (cid:96) ) = E [ (cid:15) t · ( x t + (cid:96) − x t )] = G E ( (cid:15) t · E t [ (cid:15) t − ˆ (cid:15) t + (cid:96) +1 ]) = G E ( E t [1 − (cid:15) t (cid:15) t + (cid:96) +1 ]) = G [1 − C ( (cid:96) )] . (15)This is exactly the MRR relation (9), which we have so successfully tested on empirical data. Because MRRassumed a continuous price scale, it was not clear why its predictions turned out to be true for large tickLOBs as well. Whereas price discretization implies that the MRR equation (7) is incorrect as it stands, ourEq. (14) demontrates that it nevertheless holds true on average : Relationships between quantities whichdepend linearly on the mid price are thus not aﬀected by the tick size.In Sec. 6.2 however we introduce the auto-covariance function of mid price returns. Because the auto-covariance depends quadratically on the mid price, the mathematical expectation in Eq. (14) leads to resultswhich are very diﬀerent from what the orginal MRR Eq. (7) predicts. If we accept that a fundamental price should reﬂect publicly available information, then it is easy to show empirically that the mid price cannot be the eﬃcient fundamental price. Deﬁne the volume imbalance ι ( t ) = V b ( t ) − V a ( t ) V b ( t ) + V a ( t ) , (16)where V a ( t ) and V b ( t ) denote the available volumes at the ask and bid price. This volume imbalance hasattracted some attention in the recent literature [Gould and Bonart, 2015, Cartea et al., 2015, Cartea andJaimungal, 2015, Huang et al., 2015]. The quantity ι can be interpreted as the pressure that liquidityproviders put on the price, in that when ι >

0, more limit orders have been submitted and not canceled atthe buy side, and prices are expected to go up. When ι <

0, more limit orders have been submitted at thesell side and prices are expected to go down. We can conﬁrm this intuition quantitatively by considering theprice impact of an imbalance, namely R ( (cid:96) | ι ) = E [ x t + (cid:96) − x t | ι ( t ) = ι ] , (17)which is the expected price change after (cid:96) transactions given that an imbalance ι has been observed at t .If x t was eﬃcient all price predictors constructed with public data would have zero impact: In particular,the response function conditioned on the queue imbalance ι would vanish, R ( (cid:96) | ι ) = 0. This is however in starkcontrast with empirical observations: Fig. 3 shows R ( (cid:96) | ι ) as a function of (cid:96) for diﬀerent intervals of values of ι . As soon as ι (cid:54) = 0 the subsequent expected price change is non zero. Interestingly, for large | ι | the expectedabsolute price change exceeds a half tick. We also ﬁnd empirically that R ( ∞| ι = 1) = − R ( ∞| ι = −

1) isapproximately equal to the average absolute impact of a queue depletion on the midprice, denoted by r ∞ ( t ),in agreement with intuition, see table 2.The existence of (at least) one price predictor ι which can have an impact larger than τ / b t and a t because E [ p t + (cid:96) | ι ( t )] = p t irrespective time . . . . . i m p a c t [ $ ] Figure 3: The symmetrized impact of an imbalance on x t for Microsoft and | ι | contained in 10equidistant bins; | ι | ∈ [0 . × k, . × ( k + 1)), for k = 0 , , · · · , k increases from the bottom tothe top line. of ι ( t ) by construction and E [ x ∞ | p t ] = p t . Therefore, the absolute permanent impact of a queue depletionat t is given by r ∞ ( t ) = | p t + − x t | = | p t + G ( (cid:15) t − ˆ (cid:15) t ) − x t | = | ± (cid:16) τ r (cid:17) − G ( (cid:15) t − ˆ (cid:15) t ) + G ( (cid:15) t − ˆ (cid:15) t ) | = τ r , (18)with p t + the post-trade fundamental price and x t the pre-trade mid price. Note that r ∞ ( t ) can be largerthan a half tick. Nasdaq oﬀers rebates for liquidity providers of approximately 0 . r ∞ = τ r = 0 . . (19)In Table 2 we compare the impact of queue depletion to the half tick adjusted by the market making rebate.In conclusion, since r ∞ is in general larger than τ /

2, the fundamental price can lie outside the interval( b t , a t ). The mid-price is therefore “sticky”: It remains constant as long as p t is within a certain intervalaround x t . Because these characteristic intervals reside on the price scale with periodicity τ , but overlapfor diﬀerent x t , the mid price x t can take two diﬀerent values for the same p t , depending on the pastdynamics of p t (see Robert and Rosenbaum [2011] for a similar observation in the context of high-frequencyeconometrics). We have shown that the mid price is not eﬃcient in large tick LOBs because it does not incorporate theinformation conveyed by the volume imbalance. When the tick size is small, the volume imbalance loses itspredictive power [Gould and Bonart, 2015]. A direct way of assessing the eﬃciency of the mid price in smalltick LOBs is to consider the empirical covariance function of mid price returnscov x ( (cid:96) ) = E [( x t +1+ (cid:96) − x t + (cid:96) )( x t +1 − x t )] . (20)The mid price is not eﬃcient, neither in small tick nor in large tick LOBs, because we observe that thecovariance function of lag one is in most cases negative, i.e. the mid price mean reverts after a non-zeroreturn. icker permanent impact of permanent impact ofdepletion [0 . . Table 2: Comparison between the permanent price impact of a queue depletion and the permanentprice impact of a large imbalance ( | ι | > .

9) for all large tick stocks in our dataset, deﬁned by thecondition E [ s t ] < . . . . n this section we show that MRR’s relation (7) allows us to forecast the covariance of mid price returnsfor small tick stocks. But the mathematical expectation in the corresponding equation (14) of our model forlarge ticks prevents us from forecasting the covariance of mid price returns in large tick stocks and we shallexplicitly observe this empirically.The continuous price MRR’s model can be easily adapted to ﬁnite liquidity rebates. When marketmakers earn rebates they avoid ex-post regrets by setting quotes according to a t = p t + G [1 − ˆ (cid:15) t ] − r and b t = p t + [ − − ˆ (cid:15) t ] + r . Yet, the mid price x t = [ b t + a t ] is unaﬀected by r and we can use Eq. (7) tocalculatecov x ( (cid:96) ) = G ( C ( (cid:96) ) − C ( (cid:96) + 1) − E [ (cid:15) t + (cid:96) ˆ (cid:15) t +1 ] + E [ (cid:15) t + (cid:96) +1 ˆ (cid:15) t +1 ]) + G E [ (cid:15) t + (cid:96) W t ] − G E [ (cid:15) t + (cid:96) +1 W t ]It is important to realize that we do not assume that the future order ﬂow is uncorrelated with W t . Infact, it is intuitively clear that market participants react to past returns. Order ﬂow and price dynamics aretherefore connected through a feedback loop,as shown empirically in [Hasbrouck, 1991] and more recentlywith a diﬀerent approach in [Taranto et al., 2016a,b].Despite this diﬃculty it is possible to obtain a non-parametric prediction of the covariance of mid pricereturns. We deﬁne the trade sign shifted response function as R k ( (cid:96) ) = E [ (cid:15) t + k ( x t + (cid:96) − x t )] . (21) R k ( (cid:96) ) is easily measurable on empirical data. Surprisingly, by using the MRR equation (7) it is possibleto express the covariance of mid price returns entirely in terms of trade sign shifted response functions:cov x ( (cid:96) ) = G [ R (cid:96) (1) − R (cid:96) +1 (1)] = R ( ∞ ) [ R (cid:96) (1) − R (cid:96) +1 (1)] . (22)This relation is a subtle test of Eq. (7) in that it is valid irrespective the correlations of the price dynamics(i.e. the noise W t ) with the future order ﬂow.When the relative tick size is large, our framework predicts that Eq. (7) holds only on average. Becausethe covariance function is a second order statistics, we therefore do not expect Eq. (22) to be true for largetick stocks. In Fig. 4 we test Eq. (22) on our empirical dataset. We observe that Eq. (22) performs relativelywell for small tick stocks and relatively badly for large tick stocks, as we have predicted.In summary then, we conclude that, ﬁrst, the linear dependence between the price evolution and privateand public information in the classical MRR model describes the price dynamics in small tick LOBs accu-rately, in that it performs very well with respect to linear and moderately well with respect to quadraticprice statistics. Second, this statement is true regardless the correlation between the information shocks andthe future order ﬂow. Finally, our generalized model for large tick LOBs correctly predicts that the pricediscreteness is irrelevant regarding linear price statistics, but has a signiﬁcant inﬂuence on quadratic pricestatistics. The question remains if and how the fundamental price can be reconstructed from publicly available LOBdata. This section introduces a proxy ˆ p t of the fundamental price in large tick LOBs, which we construct We ﬁnd empirically that cov x (1) < (cid:15) t +1 = C (1) (cid:15) t which implies cov x (1) = G C (1)[1 − C (1)] >

0, incontradiction with our empirical ﬁndings. On the other hand, if we assume that the trade sign process is exogenous,but allows for a “perfect” predictor, i.e. ˆ (cid:15) t +1 = (cid:15) t +1 , we ﬁnd that cov x (1) = G [2 C ( (cid:96) ) − C ( (cid:96) + 1) − C ( (cid:96) − negative when C ( (cid:96) ) is concave. Therefore, even a purely exogenous order ﬂow is compatible with a negativeauto-covariance of returns. Note that R ( (cid:96) ) = R ( (cid:96) ) is the standard reponse function of a transaction. R (cid:96) ( (cid:96) ) = − R ( − (cid:96) ) is minus the responsefunction with negative lag considered in Taranto et al. [2016a]. Note that in reality the transaction sizes are not constant. Because large transactions have in general a largerimpact than small transactions, the prefactor G depends on the size and ﬂuctuates in time. Whereas this does notplay a role in linear price statisics, the time correlations of G do change the second order statistics. Our modelneglects this eﬀect. . − .

02 0 .

00 0 .

02 0 . x ( ‘ ) /R ( ∞ ) − . − . . . . [ R ‘ ( ) − R ‘ + ( ) ] / R ( ∞ ) − . − . − . − . − . − .

02 0 .

00 0 . x ( ‘ ) /R ( ∞ ) − . − . − . − . − . − . . . [ R ‘ ( ) − R ‘ + ( ) ] / R ( ∞ ) Figure 4: The auto-covariance of mid price returns, rescaled by the permanent market order impact R ( ∞ ), for (left ﬁgure) all small tick (deﬁned by a spread E [ s t ] > . E [ s t ] < . x (1) is markedas a circle, the cov x (2) are marked as right triangles, the cov x (3) are marked as left triangles, thecov x (4) are marked as squares, and each stock’s cov x (5) is marked as a down triangle. by using the squared volumes at the best:ˆ p t = V a ( t )( b t − r ) + V b ( t )( a t + r ) V a ( t ) + V b ( t ) . (23)ˆ p t has the following desirable properties: It is easy to calculate and continuous. Its values can lie outsidethe interval ( b t , a t ) as required in our framework. Finally, in a balanced LOB, i.e. when V a ( t ) = V b ( t ), itcoincides with the mid price.Below we present several additional criteria which motivate our choice of using Eq. (23). To substantiateour choice of ˆ p t , we compare its performance to the alternative proxyˆ p (cid:48) t = V a ( t )( b t − r ) + V b ( t )( a t + r ) V a ( t ) + V b ( t ) , (24)which is the simplest linear generalization of the standard volume weighted priceˆ p (cid:48)(cid:48) t = V a ( t ) b t + V b ( t ) a t V a ( t ) + V b ( t ) . (25)Note that ˆ p (cid:48)(cid:48) t cannot lie outside the interval ( b t , a t ). By showing that ˆ p t performs better than ˆ p (cid:48)(cid:48) t we can thussubstantiate one of the key aspects of our model.We choose to measure the performance of ˆ p t , ˆ p (cid:48) t , and ˆ p (cid:48)(cid:48) t by considering: (i) the impact of an imbalanceon the price, (ii) the unconditional response function, and (iii) the autocorrelation of returns, which we alsoinvestigate by looking at the signature plot of volatility. Our empirical results can be summarized as follows.First, ˆ p t has incorporated to a large extent the information conveyed by the volume imbalance betweenthe ask and bid. To demonstrate this, we calculate the impact of an imbalance on ˆ p t for Microsoft, the mostliquid large tick stock in our dataset. We use Nasdaq’s liquidity rebate of approximately r = 0 . τ . The resultis depicted in Fig. 5. We observe that the impact of imbalances on ˆ p t is only ≈

20% as large as the impact onthe mid price x t . We interpret the missing impact as the information which is already included in ˆ p t . Thus,ˆ p t captures approximately 80% of the information conveyed by the volume imbalance. We also calculate the time . . . . . . i m p a c t o f i m b a l a n c e [ $ ] Figure 5: The symmetrized impact of imbalances | ι | ∈ [0 ,

1] on the proxy ˆ p t of the fundamentalprice deﬁned in Eq. (23) (region below red solid), on the proxy ˆ p (cid:48) t deﬁned in Eq. (24) (region belowviolet dotted), on the proxy ˆ p (cid:48)(cid:48) t deﬁned in Eq. (25) (region below green dash-dotted), and on themid price x t (region below blue dashed). impact of an imbalance on the alternative proxies ˆ p (cid:48) t and ˆ p (cid:48)(cid:48) t . While ˆ p (cid:48) t also captures approximately 80% ofthe information of the imbalance, the proxy ˆ p (cid:48)(cid:48) t performs much worse. We interpret this fact as an indicationthat proxies of the fundamental price which are adjusted for liquidity rebates (i.e. which can lie outside theregion spanned by the bid and ask) are signiﬁcantly more eﬃcient with respect to ι .Second, ˆ p t behaves approximately as the eﬃcient fundamental price in the MRR framework, in that itapproximately satisﬁes Eq. (1). Deﬁne the lagged response of a transaction on ˆ p t , R (ˆ p ) ( (cid:96) ) = E [ (cid:15) t · (ˆ p t + (cid:96) − ˆ p t )] . If ˆ p t was exactly equal to the fundamental price, R (ˆ p ) ( (cid:96) ) would coincide with R (ˆ p ) (1) for all (cid:96) ≥ R (ˆ p ) ( (cid:96) ), R (ˆ p (cid:48) ) ( (cid:96) ), R (ˆ p (cid:48)(cid:48) ) ( (cid:96) ), and R ( (cid:96) ). Whereas the standard permanent impact, R ( ∞ ),diﬀers by a factor of 2 . R (1), we observe that the diﬀerence between R (ˆ p ) ( ∞ ) and R (ˆ p ) (1) is reducedto ≈ p t is much weaker than the permanent impact on x t . Wesuggest that this diﬀerence is observed because ˆ p t incorporates a large part of the past and future order ﬂowcorrelations. In fact, if ˆ p is the fundamental price, Eq. (1) predicts R (ˆ p ) ( ∞ ) = R (ˆ p ) (1) = E [ (cid:15) t · ( p ∞ − p t )] = G (1 − E [ (cid:15) t · ˆ (cid:15) t ]) , which is, depending on the predictor ˆ (cid:15) t , signiﬁcantly smaller than R ( ∞ ) = G .Third, to test for statistical eﬃciency of ˆ p t , we calculate the correlation function of returns separated bya time lag (cid:96) : corr ˆ p ( (cid:96) ) = E [(ˆ p t + (cid:96) +1 − ˆ p t + (cid:96) )(ˆ p t +1 − ˆ p t )] E [(ˆ p t +1 − ˆ p t ) ] . (26)We also consider the correlation functions corr ˆ p (cid:48) ( (cid:96) ), corr ˆ p (cid:48)(cid:48) ( (cid:96) ), and corr x ( (cid:96) ) which are deﬁned accordingly.Fig. 7 shows that C ˆ p ( (cid:96) ) is approximately zero for all (cid:96) . Therefore, ˆ p t is virtually statistically unpredictablewith linear methods, as required by Eq. (1). Fig. 7 shows also the auto-correlation functions of the returnsof the alternative proxies ˆ p (cid:48) t , ˆ p (cid:48)(cid:48) t and the mid price x t . We observe that both ˆ p (cid:48) t and ˆ p (cid:48)(cid:48) t suﬀer from a relativelylarge positive auto-correlation reﬂecting our previous observation that the response functions R (ˆ p (cid:48) ) and R (ˆ p (cid:48)(cid:48) )

10 15 20 time lag . . . . . . . . i m p a c t Figure 6: The response function of a trade on the proxy of the fundamental price Eq. (23), R (ˆ p ) ( (cid:96) ) = E [ (cid:15) t · (ˆ p t + (cid:96) − ˆ p t )], for Microsoft (red solid) compared to the response function R ( (cid:96) ) calculated with themid price (blue dashed), the response function R (ˆ p (cid:48) ) calculated with the alternative proxy Eq. (24)(violet dotted), and the response function R (ˆ p (cid:48)(cid:48) ) calculated with the alternative proxy Eq. (25)(green dash-dotted). continue to grow signiﬁcantly after a transaction (see Fig. 6). Table 3 summarizes our ﬁndings on the auto-correlations of ˆ p t , ˆ p (cid:48) t , ˆ p (cid:48)(cid:48) t and x t for the large tick stocks. Section 6.2 discusses the causes of the negativeauto-correlation function of the mid price returns.A related method to test the statistical eﬃciency of ˆ p t is to investigate its signature plot which displaysthe time-normalized volatility deﬁned as the mean square displacement of lag (cid:96) rescaled by (cid:96) : σ ˆ p ( (cid:96) ) = (cid:114) E [(ˆ p t + (cid:96) − ˆ p t ) ] (cid:96) . The time-normalized volatility of a martingale is independent of (cid:96) , whereas an increasing time-normalizedvolatility indicates a positive auto-correlation of returns (trend following) and a decreasing time-normalizedvolatility indicates a negative auto-correlation of returns (mean-reversion). We also calculate the equivalenttime-normalized volatilities for the other proxies ˆ p (cid:48) t , ˆ p (cid:48)(cid:48) t , as well as for x t . Fig. 8 shows the signature plotof ˆ p t , ˆ p (cid:48) t , ˆ p (cid:48)(cid:48) t , and x t . We observe again that ˆ p t performs better than the alternative proxies in that itstime-normalized volatility depends much less on the time lag.Finally, we remark that ˆ p t coincides, by construction, with the average true fundamental price in ourmodel when a queue depletion is imminent, i.e. when either V b ( t ) or V a ( t ) are very close to zero. While westress that the goal of this paper is not to seek for the best proxy of the fundamental price, we believe thatthe above analysis demonstrates that ˆ p t , as it is deﬁned in Eq. (23), can serve as a good approximation tothe true fundamental price.Having a proxy of the fundamental price at hand is useful for several reasons. First, it is a naturalmeasure of the future direction of the market, in that the future average mid price is expected to increase if x t < ˆ p t and expected to decrease if x t > ˆ p t . Thus, while the mid price does not coincide with the fundamentalprice due to market frictions (here the tick size), it nevertheless converges on average to the latter in thefuture. Deviations between the mid price and the fundamental price are thus transient, an intuition that weshare with Hasbrouck [1991].Second, ˆ p t allows us to study the covariance between the next trade sign (cid:15) t +1 and the public information W t at t . Because the information process is exogeneous and uncorrelated from the past order ﬂow, this

10 15 20 time lag − . . . . . c o rr e l a t i o n o f r e t u r n s Figure 7: The correlation function of returns for Microsoft, calculated with the proxy ˆ p t of thefundamental price deﬁned in Eq. (23) (red solid), the alternative proxy ˆ p (cid:48) t deﬁned in Eq. (24)(violet dotted), the alternative proxy ˆ p (cid:48)(cid:48) t deﬁned in Eq. (25) (green dash-dotted), and the mid price x t (blue dashed).

10 20 30 40 50 time lag ‘ . . . . . . . . σ ( ‘ ) Figure 8: Signature plot of the proxies ˆ p t (red solid), ˆ p (cid:48) t (violet dotted), ˆ p (cid:48)(cid:48) t (green dash-dotted),and the mid price x t (blue dashed) for Microsoft during the year 2015.17 icker corr ˆ p (1) corr ˆ p (cid:48) (1) corr ˆ p (cid:48)(cid:48) (1) corr ˆ x (1)AMAT 0.021 (0.003) 0.114 (0.003) 0.107 (0.003) -0.030 (0.003)ATVI 0.027 (0.003) 0.111 (0.004) 0.108 (0.003) -0.026 (0.004)CA 0.041 (0.004) 0.127 (0.004) 0.125 (0.004) -0.014 (0.004)CMCSA 0.033 (0.003) 0.135 (0.003) 0.145 (0.003) -0.021 (0.003)CSCO -0.006 (0.002) 0.086 (0.003) 0.115 (0.002) -0.004 (0.002)CSX 0.040 (0.003) 0.126 (0.003) 0.119 (0.003) -0.033 (0.003)DISCA 0.064 (0.004) 0.139 (0.004) 0.138 (0.004) 0.014 (0.004)EBAY 0.043 (0.003) 0.138 (0.003) 0.143 (0.003) 0.000 (0.003)FOX 0.038 (0.003) 0.126 (0.004) 0.128 (0.003) -0.031 (0.004)GE -0.045 (0.004) 0.046 (0.005) 0.085 (0.004) -0.058 (0.003)INTC 0.018 (0.002) 0.112 (0.002) 0.119 (0.002) -0.030 (0.002)JPM 0.023 (0.003) 0.124 (0.003) 0.140 (0.003) -0.031 (0.003)MAT 0.059 (0.004) 0.143 (0.004) 0.136 (0.004) -0.011 (0.004)MDLZ 0.033 (0.003) 0.127 (0.004) 0.128 (0.003) -0.021 (0.003)MSFT 0.015 (0.002) 0.117 (0.002) 0.126 (0.002) -0.040 (0.002)MU 0.009 (0.004) 0.100 (0.004) 0.114 (0.004) -0.035 (0.003)NVDA 0.040 (0.003) 0.130 (0.003) 0.126 (0.003) -0.018 (0.003)ORCL 0.026 (0.003) 0.129 (0.004) 0.137 (0.003) -0.025 (0.003)QCOM 0.031 (0.003) 0.127 (0.003) 0.134 (0.003) -0.017 (0.003)SYMC 0.037 (0.003) 0.128 (0.003) 0.119 (0.003) -0.023 (0.003)SIRI -0.03 (0.006) 0.013 (0.007) 0.065 (0.006) -0.015 (0.006)TXN 0.033 (0.003) 0.119 (0.003) 0.124 (0.003) -0.018 (0.003)VOD 0.004 (0.003) 0.087 (0.004) 0.124 (0.003) -0.038 (0.004)YHOO 0.022 (0.004) 0.120 (0.004) 0.133 (0.004) -0.032 (0.004) Table 3: Auto-correlation functions at lag 1 for the returns of the proxies ˆ p t , ˆ p (cid:48) t and ˆ p (cid:48)(cid:48) t of thefundamental price, and the auto-correlation at lag 1 for the returns of the mid price for all largetick stocks (deﬁned by E [ s t ] < . icker E (cid:15) t +1 W t [0 . Table 4: Implied covariance between public news W t and the sign of the next transaction (cid:15) t +1 forall large tick stocks (deﬁned by their spread E [ s t ] < . covariance captures a genuine innovation of the beliefs shared amongst liquidity takers due to the arrival ofnew information. The theory of price formation has been hitherto mostly concerned with the reaction ofliquidity providers, i.e. quote setters , to new information (for new developments see however [Taranto et al.,2016a,b]). By using again the trade sign shifted response function, deﬁned in Eq. (21), we have R ( p ) (1) = G (1 − E [ (cid:15) t ˆ (cid:15) t ]) = R (1) + R (1) − E [ (cid:15) t +1 W t ] , (27)which allows us to measure the covariance between W t and (cid:15) t +1 by using the proxy ˆ p t for p t . Table 4 containsour estimates of E [ (cid:15) t +1 W t ] for all large tick stocks in our dataset. We empirically observe that E [ (cid:15) t +1 W t ] > While price discreteness is an inherent property of ﬁnancial markets, the interplay between liquidity provision,spread dynamics, and information asymmetry has hitherto been often analysed under the assumption of acontinuous price scale.We believe that this assumption is unsustainable in the light of the modern functioning of tradingplatforms: First, liquidity is nowadays mostly provided by high frequency market makers who seek tomake a proﬁt from tiny ineﬃciencies. Assuming that these ineﬃciencies are necessarily of the order of the OB price resolution would be a gross mistake. Our paper shows precisely that changes and mismatchesin the consensus price can be much smaller than the tick. Second, assuming continuous prices excludesthe possibility of analysing the modern market design and its inﬂuences on the trading environment. Theadvantages and disadvantages of certain order priority rules, the inﬂuence of latency on the investor’s welfare,the proﬁtability of hidden liquidity, are some of the regulatory issues which can be fully addressed only byconsidering ﬁnite tick sizes.This paper has overcome these limitations by developping a consistent model of liquidity provision inlarge tick LOBs. Its implications are diverse. Our generalization of MRR implies that the traders’ consensusdeﬁnes a fundamental price on scales below the tick size. Changes in the fundamental price are reﬂected inchanges of the volumes in the LOB, which allows us to deﬁne a proxy of the fundamental price based onthe liquidity at the best. This is in stark contrast to the zero intelligence approach embraced by a diﬀerentbranch of the literature on the subject [Farmer et al., 2005, Cont et al., 2010].Several empirical tests on high frequency data support our model’s core results (in particular Eq. (14)), inthat the predicted relationship between price impact and trade sign correlations holds accurately far beyondthe LOB’s price resolution. Whereas the core of our framework is very successful, we are also aware that theperformance of the proxy ˆ p t deﬁned in Sec. 7 is acceptable but not overwhelming. While the dynamics of ˆ p t are roughly in line with what we expect from the fundamental price, i.e. Eq. (1), ˆ p t incorporates ≈

80% ofthe information contained in the volume imbalance at the best; this is encouraging but not outstanding.Other aspects of our paper call for further research, as well. Whereas we are able to link the dynamicsof the volume imbalance to the location of the approximate fundamental price, our work does not developa criteria to determine the absolute level of liquidity in the LOB. The relationship between overall liquidityand the tick size remains an important unsolved problem in applied market microstructure. Second, thispaper has ignored MRR’s second equation (6) which relates the spread to the response function accordingto s = 2 R (1) / (1 − C (1)). Whereas the implied spread overestimates the realized spread a little, we donot observe a signiﬁcant diﬀerence between small and large tick stocks. This is again very surprising andsuggests that the permanent impact of a market order, R ( ∞ ) = G , is on large tick LOBs approximatelyequal to τ /

2, irrespective of other properties of the stock, such as its price or volatility. Do liquidity takersonly submit market orders if the value of their private information exceeds the trading costs? Why do orderﬂow correlations and the response function conspire in such a way that Eq. (6) is approximately satisﬁed,even on large tick stocks? We must leave these questions unanswered here.This paper postulates the existence of a fundamental price which is by construction totally unpredictable.Is this consensus price generated by a crowd of equally rational agents? This does not seem the case.By zooming into the instant when a queue depletes and the price changes, one ﬁnds that in some cases(approximately 25%) the depleted queue is immediately reﬁlled: Liquidity providers do not always agreewhether a mid price change is acceptable, and the diverging opinions create signiﬁcant ﬂuctuations. Howthis variety of perceptions merge eventually into a consensus we cannot tell; we do not derive our model fromﬁrst principles. But as a phenomenological model it can improve our understanding of ﬁnancial markets.

Acknowledgements

We thank Jonathan Donier, Jean-Philippe Bouchaud and Charles-Albert Lehalle for their many importantcomments and insights. We acknowledge discussions with Mathieu Rosenbaum, Martin Gould, Rama Contand Arseniy Kukanov. i c k e r m e a np r i ce m e a n s p r e a d tr a d e s i g n [ $ ][ $ ] c o rr e l a t i o n E (cid:15) t (cid:15) t + S I R I . . . A M A T . . . M U . . . S Y M C . . . NV D A . . . M A T . . . A T V I . . . G E . . . C S C O . . . J D . . . C A . . . D I S C A . . . C S X . . . F O X . . . I N TC . . . V O D . . . A M T D . . . T M U S . . . I B K R . . . L M C A . . . YH OO . . . E B AY . . . W F M . . . M D L Z . . . O R C L . . . F A S T . . . X L NX . . . LL TC . . . AA L . . . M S F T . . . P AYX . . . N D A Q . . . T XN . . . C I N F . . . N C L H . . . A D S K . . . V I A B . . . M Y L . . . K L A C . . . C M C S A . . . A D I . . . P C A R . . . Q C O M . . . S B UX . . . CT S H . . . CT R P . . . J P M . . . R O S T . . . C E R N . . . D I S H . . . t i c k e r m e a np r i ce m e a n s p r e a d tr a d e s i g n [ $ ][ $ ] c o rr e l a t i o n E (cid:15) t (cid:15) t + S N D K . . . C H R W . . . CT X S . . . N T R S . . . D L T R . . . M A R . . . L R C X . . . T R O W . . . A D B E . . . C H K P . . . X O M . . . W D C . . . F I S V . . . A D P . . . W B A . . . E S R X . . . T S C O . . . F B . . . S W K S . . . NX P I . . . C M E . . . I N T U . . . I N C Y . . . G I L D . . . E X PE . . . B M R N . . . C E L G . . . AA P L . . . V R T X . . . AV G O . . . N T E S . . . M N S T . . . H S I C . . . C O S T . . . U L T A . . . A M G N . . . A L XN . . . C H T R . . . B I D U . . . N F L X . . . I L M N . . . T S L A . . . O R L Y . . . E Q I X . . . B II B . . . A M Z N . . . R E G N . . . I S R G . . . G OO G . . . P C L N . . . T a b l e : Su mm a r y s t a t i s t i c s o f t h e p oo l o f s m a ll, m e d i u m a nd l a r g e t i c k s t o c k s t r a d e d o n N a s d a q du r i n g2015 . T h e s t o c k s a r e o r d e r e d w i t h a s ce nd i n g p r i ce . A v e r ag e s a r ec a l c u l a t e d i n t r a n s a c t i o n t i m e b y u s i n g t h e p r e v a ili n g q u o t e s a t m a r k e t o r d e r a rr i v a l s . eferences T. G. Andersen, T. Bollersiev, and N. Meddahi. Market microstructure noise and realized volatility fore-casting.

Journal of Econometrics , 160:220–234, 2011.E. Bacry, S. Delattre, M. Hoﬀmann, and J. F. Muzy. Modelling microstructure noise with mutually excitinghawkes processes.

Quantitative Finance , 13:65–77, 2013.C. A. Ball and T. Chordia. True spreads and equilibrium prices.

The Journal of Finance , 56:1801–1835,2001.F. Bandi and J. Russel. Microstructure noise, realized variance and optimal sampling.

Review of EconomicStudies 2008 , 75:339–369, 2008.B. Biais, P. Hillion, and C. Spatt. An empirical analysis of the limit order book and the order ﬂow in theparis bourse.

The Journal of Finance , 50:1655–1689, 1995.B. Biais, L. Glosten, and C. Spatt. Market microstructure: A survey of microfoundations, empirical results,and policy implications.

The Journal of Financial Markets , 8:217–264, 2005.J.-P. Bouchaud, Y. Gefen, M. Potters, and M. Wyart. Fluctuations and response in ﬁnancial markets: Thesubtle nature of random price changes.

Quantitative Finance , 4(2):176–190, 2004.J. P. Bouchaud, J. D. Farmer, and F. Lillo. How markets slowly digest changes in supply and demand.In T. Hens and K. R. Schenk-Hopp´e, editors,

Handbook of Financial Markets: Dynamics and Evolution ,pages 57–160. North–Holland, Amsterdam, The Netherlands, 2009.A. Cartea and S. Jaimungal. Incorporating order-ﬂow into optimal execution. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2557457 , 2015.A. Cartea, R. F. Donnelly, and S. Jaimungal. Enhanced trading strategies with order book signals. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2668277 , 2015.R. Cont, S. Stoikov, and R. Talreja. A stochastic model for order book dynamics.

Opererations Research ,58:549–563, 2010.G. Curato and F. Lillo. Modeling the coupled return-spread high frequency dynamics of large tick assets.

Journal of Statistical Mechanics-Theory and Applications , page P01028, 2015.Z. Eisler, J.-P. Bouchaud, and J. Kockelkoren. The price impact of order book events: market orders, limitorders and cancellations, 2012.J. D. Farmer and F. Lillo. The long memory of the eﬃcient market.

Studies in Nonlinear Dynamics andEconometrics , 8:1–33, 2004.J. D. Farmer, P. Patelli, and I. I. Zovko. The predictive power of zero intelligence in ﬁnancial markets.

Proceedings of the National Academy of Sciences of the United States of America , 102(6):2254–2259, 2005.J. D. Farmer, A. Gerig, F. Lillo, and S. Mike. Market eﬃciency and the long memory of supply and demand:is price impact variable and permanent or ﬁxed and temporary.

Quantitative Finance , 6:107–112, 2006.E. Ghysels and A. Sinko. Volatility prediction and microstructural noise.

Journal of Econometrics , 160:257–271, 2011.L. Glosten and P. Milgrom. Bid, ask and transaction prices in a specialist market with heterogeneouslyinformed traders.

Journal of Financial Economics , 14:71–100, 1985. . Gould, M. A. Porter, and S. D. Howison. The long memory process of order ﬂow in the foreign exchangespot market. http://arxiv.org/pdf/1504.04354.pdf , 2015.M. D. Gould and J. Bonart. Queue imbalance as a one-tick-ahead price predictor in a limit order book. arXiv:1512.03492 , 2015.Gouvernment Oﬃce for Science. Tick size regulation: costs, beneﬁts and risks. , 2015.L. Harris. Maker-taker pricing eﬀects on market quotations. http://bschool.huji.ac.il/.upload/hujibusiness/Maker-taker.pdf , 2013.J. Hasbrouck. Measuring the information content of stock trades. The Journal of Finance , 46:179–206, 1991.J. Hasbrouck and T. S. Y. Ho. Order arrival, quote behaviour, and the return-generating process.

TheJournal of Finance , 42:1035–1048, 1987.W. Huang, C.-A. Lehalle, and M. Rosenbaum. Large tick assets: implicit spread and optimal tick size. , 2013.W. Huang, C.-A. Lehalle, and M. Rosenbaum. Simulating and analyzing order book data: The queue-reactivemodel.

Journal of the Americal Statistical Association , 110:107–122, 2015.J. Jacod, Y. Li, P. A. Mykland, M. Podolskij, and M. Vetter. Microstructure noise in the continuous case:The pre-averaging approach.

Stochastic Processes and Their Application , 119:2249–2276, 2009.T. Jaisson. Liquidity and impact in fair markets.

Market Microstructure and Liquidity , 1:1550010, 2015.J. Large. Estimating quadratic variation when quotes prices change by a constant increment.

Journal ofEconometrics , 160:2–11, 2011.S. S. Lee and P. A. Mykland. Jumps in equilibrium prices and market microstructural noise.

Journal ofEconometrics , 168:396–406, 2015.F. Lillo, S. Mike, and J. Farmer. Theory for long memory in supply and demand.

Phys. Rev. E , 71:066122,2005.Limit Order Book System: The Eﬃcient Reconstructor. https://lobsterdata.com/index.php .A. Madhavan, M. Richardson, and M. Roomans. Why do security prices change? a transaction-level analysisof nyse stocks.

The Review of Financial Studies , 10:1035 – 1064, 1997.MiFID II Wholesale Firms Conference. , 2015.Order Execution and Routing section of the NASDAQ Rule Book. http://nasdaqtrader.com/Trader.aspx?id=PriceListTrading2 .C. Y. Robert and M. Rosenbaum. A new approach for the dynamics of ultra-high-frequency data: Themodel with uncertainty zones.

Journal of Financial Econometrics , 9:344–366, 2011.Securities and Exchange Commission. , 2015.D. Taranto, G. Bormetti, and F. Lillo. The adaptive nature of liquidity taking in limit order books.

Journalof Statistical Mechanics-Theory and Applications , 85:P06002, 2014. . E. Taranto, G. Bormetti, J.-P. Bouchaud, F. Lillo, and B. Toth. Linear models for the impact of order ﬂowon prices i. propagators: Transcient vs. history dependent impact. http://arxiv.org/abs/1602.02735 ,2016a.D. E. Taranto, G. Bormetti, J.-P. Bouchaud, F. Lillo, and B. Toth. Linear models for the impact of order ﬂowon prices i. propagators: The mixture transition distribution model. http://arxiv.org/abs/1602.07556 ,2016b.B. Toth, I. Palit, F. Lillo, and J. D. Farmer. Why is equity ﬂow so peristent. Journal of Economic Dynamicsand Control , 51:218–239, 2015.M. Wyart, J.-P. Bouchaud, J. Kockelkoren, M. Potters, and M. Vettorazzo. Relation between bid-ask spread,impact and volatility in order-driven markets.

Quantitative Finance , 8:41–57, 2008., 8:41–57, 2008.